[jira] [Work logged] (GRIFFIN-352) Running measure fails with NoSuchMethodError

2022-07-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-352?focusedWorklogId=791703=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-791703
 ]

ASF GitHub Bot logged work on GRIFFIN-352:
--

Author: ASF GitHub Bot
Created on: 17/Jul/22 00:07
Start Date: 17/Jul/22 00:07
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on PR #601:
URL: https://github.com/apache/griffin/pull/601#issuecomment-1186345010

   Automated Message: We're closing this PR because it hasn't been updated in a 
while. This isn't a judgement on the merit of the PR in any way. It's just a 
way of keeping the PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to 
remove the 'no-pr-activity' tag!




Issue Time Tracking
---

Worklog Id: (was: 791703)
Time Spent: 0.5h  (was: 20m)

> Running measure fails with NoSuchMethodError
> 
>
> Key: GRIFFIN-352
> URL: https://issues.apache.org/jira/browse/GRIFFIN-352
> Project: Griffin
>  Issue Type: Bug
>  Components: Measure Module
>Affects Versions: 0.6.0
> Environment: spark 2.4, Hive 3.1
>Reporter: Vijay Kiran
>Assignee: William Guo
>Priority: Major
> Attachments: env_batch.json, env_streaming.json
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> With 0.6.0 and 0.7.0-SNAPSHOT, running measure with console sink fails with 
> NSME - please see the log below
> {code}
> 20/11/17 17:21:23 INFO transform.SparkSqlTransformStep: main begin transform 
> step :
> accu
> |   |---__missCount
> |   |   |---__missRecords
> |   |---__totalCount
> Exception in thread "main" java.lang.NoSuchMethodError: 
> com.google.common.util.concurrent.MoreExecutors.sameThreadExecutor()Lcom/google/common/util/concurrent/ListeningExecutorService;
> at 
> org.apache.griffin.measure.utils.ThreadUtils$.(ThreadUtils.scala:35)
> at 
> org.apache.griffin.measure.utils.ThreadUtils$.(ThreadUtils.scala)
> at 
> org.apache.griffin.measure.step.transform.TransformStep$.(TransformStep.scala:118)
> at 
> org.apache.griffin.measure.step.transform.TransformStep$.(TransformStep.scala)
> at 
> org.apache.griffin.measure.step.transform.TransformStep$$anonfun$3.apply(TransformStep.scala:61)
> at 
> org.apache.griffin.measure.step.transform.TransformStep$$anonfun$3.apply(TransformStep.scala:51)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
> at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
> at 
> scala.collection.mutable.AbstractSet.scala$collection$SetLike$$super$map(Set.scala:46)
> at scala.collection.SetLike$class.map(SetLike.scala:92)
> at scala.collection.mutable.AbstractSet.map(Set.scala:46)
> at 
> org.apache.griffin.measure.step.transform.TransformStep$class.execute(TransformStep.scala:51)
> at 
> org.apache.griffin.measure.step.transform.SparkSqlTransformStep.execute(SparkSqlTransformStep.scala:28)
> at 
> org.apache.griffin.measure.job.DQJob$$anonfun$execute$2.apply(DQJob.scala:29)
> at 
> org.apache.griffin.measure.job.DQJob$$anonfun$execute$2.apply(DQJob.scala:29)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at scala.collection.immutable.List.foreach(List.scala:392)
> at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
> at scala.collection.immutable.List.map(List.scala:296)
> at org.apache.griffin.measure.job.DQJob.execute(DQJob.scala:29)
> at 
> org.apache.griffin.measure.launch.batch.BatchDQApp$$anonfun$1.apply(BatchDQApp.scala:85)
> at 
> org.apache.griffin.measure.launch.batch.BatchDQApp$$anonfun$1.apply(BatchDQApp.scala:64)
> at 
> org.apache.griffin.measure.utils.CommonUtils$.timeThis(CommonUtils.scala:36)
> at 
> org.apache.griffin.measure.launch.batch.BatchDQApp.run(BatchDQApp.scala:64)
> at org.apache.griffin.measure.Application$.main(Application.scala:92)
> at org.apache.griffin.measure.Application.main(Application.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
>  

[jira] [Work logged] (GRIFFIN-352) Running measure fails with NoSuchMethodError

2022-07-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-352?focusedWorklogId=791704=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-791704
 ]

ASF GitHub Bot logged work on GRIFFIN-352:
--

Author: ASF GitHub Bot
Created on: 17/Jul/22 00:07
Start Date: 17/Jul/22 00:07
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #601: 
[GRIFFIN-352] Resolve conflicts between Guava versions
URL: https://github.com/apache/griffin/pull/601




Issue Time Tracking
---

Worklog Id: (was: 791704)
Time Spent: 40m  (was: 0.5h)

> Running measure fails with NoSuchMethodError
> 
>
> Key: GRIFFIN-352
> URL: https://issues.apache.org/jira/browse/GRIFFIN-352
> Project: Griffin
>  Issue Type: Bug
>  Components: Measure Module
>Affects Versions: 0.6.0
> Environment: spark 2.4, Hive 3.1
>Reporter: Vijay Kiran
>Assignee: William Guo
>Priority: Major
> Attachments: env_batch.json, env_streaming.json
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> With 0.6.0 and 0.7.0-SNAPSHOT, running measure with console sink fails with 
> NSME - please see the log below
> {code}
> 20/11/17 17:21:23 INFO transform.SparkSqlTransformStep: main begin transform 
> step :
> accu
> |   |---__missCount
> |   |   |---__missRecords
> |   |---__totalCount
> Exception in thread "main" java.lang.NoSuchMethodError: 
> com.google.common.util.concurrent.MoreExecutors.sameThreadExecutor()Lcom/google/common/util/concurrent/ListeningExecutorService;
> at 
> org.apache.griffin.measure.utils.ThreadUtils$.(ThreadUtils.scala:35)
> at 
> org.apache.griffin.measure.utils.ThreadUtils$.(ThreadUtils.scala)
> at 
> org.apache.griffin.measure.step.transform.TransformStep$.(TransformStep.scala:118)
> at 
> org.apache.griffin.measure.step.transform.TransformStep$.(TransformStep.scala)
> at 
> org.apache.griffin.measure.step.transform.TransformStep$$anonfun$3.apply(TransformStep.scala:61)
> at 
> org.apache.griffin.measure.step.transform.TransformStep$$anonfun$3.apply(TransformStep.scala:51)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
> at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
> at 
> scala.collection.mutable.AbstractSet.scala$collection$SetLike$$super$map(Set.scala:46)
> at scala.collection.SetLike$class.map(SetLike.scala:92)
> at scala.collection.mutable.AbstractSet.map(Set.scala:46)
> at 
> org.apache.griffin.measure.step.transform.TransformStep$class.execute(TransformStep.scala:51)
> at 
> org.apache.griffin.measure.step.transform.SparkSqlTransformStep.execute(SparkSqlTransformStep.scala:28)
> at 
> org.apache.griffin.measure.job.DQJob$$anonfun$execute$2.apply(DQJob.scala:29)
> at 
> org.apache.griffin.measure.job.DQJob$$anonfun$execute$2.apply(DQJob.scala:29)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at scala.collection.immutable.List.foreach(List.scala:392)
> at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
> at scala.collection.immutable.List.map(List.scala:296)
> at org.apache.griffin.measure.job.DQJob.execute(DQJob.scala:29)
> at 
> org.apache.griffin.measure.launch.batch.BatchDQApp$$anonfun$1.apply(BatchDQApp.scala:85)
> at 
> org.apache.griffin.measure.launch.batch.BatchDQApp$$anonfun$1.apply(BatchDQApp.scala:64)
> at 
> org.apache.griffin.measure.utils.CommonUtils$.timeThis(CommonUtils.scala:36)
> at 
> org.apache.griffin.measure.launch.batch.BatchDQApp.run(BatchDQApp.scala:64)
> at org.apache.griffin.measure.Application$.main(Application.scala:92)
> at org.apache.griffin.measure.Application.main(Application.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
> at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:847)
> at 
> 

[jira] [Work logged] (GRIFFIN-352) Running measure fails with NoSuchMethodError

2022-07-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-352?focusedWorklogId=787261=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-787261
 ]

ASF GitHub Bot logged work on GRIFFIN-352:
--

Author: ASF GitHub Bot
Created on: 02/Jul/22 00:04
Start Date: 02/Jul/22 00:04
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on PR #601:
URL: https://github.com/apache/griffin/pull/601#issuecomment-1172791039

   Automated Message: This PR is being labelled as stale and will be closed in 
next 15 days due to lack of activity. To avoid this push new commits or ask the 
committers for a review/ resolution.




Issue Time Tracking
---

Worklog Id: (was: 787261)
Time Spent: 20m  (was: 10m)

> Running measure fails with NoSuchMethodError
> 
>
> Key: GRIFFIN-352
> URL: https://issues.apache.org/jira/browse/GRIFFIN-352
> Project: Griffin
>  Issue Type: Bug
>  Components: Measure Module
>Affects Versions: 0.6.0
> Environment: spark 2.4, Hive 3.1
>Reporter: Vijay Kiran
>Assignee: William Guo
>Priority: Major
> Attachments: env_batch.json, env_streaming.json
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> With 0.6.0 and 0.7.0-SNAPSHOT, running measure with console sink fails with 
> NSME - please see the log below
> {code}
> 20/11/17 17:21:23 INFO transform.SparkSqlTransformStep: main begin transform 
> step :
> accu
> |   |---__missCount
> |   |   |---__missRecords
> |   |---__totalCount
> Exception in thread "main" java.lang.NoSuchMethodError: 
> com.google.common.util.concurrent.MoreExecutors.sameThreadExecutor()Lcom/google/common/util/concurrent/ListeningExecutorService;
> at 
> org.apache.griffin.measure.utils.ThreadUtils$.(ThreadUtils.scala:35)
> at 
> org.apache.griffin.measure.utils.ThreadUtils$.(ThreadUtils.scala)
> at 
> org.apache.griffin.measure.step.transform.TransformStep$.(TransformStep.scala:118)
> at 
> org.apache.griffin.measure.step.transform.TransformStep$.(TransformStep.scala)
> at 
> org.apache.griffin.measure.step.transform.TransformStep$$anonfun$3.apply(TransformStep.scala:61)
> at 
> org.apache.griffin.measure.step.transform.TransformStep$$anonfun$3.apply(TransformStep.scala:51)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
> at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
> at 
> scala.collection.mutable.AbstractSet.scala$collection$SetLike$$super$map(Set.scala:46)
> at scala.collection.SetLike$class.map(SetLike.scala:92)
> at scala.collection.mutable.AbstractSet.map(Set.scala:46)
> at 
> org.apache.griffin.measure.step.transform.TransformStep$class.execute(TransformStep.scala:51)
> at 
> org.apache.griffin.measure.step.transform.SparkSqlTransformStep.execute(SparkSqlTransformStep.scala:28)
> at 
> org.apache.griffin.measure.job.DQJob$$anonfun$execute$2.apply(DQJob.scala:29)
> at 
> org.apache.griffin.measure.job.DQJob$$anonfun$execute$2.apply(DQJob.scala:29)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at scala.collection.immutable.List.foreach(List.scala:392)
> at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
> at scala.collection.immutable.List.map(List.scala:296)
> at org.apache.griffin.measure.job.DQJob.execute(DQJob.scala:29)
> at 
> org.apache.griffin.measure.launch.batch.BatchDQApp$$anonfun$1.apply(BatchDQApp.scala:85)
> at 
> org.apache.griffin.measure.launch.batch.BatchDQApp$$anonfun$1.apply(BatchDQApp.scala:64)
> at 
> org.apache.griffin.measure.utils.CommonUtils$.timeThis(CommonUtils.scala:36)
> at 
> org.apache.griffin.measure.launch.batch.BatchDQApp.run(BatchDQApp.scala:64)
> at org.apache.griffin.measure.Application$.main(Application.scala:92)
> at org.apache.griffin.measure.Application.main(Application.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
> at 
> 

[jira] [Work logged] (GRIFFIN-363) Hive kerberos for Griffin error

2022-03-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-363?focusedWorklogId=749023=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-749023
 ]

ASF GitHub Bot logged work on GRIFFIN-363:
--

Author: ASF GitHub Bot
Created on: 29/Mar/22 00:03
Start Date: 29/Mar/22 00:03
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #599:
URL: https://github.com/apache/griffin/pull/599#issuecomment-1081265665


   Automated Message: We're closing this PR because it hasn't been updated in a 
while. This isn't a judgement on the merit of the PR in any way. It's just a 
way of keeping the PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to 
remove the 'no-pr-activity' tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@griffin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 749023)
Remaining Estimate: 19h 50m  (was: 20h)
Time Spent: 4h 10m  (was: 4h)

> Hive kerberos for Griffin error
> ---
>
> Key: GRIFFIN-363
> URL: https://issues.apache.org/jira/browse/GRIFFIN-363
> Project: Griffin
>  Issue Type: Bug
>  Components: Service Module
>Affects Versions: 0.6.0
> Environment: CentOS 7 
>Reporter: MenghuiWan
>Priority: Minor
>  Labels: kerberos
>   Original Estimate: 24h
>  Time Spent: 4h 10m
>  Remaining Estimate: 19h 50m
>
> i am sorry for my english is not well. 
> 
> This is my problem:
> i try to use griffin for data quality projects.
> our enviroment is CDH 6.1.1 cluster with kerberos.
> i want to connect hive then i set up all configurations about kerberos by 
> user guide.
> but i found hive connection had failed.
> 
> error log is:
> {quote}2021-06-15 13:33:01.418  WARN 102938 — [           main] h.metastore   
>                           [487]  : Failed to connect to the MetaStore 
> Server...2021-06-15 13:33:01.418  WARN 102938 — [           main] h.metastore 
>                             [487]  : Failed to connect to the MetaStore 
> Server...2021-06-15 13:33:01.418  INFO 102938 — [           main] h.metastore 
>                             [518]  : Waiting 1 seconds before next connection 
> attempt.2021-06-15 13:33:02.419  INFO 102938 — [           main] h.metastore  
>                            [434]  : Trying to connect to metastore with URI 
> thrift://..com:90832021-06-15 13:33:02.422 ERROR 102938 — [           
> main] o.a.t.t.TSaslTransport                  [315]  : SASL negotiation 
> failure
>  javax.security.sasl.SaslException: GSS initiate failed at 
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
>  ~[?:1.8.0_131] at 
> org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
>  ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) 
> ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
>  ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_131] at 
> javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_131] at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>  ~[hadoop-common-2.7.1.jar:?] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:478)
>  ~[hive-metastore-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:286)
>  ~[hive-metastore-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:211)
>  ~[hive-metastore-2.2.0.jar:2.2.0] at 
> org.apache.griffin.core.metastore.hive.HiveMetaStoreProxy.initHiveMetastoreClient(HiveMetaStoreProxy.java:68)
>  ~[service-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT] at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_131] at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_131] at 
> 

[jira] [Work logged] (GRIFFIN-363) Hive kerberos for Griffin error

2022-03-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-363?focusedWorklogId=749022=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-749022
 ]

ASF GitHub Bot logged work on GRIFFIN-363:
--

Author: ASF GitHub Bot
Created on: 29/Mar/22 00:03
Start Date: 29/Mar/22 00:03
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #599:
URL: https://github.com/apache/griffin/pull/599


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@griffin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 749022)
Remaining Estimate: 20h  (was: 20h 10m)
Time Spent: 4h  (was: 3h 50m)

> Hive kerberos for Griffin error
> ---
>
> Key: GRIFFIN-363
> URL: https://issues.apache.org/jira/browse/GRIFFIN-363
> Project: Griffin
>  Issue Type: Bug
>  Components: Service Module
>Affects Versions: 0.6.0
> Environment: CentOS 7 
>Reporter: MenghuiWan
>Priority: Minor
>  Labels: kerberos
>   Original Estimate: 24h
>  Time Spent: 4h
>  Remaining Estimate: 20h
>
> i am sorry for my english is not well. 
> 
> This is my problem:
> i try to use griffin for data quality projects.
> our enviroment is CDH 6.1.1 cluster with kerberos.
> i want to connect hive then i set up all configurations about kerberos by 
> user guide.
> but i found hive connection had failed.
> 
> error log is:
> {quote}2021-06-15 13:33:01.418  WARN 102938 — [           main] h.metastore   
>                           [487]  : Failed to connect to the MetaStore 
> Server...2021-06-15 13:33:01.418  WARN 102938 — [           main] h.metastore 
>                             [487]  : Failed to connect to the MetaStore 
> Server...2021-06-15 13:33:01.418  INFO 102938 — [           main] h.metastore 
>                             [518]  : Waiting 1 seconds before next connection 
> attempt.2021-06-15 13:33:02.419  INFO 102938 — [           main] h.metastore  
>                            [434]  : Trying to connect to metastore with URI 
> thrift://..com:90832021-06-15 13:33:02.422 ERROR 102938 — [           
> main] o.a.t.t.TSaslTransport                  [315]  : SASL negotiation 
> failure
>  javax.security.sasl.SaslException: GSS initiate failed at 
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
>  ~[?:1.8.0_131] at 
> org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
>  ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) 
> ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
>  ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_131] at 
> javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_131] at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>  ~[hadoop-common-2.7.1.jar:?] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:478)
>  ~[hive-metastore-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:286)
>  ~[hive-metastore-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:211)
>  ~[hive-metastore-2.2.0.jar:2.2.0] at 
> org.apache.griffin.core.metastore.hive.HiveMetaStoreProxy.initHiveMetastoreClient(HiveMetaStoreProxy.java:68)
>  ~[service-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT] at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_131] at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_131] at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_131] at java.lang.reflect.Method.invoke(Method.java:498) 
> ~[?:1.8.0_131] at 
> org.springframework.beans.factory.support.SimpleInstantiationStrategy.instantiate(SimpleInstantiationStrategy.java:154)
>  ~[spring-beans-5.1.9.RELEASE.jar:5.1.9.RELEASE] at 
> 

[jira] [Work logged] (GRIFFIN-363) Hive kerberos for Griffin error

2022-03-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-363?focusedWorklogId=740486=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-740486
 ]

ASF GitHub Bot logged work on GRIFFIN-363:
--

Author: ASF GitHub Bot
Created on: 13/Mar/22 00:04
Start Date: 13/Mar/22 00:04
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #599:
URL: https://github.com/apache/griffin/pull/599#issuecomment-1065988107


   Automated Message: This PR is being labelled as stale and will be closed in 
next 15 days due to lack of activity. To avoid this push new commits or ask the 
committers for a review/ resolution.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@griffin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 740486)
Remaining Estimate: 20h 10m  (was: 20h 20m)
Time Spent: 3h 50m  (was: 3h 40m)

> Hive kerberos for Griffin error
> ---
>
> Key: GRIFFIN-363
> URL: https://issues.apache.org/jira/browse/GRIFFIN-363
> Project: Griffin
>  Issue Type: Bug
>  Components: Service Module
>Affects Versions: 0.6.0
> Environment: CentOS 7 
>Reporter: MenghuiWan
>Priority: Minor
>  Labels: kerberos
>   Original Estimate: 24h
>  Time Spent: 3h 50m
>  Remaining Estimate: 20h 10m
>
> i am sorry for my english is not well. 
> 
> This is my problem:
> i try to use griffin for data quality projects.
> our enviroment is CDH 6.1.1 cluster with kerberos.
> i want to connect hive then i set up all configurations about kerberos by 
> user guide.
> but i found hive connection had failed.
> 
> error log is:
> {quote}2021-06-15 13:33:01.418  WARN 102938 — [           main] h.metastore   
>                           [487]  : Failed to connect to the MetaStore 
> Server...2021-06-15 13:33:01.418  WARN 102938 — [           main] h.metastore 
>                             [487]  : Failed to connect to the MetaStore 
> Server...2021-06-15 13:33:01.418  INFO 102938 — [           main] h.metastore 
>                             [518]  : Waiting 1 seconds before next connection 
> attempt.2021-06-15 13:33:02.419  INFO 102938 — [           main] h.metastore  
>                            [434]  : Trying to connect to metastore with URI 
> thrift://..com:90832021-06-15 13:33:02.422 ERROR 102938 — [           
> main] o.a.t.t.TSaslTransport                  [315]  : SASL negotiation 
> failure
>  javax.security.sasl.SaslException: GSS initiate failed at 
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
>  ~[?:1.8.0_131] at 
> org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
>  ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) 
> ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
>  ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_131] at 
> javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_131] at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>  ~[hadoop-common-2.7.1.jar:?] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:478)
>  ~[hive-metastore-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:286)
>  ~[hive-metastore-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:211)
>  ~[hive-metastore-2.2.0.jar:2.2.0] at 
> org.apache.griffin.core.metastore.hive.HiveMetaStoreProxy.initHiveMetastoreClient(HiveMetaStoreProxy.java:68)
>  ~[service-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT] at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_131] at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_131] at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_131] at 

[jira] [Work logged] (GRIFFIN-363) Hive kerberos for Griffin error

2022-02-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-363?focusedWorklogId=724205=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-724205
 ]

ASF GitHub Bot logged work on GRIFFIN-363:
--

Author: ASF GitHub Bot
Created on: 10/Feb/22 00:59
Start Date: 10/Feb/22 00:59
Worklog Time Spent: 10m 
  Work Description: wankunde commented on pull request #599:
URL: https://github.com/apache/griffin/pull/599#issuecomment-1034371536


   LGTM.
   
   @chitralverma WDYT?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@griffin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 724205)
Remaining Estimate: 20h 20m  (was: 20.5h)
Time Spent: 3h 40m  (was: 3.5h)

> Hive kerberos for Griffin error
> ---
>
> Key: GRIFFIN-363
> URL: https://issues.apache.org/jira/browse/GRIFFIN-363
> Project: Griffin
>  Issue Type: Bug
>  Components: Service Module
>Affects Versions: 0.6.0
> Environment: CentOS 7 
>Reporter: MenghuiWan
>Priority: Minor
>  Labels: kerberos
>   Original Estimate: 24h
>  Time Spent: 3h 40m
>  Remaining Estimate: 20h 20m
>
> i am sorry for my english is not well. 
> 
> This is my problem:
> i try to use griffin for data quality projects.
> our enviroment is CDH 6.1.1 cluster with kerberos.
> i want to connect hive then i set up all configurations about kerberos by 
> user guide.
> but i found hive connection had failed.
> 
> error log is:
> {quote}2021-06-15 13:33:01.418  WARN 102938 — [           main] h.metastore   
>                           [487]  : Failed to connect to the MetaStore 
> Server...2021-06-15 13:33:01.418  WARN 102938 — [           main] h.metastore 
>                             [487]  : Failed to connect to the MetaStore 
> Server...2021-06-15 13:33:01.418  INFO 102938 — [           main] h.metastore 
>                             [518]  : Waiting 1 seconds before next connection 
> attempt.2021-06-15 13:33:02.419  INFO 102938 — [           main] h.metastore  
>                            [434]  : Trying to connect to metastore with URI 
> thrift://..com:90832021-06-15 13:33:02.422 ERROR 102938 — [           
> main] o.a.t.t.TSaslTransport                  [315]  : SASL negotiation 
> failure
>  javax.security.sasl.SaslException: GSS initiate failed at 
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
>  ~[?:1.8.0_131] at 
> org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
>  ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) 
> ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
>  ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_131] at 
> javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_131] at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>  ~[hadoop-common-2.7.1.jar:?] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:478)
>  ~[hive-metastore-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:286)
>  ~[hive-metastore-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:211)
>  ~[hive-metastore-2.2.0.jar:2.2.0] at 
> org.apache.griffin.core.metastore.hive.HiveMetaStoreProxy.initHiveMetastoreClient(HiveMetaStoreProxy.java:68)
>  ~[service-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT] at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_131] at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_131] at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_131] at java.lang.reflect.Method.invoke(Method.java:498) 
> ~[?:1.8.0_131] at 
> 

[jira] [Work logged] (GRIFFIN-363) Hive kerberos for Griffin error

2022-01-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-363?focusedWorklogId=713554=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-713554
 ]

ASF GitHub Bot logged work on GRIFFIN-363:
--

Author: ASF GitHub Bot
Created on: 24/Jan/22 07:59
Start Date: 24/Jan/22 07:59
Worklog Time Spent: 10m 
  Work Description: lipzhu opened a new pull request #599:
URL: https://github.com/apache/griffin/pull/599


   **What changes were proposed in this pull request?**
   Support the Hive Metastore client authentication method via **kerberos**
   
   **Does this PR introduce any user-facing change?**
   No.
   
   **How was this patch tested?**
   Unit Tests
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@griffin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 713554)
Remaining Estimate: 20.5h  (was: 20h 40m)
Time Spent: 3.5h  (was: 3h 20m)

> Hive kerberos for Griffin error
> ---
>
> Key: GRIFFIN-363
> URL: https://issues.apache.org/jira/browse/GRIFFIN-363
> Project: Griffin
>  Issue Type: Bug
>  Components: Service Module
>Affects Versions: 0.6.0
> Environment: CentOS 7 
>Reporter: MenghuiWan
>Priority: Minor
>  Labels: kerberos
>   Original Estimate: 24h
>  Time Spent: 3.5h
>  Remaining Estimate: 20.5h
>
> i am sorry for my english is not well. 
> 
> This is my problem:
> i try to use griffin for data quality projects.
> our enviroment is CDH 6.1.1 cluster with kerberos.
> i want to connect hive then i set up all configurations about kerberos by 
> user guide.
> but i found hive connection had failed.
> 
> error log is:
> {quote}2021-06-15 13:33:01.418  WARN 102938 — [           main] h.metastore   
>                           [487]  : Failed to connect to the MetaStore 
> Server...2021-06-15 13:33:01.418  WARN 102938 — [           main] h.metastore 
>                             [487]  : Failed to connect to the MetaStore 
> Server...2021-06-15 13:33:01.418  INFO 102938 — [           main] h.metastore 
>                             [518]  : Waiting 1 seconds before next connection 
> attempt.2021-06-15 13:33:02.419  INFO 102938 — [           main] h.metastore  
>                            [434]  : Trying to connect to metastore with URI 
> thrift://..com:90832021-06-15 13:33:02.422 ERROR 102938 — [           
> main] o.a.t.t.TSaslTransport                  [315]  : SASL negotiation 
> failure
>  javax.security.sasl.SaslException: GSS initiate failed at 
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
>  ~[?:1.8.0_131] at 
> org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
>  ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) 
> ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
>  ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_131] at 
> javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_131] at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>  ~[hadoop-common-2.7.1.jar:?] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:478)
>  ~[hive-metastore-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:286)
>  ~[hive-metastore-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:211)
>  ~[hive-metastore-2.2.0.jar:2.2.0] at 
> org.apache.griffin.core.metastore.hive.HiveMetaStoreProxy.initHiveMetastoreClient(HiveMetaStoreProxy.java:68)
>  ~[service-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT] at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_131] at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_131] at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_131] at 

[jira] [Work logged] (GRIFFIN-362) Oracle connection for Apache Griffin

2022-01-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-362?focusedWorklogId=713495=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-713495
 ]

ASF GitHub Bot logged work on GRIFFIN-362:
--

Author: ASF GitHub Bot
Created on: 24/Jan/22 05:21
Start Date: 24/Jan/22 05:21
Worklog Time Spent: 10m 
  Work Description: asfgit closed pull request #597:
URL: https://github.com/apache/griffin/pull/597


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@griffin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 713495)
Time Spent: 50m  (was: 40m)

> Oracle connection for Apache Griffin
> 
>
> Key: GRIFFIN-362
> URL: https://issues.apache.org/jira/browse/GRIFFIN-362
> Project: Griffin
>  Issue Type: Bug
>  Components: accuracy-batch
>Affects Versions: 0.6.0
> Environment: Dev
>Reporter: Praveen Kurup
>Priority: Blocker
> Attachments: image-2021-04-27-23-07-33-681.png
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Hello Team,
> We are doing a POC using Apache griffin for data quality projects.
> We have a requirement to check data quality between Oracle and Hive tables.
> Hive connection is working fine, but we are not able to establish Oracle 
> connection.
> I would like to understand if Apache Griffin supports jdbc Oracle connection 
> using oracle.jdbc.driver.OracleDriver driver. I tried using Mysql jdbc 
> connection template to pass Oracle connection details, however it didn't 
> work. I am getting below error:
> {color:#FF}ERROR griffin: JDBC driver oracle.jdbc.driver.OracleDriver 
> provided is not found in class path{color}
> {color:#FF}java.lang.ClassNotFoundException: 
> oracle.jdbc.driver.OracleDriver{color}
> {color:#FF}!image-2021-04-27-23-07-33-681.png|width=385,height=154!{color}
> {color:#172b4d}Please let me know if there is any way to establish Oracle 
> database connectivity from Griffin.{color}
> {color:#172b4d}Also, please share if there are any documentations available 
> to achieve the same.{color}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (GRIFFIN-362) Oracle connection for Apache Griffin

2022-01-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-362?focusedWorklogId=713494=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-713494
 ]

ASF GitHub Bot logged work on GRIFFIN-362:
--

Author: ASF GitHub Bot
Created on: 24/Jan/22 05:19
Start Date: 24/Jan/22 05:19
Worklog Time Spent: 10m 
  Work Description: chitralverma commented on pull request #597:
URL: https://github.com/apache/griffin/pull/597#issuecomment-1019731142


   LGTM, merging.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@griffin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 713494)
Time Spent: 40m  (was: 0.5h)

> Oracle connection for Apache Griffin
> 
>
> Key: GRIFFIN-362
> URL: https://issues.apache.org/jira/browse/GRIFFIN-362
> Project: Griffin
>  Issue Type: Bug
>  Components: accuracy-batch
>Affects Versions: 0.6.0
> Environment: Dev
>Reporter: Praveen Kurup
>Priority: Blocker
> Attachments: image-2021-04-27-23-07-33-681.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Hello Team,
> We are doing a POC using Apache griffin for data quality projects.
> We have a requirement to check data quality between Oracle and Hive tables.
> Hive connection is working fine, but we are not able to establish Oracle 
> connection.
> I would like to understand if Apache Griffin supports jdbc Oracle connection 
> using oracle.jdbc.driver.OracleDriver driver. I tried using Mysql jdbc 
> connection template to pass Oracle connection details, however it didn't 
> work. I am getting below error:
> {color:#FF}ERROR griffin: JDBC driver oracle.jdbc.driver.OracleDriver 
> provided is not found in class path{color}
> {color:#FF}java.lang.ClassNotFoundException: 
> oracle.jdbc.driver.OracleDriver{color}
> {color:#FF}!image-2021-04-27-23-07-33-681.png|width=385,height=154!{color}
> {color:#172b4d}Please let me know if there is any way to establish Oracle 
> database connectivity from Griffin.{color}
> {color:#172b4d}Also, please share if there are any documentations available 
> to achieve the same.{color}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (GRIFFIN-369) Bug fix for avro format in data connector

2022-01-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-369?focusedWorklogId=713493=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-713493
 ]

ASF GitHub Bot logged work on GRIFFIN-369:
--

Author: ASF GitHub Bot
Created on: 24/Jan/22 05:10
Start Date: 24/Jan/22 05:10
Worklog Time Spent: 10m 
  Work Description: chitralverma closed pull request #598:
URL: https://github.com/apache/griffin/pull/598


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@griffin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 713493)
Time Spent: 1h  (was: 50m)

> Bug fix for avro format in data connector
> -
>
> Key: GRIFFIN-369
> URL: https://issues.apache.org/jira/browse/GRIFFIN-369
> Project: Griffin
>  Issue Type: Bug
>Reporter: Zhu, Lipeng
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> built-in AVRO data source implementation is released in spark 2.4.0. 
> In spark 2.3.x, we need to use 
> com.databricks.spark.avro.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (GRIFFIN-369) Bug fix for avro format in data connector

2022-01-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-369?focusedWorklogId=713492=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-713492
 ]

ASF GitHub Bot logged work on GRIFFIN-369:
--

Author: ASF GitHub Bot
Created on: 24/Jan/22 05:10
Start Date: 24/Jan/22 05:10
Worklog Time Spent: 10m 
  Work Description: lipzhu opened a new pull request #598:
URL: https://github.com/apache/griffin/pull/598


   **What changes were proposed in this pull request?**
   Built in Avro format is released in Spark 
2.4.0,https://issues.apache.org/jira/browse/SPARK-24768
   For Griffin, we still need to convert the Avro to com.databricks.spark.avro 
in Spark 2.3.x environment. 
   
   **Does this PR introduce any user-facing change?**
   No.
   
   **How was this patch tested?**
   Unit Tests
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@griffin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 713492)
Time Spent: 50m  (was: 40m)

> Bug fix for avro format in data connector
> -
>
> Key: GRIFFIN-369
> URL: https://issues.apache.org/jira/browse/GRIFFIN-369
> Project: Griffin
>  Issue Type: Bug
>Reporter: Zhu, Lipeng
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> built-in AVRO data source implementation is released in spark 2.4.0. 
> In spark 2.3.x, we need to use 
> com.databricks.spark.avro.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (GRIFFIN-369) Bug fix for avro format in data connector

2022-01-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-369?focusedWorklogId=713491=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-713491
 ]

ASF GitHub Bot logged work on GRIFFIN-369:
--

Author: ASF GitHub Bot
Created on: 24/Jan/22 05:06
Start Date: 24/Jan/22 05:06
Worklog Time Spent: 10m 
  Work Description: asfgit closed pull request #598:
URL: https://github.com/apache/griffin/pull/598


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@griffin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 713491)
Time Spent: 40m  (was: 0.5h)

> Bug fix for avro format in data connector
> -
>
> Key: GRIFFIN-369
> URL: https://issues.apache.org/jira/browse/GRIFFIN-369
> Project: Griffin
>  Issue Type: Bug
>Reporter: Zhu, Lipeng
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> built-in AVRO data source implementation is released in spark 2.4.0. 
> In spark 2.3.x, we need to use 
> com.databricks.spark.avro.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (GRIFFIN-369) Bug fix for avro format in data connector

2022-01-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-369?focusedWorklogId=713489=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-713489
 ]

ASF GitHub Bot logged work on GRIFFIN-369:
--

Author: ASF GitHub Bot
Created on: 24/Jan/22 04:50
Start Date: 24/Jan/22 04:50
Worklog Time Spent: 10m 
  Work Description: chitralverma commented on a change in pull request #598:
URL: https://github.com/apache/griffin/pull/598#discussion_r790412145



##
File path: 
measure/src/main/scala/org/apache/griffin/measure/datasource/connector/batch/FileBasedDataConnector.scala
##
@@ -79,7 +79,8 @@ case class FileBasedDataConnector(
 SupportedFormats.contains(format),
 s"Invalid format '$format' specified. Must be one of 
${SupportedFormats.mkString("['", "', '", "']")}")
 
-  if (format.equalsIgnoreCase("avro") && sparkSession.version < "2.3.0") {
+  // Use old implementation for AVRO format if current spark version is not 
2.4.x and above
+  if (format.equalsIgnoreCase("avro") && sparkSession.version < "2.4.0") {

Review comment:
   ```suggestion
 if ("avro".equalsIgnoreCase(format) && sparkSession.version < "2.4.0") {
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@griffin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 713489)
Time Spent: 0.5h  (was: 20m)

> Bug fix for avro format in data connector
> -
>
> Key: GRIFFIN-369
> URL: https://issues.apache.org/jira/browse/GRIFFIN-369
> Project: Griffin
>  Issue Type: Bug
>Reporter: Zhu, Lipeng
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> built-in AVRO data source implementation is released in spark 2.4.0. 
> In spark 2.3.x, we need to use 
> com.databricks.spark.avro.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (GRIFFIN-369) Bug fix for avro format in data connector

2022-01-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-369?focusedWorklogId=713488=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-713488
 ]

ASF GitHub Bot logged work on GRIFFIN-369:
--

Author: ASF GitHub Bot
Created on: 24/Jan/22 04:48
Start Date: 24/Jan/22 04:48
Worklog Time Spent: 10m 
  Work Description: chitralverma commented on a change in pull request #598:
URL: https://github.com/apache/griffin/pull/598#discussion_r790411689



##
File path: 
measure/src/main/scala/org/apache/griffin/measure/datasource/connector/batch/FileBasedDataConnector.scala
##
@@ -79,7 +79,8 @@ case class FileBasedDataConnector(
 SupportedFormats.contains(format),
 s"Invalid format '$format' specified. Must be one of 
${SupportedFormats.mkString("['", "', '", "']")}")
 
-  if (format.equalsIgnoreCase("avro") && sparkSession.version < "2.3.0") {
+  // built-in AVRO data source implementation is released in spark 2.4.0

Review comment:
   Suggestion:
   
   ```suggestion
 // Use old implementation for AVRO format if current spark version is not 
2.4.x and above
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@griffin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 713488)
Time Spent: 20m  (was: 10m)

> Bug fix for avro format in data connector
> -
>
> Key: GRIFFIN-369
> URL: https://issues.apache.org/jira/browse/GRIFFIN-369
> Project: Griffin
>  Issue Type: Bug
>Reporter: Zhu, Lipeng
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> built-in AVRO data source implementation is released in spark 2.4.0. 
> In spark 2.3.x, we need to use 
> com.databricks.spark.avro.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (GRIFFIN-367) Deployment guide doc update

2022-01-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-367?focusedWorklogId=711893=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-711893
 ]

ASF GitHub Bot logged work on GRIFFIN-367:
--

Author: ASF GitHub Bot
Created on: 20/Jan/22 06:53
Start Date: 20/Jan/22 06:53
Worklog Time Spent: 10m 
  Work Description: whhe merged pull request #596:
URL: https://github.com/apache/griffin/pull/596


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@griffin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 711893)
Time Spent: 1h  (was: 50m)

> Deployment guide doc update
> ---
>
> Key: GRIFFIN-367
> URL: https://issues.apache.org/jira/browse/GRIFFIN-367
> Project: Griffin
>  Issue Type: Improvement
>Reporter: Zhu, Lipeng
>Priority: Minor
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Update deployment guide.
>  # Move the service-${version}.tar.gz from parent target folder to service 
> module target.
>  # Remove the diff change history for hive-site.xml.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (GRIFFIN-369) Bug fix for avro format in data connector

2022-01-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-369?focusedWorklogId=710482=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-710482
 ]

ASF GitHub Bot logged work on GRIFFIN-369:
--

Author: ASF GitHub Bot
Created on: 18/Jan/22 12:58
Start Date: 18/Jan/22 12:58
Worklog Time Spent: 10m 
  Work Description: lipzhu opened a new pull request #598:
URL: https://github.com/apache/griffin/pull/598


   **What changes were proposed in this pull request?**
   Built in Avro format is released in Spark 
2.4.0,https://issues.apache.org/jira/browse/SPARK-24768
   For Griffin, we still need to convert the Avro to com.databricks.spark.avro 
in Spark 2.3.x environment. 
   
   **Does this PR introduce any user-facing change?**
   No.
   
   **How was this patch tested?**
   Unit Tests
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@griffin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 710482)
Remaining Estimate: 0h
Time Spent: 10m

> Bug fix for avro format in data connector
> -
>
> Key: GRIFFIN-369
> URL: https://issues.apache.org/jira/browse/GRIFFIN-369
> Project: Griffin
>  Issue Type: Bug
>Reporter: Zhu, Lipeng
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> built-in AVRO data source implementation is released in spark 2.4.0. 
> In spark 2.3.x, we need to use 
> com.databricks.spark.avro.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (GRIFFIN-362) Oracle connection for Apache Griffin

2022-01-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-362?focusedWorklogId=710228=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-710228
 ]

ASF GitHub Bot logged work on GRIFFIN-362:
--

Author: ASF GitHub Bot
Created on: 18/Jan/22 03:35
Start Date: 18/Jan/22 03:35
Worklog Time Spent: 10m 
  Work Description: lipzhu commented on a change in pull request #597:
URL: https://github.com/apache/griffin/pull/597#discussion_r786393569



##
File path: measure/pom.xml
##
@@ -53,6 +53,8 @@ under the License.
 2.3.0
 2.1.0
 2.10.0
+9.4.1212.jre7

Review comment:
   > @lipzhu is there a valid JIRA ticket for this feature? If not then 
please create one, link it to this PR and also add a valid description to the 
PR.
   > 
   > Most of the PRs follow this process. Ref: #593
   
   @chitralverma Thanks for you review, this PR try to resolve 
https://issues.apache.org/jira/browse/GRIFFIN-362. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@griffin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 710228)
Time Spent: 0.5h  (was: 20m)

> Oracle connection for Apache Griffin
> 
>
> Key: GRIFFIN-362
> URL: https://issues.apache.org/jira/browse/GRIFFIN-362
> Project: Griffin
>  Issue Type: Bug
>  Components: accuracy-batch
>Affects Versions: 0.6.0
> Environment: Dev
>Reporter: Praveen Kurup
>Priority: Blocker
> Attachments: image-2021-04-27-23-07-33-681.png
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Hello Team,
> We are doing a POC using Apache griffin for data quality projects.
> We have a requirement to check data quality between Oracle and Hive tables.
> Hive connection is working fine, but we are not able to establish Oracle 
> connection.
> I would like to understand if Apache Griffin supports jdbc Oracle connection 
> using oracle.jdbc.driver.OracleDriver driver. I tried using Mysql jdbc 
> connection template to pass Oracle connection details, however it didn't 
> work. I am getting below error:
> {color:#FF}ERROR griffin: JDBC driver oracle.jdbc.driver.OracleDriver 
> provided is not found in class path{color}
> {color:#FF}java.lang.ClassNotFoundException: 
> oracle.jdbc.driver.OracleDriver{color}
> {color:#FF}!image-2021-04-27-23-07-33-681.png|width=385,height=154!{color}
> {color:#172b4d}Please let me know if there is any way to establish Oracle 
> database connectivity from Griffin.{color}
> {color:#172b4d}Also, please share if there are any documentations available 
> to achieve the same.{color}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (GRIFFIN-367) Deployment guide doc update

2022-01-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-367?focusedWorklogId=709952=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-709952
 ]

ASF GitHub Bot logged work on GRIFFIN-367:
--

Author: ASF GitHub Bot
Created on: 17/Jan/22 14:11
Start Date: 17/Jan/22 14:11
Worklog Time Spent: 10m 
  Work Description: chitralverma commented on a change in pull request #596:
URL: https://github.com/apache/griffin/pull/596#discussion_r786042592



##
File path: service/pom.xml
##
@@ -385,7 +385,6 @@ under the License.
 
 false
 false
-../target

Review comment:
   @lipzhu please update the docs instead




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@griffin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 709952)
Time Spent: 50m  (was: 40m)

> Deployment guide doc update
> ---
>
> Key: GRIFFIN-367
> URL: https://issues.apache.org/jira/browse/GRIFFIN-367
> Project: Griffin
>  Issue Type: Improvement
>Reporter: Zhu, Lipeng
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Update deployment guide.
>  # Move the service-${version}.tar.gz from parent target folder to service 
> module target.
>  # Remove the diff change history for hive-site.xml.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (GRIFFIN-362) Oracle connection for Apache Griffin

2022-01-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-362?focusedWorklogId=709951=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-709951
 ]

ASF GitHub Bot logged work on GRIFFIN-362:
--

Author: ASF GitHub Bot
Created on: 17/Jan/22 14:07
Start Date: 17/Jan/22 14:07
Worklog Time Spent: 10m 
  Work Description: chitralverma commented on a change in pull request #597:
URL: https://github.com/apache/griffin/pull/597#discussion_r786035047



##
File path: measure/pom.xml
##
@@ -53,6 +53,8 @@ under the License.
 2.3.0
 2.1.0
 2.10.0
+9.4.1212.jre7

Review comment:
   This version is compiled with java7 and is from 2016. Please use a more 
recent and stable version.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@griffin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 709951)
Time Spent: 20m  (was: 10m)

> Oracle connection for Apache Griffin
> 
>
> Key: GRIFFIN-362
> URL: https://issues.apache.org/jira/browse/GRIFFIN-362
> Project: Griffin
>  Issue Type: Bug
>  Components: accuracy-batch
>Affects Versions: 0.6.0
> Environment: Dev
>Reporter: Praveen Kurup
>Priority: Blocker
> Attachments: image-2021-04-27-23-07-33-681.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Hello Team,
> We are doing a POC using Apache griffin for data quality projects.
> We have a requirement to check data quality between Oracle and Hive tables.
> Hive connection is working fine, but we are not able to establish Oracle 
> connection.
> I would like to understand if Apache Griffin supports jdbc Oracle connection 
> using oracle.jdbc.driver.OracleDriver driver. I tried using Mysql jdbc 
> connection template to pass Oracle connection details, however it didn't 
> work. I am getting below error:
> {color:#FF}ERROR griffin: JDBC driver oracle.jdbc.driver.OracleDriver 
> provided is not found in class path{color}
> {color:#FF}java.lang.ClassNotFoundException: 
> oracle.jdbc.driver.OracleDriver{color}
> {color:#FF}!image-2021-04-27-23-07-33-681.png|width=385,height=154!{color}
> {color:#172b4d}Please let me know if there is any way to establish Oracle 
> database connectivity from Griffin.{color}
> {color:#172b4d}Also, please share if there are any documentations available 
> to achieve the same.{color}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (GRIFFIN-362) Oracle connection for Apache Griffin

2022-01-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-362?focusedWorklogId=709948=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-709948
 ]

ASF GitHub Bot logged work on GRIFFIN-362:
--

Author: ASF GitHub Bot
Created on: 17/Jan/22 14:03
Start Date: 17/Jan/22 14:03
Worklog Time Spent: 10m 
  Work Description: chitralverma commented on pull request #597:
URL: https://github.com/apache/griffin/pull/597#issuecomment-1014578475


   @lipzhu  is there a valid JIRA ticket for this feature?
   If not then please create one, link it to this PR and also add a valid 
description to the PR.
   
   Most of the PRs follow this process.
   Ref: https://github.com/apache/griffin/pull/593
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@griffin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 709948)
Remaining Estimate: 0h
Time Spent: 10m

> Oracle connection for Apache Griffin
> 
>
> Key: GRIFFIN-362
> URL: https://issues.apache.org/jira/browse/GRIFFIN-362
> Project: Griffin
>  Issue Type: Bug
>  Components: accuracy-batch
>Affects Versions: 0.6.0
> Environment: Dev
>Reporter: Praveen Kurup
>Priority: Blocker
> Attachments: image-2021-04-27-23-07-33-681.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hello Team,
> We are doing a POC using Apache griffin for data quality projects.
> We have a requirement to check data quality between Oracle and Hive tables.
> Hive connection is working fine, but we are not able to establish Oracle 
> connection.
> I would like to understand if Apache Griffin supports jdbc Oracle connection 
> using oracle.jdbc.driver.OracleDriver driver. I tried using Mysql jdbc 
> connection template to pass Oracle connection details, however it didn't 
> work. I am getting below error:
> {color:#FF}ERROR griffin: JDBC driver oracle.jdbc.driver.OracleDriver 
> provided is not found in class path{color}
> {color:#FF}java.lang.ClassNotFoundException: 
> oracle.jdbc.driver.OracleDriver{color}
> {color:#FF}!image-2021-04-27-23-07-33-681.png|width=385,height=154!{color}
> {color:#172b4d}Please let me know if there is any way to establish Oracle 
> database connectivity from Griffin.{color}
> {color:#172b4d}Also, please share if there are any documentations available 
> to achieve the same.{color}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (GRIFFIN-367) Deployment guide doc update

2022-01-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-367?focusedWorklogId=709887=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-709887
 ]

ASF GitHub Bot logged work on GRIFFIN-367:
--

Author: ASF GitHub Bot
Created on: 17/Jan/22 12:00
Start Date: 17/Jan/22 12:00
Worklog Time Spent: 10m 
  Work Description: whhe commented on a change in pull request #596:
URL: https://github.com/apache/griffin/pull/596#discussion_r785921305



##
File path: service/pom.xml
##
@@ -385,7 +385,6 @@ under the License.
 
 false
 false
-../target

Review comment:
   > The service-${version}.tar.gz location is not align with the 
Documention in 
https://github.com/apache/griffin/blob/master/griffin-doc/deploy/deploy-guide.md,
 or we can update the documentation to remove the confuse.
   > 
   > > > > It's easy to build Griffin, just run maven command mvn clean 
install. Successfully building, you can get service-${version}.tar.gz and 
measure-${version}.jar from target folder in **service** and **measure** module.
   
   You're right. I think it's better to update the doc, and the following 
commands should also be modified at the same time.
   
   ```bash
   cd $GRIFFIN_INSTALL_DIR
   tar -zxvf service-${version}.tar.gz
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@griffin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 709887)
Time Spent: 40m  (was: 0.5h)

> Deployment guide doc update
> ---
>
> Key: GRIFFIN-367
> URL: https://issues.apache.org/jira/browse/GRIFFIN-367
> Project: Griffin
>  Issue Type: Improvement
>Reporter: Zhu, Lipeng
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Update deployment guide.
>  # Move the service-${version}.tar.gz from parent target folder to service 
> module target.
>  # Remove the diff change history for hive-site.xml.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (GRIFFIN-367) Deployment guide doc update

2022-01-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-367?focusedWorklogId=709852=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-709852
 ]

ASF GitHub Bot logged work on GRIFFIN-367:
--

Author: ASF GitHub Bot
Created on: 17/Jan/22 11:09
Start Date: 17/Jan/22 11:09
Worklog Time Spent: 10m 
  Work Description: lipzhu commented on a change in pull request #596:
URL: https://github.com/apache/griffin/pull/596#discussion_r785865424



##
File path: service/pom.xml
##
@@ -385,7 +385,6 @@ under the License.
 
 false
 false
-../target

Review comment:
   The service-${version}.tar.gz location is not align with the Documention 
in 
https://github.com/apache/griffin/blob/master/griffin-doc/deploy/deploy-guide.md,
 or we can update the documentation to remove the confuse.  
   
   >>>It's easy to build Griffin, just run maven command mvn clean install. 
Successfully building, you can get service-${version}.tar.gz and 
measure-${version}.jar from target folder in **service** and **measure** module.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@griffin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 709852)
Time Spent: 0.5h  (was: 20m)

> Deployment guide doc update
> ---
>
> Key: GRIFFIN-367
> URL: https://issues.apache.org/jira/browse/GRIFFIN-367
> Project: Griffin
>  Issue Type: Improvement
>Reporter: Zhu, Lipeng
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Update deployment guide.
>  # Move the service-${version}.tar.gz from parent target folder to service 
> module target.
>  # Remove the diff change history for hive-site.xml.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (GRIFFIN-367) Deployment guide doc update

2022-01-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-367?focusedWorklogId=709847=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-709847
 ]

ASF GitHub Bot logged work on GRIFFIN-367:
--

Author: ASF GitHub Bot
Created on: 17/Jan/22 10:57
Start Date: 17/Jan/22 10:57
Worklog Time Spent: 10m 
  Work Description: whhe commented on a change in pull request #596:
URL: https://github.com/apache/griffin/pull/596#discussion_r785853228



##
File path: service/pom.xml
##
@@ -385,7 +385,6 @@ under the License.
 
 false
 false
-../target

Review comment:
   Why move it?  I think it's appropriate to put the deployment package at 
root directory. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@griffin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 709847)
Time Spent: 20m  (was: 10m)

> Deployment guide doc update
> ---
>
> Key: GRIFFIN-367
> URL: https://issues.apache.org/jira/browse/GRIFFIN-367
> Project: Griffin
>  Issue Type: Improvement
>Reporter: Zhu, Lipeng
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Update deployment guide.
>  # Move the service-${version}.tar.gz from parent target folder to service 
> module target.
>  # Remove the diff change history for hive-site.xml.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (GRIFFIN-367) Deployment guide doc update

2022-01-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-367?focusedWorklogId=709739=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-709739
 ]

ASF GitHub Bot logged work on GRIFFIN-367:
--

Author: ASF GitHub Bot
Created on: 17/Jan/22 07:27
Start Date: 17/Jan/22 07:27
Worklog Time Spent: 10m 
  Work Description: lipzhu opened a new pull request #596:
URL: https://github.com/apache/griffin/pull/596


   1. Move the service-${version}.tar.gz from parent target folder to service 
module target.
   2. Remove the diff change history for hive-site.xml.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@griffin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 709739)
Remaining Estimate: 0h
Time Spent: 10m

> Deployment guide doc update
> ---
>
> Key: GRIFFIN-367
> URL: https://issues.apache.org/jira/browse/GRIFFIN-367
> Project: Griffin
>  Issue Type: Improvement
>Reporter: Zhu, Lipeng
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Update deployment guide.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (GRIFFIN-365) Measure Enhancements and Stability fixes

2021-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-365?focusedWorklogId=659633=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-659633
 ]

ASF GitHub Bot logged work on GRIFFIN-365:
--

Author: ASF GitHub Bot
Created on: 04/Oct/21 15:12
Start Date: 04/Oct/21 15:12
Worklog Time Spent: 10m 
  Work Description: whhe merged pull request #593:
URL: https://github.com/apache/griffin/pull/593


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@griffin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 659633)
Time Spent: 40m  (was: 0.5h)

> Measure Enhancements and Stability fixes
> 
>
> Key: GRIFFIN-365
> URL: https://issues.apache.org/jira/browse/GRIFFIN-365
> Project: Griffin
>  Issue Type: Improvement
>  Components: Measure Module
>Reporter: Chitral Verma
>Assignee: Chitral Verma
>Priority: Critical
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> General updates and fixes to the new measures added as part of 
> [GRIFFIN-358|https://issues.apache.org/jira/projects/GRIFFIN/issues/GRIFFIN-358]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GRIFFIN-365) Measure Enhancements and Stability fixes

2021-09-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-365?focusedWorklogId=658112=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-658112
 ]

ASF GitHub Bot logged work on GRIFFIN-365:
--

Author: ASF GitHub Bot
Created on: 30/Sep/21 06:37
Start Date: 30/Sep/21 06:37
Worklog Time Spent: 10m 
  Work Description: guoyuepeng commented on pull request #593:
URL: https://github.com/apache/griffin/pull/593#issuecomment-930854008


   LGTM.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@griffin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 658112)
Time Spent: 0.5h  (was: 20m)

> Measure Enhancements and Stability fixes
> 
>
> Key: GRIFFIN-365
> URL: https://issues.apache.org/jira/browse/GRIFFIN-365
> Project: Griffin
>  Issue Type: Improvement
>  Components: Measure Module
>Reporter: Chitral Verma
>Assignee: Chitral Verma
>Priority: Critical
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> General updates and fixes to the new measures added as part of 
> [GRIFFIN-358|https://issues.apache.org/jira/projects/GRIFFIN/issues/GRIFFIN-358]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GRIFFIN-365) Measure Enhancements and Stability fixes

2021-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-365?focusedWorklogId=655018=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-655018
 ]

ASF GitHub Bot logged work on GRIFFIN-365:
--

Author: ASF GitHub Bot
Created on: 24/Sep/21 16:26
Start Date: 24/Sep/21 16:26
Worklog Time Spent: 10m 
  Work Description: chitralverma commented on pull request #593:
URL: https://github.com/apache/griffin/pull/593#issuecomment-926762937


   @wankunde @guoyuepeng Can you please review this. thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@griffin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 655018)
Time Spent: 20m  (was: 10m)

> Measure Enhancements and Stability fixes
> 
>
> Key: GRIFFIN-365
> URL: https://issues.apache.org/jira/browse/GRIFFIN-365
> Project: Griffin
>  Issue Type: Improvement
>  Components: Measure Module
>Reporter: Chitral Verma
>Assignee: Chitral Verma
>Priority: Critical
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> General updates and fixes to the new measures added as part of 
> [GRIFFIN-358|https://issues.apache.org/jira/projects/GRIFFIN/issues/GRIFFIN-358]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GRIFFIN-365) Measure Enhancements and Stability fixes

2021-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-365?focusedWorklogId=655015=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-655015
 ]

ASF GitHub Bot logged work on GRIFFIN-365:
--

Author: ASF GitHub Bot
Created on: 24/Sep/21 16:21
Start Date: 24/Sep/21 16:21
Worklog Time Spent: 10m 
  Work Description: chitralverma opened a new pull request #593:
URL: https://github.com/apache/griffin/pull/593


   **What changes were proposed in this pull request?**
   General updates and fixes to the new measures added as part of GRIFFIN-358
   
   Key changes:
   - Scapegoat code analysis and other minor changes to `pom.xml`
   - Handling of corner cases in measures
   - Better exception handling and logging for measures
   - Minor Updates to documentation and tests
   
   **Does this PR introduce any user-facing change?**
   Yes. Expression for completeness measure checks for complete data.
   
   How was this patch tested?
   Unit Tests


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@griffin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 655015)
Remaining Estimate: 0h
Time Spent: 10m

> Measure Enhancements and Stability fixes
> 
>
> Key: GRIFFIN-365
> URL: https://issues.apache.org/jira/browse/GRIFFIN-365
> Project: Griffin
>  Issue Type: Improvement
>  Components: Measure Module
>Reporter: Chitral Verma
>Assignee: Chitral Verma
>Priority: Critical
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> General updates and fixes to the new measures added as part of 
> [GRIFFIN-358|https://issues.apache.org/jira/projects/GRIFFIN/issues/GRIFFIN-358]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GRIFFIN-363) Hive kerberos for Griffin error

2021-08-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-363?focusedWorklogId=637969=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637969
 ]

ASF GitHub Bot logged work on GRIFFIN-363:
--

Author: ASF GitHub Bot
Created on: 15/Aug/21 00:02
Start Date: 15/Aug/21 00:02
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #592:
URL: https://github.com/apache/griffin/pull/592#issuecomment-898972968


   Automated Message: We're closing this PR because it hasn't been updated in a 
while. This isn't a judgement on the merit of the PR in any way. It's just a 
way of keeping the PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to 
remove the 'no-pr-activity' tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@griffin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 637969)
Remaining Estimate: 20h 50m  (was: 21h)
Time Spent: 3h 10m  (was: 3h)

> Hive kerberos for Griffin error
> ---
>
> Key: GRIFFIN-363
> URL: https://issues.apache.org/jira/browse/GRIFFIN-363
> Project: Griffin
>  Issue Type: Bug
>  Components: Service Module
>Affects Versions: 0.6.0
> Environment: CentOS 7 
>Reporter: MenghuiWan
>Priority: Minor
>  Labels: kerberos
>   Original Estimate: 24h
>  Time Spent: 3h 10m
>  Remaining Estimate: 20h 50m
>
> i am sorry for my english is not well. 
> 
> This is my problem:
> i try to use griffin for data quality projects.
> our enviroment is CDH 6.1.1 cluster with kerberos.
> i want to connect hive then i set up all configurations about kerberos by 
> user guide.
> but i found hive connection had failed.
> 
> error log is:
> {quote}2021-06-15 13:33:01.418  WARN 102938 — [           main] h.metastore   
>                           [487]  : Failed to connect to the MetaStore 
> Server...2021-06-15 13:33:01.418  WARN 102938 — [           main] h.metastore 
>                             [487]  : Failed to connect to the MetaStore 
> Server...2021-06-15 13:33:01.418  INFO 102938 — [           main] h.metastore 
>                             [518]  : Waiting 1 seconds before next connection 
> attempt.2021-06-15 13:33:02.419  INFO 102938 — [           main] h.metastore  
>                            [434]  : Trying to connect to metastore with URI 
> thrift://..com:90832021-06-15 13:33:02.422 ERROR 102938 — [           
> main] o.a.t.t.TSaslTransport                  [315]  : SASL negotiation 
> failure
>  javax.security.sasl.SaslException: GSS initiate failed at 
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
>  ~[?:1.8.0_131] at 
> org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
>  ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) 
> ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
>  ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_131] at 
> javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_131] at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>  ~[hadoop-common-2.7.1.jar:?] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:478)
>  ~[hive-metastore-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:286)
>  ~[hive-metastore-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:211)
>  ~[hive-metastore-2.2.0.jar:2.2.0] at 
> org.apache.griffin.core.metastore.hive.HiveMetaStoreProxy.initHiveMetastoreClient(HiveMetaStoreProxy.java:68)
>  ~[service-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT] at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_131] at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_131] at 
> 

[jira] [Work logged] (GRIFFIN-363) Hive kerberos for Griffin error

2021-08-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-363?focusedWorklogId=637970=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637970
 ]

ASF GitHub Bot logged work on GRIFFIN-363:
--

Author: ASF GitHub Bot
Created on: 15/Aug/21 00:02
Start Date: 15/Aug/21 00:02
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #592:
URL: https://github.com/apache/griffin/pull/592


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@griffin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 637970)
Remaining Estimate: 20h 40m  (was: 20h 50m)
Time Spent: 3h 20m  (was: 3h 10m)

> Hive kerberos for Griffin error
> ---
>
> Key: GRIFFIN-363
> URL: https://issues.apache.org/jira/browse/GRIFFIN-363
> Project: Griffin
>  Issue Type: Bug
>  Components: Service Module
>Affects Versions: 0.6.0
> Environment: CentOS 7 
>Reporter: MenghuiWan
>Priority: Minor
>  Labels: kerberos
>   Original Estimate: 24h
>  Time Spent: 3h 20m
>  Remaining Estimate: 20h 40m
>
> i am sorry for my english is not well. 
> 
> This is my problem:
> i try to use griffin for data quality projects.
> our enviroment is CDH 6.1.1 cluster with kerberos.
> i want to connect hive then i set up all configurations about kerberos by 
> user guide.
> but i found hive connection had failed.
> 
> error log is:
> {quote}2021-06-15 13:33:01.418  WARN 102938 — [           main] h.metastore   
>                           [487]  : Failed to connect to the MetaStore 
> Server...2021-06-15 13:33:01.418  WARN 102938 — [           main] h.metastore 
>                             [487]  : Failed to connect to the MetaStore 
> Server...2021-06-15 13:33:01.418  INFO 102938 — [           main] h.metastore 
>                             [518]  : Waiting 1 seconds before next connection 
> attempt.2021-06-15 13:33:02.419  INFO 102938 — [           main] h.metastore  
>                            [434]  : Trying to connect to metastore with URI 
> thrift://..com:90832021-06-15 13:33:02.422 ERROR 102938 — [           
> main] o.a.t.t.TSaslTransport                  [315]  : SASL negotiation 
> failure
>  javax.security.sasl.SaslException: GSS initiate failed at 
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
>  ~[?:1.8.0_131] at 
> org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
>  ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) 
> ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
>  ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_131] at 
> javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_131] at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>  ~[hadoop-common-2.7.1.jar:?] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:478)
>  ~[hive-metastore-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:286)
>  ~[hive-metastore-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:211)
>  ~[hive-metastore-2.2.0.jar:2.2.0] at 
> org.apache.griffin.core.metastore.hive.HiveMetaStoreProxy.initHiveMetastoreClient(HiveMetaStoreProxy.java:68)
>  ~[service-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT] at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_131] at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_131] at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_131] at java.lang.reflect.Method.invoke(Method.java:498) 
> ~[?:1.8.0_131] at 
> org.springframework.beans.factory.support.SimpleInstantiationStrategy.instantiate(SimpleInstantiationStrategy.java:154)
>  

[jira] [Work logged] (GRIFFIN-363) Hive kerberos for Griffin error

2021-07-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-363?focusedWorklogId=631916=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-631916
 ]

ASF GitHub Bot logged work on GRIFFIN-363:
--

Author: ASF GitHub Bot
Created on: 31/Jul/21 00:02
Start Date: 31/Jul/21 00:02
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #592:
URL: https://github.com/apache/griffin/pull/592#issuecomment-890259023


   Automated Message: This PR is being labelled as stale and will be closed in 
next 15 days due to lack of activity. To avoid this push new commits or ask the 
committers for a review/ resolution.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@griffin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 631916)
Remaining Estimate: 21h  (was: 21h 10m)
Time Spent: 3h  (was: 2h 50m)

> Hive kerberos for Griffin error
> ---
>
> Key: GRIFFIN-363
> URL: https://issues.apache.org/jira/browse/GRIFFIN-363
> Project: Griffin
>  Issue Type: Bug
>  Components: Service Module
>Affects Versions: 0.6.0
> Environment: CentOS 7 
>Reporter: MenghuiWan
>Priority: Minor
>  Labels: kerberos
>   Original Estimate: 24h
>  Time Spent: 3h
>  Remaining Estimate: 21h
>
> i am sorry for my english is not well. 
> 
> This is my problem:
> i try to use griffin for data quality projects.
> our enviroment is CDH 6.1.1 cluster with kerberos.
> i want to connect hive then i set up all configurations about kerberos by 
> user guide.
> but i found hive connection had failed.
> 
> error log is:
> {quote}2021-06-15 13:33:01.418  WARN 102938 — [           main] h.metastore   
>                           [487]  : Failed to connect to the MetaStore 
> Server...2021-06-15 13:33:01.418  WARN 102938 — [           main] h.metastore 
>                             [487]  : Failed to connect to the MetaStore 
> Server...2021-06-15 13:33:01.418  INFO 102938 — [           main] h.metastore 
>                             [518]  : Waiting 1 seconds before next connection 
> attempt.2021-06-15 13:33:02.419  INFO 102938 — [           main] h.metastore  
>                            [434]  : Trying to connect to metastore with URI 
> thrift://..com:90832021-06-15 13:33:02.422 ERROR 102938 — [           
> main] o.a.t.t.TSaslTransport                  [315]  : SASL negotiation 
> failure
>  javax.security.sasl.SaslException: GSS initiate failed at 
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
>  ~[?:1.8.0_131] at 
> org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
>  ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) 
> ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
>  ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_131] at 
> javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_131] at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>  ~[hadoop-common-2.7.1.jar:?] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:478)
>  ~[hive-metastore-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:286)
>  ~[hive-metastore-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:211)
>  ~[hive-metastore-2.2.0.jar:2.2.0] at 
> org.apache.griffin.core.metastore.hive.HiveMetaStoreProxy.initHiveMetastoreClient(HiveMetaStoreProxy.java:68)
>  ~[service-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT] at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_131] at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_131] at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_131] at 

[jira] [Work logged] (GRIFFIN-360) Improve merge_pr.py script

2021-07-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-360?focusedWorklogId=619921=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-619921
 ]

ASF GitHub Bot logged work on GRIFFIN-360:
--

Author: ASF GitHub Bot
Created on: 07/Jul/21 11:29
Start Date: 07/Jul/21 11:29
Worklog Time Spent: 10m 
  Work Description: guoyuepeng commented on pull request #590:
URL: https://github.com/apache/griffin/pull/590#issuecomment-875526716


   LTGM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@griffin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 619921)
Remaining Estimate: 0h  (was: 10m)
Time Spent: 1h  (was: 50m)

> Improve merge_pr.py script
> --
>
> Key: GRIFFIN-360
> URL: https://issues.apache.org/jira/browse/GRIFFIN-360
> Project: Griffin
>  Issue Type: Improvement
>Reporter: Chitral Verma
>Priority: Major
>   Original Estimate: 1h
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The merge_pr.py script can be improved with many good-to-have changes like 
> below,
>  * allow python 3 compatibility
>  * better check for Jira dependency
>  * Updating Jira with more details like assignee and contributor details
>  * upgrading dependencies
> Also added a requirements.txt file for installation of script dependencies.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GRIFFIN-360) Improve merge_pr.py script

2021-07-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-360?focusedWorklogId=619920=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-619920
 ]

ASF GitHub Bot logged work on GRIFFIN-360:
--

Author: ASF GitHub Bot
Created on: 07/Jul/21 11:29
Start Date: 07/Jul/21 11:29
Worklog Time Spent: 10m 
  Work Description: guoyuepeng merged pull request #590:
URL: https://github.com/apache/griffin/pull/590


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@griffin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 619920)
Remaining Estimate: 10m  (was: 20m)
Time Spent: 50m  (was: 40m)

> Improve merge_pr.py script
> --
>
> Key: GRIFFIN-360
> URL: https://issues.apache.org/jira/browse/GRIFFIN-360
> Project: Griffin
>  Issue Type: Improvement
>Reporter: Chitral Verma
>Priority: Major
>   Original Estimate: 1h
>  Time Spent: 50m
>  Remaining Estimate: 10m
>
> The merge_pr.py script can be improved with many good-to-have changes like 
> below,
>  * allow python 3 compatibility
>  * better check for Jira dependency
>  * Updating Jira with more details like assignee and contributor details
>  * upgrading dependencies
> Also added a requirements.txt file for installation of script dependencies.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GRIFFIN-347) Setup automated workflows

2021-07-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-347?focusedWorklogId=619918=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-619918
 ]

ASF GitHub Bot logged work on GRIFFIN-347:
--

Author: ASF GitHub Bot
Created on: 07/Jul/21 11:29
Start Date: 07/Jul/21 11:29
Worklog Time Spent: 10m 
  Work Description: guoyuepeng merged pull request #583:
URL: https://github.com/apache/griffin/pull/583


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@griffin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 619918)
Time Spent: 2h  (was: 1h 50m)

> Setup automated workflows
> -
>
> Key: GRIFFIN-347
> URL: https://issues.apache.org/jira/browse/GRIFFIN-347
> Project: Griffin
>  Issue Type: Improvement
>Reporter: Chitral Verma
>Assignee: Chitral Verma
>Priority: Trivial
>   Original Estimate: 10m
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> This ticket aims to set up the following 2 automation in Github. 
> *Check for Stale PRs/ Issues*
>  Add a GitHub Workflow that automatically checks for stale PRs and tags/ 
> closes them when inactive for a long duration.
> {quote}This workflow will run every day at 00.00 UTC to check for any issues/ 
> PRs that have been inactive for over 30 and will close them in another 15 
> days.
>  Additionally, the stale PRs will be marked with a {{no-pr-activity}} label. 
> PR s having {{awaiting-approval}}, {{work-in-progress}} or {{wip}} label are 
> excluded from this check.
> {quote}
>  
> *Greet new users*
> {quote}Add a GitHub Workflow that automatically greets new users on their 
> first PR.
> {quote}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GRIFFIN-358) Rewrite the Rule/Measure implementations

2021-07-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-358?focusedWorklogId=619305=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-619305
 ]

ASF GitHub Bot logged work on GRIFFIN-358:
--

Author: ASF GitHub Bot
Created on: 06/Jul/21 11:46
Start Date: 06/Jul/21 11:46
Worklog Time Spent: 10m 
  Work Description: chitralverma commented on pull request #591:
URL: https://github.com/apache/griffin/pull/591#issuecomment-874684760


   Thanks for the merge! :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@griffin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 619305)
Time Spent: 1h 40m  (was: 1.5h)

> Rewrite the Rule/Measure implementations
> 
>
> Key: GRIFFIN-358
> URL: https://issues.apache.org/jira/browse/GRIFFIN-358
> Project: Griffin
>  Issue Type: New Feature
>Reporter: Chitral Verma
>Assignee: Chitral Verma
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Current `RuleParams` can be of the following 3 DSL types,
>  * Data Ops (for source preprocessing)
>  * Griffin DSL
>  * SparkSQL
> GriffinDSL allows the implementation of measures (DQ Types) like 
> Completeness, Accuracy, etc.
> To enable such measures there is an extensive implementation of expression, 
> task hierarchies, parsing and most of this is heavily dependent on 
> scala-parser-combinators.
> At the end of the implementation, Griffin DSL tries to mimic a SparkSQL-like 
> query but substitution of user-defined constraints.
> This approach has some drawbacks,
>  * Suboptimal processing. While the transformation steps execute in parallel 
> on the driver, the data set is still scanned multiple times in parallel which 
> can cause inefficiencies on the SparkSession side and the internal task 
> scheduler was single-threaded. Even though the data set can be cached, still 
> it branched and crucial memory is required for holding the dataset rather 
> than processing it.
>  * Internal functions of Spark are not used. Data preprocessing has a very 
> limited scope currently even though we have 100s spark SQL functions 
> available for use.
>  * This blocks structured streaming. The manually constructed SQL queries 
> cause multiple aggregations in the same query on a streaming data set which 
> is not supported by Spark's Structured streaming. There are workarounds for 
> this but they all require rewriting the *Expr2DQSteps classes.
>  * Griffin DSL is SparkSQL like but not 100% compatible. Profiling measure 
> and SparkSQL are redundant functionalities
> The proposed solution involves SparkSQL DSL based measures and some changes 
> to Rule Params. This will enhance the data pre proc flows and the measures 
> themselves



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GRIFFIN-347) Setup automated workflows

2021-07-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-347?focusedWorklogId=619307=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-619307
 ]

ASF GitHub Bot logged work on GRIFFIN-347:
--

Author: ASF GitHub Bot
Created on: 06/Jul/21 11:46
Start Date: 06/Jul/21 11:46
Worklog Time Spent: 10m 
  Work Description: chitralverma edited a comment on pull request #583:
URL: https://github.com/apache/griffin/pull/583#issuecomment-874686893


   > @chitralverma
   > since we have [GRIFFIN-360], do we still need this PR?
   
   Yes @guoyuepeng. This MR is about automating closing of stale PRs on github 
when they lack any activity for long duration.
   I see that you recently closed a lot of PRs as they were very old. 
   
   This PR does the same thing, it tags PRs as stale and closes them if there 
is no activity.
   GRIFFIN-360 is about enhancements to the PR merge script itself. 
   
   So these are 2 separate issues.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@griffin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 619307)
Time Spent: 1h 50m  (was: 1h 40m)

> Setup automated workflows
> -
>
> Key: GRIFFIN-347
> URL: https://issues.apache.org/jira/browse/GRIFFIN-347
> Project: Griffin
>  Issue Type: Improvement
>Reporter: Chitral Verma
>Assignee: Chitral Verma
>Priority: Trivial
>   Original Estimate: 10m
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> This ticket aims to set up the following 2 automation in Github. 
> *Check for Stale PRs/ Issues*
>  Add a GitHub Workflow that automatically checks for stale PRs and tags/ 
> closes them when inactive for a long duration.
> {quote}This workflow will run every day at 00.00 UTC to check for any issues/ 
> PRs that have been inactive for over 30 and will close them in another 15 
> days.
>  Additionally, the stale PRs will be marked with a {{no-pr-activity}} label. 
> PR s having {{awaiting-approval}}, {{work-in-progress}} or {{wip}} label are 
> excluded from this check.
> {quote}
>  
> *Greet new users*
> {quote}Add a GitHub Workflow that automatically greets new users on their 
> first PR.
> {quote}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GRIFFIN-347) Setup automated workflows

2021-07-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-347?focusedWorklogId=619302=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-619302
 ]

ASF GitHub Bot logged work on GRIFFIN-347:
--

Author: ASF GitHub Bot
Created on: 06/Jul/21 11:45
Start Date: 06/Jul/21 11:45
Worklog Time Spent: 10m 
  Work Description: chitralverma commented on pull request #583:
URL: https://github.com/apache/griffin/pull/583#issuecomment-874686893


   > @chitralverma
   > since we have [GRIFFIN-360], do we still need this PR?
   
   Yes @guoyuepeng. This MR is about automating closing of stale PRs on github 
when they lack any activity for long duration.
   I see that you recently closed a lot of PRs as they were very old. 
   
   This PR does the same thing, it tags PRs as stale and closes them if there 
is no activity.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@griffin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 619302)
Time Spent: 1h 40m  (was: 1.5h)

> Setup automated workflows
> -
>
> Key: GRIFFIN-347
> URL: https://issues.apache.org/jira/browse/GRIFFIN-347
> Project: Griffin
>  Issue Type: Improvement
>Reporter: Chitral Verma
>Assignee: Chitral Verma
>Priority: Trivial
>   Original Estimate: 10m
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> This ticket aims to set up the following 2 automation in Github. 
> *Check for Stale PRs/ Issues*
>  Add a GitHub Workflow that automatically checks for stale PRs and tags/ 
> closes them when inactive for a long duration.
> {quote}This workflow will run every day at 00.00 UTC to check for any issues/ 
> PRs that have been inactive for over 30 and will close them in another 15 
> days.
>  Additionally, the stale PRs will be marked with a {{no-pr-activity}} label. 
> PR s having {{awaiting-approval}}, {{work-in-progress}} or {{wip}} label are 
> excluded from this check.
> {quote}
>  
> *Greet new users*
> {quote}Add a GitHub Workflow that automatically greets new users on their 
> first PR.
> {quote}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GRIFFIN-347) Setup automated workflows

2021-07-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-347?focusedWorklogId=619011=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-619011
 ]

ASF GitHub Bot logged work on GRIFFIN-347:
--

Author: ASF GitHub Bot
Created on: 06/Jul/21 11:00
Start Date: 06/Jul/21 11:00
Worklog Time Spent: 10m 
  Work Description: guoyuepeng commented on pull request #583:
URL: https://github.com/apache/griffin/pull/583#issuecomment-874089188


   @chitralverma 
   since we have [GRIFFIN-360], do we still need this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@griffin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 619011)
Time Spent: 1.5h  (was: 1h 20m)

> Setup automated workflows
> -
>
> Key: GRIFFIN-347
> URL: https://issues.apache.org/jira/browse/GRIFFIN-347
> Project: Griffin
>  Issue Type: Improvement
>Reporter: Chitral Verma
>Assignee: Chitral Verma
>Priority: Trivial
>   Original Estimate: 10m
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> This ticket aims to set up the following 2 automation in Github. 
> *Check for Stale PRs/ Issues*
>  Add a GitHub Workflow that automatically checks for stale PRs and tags/ 
> closes them when inactive for a long duration.
> {quote}This workflow will run every day at 00.00 UTC to check for any issues/ 
> PRs that have been inactive for over 30 and will close them in another 15 
> days.
>  Additionally, the stale PRs will be marked with a {{no-pr-activity}} label. 
> PR s having {{awaiting-approval}}, {{work-in-progress}} or {{wip}} label are 
> excluded from this check.
> {quote}
>  
> *Greet new users*
> {quote}Add a GitHub Workflow that automatically greets new users on their 
> first PR.
> {quote}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GRIFFIN-358) Rewrite the Rule/Measure implementations

2021-07-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-358?focusedWorklogId=618998=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-618998
 ]

ASF GitHub Bot logged work on GRIFFIN-358:
--

Author: ASF GitHub Bot
Created on: 06/Jul/21 10:59
Start Date: 06/Jul/21 10:59
Worklog Time Spent: 10m 
  Work Description: guoyuepeng merged pull request #591:
URL: https://github.com/apache/griffin/pull/591


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@griffin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 618998)
Time Spent: 1.5h  (was: 1h 20m)

> Rewrite the Rule/Measure implementations
> 
>
> Key: GRIFFIN-358
> URL: https://issues.apache.org/jira/browse/GRIFFIN-358
> Project: Griffin
>  Issue Type: New Feature
>Reporter: Chitral Verma
>Assignee: Chitral Verma
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Current `RuleParams` can be of the following 3 DSL types,
>  * Data Ops (for source preprocessing)
>  * Griffin DSL
>  * SparkSQL
> GriffinDSL allows the implementation of measures (DQ Types) like 
> Completeness, Accuracy, etc.
> To enable such measures there is an extensive implementation of expression, 
> task hierarchies, parsing and most of this is heavily dependent on 
> scala-parser-combinators.
> At the end of the implementation, Griffin DSL tries to mimic a SparkSQL-like 
> query but substitution of user-defined constraints.
> This approach has some drawbacks,
>  * Suboptimal processing. While the transformation steps execute in parallel 
> on the driver, the data set is still scanned multiple times in parallel which 
> can cause inefficiencies on the SparkSession side and the internal task 
> scheduler was single-threaded. Even though the data set can be cached, still 
> it branched and crucial memory is required for holding the dataset rather 
> than processing it.
>  * Internal functions of Spark are not used. Data preprocessing has a very 
> limited scope currently even though we have 100s spark SQL functions 
> available for use.
>  * This blocks structured streaming. The manually constructed SQL queries 
> cause multiple aggregations in the same query on a streaming data set which 
> is not supported by Spark's Structured streaming. There are workarounds for 
> this but they all require rewriting the *Expr2DQSteps classes.
>  * Griffin DSL is SparkSQL like but not 100% compatible. Profiling measure 
> and SparkSQL are redundant functionalities
> The proposed solution involves SparkSQL DSL based measures and some changes 
> to Rule Params. This will enhance the data pre proc flows and the measures 
> themselves



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GRIFFIN-358) Rewrite the Rule/Measure implementations

2021-07-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-358?focusedWorklogId=618638=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-618638
 ]

ASF GitHub Bot logged work on GRIFFIN-358:
--

Author: ASF GitHub Bot
Created on: 05/Jul/21 12:58
Start Date: 05/Jul/21 12:58
Worklog Time Spent: 10m 
  Work Description: guoyuepeng merged pull request #591:
URL: https://github.com/apache/griffin/pull/591


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@griffin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 618638)
Time Spent: 1h 20m  (was: 1h 10m)

> Rewrite the Rule/Measure implementations
> 
>
> Key: GRIFFIN-358
> URL: https://issues.apache.org/jira/browse/GRIFFIN-358
> Project: Griffin
>  Issue Type: New Feature
>Reporter: Chitral Verma
>Assignee: Chitral Verma
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Current `RuleParams` can be of the following 3 DSL types,
>  * Data Ops (for source preprocessing)
>  * Griffin DSL
>  * SparkSQL
> GriffinDSL allows the implementation of measures (DQ Types) like 
> Completeness, Accuracy, etc.
> To enable such measures there is an extensive implementation of expression, 
> task hierarchies, parsing and most of this is heavily dependent on 
> scala-parser-combinators.
> At the end of the implementation, Griffin DSL tries to mimic a SparkSQL-like 
> query but substitution of user-defined constraints.
> This approach has some drawbacks,
>  * Suboptimal processing. While the transformation steps execute in parallel 
> on the driver, the data set is still scanned multiple times in parallel which 
> can cause inefficiencies on the SparkSession side and the internal task 
> scheduler was single-threaded. Even though the data set can be cached, still 
> it branched and crucial memory is required for holding the dataset rather 
> than processing it.
>  * Internal functions of Spark are not used. Data preprocessing has a very 
> limited scope currently even though we have 100s spark SQL functions 
> available for use.
>  * This blocks structured streaming. The manually constructed SQL queries 
> cause multiple aggregations in the same query on a streaming data set which 
> is not supported by Spark's Structured streaming. There are workarounds for 
> this but they all require rewriting the *Expr2DQSteps classes.
>  * Griffin DSL is SparkSQL like but not 100% compatible. Profiling measure 
> and SparkSQL are redundant functionalities
> The proposed solution involves SparkSQL DSL based measures and some changes 
> to Rule Params. This will enhance the data pre proc flows and the measures 
> themselves



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GRIFFIN-347) Setup automated workflows

2021-07-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-347?focusedWorklogId=618636=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-618636
 ]

ASF GitHub Bot logged work on GRIFFIN-347:
--

Author: ASF GitHub Bot
Created on: 05/Jul/21 12:53
Start Date: 05/Jul/21 12:53
Worklog Time Spent: 10m 
  Work Description: guoyuepeng commented on pull request #583:
URL: https://github.com/apache/griffin/pull/583#issuecomment-874089188


   @chitralverma 
   since we have [GRIFFIN-360], do we still need this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@griffin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 618636)
Time Spent: 1h 20m  (was: 1h 10m)

> Setup automated workflows
> -
>
> Key: GRIFFIN-347
> URL: https://issues.apache.org/jira/browse/GRIFFIN-347
> Project: Griffin
>  Issue Type: Improvement
>Reporter: Chitral Verma
>Assignee: Chitral Verma
>Priority: Trivial
>   Original Estimate: 10m
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> This ticket aims to set up the following 2 automation in Github. 
> *Check for Stale PRs/ Issues*
>  Add a GitHub Workflow that automatically checks for stale PRs and tags/ 
> closes them when inactive for a long duration.
> {quote}This workflow will run every day at 00.00 UTC to check for any issues/ 
> PRs that have been inactive for over 30 and will close them in another 15 
> days.
>  Additionally, the stale PRs will be marked with a {{no-pr-activity}} label. 
> PR s having {{awaiting-approval}}, {{work-in-progress}} or {{wip}} label are 
> excluded from this check.
> {quote}
>  
> *Greet new users*
> {quote}Add a GitHub Workflow that automatically greets new users on their 
> first PR.
> {quote}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GRIFFIN-358) Rewrite the Rule/Measure implementations

2021-07-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-358?focusedWorklogId=618508=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-618508
 ]

ASF GitHub Bot logged work on GRIFFIN-358:
--

Author: ASF GitHub Bot
Created on: 05/Jul/21 04:17
Start Date: 05/Jul/21 04:17
Worklog Time Spent: 10m 
  Work Description: guoyuepeng commented on pull request #591:
URL: https://github.com/apache/griffin/pull/591#issuecomment-873771974


   LGTM.
   Will merge it


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@griffin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 618508)
Time Spent: 1h 10m  (was: 1h)

> Rewrite the Rule/Measure implementations
> 
>
> Key: GRIFFIN-358
> URL: https://issues.apache.org/jira/browse/GRIFFIN-358
> Project: Griffin
>  Issue Type: New Feature
>Reporter: Chitral Verma
>Assignee: Chitral Verma
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Current `RuleParams` can be of the following 3 DSL types,
>  * Data Ops (for source preprocessing)
>  * Griffin DSL
>  * SparkSQL
> GriffinDSL allows the implementation of measures (DQ Types) like 
> Completeness, Accuracy, etc.
> To enable such measures there is an extensive implementation of expression, 
> task hierarchies, parsing and most of this is heavily dependent on 
> scala-parser-combinators.
> At the end of the implementation, Griffin DSL tries to mimic a SparkSQL-like 
> query but substitution of user-defined constraints.
> This approach has some drawbacks,
>  * Suboptimal processing. While the transformation steps execute in parallel 
> on the driver, the data set is still scanned multiple times in parallel which 
> can cause inefficiencies on the SparkSession side and the internal task 
> scheduler was single-threaded. Even though the data set can be cached, still 
> it branched and crucial memory is required for holding the dataset rather 
> than processing it.
>  * Internal functions of Spark are not used. Data preprocessing has a very 
> limited scope currently even though we have 100s spark SQL functions 
> available for use.
>  * This blocks structured streaming. The manually constructed SQL queries 
> cause multiple aggregations in the same query on a streaming data set which 
> is not supported by Spark's Structured streaming. There are workarounds for 
> this but they all require rewriting the *Expr2DQSteps classes.
>  * Griffin DSL is SparkSQL like but not 100% compatible. Profiling measure 
> and SparkSQL are redundant functionalities
> The proposed solution involves SparkSQL DSL based measures and some changes 
> to Rule Params. This will enhance the data pre proc flows and the measures 
> themselves



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GRIFFIN-358) Rewrite the Rule/Measure implementations

2021-07-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-358?focusedWorklogId=618497=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-618497
 ]

ASF GitHub Bot logged work on GRIFFIN-358:
--

Author: ASF GitHub Bot
Created on: 05/Jul/21 03:37
Start Date: 05/Jul/21 03:37
Worklog Time Spent: 10m 
  Work Description: chitralverma commented on pull request #591:
URL: https://github.com/apache/griffin/pull/591#issuecomment-873759586


   > big patch.
   > let me go through it today.
   > 
   > Thanks.
   
   Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@griffin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 618497)
Time Spent: 1h  (was: 50m)

> Rewrite the Rule/Measure implementations
> 
>
> Key: GRIFFIN-358
> URL: https://issues.apache.org/jira/browse/GRIFFIN-358
> Project: Griffin
>  Issue Type: New Feature
>Reporter: Chitral Verma
>Assignee: Chitral Verma
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Current `RuleParams` can be of the following 3 DSL types,
>  * Data Ops (for source preprocessing)
>  * Griffin DSL
>  * SparkSQL
> GriffinDSL allows the implementation of measures (DQ Types) like 
> Completeness, Accuracy, etc.
> To enable such measures there is an extensive implementation of expression, 
> task hierarchies, parsing and most of this is heavily dependent on 
> scala-parser-combinators.
> At the end of the implementation, Griffin DSL tries to mimic a SparkSQL-like 
> query but substitution of user-defined constraints.
> This approach has some drawbacks,
>  * Suboptimal processing. While the transformation steps execute in parallel 
> on the driver, the data set is still scanned multiple times in parallel which 
> can cause inefficiencies on the SparkSession side and the internal task 
> scheduler was single-threaded. Even though the data set can be cached, still 
> it branched and crucial memory is required for holding the dataset rather 
> than processing it.
>  * Internal functions of Spark are not used. Data preprocessing has a very 
> limited scope currently even though we have 100s spark SQL functions 
> available for use.
>  * This blocks structured streaming. The manually constructed SQL queries 
> cause multiple aggregations in the same query on a streaming data set which 
> is not supported by Spark's Structured streaming. There are workarounds for 
> this but they all require rewriting the *Expr2DQSteps classes.
>  * Griffin DSL is SparkSQL like but not 100% compatible. Profiling measure 
> and SparkSQL are redundant functionalities
> The proposed solution involves SparkSQL DSL based measures and some changes 
> to Rule Params. This will enhance the data pre proc flows and the measures 
> themselves



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GRIFFIN-358) Rewrite the Rule/Measure implementations

2021-07-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-358?focusedWorklogId=617949=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-617949
 ]

ASF GitHub Bot logged work on GRIFFIN-358:
--

Author: ASF GitHub Bot
Created on: 02/Jul/21 02:35
Start Date: 02/Jul/21 02:35
Worklog Time Spent: 10m 
  Work Description: guoyuepeng commented on pull request #591:
URL: https://github.com/apache/griffin/pull/591#issuecomment-872669275


   big patch.
   let me go through it today.
   
   Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@griffin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 617949)
Time Spent: 50m  (was: 40m)

> Rewrite the Rule/Measure implementations
> 
>
> Key: GRIFFIN-358
> URL: https://issues.apache.org/jira/browse/GRIFFIN-358
> Project: Griffin
>  Issue Type: New Feature
>Reporter: Chitral Verma
>Assignee: Chitral Verma
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Current `RuleParams` can be of the following 3 DSL types,
>  * Data Ops (for source preprocessing)
>  * Griffin DSL
>  * SparkSQL
> GriffinDSL allows the implementation of measures (DQ Types) like 
> Completeness, Accuracy, etc.
> To enable such measures there is an extensive implementation of expression, 
> task hierarchies, parsing and most of this is heavily dependent on 
> scala-parser-combinators.
> At the end of the implementation, Griffin DSL tries to mimic a SparkSQL-like 
> query but substitution of user-defined constraints.
> This approach has some drawbacks,
>  * Suboptimal processing. While the transformation steps execute in parallel 
> on the driver, the data set is still scanned multiple times in parallel which 
> can cause inefficiencies on the SparkSession side and the internal task 
> scheduler was single-threaded. Even though the data set can be cached, still 
> it branched and crucial memory is required for holding the dataset rather 
> than processing it.
>  * Internal functions of Spark are not used. Data preprocessing has a very 
> limited scope currently even though we have 100s spark SQL functions 
> available for use.
>  * This blocks structured streaming. The manually constructed SQL queries 
> cause multiple aggregations in the same query on a streaming data set which 
> is not supported by Spark's Structured streaming. There are workarounds for 
> this but they all require rewriting the *Expr2DQSteps classes.
>  * Griffin DSL is SparkSQL like but not 100% compatible. Profiling measure 
> and SparkSQL are redundant functionalities
> The proposed solution involves SparkSQL DSL based measures and some changes 
> to Rule Params. This will enhance the data pre proc flows and the measures 
> themselves



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GRIFFIN-358) Rewrite the Rule/Measure implementations

2021-06-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-358?focusedWorklogId=616938=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-616938
 ]

ASF GitHub Bot logged work on GRIFFIN-358:
--

Author: ASF GitHub Bot
Created on: 30/Jun/21 06:57
Start Date: 30/Jun/21 06:57
Worklog Time Spent: 10m 
  Work Description: chitralverma commented on a change in pull request #591:
URL: https://github.com/apache/griffin/pull/591#discussion_r661182329



##
File path: 
measure/src/main/scala/org/apache/griffin/measure/datasource/connector/batch/ElasticSearchGriffinDataConnector.scala
##
@@ -19,13 +19,13 @@ package 
org.apache.griffin.measure.datasource.connector.batch
 
 import java.io.{BufferedReader, ByteArrayInputStream, InputStreamReader}
 import java.net.URI
-import java.util
 
 import scala.collection.mutable
 import scala.collection.mutable.ArrayBuffer
 
 import com.fasterxml.jackson.databind.{JsonNode, ObjectMapper}
 import com.fasterxml.jackson.module.scala.DefaultScalaModule
+import java.util

Review comment:
   done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@griffin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 616938)
Time Spent: 40m  (was: 0.5h)

> Rewrite the Rule/Measure implementations
> 
>
> Key: GRIFFIN-358
> URL: https://issues.apache.org/jira/browse/GRIFFIN-358
> Project: Griffin
>  Issue Type: New Feature
>Reporter: Chitral Verma
>Assignee: Chitral Verma
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Current `RuleParams` can be of the following 3 DSL types,
>  * Data Ops (for source preprocessing)
>  * Griffin DSL
>  * SparkSQL
> GriffinDSL allows the implementation of measures (DQ Types) like 
> Completeness, Accuracy, etc.
> To enable such measures there is an extensive implementation of expression, 
> task hierarchies, parsing and most of this is heavily dependent on 
> scala-parser-combinators.
> At the end of the implementation, Griffin DSL tries to mimic a SparkSQL-like 
> query but substitution of user-defined constraints.
> This approach has some drawbacks,
>  * Suboptimal processing. While the transformation steps execute in parallel 
> on the driver, the data set is still scanned multiple times in parallel which 
> can cause inefficiencies on the SparkSession side and the internal task 
> scheduler was single-threaded. Even though the data set can be cached, still 
> it branched and crucial memory is required for holding the dataset rather 
> than processing it.
>  * Internal functions of Spark are not used. Data preprocessing has a very 
> limited scope currently even though we have 100s spark SQL functions 
> available for use.
>  * This blocks structured streaming. The manually constructed SQL queries 
> cause multiple aggregations in the same query on a streaming data set which 
> is not supported by Spark's Structured streaming. There are workarounds for 
> this but they all require rewriting the *Expr2DQSteps classes.
>  * Griffin DSL is SparkSQL like but not 100% compatible. Profiling measure 
> and SparkSQL are redundant functionalities
> The proposed solution involves SparkSQL DSL based measures and some changes 
> to Rule Params. This will enhance the data pre proc flows and the measures 
> themselves



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GRIFFIN-360) Improve merge_pr.py script

2021-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-360?focusedWorklogId=616883=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-616883
 ]

ASF GitHub Bot logged work on GRIFFIN-360:
--

Author: ASF GitHub Bot
Created on: 30/Jun/21 02:06
Start Date: 30/Jun/21 02:06
Worklog Time Spent: 10m 
  Work Description: wankunde removed a comment on pull request #590:
URL: https://github.com/apache/griffin/pull/590#issuecomment-871041023


   LGTM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@griffin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 616883)
Remaining Estimate: 20m  (was: 0.5h)
Time Spent: 40m  (was: 0.5h)

> Improve merge_pr.py script
> --
>
> Key: GRIFFIN-360
> URL: https://issues.apache.org/jira/browse/GRIFFIN-360
> Project: Griffin
>  Issue Type: Improvement
>Reporter: Chitral Verma
>Priority: Major
>   Original Estimate: 1h
>  Time Spent: 40m
>  Remaining Estimate: 20m
>
> The merge_pr.py script can be improved with many good-to-have changes like 
> below,
>  * allow python 3 compatibility
>  * better check for Jira dependency
>  * Updating Jira with more details like assignee and contributor details
>  * upgrading dependencies
> Also added a requirements.txt file for installation of script dependencies.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GRIFFIN-360) Improve merge_pr.py script

2021-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-360?focusedWorklogId=616881=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-616881
 ]

ASF GitHub Bot logged work on GRIFFIN-360:
--

Author: ASF GitHub Bot
Created on: 30/Jun/21 02:03
Start Date: 30/Jun/21 02:03
Worklog Time Spent: 10m 
  Work Description: wankunde commented on pull request #590:
URL: https://github.com/apache/griffin/pull/590#issuecomment-871041023


   LGTM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@griffin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 616881)
Remaining Estimate: 0.5h  (was: 40m)
Time Spent: 0.5h  (was: 20m)

> Improve merge_pr.py script
> --
>
> Key: GRIFFIN-360
> URL: https://issues.apache.org/jira/browse/GRIFFIN-360
> Project: Griffin
>  Issue Type: Improvement
>Reporter: Chitral Verma
>Priority: Major
>   Original Estimate: 1h
>  Time Spent: 0.5h
>  Remaining Estimate: 0.5h
>
> The merge_pr.py script can be improved with many good-to-have changes like 
> below,
>  * allow python 3 compatibility
>  * better check for Jira dependency
>  * Updating Jira with more details like assignee and contributor details
>  * upgrading dependencies
> Also added a requirements.txt file for installation of script dependencies.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GRIFFIN-358) Rewrite the Rule/Measure implementations

2021-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-358?focusedWorklogId=616878=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-616878
 ]

ASF GitHub Bot logged work on GRIFFIN-358:
--

Author: ASF GitHub Bot
Created on: 30/Jun/21 02:00
Start Date: 30/Jun/21 02:00
Worklog Time Spent: 10m 
  Work Description: wankunde commented on a change in pull request #591:
URL: https://github.com/apache/griffin/pull/591#discussion_r661074917



##
File path: 
measure/src/main/scala/org/apache/griffin/measure/datasource/connector/batch/ElasticSearchGriffinDataConnector.scala
##
@@ -19,13 +19,13 @@ package 
org.apache.griffin.measure.datasource.connector.batch
 
 import java.io.{BufferedReader, ByteArrayInputStream, InputStreamReader}
 import java.net.URI
-import java.util
 
 import scala.collection.mutable
 import scala.collection.mutable.ArrayBuffer
 
 import com.fasterxml.jackson.databind.{JsonNode, ObjectMapper}
 import com.fasterxml.jackson.module.scala.DefaultScalaModule
+import java.util

Review comment:
   reorder this import plz




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@griffin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 616878)
Time Spent: 0.5h  (was: 20m)

> Rewrite the Rule/Measure implementations
> 
>
> Key: GRIFFIN-358
> URL: https://issues.apache.org/jira/browse/GRIFFIN-358
> Project: Griffin
>  Issue Type: New Feature
>Reporter: Chitral Verma
>Assignee: Chitral Verma
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Current `RuleParams` can be of the following 3 DSL types,
>  * Data Ops (for source preprocessing)
>  * Griffin DSL
>  * SparkSQL
> GriffinDSL allows the implementation of measures (DQ Types) like 
> Completeness, Accuracy, etc.
> To enable such measures there is an extensive implementation of expression, 
> task hierarchies, parsing and most of this is heavily dependent on 
> scala-parser-combinators.
> At the end of the implementation, Griffin DSL tries to mimic a SparkSQL-like 
> query but substitution of user-defined constraints.
> This approach has some drawbacks,
>  * Suboptimal processing. While the transformation steps execute in parallel 
> on the driver, the data set is still scanned multiple times in parallel which 
> can cause inefficiencies on the SparkSession side and the internal task 
> scheduler was single-threaded. Even though the data set can be cached, still 
> it branched and crucial memory is required for holding the dataset rather 
> than processing it.
>  * Internal functions of Spark are not used. Data preprocessing has a very 
> limited scope currently even though we have 100s spark SQL functions 
> available for use.
>  * This blocks structured streaming. The manually constructed SQL queries 
> cause multiple aggregations in the same query on a streaming data set which 
> is not supported by Spark's Structured streaming. There are workarounds for 
> this but they all require rewriting the *Expr2DQSteps classes.
>  * Griffin DSL is SparkSQL like but not 100% compatible. Profiling measure 
> and SparkSQL are redundant functionalities
> The proposed solution involves SparkSQL DSL based measures and some changes 
> to Rule Params. This will enhance the data pre proc flows and the measures 
> themselves



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GRIFFIN-363) Hive kerberos for Griffin error

2021-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-363?focusedWorklogId=616872=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-616872
 ]

ASF GitHub Bot logged work on GRIFFIN-363:
--

Author: ASF GitHub Bot
Created on: 30/Jun/21 01:52
Start Date: 30/Jun/21 01:52
Worklog Time Spent: 10m 
  Work Description: lovelyqincai commented on a change in pull request #592:
URL: https://github.com/apache/griffin/pull/592#discussion_r661073537



##
File path: service/src/main/resources/application.properties
##
@@ -22,6 +22,9 @@ spring.datasource.password=123456
 spring.jpa.generate-ddl=true
 spring.datasource.driver-class-name=org.postgresql.Driver
 spring.jpa.show-sql=true
+# kerberos
+# add new configuration for kerberos file
+krb5conf.path=/path/to/krb5conf/file

Review comment:
   > Should we comment this configuration by default ?
   Excuse me,I have a little confused,would you explain it for me?
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@griffin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 616872)
Remaining Estimate: 21h 10m  (was: 21h 20m)
Time Spent: 2h 50m  (was: 2h 40m)

> Hive kerberos for Griffin error
> ---
>
> Key: GRIFFIN-363
> URL: https://issues.apache.org/jira/browse/GRIFFIN-363
> Project: Griffin
>  Issue Type: Bug
>  Components: Service Module
>Affects Versions: 0.6.0
> Environment: CentOS 7 
>Reporter: MenghuiWan
>Priority: Minor
>  Labels: kerberos
>   Original Estimate: 24h
>  Time Spent: 2h 50m
>  Remaining Estimate: 21h 10m
>
> i am sorry for my english is not well. 
> 
> This is my problem:
> i try to use griffin for data quality projects.
> our enviroment is CDH 6.1.1 cluster with kerberos.
> i want to connect hive then i set up all configurations about kerberos by 
> user guide.
> but i found hive connection had failed.
> 
> error log is:
> {quote}2021-06-15 13:33:01.418  WARN 102938 — [           main] h.metastore   
>                           [487]  : Failed to connect to the MetaStore 
> Server...2021-06-15 13:33:01.418  WARN 102938 — [           main] h.metastore 
>                             [487]  : Failed to connect to the MetaStore 
> Server...2021-06-15 13:33:01.418  INFO 102938 — [           main] h.metastore 
>                             [518]  : Waiting 1 seconds before next connection 
> attempt.2021-06-15 13:33:02.419  INFO 102938 — [           main] h.metastore  
>                            [434]  : Trying to connect to metastore with URI 
> thrift://..com:90832021-06-15 13:33:02.422 ERROR 102938 — [           
> main] o.a.t.t.TSaslTransport                  [315]  : SASL negotiation 
> failure
>  javax.security.sasl.SaslException: GSS initiate failed at 
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
>  ~[?:1.8.0_131] at 
> org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
>  ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) 
> ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
>  ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_131] at 
> javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_131] at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>  ~[hadoop-common-2.7.1.jar:?] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:478)
>  ~[hive-metastore-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:286)
>  ~[hive-metastore-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:211)
>  ~[hive-metastore-2.2.0.jar:2.2.0] at 
> org.apache.griffin.core.metastore.hive.HiveMetaStoreProxy.initHiveMetastoreClient(HiveMetaStoreProxy.java:68)
>  ~[service-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT] at 
> 

[jira] [Work logged] (GRIFFIN-363) Hive kerberos for Griffin error

2021-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-363?focusedWorklogId=616871=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-616871
 ]

ASF GitHub Bot logged work on GRIFFIN-363:
--

Author: ASF GitHub Bot
Created on: 30/Jun/21 01:52
Start Date: 30/Jun/21 01:52
Worklog Time Spent: 10m 
  Work Description: lovelyqincai commented on a change in pull request #592:
URL: https://github.com/apache/griffin/pull/592#discussion_r661073465



##
File path: service/src/main/resources/application.properties
##
@@ -22,6 +22,9 @@ spring.datasource.password=123456
 spring.jpa.generate-ddl=true
 spring.datasource.driver-class-name=org.postgresql.Driver
 spring.jpa.show-sql=true
+# kerberos
+# add new configuration for kerberos file
+krb5conf.path=/path/to/krb5conf/file

Review comment:
   Excuse me,I have a little confused,would you explain it for me?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@griffin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 616871)
Remaining Estimate: 21h 20m  (was: 21.5h)
Time Spent: 2h 40m  (was: 2.5h)

> Hive kerberos for Griffin error
> ---
>
> Key: GRIFFIN-363
> URL: https://issues.apache.org/jira/browse/GRIFFIN-363
> Project: Griffin
>  Issue Type: Bug
>  Components: Service Module
>Affects Versions: 0.6.0
> Environment: CentOS 7 
>Reporter: MenghuiWan
>Priority: Minor
>  Labels: kerberos
>   Original Estimate: 24h
>  Time Spent: 2h 40m
>  Remaining Estimate: 21h 20m
>
> i am sorry for my english is not well. 
> 
> This is my problem:
> i try to use griffin for data quality projects.
> our enviroment is CDH 6.1.1 cluster with kerberos.
> i want to connect hive then i set up all configurations about kerberos by 
> user guide.
> but i found hive connection had failed.
> 
> error log is:
> {quote}2021-06-15 13:33:01.418  WARN 102938 — [           main] h.metastore   
>                           [487]  : Failed to connect to the MetaStore 
> Server...2021-06-15 13:33:01.418  WARN 102938 — [           main] h.metastore 
>                             [487]  : Failed to connect to the MetaStore 
> Server...2021-06-15 13:33:01.418  INFO 102938 — [           main] h.metastore 
>                             [518]  : Waiting 1 seconds before next connection 
> attempt.2021-06-15 13:33:02.419  INFO 102938 — [           main] h.metastore  
>                            [434]  : Trying to connect to metastore with URI 
> thrift://..com:90832021-06-15 13:33:02.422 ERROR 102938 — [           
> main] o.a.t.t.TSaslTransport                  [315]  : SASL negotiation 
> failure
>  javax.security.sasl.SaslException: GSS initiate failed at 
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
>  ~[?:1.8.0_131] at 
> org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
>  ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) 
> ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
>  ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_131] at 
> javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_131] at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>  ~[hadoop-common-2.7.1.jar:?] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:478)
>  ~[hive-metastore-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:286)
>  ~[hive-metastore-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:211)
>  ~[hive-metastore-2.2.0.jar:2.2.0] at 
> org.apache.griffin.core.metastore.hive.HiveMetaStoreProxy.initHiveMetastoreClient(HiveMetaStoreProxy.java:68)
>  ~[service-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT] at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 

[jira] [Work logged] (GRIFFIN-363) Hive kerberos for Griffin error

2021-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-363?focusedWorklogId=616864=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-616864
 ]

ASF GitHub Bot logged work on GRIFFIN-363:
--

Author: ASF GitHub Bot
Created on: 30/Jun/21 01:44
Start Date: 30/Jun/21 01:44
Worklog Time Spent: 10m 
  Work Description: lovelyqincai commented on a change in pull request #592:
URL: https://github.com/apache/griffin/pull/592#discussion_r661071240



##
File path: 
service/src/main/java/org/apache/griffin/core/metastore/hive/HiveMetaStoreProxy.java
##
@@ -55,6 +59,28 @@ Licensed to the Apache Software Foundation (ASF) under one
 
 private IMetaStoreClient client = null;
 
+@Value("${hive.krb5conf.path}")
+private String hiveKrb5confPath;
+
+@Value("${hive.keytab.path}")
+private String keytabPath;
+
+@Value("${hive.keytab.user}")
+private String keytabUser;
+
+@Value("${hive.need.kerberos}")
+private String needKerberos;

Review comment:
   @wankunde 
   
   > Agree with @chitralverma
   > 
   > @lovelyqincai Could we change `needKerberos` variable in 
`org.apache.griffin.core.metastore.hive.HiveMetaStoreServiceJdbcImpl` to 
boolean type in another PR.
   
   ok,i also agree with.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@griffin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 616864)
Remaining Estimate: 21.5h  (was: 21h 40m)
Time Spent: 2.5h  (was: 2h 20m)

> Hive kerberos for Griffin error
> ---
>
> Key: GRIFFIN-363
> URL: https://issues.apache.org/jira/browse/GRIFFIN-363
> Project: Griffin
>  Issue Type: Bug
>  Components: Service Module
>Affects Versions: 0.6.0
> Environment: CentOS 7 
>Reporter: MenghuiWan
>Priority: Minor
>  Labels: kerberos
>   Original Estimate: 24h
>  Time Spent: 2.5h
>  Remaining Estimate: 21.5h
>
> i am sorry for my english is not well. 
> 
> This is my problem:
> i try to use griffin for data quality projects.
> our enviroment is CDH 6.1.1 cluster with kerberos.
> i want to connect hive then i set up all configurations about kerberos by 
> user guide.
> but i found hive connection had failed.
> 
> error log is:
> {quote}2021-06-15 13:33:01.418  WARN 102938 — [           main] h.metastore   
>                           [487]  : Failed to connect to the MetaStore 
> Server...2021-06-15 13:33:01.418  WARN 102938 — [           main] h.metastore 
>                             [487]  : Failed to connect to the MetaStore 
> Server...2021-06-15 13:33:01.418  INFO 102938 — [           main] h.metastore 
>                             [518]  : Waiting 1 seconds before next connection 
> attempt.2021-06-15 13:33:02.419  INFO 102938 — [           main] h.metastore  
>                            [434]  : Trying to connect to metastore with URI 
> thrift://..com:90832021-06-15 13:33:02.422 ERROR 102938 — [           
> main] o.a.t.t.TSaslTransport                  [315]  : SASL negotiation 
> failure
>  javax.security.sasl.SaslException: GSS initiate failed at 
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
>  ~[?:1.8.0_131] at 
> org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
>  ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) 
> ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
>  ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_131] at 
> javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_131] at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>  ~[hadoop-common-2.7.1.jar:?] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:478)
>  ~[hive-metastore-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:286)
>  ~[hive-metastore-2.2.0.jar:2.2.0] 

[jira] [Work logged] (GRIFFIN-363) Hive kerberos for Griffin error

2021-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-363?focusedWorklogId=616863=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-616863
 ]

ASF GitHub Bot logged work on GRIFFIN-363:
--

Author: ASF GitHub Bot
Created on: 30/Jun/21 01:44
Start Date: 30/Jun/21 01:44
Worklog Time Spent: 10m 
  Work Description: lovelyqincai commented on a change in pull request #592:
URL: https://github.com/apache/griffin/pull/592#discussion_r661071078



##
File path: 
service/src/main/java/org/apache/griffin/core/metastore/hive/HiveMetaStoreProxy.java
##
@@ -55,6 +59,28 @@ Licensed to the Apache Software Foundation (ASF) under one
 
 private IMetaStoreClient client = null;
 
+@Value("${hive.krb5conf.path}")
+private String hiveKrb5confPath;
+
+@Value("${hive.keytab.path}")
+private String keytabPath;
+
+@Value("${hive.keytab.user}")
+private String keytabUser;
+
+@Value("${hive.need.kerberos}")
+private String needKerberos;

Review comment:
   ok,i also agree with.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@griffin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 616863)
Remaining Estimate: 21h 40m  (was: 21h 50m)
Time Spent: 2h 20m  (was: 2h 10m)

> Hive kerberos for Griffin error
> ---
>
> Key: GRIFFIN-363
> URL: https://issues.apache.org/jira/browse/GRIFFIN-363
> Project: Griffin
>  Issue Type: Bug
>  Components: Service Module
>Affects Versions: 0.6.0
> Environment: CentOS 7 
>Reporter: MenghuiWan
>Priority: Minor
>  Labels: kerberos
>   Original Estimate: 24h
>  Time Spent: 2h 20m
>  Remaining Estimate: 21h 40m
>
> i am sorry for my english is not well. 
> 
> This is my problem:
> i try to use griffin for data quality projects.
> our enviroment is CDH 6.1.1 cluster with kerberos.
> i want to connect hive then i set up all configurations about kerberos by 
> user guide.
> but i found hive connection had failed.
> 
> error log is:
> {quote}2021-06-15 13:33:01.418  WARN 102938 — [           main] h.metastore   
>                           [487]  : Failed to connect to the MetaStore 
> Server...2021-06-15 13:33:01.418  WARN 102938 — [           main] h.metastore 
>                             [487]  : Failed to connect to the MetaStore 
> Server...2021-06-15 13:33:01.418  INFO 102938 — [           main] h.metastore 
>                             [518]  : Waiting 1 seconds before next connection 
> attempt.2021-06-15 13:33:02.419  INFO 102938 — [           main] h.metastore  
>                            [434]  : Trying to connect to metastore with URI 
> thrift://..com:90832021-06-15 13:33:02.422 ERROR 102938 — [           
> main] o.a.t.t.TSaslTransport                  [315]  : SASL negotiation 
> failure
>  javax.security.sasl.SaslException: GSS initiate failed at 
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
>  ~[?:1.8.0_131] at 
> org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
>  ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) 
> ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
>  ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_131] at 
> javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_131] at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>  ~[hadoop-common-2.7.1.jar:?] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:478)
>  ~[hive-metastore-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:286)
>  ~[hive-metastore-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:211)
>  ~[hive-metastore-2.2.0.jar:2.2.0] at 
> 

[jira] [Work logged] (GRIFFIN-363) Hive kerberos for Griffin error

2021-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-363?focusedWorklogId=616861=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-616861
 ]

ASF GitHub Bot logged work on GRIFFIN-363:
--

Author: ASF GitHub Bot
Created on: 30/Jun/21 01:41
Start Date: 30/Jun/21 01:41
Worklog Time Spent: 10m 
  Work Description: wankunde commented on a change in pull request #592:
URL: https://github.com/apache/griffin/pull/592#discussion_r661070386



##
File path: service/src/main/resources/application.properties
##
@@ -22,6 +22,9 @@ spring.datasource.password=123456
 spring.jpa.generate-ddl=true
 spring.datasource.driver-class-name=org.postgresql.Driver
 spring.jpa.show-sql=true
+# kerberos
+# add new configuration for kerberos file
+krb5conf.path=/path/to/krb5conf/file

Review comment:
   Should we comment this configuration by default ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@griffin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 616861)
Remaining Estimate: 21h 50m  (was: 22h)
Time Spent: 2h 10m  (was: 2h)

> Hive kerberos for Griffin error
> ---
>
> Key: GRIFFIN-363
> URL: https://issues.apache.org/jira/browse/GRIFFIN-363
> Project: Griffin
>  Issue Type: Bug
>  Components: Service Module
>Affects Versions: 0.6.0
> Environment: CentOS 7 
>Reporter: MenghuiWan
>Priority: Minor
>  Labels: kerberos
>   Original Estimate: 24h
>  Time Spent: 2h 10m
>  Remaining Estimate: 21h 50m
>
> i am sorry for my english is not well. 
> 
> This is my problem:
> i try to use griffin for data quality projects.
> our enviroment is CDH 6.1.1 cluster with kerberos.
> i want to connect hive then i set up all configurations about kerberos by 
> user guide.
> but i found hive connection had failed.
> 
> error log is:
> {quote}2021-06-15 13:33:01.418  WARN 102938 — [           main] h.metastore   
>                           [487]  : Failed to connect to the MetaStore 
> Server...2021-06-15 13:33:01.418  WARN 102938 — [           main] h.metastore 
>                             [487]  : Failed to connect to the MetaStore 
> Server...2021-06-15 13:33:01.418  INFO 102938 — [           main] h.metastore 
>                             [518]  : Waiting 1 seconds before next connection 
> attempt.2021-06-15 13:33:02.419  INFO 102938 — [           main] h.metastore  
>                            [434]  : Trying to connect to metastore with URI 
> thrift://..com:90832021-06-15 13:33:02.422 ERROR 102938 — [           
> main] o.a.t.t.TSaslTransport                  [315]  : SASL negotiation 
> failure
>  javax.security.sasl.SaslException: GSS initiate failed at 
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
>  ~[?:1.8.0_131] at 
> org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
>  ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) 
> ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
>  ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_131] at 
> javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_131] at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>  ~[hadoop-common-2.7.1.jar:?] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:478)
>  ~[hive-metastore-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:286)
>  ~[hive-metastore-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:211)
>  ~[hive-metastore-2.2.0.jar:2.2.0] at 
> org.apache.griffin.core.metastore.hive.HiveMetaStoreProxy.initHiveMetastoreClient(HiveMetaStoreProxy.java:68)
>  ~[service-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT] at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_131] at 
> 

[jira] [Work logged] (GRIFFIN-363) Hive kerberos for Griffin error

2021-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-363?focusedWorklogId=616860=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-616860
 ]

ASF GitHub Bot logged work on GRIFFIN-363:
--

Author: ASF GitHub Bot
Created on: 30/Jun/21 01:39
Start Date: 30/Jun/21 01:39
Worklog Time Spent: 10m 
  Work Description: wankunde commented on a change in pull request #592:
URL: https://github.com/apache/griffin/pull/592#discussion_r661069716



##
File path: 
service/src/main/java/org/apache/griffin/core/metastore/hive/HiveMetaStoreProxy.java
##
@@ -55,6 +59,28 @@ Licensed to the Apache Software Foundation (ASF) under one
 
 private IMetaStoreClient client = null;
 
+@Value("${hive.krb5conf.path}")
+private String hiveKrb5confPath;
+
+@Value("${hive.keytab.path}")
+private String keytabPath;
+
+@Value("${hive.keytab.user}")
+private String keytabUser;
+
+@Value("${hive.need.kerberos}")
+private String needKerberos;

Review comment:
   Agree with @chitralverma 
   
   @lovelyqincai Could we  change `needKerberos` variable in 
`org.apache.griffin.core.metastore.hive.HiveMetaStoreServiceJdbcImpl` to 
boolean type in another PR.
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@griffin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 616860)
Remaining Estimate: 22h  (was: 22h 10m)
Time Spent: 2h  (was: 1h 50m)

> Hive kerberos for Griffin error
> ---
>
> Key: GRIFFIN-363
> URL: https://issues.apache.org/jira/browse/GRIFFIN-363
> Project: Griffin
>  Issue Type: Bug
>  Components: Service Module
>Affects Versions: 0.6.0
> Environment: CentOS 7 
>Reporter: MenghuiWan
>Priority: Minor
>  Labels: kerberos
>   Original Estimate: 24h
>  Time Spent: 2h
>  Remaining Estimate: 22h
>
> i am sorry for my english is not well. 
> 
> This is my problem:
> i try to use griffin for data quality projects.
> our enviroment is CDH 6.1.1 cluster with kerberos.
> i want to connect hive then i set up all configurations about kerberos by 
> user guide.
> but i found hive connection had failed.
> 
> error log is:
> {quote}2021-06-15 13:33:01.418  WARN 102938 — [           main] h.metastore   
>                           [487]  : Failed to connect to the MetaStore 
> Server...2021-06-15 13:33:01.418  WARN 102938 — [           main] h.metastore 
>                             [487]  : Failed to connect to the MetaStore 
> Server...2021-06-15 13:33:01.418  INFO 102938 — [           main] h.metastore 
>                             [518]  : Waiting 1 seconds before next connection 
> attempt.2021-06-15 13:33:02.419  INFO 102938 — [           main] h.metastore  
>                            [434]  : Trying to connect to metastore with URI 
> thrift://..com:90832021-06-15 13:33:02.422 ERROR 102938 — [           
> main] o.a.t.t.TSaslTransport                  [315]  : SASL negotiation 
> failure
>  javax.security.sasl.SaslException: GSS initiate failed at 
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
>  ~[?:1.8.0_131] at 
> org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
>  ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) 
> ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
>  ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_131] at 
> javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_131] at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>  ~[hadoop-common-2.7.1.jar:?] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:478)
>  ~[hive-metastore-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:286)
>  ~[hive-metastore-2.2.0.jar:2.2.0] at 
> 

[jira] [Work logged] (GRIFFIN-363) Hive kerberos for Griffin error

2021-06-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-363?focusedWorklogId=615383=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-615383
 ]

ASF GitHub Bot logged work on GRIFFIN-363:
--

Author: ASF GitHub Bot
Created on: 27/Jun/21 11:09
Start Date: 27/Jun/21 11:09
Worklog Time Spent: 10m 
  Work Description: lovelyqincai commented on a change in pull request #592:
URL: https://github.com/apache/griffin/pull/592#discussion_r659304138



##
File path: 
service/src/main/java/org/apache/griffin/core/metastore/hive/HiveMetaStoreProxy.java
##
@@ -55,6 +59,28 @@ Licensed to the Apache Software Foundation (ASF) under one
 
 private IMetaStoreClient client = null;
 
+@Value("${hive.krb5conf.path}")
+private String hiveKrb5confPath;
+
+@Value("${hive.keytab.path}")

Review comment:
   > Since kerberos doesnt apply to just hive but hadoop services in 
general, the "hive." prefix can be removed from all these configs.
   emmm, kerberos have many keytabs for different user,like hive,hdfs,livy and 
so on.
   i can see many keytab files name like hive.keytab,hdfs.keytab...
   how about keep the prefix?
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@griffin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 615383)
Remaining Estimate: 22h 20m  (was: 22.5h)
Time Spent: 1h 40m  (was: 1.5h)

> Hive kerberos for Griffin error
> ---
>
> Key: GRIFFIN-363
> URL: https://issues.apache.org/jira/browse/GRIFFIN-363
> Project: Griffin
>  Issue Type: Bug
>  Components: Service Module
>Affects Versions: 0.6.0
> Environment: CentOS 7 
>Reporter: MenghuiWan
>Priority: Minor
>  Labels: kerberos
>   Original Estimate: 24h
>  Time Spent: 1h 40m
>  Remaining Estimate: 22h 20m
>
> i am sorry for my english is not well. 
> 
> This is my problem:
> i try to use griffin for data quality projects.
> our enviroment is CDH 6.1.1 cluster with kerberos.
> i want to connect hive then i set up all configurations about kerberos by 
> user guide.
> but i found hive connection had failed.
> 
> error log is:
> {quote}2021-06-15 13:33:01.418  WARN 102938 — [           main] h.metastore   
>                           [487]  : Failed to connect to the MetaStore 
> Server...2021-06-15 13:33:01.418  WARN 102938 — [           main] h.metastore 
>                             [487]  : Failed to connect to the MetaStore 
> Server...2021-06-15 13:33:01.418  INFO 102938 — [           main] h.metastore 
>                             [518]  : Waiting 1 seconds before next connection 
> attempt.2021-06-15 13:33:02.419  INFO 102938 — [           main] h.metastore  
>                            [434]  : Trying to connect to metastore with URI 
> thrift://..com:90832021-06-15 13:33:02.422 ERROR 102938 — [           
> main] o.a.t.t.TSaslTransport                  [315]  : SASL negotiation 
> failure
>  javax.security.sasl.SaslException: GSS initiate failed at 
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
>  ~[?:1.8.0_131] at 
> org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
>  ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) 
> ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
>  ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_131] at 
> javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_131] at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>  ~[hadoop-common-2.7.1.jar:?] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:478)
>  ~[hive-metastore-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:286)
>  ~[hive-metastore-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:211)
>  

[jira] [Work logged] (GRIFFIN-363) Hive kerberos for Griffin error

2021-06-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-363?focusedWorklogId=615384=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-615384
 ]

ASF GitHub Bot logged work on GRIFFIN-363:
--

Author: ASF GitHub Bot
Created on: 27/Jun/21 11:09
Start Date: 27/Jun/21 11:09
Worklog Time Spent: 10m 
  Work Description: lovelyqincai commented on a change in pull request #592:
URL: https://github.com/apache/griffin/pull/592#discussion_r659304138



##
File path: 
service/src/main/java/org/apache/griffin/core/metastore/hive/HiveMetaStoreProxy.java
##
@@ -55,6 +59,28 @@ Licensed to the Apache Software Foundation (ASF) under one
 
 private IMetaStoreClient client = null;
 
+@Value("${hive.krb5conf.path}")
+private String hiveKrb5confPath;
+
+@Value("${hive.keytab.path}")

Review comment:
   > Since kerberos doesnt apply to just hive but hadoop services in 
general, the "hive." prefix can be removed from all these configs.
   
   emmm, kerberos have many keytabs for different user,like hive,hdfs,livy and 
so on.
   i can see many keytab files name like hive.keytab,hdfs.keytab...
   how about keep the prefix?
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@griffin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 615384)
Remaining Estimate: 22h 10m  (was: 22h 20m)
Time Spent: 1h 50m  (was: 1h 40m)

> Hive kerberos for Griffin error
> ---
>
> Key: GRIFFIN-363
> URL: https://issues.apache.org/jira/browse/GRIFFIN-363
> Project: Griffin
>  Issue Type: Bug
>  Components: Service Module
>Affects Versions: 0.6.0
> Environment: CentOS 7 
>Reporter: MenghuiWan
>Priority: Minor
>  Labels: kerberos
>   Original Estimate: 24h
>  Time Spent: 1h 50m
>  Remaining Estimate: 22h 10m
>
> i am sorry for my english is not well. 
> 
> This is my problem:
> i try to use griffin for data quality projects.
> our enviroment is CDH 6.1.1 cluster with kerberos.
> i want to connect hive then i set up all configurations about kerberos by 
> user guide.
> but i found hive connection had failed.
> 
> error log is:
> {quote}2021-06-15 13:33:01.418  WARN 102938 — [           main] h.metastore   
>                           [487]  : Failed to connect to the MetaStore 
> Server...2021-06-15 13:33:01.418  WARN 102938 — [           main] h.metastore 
>                             [487]  : Failed to connect to the MetaStore 
> Server...2021-06-15 13:33:01.418  INFO 102938 — [           main] h.metastore 
>                             [518]  : Waiting 1 seconds before next connection 
> attempt.2021-06-15 13:33:02.419  INFO 102938 — [           main] h.metastore  
>                            [434]  : Trying to connect to metastore with URI 
> thrift://..com:90832021-06-15 13:33:02.422 ERROR 102938 — [           
> main] o.a.t.t.TSaslTransport                  [315]  : SASL negotiation 
> failure
>  javax.security.sasl.SaslException: GSS initiate failed at 
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
>  ~[?:1.8.0_131] at 
> org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
>  ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) 
> ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
>  ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_131] at 
> javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_131] at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>  ~[hadoop-common-2.7.1.jar:?] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:478)
>  ~[hive-metastore-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:286)
>  ~[hive-metastore-2.2.0.jar:2.2.0] at 
> 

[jira] [Work logged] (GRIFFIN-363) Hive kerberos for Griffin error

2021-06-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-363?focusedWorklogId=615382=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-615382
 ]

ASF GitHub Bot logged work on GRIFFIN-363:
--

Author: ASF GitHub Bot
Created on: 27/Jun/21 10:47
Start Date: 27/Jun/21 10:47
Worklog Time Spent: 10m 
  Work Description: lovelyqincai commented on pull request #592:
URL: https://github.com/apache/griffin/pull/592#issuecomment-869141142


   > @amazingSaltFish can you add some test cases and add more details to this 
JIRA and this PR.
   
   I lost this scene, but I solved the problem this way...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@griffin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 615382)
Remaining Estimate: 22.5h  (was: 22h 40m)
Time Spent: 1.5h  (was: 1h 20m)

> Hive kerberos for Griffin error
> ---
>
> Key: GRIFFIN-363
> URL: https://issues.apache.org/jira/browse/GRIFFIN-363
> Project: Griffin
>  Issue Type: Bug
>  Components: Service Module
>Affects Versions: 0.6.0
> Environment: CentOS 7 
>Reporter: MenghuiWan
>Priority: Minor
>  Labels: kerberos
>   Original Estimate: 24h
>  Time Spent: 1.5h
>  Remaining Estimate: 22.5h
>
> i am sorry for my english is not well. 
> 
> This is my problem:
> i try to use griffin for data quality projects.
> our enviroment is CDH 6.1.1 cluster with kerberos.
> i want to connect hive then i set up all configurations about kerberos by 
> user guide.
> but i found hive connection had failed.
> 
> error log is:
> {quote}2021-06-15 13:33:01.418  WARN 102938 — [           main] h.metastore   
>                           [487]  : Failed to connect to the MetaStore 
> Server...2021-06-15 13:33:01.418  WARN 102938 — [           main] h.metastore 
>                             [487]  : Failed to connect to the MetaStore 
> Server...2021-06-15 13:33:01.418  INFO 102938 — [           main] h.metastore 
>                             [518]  : Waiting 1 seconds before next connection 
> attempt.2021-06-15 13:33:02.419  INFO 102938 — [           main] h.metastore  
>                            [434]  : Trying to connect to metastore with URI 
> thrift://..com:90832021-06-15 13:33:02.422 ERROR 102938 — [           
> main] o.a.t.t.TSaslTransport                  [315]  : SASL negotiation 
> failure
>  javax.security.sasl.SaslException: GSS initiate failed at 
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
>  ~[?:1.8.0_131] at 
> org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
>  ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) 
> ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
>  ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_131] at 
> javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_131] at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>  ~[hadoop-common-2.7.1.jar:?] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:478)
>  ~[hive-metastore-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:286)
>  ~[hive-metastore-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:211)
>  ~[hive-metastore-2.2.0.jar:2.2.0] at 
> org.apache.griffin.core.metastore.hive.HiveMetaStoreProxy.initHiveMetastoreClient(HiveMetaStoreProxy.java:68)
>  ~[service-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT] at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_131] at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_131] at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_131] at java.lang.reflect.Method.invoke(Method.java:498) 
> ~[?:1.8.0_131] 

[jira] [Work logged] (GRIFFIN-363) Hive kerberos for Griffin error

2021-06-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-363?focusedWorklogId=615381=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-615381
 ]

ASF GitHub Bot logged work on GRIFFIN-363:
--

Author: ASF GitHub Bot
Created on: 27/Jun/21 10:45
Start Date: 27/Jun/21 10:45
Worklog Time Spent: 10m 
  Work Description: lovelyqincai commented on a change in pull request #592:
URL: https://github.com/apache/griffin/pull/592#discussion_r659301139



##
File path: 
service/src/main/java/org/apache/griffin/core/metastore/hive/HiveMetaStoreProxy.java
##
@@ -55,6 +59,28 @@ Licensed to the Apache Software Foundation (ASF) under one
 
 private IMetaStoreClient client = null;
 
+@Value("${hive.krb5conf.path}")
+private String hiveKrb5confPath;
+
+@Value("${hive.keytab.path}")

Review comment:
   ok, I accept your suggestion




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@griffin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 615381)
Remaining Estimate: 22h 40m  (was: 22h 50m)
Time Spent: 1h 20m  (was: 1h 10m)

> Hive kerberos for Griffin error
> ---
>
> Key: GRIFFIN-363
> URL: https://issues.apache.org/jira/browse/GRIFFIN-363
> Project: Griffin
>  Issue Type: Bug
>  Components: Service Module
>Affects Versions: 0.6.0
> Environment: CentOS 7 
>Reporter: MenghuiWan
>Priority: Minor
>  Labels: kerberos
>   Original Estimate: 24h
>  Time Spent: 1h 20m
>  Remaining Estimate: 22h 40m
>
> i am sorry for my english is not well. 
> 
> This is my problem:
> i try to use griffin for data quality projects.
> our enviroment is CDH 6.1.1 cluster with kerberos.
> i want to connect hive then i set up all configurations about kerberos by 
> user guide.
> but i found hive connection had failed.
> 
> error log is:
> {quote}2021-06-15 13:33:01.418  WARN 102938 — [           main] h.metastore   
>                           [487]  : Failed to connect to the MetaStore 
> Server...2021-06-15 13:33:01.418  WARN 102938 — [           main] h.metastore 
>                             [487]  : Failed to connect to the MetaStore 
> Server...2021-06-15 13:33:01.418  INFO 102938 — [           main] h.metastore 
>                             [518]  : Waiting 1 seconds before next connection 
> attempt.2021-06-15 13:33:02.419  INFO 102938 — [           main] h.metastore  
>                            [434]  : Trying to connect to metastore with URI 
> thrift://..com:90832021-06-15 13:33:02.422 ERROR 102938 — [           
> main] o.a.t.t.TSaslTransport                  [315]  : SASL negotiation 
> failure
>  javax.security.sasl.SaslException: GSS initiate failed at 
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
>  ~[?:1.8.0_131] at 
> org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
>  ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) 
> ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
>  ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_131] at 
> javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_131] at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>  ~[hadoop-common-2.7.1.jar:?] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:478)
>  ~[hive-metastore-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:286)
>  ~[hive-metastore-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:211)
>  ~[hive-metastore-2.2.0.jar:2.2.0] at 
> org.apache.griffin.core.metastore.hive.HiveMetaStoreProxy.initHiveMetastoreClient(HiveMetaStoreProxy.java:68)
>  ~[service-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT] at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_131] at 
> 

[jira] [Work logged] (GRIFFIN-363) Hive kerberos for Griffin error

2021-06-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-363?focusedWorklogId=615379=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-615379
 ]

ASF GitHub Bot logged work on GRIFFIN-363:
--

Author: ASF GitHub Bot
Created on: 27/Jun/21 10:28
Start Date: 27/Jun/21 10:28
Worklog Time Spent: 10m 
  Work Description: chitralverma commented on pull request #592:
URL: https://github.com/apache/griffin/pull/592#issuecomment-869138900


   @amazingSaltFish can you add some test cases and add  more details to this 
JIRA and this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@griffin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 615379)
Remaining Estimate: 23h  (was: 23h 10m)
Time Spent: 1h  (was: 50m)

> Hive kerberos for Griffin error
> ---
>
> Key: GRIFFIN-363
> URL: https://issues.apache.org/jira/browse/GRIFFIN-363
> Project: Griffin
>  Issue Type: Bug
>  Components: Service Module
>Affects Versions: 0.6.0
> Environment: CentOS 7 
>Reporter: MenghuiWan
>Priority: Minor
>  Labels: kerberos
>   Original Estimate: 24h
>  Time Spent: 1h
>  Remaining Estimate: 23h
>
> i am sorry for my english is not well. 
> 
> This is my problem:
> i try to use griffin for data quality projects.
> our enviroment is CDH 6.1.1 cluster with kerberos.
> i want to connect hive then i set up all configurations about kerberos by 
> user guide.
> but i found hive connection had failed.
> 
> error log is:
> {quote}2021-06-15 13:33:01.418  WARN 102938 — [           main] h.metastore   
>                           [487]  : Failed to connect to the MetaStore 
> Server...2021-06-15 13:33:01.418  WARN 102938 — [           main] h.metastore 
>                             [487]  : Failed to connect to the MetaStore 
> Server...2021-06-15 13:33:01.418  INFO 102938 — [           main] h.metastore 
>                             [518]  : Waiting 1 seconds before next connection 
> attempt.2021-06-15 13:33:02.419  INFO 102938 — [           main] h.metastore  
>                            [434]  : Trying to connect to metastore with URI 
> thrift://..com:90832021-06-15 13:33:02.422 ERROR 102938 — [           
> main] o.a.t.t.TSaslTransport                  [315]  : SASL negotiation 
> failure
>  javax.security.sasl.SaslException: GSS initiate failed at 
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
>  ~[?:1.8.0_131] at 
> org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
>  ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) 
> ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
>  ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_131] at 
> javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_131] at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>  ~[hadoop-common-2.7.1.jar:?] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:478)
>  ~[hive-metastore-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:286)
>  ~[hive-metastore-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:211)
>  ~[hive-metastore-2.2.0.jar:2.2.0] at 
> org.apache.griffin.core.metastore.hive.HiveMetaStoreProxy.initHiveMetastoreClient(HiveMetaStoreProxy.java:68)
>  ~[service-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT] at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_131] at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_131] at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_131] at java.lang.reflect.Method.invoke(Method.java:498) 
> ~[?:1.8.0_131] at 
> 

[jira] [Work logged] (GRIFFIN-363) Hive kerberos for Griffin error

2021-06-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-363?focusedWorklogId=615378=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-615378
 ]

ASF GitHub Bot logged work on GRIFFIN-363:
--

Author: ASF GitHub Bot
Created on: 27/Jun/21 10:27
Start Date: 27/Jun/21 10:27
Worklog Time Spent: 10m 
  Work Description: chitralverma commented on a change in pull request #592:
URL: https://github.com/apache/griffin/pull/592#discussion_r659298650



##
File path: 
service/src/main/java/org/apache/griffin/core/metastore/hive/HiveMetaStoreProxy.java
##
@@ -55,6 +59,28 @@ Licensed to the Apache Software Foundation (ASF) under one
 
 private IMetaStoreClient client = null;
 
+@Value("${hive.krb5conf.path}")
+private String hiveKrb5confPath;
+
+@Value("${hive.keytab.path}")
+private String keytabPath;
+
+@Value("${hive.keytab.user}")
+private String keytabUser;
+
+@Value("${hive.need.kerberos}")
+private String needKerberos;
+
+@PostConstruct
+public void init() throws IOException {
+if ( needKerberos != null && "true".equalsIgnoreCase(needKerberos) && 
hiveKrb5confPath != null) {
+System.setProperty("java.security.krb5.conf", hiveKrb5confPath);

Review comment:
   This can be a constant.

##
File path: 
service/src/main/java/org/apache/griffin/core/metastore/hive/HiveMetaStoreProxy.java
##
@@ -55,6 +59,28 @@ Licensed to the Apache Software Foundation (ASF) under one
 
 private IMetaStoreClient client = null;
 
+@Value("${hive.krb5conf.path}")
+private String hiveKrb5confPath;
+
+@Value("${hive.keytab.path}")

Review comment:
   Since kerberos doesnt apply to just hive but hadoop services in general, 
the "hive." prefix can be removed from all these configs.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@griffin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 615378)
Remaining Estimate: 23h 10m  (was: 23h 20m)
Time Spent: 50m  (was: 40m)

> Hive kerberos for Griffin error
> ---
>
> Key: GRIFFIN-363
> URL: https://issues.apache.org/jira/browse/GRIFFIN-363
> Project: Griffin
>  Issue Type: Bug
>  Components: Service Module
>Affects Versions: 0.6.0
> Environment: CentOS 7 
>Reporter: MenghuiWan
>Priority: Minor
>  Labels: kerberos
>   Original Estimate: 24h
>  Time Spent: 50m
>  Remaining Estimate: 23h 10m
>
> i am sorry for my english is not well. 
> 
> This is my problem:
> i try to use griffin for data quality projects.
> our enviroment is CDH 6.1.1 cluster with kerberos.
> i want to connect hive then i set up all configurations about kerberos by 
> user guide.
> but i found hive connection had failed.
> 
> error log is:
> {quote}2021-06-15 13:33:01.418  WARN 102938 — [           main] h.metastore   
>                           [487]  : Failed to connect to the MetaStore 
> Server...2021-06-15 13:33:01.418  WARN 102938 — [           main] h.metastore 
>                             [487]  : Failed to connect to the MetaStore 
> Server...2021-06-15 13:33:01.418  INFO 102938 — [           main] h.metastore 
>                             [518]  : Waiting 1 seconds before next connection 
> attempt.2021-06-15 13:33:02.419  INFO 102938 — [           main] h.metastore  
>                            [434]  : Trying to connect to metastore with URI 
> thrift://..com:90832021-06-15 13:33:02.422 ERROR 102938 — [           
> main] o.a.t.t.TSaslTransport                  [315]  : SASL negotiation 
> failure
>  javax.security.sasl.SaslException: GSS initiate failed at 
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
>  ~[?:1.8.0_131] at 
> org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
>  ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) 
> ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
>  ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_131] at 
> 

[jira] [Work logged] (GRIFFIN-363) Hive kerberos for Griffin error

2021-06-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-363?focusedWorklogId=615364=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-615364
 ]

ASF GitHub Bot logged work on GRIFFIN-363:
--

Author: ASF GitHub Bot
Created on: 27/Jun/21 08:35
Start Date: 27/Jun/21 08:35
Worklog Time Spent: 10m 
  Work Description: chitralverma commented on a change in pull request #592:
URL: https://github.com/apache/griffin/pull/592#discussion_r659285098



##
File path: 
service/src/main/java/org/apache/griffin/core/metastore/hive/HiveMetaStoreProxy.java
##
@@ -55,6 +59,28 @@ Licensed to the Apache Software Foundation (ASF) under one
 
 private IMetaStoreClient client = null;
 
+@Value("${hive.krb5conf.path}")
+private String hiveKrb5confPath;
+
+@Value("${hive.keytab.path}")
+private String keytabPath;
+
+@Value("${hive.keytab.user}")
+private String keytabUser;
+
+@Value("${hive.need.kerberos}")
+private String needKerberos;

Review comment:
   Can be a boolean.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@griffin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 615364)
Remaining Estimate: 23h 20m  (was: 23.5h)
Time Spent: 40m  (was: 0.5h)

> Hive kerberos for Griffin error
> ---
>
> Key: GRIFFIN-363
> URL: https://issues.apache.org/jira/browse/GRIFFIN-363
> Project: Griffin
>  Issue Type: Bug
>  Components: Service Module
>Affects Versions: 0.6.0
> Environment: CentOS 7 
>Reporter: MenghuiWan
>Priority: Minor
>  Labels: kerberos
>   Original Estimate: 24h
>  Time Spent: 40m
>  Remaining Estimate: 23h 20m
>
> i am sorry for my english is not well. 
> 
> This is my problem:
> i try to use griffin for data quality projects.
> our enviroment is CDH 6.1.1 cluster with kerberos.
> i want to connect hive then i set up all configurations about kerberos by 
> user guide.
> but i found hive connection had failed.
> 
> error log is:
> {quote}2021-06-15 13:33:01.418  WARN 102938 — [           main] h.metastore   
>                           [487]  : Failed to connect to the MetaStore 
> Server...2021-06-15 13:33:01.418  WARN 102938 — [           main] h.metastore 
>                             [487]  : Failed to connect to the MetaStore 
> Server...2021-06-15 13:33:01.418  INFO 102938 — [           main] h.metastore 
>                             [518]  : Waiting 1 seconds before next connection 
> attempt.2021-06-15 13:33:02.419  INFO 102938 — [           main] h.metastore  
>                            [434]  : Trying to connect to metastore with URI 
> thrift://..com:90832021-06-15 13:33:02.422 ERROR 102938 — [           
> main] o.a.t.t.TSaslTransport                  [315]  : SASL negotiation 
> failure
>  javax.security.sasl.SaslException: GSS initiate failed at 
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
>  ~[?:1.8.0_131] at 
> org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
>  ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) 
> ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
>  ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_131] at 
> javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_131] at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>  ~[hadoop-common-2.7.1.jar:?] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:478)
>  ~[hive-metastore-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:286)
>  ~[hive-metastore-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:211)
>  ~[hive-metastore-2.2.0.jar:2.2.0] at 
> 

[jira] [Work logged] (GRIFFIN-363) Hive kerberos for Griffin error

2021-06-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-363?focusedWorklogId=614639=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614639
 ]

ASF GitHub Bot logged work on GRIFFIN-363:
--

Author: ASF GitHub Bot
Created on: 24/Jun/21 17:45
Start Date: 24/Jun/21 17:45
Worklog Time Spent: 10m 
  Work Description: lovelyqincai removed a comment on pull request #592:
URL: https://github.com/apache/griffin/pull/592#issuecomment-864589726


   why my pr has fail,who can tell me what should i do? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 614639)
Remaining Estimate: 23.5h  (was: 23h 40m)
Time Spent: 0.5h  (was: 20m)

> Hive kerberos for Griffin error
> ---
>
> Key: GRIFFIN-363
> URL: https://issues.apache.org/jira/browse/GRIFFIN-363
> Project: Griffin
>  Issue Type: Bug
>  Components: Service Module
>Affects Versions: 0.6.0
> Environment: CentOS 7 
>Reporter: MenghuiWan
>Priority: Minor
>  Labels: kerberos
>   Original Estimate: 24h
>  Time Spent: 0.5h
>  Remaining Estimate: 23.5h
>
> i am sorry for my english is not well. 
> 
> This is my problem:
> i try to use griffin for data quality projects.
> our enviroment is CDH 6.1.1 cluster with kerberos.
> i want to connect hive then i set up all configurations about kerberos by 
> user guide.
> but i found hive connection had failed.
> 
> error log is:
> {quote}2021-06-15 13:33:01.418  WARN 102938 — [           main] h.metastore   
>                           [487]  : Failed to connect to the MetaStore 
> Server...2021-06-15 13:33:01.418  WARN 102938 — [           main] h.metastore 
>                             [487]  : Failed to connect to the MetaStore 
> Server...2021-06-15 13:33:01.418  INFO 102938 — [           main] h.metastore 
>                             [518]  : Waiting 1 seconds before next connection 
> attempt.2021-06-15 13:33:02.419  INFO 102938 — [           main] h.metastore  
>                            [434]  : Trying to connect to metastore with URI 
> thrift://..com:90832021-06-15 13:33:02.422 ERROR 102938 — [           
> main] o.a.t.t.TSaslTransport                  [315]  : SASL negotiation 
> failure
>  javax.security.sasl.SaslException: GSS initiate failed at 
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
>  ~[?:1.8.0_131] at 
> org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
>  ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) 
> ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
>  ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_131] at 
> javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_131] at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>  ~[hadoop-common-2.7.1.jar:?] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:478)
>  ~[hive-metastore-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:286)
>  ~[hive-metastore-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:211)
>  ~[hive-metastore-2.2.0.jar:2.2.0] at 
> org.apache.griffin.core.metastore.hive.HiveMetaStoreProxy.initHiveMetastoreClient(HiveMetaStoreProxy.java:68)
>  ~[service-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT] at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_131] at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_131] at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_131] at java.lang.reflect.Method.invoke(Method.java:498) 
> ~[?:1.8.0_131] at 
> org.springframework.beans.factory.support.SimpleInstantiationStrategy.instantiate(SimpleInstantiationStrategy.java:154)
>  

[jira] [Work logged] (GRIFFIN-363) Hive kerberos for Griffin error

2021-06-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-363?focusedWorklogId=612388=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-612388
 ]

ASF GitHub Bot logged work on GRIFFIN-363:
--

Author: ASF GitHub Bot
Created on: 20/Jun/21 17:54
Start Date: 20/Jun/21 17:54
Worklog Time Spent: 10m 
  Work Description: lovelyqincai commented on pull request #592:
URL: https://github.com/apache/griffin/pull/592#issuecomment-864589726


   why my pr has fail,who can tell me what should i do? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 612388)
Remaining Estimate: 23h 40m  (was: 23h 50m)
Time Spent: 20m  (was: 10m)

> Hive kerberos for Griffin error
> ---
>
> Key: GRIFFIN-363
> URL: https://issues.apache.org/jira/browse/GRIFFIN-363
> Project: Griffin
>  Issue Type: Bug
>  Components: Service Module
>Affects Versions: 0.6.0
> Environment: CentOS 7 
>Reporter: MenghuiWan
>Priority: Minor
>  Labels: kerberos
>   Original Estimate: 24h
>  Time Spent: 20m
>  Remaining Estimate: 23h 40m
>
> i am sorry for my english is not well. 
> 
> This is my problem:
> i try to use griffin for data quality projects.
> our enviroment is CDH 6.1.1 cluster with kerberos.
> i want to connect hive then i set up all configurations about kerberos by 
> user guide.
> but i found hive connection had failed.
> 
> error log is:
> {quote}2021-06-15 13:33:01.418  WARN 102938 — [           main] h.metastore   
>                           [487]  : Failed to connect to the MetaStore 
> Server...2021-06-15 13:33:01.418  WARN 102938 — [           main] h.metastore 
>                             [487]  : Failed to connect to the MetaStore 
> Server...2021-06-15 13:33:01.418  INFO 102938 — [           main] h.metastore 
>                             [518]  : Waiting 1 seconds before next connection 
> attempt.2021-06-15 13:33:02.419  INFO 102938 — [           main] h.metastore  
>                            [434]  : Trying to connect to metastore with URI 
> thrift://..com:90832021-06-15 13:33:02.422 ERROR 102938 — [           
> main] o.a.t.t.TSaslTransport                  [315]  : SASL negotiation 
> failure
>  javax.security.sasl.SaslException: GSS initiate failed at 
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
>  ~[?:1.8.0_131] at 
> org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
>  ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) 
> ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
>  ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_131] at 
> javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_131] at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>  ~[hadoop-common-2.7.1.jar:?] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:478)
>  ~[hive-metastore-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:286)
>  ~[hive-metastore-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:211)
>  ~[hive-metastore-2.2.0.jar:2.2.0] at 
> org.apache.griffin.core.metastore.hive.HiveMetaStoreProxy.initHiveMetastoreClient(HiveMetaStoreProxy.java:68)
>  ~[service-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT] at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_131] at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_131] at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_131] at java.lang.reflect.Method.invoke(Method.java:498) 
> ~[?:1.8.0_131] at 
> org.springframework.beans.factory.support.SimpleInstantiationStrategy.instantiate(SimpleInstantiationStrategy.java:154)
>  

[jira] [Work logged] (GRIFFIN-363) Hive kerberos for Griffin error

2021-06-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-363?focusedWorklogId=611460=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-611460
 ]

ASF GitHub Bot logged work on GRIFFIN-363:
--

Author: ASF GitHub Bot
Created on: 15/Jun/21 17:00
Start Date: 15/Jun/21 17:00
Worklog Time Spent: 10m 
  Work Description: amazingSaltFish opened a new pull request #592:
URL: https://github.com/apache/griffin/pull/592


   
url:https://issues.apache.org/jira/projects/GRIFFIN/issues/GRIFFIN-363?filter=allopenissues
   1. add new property hive.krb5conf.path.
   2. solve hive metastore kerberos error.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 611460)
Remaining Estimate: 23h 50m  (was: 24h)
Time Spent: 10m

> Hive kerberos for Griffin error
> ---
>
> Key: GRIFFIN-363
> URL: https://issues.apache.org/jira/browse/GRIFFIN-363
> Project: Griffin
>  Issue Type: Bug
>  Components: Service Module
>Affects Versions: 0.6.0
> Environment: CentOS 7 
>Reporter: MenghuiWan
>Priority: Minor
>  Labels: kerberos
>   Original Estimate: 24h
>  Time Spent: 10m
>  Remaining Estimate: 23h 50m
>
> i am sorry for my english is not well. 
> 
> This is my problem:
> i try to use griffin for data quality projects.
> our enviroment is CDH 6.1.1 cluster with kerberos.
> i want to connect hive then i set up all configurations about kerberos by 
> user guide.
> but i found hive connection had failed.
> 
> error log is:
> {quote}2021-06-15 13:33:01.418  WARN 102938 — [           main] h.metastore   
>                           [487]  : Failed to connect to the MetaStore 
> Server...2021-06-15 13:33:01.418  WARN 102938 — [           main] h.metastore 
>                             [487]  : Failed to connect to the MetaStore 
> Server...2021-06-15 13:33:01.418  INFO 102938 — [           main] h.metastore 
>                             [518]  : Waiting 1 seconds before next connection 
> attempt.2021-06-15 13:33:02.419  INFO 102938 — [           main] h.metastore  
>                            [434]  : Trying to connect to metastore with URI 
> thrift://..com:90832021-06-15 13:33:02.422 ERROR 102938 — [           
> main] o.a.t.t.TSaslTransport                  [315]  : SASL negotiation 
> failure
>  javax.security.sasl.SaslException: GSS initiate failed at 
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
>  ~[?:1.8.0_131] at 
> org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
>  ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) 
> ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
>  ~[libthrift-0.9.3.jar:0.9.3] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_131] at 
> javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_131] at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>  ~[hadoop-common-2.7.1.jar:?] at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
>  ~[hive-shims-common-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:478)
>  ~[hive-metastore-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:286)
>  ~[hive-metastore-2.2.0.jar:2.2.0] at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:211)
>  ~[hive-metastore-2.2.0.jar:2.2.0] at 
> org.apache.griffin.core.metastore.hive.HiveMetaStoreProxy.initHiveMetastoreClient(HiveMetaStoreProxy.java:68)
>  ~[service-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT] at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_131] at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_131] at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_131] at java.lang.reflect.Method.invoke(Method.java:498) 
> ~[?:1.8.0_131] at 
> 

[jira] [Work logged] (GRIFFIN-358) Rewrite the Rule/Measure implementations

2021-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-358?focusedWorklogId=603701=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-603701
 ]

ASF GitHub Bot logged work on GRIFFIN-358:
--

Author: ASF GitHub Bot
Created on: 28/May/21 20:51
Start Date: 28/May/21 20:51
Worklog Time Spent: 10m 
  Work Description: chitralverma opened a new pull request #591:
URL: https://github.com/apache/griffin/pull/591


   **What changes were proposed in this pull request?**
   
   
   Current `RuleParams` can be of the following 3 DSL types,
   
   - Data Ops (for source preprocessing)
   - Griffin DSL
   - SparkSQL
   
   GriffinDSL allows the implementation of measures (DQ Types) like 
Completeness, Accuracy, etc.
   
   To enable such measures there is an extensive implementation of expression, 
task hierarchies, parsing and most of this is heavily dependent on 
scala-parser-combinators.
   
   At the end of the implementation, Griffin DSL tries to mimic a SparkSQL-like 
query but substitution of user-defined constraints.
   
   This approach has some drawbacks,
   
   - Suboptimal processing. While the transformation steps execute in parallel 
on the driver, the data set is still scanned multiple times in parallel which 
can cause inefficiencies on the SparkSession side and the internal task 
scheduler was single-threaded. Even though the data set can be cached, still it 
branched and crucial memory is required for holding the dataset rather than 
processing it.
   - Internal functions of Spark are not used. Data preprocessing has a very 
limited scope currently even though we have 100s spark SQL functions available 
for use.
   - This blocks structured streaming. The manually constructed SQL queries 
cause multiple aggregations in the same query on a streaming data set which is 
not supported by Spark's Structured streaming. There are workarounds for this 
but they all require rewriting the *Expr2DQSteps classes.
   - Griffin DSL is SparkSQL like but not 100% compatible. Profiling measure 
and SparkSQL are redundant functionalities
   
   The proposed solution involves SparkSQL DSL based measures and some changes 
to Rule Params. This will enhance the data pre proc flows and the measures 
themselves
   
   
   **Does this PR introduce any user-facing change?**
   Yes. Users can use the new measures as a separate configuration and there is 
scope for more data pre-processing.
   
   **How was this patch tested?**
   Unit Tests


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 603701)
Remaining Estimate: 0h
Time Spent: 10m

> Rewrite the Rule/Measure implementations
> 
>
> Key: GRIFFIN-358
> URL: https://issues.apache.org/jira/browse/GRIFFIN-358
> Project: Griffin
>  Issue Type: New Feature
>Reporter: Chitral Verma
>Assignee: Chitral Verma
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Current `RuleParams` can be of the following 3 DSL types,
>  * Data Ops (for source preprocessing)
>  * Griffin DSL
>  * SparkSQL
> GriffinDSL allows the implementation of measures (DQ Types) like 
> Completeness, Accuracy, etc.
> To enable such measures there is an extensive implementation of expression, 
> task hierarchies, parsing and most of this is heavily dependent on 
> scala-parser-combinators.
> At the end of the implementation, Griffin DSL tries to mimic a SparkSQL-like 
> query but substitution of user-defined constraints.
> This approach has some drawbacks,
>  * Suboptimal processing. While the transformation steps execute in parallel 
> on the driver, the data set is still scanned multiple times in parallel which 
> can cause inefficiencies on the SparkSession side and the internal task 
> scheduler was single-threaded. Even though the data set can be cached, still 
> it branched and crucial memory is required for holding the dataset rather 
> than processing it.
>  * Internal functions of Spark are not used. Data preprocessing has a very 
> limited scope currently even though we have 100s spark SQL functions 
> available for use.
>  * This blocks structured streaming. The manually constructed SQL queries 
> cause multiple aggregations in the same query on a streaming data set which 
> is not supported by Spark's Structured streaming. There are workarounds for 
> this but they all require rewriting the *Expr2DQSteps classes.
>  * Griffin DSL is SparkSQL like but not 100% compatible. Profiling measure 
> and SparkSQL are redundant functionalities
> The proposed 

[jira] [Work logged] (GRIFFIN-347) Setup automated workflows

2021-04-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-347?focusedWorklogId=586754=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-586754
 ]

ASF GitHub Bot logged work on GRIFFIN-347:
--

Author: ASF GitHub Bot
Created on: 21/Apr/21 17:16
Start Date: 21/Apr/21 17:16
Worklog Time Spent: 10m 
  Work Description: chitralverma commented on pull request #583:
URL: https://github.com/apache/griffin/pull/583#issuecomment-824225116


   @guoyuepeng @wankunde can you review this?
   I have updated the scope of this ticket. The automation now focuses only on 
PRs now since Jira handles all the issues for Apache projects.
   
   We can ideally close the Jira tickets from this automation as well but that 
will involve the creation of tokens and setting them like secrets in the repo 
settings. Also, that automation should be set up directly in Jira as Github 
won't be able to track the activity on a Jira ticket from here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 586754)
Time Spent: 1h 10m  (was: 1h)

> Setup automated workflows
> -
>
> Key: GRIFFIN-347
> URL: https://issues.apache.org/jira/browse/GRIFFIN-347
> Project: Griffin
>  Issue Type: Improvement
>Reporter: Chitral Verma
>Assignee: Chitral Verma
>Priority: Trivial
>   Original Estimate: 10m
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> This ticket aims to set up the following 2 automation in Github. 
> *Check for Stale PRs/ Issues*
>  Add a GitHub Workflow that automatically checks for stale PRs and tags/ 
> closes them when inactive for a long duration.
> {quote}This workflow will run every day at 00.00 UTC to check for any issues/ 
> PRs that have been inactive for over 30 and will close them in another 15 
> days.
>  Additionally, the stale PRs will be marked with a {{no-pr-activity}} label. 
> PR s having {{awaiting-approval}}, {{work-in-progress}} or {{wip}} label are 
> excluded from this check.
> {quote}
>  
> *Greet new users*
> {quote}Add a GitHub Workflow that automatically greets new users on their 
> first PR.
> {quote}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GRIFFIN-360) Improve merge_pr.py script

2021-04-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-360?focusedWorklogId=586629=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-586629
 ]

ASF GitHub Bot logged work on GRIFFIN-360:
--

Author: ASF GitHub Bot
Created on: 21/Apr/21 14:46
Start Date: 21/Apr/21 14:46
Worklog Time Spent: 10m 
  Work Description: chitralverma commented on pull request #590:
URL: https://github.com/apache/griffin/pull/590#issuecomment-824120086


   @guoyuepeng @wankunde  can you review this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 586629)
Remaining Estimate: 40m  (was: 50m)
Time Spent: 20m  (was: 10m)

> Improve merge_pr.py script
> --
>
> Key: GRIFFIN-360
> URL: https://issues.apache.org/jira/browse/GRIFFIN-360
> Project: Griffin
>  Issue Type: Improvement
>Reporter: Chitral Verma
>Priority: Major
>   Original Estimate: 1h
>  Time Spent: 20m
>  Remaining Estimate: 40m
>
> The merge_pr.py script can be improved with many good-to-have changes like 
> below,
>  * allow python 3 compatibility
>  * better check for Jira dependency
>  * Updating Jira with more details like assignee and contributor details
>  * upgrading dependencies
> Also added a requirements.txt file for installation of script dependencies.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GRIFFIN-360) Improve merge_pr.py script

2021-04-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-360?focusedWorklogId=586627=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-586627
 ]

ASF GitHub Bot logged work on GRIFFIN-360:
--

Author: ASF GitHub Bot
Created on: 21/Apr/21 14:45
Start Date: 21/Apr/21 14:45
Worklog Time Spent: 10m 
  Work Description: chitralverma opened a new pull request #590:
URL: https://github.com/apache/griffin/pull/590


   **What changes were proposed in this pull request?**
   
   The merge_pr.py script can be improved with many good-to-have changes like 
below,
   
   - allow python 3 compatibility
   - better check for Jira dependency
   - Updating Jira with more details like assignee and contributor details
   - upgrading dependencies
   
   Also added a requirements.txt file for installation of script dependencies.
   
   **Does this PR introduce any user-facing change?**
   No. Committers will use this script to merge changes.
   
   **How was this patch tested?**
   In sync with the Spark merge script. used this to merge previous PRs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 586627)
Remaining Estimate: 50m  (was: 1h)
Time Spent: 10m

> Improve merge_pr.py script
> --
>
> Key: GRIFFIN-360
> URL: https://issues.apache.org/jira/browse/GRIFFIN-360
> Project: Griffin
>  Issue Type: Improvement
>Reporter: Chitral Verma
>Priority: Major
>   Original Estimate: 1h
>  Time Spent: 10m
>  Remaining Estimate: 50m
>
> The merge_pr.py script can be improved with many good-to-have changes like 
> below,
>  * allow python 3 compatibility
>  * better check for Jira dependency
>  * Updating Jira with more details like assignee and contributor details
>  * upgrading dependencies
> Also added a requirements.txt file for installation of script dependencies.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GRIFFIN-345) Support cross-version compilation for Scala and other dependencies

2021-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-345?focusedWorklogId=562925=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-562925
 ]

ASF GitHub Bot logged work on GRIFFIN-345:
--

Author: ASF GitHub Bot
Created on: 09/Mar/21 09:03
Start Date: 09/Mar/21 09:03
Worklog Time Spent: 10m 
  Work Description: asfgit closed pull request #589:
URL: https://github.com/apache/griffin/pull/589


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 562925)
Time Spent: 1.5h  (was: 1h 20m)

> Support cross-version compilation for Scala and other dependencies
> --
>
> Key: GRIFFIN-345
> URL: https://issues.apache.org/jira/browse/GRIFFIN-345
> Project: Griffin
>  Issue Type: Improvement
>Reporter: Tushar
>Assignee: Chitral Verma
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Following topics are covered in this ticket,
>  * Cross-compilation across scala major versions (2.11, 2.12 and 2.13)
>  * Update Spark Version (2.4+)
>  * Explore maven profiles to execute on both Vanilla HDFS and AWS EMR (these 
> profiles can be extended to support GCP etc.)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GRIFFIN-345) Support cross-version compilation for Scala and other dependencies

2021-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-345?focusedWorklogId=560537=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-560537
 ]

ASF GitHub Bot logged work on GRIFFIN-345:
--

Author: ASF GitHub Bot
Created on: 03/Mar/21 16:03
Start Date: 03/Mar/21 16:03
Worklog Time Spent: 10m 
  Work Description: chitralverma commented on pull request #589:
URL: https://github.com/apache/griffin/pull/589#issuecomment-789824308


   Great, I'll merge this then when I'm back working next.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 560537)
Time Spent: 1h 20m  (was: 1h 10m)

> Support cross-version compilation for Scala and other dependencies
> --
>
> Key: GRIFFIN-345
> URL: https://issues.apache.org/jira/browse/GRIFFIN-345
> Project: Griffin
>  Issue Type: Improvement
>Reporter: Tushar
>Assignee: Chitral Verma
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Following topics are covered in this ticket,
>  * Cross-compilation across scala major versions (2.11, 2.12 and 2.13)
>  * Update Spark Version (2.4+)
>  * Explore maven profiles to execute on both Vanilla HDFS and AWS EMR (these 
> profiles can be extended to support GCP etc.)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GRIFFIN-345) Support cross-version compilation for Scala and other dependencies

2021-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-345?focusedWorklogId=560347=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-560347
 ]

ASF GitHub Bot logged work on GRIFFIN-345:
--

Author: ASF GitHub Bot
Created on: 03/Mar/21 08:31
Start Date: 03/Mar/21 08:31
Worklog Time Spent: 10m 
  Work Description: guoyuepeng commented on pull request #589:
URL: https://github.com/apache/griffin/pull/589#issuecomment-789536973


   LGTM!
   Thanks guys.
   @chitralverma @wankunde 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 560347)
Time Spent: 1h 10m  (was: 1h)

> Support cross-version compilation for Scala and other dependencies
> --
>
> Key: GRIFFIN-345
> URL: https://issues.apache.org/jira/browse/GRIFFIN-345
> Project: Griffin
>  Issue Type: Improvement
>Reporter: Tushar
>Assignee: Chitral Verma
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Following topics are covered in this ticket,
>  * Cross-compilation across scala major versions (2.11, 2.12 and 2.13)
>  * Update Spark Version (2.4+)
>  * Explore maven profiles to execute on both Vanilla HDFS and AWS EMR (these 
> profiles can be extended to support GCP etc.)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GRIFFIN-345) Support cross-version compilation for Scala and other dependencies

2021-03-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-345?focusedWorklogId=560303=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-560303
 ]

ASF GitHub Bot logged work on GRIFFIN-345:
--

Author: ASF GitHub Bot
Created on: 03/Mar/21 06:33
Start Date: 03/Mar/21 06:33
Worklog Time Spent: 10m 
  Work Description: wankunde commented on pull request #589:
URL: https://github.com/apache/griffin/pull/589#issuecomment-789474905


   LGTM
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 560303)
Time Spent: 1h  (was: 50m)

> Support cross-version compilation for Scala and other dependencies
> --
>
> Key: GRIFFIN-345
> URL: https://issues.apache.org/jira/browse/GRIFFIN-345
> Project: Griffin
>  Issue Type: Improvement
>Reporter: Tushar
>Assignee: Chitral Verma
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Following topics are covered in this ticket,
>  * Cross-compilation across scala major versions (2.11, 2.12 and 2.13)
>  * Update Spark Version (2.4+)
>  * Explore maven profiles to execute on both Vanilla HDFS and AWS EMR (these 
> profiles can be extended to support GCP etc.)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GRIFFIN-345) Support cross-version compilation for Scala and other dependencies

2021-03-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-345?focusedWorklogId=560070=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-560070
 ]

ASF GitHub Bot logged work on GRIFFIN-345:
--

Author: ASF GitHub Bot
Created on: 02/Mar/21 17:52
Start Date: 02/Mar/21 17:52
Worklog Time Spent: 10m 
  Work Description: chitralverma commented on pull request #589:
URL: https://github.com/apache/griffin/pull/589#issuecomment-789093549


   @wankunde @guoyuepeng Just added support for cross version build against 
Spark 3.0.2 as well !  



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 560070)
Time Spent: 50m  (was: 40m)

> Support cross-version compilation for Scala and other dependencies
> --
>
> Key: GRIFFIN-345
> URL: https://issues.apache.org/jira/browse/GRIFFIN-345
> Project: Griffin
>  Issue Type: Improvement
>Reporter: Tushar
>Assignee: Chitral Verma
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Following topics are covered in this ticket,
>  * Cross-compilation across scala major versions (2.11, 2.12 and 2.13)
>  * Update Spark Version (2.4+)
>  * Explore maven profiles to execute on both Vanilla HDFS and AWS EMR (these 
> profiles can be extended to support GCP etc.)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GRIFFIN-345) Support cross-version compilation for Scala and other dependencies

2021-03-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-345?focusedWorklogId=560010=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-560010
 ]

ASF GitHub Bot logged work on GRIFFIN-345:
--

Author: ASF GitHub Bot
Created on: 02/Mar/21 15:39
Start Date: 02/Mar/21 15:39
Worklog Time Spent: 10m 
  Work Description: chitralverma commented on pull request #589:
URL: https://github.com/apache/griffin/pull/589#issuecomment-788999767


   @wankunde spark 3 has some interface changes but let me check



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 560010)
Time Spent: 40m  (was: 0.5h)

> Support cross-version compilation for Scala and other dependencies
> --
>
> Key: GRIFFIN-345
> URL: https://issues.apache.org/jira/browse/GRIFFIN-345
> Project: Griffin
>  Issue Type: Improvement
>Reporter: Tushar
>Assignee: Chitral Verma
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Following topics are covered in this ticket,
>  * Cross-compilation across scala major versions (2.11, 2.12 and 2.13)
>  * Update Spark Version (2.4+)
>  * Explore maven profiles to execute on both Vanilla HDFS and AWS EMR (these 
> profiles can be extended to support GCP etc.)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GRIFFIN-345) Support cross-version compilation for Scala and other dependencies

2021-03-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-345?focusedWorklogId=559984=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-559984
 ]

ASF GitHub Bot logged work on GRIFFIN-345:
--

Author: ASF GitHub Bot
Created on: 02/Mar/21 14:37
Start Date: 02/Mar/21 14:37
Worklog Time Spent: 10m 
  Work Description: wankunde commented on pull request #589:
URL: https://github.com/apache/griffin/pull/589#issuecomment-788953867


   @chitralverma Hi, chitralverma , I think it's very useful to support 
cross-version compilation for Scala and Spark dependencies. Since spark 3 has 
been released for some time, can we support it at the same time?
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 559984)
Time Spent: 0.5h  (was: 20m)

> Support cross-version compilation for Scala and other dependencies
> --
>
> Key: GRIFFIN-345
> URL: https://issues.apache.org/jira/browse/GRIFFIN-345
> Project: Griffin
>  Issue Type: Improvement
>Reporter: Tushar
>Assignee: Chitral Verma
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Following topics are covered in this ticket,
>  * Cross-compilation across scala major versions (2.11, 2.12 and 2.13)
>  * Update Spark Version (2.4+)
>  * Explore maven profiles to execute on both Vanilla HDFS and AWS EMR (these 
> profiles can be extended to support GCP etc.)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GRIFFIN-345) Support cross-version compilation for Scala and other dependencies

2021-02-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-345?focusedWorklogId=558549=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-558549
 ]

ASF GitHub Bot logged work on GRIFFIN-345:
--

Author: ASF GitHub Bot
Created on: 26/Feb/21 12:33
Start Date: 26/Feb/21 12:33
Worklog Time Spent: 10m 
  Work Description: chitralverma commented on pull request #589:
URL: https://github.com/apache/griffin/pull/589#issuecomment-786620924


   @wankunde @guoyuepeng  please review this



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 558549)
Time Spent: 20m  (was: 10m)

> Support cross-version compilation for Scala and other dependencies
> --
>
> Key: GRIFFIN-345
> URL: https://issues.apache.org/jira/browse/GRIFFIN-345
> Project: Griffin
>  Issue Type: Improvement
>Reporter: Tushar
>Assignee: Chitral Verma
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Following topics are covered in this ticket,
>  * Cross-compilation across scala major versions (2.11, 2.12 and 2.13)
>  * Update Spark Version (2.4+)
>  * Explore maven profiles to execute on both Vanilla HDFS and AWS EMR (these 
> profiles can be extended to support GCP etc.)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GRIFFIN-345) Support cross-version compilation for Scala and other dependencies

2021-02-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-345?focusedWorklogId=557740=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-557740
 ]

ASF GitHub Bot logged work on GRIFFIN-345:
--

Author: ASF GitHub Bot
Created on: 25/Feb/21 06:11
Start Date: 25/Feb/21 06:11
Worklog Time Spent: 10m 
  Work Description: chitralverma opened a new pull request #589:
URL: https://github.com/apache/griffin/pull/589


   **What changes were proposed in this pull request?**
   
   _This PR affects only the measure module._
   
   In newer environments specially clouds, Griffin measure module may face 
compatibility issues due the old Scala and Spark versions. To remedy this 
following topics are covered in this ticket,
   
   - Cross-compilation across scala major versions (2.11, 2.12)
   - Update Spark Version (2.4+)
   - Create maven profiles to build different scala and spark versions
   - Changes to build strategy
   
   This process is also done is apache spark to build for different versions of 
Scala and Hadoop.
   
   **Does this PR introduce any user-facing change?**
   No
   
   **How was this patch tested?**
   Via maven build process.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 557740)
Remaining Estimate: 0h
Time Spent: 10m

> Support cross-version compilation for Scala and other dependencies
> --
>
> Key: GRIFFIN-345
> URL: https://issues.apache.org/jira/browse/GRIFFIN-345
> Project: Griffin
>  Issue Type: Improvement
>Reporter: Tushar
>Assignee: Chitral Verma
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Following topics are covered in this ticket,
>  * Cross-compilation across scala major versions (2.11, 2.12 and 2.13)
>  * Update Spark Version (2.4+)
>  * Explore maven profiles to execute on both Vanilla HDFS and AWS EMR (these 
> profiles can be extended to support GCP etc.)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GRIFFIN-347) Setup automated workflows

2020-09-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-347?focusedWorklogId=491777=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-491777
 ]

ASF GitHub Bot logged work on GRIFFIN-347:
--

Author: ASF GitHub Bot
Created on: 28/Sep/20 00:55
Start Date: 28/Sep/20 00:55
Worklog Time Spent: 10m 
  Work Description: guoyuepeng commented on pull request #583:
URL: https://github.com/apache/griffin/pull/583#issuecomment-699715572


   > > @chitralverma
   > > Good to know this feature, have you tested it on our jira?
   > > Thanks,
   > > William
   > 
   > Hi William, the bot only closes the issues and PRs on Github. Do you want 
me to update this to close the tickets automatically as well?
   
   Cool, please close the ticket at the same time.
   BTW, when you say close the issue, do you mean close JIRA tickets? Where are 
those issues documented?
   
   Thanks,
   William



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 491777)
Time Spent: 1h  (was: 50m)

> Setup automated workflows
> -
>
> Key: GRIFFIN-347
> URL: https://issues.apache.org/jira/browse/GRIFFIN-347
> Project: Griffin
>  Issue Type: Improvement
>Reporter: Chitral Verma
>Assignee: Chitral Verma
>Priority: Trivial
>   Original Estimate: 10m
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> *Check for Stale PRs/ Issues*
> Add a GitHub Workflow that automatically checks for stale PRs and Issues and 
> tags/ closes them when inactive for a long duration.
> {quote}This workflow will run every day at 00.00 UTC to check for any issues/ 
> PRs that have been inactive for over 30 and will close them in another 15 
> days.
> Additionally, the stale issues will be marked with `stale-issue` label and 
> the stale PRs will be marked with `stale-pr` label. Issues/ PR having 
> `awaiting-approval` or `work-in-progress` labels are excluded from this check.
> {quote}
> *Greet new users*
> Add a GitHub Workflow that automatically greets new users on their first PR/ 
> Issue.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GRIFFIN-347) Setup automated workflows

2020-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-347?focusedWorklogId=489536=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-489536
 ]

ASF GitHub Bot logged work on GRIFFIN-347:
--

Author: ASF GitHub Bot
Created on: 23/Sep/20 11:38
Start Date: 23/Sep/20 11:38
Worklog Time Spent: 10m 
  Work Description: chitralverma commented on pull request #583:
URL: https://github.com/apache/griffin/pull/583#issuecomment-697307471


   > @chitralverma
   > 
   > Good to know this feature, have you tested it on our jira?
   > 
   > Thanks,
   > William
   
   Hi William, the bot only closes the issues and PRs on Github. Do you want me 
to update this to close the tickets automatically as well?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 489536)
Time Spent: 50m  (was: 40m)

> Setup automated workflows
> -
>
> Key: GRIFFIN-347
> URL: https://issues.apache.org/jira/browse/GRIFFIN-347
> Project: Griffin
>  Issue Type: Improvement
>Reporter: Chitral Verma
>Assignee: Chitral Verma
>Priority: Trivial
>   Original Estimate: 10m
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> *Check for Stale PRs/ Issues*
> Add a GitHub Workflow that automatically checks for stale PRs and Issues and 
> tags/ closes them when inactive for a long duration.
> {quote}This workflow will run every day at 00.00 UTC to check for any issues/ 
> PRs that have been inactive for over 30 and will close them in another 15 
> days.
> Additionally, the stale issues will be marked with `stale-issue` label and 
> the stale PRs will be marked with `stale-pr` label. Issues/ PR having 
> `awaiting-approval` or `work-in-progress` labels are excluded from this check.
> {quote}
> *Greet new users*
> Add a GitHub Workflow that automatically greets new users on their first PR/ 
> Issue.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GRIFFIN-347) Setup automated workflows

2020-09-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-347?focusedWorklogId=488805=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-488805
 ]

ASF GitHub Bot logged work on GRIFFIN-347:
--

Author: ASF GitHub Bot
Created on: 23/Sep/20 04:11
Start Date: 23/Sep/20 04:11
Worklog Time Spent: 10m 
  Work Description: guoyuepeng commented on pull request #583:
URL: https://github.com/apache/griffin/pull/583#issuecomment-697044695


   @chitralverma 
   
   Good to know this feature, have you tested it on our jira?
   
   Thanks,
   William



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 488805)
Time Spent: 40m  (was: 0.5h)

> Setup automated workflows
> -
>
> Key: GRIFFIN-347
> URL: https://issues.apache.org/jira/browse/GRIFFIN-347
> Project: Griffin
>  Issue Type: Improvement
>Reporter: Chitral Verma
>Assignee: Chitral Verma
>Priority: Trivial
>   Original Estimate: 10m
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> *Check for Stale PRs/ Issues*
> Add a GitHub Workflow that automatically checks for stale PRs and Issues and 
> tags/ closes them when inactive for a long duration.
> {quote}This workflow will run every day at 00.00 UTC to check for any issues/ 
> PRs that have been inactive for over 30 and will close them in another 15 
> days.
> Additionally, the stale issues will be marked with `stale-issue` label and 
> the stale PRs will be marked with `stale-pr` label. Issues/ PR having 
> `awaiting-approval` or `work-in-progress` labels are excluded from this check.
> {quote}
> *Greet new users*
> Add a GitHub Workflow that automatically greets new users on their first PR/ 
> Issue.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GRIFFIN-347) Setup automated workflows

2020-09-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-347?focusedWorklogId=487429=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-487429
 ]

ASF GitHub Bot logged work on GRIFFIN-347:
--

Author: ASF GitHub Bot
Created on: 22/Sep/20 03:00
Start Date: 22/Sep/20 03:00
Worklog Time Spent: 10m 
  Work Description: chitralverma opened a new pull request #583:
URL: https://github.com/apache/griffin/pull/583


   **What changes were proposed in this pull request?**
   
   Add a GitHub Workflow that automatically checks for stale PRs and Issues and 
tags/ closes them when inactive for a long duration.
   
   > This workflow will run every day at 00.00 UTC to check for any issues/ PRs 
that have been inactive for over 30 and will close them in another 15 days.
   > Additionally, the stale issues will be marked with `stale-issue` label and 
the stale PRs will be marked with `stale-pr` label. Issues/ PR having 
`awaiting-approval` or `work-in-progress` labels are excluded from this check.
   
   Greet new users
   
   Add a GitHub Workflow that automatically greets new users on their first PR/ 
Issue.
   
   **Does this PR introduce any user-facing change?**
   Yes. Users will see comments on PRs and Issues
   
   **How was this patch tested?**
   No test required



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 487429)
Time Spent: 20m  (was: 10m)

> Setup automated workflows
> -
>
> Key: GRIFFIN-347
> URL: https://issues.apache.org/jira/browse/GRIFFIN-347
> Project: Griffin
>  Issue Type: Improvement
>Reporter: Chitral Verma
>Assignee: Chitral Verma
>Priority: Trivial
>   Original Estimate: 10m
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> *Check for Stale PRs/ Issues*
> Add a GitHub Workflow that automatically checks for stale PRs and Issues and 
> tags/ closes them when inactive for a long duration.
> {quote}This workflow will run every day at 00.00 UTC to check for any issues/ 
> PRs that have been inactive for over 30 and will close them in another 15 
> days.
> Additionally, the stale issues will be marked with `stale-issue` label and 
> the stale PRs will be marked with `stale-pr` label. Issues/ PR having 
> `awaiting-approval` or `work-in-progress` labels are excluded from this check.
> {quote}
> *Greet new users*
> Add a GitHub Workflow that automatically greets new users on their first PR/ 
> Issue.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GRIFFIN-347) Setup automated workflows

2020-09-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-347?focusedWorklogId=486764=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-486764
 ]

ASF GitHub Bot logged work on GRIFFIN-347:
--

Author: ASF GitHub Bot
Created on: 21/Sep/20 05:11
Start Date: 21/Sep/20 05:11
Worklog Time Spent: 10m 
  Work Description: chitralverma opened a new pull request #583:
URL: https://github.com/apache/griffin/pull/583


   **What changes were proposed in this pull request?**
   
   Add a GitHub Workflow that automatically checks for stale PRs and Issues and 
tags/ closes them when inactive for a long duration.
   
   > This workflow will run every day at 00.00 UTC to check for any issues/ PRs 
that have been inactive for over 30 and will close them in another 15 days.
   > Additionally, the stale issues will be marked with `stale-issue` label and 
the stale PRs will be marked with `stale-pr` label. Issues/ PR having 
`awaiting-approval` or `work-in-progress` labels are excluded from this check.
   
   Greet new users
   
   Add a GitHub Workflow that automatically greets new users on their first PR/ 
Issue.
   
   **Does this PR introduce any user-facing change?**
   Yes. Users will see comments on PRs and Issues
   
   **How was this patch tested?**
   No test required



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 486764)
Remaining Estimate: 0h  (was: 10m)
Time Spent: 10m

> Setup automated workflows
> -
>
> Key: GRIFFIN-347
> URL: https://issues.apache.org/jira/browse/GRIFFIN-347
> Project: Griffin
>  Issue Type: Improvement
>Reporter: Chitral Verma
>Assignee: Chitral Verma
>Priority: Trivial
>   Original Estimate: 10m
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> *Check for Stale PRs/ Issues*
> Add a GitHub Workflow that automatically checks for stale PRs and Issues and 
> tags/ closes them when inactive for a long duration.
> {quote}This workflow will run every day at 00.00 UTC to check for any issues/ 
> PRs that have been inactive for over 30 and will close them in another 15 
> days.
> Additionally, the stale issues will be marked with `stale-issue` label and 
> the stale PRs will be marked with `stale-pr` label. Issues/ PR having 
> `awaiting-approval` or `work-in-progress` labels are excluded from this check.
> {quote}
> *Greet new users*
> Add a GitHub Workflow that automatically greets new users on their first PR/ 
> Issue.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GRIFFIN-305) Standardize Sink Hierarchy

2020-08-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-305?focusedWorklogId=468374=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-468374
 ]

ASF GitHub Bot logged work on GRIFFIN-305:
--

Author: ASF GitHub Bot
Created on: 10/Aug/20 02:53
Start Date: 10/Aug/20 02:53
Worklog Time Spent: 10m 
  Work Description: guoyuepeng commented on pull request #575:
URL: https://github.com/apache/griffin/pull/575#issuecomment-671141520


   @chitralverma 
   
   The merge process as following:
   use python 2.7
   
   - run ./merge_pr.py
   - Which pull request would you like to merge? (e.g. 34): 575 - select 575
   - Proceed with merging pull request #575? (y/n): y
   - Merge complete (local ref PR_TOOL_MERGE_PR_575_MASTER). Push to 
apache-git? (y/n): y
   - Would you like to pick 1aa8995a into another branch? (y/n): n
   - Would you like to update an associated JIRA? (y/n): y
   - Enter comma-separated fix version(s) [0.6.0]:
   
   You should have permission for this. tell me if you encounter any problem.
   
   Thanks,
   William



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 468374)
Time Spent: 2h 10m  (was: 2h)

> Standardize Sink Hierarchy
> --
>
> Key: GRIFFIN-305
> URL: https://issues.apache.org/jira/browse/GRIFFIN-305
> Project: Griffin
>  Issue Type: Sub-task
>Reporter: Chitral Verma
>Priority: Major
> Fix For: 0.6.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GRIFFIN-305) Standardize Sink Hierarchy

2020-08-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-305?focusedWorklogId=468373=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-468373
 ]

ASF GitHub Bot logged work on GRIFFIN-305:
--

Author: ASF GitHub Bot
Created on: 10/Aug/20 02:50
Start Date: 10/Aug/20 02:50
Worklog Time Spent: 10m 
  Work Description: asfgit closed pull request #575:
URL: https://github.com/apache/griffin/pull/575


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 468373)
Time Spent: 2h  (was: 1h 50m)

> Standardize Sink Hierarchy
> --
>
> Key: GRIFFIN-305
> URL: https://issues.apache.org/jira/browse/GRIFFIN-305
> Project: Griffin
>  Issue Type: Sub-task
>Reporter: Chitral Verma
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GRIFFIN-305) Standardize Sink Hierarchy

2020-08-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-305?focusedWorklogId=467097=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-467097
 ]

ASF GitHub Bot logged work on GRIFFIN-305:
--

Author: ASF GitHub Bot
Created on: 06/Aug/20 06:41
Start Date: 06/Aug/20 06:41
Worklog Time Spent: 10m 
  Work Description: chitralverma commented on pull request #575:
URL: https://github.com/apache/griffin/pull/575#issuecomment-669736481


   absolutely, I'm all in for Griffin. :)
   @wankunde can you please merge this. 
   
   Also, can you tell me how the requests are merged for this project so that I 
can help close some of the open PRs. Thanks.
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 467097)
Time Spent: 1h 50m  (was: 1h 40m)

> Standardize Sink Hierarchy
> --
>
> Key: GRIFFIN-305
> URL: https://issues.apache.org/jira/browse/GRIFFIN-305
> Project: Griffin
>  Issue Type: Sub-task
>Reporter: Chitral Verma
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GRIFFIN-305) Standardize Sink Hierarchy

2020-08-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-305?focusedWorklogId=467088=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-467088
 ]

ASF GitHub Bot logged work on GRIFFIN-305:
--

Author: ASF GitHub Bot
Created on: 06/Aug/20 06:20
Start Date: 06/Aug/20 06:20
Worklog Time Spent: 10m 
  Work Description: wankunde commented on pull request #575:
URL: https://github.com/apache/griffin/pull/575#issuecomment-669727067


   LGTM, @chitralverma Many thanks for your work.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 467088)
Time Spent: 1h 40m  (was: 1.5h)

> Standardize Sink Hierarchy
> --
>
> Key: GRIFFIN-305
> URL: https://issues.apache.org/jira/browse/GRIFFIN-305
> Project: Griffin
>  Issue Type: Sub-task
>Reporter: Chitral Verma
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GRIFFIN-323) Refactor configuration for Data Source and Data Source Connector

2020-07-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-323?focusedWorklogId=464011=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-464011
 ]

ASF GitHub Bot logged work on GRIFFIN-323:
--

Author: ASF GitHub Bot
Created on: 29/Jul/20 07:53
Start Date: 29/Jul/20 07:53
Worklog Time Spent: 10m 
  Work Description: zgdong1987 commented on pull request #568:
URL: https://github.com/apache/griffin/pull/568#issuecomment-665456611


   I changed to the version 0.5.0, but failed to compile the UI



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 464011)
Time Spent: 1h 50m  (was: 1h 40m)

> Refactor configuration for Data Source and Data Source Connector
> 
>
> Key: GRIFFIN-323
> URL: https://issues.apache.org/jira/browse/GRIFFIN-323
> Project: Griffin
>  Issue Type: Improvement
>Reporter: Chitral Verma
>Assignee: Chitral Verma
>Priority: Major
> Fix For: 0.6.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Current config structure for Data Source is as follows,
> {noformat}
>  
> "data.sources": [
> {
>   "name": "src",
>   "connectors": [
> {
>   "type": "AVRO",
>   "version": "1.7",
>   "config": {
> "file.path": "/",
> "file.name": ".avro"
>   }
> }
>   ]
> },
> {
>   "name": "tgt",
>   "connectors": [
> {
>   "type": "AVRO",
>   "version": "1.7",
>   "config": {
> "file.path": "/",
> "file.name": ".avro"
>   }
> }
>   ]
> }
>   ]
> {noformat}
>  
> This ticket proposes the following changes,
>  * remove 'version' from 'DataConnectorParam' as it is not being used 
> anywhere in the codebase.
>  * change 'connectors' from array type to a single JSON object. Since a data 
> source named X may only be of one type (hive, file etc), the connector field 
> should not be an array.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GRIFFIN-323) Refactor configuration for Data Source and Data Source Connector

2020-07-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-323?focusedWorklogId=459845=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459845
 ]

ASF GitHub Bot logged work on GRIFFIN-323:
--

Author: ASF GitHub Bot
Created on: 16/Jul/20 15:04
Start Date: 16/Jul/20 15:04
Worklog Time Spent: 10m 
  Work Description: chitralverma commented on pull request #568:
URL: https://github.com/apache/griffin/pull/568#issuecomment-659472645


   Hi Everyone, Please note that these changes are done in the `measure` module 
only. 
   
   To get a more stable release, use the version 0.5.0 available in the maven 
central.
   
   Otherwise, fix PRs for `services` and `ui` module are most welcome.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 459845)
Time Spent: 1.5h  (was: 1h 20m)

> Refactor configuration for Data Source and Data Source Connector
> 
>
> Key: GRIFFIN-323
> URL: https://issues.apache.org/jira/browse/GRIFFIN-323
> Project: Griffin
>  Issue Type: Improvement
>Reporter: Chitral Verma
>Assignee: Chitral Verma
>Priority: Major
> Fix For: 0.6.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Current config structure for Data Source is as follows,
> {noformat}
>  
> "data.sources": [
> {
>   "name": "src",
>   "connectors": [
> {
>   "type": "AVRO",
>   "version": "1.7",
>   "config": {
> "file.path": "/",
> "file.name": ".avro"
>   }
> }
>   ]
> },
> {
>   "name": "tgt",
>   "connectors": [
> {
>   "type": "AVRO",
>   "version": "1.7",
>   "config": {
> "file.path": "/",
> "file.name": ".avro"
>   }
> }
>   ]
> }
>   ]
> {noformat}
>  
> This ticket proposes the following changes,
>  * remove 'version' from 'DataConnectorParam' as it is not being used 
> anywhere in the codebase.
>  * change 'connectors' from array type to a single JSON object. Since a data 
> source named X may only be of one type (hive, file etc), the connector field 
> should not be an array.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GRIFFIN-323) Refactor configuration for Data Source and Data Source Connector

2020-07-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-323?focusedWorklogId=459832=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459832
 ]

ASF GitHub Bot logged work on GRIFFIN-323:
--

Author: ASF GitHub Bot
Created on: 16/Jul/20 14:03
Start Date: 16/Jul/20 14:03
Worklog Time Spent: 10m 
  Work Description: FaizanSh commented on pull request #568:
URL: https://github.com/apache/griffin/pull/568#issuecomment-659433895


   > Yes these changes are not supported by the current ui and services
   
   Hi, this problem still presist in code. What workaround do you suggest?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 459832)
Time Spent: 1h 20m  (was: 1h 10m)

> Refactor configuration for Data Source and Data Source Connector
> 
>
> Key: GRIFFIN-323
> URL: https://issues.apache.org/jira/browse/GRIFFIN-323
> Project: Griffin
>  Issue Type: Improvement
>Reporter: Chitral Verma
>Assignee: Chitral Verma
>Priority: Major
> Fix For: 0.6.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Current config structure for Data Source is as follows,
> {noformat}
>  
> "data.sources": [
> {
>   "name": "src",
>   "connectors": [
> {
>   "type": "AVRO",
>   "version": "1.7",
>   "config": {
> "file.path": "/",
> "file.name": ".avro"
>   }
> }
>   ]
> },
> {
>   "name": "tgt",
>   "connectors": [
> {
>   "type": "AVRO",
>   "version": "1.7",
>   "config": {
> "file.path": "/",
> "file.name": ".avro"
>   }
> }
>   ]
> }
>   ]
> {noformat}
>  
> This ticket proposes the following changes,
>  * remove 'version' from 'DataConnectorParam' as it is not being used 
> anywhere in the codebase.
>  * change 'connectors' from array type to a single JSON object. Since a data 
> source named X may only be of one type (hive, file etc), the connector field 
> should not be an array.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GRIFFIN-323) Refactor configuration for Data Source and Data Source Connector

2020-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-323?focusedWorklogId=452464=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452464
 ]

ASF GitHub Bot logged work on GRIFFIN-323:
--

Author: ASF GitHub Bot
Created on: 29/Jun/20 16:31
Start Date: 29/Jun/20 16:31
Worklog Time Spent: 10m 
  Work Description: Rayleigh0727 commented on pull request #568:
URL: https://github.com/apache/griffin/pull/568#issuecomment-651229496


   > I do not see a change in the service code to accept this change in 
contract. When I submit with 'connectors' from the UI the request goes through 
but fails while executing the job (log snippet below) and with 'connector' does 
not work in the service API call since it's expecting 'connectors' to submit a 
measure. Am I missing something here?
   > `20/03/19 18:27:26 ERROR Application$: assertion failed: Connector is 
undefined or invalid java.lang.AssertionError: assertion failed: Connector is 
undefined or invalid at scala.Predef$.assert(Predef.scala:170) at 
org.apache.griffin.measure.configuration.dqdefinition.DataSourceParam.validate(DQConfig.scala:100)`
   
   hi,I encountered the same problem, did you solve it?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 452464)
Time Spent: 1h 10m  (was: 1h)

> Refactor configuration for Data Source and Data Source Connector
> 
>
> Key: GRIFFIN-323
> URL: https://issues.apache.org/jira/browse/GRIFFIN-323
> Project: Griffin
>  Issue Type: Improvement
>Reporter: Chitral Verma
>Assignee: Chitral Verma
>Priority: Major
> Fix For: 0.6.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Current config structure for Data Source is as follows,
> {noformat}
>  
> "data.sources": [
> {
>   "name": "src",
>   "connectors": [
> {
>   "type": "AVRO",
>   "version": "1.7",
>   "config": {
> "file.path": "/",
> "file.name": ".avro"
>   }
> }
>   ]
> },
> {
>   "name": "tgt",
>   "connectors": [
> {
>   "type": "AVRO",
>   "version": "1.7",
>   "config": {
> "file.path": "/",
> "file.name": ".avro"
>   }
> }
>   ]
> }
>   ]
> {noformat}
>  
> This ticket proposes the following changes,
>  * remove 'version' from 'DataConnectorParam' as it is not being used 
> anywhere in the codebase.
>  * change 'connectors' from array type to a single JSON object. Since a data 
> source named X may only be of one type (hive, file etc), the connector field 
> should not be an array.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GRIFFIN-305) Standardize Sink Hierarchy

2020-06-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-305?focusedWorklogId=452026=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452026
 ]

ASF GitHub Bot logged work on GRIFFIN-305:
--

Author: ASF GitHub Bot
Created on: 28/Jun/20 12:25
Start Date: 28/Jun/20 12:25
Worklog Time Spent: 10m 
  Work Description: wankunde commented on pull request #575:
URL: https://github.com/apache/griffin/pull/575#issuecomment-650745454


   Hi, @chitralverma , could you provide an implementation example of the 
`open` and `close` methods?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 452026)
Time Spent: 1h 10m  (was: 1h)

> Standardize Sink Hierarchy
> --
>
> Key: GRIFFIN-305
> URL: https://issues.apache.org/jira/browse/GRIFFIN-305
> Project: Griffin
>  Issue Type: Sub-task
>Reporter: Chitral Verma
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GRIFFIN-305) Standardize Sink Hierarchy

2020-06-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-305?focusedWorklogId=450431=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-450431
 ]

ASF GitHub Bot logged work on GRIFFIN-305:
--

Author: ASF GitHub Bot
Created on: 24/Jun/20 13:17
Start Date: 24/Jun/20 13:17
Worklog Time Spent: 10m 
  Work Description: chitralverma commented on pull request #575:
URL: https://github.com/apache/griffin/pull/575#issuecomment-648813731


   @wankunde the `open` and `close` methods are for future custom sinks 
implementations, for example, Redis, JDBC etc that do not rely on spark 
datasource v1/ v2. Such data sources require one-time initialization of 
connection/ connection pool which can then be serialized to all executor each 
time the write operation is called.
   
   This PR also acts as a basic cleanup for structured streaming sinks which 
I'm working on.
   I'm also planning to rewrite HDFSSink as FileBasedSink much like 
FileBasedDataConnector and include many other sinks.
   
   Griffin is going to get really exciting.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 450431)
Time Spent: 1h  (was: 50m)

> Standardize Sink Hierarchy
> --
>
> Key: GRIFFIN-305
> URL: https://issues.apache.org/jira/browse/GRIFFIN-305
> Project: Griffin
>  Issue Type: Sub-task
>Reporter: Chitral Verma
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GRIFFIN-305) Standardize Sink Hierarchy

2020-06-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-305?focusedWorklogId=450428=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-450428
 ]

ASF GitHub Bot logged work on GRIFFIN-305:
--

Author: ASF GitHub Bot
Created on: 24/Jun/20 13:11
Start Date: 24/Jun/20 13:11
Worklog Time Spent: 10m 
  Work Description: chitralverma commented on a change in pull request #575:
URL: https://github.com/apache/griffin/pull/575#discussion_r444881499



##
File path: measure/src/main/scala/org/apache/griffin/measure/sink/Sink.scala
##
@@ -18,30 +18,57 @@
 package org.apache.griffin.measure.sink
 
 import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.DataFrame
 
 import org.apache.griffin.measure.Loggable
 
 /**
- * sink metric and record
+ * Base trait for batch and Streaming Sinks.
+ * To implement custom sinks, extend your classes with this trait.
  */
 trait Sink extends Loggable with Serializable {
-  val metricName: String
+
+  val jobName: String

Review comment:
   absolutely, I had the same in mind but I was planning to do it as part 
of a separate config refactoring in the near future.
   
   Do you suggest I do this right now or later? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 450428)
Time Spent: 50m  (was: 40m)

> Standardize Sink Hierarchy
> --
>
> Key: GRIFFIN-305
> URL: https://issues.apache.org/jira/browse/GRIFFIN-305
> Project: Griffin
>  Issue Type: Sub-task
>Reporter: Chitral Verma
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GRIFFIN-305) Standardize Sink Hierarchy

2020-06-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-305?focusedWorklogId=450426=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-450426
 ]

ASF GitHub Bot logged work on GRIFFIN-305:
--

Author: ASF GitHub Bot
Created on: 24/Jun/20 13:10
Start Date: 24/Jun/20 13:10
Worklog Time Spent: 10m 
  Work Description: chitralverma commented on a change in pull request #575:
URL: https://github.com/apache/griffin/pull/575#discussion_r444880634



##
File path: measure/src/main/scala/org/apache/griffin/measure/sink/Sink.scala
##
@@ -18,30 +18,57 @@
 package org.apache.griffin.measure.sink
 
 import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.DataFrame
 
 import org.apache.griffin.measure.Loggable
 
 /**
- * sink metric and record
+ * Base trait for batch and Streaming Sinks.
+ * To implement custom sinks, extend your classes with this trait.
  */
 trait Sink extends Loggable with Serializable {
-  val metricName: String
+
+  val jobName: String
   val timeStamp: Long
 
   val config: Map[String, Any]
 
   val block: Boolean
 
-  def available(): Boolean
+  /**
+   * Ensures that the pre-requisites (if any) of the Sink are met before 
opening it.
+   */
+  def validate(): Boolean
 
-  def start(msg: String): Unit
-  def finish(): Unit
+  /**
+   * Allows initialization of the connection to the sink (if required).
+   *
+   * @param applicationId Spark Application ID
+   */
+  def open(applicationId: String): Unit

Review comment:
   @wankunde this is as per the existing implementation. I just changed the 
variable names to remove ambiguity and made no functional change. This has been 
done in many other places also.
   
   I'll refactor the applicationId in favor of more description soon.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 450426)
Time Spent: 40m  (was: 0.5h)

> Standardize Sink Hierarchy
> --
>
> Key: GRIFFIN-305
> URL: https://issues.apache.org/jira/browse/GRIFFIN-305
> Project: Griffin
>  Issue Type: Sub-task
>Reporter: Chitral Verma
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GRIFFIN-305) Standardize Sink Hierarchy

2020-06-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-305?focusedWorklogId=450388=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-450388
 ]

ASF GitHub Bot logged work on GRIFFIN-305:
--

Author: ASF GitHub Bot
Created on: 24/Jun/20 11:05
Start Date: 24/Jun/20 11:05
Worklog Time Spent: 10m 
  Work Description: wankunde commented on a change in pull request #575:
URL: https://github.com/apache/griffin/pull/575#discussion_r444802163



##
File path: measure/src/main/scala/org/apache/griffin/measure/sink/Sink.scala
##
@@ -18,30 +18,57 @@
 package org.apache.griffin.measure.sink
 
 import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.DataFrame
 
 import org.apache.griffin.measure.Loggable
 
 /**
- * sink metric and record
+ * Base trait for batch and Streaming Sinks.
+ * To implement custom sinks, extend your classes with this trait.
  */
 trait Sink extends Loggable with Serializable {
-  val metricName: String
+
+  val jobName: String
   val timeStamp: Long
 
   val config: Map[String, Any]
 
   val block: Boolean
 
-  def available(): Boolean
+  /**
+   * Ensures that the pre-requisites (if any) of the Sink are met before 
opening it.
+   */
+  def validate(): Boolean
 
-  def start(msg: String): Unit
-  def finish(): Unit
+  /**
+   * Allows initialization of the connection to the sink (if required).
+   *
+   * @param applicationId Spark Application ID
+   */
+  def open(applicationId: String): Unit

Review comment:
   What's the use of `applicationId `?  Can we use `jobName` instead?

##
File path: measure/src/main/scala/org/apache/griffin/measure/sink/Sink.scala
##
@@ -18,30 +18,57 @@
 package org.apache.griffin.measure.sink
 
 import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.DataFrame
 
 import org.apache.griffin.measure.Loggable
 
 /**
- * sink metric and record
+ * Base trait for batch and Streaming Sinks.
+ * To implement custom sinks, extend your classes with this trait.
  */
 trait Sink extends Loggable with Serializable {
-  val metricName: String
+
+  val jobName: String

Review comment:
   It's better to unify the names of variable, and easier to understand.
   
   In `DQConfig` is `name`, in `BatchDQApp` is `metricName`, in `DQContext` is 
`name`, in `SinkFactory` is 
   `jobName`. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 450388)
Time Spent: 0.5h  (was: 20m)

> Standardize Sink Hierarchy
> --
>
> Key: GRIFFIN-305
> URL: https://issues.apache.org/jira/browse/GRIFFIN-305
> Project: Griffin
>  Issue Type: Sub-task
>Reporter: Chitral Verma
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GRIFFIN-305) Standardize Sink Hierarchy

2020-06-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-305?focusedWorklogId=447980=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-447980
 ]

ASF GitHub Bot logged work on GRIFFIN-305:
--

Author: ASF GitHub Bot
Created on: 18/Jun/20 18:48
Start Date: 18/Jun/20 18:48
Worklog Time Spent: 10m 
  Work Description: chitralverma commented on pull request #575:
URL: https://github.com/apache/griffin/pull/575#issuecomment-646243010


   @guoyuepeng @wankunde Can you please review this. Thanks.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 447980)
Time Spent: 20m  (was: 10m)

> Standardize Sink Hierarchy
> --
>
> Key: GRIFFIN-305
> URL: https://issues.apache.org/jira/browse/GRIFFIN-305
> Project: Griffin
>  Issue Type: Sub-task
>Reporter: Chitral Verma
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GRIFFIN-305) Standardize Sink Hierarchy

2020-06-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-305?focusedWorklogId=447342=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-447342
 ]

ASF GitHub Bot logged work on GRIFFIN-305:
--

Author: ASF GitHub Bot
Created on: 17/Jun/20 15:06
Start Date: 17/Jun/20 15:06
Worklog Time Spent: 10m 
  Work Description: chitralverma opened a new pull request #575:
URL: https://github.com/apache/griffin/pull/575


   **What changes were proposed in this pull request?**
   
   Currently, the implementation of `Sinks` in Griffin poses the below issues. 
This PR aims at fixing these issues.
   - `Sinks` are based on the recursive MultiSink class which is a sink itself 
but the underlying implementation is that of a `Seq` which causes ambiguity and 
isn't much useful. This has been removed.
   - Some unused code like `SinkContext` has been removed.
   - Data is converted from the performant DataFrame to RDD while persisting in 
both streaming and batch pipelines. A new method `sinkBatchRecords` has been 
added to allow operations directly on DataFrame for batch pipelines. Streaming 
will still use the old implementation which will be replaced with structured 
streaming.
   - Refactored the methods of `Sink` like changed `start`/ `finish` to `open`/ 
`close` and `jobName` was incorrectly passed as `metricName`.
   - Presently, only one instance of a sink with a given type can be defined in 
the env config. This will not allow the cases where you want to configure 
multiple sinks of same type like HDFS or JDBC. Added sink `name` to env config 
which is used to define the sink that should be used in the job config also.
   - Updated all sinks as per the changes above. With some additional changes 
to ConsoleSink
   
   **Does this PR introduce any user-facing change?**
   Yes. As mentioned above, the sink config has changed in env and job configs.
   
   How was this patch tested?
   Griffin test suite and additional unit test cases



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 447342)
Remaining Estimate: 0h
Time Spent: 10m

> Standardize Sink Hierarchy
> --
>
> Key: GRIFFIN-305
> URL: https://issues.apache.org/jira/browse/GRIFFIN-305
> Project: Griffin
>  Issue Type: Sub-task
>Reporter: Chitral Verma
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GRIFFIN-326) New implementation for Elasticsearch Data Connector (Batch)

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-326?focusedWorklogId=441228=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441228
 ]

ASF GitHub Bot logged work on GRIFFIN-326:
--

Author: ASF GitHub Bot
Created on: 04/Jun/20 11:09
Start Date: 04/Jun/20 11:09
Worklog Time Spent: 10m 
  Work Description: chitralverma commented on pull request #569:
URL: https://github.com/apache/griffin/pull/569#issuecomment-638782290


   @guoyuepeng Seems like a build has been running for this for month now. Can 
you check this.
   
   
https://travis-ci.org/github/apache/griffin/builds/694599100?utm_source=github_status_medium=notification



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 441228)
Time Spent: 4h 20m  (was: 4h 10m)

> New implementation for Elasticsearch Data Connector (Batch)
> ---
>
> Key: GRIFFIN-326
> URL: https://issues.apache.org/jira/browse/GRIFFIN-326
> Project: Griffin
>  Issue Type: Sub-task
>Reporter: Chitral Verma
>Priority: Major
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> The current implementation of Elasticsearch relies on sending post requests 
> from the driver using either SQL or search mode for query filtering.
> This implementation has the following potential issues,
>  * Data is fetched for indexes (database scopes in ES) in bulk via 1 call on 
> the driver. If the index has a lot of data, due to the big response payload, 
> a bottleneck would be created on the driver.
>  * Further, the driver then needs to parse this response payload and then 
> parallelize it, this is again a driver side bottleneck as each JSON record 
> needs to be mapped to a set schema in a type-safe manner.
>  * Only _host_, _port_ and _version_ are the available options to configure 
> the connection to the ES node or cluster.
>  * Source partitioning logic is not carried forward when parallelizing 
> records, the records will be randomized due to the Spark's default 
> partitioning
>  * Even though this implementation is a first-class member of Apache Griffin, 
> yet it's based on the _custom_ connector trait.
> The proposed implementation aims to,
>  * Deprecate the current implementation in favor of the direct official 
> [elasticsearch-hadoop|[https://github.com/elastic/elasticsearch-hadoop/tree/master/spark/sql-20]]
>  library.
>  * This library is built on DataSource API built on spark 2.2.x+ and thus 
> brings support for filter pushdowns, column pruning, unified read and write 
> and additional optimizations.
>  * Many configuration options are available for ES connectivity, [check 
> here|[https://github.com/elastic/elasticsearch-hadoop/blob/master/mr/src/main/java/org/elasticsearch/hadoop/cfg/ConfigurationOptions.java]]
>  * Any filters can be applied as expressions directly on the data frame and 
> are pushed automatically to the source.
> The new implementation will look something like,
> {code:java}
> sparkSession.read.format("es").options( ??? ).load(""){code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GRIFFIN-326) New implementation for Elasticsearch Data Connector (Batch)

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-326?focusedWorklogId=441227=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441227
 ]

ASF GitHub Bot logged work on GRIFFIN-326:
--

Author: ASF GitHub Bot
Created on: 04/Jun/20 11:01
Start Date: 04/Jun/20 11:01
Worklog Time Spent: 10m 
  Work Description: asfgit closed pull request #569:
URL: https://github.com/apache/griffin/pull/569


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 441227)
Time Spent: 4h 10m  (was: 4h)

> New implementation for Elasticsearch Data Connector (Batch)
> ---
>
> Key: GRIFFIN-326
> URL: https://issues.apache.org/jira/browse/GRIFFIN-326
> Project: Griffin
>  Issue Type: Sub-task
>Reporter: Chitral Verma
>Priority: Major
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> The current implementation of Elasticsearch relies on sending post requests 
> from the driver using either SQL or search mode for query filtering.
> This implementation has the following potential issues,
>  * Data is fetched for indexes (database scopes in ES) in bulk via 1 call on 
> the driver. If the index has a lot of data, due to the big response payload, 
> a bottleneck would be created on the driver.
>  * Further, the driver then needs to parse this response payload and then 
> parallelize it, this is again a driver side bottleneck as each JSON record 
> needs to be mapped to a set schema in a type-safe manner.
>  * Only _host_, _port_ and _version_ are the available options to configure 
> the connection to the ES node or cluster.
>  * Source partitioning logic is not carried forward when parallelizing 
> records, the records will be randomized due to the Spark's default 
> partitioning
>  * Even though this implementation is a first-class member of Apache Griffin, 
> yet it's based on the _custom_ connector trait.
> The proposed implementation aims to,
>  * Deprecate the current implementation in favor of the direct official 
> [elasticsearch-hadoop|[https://github.com/elastic/elasticsearch-hadoop/tree/master/spark/sql-20]]
>  library.
>  * This library is built on DataSource API built on spark 2.2.x+ and thus 
> brings support for filter pushdowns, column pruning, unified read and write 
> and additional optimizations.
>  * Many configuration options are available for ES connectivity, [check 
> here|[https://github.com/elastic/elasticsearch-hadoop/blob/master/mr/src/main/java/org/elasticsearch/hadoop/cfg/ConfigurationOptions.java]]
>  * Any filters can be applied as expressions directly on the data frame and 
> are pushed automatically to the source.
> The new implementation will look something like,
> {code:java}
> sparkSession.read.format("es").options( ??? ).load(""){code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   4   5   >