[jira] [Commented] (GRIFFIN-210) [Measure] need to integrate with upstream/downstream nodes when bad records are founded

2018-11-18 Thread Eugene (JIRA)


[ 
https://issues.apache.org/jira/browse/GRIFFIN-210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16691165#comment-16691165
 ] 

Eugene commented on GRIFFIN-210:


I think we should allow to define DQ rules/threshold/remedy action from user's 
perspective.

and add pre-measure/post-measure stages in whole pipeline.

 

considering integration with different upstream/downstream nodes, more options 
are better than one mode, leave choice to users

> [Measure] need to integrate with upstream/downstream nodes when bad records 
> are founded
> ---
>
> Key: GRIFFIN-210
> URL: https://issues.apache.org/jira/browse/GRIFFIN-210
> Project: Griffin (Incubating)
>  Issue Type: Wish
>Reporter: William Guo
>Assignee: William Guo
>Priority: Major
>
> In a typical data quality project, when Apache Griffin find some data quality 
> issue, usually, it need to integrate with upstream or downstream nodes.
> So corresponding systems can have opportunities to automatically do some 
> remedy action, such as retry...  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (GRIFFIN-213) Support pluggable datasource connectors

2018-11-18 Thread Eugene (JIRA)


[ 
https://issues.apache.org/jira/browse/GRIFFIN-213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16691151#comment-16691151
 ] 

Eugene commented on GRIFFIN-213:


two comments:

1.customization connector allows users to connect any data source and involves 
third-party plugins, how could we guarantee safety and security of griffin 
pipeline, is there policy or permission check?

2.'custom' seems not suitable connector type which describes data source like 
'kafka' 'hive' 'text', do you think about it?

> Support pluggable datasource connectors
> ---
>
> Key: GRIFFIN-213
> URL: https://issues.apache.org/jira/browse/GRIFFIN-213
> Project: Griffin (Incubating)
>  Issue Type: Improvement
>Reporter: Nikolay Sokolov
>Priority: Minor
>
> As of Griffin 0.3, code modification is required, in order to add new data 
> connectors.
> Proposal is to add new data connector type, CUSTOM, that would allow to 
> specify class name of data connector implementation to use. Additional jars 
> with custom connector implementations would be provided in spark 
> configuration template.
> Class name would be specified in "class" config of data connector. For 
> example:
> {code:json}
> "connectors": [
> {
>   "type": "CUSTOM",
>   "config": {
> "class": "org.example.griffin.JDBCConnector"
> // extra connector-specific parameters
>   }
> }
>   ]
> {code}
> Proposed contract for implementations is based on current convention:
>  - for batch
>  ** class should be a subclass of BatchDataConnector
>  ** if should have method with signature:
> {code:java}
> public static BatchDataConnector apply(ctx: BatchDataConnectorContext)
> {code}
>  - for streaming
>  ** class should be a subclass of StreamingDataConnector
>  ** it should have method with signature:
> {code:java}
> public static StreamingDataConnector apply(ctx: StreamingDataConnectorContext)
> {code}
> Signatures of context objects:
> {code:scala}
> case class BatchDataConnectorContext(@transient sparkSession: SparkSession,
>  dcParam: DataConnectorParam,
>  timestampStorage: TimestampStorage)
> case class StreamingDataConnectorContext(@transient sparkSession: 
> SparkSession,
>  @transient ssc: StreamingContext,
>  dcParam: DataConnectorParam,
>  timestampStorage: TimestampStorage,
>  streamingCacheClientOpt: 
> Option[StreamingCacheClient])
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (GRIFFIN-200) Lifecycle hooks support

2018-11-06 Thread Eugene (JIRA)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene reassigned GRIFFIN-200:
--

Assignee: Eugene  (was: William Guo)

> Lifecycle hooks support
> ---
>
> Key: GRIFFIN-200
> URL: https://issues.apache.org/jira/browse/GRIFFIN-200
> Project: Griffin (Incubating)
>  Issue Type: New Feature
>Reporter: Nikolay Sokolov
>Assignee: Eugene
>Priority: Minor
>
> In some environments, users might want to perform certain actions 
> before/after job is created, before/after job is activated, before/after job 
> is deleted, and so on.
> To fullfill that need, some hook plugin mechanism can be provided, similar to 
> what Hive is doing. User would place respective jar files into Service module 
> classpath at deployment time, and would specify class names using some 
> annotation or using property listing class names (particular mechanism is yet 
> to be determined).
> Proposed signature:
> {code:none}
> public interface GriffinHook {
> void onEvent(GriffinHookEvent event) throws Exception;
> }
> public interface GriffinHookEvent { ... }
> public interface JobEvent implements GriffinHookEvent { ... } 
> public class BeforeJobCreated implements JobEvent { ... }
> public class AfterJobCreated implements JobEvent { ... }
> public class BeforeJobDeleted implements JobEvent { ... }
> public class AfterJobDeleted implements JobEvent { ... }
> public interface JobInstanceEvent implements GriffinHookEvent { ... }
> public class BeforeJobInstanceStart implements JobInstanceEvent { ... }
> public class AfterJobInstanceEnd implements JobInstanceEvent { ... }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (GRIFFIN-200) Lifecycle hooks support

2018-10-16 Thread Eugene (JIRA)


[ 
https://issues.apache.org/jira/browse/GRIFFIN-200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16652822#comment-16652822
 ] 

Eugene commented on GRIFFIN-200:


[~chemikadze]

thanks for your clarification here. I know the use case you mentioned.

you propose in the previous comment 'Updated potential interface of the hook to 
have single universal method with different event subtypes', I agree with you. 
so if we use event as type flag, do we need more listeners? maybe one general 
interface could handle all listener outside.


about synchronous and asynchronous interface, I conclude my idea based on 
common rule applying in distributed system, where asynchronous often is used 
more than synchronous.

> Lifecycle hooks support
> ---
>
> Key: GRIFFIN-200
> URL: https://issues.apache.org/jira/browse/GRIFFIN-200
> Project: Griffin (Incubating)
>  Issue Type: New Feature
>Reporter: Nikolay Sokolov
>Assignee: William Guo
>Priority: Minor
>
> In some environments, users might want to perform certain actions 
> before/after job is created, before/after job is activated, before/after job 
> is deleted, and so on.
> To fullfill that need, some hook plugin mechanism can be provided, similar to 
> what Hive is doing. User would place respective jar files into Service module 
> classpath at deployment time, and would specify class names using some 
> annotation or using property listing class names (particular mechanism is yet 
> to be determined).
> Proposed signature:
> {code:none}
> public interface JobLifecycleHook {
> void onJobEvent(JobLifecycleEvent event) throws Exception;
> }
> public class BeforeJobCreated implements JobLifecycleEvent { ... }
> public class AfterJobCreated implements JobLifecycleEvent { ... }
> public class BeforeJobDeleted implements JobLifecycleEvent { ... }
> public class AfterJobDeleted implements JobLifecycleEvent { ... }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (GRIFFIN-200) Lifecycle hooks support

2018-10-16 Thread Eugene (JIRA)


[ 
https://issues.apache.org/jira/browse/GRIFFIN-200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16651241#comment-16651241
 ] 

Eugene commented on GRIFFIN-200:


[~guoyp] [~chemikadze] 

I think it's better to take advantage of existed life cycle to implement 
instead of new defined interface.

I don't know if livy job handle listener is enough to fulfill this requirement, 
but it looks good.

[https://livy.incubator.apache.org/docs/latest/api/java/org/apache/livy/JobHandle.Listener.html]

 

about life cycle listener, do we need multiple listeners? what's use case for 
multiple listeners, in my gut feeling one listener per one job is enough.

 

about synchronous or asynchronous, I prefer asynchronous on the grounds that in 
a distributed system, we cannot assure synchronous call could return 
considering networking/partition problems.

 

 

 

> Lifecycle hooks support
> ---
>
> Key: GRIFFIN-200
> URL: https://issues.apache.org/jira/browse/GRIFFIN-200
> Project: Griffin (Incubating)
>  Issue Type: New Feature
>Reporter: Nikolay Sokolov
>Assignee: William Guo
>Priority: Minor
>
> In some environments, users might want to perform certain actions 
> before/after job is created, before/after job is activated, before/after job 
> is deleted, and so on.
> To fullfill that need, some hook plugin mechanism can be provided, similar to 
> what Hive is doing. User would place respective jar files into Service module 
> classpath at deployment time, and would specify class names using some 
> annotation or using property listing class names (particular mechanism is yet 
> to be determined).
> Proposed signature:
> {code:none}
> public interface JobLifecycleHook {
> void onJobEvent(JobLifecycleEvent event) throws Exception;
> }
> public class BeforeJobCreated implements JobLifecycleEvent { ... }
> public class AfterJobCreated implements JobLifecycleEvent { ... }
> public class BeforeJobDeleted implements JobLifecycleEvent { ... }
> public class AfterJobDeleted implements JobLifecycleEvent { ... }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GRIFFIN-204) postman collection is not up to date, some APIs don't work well, like create measure

2018-10-14 Thread Eugene (JIRA)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene resolved GRIFFIN-204.

   Resolution: Fixed
Fix Version/s: (was: 0.2.0-incubating)
   0.3.1-incubating

refresh postman script and resolve this issue

> postman collection is not up to date, some APIs don't work well, like create 
> measure
> 
>
> Key: GRIFFIN-204
> URL: https://issues.apache.org/jira/browse/GRIFFIN-204
> Project: Griffin (Incubating)
>  Issue Type: Bug
>Affects Versions: 0.2.0-incubating
>Reporter: Eugene
>Assignee: Eugene
>Priority: Minor
> Fix For: 0.3.1-incubating
>
>
> apply 
> [https://github.com/apache/incubator-griffin/blob/master/griffin-doc/service/postman/griffin.json,]
>  and launch create measure api, then get an error. I find the json is not 
> matched to latest implementation
> {
>     "timestamp": 1539073909006,
>     "status": 400,
>     "error": "Bad Request",
>     "exception": 
> "org.springframework.http.converter.HttpMessageNotReadableException",
>     "message": "Could not read document: Can not construct instance of 
> org.apache.griffin.core.measure.entity.DqType from String value 'accuracy': 
> value not one of declared Enum instance names: [COMPLETENESS, TIMELINESS, 
> PROFILING, CONSISTENCY, ACCURACY, UNIQUENESS]\n at [Source: 
> java.io.PushbackInputStream@565f5e70; line: 4, column: 15] (through reference 
> chain: org.apache.griffin.core.measure.entity.GriffinMeasure[\"dq.type\"]); 
> nested exception is 
> com.fasterxml.jackson.databind.exc.InvalidFormatException: Can not construct 
> instance of org.apache.griffin.core.measure.entity.DqType from String value 
> 'accuracy': value not one of declared Enum instance names: [COMPLETENESS, 
> TIMELINESS, PROFILING, CONSISTENCY, ACCURACY, UNIQUENESS]\n at [Source: 
> java.io.PushbackInputStream@565f5e70; line: 4, column: 15] (through reference 
> chain: org.apache.griffin.core.measure.entity.GriffinMeasure[\"dq.type\"])",
>     "path": "/api/v1/measures"
> }



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (GRIFFIN-204) postman collection is not up to date, some APIs don't work well, like create measure

2018-10-14 Thread Eugene (JIRA)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene closed GRIFFIN-204.
--

> postman collection is not up to date, some APIs don't work well, like create 
> measure
> 
>
> Key: GRIFFIN-204
> URL: https://issues.apache.org/jira/browse/GRIFFIN-204
> Project: Griffin (Incubating)
>  Issue Type: Bug
>Affects Versions: 0.2.0-incubating
>Reporter: Eugene
>Assignee: Eugene
>Priority: Minor
> Fix For: 0.3.1-incubating
>
>
> apply 
> [https://github.com/apache/incubator-griffin/blob/master/griffin-doc/service/postman/griffin.json,]
>  and launch create measure api, then get an error. I find the json is not 
> matched to latest implementation
> {
>     "timestamp": 1539073909006,
>     "status": 400,
>     "error": "Bad Request",
>     "exception": 
> "org.springframework.http.converter.HttpMessageNotReadableException",
>     "message": "Could not read document: Can not construct instance of 
> org.apache.griffin.core.measure.entity.DqType from String value 'accuracy': 
> value not one of declared Enum instance names: [COMPLETENESS, TIMELINESS, 
> PROFILING, CONSISTENCY, ACCURACY, UNIQUENESS]\n at [Source: 
> java.io.PushbackInputStream@565f5e70; line: 4, column: 15] (through reference 
> chain: org.apache.griffin.core.measure.entity.GriffinMeasure[\"dq.type\"]); 
> nested exception is 
> com.fasterxml.jackson.databind.exc.InvalidFormatException: Can not construct 
> instance of org.apache.griffin.core.measure.entity.DqType from String value 
> 'accuracy': value not one of declared Enum instance names: [COMPLETENESS, 
> TIMELINESS, PROFILING, CONSISTENCY, ACCURACY, UNIQUENESS]\n at [Source: 
> java.io.PushbackInputStream@565f5e70; line: 4, column: 15] (through reference 
> chain: org.apache.griffin.core.measure.entity.GriffinMeasure[\"dq.type\"])",
>     "path": "/api/v1/measures"
> }



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work started] (GRIFFIN-204) postman collection is not up to date, some APIs don't work well, like create measure

2018-10-09 Thread Eugene (JIRA)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on GRIFFIN-204 started by Eugene.
--
> postman collection is not up to date, some APIs don't work well, like create 
> measure
> 
>
> Key: GRIFFIN-204
> URL: https://issues.apache.org/jira/browse/GRIFFIN-204
> Project: Griffin (Incubating)
>  Issue Type: Bug
>Affects Versions: 0.2.0-incubating
>Reporter: Eugene
>Assignee: Eugene
>Priority: Minor
> Fix For: 1.0.0-incubating
>
>
> apply 
> [https://github.com/apache/incubator-griffin/blob/master/griffin-doc/service/postman/griffin.json,]
>  and launch create measure api, then get an error. I find the json is not 
> matched to latest implementation
> {
>     "timestamp": 1539073909006,
>     "status": 400,
>     "error": "Bad Request",
>     "exception": 
> "org.springframework.http.converter.HttpMessageNotReadableException",
>     "message": "Could not read document: Can not construct instance of 
> org.apache.griffin.core.measure.entity.DqType from String value 'accuracy': 
> value not one of declared Enum instance names: [COMPLETENESS, TIMELINESS, 
> PROFILING, CONSISTENCY, ACCURACY, UNIQUENESS]\n at [Source: 
> java.io.PushbackInputStream@565f5e70; line: 4, column: 15] (through reference 
> chain: org.apache.griffin.core.measure.entity.GriffinMeasure[\"dq.type\"]); 
> nested exception is 
> com.fasterxml.jackson.databind.exc.InvalidFormatException: Can not construct 
> instance of org.apache.griffin.core.measure.entity.DqType from String value 
> 'accuracy': value not one of declared Enum instance names: [COMPLETENESS, 
> TIMELINESS, PROFILING, CONSISTENCY, ACCURACY, UNIQUENESS]\n at [Source: 
> java.io.PushbackInputStream@565f5e70; line: 4, column: 15] (through reference 
> chain: org.apache.griffin.core.measure.entity.GriffinMeasure[\"dq.type\"])",
>     "path": "/api/v1/measures"
> }



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GRIFFIN-204) postman collection is not up to date, some APIs don't work well, like create measure

2018-10-09 Thread Eugene (JIRA)
Eugene created GRIFFIN-204:
--

 Summary: postman collection is not up to date, some APIs don't 
work well, like create measure
 Key: GRIFFIN-204
 URL: https://issues.apache.org/jira/browse/GRIFFIN-204
 Project: Griffin (Incubating)
  Issue Type: Bug
Affects Versions: 0.2.0-incubating
Reporter: Eugene
Assignee: Eugene
 Fix For: 1.0.0-incubating


apply 
[https://github.com/apache/incubator-griffin/blob/master/griffin-doc/service/postman/griffin.json,]
 and launch create measure api, then get an error. I find the json is not 
matched to latest implementation

{
    "timestamp": 1539073909006,
    "status": 400,
    "error": "Bad Request",
    "exception": 
"org.springframework.http.converter.HttpMessageNotReadableException",
    "message": "Could not read document: Can not construct instance of 
org.apache.griffin.core.measure.entity.DqType from String value 'accuracy': 
value not one of declared Enum instance names: [COMPLETENESS, TIMELINESS, 
PROFILING, CONSISTENCY, ACCURACY, UNIQUENESS]\n at [Source: 
java.io.PushbackInputStream@565f5e70; line: 4, column: 15] (through reference 
chain: org.apache.griffin.core.measure.entity.GriffinMeasure[\"dq.type\"]); 
nested exception is com.fasterxml.jackson.databind.exc.InvalidFormatException: 
Can not construct instance of org.apache.griffin.core.measure.entity.DqType 
from String value 'accuracy': value not one of declared Enum instance names: 
[COMPLETENESS, TIMELINESS, PROFILING, CONSISTENCY, ACCURACY, UNIQUENESS]\n at 
[Source: java.io.PushbackInputStream@565f5e70; line: 4, column: 15] (through 
reference chain: 
org.apache.griffin.core.measure.entity.GriffinMeasure[\"dq.type\"])",
    "path": "/api/v1/measures"
}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)