[GitHub] incubator-griffin issue #457: merge_pr.py python 3 support

2018-11-18 Thread ludongfang
Github user ludongfang commented on the issue:

https://github.com/apache/incubator-griffin/pull/457
  
LGTM


---


[GitHub] incubator-griffin pull request #457: merge_pr.py python 3 support

2018-11-18 Thread IAmFQQ
GitHub user IAmFQQ opened a pull request:

https://github.com/apache/incubator-griffin/pull/457

merge_pr.py python 3 support

1. Import Request/urlopen/HTTPError from urllib.request to support python 3
2. The print statement has been replaced with a print() function 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/IAmFQQ/incubator-griffin support_python3

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-griffin/pull/457.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #457


commit 4f1f4ecf6e18b9dfefb267ecc92079add36548b9
Author: Fan 
Date:   2018-11-18T13:58:07Z

url request support in python 3 \n print -> print()




---


Re: [RESULT][VOTE] Graduate Apache Griffin (incubating) as a TLP

2018-11-18 Thread William Guo
Hi Justin,

Could you have a look whether Apache Griffin miss anything in the agenda
before board meeting?

Henry should have updated the resolution part for apache griffin graduation.

Please tell us if we miss anything, so we can update it accordingly before
the meeting.


Thanks,
William




On Wed, Nov 14, 2018 at 4:23 PM William Guo  wrote:

> Henry, Thanks!
>
> On Wed, Nov 14, 2018 at 2:03 PM Henry Saputra 
> wrote:
>
>> Hi Justin,
>>
>> I can add it to the resolution.
>>
>> Should I add it as New Resolution item?
>>
>> - Henry
>>
>> On Tue, Nov 13, 2018 at 12:45 PM Justin Mclean 
>> wrote:
>>
>> > Hi,
>> >
>> > I had you down as graduating this month but don't see the resolution on
>> > the board agenda yet. [1] Is someone on top of it?
>> >
>> > Thanks,
>> > Justin
>> >
>> > 1. https://whimsy.apache.org/board/agenda/2018-11-21/
>> >
>>
>


[jira] [Commented] (GRIFFIN-210) [Measure] need to integrate with upstream/downstream nodes when bad records are founded

2018-11-18 Thread Eugene (JIRA)


[ 
https://issues.apache.org/jira/browse/GRIFFIN-210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16691165#comment-16691165
 ] 

Eugene commented on GRIFFIN-210:


I think we should allow to define DQ rules/threshold/remedy action from user's 
perspective.

and add pre-measure/post-measure stages in whole pipeline.

 

considering integration with different upstream/downstream nodes, more options 
are better than one mode, leave choice to users

> [Measure] need to integrate with upstream/downstream nodes when bad records 
> are founded
> ---
>
> Key: GRIFFIN-210
> URL: https://issues.apache.org/jira/browse/GRIFFIN-210
> Project: Griffin (Incubating)
>  Issue Type: Wish
>Reporter: William Guo
>Assignee: William Guo
>Priority: Major
>
> In a typical data quality project, when Apache Griffin find some data quality 
> issue, usually, it need to integrate with upstream or downstream nodes.
> So corresponding systems can have opportunities to automatically do some 
> remedy action, such as retry...  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (GRIFFIN-213) Support pluggable datasource connectors

2018-11-18 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/GRIFFIN-213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16691153#comment-16691153
 ] 

ASF GitHub Bot commented on GRIFFIN-213:


Github user toyboxman commented on a diff in the pull request:

https://github.com/apache/incubator-griffin/pull/456#discussion_r234475362
  
--- Diff: griffin-doc/measure/measure-configuration-guide.md ---
@@ -188,7 +188,7 @@ Above lists DQ job configure parameters.
 - **sinks**: Whitelisted sink types for this job. Note: no sinks will be 
used, if empty or omitted. 
 
 ### Data Connector
-- **type**: Data connector type, "AVRO", "HIVE", "TEXT-DIR" for batch 
mode, "KAFKA" for streaming mode.
+- **type**: Data connector type: "AVRO", "HIVE", "TEXT-DIR", "CUSTOM" for 
batch mode; "KAFKA", "CUSTOM" for streaming mode.
--- End diff --

do you think about 'ANY' as replacement for 'CUSTOM'

**type**: Data connector type, "AVRO", "HIVE", "TEXT-DIR" for batch mode, 
"KAFKA" for streaming mode, "ANY" for boths.


> Support pluggable datasource connectors
> ---
>
> Key: GRIFFIN-213
> URL: https://issues.apache.org/jira/browse/GRIFFIN-213
> Project: Griffin (Incubating)
>  Issue Type: Improvement
>Reporter: Nikolay Sokolov
>Priority: Minor
>
> As of Griffin 0.3, code modification is required, in order to add new data 
> connectors.
> Proposal is to add new data connector type, CUSTOM, that would allow to 
> specify class name of data connector implementation to use. Additional jars 
> with custom connector implementations would be provided in spark 
> configuration template.
> Class name would be specified in "class" config of data connector. For 
> example:
> {code:json}
> "connectors": [
> {
>   "type": "CUSTOM",
>   "config": {
> "class": "org.example.griffin.JDBCConnector"
> // extra connector-specific parameters
>   }
> }
>   ]
> {code}
> Proposed contract for implementations is based on current convention:
>  - for batch
>  ** class should be a subclass of BatchDataConnector
>  ** if should have method with signature:
> {code:java}
> public static BatchDataConnector apply(ctx: BatchDataConnectorContext)
> {code}
>  - for streaming
>  ** class should be a subclass of StreamingDataConnector
>  ** it should have method with signature:
> {code:java}
> public static StreamingDataConnector apply(ctx: StreamingDataConnectorContext)
> {code}
> Signatures of context objects:
> {code:scala}
> case class BatchDataConnectorContext(@transient sparkSession: SparkSession,
>  dcParam: DataConnectorParam,
>  timestampStorage: TimestampStorage)
> case class StreamingDataConnectorContext(@transient sparkSession: 
> SparkSession,
>  @transient ssc: StreamingContext,
>  dcParam: DataConnectorParam,
>  timestampStorage: TimestampStorage,
>  streamingCacheClientOpt: 
> Option[StreamingCacheClient])
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] incubator-griffin pull request #456: [GRIFFIN-213] Custom connector support

2018-11-18 Thread toyboxman
Github user toyboxman commented on a diff in the pull request:

https://github.com/apache/incubator-griffin/pull/456#discussion_r234475362
  
--- Diff: griffin-doc/measure/measure-configuration-guide.md ---
@@ -188,7 +188,7 @@ Above lists DQ job configure parameters.
 - **sinks**: Whitelisted sink types for this job. Note: no sinks will be 
used, if empty or omitted. 
 
 ### Data Connector
-- **type**: Data connector type, "AVRO", "HIVE", "TEXT-DIR" for batch 
mode, "KAFKA" for streaming mode.
+- **type**: Data connector type: "AVRO", "HIVE", "TEXT-DIR", "CUSTOM" for 
batch mode; "KAFKA", "CUSTOM" for streaming mode.
--- End diff --

do you think about 'ANY' as replacement for 'CUSTOM'

**type**: Data connector type, "AVRO", "HIVE", "TEXT-DIR" for batch mode, 
"KAFKA" for streaming mode, "ANY" for boths.


---


[jira] [Commented] (GRIFFIN-213) Support pluggable datasource connectors

2018-11-18 Thread Eugene (JIRA)


[ 
https://issues.apache.org/jira/browse/GRIFFIN-213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16691151#comment-16691151
 ] 

Eugene commented on GRIFFIN-213:


two comments:

1.customization connector allows users to connect any data source and involves 
third-party plugins, how could we guarantee safety and security of griffin 
pipeline, is there policy or permission check?

2.'custom' seems not suitable connector type which describes data source like 
'kafka' 'hive' 'text', do you think about it?

> Support pluggable datasource connectors
> ---
>
> Key: GRIFFIN-213
> URL: https://issues.apache.org/jira/browse/GRIFFIN-213
> Project: Griffin (Incubating)
>  Issue Type: Improvement
>Reporter: Nikolay Sokolov
>Priority: Minor
>
> As of Griffin 0.3, code modification is required, in order to add new data 
> connectors.
> Proposal is to add new data connector type, CUSTOM, that would allow to 
> specify class name of data connector implementation to use. Additional jars 
> with custom connector implementations would be provided in spark 
> configuration template.
> Class name would be specified in "class" config of data connector. For 
> example:
> {code:json}
> "connectors": [
> {
>   "type": "CUSTOM",
>   "config": {
> "class": "org.example.griffin.JDBCConnector"
> // extra connector-specific parameters
>   }
> }
>   ]
> {code}
> Proposed contract for implementations is based on current convention:
>  - for batch
>  ** class should be a subclass of BatchDataConnector
>  ** if should have method with signature:
> {code:java}
> public static BatchDataConnector apply(ctx: BatchDataConnectorContext)
> {code}
>  - for streaming
>  ** class should be a subclass of StreamingDataConnector
>  ** it should have method with signature:
> {code:java}
> public static StreamingDataConnector apply(ctx: StreamingDataConnectorContext)
> {code}
> Signatures of context objects:
> {code:scala}
> case class BatchDataConnectorContext(@transient sparkSession: SparkSession,
>  dcParam: DataConnectorParam,
>  timestampStorage: TimestampStorage)
> case class StreamingDataConnectorContext(@transient sparkSession: 
> SparkSession,
>  @transient ssc: StreamingContext,
>  dcParam: DataConnectorParam,
>  timestampStorage: TimestampStorage,
>  streamingCacheClientOpt: 
> Option[StreamingCacheClient])
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)