[GitHub] incubator-griffin issue #457: merge_pr.py python 3 support
Github user ludongfang commented on the issue: https://github.com/apache/incubator-griffin/pull/457 LGTM ---
[GitHub] incubator-griffin pull request #457: merge_pr.py python 3 support
GitHub user IAmFQQ opened a pull request: https://github.com/apache/incubator-griffin/pull/457 merge_pr.py python 3 support 1. Import Request/urlopen/HTTPError from urllib.request to support python 3 2. The print statement has been replaced with a print() function You can merge this pull request into a Git repository by running: $ git pull https://github.com/IAmFQQ/incubator-griffin support_python3 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-griffin/pull/457.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #457 commit 4f1f4ecf6e18b9dfefb267ecc92079add36548b9 Author: Fan Date: 2018-11-18T13:58:07Z url request support in python 3 \n print -> print() ---
Re: [RESULT][VOTE] Graduate Apache Griffin (incubating) as a TLP
Hi Justin, Could you have a look whether Apache Griffin miss anything in the agenda before board meeting? Henry should have updated the resolution part for apache griffin graduation. Please tell us if we miss anything, so we can update it accordingly before the meeting. Thanks, William On Wed, Nov 14, 2018 at 4:23 PM William Guo wrote: > Henry, Thanks! > > On Wed, Nov 14, 2018 at 2:03 PM Henry Saputra > wrote: > >> Hi Justin, >> >> I can add it to the resolution. >> >> Should I add it as New Resolution item? >> >> - Henry >> >> On Tue, Nov 13, 2018 at 12:45 PM Justin Mclean >> wrote: >> >> > Hi, >> > >> > I had you down as graduating this month but don't see the resolution on >> > the board agenda yet. [1] Is someone on top of it? >> > >> > Thanks, >> > Justin >> > >> > 1. https://whimsy.apache.org/board/agenda/2018-11-21/ >> > >> >
[jira] [Commented] (GRIFFIN-210) [Measure] need to integrate with upstream/downstream nodes when bad records are founded
[ https://issues.apache.org/jira/browse/GRIFFIN-210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16691165#comment-16691165 ] Eugene commented on GRIFFIN-210: I think we should allow to define DQ rules/threshold/remedy action from user's perspective. and add pre-measure/post-measure stages in whole pipeline. considering integration with different upstream/downstream nodes, more options are better than one mode, leave choice to users > [Measure] need to integrate with upstream/downstream nodes when bad records > are founded > --- > > Key: GRIFFIN-210 > URL: https://issues.apache.org/jira/browse/GRIFFIN-210 > Project: Griffin (Incubating) > Issue Type: Wish >Reporter: William Guo >Assignee: William Guo >Priority: Major > > In a typical data quality project, when Apache Griffin find some data quality > issue, usually, it need to integrate with upstream or downstream nodes. > So corresponding systems can have opportunities to automatically do some > remedy action, such as retry... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (GRIFFIN-213) Support pluggable datasource connectors
[ https://issues.apache.org/jira/browse/GRIFFIN-213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16691153#comment-16691153 ] ASF GitHub Bot commented on GRIFFIN-213: Github user toyboxman commented on a diff in the pull request: https://github.com/apache/incubator-griffin/pull/456#discussion_r234475362 --- Diff: griffin-doc/measure/measure-configuration-guide.md --- @@ -188,7 +188,7 @@ Above lists DQ job configure parameters. - **sinks**: Whitelisted sink types for this job. Note: no sinks will be used, if empty or omitted. ### Data Connector -- **type**: Data connector type, "AVRO", "HIVE", "TEXT-DIR" for batch mode, "KAFKA" for streaming mode. +- **type**: Data connector type: "AVRO", "HIVE", "TEXT-DIR", "CUSTOM" for batch mode; "KAFKA", "CUSTOM" for streaming mode. --- End diff -- do you think about 'ANY' as replacement for 'CUSTOM' **type**: Data connector type, "AVRO", "HIVE", "TEXT-DIR" for batch mode, "KAFKA" for streaming mode, "ANY" for boths. > Support pluggable datasource connectors > --- > > Key: GRIFFIN-213 > URL: https://issues.apache.org/jira/browse/GRIFFIN-213 > Project: Griffin (Incubating) > Issue Type: Improvement >Reporter: Nikolay Sokolov >Priority: Minor > > As of Griffin 0.3, code modification is required, in order to add new data > connectors. > Proposal is to add new data connector type, CUSTOM, that would allow to > specify class name of data connector implementation to use. Additional jars > with custom connector implementations would be provided in spark > configuration template. > Class name would be specified in "class" config of data connector. For > example: > {code:json} > "connectors": [ > { > "type": "CUSTOM", > "config": { > "class": "org.example.griffin.JDBCConnector" > // extra connector-specific parameters > } > } > ] > {code} > Proposed contract for implementations is based on current convention: > - for batch > ** class should be a subclass of BatchDataConnector > ** if should have method with signature: > {code:java} > public static BatchDataConnector apply(ctx: BatchDataConnectorContext) > {code} > - for streaming > ** class should be a subclass of StreamingDataConnector > ** it should have method with signature: > {code:java} > public static StreamingDataConnector apply(ctx: StreamingDataConnectorContext) > {code} > Signatures of context objects: > {code:scala} > case class BatchDataConnectorContext(@transient sparkSession: SparkSession, > dcParam: DataConnectorParam, > timestampStorage: TimestampStorage) > case class StreamingDataConnectorContext(@transient sparkSession: > SparkSession, > @transient ssc: StreamingContext, > dcParam: DataConnectorParam, > timestampStorage: TimestampStorage, > streamingCacheClientOpt: > Option[StreamingCacheClient]) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] incubator-griffin pull request #456: [GRIFFIN-213] Custom connector support
Github user toyboxman commented on a diff in the pull request: https://github.com/apache/incubator-griffin/pull/456#discussion_r234475362 --- Diff: griffin-doc/measure/measure-configuration-guide.md --- @@ -188,7 +188,7 @@ Above lists DQ job configure parameters. - **sinks**: Whitelisted sink types for this job. Note: no sinks will be used, if empty or omitted. ### Data Connector -- **type**: Data connector type, "AVRO", "HIVE", "TEXT-DIR" for batch mode, "KAFKA" for streaming mode. +- **type**: Data connector type: "AVRO", "HIVE", "TEXT-DIR", "CUSTOM" for batch mode; "KAFKA", "CUSTOM" for streaming mode. --- End diff -- do you think about 'ANY' as replacement for 'CUSTOM' **type**: Data connector type, "AVRO", "HIVE", "TEXT-DIR" for batch mode, "KAFKA" for streaming mode, "ANY" for boths. ---
[jira] [Commented] (GRIFFIN-213) Support pluggable datasource connectors
[ https://issues.apache.org/jira/browse/GRIFFIN-213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16691151#comment-16691151 ] Eugene commented on GRIFFIN-213: two comments: 1.customization connector allows users to connect any data source and involves third-party plugins, how could we guarantee safety and security of griffin pipeline, is there policy or permission check? 2.'custom' seems not suitable connector type which describes data source like 'kafka' 'hive' 'text', do you think about it? > Support pluggable datasource connectors > --- > > Key: GRIFFIN-213 > URL: https://issues.apache.org/jira/browse/GRIFFIN-213 > Project: Griffin (Incubating) > Issue Type: Improvement >Reporter: Nikolay Sokolov >Priority: Minor > > As of Griffin 0.3, code modification is required, in order to add new data > connectors. > Proposal is to add new data connector type, CUSTOM, that would allow to > specify class name of data connector implementation to use. Additional jars > with custom connector implementations would be provided in spark > configuration template. > Class name would be specified in "class" config of data connector. For > example: > {code:json} > "connectors": [ > { > "type": "CUSTOM", > "config": { > "class": "org.example.griffin.JDBCConnector" > // extra connector-specific parameters > } > } > ] > {code} > Proposed contract for implementations is based on current convention: > - for batch > ** class should be a subclass of BatchDataConnector > ** if should have method with signature: > {code:java} > public static BatchDataConnector apply(ctx: BatchDataConnectorContext) > {code} > - for streaming > ** class should be a subclass of StreamingDataConnector > ** it should have method with signature: > {code:java} > public static StreamingDataConnector apply(ctx: StreamingDataConnectorContext) > {code} > Signatures of context objects: > {code:scala} > case class BatchDataConnectorContext(@transient sparkSession: SparkSession, > dcParam: DataConnectorParam, > timestampStorage: TimestampStorage) > case class StreamingDataConnectorContext(@transient sparkSession: > SparkSession, > @transient ssc: StreamingContext, > dcParam: DataConnectorParam, > timestampStorage: TimestampStorage, > streamingCacheClientOpt: > Option[StreamingCacheClient]) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)