Repository: incubator-griffin Updated Branches: refs/heads/master 95e45dca4 -> 4e0f25d2c
[GRIFFIN-138] update readme.md, highlighted docker guide update readme.md, describe docker guide, debug guide and deploy guide in order for specific users Author: Lionel Liu <bhlx3l...@163.com> Closes #248 from bhlx3lyx7/tmst. Project: http://git-wip-us.apache.org/repos/asf/incubator-griffin/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-griffin/commit/4e0f25d2 Tree: http://git-wip-us.apache.org/repos/asf/incubator-griffin/tree/4e0f25d2 Diff: http://git-wip-us.apache.org/repos/asf/incubator-griffin/diff/4e0f25d2 Branch: refs/heads/master Commit: 4e0f25d2c9fd64c56a128e3ddde7c5c7addd916c Parents: 95e45dc Author: Lionel Liu <bhlx3l...@163.com> Authored: Sun Apr 8 12:52:21 2018 +0800 Committer: Lionel Liu <bhlx3l...@163.com> Committed: Sun Apr 8 12:52:21 2018 +0800 ---------------------------------------------------------------------- README.md | 174 ++++---------------------------- griffin-doc/deploy/deploy-guide.md | 160 +++++++++++++++++++++++++++++ 2 files changed, 179 insertions(+), 155 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/4e0f25d2/README.md ---------------------------------------------------------------------- diff --git a/README.md b/README.md index 5bc0e1c..37987d0 100644 --- a/README.md +++ b/README.md @@ -27,176 +27,40 @@ Apache Griffin is a model driven data quality solution for modern data systems. ## Getting Started +### First Try of Griffin -You can try Griffin in docker following the [docker guide](https://github.com/apache/incubator-griffin/blob/master/griffin-doc/docker/griffin-docker-guide.md). - -To run Griffin at local, you can follow instructions below. - -### Prerequisites -You need to install following items -- jdk (1.8 or later versions). -- mysql. -- Postgresql. -- npm (version 6.0.0+). -- [Hadoop](http://apache.claz.org/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz) (2.6.0 or later), you can get some help [here](https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/SingleCluster.html). -- [Spark](http://spark.apache.org/downloads.html) (version 1.6.x, griffin does not support 2.0.x at current), if you want to install Pseudo Distributed/Single Node Cluster, you can get some help [here](http://why-not-learn-something.blogspot.com/2015/06/spark-installation-pseudo.html). -- [Hive](http://apache.claz.org/hive/hive-1.2.1/apache-hive-1.2.1-bin.tar.gz) (version 1.2.1 or later), you can get some help [here](https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-RunningHive). - You need to make sure that your spark cluster could access your HiveContext. -- [Livy](http://archive.cloudera.com/beta/livy/livy-server-0.3.0.zip), you can get some help [here](http://livy.io/quickstart.html). - Griffin need to schedule spark jobs by server, we use livy to submit our jobs. - For some issues of Livy for HiveContext, we need to download 3 files, and put them into HDFS. - ``` - datanucleus-api-jdo-3.2.6.jar - datanucleus-core-3.2.10.jar - datanucleus-rdbms-3.2.9.jar - ``` -- ElasticSearch. - ElasticSearch works as a metrics collector, Griffin produces metrics to it, and our default UI get metrics from it, you can use your own way as well. - -### Configuration - -Create database 'quartz' in mysql -``` -mysql -u username -e "create database quartz" -p -``` -Init quartz tables in mysql by service/src/main/resources/Init_quartz.sql -``` -mysql -u username -p quartz < service/src/main/resources/Init_quartz.sql -``` - - -You should also modify some configurations of Griffin for your environment. - -- <b>service/src/main/resources/application.properties</b> - - ``` - # jpa - spring.datasource.url = jdbc:postgresql://<your IP>:5432/quartz?autoReconnect=true&useSSL=false - spring.datasource.username = <user name> - spring.datasource.password = <password> - spring.jpa.generate-ddl=true - spring.datasource.driverClassName = org.postgresql.Driver - spring.jpa.show-sql = true - - # hive metastore - hive.metastore.uris = thrift://<your IP>:9083 - hive.metastore.dbname = <hive database name> # default is "default" - - # external properties directory location, ignore it if not required - external.config.location = - - # login strategy, default is "default" - login.strategy = <default or ldap> - - # ldap properties, ignore them if ldap is not enabled - ldap.url = ldap://hostname:port - ldap.email = @example.com - ldap.searchBase = DC=org,DC=example - ldap.searchPattern = (sAMAccountName={0}) - - # hdfs, ignore it if you do not need predicate job - fs.defaultFS = hdfs://<hdfs-default-name> - - # elasticsearch - elasticsearch.host = <your IP> - elasticsearch.port = <your elasticsearch rest port> - # authentication properties, uncomment if basic authentication is enabled - # elasticsearch.user = user - # elasticsearch.password = password - ``` - -- <b>measure/src/main/resources/env.json</b> - ``` - "persist": [ - ... - { - "type": "http", - "config": { - "method": "post", - "api": "http://<your ES IP>:<ES rest port>/griffin/accuracy" - } - } - ] - ``` - Put the modified env.json file into HDFS. - -- <b>service/src/main/resources/sparkJob.properties</b> - ``` - sparkJob.file = hdfs://<griffin measure path>/griffin-measure.jar - sparkJob.args_1 = hdfs://<griffin env path>/env.json - - sparkJob.jars = hdfs://<datanucleus path>/spark-avro_2.11-2.0.1.jar\ - hdfs://<datanucleus path>/datanucleus-api-jdo-3.2.6.jar\ - hdfs://<datanucleus path>/datanucleus-core-3.2.10.jar\ - hdfs://<datanucleus path>/datanucleus-rdbms-3.2.9.jar - - spark.yarn.dist.files = hdfs:///<spark conf path>/hive-site.xml - - livy.uri = http://<your IP>:8998/batches - spark.uri = http://<your IP>:8088 - ``` - - \<griffin measure path> is the location you should put the jar file of measure module. - - \<griffin env path> is the location you should put the env.json file. - - \<datanucleus path> is the location you should put the 3 jar files of livy, and the spark avro jar file if you need. - - \<spark conf path> is the location of spark conf directory. - -### Build and Run - -Build the whole project and deploy. (NPM should be installed) - - ``` - mvn clean install - ``` - -Put jar file of measure module into \<griffin measure path> in HDFS - -``` -cp measure/target/measure-<version>-incubating-SNAPSHOT.jar measure/target/griffin-measure.jar -hdfs dfs -put measure/target/griffin-measure.jar <griffin measure path>/ - ``` - -After all environment services startup, we can start our server. - - ``` - java -jar service/target/service.jar - ``` - -After a few seconds, we can visit our default UI of Griffin (by default the port of spring boot is 8080). - - ``` - http://<your IP>:8080 - ``` - -You can use UI following the steps [here](https://github.com/apache/incubator-griffin/blob/master/griffin-doc/ui/user-guide.md). - -**Note**: The front-end UI is still under development, you can only access some basic features currently. - - -### Build and Debug +You can try Griffin in docker following the [docker guide](griffin-doc/docker/griffin-docker-guide.md). + +### Environment for Dev If you want to develop Griffin, please follow [this document](griffin-doc/dev/dev-env-build.md), to skip complex environment building work. +### Deployment at Local -## Community +If you want to deploy Griffin in your local environment, please follow [this document](griffin-doc/deploy/deploy-guide.md). -You can contact us via email: <a href="mailto:d...@griffin.incubator.apache.org">d...@griffin.incubator.apache.org</a> +## Community -You can also subscribe this mail by sending a email to [here](mailto:dev-subscr...@griffin.incubator.apache.org). +You can access [griffin home page](http://griffin.apache.org). -You can access our issues jira page [here](https://issues.apache.org/jira/browse/GRIFFIN) +You can contact us via email: +- dev-list: <a href="mailto:d...@griffin.incubator.apache.org">d...@griffin.incubator.apache.org</a> +- user-list: <a href="mailto:u...@griffin.incubator.apache.org">u...@griffin.incubator.apache.org</a> +You can also subscribe this mail by sending a email to [subscribe dev-list](mailto:dev-subscr...@griffin.incubator.apache.org) and [subscribe user-list](mailto:user-subscr...@griffin.incubator.apache.org). +You can access our issues on [JIRA page](https://issues.apache.org/jira/browse/GRIFFIN) ## Contributing -See [Contributing Guide](./CONTRIBUTING.md) for details on how to contribute code, documentation, etc. +See [How to Contribute](http://griffin.apache.org/2017/03/04/community) for details on how to contribute code, documentation, etc. ## References - [Home Page](http://griffin.incubator.apache.org/) - [Wiki](https://cwiki.apache.org/confluence/display/GRIFFIN/Apache+Griffin) - Documents: - - [Measure](https://github.com/apache/incubator-griffin/tree/master/griffin-doc/measure) - - [Service](https://github.com/apache/incubator-griffin/tree/master/griffin-doc/service) - - [UI](https://github.com/apache/incubator-griffin/tree/master/griffin-doc/ui) - - [Docker usage](https://github.com/apache/incubator-griffin/tree/master/griffin-doc/docker) - - [Postman API](https://github.com/apache/incubator-griffin/tree/master/griffin-doc/service/postman) \ No newline at end of file + - [Measure](griffin-doc/measure) + - [Service](griffin-doc/service) + - [UI](griffin-doc/ui) + - [Docker usage](griffin-doc/docker) + - [Postman API](griffin-doc/service/postman) \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/4e0f25d2/griffin-doc/deploy/deploy-guide.md ---------------------------------------------------------------------- diff --git a/griffin-doc/deploy/deploy-guide.md b/griffin-doc/deploy/deploy-guide.md new file mode 100644 index 0000000..0693c25 --- /dev/null +++ b/griffin-doc/deploy/deploy-guide.md @@ -0,0 +1,160 @@ +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# Apache Griffin Deployment Guide +For Griffin users, you can deploy it with some dependencies in your environment, you can follow instructions below. + +### Prerequisites +You need to install following items +- jdk (1.8 or later versions). +- mysql or Postgresql. +- npm (version 6.0.0+). +- [Hadoop](http://apache.claz.org/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz) (2.6.0 or later), you can get some help [here](https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/SingleCluster.html). +- [Spark](http://spark.apache.org/downloads.html) (version 1.6.x, griffin does not support 2.0.x at current), if you want to install Pseudo Distributed/Single Node Cluster, you can get some help [here](http://why-not-learn-something.blogspot.com/2015/06/spark-installation-pseudo.html). +- [Hive](http://apache.claz.org/hive/hive-1.2.1/apache-hive-1.2.1-bin.tar.gz) (version 1.2.1 or later), you can get some help [here](https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-RunningHive). + You need to make sure that your spark cluster could access your HiveContext. +- [Livy](http://archive.cloudera.com/beta/livy/livy-server-0.3.0.zip), you can get some help [here](http://livy.io/quickstart.html). + Griffin need to schedule spark jobs by server, we use livy to submit our jobs. + For some issues of Livy for HiveContext, we need to download 3 files or get them from Spark lib `$SPARK_HOME/lib/`, and put them into HDFS. + ``` + datanucleus-api-jdo-3.2.6.jar + datanucleus-core-3.2.10.jar + datanucleus-rdbms-3.2.9.jar + ``` +- ElasticSearch. + ElasticSearch works as a metrics collector, Griffin produces metrics to it, and our default UI get metrics from it, you can use your own way as well. + +### Configuration + +Create database 'quartz' in mysql +``` +mysql -u username -e "create database quartz" -p +``` +Init quartz tables in mysql by service/src/main/resources/Init_quartz.sql +``` +mysql -u username -p quartz < service/src/main/resources/Init_quartz.sql +``` + + +You should also modify some configurations of Griffin for your environment. + +- <b>service/src/main/resources/application.properties</b> + + ``` + # jpa + spring.datasource.url = jdbc:postgresql://<your IP>:5432/quartz?autoReconnect=true&useSSL=false + spring.datasource.username = <user name> + spring.datasource.password = <password> + spring.jpa.generate-ddl=true + spring.datasource.driverClassName = org.postgresql.Driver + spring.jpa.show-sql = true + + # hive metastore + hive.metastore.uris = thrift://<your IP>:9083 + hive.metastore.dbname = <hive database name> # default is "default" + + # external properties directory location, ignore it if not required + external.config.location = + + # login strategy, default is "default" + login.strategy = <default or ldap> + + # ldap properties, ignore them if ldap is not enabled + ldap.url = ldap://hostname:port + ldap.email = @example.com + ldap.searchBase = DC=org,DC=example + ldap.searchPattern = (sAMAccountName={0}) + + # hdfs, ignore it if you do not need predicate job + fs.defaultFS = hdfs://<hdfs-default-name> + + # elasticsearch + elasticsearch.host = <your IP> + elasticsearch.port = <your elasticsearch rest port> + # authentication properties, uncomment if basic authentication is enabled + # elasticsearch.user = user + # elasticsearch.password = password + ``` + +- <b>measure/src/main/resources/env.json</b> + ``` + "persist": [ + ... + { + "type": "http", + "config": { + "method": "post", + "api": "http://<your ES IP>:<ES rest port>/griffin/accuracy" + } + } + ] + ``` + Put the modified env.json file into HDFS. + +- <b>service/src/main/resources/sparkJob.properties</b> + ``` + sparkJob.file = hdfs://<griffin measure path>/griffin-measure.jar + sparkJob.args_1 = hdfs://<griffin env path>/env.json + + sparkJob.jars = hdfs://<datanucleus path>/spark-avro_2.11-2.0.1.jar\ + hdfs://<datanucleus path>/datanucleus-api-jdo-3.2.6.jar\ + hdfs://<datanucleus path>/datanucleus-core-3.2.10.jar\ + hdfs://<datanucleus path>/datanucleus-rdbms-3.2.9.jar + + spark.yarn.dist.files = hdfs:///<spark conf path>/hive-site.xml + + livy.uri = http://<your IP>:8998/batches + spark.uri = http://<your IP>:8088 + ``` + - \<griffin measure path> is the location you should put the jar file of measure module. + - \<griffin env path> is the location you should put the env.json file. + - \<datanucleus path> is the location you should put the 3 jar files of livy, and the spark avro jar file if you need to support avro data. + - \<spark conf path> is the location of spark conf directory. + +### Build and Run + +Build the whole project and deploy. (NPM should be installed) + + ``` + mvn clean install + ``` + +Put jar file of measure module into \<griffin measure path> in HDFS + +``` +cp measure/target/measure-<version>-incubating-SNAPSHOT.jar measure/target/griffin-measure.jar +hdfs dfs -put measure/target/griffin-measure.jar <griffin measure path>/ + ``` + +After all environment services startup, we can start our server. + + ``` + java -jar service/target/service.jar + ``` + +After a few seconds, we can visit our default UI of Griffin (by default the port of spring boot is 8080). + + ``` + http://<your IP>:8080 + ``` + +You can use UI following the steps [here](../ui/user-guide.md). + +**Note**: The front-end UI is still under development, you can only access some basic features currently. +