http://git-wip-us.apache.org/repos/asf/falcon/blob/91c68bea/content/0.11/Configuration.html ---------------------------------------------------------------------- diff --git a/content/0.11/Configuration.html b/content/0.11/Configuration.html new file mode 100644 index 0000000..21991f7 --- /dev/null +++ b/content/0.11/Configuration.html @@ -0,0 +1,531 @@ +<!DOCTYPE html> +<!-- + | Generated by Apache Maven Doxia at 2018-03-12 + | Rendered using Apache Maven Fluido Skin 1.3.0 +--> +<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> + <head> + <meta charset="UTF-8" /> + <meta name="viewport" content="width=device-width, initial-scale=1.0" /> + <meta name="Date-Revision-yyyymmdd" content="20180312" /> + <meta http-equiv="Content-Language" content="en" /> + <title>Falcon - Configuring Falcon</title> + <link rel="stylesheet" href="./css/apache-maven-fluido-1.3.0.min.css" /> + <link rel="stylesheet" href="./css/site.css" /> + <link rel="stylesheet" href="./css/print.css" media="print" /> + + + <script type="text/javascript" src="./js/apache-maven-fluido-1.3.0.min.js"></script> + + + +<script type="text/javascript">$( document ).ready( function() { $( '.carousel' ).carousel( { interval: 3500 } ) } );</script> + + </head> + <body class="topBarDisabled"> + + + + + <div class="container"> + <div id="banner"> + <div class="pull-left"> + <div id="bannerLeft"> + <img src="images/falcon-logo.png" alt="Apache Falcon" width="200px" height="45px"/> + </div> + </div> + <div class="pull-right"> </div> + <div class="clear"><hr/></div> + </div> + + <div id="breadcrumbs"> + <ul class="breadcrumb"> + + + <li class=""> + <a href="index.html" title="Falcon"> + Falcon</a> + </li> + <li class="divider ">/</li> + <li class="">Configuring Falcon</li> + + + + <li id="publishDate" class="pull-right">Last Published: 2018-03-12</li> <li class="divider pull-right">|</li> + <li id="projectVersion" class="pull-right">Version: 0.11</li> + + </ul> + </div> + + + + <div id="bodyColumn" > + + <div class="section"> +<h2>Configuring Falcon<a name="Configuring_Falcon"></a></h2> +<p>By default config directory used by falcon is {package dir}/conf. To override this (to use the same conf with multiple falcon upgrades), set environment variable FALCON_CONF to the path of the conf dir.</p> +<p>falcon-env.sh has been added to the falcon conf. This file can be used to set various environment variables that you need for you services. In addition you can set any other environment variables you might need. This file will be sourced by falcon scripts before any commands are executed. The following environment variables are available to set.</p> +<div class="source"> +<pre> +# The java implementation to use. If JAVA_HOME is not found we expect java and jar to be in path +#export JAVA_HOME= + +# any additional java opts you want to set. This will apply to both client and server operations +#export FALCON_OPTS= + +# any additional java opts that you want to set for client only +#export FALCON_CLIENT_OPTS= + +# java heap size we want to set for the client. Default is 1024MB +#export FALCON_CLIENT_HEAP= + +# any additional opts you want to set for prism service. +#export FALCON_PRISM_OPTS= + +# java heap size we want to set for the prism service. Default is 1024MB +#export FALCON_PRISM_HEAP= + +# any additional opts you want to set for falcon service. +#export FALCON_SERVER_OPTS= + +# java heap size we want to set for the falcon server. Default is 1024MB +#export FALCON_SERVER_HEAP= + +# What is is considered as falcon home dir. Default is the base location of the installed software +#export FALCON_HOME_DIR= + +# Where log files are stored. Default is logs directory under the base install location +#export FALCON_LOG_DIR= + +# Where pid files are stored. Default is logs directory under the base install location +#export FALCON_PID_DIR= + +# where the falcon active mq data is stored. Default is logs/data directory under the base install location +#export FALCON_DATA_DIR= + +# Where do you want to expand the war file. By Default it is in /server/webapp dir under the base install dir. +#export FALCON_EXPANDED_WEBAPP_DIR= + +# Any additional classpath elements to be added to the Falcon server/client classpath +#export FALCON_EXTRA_CLASS_PATH= + +</pre></div></div> +<div class="section"> +<h3>Advanced Configurations<a name="Advanced_Configurations"></a></h3></div> +<div class="section"> +<h4>Configuring Monitoring plugin to register catalog partitions<a name="Configuring_Monitoring_plugin_to_register_catalog_partitions"></a></h4> +<p>Falcon comes with a monitoring plugin that registers catalog partition. This comes in really handy during migration from filesystem based feeds to hcatalog based feeds. This plugin enables the user to de-couple the partition registration and assume that all partitions are already on hcatalog even before the migration, simplifying the hcatalog migration.</p> +<p>By default this plugin is disabled. To enable this plugin and leverage the feature, there are 3 pre-requisites:</p> +<div class="source"> +<pre> +In {package dir}/conf/startup.properties, add +*.workflow.execution.listeners=org.apache.falcon.catalog.CatalogPartitionHandler + +In the cluster definition, ensure registry endpoint is defined. +Ex: +<interface type="registry" endpoint="thrift://localhost:1109" version="0.13.3"/> + +In the feed definition, ensure the corresponding catalog table is mentioned in feed-properties +Ex: +<properties> + <property name="catalog.table" value="catalog:default:in_table#year={YEAR};month={MONTH};day={DAY};hour={HOUR}; + minute={MINUTE}"/> +</properties> + +</pre></div> +<p><b>NOTE : for Mac OS users</b></p> +<div class="source"> +<pre> +If you are using a Mac OS, you will need to configure the FALCON_SERVER_OPTS (explained above). + +In {package dir}/conf/falcon-env.sh uncomment the following line +#export FALCON_SERVER_OPTS= + +and change it to look as below +export FALCON_SERVER_OPTS="-Djava.awt.headless=true -Djava.security.krb5.realm= -Djava.security.krb5.kdc=" + +</pre></div></div> +<div class="section"> +<h4>Activemq<a name="Activemq"></a></h4> +<p>* falcon server starts embedded active mq. To control this behaviour, set the following system properties using -D option in environment variable FALCON_OPTS:</p> +<ul> +<li>falcon.embeddedmq=<true/false> - Should server start embedded active mq, default true</li> +<li>falcon.embeddedmq.port=<port> - Port for embedded active mq, default 61616</li> +<li>falcon.embeddedmq.data=<path> - Data path for embedded active mq, default {package dir}/logs/data</li></ul></div> +<div class="section"> +<h4>Falcon System Notifications<a name="Falcon_System_Notifications"></a></h4> +<p>Some Falcon features such as late data handling, retries, metadata service, depend on JMS notifications sent when the Oozie workflow completes. Falcon listens to Oozie notification via JMS. You need to enable Oozie JMS notification as explained below. Falcon post processing feature continues to only send user notifications so enabling Oozie JMS notification is important.</p> +<p><b>NOTE : If Oozie JMS notification is not enabled, the Falcon features such as failure retry, late data handling and metadata service will be disabled for all entities on the server.</b></p></div> +<div class="section"> +<h4>Enable Oozie JMS notification<a name="Enable_Oozie_JMS_notification"></a></h4> +<p></p> +<ul> +<li>Please add/change the following properties in oozie-site.xml in the oozie installation dir.</li></ul> +<div class="source"> +<pre> + <property> + <name>oozie.jms.producer.connection.properties</name> + <value>java.naming.factory.initial#org.apache.activemq.jndi.ActiveMQInitialContextFactory;java.naming.provider.url#tcp://<activemq-host>:<port></value> + </property> + + <property> + <name>oozie.service.EventHandlerService.event.listeners</name> + <value>org.apache.oozie.jms.JMSJobEventListener</value> + </property> + + <property> + <name>oozie.service.JMSTopicService.topic.name</name> + <value>WORKFLOW=ENTITY.TOPIC,COORDINATOR=ENTITY.TOPIC</value> + </property> + + <property> + <name>oozie.service.JMSTopicService.topic.prefix</name> + <value>FALCON.</value> + </property> + + <!-- add org.apache.oozie.service.JMSAccessorService to the other existing services if any --> + <property> + <name>oozie.services.ext</name> + <value>org.apache.oozie.service.JMSAccessorService,org.apache.oozie.service.PartitionDependencyManagerService,org.apache.oozie.service.HCatAccessorService</value> + </property> + +</pre></div> +<p></p> +<ul> +<li>In falcon startup.properties, set JMS broker url to be the same as the one set in oozie-site.xml property</li></ul>oozie.jms.producer.connection.properties (see above) +<div class="source"> +<pre> + *.broker.url=tcp://<activemq-host>:<port> + +</pre></div></div> +<div class="section"> +<h4>Configuring Oozie for Falcon<a name="Configuring_Oozie_for_Falcon"></a></h4> +<p>Falcon uses HCatalog for data availability notification when Hive tables are replicated. Make the following configuration changes to Oozie to ensure Hive table replication in Falcon:</p> +<p></p> +<ul> +<li>Stop the Oozie service on all Falcon clusters. Run the following commands on the Oozie host machine.</li></ul> +<div class="source"> +<pre> +su - $OOZIE_USER + +<oozie-install-dir>/bin/oozie-stop.sh + +where $OOZIE_USER is the Oozie user. For example, oozie. + +</pre></div> +<p></p> +<ul> +<li>Copy each cluster's hadoop conf directory to a different location. For example, if you have two clusters, copy one to /etc/hadoop/conf-1 and the other to /etc/hadoop/conf-2.</li></ul> +<p></p> +<ul> +<li>For each oozie-site.xml file, modify the oozie.service.HadoopAccessorService.hadoop.configurations property, specifying clusters, the RPC ports of the NameNodes, and HostManagers accordingly. For example, if Falcon connects to three clusters, specify:</li></ul> +<div class="source"> +<pre> + +<property> + <name>oozie.service.HadoopAccessorService.hadoop.configurations</name> + <value>*=/etc/hadoop/conf,$NameNode:$rpcPortNN=$hadoopConfDir1,$ResourceManager1:$rpcPortRM=$hadoopConfDir1,$NameNode2=$hadoopConfDir2,$ResourceManager2:$rpcPortRM=$hadoopConfDir2,$NameNode3 :$rpcPortNN =$hadoopConfDir3,$ResourceManager3 :$rpcPortRM =$hadoopConfDir3</value> + <description> + Comma separated AUTHORITY=HADOOP_CONF_DIR, where AUTHORITY is the HOST:PORT of + the Hadoop service (JobTracker, HDFS). The wildcard '*' configuration is + used when there is no exact match for an authority. The HADOOP_CONF_DIR contains + the relevant Hadoop *-site.xml files. If the path is relative is looked within + the Oozie configuration directory; though the path can be absolute (i.e. to point + to Hadoop client conf/ directories in the local filesystem. + </description> +</property> + + +</pre></div> +<p></p> +<ul> +<li>Add the following properties to the /etc/oozie/conf/oozie-site.xml file:</li></ul> +<div class="source"> +<pre> + +<property> + <name>oozie.service.ProxyUserService.proxyuser.falcon.hosts</name> + <value>*</value> +</property> + +<property> + <name>oozie.service.ProxyUserService.proxyuser.falcon.groups</name> + <value>*</value> +</property> + +<property> + <name>oozie.service.URIHandlerService.uri.handlers</name> + <value>org.apache.oozie.dependency.FSURIHandler, org.apache.oozie.dependency.HCatURIHandler</value> +</property> + +<property> + <name>oozie.services.ext</name> + <value>org.apache.oozie.service.JMSAccessorService, org.apache.oozie.service.PartitionDependencyManagerService, + org.apache.oozie.service.HCatAccessorService</value> +</property> + +<!-- Coord EL Functions Properties --> + +<property> + <name>oozie.service.ELService.ext.functions.coord-job-submit-instances</name> + <value>now=org.apache.oozie.extensions.OozieELExtensions#ph1_now_echo, + today=org.apache.oozie.extensions.OozieELExtensions#ph1_today_echo, + yesterday=org.apache.oozie.extensions.OozieELExtensions#ph1_yesterday_echo, + currentMonth=org.apache.oozie.extensions.OozieELExtensions#ph1_currentMonth_echo, + lastMonth=org.apache.oozie.extensions.OozieELExtensions#ph1_lastMonth_echo, + currentYear=org.apache.oozie.extensions.OozieELExtensions#ph1_currentYear_echo, + lastYear=org.apache.oozie.extensions.OozieELExtensions#ph1_lastYear_echo, + formatTime=org.apache.oozie.coord.CoordELFunctions#ph1_coord_formatTime_echo, + latest=org.apache.oozie.coord.CoordELFunctions#ph2_coord_latest_echo, + future=org.apache.oozie.coord.CoordELFunctions#ph2_coord_future_echo + </value> +</property> + +<property> + <name>oozie.service.ELService.ext.functions.coord-action-create-inst</name> + <value>now=org.apache.oozie.extensions.OozieELExtensions#ph2_now_inst, + today=org.apache.oozie.extensions.OozieELExtensions#ph2_today_inst, + yesterday=org.apache.oozie.extensions.OozieELExtensions#ph2_yesterday_inst, + currentMonth=org.apache.oozie.extensions.OozieELExtensions#ph2_currentMonth_inst, + lastMonth=org.apache.oozie.extensions.OozieELExtensions#ph2_lastMonth_inst, + currentYear=org.apache.oozie.extensions.OozieELExtensions#ph2_currentYear_inst, + lastYear=org.apache.oozie.extensions.OozieELExtensions#ph2_lastYear_inst, + latest=org.apache.oozie.coord.CoordELFunctions#ph2_coord_latest_echo, + future=org.apache.oozie.coord.CoordELFunctions#ph2_coord_future_echo, + formatTime=org.apache.oozie.coord.CoordELFunctions#ph2_coord_formatTime, + user=org.apache.oozie.coord.CoordELFunctions#coord_user + </value> +</property> + +<property> +<name>oozie.service.ELService.ext.functions.coord-action-start</name> +<value> +now=org.apache.oozie.extensions.OozieELExtensions#ph2_now, +today=org.apache.oozie.extensions.OozieELExtensions#ph2_today, +yesterday=org.apache.oozie.extensions.OozieELExtensions#ph2_yesterday, +currentMonth=org.apache.oozie.extensions.OozieELExtensions#ph2_currentMonth, +lastMonth=org.apache.oozie.extensions.OozieELExtensions#ph2_lastMonth, +currentYear=org.apache.oozie.extensions.OozieELExtensions#ph2_currentYear, +lastYear=org.apache.oozie.extensions.OozieELExtensions#ph2_lastYear, +latest=org.apache.oozie.coord.CoordELFunctions#ph3_coord_latest, +future=org.apache.oozie.coord.CoordELFunctions#ph3_coord_future, +dataIn=org.apache.oozie.extensions.OozieELExtensions#ph3_dataIn, +instanceTime=org.apache.oozie.coord.CoordELFunctions#ph3_coord_nominalTime, +dateOffset=org.apache.oozie.coord.CoordELFunctions#ph3_coord_dateOffset, +formatTime=org.apache.oozie.coord.CoordELFunctions#ph3_coord_formatTime, +user=org.apache.oozie.coord.CoordELFunctions#coord_user +</value> +</property> + +<property> + <name>oozie.service.ELService.ext.functions.coord-sla-submit</name> + <value> + instanceTime=org.apache.oozie.coord.CoordELFunctions#ph1_coord_nominalTime_echo_fixed, + user=org.apache.oozie.coord.CoordELFunctions#coord_user + </value> +</property> + +<property> + <name>oozie.service.ELService.ext.functions.coord-sla-create</name> + <value> + instanceTime=org.apache.oozie.coord.CoordELFunctions#ph2_coord_nominalTime, + user=org.apache.oozie.coord.CoordELFunctions#coord_user + </value> +</property> + + +</pre></div> +<p></p> +<ul> +<li>Copy the existing Oozie WAR file to <oozie-install-dir>/oozie.war. This will ensure that all existing items in the WAR file are still present after the current update.</li></ul> +<div class="source"> +<pre> +su - root +cp $CATALINA_BASE/webapps/oozie.war <oozie-install-dir>/oozie.war + +where $CATALINA_BASE is the path for the Oozie web app. By default, $CATALINA_BASE is: <oozie-install-dir> + +</pre></div> +<p></p> +<ul> +<li>Add the Falcon EL extensions to Oozie.</li></ul> +<p>Copy the extension JAR files provided with the Falcon Server to a temporary directory on the Oozie server. For example, if your standalone Falcon Server is on the same machine as your Oozie server, you can just copy the JAR files.</p> +<div class="source"> +<pre> + +mkdir /tmp/falcon-oozie-jars +cp <falcon-install-dir>/oozie/ext/falcon-oozie-el-extension-<$version>.jar /tmp/falcon-oozie-jars +cp /tmp/falcon-oozie-jars/falcon-oozie-el-extension-<$version>.jar <oozie-install-dir>/libext + + +</pre></div> +<p></p> +<ul> +<li>Package the Oozie WAR file as the Oozie user</li></ul> +<div class="source"> +<pre> +su - $OOZIE_USER +cd <oozie-install-dir>/bin +./oozie-setup.sh prepare-war + +Where $OOZIE_USER is the Oozie user. For example, oozie. + +</pre></div> +<p></p> +<ul> +<li>Start the Oozie service on all Falcon clusters. Run these commands on the Oozie host machine.</li></ul> +<div class="source"> +<pre> +su - $OOZIE_USER +<oozie-install-dir>/bin/oozie-start.sh + +Where $OOZIE_USER is the Oozie user. For example, oozie. + +</pre></div></div> +<div class="section"> +<h4>Disabling Falcon Post Processing<a name="Disabling_Falcon_Post_Processing"></a></h4> +<p>Falcon post processing performs two tasks: They send user notifications to Active mq. It moves oozie executor logs once the workflow finishes.</p> +<p>If post processing is failing because of any reason user mind end up having a backlog in the pipeline thats why it has been made optional.</p> +<p>To disable post processing set the following property to false in startup.properties :</p> +<div class="source"> +<pre> +*.falcon.postprocessing.enable=false +*.workflow.execution.listeners=org.apache.falcon.service.LogMoverService + +</pre></div> +<p><b>NOTE : Please make sure Oozie JMS Notifications are enabled as logMoverService depends on the Oozie JMS Notification.</b></p></div> +<div class="section"> +<h4>Enabling Falcon Native Scheudler<a name="Enabling_Falcon_Native_Scheudler"></a></h4> +<p><verbatim>$FALCON_HOME/conf/startup.properties</verbatim> before starting the Falcon Server. For details on the same, refer to <a href="./FalconNativeScheduler.html">Falcon Native Scheduler</a></p></div> +<div class="section"> +<h4>Titan GraphDB backend<a name="Titan_GraphDB_backend"></a></h4> +<p>GraphDB backend needs to be configured to properly start Falcon server. You can either choose to use 5.0.73 version of berkeleydb (the default for Falcon for the last few releases) or 1.1.x or later version HBase as the backend database. Falcon in its release distributions will have the titan storage plugins for both BerkeleyDB and HBase.</p> +<p>----++++Using BerkeleyDB backend Falcon distributions may not package berkeley db artifacts (je-5.0.73.jar) based on build profiles. If Berkeley DB is not packaged, you can download the Berkeley DB jar file from the URL:</p> +<div class="source"> +<pre>http://download.oracle.com/otn/berkeley-db/je-5.0.73.zip +</pre></div> +<p>The following properties describe an example berkeley db graph storage backend that can be specified in the configuration file</p> +<div class="source"> +<pre>$FALCON_HOME/conf/startup.properties +</pre></div> +<div class="source"> +<pre> +# Graph Storage +*.falcon.graph.storage.directory=${user.dir}/target/graphdb +*.falcon.graph.storage.backend=berkeleyje +*.falcon.graph.serialize.path=${user.dir}/target/graphdb + +</pre></div></div> +<div class="section"> +<h5>Using HBase backend<a name="Using_HBase_backend"></a></h5> +<p><verbatim>hbase-site.xml</verbatim> is provided, which can be used to start the standalone mode HBase enviornment for development/testing purposes.</p></div> +<div class="section"> +<h5>Basic configuration<a name="Basic_configuration"></a></h5> +<div class="source"> +<pre> +##### Falcon startup.properties +*.falcon.graph.storage.backend=hbase +#For standalone mode , specify localhost +#for distributed mode, specify zookeeper quorum here - For more information refer http://s3.thinkaurelius.com/docs/titan/current/hbase.html#_remote_server_mode_2 +*.falcon.graph.storage.hostname=<ZooKeeper Quorum> + +</pre></div> +<p><verbatim>FALCON_EXTRA_CLASS_PATH<verbatim> in <verbatim>$FALCON_HOME/bin/falcon-env.sh</verbatim>. Additionally the correct hbase client libraries need to be added. For example,</p> +<div class="source"> +<pre> +export FALCON_EXTRA_CLASS_PATH=`${HBASE_HOME}/bin/hbase classpath` + +</pre></div> +<p>Table name We recommend that in the startup config the tablename for titan storage be named <verbatim>falcon_titan<verbatim> so that multiple applications using Titan can share the same HBase cluster. This can be set by specifying the tablename using the startup property given below. The default value is shown.</p> +<div class="source"> +<pre> +*.falcon.graph.storage.hbase.table=falcon_titan + +</pre></div></div> +<div class="section"> +<h5>Starting standalone HBase for testing<a name="Starting_standalone_HBase_for_testing"></a></h5> +<p>HBase can be started in stand alone mode for testing as a backend for Titan. The following steps outline the config changes required:</p> +<div class="source"> +<pre> +1. Build Falcon as below to package hbase binaries + $ export MAVEN_OPTS="-Xmx1024m -XX:MaxPermSize=256m" && mvn clean assembly:assembly -Ppackage-standalone-hbase +2. Configure HBase + a. When falcon tar file is expanded, HBase binaries are under ${FALCON_HOME}/hbase + b. Copy ${FALCON_HOME}/conf/hbase-site.xml.template into hbase conf dir in ${FALCON_HOME}/hbase/conf/hbase-site.xml + c. Set {hbase_home} property to point to a local dir + d. Standalone HBase starts zookeeper on the default port (2181). This port can be changed by adding the following to hbase-site.xml + <property> + <name>hbase.zookeeper.property.clientPort</name> + <value>2223</value> + </property> + + <property> + <name>hbase.zookeeper.quorum</name> + <value>localhost</value> + </property> + e. set JAVA_HOME to point to Java 1.7 or above + f. Start hbase as ${FALCON_HOME}/hbase/bin/start-hbase.sh +3. Configure Falcon + a. In ${FALCON_HOME}/conf/startup.properties, uncomment the following to enable HBase as the backend + *.falcon.graph.storage.backend=hbase + ### specify the zookeeper host and port name with which standalone hbase is started (see step 2) + ### by default, it will be localhost and port 2181 + *.falcon.graph.storage.hostname=<zookeeper-host-name>:<zookeeper-host-port> + *.falcon.graph.serialize.path=${user.dir}/target/graphdb + *.falcon.graph.storage.hbase.table=falcon_titan + *.falcon.graph.storage.transactions=false +4. Add HBase jars to Falcon classpath in ${FALCON_HOME}/conf/falcon-env.sh as: + FALCON_EXTRA_CLASS_PATH=`${FALCON_HOME}/hbase/bin/hbase classpath` +5. Set the following in ${FALCON_HOME}/conf/startup.properties to disable SSL if needed + *.falcon.enableTLS=false +6. Start Falcon + +</pre></div></div> +<div class="section"> +<h5>Permissions<a name="Permissions"></a></h5> +<p><verbatim>falcon</verbatim> user for the <verbatim>falcon_titan</verbatim> table (or whateven tablename was specified for the property <verbatim>*.falcon.graph.storage.hbase.table</verbatim></p> +<p><verbatim>falcon_titan</verbatim>.</p> +<p>Without Ranger, HBase shell can be used to set the permissions.</p> +<div class="source"> +<pre> + su hbase + kinit -k -t <hbase keytab> <hbase principal> + echo "grant 'falcon', 'RWXCA', 'falcon_titan'" | hbase shell + +</pre></div></div> +<div class="section"> +<h5>Advanced configuration<a name="Advanced_configuration"></a></h5> +<p><verbatim>$FALCON_HOME/conf/startup.properties</verbatim>, by prefixing the Titan property with <verbatim>*.falcon.graph</verbatim> prefix.</p> +<p><verbatim>http://s3.thinkaurelius.com/docs/titan/0.5.4/titan-config-ref.html#_storage</verbatim> for generic storage properties, <verbaim>http://s3.thinkaurelius.com/docs/titan/0.5.4/titan-config-ref.html#_storage_berkeleydb</verbatim> for berkeley db properties and <verbatim><a class="externalLink" href="http://s3.thinkaurelius.com/docs/titan/0.5.4/titan-config-ref.html#_storage_hbase">http://s3.thinkaurelius.com/docs/titan/0.5.4/titan-config-ref.html#_storage_hbase</a></verbatim> for hbase storage backend properties.</p></div> +<div class="section"> +<h4>Adding Extension Libraries<a name="Adding_Extension_Libraries"></a></h4> +<p>Library extensions allows users to add custom libraries to entity lifecycles such as feed retention, feed replication and process execution. This is useful for usecases such as adding filesystem extensions. To enable this, add the following configs to startup.properties: *.libext.paths=<paths to be added to all entity lifecycles></p> +<p>*.libext.feed.paths=<paths to be added to all feed lifecycles></p> +<p>*.libext.feed.retentions.paths=<paths to be added to feed retention workflow></p> +<p>*.libext.feed.replication.paths=<paths to be added to feed replication workflow></p> +<p>*.libext.process.paths=<paths to be added to process workflow></p> +<p>The configured jars are added to falcon classpath and the corresponding workflows.</p></div> + </div> + </div> + + <hr/> + + <footer> + <div class="container"> + <div class="row span12">Copyright © 2013-2018 + <a href="http://www.apache.org">Apache Software Foundation</a>. + All Rights Reserved. + + </div> + + + <p id="poweredBy" class="pull-right"> + <a href="http://maven.apache.org/" title="Built by Maven" class="poweredBy"> + <img class="builtBy" alt="Built by Maven" src="./images/logos/maven-feather.png" /> + </a> + </p> + + </div> + </footer> + </body> +</html>
http://git-wip-us.apache.org/repos/asf/falcon/blob/91c68bea/content/0.11/DataReplicationAzure.html ---------------------------------------------------------------------- diff --git a/content/0.11/DataReplicationAzure.html b/content/0.11/DataReplicationAzure.html new file mode 100644 index 0000000..f8264c2 --- /dev/null +++ b/content/0.11/DataReplicationAzure.html @@ -0,0 +1,143 @@ +<!DOCTYPE html> +<!-- + | Generated by Apache Maven Doxia at 2018-03-12 + | Rendered using Apache Maven Fluido Skin 1.3.0 +--> +<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> + <head> + <meta charset="UTF-8" /> + <meta name="viewport" content="width=device-width, initial-scale=1.0" /> + <meta name="Date-Revision-yyyymmdd" content="20180312" /> + <meta http-equiv="Content-Language" content="en" /> + <title>Falcon - Data Replication between On-premise Hadoop Clusters and Azure Cloud</title> + <link rel="stylesheet" href="./css/apache-maven-fluido-1.3.0.min.css" /> + <link rel="stylesheet" href="./css/site.css" /> + <link rel="stylesheet" href="./css/print.css" media="print" /> + + + <script type="text/javascript" src="./js/apache-maven-fluido-1.3.0.min.js"></script> + + + +<script type="text/javascript">$( document ).ready( function() { $( '.carousel' ).carousel( { interval: 3500 } ) } );</script> + + </head> + <body class="topBarDisabled"> + + + + + <div class="container"> + <div id="banner"> + <div class="pull-left"> + <div id="bannerLeft"> + <img src="images/falcon-logo.png" alt="Apache Falcon" width="200px" height="45px"/> + </div> + </div> + <div class="pull-right"> </div> + <div class="clear"><hr/></div> + </div> + + <div id="breadcrumbs"> + <ul class="breadcrumb"> + + + <li class=""> + <a href="index.html" title="Falcon"> + Falcon</a> + </li> + <li class="divider ">/</li> + <li class="">Data Replication between On-premise Hadoop Clusters and Azure Cloud</li> + + + + <li id="publishDate" class="pull-right">Last Published: 2018-03-12</li> <li class="divider pull-right">|</li> + <li id="projectVersion" class="pull-right">Version: 0.11</li> + + </ul> + </div> + + + + <div id="bodyColumn" > + + <div class="section"> +<h2>Data Replication between On-premise Hadoop Clusters and Azure Cloud<a name="Data_Replication_between_On-premise_Hadoop_Clusters_and_Azure_Cloud"></a></h2></div> +<div class="section"> +<h3>Overview<a name="Overview"></a></h3> +<p>Falcon provides an easy way to replicate data between on-premise Hadoop clusters and Azure cloud. With this feature, users would be able to build a hybrid data pipeline, e.g. processing sensitive data on-premises for privacy and compliance reasons while leverage cloud for elastic scale and online services (e.g. Azure machine learning) with non-sensitive data.</p></div> +<div class="section"> +<h3>Use Case<a name="Use_Case"></a></h3> +<p>1. Copy data from on-premise Hadoop clusters to Azure cloud 2. Copy data from Azure cloud to on-premise Hadoop clusters 3. Copy data within Azure cloud (i.e. from one Azure location to another).</p></div> +<div class="section"> +<h3>Usage<a name="Usage"></a></h3></div> +<div class="section"> +<h4>Set Up Azure Blob Credentials<a name="Set_Up_Azure_Blob_Credentials"></a></h4> +<p>To move data to/from Azure blobs, we need to add Azure blob credentials in HDFS. This can be done by adding the credential property through Ambari HDFS configs, and HDFS needs to be restarted after adding the credential. You can also add the credential property to core-site.xml directly, but make sure you restart HDFS from command line instead of Ambari. Otherwise, Ambari will take the previous HDFS configuration without your Azure blob credentials.</p> +<div class="source"> +<pre> +<property> + <name>fs.azure.account.key.{AZURE_BLOB_ACCOUNT_NAME}.blob.core.windows.net</name> + <value>{AZURE_BLOB_ACCOUNT_KEY}</value> +</property> + +</pre></div> +<p>To verify you set up Azure credential properly, you can check if you are able to access Azure blob through HDFS, e.g.</p> +<div class="source"> +<pre> +hadoop fs ­ls wasb://{AZURE_BLOB_CONTAINER}@{AZURE_BLOB_ACCOUNT_NAME}.blob.core.windows.net/ + +</pre></div></div> +<div class="section"> +<h4>Replication Feed<a name="Replication_Feed"></a></h4> +<p><a href="./EntitySpecification.html">Falcon replication feed</a> can be used for data replication to/from Azure cloud. You can specify WASB (i.e. Windows Azure Storage Blob) url in source or target locations. See below for an example of data replication from Hadoop cluster to Azure blob. Note that the clusters for the source and the target need to be different. Analogously, if you want to copy data from Azure blob, you can add Azure blob location to the source.</p> +<div class="source"> +<pre> +<?xml version="1.0" encoding="UTF-8"?> +<feed name="AzureReplication" xmlns="uri:falcon:feed:0.1"> + <frequency>months(1)</frequency> + <clusters> + <cluster name="SampleCluster1" type="source"> + <validity start="2010-06-01T00:00Z" end="2010-06-02T00:00Z"/> + <retention limit="days(90)" action="delete"/> + </cluster> + <cluster name="SampleCluster2" type="target"> + <validity start="2010-06-01T00:00Z" end="2010-06-02T00:00Z"/> + <retention limit="days(90)" action="delete"/> + <locations> + <location type="data" path="wasb://replication-t...@mystorage.blob.core.windows.net/replicated-${YEAR}-${MONTH}"/> + </locations> + </cluster> + </clusters> + <locations> + <location type="data" path="/apps/falcon/demo/data-${YEAR}-${MONTH}" /> + </locations> + <ACL owner="ambari-qa" group="users" permission="0755"/> + <schema location="hcat" provider="hcat"/> +</feed> + +</pre></div></div> + </div> + </div> + + <hr/> + + <footer> + <div class="container"> + <div class="row span12">Copyright © 2013-2018 + <a href="http://www.apache.org">Apache Software Foundation</a>. + All Rights Reserved. + + </div> + + + <p id="poweredBy" class="pull-right"> + <a href="http://maven.apache.org/" title="Built by Maven" class="poweredBy"> + <img class="builtBy" alt="Built by Maven" src="./images/logos/maven-feather.png" /> + </a> + </p> + + </div> + </footer> + </body> +</html> http://git-wip-us.apache.org/repos/asf/falcon/blob/91c68bea/content/0.11/Distributed-mode.html ---------------------------------------------------------------------- diff --git a/content/0.11/Distributed-mode.html b/content/0.11/Distributed-mode.html new file mode 100644 index 0000000..d813731 --- /dev/null +++ b/content/0.11/Distributed-mode.html @@ -0,0 +1,250 @@ +<!DOCTYPE html> +<!-- + | Generated by Apache Maven Doxia at 2018-03-12 + | Rendered using Apache Maven Fluido Skin 1.3.0 +--> +<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> + <head> + <meta charset="UTF-8" /> + <meta name="viewport" content="width=device-width, initial-scale=1.0" /> + <meta name="Date-Revision-yyyymmdd" content="20180312" /> + <meta http-equiv="Content-Language" content="en" /> + <title>Falcon - Distributed Mode</title> + <link rel="stylesheet" href="./css/apache-maven-fluido-1.3.0.min.css" /> + <link rel="stylesheet" href="./css/site.css" /> + <link rel="stylesheet" href="./css/print.css" media="print" /> + + + <script type="text/javascript" src="./js/apache-maven-fluido-1.3.0.min.js"></script> + + + +<script type="text/javascript">$( document ).ready( function() { $( '.carousel' ).carousel( { interval: 3500 } ) } );</script> + + </head> + <body class="topBarDisabled"> + + + + + <div class="container"> + <div id="banner"> + <div class="pull-left"> + <div id="bannerLeft"> + <img src="images/falcon-logo.png" alt="Apache Falcon" width="200px" height="45px"/> + </div> + </div> + <div class="pull-right"> </div> + <div class="clear"><hr/></div> + </div> + + <div id="breadcrumbs"> + <ul class="breadcrumb"> + + + <li class=""> + <a href="index.html" title="Falcon"> + Falcon</a> + </li> + <li class="divider ">/</li> + <li class="">Distributed Mode</li> + + + + <li id="publishDate" class="pull-right">Last Published: 2018-03-12</li> <li class="divider pull-right">|</li> + <li id="projectVersion" class="pull-right">Version: 0.11</li> + + </ul> + </div> + + + + <div id="bodyColumn" > + + <div class="section"> +<h2>Distributed Mode<a name="Distributed_Mode"></a></h2> +<p>Following are the steps needed to package and deploy Falcon in Embedded Mode. You need to complete Steps 1-3 mentioned <a href="./InstallationSteps.html">here</a> before proceeding further.</p></div> +<div class="section"> +<h3>Package Falcon<a name="Package_Falcon"></a></h3> +<p>Ensure that you are in the base directory (where you cloned Falcon). Let’s call it {project dir}</p> +<div class="source"> +<pre> +$mvn clean assembly:assembly -DskipTests -DskipCheck=true -Pdistributed,hadoop-2 + +</pre></div> +<div class="source"> +<pre> +$ls {project dir}/distro/target/ + +</pre></div> +<p>It should give an output like below :</p> +<div class="source"> +<pre> +apache-falcon-distributed-${project.version}-server.tar.gz +apache-falcon-distributed-${project.version}-sources.tar.gz +archive-tmp +maven-shared-archive-resources + +</pre></div> +<p></p> +<ul> +<li>apache-falcon-distributed-${project.version}-sources.tar.gz contains source files of Falcon repo.</li></ul> +<p></p> +<ul> +<li>apache-falcon-distributed-${project.version}-server.tar.gz package contains project artifacts along with it's</li></ul>dependencies, configuration files and scripts required to deploy Falcon. +<p>Tar can be found in {project dir}/target/apache-falcon-distributed-${project.version}-server.tar.gz . This is the tar used for installing Falcon. Lets call it {falcon package}</p> +<p>Tar is structured as follows.</p> +<div class="source"> +<pre> + +|- bin + |- falcon + |- falcon-start + |- falcon-stop + |- falcon-status + |- falcon-config.sh + |- service-start.sh + |- service-stop.sh + |- service-status.sh + |- prism-stop + |- prism-start + |- prism-status +|- conf + |- startup.properties + |- runtime.properties + |- client.properties + |- prism.keystore + |- log4j.xml + |- falcon-env.sh +|- docs +|- client + |- lib (client support libs) +|- server + |- webapp + |- falcon.war + |- prism.war +|- oozie + |- conf + |- libext +|- hadooplibs +|- README +|- NOTICE.txt +|- LICENSE.txt +|- DISCLAIMER.txt +|- CHANGES.txt + +</pre></div></div> +<div class="section"> +<h3>Installing & running Falcon<a name="Installing__running_Falcon"></a></h3></div> +<div class="section"> +<h4>Installing Falcon<a name="Installing_Falcon"></a></h4> +<p>Running Falcon in distributed mode requires bringing up both prism and server.As the name suggests Falcon prism splits the request it gets to the Falcon servers. It is a good practice to start prism and server with their corresponding configurations separately. Create separate directory for prism and server. Let's call them {falcon-prism-dir} and {falcon-server-dir} respectively.</p> +<p><b>For prism</b></p> +<div class="source"> +<pre> +$mkdir {falcon-prism-dir} +$tar -xzvf {falcon package} + +</pre></div> +<p><b>For server</b></p> +<div class="source"> +<pre> +$mkdir {falcon-server-dir} +$tar -xzvf {falcon package} + +</pre></div></div> +<div class="section"> +<h4>Starting Prism<a name="Starting_Prism"></a></h4> +<div class="source"> +<pre> +cd {falcon-prism-dir}/falcon-distributed-${project.version} +bin/prism-start [-port <port>] + +</pre></div> +<p>By default, * prism server starts at port 16443. To change the port, use -port option</p> +<p>* falcon.enableTLS can be set to true or false explicitly to enable SSL, if not port that end with 443 will automatically put prism on <a class="externalLink" href="https://">https://</a></p> +<p>* prism starts with conf from {falcon-prism-dir}/falcon-distributed-${project.version}/conf. To override this (to use the same conf with multiple prism upgrades), set environment variable FALCON_CONF to the path of conf dir. You can find the instructions for configuring Falcon <a href="./Configuration.html">here</a>.</p> +<p><b>Enabling prism-client</b> *If prism is not started using default-port 16443 then edit the following property in {falcon-prism-dir}/falcon-distributed-${project.version}/conf/client.properties falcon.url=http://{machine-ip}:{prism-port}/</p></div> +<div class="section"> +<h4>Starting Falcon Server<a name="Starting_Falcon_Server"></a></h4> +<div class="source"> +<pre> +$cd {falcon-server-dir}/falcon-distributed-${project.version} +$bin/falcon-start [-port <port>] + +</pre></div> +<p>By default, * If falcon.enableTLS is set to true explicitly or not set at all, Falcon starts at port 15443 on <a class="externalLink" href="https://">https://</a> by default.</p> +<p>* If falcon.enableTLS is set to false explicitly, Falcon starts at port 15000 on <a class="externalLink" href="http://.">http://.</a></p> +<p>* To change the port, use -port option.</p> +<p>* If falcon.enableTLS is not set explicitly, port that ends with 443 will automatically put Falcon on <a class="externalLink" href="https://.">https://.</a> Any other port will put Falcon on <a class="externalLink" href="http://.">http://.</a></p> +<p>* server starts with conf from {falcon-server-dir}/falcon-distributed-${project.version}/conf. To override this (to use the same conf with multiple server upgrades), set environment variable FALCON_CONF to the path of conf dir. You can find the instructions for configuring Falcon <a href="./Configuration.html">here</a>.</p> +<p><b>Enabling server-client</b> *If server is not started using default-port 15443 then edit the following property in {falcon-server-dir}/falcon-distributed-${project.version}/conf/client.properties. You can find the instructions for configuring Falcon here. falcon.url=http://{machine-ip}:{server-port}/</p> +<p><b>NOTE</b> : https is the secure version of HTTP, the protocol over which data is sent between your browser and the website that you are connected to. By default Falcon runs in https mode. But user can configure it to http.</p></div> +<div class="section"> +<h4>Using Falcon<a name="Using_Falcon"></a></h4> +<div class="source"> +<pre> +$cd {falcon-prism-dir}/falcon-distributed-${project.version} +$bin/falcon admin -version +Falcon server build version: {Version:"${project.version}-SNAPSHOT-rd7e2be9afa2a5dc96acd1ec9e325f39c6b2f17f7", +Mode:"embedded"} + +$bin/falcon help +(for more details about Falcon cli usage) + +</pre></div></div> +<div class="section"> +<h4>Dashboard<a name="Dashboard"></a></h4> +<p>Once Falcon / prism is started, you can view the status of Falcon entities using the Web-based dashboard. You can open your browser at the corresponding port to use the web UI.</p> +<p>Falcon dashboard makes the REST api calls as user "falcon-dashboard". If this user does not exist on your Falcon and Oozie servers, please create the user.</p> +<div class="source"> +<pre> +## create user. +[root@falconhost ~] useradd -U -m falcon-dashboard -G users + +## verify user is created with membership in correct groups. +[root@falconhost ~] groups falcon-dashboard +falcon-dashboard : falcon-dashboard users +[root@falconhost ~] + +</pre></div></div> +<div class="section"> +<h4>Stopping Falcon Server<a name="Stopping_Falcon_Server"></a></h4> +<div class="source"> +<pre> +$cd {falcon-server-dir}/falcon-distributed-${project.version} +$bin/falcon-stop + +</pre></div></div> +<div class="section"> +<h4>Stopping Falcon Prism<a name="Stopping_Falcon_Prism"></a></h4> +<div class="source"> +<pre> +$cd {falcon-prism-dir}/falcon-distributed-${project.version} +$bin/prism-stop + +</pre></div></div> + </div> + </div> + + <hr/> + + <footer> + <div class="container"> + <div class="row span12">Copyright © 2013-2018 + <a href="http://www.apache.org">Apache Software Foundation</a>. + All Rights Reserved. + + </div> + + + <p id="poweredBy" class="pull-right"> + <a href="http://maven.apache.org/" title="Built by Maven" class="poweredBy"> + <img class="builtBy" alt="Built by Maven" src="./images/logos/maven-feather.png" /> + </a> + </p> + + </div> + </footer> + </body> +</html> http://git-wip-us.apache.org/repos/asf/falcon/blob/91c68bea/content/0.11/Embedded-mode.html ---------------------------------------------------------------------- diff --git a/content/0.11/Embedded-mode.html b/content/0.11/Embedded-mode.html new file mode 100644 index 0000000..c4bc9b7 --- /dev/null +++ b/content/0.11/Embedded-mode.html @@ -0,0 +1,277 @@ +<!DOCTYPE html> +<!-- + | Generated by Apache Maven Doxia at 2018-03-12 + | Rendered using Apache Maven Fluido Skin 1.3.0 +--> +<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> + <head> + <meta charset="UTF-8" /> + <meta name="viewport" content="width=device-width, initial-scale=1.0" /> + <meta name="Date-Revision-yyyymmdd" content="20180312" /> + <meta http-equiv="Content-Language" content="en" /> + <title>Falcon - Embedded Mode</title> + <link rel="stylesheet" href="./css/apache-maven-fluido-1.3.0.min.css" /> + <link rel="stylesheet" href="./css/site.css" /> + <link rel="stylesheet" href="./css/print.css" media="print" /> + + + <script type="text/javascript" src="./js/apache-maven-fluido-1.3.0.min.js"></script> + + + +<script type="text/javascript">$( document ).ready( function() { $( '.carousel' ).carousel( { interval: 3500 } ) } );</script> + + </head> + <body class="topBarDisabled"> + + + + + <div class="container"> + <div id="banner"> + <div class="pull-left"> + <div id="bannerLeft"> + <img src="images/falcon-logo.png" alt="Apache Falcon" width="200px" height="45px"/> + </div> + </div> + <div class="pull-right"> </div> + <div class="clear"><hr/></div> + </div> + + <div id="breadcrumbs"> + <ul class="breadcrumb"> + + + <li class=""> + <a href="index.html" title="Falcon"> + Falcon</a> + </li> + <li class="divider ">/</li> + <li class="">Embedded Mode</li> + + + + <li id="publishDate" class="pull-right">Last Published: 2018-03-12</li> <li class="divider pull-right">|</li> + <li id="projectVersion" class="pull-right">Version: 0.11</li> + + </ul> + </div> + + + + <div id="bodyColumn" > + + <div class="section"> +<h2>Embedded Mode<a name="Embedded_Mode"></a></h2> +<p>Following are the steps needed to package and deploy Falcon in Embedded Mode. You need to complete Steps 1-3 mentioned <a href="./InstallationSteps.html">here</a> before proceeding further.</p></div> +<div class="section"> +<h3>Package Falcon<a name="Package_Falcon"></a></h3> +<p>Ensure that you are in the base directory (where you cloned Falcon). Let’s call it {project dir}</p> +<div class="source"> +<pre> +$mvn clean assembly:assembly -DskipTests -DskipCheck=true + +</pre></div> +<div class="source"> +<pre> +$ls {project dir}/distro/target/ + +</pre></div> +<p>It should give an output like below :</p> +<div class="source"> +<pre> +apache-falcon-${project.version}-bin.tar.gz +apache-falcon-${project.version}-sources.tar.gz +archive-tmp +maven-shared-archive-resources + +</pre></div> +<p>* apache-falcon-${project.version}-sources.tar.gz contains source files of Falcon repo.</p> +<p>* apache-falcon-${project.version}-bin.tar.gz package contains project artifacts along with it's dependencies, configuration files and scripts required to deploy Falcon.</p> +<p>Tar can be found in {project dir}/target/apache-falcon-${project.version}-bin.tar.gz</p> +<p>Tar is structured as follows :</p> +<div class="source"> +<pre> + +|- bin + |- falcon + |- falcon-start + |- falcon-stop + |- falcon-status + |- falcon-config.sh + |- service-start.sh + |- service-stop.sh + |- service-status.sh +|- conf + |- startup.properties + |- runtime.properties + |- prism.keystore + |- client.properties + |- log4j.xml + |- falcon-env.sh +|- docs +|- client + |- lib (client support libs) +|- server + |- webapp + |- falcon.war +|- data + |- falcon-store + |- graphdb + |- localhost +|- examples + |- app + |- hive + |- oozie-mr + |- pig + |- data + |- entity + |- filesystem + |- hcat +|- oozie + |- conf + |- libext +|- logs +|- hadooplibs +|- README +|- NOTICE.txt +|- LICENSE.txt +|- DISCLAIMER.txt +|- CHANGES.txt + +</pre></div></div> +<div class="section"> +<h3>Installing & running Falcon<a name="Installing__running_Falcon"></a></h3> +<p>Running Falcon in embedded mode requires bringing up server.</p> +<div class="source"> +<pre> +$tar -xzvf {falcon package} +$cd falcon-${project.version} + +</pre></div></div> +<div class="section"> +<h4>Starting Falcon Server<a name="Starting_Falcon_Server"></a></h4> +<div class="source"> +<pre> +$cd falcon-${project.version} +$bin/falcon-start [-port <port>] + +</pre></div> +<p>By default, * If falcon.enableTLS is set to true explicitly or not set at all, Falcon starts at port 15443 on <a class="externalLink" href="https://">https://</a> by default.</p> +<p>* If falcon.enableTLS is set to false explicitly, Falcon starts at port 15000 on <a class="externalLink" href="http://.">http://.</a></p> +<p>* To change the port, use -port option.</p> +<p>* If falcon.enableTLS is not set explicitly, port that ends with 443 will automatically put Falcon on <a class="externalLink" href="https://.">https://.</a> Any other port will put Falcon on <a class="externalLink" href="http://.">http://.</a></p> +<p>* Server starts with conf from {falcon-server-dir}/falcon-distributed-${project.version}/conf. To override this (to use the same conf with multiple server upgrades), set environment variable FALCON_CONF to the path of conf dir. You can find the instructions for configuring Falcon <a href="./Configuration.html">here</a>.</p></div> +<div class="section"> +<h4>Enabling server-client<a name="Enabling_server-client"></a></h4> +<p>If server is not started using default-port 15443 then edit the following property in {falcon-server-dir}/falcon-${project.version}/conf/client.properties</p> +<p>falcon.url=http://{machine-ip}:{server-port}/</p></div> +<div class="section"> +<h4>Using Falcon<a name="Using_Falcon"></a></h4> +<div class="source"> +<pre> +$cd falcon-${project.version} +$bin/falcon admin -version +Falcon server build version: {Version:"${project.version}-SNAPSHOT-rd7e2be9afa2a5dc96acd1ec9e325f39c6b2f17f7",Mode: +"embedded",Hadoop:"${hadoop.version}"} + +$bin/falcon help +(for more details about Falcon cli usage) + +</pre></div> +<p><b>Note</b> : https is the secure version of HTTP, the protocol over which data is sent between your browser and the website that you are connected to. By default Falcon runs in https mode. But user can configure it to http.</p></div> +<div class="section"> +<h4>Dashboard<a name="Dashboard"></a></h4> +<p>Once Falcon server is started, you can view the status of Falcon entities using the Web-based dashboard. You can open your browser at the corresponding port to use the web UI.</p> +<p>Falcon dashboard makes the REST api calls as user "falcon-dashboard". If this user does not exist on your Falcon and Oozie servers, please create the user.</p> +<div class="source"> +<pre> +## create user. +[root@falconhost ~] useradd -U -m falcon-dashboard -G users + +## verify user is created with membership in correct groups. +[root@falconhost ~] groups falcon-dashboard +falcon-dashboard : falcon-dashboard users +[root@falconhost ~] + +</pre></div></div> +<div class="section"> +<h3>Running Examples using embedded package<a name="Running_Examples_using_embedded_package"></a></h3> +<div class="source"> +<pre> +$cd falcon-${project.version} +$bin/falcon-start + +</pre></div> +<p>Make sure the Hadoop and Oozie endpoints are according to your setup in examples/entity/filesystem/standalone-cluster.xml The cluster locations,staging and working dirs, MUST be created prior to submitting a cluster entity to Falcon. <b>staging</b> must have 777 permissions and the parent dirs must have execute permissions <b>working</b> must have 755 permissions and the parent dirs must have execute permissions</p> +<div class="source"> +<pre> +$bin/falcon entity -submit -type cluster -file examples/entity/filesystem/standalone-cluster.xml + +</pre></div> +<p>Submit input and output feeds:</p> +<div class="source"> +<pre> +$bin/falcon entity -submit -type feed -file examples/entity/filesystem/in-feed.xml +$bin/falcon entity -submit -type feed -file examples/entity/filesystem/out-feed.xml + +</pre></div> +<p>Set-up workflow for the process:</p> +<div class="source"> +<pre> +$hadoop fs -put examples/app / + +</pre></div> +<p>Submit and schedule the process:</p> +<div class="source"> +<pre> +$bin/falcon entity -submitAndSchedule -type process -file examples/entity/filesystem/oozie-mr-process.xml +$bin/falcon entity -submitAndSchedule -type process -file examples/entity/filesystem/pig-process.xml +$bin/falcon entity -submitAndSchedule -type process -file examples/entity/spark/spark-process.xml + +</pre></div> +<p>Generate input data:</p> +<div class="source"> +<pre> +$examples/data/generate.sh <<hdfs endpoint>> + +</pre></div> +<p>Get status of instances:</p> +<div class="source"> +<pre> +$bin/falcon instance -status -type process -name oozie-mr-process -start 2013-11-15T00:05Z -end 2013-11-15T01:00Z + +</pre></div> +<p>HCat based example entities are in examples/entity/hcat. Spark based example entities are in examples/entity/spark.</p></div> +<div class="section"> +<h4>Stopping Falcon Server<a name="Stopping_Falcon_Server"></a></h4> +<div class="source"> +<pre> +$cd falcon-${project.version} +$bin/falcon-stop + +</pre></div></div> + </div> + </div> + + <hr/> + + <footer> + <div class="container"> + <div class="row span12">Copyright © 2013-2018 + <a href="http://www.apache.org">Apache Software Foundation</a>. + All Rights Reserved. + + </div> + + + <p id="poweredBy" class="pull-right"> + <a href="http://maven.apache.org/" title="Built by Maven" class="poweredBy"> + <img class="builtBy" alt="Built by Maven" src="./images/logos/maven-feather.png" /> + </a> + </p> + + </div> + </footer> + </body> +</html> http://git-wip-us.apache.org/repos/asf/falcon/blob/91c68bea/content/0.11/EntityDependency.png ---------------------------------------------------------------------- diff --git a/content/0.11/EntityDependency.png b/content/0.11/EntityDependency.png new file mode 100644 index 0000000..9f11870 Binary files /dev/null and b/content/0.11/EntityDependency.png differ http://git-wip-us.apache.org/repos/asf/falcon/blob/91c68bea/content/0.11/EntitySLAAlerting.html ---------------------------------------------------------------------- diff --git a/content/0.11/EntitySLAAlerting.html b/content/0.11/EntitySLAAlerting.html new file mode 100644 index 0000000..ae2add0 --- /dev/null +++ b/content/0.11/EntitySLAAlerting.html @@ -0,0 +1,134 @@ +<!DOCTYPE html> +<!-- + | Generated by Apache Maven Doxia at 2018-03-12 + | Rendered using Apache Maven Fluido Skin 1.3.0 +--> +<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> + <head> + <meta charset="UTF-8" /> + <meta name="viewport" content="width=device-width, initial-scale=1.0" /> + <meta name="Date-Revision-yyyymmdd" content="20180312" /> + <meta http-equiv="Content-Language" content="en" /> + <title>Falcon - Entity SLA Alerting</title> + <link rel="stylesheet" href="./css/apache-maven-fluido-1.3.0.min.css" /> + <link rel="stylesheet" href="./css/site.css" /> + <link rel="stylesheet" href="./css/print.css" media="print" /> + + + <script type="text/javascript" src="./js/apache-maven-fluido-1.3.0.min.js"></script> + + + +<script type="text/javascript">$( document ).ready( function() { $( '.carousel' ).carousel( { interval: 3500 } ) } );</script> + + </head> + <body class="topBarDisabled"> + + + + + <div class="container"> + <div id="banner"> + <div class="pull-left"> + <div id="bannerLeft"> + <img src="images/falcon-logo.png" alt="Apache Falcon" width="200px" height="45px"/> + </div> + </div> + <div class="pull-right"> </div> + <div class="clear"><hr/></div> + </div> + + <div id="breadcrumbs"> + <ul class="breadcrumb"> + + + <li class=""> + <a href="index.html" title="Falcon"> + Falcon</a> + </li> + <li class="divider ">/</li> + <li class="">Entity SLA Alerting</li> + + + + <li id="publishDate" class="pull-right">Last Published: 2018-03-12</li> <li class="divider pull-right">|</li> + <li id="projectVersion" class="pull-right">Version: 0.11</li> + + </ul> + </div> + + + + <div id="bodyColumn" > + + <div class="section"> +<h3>Entity SLA Alerting<a name="Entity_SLA_Alerting"></a></h3> +<p>Falcon supports SLA in feed and process.</p> +<p>Types of SLA supported for feed:</p> +<p></p> +<ol style="list-style-type: decimal"> +<li>slaLow</li> +<li>slaHigh</li></ol> +<p>To know more about feedSla look at <a href="./EntitySpecification.html">Feed Specification</a></p> +<p>Types of SLA supported for process:</p> +<p></p> +<ol style="list-style-type: decimal"> +<li>shouldStartIn</li> +<li>shouldEndIn</li></ol> +<p>To know more about processSla look at <a href="./EntitySpecification.html">Process Specification</a></p> +<p>Falcon Entity Alerting service do the following things:</p> +<p></p> +<ol style="list-style-type: decimal"> +<li>Monitor instances of feed and process and send notifications to all the listeners attached to it.</li> +<li>In case of feed it notifies when an <b>slaHigh</b> miss happens. slaLow is not supported.</li> +<li>In case of process it notifies when an SLA miss for <b>shouldEndIn</b> happens. shouldStartIn is not supported.</li></ol> +<p>Entity SLA Alert service depends upon <a href="./EntitySLAMonitoring.html">Falcon Entity SLA Monitoring</a> to know which process and feed instances are to be monitored.</p> +<p><b>How to attach listeners:</b></p> +<p>You can write custom listeners to do some action whenever a process or feed instance misses its SLA. To attach listeners please add below property in startup.properties:</p> +<div class="source"> +<pre> + +*.entityAlert.listeners=org.apache.customPath.customListener + + +</pre></div> +<p>Currently Falcon natively supports <a href="./BacklogMetricEmitterService.html">Back Log Emitter Service</a> as a listener to EntitySLAAlert service</p></div> +<div class="section"> +<h3>Dependencies:<a name="Dependencies:"></a></h3> +<p><b>Other Services:</b></p> +<p>To enable Enity SLA Alerting service you need to enable <a href="./EntitySLAMonitoring.html">Falcon Entity SLA Monitoring</a></p> +<p>Following properties are needed in startup.properties:</p> +<div class="source"> +<pre> + +*.application.services=org.apache.falcon.service.EntitySLAAlertService + +*.entity.sla.statusCheck.frequency.seconds=600 + +</pre></div> +<p><b>Falcon Database:</b></p> +<p>Entity SLA Alerting service maintains its state in the database.It needs one table <b>ENTITY_SLA_ALERTS</b> please have a look at <a href="./FalconDatabase.html">FalconDatabase</a> to know how to create it.</p></div> + </div> + </div> + + <hr/> + + <footer> + <div class="container"> + <div class="row span12">Copyright © 2013-2018 + <a href="http://www.apache.org">Apache Software Foundation</a>. + All Rights Reserved. + + </div> + + + <p id="poweredBy" class="pull-right"> + <a href="http://maven.apache.org/" title="Built by Maven" class="poweredBy"> + <img class="builtBy" alt="Built by Maven" src="./images/logos/maven-feather.png" /> + </a> + </p> + + </div> + </footer> + </body> +</html> http://git-wip-us.apache.org/repos/asf/falcon/blob/91c68bea/content/0.11/EntitySLAMonitoring.html ---------------------------------------------------------------------- diff --git a/content/0.11/EntitySLAMonitoring.html b/content/0.11/EntitySLAMonitoring.html new file mode 100644 index 0000000..f8600e5 --- /dev/null +++ b/content/0.11/EntitySLAMonitoring.html @@ -0,0 +1,106 @@ +<!DOCTYPE html> +<!-- + | Generated by Apache Maven Doxia at 2018-03-12 + | Rendered using Apache Maven Fluido Skin 1.3.0 +--> +<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> + <head> + <meta charset="UTF-8" /> + <meta name="viewport" content="width=device-width, initial-scale=1.0" /> + <meta name="Date-Revision-yyyymmdd" content="20180312" /> + <meta http-equiv="Content-Language" content="en" /> + <title>Falcon - Falcon Entity SLA Monitoring</title> + <link rel="stylesheet" href="./css/apache-maven-fluido-1.3.0.min.css" /> + <link rel="stylesheet" href="./css/site.css" /> + <link rel="stylesheet" href="./css/print.css" media="print" /> + + + <script type="text/javascript" src="./js/apache-maven-fluido-1.3.0.min.js"></script> + + + +<script type="text/javascript">$( document ).ready( function() { $( '.carousel' ).carousel( { interval: 3500 } ) } );</script> + + </head> + <body class="topBarDisabled"> + + + + + <div class="container"> + <div id="banner"> + <div class="pull-left"> + <div id="bannerLeft"> + <img src="images/falcon-logo.png" alt="Apache Falcon" width="200px" height="45px"/> + </div> + </div> + <div class="pull-right"> </div> + <div class="clear"><hr/></div> + </div> + + <div id="breadcrumbs"> + <ul class="breadcrumb"> + + + <li class=""> + <a href="index.html" title="Falcon"> + Falcon</a> + </li> + <li class="divider ">/</li> + <li class="">Falcon Entity SLA Monitoring</li> + + + + <li id="publishDate" class="pull-right">Last Published: 2018-03-12</li> <li class="divider pull-right">|</li> + <li id="projectVersion" class="pull-right">Version: 0.11</li> + + </ul> + </div> + + + + <div id="bodyColumn" > + + <div class="section"> +<h3>Falcon Entity SLA Monitoring<a name="Falcon_Entity_SLA_Monitoring"></a></h3> +<p>Entity SLA monitoring allows you to monitor the entity (process and feed) .It keeps track of the instances of the entity that are running and stores them in the db.</p></div> +<div class="section"> +<h3>Dependencies:<a name="Dependencies:"></a></h3> +<p><b>Other Services:</b></p> +<p>Entity SLA monitoring service requires FalconJPAService to be up. Following are the values you need to set to run EntitySLAMonitoring. In startup.properties:</p> +<div class="source"> +<pre> +*.application.services= org.apache.falcon.state.store.service.FalconJPAService, + org.apache.falcon.service.EntitySLAMonitoringService + +</pre></div> +<p><b>Falcon Database:</b></p> +<p>Entity SLA monitoring service maintains its state in the database.It needs two tables:</p> +<p></p> +<ol style="list-style-type: decimal"> +<li>MONITORED_ENTITY</li> +<li>PENDING_INSTANCES</li></ol>please have a look at <a href="./FalconDatabase.html">FalconDatabase</a> to know how to create them.</div> + </div> + </div> + + <hr/> + + <footer> + <div class="container"> + <div class="row span12">Copyright © 2013-2018 + <a href="http://www.apache.org">Apache Software Foundation</a>. + All Rights Reserved. + + </div> + + + <p id="poweredBy" class="pull-right"> + <a href="http://maven.apache.org/" title="Built by Maven" class="poweredBy"> + <img class="builtBy" alt="Built by Maven" src="./images/logos/maven-feather.png" /> + </a> + </p> + + </div> + </footer> + </body> +</html>