kramasamy closed pull request #2838: [Documentation] Improve JavaDocs and Documentation URL: https://github.com/apache/incubator-heron/pull/2838
This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/heron/common/src/java/com/twitter/heron/common/network/HeronClient.java b/heron/common/src/java/com/twitter/heron/common/network/HeronClient.java index a4760edfc6..adb7714755 100644 --- a/heron/common/src/java/com/twitter/heron/common/network/HeronClient.java +++ b/heron/common/src/java/com/twitter/heron/common/network/HeronClient.java @@ -33,7 +33,7 @@ /** * Implements this class could handle some following socket related behaviors: * 1. handleRead(SelectableChannel), which read data from a socket and convert into incomingPacket. - * It could handle the conditions of closedConnection, normal Reading and partial Reading. When a + * It could handle the conditions of closedConnection, normal Reading and partial Reading. When an * incomingPacket is read, it will be pass to handlePacket(), which will convert incomingPackets to * messages and call onIncomingMessage(message), which should be implemented by its child class. * <p> diff --git a/heron/downloaders/src/java/com/twitter/heron/downloader/FileDownloader.java b/heron/downloaders/src/java/com/twitter/heron/downloader/FileDownloader.java index 272c242768..bcbd7fbec4 100644 --- a/heron/downloaders/src/java/com/twitter/heron/downloader/FileDownloader.java +++ b/heron/downloaders/src/java/com/twitter/heron/downloader/FileDownloader.java @@ -19,7 +19,7 @@ /** * Used to download files via heron_downloader that has a URI prefix of "file://" - * E.g. ./heron_dowloader file:///foo/bar /path/location + * E.g. ./heron_downloader file:///foo/bar /path/location */ public class FileDownloader implements Downloader { @Override diff --git a/heron/instance/src/java/com/twitter/heron/instance/HeronInstance.java b/heron/instance/src/java/com/twitter/heron/instance/HeronInstance.java index 15ee2040e1..df3bf70330 100644 --- a/heron/instance/src/java/com/twitter/heron/instance/HeronInstance.java +++ b/heron/instance/src/java/com/twitter/heron/instance/HeronInstance.java @@ -302,7 +302,7 @@ public static void main(String[] args) throws IOException { String logMsg = "\nStarting instance " + instanceId + " for topology " + topologyName + " and topologyId " + topologyId + " for component " + componentName + " with taskId " + taskId + " and componentIndex " + componentIndex - + " and stmgrId " + streamId + " and stmgrPort " + streamPort + + " and streamManagerId " + streamId + " and streamManagerPort " + streamPort + " and metricsManagerPort " + metricsPort; if (remoteDebuggerPort != null) { diff --git a/heron/packing/src/java/com/twitter/heron/packing/roundrobin/ResourceCompliantRRPacking.java b/heron/packing/src/java/com/twitter/heron/packing/roundrobin/ResourceCompliantRRPacking.java index 778e0cc230..f07dd86b7c 100644 --- a/heron/packing/src/java/com/twitter/heron/packing/roundrobin/ResourceCompliantRRPacking.java +++ b/heron/packing/src/java/com/twitter/heron/packing/roundrobin/ResourceCompliantRRPacking.java @@ -122,7 +122,7 @@ public void initialize(Config config, TopologyAPI.Topology inputTopology) { Context.instanceDisk(config)); resetToFirstContainer(); - LOG.info(String.format("Initalizing ResourceCompliantRRPacking. " + LOG.info(String.format("Initializing ResourceCompliantRRPacking. " + "CPU default: %f, RAM default: %s, DISK default: %s.", this.defaultInstanceResources.getCpu(), this.defaultInstanceResources.getRam().toString(), @@ -308,7 +308,7 @@ private void assignInstancesToContainers(PackingPlanBuilder planBuilder, * Attempts to place the instance the current containerId. * * @param planBuilder packing plan builder - * @param componentName the componet name of the instance that needs to be placed in the container + * @param componentName the component name of the instance that needs to be placed in the container * @throws ResourceExceededException if there is no room on the current container for the instance */ private void strictRRpolicy(PackingPlanBuilder planBuilder, diff --git a/heron/schedulers/src/java/com/twitter/heron/scheduler/nomad/NomadScheduler.java b/heron/schedulers/src/java/com/twitter/heron/scheduler/nomad/NomadScheduler.java index 2bdac1e0e8..482aaa97d5 100644 --- a/heron/schedulers/src/java/com/twitter/heron/scheduler/nomad/NomadScheduler.java +++ b/heron/schedulers/src/java/com/twitter/heron/scheduler/nomad/NomadScheduler.java @@ -275,7 +275,7 @@ Task getTask(String taskName, int containerIndex, Resource containerResource) { } /** - * Get the task spec for using the docker driver in Noad + * Get the task spec for using the docker driver in Nomad * In docker mode, Heron will be use in docker containers */ Task getTaskSpecDockerDriver(Task task, String taskName, int containerIndex) { diff --git a/website/content/docs/concepts/architecture.md b/website/content/docs/concepts/architecture.md index a145dc0113..49184315ea 100644 --- a/website/content/docs/concepts/architecture.md +++ b/website/content/docs/concepts/architecture.md @@ -78,7 +78,7 @@ In a Heron cluster ### Uploaders -## Topology Components +# Topology Components From an architectural standpoint, Heron was built as an interconnected set of modular components. @@ -99,9 +99,9 @@ the sections below: The **Topology Master** \(TM) manages a topology throughout its entire lifecycle, from the time it's submitted until it's ultimately killed. When `heron` deploys a topology it starts a single TM and multiple [containers]({{< ref "#container" >}}). -The TM creates an ephemeral [ZooKeeper](http://zookeeper.apache.org) node to -ensure that there's only one TM for the topology and that the TM is easily -discoverable by any process in the topology. The TM also constructs the [physical +The **TM** creates an ephemeral [ZooKeeper](http://zookeeper.apache.org) node to +ensure that there's only one **TM** for the topology and that the **TM** is easily +discoverable by any process in the topology. The **TM** also constructs the [physical plan](../topologies#physical-plan) for a topology which it relays to different components. @@ -118,7 +118,7 @@ phase of a topology's [lifecycle](../topologies#topology-lifecycle). Each Heron topology consists of multiple **containers**, each of which houses multiple [Heron Instances]({{< ref "#heron-instance" >}}), a [Stream Manager]({{< ref "#stream-manager" >}}), and a [Metrics Manager]({{< ref "#metrics-manager" >}}). Containers -communicate with the topology's TM to ensure that the topology forms a fully +communicate with the topology's **TM** to ensure that the topology forms a fully connected graph. For an illustration, see the figure in the [Topology Master]({{< ref "#topology-master" >}}) @@ -130,12 +130,12 @@ section above. The **Stream Manager** (SM) manages the routing of tuples between topology components. Each [Heron Instance]({{< ref "#heron-instance" >}}) in a topology connects to its -local SM, while all of the SMs in a given topology connect to one another to -form a network. Below is a visual illustration of a network of SMs: +local **SM**, while all of the **SMs** in a given topology connect to one another to +form a network. Below is a visual illustration of a network of **SMs**: ![Heron Data Flow](/img/data-flow.png) -In addition to being a routing engine for data streams, SMs are responsible for +In addition to being a routing engine for data streams, **SMs** are responsible for propagating [back pressure](https://en.wikipedia.org/wiki/Back_pressure) within the topology when necessary. Below is an illustration of back pressure: @@ -147,21 +147,21 @@ components. In response, the SM for container **A** will refuse input from the SMs in containers **C** and **D**, which will lead to the socket buffers in those containers filling up, which could lead to throughput collapse. -In a situation like this, Heron's back pressure mechanism will kick in. The SM -in container **A** will send a message to all the other SMs. In response, the -other SMs will examine the container's [physical +In a situation like this, Heron's back pressure mechanism will kick in. The **SM** +in container **A** will send a message to all the other **SMs**. In response, the +other **SMs** will examine the container's [physical plan](../topologies#physical-plan) and cut off inputs from spouts that feed bolt **B3** (in this case spout **S1**). ![Back Pressure 2](/img/backpressure2.png) -Once the lagging bolt (**B3**) begins functioning normally, the SM in container -**A** will notify the other SMs and stream routing within the topology will +Once the lagging bolt (**B3**) begins functioning normally, the **SM** in container +**A** will notify the other **SMs** and stream routing within the topology will return to normal. -#### Stream Manger Configuration +#### Stream Manager Configuration -SMs have a variety of [configurable +**SMs** have a variety of [configurable parameters](../../operators/configuration/stmgr) that you can adjust at each phase of a topology's [lifecycle](../topologies#topology-lifecycle). @@ -172,18 +172,18 @@ A **Heron Instance** (HI) is a process that handles a single task of a for easy debugging and profiling. Currently, Heron only supports Java, so all -HIs are [JVM](https://en.wikipedia.org/wiki/Java_virtual_machine) processes, but +**HIs** are [JVM](https://en.wikipedia.org/wiki/Java_virtual_machine) processes, but this will change in the future. #### Heron Instance Configuration -HIs have a variety of [configurable +**HIs** have a variety of [configurable parameters](../../operators/configuration/instance) that you can adjust at each phase of a topology's [lifecycle](../topologies#topology-lifecycle). ### Metrics Manager -Each topology runs a Metrics Manager (MM) that collects and exports metrics from +Each topology runs a **Metrics Manager** (MM) that collects and exports metrics from all components in a [container]({{< ref "#container" >}}). It then routes those metrics to both the [Topology Master]({{< ref "#topology-master" >}}) and to external collectors, such as [Scribe](https://github.com/facebookarchive/scribe), @@ -192,7 +192,7 @@ both the [Topology Master]({{< ref "#topology-master" >}}) and to external colle You can adapt Heron to support additional systems by implementing your own [custom metrics sink](../../contributors/custom-metrics-sink). -## Cluster-level Components +# Cluster-level Components All of the components listed in the sections above can be found in each topology. The components listed below are cluster-level components that function @@ -200,7 +200,7 @@ outside of particular topologies. ### Heron CLI -Heron has a CLI tool called `heron` that is used to manage topologies. +Heron has a **CLI** tool called `heron` that is used to manage topologies. Documentation can be found in [Managing Topologies](../../operators/heron-cli). @@ -229,7 +229,7 @@ Tracker](../../operators/heron-tracker). ### Heron UI **Heron UI** is a rich visual interface that you can use to interact with -topologies. Through Heron UI you can see color-coded visual representations of +topologies. Through **Heron UI** you can see color-coded visual representations of the [logical](../topologies#logical-plan) and [physical](../topologies#physical-plan) plan of each topology in your cluster. @@ -264,13 +264,11 @@ Heron API server | When the [Heron API server](../../operators/heron-api-server) Heron scheduler | When the Heron CLI (client) submits a topology to the Heron API server, the API server notifies the Heron scheduler and also provides the scheduler with the topology's [logical plan](../../concepts/topologies#logical-plan), [physical plan](../../concepts/topologies#physical-plan), and some other artifacts. The scheduler, be it [Mesos](../../operators/deployment/schedulers/mesos), [Aurora](../../operators/deployment/schedulers/aurora), the [local filesystem](../../operators/deployment/schedulers/localfs), or something else, then deploys the topology using containers. Storage | When the topology is deployed to containers by the scheduler, the code running in those containers then downloads the remaining necessary topology artifacts (essentially the code that will run in those containers) from the storage system. - -<!-- * Shared Services When the main scheduler (`com.twitter.heron.scheduler.SchedulerMain`) is invoked by the launcher, it fetches the submitted topology artifact from the - topology storage, initializes the State Manager, and prepares a physical plan that + topology storage, initializes the **State Manager**, and prepares a physical plan that specifies how multiple instances should be packed into containers. Then, it starts the specified scheduler, such as `com.twitter.heron.scheduler.local.LocalScheduler`, which invokes the `heron-executor` for each container. @@ -278,30 +276,30 @@ Storage | When the topology is deployed to containers by the scheduler, the code * Topologies `heron-executor` process is started for each container and is responsible for - executing the Topology Master or Heron Instances (Bolt/Spout) that are - assigned to the container. Note that the Topology Master is always executed - on container 0. When `heron-executor` executes normal Heron Instances + executing the **Topology Master** or **Heron Instances** (Bolt/Spout) that are + assigned to the container. Note that the **Topology Master** is always executed + on container 0. When `heron-executor` executes normal **Heron Instances** (i.e. except for container 0), it first prepares - the Stream Manager and the Metrics Manager before starting + the **Stream Manager** and the **Metrics Manager** before starting `com.twitter.heron.instance.HeronInstance` for each instance that is assigned to the container. - Heron Instance has two threads: the gateway thread and the slave thread. - The gateway thread is mainly responsible for communicating with the Stream Manager - and the Metrics Manager using `StreamManagerClient` and `MetricsManagerClient` + **Heron Instance** has two threads: the gateway thread and the slave thread. + The gateway thread is mainly responsible for communicating with the **Stream Manager** + and the **Metrics Manager** using `StreamManagerClient` and `MetricsManagerClient` respectively, as well as sending/receiving tuples to/from the slave - thread. On the other hand, the slave thread runs either spout or bolt + thread. On the other hand, the slave thread runs either Spout or Bolt of the topology based on the physical plan. - When a new Heron Instance is started, its `StreamManagerClient` establishes - a connection and registers itself with the stream manager. + When a new **Heron Instance** is started, its `StreamManagerClient` establishes + a connection and registers itself with the **Stream Manager**. After the successful registration, the gateway thread sends its physical plan to the slave thread, which then executes the assigned instance accordingly. ## Codebase -Heron is primarily written in Java, C++, and Python. +Heron is primarily written in **Java**, **C++**, and **Python**. A detailed guide to the Heron codebase can be found [here](../../contributors/codebase). ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services