This is an automated email from the ASF dual-hosted git repository. mwalch pushed a commit to branch gh-pages in repository https://gitbox.apache.org/repos/asf/fluo-website.git
The following commit(s) were added to refs/heads/gh-pages by this push: new 16b6f7f Added troubleshooting documentation (#142) 16b6f7f is described below commit 16b6f7f71ea0acd03dac6001a99c717bd2f2e78f Author: Mike Walch <mwa...@apache.org> AuthorDate: Tue Mar 13 17:07:54 2018 -0400 Added troubleshooting documentation (#142) --- _fluo-1-2/administration/troubleshooting.md | 56 +++++++++++++++++++++++++++++ 1 file changed, 56 insertions(+) diff --git a/_fluo-1-2/administration/troubleshooting.md b/_fluo-1-2/administration/troubleshooting.md new file mode 100644 index 0000000..47dd5e4 --- /dev/null +++ b/_fluo-1-2/administration/troubleshooting.md @@ -0,0 +1,56 @@ +--- +title: Troubleshooting +category: administration +order: 7 +--- + +Steps for troubleshooting problems with Fluo applications. + +## Fluo application stops processing data + +1. Confirm that your application is running with the expected number of workers. + ```bash + $ fluo list + Fluo instance (localhost/fluo) contains 1 application(s) + + Application Status # Workers + ----------- ------ --------- + webindex RUNNING 3 + ``` + Look for errors in the logs of any oracle or worker that has died. + +1. Run the `fluo wait` command to see if you application is processing notifications. + ```bash + $ fluo wait -a webindex + [command.FluoWait] INFO : The wait command will exit when all notifications are processed + [command.FluoWait] INFO : 140 notifications are still outstanding. Will try again in 10 seconds... + [command.FluoWait] INFO : 140 notifications are still outstanding. Will try again in 10 seconds... + [command.FluoWait] INFO : 140 notifications are still outstanding. Will try again in 10 seconds... + [command.FluoWait] INFO : 96 notifications are still outstanding. Will try again in 10 seconds... + [command.FluoWait] INFO : 70 notifications are still outstanding. Will try again in 10 seconds... + [command.FluoWait] INFO : 31 notifications are still outstanding. Will try again in 10 seconds... + [command.FluoWait] INFO : All processing has finished! + ``` + The number of notifications will increase as data is added to the application but they should eventually decrease + to zero and processing should finish. + +1. Look for errors or exceptions in the logs of all oracle and worker processes. Processing can stop if all threads + in a worker process were consumed by exceptions thrown in Fluo application's observer code. These exceptions + are often due to parsing issues or corner cases not seen during development or using small data sets. + +1. If you are using a cluster manager (i.e Marathon, YARN etc) to run your Fluo application, look for errors in the logs of + your cluster manager or application manager. Below are some common errors: + + * Cluster managers sometimes fail to start all process of Fluo application due to lack of container slots or resources (CPU, memory, etc). + This can be fixed by giving more resources to your cluster manager or decrease the number/resources of Fluo workers. + * Cluster managers can kill Fluo processes if they use too much memory. This can be fixed by allocating more memory to your workers. + +1. Run [jstack] to get stack traces of threads in your Fluo application processes and look for any stuck threads. + +1. Consider configuring your Fluo application to [report metrics][metrics] so that they are viewable in Grafana/InfluxDB. Metrics + can are helpfu in debugging performance issues. + +If you are still having trouble, feel free to email `d...@fluo.apache.org` for help. + +[jstack]: https://docs.oracle.com/javase/8/docs/technotes/tools/unix/jstack.html +[metrics]: {{ page.docs_base }}/administration/metrics -- To stop receiving notification emails like this one, please contact mwa...@apache.org.