[GitHub] keith-turner opened a new issue #80: Improve Fluo's front page and short description.

git Fri, 11 Aug 2017 08:48:18 -0700

keith-turner opened a new issue #80: Improve Fluo's front page and short 
description.
URL: https://github.com/apache/fluo-website/issues/80
 
 
   Recent events have caused discussion in the community about how best to 
quickly describe Fluo.  As a result we have a few different short descriptions 
floating around.
   
   Description from Github readme
   
   ```
   Apache Fluo is an open source implementation of Percolator (which populates
   Google's search index) for Apache Accumulo. Fluo makes it possible to update
   the results of a large-scale computation, index, or analytic as new data is
   discovered. Check out the Fluo project website for news and general
   information.
   ```
   
   Description on website
   
   ```
   Apache Fluo is an open source implementation of Percolator (which populates
   Google's search index) for Apache Accumulo. With Fluo, users can continuously
   join new data into large existing data sets without reprocessing all data.
   Unlike batch and streaming frameworks, Fluo offers much lower latency and can
   operate on extremely large data sets.
   ```
   
   Description in August 2017 board report.
   
   ```
    - Apache Fluo is a distributed processing system built on Apache Accumulo.  
Fluo
      users can easily setup workflows that execute cross node transactions 
when data
      changes.  These workflows enable users to continuously join new data into 
large
      existing data sets with low latency while avoiding reprocessing all data.
   ```
   
   Below are some of the concepts these short descriptions are trying to 
communicate.  What else needs to be in this outline?  Can we improve the front 
page of the website to be more informative and succinct? The front page does 
not have to touch on all aspects, it could possibly link out for more details 
or omit some aspects.
   
   * History
     * Based on Percolator design.
   * What capabilities it offers to users
      * Continuously join new data into large existing data sets without 
reprocessing all data
      * Keep multiple dependent derived data sets  (similar to materialized 
views)
      * Emit changes in derived data sets to external systems
        * Continually keep a large index up to date as new data arrives.
        * Update external analytic systems.
   * How it works
      * Cross node transactions
      * Notifications
      * Observers that execute based on notifications
   * Context, how does it compare to other technologies.  Explaining Fluo 
relative to other technologies may help people understand Fluo more quickly.
      * lower latency than batch
      * larger data sets than streaming
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



With regards,
Apache Git Services

[GitHub] keith-turner opened a new issue #80: Improve Fluo's front page and short description.

Reply via email to