Re: NiFi for IoT Analytics

Julian Feinauer Mon, 01 Oct 2018 12:07:52 -0700

Hi Joe,

thirst, thank you very much for your extensive answer, clarification and 
resources.
After digging through your references and reading your text It becomes clearer 
and clearer to me what NiFi is about and what problem it solves. 
And indeed, I get the impression that what NiFi does and what we plan to do is 
somewhat orthogonal as we do not care that much about all that "where does a 
value come from" or "how does it get from here to here" but more about the 
"what does it tell us" or "how do we have to react on this".
So in the Language of NiFi this should be a processor, as I understand.


And regarding your comment to MiNiFi, I was thinking the same. From my reading 
about the small footprint it seems ideal as "Edge" Runtime and as first level 
of ingestion.
But reading your comment I got another question to my mind. You talk about 
async command and control (which is an ideal pattern for communication with 
edge devices). Are these Features build into MiNiFi or are there some Patterns 
to achieve this, e.g., by sending specialized messages?

Thank you!
Julian

Am 01.10.18, 16:29 schrieb "Joe Witt" <[email protected]>:

    Julian,
    
    It is a great question.  If you have not already read them the
    overview [1] and life of a flow file [2] docs will probably help
    orient things.  One is really high level and the other is much lower
    level so they might not provide enough clarity.
    
    For your question about what happens to data while it in NiFi's
    control the in depth doc [2] is key to follow.
    
    I'm going to provide a sort of 'architecture stack' for IoT systems
    from a sort of end-to-end view and explain where NiFi fits in there in
    terms of sweet spot and overlap and will try to do so from Edge
    through Core/Cloud.
    
    Edge:
    - Device/Sensor
    - Data Processing (simple, complex *)
    - Data Routing
    - Gateway
    
    * When I say simple/complex here i mean it in the sort of classical
    sense of simple event processing versus complex event processing.
    These terms have become problematic and seem to have fallen out of
    favor.  But for my purposes here in this email I mean to suggest the
    difference really is whether a discrete object/event can be operated
    on in its own right based on its data and the context/configuration of
    the system and the confirmation of the flow itself (think of a flow
    like a finite state machine where being at some state implies prior
    context).  Whereas complex means a given object/data/event is operated
    on against some sort of rolling window of knowledge/state/observation
    to support cool things like temporal/spatial correlation, etc..  There
    are probably better definitions out there and indeed the above is more
    of a continuum than a yes/no thing.
    
    In your examples you ask:
    "how long was this bit set"
    "notify me when this signal is below a certain threshold for more than 30s"
    Both of those require retaining and tracking state apart from the
    event itself.  If we consider that a sensor reading is an event what
    we want to know how across a series of events for how long a given
    reading was consistent/when it changed, etc.. then we have some sort
    of database where we keep this information and indeed when the state
    changes we want to fire/detect that as an event itself.  For the 30
    other case we want to have events generated based on that same sort of
    knowledge but in this case with an added time specific trigger.
    They're both of the same variety.
    
    Regional DataCenter (Cloud, On-Prem, etc..)
    - Device Management
    - Messaging
    - Data Flow Integration
    
    Core Data Center (Cloud, On-Prem, etc..)
    - Messaging
    - Data Flow Integration
    - Global/Centralized Command and Control of DataFlows
    - Data Processing (stream/batch/etc..)
    - System Orchestration
    
    NiFi as a project fundamentally was designed to tackle the end (where
    data is created) to end (all points of consumption) flow management of
    data.  The fundamental understanding is that systems which generate
    data and systems which consume data are often not designed to talk to
    eachother in advance and even if they were there are important
    separations of concerns to consider and plan for and enforce to have a
    healthy architecture.  I talk about this part quite a bit in [3].  In
    short though, for two systems to talk reliably there are many things
    which have to agree - always.  Some of them are format, schema,
    relevance, priority, size, rate, etc..  A related set of problems then
    is how to manage that for end to end systems the components within it
    come and go and get moved around and get upgraded, and use different
    security, etc...  I talk about messaging versus data flow management
    in the oscon talk.
    
    Now, having sort of established (this is a much longer topic) the
    point of dataflow management, let me distinguish this from other
    systems like Camel/etc..  and then processing systems.
    
    NiFi differs from most systems in this dataflow management space in
    several important ways.  One key way is that it takes the
    responsibility for the safety of data and is designed for handling
    tiny objects (a  few bytes) and large objects (several GB) at once.
    It exposes a streaming API to extension writers that never forces
    byte[] usage which is usually what makes other systems difficult to
    use and scale in this space.  It provides built-in data provenance
    capabilities which make tracking the origin and attribution of data in
    an end-to-end sense almost a solvable problem :) (among other cool
    things).  It provides the ability to see in real-time what is
    happening flowing in the system and to interactively modify a running
    flow impacting only the parts of the flow under change.  It supports
    complex directed graphs of processing.  Now with the NiFi Registry it
    supports a powerful SDLC model for CI/CD style work across
    environments and still maintains the previous comments about how it
    efficiently impacts live flows/change-sets.
    
    Also, NiFi strongly differs in that you don't put 'nifi into your
    application'.  People do that with Camel all the time as it helps them
    give their applications dataflow capabilities.  With NiFi it is a
    central data broker in and of itself.  You run NiFi as an
    application/cluster/etc.. and use it to capture/source data from
    producers using the protocols those systems were built for and in
    either a push/pull sourcing mode.  You route/transform data as needed
    both within NiFi or by making service calls.  And finally you use NiFi
    to get data to consuming systems again using their protocol of choice
    and again in either a push or pull fashion.  NiFi handles the safety
    of the data using its repositories and transactional behavior each
    exchange point.  Tons of power in that.
    
    NiFi is used all the time for 'data processing'.  This is often
    referred to as 'Transformations'.  If you talk to stream processing
    people they'll say processing is about windowing and any system that
    doesn't have that isn't a stream processing system.  Well, NiFi
    doesn't have that but its used for all kinds of
    processing/transformations all the time.  If you talk to people that
    do 'ETL' they'll say transformations are all about relational
    transforms and if NiFi doesn't have those or make those easy then it
    isn't an ETL system.  Well, NiFi is used in ETL all the time and it
    doesn't have those things.  The point I'm trying to make is that NiFi
    is designed for routing, transformation, and mediation of data between
    systems.  Routing/Mediation are obvious easy so lets stay on
    transformation.  What types of things is NiFi often used for in this
    realm?  These would be things like enrichment of events, format/schema
    transformation, filtering, aggregation where the intent is to combine
    a series of events together into a larger event, not so much
    aggregation where the intent is to look at a series of events and
    produce a single 'aggregate/summary' though on occasion that too,
    splitting where we'd take a series of events and split them apart,
    etc.  That comment there about aggregation is a key one.  You want
    what some of the batch/streaming processing systems do for those
    things. Why not have all that in a single system?  The actors, needs,
    and resulting APIs and user experiences tend to be skewed toward data
    processing and data flow management as different things.  I see
    projects/companies all the time trying to use one for the other.  So,
    my advice there is be careful.  Make sure you use the one you need or
    use pick a tool for each category and use them both together and let
    them do their part well.
    
    So, all that rambling now let me answer your fundamental ask which was
    "What I am looking for is a framework which does some analysis of data
    streams coming from controllers"
    
    NiFi is less about the 'analysis' of data streams and more about the
    management (capture, transformation, mediation) of those data streams.
    Have NiFi be what talks with the devices/gateway, messaging systems,
    etc. and feeds data to/through the analytic/processing system whatever
    that may be.  You can also use NiFi for the analytic execution in
    itself as well but it won't be 'as good' at that.
    
    Some quick comments to round this out:
    1) MiNiFi is for the first-mile edge collection problem.  It is
    designed to live directly on a sensor/device and failing that then
    live on the nearby gateway to act as a data flow system.  It would
    relay data to NiFi, or via MQTT, or to a Kafka topic in some other
    location.  It's about having a vast distribution of data flow agents
    each operating independently but under some centralized asynchronous
    command and control model.
    2) NiFi is for regional/core datacenter data flow management.  It
    supports clustering, live/interactive flow management, etc..
    3) The NiFi Registry is a centralized registry to store version
    controlled flows which can be instantiated as many times as needed in
    a single cluster, environment, etc..  It serves as a store for well
    designed/tested/certified/compliance approved flows or as a tool to
    migrate from dev to staging to prod or to replicate prod in another
    environment for quick troubleshooting, etc..
    
    NiFi can be used for service activator.  If that is your primary
    interest you might also want to look at Apache Airflow.  That project
    seems to focus on system orchestration moreso than data flow
    management.  At first blush they appear similar but in reality they
    solve really different problems.
    
    [1] https://nifi.apache.org/docs/nifi-docs/html/overview.html
    [2] https://nifi.apache.org/docs/nifi-docs/html/nifi-in-depth.html
    [3] https://www.youtube.com/watch?v=sQCgtCoZyFQ
    
    Thanks
    Joe
    On Mon, Oct 1, 2018 at 9:06 AM Otto Fowler <[email protected]> wrote:
    >
    > I think you might want to look at Apache Metron, and it’s profile and 
alerting capabilities if you are going to look for SIEM like things.
    >
    >
    >
    > On September 30, 2018 at 07:58:10, Julian Feinauer 
([email protected]) wrote:
    >
    > Hi Nifi-team,
    >
    >
    >
    > I’m from the incubating plc4x project [1] and I am looking for a 
framework which is suitable for the management of IoT Datastreams and do some 
edge computing.
    >
    > As nifi is often times mentioned in relation with IoT I tried to find out 
what nifi realy does and how it would fit with our ideas (and also the MiNiFi 
Project seems to fit into this).
    >
    >
    >
    > From what I understood from the Docs and some Videos NiFi looks for me a 
bit like Apache Camel [2] as it is able to (dynamically) integrate different 
systems and manage the dataflow between them. So what I did not get exactly I 
how the payloads are managed between these Endpoints and how much of processing 
Nifi does itself and how much it delegates to other components (like e.g. 
Service Activater in EIP).
    >
    >
    >
    > What I am looking for is a framework which does some analysis of data 
streams coming from controllers that, e.g., control machines or robots. 
chrisdutz already prepared the first version of an NiFi Endpoint in th Plc4x 
Repo so we are already able to stream these datasets to NiFi. Whats unclear to 
me is how we could tackle some of the questions like “how long was this bit 
set” or “notify me when this signal is below a certain threshold for more than 
30s” or so.
    >
    > Is this in the scope of NiFi or is NiFi more of an integration / 
data-flow layer which is absolutely agnostic of these processing blocks?
    >
    >
    >
    > I hope my questions are not too dumb or I’m not missing NiFis core too 
much with my current knowledge.
    >
    > I would be happy for some answers or some ideas about how to approach the 
questions stated above by some experienced users.
    >
    >
    >
    > Best
    >
    > Julian
    >
    >
    >
    > [1] http://plc4x.incubator.apache.org/
    >
    > [2] https://camel.apache.org/
    >
    > [3] 
https://github.com/apache/incubator-plc4x/tree/master/integrations/apache-nifi

Re: NiFi for IoT Analytics

Reply via email to