Re: Using nifi in separated networks

2021-08-03 Thread Marc
Hi Daniel,

we tested two different diodes (two different manufacturer) for a longer time. 
Both diodes count the UDP packets and log any loss. Additionally we are using 
hashes to detect any form of content errors. We have tested the two diodes for 
over a year and have not seen any packet loss during this time. Both diodes 
were shipped as appliances - the manufacturer therefore knows exactly the 
hardware. 

This is also the reason why I initially decided against using a reed-solomon 
codes (or OpenRQ as another example). The effort seems greater than the 
benefit. But maybe I'll have more time to look at this later. Especially 
because I want to implement a diode based on nifi and therefore I don't know 
the hardware used.

If you are interested in a very reliable diode then another concept would be 
interesting for you. A German manufacturer uses a L4 Microkernel (similar to 
the seL4 microkernel) to prevent data being sent back. Only acks without 
payload are allowed. This works like a charm and is very reliable.

Thanks again for your interest in my work. I appreciate that.

Marc

> Am 03.08.2021 um 12:33 schrieb Daniel Chaffelson :
> 
> This is a very interesting area of integration investigation Marc, thank
> you for sharing your work!
> 
> I looked into this a little after conversations with folks in security
> applications, and I wonder if you investigated approaches to tracking and
> reporting/handling packet loss and error rates in this?
> The interest was in reasoning about loss rates, and the completeness of
> received data - something with a simple merge>put>diode>get>unpack would
> not manage I think.
> 
> I was looking at Longhair , and similar
> reed-solomon approaches, as a method of breaking down arbitrary files and
> transmitting for reconstitution over diodes that may have lossy behavior in
> Field scenarios.
> I also looked a little into transmitting manifests for downstream
> reconciliation, but this unravelled to be more complex an operation than
> would suit a pure NiFi implementation, so I started on the path of
> Kafka/Flink as a streaming-reconciliation service but quickly realised i
> was creating a monster without commercial interest :)
> Both approaches are easier for fewer larger files than millions of tiny
> messages in terms of practicality, and if you had very reliable diode
> transmission the overhead of ecc/reconciliation may not be worthwhile.
> Other implementations I had seen (like ZeroMQ radio/dish or blindFTP)
> seemed to talk about provable delivery as a potential requirement, but I
> only found the more simplistic 'my network is reliable and any packet loss
> is negligible anyway' approaches. I suspect the implementations of these
> more robust approaches are reserved for commercial offerings...
> 
> Anyway, I appreciate that you may not be able to share more details on
> this, but you reminded me of enjoying the investigation when I looked at it
> so I thought I'd say thanks for that.
> 
> On Tue, Aug 3, 2021 at 2:55 AM Phil H  wrote:
> 
>> Adam, that's true, although if your data size is larger than network
>> MTU there can be some disconnect there.
>> 
>> Connection per flow file is pretty slow for sustained high traffic
>> flows though (can't recall the establishment times off the top of my
>> head, but they are non-trivial).
>> 
>> On Tue, Aug 3, 2021 at 8:39 AM Adam Taft  wrote:
>>> 
>>> Just spitballing a little here. If you set the configuration of the
>> PutTCP
>>> processor property "Connection per Flowfile" to 'true' and you leave the
>>> "Outgoing Message Delimiter" as blank (none), then I don't think you have
>>> the delimiter problem that you both are describing. I could be wrong
>> though?
>>> 
>>> I would consider it a bug if you couldn't send a "raw"
>> connection-oriented
>>> object over PutTCP.  With that processor, the goal would be to: a) open a
>>> socket, b) dump whatever binary you have prepared over it, c) close the
>>> socket to signal completion of transfer. If PutTCP doesn't work this way
>>> (byte-for-byte), it should probably be flagged as a bug (its original
>>> intention was exactly this use case).
>>> 
>>> That being said, I still think custom FlowFile serialization might be
>>> something that is outside of the concern of the transport. I personally
>>> think serializing/deserializing is a different concern from transport.
>>> Arguably, sometimes the semantics of the transport protocol requires you
>> to
>>> prepare the message itself in a protocol accommodating way (HTTP being an
>>> obvious example of this, or packet ordering in Marc's UDP example). But a
>>> new JSON flowfile serialization seems like it could be a separate
>>> processor, not commingled into an existing one.
>>> 
>>> MergeContent / UnpackContent work in tandem and have a "FlowFile Stream
>> v3"
>>> format that can serialize/deserialize multiple flowfiles together into a
>>> single byte stream. This allows transport over any 

Re: Using nifi in separated networks

2021-08-03 Thread Daniel Chaffelson
This is a very interesting area of integration investigation Marc, thank
you for sharing your work!

I looked into this a little after conversations with folks in security
applications, and I wonder if you investigated approaches to tracking and
reporting/handling packet loss and error rates in this?
The interest was in reasoning about loss rates, and the completeness of
received data - something with a simple merge>put>diode>get>unpack would
not manage I think.

I was looking at Longhair , and similar
reed-solomon approaches, as a method of breaking down arbitrary files and
transmitting for reconstitution over diodes that may have lossy behavior in
Field scenarios.
I also looked a little into transmitting manifests for downstream
reconciliation, but this unravelled to be more complex an operation than
would suit a pure NiFi implementation, so I started on the path of
Kafka/Flink as a streaming-reconciliation service but quickly realised i
was creating a monster without commercial interest :)
Both approaches are easier for fewer larger files than millions of tiny
messages in terms of practicality, and if you had very reliable diode
transmission the overhead of ecc/reconciliation may not be worthwhile.
Other implementations I had seen (like ZeroMQ radio/dish or blindFTP)
seemed to talk about provable delivery as a potential requirement, but I
only found the more simplistic 'my network is reliable and any packet loss
is negligible anyway' approaches. I suspect the implementations of these
more robust approaches are reserved for commercial offerings...

Anyway, I appreciate that you may not be able to share more details on
this, but you reminded me of enjoying the investigation when I looked at it
so I thought I'd say thanks for that.

On Tue, Aug 3, 2021 at 2:55 AM Phil H  wrote:

> Adam, that's true, although if your data size is larger than network
> MTU there can be some disconnect there.
>
> Connection per flow file is pretty slow for sustained high traffic
> flows though (can't recall the establishment times off the top of my
> head, but they are non-trivial).
>
> On Tue, Aug 3, 2021 at 8:39 AM Adam Taft  wrote:
> >
> > Just spitballing a little here. If you set the configuration of the
> PutTCP
> > processor property "Connection per Flowfile" to 'true' and you leave the
> > "Outgoing Message Delimiter" as blank (none), then I don't think you have
> > the delimiter problem that you both are describing. I could be wrong
> though?
> >
> > I would consider it a bug if you couldn't send a "raw"
> connection-oriented
> > object over PutTCP.  With that processor, the goal would be to: a) open a
> > socket, b) dump whatever binary you have prepared over it, c) close the
> > socket to signal completion of transfer. If PutTCP doesn't work this way
> > (byte-for-byte), it should probably be flagged as a bug (its original
> > intention was exactly this use case).
> >
> > That being said, I still think custom FlowFile serialization might be
> > something that is outside of the concern of the transport. I personally
> > think serializing/deserializing is a different concern from transport.
> > Arguably, sometimes the semantics of the transport protocol requires you
> to
> > prepare the message itself in a protocol accommodating way (HTTP being an
> > obvious example of this, or packet ordering in Marc's UDP example). But a
> > new JSON flowfile serialization seems like it could be a separate
> > processor, not commingled into an existing one.
> >
> > MergeContent / UnpackContent work in tandem and have a "FlowFile Stream
> v3"
> > format that can serialize/deserialize multiple flowfiles together into a
> > single byte stream. This allows transport over any protocol, including
> > file-based, socket-based, etc.
> >
> > Marc: Your mention of performance is, of course, appropriate for the
> scale
> > that you're talking about (Gbps). Maybe there's some performance
> > improvements that could be garnered from your work applicable to the
> > "standard" processors I mentioned. And I definitely didn't mean to imply
> > you were doing "anything wrong". Just legitimately curious as to your
> > thought process and design approach.
> >
> > OK, I'll step off a little, because I might be probing too hard here.
> But I
> > was legitimately curious about the intention of the proposed processor as
> > it relates to the mentioned Diode device.
> >
> > Thanks,
> >
> > Adam
> >
> >
> > On Mon, Aug 2, 2021 at 4:15 PM Phil H  wrote:
> >
> > > Hi Marc,
> > >
> > > Thanks for the additional info.  Just so you know you’re not the only
> > > one, I’ve also had to re-implement a ListenTCP alternative to get
> > > around the byte delimeter issue for binary and multiline text data.
> > >
> > > Phil
> > >
> > >
> > > On Tue, Aug 3, 2021 at 6:59 AM Marc  wrote:
> > > >
> > > > Hi Adam,
> > > >
> > > > more or less it is a ‚merge', puttcp, listentcp and unpack. I hope
> that
> > > I am not wrong but the nifi 

Re: Using nifi in separated networks

2021-08-02 Thread Phil H
Adam, that's true, although if your data size is larger than network
MTU there can be some disconnect there.

Connection per flow file is pretty slow for sustained high traffic
flows though (can't recall the establishment times off the top of my
head, but they are non-trivial).

On Tue, Aug 3, 2021 at 8:39 AM Adam Taft  wrote:
>
> Just spitballing a little here. If you set the configuration of the PutTCP
> processor property "Connection per Flowfile" to 'true' and you leave the
> "Outgoing Message Delimiter" as blank (none), then I don't think you have
> the delimiter problem that you both are describing. I could be wrong though?
>
> I would consider it a bug if you couldn't send a "raw" connection-oriented
> object over PutTCP.  With that processor, the goal would be to: a) open a
> socket, b) dump whatever binary you have prepared over it, c) close the
> socket to signal completion of transfer. If PutTCP doesn't work this way
> (byte-for-byte), it should probably be flagged as a bug (its original
> intention was exactly this use case).
>
> That being said, I still think custom FlowFile serialization might be
> something that is outside of the concern of the transport. I personally
> think serializing/deserializing is a different concern from transport.
> Arguably, sometimes the semantics of the transport protocol requires you to
> prepare the message itself in a protocol accommodating way (HTTP being an
> obvious example of this, or packet ordering in Marc's UDP example). But a
> new JSON flowfile serialization seems like it could be a separate
> processor, not commingled into an existing one.
>
> MergeContent / UnpackContent work in tandem and have a "FlowFile Stream v3"
> format that can serialize/deserialize multiple flowfiles together into a
> single byte stream. This allows transport over any protocol, including
> file-based, socket-based, etc.
>
> Marc: Your mention of performance is, of course, appropriate for the scale
> that you're talking about (Gbps). Maybe there's some performance
> improvements that could be garnered from your work applicable to the
> "standard" processors I mentioned. And I definitely didn't mean to imply
> you were doing "anything wrong". Just legitimately curious as to your
> thought process and design approach.
>
> OK, I'll step off a little, because I might be probing too hard here. But I
> was legitimately curious about the intention of the proposed processor as
> it relates to the mentioned Diode device.
>
> Thanks,
>
> Adam
>
>
> On Mon, Aug 2, 2021 at 4:15 PM Phil H  wrote:
>
> > Hi Marc,
> >
> > Thanks for the additional info.  Just so you know you’re not the only
> > one, I’ve also had to re-implement a ListenTCP alternative to get
> > around the byte delimeter issue for binary and multiline text data.
> >
> > Phil
> >
> >
> > On Tue, Aug 3, 2021 at 6:59 AM Marc  wrote:
> > >
> > > Hi Adam,
> > >
> > > more or less it is a ‚merge', puttcp, listentcp and unpack. I hope that
> > I am not wrong but the nifi ListenTCP processor uses a delimiter (\n as
> > default?). If you are transferring binary data the processor splits the
> > flow into ‚pieces'. And the attributes are not transferred to the
> > destination.
> > >
> > > But your idea describes what the processor is doing.
> > >
> > > 1. It converts the attributes to a json string
> > > 2. It transfers the json string and the payload (there is a header that
> > tells the destination how long the json header and how long the payload is)
> > > 3. The Listener gets the flow and decodes the header (to get the size of
> > the json header and the payload)
> > > 4. It writes the payload to a flow
> > > 5. It converts the json string and sets the attributes to the flow
> > >
> > > If you do not want to transfer attributes you can configure a different
> > decoder. In this case you can just ‚nectat‘ a binary file to nifi.
> > >
> > > The UDP version is far more complex. There must be a counter to tell the
> > destination what part of the flow file was received (even in a diode
> > environment packets are not received in the right order!). And you must be
> > fast, very fast. It is a multithreaded architecture because one thread
> > cannot receive, decode, and write a gigabit per second. I used the
> > disruptor library. Receive a packet in one thread, decode it in another
> > thread. A third thread gets the packet and write the content in the right
> > order to a flow.
> > >
> > > I am still learning (and I am not a professional software developer). If
> > I did something wrong or oversaw something please tell me.
> > >
> > > Marc
> > >
> > > > Am 02.08.2021 um 22:01 schrieb Adam Taft :
> > > >
> > > > Marc,
> > > >
> > > > How would this differ from a more generic use of the existing
> > processors,
> > > > PutTCP/ListentTCP and PutUDP/ListenUDP?  I'm not sure what value is
> > being
> > > > added above these existing processors, but I'm sure I'm missing
> > something.
> > > >
> > > > There's already an ability to 

Re: Using nifi in separated networks

2021-08-02 Thread Adam Taft
Just spitballing a little here. If you set the configuration of the PutTCP
processor property "Connection per Flowfile" to 'true' and you leave the
"Outgoing Message Delimiter" as blank (none), then I don't think you have
the delimiter problem that you both are describing. I could be wrong though?

I would consider it a bug if you couldn't send a "raw" connection-oriented
object over PutTCP.  With that processor, the goal would be to: a) open a
socket, b) dump whatever binary you have prepared over it, c) close the
socket to signal completion of transfer. If PutTCP doesn't work this way
(byte-for-byte), it should probably be flagged as a bug (its original
intention was exactly this use case).

That being said, I still think custom FlowFile serialization might be
something that is outside of the concern of the transport. I personally
think serializing/deserializing is a different concern from transport.
Arguably, sometimes the semantics of the transport protocol requires you to
prepare the message itself in a protocol accommodating way (HTTP being an
obvious example of this, or packet ordering in Marc's UDP example). But a
new JSON flowfile serialization seems like it could be a separate
processor, not commingled into an existing one.

MergeContent / UnpackContent work in tandem and have a "FlowFile Stream v3"
format that can serialize/deserialize multiple flowfiles together into a
single byte stream. This allows transport over any protocol, including
file-based, socket-based, etc.

Marc: Your mention of performance is, of course, appropriate for the scale
that you're talking about (Gbps). Maybe there's some performance
improvements that could be garnered from your work applicable to the
"standard" processors I mentioned. And I definitely didn't mean to imply
you were doing "anything wrong". Just legitimately curious as to your
thought process and design approach.

OK, I'll step off a little, because I might be probing too hard here. But I
was legitimately curious about the intention of the proposed processor as
it relates to the mentioned Diode device.

Thanks,

Adam


On Mon, Aug 2, 2021 at 4:15 PM Phil H  wrote:

> Hi Marc,
>
> Thanks for the additional info.  Just so you know you’re not the only
> one, I’ve also had to re-implement a ListenTCP alternative to get
> around the byte delimeter issue for binary and multiline text data.
>
> Phil
>
>
> On Tue, Aug 3, 2021 at 6:59 AM Marc  wrote:
> >
> > Hi Adam,
> >
> > more or less it is a ‚merge', puttcp, listentcp and unpack. I hope that
> I am not wrong but the nifi ListenTCP processor uses a delimiter (\n as
> default?). If you are transferring binary data the processor splits the
> flow into ‚pieces'. And the attributes are not transferred to the
> destination.
> >
> > But your idea describes what the processor is doing.
> >
> > 1. It converts the attributes to a json string
> > 2. It transfers the json string and the payload (there is a header that
> tells the destination how long the json header and how long the payload is)
> > 3. The Listener gets the flow and decodes the header (to get the size of
> the json header and the payload)
> > 4. It writes the payload to a flow
> > 5. It converts the json string and sets the attributes to the flow
> >
> > If you do not want to transfer attributes you can configure a different
> decoder. In this case you can just ‚nectat‘ a binary file to nifi.
> >
> > The UDP version is far more complex. There must be a counter to tell the
> destination what part of the flow file was received (even in a diode
> environment packets are not received in the right order!). And you must be
> fast, very fast. It is a multithreaded architecture because one thread
> cannot receive, decode, and write a gigabit per second. I used the
> disruptor library. Receive a packet in one thread, decode it in another
> thread. A third thread gets the packet and write the content in the right
> order to a flow.
> >
> > I am still learning (and I am not a professional software developer). If
> I did something wrong or oversaw something please tell me.
> >
> > Marc
> >
> > > Am 02.08.2021 um 22:01 schrieb Adam Taft :
> > >
> > > Marc,
> > >
> > > How would this differ from a more generic use of the existing
> processors,
> > > PutTCP/ListentTCP and PutUDP/ListenUDP?  I'm not sure what value is
> being
> > > added above these existing processors, but I'm sure I'm missing
> something.
> > >
> > > There's already an ability to serialize flowfiles via MergeContent. And
> > > there's the deserialize side in UnpackContent. So a dataflow that looks
> > > like the following would seem a reasonable approach to the problem:
> > >
> > > MergeContent -> PutTCP -> {diode} -> ListentTCP -> UnpackContent
> > >
> > > I'm actually very interested in this topic, having a project that has
> a use
> > > case for a "diode". So I'm legitimately asking here, not trying to
> derail
> > > your work.
> > >
> > > Thanks in advance,
> > >
> > > Adam
> > >
> > > On Sun, Aug 

Re: Using nifi in separated networks

2021-08-02 Thread Phil H
Hi Marc,

Thanks for the additional info.  Just so you know you’re not the only
one, I’ve also had to re-implement a ListenTCP alternative to get
around the byte delimeter issue for binary and multiline text data.

Phil


On Tue, Aug 3, 2021 at 6:59 AM Marc  wrote:
>
> Hi Adam,
>
> more or less it is a ‚merge', puttcp, listentcp and unpack. I hope that I am 
> not wrong but the nifi ListenTCP processor uses a delimiter (\n as default?). 
> If you are transferring binary data the processor splits the flow into 
> ‚pieces'. And the attributes are not transferred to the destination.
>
> But your idea describes what the processor is doing.
>
> 1. It converts the attributes to a json string
> 2. It transfers the json string and the payload (there is a header that tells 
> the destination how long the json header and how long the payload is)
> 3. The Listener gets the flow and decodes the header (to get the size of the 
> json header and the payload)
> 4. It writes the payload to a flow
> 5. It converts the json string and sets the attributes to the flow
>
> If you do not want to transfer attributes you can configure a different 
> decoder. In this case you can just ‚nectat‘ a binary file to nifi.
>
> The UDP version is far more complex. There must be a counter to tell the 
> destination what part of the flow file was received (even in a diode 
> environment packets are not received in the right order!). And you must be 
> fast, very fast. It is a multithreaded architecture because one thread cannot 
> receive, decode, and write a gigabit per second. I used the disruptor 
> library. Receive a packet in one thread, decode it in another thread. A third 
> thread gets the packet and write the content in the right order to a flow.
>
> I am still learning (and I am not a professional software developer). If I 
> did something wrong or oversaw something please tell me.
>
> Marc
>
> > Am 02.08.2021 um 22:01 schrieb Adam Taft :
> >
> > Marc,
> >
> > How would this differ from a more generic use of the existing processors,
> > PutTCP/ListentTCP and PutUDP/ListenUDP?  I'm not sure what value is being
> > added above these existing processors, but I'm sure I'm missing something.
> >
> > There's already an ability to serialize flowfiles via MergeContent. And
> > there's the deserialize side in UnpackContent. So a dataflow that looks
> > like the following would seem a reasonable approach to the problem:
> >
> > MergeContent -> PutTCP -> {diode} -> ListentTCP -> UnpackContent
> >
> > I'm actually very interested in this topic, having a project that has a use
> > case for a "diode". So I'm legitimately asking here, not trying to derail
> > your work.
> >
> > Thanks in advance,
> >
> > Adam
> >
> > On Sun, Aug 1, 2021 at 12:26 PM Marc  wrote:
> >
> >> Greetings,
> >>
> >> there are companies and organizations that strictly separate their
> >> networks for security reasons. Such companies often use diodes to achieve
> >> this. But of course they still have to exchange data between the networks
> >> (eg. transfer data from ‚low‘ to ‚high‘). There are at least two kinds of
> >> diodes. Some hardware-based ones only use one fiber optic to send data (UDP
> >> based). Others use TCP, but prevent sending in the reverse direction.
> >>
> >> Nifi is an amazing tool that allows data to be transferred between two
> >> separate networks in a very flexible but also secure way. I have
> >> implemented two processors. The first one ‚merges‘ the attributes and the
> >> content of a flowfile and sends it to the destination. The second one
> >> listens on a TCP port, splits attributes and content and creates a new
> >> flowfile containing all attributes of the origin flow. You can send the
> >> flow without attributes as well. In this case you can easily netcat a
> >> binary file to Nifi.
> >>
> >> These two processors are useful if you do NOT have a bidirectional
> >> communication between two NiFi instances and therefore the site-2-site
> >> mechanism or http(s) cannot be used.
> >>
> >> We have been using these processors for a longer period of time (exactly
> >> the version for 1.13.2) and would like to share these processors with
> >> others. So the question to you all is: Is someone interested in these
> >> processors or is this use case too special?
> >>
> >> The current source code can be found on GitHub. (
> >> https://github.com/nerdfunk-net/diode/ <
> >> https://github.com/nerdfunk-net/diode/>)
> >>
> >> I have also implemented a UDP based version of the processor. Due to the
> >> nature of UDP, this is more complex and these processors are now being
> >> tested.
> >>
> >> Best regards
> >> Marc
>


Re: Using nifi in separated networks

2021-08-02 Thread Marc
Hi Adam,

more or less it is a ‚merge', puttcp, listentcp and unpack. I hope that I am 
not wrong but the nifi ListenTCP processor uses a delimiter (\n as default?). 
If you are transferring binary data the processor splits the flow into 
‚pieces'. And the attributes are not transferred to the destination.

But your idea describes what the processor is doing.

1. It converts the attributes to a json string
2. It transfers the json string and the payload (there is a header that tells 
the destination how long the json header and how long the payload is)
3. The Listener gets the flow and decodes the header (to get the size of the 
json header and the payload)
4. It writes the payload to a flow
5. It converts the json string and sets the attributes to the flow 

If you do not want to transfer attributes you can configure a different 
decoder. In this case you can just ‚nectat‘ a binary file to nifi.

The UDP version is far more complex. There must be a counter to tell the 
destination what part of the flow file was received (even in a diode 
environment packets are not received in the right order!). And you must be 
fast, very fast. It is a multithreaded architecture because one thread cannot 
receive, decode, and write a gigabit per second. I used the disruptor library. 
Receive a packet in one thread, decode it in another thread. A third thread 
gets the packet and write the content in the right order to a flow.

I am still learning (and I am not a professional software developer). If I did 
something wrong or oversaw something please tell me.

Marc 

> Am 02.08.2021 um 22:01 schrieb Adam Taft :
> 
> Marc,
> 
> How would this differ from a more generic use of the existing processors,
> PutTCP/ListentTCP and PutUDP/ListenUDP?  I'm not sure what value is being
> added above these existing processors, but I'm sure I'm missing something.
> 
> There's already an ability to serialize flowfiles via MergeContent. And
> there's the deserialize side in UnpackContent. So a dataflow that looks
> like the following would seem a reasonable approach to the problem:
> 
> MergeContent -> PutTCP -> {diode} -> ListentTCP -> UnpackContent
> 
> I'm actually very interested in this topic, having a project that has a use
> case for a "diode". So I'm legitimately asking here, not trying to derail
> your work.
> 
> Thanks in advance,
> 
> Adam
> 
> On Sun, Aug 1, 2021 at 12:26 PM Marc  wrote:
> 
>> Greetings,
>> 
>> there are companies and organizations that strictly separate their
>> networks for security reasons. Such companies often use diodes to achieve
>> this. But of course they still have to exchange data between the networks
>> (eg. transfer data from ‚low‘ to ‚high‘). There are at least two kinds of
>> diodes. Some hardware-based ones only use one fiber optic to send data (UDP
>> based). Others use TCP, but prevent sending in the reverse direction.
>> 
>> Nifi is an amazing tool that allows data to be transferred between two
>> separate networks in a very flexible but also secure way. I have
>> implemented two processors. The first one ‚merges‘ the attributes and the
>> content of a flowfile and sends it to the destination. The second one
>> listens on a TCP port, splits attributes and content and creates a new
>> flowfile containing all attributes of the origin flow. You can send the
>> flow without attributes as well. In this case you can easily netcat a
>> binary file to Nifi.
>> 
>> These two processors are useful if you do NOT have a bidirectional
>> communication between two NiFi instances and therefore the site-2-site
>> mechanism or http(s) cannot be used.
>> 
>> We have been using these processors for a longer period of time (exactly
>> the version for 1.13.2) and would like to share these processors with
>> others. So the question to you all is: Is someone interested in these
>> processors or is this use case too special?
>> 
>> The current source code can be found on GitHub. (
>> https://github.com/nerdfunk-net/diode/ <
>> https://github.com/nerdfunk-net/diode/>)
>> 
>> I have also implemented a UDP based version of the processor. Due to the
>> nature of UDP, this is more complex and these processors are now being
>> tested.
>> 
>> Best regards
>> Marc



Re: Using nifi in separated networks

2021-08-02 Thread Adam Taft
Marc,

How would this differ from a more generic use of the existing processors,
PutTCP/ListentTCP and PutUDP/ListenUDP?  I'm not sure what value is being
added above these existing processors, but I'm sure I'm missing something.

There's already an ability to serialize flowfiles via MergeContent. And
there's the deserialize side in UnpackContent. So a dataflow that looks
like the following would seem a reasonable approach to the problem:

MergeContent -> PutTCP -> {diode} -> ListentTCP -> UnpackContent

I'm actually very interested in this topic, having a project that has a use
case for a "diode". So I'm legitimately asking here, not trying to derail
your work.

Thanks in advance,

Adam

On Sun, Aug 1, 2021 at 12:26 PM Marc  wrote:

> Greetings,
>
> there are companies and organizations that strictly separate their
> networks for security reasons. Such companies often use diodes to achieve
> this. But of course they still have to exchange data between the networks
> (eg. transfer data from ‚low‘ to ‚high‘). There are at least two kinds of
> diodes. Some hardware-based ones only use one fiber optic to send data (UDP
> based). Others use TCP, but prevent sending in the reverse direction.
>
> Nifi is an amazing tool that allows data to be transferred between two
> separate networks in a very flexible but also secure way. I have
> implemented two processors. The first one ‚merges‘ the attributes and the
> content of a flowfile and sends it to the destination. The second one
> listens on a TCP port, splits attributes and content and creates a new
> flowfile containing all attributes of the origin flow. You can send the
> flow without attributes as well. In this case you can easily netcat a
> binary file to Nifi.
>
> These two processors are useful if you do NOT have a bidirectional
> communication between two NiFi instances and therefore the site-2-site
> mechanism or http(s) cannot be used.
>
> We have been using these processors for a longer period of time (exactly
> the version for 1.13.2) and would like to share these processors with
> others. So the question to you all is: Is someone interested in these
> processors or is this use case too special?
>
> The current source code can be found on GitHub. (
> https://github.com/nerdfunk-net/diode/ <
> https://github.com/nerdfunk-net/diode/>)
>
> I have also implemented a UDP based version of the processor. Due to the
> nature of UDP, this is more complex and these processors are now being
> tested.
>
> Best regards
> Marc


Re: Using nifi in separated networks

2021-08-02 Thread Marc
Hi,

no errors can be detected on the sender side (even the nic will not detect if 
the other side is down or not). If a UDP packet is lost, the receiver side will 
detect and log it. There are diodes that are using an archive. If any 
transmission is lost you can easily resend the data manually. 

Practically one proceeds as follows:
 - know your hardware very (very) well. Know how many packets can be sent 
without loss (that is most important).
 - there is only one process that sends data; No overload (congestion) may be 
generated
 - counters are used to detect packet loss (some diodes use something like ecc)

I know diodes that transfer millions of data without any loss. It is not so 
difficult as it sounds. But of course there is always the possibility of a 
packet loss without recognizing it immediately. It is always a tradeoff between 
security and convenience. 

Using two diodes always means having two separate Nifi systems. We do generate 
a unique ID that is always part of the flow (like the uuid). This ID is 
transferred across all systems. All logs contain this ID. So you see the flow 
across the network border. 

We use nifi to synchronize a lot of data across separated networks. And I don't 
know of any system that can do this better.

Regards
Marc


> Am 01.08.2021 um 23:15 schrieb Phil H :
> 
> That is interesting stuff - out of interest, if it was sent over that UDP
> diode, how would you know whether or not it got to the other side? I
> haven’t looked into the site-to-site functionality much yet but I assume it
> maintains the providence info?
> 
> On Mon, 2 Aug 2021 at 04:26, Marc  > wrote:
> 
>> Greetings,
>> 
>> there are companies and organizations that strictly separate their
>> networks for security reasons. Such companies often use diodes to achieve
>> this. But of course they still have to exchange data between the networks
>> (eg. transfer data from ‚low‘ to ‚high‘). There are at least two kinds of
>> diodes. Some hardware-based ones only use one fiber optic to send data (UDP
>> based). Others use TCP, but prevent sending in the reverse direction.
>> 
>> Nifi is an amazing tool that allows data to be transferred between two
>> separate networks in a very flexible but also secure way. I have
>> implemented two processors. The first one ‚merges‘ the attributes and the
>> content of a flowfile and sends it to the destination. The second one
>> listens on a TCP port, splits attributes and content and creates a new
>> flowfile containing all attributes of the origin flow. You can send the
>> flow without attributes as well. In this case you can easily netcat a
>> binary file to Nifi.
>> 
>> These two processors are useful if you do NOT have a bidirectional
>> communication between two NiFi instances and therefore the site-2-site
>> mechanism or http(s) cannot be used.
>> 
>> We have been using these processors for a longer period of time (exactly
>> the version for 1.13.2) and would like to share these processors with
>> others. So the question to you all is: Is someone interested in these
>> processors or is this use case too special?
>> 
>> The current source code can be found on GitHub. (
>> https://github.com/nerdfunk-net/diode/ 
>>  <
>> https://github.com/nerdfunk-net/diode/ 
>> >)
>> 
>> I have also implemented a UDP based version of the processor. Due to the
>> nature of UDP, this is more complex and these processors are now being
>> tested.
>> 
>> Best regards
>> Marc



Re: Using nifi in separated networks

2021-08-01 Thread Phil H
That is interesting stuff - out of interest, if it was sent over that UDP
diode, how would you know whether or not it got to the other side? I
haven’t looked into the site-to-site functionality much yet but I assume it
maintains the providence info?

On Mon, 2 Aug 2021 at 04:26, Marc  wrote:

> Greetings,
>
> there are companies and organizations that strictly separate their
> networks for security reasons. Such companies often use diodes to achieve
> this. But of course they still have to exchange data between the networks
> (eg. transfer data from ‚low‘ to ‚high‘). There are at least two kinds of
> diodes. Some hardware-based ones only use one fiber optic to send data (UDP
> based). Others use TCP, but prevent sending in the reverse direction.
>
> Nifi is an amazing tool that allows data to be transferred between two
> separate networks in a very flexible but also secure way. I have
> implemented two processors. The first one ‚merges‘ the attributes and the
> content of a flowfile and sends it to the destination. The second one
> listens on a TCP port, splits attributes and content and creates a new
> flowfile containing all attributes of the origin flow. You can send the
> flow without attributes as well. In this case you can easily netcat a
> binary file to Nifi.
>
> These two processors are useful if you do NOT have a bidirectional
> communication between two NiFi instances and therefore the site-2-site
> mechanism or http(s) cannot be used.
>
> We have been using these processors for a longer period of time (exactly
> the version for 1.13.2) and would like to share these processors with
> others. So the question to you all is: Is someone interested in these
> processors or is this use case too special?
>
> The current source code can be found on GitHub. (
> https://github.com/nerdfunk-net/diode/ <
> https://github.com/nerdfunk-net/diode/>)
>
> I have also implemented a UDP based version of the processor. Due to the
> nature of UDP, this is more complex and these processors are now being
> tested.
>
> Best regards
> Marc