Hello Lars,
You are correct that the WAL is different from swapping. 
Swapping is used when a single connection queue grows to be very large. A chunk 
of the FlowFiles are then swapped out of JVM memory and written to disk. Where 
they are stored until they are swapped back in for processing. The WAL is 
almost solely for persistence of information when an NiFi instance is stopped 
for some reason (ie. restarting or hardware failures).
I am currently working on finishing up a document which will explain these and 
many other concepts utilized by the underlying system. So look out for that in 
the relatively near future. Joe
- - - - - - Joseph Percivalllinkedin.com/in/Percivalle: [email protected]
 

    On Wednesday, February 17, 2016 6:48 PM, Lars Francke 
<[email protected]> wrote:
 

 Thanks a lot for confirming my suspicions.
One last clarification: The WAL is different from the swapping concept, 
correct? I guess it's way faster to swap in a dedicated "dump" than replaying a 
WAL.
On Wed, Feb 17, 2016 at 7:53 PM, Joe Witt <[email protected]> wrote:

Lars,

You are right about the thought process.  We've never provided solid
guidance here but we should.  It is definitely the case that flow file
content is streamed to and from the underlying repository and the only
way to access it is through that API.  Thus well behaved extensions
and the framework itself can handle basically data as large as the
underlying repository has space for.  For the flow file attributes
though these are held in memory in a map with each flowfile object.
So it is important to avoid having vast (undefined) quantities of
attributes or attributes with really large (undefined) values.

There are things we can and should do to make even this relatively
transparent to the users and it is why actually we support swapping
flowfiles to disk when there are large queues because even those inmem
attributes can really add up.

Thanks
Joe

On Wed, Feb 17, 2016 at 11:06 AM, Lars Francke <[email protected]> wrote:
> Hi and sorry for all these questions.
>
> I know that FlowFile content is persisted to the content_repository and can
> handle reasonably large amounts of data. Is the same true for attributes?
>
> I download JSON files (up to 200kb I'd say) and I want to insert them as
> they are into a PostgreSQL JSONB column. I'd love to use the PutSQL
> processor for that but it requires parameters in attributes.
>
> I have a feeling that putting large objects in attributes is a bad idea?




  

Reply via email to