Martin,
I agree with Andrew on the point of a single virtual core not being much. I've
not really
dealt with Google Compute Cloud personally but on the AWS t1.micro instances,
which
offer a similar VM, I don't expect much out of it.
That being said, let's look a bit deeper at some of the
1 vcore, which is not even a full core (a shared and oversubscribed cpu
core). I'm not sure what you expected to see when you raised concurrency to
10 :)
There's a lot of things NiFi is doing behind the scenes, especially around
provenance recording. I don't recommend anything below 4 cores to
Thanks Andrew,
I have added UpdateAttribute processors to update the file names like you
said. Now it works, writing out 1MB files at a time (updated the
MergeContent MaxNumberOfEntries to 1 to achieve that since each line in
my csv is 100 bytes).
The current flow is:
ListHDFS -> FetchHDFS
It looks like your max bin size is 1000 and 10MB. Every time you hit those,
it will write out a merged file. Update tge filename attribute to be unique
before writing via PutHDFS.
Andrew
On Thu, Jun 1, 2017, 2:24 AM Martin Eden wrote:
> Hi Joe,
>
> Thanks for the
Hi Joe,
Thanks for the explanations. Really useful in understanding how it works.
Good to know that in the future this will be improved.
About the appending to HDFS issue let me recap. My flow is:
ListHDFS -> FetchHDFS -> UnpackContent -> SplitText(5000) -> SplitText(1)
-> RouteOnContent ->
Split failed before even with backpressure:
- yes that backpressure kicks in when destination queues for a given
processor have reached their target size (in count of flowfiles or
total size represented). However, to clarify why the OOM happened it
is important to realize that it is not about
Hi Koji,
Good to know that it can handle large files. I thought it was the case but
I was just not seeing in practice.
Yes I am using 'Line Split Count' as 1 at SplitText.
I added the extra SplitText processor exactly as you suggested and the OOM
went away. So, big thanks!!!
However I have 2
Hi Martin,
Generally, NiFi processor doesn't load entire content of file and is
capable of handling huge files.
However, having massive amount of FlowFiles can cause OOM issue as
FlowFiles and its Attributes resides on heap.
I assume you are using 'Line Split Count' as 1 at SplitText.
We
Hi all,
I have a vanilla Nifi 1.2.0 node with 1GB of heap.
The flow I am trying to run is:
ListHDFS -> FetchHDFS -> SplitText -> RouteOnContent -> MergeContent ->
PutHDFS
When I give it a 300MB input zip file (2.5GB uncompressed) I am getting
Java OutOfMemoryError as below.
Does NiFi read in