Re: NiFi ram usage

2017-08-31 Thread Adam Lamar
Hi All,

Thanks for the messages, especially thanks for those doc links Andy! I
think it would definitely be an improvement to add some memory requirements
to the docs, even just a ballpark minimum figure. NiFi on my mac uses about
1GB at startup too (with no flows).

Mike, Jeff,

On a fresh Linux instance with 1GB of memory, `free -m` reports about 650MB
completely free, and 800 available. There are no databases or processes
running since the instance is completely new at that point. So while there
is some OS overhead, there is a decent amount available to work with too.

Although the jvm heap size is set to 512MB, the jvm still uses about 1.1GB
of OS memory at startup. In addition to the heap memory, the jvm needs
memory to load classes amongst other things. In reality a java process uses
more than the heap size specification only, and all of NiFi is just a lot
of code :)

The good news: I did a little experiment by removing all the nars in the
lib/ directory except for these:

- nifi-framework
- nifi-jetty-bundle
- nifi-provenance-repository
- nifi-standard-nar
- nifi-standard-services
- nifi-aws-nar (since I was using aws processors)

I also changed the bootstrap to use `java.arg.2=-Xms100m` to avoid NiFi
gobbling up too much ram initially.

Not only did NiFi start, but after moving 100GB of content, NiFi was only
using about 575MB total OS memory! About half of what I had seen
previously, and had no trouble operating on the 1GB Linux instance.

Hopefully this is useful to someone else who wants to run NiFi in a more
constrained environment (and save a few dollars on cloud resources too).

Thanks all for the discussion.

Adam


On Thu, Aug 31, 2017 at 10:20 AM, Andy LoPresto 
wrote:

> Adam,
>
> Hopefully the team is able to get your specific issue resolved here, but
> to answer a question you asked that I think may have been missed, we do
> have “System Requirements” [1] and “Configuration Best Practices” [2]
> (which is really “Additional Requirements” now) both documented. Neither
> give an explicit value for memory, so that’s an opportunity for us to
> improve the documentation.
>
> [1] https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#
> system-requirements
> [2] https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#
> configuration-best-practices
>
> Andy LoPresto
> alopre...@apache.org
> *alopresto.apa...@gmail.com *
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> On Aug 31, 2017, at 5:38 AM, Jeff  wrote:
>
> Adam,
>
> Mike brings up a good point...  When your VM has started, and you haven't
> started NiFi yet, how much memory is free in the system?  An instance of
> NiFi with an empty flow should have no trouble running in 512mb of heap
> space.  I have a flow with a few processors on it and the heap usage
> averages around 250mb for me, a default bootstrap.conf.
>
> On Thu, Aug 31, 2017 at 7:59 AM Pierre Villard <
> pierre.villard...@gmail.com> wrote:
>
>> As Jeff, I'm a bit surprised by what you are experiencing. I've never
>> changed the default values of 512MB when working with NiFi on my laptop and
>> never hit OOM errors. Are you sure that 1GB is available on the VM before
>> starting NiFi?
>>
>> Pierre
>>
>> 2017-08-31 13:52 GMT+02:00 Mike Thomsen :
>>
>>> Adam,
>>>
>>> I cannot say exactly why the default settings won't work for you on a
>>> clean installation, but it likely has to do with how small the VM is. The
>>> OS overhead alone is probably a few hundred MB of RAM. If you have anything
>>> else running, even just MySQL or MongoDB it's entirely possible that you
>>> actually don't have enough memory to give even 512MB to NiFi.
>>>
>>> My recommendation would be 4GB of RAM for the VM with Xms1G and Xmx2G
>>> for the heap sizes. That's very reasonable for experimenting with something
>>> like NiFi. The ram usage is very difficult to calculate in advance because
>>> it's based entirely on what you're doing with NiFi.
>>>
>>> Mike
>>>
>>> On Wed, Aug 30, 2017 at 11:45 PM, Adam Lamar 
>>> wrote:
>>>
 Jeff,

 This was a new installation so I actually hadn't set up any flows yet.
 NiFi wouldn't start immediately after installation (before I could
 configure any flows) because the system had too little ram. The 1.1GB
 figure is private (RSS) memory usage, which exceeded the 1GB instance limit
 (and the instance had no swap configured).

 Is there any system requirements documentation? I couldn't find any
 docs on minimum system specs, so I guess I'm wondering if the ram usage is
 known and expected, and if there are any ways to get the ram usage down.

 Thanks in advance,
 Adam

 ​

>>>
>>>
>>
>


Re: NiFi ram usage

2017-08-31 Thread Andy LoPresto
Adam,

Hopefully the team is able to get your specific issue resolved here, but to 
answer a question you asked that I think may have been missed, we do have 
“System Requirements” [1] and “Configuration Best Practices” [2] (which is 
really “Additional Requirements” now) both documented. Neither give an explicit 
value for memory, so that’s an opportunity for us to improve the documentation.

[1] 
https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#system-requirements
[2] 
https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#configuration-best-practices

Andy LoPresto
alopre...@apache.org
alopresto.apa...@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Aug 31, 2017, at 5:38 AM, Jeff  wrote:
> 
> Adam,
> 
> Mike brings up a good point...  When your VM has started, and you haven't 
> started NiFi yet, how much memory is free in the system?  An instance of NiFi 
> with an empty flow should have no trouble running in 512mb of heap space.  I 
> have a flow with a few processors on it and the heap usage averages around 
> 250mb for me, a default bootstrap.conf.
> 
> On Thu, Aug 31, 2017 at 7:59 AM Pierre Villard  > wrote:
> As Jeff, I'm a bit surprised by what you are experiencing. I've never changed 
> the default values of 512MB when working with NiFi on my laptop and never hit 
> OOM errors. Are you sure that 1GB is available on the VM before starting NiFi?
> 
> Pierre
> 
> 2017-08-31 13:52 GMT+02:00 Mike Thomsen  >:
> Adam,
> 
> I cannot say exactly why the default settings won't work for you on a clean 
> installation, but it likely has to do with how small the VM is. The OS 
> overhead alone is probably a few hundred MB of RAM. If you have anything else 
> running, even just MySQL or MongoDB it's entirely possible that you actually 
> don't have enough memory to give even 512MB to NiFi.
> 
> My recommendation would be 4GB of RAM for the VM with Xms1G and Xmx2G for the 
> heap sizes. That's very reasonable for experimenting with something like 
> NiFi. The ram usage is very difficult to calculate in advance because it's 
> based entirely on what you're doing with NiFi.
> 
> Mike
> 
> On Wed, Aug 30, 2017 at 11:45 PM, Adam Lamar  > wrote:
> Jeff,
> 
> This was a new installation so I actually hadn't set up any flows yet. NiFi 
> wouldn't start immediately after installation (before I could configure any 
> flows) because the system had too little ram. The 1.1GB figure is private 
> (RSS) memory usage, which exceeded the 1GB instance limit (and the instance 
> had no swap configured).
> 
> Is there any system requirements documentation? I couldn't find any docs on 
> minimum system specs, so I guess I'm wondering if the ram usage is known and 
> expected, and if there are any ways to get the ram usage down.
> 
> Thanks in advance,
> Adam
> 
> ​
> 
> 



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: JSON array chunking

2017-08-31 Thread Bryan Bende
Neil,

I'm a little confused as to what format your initial data is in... You
showed an example payload as JSON, but then mentioned using an
AvroReader, so it wasn't clear to me if your starting point is JSON or
Avro.

Assuming it is JSON, I put together a template that shows how to split
your sample data:

https://gist.github.com/bbende/f73d06c0d35ed1aeb2603a8f87276ed7

I used the second schema you have (the one where the top-level element
is a record) and then SplitRecord with a JsonTreeReader and
JsonRecordSetWriter.

The sample data I sent in was your example data, and it produced two
flow files coming out of SplitRecord, one for each element of the
array.

Let us know if this not what you are trying to do.

Thanks,

Bryan


On Wed, Aug 30, 2017 at 8:31 PM, Neil Derraugh
 wrote:
> I should have mentioned I tried starting with a JsonPathReader before the
> AvroReader.  I had a property I was calling root with a value of $.  I can
> post details about that too if it would be helpful.
>
> On Wed, Aug 30, 2017 at 8:08 PM, Neil Derraugh
>  wrote:
>>
>> I have arbitrary JSON arrays that I want to split into chunks.  I've been
>> (unsuccessfully) trying to figure this out with InferAvroSchema ->
>> SplitJson(AvroReader, JsonRecordSetWriter).
>>
>> Here's an example payload:
>> [{
>> "id": "56740f4b-48de-0502-afdc-59a463b3f6dc",
>> "account_id": "b0dad7e2-7bb9-4ca9-b9fd-134870656eb2",
>> "contact_id": "a0ebd53a-77c5-e2ea-4787-59a463053b1b",
>> "date_modified": 1503959931000,
>> "deleted": 0
>>   },
>>   {
>> "id": "1ac80e25-7f28-f5c6-bac0-59a4636ef31f",
>> "account_id": "71d4904e-f8f1-4209-bff9-4d080057ea84",
>> "contact_id": "e429bfe6-9c89-8b81-9ee6-59a463fc7fd8",
>> "date_modified": 1503959873000,
>> "deleted": 0
>>   }]
>>
>> Here's the schema that gets inferred (the AvroReader's Avro Record Name is
>> "root"):
>> {
>>   "type": "array",
>>   "items": {
>> "type": "record",
>> "name": "root",
>> "fields": [
>>   {
>> "name": "id",
>> "type": "string",
>> "doc": "Type inferred from
>> '\"56740f4b-48de-0502-afdc-59a463b3f6dc\"'"
>>   },
>>   {
>> "name": "account_id",
>> "type": "string",
>> "doc": "Type inferred from
>> '\"b0dad7e2-7bb9-4ca9-b9fd-134870656eb2\"'"
>>   },
>>   {
>> "name": "contact_id",
>> "type": "string",
>> "doc": "Type inferred from
>> '\"a0ebd53a-77c5-e2ea-4787-59a463053b1b\"'"
>>   },
>>   {
>> "name": "date_modified",
>> "type": "long",
>> "doc": "Type inferred from '1503959931000'"
>>   },
>>   {
>> "name": "deleted",
>> "type": "int",
>> "doc": "Type inferred from '0'"
>>   }
>> ]
>>   }
>> }
>>
>> When I use ${inferred.avro.schema} for both the AvroReader and the
>> JsonRecordSetWriter I get:
>> SplitRecord[id=b3453515-caaa-1e1f-8bb6-26dec275a0d5] Failed to create
>> Record Writer for
>> StandardFlowFileRecord[uuid=45d7a0d2-258a-4f40-b5f9-4886eb2c2a76,claim=StandardContentClaim
>> [resourceClaim=StandardResourceClaim[id=1504118228480-325,
>> container=default, section=325], offset=0,
>> length=86462199],offset=0,name=accounts-contacts.json.avro,size=86462199];
>> routing to failure: org.apache.nifi.schema.access.SchemaNotFoundException:
>> org.apache.avro.AvroRuntimeException: Not a record:
>> {"type":"array","items":{"type":"record","name":"root","fields":[{"name":"id","type":"string","doc":"Type
>> inferred from
>> '\"56740f4b-48de-0502-afdc-59a463b3f6dc\"'"},{"name":"account_id","type":"string","doc":"Type
>> inferred from
>> '\"b0dad7e2-7bb9-4ca9-b9fd-134870656eb2\"'"},{"name":"contact_id","type":"string","doc":"Type
>> inferred from
>> '\"a0ebd53a-77c5-e2ea-4787-59a463053b1b\"'"},{"name":"date_modified","type":"long","doc":"Type
>> inferred from '1503959931000'"},{"name":"deleted","type":"int","doc":"Type
>> inferred from '0'"}]}}.
>>
>> The stack trace:
>> 2017-08-30 19:42:21,692 ERROR [Timer-Driven Process Thread-9]
>> o.a.nifi.processors.standard.SplitRecord
>> SplitRecord[id=b3453515-caaa-1e1f-8bb6-26dec275a0d5] Failed to create Record
>> Writer for
>> StandardFlowFileRecord[uuid=a5f720cf-98a8-4c29-bd91-098c7f25448d,claim=StandardContentClaim
>> [resourceClaim=StandardResourceClaim[id=1504121074997-336,
>> container=default, section=336], offset=1013917,
>> length=454],offset=0,name=626851422080935,size=454]; routing to failure:
>> org.apache.nifi.schema.access.SchemaNotFoundException:
>> org.apache.avro.AvroRuntimeException: Not a record:
>> {"type":"array","items":{"type":"record","name":"root","fields":[{"name":"id","type":"string","doc":"Type
>> inferred from
>> '\"56740f4b-48de-0502-afdc-59a463b3f6dc\"'"},{"name":"account_id","type":"string","doc":"Type
>> inferred from
>> '\"b0dad7e2-7bb9-4ca9-b9fd-134870656eb2\"'"},{"name":"contact_id","type":"string","doc":"Type
>> inferred from

Re: DBCPConnectionPool SqlServer and Kerberos

2017-08-31 Thread Bryan Bende
Hello,

As far as I know there is no specific Kerberos logic in the DBCPConnectionPool.

I would assume something would have to be built into the controller
service like the HiveConnectionPool, unless the driver does something
for you.

Although I am not that familiar with using Kerberos and relational databases.

-Bryan


On Wed, Aug 30, 2017 at 2:42 PM, Noe Detore  wrote:
> Hello
>
> Does anyone have experience or know if DBCPConnectionPool using SqlServer
> can be configured to authenticate with Kerberos?
>
> Thanks
> Noe


Re: NiFi ram usage

2017-08-31 Thread Jeff
Adam,

Mike brings up a good point...  When your VM has started, and you haven't
started NiFi yet, how much memory is free in the system?  An instance of
NiFi with an empty flow should have no trouble running in 512mb of heap
space.  I have a flow with a few processors on it and the heap usage
averages around 250mb for me, a default bootstrap.conf.

On Thu, Aug 31, 2017 at 7:59 AM Pierre Villard 
wrote:

> As Jeff, I'm a bit surprised by what you are experiencing. I've never
> changed the default values of 512MB when working with NiFi on my laptop and
> never hit OOM errors. Are you sure that 1GB is available on the VM before
> starting NiFi?
>
> Pierre
>
> 2017-08-31 13:52 GMT+02:00 Mike Thomsen :
>
>> Adam,
>>
>> I cannot say exactly why the default settings won't work for you on a
>> clean installation, but it likely has to do with how small the VM is. The
>> OS overhead alone is probably a few hundred MB of RAM. If you have anything
>> else running, even just MySQL or MongoDB it's entirely possible that you
>> actually don't have enough memory to give even 512MB to NiFi.
>>
>> My recommendation would be 4GB of RAM for the VM with Xms1G and Xmx2G for
>> the heap sizes. That's very reasonable for experimenting with something
>> like NiFi. The ram usage is very difficult to calculate in advance because
>> it's based entirely on what you're doing with NiFi.
>>
>> Mike
>>
>> On Wed, Aug 30, 2017 at 11:45 PM, Adam Lamar 
>> wrote:
>>
>>> Jeff,
>>>
>>> This was a new installation so I actually hadn't set up any flows yet.
>>> NiFi wouldn't start immediately after installation (before I could
>>> configure any flows) because the system had too little ram. The 1.1GB
>>> figure is private (RSS) memory usage, which exceeded the 1GB instance limit
>>> (and the instance had no swap configured).
>>>
>>> Is there any system requirements documentation? I couldn't find any docs
>>> on minimum system specs, so I guess I'm wondering if the ram usage is known
>>> and expected, and if there are any ways to get the ram usage down.
>>>
>>> Thanks in advance,
>>> Adam
>>>
>>> ​
>>>
>>
>>
>


Re: NiFi ram usage

2017-08-31 Thread Pierre Villard
As Jeff, I'm a bit surprised by what you are experiencing. I've never
changed the default values of 512MB when working with NiFi on my laptop and
never hit OOM errors. Are you sure that 1GB is available on the VM before
starting NiFi?

Pierre

2017-08-31 13:52 GMT+02:00 Mike Thomsen :

> Adam,
>
> I cannot say exactly why the default settings won't work for you on a
> clean installation, but it likely has to do with how small the VM is. The
> OS overhead alone is probably a few hundred MB of RAM. If you have anything
> else running, even just MySQL or MongoDB it's entirely possible that you
> actually don't have enough memory to give even 512MB to NiFi.
>
> My recommendation would be 4GB of RAM for the VM with Xms1G and Xmx2G for
> the heap sizes. That's very reasonable for experimenting with something
> like NiFi. The ram usage is very difficult to calculate in advance because
> it's based entirely on what you're doing with NiFi.
>
> Mike
>
> On Wed, Aug 30, 2017 at 11:45 PM, Adam Lamar  wrote:
>
>> Jeff,
>>
>> This was a new installation so I actually hadn't set up any flows yet.
>> NiFi wouldn't start immediately after installation (before I could
>> configure any flows) because the system had too little ram. The 1.1GB
>> figure is private (RSS) memory usage, which exceeded the 1GB instance limit
>> (and the instance had no swap configured).
>>
>> Is there any system requirements documentation? I couldn't find any docs
>> on minimum system specs, so I guess I'm wondering if the ram usage is known
>> and expected, and if there are any ways to get the ram usage down.
>>
>> Thanks in advance,
>> Adam
>>
>> ​
>>
>
>


Re: NiFi ram usage

2017-08-31 Thread Mike Thomsen
Adam,

I cannot say exactly why the default settings won't work for you on a clean
installation, but it likely has to do with how small the VM is. The OS
overhead alone is probably a few hundred MB of RAM. If you have anything
else running, even just MySQL or MongoDB it's entirely possible that you
actually don't have enough memory to give even 512MB to NiFi.

My recommendation would be 4GB of RAM for the VM with Xms1G and Xmx2G for
the heap sizes. That's very reasonable for experimenting with something
like NiFi. The ram usage is very difficult to calculate in advance because
it's based entirely on what you're doing with NiFi.

Mike

On Wed, Aug 30, 2017 at 11:45 PM, Adam Lamar  wrote:

> Jeff,
>
> This was a new installation so I actually hadn't set up any flows yet.
> NiFi wouldn't start immediately after installation (before I could
> configure any flows) because the system had too little ram. The 1.1GB
> figure is private (RSS) memory usage, which exceeded the 1GB instance limit
> (and the instance had no swap configured).
>
> Is there any system requirements documentation? I couldn't find any docs
> on minimum system specs, so I guess I'm wondering if the ram usage is known
> and expected, and if there are any ways to get the ram usage down.
>
> Thanks in advance,
> Adam
>
> ​
>