Re: QueryRecord and NULLs

2019-03-07 Thread Koji Kawamura
Using NULLIF can be a workaround. I was able to populate new columns with null.

SELECT
*
,NULLIF(5, 5) as unit_cerner_alias
,NULLIF(5, 5) as room_cerner_alias
,NULLIF(5, 5) as bed_cerner_alias
FROM FLOWFILE

On Fri, Mar 8, 2019 at 7:57 AM Boris Tyukin  wrote:
>
> I am struggling for an hour now with a very simple thing.
>
> I need to add 3 new fields to a record and set them to NULL but it does not 
> work.
>
> I tried null instead - same thing. I checked Calcite docs and I do not see 
> anything special about NULL. And I know you can do it in SQL.
>
> This works:
>
> SELECT
> *
> ,'' as unit_cerner_alias
> ,'' as room_cerner_alias
> ,'' as bed_cerner_alias
> FROM FLOWFILE
>
> But this does not:
>
> SELECT
> *
> ,NULL as unit_cerner_alias
> ,NULL as room_cerner_alias
> ,NULL as bed_cerner_alias
> FROM FLOWFILE
>
> Then I use LookupRecord processor to populate them or leave with NULL


Re: Convert Avro to ORC or JSON processor - retaining the data type

2019-03-07 Thread Koji Kawamura
Hi Ravi,

I looked at following links, Hive does support some logical types like
timestamp-millis, but not sure if decimal is supported.
https://issues.apache.org/jira/browse/HIVE-8131
https://cwiki.apache.org/confluence/display/Hive/AvroSerDe#AvroSerDe-AvrotoHivetypeconversion

If treating the number as String works in your use-case, then I'd
recommend disabling "Use Avro Logical Types" at ExecuteSQL.

Thanks,
Koji

On Fri, Mar 8, 2019 at 4:48 AM Ravi Papisetti (rpapiset)
 wrote:
>
> Hi,
>
>
>
> Nifi version 1.7
>
>
>
> We have a dataflow that would get data from Oracle database and load into 
> hive tables.
>
>
>
> Data flow is something like below:
>
> GenerateTableFetch -> ExecuteSQL > AvrotoJson/ORC (we tried both) > PutHDFS > 
> ListHDFS> ReplaceTExt (to build load data query form the file) > PutHiveQL.
>
>
>
> Data at source (ex: column "cpyKey" NUMBER)  in Number/INT format is being 
> written as
>
> {"type":"record","name":"NiFi_ExecuteSQL_Record","namespace":"any.data","fields":[{"name":"cpyKey","type":["null",{"type":"bytes","logicalType":"decimal","precision":10,"scale":0}]}
>
>
>
> When this is inserted into hive table weather data is loaded from ORC 
> (convertAvroToORC)  file or JSON (ConvertAvroToJSON) file, querying data from 
> hive throws parsing exception with incompatible data types.
>
>
>
> Error: java.io.IOException: java.lang.RuntimeException: ORC split generation 
> failed with exception: java.lang.IllegalArgumentException: ORC does not 
> support type conversion from file type binary (1) to reader type bigint (1) 
> (state=,code=0)
>
>
>
> Appreciate any help on this.
>
>
>
> Thanks,
>
> Ravi Papisetti


Re: Errors when attempting to use timestamp-millis fields with QueryRecord

2019-03-07 Thread Koji Kawamura
Hello,

I believe this is a known issue. Unfortunately, querying against
timestamp column is not supported.
https://issues.apache.org/jira/browse/NIFI-5888

I'm working on fixing this at Calcite project, the sql execution
engine underneath QueryRecord.
https://issues.apache.org/jira/browse/CALCITE-1703

Thanks,
Koji

On Thu, Mar 7, 2019 at 11:11 PM Edward George  wrote:
>
> I have some input avro with some fields using the timestamp-millis 
> logicalType. I've been attempting to use them with QueryRecord to filter, or 
> otherwise operate on the fields, using timestamp operations and I get errors 
> produced.
>
> For instance the following SQL queries:
>
> SELECT * FROM FLOWFILE WHERE dt > TIMESTAMP '1984-01-01 00:00:00'
>
> SELECT * FROM FLOWFILE WHERE CAST(dt AS TIMESTAMP) > TIMESTAMP '1984-01-01 
> 00:00:00'
>
> SELECT YEAR(dt) FROM FLOWFILE
>
> All fail with the following error:
>
>  java.lang.RuntimeException: Cannot convert 2019-02-19 01:01:01.0 to long
>
> Where the date `2019-02-19 01:01:01` is from the first row in the flowfile.
>
> Is this a bug with the implementation of QueryRecord or is there something 
> wrong with my queries / expectations here?
>
> Tested on nifi v1.9.0 using the official docker image.
>
> If I instead try the following SQL:
>
> SELECT * FROM FLOWFILE WHERE dt > 1
>
> I can see that the timestamp-millis column is represented as a 
> java.sql.Timestamp object:
>
>  org.apache.calcite.sql.validate.SqlValidatorException: Cannot apply '>' to 
> arguments of type ' > '.
>
> This was reproduced using this test avro file:
>
> $ avro-utils getmeta nifi-data/out2.avro
> avro.codec  deflate
> avro.schema {"type": "record", "name": "x", "fields": [{"name": "dt", 
> "type": {"logicalType": "timestamp-millis", "type": "long"}}, {"name": "v", 
> "type": "long"}], "__fastavro_parsed": true}
>
> $ avro-utils tojson nifi-data/out2.avro
> {"dt":1550538061000,"v":1}
> {"dt":-220894171,"v":2}
> {"dt":323687349000,"v":3}
>
> $ base64 nifi-data/out2.avro
> T2JqAQQUYXZyby5jb2RlYw5kZWZsYXRlFmF2cm8uc2NoZW1h5AJ7InR5cGUiOiAicmVjb3JkIiwg
> Im5hbWUiOiAieCIsICJmaWVsZHMiOiBbeyJuYW1lIjogImR0IiwgInR5cGUiOiB7ImxvZ2ljYWxU
> eXBlIjogInRpbWVzdGFtcC1taWxsaXMiLCAidHlwZSI6ICJsb25nIn19LCB7Im5hbWUiOiAidiIs
> ICJ0eXBlIjogImxvbmcifV0sICJfX2Zhc3RhdnJvX3BhcnNlZCI6IHRydWV9AHN0YWNraHV0bnN0
> YWNrMTUGPAEWAOn/kNOptKBaAt/q/PLJgAEEkOyn1OsSBpxyDHN0YWNraHV0bnN0YWNrMTU=
>
> And more context for the stacktrace for the error above is:
> nifi_1_63870dd343fd   | java.lang.RuntimeException: Cannot convert 
> 2019-02-19 01:01:01.0 to long
> nifi_1_63870dd343fd   | at 
> org.apache.calcite.runtime.SqlFunctions.cannotConvert(SqlFunctions.java:1460)
> nifi_1_63870dd343fd   | at 
> org.apache.calcite.runtime.SqlFunctions.toLong(SqlFunctions.java:1616)
> nifi_1_63870dd343fd   | at Baz$1$1.moveNext(Unknown Source)
> nifi_1_63870dd343fd   | at 
> org.apache.calcite.linq4j.Linq4j$EnumeratorIterator.(Linq4j.java:676)
> nifi_1_63870dd343fd   | at 
> org.apache.calcite.linq4j.Linq4j.enumeratorIterator(Linq4j.java:96)
> nifi_1_63870dd343fd   | at 
> org.apache.calcite.linq4j.AbstractEnumerable.iterator(AbstractEnumerable.java:33)
> nifi_1_63870dd343fd   | at 
> org.apache.calcite.avatica.MetaImpl.createCursor(MetaImpl.java:90)
> nifi_1_63870dd343fd   | at 
> org.apache.calcite.avatica.AvaticaResultSet.execute(AvaticaResultSet.java:184)
> nifi_1_63870dd343fd   | at 
> org.apache.calcite.jdbc.CalciteResultSet.execute(CalciteResultSet.java:64)
> nifi_1_63870dd343fd   | at 
> org.apache.calcite.jdbc.CalciteResultSet.execute(CalciteResultSet.java:43)
> nifi_1_63870dd343fd   | at 
> org.apache.calcite.avatica.AvaticaConnection.executeQueryInternal(AvaticaConnection.java:573)
> nifi_1_63870dd343fd   | at 
> org.apache.calcite.avatica.AvaticaPreparedStatement.executeQuery(AvaticaPreparedStatement.java:137)
> nifi_1_63870dd343fd   | at 
> org.apache.nifi.processors.standard.QueryRecord.query(QueryRecord.java:465)
> nifi_1_63870dd343fd   | at 
> org.apache.nifi.processors.standard.QueryRecord.onTrigger(QueryRecord.java:320)


Re: Different NiFi Node sizes within same cluster

2019-03-07 Thread Koji Kawamura
> The last thing I'm looking to understand is what Byran B brought up, do load 
> balanced connections take into consideration the load of each node?

No, load balanced connection doesn't use load of each node to
calculate destination currently.

As future improvement ideas.
We can implement another FlowFilePartitioner that uses QueuePartition.size().
Or add a nifi.property to specify the number of partitions each node
has. This may be helpful if the cluster consists of nodes having
different specs.

The rest is a note for some important lines of code to understand how
load balancing and partitioning works.

None of FlowFilePartitioner implementation takes into consideration
the load of each node.
- PARTITION_BY_ATTRIBUTE: Calculate hash from FlowFile attribute
value, then calculate target partition using consistent hashing. If
the attribute value doesn't distribute well, some node gets higher
number of FlowFiles.
- ROUND_ROBIN: We could implement another round robin strategy, that
uses QueuePartition.size() to pick a destination with less queued
FlowFile.
- SINGLE_NODE: Always uses the partitions[0]. Meaning the first node
in node identifier order.
https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/queue/clustered/partition/FlowFilePartitioner.java

For example, let's use 5 node cluster.

Partitions are created using sorted node identifiers.
The num of partitions = the num of nodes.
Each node will have 5 partitions. 1 LocalPartition, and 4 RemoteQueuePartition.
https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/queue/clustered/SocketLoadBalancedFlowFileQueue.java#L140,L162

Each RemoteQueuePartition register itself to clientRegistry.
In this case, there are 4 clients for this loop.
Each node execute this task periodically.
https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/queue/clustered/client/async/nio/NioAsyncLoadBalanceClientTask.java#L50,L76

Interestingly, the task is created for N times. N is configured at
nifi.cluster.load.balance.max.thread.count. 8 by default.
So, 8 threads loops through 4 clients?
https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/FlowController.java#L652

Thanks,
Koji

On Thu, Mar 7, 2019 at 9:19 PM Chad Woodhead  wrote:
>
> Thanks all for the input. So from what I'm gathering, storage differences of 
> around 5 GB (125 GB vs 130 GB) should not cause any problems/load impacts. 
> Larger storage differences could have load impacts. Differences in CPU and 
> RAM could definitely have load impacts. Luckily my older nodes have the same 
> CPU and RAM counts/specs as my new nodes.
>
> The last thing I'm looking to understand is what Byran B brought up, do load 
> balanced connections take into consideration the load of each node?
>
> Thanks,
> Chad
>
> On Wed, Mar 6, 2019 at 4:50 PM Bryan Bende  wrote:
>>
>> Yea ListenTCP also doesn't handle the back-pressure with the client
>> the way it really should.
>>
>> Regarding the load balancing, I believe traditional s2s does factor in
>> the load of each node when deciding how to load balance, but I don't
>> know if this is part of load balanced connections or not. Mark P would
>> know for sure.
>>
>> On Wed, Mar 6, 2019 at 4:47 PM James Srinivasan
>>  wrote:
>> >
>> > Yup, but because of the unfortunate way the source (outside NiFi)
>> > works, it doesn't buffer for long when the connection doesn't pull or
>> > drops. It behaves far more like a 5 Mbps UDP stream really :-(
>> >
>> > On Wed, 6 Mar 2019 at 21:44, Bryan Bende  wrote:
>> > >
>> > > James, just curious, what was your source processor in this case? 
>> > > ListenTCP?
>> > >
>> > > On Wed, Mar 6, 2019 at 4:26 PM Jon Logan  wrote:
>> > > >
>> > > > What really would resolve some of these issues is backpressure on CPU 
>> > > > -- ie. let Nifi throttle itself down to not choke the machine until it 
>> > > > dies if constrained on CPU. Easier said than done unfortunately.
>> > > >
>> > > > On Wed, Mar 6, 2019 at 4:23 PM James Srinivasan 
>> > > >  wrote:
>> > > >>
>> > > >> In our case, backpressure applied all the way up to the TCP network
>> > > >> source which meant we lost data. AIUI, current load balancing is round
>> > > >> robin (and two other options prob not relevant). Would actual load
>> > > >> balancing (e.g. send to node with lowest OS load, or number of active
>> > > >> threads) be a reasonable request?
>> > > >>
>> > > >> On Wed, 6 Mar 2019 at 20:51, Joe Witt  wrote:
>> > > >> >
>> > > >> > This is generally workable (heterogenous node capabilities) in NiFi 
>> > > >> > clustering.  But you do want to leverage back-pressure and load 
>> > > >> 

QueryRecord and NULLs

2019-03-07 Thread Boris Tyukin
I am struggling for an hour now with a very simple thing.

I need to add 3 new fields to a record and set them to NULL but it does not
work.

I tried null instead - same thing. I checked Calcite docs and I do not see
anything special about NULL. And I know you can do it in SQL.

This works:

SELECT
*
,'' as unit_cerner_alias
,'' as room_cerner_alias
,'' as bed_cerner_alias
FROM FLOWFILE

But this does not:

SELECT
*
,NULL as unit_cerner_alias
,NULL as room_cerner_alias
,NULL as bed_cerner_alias
FROM FLOWFILE

Then I use LookupRecord processor to populate them or leave with NULL


Convert Avro to ORC or JSON processor - retaining the data type

2019-03-07 Thread Ravi Papisetti (rpapiset)
Hi,

Nifi version 1.7

We have a dataflow that would get data from Oracle database and load into hive 
tables.

Data flow is something like below:
GenerateTableFetch -> ExecuteSQL > AvrotoJson/ORC (we tried both) > PutHDFS > 
ListHDFS> ReplaceTExt (to build load data query form the file) > PutHiveQL.

Data at source (ex: column "cpyKey" NUMBER)  in Number/INT format is being 
written as
{"type":"record","name":"NiFi_ExecuteSQL_Record","namespace":"any.data","fields":[{"name":"cpyKey","type":["null",{"type":"bytes","logicalType":"decimal","precision":10,"scale":0}]}

When this is inserted into hive table weather data is loaded from ORC 
(convertAvroToORC)  file or JSON (ConvertAvroToJSON) file, querying data from 
hive throws parsing exception with incompatible data types.


Error: java.io.IOException: java.lang.RuntimeException: ORC split generation 
failed with exception: java.lang.IllegalArgumentException: ORC does not support 
type conversion from file type binary (1) to reader type bigint (1) 
(state=,code=0)

Appreciate any help on this.

Thanks,
Ravi Papisetti


ExecuteSQLRecord and timestamps

2019-03-07 Thread Boris Tyukin
Hi guys,

we just upgraded to 1.9 and I was excited to start using new
ExecuteSQLRecord processor.

While I was migrating an older flow, that uses ExecuteSQL processor I've
noticed that timestamp/date types are coming as integers not strings like
before.

Also AVRO schema inferred from a database is identical for both processors.
e.g.
{"name":"update_dt_tm","type":["null","string"]}]}

but actual values in flowfiles are very different. e.g.

ExecuteSQL:
2014-12-23 11:04:56.664

but ExecuteSQLRecord:
1419350696664 (it is not from the same record but you get an idea).

I now wonder what happens with other data types...

Note, I used AVROWriter with the infered schema. Pretty much all the
properties were left with default values, including logical AVRO data types
(set to no)

Is this a known issue or I need to submit new JIRA?

Thanks.
Boris


Re: PutS3Object failing when using non-Latin characters in filename

2019-03-07 Thread Andy LoPresto
The fact that the signatures don’t match may indicate some kind of character 
normalization or encoding difference with the way AWS handles the input. There 
is an existing Jira for handling filenames with orthographic marks in FetchFile 
[1]. 

[1] https://issues.apache.org/jira/browse/NIFI-6051


Andy LoPresto
alopre...@apache.org
alopresto.apa...@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Mar 7, 2019, at 8:04 AM, Mike Thomsen  wrote:
> 
> I kept the default for the object key, which is ${filename} and some of our 
> files have non-Latin characters. The error from AWS is:
> 
> > The request signature we calculated does not match the signature you 
> > provided. Check your key and signing method. (Service: Amazon S3; Status 
> > Code: 403; Error Code: SignatureDoesNotMatch; Request ID: ; S3 
> > Extended Request ID: )
> 
> There are no obvious encoding issues on the NiFi end. It renders the 
> characters just fine in the flowfile viewer. Is it something with UTF8 
> characters being problematic here? Any mitigation suggestions?
> 
> Thanks,
> 
> Mike



PutS3Object failing when using non-Latin characters in filename

2019-03-07 Thread Mike Thomsen
I kept the default for the object key, which is ${filename} and some of our
files have non-Latin characters. The error from AWS is:

> The request signature we calculated does not match the signature you
provided. Check your key and signing method. (Service: Amazon S3; Status
Code: 403; Error Code: SignatureDoesNotMatch; Request ID: ; S3
Extended Request ID: )

There are no obvious encoding issues on the NiFi end. It renders the
characters just fine in the flowfile viewer. Is it something with UTF8
characters being problematic here? Any mitigation suggestions?

Thanks,

Mike


Errors when attempting to use timestamp-millis fields with QueryRecord

2019-03-07 Thread Edward George
I have some input avro with some fields using the timestamp-millis
logicalType. I've been attempting to use them with QueryRecord to filter,
or otherwise operate on the fields, using timestamp operations and I get
errors produced.

For instance the following SQL queries:

SELECT * FROM FLOWFILE WHERE dt > TIMESTAMP '1984-01-01 00:00:00'

SELECT * FROM FLOWFILE WHERE CAST(dt AS TIMESTAMP) > TIMESTAMP '1984-01-01
00:00:00'

SELECT YEAR(dt) FROM FLOWFILE

All fail with the following error:

 java.lang.RuntimeException: Cannot convert 2019-02-19 01:01:01.0 to long

Where the date `2019-02-19 01:01:01` is from the first row in the flowfile.

Is this a bug with the implementation of QueryRecord or is there something
wrong with my queries / expectations here?

Tested on nifi v1.9.0 using the official docker image.

If I instead try the following SQL:

SELECT * FROM FLOWFILE WHERE dt > 1

I can see that the timestamp-millis column is represented as a
java.sql.Timestamp object:

 org.apache.calcite.sql.validate.SqlValidatorException: Cannot apply '>' to
arguments of type ' > '.

This was reproduced using this test avro file:

$ avro-utils getmeta nifi-data/out2.avro
avro.codec  deflate
avro.schema {"type": "record", "name": "x", "fields": [{"name": "dt",
"type": {"logicalType": "timestamp-millis", "type": "long"}}, {"name": "v",
"type": "long"}], "__fastavro_parsed": true}

$ avro-utils tojson nifi-data/out2.avro
{"dt":1550538061000,"v":1}
{"dt":-220894171,"v":2}
{"dt":323687349000,"v":3}

$ base64 nifi-data/out2.avro
T2JqAQQUYXZyby5jb2RlYw5kZWZsYXRlFmF2cm8uc2NoZW1h5AJ7InR5cGUiOiAicmVjb3JkIiwg
Im5hbWUiOiAieCIsICJmaWVsZHMiOiBbeyJuYW1lIjogImR0IiwgInR5cGUiOiB7ImxvZ2ljYWxU
eXBlIjogInRpbWVzdGFtcC1taWxsaXMiLCAidHlwZSI6ICJsb25nIn19LCB7Im5hbWUiOiAidiIs
ICJ0eXBlIjogImxvbmcifV0sICJfX2Zhc3RhdnJvX3BhcnNlZCI6IHRydWV9AHN0YWNraHV0bnN0
YWNrMTUGPAEWAOn/kNOptKBaAt/q/PLJgAEEkOyn1OsSBpxyDHN0YWNraHV0bnN0YWNrMTU=

And more context for the stacktrace for the error above is:
nifi_1_63870dd343fd   | java.lang.RuntimeException: Cannot convert
2019-02-19 01:01:01.0 to long
nifi_1_63870dd343fd   | at
org.apache.calcite.runtime.SqlFunctions.cannotConvert(SqlFunctions.java:1460)
nifi_1_63870dd343fd   | at
org.apache.calcite.runtime.SqlFunctions.toLong(SqlFunctions.java:1616)
nifi_1_63870dd343fd   | at Baz$1$1.moveNext(Unknown Source)
nifi_1_63870dd343fd   | at
org.apache.calcite.linq4j.Linq4j$EnumeratorIterator.(Linq4j.java:676)
nifi_1_63870dd343fd   | at
org.apache.calcite.linq4j.Linq4j.enumeratorIterator(Linq4j.java:96)
nifi_1_63870dd343fd   | at
org.apache.calcite.linq4j.AbstractEnumerable.iterator(AbstractEnumerable.java:33)
nifi_1_63870dd343fd   | at
org.apache.calcite.avatica.MetaImpl.createCursor(MetaImpl.java:90)
nifi_1_63870dd343fd   | at
org.apache.calcite.avatica.AvaticaResultSet.execute(AvaticaResultSet.java:184)
nifi_1_63870dd343fd   | at
org.apache.calcite.jdbc.CalciteResultSet.execute(CalciteResultSet.java:64)
nifi_1_63870dd343fd   | at
org.apache.calcite.jdbc.CalciteResultSet.execute(CalciteResultSet.java:43)
nifi_1_63870dd343fd   | at
org.apache.calcite.avatica.AvaticaConnection.executeQueryInternal(AvaticaConnection.java:573)
nifi_1_63870dd343fd   | at
org.apache.calcite.avatica.AvaticaPreparedStatement.executeQuery(AvaticaPreparedStatement.java:137)
nifi_1_63870dd343fd   | at
org.apache.nifi.processors.standard.QueryRecord.query(QueryRecord.java:465)
nifi_1_63870dd343fd   | at
org.apache.nifi.processors.standard.QueryRecord.onTrigger(QueryRecord.java:320)


Re: Different NiFi Node sizes within same cluster

2019-03-07 Thread Chad Woodhead
Thanks all for the input. So from what I'm gathering, storage differences
of around 5 GB (125 GB vs 130 GB) should not cause any problems/load
impacts. Larger storage differences could have load impacts. Differences in
CPU and RAM could definitely have load impacts. Luckily my older nodes have
the same CPU and RAM counts/specs as my new nodes.

The last thing I'm looking to understand is what Byran B brought up, do
load balanced connections take into consideration the load of each node?

Thanks,
Chad

On Wed, Mar 6, 2019 at 4:50 PM Bryan Bende  wrote:

> Yea ListenTCP also doesn't handle the back-pressure with the client
> the way it really should.
>
> Regarding the load balancing, I believe traditional s2s does factor in
> the load of each node when deciding how to load balance, but I don't
> know if this is part of load balanced connections or not. Mark P would
> know for sure.
>
> On Wed, Mar 6, 2019 at 4:47 PM James Srinivasan
>  wrote:
> >
> > Yup, but because of the unfortunate way the source (outside NiFi)
> > works, it doesn't buffer for long when the connection doesn't pull or
> > drops. It behaves far more like a 5 Mbps UDP stream really :-(
> >
> > On Wed, 6 Mar 2019 at 21:44, Bryan Bende  wrote:
> > >
> > > James, just curious, what was your source processor in this case?
> ListenTCP?
> > >
> > > On Wed, Mar 6, 2019 at 4:26 PM Jon Logan  wrote:
> > > >
> > > > What really would resolve some of these issues is backpressure on
> CPU -- ie. let Nifi throttle itself down to not choke the machine until it
> dies if constrained on CPU. Easier said than done unfortunately.
> > > >
> > > > On Wed, Mar 6, 2019 at 4:23 PM James Srinivasan <
> james.sriniva...@gmail.com> wrote:
> > > >>
> > > >> In our case, backpressure applied all the way up to the TCP network
> > > >> source which meant we lost data. AIUI, current load balancing is
> round
> > > >> robin (and two other options prob not relevant). Would actual load
> > > >> balancing (e.g. send to node with lowest OS load, or number of
> active
> > > >> threads) be a reasonable request?
> > > >>
> > > >> On Wed, 6 Mar 2019 at 20:51, Joe Witt  wrote:
> > > >> >
> > > >> > This is generally workable (heterogenous node capabilities) in
> NiFi clustering.  But you do want to leverage back-pressure and load
> balanced connections so that faster nodes will have an opportunity to take
> on the workload for slower nodes.
> > > >> >
> > > >> > Thanks
> > > >> >
> > > >> > On Wed, Mar 6, 2019 at 3:48 PM James Srinivasan <
> james.sriniva...@gmail.com> wrote:
> > > >> >>
> > > >> >> Yes, we hit this with the new load balanced queues (which, to be
> fair, we also had with remote process groups previously). Two "old" nodes
> got saturated and their queues filled while three "new" nodes were fine.
> > > >> >>
> > > >> >> My "solution" was to move everything to new hardware which we
> had inbound anyway.
> > > >> >>
> > > >> >> On Wed, 6 Mar 2019, 20:40 Jon Logan, 
> wrote:
> > > >> >>>
> > > >> >>> You may run into issues with different processing power, as
> some machines may be overwhelmed in order to saturate other machines.
> > > >> >>>
> > > >> >>> On Wed, Mar 6, 2019 at 3:34 PM Mark Payne 
> wrote:
> > > >> 
> > > >>  Chad,
> > > >> 
> > > >>  This should not be a problem, given that all nodes have enough
> storage available to handle the influx of data.
> > > >> 
> > > >>  Thanks
> > > >>  -Mark
> > > >> 
> > > >> 
> > > >>  > On Mar 6, 2019, at 1:44 PM, Chad Woodhead <
> chadwoodh...@gmail.com> wrote:
> > > >>  >
> > > >>  > Are there any negative effects of having filesystem mounts
> (dedicated mounts for each repo) used by the different NiFi repositories
> differ in size on NiFi nodes within the same cluster? For instance, if some
> nodes have a content_repo mount of 130 GB and other nodes have a
> content_repo mount of 125 GB, could that cause any problems or cause one
> node to be used more since it has more space? What about if the difference
> was larger, by say a 100 GB difference?
> > > >>  >
> > > >>  > Trying to repurpose old nodes and add them as NiFi nodes,
> but their mount sizes are different than my current cluster’s nodes and
> I’ve noticed I can’t set the max size limit to use of a particular mount
> for a repo.
> > > >>  >
> > > >>  > -Chad
> > > >> 
>