[GitHub] nifi issue #3111: NIFI-5757 AvroRecordSetWriter - Fix for slow synchronized ...

2018-11-09 Thread markap14
Github user markap14 commented on the issue:

https://github.com/apache/nifi/pull/3111
  
@arkadius I did a good bit of testing and wasn't able to break anything. 
Thanks for the improvements! +1, I have merged to master. 


---


[GitHub] nifi issue #3111: NIFI-5757 AvroRecordSetWriter - Fix for slow synchronized ...

2018-11-09 Thread arkadius
Github user arkadius commented on the issue:

https://github.com/apache/nifi/pull/3111
  
@markap14 No problem, looking forward for conclusions from this testing.


---


[GitHub] nifi issue #3111: NIFI-5757 AvroRecordSetWriter - Fix for slow synchronized ...

2018-11-09 Thread markap14
Github user markap14 commented on the issue:

https://github.com/apache/nifi/pull/3111
  
@arkadius thanks for the update! Code review looks good. All unit tests and 
contrib-check pass. This does touch some super heavily used processors, though, 
like UpdateAttribute, so I'll want to take my time testing. This could 
certainly be a great improvement though. Thanks again!


---


[GitHub] nifi issue #3111: NIFI-5757 AvroRecordSetWriter - Fix for slow synchronized ...

2018-11-06 Thread arkadius
Github user arkadius commented on the issue:

https://github.com/apache/nifi/pull/3111
  
@markap14 I understand your point, I also don't like poems in comments :-) 
I've switched to Caffeine in all places except 4. Can you look at this now? CI 
report failure in some jobs - looks like non deterministic tests in other 
module. Have no idea how to retrigger them (I suppose that I have no access to 
this action).


---


[GitHub] nifi issue #3111: NIFI-5757 AvroRecordSetWriter - Fix for slow synchronized ...

2018-11-05 Thread markap14
Github user markap14 commented on the issue:

https://github.com/apache/nifi/pull/3111
  
@arkadius thanks for compiling that list. Sorry it took so long to reply! 
Looking through the list, I do think you're right - these all appear to be the 
same pattern. I certainly didn't realize that we were making such prolific use 
of this pattern. Reading through the Caffeine docs, it probably does make sense 
to update these as well.


---


[GitHub] nifi issue #3111: NIFI-5757 AvroRecordSetWriter - Fix for slow synchronized ...

2018-10-30 Thread arkadius
Github user arkadius commented on the issue:

https://github.com/apache/nifi/pull/3111
  
Other places of usage:
1. `org.apache.nifi.record.path.util.RecordPathCache` - record path cache, 
synchronized access
2. `org.apache.nifi.avro.AvroReader` - avro schema cache (this parts is 
almost copy-paste of `AvroRecordSetWriter`), synchronized access
3. `org.apache.nifi.processors.attributes.UpdateAttribute` - canonical 
value lookup cache, synchronized access
4. `org.apache.nifi.atlas.hook.NotificationSender` - `guidToQualifiedName` 
and `typedQualifiedNameToRef` caches, but no synchronization on class level
5. `org.apache.nifi.processors.jolt.record.JoltTransformRecord` - 
tranformations cache, synchronized access
6. `org.apache.nifi.schema.access.WriteAvroSchemaAttributeStrategy` - avro 
schema cache, synchronized access
7. `org.apache.nifi.processors.standard.PutDatabaseRecord` - table schema 
cache, synchronized access
8. `org.apache.nifi.processors.standard.JoltTransformJSON` - tranformations 
cache, synchronized access
9. `org.apache.nifi.processors.standard.ConvertJSONToSQL` - table schema 
cache, synchronized access
10. 
`org.apache.nifi.confluent.schemaregistry.client.CachingSchemaRegistryClient` - 
avro schema cache, synchronized access

For me in 9/10 case it is the same scenario and would be better to fix it 
there before someone hit the same problem. Especially when it is used for avro 
schema cache. If someone choose, avro it is very high probability that he or 
she will expect high throughputs. WDYT?


---


[GitHub] nifi issue #3111: NIFI-5757 AvroRecordSetWriter - Fix for slow synchronized ...

2018-10-30 Thread markap14
Github user markap14 commented on the issue:

https://github.com/apache/nifi/pull/3111
  
@arkadius I am not familiar with caffeine but it looks promising. A quick 
glance through the LICENSE indicates that it is compatible and has only a 
single dependency on Google Guava, which is also compatible.
I would avoid updating the other references blindly, though. We would need 
to do some evaluation to see where it does and does not make sense. Especially 
if used in a context where thread safety is not required.


---


[GitHub] nifi issue #3111: NIFI-5757 AvroRecordSetWriter - Fix for slow synchronized ...

2018-10-29 Thread arkadius
Github user arkadius commented on the issue:

https://github.com/apache/nifi/pull/3111
  
@markap14 what do you think about using 
https://github.com/ben-manes/caffeine instead? They have benchmarks showing 
performance difference from current approach (named `LinkedHashMap_Lru`). Take 
a look at probably the most important _Read 100%_ scenario. If you want, I can 
also rewrite other `LinkedHashMap` usages for cache? `grep removeEldestEntry` 
shows 10 other places.


---


[GitHub] nifi issue #3111: NIFI-5757 AvroRecordSetWriter - Fix for slow synchronized ...

2018-10-29 Thread markap14
Github user markap14 commented on the issue:

https://github.com/apache/nifi/pull/3111
  
@arkadius fair enough. I'd be okay with a soft limit.


---


[GitHub] nifi issue #3111: NIFI-5757 AvroRecordSetWriter - Fix for slow synchronized ...

2018-10-29 Thread arkadius
Github user arkadius commented on the issue:

https://github.com/apache/nifi/pull/3111
  
Hi @markap14 . Thank you for your reply. We have a quite heavy-loaded 
environment. We are processing about 5 GB / min of messages on 32 core host. 
Avro serialization was our main bottleneck. We had avg 30 threads concurrently 
using this service. CPU usage was on 85%, iowaits near 0, load on 50. 
Increasing of number of threads didn't help. As I wrote in issue, I found out 
that the synchronization is an issue by sampling (doing occasional thread 
dumps). Most of thread was blocked on it. After this modification, we stopped 
to occur such a performance issue.

Regards to removing of limitation of entries in the cache. I agree with you 
that it isn't the best approach. Limitation of 20 schemas wasn't either. In 
systems with multiple flows, commonly using the same service, this limit would 
be constantly hit. What do you think about introducing two parameters to this 
service: Cache Size Limit and Time To Leave? After hitting one of them, entry 
would be evicted. Condition would be checked during each entry get. Similar to 
`cachedSchema.isOlderThan` check in `CachingSchemaRegistryClient`. If we assume 
that these limits would be soft one, it won't need synchronization.



---


[GitHub] nifi issue #3111: NIFI-5757 AvroRecordSetWriter - Fix for slow synchronized ...

2018-10-29 Thread markap14
Github user markap14 commented on the issue:

https://github.com/apache/nifi/pull/3111
  
@arkadius thanks for the contribution. I don't think this is the right 
approach, however, As-is, the code ensures that we don't hold any more than a 
set number of entries in the cache. The proposed change removes this 
limitation, which introduces a memory leak that can quickly cause a system to 
run out of memory if they are using an unbounded number of schemas (as can 
occur if a system is dynamically generating schemas).

The synchronization here is limited to an extremely small scope. Can you 
explain how you arrived at the determination that the existing code is slow? 
What kind of performance are you seeing? What performance do you expect? What 
kind of difference do you see after applying this change?


---