[jira] [Comment Edited] (ATLAS-3254) Atlas entity with large array of refs causes performance issues for lineage

2019-12-17 Thread Bolke de Bruin (Jira)


[ 
https://issues.apache.org/jira/browse/ATLAS-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16998538#comment-16998538
 ] 

Bolke de Bruin edited comment on ATLAS-3254 at 12/17/19 8:08 PM:
-

What [~mayank_nj] do you consider to load “properly”? What is the time taken to 
show the properties? What is the size of the json sent over the network (ours 
is > 27mb)? What is the load time? What is the render time?

Are you saying the loading of 200K objects in a pseudodir is taking over 1h? 
That is not “proper” I think?


was (Author: bolke):
What [~mayank_nj] do you consider to load “properly”? What is the time taken to 
show the properties? What is the size of the json sent over the network (ours 
is > 27mb)? What is the load time? What is the render time?

> Atlas entity with large array of refs causes performance issues for lineage
> ---
>
> Key: ATLAS-3254
> URL: https://issues.apache.org/jira/browse/ATLAS-3254
> Project: Atlas
>  Issue Type: Bug
>  Components:  atlas-core, atlas-webui
>Affects Versions: 1.0.0, 2.0.0
>Reporter: Adam Rempter
>Assignee: Mayank Jain
>Priority: Major
>  Labels: performance
> Attachments: Screenshot 2019-11-28 at 21.18.44.png, 
> entity_auto_create.sh, example_create_entities.json, 
> rest_entity_get_pseudodir.json
>
>
> We use “aws_s3_pseudo_dir” type from 3020-aws_s3_typedefs.json model.
> It has following property: 
> "name":    "s3Objects",
> "typeName":    "array"
>  
> Now in AWS buckets you can have thousands of objects. This causes that 
> s3Objects array grows quite quickly, causing aws_s3_pseudo_dir entity Json to 
> rich easly few MBs.
>  
> Then we start seeing problems like:
>  * UI is dying on displaying entity properties or lineage
>  * Error in logs: audit record too long: entityType=aws_s3_pseudo_dir, 
> guid=24398271-6ba0-4db5-adfa-38e432dc55ce, size=1053931; maxSize=1048576. 
> entity attribute values not stored in audit (EntityAuditListenerV2:234)
>  * Some errors with write to HBase (java.lang.IllegalArgumentException: 
> KeyValue size too large, as workaround we set hbase.client.keyvalue.maxsize 
> param to 0)
>  * kafka consumer errors (we can of course set some parameters on consumer, 
> but I think it is just workaround)
> …
> Exception in NotificationHookConsumer (NotificationHookConsumer:332)
> org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be 
> completed since the group has already rebalanced and assigned the partitions 
> to another member. This means that the time between subsequen
> t calls to poll() was longer than the configured max.poll.interval.ms, which 
> typically implies that the poll loop is spending too much time message 
> processing. You can address this either by increasing the sessio
> n timeout or by reducing the maximum size of batches returned in poll() with 
> max.poll.records.
> …
> Specifying pseudo_dir is required for s3objects:
> name": "pseudoDirectory",
> "typeName": "aws_s3_pseudo_dir",
> "cardinality": "SINGLE",
> "isIndexable": false,
> *"isOptional": false,*
> "isUnique": false,
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ATLAS-3254) Atlas entity with large array of refs causes performance issues for lineage

2019-11-28 Thread Bolke de Bruin (Jira)


[ 
https://issues.apache.org/jira/browse/ATLAS-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16984631#comment-16984631
 ] 

Bolke de Bruin edited comment on ATLAS-3254 at 11/28/19 8:24 PM:
-

[~madhan] we see now serious issues with entity deletion. The attached 
screenshot shows 30min processing time. So this not only has an effect on the 
UI.

This is on master btw.


was (Author: bolke):
[~madhan] we see now serious issues with entity deletion. The attached 
screenshot shows 30min processing time. So this not only has an effect on the 
UI.

> Atlas entity with large array of refs causes performance issues for lineage
> ---
>
> Key: ATLAS-3254
> URL: https://issues.apache.org/jira/browse/ATLAS-3254
> Project: Atlas
>  Issue Type: Bug
>  Components:  atlas-core, atlas-webui
>Affects Versions: 1.0.0, 2.0.0
>Reporter: Adam Rempter
>Priority: Major
>  Labels: performance
> Attachments: Screenshot 2019-11-28 at 21.18.44.png, 
> example_create_entities.json, rest_entity_get_pseudodir.json
>
>
> We use “aws_s3_pseudo_dir” type from 3020-aws_s3_typedefs.json model.
> It has following property: 
> "name":    "s3Objects",
> "typeName":    "array"
>  
> Now in AWS buckets you can have thousands of objects. This causes that 
> s3Objects array grows quite quickly, causing aws_s3_pseudo_dir entity Json to 
> rich easly few MBs.
>  
> Then we start seeing problems like:
>  * UI is dying on displaying entity properties or lineage
>  * Error in logs: audit record too long: entityType=aws_s3_pseudo_dir, 
> guid=24398271-6ba0-4db5-adfa-38e432dc55ce, size=1053931; maxSize=1048576. 
> entity attribute values not stored in audit (EntityAuditListenerV2:234)
>  * Some errors with write to HBase (java.lang.IllegalArgumentException: 
> KeyValue size too large, as workaround we set hbase.client.keyvalue.maxsize 
> param to 0)
>  * kafka consumer errors (we can of course set some parameters on consumer, 
> but I think it is just workaround)
> …
> Exception in NotificationHookConsumer (NotificationHookConsumer:332)
> org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be 
> completed since the group has already rebalanced and assigned the partitions 
> to another member. This means that the time between subsequen
> t calls to poll() was longer than the configured max.poll.interval.ms, which 
> typically implies that the poll loop is spending too much time message 
> processing. You can address this either by increasing the sessio
> n timeout or by reducing the maximum size of batches returned in poll() with 
> max.poll.records.
> …
> Specifying pseudo_dir is required for s3objects:
> name": "pseudoDirectory",
> "typeName": "aws_s3_pseudo_dir",
> "cardinality": "SINGLE",
> "isIndexable": false,
> *"isOptional": false,*
> "isUnique": false,
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)