[
https://issues.apache.org/jira/browse/ATLAS-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16998538#comment-16998538
]
Bolke de Bruin edited comment on ATLAS-3254 at 12/17/19 8:08 PM:
-
What [~mayank_nj] do you consider to load “properly”? What is the time taken to
show the properties? What is the size of the json sent over the network (ours
is > 27mb)? What is the load time? What is the render time?
Are you saying the loading of 200K objects in a pseudodir is taking over 1h?
That is not “proper” I think?
was (Author: bolke):
What [~mayank_nj] do you consider to load “properly”? What is the time taken to
show the properties? What is the size of the json sent over the network (ours
is > 27mb)? What is the load time? What is the render time?
> Atlas entity with large array of refs causes performance issues for lineage
> ---
>
> Key: ATLAS-3254
> URL: https://issues.apache.org/jira/browse/ATLAS-3254
> Project: Atlas
> Issue Type: Bug
> Components: atlas-core, atlas-webui
>Affects Versions: 1.0.0, 2.0.0
>Reporter: Adam Rempter
>Assignee: Mayank Jain
>Priority: Major
> Labels: performance
> Attachments: Screenshot 2019-11-28 at 21.18.44.png,
> entity_auto_create.sh, example_create_entities.json,
> rest_entity_get_pseudodir.json
>
>
> We use “aws_s3_pseudo_dir” type from 3020-aws_s3_typedefs.json model.
> It has following property:
> "name": "s3Objects",
> "typeName": "array"
>
> Now in AWS buckets you can have thousands of objects. This causes that
> s3Objects array grows quite quickly, causing aws_s3_pseudo_dir entity Json to
> rich easly few MBs.
>
> Then we start seeing problems like:
> * UI is dying on displaying entity properties or lineage
> * Error in logs: audit record too long: entityType=aws_s3_pseudo_dir,
> guid=24398271-6ba0-4db5-adfa-38e432dc55ce, size=1053931; maxSize=1048576.
> entity attribute values not stored in audit (EntityAuditListenerV2:234)
> * Some errors with write to HBase (java.lang.IllegalArgumentException:
> KeyValue size too large, as workaround we set hbase.client.keyvalue.maxsize
> param to 0)
> * kafka consumer errors (we can of course set some parameters on consumer,
> but I think it is just workaround)
> …
> Exception in NotificationHookConsumer (NotificationHookConsumer:332)
> org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be
> completed since the group has already rebalanced and assigned the partitions
> to another member. This means that the time between subsequen
> t calls to poll() was longer than the configured max.poll.interval.ms, which
> typically implies that the poll loop is spending too much time message
> processing. You can address this either by increasing the sessio
> n timeout or by reducing the maximum size of batches returned in poll() with
> max.poll.records.
> …
> Specifying pseudo_dir is required for s3objects:
> name": "pseudoDirectory",
> "typeName": "aws_s3_pseudo_dir",
> "cardinality": "SINGLE",
> "isIndexable": false,
> *"isOptional": false,*
> "isUnique": false,
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)