[jira] [Commented] (ATLAS-3305) Unable to scale atlas kafka consumers
[ https://issues.apache.org/jira/browse/ATLAS-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16986690#comment-16986690 ] Adam Rempter commented on ATLAS-3305: - [~madhan], [~sarath], could you guys have a look at patch I attached and suggest if we need any more improvements to it? By default patch sets one consumer per topic - so the same like it is at the moment. Maybe add more clear docs like [~bolke] suggested to ensure that admin is aware of how atlas will process hook messages? Now Atlas supports multiple topics, so it could be another addition to improve performance of atlas consumer. Btw I think it relaxes consistency also... And consistency can be guaranteed on producer side, eg: [https://www.javaworld.com/article/3066873/big-data-messaging-with-kafka-part-2.html] Thanks, Adam > Unable to scale atlas kafka consumers > - > > Key: ATLAS-3305 > URL: https://issues.apache.org/jira/browse/ATLAS-3305 > Project: Atlas > Issue Type: Bug > Components: atlas-core, atlas-intg >Affects Versions: 1.1.0, 2.0.0 >Reporter: Adam Rempter >Priority: Major > Labels: performance > Attachments: ATLAS-3305_multiple_kafka_consumers.patch, > multiple_consumers_perf.png > > Time Spent: 10m > Remaining Estimate: 0h > > We wanted to scale kafka consumers for atlas, as we are getting many lineage > messages and processing them just with one consumer is not enough. > > There is parameter atlas.notification.hook.numthreads to scale consumers in > NotificationHookConsumer. > But the method: > > notificationInterface.createConsumers(NotificationType.HOOK, numThreads) > > is always returning one element list, which effectively always starts one > consumer > List> consumers = > Collections.singletonList(kafkaConsumer); > > Log incorrectly says that nuber of consumers has been created: > LOG.info("<== KafkaNotification.createConsumers(notificationType={}, > numConsumers={}, autoCommitEnabled={})", notificationType, numConsumers, > autoCommitEnabled) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ATLAS-3305) Unable to scale atlas kafka consumers
[ https://issues.apache.org/jira/browse/ATLAS-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16985964#comment-16985964 ] Adam Rempter commented on ATLAS-3305: - Updated patch with latest master. In our case such change really allowed atlas to swallow growing changes delta. Please see grafana screenshot for details. !multiple_consumers_perf.png! > Unable to scale atlas kafka consumers > - > > Key: ATLAS-3305 > URL: https://issues.apache.org/jira/browse/ATLAS-3305 > Project: Atlas > Issue Type: Bug > Components: atlas-core, atlas-intg >Affects Versions: 1.1.0, 2.0.0 >Reporter: Adam Rempter >Priority: Major > Labels: performance > Attachments: ATLAS-3305_multiple_kafka_consumers.patch, > multiple_consumers_perf.png > > Time Spent: 10m > Remaining Estimate: 0h > > We wanted to scale kafka consumers for atlas, as we are getting many lineage > messages and processing them just with one consumer is not enough. > > There is parameter atlas.notification.hook.numthreads to scale consumers in > NotificationHookConsumer. > But the method: > > notificationInterface.createConsumers(NotificationType.HOOK, numThreads) > > is always returning one element list, which effectively always starts one > consumer > List> consumers = > Collections.singletonList(kafkaConsumer); > > Log incorrectly says that nuber of consumers has been created: > LOG.info("<== KafkaNotification.createConsumers(notificationType={}, > numConsumers={}, autoCommitEnabled={})", notificationType, numConsumers, > autoCommitEnabled) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ATLAS-3305) Unable to scale atlas kafka consumers
[ https://issues.apache.org/jira/browse/ATLAS-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Rempter updated ATLAS-3305: Attachment: multiple_consumers_perf.png > Unable to scale atlas kafka consumers > - > > Key: ATLAS-3305 > URL: https://issues.apache.org/jira/browse/ATLAS-3305 > Project: Atlas > Issue Type: Bug > Components: atlas-core, atlas-intg >Affects Versions: 1.1.0, 2.0.0 >Reporter: Adam Rempter >Priority: Major > Labels: performance > Attachments: ATLAS-3305_multiple_kafka_consumers.patch, > multiple_consumers_perf.png > > Time Spent: 10m > Remaining Estimate: 0h > > We wanted to scale kafka consumers for atlas, as we are getting many lineage > messages and processing them just with one consumer is not enough. > > There is parameter atlas.notification.hook.numthreads to scale consumers in > NotificationHookConsumer. > But the method: > > notificationInterface.createConsumers(NotificationType.HOOK, numThreads) > > is always returning one element list, which effectively always starts one > consumer > List> consumers = > Collections.singletonList(kafkaConsumer); > > Log incorrectly says that nuber of consumers has been created: > LOG.info("<== KafkaNotification.createConsumers(notificationType={}, > numConsumers={}, autoCommitEnabled={})", notificationType, numConsumers, > autoCommitEnabled) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ATLAS-3305) Unable to scale atlas kafka consumers
[ https://issues.apache.org/jira/browse/ATLAS-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Rempter updated ATLAS-3305: Attachment: ATLAS-3305_multiple_kafka_consumers.patch > Unable to scale atlas kafka consumers > - > > Key: ATLAS-3305 > URL: https://issues.apache.org/jira/browse/ATLAS-3305 > Project: Atlas > Issue Type: Bug > Components: atlas-core, atlas-intg >Affects Versions: 1.1.0, 2.0.0 >Reporter: Adam Rempter >Priority: Major > Labels: performance > Attachments: ATLAS-3305_multiple_kafka_consumers.patch > > Time Spent: 10m > Remaining Estimate: 0h > > We wanted to scale kafka consumers for atlas, as we are getting many lineage > messages and processing them just with one consumer is not enough. > > There is parameter atlas.notification.hook.numthreads to scale consumers in > NotificationHookConsumer. > But the method: > > notificationInterface.createConsumers(NotificationType.HOOK, numThreads) > > is always returning one element list, which effectively always starts one > consumer > List> consumers = > Collections.singletonList(kafkaConsumer); > > Log incorrectly says that nuber of consumers has been created: > LOG.info("<== KafkaNotification.createConsumers(notificationType={}, > numConsumers={}, autoCommitEnabled={})", notificationType, numConsumers, > autoCommitEnabled) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ATLAS-3261) Ranger Authorizer for Atlas is not checked for kafka messages
[ https://issues.apache.org/jira/browse/ATLAS-3261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16984449#comment-16984449 ] Adam Rempter commented on ATLAS-3261: - [~madhan] patch looks ok to me > Ranger Authorizer for Atlas is not checked for kafka messages > - > > Key: ATLAS-3261 > URL: https://issues.apache.org/jira/browse/ATLAS-3261 > Project: Atlas > Issue Type: Bug > Components: atlas-intg >Affects Versions: 1.1.0, 2.0.0 >Reporter: Adam Rempter >Priority: Major > Labels: security > Fix For: 2.1.0, 3.0.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > Atlas can be configured to authorize user actions with Ranger > ([https://atlas.apache.org/1.1.0/Atlas-Authorization-Ranger-Authorizer.html]). > > When I use user via REST it works: > curl -X GET -u testuser:testuser > http://localhost:21000/api/atlas/v2/entity/guid/f52151a0-fa08-4eab-b885-ece847a106e0 > {"errorCode":"ATLAS-403-00-001","errorMessage":"testuser is not authorized to > perform read entity: guid=f52151a0-fa08-4eab-b885-ece847a106e0"} > > When I send lineage to ATLAS_HOOK, I can create lineage successfully: > 2019-06-04 14:01:38,974 > 2019-06-04T12:01:23.867Z|testuser|NotificationHookConsumer|POST|api/atlas/v2/entity/|200|15119 > In above, I think user is taken from lineage message field user in json. > > Of course above is valid if another policy in ranger (kafka plugin) allows > puting messages to ATLAS_HOOK topic. > > But if I have one user (technical account) to produce to kafka and I want to > deny access in Atlas based on user from message, atlas ranger authorizer > doens't work. > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ATLAS-3261) Ranger Authorizer for Atlas is not checked for kafka messages
[ https://issues.apache.org/jira/browse/ATLAS-3261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16984361#comment-16984361 ] Adam Rempter commented on ATLAS-3261: - nice, will do > Ranger Authorizer for Atlas is not checked for kafka messages > - > > Key: ATLAS-3261 > URL: https://issues.apache.org/jira/browse/ATLAS-3261 > Project: Atlas > Issue Type: Bug > Components: atlas-intg >Affects Versions: 1.1.0, 2.0.0 >Reporter: Adam Rempter >Priority: Major > Labels: security > Fix For: 2.1.0, 3.0.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > Atlas can be configured to authorize user actions with Ranger > ([https://atlas.apache.org/1.1.0/Atlas-Authorization-Ranger-Authorizer.html]). > > When I use user via REST it works: > curl -X GET -u testuser:testuser > http://localhost:21000/api/atlas/v2/entity/guid/f52151a0-fa08-4eab-b885-ece847a106e0 > {"errorCode":"ATLAS-403-00-001","errorMessage":"testuser is not authorized to > perform read entity: guid=f52151a0-fa08-4eab-b885-ece847a106e0"} > > When I send lineage to ATLAS_HOOK, I can create lineage successfully: > 2019-06-04 14:01:38,974 > 2019-06-04T12:01:23.867Z|testuser|NotificationHookConsumer|POST|api/atlas/v2/entity/|200|15119 > In above, I think user is taken from lineage message field user in json. > > Of course above is valid if another policy in ranger (kafka plugin) allows > puting messages to ATLAS_HOOK topic. > > But if I have one user (technical account) to produce to kafka and I want to > deny access in Atlas based on user from message, atlas ranger authorizer > doens't work. > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ATLAS-3305) Unable to scale atlas kafka consumers
[ https://issues.apache.org/jira/browse/ATLAS-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16903621#comment-16903621 ] Adam Rempter commented on ATLAS-3305: - Yes, exactly I was thinking something like this. > Unable to scale atlas kafka consumers > - > > Key: ATLAS-3305 > URL: https://issues.apache.org/jira/browse/ATLAS-3305 > Project: Atlas > Issue Type: Bug > Components: atlas-core, atlas-intg >Affects Versions: 1.1.0, 2.0.0 >Reporter: Adam Rempter >Priority: Major > Labels: performance > > We wanted to scale kafka consumers for atlas, as we are getting many lineage > messages and processing them just with one consumer is not enough. > > There is parameter atlas.notification.hook.numthreads to scale consumers in > NotificationHookConsumer. > But the method: > > notificationInterface.createConsumers(NotificationType.HOOK, numThreads) > > is always returning one element list, which effectively always starts one > consumer > List> consumers = > Collections.singletonList(kafkaConsumer); > > Log incorrectly says that nuber of consumers has been created: > LOG.info("<== KafkaNotification.createConsumers(notificationType={}, > numConsumers={}, autoCommitEnabled={})", notificationType, numConsumers, > autoCommitEnabled) -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (ATLAS-3305) Unable to scale atlas kafka consumers
[ https://issues.apache.org/jira/browse/ATLAS-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875965#comment-16875965 ] Adam Rempter commented on ATLAS-3305: - Yes, its true, it just spawns multiple consumers. I guess, Atlas is using Kafka which is distributed message broker, and by definition there is no real way (at least now?) to guarantee global consitency. One way to mitigate it, would be to use message key by producer, so at least there will be order preserved by partition. Key could be either userId (one user per type of service, eg. hive) or entity name. In more strict mode (configuration option?) Atlas consumer could then check if message has key and if not discard such message? > Unable to scale atlas kafka consumers > - > > Key: ATLAS-3305 > URL: https://issues.apache.org/jira/browse/ATLAS-3305 > Project: Atlas > Issue Type: Bug > Components: atlas-core, atlas-intg >Affects Versions: 1.1.0, 2.0.0 >Reporter: Adam Rempter >Priority: Major > Labels: performance > > We wanted to scale kafka consumers for atlas, as we are getting many lineage > messages and processing them just with one consumer is not enough. > > There is parameter atlas.notification.hook.numthreads to scale consumers in > NotificationHookConsumer. > But the method: > > notificationInterface.createConsumers(NotificationType.HOOK, numThreads) > > is always returning one element list, which effectively always starts one > consumer > List> consumers = > Collections.singletonList(kafkaConsumer); > > Log incorrectly says that nuber of consumers has been created: > LOG.info("<== KafkaNotification.createConsumers(notificationType={}, > numConsumers={}, autoCommitEnabled={})", notificationType, numConsumers, > autoCommitEnabled) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ATLAS-3254) Atlas entity with large array of refs causes performance issues for lineage
[ https://issues.apache.org/jira/browse/ATLAS-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16873225#comment-16873225 ] Adam Rempter commented on ATLAS-3254: - Sorry I missed your question somehow. I am attaching two files, one is kafka message used to create atlas lineage, other shows rest get for pseudoDir containing s3 objects. [^example_create_entities.json] [^rest_entity_get_pseudodir.json] ^I think problem is not only related to UI only, we also see problems with processing messages (as decribed above) by kafka consumer.^ ^I think it can be easily reproduced by creating more than 100 s3 objects in pseudoDir array "s3Objects": []^ ^The size of the json grows with number of s3objects added.^ > Atlas entity with large array of refs causes performance issues for lineage > --- > > Key: ATLAS-3254 > URL: https://issues.apache.org/jira/browse/ATLAS-3254 > Project: Atlas > Issue Type: Bug > Components: atlas-core, atlas-webui >Affects Versions: 1.0.0, 2.0.0 >Reporter: Adam Rempter >Priority: Major > Labels: performance > Attachments: example_create_entities.json, > rest_entity_get_pseudodir.json > > > We use “aws_s3_pseudo_dir” type from 3020-aws_s3_typedefs.json model. > It has following property: > "name": "s3Objects", > "typeName": "array" > > Now in AWS buckets you can have thousands of objects. This causes that > s3Objects array grows quite quickly, causing aws_s3_pseudo_dir entity Json to > rich easly few MBs. > > Then we start seeing problems like: > * UI is dying on displaying entity properties or lineage > * Error in logs: audit record too long: entityType=aws_s3_pseudo_dir, > guid=24398271-6ba0-4db5-adfa-38e432dc55ce, size=1053931; maxSize=1048576. > entity attribute values not stored in audit (EntityAuditListenerV2:234) > * Some errors with write to HBase (java.lang.IllegalArgumentException: > KeyValue size too large, as workaround we set hbase.client.keyvalue.maxsize > param to 0) > * kafka consumer errors (we can of course set some parameters on consumer, > but I think it is just workaround) > … > Exception in NotificationHookConsumer (NotificationHookConsumer:332) > org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be > completed since the group has already rebalanced and assigned the partitions > to another member. This means that the time between subsequen > t calls to poll() was longer than the configured max.poll.interval.ms, which > typically implies that the poll loop is spending too much time message > processing. You can address this either by increasing the sessio > n timeout or by reducing the maximum size of batches returned in poll() with > max.poll.records. > … > Specifying pseudo_dir is required for s3objects: > name": "pseudoDirectory", > "typeName": "aws_s3_pseudo_dir", > "cardinality": "SINGLE", > "isIndexable": false, > *"isOptional": false,* > "isUnique": false, > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ATLAS-3254) Atlas entity with large array of refs causes performance issues for lineage
[ https://issues.apache.org/jira/browse/ATLAS-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Rempter updated ATLAS-3254: Attachment: example_create_entities.json > Atlas entity with large array of refs causes performance issues for lineage > --- > > Key: ATLAS-3254 > URL: https://issues.apache.org/jira/browse/ATLAS-3254 > Project: Atlas > Issue Type: Bug > Components: atlas-core, atlas-webui >Affects Versions: 1.0.0, 2.0.0 >Reporter: Adam Rempter >Priority: Major > Labels: performance > Attachments: example_create_entities.json > > > We use “aws_s3_pseudo_dir” type from 3020-aws_s3_typedefs.json model. > It has following property: > "name": "s3Objects", > "typeName": "array" > > Now in AWS buckets you can have thousands of objects. This causes that > s3Objects array grows quite quickly, causing aws_s3_pseudo_dir entity Json to > rich easly few MBs. > > Then we start seeing problems like: > * UI is dying on displaying entity properties or lineage > * Error in logs: audit record too long: entityType=aws_s3_pseudo_dir, > guid=24398271-6ba0-4db5-adfa-38e432dc55ce, size=1053931; maxSize=1048576. > entity attribute values not stored in audit (EntityAuditListenerV2:234) > * Some errors with write to HBase (java.lang.IllegalArgumentException: > KeyValue size too large, as workaround we set hbase.client.keyvalue.maxsize > param to 0) > * kafka consumer errors (we can of course set some parameters on consumer, > but I think it is just workaround) > … > Exception in NotificationHookConsumer (NotificationHookConsumer:332) > org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be > completed since the group has already rebalanced and assigned the partitions > to another member. This means that the time between subsequen > t calls to poll() was longer than the configured max.poll.interval.ms, which > typically implies that the poll loop is spending too much time message > processing. You can address this either by increasing the sessio > n timeout or by reducing the maximum size of batches returned in poll() with > max.poll.records. > … > Specifying pseudo_dir is required for s3objects: > name": "pseudoDirectory", > "typeName": "aws_s3_pseudo_dir", > "cardinality": "SINGLE", > "isIndexable": false, > *"isOptional": false,* > "isUnique": false, > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ATLAS-3305) Unable to scale atlas kafka consumers
[ https://issues.apache.org/jira/browse/ATLAS-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16873114#comment-16873114 ] Adam Rempter commented on ATLAS-3305: - With the change I can see atlas is able to spawn consumers correctly: TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID ATLAS_HOOK 2 - 0 - consumer-2-51e593e8-549b-47b7-a6d0-0f543ffaa306 /127.0.0.1 consumer-2 ATLAS_HOOK 3 - 0 - consumer-2-51e593e8-549b-47b7-a6d0-0f543ffaa306 /127.0.0.1 consumer-2 ATLAS_HOOK 0 - 1 - consumer-1-542ae27b-98be-4ee6-a206-5445950b94ff /127.0.0.1 consumer-1 ATLAS_HOOK 1 - 0 - consumer-1-542ae27b-98be-4ee6-a206-5445950b94ff /127.0.0.1 consumer-1 > Unable to scale atlas kafka consumers > - > > Key: ATLAS-3305 > URL: https://issues.apache.org/jira/browse/ATLAS-3305 > Project: Atlas > Issue Type: Bug > Components: atlas-core, atlas-intg >Affects Versions: 1.1.0, 2.0.0 >Reporter: Adam Rempter >Priority: Major > Labels: performance > > We wanted to scale kafka consumers for atlas, as we are getting many lineage > messages and processing them just with one consumer is not enough. > > There is parameter atlas.notification.hook.numthreads to scale consumers in > NotificationHookConsumer. > But the method: > > notificationInterface.createConsumers(NotificationType.HOOK, numThreads) > > is always returning one element list, which effectively always starts one > consumer > List> consumers = > Collections.singletonList(kafkaConsumer); > > Log incorrectly says that nuber of consumers has been created: > LOG.info("<== KafkaNotification.createConsumers(notificationType={}, > numConsumers={}, autoCommitEnabled={})", notificationType, numConsumers, > autoCommitEnabled) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ATLAS-3305) Unable to scale atlas kafka consumers
Adam Rempter created ATLAS-3305: --- Summary: Unable to scale atlas kafka consumers Key: ATLAS-3305 URL: https://issues.apache.org/jira/browse/ATLAS-3305 Project: Atlas Issue Type: Bug Components: atlas-core, atlas-intg Affects Versions: 2.0.0, 1.1.0 Reporter: Adam Rempter We wanted to scale kafka consumers for atlas, as we are getting many lineage messages and processing them just with one consumer is not enough. There is parameter atlas.notification.hook.numthreads to scale consumers in NotificationHookConsumer. But the method: notificationInterface.createConsumers(NotificationType.HOOK, numThreads) is always returning one element list, which effectively always starts one consumer List> consumers = Collections.singletonList(kafkaConsumer); Log incorrectly says that nuber of consumers has been created: LOG.info("<== KafkaNotification.createConsumers(notificationType={}, numConsumers={}, autoCommitEnabled={})", notificationType, numConsumers, autoCommitEnabled) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ATLAS-3261) Ranger Authorizer for Atlas is not checked for kafka messages
[ https://issues.apache.org/jira/browse/ATLAS-3261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16856722#comment-16856722 ] Adam Rempter commented on ATLAS-3261: - I did some checking on this issue, and what happens, when new message is received, security check is done by AtlasAuthorizationUtils before entity is created. public static boolean isAccessAllowed(AtlasAdminAccessRequest request) { boolean ret = false; *String userName = getCurrentUserName(); <-- returns ""* *if (StringUtils.isNotEmpty(userName)) { <-- false* try { AtlasAuthorizer authorizer = AtlasAuthorizerFactory.getAtlasAuthorizer(); request.setUser(userName, getCurrentUserGroups()); request.setClientIPAddress(RequestContext.get().getClientIPAddress()); ret = authorizer.isAccessAllowed(request); } catch (AtlasAuthorizationException e) { LOG.error("Unable to obtain AtlasAuthorizer", e); } *} else { <-- if no user return true* ret = true; } Now I added user from kafka as current security principal (see patch) and then it properly calls authorizer: rg.apache.atlas.exception.AtlasBaseException: *testuser is not authorized to perform create entity: type=server* at org.apache.atlas.authorize.AtlasAuthorizationUtils.verifyAccess(AtlasAuthorizationUtils.java:61) at org.apache.atlas.repository.store.graph.v2.AtlasEntityStoreV2.createOrUpdate(AtlasEntityStoreV2.java:664) But I am not sure it this is best way to handle this case... > Ranger Authorizer for Atlas is not checked for kafka messages > - > > Key: ATLAS-3261 > URL: https://issues.apache.org/jira/browse/ATLAS-3261 > Project: Atlas > Issue Type: Bug > Components: atlas-intg >Affects Versions: 1.1.0, 2.0.0 >Reporter: Adam Rempter >Priority: Major > Labels: security > Attachments: 0001-set-message-user-as-principal.patch > > > Atlas can be configured to authorize user actions with Ranger > ([https://atlas.apache.org/1.1.0/Atlas-Authorization-Ranger-Authorizer.html]). > > When I use user via REST it works: > curl -X GET -u testuser:testuser > http://localhost:21000/api/atlas/v2/entity/guid/f52151a0-fa08-4eab-b885-ece847a106e0 > {"errorCode":"ATLAS-403-00-001","errorMessage":"testuser is not authorized to > perform read entity: guid=f52151a0-fa08-4eab-b885-ece847a106e0"} > > When I send lineage to ATLAS_HOOK, I can create lineage successfully: > 2019-06-04 14:01:38,974 > 2019-06-04T12:01:23.867Z|testuser|NotificationHookConsumer|POST|api/atlas/v2/entity/|200|15119 > In above, I think user is taken from lineage message field user in json. > > Of course above is valid if another policy in ranger (kafka plugin) allows > puting messages to ATLAS_HOOK topic. > > But if I have one user (technical account) to produce to kafka and I want to > deny access in Atlas based on user from message, atlas ranger authorizer > doens't work. > > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ATLAS-3261) Ranger Authorizer for Atlas is not checked for kafka messages
[ https://issues.apache.org/jira/browse/ATLAS-3261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Rempter updated ATLAS-3261: Attachment: 0001-set-message-user-as-principal.patch > Ranger Authorizer for Atlas is not checked for kafka messages > - > > Key: ATLAS-3261 > URL: https://issues.apache.org/jira/browse/ATLAS-3261 > Project: Atlas > Issue Type: Bug > Components: atlas-intg >Affects Versions: 1.1.0, 2.0.0 >Reporter: Adam Rempter >Priority: Major > Labels: security > Attachments: 0001-set-message-user-as-principal.patch > > > Atlas can be configured to authorize user actions with Ranger > ([https://atlas.apache.org/1.1.0/Atlas-Authorization-Ranger-Authorizer.html]). > > When I use user via REST it works: > curl -X GET -u testuser:testuser > http://localhost:21000/api/atlas/v2/entity/guid/f52151a0-fa08-4eab-b885-ece847a106e0 > {"errorCode":"ATLAS-403-00-001","errorMessage":"testuser is not authorized to > perform read entity: guid=f52151a0-fa08-4eab-b885-ece847a106e0"} > > When I send lineage to ATLAS_HOOK, I can create lineage successfully: > 2019-06-04 14:01:38,974 > 2019-06-04T12:01:23.867Z|testuser|NotificationHookConsumer|POST|api/atlas/v2/entity/|200|15119 > In above, I think user is taken from lineage message field user in json. > > Of course above is valid if another policy in ranger (kafka plugin) allows > puting messages to ATLAS_HOOK topic. > > But if I have one user (technical account) to produce to kafka and I want to > deny access in Atlas based on user from message, atlas ranger authorizer > doens't work. > > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ATLAS-3254) Atlas entity with large array of refs causes performance issues for lineage
[ https://issues.apache.org/jira/browse/ATLAS-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Rempter updated ATLAS-3254: Issue Type: Bug (was: Improvement) > Atlas entity with large array of refs causes performance issues for lineage > --- > > Key: ATLAS-3254 > URL: https://issues.apache.org/jira/browse/ATLAS-3254 > Project: Atlas > Issue Type: Bug > Components: atlas-core, atlas-webui >Affects Versions: 1.0.0, 2.0.0 >Reporter: Adam Rempter >Priority: Major > Labels: performance > > We use “aws_s3_pseudo_dir” type from 3020-aws_s3_typedefs.json model. > It has following property: > "name": "s3Objects", > "typeName": "array" > > Now in AWS buckets you can have thousands of objects. This causes that > s3Objects array grows quite quickly, causing aws_s3_pseudo_dir entity Json to > rich easly few MBs. > > Then we start seeing problems like: > * UI is dying on displaying entity properties or lineage > * Error in logs: audit record too long: entityType=aws_s3_pseudo_dir, > guid=24398271-6ba0-4db5-adfa-38e432dc55ce, size=1053931; maxSize=1048576. > entity attribute values not stored in audit (EntityAuditListenerV2:234) > * Some errors with write to HBase (java.lang.IllegalArgumentException: > KeyValue size too large, as workaround we set hbase.client.keyvalue.maxsize > param to 0) > * kafka consumer errors (we can of course set some parameters on consumer, > but I think it is just workaround) > … > Exception in NotificationHookConsumer (NotificationHookConsumer:332) > org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be > completed since the group has already rebalanced and assigned the partitions > to another member. This means that the time between subsequen > t calls to poll() was longer than the configured max.poll.interval.ms, which > typically implies that the poll loop is spending too much time message > processing. You can address this either by increasing the sessio > n timeout or by reducing the maximum size of batches returned in poll() with > max.poll.records. > … > Specifying pseudo_dir is required for s3objects: > name": "pseudoDirectory", > "typeName": "aws_s3_pseudo_dir", > "cardinality": "SINGLE", > "isIndexable": false, > *"isOptional": false,* > "isUnique": false, > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ATLAS-3261) Ranger Authorizer for Atlas is not checked for kafka messages
Adam Rempter created ATLAS-3261: --- Summary: Ranger Authorizer for Atlas is not checked for kafka messages Key: ATLAS-3261 URL: https://issues.apache.org/jira/browse/ATLAS-3261 Project: Atlas Issue Type: Bug Components: atlas-intg Affects Versions: 2.0.0, 1.1.0 Reporter: Adam Rempter Atlas can be configured to authorize user actions with Ranger ([https://atlas.apache.org/1.1.0/Atlas-Authorization-Ranger-Authorizer.html]). When I use user via REST it works: curl -X GET -u testuser:testuser http://localhost:21000/api/atlas/v2/entity/guid/f52151a0-fa08-4eab-b885-ece847a106e0 {"errorCode":"ATLAS-403-00-001","errorMessage":"testuser is not authorized to perform read entity: guid=f52151a0-fa08-4eab-b885-ece847a106e0"} When I send lineage to ATLAS_HOOK, I can create lineage successfully: 2019-06-04 14:01:38,974 2019-06-04T12:01:23.867Z|testuser|NotificationHookConsumer|POST|api/atlas/v2/entity/|200|15119 In above, I think user is taken from lineage message field user in json. Of course above is valid if another policy in ranger (kafka plugin) allows puting messages to ATLAS_HOOK topic. But if I have one user (technical account) to produce to kafka and I want to deny access in Atlas based on user from message, atlas ranger authorizer doens't work. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ATLAS-3254) Atlas entity with large array of refs causes performance issues for lineage
[ https://issues.apache.org/jira/browse/ATLAS-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852840#comment-16852840 ] Adam Rempter commented on ATLAS-3254: - Atlas has created so far pseudo directory which json representation has 27MB! Maybe there should be some limit to which entity should grow? > Atlas entity with large array of refs causes performance issues for lineage > --- > > Key: ATLAS-3254 > URL: https://issues.apache.org/jira/browse/ATLAS-3254 > Project: Atlas > Issue Type: Improvement > Components: atlas-core, atlas-webui >Affects Versions: 1.0.0, 2.0.0 >Reporter: Adam Rempter >Priority: Major > Labels: performance > > We use “aws_s3_pseudo_dir” type from 3020-aws_s3_typedefs.json model. > It has following property: > "name": "s3Objects", > "typeName": "array" > > Now in AWS buckets you can have thousands of objects. This causes that > s3Objects array grows quite quickly, causing aws_s3_pseudo_dir entity Json to > rich easly few MBs. > > Then we start seeing problems like: > * UI is dying on displaying entity properties or lineage > * Error in logs: audit record too long: entityType=aws_s3_pseudo_dir, > guid=24398271-6ba0-4db5-adfa-38e432dc55ce, size=1053931; maxSize=1048576. > entity attribute values not stored in audit (EntityAuditListenerV2:234) > * Some errors with write to HBase (java.lang.IllegalArgumentException: > KeyValue size too large, as workaround we set hbase.client.keyvalue.maxsize > param to 0) > * kafka consumer errors (we can of course set some parameters on consumer, > but I think it is just workaround) > … > Exception in NotificationHookConsumer (NotificationHookConsumer:332) > org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be > completed since the group has already rebalanced and assigned the partitions > to another member. This means that the time between subsequen > t calls to poll() was longer than the configured max.poll.interval.ms, which > typically implies that the poll loop is spending too much time message > processing. You can address this either by increasing the sessio > n timeout or by reducing the maximum size of batches returned in poll() with > max.poll.records. > … > Specifying pseudo_dir is required for s3objects: > name": "pseudoDirectory", > "typeName": "aws_s3_pseudo_dir", > "cardinality": "SINGLE", > "isIndexable": false, > *"isOptional": false,* > "isUnique": false, > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ATLAS-3254) Atlas entity with large array of refs causes performance issues for lineage
Adam Rempter created ATLAS-3254: --- Summary: Atlas entity with large array of refs causes performance issues for lineage Key: ATLAS-3254 URL: https://issues.apache.org/jira/browse/ATLAS-3254 Project: Atlas Issue Type: Improvement Components: atlas-core, atlas-webui Affects Versions: 2.0.0, 1.0.0 Reporter: Adam Rempter We use “aws_s3_pseudo_dir” type from 3020-aws_s3_typedefs.json model. It has following property: "name": "s3Objects", "typeName": "array" Now in AWS buckets you can have thousands of objects. This causes that s3Objects array grows quite quickly, causing aws_s3_pseudo_dir entity Json to rich easly few MBs. Then we start seeing problems like: * UI is dying on displaying entity properties or lineage * Error in logs: audit record too long: entityType=aws_s3_pseudo_dir, guid=24398271-6ba0-4db5-adfa-38e432dc55ce, size=1053931; maxSize=1048576. entity attribute values not stored in audit (EntityAuditListenerV2:234) * Some errors with write to HBase (java.lang.IllegalArgumentException: KeyValue size too large, as workaround we set hbase.client.keyvalue.maxsize param to 0) * kafka consumer errors (we can of course set some parameters on consumer, but I think it is just workaround) … Exception in NotificationHookConsumer (NotificationHookConsumer:332) org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequen t calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the sessio n timeout or by reducing the maximum size of batches returned in poll() with max.poll.records. … Specifying pseudo_dir is required for s3objects: name": "pseudoDirectory", "typeName": "aws_s3_pseudo_dir", "cardinality": "SINGLE", "isIndexable": false, *"isOptional": false,* "isUnique": false, -- This message was sent by Atlassian JIRA (v7.6.3#76005)