[jira] [Commented] (ATLAS-3305) Unable to scale atlas kafka consumers

2019-12-03 Thread Adam Rempter (Jira)


[ 
https://issues.apache.org/jira/browse/ATLAS-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16986690#comment-16986690
 ] 

Adam Rempter commented on ATLAS-3305:
-

[~madhan], [~sarath], could you guys have a look at patch I attached and 
suggest if we need any more improvements to it?

By default patch sets one consumer per topic - so the same like it is at the 
moment.

Maybe add more clear docs like [~bolke] suggested to ensure that admin is aware 
of how atlas will process hook messages?

 

Now Atlas supports multiple topics, so it could be another addition to improve 
performance of atlas consumer. Btw I think it relaxes consistency also...

And consistency can be guaranteed on producer side, eg:

[https://www.javaworld.com/article/3066873/big-data-messaging-with-kafka-part-2.html]

 

Thanks,
Adam

 

> Unable to scale atlas kafka consumers
> -
>
> Key: ATLAS-3305
> URL: https://issues.apache.org/jira/browse/ATLAS-3305
> Project: Atlas
>  Issue Type: Bug
>  Components:  atlas-core, atlas-intg
>Affects Versions: 1.1.0, 2.0.0
>Reporter: Adam Rempter
>Priority: Major
>  Labels: performance
> Attachments: ATLAS-3305_multiple_kafka_consumers.patch, 
> multiple_consumers_perf.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We wanted to scale kafka consumers for atlas, as we are getting many lineage 
> messages and processing them just with one consumer is not enough. 
>  
> There is parameter atlas.notification.hook.numthreads to scale consumers in  
> NotificationHookConsumer.
> But the method:
>  
> notificationInterface.createConsumers(NotificationType.HOOK, numThreads)
>  
> is always returning one element list, which effectively always starts one 
> consumer
> List> consumers = 
> Collections.singletonList(kafkaConsumer);
>  
> Log incorrectly says that nuber of consumers has been created:
> LOG.info("<== KafkaNotification.createConsumers(notificationType={}, 
> numConsumers={}, autoCommitEnabled={})", notificationType, numConsumers, 
> autoCommitEnabled)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ATLAS-3305) Unable to scale atlas kafka consumers

2019-12-02 Thread Adam Rempter (Jira)


[ 
https://issues.apache.org/jira/browse/ATLAS-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16985964#comment-16985964
 ] 

Adam Rempter commented on ATLAS-3305:
-

Updated patch with latest master.  In our case such change really allowed atlas 
to swallow growing changes delta. 

Please see grafana screenshot for details. !multiple_consumers_perf.png!

> Unable to scale atlas kafka consumers
> -
>
> Key: ATLAS-3305
> URL: https://issues.apache.org/jira/browse/ATLAS-3305
> Project: Atlas
>  Issue Type: Bug
>  Components:  atlas-core, atlas-intg
>Affects Versions: 1.1.0, 2.0.0
>Reporter: Adam Rempter
>Priority: Major
>  Labels: performance
> Attachments: ATLAS-3305_multiple_kafka_consumers.patch, 
> multiple_consumers_perf.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We wanted to scale kafka consumers for atlas, as we are getting many lineage 
> messages and processing them just with one consumer is not enough. 
>  
> There is parameter atlas.notification.hook.numthreads to scale consumers in  
> NotificationHookConsumer.
> But the method:
>  
> notificationInterface.createConsumers(NotificationType.HOOK, numThreads)
>  
> is always returning one element list, which effectively always starts one 
> consumer
> List> consumers = 
> Collections.singletonList(kafkaConsumer);
>  
> Log incorrectly says that nuber of consumers has been created:
> LOG.info("<== KafkaNotification.createConsumers(notificationType={}, 
> numConsumers={}, autoCommitEnabled={})", notificationType, numConsumers, 
> autoCommitEnabled)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ATLAS-3305) Unable to scale atlas kafka consumers

2019-12-02 Thread Adam Rempter (Jira)


 [ 
https://issues.apache.org/jira/browse/ATLAS-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Rempter updated ATLAS-3305:

Attachment: multiple_consumers_perf.png

> Unable to scale atlas kafka consumers
> -
>
> Key: ATLAS-3305
> URL: https://issues.apache.org/jira/browse/ATLAS-3305
> Project: Atlas
>  Issue Type: Bug
>  Components:  atlas-core, atlas-intg
>Affects Versions: 1.1.0, 2.0.0
>Reporter: Adam Rempter
>Priority: Major
>  Labels: performance
> Attachments: ATLAS-3305_multiple_kafka_consumers.patch, 
> multiple_consumers_perf.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We wanted to scale kafka consumers for atlas, as we are getting many lineage 
> messages and processing them just with one consumer is not enough. 
>  
> There is parameter atlas.notification.hook.numthreads to scale consumers in  
> NotificationHookConsumer.
> But the method:
>  
> notificationInterface.createConsumers(NotificationType.HOOK, numThreads)
>  
> is always returning one element list, which effectively always starts one 
> consumer
> List> consumers = 
> Collections.singletonList(kafkaConsumer);
>  
> Log incorrectly says that nuber of consumers has been created:
> LOG.info("<== KafkaNotification.createConsumers(notificationType={}, 
> numConsumers={}, autoCommitEnabled={})", notificationType, numConsumers, 
> autoCommitEnabled)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ATLAS-3305) Unable to scale atlas kafka consumers

2019-12-02 Thread Adam Rempter (Jira)


 [ 
https://issues.apache.org/jira/browse/ATLAS-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Rempter updated ATLAS-3305:

Attachment: ATLAS-3305_multiple_kafka_consumers.patch

> Unable to scale atlas kafka consumers
> -
>
> Key: ATLAS-3305
> URL: https://issues.apache.org/jira/browse/ATLAS-3305
> Project: Atlas
>  Issue Type: Bug
>  Components:  atlas-core, atlas-intg
>Affects Versions: 1.1.0, 2.0.0
>Reporter: Adam Rempter
>Priority: Major
>  Labels: performance
> Attachments: ATLAS-3305_multiple_kafka_consumers.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We wanted to scale kafka consumers for atlas, as we are getting many lineage 
> messages and processing them just with one consumer is not enough. 
>  
> There is parameter atlas.notification.hook.numthreads to scale consumers in  
> NotificationHookConsumer.
> But the method:
>  
> notificationInterface.createConsumers(NotificationType.HOOK, numThreads)
>  
> is always returning one element list, which effectively always starts one 
> consumer
> List> consumers = 
> Collections.singletonList(kafkaConsumer);
>  
> Log incorrectly says that nuber of consumers has been created:
> LOG.info("<== KafkaNotification.createConsumers(notificationType={}, 
> numConsumers={}, autoCommitEnabled={})", notificationType, numConsumers, 
> autoCommitEnabled)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ATLAS-3261) Ranger Authorizer for Atlas is not checked for kafka messages

2019-11-28 Thread Adam Rempter (Jira)


[ 
https://issues.apache.org/jira/browse/ATLAS-3261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16984449#comment-16984449
 ] 

Adam Rempter commented on ATLAS-3261:
-

[~madhan] patch looks ok to me

> Ranger Authorizer for Atlas is not checked for kafka messages
> -
>
> Key: ATLAS-3261
> URL: https://issues.apache.org/jira/browse/ATLAS-3261
> Project: Atlas
>  Issue Type: Bug
>  Components: atlas-intg
>Affects Versions: 1.1.0, 2.0.0
>Reporter: Adam Rempter
>Priority: Major
>  Labels: security
> Fix For: 2.1.0, 3.0.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Atlas can be configured to authorize user actions with Ranger 
> ([https://atlas.apache.org/1.1.0/Atlas-Authorization-Ranger-Authorizer.html]).
>  
> When I use user via REST it works:
> curl -X GET -u testuser:testuser 
> http://localhost:21000/api/atlas/v2/entity/guid/f52151a0-fa08-4eab-b885-ece847a106e0
> {"errorCode":"ATLAS-403-00-001","errorMessage":"testuser is not authorized to 
> perform read entity: guid=f52151a0-fa08-4eab-b885-ece847a106e0"}
>  
> When I send lineage to ATLAS_HOOK, I can create lineage successfully:
> 2019-06-04 14:01:38,974 
> 2019-06-04T12:01:23.867Z|testuser|NotificationHookConsumer|POST|api/atlas/v2/entity/|200|15119
> In above, I think user is taken from lineage message field user in json.
>  
> Of course above is valid if another policy in ranger  (kafka plugin) allows 
> puting messages to ATLAS_HOOK topic. 
>  
> But if I have one user (technical account) to produce to kafka and I want to 
> deny access in Atlas based on user from message, atlas ranger authorizer 
> doens't work.
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ATLAS-3261) Ranger Authorizer for Atlas is not checked for kafka messages

2019-11-28 Thread Adam Rempter (Jira)


[ 
https://issues.apache.org/jira/browse/ATLAS-3261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16984361#comment-16984361
 ] 

Adam Rempter commented on ATLAS-3261:
-

nice, will do

> Ranger Authorizer for Atlas is not checked for kafka messages
> -
>
> Key: ATLAS-3261
> URL: https://issues.apache.org/jira/browse/ATLAS-3261
> Project: Atlas
>  Issue Type: Bug
>  Components: atlas-intg
>Affects Versions: 1.1.0, 2.0.0
>Reporter: Adam Rempter
>Priority: Major
>  Labels: security
> Fix For: 2.1.0, 3.0.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Atlas can be configured to authorize user actions with Ranger 
> ([https://atlas.apache.org/1.1.0/Atlas-Authorization-Ranger-Authorizer.html]).
>  
> When I use user via REST it works:
> curl -X GET -u testuser:testuser 
> http://localhost:21000/api/atlas/v2/entity/guid/f52151a0-fa08-4eab-b885-ece847a106e0
> {"errorCode":"ATLAS-403-00-001","errorMessage":"testuser is not authorized to 
> perform read entity: guid=f52151a0-fa08-4eab-b885-ece847a106e0"}
>  
> When I send lineage to ATLAS_HOOK, I can create lineage successfully:
> 2019-06-04 14:01:38,974 
> 2019-06-04T12:01:23.867Z|testuser|NotificationHookConsumer|POST|api/atlas/v2/entity/|200|15119
> In above, I think user is taken from lineage message field user in json.
>  
> Of course above is valid if another policy in ranger  (kafka plugin) allows 
> puting messages to ATLAS_HOOK topic. 
>  
> But if I have one user (technical account) to produce to kafka and I want to 
> deny access in Atlas based on user from message, atlas ranger authorizer 
> doens't work.
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ATLAS-3305) Unable to scale atlas kafka consumers

2019-08-09 Thread Adam Rempter (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16903621#comment-16903621
 ] 

Adam Rempter commented on ATLAS-3305:
-

Yes, exactly I was thinking something like this.

> Unable to scale atlas kafka consumers
> -
>
> Key: ATLAS-3305
> URL: https://issues.apache.org/jira/browse/ATLAS-3305
> Project: Atlas
>  Issue Type: Bug
>  Components:  atlas-core, atlas-intg
>Affects Versions: 1.1.0, 2.0.0
>Reporter: Adam Rempter
>Priority: Major
>  Labels: performance
>
> We wanted to scale kafka consumers for atlas, as we are getting many lineage 
> messages and processing them just with one consumer is not enough. 
>  
> There is parameter atlas.notification.hook.numthreads to scale consumers in  
> NotificationHookConsumer.
> But the method:
>  
> notificationInterface.createConsumers(NotificationType.HOOK, numThreads)
>  
> is always returning one element list, which effectively always starts one 
> consumer
> List> consumers = 
> Collections.singletonList(kafkaConsumer);
>  
> Log incorrectly says that nuber of consumers has been created:
> LOG.info("<== KafkaNotification.createConsumers(notificationType={}, 
> numConsumers={}, autoCommitEnabled={})", notificationType, numConsumers, 
> autoCommitEnabled)



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (ATLAS-3305) Unable to scale atlas kafka consumers

2019-07-01 Thread Adam Rempter (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875965#comment-16875965
 ] 

Adam Rempter commented on ATLAS-3305:
-

Yes, its true, it just spawns multiple consumers.

I guess, Atlas is using Kafka which is distributed message broker, and by 
definition there is no real way (at least now?) to guarantee global consitency. 

 

One way to mitigate it, would be to use message key by producer, so at least 
there will be order preserved by partition. 

Key could be either userId (one user per type of service, eg. hive) or entity 
name.

In more strict mode (configuration option?) Atlas consumer could then check if 
message has key and if not discard such message?

> Unable to scale atlas kafka consumers
> -
>
> Key: ATLAS-3305
> URL: https://issues.apache.org/jira/browse/ATLAS-3305
> Project: Atlas
>  Issue Type: Bug
>  Components:  atlas-core, atlas-intg
>Affects Versions: 1.1.0, 2.0.0
>Reporter: Adam Rempter
>Priority: Major
>  Labels: performance
>
> We wanted to scale kafka consumers for atlas, as we are getting many lineage 
> messages and processing them just with one consumer is not enough. 
>  
> There is parameter atlas.notification.hook.numthreads to scale consumers in  
> NotificationHookConsumer.
> But the method:
>  
> notificationInterface.createConsumers(NotificationType.HOOK, numThreads)
>  
> is always returning one element list, which effectively always starts one 
> consumer
> List> consumers = 
> Collections.singletonList(kafkaConsumer);
>  
> Log incorrectly says that nuber of consumers has been created:
> LOG.info("<== KafkaNotification.createConsumers(notificationType={}, 
> numConsumers={}, autoCommitEnabled={})", notificationType, numConsumers, 
> autoCommitEnabled)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ATLAS-3254) Atlas entity with large array of refs causes performance issues for lineage

2019-06-26 Thread Adam Rempter (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16873225#comment-16873225
 ] 

Adam Rempter commented on ATLAS-3254:
-

Sorry I missed your question somehow.

 

I am attaching two files, one is kafka message used to create atlas lineage, 
other shows rest get for pseudoDir containing s3 objects.

[^example_create_entities.json]

[^rest_entity_get_pseudodir.json]

 

^I think problem is not only related to UI only, we also see problems with 
processing messages (as decribed above) by kafka consumer.^ 

^I think it can be easily reproduced by creating more than 100 s3 objects in 
pseudoDir array  "s3Objects": []^

^The size of the json grows with number of s3objects added.^ 

 

> Atlas entity with large array of refs causes performance issues for lineage
> ---
>
> Key: ATLAS-3254
> URL: https://issues.apache.org/jira/browse/ATLAS-3254
> Project: Atlas
>  Issue Type: Bug
>  Components:  atlas-core, atlas-webui
>Affects Versions: 1.0.0, 2.0.0
>Reporter: Adam Rempter
>Priority: Major
>  Labels: performance
> Attachments: example_create_entities.json, 
> rest_entity_get_pseudodir.json
>
>
> We use “aws_s3_pseudo_dir” type from 3020-aws_s3_typedefs.json model.
> It has following property: 
> "name":    "s3Objects",
> "typeName":    "array"
>  
> Now in AWS buckets you can have thousands of objects. This causes that 
> s3Objects array grows quite quickly, causing aws_s3_pseudo_dir entity Json to 
> rich easly few MBs.
>  
> Then we start seeing problems like:
>  * UI is dying on displaying entity properties or lineage
>  * Error in logs: audit record too long: entityType=aws_s3_pseudo_dir, 
> guid=24398271-6ba0-4db5-adfa-38e432dc55ce, size=1053931; maxSize=1048576. 
> entity attribute values not stored in audit (EntityAuditListenerV2:234)
>  * Some errors with write to HBase (java.lang.IllegalArgumentException: 
> KeyValue size too large, as workaround we set hbase.client.keyvalue.maxsize 
> param to 0)
>  * kafka consumer errors (we can of course set some parameters on consumer, 
> but I think it is just workaround)
> …
> Exception in NotificationHookConsumer (NotificationHookConsumer:332)
> org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be 
> completed since the group has already rebalanced and assigned the partitions 
> to another member. This means that the time between subsequen
> t calls to poll() was longer than the configured max.poll.interval.ms, which 
> typically implies that the poll loop is spending too much time message 
> processing. You can address this either by increasing the sessio
> n timeout or by reducing the maximum size of batches returned in poll() with 
> max.poll.records.
> …
> Specifying pseudo_dir is required for s3objects:
> name": "pseudoDirectory",
> "typeName": "aws_s3_pseudo_dir",
> "cardinality": "SINGLE",
> "isIndexable": false,
> *"isOptional": false,*
> "isUnique": false,
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ATLAS-3254) Atlas entity with large array of refs causes performance issues for lineage

2019-06-26 Thread Adam Rempter (JIRA)


 [ 
https://issues.apache.org/jira/browse/ATLAS-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Rempter updated ATLAS-3254:

Attachment: example_create_entities.json

> Atlas entity with large array of refs causes performance issues for lineage
> ---
>
> Key: ATLAS-3254
> URL: https://issues.apache.org/jira/browse/ATLAS-3254
> Project: Atlas
>  Issue Type: Bug
>  Components:  atlas-core, atlas-webui
>Affects Versions: 1.0.0, 2.0.0
>Reporter: Adam Rempter
>Priority: Major
>  Labels: performance
> Attachments: example_create_entities.json
>
>
> We use “aws_s3_pseudo_dir” type from 3020-aws_s3_typedefs.json model.
> It has following property: 
> "name":    "s3Objects",
> "typeName":    "array"
>  
> Now in AWS buckets you can have thousands of objects. This causes that 
> s3Objects array grows quite quickly, causing aws_s3_pseudo_dir entity Json to 
> rich easly few MBs.
>  
> Then we start seeing problems like:
>  * UI is dying on displaying entity properties or lineage
>  * Error in logs: audit record too long: entityType=aws_s3_pseudo_dir, 
> guid=24398271-6ba0-4db5-adfa-38e432dc55ce, size=1053931; maxSize=1048576. 
> entity attribute values not stored in audit (EntityAuditListenerV2:234)
>  * Some errors with write to HBase (java.lang.IllegalArgumentException: 
> KeyValue size too large, as workaround we set hbase.client.keyvalue.maxsize 
> param to 0)
>  * kafka consumer errors (we can of course set some parameters on consumer, 
> but I think it is just workaround)
> …
> Exception in NotificationHookConsumer (NotificationHookConsumer:332)
> org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be 
> completed since the group has already rebalanced and assigned the partitions 
> to another member. This means that the time between subsequen
> t calls to poll() was longer than the configured max.poll.interval.ms, which 
> typically implies that the poll loop is spending too much time message 
> processing. You can address this either by increasing the sessio
> n timeout or by reducing the maximum size of batches returned in poll() with 
> max.poll.records.
> …
> Specifying pseudo_dir is required for s3objects:
> name": "pseudoDirectory",
> "typeName": "aws_s3_pseudo_dir",
> "cardinality": "SINGLE",
> "isIndexable": false,
> *"isOptional": false,*
> "isUnique": false,
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ATLAS-3305) Unable to scale atlas kafka consumers

2019-06-26 Thread Adam Rempter (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16873114#comment-16873114
 ] 

Adam Rempter commented on ATLAS-3305:
-

With the change I can see atlas is able to spawn consumers correctly:

TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
ATLAS_HOOK 2 - 0 - consumer-2-51e593e8-549b-47b7-a6d0-0f543ffaa306 /127.0.0.1 
consumer-2
ATLAS_HOOK 3 - 0 - consumer-2-51e593e8-549b-47b7-a6d0-0f543ffaa306 /127.0.0.1 
consumer-2
ATLAS_HOOK 0 - 1 - consumer-1-542ae27b-98be-4ee6-a206-5445950b94ff /127.0.0.1 
consumer-1
ATLAS_HOOK 1 - 0 - consumer-1-542ae27b-98be-4ee6-a206-5445950b94ff /127.0.0.1 
consumer-1

> Unable to scale atlas kafka consumers
> -
>
> Key: ATLAS-3305
> URL: https://issues.apache.org/jira/browse/ATLAS-3305
> Project: Atlas
>  Issue Type: Bug
>  Components:  atlas-core, atlas-intg
>Affects Versions: 1.1.0, 2.0.0
>Reporter: Adam Rempter
>Priority: Major
>  Labels: performance
>
> We wanted to scale kafka consumers for atlas, as we are getting many lineage 
> messages and processing them just with one consumer is not enough. 
>  
> There is parameter atlas.notification.hook.numthreads to scale consumers in  
> NotificationHookConsumer.
> But the method:
>  
> notificationInterface.createConsumers(NotificationType.HOOK, numThreads)
>  
> is always returning one element list, which effectively always starts one 
> consumer
> List> consumers = 
> Collections.singletonList(kafkaConsumer);
>  
> Log incorrectly says that nuber of consumers has been created:
> LOG.info("<== KafkaNotification.createConsumers(notificationType={}, 
> numConsumers={}, autoCommitEnabled={})", notificationType, numConsumers, 
> autoCommitEnabled)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ATLAS-3305) Unable to scale atlas kafka consumers

2019-06-25 Thread Adam Rempter (JIRA)
Adam Rempter created ATLAS-3305:
---

 Summary: Unable to scale atlas kafka consumers
 Key: ATLAS-3305
 URL: https://issues.apache.org/jira/browse/ATLAS-3305
 Project: Atlas
  Issue Type: Bug
  Components:  atlas-core, atlas-intg
Affects Versions: 2.0.0, 1.1.0
Reporter: Adam Rempter


We wanted to scale kafka consumers for atlas, as we are getting many lineage 
messages and processing them just with one consumer is not enough. 

 

There is parameter atlas.notification.hook.numthreads to scale consumers in  

NotificationHookConsumer.

But the method:

 

notificationInterface.createConsumers(NotificationType.HOOK, numThreads)

 

is always returning one element list, which effectively always starts one 
consumer

List> consumers = 
Collections.singletonList(kafkaConsumer);

 

Log incorrectly says that nuber of consumers has been created:

LOG.info("<== KafkaNotification.createConsumers(notificationType={}, 
numConsumers={}, autoCommitEnabled={})", notificationType, numConsumers, 
autoCommitEnabled)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ATLAS-3261) Ranger Authorizer for Atlas is not checked for kafka messages

2019-06-05 Thread Adam Rempter (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-3261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16856722#comment-16856722
 ] 

Adam Rempter commented on ATLAS-3261:
-

I did some checking on this issue, and what happens, when new message is 
received, security check is done by AtlasAuthorizationUtils

before entity is created. 

 

public static boolean isAccessAllowed(AtlasAdminAccessRequest request) {
 boolean ret = false;
 *String userName = getCurrentUserName(); <-- returns ""*

*if (StringUtils.isNotEmpty(userName)) { <-- false*
 try {
 AtlasAuthorizer authorizer = AtlasAuthorizerFactory.getAtlasAuthorizer();

request.setUser(userName, getCurrentUserGroups());
 request.setClientIPAddress(RequestContext.get().getClientIPAddress());
 ret = authorizer.isAccessAllowed(request);
 } catch (AtlasAuthorizationException e) {
 LOG.error("Unable to obtain AtlasAuthorizer", e);
 }
 *} else { <-- if no user return true*
 ret = true;
 }

 

Now I added user from kafka as current security principal (see patch) and then 
it properly calls authorizer:

rg.apache.atlas.exception.AtlasBaseException: *testuser is not authorized to 
perform create entity: type=server*
 at 
org.apache.atlas.authorize.AtlasAuthorizationUtils.verifyAccess(AtlasAuthorizationUtils.java:61)
 at 
org.apache.atlas.repository.store.graph.v2.AtlasEntityStoreV2.createOrUpdate(AtlasEntityStoreV2.java:664)

 

But I am not sure it this is best way to handle this case...

> Ranger Authorizer for Atlas is not checked for kafka messages
> -
>
> Key: ATLAS-3261
> URL: https://issues.apache.org/jira/browse/ATLAS-3261
> Project: Atlas
>  Issue Type: Bug
>  Components: atlas-intg
>Affects Versions: 1.1.0, 2.0.0
>Reporter: Adam Rempter
>Priority: Major
>  Labels: security
> Attachments: 0001-set-message-user-as-principal.patch
>
>
> Atlas can be configured to authorize user actions with Ranger 
> ([https://atlas.apache.org/1.1.0/Atlas-Authorization-Ranger-Authorizer.html]).
>  
> When I use user via REST it works:
> curl -X GET -u testuser:testuser 
> http://localhost:21000/api/atlas/v2/entity/guid/f52151a0-fa08-4eab-b885-ece847a106e0
> {"errorCode":"ATLAS-403-00-001","errorMessage":"testuser is not authorized to 
> perform read entity: guid=f52151a0-fa08-4eab-b885-ece847a106e0"}
>  
> When I send lineage to ATLAS_HOOK, I can create lineage successfully:
> 2019-06-04 14:01:38,974 
> 2019-06-04T12:01:23.867Z|testuser|NotificationHookConsumer|POST|api/atlas/v2/entity/|200|15119
> In above, I think user is taken from lineage message field user in json.
>  
> Of course above is valid if another policy in ranger  (kafka plugin) allows 
> puting messages to ATLAS_HOOK topic. 
>  
> But if I have one user (technical account) to produce to kafka and I want to 
> deny access in Atlas based on user from message, atlas ranger authorizer 
> doens't work.
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ATLAS-3261) Ranger Authorizer for Atlas is not checked for kafka messages

2019-06-05 Thread Adam Rempter (JIRA)


 [ 
https://issues.apache.org/jira/browse/ATLAS-3261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Rempter updated ATLAS-3261:

Attachment: 0001-set-message-user-as-principal.patch

> Ranger Authorizer for Atlas is not checked for kafka messages
> -
>
> Key: ATLAS-3261
> URL: https://issues.apache.org/jira/browse/ATLAS-3261
> Project: Atlas
>  Issue Type: Bug
>  Components: atlas-intg
>Affects Versions: 1.1.0, 2.0.0
>Reporter: Adam Rempter
>Priority: Major
>  Labels: security
> Attachments: 0001-set-message-user-as-principal.patch
>
>
> Atlas can be configured to authorize user actions with Ranger 
> ([https://atlas.apache.org/1.1.0/Atlas-Authorization-Ranger-Authorizer.html]).
>  
> When I use user via REST it works:
> curl -X GET -u testuser:testuser 
> http://localhost:21000/api/atlas/v2/entity/guid/f52151a0-fa08-4eab-b885-ece847a106e0
> {"errorCode":"ATLAS-403-00-001","errorMessage":"testuser is not authorized to 
> perform read entity: guid=f52151a0-fa08-4eab-b885-ece847a106e0"}
>  
> When I send lineage to ATLAS_HOOK, I can create lineage successfully:
> 2019-06-04 14:01:38,974 
> 2019-06-04T12:01:23.867Z|testuser|NotificationHookConsumer|POST|api/atlas/v2/entity/|200|15119
> In above, I think user is taken from lineage message field user in json.
>  
> Of course above is valid if another policy in ranger  (kafka plugin) allows 
> puting messages to ATLAS_HOOK topic. 
>  
> But if I have one user (technical account) to produce to kafka and I want to 
> deny access in Atlas based on user from message, atlas ranger authorizer 
> doens't work.
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ATLAS-3254) Atlas entity with large array of refs causes performance issues for lineage

2019-06-04 Thread Adam Rempter (JIRA)


 [ 
https://issues.apache.org/jira/browse/ATLAS-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Rempter updated ATLAS-3254:

Issue Type: Bug  (was: Improvement)

> Atlas entity with large array of refs causes performance issues for lineage
> ---
>
> Key: ATLAS-3254
> URL: https://issues.apache.org/jira/browse/ATLAS-3254
> Project: Atlas
>  Issue Type: Bug
>  Components:  atlas-core, atlas-webui
>Affects Versions: 1.0.0, 2.0.0
>Reporter: Adam Rempter
>Priority: Major
>  Labels: performance
>
> We use “aws_s3_pseudo_dir” type from 3020-aws_s3_typedefs.json model.
> It has following property: 
> "name":    "s3Objects",
> "typeName":    "array"
>  
> Now in AWS buckets you can have thousands of objects. This causes that 
> s3Objects array grows quite quickly, causing aws_s3_pseudo_dir entity Json to 
> rich easly few MBs.
>  
> Then we start seeing problems like:
>  * UI is dying on displaying entity properties or lineage
>  * Error in logs: audit record too long: entityType=aws_s3_pseudo_dir, 
> guid=24398271-6ba0-4db5-adfa-38e432dc55ce, size=1053931; maxSize=1048576. 
> entity attribute values not stored in audit (EntityAuditListenerV2:234)
>  * Some errors with write to HBase (java.lang.IllegalArgumentException: 
> KeyValue size too large, as workaround we set hbase.client.keyvalue.maxsize 
> param to 0)
>  * kafka consumer errors (we can of course set some parameters on consumer, 
> but I think it is just workaround)
> …
> Exception in NotificationHookConsumer (NotificationHookConsumer:332)
> org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be 
> completed since the group has already rebalanced and assigned the partitions 
> to another member. This means that the time between subsequen
> t calls to poll() was longer than the configured max.poll.interval.ms, which 
> typically implies that the poll loop is spending too much time message 
> processing. You can address this either by increasing the sessio
> n timeout or by reducing the maximum size of batches returned in poll() with 
> max.poll.records.
> …
> Specifying pseudo_dir is required for s3objects:
> name": "pseudoDirectory",
> "typeName": "aws_s3_pseudo_dir",
> "cardinality": "SINGLE",
> "isIndexable": false,
> *"isOptional": false,*
> "isUnique": false,
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ATLAS-3261) Ranger Authorizer for Atlas is not checked for kafka messages

2019-06-04 Thread Adam Rempter (JIRA)
Adam Rempter created ATLAS-3261:
---

 Summary: Ranger Authorizer for Atlas is not checked for kafka 
messages
 Key: ATLAS-3261
 URL: https://issues.apache.org/jira/browse/ATLAS-3261
 Project: Atlas
  Issue Type: Bug
  Components: atlas-intg
Affects Versions: 2.0.0, 1.1.0
Reporter: Adam Rempter


Atlas can be configured to authorize user actions with Ranger 
([https://atlas.apache.org/1.1.0/Atlas-Authorization-Ranger-Authorizer.html]).

 

When I use user via REST it works:

curl -X GET -u testuser:testuser 
http://localhost:21000/api/atlas/v2/entity/guid/f52151a0-fa08-4eab-b885-ece847a106e0
{"errorCode":"ATLAS-403-00-001","errorMessage":"testuser is not authorized to 
perform read entity: guid=f52151a0-fa08-4eab-b885-ece847a106e0"}

 

When I send lineage to ATLAS_HOOK, I can create lineage successfully:

2019-06-04 14:01:38,974 
2019-06-04T12:01:23.867Z|testuser|NotificationHookConsumer|POST|api/atlas/v2/entity/|200|15119

In above, I think user is taken from lineage message field user in json.

 

Of course above is valid if another policy in ranger  (kafka plugin) allows 
puting messages to ATLAS_HOOK topic. 

 

But if I have one user (technical account) to produce to kafka and I want to 
deny access in Atlas based on user from message, atlas ranger authorizer 
doens't work.

 

 

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ATLAS-3254) Atlas entity with large array of refs causes performance issues for lineage

2019-05-31 Thread Adam Rempter (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852840#comment-16852840
 ] 

Adam Rempter commented on ATLAS-3254:
-

Atlas has created so far pseudo directory which json representation has 27MB!

Maybe there should be some limit to which entity should grow?

> Atlas entity with large array of refs causes performance issues for lineage
> ---
>
> Key: ATLAS-3254
> URL: https://issues.apache.org/jira/browse/ATLAS-3254
> Project: Atlas
>  Issue Type: Improvement
>  Components:  atlas-core, atlas-webui
>Affects Versions: 1.0.0, 2.0.0
>Reporter: Adam Rempter
>Priority: Major
>  Labels: performance
>
> We use “aws_s3_pseudo_dir” type from 3020-aws_s3_typedefs.json model.
> It has following property: 
> "name":    "s3Objects",
> "typeName":    "array"
>  
> Now in AWS buckets you can have thousands of objects. This causes that 
> s3Objects array grows quite quickly, causing aws_s3_pseudo_dir entity Json to 
> rich easly few MBs.
>  
> Then we start seeing problems like:
>  * UI is dying on displaying entity properties or lineage
>  * Error in logs: audit record too long: entityType=aws_s3_pseudo_dir, 
> guid=24398271-6ba0-4db5-adfa-38e432dc55ce, size=1053931; maxSize=1048576. 
> entity attribute values not stored in audit (EntityAuditListenerV2:234)
>  * Some errors with write to HBase (java.lang.IllegalArgumentException: 
> KeyValue size too large, as workaround we set hbase.client.keyvalue.maxsize 
> param to 0)
>  * kafka consumer errors (we can of course set some parameters on consumer, 
> but I think it is just workaround)
> …
> Exception in NotificationHookConsumer (NotificationHookConsumer:332)
> org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be 
> completed since the group has already rebalanced and assigned the partitions 
> to another member. This means that the time between subsequen
> t calls to poll() was longer than the configured max.poll.interval.ms, which 
> typically implies that the poll loop is spending too much time message 
> processing. You can address this either by increasing the sessio
> n timeout or by reducing the maximum size of batches returned in poll() with 
> max.poll.records.
> …
> Specifying pseudo_dir is required for s3objects:
> name": "pseudoDirectory",
> "typeName": "aws_s3_pseudo_dir",
> "cardinality": "SINGLE",
> "isIndexable": false,
> *"isOptional": false,*
> "isUnique": false,
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ATLAS-3254) Atlas entity with large array of refs causes performance issues for lineage

2019-05-31 Thread Adam Rempter (JIRA)
Adam Rempter created ATLAS-3254:
---

 Summary: Atlas entity with large array of refs causes performance 
issues for lineage
 Key: ATLAS-3254
 URL: https://issues.apache.org/jira/browse/ATLAS-3254
 Project: Atlas
  Issue Type: Improvement
  Components:  atlas-core, atlas-webui
Affects Versions: 2.0.0, 1.0.0
Reporter: Adam Rempter


We use “aws_s3_pseudo_dir” type from 3020-aws_s3_typedefs.json model.

It has following property: 

"name":    "s3Objects",

"typeName":    "array"

 

Now in AWS buckets you can have thousands of objects. This causes that 
s3Objects array grows quite quickly, causing aws_s3_pseudo_dir entity Json to 
rich easly few MBs.

 

Then we start seeing problems like:
 * UI is dying on displaying entity properties or lineage
 * Error in logs: audit record too long: entityType=aws_s3_pseudo_dir, 
guid=24398271-6ba0-4db5-adfa-38e432dc55ce, size=1053931; maxSize=1048576. 
entity attribute values not stored in audit (EntityAuditListenerV2:234)
 * Some errors with write to HBase (java.lang.IllegalArgumentException: 
KeyValue size too large, as workaround we set hbase.client.keyvalue.maxsize 
param to 0)
 * kafka consumer errors (we can of course set some parameters on consumer, but 
I think it is just workaround)

…

Exception in NotificationHookConsumer (NotificationHookConsumer:332)

org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be 
completed since the group has already rebalanced and assigned the partitions to 
another member. This means that the time between subsequen

t calls to poll() was longer than the configured max.poll.interval.ms, which 
typically implies that the poll loop is spending too much time message 
processing. You can address this either by increasing the sessio

n timeout or by reducing the maximum size of batches returned in poll() with 
max.poll.records.

…

Specifying pseudo_dir is required for s3objects:

name": "pseudoDirectory",
"typeName": "aws_s3_pseudo_dir",
"cardinality": "SINGLE",
"isIndexable": false,
*"isOptional": false,*
"isUnique": false,

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)