[jira] [Commented] (ATLAS-4263) KafkaUtils sets invalid dynamic JAAS config

2021-04-27 Thread Vladislav Glinskiy (Jira)


[ 
https://issues.apache.org/jira/browse/ATLAS-4263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17333462#comment-17333462
 ] 

Vladislav Glinskiy commented on ATLAS-4263:
---

cc [~jayendrap], [~nixon]

> KafkaUtils sets invalid dynamic JAAS config
> ---
>
> Key: ATLAS-4263
> URL: https://issues.apache.org/jira/browse/ATLAS-4263
> Project: Atlas
>  Issue Type: Task
>  Components:  atlas-core
>Affects Versions: 2.1.0, 3.0.0
>Reporter: Vladislav Glinskiy
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> [KafkaUtils|https://github.com/apache/atlas/blob/8d3c4ab0e8844f04e29f66acb3577e9d40de9a16/common/src/main/java/org/apache/atlas/utils/KafkaUtils.java#L195]
>  doesn't always 
> [enclose|https://github.com/apache/atlas/blob/8d3c4ab0e8844f04e29f66acb3577e9d40de9a16/common/src/main/java/org/apache/atlas/utils/KafkaUtils.java#L316]
>  property values in double-quotes, thus, setting invalid dynamic JAAS config 
> for token auth in some cases.
> Faced this issue with Spark Atlas Connector trying to configure Atlas client 
> to use delegation tokens. The following configuration is not handled properly:
> {code:java}
> atlas.jaas.KafkaClient.option.username=30CQ4q1hQMy0dB6X0eXfxQ
> atlas.jaas.KafkaClient.option.password=KdaUQ4FlKWlDxwQrAeFGUVbb6sR0P+zoqOZDZjtIRP1wseXbSbhiTjz3QI9Ur9o4LTYZSv8TE1QqUC4FSwnoTA==
> {code}
> and results in the following error:
> {code:java}
> java.lang.IllegalArgumentException: Value not specified for key 'null' in 
> JAAS config
>   at 
> org.apache.kafka.common.security.JaasConfig.parseAppConfigurationEntry(JaasConfig.java:116)
>   at 
> org.apache.kafka.common.security.JaasConfig.(JaasConfig.java:63)
>   at 
> org.apache.kafka.common.security.JaasContext.load(JaasContext.java:90)
>   at 
> org.apache.kafka.common.security.JaasContext.loadClientContext(JaasContext.java:84)
> {code}
> [KafkaUtils|https://github.com/apache/atlas/blob/8d3c4ab0e8844f04e29f66acb3577e9d40de9a16/common/src/main/java/org/apache/atlas/utils/KafkaUtils.java#L195]
>  should always enclose property values in double-quotes, since unenclosed 
> digits and '+' sign can not be parsed by Kafka 
> [JaasConfig|https://github.com/apache/kafka/blob/2.0.0/clients/src/main/java/org/apache/kafka/common/security/JaasConfig.java#L116].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ATLAS-4263) KafkaUtils sets invalid dynamic JAAS config

2021-04-27 Thread Vladislav Glinskiy (Jira)
Vladislav Glinskiy created ATLAS-4263:
-

 Summary: KafkaUtils sets invalid dynamic JAAS config
 Key: ATLAS-4263
 URL: https://issues.apache.org/jira/browse/ATLAS-4263
 Project: Atlas
  Issue Type: Task
  Components:  atlas-core
Affects Versions: 2.1.0, 3.0.0
Reporter: Vladislav Glinskiy


[KafkaUtils|https://github.com/apache/atlas/blob/8d3c4ab0e8844f04e29f66acb3577e9d40de9a16/common/src/main/java/org/apache/atlas/utils/KafkaUtils.java#L195]
 doesn't always 
[enclose|https://github.com/apache/atlas/blob/8d3c4ab0e8844f04e29f66acb3577e9d40de9a16/common/src/main/java/org/apache/atlas/utils/KafkaUtils.java#L316]
 property values in double-quotes, thus, setting invalid dynamic JAAS config 
for token auth in some cases.

Faced this issue with Spark Atlas Connector trying to configure Atlas client to 
use delegation tokens. The following configuration is not handled properly:
{code:java}
atlas.jaas.KafkaClient.option.username=30CQ4q1hQMy0dB6X0eXfxQ
atlas.jaas.KafkaClient.option.password=KdaUQ4FlKWlDxwQrAeFGUVbb6sR0P+zoqOZDZjtIRP1wseXbSbhiTjz3QI9Ur9o4LTYZSv8TE1QqUC4FSwnoTA==
{code}
and results in the following error:
{code:java}
java.lang.IllegalArgumentException: Value not specified for key 'null' in JAAS 
config
at 
org.apache.kafka.common.security.JaasConfig.parseAppConfigurationEntry(JaasConfig.java:116)
at 
org.apache.kafka.common.security.JaasConfig.(JaasConfig.java:63)
at 
org.apache.kafka.common.security.JaasContext.load(JaasContext.java:90)
at 
org.apache.kafka.common.security.JaasContext.loadClientContext(JaasContext.java:84)
{code}
[KafkaUtils|https://github.com/apache/atlas/blob/8d3c4ab0e8844f04e29f66acb3577e9d40de9a16/common/src/main/java/org/apache/atlas/utils/KafkaUtils.java#L195]
 should always enclose property values in double-quotes, since unenclosed 
digits and '+' sign can not be parsed by Kafka 
[JaasConfig|https://github.com/apache/kafka/blob/2.0.0/clients/src/main/java/org/apache/kafka/common/security/JaasConfig.java#L116].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ATLAS-3665) Add 'queryText' attribute to the 'spark_process' type

2020-03-16 Thread Vladislav Glinskiy (Jira)


 [ 
https://issues.apache.org/jira/browse/ATLAS-3665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladislav Glinskiy updated ATLAS-3665:
--
Description: Add 'queryText' attribute to the 'spark_process' type in order 
to make `spark_process` more readable by the user. The `queryText` attribute 
stores exact SQL query that are executed within the Spark session as a Spark 
process.  (was: Add 'recentQueries' attribute to the 'spark_process' type in 
order to make `spark_process` more readable by the user. The `recentQueries` 
attribute stores exact SQL queries that are executed within the Spark session.)

> Add 'queryText' attribute to the 'spark_process' type
> -
>
> Key: ATLAS-3665
> URL: https://issues.apache.org/jira/browse/ATLAS-3665
> Project: Atlas
>  Issue Type: Task
>Reporter: Vladislav Glinskiy
>Priority: Major
> Fix For: 2.1.0, 3.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Add 'queryText' attribute to the 'spark_process' type in order to make 
> `spark_process` more readable by the user. The `queryText` attribute stores 
> exact SQL query that are executed within the Spark session as a Spark process.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ATLAS-3665) Add 'queryText' attribute to the 'spark_process' type

2020-03-16 Thread Vladislav Glinskiy (Jira)


 [ 
https://issues.apache.org/jira/browse/ATLAS-3665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladislav Glinskiy updated ATLAS-3665:
--
Summary: Add 'queryText' attribute to the 'spark_process' type  (was: Add 
'recentQueries' attribute to the 'spark_process' type)

> Add 'queryText' attribute to the 'spark_process' type
> -
>
> Key: ATLAS-3665
> URL: https://issues.apache.org/jira/browse/ATLAS-3665
> Project: Atlas
>  Issue Type: Task
>Reporter: Vladislav Glinskiy
>Priority: Major
> Fix For: 2.1.0, 3.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Add 'recentQueries' attribute to the 'spark_process' type in order to make 
> `spark_process` more readable by the user. The `recentQueries` attribute 
> stores exact SQL queries that are executed within the Spark session.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ATLAS-3665) Add 'recentQueries' attribute to the 'spark_process' type

2020-03-13 Thread Vladislav Glinskiy (Jira)


[ 
https://issues.apache.org/jira/browse/ATLAS-3665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17058805#comment-17058805
 ] 

Vladislav Glinskiy commented on ATLAS-3665:
---

cc [~kabhwan] [~sarath] 

> Add 'recentQueries' attribute to the 'spark_process' type
> -
>
> Key: ATLAS-3665
> URL: https://issues.apache.org/jira/browse/ATLAS-3665
> Project: Atlas
>  Issue Type: Task
>Reporter: Vladislav Glinskiy
>Priority: Major
> Fix For: 2.1.0, 3.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Add 'recentQueries' attribute to the 'spark_process' type in order to make 
> `spark_process` more readable by the user. The `recentQueries` attribute 
> stores exact SQL queries that are executed within the Spark session.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ATLAS-3665) Add 'recentQueries' attribute to the 'spark_process' type

2020-03-13 Thread Vladislav Glinskiy (Jira)
Vladislav Glinskiy created ATLAS-3665:
-

 Summary: Add 'recentQueries' attribute to the 'spark_process' type
 Key: ATLAS-3665
 URL: https://issues.apache.org/jira/browse/ATLAS-3665
 Project: Atlas
  Issue Type: Task
Reporter: Vladislav Glinskiy
 Fix For: 2.1.0, 3.0.0


Add 'recentQueries' attribute to the 'spark_process' type in order to make 
`spark_process` more readable by the user. The `recentQueries` attribute stores 
exact SQL queries that are executed within the Spark session.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ATLAS-3661) Create 'spark_column_lineage' type and relationship definition to add support of column level lineage

2020-03-11 Thread Vladislav Glinskiy (Jira)


[ 
https://issues.apache.org/jira/browse/ATLAS-3661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057421#comment-17057421
 ] 

Vladislav Glinskiy commented on ATLAS-3661:
---

cc [~kabhwan] [~sarath] 

> Create 'spark_column_lineage' type and relationship definition to add support 
> of column level lineage
> -
>
> Key: ATLAS-3661
> URL: https://issues.apache.org/jira/browse/ATLAS-3661
> Project: Atlas
>  Issue Type: Task
>Reporter: Vladislav Glinskiy
>Priority: Major
> Fix For: 2.1.0, 3.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Create 'spark_column_lineage' type and corresponding 
> 'spark_process_column_lineage' relationship definition to add support of 
> column-level lineage.
> Column level lineage refers to lineage created between the input and output 
> columns.
>  For example:
> {code:java}
> hive > create table employee_ctas as select id from employee;{code}
>  For the above query, lineage is created from 'employee' to 'employee_ctas',
>  and also from 'employee.id' to 'employee_ctas.id'.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ATLAS-3661) Create 'spark_column_lineage' type and relationship definition to add support of column level lineage

2020-03-11 Thread Vladislav Glinskiy (Jira)
Vladislav Glinskiy created ATLAS-3661:
-

 Summary: Create 'spark_column_lineage' type and relationship 
definition to add support of column level lineage
 Key: ATLAS-3661
 URL: https://issues.apache.org/jira/browse/ATLAS-3661
 Project: Atlas
  Issue Type: Task
Reporter: Vladislav Glinskiy
 Fix For: 2.1.0, 3.0.0


Create 'spark_column_lineage' type and corresponding 
'spark_process_column_lineage' relationship definition to add support of 
column-level lineage.

Column level lineage refers to lineage created between the input and output 
columns.
 For example:
{code:java}
hive > create table employee_ctas as select id from employee;{code}

 For the above query, lineage is created from 'employee' to 'employee_ctas',
 and also from 'employee.id' to 'employee_ctas.id'.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ATLAS-3655) Create 'spark_application' type to avoid 'spark_process' from being updated for multiple operations

2020-03-05 Thread Vladislav Glinskiy (Jira)


[ 
https://issues.apache.org/jira/browse/ATLAS-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17052471#comment-17052471
 ] 

Vladislav Glinskiy commented on ATLAS-3655:
---

cc [~kabhwan] [~sarath] 

> Create 'spark_application' type to avoid 'spark_process' from being updated 
> for multiple operations
> ---
>
> Key: ATLAS-3655
> URL: https://issues.apache.org/jira/browse/ATLAS-3655
> Project: Atlas
>  Issue Type: Task
>Reporter: Vladislav Glinskiy
>Priority: Major
> Fix For: 2.1.0, 3.0.0
>
> Attachments: Screenshot from 2020-03-03 16-09-39.png
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Create 'spark_application' type to avoid 'spark_process' from being updated 
> for multiple operations. Currently, Spark Atlas Connector uses 
> 'spark_process' as a top-level type for a Spark session, thus it's being 
> updated for multiple operations within the same session.
> The following statements:
> {code:java}
> spark.sql("create table table_1(col1 int,col2 string)");
> spark.sql("create table table_2 as select * from table_1");
> {code}
> result in the next correct lineage:
> table1 --> spark_process1 ---> table2
> but executing similar statements in the same spark session:
> {code:java}
> spark.sql("create table table_3(col1 int,col2 string)"); 
> spark.sql("create table table_4 as select * from table_3");
> {code}
> result in the same 'spark_process' being updated and the lineage now connects 
> all the 4 tables(see screenshot in the attachments).
>  
> The proposal is to create a 'spark_application' entity and associate all 
> 'spark_process' entities (created within that session) to it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ATLAS-3655) Create 'spark_application' type to avoid 'spark_process' from being updated for multiple operations

2020-03-05 Thread Vladislav Glinskiy (Jira)
Vladislav Glinskiy created ATLAS-3655:
-

 Summary: Create 'spark_application' type to avoid 'spark_process' 
from being updated for multiple operations
 Key: ATLAS-3655
 URL: https://issues.apache.org/jira/browse/ATLAS-3655
 Project: Atlas
  Issue Type: Task
Reporter: Vladislav Glinskiy
 Fix For: 2.1.0, 3.0.0
 Attachments: Screenshot from 2020-03-03 16-09-39.png

Create 'spark_application' type to avoid 'spark_process' from being updated for 
multiple operations. Currently, Spark Atlas Connector uses 'spark_process' as a 
top-level type for a Spark session, thus it's being updated for multiple 
operations within the same session.

The following statements:
{code:java}
spark.sql("create table table_1(col1 int,col2 string)");
spark.sql("create table table_2 as select * from table_1");
{code}
result in the next correct lineage:

table1 --> spark_process1 ---> table2

but executing similar statements in the same spark session:
{code:java}
spark.sql("create table table_3(col1 int,col2 string)"); 
spark.sql("create table table_4 as select * from table_3");
{code}
result in the same 'spark_process' being updated and the lineage now connects 
all the 4 tables(see screenshot in the attachments).

 

The proposal is to create a 'spark_application' entity and associate all 
'spark_process' entities (created within that session) to it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ATLAS-3640) Update 'spark_ml_model_ml_directory' and 'spark_ml_pipeline_ml_directory' relationship definitions

2020-03-02 Thread Vladislav Glinskiy (Jira)


 [ 
https://issues.apache.org/jira/browse/ATLAS-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladislav Glinskiy resolved ATLAS-3640.
---
Resolution: Invalid

> Update 'spark_ml_model_ml_directory' and 'spark_ml_pipeline_ml_directory' 
> relationship definitions
> --
>
> Key: ATLAS-3640
> URL: https://issues.apache.org/jira/browse/ATLAS-3640
> Project: Atlas
>  Issue Type: Task
>Reporter: Vladislav Glinskiy
>Priority: Major
> Fix For: 2.1.0, 3.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Update 'spark_ml_model_ml_directory' and 'spark_ml_pipeline_ml_directory' 
> relationship definitions to use 'DataSet' type instead of it's child type 
> 'spark_ml_directory'. This is required in order to integrate Spark Atlas 
> Connector's ML event processor.
> Previously, Spark Atlas Connector used the 'spark_ml_directory' model for ML 
> model directory but this is changed in the scope of 
> [https://github.com/hortonworks-spark/spark-atlas-connector/issues/61], 
> [https://github.com/hortonworks-spark/spark-atlas-connector/pull/62] so ML 
> model directory is 'DataSet' entity(i.e. 'hdfs_path').
> Thus, relationship definitions must be updated, otherwise, an attempt to 
> create relation leads to: 
> {code:java}
> org.apache.atlas.exception.AtlasBaseException: invalid relationshipDef: 
> spark_ml_model_ml_directory: end type 1: spark_ml_directory, end type 2: 
> spark_ml_model
> {code}
> since 'COMPOSITION' requires 'spark_ml_directory' to be set.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ATLAS-3646) Create new 'spark_ml_model_dataset' and 'spark_ml_pipeline_dataset' relationship definitions

2020-03-02 Thread Vladislav Glinskiy (Jira)


[ 
https://issues.apache.org/jira/browse/ATLAS-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17049118#comment-17049118
 ] 

Vladislav Glinskiy commented on ATLAS-3646:
---

cc [~kabhwan] [~sarath] 

> Create new 'spark_ml_model_dataset' and 'spark_ml_pipeline_dataset' 
> relationship definitions
> 
>
> Key: ATLAS-3646
> URL: https://issues.apache.org/jira/browse/ATLAS-3646
> Project: Atlas
>  Issue Type: Task
>Reporter: Vladislav Glinskiy
>Priority: Major
> Fix For: 2.1.0, 3.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Create new 'spark_ml_model_dataset' and 'spark_ml_pipeline_dataset' 
> relationship definitions. This is required in order to integrate Spark Atlas 
> Connector's ML event processor.
> Previously, Spark Atlas Connector used the 'spark_ml_directory' model for the 
> ML model directory and 'spark_ml_model_ml_directory', 
> 'spark_ml_pipeline_ml_directory' relationship definitions. Usage of the 
> 'spark_ml_directory'  was reverted in the scope of 
> [https://github.com/hortonworks-spark/spark-atlas-connector/issues/61], 
> [https://github.com/hortonworks-spark/spark-atlas-connector/pull/62] so ML 
> model directory is 'DataSet' entity(i.e. 'hdfs_path', 'fs_path').
> Thus, new relationship definitions must be created, since there is no 
> straightforward way to update existing ones to use 'DataSet' type instead of 
> it's child type 'spark_ml_directory'.
> See:
>  * ATLAS-3640
>  * [https://github.com/apache/atlas/pull/88#issuecomment-592699723]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ATLAS-3640) Update 'spark_ml_model_ml_directory' and 'spark_ml_pipeline_ml_directory' relationship definitions

2020-03-02 Thread Vladislav Glinskiy (Jira)


[ 
https://issues.apache.org/jira/browse/ATLAS-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17049117#comment-17049117
 ] 

Vladislav Glinskiy commented on ATLAS-3640:
---

Closing this Jira since there is no straightforward way to update 
`spark_ml_model_ml_directory` and `spark_ml_pipeline_ml_directory` relationship 
definitions to use `DataSet` type instead of it's child type 
`spark_ml_directory`.

Filed a new Jira to create new relationship definitions: 
- https://issues.apache.org/jira/browse/ATLAS-3646
- [https://github.com/apache/atlas/pull/89]

> Update 'spark_ml_model_ml_directory' and 'spark_ml_pipeline_ml_directory' 
> relationship definitions
> --
>
> Key: ATLAS-3640
> URL: https://issues.apache.org/jira/browse/ATLAS-3640
> Project: Atlas
>  Issue Type: Task
>Reporter: Vladislav Glinskiy
>Priority: Major
> Fix For: 2.1.0, 3.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Update 'spark_ml_model_ml_directory' and 'spark_ml_pipeline_ml_directory' 
> relationship definitions to use 'DataSet' type instead of it's child type 
> 'spark_ml_directory'. This is required in order to integrate Spark Atlas 
> Connector's ML event processor.
> Previously, Spark Atlas Connector used the 'spark_ml_directory' model for ML 
> model directory but this is changed in the scope of 
> [https://github.com/hortonworks-spark/spark-atlas-connector/issues/61], 
> [https://github.com/hortonworks-spark/spark-atlas-connector/pull/62] so ML 
> model directory is 'DataSet' entity(i.e. 'hdfs_path').
> Thus, relationship definitions must be updated, otherwise, an attempt to 
> create relation leads to: 
> {code:java}
> org.apache.atlas.exception.AtlasBaseException: invalid relationshipDef: 
> spark_ml_model_ml_directory: end type 1: spark_ml_directory, end type 2: 
> spark_ml_model
> {code}
> since 'COMPOSITION' requires 'spark_ml_directory' to be set.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ATLAS-3646) Create new 'spark_ml_model_dataset' and 'spark_ml_pipeline_dataset' relationship definitions

2020-03-02 Thread Vladislav Glinskiy (Jira)
Vladislav Glinskiy created ATLAS-3646:
-

 Summary: Create new 'spark_ml_model_dataset' and 
'spark_ml_pipeline_dataset' relationship definitions
 Key: ATLAS-3646
 URL: https://issues.apache.org/jira/browse/ATLAS-3646
 Project: Atlas
  Issue Type: Task
Reporter: Vladislav Glinskiy
 Fix For: 2.1.0, 3.0.0


Create new 'spark_ml_model_dataset' and 'spark_ml_pipeline_dataset' 
relationship definitions. This is required in order to integrate Spark Atlas 
Connector's ML event processor.

Previously, Spark Atlas Connector used the 'spark_ml_directory' model for the 
ML model directory and 'spark_ml_model_ml_directory', 
'spark_ml_pipeline_ml_directory' relationship definitions. Usage of the 
'spark_ml_directory'  was reverted in the scope of 
[https://github.com/hortonworks-spark/spark-atlas-connector/issues/61], 
[https://github.com/hortonworks-spark/spark-atlas-connector/pull/62] so ML 
model directory is 'DataSet' entity(i.e. 'hdfs_path', 'fs_path').

Thus, new relationship definitions must be created, since there is no 
straightforward way to update existing ones to use 'DataSet' type instead of 
it's child type 'spark_ml_directory'.

See:
 * ATLAS-3640
 * [https://github.com/apache/atlas/pull/88#issuecomment-592699723]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ATLAS-3640) Update 'spark_ml_model_ml_directory' and 'spark_ml_pipeline_ml_directory' relationship definitions

2020-02-27 Thread Vladislav Glinskiy (Jira)


[ 
https://issues.apache.org/jira/browse/ATLAS-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17046551#comment-17046551
 ] 

Vladislav Glinskiy commented on ATLAS-3640:
---

cc [~sarath]

> Update 'spark_ml_model_ml_directory' and 'spark_ml_pipeline_ml_directory' 
> relationship definitions
> --
>
> Key: ATLAS-3640
> URL: https://issues.apache.org/jira/browse/ATLAS-3640
> Project: Atlas
>  Issue Type: Task
>Reporter: Vladislav Glinskiy
>Priority: Major
> Fix For: 2.1.0, 3.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Update 'spark_ml_model_ml_directory' and 'spark_ml_pipeline_ml_directory' 
> relationship definitions to use 'DataSet' type instead of it's child type 
> 'spark_ml_directory'. This is required in order to integrate Spark Atlas 
> Connector's ML event processor.
> Previously, Spark Atlas Connector used the 'spark_ml_directory' model for ML 
> model directory but this is changed in the scope of 
> [https://github.com/hortonworks-spark/spark-atlas-connector/issues/61], 
> [https://github.com/hortonworks-spark/spark-atlas-connector/pull/62] so ML 
> model directory is 'DataSet' entity(i.e. 'hdfs_path').
> Thus, relationship definitions must be updated, otherwise, an attempt to 
> create relation leads to: 
> {code:java}
> org.apache.atlas.exception.AtlasBaseException: invalid relationshipDef: 
> spark_ml_model_ml_directory: end type 1: spark_ml_directory, end type 2: 
> spark_ml_model
> {code}
> since 'COMPOSITION' requires 'spark_ml_directory' to be set.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ATLAS-3640) Update 'spark_ml_model_ml_directory' and 'spark_ml_pipeline_ml_directory' relationship definitions

2020-02-27 Thread Vladislav Glinskiy (Jira)


[ 
https://issues.apache.org/jira/browse/ATLAS-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17046533#comment-17046533
 ] 

Vladislav Glinskiy commented on ATLAS-3640:
---

cc [~kabhwan]

> Update 'spark_ml_model_ml_directory' and 'spark_ml_pipeline_ml_directory' 
> relationship definitions
> --
>
> Key: ATLAS-3640
> URL: https://issues.apache.org/jira/browse/ATLAS-3640
> Project: Atlas
>  Issue Type: Task
>Reporter: Vladislav Glinskiy
>Priority: Major
> Fix For: 2.1.0, 3.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Update 'spark_ml_model_ml_directory' and 'spark_ml_pipeline_ml_directory' 
> relationship definitions to use 'DataSet' type instead of it's child type 
> 'spark_ml_directory'. This is required in order to integrate Spark Atlas 
> Connector's ML event processor.
> Previously, Spark Atlas Connector used the 'spark_ml_directory' model for ML 
> model directory but this is changed in the scope of 
> [https://github.com/hortonworks-spark/spark-atlas-connector/issues/61], 
> [https://github.com/hortonworks-spark/spark-atlas-connector/pull/62] so ML 
> model directory is 'DataSet' entity(i.e. 'hdfs_path').
> Thus, relationship definitions must be updated, otherwise, an attempt to 
> create relation leads to: 
> {code:java}
> org.apache.atlas.exception.AtlasBaseException: invalid relationshipDef: 
> spark_ml_model_ml_directory: end type 1: spark_ml_directory, end type 2: 
> spark_ml_model
> {code}
> since 'COMPOSITION' requires 'spark_ml_directory' to be set.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ATLAS-3640) Update 'spark_ml_model_ml_directory' and 'spark_ml_pipeline_ml_directory' relationship definitions

2020-02-27 Thread Vladislav Glinskiy (Jira)
Vladislav Glinskiy created ATLAS-3640:
-

 Summary: Update 'spark_ml_model_ml_directory' and 
'spark_ml_pipeline_ml_directory' relationship definitions
 Key: ATLAS-3640
 URL: https://issues.apache.org/jira/browse/ATLAS-3640
 Project: Atlas
  Issue Type: Task
Reporter: Vladislav Glinskiy
 Fix For: 2.1.0, 3.0.0


Update 'spark_ml_model_ml_directory' and 'spark_ml_pipeline_ml_directory' 
relationship definitions to use 'DataSet' type instead of it's child type 
'spark_ml_directory'. This is required in order to integrate Spark Atlas 
Connector's ML event processor.

Previously, Spark Atlas Connector used the 'spark_ml_directory' model for ML 
model directory but this is changed in the scope of 
[https://github.com/hortonworks-spark/spark-atlas-connector/issues/61], 
[https://github.com/hortonworks-spark/spark-atlas-connector/pull/62] so ML 
model directory is 'DataSet' entity(i.e. 'hdfs_path').

Thus, relationship definitions must be updated, otherwise, an attempt to create 
relation leads to: 
{code:java}
org.apache.atlas.exception.AtlasBaseException: invalid relationshipDef: 
spark_ml_model_ml_directory: end type 1: spark_ml_directory, end type 2: 
spark_ml_model
{code}
since 'COMPOSITION' requires 'spark_ml_directory' to be set.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)