[jira] [Commented] (ATLAS-4263) KafkaUtils sets invalid dynamic JAAS config
[ https://issues.apache.org/jira/browse/ATLAS-4263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17333462#comment-17333462 ] Vladislav Glinskiy commented on ATLAS-4263: --- cc [~jayendrap], [~nixon] > KafkaUtils sets invalid dynamic JAAS config > --- > > Key: ATLAS-4263 > URL: https://issues.apache.org/jira/browse/ATLAS-4263 > Project: Atlas > Issue Type: Task > Components: atlas-core >Affects Versions: 2.1.0, 3.0.0 >Reporter: Vladislav Glinskiy >Priority: Major > Fix For: 3.0.0, 2.2.0 > > Time Spent: 10m > Remaining Estimate: 0h > > [KafkaUtils|https://github.com/apache/atlas/blob/8d3c4ab0e8844f04e29f66acb3577e9d40de9a16/common/src/main/java/org/apache/atlas/utils/KafkaUtils.java#L195] > doesn't always > [enclose|https://github.com/apache/atlas/blob/8d3c4ab0e8844f04e29f66acb3577e9d40de9a16/common/src/main/java/org/apache/atlas/utils/KafkaUtils.java#L316] > property values in double-quotes, thus, setting invalid dynamic JAAS config > for token auth in some cases. > Faced this issue with Spark Atlas Connector trying to configure Atlas client > to use delegation tokens. The following configuration is not handled properly: > {code:java} > atlas.jaas.KafkaClient.option.username=30CQ4q1hQMy0dB6X0eXfxQ > atlas.jaas.KafkaClient.option.password=KdaUQ4FlKWlDxwQrAeFGUVbb6sR0P+zoqOZDZjtIRP1wseXbSbhiTjz3QI9Ur9o4LTYZSv8TE1QqUC4FSwnoTA== > {code} > and results in the following error: > {code:java} > java.lang.IllegalArgumentException: Value not specified for key 'null' in > JAAS config > at > org.apache.kafka.common.security.JaasConfig.parseAppConfigurationEntry(JaasConfig.java:116) > at > org.apache.kafka.common.security.JaasConfig.(JaasConfig.java:63) > at > org.apache.kafka.common.security.JaasContext.load(JaasContext.java:90) > at > org.apache.kafka.common.security.JaasContext.loadClientContext(JaasContext.java:84) > {code} > [KafkaUtils|https://github.com/apache/atlas/blob/8d3c4ab0e8844f04e29f66acb3577e9d40de9a16/common/src/main/java/org/apache/atlas/utils/KafkaUtils.java#L195] > should always enclose property values in double-quotes, since unenclosed > digits and '+' sign can not be parsed by Kafka > [JaasConfig|https://github.com/apache/kafka/blob/2.0.0/clients/src/main/java/org/apache/kafka/common/security/JaasConfig.java#L116]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ATLAS-4263) KafkaUtils sets invalid dynamic JAAS config
Vladislav Glinskiy created ATLAS-4263: - Summary: KafkaUtils sets invalid dynamic JAAS config Key: ATLAS-4263 URL: https://issues.apache.org/jira/browse/ATLAS-4263 Project: Atlas Issue Type: Task Components: atlas-core Affects Versions: 2.1.0, 3.0.0 Reporter: Vladislav Glinskiy [KafkaUtils|https://github.com/apache/atlas/blob/8d3c4ab0e8844f04e29f66acb3577e9d40de9a16/common/src/main/java/org/apache/atlas/utils/KafkaUtils.java#L195] doesn't always [enclose|https://github.com/apache/atlas/blob/8d3c4ab0e8844f04e29f66acb3577e9d40de9a16/common/src/main/java/org/apache/atlas/utils/KafkaUtils.java#L316] property values in double-quotes, thus, setting invalid dynamic JAAS config for token auth in some cases. Faced this issue with Spark Atlas Connector trying to configure Atlas client to use delegation tokens. The following configuration is not handled properly: {code:java} atlas.jaas.KafkaClient.option.username=30CQ4q1hQMy0dB6X0eXfxQ atlas.jaas.KafkaClient.option.password=KdaUQ4FlKWlDxwQrAeFGUVbb6sR0P+zoqOZDZjtIRP1wseXbSbhiTjz3QI9Ur9o4LTYZSv8TE1QqUC4FSwnoTA== {code} and results in the following error: {code:java} java.lang.IllegalArgumentException: Value not specified for key 'null' in JAAS config at org.apache.kafka.common.security.JaasConfig.parseAppConfigurationEntry(JaasConfig.java:116) at org.apache.kafka.common.security.JaasConfig.(JaasConfig.java:63) at org.apache.kafka.common.security.JaasContext.load(JaasContext.java:90) at org.apache.kafka.common.security.JaasContext.loadClientContext(JaasContext.java:84) {code} [KafkaUtils|https://github.com/apache/atlas/blob/8d3c4ab0e8844f04e29f66acb3577e9d40de9a16/common/src/main/java/org/apache/atlas/utils/KafkaUtils.java#L195] should always enclose property values in double-quotes, since unenclosed digits and '+' sign can not be parsed by Kafka [JaasConfig|https://github.com/apache/kafka/blob/2.0.0/clients/src/main/java/org/apache/kafka/common/security/JaasConfig.java#L116]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ATLAS-3665) Add 'queryText' attribute to the 'spark_process' type
[ https://issues.apache.org/jira/browse/ATLAS-3665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladislav Glinskiy updated ATLAS-3665: -- Description: Add 'queryText' attribute to the 'spark_process' type in order to make `spark_process` more readable by the user. The `queryText` attribute stores exact SQL query that are executed within the Spark session as a Spark process. (was: Add 'recentQueries' attribute to the 'spark_process' type in order to make `spark_process` more readable by the user. The `recentQueries` attribute stores exact SQL queries that are executed within the Spark session.) > Add 'queryText' attribute to the 'spark_process' type > - > > Key: ATLAS-3665 > URL: https://issues.apache.org/jira/browse/ATLAS-3665 > Project: Atlas > Issue Type: Task >Reporter: Vladislav Glinskiy >Priority: Major > Fix For: 2.1.0, 3.0.0 > > Time Spent: 50m > Remaining Estimate: 0h > > Add 'queryText' attribute to the 'spark_process' type in order to make > `spark_process` more readable by the user. The `queryText` attribute stores > exact SQL query that are executed within the Spark session as a Spark process. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ATLAS-3665) Add 'queryText' attribute to the 'spark_process' type
[ https://issues.apache.org/jira/browse/ATLAS-3665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladislav Glinskiy updated ATLAS-3665: -- Summary: Add 'queryText' attribute to the 'spark_process' type (was: Add 'recentQueries' attribute to the 'spark_process' type) > Add 'queryText' attribute to the 'spark_process' type > - > > Key: ATLAS-3665 > URL: https://issues.apache.org/jira/browse/ATLAS-3665 > Project: Atlas > Issue Type: Task >Reporter: Vladislav Glinskiy >Priority: Major > Fix For: 2.1.0, 3.0.0 > > Time Spent: 50m > Remaining Estimate: 0h > > Add 'recentQueries' attribute to the 'spark_process' type in order to make > `spark_process` more readable by the user. The `recentQueries` attribute > stores exact SQL queries that are executed within the Spark session. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ATLAS-3665) Add 'recentQueries' attribute to the 'spark_process' type
[ https://issues.apache.org/jira/browse/ATLAS-3665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17058805#comment-17058805 ] Vladislav Glinskiy commented on ATLAS-3665: --- cc [~kabhwan] [~sarath] > Add 'recentQueries' attribute to the 'spark_process' type > - > > Key: ATLAS-3665 > URL: https://issues.apache.org/jira/browse/ATLAS-3665 > Project: Atlas > Issue Type: Task >Reporter: Vladislav Glinskiy >Priority: Major > Fix For: 2.1.0, 3.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Add 'recentQueries' attribute to the 'spark_process' type in order to make > `spark_process` more readable by the user. The `recentQueries` attribute > stores exact SQL queries that are executed within the Spark session. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ATLAS-3665) Add 'recentQueries' attribute to the 'spark_process' type
Vladislav Glinskiy created ATLAS-3665: - Summary: Add 'recentQueries' attribute to the 'spark_process' type Key: ATLAS-3665 URL: https://issues.apache.org/jira/browse/ATLAS-3665 Project: Atlas Issue Type: Task Reporter: Vladislav Glinskiy Fix For: 2.1.0, 3.0.0 Add 'recentQueries' attribute to the 'spark_process' type in order to make `spark_process` more readable by the user. The `recentQueries` attribute stores exact SQL queries that are executed within the Spark session. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ATLAS-3661) Create 'spark_column_lineage' type and relationship definition to add support of column level lineage
[ https://issues.apache.org/jira/browse/ATLAS-3661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057421#comment-17057421 ] Vladislav Glinskiy commented on ATLAS-3661: --- cc [~kabhwan] [~sarath] > Create 'spark_column_lineage' type and relationship definition to add support > of column level lineage > - > > Key: ATLAS-3661 > URL: https://issues.apache.org/jira/browse/ATLAS-3661 > Project: Atlas > Issue Type: Task >Reporter: Vladislav Glinskiy >Priority: Major > Fix For: 2.1.0, 3.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Create 'spark_column_lineage' type and corresponding > 'spark_process_column_lineage' relationship definition to add support of > column-level lineage. > Column level lineage refers to lineage created between the input and output > columns. > For example: > {code:java} > hive > create table employee_ctas as select id from employee;{code} > For the above query, lineage is created from 'employee' to 'employee_ctas', > and also from 'employee.id' to 'employee_ctas.id'. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ATLAS-3661) Create 'spark_column_lineage' type and relationship definition to add support of column level lineage
Vladislav Glinskiy created ATLAS-3661: - Summary: Create 'spark_column_lineage' type and relationship definition to add support of column level lineage Key: ATLAS-3661 URL: https://issues.apache.org/jira/browse/ATLAS-3661 Project: Atlas Issue Type: Task Reporter: Vladislav Glinskiy Fix For: 2.1.0, 3.0.0 Create 'spark_column_lineage' type and corresponding 'spark_process_column_lineage' relationship definition to add support of column-level lineage. Column level lineage refers to lineage created between the input and output columns. For example: {code:java} hive > create table employee_ctas as select id from employee;{code} For the above query, lineage is created from 'employee' to 'employee_ctas', and also from 'employee.id' to 'employee_ctas.id'. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ATLAS-3655) Create 'spark_application' type to avoid 'spark_process' from being updated for multiple operations
[ https://issues.apache.org/jira/browse/ATLAS-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17052471#comment-17052471 ] Vladislav Glinskiy commented on ATLAS-3655: --- cc [~kabhwan] [~sarath] > Create 'spark_application' type to avoid 'spark_process' from being updated > for multiple operations > --- > > Key: ATLAS-3655 > URL: https://issues.apache.org/jira/browse/ATLAS-3655 > Project: Atlas > Issue Type: Task >Reporter: Vladislav Glinskiy >Priority: Major > Fix For: 2.1.0, 3.0.0 > > Attachments: Screenshot from 2020-03-03 16-09-39.png > > Time Spent: 0.5h > Remaining Estimate: 0h > > Create 'spark_application' type to avoid 'spark_process' from being updated > for multiple operations. Currently, Spark Atlas Connector uses > 'spark_process' as a top-level type for a Spark session, thus it's being > updated for multiple operations within the same session. > The following statements: > {code:java} > spark.sql("create table table_1(col1 int,col2 string)"); > spark.sql("create table table_2 as select * from table_1"); > {code} > result in the next correct lineage: > table1 --> spark_process1 ---> table2 > but executing similar statements in the same spark session: > {code:java} > spark.sql("create table table_3(col1 int,col2 string)"); > spark.sql("create table table_4 as select * from table_3"); > {code} > result in the same 'spark_process' being updated and the lineage now connects > all the 4 tables(see screenshot in the attachments). > > The proposal is to create a 'spark_application' entity and associate all > 'spark_process' entities (created within that session) to it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ATLAS-3655) Create 'spark_application' type to avoid 'spark_process' from being updated for multiple operations
Vladislav Glinskiy created ATLAS-3655: - Summary: Create 'spark_application' type to avoid 'spark_process' from being updated for multiple operations Key: ATLAS-3655 URL: https://issues.apache.org/jira/browse/ATLAS-3655 Project: Atlas Issue Type: Task Reporter: Vladislav Glinskiy Fix For: 2.1.0, 3.0.0 Attachments: Screenshot from 2020-03-03 16-09-39.png Create 'spark_application' type to avoid 'spark_process' from being updated for multiple operations. Currently, Spark Atlas Connector uses 'spark_process' as a top-level type for a Spark session, thus it's being updated for multiple operations within the same session. The following statements: {code:java} spark.sql("create table table_1(col1 int,col2 string)"); spark.sql("create table table_2 as select * from table_1"); {code} result in the next correct lineage: table1 --> spark_process1 ---> table2 but executing similar statements in the same spark session: {code:java} spark.sql("create table table_3(col1 int,col2 string)"); spark.sql("create table table_4 as select * from table_3"); {code} result in the same 'spark_process' being updated and the lineage now connects all the 4 tables(see screenshot in the attachments). The proposal is to create a 'spark_application' entity and associate all 'spark_process' entities (created within that session) to it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ATLAS-3640) Update 'spark_ml_model_ml_directory' and 'spark_ml_pipeline_ml_directory' relationship definitions
[ https://issues.apache.org/jira/browse/ATLAS-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladislav Glinskiy resolved ATLAS-3640. --- Resolution: Invalid > Update 'spark_ml_model_ml_directory' and 'spark_ml_pipeline_ml_directory' > relationship definitions > -- > > Key: ATLAS-3640 > URL: https://issues.apache.org/jira/browse/ATLAS-3640 > Project: Atlas > Issue Type: Task >Reporter: Vladislav Glinskiy >Priority: Major > Fix For: 2.1.0, 3.0.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Update 'spark_ml_model_ml_directory' and 'spark_ml_pipeline_ml_directory' > relationship definitions to use 'DataSet' type instead of it's child type > 'spark_ml_directory'. This is required in order to integrate Spark Atlas > Connector's ML event processor. > Previously, Spark Atlas Connector used the 'spark_ml_directory' model for ML > model directory but this is changed in the scope of > [https://github.com/hortonworks-spark/spark-atlas-connector/issues/61], > [https://github.com/hortonworks-spark/spark-atlas-connector/pull/62] so ML > model directory is 'DataSet' entity(i.e. 'hdfs_path'). > Thus, relationship definitions must be updated, otherwise, an attempt to > create relation leads to: > {code:java} > org.apache.atlas.exception.AtlasBaseException: invalid relationshipDef: > spark_ml_model_ml_directory: end type 1: spark_ml_directory, end type 2: > spark_ml_model > {code} > since 'COMPOSITION' requires 'spark_ml_directory' to be set. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ATLAS-3646) Create new 'spark_ml_model_dataset' and 'spark_ml_pipeline_dataset' relationship definitions
[ https://issues.apache.org/jira/browse/ATLAS-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17049118#comment-17049118 ] Vladislav Glinskiy commented on ATLAS-3646: --- cc [~kabhwan] [~sarath] > Create new 'spark_ml_model_dataset' and 'spark_ml_pipeline_dataset' > relationship definitions > > > Key: ATLAS-3646 > URL: https://issues.apache.org/jira/browse/ATLAS-3646 > Project: Atlas > Issue Type: Task >Reporter: Vladislav Glinskiy >Priority: Major > Fix For: 2.1.0, 3.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Create new 'spark_ml_model_dataset' and 'spark_ml_pipeline_dataset' > relationship definitions. This is required in order to integrate Spark Atlas > Connector's ML event processor. > Previously, Spark Atlas Connector used the 'spark_ml_directory' model for the > ML model directory and 'spark_ml_model_ml_directory', > 'spark_ml_pipeline_ml_directory' relationship definitions. Usage of the > 'spark_ml_directory' was reverted in the scope of > [https://github.com/hortonworks-spark/spark-atlas-connector/issues/61], > [https://github.com/hortonworks-spark/spark-atlas-connector/pull/62] so ML > model directory is 'DataSet' entity(i.e. 'hdfs_path', 'fs_path'). > Thus, new relationship definitions must be created, since there is no > straightforward way to update existing ones to use 'DataSet' type instead of > it's child type 'spark_ml_directory'. > See: > * ATLAS-3640 > * [https://github.com/apache/atlas/pull/88#issuecomment-592699723] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ATLAS-3640) Update 'spark_ml_model_ml_directory' and 'spark_ml_pipeline_ml_directory' relationship definitions
[ https://issues.apache.org/jira/browse/ATLAS-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17049117#comment-17049117 ] Vladislav Glinskiy commented on ATLAS-3640: --- Closing this Jira since there is no straightforward way to update `spark_ml_model_ml_directory` and `spark_ml_pipeline_ml_directory` relationship definitions to use `DataSet` type instead of it's child type `spark_ml_directory`. Filed a new Jira to create new relationship definitions: - https://issues.apache.org/jira/browse/ATLAS-3646 - [https://github.com/apache/atlas/pull/89] > Update 'spark_ml_model_ml_directory' and 'spark_ml_pipeline_ml_directory' > relationship definitions > -- > > Key: ATLAS-3640 > URL: https://issues.apache.org/jira/browse/ATLAS-3640 > Project: Atlas > Issue Type: Task >Reporter: Vladislav Glinskiy >Priority: Major > Fix For: 2.1.0, 3.0.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Update 'spark_ml_model_ml_directory' and 'spark_ml_pipeline_ml_directory' > relationship definitions to use 'DataSet' type instead of it's child type > 'spark_ml_directory'. This is required in order to integrate Spark Atlas > Connector's ML event processor. > Previously, Spark Atlas Connector used the 'spark_ml_directory' model for ML > model directory but this is changed in the scope of > [https://github.com/hortonworks-spark/spark-atlas-connector/issues/61], > [https://github.com/hortonworks-spark/spark-atlas-connector/pull/62] so ML > model directory is 'DataSet' entity(i.e. 'hdfs_path'). > Thus, relationship definitions must be updated, otherwise, an attempt to > create relation leads to: > {code:java} > org.apache.atlas.exception.AtlasBaseException: invalid relationshipDef: > spark_ml_model_ml_directory: end type 1: spark_ml_directory, end type 2: > spark_ml_model > {code} > since 'COMPOSITION' requires 'spark_ml_directory' to be set. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ATLAS-3646) Create new 'spark_ml_model_dataset' and 'spark_ml_pipeline_dataset' relationship definitions
Vladislav Glinskiy created ATLAS-3646: - Summary: Create new 'spark_ml_model_dataset' and 'spark_ml_pipeline_dataset' relationship definitions Key: ATLAS-3646 URL: https://issues.apache.org/jira/browse/ATLAS-3646 Project: Atlas Issue Type: Task Reporter: Vladislav Glinskiy Fix For: 2.1.0, 3.0.0 Create new 'spark_ml_model_dataset' and 'spark_ml_pipeline_dataset' relationship definitions. This is required in order to integrate Spark Atlas Connector's ML event processor. Previously, Spark Atlas Connector used the 'spark_ml_directory' model for the ML model directory and 'spark_ml_model_ml_directory', 'spark_ml_pipeline_ml_directory' relationship definitions. Usage of the 'spark_ml_directory' was reverted in the scope of [https://github.com/hortonworks-spark/spark-atlas-connector/issues/61], [https://github.com/hortonworks-spark/spark-atlas-connector/pull/62] so ML model directory is 'DataSet' entity(i.e. 'hdfs_path', 'fs_path'). Thus, new relationship definitions must be created, since there is no straightforward way to update existing ones to use 'DataSet' type instead of it's child type 'spark_ml_directory'. See: * ATLAS-3640 * [https://github.com/apache/atlas/pull/88#issuecomment-592699723] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ATLAS-3640) Update 'spark_ml_model_ml_directory' and 'spark_ml_pipeline_ml_directory' relationship definitions
[ https://issues.apache.org/jira/browse/ATLAS-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17046551#comment-17046551 ] Vladislav Glinskiy commented on ATLAS-3640: --- cc [~sarath] > Update 'spark_ml_model_ml_directory' and 'spark_ml_pipeline_ml_directory' > relationship definitions > -- > > Key: ATLAS-3640 > URL: https://issues.apache.org/jira/browse/ATLAS-3640 > Project: Atlas > Issue Type: Task >Reporter: Vladislav Glinskiy >Priority: Major > Fix For: 2.1.0, 3.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Update 'spark_ml_model_ml_directory' and 'spark_ml_pipeline_ml_directory' > relationship definitions to use 'DataSet' type instead of it's child type > 'spark_ml_directory'. This is required in order to integrate Spark Atlas > Connector's ML event processor. > Previously, Spark Atlas Connector used the 'spark_ml_directory' model for ML > model directory but this is changed in the scope of > [https://github.com/hortonworks-spark/spark-atlas-connector/issues/61], > [https://github.com/hortonworks-spark/spark-atlas-connector/pull/62] so ML > model directory is 'DataSet' entity(i.e. 'hdfs_path'). > Thus, relationship definitions must be updated, otherwise, an attempt to > create relation leads to: > {code:java} > org.apache.atlas.exception.AtlasBaseException: invalid relationshipDef: > spark_ml_model_ml_directory: end type 1: spark_ml_directory, end type 2: > spark_ml_model > {code} > since 'COMPOSITION' requires 'spark_ml_directory' to be set. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ATLAS-3640) Update 'spark_ml_model_ml_directory' and 'spark_ml_pipeline_ml_directory' relationship definitions
[ https://issues.apache.org/jira/browse/ATLAS-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17046533#comment-17046533 ] Vladislav Glinskiy commented on ATLAS-3640: --- cc [~kabhwan] > Update 'spark_ml_model_ml_directory' and 'spark_ml_pipeline_ml_directory' > relationship definitions > -- > > Key: ATLAS-3640 > URL: https://issues.apache.org/jira/browse/ATLAS-3640 > Project: Atlas > Issue Type: Task >Reporter: Vladislav Glinskiy >Priority: Major > Fix For: 2.1.0, 3.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Update 'spark_ml_model_ml_directory' and 'spark_ml_pipeline_ml_directory' > relationship definitions to use 'DataSet' type instead of it's child type > 'spark_ml_directory'. This is required in order to integrate Spark Atlas > Connector's ML event processor. > Previously, Spark Atlas Connector used the 'spark_ml_directory' model for ML > model directory but this is changed in the scope of > [https://github.com/hortonworks-spark/spark-atlas-connector/issues/61], > [https://github.com/hortonworks-spark/spark-atlas-connector/pull/62] so ML > model directory is 'DataSet' entity(i.e. 'hdfs_path'). > Thus, relationship definitions must be updated, otherwise, an attempt to > create relation leads to: > {code:java} > org.apache.atlas.exception.AtlasBaseException: invalid relationshipDef: > spark_ml_model_ml_directory: end type 1: spark_ml_directory, end type 2: > spark_ml_model > {code} > since 'COMPOSITION' requires 'spark_ml_directory' to be set. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ATLAS-3640) Update 'spark_ml_model_ml_directory' and 'spark_ml_pipeline_ml_directory' relationship definitions
Vladislav Glinskiy created ATLAS-3640: - Summary: Update 'spark_ml_model_ml_directory' and 'spark_ml_pipeline_ml_directory' relationship definitions Key: ATLAS-3640 URL: https://issues.apache.org/jira/browse/ATLAS-3640 Project: Atlas Issue Type: Task Reporter: Vladislav Glinskiy Fix For: 2.1.0, 3.0.0 Update 'spark_ml_model_ml_directory' and 'spark_ml_pipeline_ml_directory' relationship definitions to use 'DataSet' type instead of it's child type 'spark_ml_directory'. This is required in order to integrate Spark Atlas Connector's ML event processor. Previously, Spark Atlas Connector used the 'spark_ml_directory' model for ML model directory but this is changed in the scope of [https://github.com/hortonworks-spark/spark-atlas-connector/issues/61], [https://github.com/hortonworks-spark/spark-atlas-connector/pull/62] so ML model directory is 'DataSet' entity(i.e. 'hdfs_path'). Thus, relationship definitions must be updated, otherwise, an attempt to create relation leads to: {code:java} org.apache.atlas.exception.AtlasBaseException: invalid relationshipDef: spark_ml_model_ml_directory: end type 1: spark_ml_directory, end type 2: spark_ml_model {code} since 'COMPOSITION' requires 'spark_ml_directory' to be set. -- This message was sent by Atlassian Jira (v8.3.4#803005)