[jira] [Commented] (ATLAS-2816) Allow ignoring relationship in EntityGraphRetriever for FullTextMapperV2
[ https://issues.apache.org/jira/browse/ATLAS-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575843#comment-16575843 ] Chengbing Liu commented on ATLAS-2816: -- Thanks [~apoorvnaik] for review! Uploaded a new patch. > Allow ignoring relationship in EntityGraphRetriever for FullTextMapperV2 > > > Key: ATLAS-2816 > URL: https://issues.apache.org/jira/browse/ATLAS-2816 > Project: Atlas > Issue Type: Bug >Affects Versions: 1.0.0 >Reporter: Chengbing Liu >Assignee: Apoorv Naik >Priority: Major > Attachments: ATLAS-2816.01.patch, ATLAS-2816.02.patch > > > We encountered a problem when using Hive bridge in production. One database > has 5000+ tables. Importing the first table costs only tens of milliseconds, > and then it becomes slower with more tables. In the end, it costs 1~2 seconds > to import one table. > After investigation, we realized that it is not necessary for the > {{FullTextMapperV2}} to retrieve all the relationship of the database each > time a table is imported. The time complexity of importing a whole database > actually goes to O(n^2) (n is number of tables). > We propose to add a parameter to the constructor of {{EntityGraphRetriever}}: > {{ignoreRelationship}}. When set to true, {{mapVertexToAtlasEntity}} will > skip the {{mapRelationshipAttributes}} call. Since {{FullTextMapperV2}} will > not use relationship attributes of the entity, this can save plenty of time > when importing entities with a large number of relations. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ATLAS-2816) Allow ignoring relationship in EntityGraphRetriever for FullTextMapperV2
[ https://issues.apache.org/jira/browse/ATLAS-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengbing Liu updated ATLAS-2816: - Attachment: ATLAS-2816.02.patch > Allow ignoring relationship in EntityGraphRetriever for FullTextMapperV2 > > > Key: ATLAS-2816 > URL: https://issues.apache.org/jira/browse/ATLAS-2816 > Project: Atlas > Issue Type: Bug >Affects Versions: 1.0.0 >Reporter: Chengbing Liu >Assignee: Apoorv Naik >Priority: Major > Attachments: ATLAS-2816.01.patch, ATLAS-2816.02.patch > > > We encountered a problem when using Hive bridge in production. One database > has 5000+ tables. Importing the first table costs only tens of milliseconds, > and then it becomes slower with more tables. In the end, it costs 1~2 seconds > to import one table. > After investigation, we realized that it is not necessary for the > {{FullTextMapperV2}} to retrieve all the relationship of the database each > time a table is imported. The time complexity of importing a whole database > actually goes to O(n^2) (n is number of tables). > We propose to add a parameter to the constructor of {{EntityGraphRetriever}}: > {{ignoreRelationship}}. When set to true, {{mapVertexToAtlasEntity}} will > skip the {{mapRelationshipAttributes}} call. Since {{FullTextMapperV2}} will > not use relationship attributes of the entity, this can save plenty of time > when importing entities with a large number of relations. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ATLAS-2816) Allow ignoring relationship in EntityGraphRetriever for FullTextMapperV2
[ https://issues.apache.org/jira/browse/ATLAS-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengbing Liu updated ATLAS-2816: - Attachment: ATLAS-2816.01.patch > Allow ignoring relationship in EntityGraphRetriever for FullTextMapperV2 > > > Key: ATLAS-2816 > URL: https://issues.apache.org/jira/browse/ATLAS-2816 > Project: Atlas > Issue Type: Bug >Affects Versions: 1.0.0 >Reporter: Chengbing Liu >Priority: Major > Attachments: ATLAS-2816.01.patch > > > We encountered a problem when using Hive bridge in production. One database > has 5000+ tables. Importing the first table costs only tens of milliseconds, > and then it becomes slower with more tables. In the end, it costs 1~2 seconds > to import one table. > After investigation, we realized that it is not necessary for the > {{FullTextMapperV2}} to retrieve all the relationship of the database each > time a table is imported. The time complexity of importing a whole database > actually goes to O(n^2) (n is number of tables). > We propose to add a parameter to the constructor of {{EntityGraphRetriever}}: > {{ignoreRelationship}}. When set to true, {{mapVertexToAtlasEntity}} will > skip the {{mapRelationshipAttributes}} call. Since {{FullTextMapperV2}} will > not use relationship attributes of the entity, this can save plenty of time > when importing entities with a large number of relations. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ATLAS-2816) Allow ignoring relationship in EntityGraphRetriever for FullTextMapperV2
[ https://issues.apache.org/jira/browse/ATLAS-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575686#comment-16575686 ] Chengbing Liu commented on ATLAS-2816: -- [~apoorvnaik], I just found ATLAS-2815 removes {{mapRelationshipAttributes(entityVertex, entity)}} and then adds it back, looks like it's an accidental change? I will provide a patch based on the latest code today. > Allow ignoring relationship in EntityGraphRetriever for FullTextMapperV2 > > > Key: ATLAS-2816 > URL: https://issues.apache.org/jira/browse/ATLAS-2816 > Project: Atlas > Issue Type: Bug >Affects Versions: 1.0.0 >Reporter: Chengbing Liu >Priority: Major > > We encountered a problem when using Hive bridge in production. One database > has 5000+ tables. Importing the first table costs only tens of milliseconds, > and then it becomes slower with more tables. In the end, it costs 1~2 seconds > to import one table. > After investigation, we realized that it is not necessary for the > {{FullTextMapperV2}} to retrieve all the relationship of the database each > time a table is imported. The time complexity of importing a whole database > actually goes to O(n^2) (n is number of tables). > We propose to add a parameter to the constructor of {{EntityGraphRetriever}}: > {{ignoreRelationship}}. When set to true, {{mapVertexToAtlasEntity}} will > skip the {{mapRelationshipAttributes}} call. Since {{FullTextMapperV2}} will > not use relationship attributes of the entity, this can save plenty of time > when importing entities with a large number of relations. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ATLAS-2816) Allow ignoring relationship in EntityGraphRetriever for FullTextMapperV2
Chengbing Liu created ATLAS-2816: Summary: Allow ignoring relationship in EntityGraphRetriever for FullTextMapperV2 Key: ATLAS-2816 URL: https://issues.apache.org/jira/browse/ATLAS-2816 Project: Atlas Issue Type: Bug Affects Versions: 1.0.0 Reporter: Chengbing Liu We encountered a problem when using Hive bridge in production. One database has 5000+ tables. Importing the first table costs only tens of milliseconds, and then it becomes slower with more tables. In the end, it costs 1~2 seconds to import one table. After investigation, we realized that it is not necessary for the {{FullTextMapperV2}} to retrieve all the relationship of the database each time a table is imported. The time complexity of importing a whole database actually goes to O(n^2) (n is number of tables). We propose to add a parameter to the constructor of {{EntityGraphRetriever}}: {{ignoreRelationship}}. When set to true, {{mapVertexToAtlasEntity}} will skip the {{mapRelationshipAttributes}} call. Since {{FullTextMapperV2}} will not use relationship attributes of the entity, this can save plenty of time when importing entities with a large number of relations. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ATLAS-2541) Add hbase-server jar for Hive hook packaging
[ https://issues.apache.org/jira/browse/ATLAS-2541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengbing Liu updated ATLAS-2541: - Attachment: ATLAS-2541.01.patch > Add hbase-server jar for Hive hook packaging > > > Key: ATLAS-2541 > URL: https://issues.apache.org/jira/browse/ATLAS-2541 > Project: Atlas > Issue Type: Bug >Affects Versions: 1.0.0-alpha >Reporter: Chengbing Liu >Priority: Major > Attachments: ATLAS-2541.01.patch > > > When importing Hive metadata using Hive bridge, a NoClassDefFoundError > exception was thrown: > {code:java} > Exception in thread "main" java.lang.NoClassDefFoundError: > org/apache/hadoop/hbase/mapreduce/TableInputFormatBase > at java.lang.ClassLoader.defineClass1(Native Method) > at java.lang.ClassLoader.defineClass(ClassLoader.java:760) > at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) > at java.net.URLClassLoader.defineClass(URLClassLoader.java:467) > at java.net.URLClassLoader.access$100(URLClassLoader.java:73) > at java.net.URLClassLoader$1.run(URLClassLoader.java:368) > at java.net.URLClassLoader$1.run(URLClassLoader.java:362) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:361) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:348) > at > org.apache.hadoop.hive.ql.metadata.Table.getInputFormatClass(Table.java:321) > at org.apache.hadoop.hive.ql.metadata.Table.checkValidity(Table.java:197) > at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1040) > at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:973) > at > org.apache.atlas.hive.bridge.HiveMetaStoreBridge.importTable(HiveMetaStoreBridge.java:300) > at > org.apache.atlas.hive.bridge.HiveMetaStoreBridge.importTables(HiveMetaStoreBridge.java:284) > at > org.apache.atlas.hive.bridge.HiveMetaStoreBridge.importDatabases(HiveMetaStoreBridge.java:155) > at > org.apache.atlas.hive.bridge.HiveMetaStoreBridge.importHiveMetadata(HiveMetaStoreBridge.java:146) > at > org.apache.atlas.hive.bridge.HiveMetaStoreBridge.main(HiveMetaStoreBridge.java:659) > Caused by: java.lang.ClassNotFoundException: > org.apache.hadoop.hbase.mapreduce.TableInputFormatBase > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > ... 23 more > {code} > The cause is not having > {{org.apache.hadoop.hbase.mapreduce.TableInputFormatBase}} on the classpath. > Currently we have only hbase-common jar on hive hook packaging. Simply adding > hbase-server jar solves the problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ATLAS-2540) Hive bridge should include hbase-server as runtime dependency
[ https://issues.apache.org/jira/browse/ATLAS-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengbing Liu resolved ATLAS-2540. -- Resolution: Duplicate Duplicated due to network issue. > Hive bridge should include hbase-server as runtime dependency > - > > Key: ATLAS-2540 > URL: https://issues.apache.org/jira/browse/ATLAS-2540 > Project: Atlas > Issue Type: Bug >Affects Versions: 1.0.0-alpha >Reporter: Chengbing Liu >Priority: Major > > When importing Hive metadata using Hive bridge, the following exception was > thrown: > {code:java} > Exception in thread "main" java.lang.NoClassDefFoundError: > org/apache/hadoop/hbase/mapreduce/TableInputFormatBase > at java.lang.ClassLoader.defineClass1(Native Method) > at java.lang.ClassLoader.defineClass(ClassLoader.java:760) > at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) > at java.net.URLClassLoader.defineClass(URLClassLoader.java:467) > at java.net.URLClassLoader.access$100(URLClassLoader.java:73) > at java.net.URLClassLoader$1.run(URLClassLoader.java:368) > at java.net.URLClassLoader$1.run(URLClassLoader.java:362) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:361) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:348) > at > org.apache.hadoop.hive.ql.metadata.Table.getInputFormatClass(Table.java:321) > at org.apache.hadoop.hive.ql.metadata.Table.checkValidity(Table.java:197) > at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1040) > at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:973) > at > org.apache.atlas.hive.bridge.HiveMetaStoreBridge.importTable(HiveMetaStoreBridge.java:300) > at > org.apache.atlas.hive.bridge.HiveMetaStoreBridge.importTables(HiveMetaStoreBridge.java:284) > at > org.apache.atlas.hive.bridge.HiveMetaStoreBridge.importDatabases(HiveMetaStoreBridge.java:155) > at > org.apache.atlas.hive.bridge.HiveMetaStoreBridge.importHiveMetadata(HiveMetaStoreBridge.java:146) > at > org.apache.atlas.hive.bridge.HiveMetaStoreBridge.main(HiveMetaStoreBridge.java:659) > Caused by: java.lang.ClassNotFoundException: > org.apache.hadoop.hbase.mapreduce.TableInputFormatBase > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > ... 23 more > {code} > The cause is not having > {{org.apache.hadoop.hbase.mapreduce.TableInputFormatBase}} on the classpath. > Currently we have only hbase-common in the dependency list. Simply adding > hbase-server to the dependency list solves the problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ATLAS-2541) Add hbase-server jar for Hive hook packaging
Chengbing Liu created ATLAS-2541: Summary: Add hbase-server jar for Hive hook packaging Key: ATLAS-2541 URL: https://issues.apache.org/jira/browse/ATLAS-2541 Project: Atlas Issue Type: Bug Affects Versions: 1.0.0-alpha Reporter: Chengbing Liu When importing Hive metadata using Hive bridge, a NoClassDefFoundError exception was thrown: {code:java} Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/mapreduce/TableInputFormatBase at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:760) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:467) at java.net.URLClassLoader.access$100(URLClassLoader.java:73) at java.net.URLClassLoader$1.run(URLClassLoader.java:368) at java.net.URLClassLoader$1.run(URLClassLoader.java:362) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:361) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.hadoop.hive.ql.metadata.Table.getInputFormatClass(Table.java:321) at org.apache.hadoop.hive.ql.metadata.Table.checkValidity(Table.java:197) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1040) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:973) at org.apache.atlas.hive.bridge.HiveMetaStoreBridge.importTable(HiveMetaStoreBridge.java:300) at org.apache.atlas.hive.bridge.HiveMetaStoreBridge.importTables(HiveMetaStoreBridge.java:284) at org.apache.atlas.hive.bridge.HiveMetaStoreBridge.importDatabases(HiveMetaStoreBridge.java:155) at org.apache.atlas.hive.bridge.HiveMetaStoreBridge.importHiveMetadata(HiveMetaStoreBridge.java:146) at org.apache.atlas.hive.bridge.HiveMetaStoreBridge.main(HiveMetaStoreBridge.java:659) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.mapreduce.TableInputFormatBase at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 23 more {code} The cause is not having {{org.apache.hadoop.hbase.mapreduce.TableInputFormatBase}} on the classpath. Currently we have only hbase-common jar on hive hook packaging. Simply adding hbase-server jar solves the problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ATLAS-2540) Hive bridge should include hbase-server as runtime dependency
Chengbing Liu created ATLAS-2540: Summary: Hive bridge should include hbase-server as runtime dependency Key: ATLAS-2540 URL: https://issues.apache.org/jira/browse/ATLAS-2540 Project: Atlas Issue Type: Bug Affects Versions: 1.0.0-alpha Reporter: Chengbing Liu When importing Hive metadata using Hive bridge, the following exception was thrown: {code:java} Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/mapreduce/TableInputFormatBase at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:760) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:467) at java.net.URLClassLoader.access$100(URLClassLoader.java:73) at java.net.URLClassLoader$1.run(URLClassLoader.java:368) at java.net.URLClassLoader$1.run(URLClassLoader.java:362) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:361) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.hadoop.hive.ql.metadata.Table.getInputFormatClass(Table.java:321) at org.apache.hadoop.hive.ql.metadata.Table.checkValidity(Table.java:197) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1040) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:973) at org.apache.atlas.hive.bridge.HiveMetaStoreBridge.importTable(HiveMetaStoreBridge.java:300) at org.apache.atlas.hive.bridge.HiveMetaStoreBridge.importTables(HiveMetaStoreBridge.java:284) at org.apache.atlas.hive.bridge.HiveMetaStoreBridge.importDatabases(HiveMetaStoreBridge.java:155) at org.apache.atlas.hive.bridge.HiveMetaStoreBridge.importHiveMetadata(HiveMetaStoreBridge.java:146) at org.apache.atlas.hive.bridge.HiveMetaStoreBridge.main(HiveMetaStoreBridge.java:659) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.mapreduce.TableInputFormatBase at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 23 more {code} The cause is not having {{org.apache.hadoop.hbase.mapreduce.TableInputFormatBase}} on the classpath. Currently we have only hbase-common in the dependency list. Simply adding hbase-server to the dependency list solves the problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ATLAS-1175) Type updates should allow removal of optional attributes
[ https://issues.apache.org/jira/browse/ATLAS-1175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16411307#comment-16411307 ] Chengbing Liu commented on ATLAS-1175: -- Hi folks, is there any plan to support this? It could be really annoying if you cannot delete attributes in a production environment... The background cleanup thread is reasonable to me. > Type updates should allow removal of optional attributes > > > Key: ATLAS-1175 > URL: https://issues.apache.org/jira/browse/ATLAS-1175 > Project: Atlas > Issue Type: Bug >Affects Versions: 0.7-incubating, 0.8-incubating >Reporter: Suma Shivaprasad >Assignee: Sarath Subramanian >Priority: Major > > Currently optional attributes are not allowed to be removed from a given > type. This should be allowed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)