Re: Interest in Apache Atlas
Hi, I thought I would add a couple of ideas. 1) another way would be to separate customer information into separate atlas instances. This would make sense if the customers had significantly different policies for example they had different laws to comply with as they were in different geographies. Exchange of metadata could of course occur; this could be an import or export. 2) In terms of namespaces - I think of namespaces being useful to split test and production data and the like for a single customer. I would suggest the multiple tenancy scenario could be managed using tags and Ranger policies to separate who could see and act on what. all the best David. From: Sandeep Nayak To: dev@atlas.incubator.apache.org Cc: Venkatesh Seetharam Date: 04/12/2016 20:33 Subject:Re: Interest in Apache Atlas Hi Hemanth, Thank you for taking the time to respond. I will take a look at ATLAS-51 and will also be interested in hearing from others like you eluded to in your response. Cheers, Sandeep. On Sun, Dec 4, 2016 at 5:09 AM, Hemanth Yamijala wrote: > Hi Sandeep, > > Responses inline. Hoping others can pitch in with more recent information, > as mine might be a little dated. > > Thanks > hemanth > > From: Sandeep Nayak > Sent: Sunday, December 04, 2016 12:00 AM > To: dev@atlas.incubator.apache.org > Cc: Venkatesh Seetharam > Subject: Re: Interest in Apache Atlas > > Hi all, > > Sending a reminder, I am looking for answers to the questions below. Can > someone help? > > Thanks in advance for your attention. > > - Sandeep > > On Thu, Dec 1, 2016 at 12:13 AM, Sandeep Nayak > wrote: > > > Hi all, > > > > I had asked a couple questions to Venkatesh earlier please see email > > below. He recommended that I move the questions to the dev mailing list > and > > thus this mail. > > > > To follow up on the questions asked below to my queries > > > > (a) Multi-tenancy: If I were to bring in data-sets from different > > customers then I need to record, annotate or tag and provide access to > > data-sets only to the relevant owners. Is it possible for me to record > and > > manage data-sets for different customers in a single Atlas instance? Does > > Atlas provide me with the necessary constructs to separate recording of > > data-sets by tenant and tracking metadata etc by tenant? > > It is possible to build a solution on top of Atlas to satisfy your > requirements. It appears you need a namespacing facility of sorts. While > there is no native construct like that in Atlas today (please see ATLAS-51, > which is still open), I guess you could rely on the extensibility of the > type system to let your objects extend from a base type that defines a > tenant attribute. Then use wrapper APIs that filter out objects according > to the tenant in question. Of course, one could use the lower level APIs to > get around this, and hence it is cooperative in nature. > > > > > (c) Performance Numbers: I understand it is built to scale given the use > > of HBase but any performance numbers that can be shared will be helpful. > > E.g. Is there a limit to the number of data-sets I can record on Atlas? > Are > > there performance numbers on the number of queries? > > > > This is dated information (at least couple of months). If someone has > updated numbers, we should hear from them. At that time, we tested > importing 50K Hive tables and dependent objects (columns etc) with a total > of about < 10M vertices. > > From what I remember, I think we could import these in about 20 minutes or > so. However, this does make some assumptions about the dependencies on the > data sets and hence we could bump up parallelism for import. We tested > reads with queries from 30 users in parallel. Times vary based on type of > queries - simple lookups take seconds, but more complex queries like > lineage take longer. > > This is a constant source of improvement in the project and there are > several JIRAs talking about performance changes including some that are > still open. E.g. ATLAS-711. > > > (d) Are there companies using Atlas in production at this stage? > > > > Thanks in advance for your responses. > > > > - Sandeep > > > > > > > > > > On Fri, Nov 18, 2016 at 9:10 AM, Venkatesh Seetharam < > venkat...@apache.org > > > wrote: > > > >> Sandeep - please use the dev mailing list for atlas for a prompt > response. > >> > >> (a) How can one achieve multi-tenancy on Apache Atlas? > >> Can you pls elaborate? You can always have a package structure for you
Re: Interest in Apache Atlas
Hi Hemanth, Thank you for taking the time to respond. I will take a look at ATLAS-51 and will also be interested in hearing from others like you eluded to in your response. Cheers, Sandeep. On Sun, Dec 4, 2016 at 5:09 AM, Hemanth Yamijala wrote: > Hi Sandeep, > > Responses inline. Hoping others can pitch in with more recent information, > as mine might be a little dated. > > Thanks > hemanth > > From: Sandeep Nayak > Sent: Sunday, December 04, 2016 12:00 AM > To: dev@atlas.incubator.apache.org > Cc: Venkatesh Seetharam > Subject: Re: Interest in Apache Atlas > > Hi all, > > Sending a reminder, I am looking for answers to the questions below. Can > someone help? > > Thanks in advance for your attention. > > - Sandeep > > On Thu, Dec 1, 2016 at 12:13 AM, Sandeep Nayak > wrote: > > > Hi all, > > > > I had asked a couple questions to Venkatesh earlier please see email > > below. He recommended that I move the questions to the dev mailing list > and > > thus this mail. > > > > To follow up on the questions asked below to my queries > > > > (a) Multi-tenancy: If I were to bring in data-sets from different > > customers then I need to record, annotate or tag and provide access to > > data-sets only to the relevant owners. Is it possible for me to record > and > > manage data-sets for different customers in a single Atlas instance? Does > > Atlas provide me with the necessary constructs to separate recording of > > data-sets by tenant and tracking metadata etc by tenant? > > It is possible to build a solution on top of Atlas to satisfy your > requirements. It appears you need a namespacing facility of sorts. While > there is no native construct like that in Atlas today (please see ATLAS-51, > which is still open), I guess you could rely on the extensibility of the > type system to let your objects extend from a base type that defines a > tenant attribute. Then use wrapper APIs that filter out objects according > to the tenant in question. Of course, one could use the lower level APIs to > get around this, and hence it is cooperative in nature. > > > > > (c) Performance Numbers: I understand it is built to scale given the use > > of HBase but any performance numbers that can be shared will be helpful. > > E.g. Is there a limit to the number of data-sets I can record on Atlas? > Are > > there performance numbers on the number of queries? > > > > This is dated information (at least couple of months). If someone has > updated numbers, we should hear from them. At that time, we tested > importing 50K Hive tables and dependent objects (columns etc) with a total > of about < 10M vertices. > > From what I remember, I think we could import these in about 20 minutes or > so. However, this does make some assumptions about the dependencies on the > data sets and hence we could bump up parallelism for import. We tested > reads with queries from 30 users in parallel. Times vary based on type of > queries - simple lookups take seconds, but more complex queries like > lineage take longer. > > This is a constant source of improvement in the project and there are > several JIRAs talking about performance changes including some that are > still open. E.g. ATLAS-711. > > > (d) Are there companies using Atlas in production at this stage? > > > > Thanks in advance for your responses. > > > > - Sandeep > > > > > > > > > > On Fri, Nov 18, 2016 at 9:10 AM, Venkatesh Seetharam < > venkat...@apache.org > > > wrote: > > > >> Sandeep - please use the dev mailing list for atlas for a prompt > response. > >> > >> (a) How can one achieve multi-tenancy on Apache Atlas? > >> Can you pls elaborate? You can always have a package structure for your > >> data sets. > >> > >> (b) Is Atlas ready for production usage? > >> It depends, I think it is but needs some scripting around BCP, etc. > >> > >> (c) Are there published numbers on the volume of data-sets Atlas can > >> manage? > >> Its built to scale, uses Titan & Hbase as a backend store which is known > >> to scale. > >> > >> On Fri, Nov 4, 2016 at 12:02 PM Sandeep Nayak > >> wrote: > >> > >>> Hi Venkatesh, > >>> > >>> I apologize for the direct email, if there is a better channel to > >>> surface my questions I will be happy to go there. I am subscribed to > >>> dev@atlas but thought that may not be the right forum for questions > >>> potential Atlas use
Re: Interest in Apache Atlas
Hi Sandeep, Responses inline. Hoping others can pitch in with more recent information, as mine might be a little dated. Thanks hemanth From: Sandeep Nayak Sent: Sunday, December 04, 2016 12:00 AM To: dev@atlas.incubator.apache.org Cc: Venkatesh Seetharam Subject: Re: Interest in Apache Atlas Hi all, Sending a reminder, I am looking for answers to the questions below. Can someone help? Thanks in advance for your attention. - Sandeep On Thu, Dec 1, 2016 at 12:13 AM, Sandeep Nayak wrote: > Hi all, > > I had asked a couple questions to Venkatesh earlier please see email > below. He recommended that I move the questions to the dev mailing list and > thus this mail. > > To follow up on the questions asked below to my queries > > (a) Multi-tenancy: If I were to bring in data-sets from different > customers then I need to record, annotate or tag and provide access to > data-sets only to the relevant owners. Is it possible for me to record and > manage data-sets for different customers in a single Atlas instance? Does > Atlas provide me with the necessary constructs to separate recording of > data-sets by tenant and tracking metadata etc by tenant? It is possible to build a solution on top of Atlas to satisfy your requirements. It appears you need a namespacing facility of sorts. While there is no native construct like that in Atlas today (please see ATLAS-51, which is still open), I guess you could rely on the extensibility of the type system to let your objects extend from a base type that defines a tenant attribute. Then use wrapper APIs that filter out objects according to the tenant in question. Of course, one could use the lower level APIs to get around this, and hence it is cooperative in nature. > > (c) Performance Numbers: I understand it is built to scale given the use > of HBase but any performance numbers that can be shared will be helpful. > E.g. Is there a limit to the number of data-sets I can record on Atlas? Are > there performance numbers on the number of queries? > This is dated information (at least couple of months). If someone has updated numbers, we should hear from them. At that time, we tested importing 50K Hive tables and dependent objects (columns etc) with a total of about < 10M vertices. >From what I remember, I think we could import these in about 20 minutes or so. >However, this does make some assumptions about the dependencies on the data >sets and hence we could bump up parallelism for import. We tested reads with >queries from 30 users in parallel. Times vary based on type of queries - >simple lookups take seconds, but more complex queries like lineage take longer. This is a constant source of improvement in the project and there are several JIRAs talking about performance changes including some that are still open. E.g. ATLAS-711. > (d) Are there companies using Atlas in production at this stage? > > Thanks in advance for your responses. > > - Sandeep > > > > > On Fri, Nov 18, 2016 at 9:10 AM, Venkatesh Seetharam > wrote: > >> Sandeep - please use the dev mailing list for atlas for a prompt response. >> >> (a) How can one achieve multi-tenancy on Apache Atlas? >> Can you pls elaborate? You can always have a package structure for your >> data sets. >> >> (b) Is Atlas ready for production usage? >> It depends, I think it is but needs some scripting around BCP, etc. >> >> (c) Are there published numbers on the volume of data-sets Atlas can >> manage? >> Its built to scale, uses Titan & Hbase as a backend store which is known >> to scale. >> >> On Fri, Nov 4, 2016 at 12:02 PM Sandeep Nayak >> wrote: >> >>> Hi Venkatesh, >>> >>> I apologize for the direct email, if there is a better channel to >>> surface my questions I will be happy to go there. I am subscribed to >>> dev@atlas but thought that may not be the right forum for questions >>> potential Atlas users may have. >>> >>> I am looking for Data Catalog solutions and in early evaluation and from >>> what I read so far it appears Apache Atlas provides most of the >>> capabilities I am looking for. Namely data-set registration, lineage >>> tracking, access control (via Ranger), auditing to name a few. >>> >>> I do have a couple questions which will help me in my evaluation >>> >>> (a) How can one achieve multi-tenancy on Apache Atlas? >>> (b) Is Atlas ready for production usage? >>> (c) Are there published numbers on the volume of data-sets Atlas can >>> manage? One of the requirements I pointed out above is data lineage and if >>> I am ingesting streaming and batch data sets the typical volumes could be >>> very high. >>> >>> Hoping you will point me in the right direction to get answers. >>> >>> Thanks for your time and help. >>> >>> Regards, >>> >>> Sandeep >>> >> >
Re: Interest in Apache Atlas
Hi all, Sending a reminder, I am looking for answers to the questions below. Can someone help? Thanks in advance for your attention. - Sandeep On Thu, Dec 1, 2016 at 12:13 AM, Sandeep Nayak wrote: > Hi all, > > I had asked a couple questions to Venkatesh earlier please see email > below. He recommended that I move the questions to the dev mailing list and > thus this mail. > > To follow up on the questions asked below to my queries > > (a) Multi-tenancy: If I were to bring in data-sets from different > customers then I need to record, annotate or tag and provide access to > data-sets only to the relevant owners. Is it possible for me to record and > manage data-sets for different customers in a single Atlas instance? Does > Atlas provide me with the necessary constructs to separate recording of > data-sets by tenant and tracking metadata etc by tenant? > > (c) Performance Numbers: I understand it is built to scale given the use > of HBase but any performance numbers that can be shared will be helpful. > E.g. Is there a limit to the number of data-sets I can record on Atlas? Are > there performance numbers on the number of queries? > > (d) Are there companies using Atlas in production at this stage? > > Thanks in advance for your responses. > > - Sandeep > > > > > On Fri, Nov 18, 2016 at 9:10 AM, Venkatesh Seetharam > wrote: > >> Sandeep - please use the dev mailing list for atlas for a prompt response. >> >> (a) How can one achieve multi-tenancy on Apache Atlas? >> Can you pls elaborate? You can always have a package structure for your >> data sets. >> >> (b) Is Atlas ready for production usage? >> It depends, I think it is but needs some scripting around BCP, etc. >> >> (c) Are there published numbers on the volume of data-sets Atlas can >> manage? >> Its built to scale, uses Titan & Hbase as a backend store which is known >> to scale. >> >> On Fri, Nov 4, 2016 at 12:02 PM Sandeep Nayak >> wrote: >> >>> Hi Venkatesh, >>> >>> I apologize for the direct email, if there is a better channel to >>> surface my questions I will be happy to go there. I am subscribed to >>> dev@atlas but thought that may not be the right forum for questions >>> potential Atlas users may have. >>> >>> I am looking for Data Catalog solutions and in early evaluation and from >>> what I read so far it appears Apache Atlas provides most of the >>> capabilities I am looking for. Namely data-set registration, lineage >>> tracking, access control (via Ranger), auditing to name a few. >>> >>> I do have a couple questions which will help me in my evaluation >>> >>> (a) How can one achieve multi-tenancy on Apache Atlas? >>> (b) Is Atlas ready for production usage? >>> (c) Are there published numbers on the volume of data-sets Atlas can >>> manage? One of the requirements I pointed out above is data lineage and if >>> I am ingesting streaming and batch data sets the typical volumes could be >>> very high. >>> >>> Hoping you will point me in the right direction to get answers. >>> >>> Thanks for your time and help. >>> >>> Regards, >>> >>> Sandeep >>> >> >
Re: Interest in Apache Atlas
Hi all, I had asked a couple questions to Venkatesh earlier please see email below. He recommended that I move the questions to the dev mailing list and thus this mail. To follow up on the questions asked below to my queries (a) Multi-tenancy: If I were to bring in data-sets from different customers then I need to record, annotate or tag and provide access to data-sets only to the relevant owners. Is it possible for me to record and manage data-sets for different customers in a single Atlas instance? Does Atlas provide me with the necessary constructs to separate recording of data-sets by tenant and tracking metadata etc by tenant? (c) Performance Numbers: I understand it is built to scale given the use of HBase but any performance numbers that can be shared will be helpful. E.g. Is there a limit to the number of data-sets I can record on Atlas? Are there performance numbers on the number of queries? (d) Are there companies using Atlas in production at this stage? Thanks in advance for your responses. - Sandeep On Fri, Nov 18, 2016 at 9:10 AM, Venkatesh Seetharam wrote: > Sandeep - please use the dev mailing list for atlas for a prompt response. > > (a) How can one achieve multi-tenancy on Apache Atlas? > Can you pls elaborate? You can always have a package structure for your > data sets. > > (b) Is Atlas ready for production usage? > It depends, I think it is but needs some scripting around BCP, etc. > > (c) Are there published numbers on the volume of data-sets Atlas can > manage? > Its built to scale, uses Titan & Hbase as a backend store which is known > to scale. > > On Fri, Nov 4, 2016 at 12:02 PM Sandeep Nayak > wrote: > >> Hi Venkatesh, >> >> I apologize for the direct email, if there is a better channel to surface >> my questions I will be happy to go there. I am subscribed to dev@atlas >> but thought that may not be the right forum for questions potential Atlas >> users may have. >> >> I am looking for Data Catalog solutions and in early evaluation and from >> what I read so far it appears Apache Atlas provides most of the >> capabilities I am looking for. Namely data-set registration, lineage >> tracking, access control (via Ranger), auditing to name a few. >> >> I do have a couple questions which will help me in my evaluation >> >> (a) How can one achieve multi-tenancy on Apache Atlas? >> (b) Is Atlas ready for production usage? >> (c) Are there published numbers on the volume of data-sets Atlas can >> manage? One of the requirements I pointed out above is data lineage and if >> I am ingesting streaming and batch data sets the typical volumes could be >> very high. >> >> Hoping you will point me in the right direction to get answers. >> >> Thanks for your time and help. >> >> Regards, >> >> Sandeep >> >