Re: Interest in Apache Atlas

2016-12-05 Thread David Radley
Hi,
I thought I would add a couple of ideas. 
1) another way would be to separate customer information into separate 
atlas instances. This would make sense if the customers had significantly 
different policies for example they had different laws to comply with as 
they were in different geographies. Exchange of metadata could of course 
occur;  this could be an import or export. 
2) In terms of namespaces - I think of namespaces being useful to split 
test and production data and the like for a single customer. I would 
suggest the multiple tenancy scenario could be managed using tags and 
Ranger policies to separate who could see and act on what. 

   all the best David. 




From:   Sandeep Nayak 
To: dev@atlas.incubator.apache.org
Cc: Venkatesh Seetharam 
Date:   04/12/2016 20:33
Subject:Re: Interest in Apache Atlas



Hi Hemanth,

Thank you for taking the time to respond. I will take a look at ATLAS-51
and will also be interested in hearing from others like you eluded to in
your response.

Cheers,

Sandeep.

On Sun, Dec 4, 2016 at 5:09 AM, Hemanth Yamijala 

wrote:

> Hi Sandeep,
>
> Responses inline. Hoping others can pitch in with more recent 
information,
> as mine might be a little dated.
>
> Thanks
> hemanth
> 
> From: Sandeep Nayak 
> Sent: Sunday, December 04, 2016 12:00 AM
> To: dev@atlas.incubator.apache.org
> Cc: Venkatesh Seetharam
> Subject: Re: Interest in Apache Atlas
>
> Hi all,
>
> Sending a reminder, I am looking for answers to the questions below. Can
> someone help?
>
> Thanks in advance for your attention.
>
> - Sandeep
>
> On Thu, Dec 1, 2016 at 12:13 AM, Sandeep Nayak 
> wrote:
>
> > Hi all,
> >
> > I had asked a couple questions to Venkatesh earlier please see email
> > below. He recommended that I move the questions to the dev mailing 
list
> and
> > thus this mail.
> >
> > To follow up on the questions asked below to my queries
> >
> > (a) Multi-tenancy: If I were to bring in data-sets from different
> > customers then I need to record, annotate or tag and provide access to
> > data-sets only to the relevant owners. Is it possible for me to record
> and
> > manage data-sets for different customers in a single Atlas instance? 
Does
> > Atlas provide me with the necessary constructs to separate recording 
of
> > data-sets by tenant and tracking metadata etc by tenant?
>
> It is possible to build a solution on top of Atlas to satisfy your
> requirements. It appears you need a namespacing facility of sorts. While
> there is no native construct like that in Atlas today (please see 
ATLAS-51,
> which is still open), I guess you could rely on the extensibility of the
> type system to let your objects extend from a base type that defines a
> tenant attribute. Then use wrapper APIs that filter out objects 
according
> to the tenant in question. Of course, one could use the lower level APIs 
to
> get around this, and hence it is cooperative in nature.
>
> >
> > (c) Performance Numbers: I understand it is built to scale given the 
use
> > of HBase but any performance numbers that can be shared will be 
helpful.
> > E.g. Is there a limit to the number of data-sets I can record on 
Atlas?
> Are
> > there performance numbers on the number of queries?
> >
>
> This is dated information (at least couple of months). If someone has
> updated numbers, we should hear from them. At that time, we tested
> importing 50K Hive tables and dependent objects (columns etc) with a 
total
> of about < 10M vertices.
>
> From what I remember, I think we could import these in about 20 minutes 
or
> so. However, this does make some assumptions about the dependencies on 
the
> data sets and hence we could bump up parallelism for import. We tested
> reads with queries from 30 users in parallel. Times vary based on type 
of
> queries - simple lookups take seconds, but more complex queries like
> lineage take longer.
>
> This is a constant source of improvement in the project and there are
> several JIRAs talking about performance changes including some that are
> still open. E.g. ATLAS-711.
>
> > (d) Are there companies using Atlas in production at this stage?
> >
> > Thanks in advance for your responses.
> >
> > - Sandeep
> >
> >
> >
> >
> > On Fri, Nov 18, 2016 at 9:10 AM, Venkatesh Seetharam <
> venkat...@apache.org
> > > wrote:
> >
> >> Sandeep - please use the dev mailing list for atlas for a prompt
> response.
> >>
> >> (a) How can one achieve multi-tenancy on Apache Atlas?
> >> Can you pls elaborate? You can always have a package structure for 
you

Re: Interest in Apache Atlas

2016-12-04 Thread Sandeep Nayak
Hi Hemanth,

Thank you for taking the time to respond. I will take a look at ATLAS-51
and will also be interested in hearing from others like you eluded to in
your response.

Cheers,

Sandeep.

On Sun, Dec 4, 2016 at 5:09 AM, Hemanth Yamijala 
wrote:

> Hi Sandeep,
>
> Responses inline. Hoping others can pitch in with more recent information,
> as mine might be a little dated.
>
> Thanks
> hemanth
> 
> From: Sandeep Nayak 
> Sent: Sunday, December 04, 2016 12:00 AM
> To: dev@atlas.incubator.apache.org
> Cc: Venkatesh Seetharam
> Subject: Re: Interest in Apache Atlas
>
> Hi all,
>
> Sending a reminder, I am looking for answers to the questions below. Can
> someone help?
>
> Thanks in advance for your attention.
>
> - Sandeep
>
> On Thu, Dec 1, 2016 at 12:13 AM, Sandeep Nayak 
> wrote:
>
> > Hi all,
> >
> > I had asked a couple questions to Venkatesh earlier please see email
> > below. He recommended that I move the questions to the dev mailing list
> and
> > thus this mail.
> >
> > To follow up on the questions asked below to my queries
> >
> > (a) Multi-tenancy: If I were to bring in data-sets from different
> > customers then I need to record, annotate or tag and provide access to
> > data-sets only to the relevant owners. Is it possible for me to record
> and
> > manage data-sets for different customers in a single Atlas instance? Does
> > Atlas provide me with the necessary constructs to separate recording of
> > data-sets by tenant and tracking metadata etc by tenant?
>
> It is possible to build a solution on top of Atlas to satisfy your
> requirements. It appears you need a namespacing facility of sorts. While
> there is no native construct like that in Atlas today (please see ATLAS-51,
> which is still open), I guess you could rely on the extensibility of the
> type system to let your objects extend from a base type that defines a
> tenant attribute. Then use wrapper APIs that filter out objects according
> to the tenant in question. Of course, one could use the lower level APIs to
> get around this, and hence it is cooperative in nature.
>
> >
> > (c) Performance Numbers: I understand it is built to scale given the use
> > of HBase but any performance numbers that can be shared will be helpful.
> > E.g. Is there a limit to the number of data-sets I can record on Atlas?
> Are
> > there performance numbers on the number of queries?
> >
>
> This is dated information (at least couple of months). If someone has
> updated numbers, we should hear from them. At that time, we tested
> importing 50K Hive tables and dependent objects (columns etc) with a total
> of about < 10M vertices.
>
> From what I remember, I think we could import these in about 20 minutes or
> so. However, this does make some assumptions about the dependencies on the
> data sets and hence we could bump up parallelism for import. We tested
> reads with queries from 30 users in parallel. Times vary based on type of
> queries - simple lookups take seconds, but more complex queries like
> lineage take longer.
>
> This is a constant source of improvement in the project and there are
> several JIRAs talking about performance changes including some that are
> still open. E.g. ATLAS-711.
>
> > (d) Are there companies using Atlas in production at this stage?
> >
> > Thanks in advance for your responses.
> >
> > - Sandeep
> >
> >
> >
> >
> > On Fri, Nov 18, 2016 at 9:10 AM, Venkatesh Seetharam <
> venkat...@apache.org
> > > wrote:
> >
> >> Sandeep - please use the dev mailing list for atlas for a prompt
> response.
> >>
> >> (a) How can one achieve multi-tenancy on Apache Atlas?
> >> Can you pls elaborate? You can always have a package structure for your
> >> data sets.
> >>
> >> (b) Is Atlas ready for production usage?
> >> It depends, I think it is but needs some scripting around BCP, etc.
> >>
> >> (c) Are there published numbers on the volume of data-sets Atlas can
> >> manage?
> >> Its built to scale, uses Titan & Hbase as a backend store which is known
> >> to scale.
> >>
> >> On Fri, Nov 4, 2016 at 12:02 PM Sandeep Nayak 
> >> wrote:
> >>
> >>> Hi Venkatesh,
> >>>
> >>> I apologize for the direct email, if there is a better channel to
> >>> surface my questions I will be happy to go there. I am subscribed to
> >>> dev@atlas but thought that may not be the right forum for questions
> >>> potential Atlas use

Re: Interest in Apache Atlas

2016-12-04 Thread Hemanth Yamijala
Hi Sandeep,

Responses inline. Hoping others can pitch in with more recent information, as 
mine might be a little dated.

Thanks
hemanth

From: Sandeep Nayak 
Sent: Sunday, December 04, 2016 12:00 AM
To: dev@atlas.incubator.apache.org
Cc: Venkatesh Seetharam
Subject: Re: Interest in Apache Atlas

Hi all,

Sending a reminder, I am looking for answers to the questions below. Can
someone help?

Thanks in advance for your attention.

- Sandeep

On Thu, Dec 1, 2016 at 12:13 AM, Sandeep Nayak 
wrote:

> Hi all,
>
> I had asked a couple questions to Venkatesh earlier please see email
> below. He recommended that I move the questions to the dev mailing list and
> thus this mail.
>
> To follow up on the questions asked below to my queries
>
> (a) Multi-tenancy: If I were to bring in data-sets from different
> customers then I need to record, annotate or tag and provide access to
> data-sets only to the relevant owners. Is it possible for me to record and
> manage data-sets for different customers in a single Atlas instance? Does
> Atlas provide me with the necessary constructs to separate recording of
> data-sets by tenant and tracking metadata etc by tenant?

It is possible to build a solution on top of Atlas to satisfy your 
requirements. It appears you need a namespacing facility of sorts. While there 
is no native construct like that in Atlas today (please see ATLAS-51, which is 
still open), I guess you could rely on the extensibility of the type system to 
let your objects extend from a base type that defines a tenant attribute. Then 
use wrapper APIs that filter out objects according to the tenant in question. 
Of course, one could use the lower level APIs to get around this, and hence it 
is cooperative in nature.

>
> (c) Performance Numbers: I understand it is built to scale given the use
> of HBase but any performance numbers that can be shared will be helpful.
> E.g. Is there a limit to the number of data-sets I can record on Atlas? Are
> there performance numbers on the number of queries?
>

This is dated information (at least couple of months). If someone has updated 
numbers, we should hear from them. At that time, we tested importing 50K Hive 
tables and dependent objects (columns etc) with a total of about < 10M 
vertices. 

>From what I remember, I think we could import these in about 20 minutes or so. 
>However, this does make some assumptions about the dependencies on the data 
>sets and hence we could bump up parallelism for import. We tested reads with 
>queries from 30 users in parallel. Times vary based on type of queries - 
>simple lookups take seconds, but more complex queries like lineage take longer.

This is a constant source of improvement in the project and there are several 
JIRAs talking about performance changes including some that are still open. 
E.g. ATLAS-711.

> (d) Are there companies using Atlas in production at this stage?
>
> Thanks in advance for your responses.
>
> - Sandeep
>
>
>
>
> On Fri, Nov 18, 2016 at 9:10 AM, Venkatesh Seetharam  > wrote:
>
>> Sandeep - please use the dev mailing list for atlas for a prompt response.
>>
>> (a) How can one achieve multi-tenancy on Apache Atlas?
>> Can you pls elaborate? You can always have a package structure for your
>> data sets.
>>
>> (b) Is Atlas ready for production usage?
>> It depends, I think it is but needs some scripting around BCP, etc.
>>
>> (c) Are there published numbers on the volume of data-sets Atlas can
>> manage?
>> Its built to scale, uses Titan & Hbase as a backend store which is known
>> to scale.
>>
>> On Fri, Nov 4, 2016 at 12:02 PM Sandeep Nayak 
>> wrote:
>>
>>> Hi Venkatesh,
>>>
>>> I apologize for the direct email, if there is a better channel to
>>> surface my questions I will be happy to go there. I am subscribed to
>>> dev@atlas but thought that may not be the right forum for questions
>>> potential Atlas users may have.
>>>
>>> I am looking for Data Catalog solutions and in early evaluation and from
>>> what I read so far it appears Apache Atlas provides most of the
>>> capabilities I am looking for. Namely data-set registration, lineage
>>> tracking, access control (via Ranger), auditing to name a few.
>>>
>>> I do have a couple questions which will help me in my evaluation
>>>
>>> (a) How can one achieve multi-tenancy on Apache Atlas?
>>> (b) Is Atlas ready for production usage?
>>> (c) Are there published numbers on the volume of data-sets Atlas can
>>> manage? One of the requirements I pointed out above is data lineage and if
>>> I am ingesting streaming and batch data sets the typical volumes could be
>>> very high.
>>>
>>> Hoping you will point me in the right direction to get answers.
>>>
>>> Thanks for your time and help.
>>>
>>> Regards,
>>>
>>> Sandeep
>>>
>>
>


Re: Interest in Apache Atlas

2016-12-03 Thread Sandeep Nayak
Hi all,

Sending a reminder, I am looking for answers to the questions below. Can
someone help?

Thanks in advance for your attention.

- Sandeep

On Thu, Dec 1, 2016 at 12:13 AM, Sandeep Nayak 
wrote:

> Hi all,
>
> I had asked a couple questions to Venkatesh earlier please see email
> below. He recommended that I move the questions to the dev mailing list and
> thus this mail.
>
> To follow up on the questions asked below to my queries
>
> (a) Multi-tenancy: If I were to bring in data-sets from different
> customers then I need to record, annotate or tag and provide access to
> data-sets only to the relevant owners. Is it possible for me to record and
> manage data-sets for different customers in a single Atlas instance? Does
> Atlas provide me with the necessary constructs to separate recording of
> data-sets by tenant and tracking metadata etc by tenant?
>
> (c) Performance Numbers: I understand it is built to scale given the use
> of HBase but any performance numbers that can be shared will be helpful.
> E.g. Is there a limit to the number of data-sets I can record on Atlas? Are
> there performance numbers on the number of queries?
>
> (d) Are there companies using Atlas in production at this stage?
>
> Thanks in advance for your responses.
>
> - Sandeep
>
>
>
>
> On Fri, Nov 18, 2016 at 9:10 AM, Venkatesh Seetharam  > wrote:
>
>> Sandeep - please use the dev mailing list for atlas for a prompt response.
>>
>> (a) How can one achieve multi-tenancy on Apache Atlas?
>> Can you pls elaborate? You can always have a package structure for your
>> data sets.
>>
>> (b) Is Atlas ready for production usage?
>> It depends, I think it is but needs some scripting around BCP, etc.
>>
>> (c) Are there published numbers on the volume of data-sets Atlas can
>> manage?
>> Its built to scale, uses Titan & Hbase as a backend store which is known
>> to scale.
>>
>> On Fri, Nov 4, 2016 at 12:02 PM Sandeep Nayak 
>> wrote:
>>
>>> Hi Venkatesh,
>>>
>>> I apologize for the direct email, if there is a better channel to
>>> surface my questions I will be happy to go there. I am subscribed to
>>> dev@atlas but thought that may not be the right forum for questions
>>> potential Atlas users may have.
>>>
>>> I am looking for Data Catalog solutions and in early evaluation and from
>>> what I read so far it appears Apache Atlas provides most of the
>>> capabilities I am looking for. Namely data-set registration, lineage
>>> tracking, access control (via Ranger), auditing to name a few.
>>>
>>> I do have a couple questions which will help me in my evaluation
>>>
>>> (a) How can one achieve multi-tenancy on Apache Atlas?
>>> (b) Is Atlas ready for production usage?
>>> (c) Are there published numbers on the volume of data-sets Atlas can
>>> manage? One of the requirements I pointed out above is data lineage and if
>>> I am ingesting streaming and batch data sets the typical volumes could be
>>> very high.
>>>
>>> Hoping you will point me in the right direction to get answers.
>>>
>>> Thanks for your time and help.
>>>
>>> Regards,
>>>
>>> Sandeep
>>>
>>
>


Re: Interest in Apache Atlas

2016-12-01 Thread Sandeep Nayak
Hi all,

I had asked a couple questions to Venkatesh earlier please see email below.
He recommended that I move the questions to the dev mailing list and thus
this mail.

To follow up on the questions asked below to my queries

(a) Multi-tenancy: If I were to bring in data-sets from different customers
then I need to record, annotate or tag and provide access to data-sets only
to the relevant owners. Is it possible for me to record and manage
data-sets for different customers in a single Atlas instance? Does Atlas
provide me with the necessary constructs to separate recording of data-sets
by tenant and tracking metadata etc by tenant?

(c) Performance Numbers: I understand it is built to scale given the use of
HBase but any performance numbers that can be shared will be helpful. E.g.
Is there a limit to the number of data-sets I can record on Atlas? Are
there performance numbers on the number of queries?

(d) Are there companies using Atlas in production at this stage?

Thanks in advance for your responses.

- Sandeep




On Fri, Nov 18, 2016 at 9:10 AM, Venkatesh Seetharam 
wrote:

> Sandeep - please use the dev mailing list for atlas for a prompt response.
>
> (a) How can one achieve multi-tenancy on Apache Atlas?
> Can you pls elaborate? You can always have a package structure for your
> data sets.
>
> (b) Is Atlas ready for production usage?
> It depends, I think it is but needs some scripting around BCP, etc.
>
> (c) Are there published numbers on the volume of data-sets Atlas can
> manage?
> Its built to scale, uses Titan & Hbase as a backend store which is known
> to scale.
>
> On Fri, Nov 4, 2016 at 12:02 PM Sandeep Nayak 
> wrote:
>
>> Hi Venkatesh,
>>
>> I apologize for the direct email, if there is a better channel to surface
>> my questions I will be happy to go there. I am subscribed to dev@atlas
>> but thought that may not be the right forum for questions potential Atlas
>> users may have.
>>
>> I am looking for Data Catalog solutions and in early evaluation and from
>> what I read so far it appears Apache Atlas provides most of the
>> capabilities I am looking for. Namely data-set registration, lineage
>> tracking, access control (via Ranger), auditing to name a few.
>>
>> I do have a couple questions which will help me in my evaluation
>>
>> (a) How can one achieve multi-tenancy on Apache Atlas?
>> (b) Is Atlas ready for production usage?
>> (c) Are there published numbers on the volume of data-sets Atlas can
>> manage? One of the requirements I pointed out above is data lineage and if
>> I am ingesting streaming and batch data sets the typical volumes could be
>> very high.
>>
>> Hoping you will point me in the right direction to get answers.
>>
>> Thanks for your time and help.
>>
>> Regards,
>>
>> Sandeep
>>
>