Re: [DISCUSS] querying commit metadata from spark DataSource

2020-06-10 Thread Shiyan Xu
Yes, Vinoth, it does go a bit too far with first class support on these
data.
A global error table can do the job easily. As we discussed yesterday,
parallel local error tables with `_errors` suffix could also benefit for
some scenarios, like different product teams manage their own tables or in
2B case where customers manage their own data. These would prefer good
segregation on errors or other related data. Let me note down the points in
RFC-20 for further discussion. Thanks for the feedback!

On Wed, Jun 3, 2020 at 9:31 PM Vinoth Chandar  wrote:

> Hi Raymond,
>
> I am not sure generalizing this to all metadata like - errors and metrics -
> would be a good idea. We can certainly implement logging errors to a common
> errors hudi table, with a certain schema. But these can be just regular
> “hudi” format tables.
>
> Unlike the timeline metadata, these are really external data, not related
> to a given table’ core functioning.. we don’t necessarily want to keep one
> error table per hudi table..
>
> Thoughts?
>
> On Tue, Jun 2, 2020 at 5:34 PM Shiyan Xu 
> wrote:
>
> > I also encountered use cases where I'd like to programmatically query
> > metadata.
> > +1 on the idea of format(“hudi-timeline”)
> >
> > I also feel that the metadata can be extended further to include more
> info
> > like, errors, metrics/write statistics, etc. Like the newly proposed
> error
> > handling, we could also store all metrics or write stats there too, and
> > relate them to the timeline actions.
> >
> > A potential use case could be, with all these info encapsulated within
> > metadata, we may be able to derive some insightful results (by check
> > against some benchmarks) and answer questions like: does table A need
> more
> > tuning? does table B exceed error budget?
> >
> > Programmatic query to these metadata can help manage many tables in
> > diagnosis and inspection. We may need different read formats like
> > format("hudi-errors") or format("hudi-metrics")
> >
> > Sorry this sidetracked from the original question..These are really rough
> > high-level thoughts, and may have sign of over-engineering. Would like to
> > hear some feedbacks. Thanks.
> >
> >
> >
> >
> > On Mon, Jun 1, 2020 at 9:28 PM Satish Kotha  >
> > wrote:
> >
> > > Got it. I'll look into implementation choices for creating a new data
> > > source. Appreciate all the feedback.
> > >
> > > On Mon, Jun 1, 2020 at 7:53 PM Vinoth Chandar 
> wrote:
> > >
> > > > >Is it to separate data and metadata access?
> > > > Correct. We already have modes for querying data using
> format("hudi").
> > I
> > > > feel it will get very confusing to mix data and metadata in the same
> > > > source.. for e.g a lot of options we support for data may not even
> make
> > > > sense for the TimelineRelation.
> > > >
> > > > >This class seems like a list of static methods, I'm not seeing where
> > > these
> > > > are accessed from
> > > > That's the public API for obtaining this information for Scala/Java
> > > Spark.
> > > > If you have a way of calling this from python through some bridge
> > without
> > > > painful bridges (e.g jython), might be a tactical solution that can
> > meet
> > > > your needs.
> > > >
> > > > On Mon, Jun 1, 2020 at 5:07 PM Satish Kotha
> >  > > >
> > > > wrote:
> > > >
> > > > > Thanks for the feedback.
> > > > >
> > > > > What is the advantage of doing
> > > > > spark.read.format(“hudi-timeline”).load(basepath) as opposed to
> doing
> > > new
> > > > > relation? Is it to separate data and metadata access?
> > > > >
> > > > > Are you looking for similar functionality as
> HoodieDatasourceHelpers?
> > > > > >
> > > > > This class seems like a list of static methods, I'm not seeing
> where
> > > > these
> > > > > are accessed from. But, I need a way to query metadata details
> easily
> > > > > in pyspark.
> > > > >
> > > > >
> > > > > On Mon, Jun 1, 2020 at 8:02 AM Vinoth Chandar 
> > > wrote:
> > > > >
> > > > > > Also please take a look at
> > > > > >
> > > > >
> > > >
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_HUDI-2D309=DwIFaQ=r2dcLCtU9q6n0vrtnDw9vg=4xNSsHvHqd0Eym5a_ZpDVwlq_iJaZ0Rdk0u0SMLXZ0c=NLHsTFjPharIb29R1o1lWgYLCr1KIZZB4WGPt4IQnOE=fGOaSc8PxPJ8yqczQyzYtsqWMEXAbWdeKh-5xltbVG0=
> > > > > > .
> > > > > >
> > > > > > This was an effort to make the timeline more generalized for
> > querying
> > > > > (for
> > > > > > a different purpose).. but good to revisit now..
> > > > > >
> > > > > > On Sun, May 31, 2020 at 11:04 PM vbal...@apache.org <
> > > > vbal...@apache.org>
> > > > > > wrote:
> > > > > >
> > > > > > >
> > > > > > > I strongly recommend using a separate datasource relation
> (option
> > > 1)
> > > > to
> > > > > > > query timeline. It is elegant and fits well with spark APIs.
> > > > > > > Thanks.Balaji.VOn Saturday, May 30, 2020, 01:18:45 PM PDT,
> > > Vinoth
> > > > > > > Chandar  wrote:
> > > > > > >
> > > > > > >  Hi satish,
> > > > > > >
> > > > > > > Are you looking for similar 

[VOTE] Release 0.5.3, release candidate #2

2020-06-10 Thread Sivabalan
Hi everyone,

Please review and vote on the release candidate #2 for the version 0.5.3,
as follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)

 The complete staging area is available for your review, which includes:

* JIRA release notes [1],
* the official Apache source release and binary convenience releases to be
deployed to dist.apache.org [2], which are signed with the key with
fingerprint 001B66FA2B2543C151872CCC29A4FD82F1508833 [3],
* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag "release-0.5.3-rc2" [5],

The vote will be open for at least 72 hours. It is adopted by majority
approval, with at least 3 PMC affirmative votes.


Thanks,
Release Manager

[1]
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822=12348256

[2] https://dist.apache.org/repos/dist/dev/hudi/hudi-0.5.3-rc2/

[3] https://dist.apache.org/repos/dist/release/hudi/KEYS

[4] https://repository.apache.org/content/repositories/orgapachehudi-1023/

[5] https://github.com/apache/hudi/tree/release-0.5.3-rc2


Re: How to extend the timeline server schema to accommodate business metadata

2020-06-10 Thread Mario de Sá Vera
great ! will definitely follow up...

Em qua., 10 de jun. de 2020 às 19:28, Bhavani Sudha 
escreveu:

> Ah okay. Thanks for letting us know. I created a Jira here to capture this
> thread - https://issues.apache.org/jira/browse/HUDI-1020. Feel free to add
> to the jira.
>
> Thanks,
> Sudha
>
> On Wed, Jun 10, 2020 at 11:03 AM Mario de Sá Vera 
> wrote:
>
> > Sure Sudha, I am afraid I am not allowed to become a Hudi contributor
> > unfortunately ... but restrict myself to be an enthusiastic as my current
> > employer applies some severe restrictions.
> >
> > I would be more than happy to contribute by specifying the requirements
> but
> > from a code developer perspective I will have to pass that for now...
> >
> > Em qua., 10 de jun. de 2020 às 18:40, Bhavani Sudha <
> > bhavanisud...@gmail.com>
> > escreveu:
> >
> > > Definitely. I was trying to add you to the Hudi contributors so you can
> > > create a Jira . For that I need a jira id. If you have not already
> signed
> > > up, please sign up for Jira and let me know your jira id.
> > >
> > > Thanks,
> > > Sudha
> > >
> > > On Wed, Jun 10, 2020 at 12:17 AM Mario de Sá Vera 
> > > wrote:
> > >
> > > > Hi Sudha,
> > > >
> > > > Can you or Vinoth help me with this? How can we create a JIRA for
> that
> > ?
> > > >
> > > > I can collaborate bringing the description and definition of done.
> > > >
> > > > Thanks,
> > > >
> > > > Mario.
> > > >
> > > > On Tue, 9 Jun 2020, 23:46 Bhavani Sudha, 
> > > wrote:
> > > >
> > > > > Hi Mario,
> > > > >
> > > > > Can you please share your jira id ?
> > > > >
> > > > > Thanks,
> > > > > Sudha
> > > > >
> > > > > On Tue, Jun 9, 2020 at 3:29 AM Mario de Sá Vera <
> desav...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > hey Vinoth, I noticed you added this suggestion to the weekly log
> > ..
> > > > that
> > > > > > is great ! just let me know if I am able to create a JIRA , as I
> > > tried
> > > > to
> > > > > > go to HUDI project in Apache and did not find a way to do it. I
> can
> > > > bring
> > > > > > in a good description of the benefits etc...
> > > > > >
> > > > > > thanks, Mario.
> > > > > >
> > > > > > Em seg., 8 de jun. de 2020 às 12:46, Vinoth Chandar <
> > > vin...@apache.org
> > > > >
> > > > > > escreveu:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > We can probably make a new JIRA. Not sure if there is an
> existing
> > > > JIRA
> > > > > to
> > > > > > > re-use.
> > > > > > > The Following modules are good to look at.
> > > > > > >
> > > > > > > hudi-timeline-service
> > > > > > > packaging/hudi-timeline-server-bundle
> > > > > > >
> > > > > > > Thanks
> > > > > > > Vinoth
> > > > > > >
> > > > > > > On Fri, Jun 5, 2020 at 12:56 AM Mario de Sá Vera <
> > > desav...@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Sorry Vinoth for not being clear... If that is a work in
> > progress
> > > > > would
> > > > > > > you
> > > > > > > > have a jira I could follow up and contribute to ? If not ,
> what
> > > is
> > > > > the
> > > > > > > > module name you suggest me looking at?
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > >
> > > > > > > > Mario.
> > > > > > > >
> > > > > > > > On Fri, 5 Jun 2020, 02:12 Vinoth Chandar,  >
> > > > wrote:
> > > > > > > >
> > > > > > > > > Sorry did not understand the last part. :) are you
> suggesting
> > > we
> > > > > > > create a
> > > > > > > > > jira
> > > > > > > > >
> > > > > > > > > On Thu, Jun 4, 2020 at 1:08 AM Mario de Sá Vera <
> > > > > desav...@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > That sounds great ! Will check that and keep an eye on
> the
> > > long
> > > > > > > running
> > > > > > > > > > server approach... once it gets a ticket I could watch
> for
> > > just
> > > > > let
> > > > > > > me
> > > > > > > > > know
> > > > > > > > > > please.
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Thu, 4 Jun 2020, 05:34 Vinoth Chandar, <
> > vin...@apache.org
> > > >
> > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi Mario,
> > > > > > > > > > >
> > > > > > > > > > > We actually started with the idea of making the
> timeline
> > > > > server,
> > > > > > a
> > > > > > > > long
> > > > > > > > > > > running service.  We have a module if you notice that
> > > builds
> > > > > our
> > > > > > a
> > > > > > > > > bundle
> > > > > > > > > > > that you could deploy. May be you can play with it and
> > see
> > > if
> > > > > > that
> > > > > > > > > sounds
> > > > > > > > > > > interesting to you. It will definitely have some rough
> > > edges
> > > > > > given
> > > > > > > > it’s
> > > > > > > > > > not
> > > > > > > > > > > been widely used.
> > > > > > > > > > >
> > > > > > > > > > > Thanks
> > > > > > > > > > > Vinoth
> > > > > > > > > > >
> > > > > > > > > > > On Wed, Jun 3, 2020 at 2:33 AM Mario de Sá Vera <
> > > > > > > desav...@gmail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > 

Re: How to extend the timeline server schema to accommodate business metadata

2020-06-10 Thread Bhavani Sudha
Ah okay. Thanks for letting us know. I created a Jira here to capture this
thread - https://issues.apache.org/jira/browse/HUDI-1020. Feel free to add
to the jira.

Thanks,
Sudha

On Wed, Jun 10, 2020 at 11:03 AM Mario de Sá Vera 
wrote:

> Sure Sudha, I am afraid I am not allowed to become a Hudi contributor
> unfortunately ... but restrict myself to be an enthusiastic as my current
> employer applies some severe restrictions.
>
> I would be more than happy to contribute by specifying the requirements but
> from a code developer perspective I will have to pass that for now...
>
> Em qua., 10 de jun. de 2020 às 18:40, Bhavani Sudha <
> bhavanisud...@gmail.com>
> escreveu:
>
> > Definitely. I was trying to add you to the Hudi contributors so you can
> > create a Jira . For that I need a jira id. If you have not already signed
> > up, please sign up for Jira and let me know your jira id.
> >
> > Thanks,
> > Sudha
> >
> > On Wed, Jun 10, 2020 at 12:17 AM Mario de Sá Vera 
> > wrote:
> >
> > > Hi Sudha,
> > >
> > > Can you or Vinoth help me with this? How can we create a JIRA for that
> ?
> > >
> > > I can collaborate bringing the description and definition of done.
> > >
> > > Thanks,
> > >
> > > Mario.
> > >
> > > On Tue, 9 Jun 2020, 23:46 Bhavani Sudha, 
> > wrote:
> > >
> > > > Hi Mario,
> > > >
> > > > Can you please share your jira id ?
> > > >
> > > > Thanks,
> > > > Sudha
> > > >
> > > > On Tue, Jun 9, 2020 at 3:29 AM Mario de Sá Vera 
> > > > wrote:
> > > >
> > > > > hey Vinoth, I noticed you added this suggestion to the weekly log
> ..
> > > that
> > > > > is great ! just let me know if I am able to create a JIRA , as I
> > tried
> > > to
> > > > > go to HUDI project in Apache and did not find a way to do it. I can
> > > bring
> > > > > in a good description of the benefits etc...
> > > > >
> > > > > thanks, Mario.
> > > > >
> > > > > Em seg., 8 de jun. de 2020 às 12:46, Vinoth Chandar <
> > vin...@apache.org
> > > >
> > > > > escreveu:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > We can probably make a new JIRA. Not sure if there is an existing
> > > JIRA
> > > > to
> > > > > > re-use.
> > > > > > The Following modules are good to look at.
> > > > > >
> > > > > > hudi-timeline-service
> > > > > > packaging/hudi-timeline-server-bundle
> > > > > >
> > > > > > Thanks
> > > > > > Vinoth
> > > > > >
> > > > > > On Fri, Jun 5, 2020 at 12:56 AM Mario de Sá Vera <
> > desav...@gmail.com
> > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Sorry Vinoth for not being clear... If that is a work in
> progress
> > > > would
> > > > > > you
> > > > > > > have a jira I could follow up and contribute to ? If not , what
> > is
> > > > the
> > > > > > > module name you suggest me looking at?
> > > > > > >
> > > > > > > Regards,
> > > > > > >
> > > > > > > Mario.
> > > > > > >
> > > > > > > On Fri, 5 Jun 2020, 02:12 Vinoth Chandar, 
> > > wrote:
> > > > > > >
> > > > > > > > Sorry did not understand the last part. :) are you suggesting
> > we
> > > > > > create a
> > > > > > > > jira
> > > > > > > >
> > > > > > > > On Thu, Jun 4, 2020 at 1:08 AM Mario de Sá Vera <
> > > > desav...@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > That sounds great ! Will check that and keep an eye on the
> > long
> > > > > > running
> > > > > > > > > server approach... once it gets a ticket I could watch for
> > just
> > > > let
> > > > > > me
> > > > > > > > know
> > > > > > > > > please.
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Thu, 4 Jun 2020, 05:34 Vinoth Chandar, <
> vin...@apache.org
> > >
> > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Mario,
> > > > > > > > > >
> > > > > > > > > > We actually started with the idea of making the timeline
> > > > server,
> > > > > a
> > > > > > > long
> > > > > > > > > > running service.  We have a module if you notice that
> > builds
> > > > our
> > > > > a
> > > > > > > > bundle
> > > > > > > > > > that you could deploy. May be you can play with it and
> see
> > if
> > > > > that
> > > > > > > > sounds
> > > > > > > > > > interesting to you. It will definitely have some rough
> > edges
> > > > > given
> > > > > > > it’s
> > > > > > > > > not
> > > > > > > > > > been widely used.
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > > > Vinoth
> > > > > > > > > >
> > > > > > > > > > On Wed, Jun 3, 2020 at 2:33 AM Mario de Sá Vera <
> > > > > > desav...@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi Vinoth, thanks for your comments on this. I spent
> > > sometime
> > > > > > > > thinking
> > > > > > > > > > over
> > > > > > > > > > > another possibility which would be externalising the
> Hudi
> > > > > > timeline
> > > > > > > > > > service
> > > > > > > > > > > itself to an external server holding both operational
> (ie
> > > > Hudi)
> > > > > > and
> > > > > > > > > > > business metadata.
> > > > > > > > > > >
> > > > > > > > > > > would you guys have any 

Re: How to extend the timeline server schema to accommodate business metadata

2020-06-10 Thread Mario de Sá Vera
Sure Sudha, I am afraid I am not allowed to become a Hudi contributor
unfortunately ... but restrict myself to be an enthusiastic as my current
employer applies some severe restrictions.

I would be more than happy to contribute by specifying the requirements but
from a code developer perspective I will have to pass that for now...

Em qua., 10 de jun. de 2020 às 18:40, Bhavani Sudha 
escreveu:

> Definitely. I was trying to add you to the Hudi contributors so you can
> create a Jira . For that I need a jira id. If you have not already signed
> up, please sign up for Jira and let me know your jira id.
>
> Thanks,
> Sudha
>
> On Wed, Jun 10, 2020 at 12:17 AM Mario de Sá Vera 
> wrote:
>
> > Hi Sudha,
> >
> > Can you or Vinoth help me with this? How can we create a JIRA for that ?
> >
> > I can collaborate bringing the description and definition of done.
> >
> > Thanks,
> >
> > Mario.
> >
> > On Tue, 9 Jun 2020, 23:46 Bhavani Sudha, 
> wrote:
> >
> > > Hi Mario,
> > >
> > > Can you please share your jira id ?
> > >
> > > Thanks,
> > > Sudha
> > >
> > > On Tue, Jun 9, 2020 at 3:29 AM Mario de Sá Vera 
> > > wrote:
> > >
> > > > hey Vinoth, I noticed you added this suggestion to the weekly log ..
> > that
> > > > is great ! just let me know if I am able to create a JIRA , as I
> tried
> > to
> > > > go to HUDI project in Apache and did not find a way to do it. I can
> > bring
> > > > in a good description of the benefits etc...
> > > >
> > > > thanks, Mario.
> > > >
> > > > Em seg., 8 de jun. de 2020 às 12:46, Vinoth Chandar <
> vin...@apache.org
> > >
> > > > escreveu:
> > > >
> > > > > Hi,
> > > > >
> > > > > We can probably make a new JIRA. Not sure if there is an existing
> > JIRA
> > > to
> > > > > re-use.
> > > > > The Following modules are good to look at.
> > > > >
> > > > > hudi-timeline-service
> > > > > packaging/hudi-timeline-server-bundle
> > > > >
> > > > > Thanks
> > > > > Vinoth
> > > > >
> > > > > On Fri, Jun 5, 2020 at 12:56 AM Mario de Sá Vera <
> desav...@gmail.com
> > >
> > > > > wrote:
> > > > >
> > > > > > Sorry Vinoth for not being clear... If that is a work in progress
> > > would
> > > > > you
> > > > > > have a jira I could follow up and contribute to ? If not , what
> is
> > > the
> > > > > > module name you suggest me looking at?
> > > > > >
> > > > > > Regards,
> > > > > >
> > > > > > Mario.
> > > > > >
> > > > > > On Fri, 5 Jun 2020, 02:12 Vinoth Chandar, 
> > wrote:
> > > > > >
> > > > > > > Sorry did not understand the last part. :) are you suggesting
> we
> > > > > create a
> > > > > > > jira
> > > > > > >
> > > > > > > On Thu, Jun 4, 2020 at 1:08 AM Mario de Sá Vera <
> > > desav...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > That sounds great ! Will check that and keep an eye on the
> long
> > > > > running
> > > > > > > > server approach... once it gets a ticket I could watch for
> just
> > > let
> > > > > me
> > > > > > > know
> > > > > > > > please.
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > >
> > > > > > > >
> > > > > > > > On Thu, 4 Jun 2020, 05:34 Vinoth Chandar,  >
> > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Mario,
> > > > > > > > >
> > > > > > > > > We actually started with the idea of making the timeline
> > > server,
> > > > a
> > > > > > long
> > > > > > > > > running service.  We have a module if you notice that
> builds
> > > our
> > > > a
> > > > > > > bundle
> > > > > > > > > that you could deploy. May be you can play with it and see
> if
> > > > that
> > > > > > > sounds
> > > > > > > > > interesting to you. It will definitely have some rough
> edges
> > > > given
> > > > > > it’s
> > > > > > > > not
> > > > > > > > > been widely used.
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > > Vinoth
> > > > > > > > >
> > > > > > > > > On Wed, Jun 3, 2020 at 2:33 AM Mario de Sá Vera <
> > > > > desav...@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Vinoth, thanks for your comments on this. I spent
> > sometime
> > > > > > > thinking
> > > > > > > > > over
> > > > > > > > > > another possibility which would be externalising the Hudi
> > > > > timeline
> > > > > > > > > service
> > > > > > > > > > itself to an external server holding both operational (ie
> > > Hudi)
> > > > > and
> > > > > > > > > > business metadata.
> > > > > > > > > >
> > > > > > > > > > would you guys have any opinion on that ? would that be
> > easy
> > > > as I
> > > > > > do
> > > > > > > > not
> > > > > > > > > > seem to see a way yet , except reading about RocksDB but
> > that
> > > > is
> > > > > > > still
> > > > > > > > > not
> > > > > > > > > > quite clear.
> > > > > > > > > >
> > > > > > > > > > best regards,
> > > > > > > > > >
> > > > > > > > > > Mario.
> > > > > > > > > >
> > > > > > > > > > Em seg., 1 de jun. de 2020 às 16:01, Vinoth Chandar <
> > > > > > > > > > mail.vinoth.chan...@gmail.com> escreveu:
> > > > > > > > > >
> > > > > > > > > > > Hi Mario,
> > > > > > > > > > >
> > > > > > > > > > > Thanks for 

Re: How to extend the timeline server schema to accommodate business metadata

2020-06-10 Thread Bhavani Sudha
Definitely. I was trying to add you to the Hudi contributors so you can
create a Jira . For that I need a jira id. If you have not already signed
up, please sign up for Jira and let me know your jira id.

Thanks,
Sudha

On Wed, Jun 10, 2020 at 12:17 AM Mario de Sá Vera 
wrote:

> Hi Sudha,
>
> Can you or Vinoth help me with this? How can we create a JIRA for that ?
>
> I can collaborate bringing the description and definition of done.
>
> Thanks,
>
> Mario.
>
> On Tue, 9 Jun 2020, 23:46 Bhavani Sudha,  wrote:
>
> > Hi Mario,
> >
> > Can you please share your jira id ?
> >
> > Thanks,
> > Sudha
> >
> > On Tue, Jun 9, 2020 at 3:29 AM Mario de Sá Vera 
> > wrote:
> >
> > > hey Vinoth, I noticed you added this suggestion to the weekly log ..
> that
> > > is great ! just let me know if I am able to create a JIRA , as I tried
> to
> > > go to HUDI project in Apache and did not find a way to do it. I can
> bring
> > > in a good description of the benefits etc...
> > >
> > > thanks, Mario.
> > >
> > > Em seg., 8 de jun. de 2020 às 12:46, Vinoth Chandar  >
> > > escreveu:
> > >
> > > > Hi,
> > > >
> > > > We can probably make a new JIRA. Not sure if there is an existing
> JIRA
> > to
> > > > re-use.
> > > > The Following modules are good to look at.
> > > >
> > > > hudi-timeline-service
> > > > packaging/hudi-timeline-server-bundle
> > > >
> > > > Thanks
> > > > Vinoth
> > > >
> > > > On Fri, Jun 5, 2020 at 12:56 AM Mario de Sá Vera  >
> > > > wrote:
> > > >
> > > > > Sorry Vinoth for not being clear... If that is a work in progress
> > would
> > > > you
> > > > > have a jira I could follow up and contribute to ? If not , what is
> > the
> > > > > module name you suggest me looking at?
> > > > >
> > > > > Regards,
> > > > >
> > > > > Mario.
> > > > >
> > > > > On Fri, 5 Jun 2020, 02:12 Vinoth Chandar, 
> wrote:
> > > > >
> > > > > > Sorry did not understand the last part. :) are you suggesting we
> > > > create a
> > > > > > jira
> > > > > >
> > > > > > On Thu, Jun 4, 2020 at 1:08 AM Mario de Sá Vera <
> > desav...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > That sounds great ! Will check that and keep an eye on the long
> > > > running
> > > > > > > server approach... once it gets a ticket I could watch for just
> > let
> > > > me
> > > > > > know
> > > > > > > please.
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > >
> > > > > > > On Thu, 4 Jun 2020, 05:34 Vinoth Chandar, 
> > > wrote:
> > > > > > >
> > > > > > > > Hi Mario,
> > > > > > > >
> > > > > > > > We actually started with the idea of making the timeline
> > server,
> > > a
> > > > > long
> > > > > > > > running service.  We have a module if you notice that builds
> > our
> > > a
> > > > > > bundle
> > > > > > > > that you could deploy. May be you can play with it and see if
> > > that
> > > > > > sounds
> > > > > > > > interesting to you. It will definitely have some rough edges
> > > given
> > > > > it’s
> > > > > > > not
> > > > > > > > been widely used.
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > > Vinoth
> > > > > > > >
> > > > > > > > On Wed, Jun 3, 2020 at 2:33 AM Mario de Sá Vera <
> > > > desav...@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Vinoth, thanks for your comments on this. I spent
> sometime
> > > > > > thinking
> > > > > > > > over
> > > > > > > > > another possibility which would be externalising the Hudi
> > > > timeline
> > > > > > > > service
> > > > > > > > > itself to an external server holding both operational (ie
> > Hudi)
> > > > and
> > > > > > > > > business metadata.
> > > > > > > > >
> > > > > > > > > would you guys have any opinion on that ? would that be
> easy
> > > as I
> > > > > do
> > > > > > > not
> > > > > > > > > seem to see a way yet , except reading about RocksDB but
> that
> > > is
> > > > > > still
> > > > > > > > not
> > > > > > > > > quite clear.
> > > > > > > > >
> > > > > > > > > best regards,
> > > > > > > > >
> > > > > > > > > Mario.
> > > > > > > > >
> > > > > > > > > Em seg., 1 de jun. de 2020 às 16:01, Vinoth Chandar <
> > > > > > > > > mail.vinoth.chan...@gmail.com> escreveu:
> > > > > > > > >
> > > > > > > > > > Hi Mario,
> > > > > > > > > >
> > > > > > > > > > Thanks for the detailed explanation. Hudi already allows
> > > extra
> > > > > > > metadata
> > > > > > > > > to
> > > > > > > > > > be written atomically with each commit i.e write
> operation.
> > > In
> > > > > > fact,
> > > > > > > > that
> > > > > > > > > > is how we track checkpoints for our delta streamer tool..
> > It
> > > > may
> > > > > > not
> > > > > > > > > solve
> > > > > > > > > > the need for querying the data together with this
> > > information.
> > > > > but
> > > > > > > > gives
> > > > > > > > > > you ability to do some basic tagging.. if thats useful
> > > > > > > > > >
> > > > > > > > > > >>If we enable the timeline service metadata model to be
> > > > extended
> > > > > > we
> > > > > > > > > could
> > > > > > > > > > use the service instance itself to 

Re: How to extend the timeline server schema to accommodate business metadata

2020-06-10 Thread Mario de Sá Vera
Hi Sudha,

Can you or Vinoth help me with this? How can we create a JIRA for that ?

I can collaborate bringing the description and definition of done.

Thanks,

Mario.

On Tue, 9 Jun 2020, 23:46 Bhavani Sudha,  wrote:

> Hi Mario,
>
> Can you please share your jira id ?
>
> Thanks,
> Sudha
>
> On Tue, Jun 9, 2020 at 3:29 AM Mario de Sá Vera 
> wrote:
>
> > hey Vinoth, I noticed you added this suggestion to the weekly log .. that
> > is great ! just let me know if I am able to create a JIRA , as I tried to
> > go to HUDI project in Apache and did not find a way to do it. I can bring
> > in a good description of the benefits etc...
> >
> > thanks, Mario.
> >
> > Em seg., 8 de jun. de 2020 às 12:46, Vinoth Chandar 
> > escreveu:
> >
> > > Hi,
> > >
> > > We can probably make a new JIRA. Not sure if there is an existing JIRA
> to
> > > re-use.
> > > The Following modules are good to look at.
> > >
> > > hudi-timeline-service
> > > packaging/hudi-timeline-server-bundle
> > >
> > > Thanks
> > > Vinoth
> > >
> > > On Fri, Jun 5, 2020 at 12:56 AM Mario de Sá Vera 
> > > wrote:
> > >
> > > > Sorry Vinoth for not being clear... If that is a work in progress
> would
> > > you
> > > > have a jira I could follow up and contribute to ? If not , what is
> the
> > > > module name you suggest me looking at?
> > > >
> > > > Regards,
> > > >
> > > > Mario.
> > > >
> > > > On Fri, 5 Jun 2020, 02:12 Vinoth Chandar,  wrote:
> > > >
> > > > > Sorry did not understand the last part. :) are you suggesting we
> > > create a
> > > > > jira
> > > > >
> > > > > On Thu, Jun 4, 2020 at 1:08 AM Mario de Sá Vera <
> desav...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > That sounds great ! Will check that and keep an eye on the long
> > > running
> > > > > > server approach... once it gets a ticket I could watch for just
> let
> > > me
> > > > > know
> > > > > > please.
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > >
> > > > > > On Thu, 4 Jun 2020, 05:34 Vinoth Chandar, 
> > wrote:
> > > > > >
> > > > > > > Hi Mario,
> > > > > > >
> > > > > > > We actually started with the idea of making the timeline
> server,
> > a
> > > > long
> > > > > > > running service.  We have a module if you notice that builds
> our
> > a
> > > > > bundle
> > > > > > > that you could deploy. May be you can play with it and see if
> > that
> > > > > sounds
> > > > > > > interesting to you. It will definitely have some rough edges
> > given
> > > > it’s
> > > > > > not
> > > > > > > been widely used.
> > > > > > >
> > > > > > > Thanks
> > > > > > > Vinoth
> > > > > > >
> > > > > > > On Wed, Jun 3, 2020 at 2:33 AM Mario de Sá Vera <
> > > desav...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Vinoth, thanks for your comments on this. I spent sometime
> > > > > thinking
> > > > > > > over
> > > > > > > > another possibility which would be externalising the Hudi
> > > timeline
> > > > > > > service
> > > > > > > > itself to an external server holding both operational (ie
> Hudi)
> > > and
> > > > > > > > business metadata.
> > > > > > > >
> > > > > > > > would you guys have any opinion on that ? would that be easy
> > as I
> > > > do
> > > > > > not
> > > > > > > > seem to see a way yet , except reading about RocksDB but that
> > is
> > > > > still
> > > > > > > not
> > > > > > > > quite clear.
> > > > > > > >
> > > > > > > > best regards,
> > > > > > > >
> > > > > > > > Mario.
> > > > > > > >
> > > > > > > > Em seg., 1 de jun. de 2020 às 16:01, Vinoth Chandar <
> > > > > > > > mail.vinoth.chan...@gmail.com> escreveu:
> > > > > > > >
> > > > > > > > > Hi Mario,
> > > > > > > > >
> > > > > > > > > Thanks for the detailed explanation. Hudi already allows
> > extra
> > > > > > metadata
> > > > > > > > to
> > > > > > > > > be written atomically with each commit i.e write operation.
> > In
> > > > > fact,
> > > > > > > that
> > > > > > > > > is how we track checkpoints for our delta streamer tool..
> It
> > > may
> > > > > not
> > > > > > > > solve
> > > > > > > > > the need for querying the data together with this
> > information.
> > > > but
> > > > > > > gives
> > > > > > > > > you ability to do some basic tagging.. if thats useful
> > > > > > > > >
> > > > > > > > > >>If we enable the timeline service metadata model to be
> > > extended
> > > > > we
> > > > > > > > could
> > > > > > > > > use the service instance itself to support specialised
> > queries
> > > > that
> > > > > > > > involve
> > > > > > > > > business qualifiers in order to return a proper set of
> > metadata
> > > > > > > pointing
> > > > > > > > to
> > > > > > > > > the related commits
> > > > > > > > >
> > > > > > > > > This is a good idea actually.. There is another active
> > discuss
> > > > > thread
> > > > > > > on
> > > > > > > > > making the metadata queryable.. there is also
> > > > > > > > > https://issues.apache.org/jira/browse/HUDI-309 which we
> > paused
> > > > for
> > > > > > > now..
> > > > > > > > > But that's more in line with what you are thinking IIUC