Re: 4.0 documentation - Confluence limitations?

2024-01-08 Thread Ayush Saxena
When we discussed last time, all of us were in favour and wanted to
have documentation as part of our website per version, but we realised
that isn't very much chasable considering the number of volunteers we
have, so we decided to go with improving the existing wiki pages & add
new additions to those existing pages. I went a step ahead and rather
than just tweaking those pages, created a separate space for 4.0
documentation, not sure if we would be able to complete that either
but that is a work in progress and a stop gap solution till we get the
documentation properly on our website. needless to say I am +1 on
coining a proper framework for the documentation & to get them on our
website.

-Ayush

On Mon, 8 Jan 2024 at 15:52, Simhadri G  wrote:
>
> Hi Zsolt,
>
> The current hive website is built with hugo,  so +1 from me :)
>
> We do have a few doc pages written in hugo, example :
> https://hive.apache.org/developement/quickstart/
>
> To add a new page we will need to add a new markdown file in the correct
> location in the hive-site repo and hugo will render the same in the hive
> website.
> For reference , there is a readme section here on how to add new pages as
> well: https://github.com/apache/hive-site#to-add-new-content
> We can definitely change the formatting/style of docs as needed.
>
>
> Thanks!
> Simhadri G
>
> On Mon, Jan 8, 2024 at 3:04 PM Stamatis Zampetakis 
> wrote:
>
> > Hey Zsolt,
> >
> > There have been a few discussions in the past about moving the
> > documentation from the wiki to the website and from what I recall
> > people were more or less in favor of moving towards this direction.
> > The main thing missing is volunteers that are willing to take on this
> > migration step.
> >
> > Personally, I am very much in favor of going into this direction not
> > only for solving namespacing issues but also for traceability purposes
> > and facilitating doc contributions and reviews.
> >
> > Big +1 from me.
> >
> > Best,
> > Stamatis
> >
> > On Mon, Jan 8, 2024 at 10:15 AM Zsolt Miskolczi
> >  wrote:
> > >
> > > In confluence, page names should be unique in a given space. As I see,
> > > Apache Hive has its own space.
> > > And now comes the tricky part: with 4.0 documentation, we didn't create a
> > > new space, just a 4.0 parent page. We create a copy of existing pages
> > under
> > > the umbrella of this page:
> > > https://cwiki.apache.org/confluence/display/Hive/Apache+Hive+4.0.0
> > >
> > > The problem is the unique naming of pages: it would make sense to keep
> > the
> > > page names the same as in the older documents but unfortunately, we
> > cannot.
> > > So we try to create names that are almost the same, or just delay the
> > > decisions.
> > > Two examples:
> > > - AdminManual Installation
> > > <
> > https://cwiki.apache.org/confluence/display/Hive/AdminManual+Installation>
> > > became Manual Installation
> > > 
> > > - Hive Schema Tool
> > > became
> > Copy
> > > of Hive Schema Tool - [TODO: move it under a 4.0 admin manual page, find
> > a
> > > proper name]
> > > <
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=284790216
> > >
> > >
> > > I feel multiple issues with that: Consistency is gone. And also, I'm not
> > > sure how it can support search engines. Also, it can be confusing for
> > > people who want to use the wiki pages.
> > >
> > > I was thinking about different solutions. Creating a Hive 4.0 space in
> > > Confluence can solve the problem of page uniqueness. But doesn't address
> > > the issue of searchability and ease of use.
> > >
> > > We can also keep the current one but in that case, it would be
> > recommended
> > > to figure out a great naming convention about the pages.
> > >
> > > At this point, my best idea is to move to an engine that has better
> > offers
> > > to document a software product. For example, Iceberg uses Hugo. It is a
> > > markup-based engine, it can be kept in source control and pretty fast.
> > > Example page: https://iceberg.apache.org/docs/1.4.1/.
> > >
> > >
> > > What do you think of that?
> > >
> > > Thank you,
> > > Zsolt
> >


Re: Force coding style in hive precommit

2024-01-08 Thread Zoltán Rátkai
+1

I think most of the devs use IntelliJ, where no other plugin needed to have
a Code Style. It can even import Eclipse formatting file.

Regards,

RZ

On Mon, Jan 8, 2024 at 10:22 AM Stamatis Zampetakis 
wrote:

> +1 for enforcing style on new code. It will definitely save us from
> additional review cycles.
>
> Although I like checkstyle I tend to prefer tools that can
> automatically apply and fix style violations such as spotless [1].
>
> It seems that the spotless plugin can be configured to enforce
> formatting gradually [2] so I think it is an ideal choice for this
> discussion.
>
> To avoid wasting CI resources for nothing we can employ spotless (or
> other plugins) during the regular build so that detect and fix style
> violations fail early on before raising the PR.
>
> Finally, spotless can be configured easily to apply Eclipse styles so
> making it use our recommended formatting [3] would be trivial.
>
> Best,
> Stamatis
>
> [1] https://github.com/diffplug/spotless
> [2]
> https://github.com/diffplug/spotless/tree/main/plugin-maven#how-can-i-enforce-formatting-gradually-aka-ratchet
> [3]
> https://github.com/apache/hive/blob/master/dev-support/eclipse-styles.xml
>
> On Mon, Jan 8, 2024 at 11:06 AM Zsolt Miskolczi
>  wrote:
> >
> > I think giving a warning is something that nobody will check. It could
> only
> > make sense if it is formatted in a way that it cannot be overseen. In
> every
> > other case, it is just ignored. And also, we are already full of warnings
> > so I'm afraid it can just hide in the noise.
> > Sorry, I don't know how it works in hadoop/tez, maybe it is easy to use.
> >
> > Ayush Saxena  ezt írta (időpont: 2024. jan. 8., H,
> > 9:53):
> >
> > > +1, to have a checkstyle build. I am strongly against doing that big
> > > refactor to make just checkstyle happy, such a refactor will make
> > > backports to Hive lower branches tough and the life of folks
> > > maintaining downstream forks quite painful.
> > >
> > > We should enforce same kind of stuff like in Tez/Hadoop, where
> > > checkstyle violations are highlighted and the committer before
> > > committing can check that & decide whether that in unavoidable or not
> > >
> > > -Ayush
> > >
> > > On Mon, 8 Jan 2024 at 14:05, László Bodor 
> > > wrote:
> > > >
> > > > thanks for the responses so far!
> > > > I'm a bit against the one-time huge refactor commit as we don't need
> that
> > > > (but I can be convinced of course), because checkstyle can be set up
> to
> > > > warn only on style issues in the new/touched bits in the PR (or at
> least
> > > > that's how it works in tez), that's what we need, so we don't have to
> > > make
> > > > that huge commit to simply introduce this enforcement
> > > >
> > > > Butao Zhang  ezt írta (időpont: 2024. jan. 8.,
> H,
> > > > 9:28):
> > > >
> > > > > +1
> > > > >
> > > > >
> > > > >
> > > > > BTW, We have a independent checkstyle file under iceberg module
> > > > > https://github.com/apache/hive/tree/master/iceberg/checkstyle . I
> > > think
> > > > > we need to consider unifing the checkstyle in all the sub-module.
> > > > >
> > > > >
> > > > > Thanks,
> > > > > Butao Zhang
> > > > >  Replied Message 
> > > > > | From | Zsolt Miskolczi |
> > > > > | Date | 1/8/2024 16:19 |
> > > > > | To |  |
> > > > > | Subject | Re: Force coding style in hive precommit |
> > > > > +1
> > > > >
> > > > > In case there is an agreement about the coding style, we can
> prepare a
> > > tool
> > > > > that enforces that style at compile time. Run a tool one time to
> > > re-format
> > > > > all the existing code once. And turn on a compile time check.
> Iceberg
> > > did
> > > > > the same approach, they had one huge commit with almost 4k files
> > > changed
> > > > > and from that point, it worked well. And there are no issues about
> > > > > formatting.
> > > > > I don't think putting a warning message helps at all. Also, it
> should
> > > be
> > > > > enforced on compile time.
> > > > >
> > > > > Zsolt
> > > > >
> > > > > Kirti Ruge  ezt írta (időpont: 2024. jan.
> 8.,
> > > H,
> > > > > 7:20):
> > > > >
> > > > > +1
> > > > > As it would improve maintainability and code reviews. Sometimes
> small
> > > > > indentation/styling issues would kill review cycle time and we can
> > > easily
> > > > > avoid it before requesting review.
> > > > > Enforcing more rules around it definitely boost guaranteeing
> quality.
> > > We
> > > > > can integrate it with git hooks. If we are going for this, I can
> work
> > > on
> > > > > getting it in place .
> > > > >
> > > > > Thanks,
> > > > > Kirti
> > > > >
> > > > > On 08-Jan-2024, at 11:36 AM, Akshat m 
> wrote:
> > > > >
> > > > > +1, We do have a documentation round it as well:
> > > > >
> > > > >
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/Hive/HowToContribute#HowToContribute-CodingConventions
> > > > > so it makes sense to enforce it as well.
> > > > >
> > > > > Right now we have a small section around this in documentation, We
> can

Re: 4.0 documentation - Confluence limitations?

2024-01-08 Thread Simhadri G
Hi Zsolt,

The current hive website is built with hugo,  so +1 from me :)

We do have a few doc pages written in hugo, example :
https://hive.apache.org/developement/quickstart/

To add a new page we will need to add a new markdown file in the correct
location in the hive-site repo and hugo will render the same in the hive
website.
For reference , there is a readme section here on how to add new pages as
well: https://github.com/apache/hive-site#to-add-new-content
We can definitely change the formatting/style of docs as needed.


Thanks!
Simhadri G

On Mon, Jan 8, 2024 at 3:04 PM Stamatis Zampetakis 
wrote:

> Hey Zsolt,
>
> There have been a few discussions in the past about moving the
> documentation from the wiki to the website and from what I recall
> people were more or less in favor of moving towards this direction.
> The main thing missing is volunteers that are willing to take on this
> migration step.
>
> Personally, I am very much in favor of going into this direction not
> only for solving namespacing issues but also for traceability purposes
> and facilitating doc contributions and reviews.
>
> Big +1 from me.
>
> Best,
> Stamatis
>
> On Mon, Jan 8, 2024 at 10:15 AM Zsolt Miskolczi
>  wrote:
> >
> > In confluence, page names should be unique in a given space. As I see,
> > Apache Hive has its own space.
> > And now comes the tricky part: with 4.0 documentation, we didn't create a
> > new space, just a 4.0 parent page. We create a copy of existing pages
> under
> > the umbrella of this page:
> > https://cwiki.apache.org/confluence/display/Hive/Apache+Hive+4.0.0
> >
> > The problem is the unique naming of pages: it would make sense to keep
> the
> > page names the same as in the older documents but unfortunately, we
> cannot.
> > So we try to create names that are almost the same, or just delay the
> > decisions.
> > Two examples:
> > - AdminManual Installation
> > <
> https://cwiki.apache.org/confluence/display/Hive/AdminManual+Installation>
> > became Manual Installation
> > 
> > - Hive Schema Tool
> > became
> Copy
> > of Hive Schema Tool - [TODO: move it under a 4.0 admin manual page, find
> a
> > proper name]
> > <
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=284790216
> >
> >
> > I feel multiple issues with that: Consistency is gone. And also, I'm not
> > sure how it can support search engines. Also, it can be confusing for
> > people who want to use the wiki pages.
> >
> > I was thinking about different solutions. Creating a Hive 4.0 space in
> > Confluence can solve the problem of page uniqueness. But doesn't address
> > the issue of searchability and ease of use.
> >
> > We can also keep the current one but in that case, it would be
> recommended
> > to figure out a great naming convention about the pages.
> >
> > At this point, my best idea is to move to an engine that has better
> offers
> > to document a software product. For example, Iceberg uses Hugo. It is a
> > markup-based engine, it can be kept in source control and pretty fast.
> > Example page: https://iceberg.apache.org/docs/1.4.1/.
> >
> >
> > What do you think of that?
> >
> > Thank you,
> > Zsolt
>


Re: 4.0 documentation - Confluence limitations?

2024-01-08 Thread Stamatis Zampetakis
Hey Zsolt,

There have been a few discussions in the past about moving the
documentation from the wiki to the website and from what I recall
people were more or less in favor of moving towards this direction.
The main thing missing is volunteers that are willing to take on this
migration step.

Personally, I am very much in favor of going into this direction not
only for solving namespacing issues but also for traceability purposes
and facilitating doc contributions and reviews.

Big +1 from me.

Best,
Stamatis

On Mon, Jan 8, 2024 at 10:15 AM Zsolt Miskolczi
 wrote:
>
> In confluence, page names should be unique in a given space. As I see,
> Apache Hive has its own space.
> And now comes the tricky part: with 4.0 documentation, we didn't create a
> new space, just a 4.0 parent page. We create a copy of existing pages under
> the umbrella of this page:
> https://cwiki.apache.org/confluence/display/Hive/Apache+Hive+4.0.0
>
> The problem is the unique naming of pages: it would make sense to keep the
> page names the same as in the older documents but unfortunately, we cannot.
> So we try to create names that are almost the same, or just delay the
> decisions.
> Two examples:
> - AdminManual Installation
> 
> became Manual Installation
> 
> - Hive Schema Tool
> became Copy
> of Hive Schema Tool - [TODO: move it under a 4.0 admin manual page, find a
> proper name]
> 
>
> I feel multiple issues with that: Consistency is gone. And also, I'm not
> sure how it can support search engines. Also, it can be confusing for
> people who want to use the wiki pages.
>
> I was thinking about different solutions. Creating a Hive 4.0 space in
> Confluence can solve the problem of page uniqueness. But doesn't address
> the issue of searchability and ease of use.
>
> We can also keep the current one but in that case, it would be recommended
> to figure out a great naming convention about the pages.
>
> At this point, my best idea is to move to an engine that has better offers
> to document a software product. For example, Iceberg uses Hugo. It is a
> markup-based engine, it can be kept in source control and pretty fast.
> Example page: https://iceberg.apache.org/docs/1.4.1/.
>
>
> What do you think of that?
>
> Thank you,
> Zsolt


Re: Force coding style in hive precommit

2024-01-08 Thread Stamatis Zampetakis
+1 for enforcing style on new code. It will definitely save us from
additional review cycles.

Although I like checkstyle I tend to prefer tools that can
automatically apply and fix style violations such as spotless [1].

It seems that the spotless plugin can be configured to enforce
formatting gradually [2] so I think it is an ideal choice for this
discussion.

To avoid wasting CI resources for nothing we can employ spotless (or
other plugins) during the regular build so that detect and fix style
violations fail early on before raising the PR.

Finally, spotless can be configured easily to apply Eclipse styles so
making it use our recommended formatting [3] would be trivial.

Best,
Stamatis

[1] https://github.com/diffplug/spotless
[2] 
https://github.com/diffplug/spotless/tree/main/plugin-maven#how-can-i-enforce-formatting-gradually-aka-ratchet
[3] https://github.com/apache/hive/blob/master/dev-support/eclipse-styles.xml

On Mon, Jan 8, 2024 at 11:06 AM Zsolt Miskolczi
 wrote:
>
> I think giving a warning is something that nobody will check. It could only
> make sense if it is formatted in a way that it cannot be overseen. In every
> other case, it is just ignored. And also, we are already full of warnings
> so I'm afraid it can just hide in the noise.
> Sorry, I don't know how it works in hadoop/tez, maybe it is easy to use.
>
> Ayush Saxena  ezt írta (időpont: 2024. jan. 8., H,
> 9:53):
>
> > +1, to have a checkstyle build. I am strongly against doing that big
> > refactor to make just checkstyle happy, such a refactor will make
> > backports to Hive lower branches tough and the life of folks
> > maintaining downstream forks quite painful.
> >
> > We should enforce same kind of stuff like in Tez/Hadoop, where
> > checkstyle violations are highlighted and the committer before
> > committing can check that & decide whether that in unavoidable or not
> >
> > -Ayush
> >
> > On Mon, 8 Jan 2024 at 14:05, László Bodor 
> > wrote:
> > >
> > > thanks for the responses so far!
> > > I'm a bit against the one-time huge refactor commit as we don't need that
> > > (but I can be convinced of course), because checkstyle can be set up to
> > > warn only on style issues in the new/touched bits in the PR (or at least
> > > that's how it works in tez), that's what we need, so we don't have to
> > make
> > > that huge commit to simply introduce this enforcement
> > >
> > > Butao Zhang  ezt írta (időpont: 2024. jan. 8., H,
> > > 9:28):
> > >
> > > > +1
> > > >
> > > >
> > > >
> > > > BTW, We have a independent checkstyle file under iceberg module
> > > > https://github.com/apache/hive/tree/master/iceberg/checkstyle . I
> > think
> > > > we need to consider unifing the checkstyle in all the sub-module.
> > > >
> > > >
> > > > Thanks,
> > > > Butao Zhang
> > > >  Replied Message 
> > > > | From | Zsolt Miskolczi |
> > > > | Date | 1/8/2024 16:19 |
> > > > | To |  |
> > > > | Subject | Re: Force coding style in hive precommit |
> > > > +1
> > > >
> > > > In case there is an agreement about the coding style, we can prepare a
> > tool
> > > > that enforces that style at compile time. Run a tool one time to
> > re-format
> > > > all the existing code once. And turn on a compile time check. Iceberg
> > did
> > > > the same approach, they had one huge commit with almost 4k files
> > changed
> > > > and from that point, it worked well. And there are no issues about
> > > > formatting.
> > > > I don't think putting a warning message helps at all. Also, it should
> > be
> > > > enforced on compile time.
> > > >
> > > > Zsolt
> > > >
> > > > Kirti Ruge  ezt írta (időpont: 2024. jan. 8.,
> > H,
> > > > 7:20):
> > > >
> > > > +1
> > > > As it would improve maintainability and code reviews. Sometimes small
> > > > indentation/styling issues would kill review cycle time and we can
> > easily
> > > > avoid it before requesting review.
> > > > Enforcing more rules around it definitely boost guaranteeing quality.
> > We
> > > > can integrate it with git hooks. If we are going for this, I can work
> > on
> > > > getting it in place .
> > > >
> > > > Thanks,
> > > > Kirti
> > > >
> > > > On 08-Jan-2024, at 11:36 AM, Akshat m  wrote:
> > > >
> > > > +1, We do have a documentation round it as well:
> > > >
> > > >
> > > >
> > https://cwiki.apache.org/confluence/display/Hive/HowToContribute#HowToContribute-CodingConventions
> > > > so it makes sense to enforce it as well.
> > > >
> > > > Right now we have a small section around this in documentation, We can
> > > > also
> > > > expand this to a new page and add more Java practices to it as well
> > which
> > > > are followed in the project while we are at this, Will be a great
> > > > addition
> > > > to Hive 4 documentation, I can pick it up.
> > > >
> > > > I suggest we add this style check as a pre-commit git hook as well, so
> > it
> > > > is enforced when the author is committing locally as well, this can
> > save
> > > > the wait time for pre-commit failure in the PR for the author 

Re: Force coding style in hive precommit

2024-01-08 Thread Zsolt Miskolczi
I think giving a warning is something that nobody will check. It could only
make sense if it is formatted in a way that it cannot be overseen. In every
other case, it is just ignored. And also, we are already full of warnings
so I'm afraid it can just hide in the noise.
Sorry, I don't know how it works in hadoop/tez, maybe it is easy to use.

Ayush Saxena  ezt írta (időpont: 2024. jan. 8., H,
9:53):

> +1, to have a checkstyle build. I am strongly against doing that big
> refactor to make just checkstyle happy, such a refactor will make
> backports to Hive lower branches tough and the life of folks
> maintaining downstream forks quite painful.
>
> We should enforce same kind of stuff like in Tez/Hadoop, where
> checkstyle violations are highlighted and the committer before
> committing can check that & decide whether that in unavoidable or not
>
> -Ayush
>
> On Mon, 8 Jan 2024 at 14:05, László Bodor 
> wrote:
> >
> > thanks for the responses so far!
> > I'm a bit against the one-time huge refactor commit as we don't need that
> > (but I can be convinced of course), because checkstyle can be set up to
> > warn only on style issues in the new/touched bits in the PR (or at least
> > that's how it works in tez), that's what we need, so we don't have to
> make
> > that huge commit to simply introduce this enforcement
> >
> > Butao Zhang  ezt írta (időpont: 2024. jan. 8., H,
> > 9:28):
> >
> > > +1
> > >
> > >
> > >
> > > BTW, We have a independent checkstyle file under iceberg module
> > > https://github.com/apache/hive/tree/master/iceberg/checkstyle . I
> think
> > > we need to consider unifing the checkstyle in all the sub-module.
> > >
> > >
> > > Thanks,
> > > Butao Zhang
> > >  Replied Message 
> > > | From | Zsolt Miskolczi |
> > > | Date | 1/8/2024 16:19 |
> > > | To |  |
> > > | Subject | Re: Force coding style in hive precommit |
> > > +1
> > >
> > > In case there is an agreement about the coding style, we can prepare a
> tool
> > > that enforces that style at compile time. Run a tool one time to
> re-format
> > > all the existing code once. And turn on a compile time check. Iceberg
> did
> > > the same approach, they had one huge commit with almost 4k files
> changed
> > > and from that point, it worked well. And there are no issues about
> > > formatting.
> > > I don't think putting a warning message helps at all. Also, it should
> be
> > > enforced on compile time.
> > >
> > > Zsolt
> > >
> > > Kirti Ruge  ezt írta (időpont: 2024. jan. 8.,
> H,
> > > 7:20):
> > >
> > > +1
> > > As it would improve maintainability and code reviews. Sometimes small
> > > indentation/styling issues would kill review cycle time and we can
> easily
> > > avoid it before requesting review.
> > > Enforcing more rules around it definitely boost guaranteeing quality.
> We
> > > can integrate it with git hooks. If we are going for this, I can work
> on
> > > getting it in place .
> > >
> > > Thanks,
> > > Kirti
> > >
> > > On 08-Jan-2024, at 11:36 AM, Akshat m  wrote:
> > >
> > > +1, We do have a documentation round it as well:
> > >
> > >
> > >
> https://cwiki.apache.org/confluence/display/Hive/HowToContribute#HowToContribute-CodingConventions
> > > so it makes sense to enforce it as well.
> > >
> > > Right now we have a small section around this in documentation, We can
> > > also
> > > expand this to a new page and add more Java practices to it as well
> which
> > > are followed in the project while we are at this, Will be a great
> > > addition
> > > to Hive 4 documentation, I can pick it up.
> > >
> > > I suggest we add this style check as a pre-commit git hook as well, so
> it
> > > is enforced when the author is committing locally as well, this can
> save
> > > the wait time for pre-commit failure in the PR for the author to
> realise
> > > the styling issues, ideally this should be taken care of with the ide
> > > style
> > > configuration but in case we miss it this would error out while
> > > committing the changes.
> > >
> > > Regards,
> > > Akshat
> > >
> > > On Sat, Jan 6, 2024 at 10:17 AM László Bodor <
> bodorlaszlo0...@gmail.com>
> > > wrote:
> > >
> > > Hi All!
> > >
> > > What do you think about forcing coding style in Hive precommit?
> > >
> > > I remember, back in the old days, precommit printed some warnings in
> > > case
> > > some coding style (formatting, indentation, naming convention, etc.)
> > > problems were found in the patch, now it's simply not used, I guess
> > > since
> > > we're using GitHub PRs.
> > >
> > > For example: I remember I simply approved a PR a few months ago which
> > > LGTM, and later just realized it's full of 4-spaces indentation, which
> > > is
> > > wrong if we assume that code should be formatted according to the style
> > > definition here:
> > >
> > >
> https://github.com/apache/hive/blob/master/dev-support/eclipse-styles.xml
> > >
> > > I have just attached an example of Tez PR to open minds and start a
> > > conversation.
> > >
> > > Regards,
> > > Laszlo Bodor

Re: Force coding style in hive precommit

2024-01-08 Thread Ayush Saxena
+1, to have a checkstyle build. I am strongly against doing that big
refactor to make just checkstyle happy, such a refactor will make
backports to Hive lower branches tough and the life of folks
maintaining downstream forks quite painful.

We should enforce same kind of stuff like in Tez/Hadoop, where
checkstyle violations are highlighted and the committer before
committing can check that & decide whether that in unavoidable or not

-Ayush

On Mon, 8 Jan 2024 at 14:05, László Bodor  wrote:
>
> thanks for the responses so far!
> I'm a bit against the one-time huge refactor commit as we don't need that
> (but I can be convinced of course), because checkstyle can be set up to
> warn only on style issues in the new/touched bits in the PR (or at least
> that's how it works in tez), that's what we need, so we don't have to make
> that huge commit to simply introduce this enforcement
>
> Butao Zhang  ezt írta (időpont: 2024. jan. 8., H,
> 9:28):
>
> > +1
> >
> >
> >
> > BTW, We have a independent checkstyle file under iceberg module
> > https://github.com/apache/hive/tree/master/iceberg/checkstyle . I think
> > we need to consider unifing the checkstyle in all the sub-module.
> >
> >
> > Thanks,
> > Butao Zhang
> >  Replied Message 
> > | From | Zsolt Miskolczi |
> > | Date | 1/8/2024 16:19 |
> > | To |  |
> > | Subject | Re: Force coding style in hive precommit |
> > +1
> >
> > In case there is an agreement about the coding style, we can prepare a tool
> > that enforces that style at compile time. Run a tool one time to re-format
> > all the existing code once. And turn on a compile time check. Iceberg did
> > the same approach, they had one huge commit with almost 4k files changed
> > and from that point, it worked well. And there are no issues about
> > formatting.
> > I don't think putting a warning message helps at all. Also, it should be
> > enforced on compile time.
> >
> > Zsolt
> >
> > Kirti Ruge  ezt írta (időpont: 2024. jan. 8., H,
> > 7:20):
> >
> > +1
> > As it would improve maintainability and code reviews. Sometimes small
> > indentation/styling issues would kill review cycle time and we can easily
> > avoid it before requesting review.
> > Enforcing more rules around it definitely boost guaranteeing quality. We
> > can integrate it with git hooks. If we are going for this, I can work on
> > getting it in place .
> >
> > Thanks,
> > Kirti
> >
> > On 08-Jan-2024, at 11:36 AM, Akshat m  wrote:
> >
> > +1, We do have a documentation round it as well:
> >
> >
> > https://cwiki.apache.org/confluence/display/Hive/HowToContribute#HowToContribute-CodingConventions
> > so it makes sense to enforce it as well.
> >
> > Right now we have a small section around this in documentation, We can
> > also
> > expand this to a new page and add more Java practices to it as well which
> > are followed in the project while we are at this, Will be a great
> > addition
> > to Hive 4 documentation, I can pick it up.
> >
> > I suggest we add this style check as a pre-commit git hook as well, so it
> > is enforced when the author is committing locally as well, this can save
> > the wait time for pre-commit failure in the PR for the author to realise
> > the styling issues, ideally this should be taken care of with the ide
> > style
> > configuration but in case we miss it this would error out while
> > committing the changes.
> >
> > Regards,
> > Akshat
> >
> > On Sat, Jan 6, 2024 at 10:17 AM László Bodor 
> > wrote:
> >
> > Hi All!
> >
> > What do you think about forcing coding style in Hive precommit?
> >
> > I remember, back in the old days, precommit printed some warnings in
> > case
> > some coding style (formatting, indentation, naming convention, etc.)
> > problems were found in the patch, now it's simply not used, I guess
> > since
> > we're using GitHub PRs.
> >
> > For example: I remember I simply approved a PR a few months ago which
> > LGTM, and later just realized it's full of 4-spaces indentation, which
> > is
> > wrong if we assume that code should be formatted according to the style
> > definition here:
> >
> > https://github.com/apache/hive/blob/master/dev-support/eclipse-styles.xml
> >
> > I have just attached an example of Tez PR to open minds and start a
> > conversation.
> >
> > Regards,
> > Laszlo Bodor
> >
> >
> >
> >
> >


Re: Force coding style in hive precommit

2024-01-08 Thread László Bodor
thanks for the responses so far!
I'm a bit against the one-time huge refactor commit as we don't need that
(but I can be convinced of course), because checkstyle can be set up to
warn only on style issues in the new/touched bits in the PR (or at least
that's how it works in tez), that's what we need, so we don't have to make
that huge commit to simply introduce this enforcement

Butao Zhang  ezt írta (időpont: 2024. jan. 8., H,
9:28):

> +1
>
>
>
> BTW, We have a independent checkstyle file under iceberg module
> https://github.com/apache/hive/tree/master/iceberg/checkstyle . I think
> we need to consider unifing the checkstyle in all the sub-module.
>
>
> Thanks,
> Butao Zhang
>  Replied Message 
> | From | Zsolt Miskolczi |
> | Date | 1/8/2024 16:19 |
> | To |  |
> | Subject | Re: Force coding style in hive precommit |
> +1
>
> In case there is an agreement about the coding style, we can prepare a tool
> that enforces that style at compile time. Run a tool one time to re-format
> all the existing code once. And turn on a compile time check. Iceberg did
> the same approach, they had one huge commit with almost 4k files changed
> and from that point, it worked well. And there are no issues about
> formatting.
> I don't think putting a warning message helps at all. Also, it should be
> enforced on compile time.
>
> Zsolt
>
> Kirti Ruge  ezt írta (időpont: 2024. jan. 8., H,
> 7:20):
>
> +1
> As it would improve maintainability and code reviews. Sometimes small
> indentation/styling issues would kill review cycle time and we can easily
> avoid it before requesting review.
> Enforcing more rules around it definitely boost guaranteeing quality. We
> can integrate it with git hooks. If we are going for this, I can work on
> getting it in place .
>
> Thanks,
> Kirti
>
> On 08-Jan-2024, at 11:36 AM, Akshat m  wrote:
>
> +1, We do have a documentation round it as well:
>
>
> https://cwiki.apache.org/confluence/display/Hive/HowToContribute#HowToContribute-CodingConventions
> so it makes sense to enforce it as well.
>
> Right now we have a small section around this in documentation, We can
> also
> expand this to a new page and add more Java practices to it as well which
> are followed in the project while we are at this, Will be a great
> addition
> to Hive 4 documentation, I can pick it up.
>
> I suggest we add this style check as a pre-commit git hook as well, so it
> is enforced when the author is committing locally as well, this can save
> the wait time for pre-commit failure in the PR for the author to realise
> the styling issues, ideally this should be taken care of with the ide
> style
> configuration but in case we miss it this would error out while
> committing the changes.
>
> Regards,
> Akshat
>
> On Sat, Jan 6, 2024 at 10:17 AM László Bodor 
> wrote:
>
> Hi All!
>
> What do you think about forcing coding style in Hive precommit?
>
> I remember, back in the old days, precommit printed some warnings in
> case
> some coding style (formatting, indentation, naming convention, etc.)
> problems were found in the patch, now it's simply not used, I guess
> since
> we're using GitHub PRs.
>
> For example: I remember I simply approved a PR a few months ago which
> LGTM, and later just realized it's full of 4-spaces indentation, which
> is
> wrong if we assume that code should be formatted according to the style
> definition here:
>
> https://github.com/apache/hive/blob/master/dev-support/eclipse-styles.xml
>
> I have just attached an example of Tez PR to open minds and start a
> conversation.
>
> Regards,
> Laszlo Bodor
>
>
>
>
>


Re: Force coding style in hive precommit

2024-01-08 Thread Butao Zhang
+1



BTW, We have a independent checkstyle file under iceberg module 
https://github.com/apache/hive/tree/master/iceberg/checkstyle . I think we need 
to consider unifing the checkstyle in all the sub-module.


Thanks,
Butao Zhang
 Replied Message 
| From | Zsolt Miskolczi |
| Date | 1/8/2024 16:19 |
| To |  |
| Subject | Re: Force coding style in hive precommit |
+1

In case there is an agreement about the coding style, we can prepare a tool
that enforces that style at compile time. Run a tool one time to re-format
all the existing code once. And turn on a compile time check. Iceberg did
the same approach, they had one huge commit with almost 4k files changed
and from that point, it worked well. And there are no issues about
formatting.
I don't think putting a warning message helps at all. Also, it should be
enforced on compile time.

Zsolt

Kirti Ruge  ezt írta (időpont: 2024. jan. 8., H,
7:20):

+1
As it would improve maintainability and code reviews. Sometimes small
indentation/styling issues would kill review cycle time and we can easily
avoid it before requesting review.
Enforcing more rules around it definitely boost guaranteeing quality. We
can integrate it with git hooks. If we are going for this, I can work on
getting it in place .

Thanks,
Kirti

On 08-Jan-2024, at 11:36 AM, Akshat m  wrote:

+1, We do have a documentation round it as well:

https://cwiki.apache.org/confluence/display/Hive/HowToContribute#HowToContribute-CodingConventions
so it makes sense to enforce it as well.

Right now we have a small section around this in documentation, We can
also
expand this to a new page and add more Java practices to it as well which
are followed in the project while we are at this, Will be a great
addition
to Hive 4 documentation, I can pick it up.

I suggest we add this style check as a pre-commit git hook as well, so it
is enforced when the author is committing locally as well, this can save
the wait time for pre-commit failure in the PR for the author to realise
the styling issues, ideally this should be taken care of with the ide
style
configuration but in case we miss it this would error out while
committing the changes.

Regards,
Akshat

On Sat, Jan 6, 2024 at 10:17 AM László Bodor 
wrote:

Hi All!

What do you think about forcing coding style in Hive precommit?

I remember, back in the old days, precommit printed some warnings in
case
some coding style (formatting, indentation, naming convention, etc.)
problems were found in the patch, now it's simply not used, I guess
since
we're using GitHub PRs.

For example: I remember I simply approved a PR a few months ago which
LGTM, and later just realized it's full of 4-spaces indentation, which
is
wrong if we assume that code should be formatted according to the style
definition here:

https://github.com/apache/hive/blob/master/dev-support/eclipse-styles.xml

I have just attached an example of Tez PR to open minds and start a
conversation.

Regards,
Laszlo Bodor






Re: Force coding style in hive precommit

2024-01-08 Thread Zsolt Miskolczi
+1

In case there is an agreement about the coding style, we can prepare a tool
that enforces that style at compile time. Run a tool one time to re-format
all the existing code once. And turn on a compile time check. Iceberg did
the same approach, they had one huge commit with almost 4k files changed
and from that point, it worked well. And there are no issues about
formatting.
I don't think putting a warning message helps at all. Also, it should be
enforced on compile time.

Zsolt

Kirti Ruge  ezt írta (időpont: 2024. jan. 8., H,
7:20):

> +1
> As it would improve maintainability and code reviews. Sometimes small
> indentation/styling issues would kill review cycle time and we can easily
> avoid it before requesting review.
> Enforcing more rules around it definitely boost guaranteeing quality. We
> can integrate it with git hooks. If we are going for this, I can work on
> getting it in place .
>
> Thanks,
> Kirti
>
> > On 08-Jan-2024, at 11:36 AM, Akshat m  wrote:
> >
> > +1, We do have a documentation round it as well:
> >
> https://cwiki.apache.org/confluence/display/Hive/HowToContribute#HowToContribute-CodingConventions
> > so it makes sense to enforce it as well.
> >
> > Right now we have a small section around this in documentation, We can
> also
> > expand this to a new page and add more Java practices to it as well which
> > are followed in the project while we are at this, Will be a great
> addition
> > to Hive 4 documentation, I can pick it up.
> >
> > I suggest we add this style check as a pre-commit git hook as well, so it
> > is enforced when the author is committing locally as well, this can save
> > the wait time for pre-commit failure in the PR for the author to realise
> > the styling issues, ideally this should be taken care of with the ide
> style
> > configuration but in case we miss it this would error out while
> > committing the changes.
> >
> > Regards,
> > Akshat
> >
> > On Sat, Jan 6, 2024 at 10:17 AM László Bodor 
> > wrote:
> >
> >> Hi All!
> >>
> >> What do you think about forcing coding style in Hive precommit?
> >>
> >> I remember, back in the old days, precommit printed some warnings in
> case
> >> some coding style (formatting, indentation, naming convention, etc.)
> >> problems were found in the patch, now it's simply not used, I guess
> since
> >> we're using GitHub PRs.
> >>
> >> For example: I remember I simply approved a PR a few months ago which
> >> LGTM, and later just realized it's full of 4-spaces indentation, which
> is
> >> wrong if we assume that code should be formatted according to the style
> >> definition here:
> >>
> https://github.com/apache/hive/blob/master/dev-support/eclipse-styles.xml
> >>
> >> I have just attached an example of Tez PR to open minds and start a
> >> conversation.
> >>
> >> Regards,
> >> Laszlo Bodor
> >>
> >>
>
>


4.0 documentation - Confluence limitations?

2024-01-08 Thread Zsolt Miskolczi
In confluence, page names should be unique in a given space. As I see,
Apache Hive has its own space.
And now comes the tricky part: with 4.0 documentation, we didn't create a
new space, just a 4.0 parent page. We create a copy of existing pages under
the umbrella of this page:
https://cwiki.apache.org/confluence/display/Hive/Apache+Hive+4.0.0

The problem is the unique naming of pages: it would make sense to keep the
page names the same as in the older documents but unfortunately, we cannot.
So we try to create names that are almost the same, or just delay the
decisions.
Two examples:
- AdminManual Installation

became Manual Installation

- Hive Schema Tool
became Copy
of Hive Schema Tool - [TODO: move it under a 4.0 admin manual page, find a
proper name]


I feel multiple issues with that: Consistency is gone. And also, I'm not
sure how it can support search engines. Also, it can be confusing for
people who want to use the wiki pages.

I was thinking about different solutions. Creating a Hive 4.0 space in
Confluence can solve the problem of page uniqueness. But doesn't address
the issue of searchability and ease of use.

We can also keep the current one but in that case, it would be recommended
to figure out a great naming convention about the pages.

At this point, my best idea is to move to an engine that has better offers
to document a software product. For example, Iceberg uses Hugo. It is a
markup-based engine, it can be kept in source control and pretty fast.
Example page: https://iceberg.apache.org/docs/1.4.1/.


What do you think of that?

Thank you,
Zsolt