Re: Nutch codebase formatting

2023-11-02 Thread Lewis John McGibbney
Thanks Seb. I'll go ahead and try to build in the google Java format via 
super-linter and see where we get...!
lewismc

On 2023/10/29 17:04:47 Sebastian Nagel wrote:
> Hi Lewis,
> 
>  >> whether we need a Nutch custom code style at all… why don’t we just use
>  >> some other existing style and then enforce it?
> 
> Enforcing: yes!
> 
> However, I would try hard to keep the changes on a reasonable minimum. For 
> example, if we change the indentation, almost every code line is affected 
> which 
> makes
> - "git annotate" mostly useless (or more difficult to use because you need 
> look
>back)
> - merges of open PRs, custom patches or modifications in custom repositories
>might get quite painful, until the formatting is synchronized.
> 
> 
>  >> * google Java format [1] which offers a GitHub action for easy integration
>  >> into our CI process, or
> 
> +1
> 
> + available also for Intellij, Eclipse
> + indentation stays the same
> +/- about 25% of the code lines are changed (might be acceptable)
> 
> 
>  >> * superlinter [3] basically emerging as the industry OSS default, offers a
>  >> GitHub action and could also be configured to lint dockerfile, and other
>  >> artifacts. It can also be configured to use the google Java style as well…
> 
> +1 (with Google Java style)
> 
> 
>  > I’ll submit a PR for superlinter so everyone can see what it would look 
> like.
> 
> Great! Thanks!
> 
> 
> Best,
> Sebastian
> 
> On 10/29/23 00:38, Lewis John McGibbney wrote:
> > Any thoughts on this folks.
> > I’ll submit a PR for superlinter so everyone can see what it would look 
> > like.
> > lewismc
> > 
> > On 2023/10/23 19:28:45 lewis john mcgibbney wrote:
> >> Hi dev@,
> >>
> >> For the longest time the Nutch codebase has shipped with a
> >> eclipse-codeformat.xml [0] file.
> >> Whilst this has been largely successful in keeping the codebase uniform, it
> >> cannot/has not been integrated into continuous integration (CI)  and
> >> subsequently not really enforced!
> >>
> >> Whilst I’m a big fan of “if it ain’t broken don’t fix it”, I think we
> >> should have some CI code formatting checks. Additionally I really question
> >> whether we need a Nutch custom code style at all… why don’t we just use
> >> some other existing style and then enforce it?
> >>
> >> I therefore propose that we replace the legacy code formatter with a
> >> convention such as
> >>
> >> * google Java format [1] which offers a GitHub action for easy integration
> >> into our CI process, or
> >> * check style [2] which offers an Ant task which we could use, this is of
> >> less utility as we think about the move to grade
> >> * superlinter [3] basically emerging as the industry OSS default, offers a
> >> GitHub action and could also be configured to lint dockerfile, and other
> >> artifacts. It can also be configured to use the google Java style as well…
> >>
> >> My preference would be [3] because it offers a more comprehensive linting
> >> package for the entire codebase not just the Java code.
> >>
> >> Thanks for your consideration.
> >> lewismc
> >>
> >> [0]
> >> https://github.com/apache/nutch/blob/master/eclipse-codeformat.xml
> >> [1]
> >> https://github.com/google/google-java-format
> >> [2]
> >> https://checkstyle.sourceforge.io/
> >> [3]
> >> https://github.com/marketplace/actions/super-linter
> >>
> 


Re: Nutch codebase formatting

2023-10-29 Thread Sebastian Nagel

Hi Lewis,

>> whether we need a Nutch custom code style at all… why don’t we just use
>> some other existing style and then enforce it?

Enforcing: yes!

However, I would try hard to keep the changes on a reasonable minimum. For 
example, if we change the indentation, almost every code line is affected which 
makes

- "git annotate" mostly useless (or more difficult to use because you need look
  back)
- merges of open PRs, custom patches or modifications in custom repositories
  might get quite painful, until the formatting is synchronized.


>> * google Java format [1] which offers a GitHub action for easy integration
>> into our CI process, or

+1

+ available also for Intellij, Eclipse
+ indentation stays the same
+/- about 25% of the code lines are changed (might be acceptable)


>> * superlinter [3] basically emerging as the industry OSS default, offers a
>> GitHub action and could also be configured to lint dockerfile, and other
>> artifacts. It can also be configured to use the google Java style as well…

+1 (with Google Java style)


> I’ll submit a PR for superlinter so everyone can see what it would look like.

Great! Thanks!


Best,
Sebastian

On 10/29/23 00:38, Lewis John McGibbney wrote:

Any thoughts on this folks.
I’ll submit a PR for superlinter so everyone can see what it would look like.
lewismc

On 2023/10/23 19:28:45 lewis john mcgibbney wrote:

Hi dev@,

For the longest time the Nutch codebase has shipped with a
eclipse-codeformat.xml [0] file.
Whilst this has been largely successful in keeping the codebase uniform, it
cannot/has not been integrated into continuous integration (CI)  and
subsequently not really enforced!

Whilst I’m a big fan of “if it ain’t broken don’t fix it”, I think we
should have some CI code formatting checks. Additionally I really question
whether we need a Nutch custom code style at all… why don’t we just use
some other existing style and then enforce it?

I therefore propose that we replace the legacy code formatter with a
convention such as

* google Java format [1] which offers a GitHub action for easy integration
into our CI process, or
* check style [2] which offers an Ant task which we could use, this is of
less utility as we think about the move to grade
* superlinter [3] basically emerging as the industry OSS default, offers a
GitHub action and could also be configured to lint dockerfile, and other
artifacts. It can also be configured to use the google Java style as well…

My preference would be [3] because it offers a more comprehensive linting
package for the entire codebase not just the Java code.

Thanks for your consideration.
lewismc

[0]
https://github.com/apache/nutch/blob/master/eclipse-codeformat.xml
[1]
https://github.com/google/google-java-format
[2]
https://checkstyle.sourceforge.io/
[3]
https://github.com/marketplace/actions/super-linter



Re: Nutch codebase formatting

2023-10-28 Thread Lewis John McGibbney
Any thoughts on this folks.
I’ll submit a PR for superlinter so everyone can see what it would look like.
lewismc 

On 2023/10/23 19:28:45 lewis john mcgibbney wrote:
> Hi dev@,
> 
> For the longest time the Nutch codebase has shipped with a
> eclipse-codeformat.xml [0] file.
> Whilst this has been largely successful in keeping the codebase uniform, it
> cannot/has not been integrated into continuous integration (CI)  and
> subsequently not really enforced!
> 
> Whilst I’m a big fan of “if it ain’t broken don’t fix it”, I think we
> should have some CI code formatting checks. Additionally I really question
> whether we need a Nutch custom code style at all… why don’t we just use
> some other existing style and then enforce it?
> 
> I therefore propose that we replace the legacy code formatter with a
> convention such as
> 
> * google Java format [1] which offers a GitHub action for easy integration
> into our CI process, or
> * check style [2] which offers an Ant task which we could use, this is of
> less utility as we think about the move to grade
> * superlinter [3] basically emerging as the industry OSS default, offers a
> GitHub action and could also be configured to lint dockerfile, and other
> artifacts. It can also be configured to use the google Java style as well…
> 
> My preference would be [3] because it offers a more comprehensive linting
> package for the entire codebase not just the Java code.
> 
> Thanks for your consideration.
> lewismc
> 
> [0]
> https://github.com/apache/nutch/blob/master/eclipse-codeformat.xml
> [1]
> https://github.com/google/google-java-format
> [2]
> https://checkstyle.sourceforge.io/
> [3]
> https://github.com/marketplace/actions/super-linter
> 


Nutch codebase formatting

2023-10-23 Thread lewis john mcgibbney
Hi dev@,

For the longest time the Nutch codebase has shipped with a
eclipse-codeformat.xml [0] file.
Whilst this has been largely successful in keeping the codebase uniform, it
cannot/has not been integrated into continuous integration (CI)  and
subsequently not really enforced!

Whilst I’m a big fan of “if it ain’t broken don’t fix it”, I think we
should have some CI code formatting checks. Additionally I really question
whether we need a Nutch custom code style at all… why don’t we just use
some other existing style and then enforce it?

I therefore propose that we replace the legacy code formatter with a
convention such as

* google Java format [1] which offers a GitHub action for easy integration
into our CI process, or
* check style [2] which offers an Ant task which we could use, this is of
less utility as we think about the move to grade
* superlinter [3] basically emerging as the industry OSS default, offers a
GitHub action and could also be configured to lint dockerfile, and other
artifacts. It can also be configured to use the google Java style as well…

My preference would be [3] because it offers a more comprehensive linting
package for the entire codebase not just the Java code.

Thanks for your consideration.
lewismc

[0]
https://github.com/apache/nutch/blob/master/eclipse-codeformat.xml
[1]
https://github.com/google/google-java-format
[2]
https://checkstyle.sourceforge.io/
[3]
https://github.com/marketplace/actions/super-linter