Re: [Bioc-devel] Git pack file greater than 5MB

2020-10-01 Thread McGrath, Max
Hi all,

Thank you all for taking the time to discuss this issue. While the package does 
have a fairly long history with multiple authors, I am currently the only 
active developer on the project. So, if it preferable from Bioconductor’s 
perspective, I will rewrite the repository's history.

I do have one remaining question regarding the extent of the rewrite. Currently 
the package (not including the pack file) sits at ~1.6MB. After a test run of 
the rewrite I was able to reduce the pack file to ~4MB. So, the total package 
is still over 5MB, but each individual file is under the threshold. Is this 
acceptable? Or will I need to delete more from the history? I ask because I 
imagine it is preferable to limit the extent of the rewrite to a minimal 
acceptable standard.

Thanks,
Max

From: Bioc-devel  on behalf of Martin Morgan 

Sent: Thursday, October 1, 2020 10:32 AM
To: Henrik Bengtsson ; Nitesh Turaga 

Cc: bioc-devel@r-project.org 
Subject: Re: [Bioc-devel] Git pack file greater than 5MB

yes Hervé has made this point too -- mucking with the history of the package, 
potentially breaking historical checkouts (when large files are deleted from 
the history, too).

It's relevant because when a package is added to our repository we do a full 
clone of the master branch; an alternative would be to do a --depth 1 clone of 
the master branch, but to me this doesn't seem ideal at all -- from the 
Bioconductor perspective the git.bioconductor.org package is definitive, and 
all we would have would be 'and then a miracle occurred' for early package 
development. I'm also nervous about side-effects associated with maintaining 
the Bioconductor and non-Bioconductor repositories that have different 
historical starts.

My own feel is that most of these cases are packages that are still 'new' and 
seldom have clones / forks.

One could take a hybrid approach, where if a maintainer insists on the 
integrity of their git repository (or even automatically, if they do have large 
files in their history we automatically change strategy) then we do a --depth 1 
clone.

Martin

On 10/1/20, 12:17 PM, "Bioc-devel on behalf of Henrik Bengtsson" 
 
wrote:

I understood that it's a submission. Just wanted to make sure that it's
clear there might be side effects, e.g. people clone and collaborate also
before submitting to Bioc and a rewrite might surprise existing
collaborators etc.

/H

On Thu, Oct 1, 2020, 09:04 Nitesh Turaga  wrote:

> This package isn’t yet a Bioconductor package Henrik. It will break other
> forks most likely. This package hasn’t been submitted to the Contributions
> either to be reviewed. So this is the time to break what needs to be 
broken
> before it’s submitted to Bioconductor and gets into the Bioconductor git
> repository.
>
> Nitesh
>
> On Oct 1, 2020, at 11:57 AM, Henrik Bengtsson 
> wrote:
>
> Doesn't a git rewrite break all existing clones, forks out there? I'm
> happy to be corrected, if this is not the case.
>
> /Henrik
>
> On Thu, Oct 1, 2020, 08:16 Nitesh Turaga  wrote:
>
>> Hi,
>>
>> The BiocCheck will complain on the build system about the > 5MB package
>> size.
>>
>> The rewrite of the history with BFG cleaner (
>> https://rtyley.github.io/bfg-repo-cleaner/ <
>> https://rtyley.github.io/bfg-repo-cleaner/>) is not as severe as you
>> think it is to be honest. It is just removing these pack files which 
don’t
>> have a place in the tree structure. These are more often than not, orphan
>> files.
>>
>> If you are suspect of this solution, I would suggest you make a backup
>> clone of your repo and try it on that first before you touch the main 
repo.
>> Check the history (git log) to see if anything important is missing.
>>
>> But usually a software package has to be below 5MB. If you have some data
>> in there which is needed for the package, consider Experiment Hub.
>>
>> Best,
>>
>> Nitesh
>>
>> > On Sep 30, 2020, at 12:46 PM, McGrath, Max 
>> wrote:
>> >
>> > Hi all,
>> >
>> > We have a package that is ready for submission, but when running
>> BiocCheck a warning is generated noting that "The following files are 
over
>> 5MB in size: '.git/objects/pack/pack-xxx...". I've pruned, repacked, and
>> run git gc which reduced the file size from 5.2 to 5.1MB, but I have been
>> unable to reduce it further.
>> >
>> > I'm reaching out to determine if this is an issue, and if so to ask for
>> recommendation

Re: [Bioc-devel] Git pack file greater than 5MB

2020-10-01 Thread Martin Morgan
yes Hervé has made this point too -- mucking with the history of the package, 
potentially breaking historical checkouts (when large files are deleted from 
the history, too).

It's relevant because when a package is added to our repository we do a full 
clone of the master branch; an alternative would be to do a --depth 1 clone of 
the master branch, but to me this doesn't seem ideal at all -- from the 
Bioconductor perspective the git.bioconductor.org package is definitive, and 
all we would have would be 'and then a miracle occurred' for early package 
development. I'm also nervous about side-effects associated with maintaining 
the Bioconductor and non-Bioconductor repositories that have different 
historical starts.

My own feel is that most of these cases are packages that are still 'new' and 
seldom have clones / forks.

One could take a hybrid approach, where if a maintainer insists on the 
integrity of their git repository (or even automatically, if they do have large 
files in their history we automatically change strategy) then we do a --depth 1 
clone.

Martin

On 10/1/20, 12:17 PM, "Bioc-devel on behalf of Henrik Bengtsson" 
 
wrote:

I understood that it's a submission. Just wanted to make sure that it's
clear there might be side effects, e.g. people clone and collaborate also
before submitting to Bioc and a rewrite might surprise existing
collaborators etc.

/H

On Thu, Oct 1, 2020, 09:04 Nitesh Turaga  wrote:

> This package isn’t yet a Bioconductor package Henrik. It will break other
> forks most likely. This package hasn’t been submitted to the Contributions
> either to be reviewed. So this is the time to break what needs to be 
broken
> before it’s submitted to Bioconductor and gets into the Bioconductor git
> repository.
>
> Nitesh
>
> On Oct 1, 2020, at 11:57 AM, Henrik Bengtsson 
> wrote:
>
> Doesn't a git rewrite break all existing clones, forks out there? I'm
> happy to be corrected, if this is not the case.
>
> /Henrik
>
> On Thu, Oct 1, 2020, 08:16 Nitesh Turaga  wrote:
>
>> Hi,
>>
>> The BiocCheck will complain on the build system about the > 5MB package
>> size.
>>
>> The rewrite of the history with BFG cleaner (
>> https://rtyley.github.io/bfg-repo-cleaner/ <
>> https://rtyley.github.io/bfg-repo-cleaner/>) is not as severe as you
>> think it is to be honest. It is just removing these pack files which 
don’t
>> have a place in the tree structure. These are more often than not, orphan
>> files.
>>
>> If you are suspect of this solution, I would suggest you make a backup
>> clone of your repo and try it on that first before you touch the main 
repo.
>> Check the history (git log) to see if anything important is missing.
>>
>> But usually a software package has to be below 5MB. If you have some data
>> in there which is needed for the package, consider Experiment Hub.
>>
>> Best,
>>
>> Nitesh
>>
>> > On Sep 30, 2020, at 12:46 PM, McGrath, Max 
>> wrote:
>> >
>> > Hi all,
>> >
>> > We have a package that is ready for submission, but when running
>> BiocCheck a warning is generated noting that "The following files are 
over
>> 5MB in size: '.git/objects/pack/pack-xxx...". I've pruned, repacked, and
>> run git gc which reduced the file size from 5.2 to 5.1MB, but I have been
>> unable to reduce it further.
>> >
>> > I'm reaching out to determine if this is an issue, and if so to ask for
>> recommendations for solving it. Currently, the only solution I've come up
>> with is to rewrite the repository's history using a tool like
>> "git-filter-repo", but this is a more drastic action than I would prefer 
to
>> take. I would greatly appreciate any advice on the matter.
>> >
>> > Thank you,
>> > Max McGrath
>> >
>> >   [[alternative HTML version deleted]]
>> >
>> > ___
>> > Bioc-devel@r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>>
>> [[alternative HTML version deleted]]
>>
>> ___
>> Bioc-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
>

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Git pack file greater than 5MB

2020-10-01 Thread Henrik Bengtsson
I understood that it's a submission. Just wanted to make sure that it's
clear there might be side effects, e.g. people clone and collaborate also
before submitting to Bioc and a rewrite might surprise existing
collaborators etc.

/H

On Thu, Oct 1, 2020, 09:04 Nitesh Turaga  wrote:

> This package isn’t yet a Bioconductor package Henrik. It will break other
> forks most likely. This package hasn’t been submitted to the Contributions
> either to be reviewed. So this is the time to break what needs to be broken
> before it’s submitted to Bioconductor and gets into the Bioconductor git
> repository.
>
> Nitesh
>
> On Oct 1, 2020, at 11:57 AM, Henrik Bengtsson 
> wrote:
>
> Doesn't a git rewrite break all existing clones, forks out there? I'm
> happy to be corrected, if this is not the case.
>
> /Henrik
>
> On Thu, Oct 1, 2020, 08:16 Nitesh Turaga  wrote:
>
>> Hi,
>>
>> The BiocCheck will complain on the build system about the > 5MB package
>> size.
>>
>> The rewrite of the history with BFG cleaner (
>> https://rtyley.github.io/bfg-repo-cleaner/ <
>> https://rtyley.github.io/bfg-repo-cleaner/>) is not as severe as you
>> think it is to be honest. It is just removing these pack files which don’t
>> have a place in the tree structure. These are more often than not, orphan
>> files.
>>
>> If you are suspect of this solution, I would suggest you make a backup
>> clone of your repo and try it on that first before you touch the main repo.
>> Check the history (git log) to see if anything important is missing.
>>
>> But usually a software package has to be below 5MB. If you have some data
>> in there which is needed for the package, consider Experiment Hub.
>>
>> Best,
>>
>> Nitesh
>>
>> > On Sep 30, 2020, at 12:46 PM, McGrath, Max 
>> wrote:
>> >
>> > Hi all,
>> >
>> > We have a package that is ready for submission, but when running
>> BiocCheck a warning is generated noting that "The following files are over
>> 5MB in size: '.git/objects/pack/pack-xxx...". I've pruned, repacked, and
>> run git gc which reduced the file size from 5.2 to 5.1MB, but I have been
>> unable to reduce it further.
>> >
>> > I'm reaching out to determine if this is an issue, and if so to ask for
>> recommendations for solving it. Currently, the only solution I've come up
>> with is to rewrite the repository's history using a tool like
>> "git-filter-repo", but this is a more drastic action than I would prefer to
>> take. I would greatly appreciate any advice on the matter.
>> >
>> > Thank you,
>> > Max McGrath
>> >
>> >   [[alternative HTML version deleted]]
>> >
>> > ___
>> > Bioc-devel@r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>>
>> [[alternative HTML version deleted]]
>>
>> ___
>> Bioc-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
>

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Git pack file greater than 5MB

2020-10-01 Thread Nitesh Turaga
This package isn’t yet a Bioconductor package Henrik. It will break other forks 
most likely. This package hasn’t been submitted to the Contributions either to 
be reviewed. So this is the time to break what needs to be broken before it’s 
submitted to Bioconductor and gets into the Bioconductor git repository.

Nitesh 

> On Oct 1, 2020, at 11:57 AM, Henrik Bengtsson  
> wrote:
> 
> Doesn't a git rewrite break all existing clones, forks out there? I'm happy 
> to be corrected, if this is not the case.
> 
> /Henrik
> 
> On Thu, Oct 1, 2020, 08:16 Nitesh Turaga  > wrote:
> Hi,
> 
> The BiocCheck will complain on the build system about the > 5MB package size. 
> 
> The rewrite of the history with BFG cleaner 
> (https://rtyley.github.io/bfg-repo-cleaner/ 
>  
>  >) is not as severe as you think 
> it is to be honest. It is just removing these pack files which don’t have a 
> place in the tree structure. These are more often than not, orphan files.
> 
> If you are suspect of this solution, I would suggest you make a backup clone 
> of your repo and try it on that first before you touch the main repo. Check 
> the history (git log) to see if anything important is missing. 
> 
> But usually a software package has to be below 5MB. If you have some data in 
> there which is needed for the package, consider Experiment Hub. 
> 
> Best,
> 
> Nitesh 
> 
> > On Sep 30, 2020, at 12:46 PM, McGrath, Max  > > wrote:
> > 
> > Hi all,
> > 
> > We have a package that is ready for submission, but when running BiocCheck 
> > a warning is generated noting that "The following files are over 5MB in 
> > size: '.git/objects/pack/pack-xxx...". I've pruned, repacked, and run git 
> > gc which reduced the file size from 5.2 to 5.1MB, but I have been unable to 
> > reduce it further.
> > 
> > I'm reaching out to determine if this is an issue, and if so to ask for 
> > recommendations for solving it. Currently, the only solution I've come up 
> > with is to rewrite the repository's history using a tool like 
> > "git-filter-repo", but this is a more drastic action than I would prefer to 
> > take. I would greatly appreciate any advice on the matter.
> > 
> > Thank you,
> > Max McGrath
> > 
> >   [[alternative HTML version deleted]]
> > 
> > ___
> > Bioc-devel@r-project.org  mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel 
> > 
> 
> 
> [[alternative HTML version deleted]]
> 
> ___
> Bioc-devel@r-project.org  mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel 
> 


[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Git pack file greater than 5MB

2020-10-01 Thread Henrik Bengtsson
Doesn't a git rewrite break all existing clones, forks out there? I'm happy
to be corrected, if this is not the case.

/Henrik

On Thu, Oct 1, 2020, 08:16 Nitesh Turaga  wrote:

> Hi,
>
> The BiocCheck will complain on the build system about the > 5MB package
> size.
>
> The rewrite of the history with BFG cleaner (
> https://rtyley.github.io/bfg-repo-cleaner/ <
> https://rtyley.github.io/bfg-repo-cleaner/>) is not as severe as you
> think it is to be honest. It is just removing these pack files which don’t
> have a place in the tree structure. These are more often than not, orphan
> files.
>
> If you are suspect of this solution, I would suggest you make a backup
> clone of your repo and try it on that first before you touch the main repo.
> Check the history (git log) to see if anything important is missing.
>
> But usually a software package has to be below 5MB. If you have some data
> in there which is needed for the package, consider Experiment Hub.
>
> Best,
>
> Nitesh
>
> > On Sep 30, 2020, at 12:46 PM, McGrath, Max 
> wrote:
> >
> > Hi all,
> >
> > We have a package that is ready for submission, but when running
> BiocCheck a warning is generated noting that "The following files are over
> 5MB in size: '.git/objects/pack/pack-xxx...". I've pruned, repacked, and
> run git gc which reduced the file size from 5.2 to 5.1MB, but I have been
> unable to reduce it further.
> >
> > I'm reaching out to determine if this is an issue, and if so to ask for
> recommendations for solving it. Currently, the only solution I've come up
> with is to rewrite the repository's history using a tool like
> "git-filter-repo", but this is a more drastic action than I would prefer to
> take. I would greatly appreciate any advice on the matter.
> >
> > Thank you,
> > Max McGrath
> >
> >   [[alternative HTML version deleted]]
> >
> > ___
> > Bioc-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>
> [[alternative HTML version deleted]]
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Git pack file greater than 5MB

2020-10-01 Thread Nitesh Turaga
Hi,

The BiocCheck will complain on the build system about the > 5MB package size. 

The rewrite of the history with BFG cleaner 
(https://rtyley.github.io/bfg-repo-cleaner/ 
) is not as severe as you think it 
is to be honest. It is just removing these pack files which don’t have a place 
in the tree structure. These are more often than not, orphan files.

If you are suspect of this solution, I would suggest you make a backup clone of 
your repo and try it on that first before you touch the main repo. Check the 
history (git log) to see if anything important is missing. 

But usually a software package has to be below 5MB. If you have some data in 
there which is needed for the package, consider Experiment Hub. 

Best,

Nitesh 

> On Sep 30, 2020, at 12:46 PM, McGrath, Max  wrote:
> 
> Hi all,
> 
> We have a package that is ready for submission, but when running BiocCheck a 
> warning is generated noting that "The following files are over 5MB in size: 
> '.git/objects/pack/pack-xxx...". I've pruned, repacked, and run git gc which 
> reduced the file size from 5.2 to 5.1MB, but I have been unable to reduce it 
> further.
> 
> I'm reaching out to determine if this is an issue, and if so to ask for 
> recommendations for solving it. Currently, the only solution I've come up 
> with is to rewrite the repository's history using a tool like 
> "git-filter-repo", but this is a more drastic action than I would prefer to 
> take. I would greatly appreciate any advice on the matter.
> 
> Thank you,
> Max McGrath
> 
>   [[alternative HTML version deleted]]
> 
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel


[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] Git pack file greater than 5MB

2020-09-30 Thread McGrath, Max
Hi all,

We have a package that is ready for submission, but when running BiocCheck a 
warning is generated noting that "The following files are over 5MB in size: 
'.git/objects/pack/pack-xxx...". I've pruned, repacked, and run git gc which 
reduced the file size from 5.2 to 5.1MB, but I have been unable to reduce it 
further.

I'm reaching out to determine if this is an issue, and if so to ask for 
recommendations for solving it. Currently, the only solution I've come up with 
is to rewrite the repository's history using a tool like "git-filter-repo", but 
this is a more drastic action than I would prefer to take. I would greatly 
appreciate any advice on the matter.

Thank you,
Max McGrath

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel