Re: [Bioc-devel] Remote BigWig file access

2024-05-28 Thread Chris Wilks (gmail)
Thanks Vince, understood about the Core's focus right now.

 I think this is something that Leo and I can fix among ourselves for the
time being.

Looking forward, as you brought up, if we were to refresh recount or
produce a recount4 (discussed) we'd certainly consider additional coverage
formats.

I'm aware of tiledb though not duckdb (I'll have to check it out), thanks
for the pointer.

There's also the D4 format from Aaron Quinlan's lab from a few years ago
which was explicitly designed to replace bigwigs:
https://www.nature.com/articles/s43588-021-00085-0

All that said, we're pretty committed to bigwigs at this point given the
~750,000 sequence runs we've encoded using them for recount3.

On Wed, May 22, 2024 at 7:17 AM Vincent Carey 
wrote:

> Really glad to see this discussion moving forward.  I would say that the
> core is wrangling with some
> even lower-level technical concerns right now, so I can't jump in just
> now.  I just want to raise the question
> of whether bigWig files are a technologically sound format to continue
> investing in for the use case of
> targeted remote query resolution on genomic coordinates.  A number of new
> concepts have come into
> play since bigWig was designed and implemented.  I'll naively mention
> duckdb and tiledb, which seem
> to have very good remote performance.  Maybe these are too generic ... are
> there other concepts in
> GA4GH that might be relevant to leverage for recount-like projects in the
> future?
>
>
>
> On Wed, May 22, 2024 at 6:58 AM Chris Wilks (gmail) 
> wrote:
>
>> Thanks for sharing Leo, this does interest me, especially since so much is
>> built on BigWig access via rtracklayer at least in the recount2 ecosystem.
>>
>> As you alluded to, Megadepth currently supports remote access of BigWigs
>> (and BAMs) over HTTPS on all platforms (Linux, MacOS, and Windows),
>> getting back just the byte ranges overlapping the set of regions requested
>> so it should work for at least recount2/recount3 and anything that uses
>> HTTP/s.
>>
>> I'd be open to exploring updates to the Megadepth C/C++ code side to
>> support Rle if that makes sense to replace rtracklayer.
>> But to do that you'd need to be involved in updating all the R packages if
>> you're willing (both megadepth and those that currently rely on
>> rtracklayer
>> for this functionality).
>>
>> Let me know if you want to chat about this over Zoom,
>> Chris
>>
>> On Tue, May 21, 2024 at 2:41 PM Leonardo Collado Torres <
>> lcollado...@gmail.com> wrote:
>>
>> > Hi Bioc-devel,
>> >
>> > As some of you are aware, rtracklayer::import() has long provided
>> > access to import BigWig files. Those files can be shared on servers
>> > and accessed remotely thanks to all the effort from many of you in
>> > building and maintaining rtracklayer.
>> >
>> > From my side, derfinder::loadCoverage() relies on
>> > rtracklayer::import.bw(), and recount::expressed_regions() +
>> > recount::coverage_matrix() use derfinder::loadCoverage().
>> > recountWorkflow showcases those recount functions on larger datasets.
>> > brainflowprobes by Amanda Price, Nina Rajpurohit and others also ends
>> > up relying on rtracklayer::import.bw() through these functions.
>> >
>> > At https://github.com/lawremi/rtracklayer/issues/83 I initially
>> > reported some issues once our recount2/3 data host changed, but
>> > previously Brian Schilder also reported that one could no longer read
>> > remote files https://github.com/lawremi/rtracklayer/issues/73.
>> > https://github.com/lawremi/rtracklayer/issues/63 and/or
>> > https://github.com/lawremi/rtracklayer/issues/65 might have been
>> > related.
>> >
>> > Yesterday I updated
>> >
>> https://github.com/lawremi/rtracklayer/issues/83#issuecomment-2121313270
>> > with a comment showing some small reproducible code, and that the
>> > workaround of downloading the data first, then using
>> > rtracklayer::import() on the local data does work. However, this
>> > workaround does involve a lot of, hmm, wasteful data transfer.
>> >
>> > On the recount vignette at some point I access just chrY of a bigWig
>> > file that is about 1300 MB. On the recountWorkflow vignette I do
>> > something similar for a 7GB bigWig file. Previously accessing just
>> > chrY on these files was a small data transfer.
>> >
>> > On recountWorkflow version 1.29.2
>> > https://github.com/LieberInstitute/recountWorkflow, I've included
>> > pre-computed results (~2 MB) to avoid downloading tons of data, though
>> > the vignette code shows how to actually fully reproduce the results if
>> > you don't mind downloading those large files. I also implemented some
>> > workarounds on recount, though I haven't yet gone the full route of
>> > including pre-computed results. I have yet to try implementing a
>> > workaround for brainflowprobes.
>> >
>> >
>> >
>> > My understanding is that rtracklayer's root issues are elsewhere and
>> > changes in dependencies rtracklayer has likely created these problems.
>> > These problems are

Re: [Bioc-devel] Remote BigWig file access

2024-05-25 Thread Håkon Tjeldnes
We have been experimenting with other formats in our package ORFik, the fst 
format is also a good candidate, though the problem is that only R and Julia 
supports it currently. My biggest problems with bigwigs are the slow full file 
access time and not supporting multiple score columns (as far as I know).

Sent from Outlook for Android<https://aka.ms/AAb9ysg>

From: Bioc-devel  on behalf of Vincent Carey 

Sent: Friday, May 24, 2024 12:26:53 AM
To: Chris Wilks (gmail) 
Cc: Price, Amanda (NIH/NICHD) [E] ; Bioc-devel 
; Nina Rajpurohit ; Jaffe, 
Andrew E. 
Subject: Re: [Bioc-devel] Remote BigWig file access

thanks

On Thu, May 23, 2024 at 5:36 PM Chris Wilks (gmail) 
wrote:

> Thanks Vince, understood about the Core's focus right now.
>
>  I think this is something that Leo and I can fix among ourselves for the
> time being.
>
> Looking forward, as you brought up, if we were to refresh recount or
> produce a recount4 (discussed) we'd certainly consider additional coverage
> formats.
>
> I'm aware of tiledb though not duckdb (I'll have to check it out), thanks
> for the pointer.
>
> There's also the D4 format from Aaron Quinlan's lab from a few years ago
> which was explicitly designed to replace bigwigs:
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.nature.com%2Farticles%2Fs43588-021-00085-0&data=05%7C02%7C%7C3ba45fd4eedc4345092308dc7b778b9f%7C84df9e7fe9f640afb435%7C1%7C0%7C638521000591663672%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=DHohOJ341h1sk4SvxQDTMAzIBRk23qUCdKaKl1WrloQ%3D&reserved=0<https://www.nature.com/articles/s43588-021-00085-0>
>
> All that said, we're pretty committed to bigwigs at this point given the
> ~750,000 sequence runs we've encoded using them for recount3.
>
> On Wed, May 22, 2024 at 7:17 AM Vincent Carey 
> wrote:
>
>> Really glad to see this discussion moving forward.  I would say that the
>> core is wrangling with some
>> even lower-level technical concerns right now, so I can't jump in just
>> now.  I just want to raise the question
>> of whether bigWig files are a technologically sound format to continue
>> investing in for the use case of
>> targeted remote query resolution on genomic coordinates.  A number of new
>> concepts have come into
>> play since bigWig was designed and implemented.  I'll naively mention
>> duckdb and tiledb, which seem
>> to have very good remote performance.  Maybe these are too generic ...
>> are there other concepts in
>> GA4GH that might be relevant to leverage for recount-like projects in the
>> future?
>>
>>
>>
>> On Wed, May 22, 2024 at 6:58 AM Chris Wilks (gmail) 
>> wrote:
>>
>>> Thanks for sharing Leo, this does interest me, especially since so much
>>> is
>>> built on BigWig access via rtracklayer at least in the recount2
>>> ecosystem.
>>>
>>> As you alluded to, Megadepth currently supports remote access of BigWigs
>>> (and BAMs) over HTTPS on all platforms (Linux, MacOS, and Windows),
>>> getting back just the byte ranges overlapping the set of regions
>>> requested
>>> so it should work for at least recount2/recount3 and anything that uses
>>> HTTP/s.
>>>
>>> I'd be open to exploring updates to the Megadepth C/C++ code side to
>>> support Rle if that makes sense to replace rtracklayer.
>>> But to do that you'd need to be involved in updating all the R packages
>>> if
>>> you're willing (both megadepth and those that currently rely on
>>> rtracklayer
>>> for this functionality).
>>>
>>> Let me know if you want to chat about this over Zoom,
>>> Chris
>>>
>>> On Tue, May 21, 2024 at 2:41 PM Leonardo Collado Torres <
>>> lcollado...@gmail.com> wrote:
>>>
>>> > Hi Bioc-devel,
>>> >
>>> > As some of you are aware, rtracklayer::import() has long provided
>>> > access to import BigWig files. Those files can be shared on servers
>>> > and accessed remotely thanks to all the effort from many of you in
>>> > building and maintaining rtracklayer.
>>> >
>>> > From my side, derfinder::loadCoverage() relies on
>>> > rtracklayer::import.bw(), and recount::expressed_regions() +
>>> > recount::coverage_matrix() use derfinder::loadCoverage().
>>> > recountWorkflow showcases those recount functions on larger datasets.
>>> > brainflowprobes by Amanda Price, Nina Rajpurohit

Re: [Bioc-devel] Remote BigWig file access

2024-05-23 Thread Vincent Carey
thanks

On Thu, May 23, 2024 at 5:36 PM Chris Wilks (gmail) 
wrote:

> Thanks Vince, understood about the Core's focus right now.
>
>  I think this is something that Leo and I can fix among ourselves for the
> time being.
>
> Looking forward, as you brought up, if we were to refresh recount or
> produce a recount4 (discussed) we'd certainly consider additional coverage
> formats.
>
> I'm aware of tiledb though not duckdb (I'll have to check it out), thanks
> for the pointer.
>
> There's also the D4 format from Aaron Quinlan's lab from a few years ago
> which was explicitly designed to replace bigwigs:
> https://www.nature.com/articles/s43588-021-00085-0
>
> All that said, we're pretty committed to bigwigs at this point given the
> ~750,000 sequence runs we've encoded using them for recount3.
>
> On Wed, May 22, 2024 at 7:17 AM Vincent Carey 
> wrote:
>
>> Really glad to see this discussion moving forward.  I would say that the
>> core is wrangling with some
>> even lower-level technical concerns right now, so I can't jump in just
>> now.  I just want to raise the question
>> of whether bigWig files are a technologically sound format to continue
>> investing in for the use case of
>> targeted remote query resolution on genomic coordinates.  A number of new
>> concepts have come into
>> play since bigWig was designed and implemented.  I'll naively mention
>> duckdb and tiledb, which seem
>> to have very good remote performance.  Maybe these are too generic ...
>> are there other concepts in
>> GA4GH that might be relevant to leverage for recount-like projects in the
>> future?
>>
>>
>>
>> On Wed, May 22, 2024 at 6:58 AM Chris Wilks (gmail) 
>> wrote:
>>
>>> Thanks for sharing Leo, this does interest me, especially since so much
>>> is
>>> built on BigWig access via rtracklayer at least in the recount2
>>> ecosystem.
>>>
>>> As you alluded to, Megadepth currently supports remote access of BigWigs
>>> (and BAMs) over HTTPS on all platforms (Linux, MacOS, and Windows),
>>> getting back just the byte ranges overlapping the set of regions
>>> requested
>>> so it should work for at least recount2/recount3 and anything that uses
>>> HTTP/s.
>>>
>>> I'd be open to exploring updates to the Megadepth C/C++ code side to
>>> support Rle if that makes sense to replace rtracklayer.
>>> But to do that you'd need to be involved in updating all the R packages
>>> if
>>> you're willing (both megadepth and those that currently rely on
>>> rtracklayer
>>> for this functionality).
>>>
>>> Let me know if you want to chat about this over Zoom,
>>> Chris
>>>
>>> On Tue, May 21, 2024 at 2:41 PM Leonardo Collado Torres <
>>> lcollado...@gmail.com> wrote:
>>>
>>> > Hi Bioc-devel,
>>> >
>>> > As some of you are aware, rtracklayer::import() has long provided
>>> > access to import BigWig files. Those files can be shared on servers
>>> > and accessed remotely thanks to all the effort from many of you in
>>> > building and maintaining rtracklayer.
>>> >
>>> > From my side, derfinder::loadCoverage() relies on
>>> > rtracklayer::import.bw(), and recount::expressed_regions() +
>>> > recount::coverage_matrix() use derfinder::loadCoverage().
>>> > recountWorkflow showcases those recount functions on larger datasets.
>>> > brainflowprobes by Amanda Price, Nina Rajpurohit and others also ends
>>> > up relying on rtracklayer::import.bw() through these functions.
>>> >
>>> > At https://github.com/lawremi/rtracklayer/issues/83 I initially
>>> > reported some issues once our recount2/3 data host changed, but
>>> > previously Brian Schilder also reported that one could no longer read
>>> > remote files https://github.com/lawremi/rtracklayer/issues/73.
>>> > https://github.com/lawremi/rtracklayer/issues/63 and/or
>>> > https://github.com/lawremi/rtracklayer/issues/65 might have been
>>> > related.
>>> >
>>> > Yesterday I updated
>>> >
>>> https://github.com/lawremi/rtracklayer/issues/83#issuecomment-2121313270
>>> > with a comment showing some small reproducible code, and that the
>>> > workaround of downloading the data first, then using
>>> > rtracklayer::import() on the local data does work. However, this
>>> > workaround does involve a lot of, hmm, wasteful data transfer.
>>> >
>>> > On the recount vignette at some point I access just chrY of a bigWig
>>> > file that is about 1300 MB. On the recountWorkflow vignette I do
>>> > something similar for a 7GB bigWig file. Previously accessing just
>>> > chrY on these files was a small data transfer.
>>> >
>>> > On recountWorkflow version 1.29.2
>>> > https://github.com/LieberInstitute/recountWorkflow, I've included
>>> > pre-computed results (~2 MB) to avoid downloading tons of data, though
>>> > the vignette code shows how to actually fully reproduce the results if
>>> > you don't mind downloading those large files. I also implemented some
>>> > workarounds on recount, though I haven't yet gone the full route of
>>> > including pre-computed results. I have yet to try implementing a
>>> > workaround fo

Re: [Bioc-devel] Remote BigWig file access

2024-05-22 Thread Vincent Carey
Really glad to see this discussion moving forward.  I would say that the
core is wrangling with some
even lower-level technical concerns right now, so I can't jump in just
now.  I just want to raise the question
of whether bigWig files are a technologically sound format to continue
investing in for the use case of
targeted remote query resolution on genomic coordinates.  A number of new
concepts have come into
play since bigWig was designed and implemented.  I'll naively mention
duckdb and tiledb, which seem
to have very good remote performance.  Maybe these are too generic ... are
there other concepts in
GA4GH that might be relevant to leverage for recount-like projects in the
future?



On Wed, May 22, 2024 at 6:58 AM Chris Wilks (gmail) 
wrote:

> Thanks for sharing Leo, this does interest me, especially since so much is
> built on BigWig access via rtracklayer at least in the recount2 ecosystem.
>
> As you alluded to, Megadepth currently supports remote access of BigWigs
> (and BAMs) over HTTPS on all platforms (Linux, MacOS, and Windows),
> getting back just the byte ranges overlapping the set of regions requested
> so it should work for at least recount2/recount3 and anything that uses
> HTTP/s.
>
> I'd be open to exploring updates to the Megadepth C/C++ code side to
> support Rle if that makes sense to replace rtracklayer.
> But to do that you'd need to be involved in updating all the R packages if
> you're willing (both megadepth and those that currently rely on rtracklayer
> for this functionality).
>
> Let me know if you want to chat about this over Zoom,
> Chris
>
> On Tue, May 21, 2024 at 2:41 PM Leonardo Collado Torres <
> lcollado...@gmail.com> wrote:
>
> > Hi Bioc-devel,
> >
> > As some of you are aware, rtracklayer::import() has long provided
> > access to import BigWig files. Those files can be shared on servers
> > and accessed remotely thanks to all the effort from many of you in
> > building and maintaining rtracklayer.
> >
> > From my side, derfinder::loadCoverage() relies on
> > rtracklayer::import.bw(), and recount::expressed_regions() +
> > recount::coverage_matrix() use derfinder::loadCoverage().
> > recountWorkflow showcases those recount functions on larger datasets.
> > brainflowprobes by Amanda Price, Nina Rajpurohit and others also ends
> > up relying on rtracklayer::import.bw() through these functions.
> >
> > At https://github.com/lawremi/rtracklayer/issues/83 I initially
> > reported some issues once our recount2/3 data host changed, but
> > previously Brian Schilder also reported that one could no longer read
> > remote files https://github.com/lawremi/rtracklayer/issues/73.
> > https://github.com/lawremi/rtracklayer/issues/63 and/or
> > https://github.com/lawremi/rtracklayer/issues/65 might have been
> > related.
> >
> > Yesterday I updated
> > https://github.com/lawremi/rtracklayer/issues/83#issuecomment-2121313270
> > with a comment showing some small reproducible code, and that the
> > workaround of downloading the data first, then using
> > rtracklayer::import() on the local data does work. However, this
> > workaround does involve a lot of, hmm, wasteful data transfer.
> >
> > On the recount vignette at some point I access just chrY of a bigWig
> > file that is about 1300 MB. On the recountWorkflow vignette I do
> > something similar for a 7GB bigWig file. Previously accessing just
> > chrY on these files was a small data transfer.
> >
> > On recountWorkflow version 1.29.2
> > https://github.com/LieberInstitute/recountWorkflow, I've included
> > pre-computed results (~2 MB) to avoid downloading tons of data, though
> > the vignette code shows how to actually fully reproduce the results if
> > you don't mind downloading those large files. I also implemented some
> > workarounds on recount, though I haven't yet gone the full route of
> > including pre-computed results. I have yet to try implementing a
> > workaround for brainflowprobes.
> >
> >
> >
> > My understanding is that rtracklayer's root issues are elsewhere and
> > changes in dependencies rtracklayer has likely created these problems.
> > These problems are not always in the control of rtracklayer authors to
> > resolve, and also create an unexpected burden on them.
> >
> > If one considers alternatives to rtracklayer, I see that there's a new
> > package https://github.com/PoisonAlien/trackplot/tree/master that uses
> > bwtool (a system dependency), and older alternative
> > https://github.com/andrelmartins/bigWig that hasn't had updates in 4
> > years, and a CRAN package
> > (https://cran.r-project.org/web/packages/wig/readme/README.html) that
> > recommends using rtracklayer for larger files. I guess that I could
> > also try using megadepth https://research.libd.org/megadepth/, though
> > derfinder::loadCoverage uses rtracklayer::import(as = "RleList") for
> > efficiency
> >
> https://github.com/lcolladotor/derfinder/blob/f9cd986e0c1b9ea6551d0d8d2077d4501216a661/R/loadCoverage.R#L401
> > and lots of functi

Re: [Bioc-devel] Remote BigWig file access

2024-05-22 Thread Chris Wilks (gmail)
Thanks for sharing Leo, this does interest me, especially since so much is
built on BigWig access via rtracklayer at least in the recount2 ecosystem.

As you alluded to, Megadepth currently supports remote access of BigWigs
(and BAMs) over HTTPS on all platforms (Linux, MacOS, and Windows),
getting back just the byte ranges overlapping the set of regions requested
so it should work for at least recount2/recount3 and anything that uses
HTTP/s.

I'd be open to exploring updates to the Megadepth C/C++ code side to
support Rle if that makes sense to replace rtracklayer.
But to do that you'd need to be involved in updating all the R packages if
you're willing (both megadepth and those that currently rely on rtracklayer
for this functionality).

Let me know if you want to chat about this over Zoom,
Chris

On Tue, May 21, 2024 at 2:41 PM Leonardo Collado Torres <
lcollado...@gmail.com> wrote:

> Hi Bioc-devel,
>
> As some of you are aware, rtracklayer::import() has long provided
> access to import BigWig files. Those files can be shared on servers
> and accessed remotely thanks to all the effort from many of you in
> building and maintaining rtracklayer.
>
> From my side, derfinder::loadCoverage() relies on
> rtracklayer::import.bw(), and recount::expressed_regions() +
> recount::coverage_matrix() use derfinder::loadCoverage().
> recountWorkflow showcases those recount functions on larger datasets.
> brainflowprobes by Amanda Price, Nina Rajpurohit and others also ends
> up relying on rtracklayer::import.bw() through these functions.
>
> At https://github.com/lawremi/rtracklayer/issues/83 I initially
> reported some issues once our recount2/3 data host changed, but
> previously Brian Schilder also reported that one could no longer read
> remote files https://github.com/lawremi/rtracklayer/issues/73.
> https://github.com/lawremi/rtracklayer/issues/63 and/or
> https://github.com/lawremi/rtracklayer/issues/65 might have been
> related.
>
> Yesterday I updated
> https://github.com/lawremi/rtracklayer/issues/83#issuecomment-2121313270
> with a comment showing some small reproducible code, and that the
> workaround of downloading the data first, then using
> rtracklayer::import() on the local data does work. However, this
> workaround does involve a lot of, hmm, wasteful data transfer.
>
> On the recount vignette at some point I access just chrY of a bigWig
> file that is about 1300 MB. On the recountWorkflow vignette I do
> something similar for a 7GB bigWig file. Previously accessing just
> chrY on these files was a small data transfer.
>
> On recountWorkflow version 1.29.2
> https://github.com/LieberInstitute/recountWorkflow, I've included
> pre-computed results (~2 MB) to avoid downloading tons of data, though
> the vignette code shows how to actually fully reproduce the results if
> you don't mind downloading those large files. I also implemented some
> workarounds on recount, though I haven't yet gone the full route of
> including pre-computed results. I have yet to try implementing a
> workaround for brainflowprobes.
>
>
>
> My understanding is that rtracklayer's root issues are elsewhere and
> changes in dependencies rtracklayer has likely created these problems.
> These problems are not always in the control of rtracklayer authors to
> resolve, and also create an unexpected burden on them.
>
> If one considers alternatives to rtracklayer, I see that there's a new
> package https://github.com/PoisonAlien/trackplot/tree/master that uses
> bwtool (a system dependency), and older alternative
> https://github.com/andrelmartins/bigWig that hasn't had updates in 4
> years, and a CRAN package
> (https://cran.r-project.org/web/packages/wig/readme/README.html) that
> recommends using rtracklayer for larger files. I guess that I could
> also try using megadepth https://research.libd.org/megadepth/, though
> derfinder::loadCoverage uses rtracklayer::import(as = "RleList") for
> efficiency
> https://github.com/lcolladotor/derfinder/blob/f9cd986e0c1b9ea6551d0d8d2077d4501216a661/R/loadCoverage.R#L401
> and lots of functions in that package were built for that structure
> (RleList objects). I likely missed other alternatives.
>
>
> My current line of thought is to keep implementing workarounds using
> local data (sometimes with pre-computed results) for recount,
> recountWorkflow, and brainflowprobes (derfinder only has tests with
> local bigWig files) without really altering the internals of those
> packages. That is, assume that the remote BigWig file access via
> rtracklayer will indefinitely be suspended, though it could be
> supported again at some point and when it does, those packages will
> work again with remote BigWig files as if nothing ever happened. But I
> wanted to check in if this is what others who use BigWig files are
> thinking of doing.
>
> Thanks!
>
> Best,
> Leo
>
>
> Leonardo Collado Torres, Ph. D.
> Investigator, LIEBER INSTITUTE for BRAIN DEVELOPMENT
> Assistant Professor, Department of Biostatistics

[Bioc-devel] Remote BigWig file access

2024-05-21 Thread Leonardo Collado Torres
Hi Bioc-devel,

As some of you are aware, rtracklayer::import() has long provided
access to import BigWig files. Those files can be shared on servers
and accessed remotely thanks to all the effort from many of you in
building and maintaining rtracklayer.

>From my side, derfinder::loadCoverage() relies on
rtracklayer::import.bw(), and recount::expressed_regions() +
recount::coverage_matrix() use derfinder::loadCoverage().
recountWorkflow showcases those recount functions on larger datasets.
brainflowprobes by Amanda Price, Nina Rajpurohit and others also ends
up relying on rtracklayer::import.bw() through these functions.

At https://github.com/lawremi/rtracklayer/issues/83 I initially
reported some issues once our recount2/3 data host changed, but
previously Brian Schilder also reported that one could no longer read
remote files https://github.com/lawremi/rtracklayer/issues/73.
https://github.com/lawremi/rtracklayer/issues/63 and/or
https://github.com/lawremi/rtracklayer/issues/65 might have been
related.

Yesterday I updated
https://github.com/lawremi/rtracklayer/issues/83#issuecomment-2121313270
with a comment showing some small reproducible code, and that the
workaround of downloading the data first, then using
rtracklayer::import() on the local data does work. However, this
workaround does involve a lot of, hmm, wasteful data transfer.

On the recount vignette at some point I access just chrY of a bigWig
file that is about 1300 MB. On the recountWorkflow vignette I do
something similar for a 7GB bigWig file. Previously accessing just
chrY on these files was a small data transfer.

On recountWorkflow version 1.29.2
https://github.com/LieberInstitute/recountWorkflow, I've included
pre-computed results (~2 MB) to avoid downloading tons of data, though
the vignette code shows how to actually fully reproduce the results if
you don't mind downloading those large files. I also implemented some
workarounds on recount, though I haven't yet gone the full route of
including pre-computed results. I have yet to try implementing a
workaround for brainflowprobes.



My understanding is that rtracklayer's root issues are elsewhere and
changes in dependencies rtracklayer has likely created these problems.
These problems are not always in the control of rtracklayer authors to
resolve, and also create an unexpected burden on them.

If one considers alternatives to rtracklayer, I see that there's a new
package https://github.com/PoisonAlien/trackplot/tree/master that uses
bwtool (a system dependency), and older alternative
https://github.com/andrelmartins/bigWig that hasn't had updates in 4
years, and a CRAN package
(https://cran.r-project.org/web/packages/wig/readme/README.html) that
recommends using rtracklayer for larger files. I guess that I could
also try using megadepth https://research.libd.org/megadepth/, though
derfinder::loadCoverage uses rtracklayer::import(as = "RleList") for
efficiency 
https://github.com/lcolladotor/derfinder/blob/f9cd986e0c1b9ea6551d0d8d2077d4501216a661/R/loadCoverage.R#L401
and lots of functions in that package were built for that structure
(RleList objects). I likely missed other alternatives.


My current line of thought is to keep implementing workarounds using
local data (sometimes with pre-computed results) for recount,
recountWorkflow, and brainflowprobes (derfinder only has tests with
local bigWig files) without really altering the internals of those
packages. That is, assume that the remote BigWig file access via
rtracklayer will indefinitely be suspended, though it could be
supported again at some point and when it does, those packages will
work again with remote BigWig files as if nothing ever happened. But I
wanted to check in if this is what others who use BigWig files are
thinking of doing.

Thanks!

Best,
Leo


Leonardo Collado Torres, Ph. D.
Investigator, LIEBER INSTITUTE for BRAIN DEVELOPMENT
Assistant Professor, Department of Biostatistics
Johns Hopkins Bloomberg School of Public Health
855 N. Wolfe St., Room 382
Baltimore, MD 21205
lcolladotor.github.io
lcollado...@gmail.com

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel