[Bioc-devel] Merging or renaming a fork, and appropriate journal for package updates

2021-12-01 Thread Pariksheet Nanda

Hi folks,

I'm wrapping up my dissertation and one of the chapters touches on a 
summer of patching a Bioconductor package that currently lives as a 
separate GitHub fork (the list of changes is here [1]).  2 of the 
questions I've been asked by a member of my committee are whether to:


(1) associate a publication with the work, and
(2) republish the code with the original or as a separate package.

For (1) while I appreciate the traditional unit of research output is 
the publication, I'm struggling to think of a suitable journal for what 
essentially is discussion about enhancements and some bugfixes.  I've 
seen R package updates published in the Journal of Statistical Software 
(JSS) which might be appropriate?  I suppose any place that gets indexed 
on pubmed would work (yes, JSS is part of the NLM catalog [2]).  What 
would you suggest?  Perhaps the Bioconductor project collect metrics for 
publication activity about its packages to get more funding and has some 
preference?


For (2) I would prefer to merge back with the original Bioconductor 
package.  I tried upstreaming an early changeset [3], but besides my 
issue being open, there are currently 2 other open GitHub issues with no 
response which makes me wonder if upstream is dead.  If that's the case, 
would someone from the Bioconductor core team be willing to work with me 
to proxy commit to git.bioconductor.org?  I've made some API breaking 
changes, so I expect I would need to create at least 2 branches: one 
that can be commit with a deprecation warning for upcoming API breaking 
changes, and a second branch with API breaking changes to be commit at 
the subsequent Bioconductor release.  Or maybe I would need to create a 
branch for each feature change; honestly I don't know if that would be 
or less work but certainly it would be easier to read the git history.


Pariksheet

[1] https://github.com/coregenomics/groHMM/blob/1.99.x/NEWS
[2] https://www.ncbi.nlm.nih.gov/nlmcatalog/101307056
[3] https://github.com/Kraus-Lab/groHMM/issues/2

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [R-pkg-devel] How does one install a libtool generated libfoo.so.1 file into ./libs/?

2021-10-19 Thread Pariksheet Nanda

Hi Simon and Vladimir,

>> On Oct 19, 2021, at 4:13 PM, Pariksheet Nanda 
 wrote:
>> The trouble is, R's installation process will only copy compiled 
files from ./libs/ that have exactly the extension ".so" and files 
ending with ".so.1" are ignored.

--snip--
>> library(tsshmm)
>> ...
>> Error: package or namespace load failed for 'tsshmm' 
indyn.load(file, DLLpath = DLLpath, ...):
>> unable to load shared object 
'/home/omsai/R/x86_64-pc-linux-gnu-library/4.1/tsshmm/libs/tsshmm.so':
>>   libghmm.so.1: cannot open shared object file: No such file or 
directory

>

Pariksheet

On 10/19/21 5:00 AM, Simon Urbanek wrote:


dynamic linking won't work, compile a static version with PIC enabled. If the 
subproject is autoconf-compatible this means using --disable-shared --with-pic. 
Then you only need to add libfoo.a to your PKG_LIBS. >

> Simon

On 10/19/21 6:39 AM, Vladimir Dergachev wrote:
>
> The simplest thing to try is to compile the library statically and 
link it

> into your package. No extra files - no trouble.
>
> You can also try renaming the file from *.so.1 to *.so.
>
> Vladimir Dergachev

Thank you both for your suggestions!  I will link the code statically 
with PIC per your consensus.


I found when linking the R package library, one also has to link the 
dependencies of the static library; in this case libghmm depends on 
libxml-2.0 > 2.6 and so I have to link libxml2 to my R package library 
after finding libxml2 with pkg-config.



Thanks for the quick replies,

Pariksheet

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] How does one install a libtool generated libfoo.so.1 file into ./libs/?

2021-10-19 Thread Pariksheet Nanda

Hi folks,

On 10/18/21 11:13 PM, Pariksheet Nanda wrote:


The trouble is, R's installation process will only copy compiled files 
from ./libs/ that have exactly the extension ".so" and files ending with 
".so.1" are ignored.

--snip--
So is there some mechanism to copy arbitrary files or symlinks to the 
final install location?  I prefer not to patch upstreams Makefile.am to 
remove their -version-info, but currently that's the only option I can 
think of.


It turns out removing -version-info or setting it to 0.0.0 will still 
try to link against libghmm.so.0 which is still problematic.  I don't 
see how to disable libtool's versioning.


So after playing around, the only way I can think of doing is is 
eliminating the dependency file by compiling it statically and linking 
it with the dynamic library, because when I try merging the 2 dynamic 
libraries with libtool it gives the same error of not finding 
"libghmm.so.1".  I have a patch that works on my Debian machines, but 
not yet on the Ubuntu CI Image:


https://gitlab.com/coregenomics/tsshmm/-/commit/e9608f01deb7baa13684d2bd65fe11e93f6c2e08

Also pasting the short diff below for search-ability.



Pariksheet


Pariksheet



$ GIT_PAGER=cat git log -1 --patch
commit e9608f01deb7baa13684d2bd65fe11e93f6c2e08 (HEAD -> master, 
origin/master, origin/HEAD)

Author: Pariksheet Nanda 
Date:   Tue Oct 19 01:43:09 2021 -0400

BLD: Link bundled dependency statically to workaround load errors

diff --git a/configure.ac b/configure.ac
index 87b4d31..4d0be6e 100644
--- a/configure.ac
+++ b/configure.ac
@@ -135,8 +135,6 @@ AS_IF([test x$with_ghmm_strategy = x],
   ) # AS_IF
 ) # AS_IF

-AC_SUBST(GHMM_LIBS, -lghmm)
-
 # If GHMM_ROOT was provided, set the header and library paths.
 #
 # Check for the existance of include/ and lib/ sub-directories and if 
both are

@@ -180,7 +178,10 @@ AS_IF(test x$found_ghmm_system != xyes &&
   AM_CONDITIONAL(BUNDLED_GHMM, true)
   [AX_SUBDIRS_CONFIGURE([src/ghmm-0.9-rc3],
 [[CFLAGS=$CFLAGS],
- [--enable-gsl=no],
+ [--enable-static],
+ [--disable-shared],
+ [--with-pic],
+ [--enable-gsl=no],
  [--disable-gsltest],
  [--with-rng=mt],
  [--with-python=no],
@@ -191,8 +192,14 @@ AS_IF(test x$found_ghmm_system != xyes &&
   [AS_IF([test -d $GHMM_ROOT], [],
  AC_MSG_FAILURE(Directory of bundled GHMM not found.))]
   [AC_SUBST(GHMM_CPPFLAGS, ["-I$GHMM_ROOT/.."])]
-  # Using -rpath=. prefers the bundled over any system installation.
-  [AC_SUBST(GHMM_LDFLAGS, ["-Wl,-rpath=. -L$GHMM_ROOT/.libs"])]
+  # We don't need GMM_LIBS or GHMM_LDFLAGS because we can directly 
merge

+  # libraries using tsshmm_la_LIBADD per
+  # https://stackoverflow.com/a/13978856 and
+  # 
https://www.gnu.org/software/automake/manual/html_node/Libtool-Convenience-Libraries.html

+  #
+  # However we now need to link against libghmm's libxml2 dependency
+  # because we're merging libraries.
+  [PKG_CHECK_MODULES([LIBXML2], [libxml-2.0 >= 2.6])]
   AC_MSG_NOTICE(Applying patches to GHMM to fix errors and 
warnings from "R CMD check")

   # Patch bug in upstream's configure bug:
   #
@@ -239,7 +246,9 @@ AS_IF(test x$found_ghmm_system != xyes &&
 #include ' src/ghmm-0.9-rc3/tests/mcmc.c
   [touch -r src/ghmm-0.9-rc3/tests/mcmc.c{.bak,}]
   [diff -u src/ghmm-0.9-rc3/tests/mcmc.c{.bak,}]
-  AC_MSG_NOTICE(Finished patching GHMM)
+  AC_MSG_NOTICE(Finished patching GHMM),
+  # Only link if we're not using the static bundled dependency.
+  [AC_SUBST(GHMM_LIBS, -lghmm)]
 ) # AS_IF

 # Variables for Doxygen.
diff --git a/src/Makefile.am b/src/Makefile.am
index 617a4e7..0e38b4a 100644
--- a/src/Makefile.am
+++ b/src/Makefile.am
@@ -9,18 +9,19 @@ endif
 lib_LTLIBRARIES= tsshmm.la
 tsshmm_la_CFLAGS   = $(PKG_CFLAGS)
 tsshmm_la_CPPFLAGS = $(PKG_CPPFLAGS)
+if BUNDLED_GHMM
+tsshmm_la_LIBADD   = @GHMM_ROOT@/libghmm.la
+tsshmm_la_LDFLAGS  = -module $(PKG_LIBS) @LIBXML2_LIBS@
+else
 tsshmm_la_LDFLAGS  = -module $(PKG_LIBS)
+endif
 tsshmm_la_SOURCES  = R_init_tsshmm.c R_wrap_tsshmm.c models.c \
 simulate.c train.c tss.c viterbi.c

 ACLOCAL_AMFLAGS = -I tools

 # Hook that runs after the default "all" rule.
-if BUNDLED_GHMM
-all-local : tsshmm.so libghmm.so
-else
 all-local : tsshmm.so
-endif

 # One of the limitations with POSIX-compliant `make` is not being able to
 # specify multiple outputs from a single rule.  Therefore, even though 
libtool

@@ -30,14 +31,8 @@ tsshmm.so : tsshmm.la
cp -av .libs/tsshmm.so.0.0.0 $@
chmod -x $@

-if BUNDLED_GHMM
-libghmm.so : @GHMM_ROOT@/libghmm.la
-   cp -av @GHMM

[R-pkg-devel] How does one install a libtool generated libfoo.so.1 file into ./libs/?

2021-10-19 Thread Pariksheet Nanda

Hi folks,

My package [1] depends on a C library libghmm-dev that's available in 
many GNU/Linux package managers.  However, it's not available on all 
platforms and if this dependency is not installed, my autoconf generated 
configure script defaults to falling back to compiling and installing 
the dependency from my bundled copy of upstream's pristine source 
tarball [2].  Now, because upstream uses automake which in turn uses 
libtool, I also use automake and libtool in my build process to hook 
into their build artifacts using SUBDIRS and *-local automake rules [3].


As you may know libtool appends `-version-info` to its generated shared 
libraries in the form "libfoo.so.1.2.3".  I'm linking against the 
bundled library which only sets the first value, namely libghmm.so.1.


The trouble is, R's installation process will only copy compiled files 
from ./libs/ that have exactly the extension ".so" and files ending with 
".so.1" are ignored.


My current workaround is to set -Wl,-rpath to the location of the 
generated ".so.1" file.  This allows the installation process to 
complete and sneakily pass the 2 canonical tests:



** testing if installed package can be loaded from temporary location
---snip---
** testing if installed package can be loaded from final location


However, not surprisingly, when I try to load the library from the final 
location after the temporary directory has been deleted it fails with:



library(tsshmm)
...
Error: package or namespace load failed for 'tsshmm' indyn.load(file, 
DLLpath = DLLpath, ...):
 unable to load shared object 
'/home/omsai/R/x86_64-pc-linux-gnu-library/4.1/tsshmm/libs/tsshmm.so':

  libghmm.so.1: cannot open shared object file: No such file or directory


I can rename the dependency from ".so.1" to ".so" to also get the 
dependent library to the final location.  But it still fails with the 
above error because the library links against the ".so.1" file and I 
would need an accompanying symlink.  I tried creating a symlink but 
can't think of how to get the symlink to the final location.  If my 
Makefile writes the symlink into ./inst/libs/libghmm.so.1 during compile 
time it is not actually installed; perhaps because the ./inst/ 
sub-directories are only copied earlier on when staging and are ignored 
later?  If I were to create that dangling symlink inside ./inst/libs/ 
instead of generating it later during compile time, devtools::install() 
complains about the broken symlink with:



cp: cannot stat 'tsshmm/inst/libs/libghmm.so.1': No such file or directory


So is there some mechanism to copy arbitrary files or symlinks to the 
final install location?  I prefer not to patch upstreams Makefile.am to 
remove their -version-info, but currently that's the only option I can 
think of.  I can't find helpful discussion surrounding this in the 
mailing list archives.


Last week when I've posted for help with my package on another issue on 
the Bioconductor mailing list, one adventurous soul tried installing the 
package using `remotes::install_gitlab("coregenomics/tsshmm")`.  This 
won't work because I haven't committed the generated autotools files; if 
anyone wants to play with it, you'll have to follow the 2 additional 
steps run by the continuous integration script, namely, unpacking 
./src/ghmm-0.9-rc3.tar.gz into ./src/ and running `autoreconf -ivf` in 
the package's top-level directory where configure.ac is located.


Any help appreciated,

Pariksheet


[1] https://gitlab.com/coregenomics/tsshmm

[2] The only patches I apply to the dependency are to fix 2 bugs for 
compiling, and to remedy a warning severe enough to be flagged by `R CMD 
check`.


[3] You can see my Makefile.am here:
https://gitlab.com/coregenomics/tsshmm/-/blob/master/src/Makefile.am

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Bioc-devel] Strange "internal logical NA value has been modified" error

2021-10-13 Thread Pariksheet Nanda

Hi Hervé,

On 10/13/21 12:43 PM, Hervé Pagès wrote:


On 12/10/2021 15:43, Pariksheet Nanda wrote:


The function in question is:

replace_unstranded <- function (gr) {
 idx <- strand(gr) == "*"
 if (length(idx) == 0L)


    ^
Not related to the "internal logical NA value has been modified" error
but shouldn't you be doing '!any(idx)' instead of 'length(idx) == 0L' here?


Indeed.  Although in a roundabout way the result somehow satisfied the 
unit tests, idx is a poor choice of name because it's really a mask, and 
your suggestion of OR-ing the mask FALSE values with any() is more 
intuitive.  The name is_unstranded might be less cryptic than mask.


Applying your suggestion of the correct condition uncovered a bug where 
return(gr) was returning the unsorted value, which is inconsistent with 
the behavior of the final statement returns a sorted value.  So changed 
to return(sort(gr)) for a consistent contract.


Fixed in f6892ea



Best,
H.


 return(gr)
 sort(c(
 gr[! idx],
 `strand<-`(gr[idx], value = "+"),
 `strand<-`(gr[idx], value = "-")))
}



Pariksheet

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] [External] Re: Strange "internal logical NA value has been modified" error

2021-10-13 Thread Pariksheet Nanda
0/13/21 8:27 AM, luke-tier...@uiowa.edu wrote:

*Message sent from a system outside of UConn.*


The most likely culprit is C code that is modifying a logical vector
without checking whether this is legitimate for R semantics
(i.e. making sure MAYBE_REFERENCED or at least MAYBE_SHARED is FALSE).
If that is the case, then this is legitimate for C code to do in
principle, so UBSAN and valgrind won't help. You need to set a gdb
watchpoint on the location, catch where it is modified, and look up
the call stack from there.

The error signaled in the GC is a sanity check for catching that this
sort of misbehavior has happened in C code. But it is a check after
the fact; it can't tell you more that that the problem happened
sometime before it was detected.

Best,

luke

On Wed, 13 Oct 2021, Martin Morgan wrote:

The problem with using gdb is you'd find yourself in the garbage 
collector, but perhaps quite removed from where the corruption 
occurred, e.g., gc() might / will likely be triggered after you've 
returned to the top-level evaluation loop, and the part of your code 
that did the corruption might be off the stack.


The problem with devtools::check() (and R CMD check) is that running 
the unit tests occurs in a separate process, so things like setting a 
global option (and even system variable from within R) may not be 
visible in the process doing the check. Conversely, for the same 
reasons, it seems like the problem can be tickled by running the tests 
alone. So


 R -f /tests/testthat.R

would seem to be a good enough starting point.

Actually, I liked Henrik's UBSAN suggestion, which requires the least 
amount of work. I think I'd then try


 R -d valgrind -f /tests/testthat.R

and then further into the weeds... actually from the section of R-exts 
you mention


 R_C_BOUNDS_CHECK=yes R -f /tests/testthat.R

might also be promising.

Martin

On 10/12/21, 10:30 PM, "Bioc-devel on behalf of Pariksheet Nanda" 
pariksheet.na...@uconn.edu> wrote:


   Hi all,

   On 10/12/21 6:43 PM, Pariksheet Nanda wrote:
   >
   > Error in `...`: internal logical NA value has been modified

   In the R source code, this error is in src/main/memory.c so I was
   thinking one way of investigating might be to run `R --debugger gdb`,
   then running R to load the symbols and either:

   1) set a breakpoint for when it reaches that particular line in
   memory.c:R_gc_internal and then walk up the stack,

   2) or set a watch point on memory.c:R_gc_internal:R_LogicalNAValue
   (somehow; having trouble getting gdb to reach that context).

   3) Then I thought, maybe this is getting far into the weeds and 
instead

   I could check the most common C related error by enabling bounds
   checking of my C arrays per section 4.4 of the R-exts manual:

   $ R -q
    > options(CBoundsCheck = TRUE)
    > Sys.setenv(R_C_BOUNDS_CHECK = "yes") # Try both ways *shrug*
    > devtools::test()
   ... # All tests still pass.
    > devtools::check()
   ... # No change :(

   Maybe I'm not sure I'm using that option correctly?  Or the option is
   ignored in devtools::check().  Or indeed, the error is not from over
   running C array boundaries.

   It turns out that using the precompiled debug symbols[1] isn't all 
that

   useful here because I don't get line numbers in gdb without the source
   files and many symbols are optimized out, so it looks like I would 
need

   to compile R from source with -ggdb first instead of using the Debian
   packages.

   Hopefully this is still the right approach?

   Pariksheet


   [1] After install r-base-core-dbg on Debian for the debug symbols.

   ___
   Bioc-devel@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/bioc-devel
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics and    Fax:   319-335-3017
    Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu


___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Strange "internal logical NA value has been modified" error

2021-10-12 Thread Pariksheet Nanda

Thanks, Martin and Henrik!

My previous confusing reply from a few minutes ago was due my university 
GMail hiding your replies in All Mail.


I'll consider both your suggestions carefully and thank you again for 
the quick and thoughtful replies.


Pairksheet


On 10/12/21 8:03 PM, Henrik Bengtsson wrote:

*Message sent from a system outside of UConn.*


In addition to checking with Valgrind, the ASan/UBsan and rchk
platforms on R-Hub (https://builder.r-hub.io/) can probably also be
useful;


rhub::check(platform = "linux-x86_64-rocker-gcc-san")
rhub::check(platform = "ubuntu-rchk")


/Henrik



On Tue, Oct 12, 2021 at 4:54 PM Martin Morgan  wrote:


It is from base R

   
https://github.com/wch/r-source/blob/a984cc29b9b8d8821f8eb2a1081d9e0d1d4df56e/src/main/memory.c#L3214

and likely indicates memory corruption, not necessarily in the code that 
triggers the error (this is when the garbage collector is triggered...). 
Probably in *your* C code :) since it's the least tested. Probably writing out 
of bounds.

This could be quite tricky to debug. I'd try to get something close to a 
minimal reproducible example.

I'd try to take devtools out of the picture, maybe running the test/testhat.R 
script from the command line using Rscript, or worst case creating a shell 
package that adds minimal code and can be checked with R CMD build 
--no-build-vignettes / R CMD check.

You could try inserting gc() before / after the unit test; it might make it 
clear that the unit test isn't the problem. You could also try gctorture(TRUE); 
this will make your code run extremely painfully slowly, which puts a big 
premium on having a minimal reproducible example; you could put this near the 
code chunks that are causing problems.

You might have success running under valgrind, something like R -d valgrind -f 
minimal_script.R.

Hope those suggestions help!

Martin


On 10/12/21, 6:43 PM, "Bioc-devel on behalf of Pariksheet Nanda" 
 wrote:

 Hi folks,

 I've been told to ask some of my more fun questions on this mailing list
 instead of Slack.  I'm climbing the ladder of submitting my first
 Bioconductor package (https://gitlab.com/coregenomics/tsshmm) and feel
 like there are gremlins that keep adding rungs to the top of the ladder.
   The latest head scratcher from running devtools::check() is a unit
 test for a  trivial 2 line function failing with this gem of an error:


  > test_check("tsshmm")
 ══ Failed tests
 
 ── Error (test-tss.R:11:5): replace_unstranded splits unstranded into +
 and - ──
 Error in `tryCatchOne(expr, names, parentenv, handlers[[1L]])`: internal
 logical NA value has been modified
 Backtrace:
   █
1. ├─testthat::expect_equal(...) test-tss.R:11:4
2. │ └─testthat::quasi_label(enquo(expected), expected.label, arg =
 "expected")
3. │   └─rlang::eval_bare(expr, quo_get_env(quo))
4. └─GenomicRanges::GRanges(c("chr:100:+", "chr:100:-"))
5.   └─methods::as(seqnames, "GRanges")
6. └─GenomicRanges:::asMethod(object)
7.   └─GenomicRanges::GRanges(ans_seqnames, ans_ranges, ans_strand)
8. └─GenomicRanges:::new_GRanges(...)
9.   └─S4Vectors:::normarg_mcols(mcols, Class, ans_len)
   10. └─S4Vectors::make_zero_col_DFrame(x_len)
   11.   └─S4Vectors::new2("DFrame", nrows = nrow, check = 
FALSE)
   12. └─methods::new(...)
   13.   ├─methods::initialize(value, ...)
   14.   └─methods::initialize(value, ...)
   15. └─methods::validObject(.Object)
   16.   └─base::try(...)
   17. └─base::tryCatch(...)
   18.   └─base:::tryCatchList(expr, classes,
 parentenv, handlers)
   19. └─base:::tryCatchOne(expr, names,
 parentenv, handlers[[1L]])
 [ FAIL 1 | WARN 0 | SKIP 0 | PASS 109 ]


 The full continuous integration log is here:
 https://gitlab.com/coregenomics/tsshmm/-/jobs/1673603868

 The function in question is:


 replace_unstranded <- function (gr) {
  idx <- strand(gr) == "*"
  if (length(idx) == 0L)
  return(gr)
  sort(c(
  gr[! idx],
  `strand<-`(gr[idx], value = "+"),
  `strand<-`(gr[idx], value = "-")))
 }


 Also online here:
 
https://gitlab.com/coregenomics/tsshmm/-/blob/ef5e19a0e2f68fca93665bc417afbcfb6d437189/R/hmm.R#L170-178

 ... and the unit test is:


 test_that("replace_unstranded splits unstranded into + and -", {
  expect_equal(replace_unstranded(GRanges(

Re: [Bioc-devel] Strange "internal logical NA value has been modified" error

2021-10-12 Thread Pariksheet Nanda

Hi all,

On 10/12/21 6:43 PM, Pariksheet Nanda wrote:


Error in `...`: internal logical NA value has been modified


In the R source code, this error is in src/main/memory.c so I was 
thinking one way of investigating might be to run `R --debugger gdb`, 
then running R to load the symbols and either:


1) set a breakpoint for when it reaches that particular line in 
memory.c:R_gc_internal and then walk up the stack,


2) or set a watch point on memory.c:R_gc_internal:R_LogicalNAValue 
(somehow; having trouble getting gdb to reach that context).


3) Then I thought, maybe this is getting far into the weeds and instead 
I could check the most common C related error by enabling bounds 
checking of my C arrays per section 4.4 of the R-exts manual:


$ R -q
> options(CBoundsCheck = TRUE)
> Sys.setenv(R_C_BOUNDS_CHECK = "yes") # Try both ways *shrug*
> devtools::test()
... # All tests still pass.
> devtools::check()
... # No change :(

Maybe I'm not sure I'm using that option correctly?  Or the option is 
ignored in devtools::check().  Or indeed, the error is not from over 
running C array boundaries.


It turns out that using the precompiled debug symbols[1] isn't all that 
useful here because I don't get line numbers in gdb without the source 
files and many symbols are optimized out, so it looks like I would need 
to compile R from source with -ggdb first instead of using the Debian 
packages.


Hopefully this is still the right approach?

Pariksheet


[1] After install r-base-core-dbg on Debian for the debug symbols.

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] Strange "internal logical NA value has been modified" error

2021-10-12 Thread Pariksheet Nanda

Hi folks,

I've been told to ask some of my more fun questions on this mailing list 
instead of Slack.  I'm climbing the ladder of submitting my first 
Bioconductor package (https://gitlab.com/coregenomics/tsshmm) and feel 
like there are gremlins that keep adding rungs to the top of the ladder. 
 The latest head scratcher from running devtools::check() is a unit 
test for a  trivial 2 line function failing with this gem of an error:



> test_check("tsshmm")
══ Failed tests 

── Error (test-tss.R:11:5): replace_unstranded splits unstranded into + 
and - ──
Error in `tryCatchOne(expr, names, parentenv, handlers[[1L]])`: internal 
logical NA value has been modified

Backtrace:
 █
  1. ├─testthat::expect_equal(...) test-tss.R:11:4
  2. │ └─testthat::quasi_label(enquo(expected), expected.label, arg = 
"expected")

  3. │   └─rlang::eval_bare(expr, quo_get_env(quo))
  4. └─GenomicRanges::GRanges(c("chr:100:+", "chr:100:-"))
  5.   └─methods::as(seqnames, "GRanges")
  6. └─GenomicRanges:::asMethod(object)
  7.   └─GenomicRanges::GRanges(ans_seqnames, ans_ranges, ans_strand)
  8. └─GenomicRanges:::new_GRanges(...)
  9.   └─S4Vectors:::normarg_mcols(mcols, Class, ans_len)
 10. └─S4Vectors::make_zero_col_DFrame(x_len)
 11.   └─S4Vectors::new2("DFrame", nrows = nrow, check = FALSE)
 12. └─methods::new(...)
 13.   ├─methods::initialize(value, ...)
 14.   └─methods::initialize(value, ...)
 15. └─methods::validObject(.Object)
 16.   └─base::try(...)
 17. └─base::tryCatch(...)
 18.   └─base:::tryCatchList(expr, classes, 
parentenv, handlers)
 19. └─base:::tryCatchOne(expr, names, 
parentenv, handlers[[1L]])

[ FAIL 1 | WARN 0 | SKIP 0 | PASS 109 ]


The full continuous integration log is here:
https://gitlab.com/coregenomics/tsshmm/-/jobs/1673603868

The function in question is:


replace_unstranded <- function (gr) {
idx <- strand(gr) == "*"
if (length(idx) == 0L)
return(gr)
sort(c(
gr[! idx],
`strand<-`(gr[idx], value = "+"),
`strand<-`(gr[idx], value = "-")))
}


Also online here:
https://gitlab.com/coregenomics/tsshmm/-/blob/ef5e19a0e2f68fca93665bc417afbcfb6d437189/R/hmm.R#L170-178

... and the unit test is:


test_that("replace_unstranded splits unstranded into + and -", {
expect_equal(replace_unstranded(GRanges("chr:100")),
 GRanges(c("chr:100:+", "chr:100:-")))
expect_equal(replace_unstranded(GRanges(c("chr:100", "chr:200:+"))),
 sort(GRanges(c("chr:100:+", "chr:100:-", "chr:200:+"
})


Also online here:
https://gitlab.com/coregenomics/tsshmm/-/blob/ef5e19a0e2f68fca93665bc417afbcfb6d437189/tests/testthat/test-tss.R#L11-L12

What's interesting is this is *not* reproducible by running 
devtools::test() but only devtools::check() so as far as I know there 
isn't a way to interactively debug this while devtools::check() is going on?


Every few days I've seen on that "internal ... value has been modified" 
which prevents me from running nearly any R commands.  Originally I 
would restart R, but then I found I could clear that error by running 
gc().  No idea what causes it.  Maybe some S4 magic?


Yes, I have downloaded the mailing lists for bioc-devel, r-devel, 
r-help, and r-package-devel and see no mention of "value has been 
modified" [1].


Any help appreciated.

Pariksheet



[1] Mailing lists downloader:
#!/bin/bash -x

for url in 
https://stat.ethz.ch/pipermail/{bioc-devel,r-{devel,help,package-devel}}/

do
dir=$(basename $url)
wget \
--timestamping \
--no-remove-listing \
--recursive \
--level 1 \
--no-directories \
--no-host-directories \
--cut-dirs 2 \
--directory-prefix "$dir" \
--accept '*.txt.gz' \
--relative \
--no-parent \
$url
done

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] GLAD quotation

2019-08-22 Thread Pariksheet Nanda
Hi Patricia,

Whoops, my mistake!  GLAD is indeed a Bioconductor package:

$ R -q
> BiocManager::available("glad")
[1] "GLAD"  "GladiaTOX"
>

You don't need to purchase any software license.
You can install the package freely inside R.
See:
https://bioconductor.org/packages/release/bioc/html/GLAD.html

If you're not familiar with using R, a good place to start is the Software
Carpentry lesson:
https://swcarpentry.github.io/r-novice-gapminder/

Pariksheet

On Thu, Aug 22, 2019 at 6:33 AM P q  wrote:

> Dear Support assistant,
>
> I am a doctoral student of Dr. Nicolas Carrels at FIOCRUZ and I am in
> charge to ask for softwares quotations for the lab. I would like a
> quotation for 4 years GLAD software license and It is for non-profit
> research. The quotation document must contain these informations
> below:
>
> 1)Head Researcher/ Scientist: Nicolas Carrels  CPF 84166770500
> 2)Institution: Fundacao Oswaldo Cruz -FIOCRUZ-Centro de Desenvolv.
> Tecn. em Saude Publica-CDTS
> 3)Address: Av.Brasil, 4036 - predio da expansão - 8˚ andar - sala 814
> cep 21040-361 - Rio de Janeiro-RJ - Brasil
>
>
> If you do have any questions, please, contact me by e-mail or by
> telephone: +55 21 965515609
>
>
> Best Regards,
> Patricia Queiroz Monteiro
>
> [[alternative HTML version deleted]]
>
> ___
> Bioc-devel@r-project.org mailing list
>
> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fbioc-develdata=02%7C01%7Cpariksheet.nanda%40uconn.edu%7C904fbfa340e94e99ccf208d726ec221f%7C17f1a87e2a254eaab9df9d439034b080%7C0%7C0%7C637020667966450005sdata=FBmicjyZJN2DfCI9%2By%2BVxNxubI9cJGUQVn5enwnNuMs%3Dreserved=0
>

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] GLAD quotation

2019-08-22 Thread Pariksheet Nanda
Hi Patricia,

You've got the wrong e-mail address.
Bioconductor doesn't sell proprietary software licenses.
This mailing list is for developers to discuss technical matters with the R
or Bioconductor software packages.

I don't know what the website or e-mail address you need for purchasing the
GLAD software; my NCBI and web searches are failing me.
You might need to ask Prof. Carrels?

Good luck,
Pariksheet

On Thu, Aug 22, 2019 at 6:33 AM P q  wrote:

> Dear Support assistant,
>
> I am a doctoral student of Dr. Nicolas Carrels at FIOCRUZ and I am in
> charge to ask for softwares quotations for the lab. I would like a
> quotation for 4 years GLAD software license and It is for non-profit
> research. The quotation document must contain these informations
> below:
>
> 1)Head Researcher/ Scientist: Nicolas Carrels  CPF 84166770500
> 2)Institution: Fundacao Oswaldo Cruz -FIOCRUZ-Centro de Desenvolv.
> Tecn. em Saude Publica-CDTS
> 3)Address: Av.Brasil, 4036 - predio da expansão - 8˚ andar - sala 814
> cep 21040-361 - Rio de Janeiro-RJ - Brasil
>
>
> If you do have any questions, please, contact me by e-mail or by
> telephone: +55 21 965515609
>
>
> Best Regards,
> Patricia Queiroz Monteiro
>
> [[alternative HTML version deleted]]
>
> ___
> Bioc-devel@r-project.org mailing list
>
> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fbioc-develdata=02%7C01%7Cpariksheet.nanda%40uconn.edu%7C904fbfa340e94e99ccf208d726ec221f%7C17f1a87e2a254eaab9df9d439034b080%7C0%7C0%7C637020667966450005sdata=FBmicjyZJN2DfCI9%2By%2BVxNxubI9cJGUQVn5enwnNuMs%3Dreserved=0
>

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] IRanges should support long vectors

2019-05-28 Thread Pariksheet Nanda
Hi Hervé,

Indeed, an IRanges with 2^31 elements is 17.1 GB.
The reason I was interested in IRanges, was GRanges are needed to create
the BSgenome::BSgenomeViews.
More broadly, my use case is chopping up a large genome into a fixed kmer
size so that repetitive "unmappable" regions can be removed.
https://github.com/coregenomics/kmap
My interest in long vectors is to address issue #8
https://github.com/coregenomics/kmap/issues/8

The workaround I've imagined so far is to have my kmap::kmerize function
return an iterator that creates GRanges less than length 2^31.
Using iterators doesn't even need any additional packages: they're
implemented in the BiocParallel bpiterator unit tests as returning a
function that keeps returning objects until it returns NULL.

But looking at how much more efficient your GPos, etc functions are,
perhaps maybe BSgenomeViews requiring a GRanges is not as reasonable?
I don't even know of a sane way to mock a BSgenome object for writing
tests.  It's irritating to have to use actual small genomes for tests.

Pariksheet

On Tue, May 28, 2019 at 3:35 AM Pages, Herve  wrote:

> Hi Pariksheet,
>
> On 5/25/19 12:49, Pariksheet Nanda wrote:
>
> Hello,
>
> R 3.0 added support for long vectors, but it's not yet possible to use them
> with IRanges.  Without long vector support it's not possible to construct
> an IRanges object with more than 2^31 elements:
>
>
>
> ir <- IRanges(start = 1:(2^31 - 1), width = 1)
> ir <- IRanges(start = 1:2^31, width = 1)
>
> Error in .Call2("solve_user_SEW0", start, end, width, PACKAGE = "IRanges")
> :
>   long vectors not supported yet: memory.c:3715
> In addition: Warning message:
> In .normargSEW0(start, "start") :
>   NAs introduced by coercion to integer range
>
> Right. This is a known limitation of IRanges objects and Vector
> derivatives in general.
>
> I wonder what's your use case?
>
> FWIW supporting long Vector derivatives (including long IRanges) has been
> on the TODO list for a while. Unfortunately it seems that we keep getting
> distracted by other things.
>
> Note that even when long IRanges objects are supported, computing on them
> will not be very efficient because the memory footprint of these objects
> will be very big (> 16Gb). It is much more interesting (and fun) to use
> long Vector derivatives that have a **small** memory footprint like long
> Rle's or long StitchedIPos/StitchedGPos objects:
>
>   library(S4Vectors)
>
>   x <- Rle(1:15, 1e9)
>   x
>   # integer-Rle of length 150 with 15 runs
>   #   Lengths: 10 10 10 ... 10 10
>   #   Values :  1  2  3 ... 14 15
>
>   object.size(x)
>   # 1288 bytes
>
>   library(IRanges)
>
>   ipos <- IPos(IRanges(1, 2e9))
>   ipos
>   # StitchedIPos object with 20 positions and 0 metadata columns:
>   #   pos
>   # 
>   #[1]  1
>   #[2]  2
>   #[3]  3
>   #[4]  4
>   #[5]  5
>   #......
>   #   [16] 16
>   #   [17] 17
>   #   [18] 18
>   #   [19] 19
>   #   [20] 20
>
>   object.size(ipos)
>   # 2736 bytes
>
>   library(GenomicRanges)
>
>   gpos <- GPos("chr1:1-5e8")  # not a real organism ;-)
>   gpos
>   # StitchedGPos object with 5 positions and 0 metadata columns:
>   # seqnames   pos strand
>   #   
>   # [1] chr1 1  *
>   # [2] chr1 2  *
>   # [3] chr1 3  *
>   # [4] chr1 4  *
>   # [5] chr1 5  *
>   # ...  ...   ......
>   # [49996] chr1 49996  *
>   # [49997] chr1 49997  *
>   # [49998] chr1 49998  *
>   # [4] chr1 4  *
>   # [5] chr1 5  *
>   # ---
>   # seqinfo: 1 sequence from an unspecified genome; no seqlengths
>
>   object.size(gpos)
>   # 10552 bytes
>
>
> We're not here yet but the goal would be to have light-weight objects that
> can represent all the genomic positions in the Human genome.
>
> H.
>
>
> This is true when using the latest version from GitHub
>
>
>
> BiocManager::install("Bioconductor/IRanges")
> sessionInfo()
>
> R version 3.6.0 (2019-04-26)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Red Hat Enterprise Linux Server release 6.7 (Santiago)
>
> M

[Bioc-devel] IRanges should support long vectors

2019-05-25 Thread Pariksheet Nanda
Hello,

R 3.0 added support for long vectors, but it's not yet possible to use them
with IRanges.  Without long vector support it's not possible to construct
an IRanges object with more than 2^31 elements:


> ir <- IRanges(start = 1:(2^31 - 1), width = 1)
> ir <- IRanges(start = 1:2^31, width = 1)
Error in .Call2("solve_user_SEW0", start, end, width, PACKAGE = "IRanges")
:
  long vectors not supported yet: memory.c:3715
In addition: Warning message:
In .normargSEW0(start, "start") :
  NAs introduced by coercion to integer range
>


This is true when using the latest version from GitHub


> BiocManager::install("Bioconductor/IRanges")
> sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server release 6.7 (Santiago)

Matrix products: default
BLAS:
/home/pan14001/spack/opt/spack/linux-rhel6-x86_64/gcc-7.4.0/r-3.6.0-r7m53dthhqtxyrrdghjuiw2otasowvbl/rlib/R/lib/libRblas.so
LAPACK:
/home/pan14001/spack/opt/spack/linux-rhel6-x86_64/gcc-7.4.0/r-3.6.0-r7m53dthhqtxyrrdghjuiw2otasowvbl/rlib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4parallel  stats graphics  grDevices utils datasets
[8] methods   base

other attached packages:
[1] IRanges_2.19.5  S4Vectors_0.22.0BiocGenerics_0.30.0

loaded via a namespace (and not attached):
 [1] ps_1.3.0   prettyunits_1.0.2  withr_2.1.2crayon_1.3.4

 [5] rprojroot_1.3-2assertthat_0.2.1   R6_2.4.0
backports_1.1.4
 [9] magrittr_1.5   cli_1.1.0  curl_3.3   remotes_2.0.4

[13] callr_3.2.0tools_3.6.0compiler_3.6.0
processx_3.3.1
[17] pkgbuild_1.0.3 BiocManager_1.30.4
>


Pariksheet

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] BiocManager to install Depends/Imports/Suggests

2018-07-09 Thread Pariksheet Nanda
Hi Levi,

Why not use devtools which already does this?  Setting `dependencies =
TRUE` installs the packages listed in Imports and Suggests, and
BiocManager::repositories() like BiocInstaller::biocinstallRepos() returns
a list of repositories.  See inline below:

On Mon, Jul 9, 2018 at 4:51 AM, Levi Waldron 
wrote:

> It would be useful to be able to use BiocManager to install
> the Depends/Imports/Suggests of a source package not on Bioconductor, e.g.:
>
> BiocManager::install("Bioconductor/BiocWorkshops")  #works but only if all
> Depends/Imports are already installed
>

devtools::install("Bioconductor/BiocWorkshops", repos =
BiocManager::repositories(), dependencies = TRUE)



> Also from a local package, e.g.:
>
> BiocManager::install("mypackage_0.1.tar.gz")  # or,
> BiocManager::install("mypackage")
>

devtools::install("mypackage_0.1.tar.gz", repos =
BiocManager::repositories(), dependencies = TRUE)
devtools::install("mypackage", repos = BiocManager::repositories(),
dependencies = TRUE)
devtools::install(".", repos = BiocManager::repositories(), dependencies =
TRUE)


Pariksheet

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] BiocInstaller: next generation

2018-05-09 Thread Pariksheet Nanda
Hi Henrik,

On Thu, May 10, 2018 at 1:21 AM, Henrik Bengtsson <
henrik.bengts...@gmail.com> wrote:
>
>
> May I suggest the package name:
>
> * Bioconductor
>
> The potential downside would be possible confusions between the version of
> this package versus the actual Bioconductor repository.  Could the
> Bioconductor *package* have a version  x.y.z that reflects the
*repository*
> x.y version?

This is a nice suggestion that also crossed my mind, but users new to both
R and Bioconductor might think "but I have 'Bioconductor' installed, why
can't I run this script?", and it might complicate web namespace / presence
by entrapping searches for the Bioconductor system to the single package.


> /Henrik

Pariksheet

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Including data for @examples to run

2018-04-26 Thread Pariksheet Nanda
Hi Adam,

On Wed, Apr 25, 2018 at 2:35 PM, Adam Price  wrote:
>
> There are a few reasons why I'm using \dontrun{} for my examples and want
> to know if there is any way to actually run my examples.
>
> My package incorporates some automated data management and requires in
> practice that certain directories exist.

You might consider decoupling the code logic that creates or requires such
a directory structure, and you could have a wrapper function that takes the
input + output paths as function parameters with some defaults.  An R-ish
way of setting package-wide defaults that might be a good fit for your use
case is using options(), so you could have the flexibility of your input
and output paths falling back to getOption(...) if they are not provided.
Although that can be a little fragile if a user sets options() between
related function calls.  Another possibility might be to instantiate a
class to keep a single instance of your path structure.  I imagine there
must be existing bioconductor packages that do this sort of thing,
especially those that lightly wrap around other programs that create
directories, like the Rbowtie2 package (Rbowtie2 checks for files and
directory structures, but doesn't make use of classes).  Maybe others on
the list know of packages that come to mind or can comment on these ideas.

Are you by any chance writing unit tests for your package?  One of the
really nice benefits of separating out the directory requirement is making
your code more testable.  I know that Bioconductor doesn't formally require
tests for their packages, but even so they are very useful and often you
can answer architectural decisions about how to best structure your code by
how nicely it satisfies tests.


> I am storing some package environmental variables in my package like this:
> myPackage_env <- new.env(parent=emptyenv())

I feel like in general using environmental variables for solving problems
in computing is like reaching for the sledge hammer in your toolbox.
Certainly, it has many legitimate uses, especially in cluster computing
where you have to setup the environment for packages to find each other.
Would it be possible to see a link to some your package code?  Then we can
comment more about the paths and environmental variables, and be more
specific about alternatives and suggestions.


These are very good questions.  A lot of workflows don't lend themselves to
being done entirely in memory, can rely on lots of existing files, and
require better integration with software outside of the Bioconductor
system, and it's fun to learn about how to tackle them.


> -Adam

Pariksheet

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] BiocCheck - warning: files are over 5MB

2018-03-10 Thread Pariksheet Nanda
> disk efficient compression algorithm

Whoops, meant to say compression format.

Pariksheet

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] BiocCheck - warning: files are over 5MB

2018-03-10 Thread Pariksheet Nanda
Hi Claris,

On Sat, Mar 10, 2018 at 2:49 AM, Claris Baby via Bioc-devel <
bioc-devel@r-project.org> wrote:
>
> [1] "The following files are over 5MB in size:
> 'dataset/Caenorhabditis_elegans.WBcel235.dna.chromosome.I.fa'."
> This as well as other data like .gff files, that are being used
> for the reference based assembly are all much more than 5mb.
> But the total package size is less than 500mb.

Assuming that's not a typo, 500 mb is very large and inappropriate for a
package.  It's generally good practice to separate code and data where
possible, not least because it bloats code version control.  If your
package size is close to 500 mb, you should think about stashing the data
and accessing it using something like the AnnotationHub or BiocFileCache
(some others on the mailing list might have better and more specific
suggestions as I've not yet had to deal with this particular problem, if
you confirm that the package is indeed that big).


> Is it essential that each file within the package is less than
> 5mb. If so, it would be very kind if anyone could suggest how
> to reduce the size of the genomic data files.

Can you gzip compress those data files?  Text based files usually compress
quite well and many functions like import() from tracklayer will
automagically decompress them so you might not even need to change much in
your code.

.gz isn't the most disk efficient compression algorithm out there; .bz2
compresses better and is actually what R natively uses for save() and
load() of .RData files, and .xz typically yields even better lossless
compression but, for cross-platform compatibility that Bioconductor strives
for, using .gz might be best to try first.


> Claris Baby

Pariksheet

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] parallel processing in R?

2017-12-11 Thread Pariksheet Nanda
Hi Bhakti,

On Mon, Dec 11, 2017 at 12:19 PM, Dwivedi, Bhakti 
wrote:

> Is there a way to parallelize ConsensusClusterPlus package?
> https://bioconductor.org/packages/release/bioc/html/
> ConsensusClusterPlus.html
> We are developing a R shiny tool that performs consensus clustering in
> addition to other genome-wide analyses. The consensus clustering step is
> taking the longest.
> Can I do parallel processing in R/R shiny? What parallel package (if any)
> I can implement in the R code to do parallel computing?
>

The canonical package for parallel computing in Bioconductor is
BiocParallel:
https://bioconductor.org/packages/release/bioc/html/BiocParallel.html
One can choose what backend to use for parallelism.  You can switch from
using SerialParam for cheap and cheerful lapply-like functionality to the
default MultiCoreParam and SnowParam which nicely logs useful things like
memory usage.

It does not look like ConsensusClusterPlus is importing any parallel
package of it's own that you need to fight against, so best case scenario
is you look at the function of interest you want to run many times and run
that function with BiocParallel's bplapply.  Or if there are multiple
levels of parallelism like internal and external looping then you might
have to dive into ConsensusClusterPlus and inject bplapply statements
ideally allowing some bpparam() argument passing for the inner and outer
loops.



> Bhakti
>

Pariksheet

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Help with R CMD check NOTEs

2017-10-18 Thread Pariksheet Nanda
Hi Anusha,

On Wed, Oct 18, 2017 at 2:30 PM, Anusha Nagari <
anusha.nag...@utsouthwestern.edu> wrote:
>
> Can you please let me know how to go about the following NOTE. Or if this
is something that should be really taken care of for a successful package
build and install:
>
> * checking re-building of vignette outputs ... NOTE
> Warnings in re-building vignettes:
>   Warning: file stem ‘/fig2’ is not portable
>   Warning: file stem ‘/fig3’ is not portable
>
> @Pariksheet: I am working on the groHMM package.
https://github.com/Kraus-Lab/groHMM

My best guess is you might want to revise your figure label names without
the number.  LaTeX commands like macro names consider numbers to be an
invalid character class.  So you could try replacing instances of fig2 and
fig3 in the vignette with something like figTwo and figThree.

I couldn't reproduce the build NOTE to confirm the fault / fix.  The
bioconductor.org 1.10.0 tarball doesn't see to produce the error, BiocCheck
fails, and I had trouble building the vignette from the GitHub master
branch.

Hope that helps.


> Anusha

 Pariksheet

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] Help with R CMD check NOTEs

2017-10-18 Thread Pariksheet Nanda
Hi Anusha

On Wed, Oct 18, 2017 at 12:04 PM, Anusha Nagari
<anusha.nag...@utsouthwestern.edu> wrote:
>
> Depends: includes the non-default packages:
>   ‘MASS’ ‘parallel’ ‘S4Vectors’ ‘IRanges’ ‘GenomeInfoDb’
>   ‘GenomicRanges’ ‘GenomicAlignments’ ‘rtracklayer’
> Adding so many packages to the search path is excessive and importing
> selectively is preferable.

Move those to the "Imports" section in your package DESCRIPTION file.


> * checking re-building of vignette outputs ... NOTE
> Warnings in re-building vignettes:
>   Warning: file stem ‘/fig2’ is not portable
>   Warning: file stem ‘/fig3’ is not portable

Hmm... I think we'll have to look at the exact vignette to see what's
going on.  Presumably that's a LaTeX vignette.  Can you advise the
package name you are working on and/or link to the the source code?


> Anusha

Pariksheet

---
Pariksheet Nanda
PhD Candidate in Genetics and Genomics
System Administrator, Storrs HPC Cluster
University of Connecticut

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

[Bioc-devel] Cannot install Bioconductor packages with biocLite() after loading QuasR

2017-09-03 Thread Pariksheet Nanda
Hi folks,

It looks like loading QuasR breaks biocLite() because it magically
wants to use biocLite() in qAlign():


$ find -not -name '*.Rnw' -exec grep -E '(BiocInstaller|biocLite)' {}
+ 2>/dev/null
./DESCRIPTION:   S4Vectors (>= 0.9.25), IRanges, BiocInstaller, Biobase,
./R/qAlign.R:  biocLite(genome, suppressUpdates=TRUE, lib=lib.loc)
./NAMESPACE:importFrom(BiocInstaller, biocLite)
$


Here's the error in a fresh R session:

> suppressPackageStartupMessages(library(QuasR))
> BiocInstaller::biocLite("BSgenome.Hsapiens.UCSC.hg38")
Error: failed to update BiocInstaller:
namespace ‘BiocInstaller’ is imported by ‘QuasR’ so cannot be unloaded
>


What would be a good way to fix this?  I think trying to use
biocLite() from inside a package is a bit naughty and installing
packages should be left up to the user instead?


Reproducible in R 3.4.1 and a daily build:

> sessionInfo()
R Under development (unstable) (2017-08-01 r73012)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.3 LTS

Matrix products: default
BLAS: 
/share/apps/spack/opt/spack/linux-ubuntu16-x86_64/gcc-5.4.0/r-2017-08-01-jyjbn6hodegfxzvg6aojsdu7fmrdzi3y/rlib/R/lib/libRblas.so
LAPACK: 
/share/apps/spack/opt/spack/linux-ubuntu16-x86_64/gcc-5.4.0/r-2017-08-01-jyjbn6hodegfxzvg6aojsdu7fmrdzi3y/rlib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4parallel  stats graphics  grDevices utils datasets
[8] methods   base

other attached packages:
[1] QuasR_1.17.0  Rbowtie_1.17.0GenomicRanges_1.29.12
[4] GenomeInfoDb_1.13.4   IRanges_2.11.12   S4Vectors_0.15.6
[7] BiocGenerics_0.23.0

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.12   RColorBrewer_1.1-2
 [3] BiocInstaller_1.27.3   compiler_3.5.0
 [5] XVector_0.17.1 prettyunits_1.0.2
 [7] progress_1.1.2 GenomicFeatures_1.29.8
 [9] bitops_1.0-6   GenomicFiles_1.13.10
[11] tools_3.5.0zlibbioc_1.23.0
[13] biomaRt_2.33.4 digest_0.6.12
[15] bit_1.1-12 BSgenome_1.45.1
[17] RSQLite_2.0memoise_1.1.0
[19] tibble_1.3.4   lattice_0.20-35
[21] rlang_0.1.2Matrix_1.2-11
[23] DelayedArray_0.3.19DBI_0.7
[25] GenomeInfoDbData_0.99.1hwriter_1.3.2
[27] stringr_1.2.0  rtracklayer_1.37.3
[29] Biostrings_2.45.4  bit64_0.9-7
[31] grid_3.5.0 Biobase_2.37.2
[33] R6_2.2.2   AnnotationDbi_1.39.2
[35] XML_3.98-1.9   BiocParallel_1.11.6
[37] latticeExtra_0.6-28magrittr_1.5
[39] blob_1.1.0 Rsamtools_1.29.1
[41] matrixStats_0.52.2 GenomicAlignments_1.13.5
[43] ShortRead_1.35.1   assertthat_0.2.0
[45] SummarizedExperiment_1.7.5 stringi_1.1.5
[47] RCurl_1.95-4.8 VariantAnnotation_1.23.8
>


Pariksheet

---
Pariksheet Nanda
PhD Candidate in Genetics and Genomics
System Administrator, Storrs HPC Cluster
University of Connecticut

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] Iterating over BSgenomeViews returns DNAString instead of BSgenomeViews

2017-04-12 Thread Pariksheet Nanda
On Fri, Apr 7, 2017 at 1:13 AM, Hervé Pagès  wrote:
>
> This is the expected behavior.
>
> Some background: BSgenomeViews are list-like objects where the *list
> elements* (i.e. the elements one extracts with [[) are the DNA
> sequences from the views
--snip--
> The important difference is that with [[ I get a DNAString object
> (the content of the view) and with [ I get a BSgenomeViews object
> of length 1.

Thank you, Hervé!

I was failing to make the connection with the `[[` accessor.


On Fri, Apr 7, 2017 at 1:16 AM, Michael Lawrence 
wrote:
>
> I'm curious as to why you are looping over the views in the first
> place. Maybe we could arrive at a vectorized solution, which is often
> but not always simpler and faster.

Hi Michael!

Broad background is I'm acculturating an undergraduate student to writing a
bioconductor package and applying software engineering practices of version
control, unit testing, documenting, dependency setup and validation in a
different environment on our university HPC cluster, etc.  The student also
came along to LibrePlanet to better understand the culture of software
freedom :o)  The package goal is to use Biostrings to look for repeating
DNA sequences of a fixed kmer size and subset to portions of the genome
without repeats (an aligner can do this ofc, but the goal is to teach R and
engineering practices).

I appreciate your thoughtfulness for vectorizing the code to best use
BSgenomeViews, but please don't spend more than 10 minutes as I have to
balance changes to the code with the student's learning and coding "voice"
and may not do proper justice for more of your effort.  My slowness to
reply was getting the project further along to be more understandable.
Here was the line which I've updating as Hervé suggested to use seq_along():
https://github.com/coregenomics/kmap/blob/4adaed6b8007e8ea39f39ff57a42a821445d3d46/R/BiostringsProjectNEW.R#L185
(I'm having a hard time thinking of how to summarizing a small example out
of context).
Although in that line ranges_hits() is only operating on single indices,
ranges_hits() was written to process groups of indices to reduce
multi-processor communication.  Generating such sets of indices would
involve applying width() to the views inside mappable() to break in into
chunks of, say, a million bases for matchPDict().  Again, I'm linking to
the code for anything that stands out at you, but I will feel bad if you
spend a lot of time on it.


> H.

> Michael

Pariksheet

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

[Bioc-devel] Iterating over BSgenomeViews returns DNAString instead of BSgenomeViews

2017-04-05 Thread Pariksheet Nanda
Hi bioconductor devs,

The BSgenomeViews class has been very useful in efficiently propagating
metadata for running Biostring operations.  I noticed something unexpected
when iterating over views - it seems to return the Biostrings object
instead of a single length Views object, and thus loses the associated view
metadata.  Is this intentional?  Below is some example code, the output and
sessionInfo().  Yes, I also confirmed this happens in the development
version of R / bioconductor 3.5.

On a side note, for unit testing it's been difficult to mock a BSgenome
object due to the link to physical files, and as a workaround I use a
small, arbitrary BSgenome package.  Can one construct a BSgenome from its
package bundled extdata?  The man page examples use packaged genomes.

Code to reproduce the issue:

--
library(BSgenome)
genome <- getBSgenome("BSgenome.Hsapiens.UCSC.hg19")
gr <- GRanges(c("chr1:25001-28000", "chr2:30001-31000"))
views <- Views(genome, gr)
views
lapply(views, class)
--

Result:

--
> views
BSgenomeViews object with 2 views and 0 metadata columns:
  seqnames ranges strand   dna
 
  [1] chr1 [25001, 28000]  * [GCTTCAGCCT...TTATTTATTG]
  [2] chr2 [30001, 31000]  * [GACCCTCCTG...AGCAGGTGGT]
  ---
  seqinfo: 93 sequences (1 circular) from hg19 genome
> lapply(views, class)
[[1]]
[1] "DNAString"
attr(,"package")
[1] "Biostrings"

[[2]]
[1] "DNAString"
attr(,"package")
[1] "Biostrings"

>
--

Tested against these configurations:
1) R 3.3.2 + BSgenome 1.42.0 (stable bioconductor 3.4)
2) R 2017-04-05 installed via llnl/spack + BSgenome 1.43.7 (devel
bioconductor 3.5)

sessionInfo for configuration #2 above:
--
> sessionInfo()
R Under development (unstable) (2017-04-05 r72488)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.2 LTS

Matrix products: default
BLAS:
/share/apps/spack/opt/spack/linux-ubuntu16-x86_64/gcc-5.4.0/r-2017-04-05-4tkzhsu6sdpwmlvnv275jf6x766gwnpy/rlib/R/lib/libRblas.so
LAPACK:
/share/apps/spack/opt/spack/linux-ubuntu16-x86_64/gcc-5.4.0/r-2017-04-05-4tkzhsu6sdpwmlvnv275jf6x766gwnpy/rlib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4parallel  stats graphics  grDevices utils datasets
[8] methods   base

other attached packages:
 [1] BSgenome.Hsapiens.UCSC.hg19_1.4.0 BSgenome_1.43.7
 [3] rtracklayer_1.35.10   Biostrings_2.43.7
 [5] XVector_0.15.2GenomicRanges_1.27.23
 [7] GenomeInfoDb_1.11.10  IRanges_2.9.19
 [9] S4Vectors_0.13.15 BiocGenerics_0.21.3

loaded via a namespace (and not attached):
 [1] zlibbioc_1.21.0GenomicAlignments_1.11.12
 [3] BiocParallel_1.9.5 lattice_0.20-35
 [5] tools_3.5.0SummarizedExperiment_1.5.7
 [7] grid_3.5.0 Biobase_2.35.1
 [9] matrixStats_0.52.1 Matrix_1.2-9
[11] GenomeInfoDbData_0.99.0bitops_1.0-6
[13] RCurl_1.95-4.8 DelayedArray_0.1.7
[15] compiler_3.5.0 Rsamtools_1.27.15
[17] XML_3.98-1.6
> BiocInstaller::biocValid()
[1] TRUE
>

---
Pariksheet Nanda
PhD Candidate in Genetics and Genomics
System Administrator, Storrs HPC Cluster
University of Connecticut

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] OrganismDb package for Drosophila.melanogaster

2016-11-17 Thread Pariksheet Nanda
On Tue, Nov 15, 2016 at 7:34 PM, Martin Morgan wrote:
> On 11/15/2016 09:52 AM, Obenchain, Valerie wrote:
>> On 11/15/2016 03:32 AM, Pariksheet Nanda wrote:
>>>
>>> It would be great to have an OrganismDb package for
>>> Drosophila.melanogaster, similar to Homo.sapiens, Mus.musculus and
>>> Rattus.norvegicus.
--snip--
>>> In other words, like Rattus.norvegicus, it might be good do add a UCSC
>>> "refGene" TxDb package for dm6 as "ensGene" doesn't appear to be as
good of
>>> a candidate (at least without some ugliness)?  I was looking at
creating a
>>> dm6 UCSC "refGene" TxDb.
>>
>> You can use GenomicFeatures::makeTxDbFromUCSC() to create the TxDb. The
>> man page, ?makeTxDbFromUCSC, also has helper functions that display
>> available genomes, tables and tracks.
>
> I'm not completely sure of the result, but
>
> library(OrganismDb)
> odb = makeOrganismDbFromUCSC("dm6", tableName="refGene")
>
> might be most of the way there?

Thanks Valerie and Martin for pointing out the make*() functions!

As my lab uses the same UCSC tables frequently, I used the
make*Package() functions (namely,
GenomicFeatures::makeTxDbPackageFromUCSC and
OrganismDbi::makeOrganismPackage).

For others who run OrganismDbi::makeOrganismPackage, don't forget
to edit the generated PACKAGE file and add your new TxDb package
to "Depends".


>> Valerie

> Martin

Pariksheet

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] OrganismDb package for Drosophila.melanogaster

2016-11-15 Thread Pariksheet Nanda
Hi folks,

It would be great to have an OrganismDb package for
Drosophila.melanogaster, similar to Homo.sapiens, Mus.musculus and
Rattus.norvegicus.

While trying to do this on my own using the Homo.sapiens package as a
starting point, I found the most similar looking keys to relate
org.Dm.eg.db and TxDb.Dmelanogaster.UCSC.dm6.ensGene to be "ENSEMBL" and
"GENEID" though there's a ".1" tacked to the end "GENEID" which makes it
harder to supply the graphInfo object to OrganismDbi:::.loadOrganismDbiPkg:

!> key_ <- function(db, key) sort(as.character(
 +select(db, keys(db, key), key,
key)[[key]]))
 > key_head <- function(db, key) head(key_(db, key))
 > key_head(TxDb.Dmelanogaster.UCSC.dm6.ensGene, "GENEID")
 'select()' returned 1:1 mapping between keys and columns
 [1] "FBgn003.1" "FBgn008.1" "FBgn014.1" "FBgn015.1"
 [5] "FBgn017.1" "FBgn018.1"
 > key_head(org.Dm.eg.db, "ENSEMBL")
 [1] "FBgn008" "FBgn014" "FBgn015" "FBgn017" "FBgn018"
 [6] "FBgn022"
 >

In other words, like Rattus.norvegicus, it might be good do add a UCSC
"refGene" TxDb package for dm6 as "ensGene" doesn't appear to be as good of
a candidate (at least without some ugliness)?  I was looking at creating a
dm6 UCSC "refGene" TxDb.  I imagine one would query the UCSC public MySQL
server and then do the SQLite conversion.  Although the conversion to
SQLite seems a bit finagly as the datatypes differ between MySQL and SQLite
and I'm having a hard time finding a well supported tool to do it; I don't
want to introduce errors or harm reproducibility.  What do you use for
MySQL to SQLite conversion?  Or would it be more sensible for you
benevolent dictators to generate the package(s)?

Pariksheet

---
Pariksheet Nanda
PhD Candidate in Genetics and Genomics
System Administrator, Storrs HPC Cluster
University of Connecticut

---
 > sessionInfo()
 R Under development (unstable) (2016-11-13 r71655)
 Platform: x86_64-pc-linux-gnu (64-bit)
 Running under: Ubuntu 16.04.1 LTS

 locale:
  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
 [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

 attached base packages:
 [1] stats4parallel  stats graphics  grDevices utils datasets
 [8] methods   base

 other attached packages:
  [1] Rattus.norvegicus_1.3.1
  [2] TxDb.Rnorvegicus.UCSC.rn5.refGene_3.4.0
  [3] org.Rn.eg.db_3.4.0
  [4] Mus.musculus_1.3.1
  [5] TxDb.Mmusculus.UCSC.mm10.knownGene_3.4.0
  [6] org.Mm.eg.db_3.4.0
  [7] Homo.sapiens_1.3.1
  [8] GO.db_3.4.0
  [9] OrganismDbi_1.17.1
 [10] org.Hs.eg.db_3.4.0
 [11] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
 [12] org.Dm.eg.db_3.4.0
 [13] TxDb.Dmelanogaster.UCSC.dm6.ensGene_3.3.0
 [14] GenomicFeatures_1.27.2
 [15] AnnotationDbi_1.37.0
 [16] Biobase_2.35.0
 [17] GenomicRanges_1.27.6
 [18] GenomeInfoDb_1.11.4
 [19] IRanges_2.9.8
 [20] S4Vectors_0.13.2
 [21] BiocGenerics_0.21.0
 [22] BiocInstaller_1.25.2

 loaded via a namespace (and not attached):
  [1] compiler_3.4.0 XVector_0.15.0
  [3] bitops_1.0-6   tools_3.4.0
  [5] zlibbioc_1.21.0biomaRt_2.31.1
  [7] RSQLite_1.0.0  lattice_0.20-34
  [9] Matrix_1.2-7.1 graph_1.53.0
 [11] DBI_0.5-1  rtracklayer_1.35.1
 [13] Biostrings_2.43.0  grid_3.4.0
 [15] XML_3.98-1.5   RBGL_1.51.0
 [17] BiocParallel_1.9.1 Rsamtools_1.27.2
 [19] GenomicAlignments_1.11.1   SummarizedExperiment_1.5.3
 [21] RCurl_1.95-4.8
 >

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel