Re: [Bioc-devel] IRanges findOverlaps Result Different for Recent Update

2015-01-15 Thread Michael Lawrence
If it's not documented, it should be, because Patrick did it on purpose
(the output from the IntervalTree code is not sorted). We could add an
argument to disable the sorting for when the extra speed is desired. But it
has proven useful.

On Thu, Jan 15, 2015 at 6:42 AM, Kasper Daniel Hansen 
kasperdanielhan...@gmail.com wrote:

 Has it ever been documented that the return object is sorted in a specific
 way?  I just want to make sure we think about whether that is something we
 want to enforce giving the possibility of using a different algorithm in
 the future.

 We could also address this by implementing (perhaps it already exists) a
 sort() method for the return object.  That would still break existing code
 though.

 Best,
 Kasper

 On Wed, Jan 14, 2015 at 11:13 PM, Michael Lawrence 
 lawrence.mich...@gene.com wrote:

  I bet there is a lot of code that depends on having the hits
 (conveniently)
  ordered by query,subject index, so we should try to restore the previous
  behavior.
 
  On Wed, Jan 14, 2015 at 8:00 PM, Dario Strbenac 
  dstr7...@uni.sydney.edu.au
  wrote:
 
   Hello,
  
   For an identical query, the matrix results are in a different order.
   Consider the subject hits of the last two rows :
  
mapping# R Under development (unstable) (2015-01-13 r67453)
 and
   IRanges 2.1.35
queryHits subjectHits
   [1,] 1   1
   [2,] 1   4
   [3,] 2   2
   [4,] 4   1
   [5,] 4   4
   [6,] 6   7
   [7,] 6   6
  
mapping# R Under development (unstable) (2015-01-13 r67453)
 and
   IRanges 2.0.1
queryHits subjectHits
   [1,] 1   1
   [2,] 1   4
   [3,] 2   2
   [4,] 4   1
   [5,] 4   4
   [6,] 6   6
   [7,] 6   7
  
   This causes some values to be extracted in a different order by our
   annotationLookup function, and causes an error for the development
  version
   of Repitools on a test case which uses all.equal to compare a list to a
   correct list, but not for the release version which uses the release
   version of IRanges. Should I update the test case to have a new
 expected
   result, or is this new characteristic of findOverlaps likely to revert
 to
   the previous output soon ?
  
   The two sets of intervals to produce this result are anno and probesGR,
   defined in the tests.R file in the Repitools package.
  
   --
   Dario Strbenac
   PhD Student
   University of Sydney
   Camperdown NSW 2050
   Australia
   ___
   Bioc-devel@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/bioc-devel
  
 
  [[alternative HTML version deleted]]
 
  ___
  Bioc-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/bioc-devel
 

 [[alternative HTML version deleted]]

 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel


[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] IRanges findOverlaps Result Different for Recent Update

2015-01-15 Thread Hervé Pagès

Hi guys,

Indeed, the Hits object returned by findOverlaps() is not fully
sorted anymore. Now it's sorted by query hit *only* and not by query
hit *and* subject hit. Fully sorting a big Hits object has a high
cost, both in terms of time and memory footprint. The partial
sorting is *much* cheaper: it's done using a tabulated sorting
algo implemented in C that works in linear time.

The partial sorting is important: it allows a very common
transformation like as(hits, List) to be super fast. But the
full sorting was overkill and generally not needed. Also note that
the full sorting was never enforced via the validity method for
Hits objects (and t(hits) was breaking that order in BioC  3.1).
Now the validity method for Hits enforces the partial sorting and
t(hits) preserves it.

There were only 3 or 4 packages that broke in devel because of
that change (typically the change broke their unit tests). I fixed
them (except Repitools, but it's still on my list). The fix is easy:
if having the hits fully sorted matters, just use sort() on the Hits
object. The man page for ?findOverlaps will soon be updated to
reflect these changes.

Cheers,
H.


On 01/15/2015 06:42 AM, Kasper Daniel Hansen wrote:

Has it ever been documented that the return object is sorted in a specific
way?  I just want to make sure we think about whether that is something we
want to enforce giving the possibility of using a different algorithm in
the future.

We could also address this by implementing (perhaps it already exists) a
sort() method for the return object.  That would still break existing code
though.

Best,
Kasper

On Wed, Jan 14, 2015 at 11:13 PM, Michael Lawrence 
lawrence.mich...@gene.com wrote:


I bet there is a lot of code that depends on having the hits (conveniently)
ordered by query,subject index, so we should try to restore the previous
behavior.

On Wed, Jan 14, 2015 at 8:00 PM, Dario Strbenac 
dstr7...@uni.sydney.edu.au
wrote:


Hello,

For an identical query, the matrix results are in a different order.
Consider the subject hits of the last two rows :


mapping# R Under development (unstable) (2015-01-13 r67453) and

IRanges 2.1.35
  queryHits subjectHits
[1,] 1   1
[2,] 1   4
[3,] 2   2
[4,] 4   1
[5,] 4   4
[6,] 6   7
[7,] 6   6


mapping# R Under development (unstable) (2015-01-13 r67453) and

IRanges 2.0.1
  queryHits subjectHits
[1,] 1   1
[2,] 1   4
[3,] 2   2
[4,] 4   1
[5,] 4   4
[6,] 6   6
[7,] 6   7

This causes some values to be extracted in a different order by our
annotationLookup function, and causes an error for the development

version

of Repitools on a test case which uses all.equal to compare a list to a
correct list, but not for the release version which uses the release
version of IRanges. Should I update the test case to have a new expected
result, or is this new characteristic of findOverlaps likely to revert to
the previous output soon ?

The two sets of intervals to produce this result are anno and probesGR,
defined in the tests.R file in the Repitools package.

--
Dario Strbenac
PhD Student
University of Sydney
Camperdown NSW 2050
Australia
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



 [[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] IRanges findOverlaps Result Different for Recent Update

2015-01-15 Thread Michael Lawrence
My concern is mostly in user code not seen in Bioc svn. But perhaps the
partial sorting (by query) is sufficient for many of those.

On Thu, Jan 15, 2015 at 11:34 AM, Hervé Pagès hpa...@fredhutch.org wrote:

 Hi guys,

 Indeed, the Hits object returned by findOverlaps() is not fully
 sorted anymore. Now it's sorted by query hit *only* and not by query
 hit *and* subject hit. Fully sorting a big Hits object has a high
 cost, both in terms of time and memory footprint. The partial
 sorting is *much* cheaper: it's done using a tabulated sorting
 algo implemented in C that works in linear time.

 The partial sorting is important: it allows a very common
 transformation like as(hits, List) to be super fast. But the
 full sorting was overkill and generally not needed. Also note that
 the full sorting was never enforced via the validity method for
 Hits objects (and t(hits) was breaking that order in BioC  3.1).
 Now the validity method for Hits enforces the partial sorting and
 t(hits) preserves it.

 There were only 3 or 4 packages that broke in devel because of
 that change (typically the change broke their unit tests). I fixed
 them (except Repitools, but it's still on my list). The fix is easy:
 if having the hits fully sorted matters, just use sort() on the Hits
 object. The man page for ?findOverlaps will soon be updated to
 reflect these changes.

 Cheers,
 H.



 On 01/15/2015 06:42 AM, Kasper Daniel Hansen wrote:

 Has it ever been documented that the return object is sorted in a specific
 way?  I just want to make sure we think about whether that is something we
 want to enforce giving the possibility of using a different algorithm in
 the future.

 We could also address this by implementing (perhaps it already exists) a
 sort() method for the return object.  That would still break existing code
 though.

 Best,
 Kasper

 On Wed, Jan 14, 2015 at 11:13 PM, Michael Lawrence 
 lawrence.mich...@gene.com wrote:

  I bet there is a lot of code that depends on having the hits
 (conveniently)
 ordered by query,subject index, so we should try to restore the previous
 behavior.

 On Wed, Jan 14, 2015 at 8:00 PM, Dario Strbenac 
 dstr7...@uni.sydney.edu.au
 wrote:

  Hello,

 For an identical query, the matrix results are in a different order.
 Consider the subject hits of the last two rows :

  mapping# R Under development (unstable) (2015-01-13 r67453) and

 IRanges 2.1.35
   queryHits subjectHits
 [1,] 1   1
 [2,] 1   4
 [3,] 2   2
 [4,] 4   1
 [5,] 4   4
 [6,] 6   7
 [7,] 6   6

  mapping# R Under development (unstable) (2015-01-13 r67453) and

 IRanges 2.0.1
   queryHits subjectHits
 [1,] 1   1
 [2,] 1   4
 [3,] 2   2
 [4,] 4   1
 [5,] 4   4
 [6,] 6   6
 [7,] 6   7

 This causes some values to be extracted in a different order by our
 annotationLookup function, and causes an error for the development

 version

 of Repitools on a test case which uses all.equal to compare a list to a
 correct list, but not for the release version which uses the release
 version of IRanges. Should I update the test case to have a new expected
 result, or is this new characteristic of findOverlaps likely to revert
 to
 the previous output soon ?

 The two sets of intervals to produce this result are anno and probesGR,
 defined in the tests.R file in the Repitools package.

 --
 Dario Strbenac
 PhD Student
 University of Sydney
 Camperdown NSW 2050
 Australia
 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel


  [[alternative HTML version deleted]]

 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel


 [[alternative HTML version deleted]]

 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel


 --
 Hervé Pagès

 Program in Computational Biology
 Division of Public Health Sciences
 Fred Hutchinson Cancer Research Center
 1100 Fairview Ave. N, M1-B514
 P.O. Box 19024
 Seattle, WA 98109-1024

 E-mail: hpa...@fredhutch.org
 Phone:  (206) 667-5791
 Fax:(206) 667-1319


[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] IRanges findOverlaps Result Different for Recent Update

2015-01-15 Thread Dario Strbenac
The order of results is not important for the analysis. I have updated the test 
case with a new expected result.

--
Dario Strbenac
PhD Student
University of Sydney
Camperdown NSW 2050
Australia
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] ClassifyR Fails to Build on Windows

2015-01-15 Thread Dario Strbenac
Hello,

The development version of ClassifyR won't build on Windows. It happens for a 
code section in the vignette that executes a function that has a bpmapply loop. 
However, I'm using the default parameters by calling bpparam(), so it should 
work on Windows. The code in the vignette executes without problems for Linux 
and Mac OS. Is there a flaw in the development version of BiocParallel ?

--
Dario Strbenac
PhD Student
University of Sydney
Camperdown NSW 2050
Australia
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] ClassifyR Fails to Build on Windows

2015-01-15 Thread Dan Tenenbaum
There is no shared memory on windows so you need to make sure you require() any 
necessary packages on each node. 

Dan

On January 15, 2015 5:00:22 PM PST, Dario Strbenac dstr7...@uni.sydney.edu.au 
wrote:
Hello,

The development version of ClassifyR won't build on Windows. It happens
for a code section in the vignette that executes a function that has a
bpmapply loop. However, I'm using the default parameters by calling
bpparam(), so it should work on Windows. The code in the vignette
executes without problems for Linux and Mac OS. Is there a flaw in the
development version of BiocParallel ?

--
Dario Strbenac
PhD Student
University of Sydney
Camperdown NSW 2050
Australia
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] IRanges findOverlaps Result Different for Recent Update

2015-01-15 Thread Hervé Pagès

Hi Michael,

On 01/15/2015 11:59 AM, Michael Lawrence wrote:

My concern is mostly in user code not seen in Bioc svn.


I understand but the fate of that code is to get out of sync
sooner or later. And sooner rather than later if it relies on
undocumented behavior.


But perhaps the
partial sorting (by query) is sufficient for many of those.


It seems to be sufficient for more than 99.5% of the packages in
BioC svn :-)

Note that keeping Hits objects partially sorted instead of fully
sorted not only speeds up findOverlaps() but also basic operations
on Hits objects like union(), t(), etc...

Since we are on it, I should also mention that new in BioC 3.1 is a
Hits() constructor function which takes care of partially sorting the
hits, selectHits() for selecting hits in the same way the 'select'
arg of findOverlaps() does, and all the comparison operations (==, =,
order, sort, rank, etc..., see ?`Hits-comparison` in S4Vectors).

Cheers,
H.



On Thu, Jan 15, 2015 at 11:34 AM, Hervé Pagès hpa...@fredhutch.org
mailto:hpa...@fredhutch.org wrote:

Hi guys,

Indeed, the Hits object returned by findOverlaps() is not fully
sorted anymore. Now it's sorted by query hit *only* and not by query
hit *and* subject hit. Fully sorting a big Hits object has a high
cost, both in terms of time and memory footprint. The partial
sorting is *much* cheaper: it's done using a tabulated sorting
algo implemented in C that works in linear time.

The partial sorting is important: it allows a very common
transformation like as(hits, List) to be super fast. But the
full sorting was overkill and generally not needed. Also note that
the full sorting was never enforced via the validity method for
Hits objects (and t(hits) was breaking that order in BioC  3.1).
Now the validity method for Hits enforces the partial sorting and
t(hits) preserves it.

There were only 3 or 4 packages that broke in devel because of
that change (typically the change broke their unit tests). I fixed
them (except Repitools, but it's still on my list). The fix is easy:
if having the hits fully sorted matters, just use sort() on the Hits
object. The man page for ?findOverlaps will soon be updated to
reflect these changes.

Cheers,
H.



On 01/15/2015 06:42 AM, Kasper Daniel Hansen wrote:

Has it ever been documented that the return object is sorted in
a specific
way?  I just want to make sure we think about whether that is
something we
want to enforce giving the possibility of using a different
algorithm in
the future.

We could also address this by implementing (perhaps it already
exists) a
sort() method for the return object.  That would still break
existing code
though.

Best,
Kasper

On Wed, Jan 14, 2015 at 11:13 PM, Michael Lawrence 
lawrence.mich...@gene.com mailto:lawrence.mich...@gene.com wrote:

I bet there is a lot of code that depends on having the hits
(conveniently)
ordered by query,subject index, so we should try to restore
the previous
behavior.

On Wed, Jan 14, 2015 at 8:00 PM, Dario Strbenac 
dstr7...@uni.sydney.edu.au mailto:dstr7...@uni.sydney.edu.au
wrote:

Hello,

For an identical query, the matrix results are in a
different order.
Consider the subject hits of the last two rows :

mapping# R Under development (unstable)
(2015-01-13 r67453) and

IRanges 2.1.35
   queryHits subjectHits
[1,] 1   1
[2,] 1   4
[3,] 2   2
[4,] 4   1
[5,] 4   4
[6,] 6   7
[7,] 6   6

mapping# R Under development (unstable)
(2015-01-13 r67453) and

IRanges 2.0.1
   queryHits subjectHits
[1,] 1   1
[2,] 1   4
[3,] 2   2
[4,] 4   1
[5,] 4   4
[6,] 6   6
[7,] 6   7

This causes some values to be extracted in a different
order by our
annotationLookup function, and causes an error for the
development

version

of Repitools on a test case which uses all.equal to
compare a list to a
correct list, but not for the release version which uses
the release