Re: [R] How to read a file containing two types of rows - (for the Netflix challenge data format)

2020-01-31 Thread Emmanuel Levy
(first.col.val, reps-1), mat.clean) On Fri, 31 Jan 2020 at 20:31, Berry, Charles wrote: > > > > On Jan 31, 2020, at 1:04 AM, Emmanuel Levy > wrote: > > > > Hi, > > > > I'd like to use the Netflix challenge data and just can't figure out how > to

[R] How to read a file containing two types of rows - (for the Netflix challenge data format)

2020-01-31 Thread Emmanuel Levy
Hi, I'd like to use the Netflix challenge data and just can't figure out how to efficiently "scan" the files. https://www.kaggle.com/netflix-inc/netflix-prize-data The files have two types of row, either an *ID* e.g., "1:" , "2:", etc. or 3 values associated to each ID: The format is as

Re: [R] Adding a column to an **empty** data.frame

2016-11-02 Thread Emmanuel Levy
'empty' > data.frames, which could be either.) > > > > Bill Dunlap > TIBCO Software > wdunlap tibco.com > > On Wed, Nov 2, 2016 at 6:48 AM, Emmanuel Levy <emmanuel.l...@gmail.com> > wrote: > >> Dear All, >> >> This sounds simple but can'

[R] Adding a column to an **empty** data.frame

2016-11-02 Thread Emmanuel Levy
Dear All, This sounds simple but can't figure out a good way to do it. Let's say that I have an empty data frame "df": ## creates the df df = data.frame( id=1, data=2) ## empties the df, perhaps there is a more elegant way to create an empty df? df = df[-c(1),] > df [1] id data <0 rows> (or

Re: [R] combining columns into a combination index of the same length

2015-07-21 Thread Emmanuel Levy
for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2015-07-21 15:43 GMT+02:00 Emmanuel Levy emmanuel.l...@gmail.com: Thanks! -- this is indeed much faster (plus I made a mistake, one has to use paste with the option collapse

[R] combining columns into a combination index of the same length

2015-07-21 Thread Emmanuel Levy
Hi, The answer to this is probably straightforward, I have a dataframe and I'd like to build an index of column combinations, e.g. col1 col2 -- col3 (the index I need) A 1 1 A 1 1 A 2 2 B 1 3 B 2 4 B 2 4 At

[R] Retrieve indexes of the first occurrence of numbers in an effective manner

2012-12-27 Thread Emmanuel Levy
Hi, That sounds simple but I cannot think of a really fast way of getting the following: c(1,1,2,2,3,3,4,4) would give c(1,3,5,7) i.e., a function that returns the indexes of the first occurrences of numbers. Note that numbers may have any order e.g., c(3,4,1,2,1,1,2,3,5), can be very large,

[R] Finding (swapped) repetitions of numbers pairs across two columns

2012-12-27 Thread Emmanuel Levy
Hi, I've had this problem for a while and tackled it is a quite dirty way so I'm wondering is a better solution exists: If we have two vectors: v1 = c(0,1,2,3,4) v2 = c(5,3,2,1,0) How to remove one instance of the 3,1 / 1,3 double? At the moment I'm using the following solution, which is

Re: [R] Finding (swapped) repetitions of numbers pairs across two columns

2012-12-27 Thread Emmanuel Levy
I did not know that unique worked on entire rows! That is great, thank you very much! Emmanuel On 27 December 2012 22:39, Marc Schwartz marc_schwa...@me.com wrote: unique(t(apply(cbind(v1, v2), 1, sort))) __ R-help@r-project.org mailing list

[R] How to re-order clusters of hclust output?

2012-05-11 Thread Emmanuel Levy
Hello, The heatmap function conveniently has a reorder.dendrogram function so that clusters follow a certain logic. It seems that the hclust function doesn't have such feature. I can use the reorder function on the dendrogram obtained from hclust, but this does not modify the hclust object

Re: [R] How to re-order clusters of hclust output?

2012-05-11 Thread Emmanuel Levy
wrote: I don't have a general answer to your question, but 1L and 2L are just the integers 1 and 2 (the L makes them integers instead of doubles which is useful for some things) Michael On May 11, 2012, at 2:15 PM, Emmanuel Levy emmanuel.l...@gmail.com wrote: Hello, The heatmap function

[R] How to flatten a multidimensional array into a dataframe?

2012-04-19 Thread Emmanuel Levy
Hi, I have a three dimensional array, e.g., my.array = array(0, dim=c(2,3,4), dimnames=list( d1=c(A1,A2), d2=c(B1,B2,B3), d3=c(C1,C2,C3,C4)) ) what I would like to get is then a dataframe: d1 d2 d3 value A1 B1 C1 0 A2 B1 C1 0 . . . A2 B3 C4 0 I'm sure there is one function to do this

Re: [R] How to flatten a multidimensional array into a dataframe?

2012-04-19 Thread Emmanuel Levy
OK, it seems that the array2df function from arrayhelpers package does the job :) On 19 April 2012 16:46, Emmanuel Levy emmanuel.l...@gmail.com wrote: Hi, I have a three dimensional array, e.g., my.array = array(0, dim=c(2,3,4), dimnames=list( d1=c(A1,A2), d2=c(B1,B2,B3), d3=c(C1,C2,C3,C4

Re: [R] Idea/package to linearize a curve along the diagonal?

2012-03-13 Thread Emmanuel Levy
return(c(Xtrans,Ytrans)) } On 12 March 2012 20:58, David Winsemius dwinsem...@comcast.net wrote: On Mar 12, 2012, at 3:07 PM, Emmanuel Levy wrote: Hi Jeff, Thanks for your reply and the example. I'm not sure if it could be applied to the problem I'm facing though, for two reasons: (i

Re: [R] Idea/package to linearize a curve along the diagonal?

2012-03-12 Thread Emmanuel Levy
           O.O#.       #.O#.  with /Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k --- Sent from my phone. Please excuse my brevity. Emmanuel Levy emmanuel.l...@gmail.com wrote: Dear Jeff, I'm sorry but I

Re: [R] Which non-parametric regression would allow fitting this type of data? (example given).

2012-03-11 Thread Emmanuel Levy
. That is not what you say you want, so these approaches are unlikely to work. -- Bert On Sat, Mar 10, 2012 at 6:20 PM, Emmanuel Levy emmanuel.l...@gmail.com wrote: Hi, I'm wondering which function would allow fitting this type of data: tmp=rnorm(2000) X.1 = 5+tmp Y.1 = 5+ (5*tmp+rnorm(2000

[R] Idea/package to linearize a curve along the diagonal?

2012-03-11 Thread Emmanuel Levy
Hi, I am trying to normalize some data. First I fitted a principal curve (using the LCPM package), but now I would like to apply a transformation so that the curve becomes a straight diagonal line on the plot. The data used to fit the curve would then be normalized by applying the same

[R] How to fit a line through the Mountain crest, i.e., through the highest density of points - in a loess-like fashion.

2012-03-10 Thread Emmanuel Levy
Hi, I'm trying to normalize data by fitting a line through the highest density of points (in a 2D plot). In other words, if you visualize the data as a density plot, the fit I'm trying to achieve is the line that goes through the crest of the mountain. This is similar yet different to what LOESS

[R] How to improve the robustness of loess? - example included.

2012-03-10 Thread Emmanuel Levy
Hi, I posted a message earlier entitled How to fit a line through the Mountain crest ... I figured loess is probably the best way, but it seems that the problem is the robustness of the fit. Below I paste an example to illustrate the problem: tmp=rnorm(2000) X.background = 5+tmp;

Re: [R] How to fit a line through the Mountain crest, i.e., through the highest density of points - in a loess-like fashion.

2012-03-10 Thread Emmanuel Levy
as a reply to the second post. All the best, Emmanuel On 10 March 2012 19:46, David Winsemius dwinsem...@comcast.net wrote: On Mar 10, 2012, at 3:55 PM, Emmanuel Levy wrote: Hi, I'm trying to normalize data by fitting a line through the highest density of points (in a 2D plot). In other

Re: [R] How to improve the robustness of loess? - example included.

2012-03-10 Thread Emmanuel Levy
))) ) On 10 March 2012 18:30, Emmanuel Levy emmanuel.l...@gmail.com wrote: Hi, I posted a message earlier entitled How to fit a line through the Mountain crest ... I figured loess is probably the best way, but it seems that the problem is the robustness of the fit. Below I paste an example

[R] Which non-parametric regression would allow fitting this type of data? (example given).

2012-03-10 Thread Emmanuel Levy
Hi, I'm wondering which function would allow fitting this type of data: tmp=rnorm(2000) X.1 = 5+tmp Y.1 = 5+ (5*tmp+rnorm(2000)) tmp=rnorm(100) X.2 = 9+tmp Y.2 = 40+ (1.5*tmp+rnorm(100)) X.3 = 7+ 0.5*runif(500) Y.3 = 15+20*runif(500) X = c(X.1,X.2,X.3) Y =

[R] Best HMM package to generate random (protein) sequences?

2011-03-22 Thread Emmanuel Levy
Dear All, I would like to generate random protein sequences using a HMM model. Has anybody done that before, or would you have any idea which package is likely to be best for that? The important facts are that the HMM will be fitted on ~3 million sequential observations, with 20 different states

[R] How to do a probability density based filtering in 2D?

2010-11-19 Thread Emmanuel Levy
Hello, This sounds like a problem to which many solutions should exist, but I did not manage to find one. Basically, given a list of datapoints, I'd like to keep those within the X% percentile highest density. That would be equivalent to retain only points within a given line of a contour plot.

Re: [R] How to do a probability density based filtering in 2D?

2010-11-19 Thread Emmanuel Levy
help, Emmanuel On 19 November 2010 21:25, David Winsemius dwinsem...@comcast.net wrote: On Nov 19, 2010, at 8:44 PM, Emmanuel Levy wrote: Hello, This sounds like a problem to which many solutions should exist, but I did not manage to find one. Basically, given a list of datapoints, I'd

Re: [R] How to do a probability density based filtering in 2D?

2010-11-19 Thread Emmanuel Levy
Hello Roger, Thanks for the suggestions. I finally managed to do it using the output of kde2d - The code is pasted below. Actually this made me realize that the outcome of kde2d can be quite influenced by outliers if a boundary box is not given (try running the code without the boundary box,

[R] problem with PDF/postcript, cannot change paper size: ‘mode(width)’ and ‘mod e(height)’ differ between new and previous

2010-11-16 Thread Emmanuel Levy
Hi, The pdf function would not let me change the paper size and gives me the following warning: pdf(figure.pdf, width=6, height=10) Warning message: ‘mode(width)’ and ‘mode(height)’ differ between new and previous == NOT changing ‘width’ ‘height’ If I use the option paper = a4r,

Re: [R] problem with PDF/postcript, cannot change paper size: ‘mode(width)’ and ‘mod e(height)’ differ between new and previous

2010-11-16 Thread Emmanuel Levy
Update - sorry for the stupid question, let's say it's pretty late. For those who may be as tired as I am and get the same warning, the paper size should be given as an integer! On 16 November 2010 04:17, Emmanuel Levy emmanuel.l...@gmail.com wrote: Hi, The pdf function would not let me

[R] Random sampling while keeping distribution of nearest neighbor distances constant.

2009-08-12 Thread Emmanuel Levy
Dear All, I cannot find a solution to the following problem although I imagine that it is a classic, hence my email. I have a vector V of X values comprised between 1 and N. I would like to get random samples of X values also comprised between 1 and N, but the important point is: * I would like

[R] Random sampling while keeping distribution of nearest neighbor distances constant.

2009-08-12 Thread Emmanuel Levy
Dear All,(my apologies if it got posted twice, it seems it didn't get through) I cannot find a solution to the following problem although I suppose this is a classic. I have a vector V of X=length(V) values comprised between 1 and N. I would like to get random samples of X values also

Re: [R] Random sampling while keeping distribution of nearest ne

2009-08-12 Thread Emmanuel Levy
solve it. Many thanks! Emmanuel PS: I apologize that I sent a second post. This one did not appear in my R-help label so I assumed it wasn't sent for some reason. 2009/8/12 Ted Harding ted.hard...@manchester.ac.uk: On 12-Aug-09 22:05:24, Emmanuel Levy wrote: Dear All, I cannot find

Re: [R] Random sampling while keeping distribution of nearest neighbor distances constant.

2009-08-12 Thread Emmanuel Levy
with this problem? Or even better of a package? Thanks for your help, Emmanuel 2009/8/12 Nordlund, Dan (DSHS/RDA) nord...@dshs.wa.gov: -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Emmanuel Levy Sent: Wednesday, August 12, 2009

Re: [R] Random sampling while keeping distribution of nearest neighbor distances constant.

2009-08-12 Thread Emmanuel Levy
But if the 1st order differences are the same, then doesn't it follow that the 2nd, 3rd, ... order differences must be the same between the original and the new random vector.  What am I missing? You are missing nothing sorry, I wrote something wrong. What I would like to be preserved is the

[R] Is it normal that normalize.loess does not tolerate a single NA value?

2009-03-13 Thread Emmanuel Levy
Dear all, I have been using normalize.loess and I get the following error message when my matrix contains NA values: my.mat = matrix(nrow=100, ncol=4, runif(400) ) my.mat[1,1]=NA my.mat.n = normalize.loess(my.mat, verbose=TRUE) Done with 1 vs 2 in iteration 1 Done with 1 vs 3 in iteration 1

Re: [R] Mathematica now working with Nvidia GPUs -- any plan for R?

2008-11-20 Thread Emmanuel Levy
Dear Brian, Mose, Peter and Stefan, Thanks a lot for your replies - the issues are now clearer to me. (and I apologize for not using the appropriate list). Best wishes, Emmanuel 2008/11/19 Peter Dalgaard [EMAIL PROTECTED]: Stefan Evert wrote: On 19 Nov 2008, at 07:56, Prof Brian Ripley

[R] Mathematica now working with Nvidia GPUs -- any plan for R?

2008-11-18 Thread Emmanuel Levy
Dear All, I just read an announcement saying that Mathematica is launching a version working with Nvidia GPUs. It is claimed that it'd make it ~10-100x faster! http://www.physorg.com/news146247669.html I was wondering if you are aware of any development going into this direction with R? Thanks

[R] gregexpr slow and increases exponentially with string length -- how to speed it up?

2008-10-30 Thread Emmanuel Levy
Dear All, I have a long string and need to search for regular expressions in there. However it becomes horribly slow as the string length increases. Below is an example: when i increases by 5, the time spent increases by more! (my string is 11,000,000 letters long!) I also noticed that - the

Re: [R] gregexpr slow and increases exponentially with string length -- how to speed it up?

2008-10-30 Thread Emmanuel Levy
Hi Chuck, Thanks a lot for your suggestion. You can find all such matches (not just the disjoint ones that gregexpr finds) using something like this: twomatch -function(x,y) intersect(x+1,y) match.list - list( which( vec %in% c(3,6,7) ),

[R] If I known d1 (density1), and dmix is a mix between d1 and d2 (d2 is unknown), can one infer d2?

2008-10-22 Thread Emmanuel Levy
Dear All, I hope the title speaks by itself. I believe that there should be a solution when I see what Mclust is able to do. However, this problem is quite particular in that d3 is not known and does not necessarily correspond to a common distribution (e.g. normal, exponential ...). However it

Re: [R] Mclust problem with mclust1Dplot: Error in to - from : non-numeric argument to binary operator

2008-10-21 Thread Emmanuel Levy
this would be great; is it possible to somehow force the parameters (e.g variance) to be greater than a particular threshold? Thanks, Emmanuel 2008/10/20 Emmanuel Levy [EMAIL PROTECTED]: Dear list members, I am using Mclust in order to deconvolute a distribution that I believe is a sum of two

Re: [R] Mclust problem with mclust1Dplot: Error in to - from : non-numeric argument to binary operator

2008-10-21 Thread Emmanuel Levy
=c(0,0.4) ) axis(side=1) for (i in 1:2) { ni - v$parameters$pro[i]*dnorm(x0, mean=as.numeric(v$parameters$mean[i]),sd=1) lines(x0,ni,col=1) nt - nt+ni } lines(x0,nt,lwd=3) segments(my.data,0,my.data,0.02) Best, Emmanuel 2008/10/21 Emmanuel Levy [EMAIL PROTECTED]: After playing

[R] unimodal VS bimodal normal distribution - how to get a pvalue?

2008-10-21 Thread Emmanuel Levy
Dear All, I have a distribution of values and I would like to assess the uni/bimodality of the distribution. I managed to decompose it into two normal distribs using Mclust, and the BIC criteria is best for two parameters. However, the problem is that the BIC criteria is not a P-value, which I

Re: [R] unimodal VS bimodal normal distribution - how to get a pvalue?

2008-10-21 Thread Emmanuel Levy
Hi Duncan, I'm really stupid --- yes of course!! Thanks for pointing me out the (now) obvious. All the best, E 2008/10/21 Duncan Murdoch [EMAIL PROTECTED]: On 10/21/2008 2:56 PM, Emmanuel Levy wrote: Dear All, I have a distribution of values and I would like to assess the uni/bimodality

[R] Mclust problem with mclust1Dplot: Error in to - from : non-numeric argument to binary operator

2008-10-20 Thread Emmanuel Levy
Dear list members, I am using Mclust in order to deconvolute a distribution that I believe is a sum of two gaussians. First I can make a model: my.data.model = Mclust(my.data, modelNames=c(E), warn=T, G=1:3) But then, when I try to plot the result, I get the following error:

Re: [R] RCurl compilation error on ubuntu hardy

2008-09-17 Thread Emmanuel Levy
to encoding of strings. D. Emmanuel Levy wrote: Dear list members, I encountered this problem and the solution pointed out in a previous thread did not work for me. (e.g. install.packages(RCurl, repos = http://www.omegahat.org/R;) I work with Ubuntu Hardy, and installed R 2.6.2 via apt-get. I

[R] RCurl compilation error on ubuntu hardy

2008-09-16 Thread Emmanuel Levy
Dear list members, I encountered this problem and the solution pointed out in a previous thread did not work for me. (e.g. install.packages(RCurl, repos = http://www.omegahat.org/R;) I work with Ubuntu Hardy, and installed R 2.6.2 via apt-get. I really need RCurl in order to use biomaRt ...

Re: [R] which(df$name==A) takes ~1 second! (df is very large), but can it be speeded up?

2008-08-13 Thread Emmanuel Levy
PM, Peter Cowan [EMAIL PROTECTED] wrote: Emmanuel, On Tue, Aug 12, 2008 at 4:35 PM, Emmanuel Levy [EMAIL PROTECTED] wrote: Dear All, I have a large data frame ( 270 lines and 14 columns), and I would like to extract the information in a particular way illustrated below: Given a data

Re: [R] which(df$name==A) takes ~1 second! (df is very large), but can it be speeded up?

2008-08-13 Thread Emmanuel Levy
that the split and hash.mat functions. Thanks for your help, Emmanuel 2008/8/13 Erik Iverson [EMAIL PROTECTED]: I still don't understand what you are doing. Can you make a small example that shows what you have and what you want? Is ?split what you are after? Emmanuel Levy wrote: Dear

Re: [R] which(df$name==A) takes ~1 second! (df is very large), but can it be speeded up?

2008-08-13 Thread Emmanuel Levy
a small example that shows what you have and what you want? Is ?split what you are after? Emmanuel Levy wrote: Dear Peter and Henrik, Thanks for your replies - this helps speed up a bit, but I thought there would be something much faster. What I mean is that I thought that a particular

[R] which(df$name==A) takes ~1 second! (df is very large), but can it be speeded up?

2008-08-12 Thread Emmanuel Levy
Dear All, I have a large data frame ( 270 lines and 14 columns), and I would like to extract the information in a particular way illustrated below: Given a data frame df: col1=sample(c(0,1),10, rep=T) names = factor(c(rep(A,5),rep(B,5))) df = data.frame(names,col1) df names col1 1

[R] Smoothing z-values according to their x, y positions

2008-03-19 Thread Emmanuel Levy
Dear All, I'm sure this is not the first time this question comes up but I couldn't find the keywords that would point me out to it - so apologies if this is a re-post. Basically I've got thousands of points, each depending on three variables: x, y, and z. if I do a plot(x,y, col=z), I get

Re: [R] Smoothing z-values according to their x, y positions

2008-03-19 Thread Emmanuel Levy
PROTECTED] On Behalf Of Emmanuel Levy Sent: Wednesday, March 19, 2008 12:42 PM To: r-help@r-project.org Subject: [R] Smoothing z-values according to their x, y positions Dear All, I'm sure this is not the first time this question comes up but I couldn't find the keywords that would

Re: [R] Smoothing z-values according to their x, y positions

2008-03-19 Thread Emmanuel Levy
looked yet at the locfit package as it is not installed, but I will check it out! Thanks for helping! Emmanuel On 20/03/2008, David Winsemius [EMAIL PROTECTED] wrote: Emmanuel Levy [EMAIL PROTECTED] wrote in news:[EMAIL PROTECTED]: Dear Bert, Thanks for your reply - I indeed saw