[R] Jarque-Bera and rnorm()
Folks, I'm a bit puzzled by the fact that if I generate 100,000 standard normal variates using rnorm() and perform the Jarque-Bera on the resulting vector, I get p-values that vary drastically from run to run. Is this expected? Surely the p-val should be close to 1 for each test? Are 100,000 variates sufficient for this test? Or is it that rnorm() is not a robust random number generator? I looked at the skewness and excess kurtosis, and the former seems to be unstable, which leads me to think that is why JB is failing. Here are my outputs from successive runs of rjb.test (the robust Jarque Bera from the lawstat package). set.seed(100) y - rnorm(10);rjb.test(y);skewness(y)[1];kurtosis(y)[1] Robust Jarque Bera Test data: y X-squared = 1.753, df = 2, p-value = 0.4162 [1] -0.01025744 [1] 0.0008213325 y - rnorm(10);rjb.test(y);skewness(y)[1];kurtosis(y)[1] Robust Jarque Bera Test data: y X-squared = 0.1359, df = 2, p-value = 0.9343 [1] -0.001833042 [1] -0.002603599 y - rnorm(10);rjb.test(y);skewness(y)[1];kurtosis(y)[1] Robust Jarque Bera Test data: y X-squared = 4.6438, df = 2, p-value = 0.09809 [1] -0.01620776 [1] -0.005762349 Please advise. Thanks, Murali _ MSN is giving away a trip to Vegas to see Elton John. Enter to win today. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] creating conditional list of elements
Sorry to plague the list, but I think I got the answer. The following would do: signalList - list(tradingRules$Signal[tradingRules$Enabled]) [[1]] length(signalList) [1] 2 Now my problem is shifted: I have the Signal column in the original data frame referring to actual matrices previously created in R. That is, bar_signal and cif_signal are extant matrices. What I need is the minimum number of rows in these matrices, so what I plan to do is: n - min(sapply(signalList, NROW)) but this doesn't work (it returns 1, but I have 2800 rows in each of bar_signal and cif_signal, so I should get 2800) Is there a smart way to do this? And is there a way to tell R that the entries in the Signal column of tradingSignal are the names of objects? Thanks, Murali Murali Menon/GB/ABNAMRO/NL 29/03/2007 11:13 To r-help@stat.math.ethz.ch cc Subject creating conditional list of elements Folks, I have a matrix as follows (first column is the rownames, first row is the columnnames) RuleEnabled Signal Foo False foo_signal Bar Truebar_signal Gum False gum_signal Cif Truecif_signal I would like to create a list of only those signals whose 'enabled' flag is True. So in the above case I should end up with signalList = bar_signal, cif_signal Likewise, if the enabled flags were all set to False, then signalList should be an empty list. What's a good way to achieve this, please? Thanks, Murali --- This message (including any attachments) is confidential and...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] creating conditional list of elements
John, Thanks, that works nicely. Didn't know about 'get'. Onwards and upwards! :-) Cheers, Murali John James [EMAIL PROTECTED] 29/03/2007 12:40 To [EMAIL PROTECTED] cc Subject RE: [R] creating conditional list of elements Murali If sm is your orginal matrix # such that sm[sm$Enabled,]$Signal [1] bar_signal cif_signal # and bar_signal - matrix(rnorm(100), nrow=5, ncol=20) cif_signal - matrix(rnorm(60), nrow=12, ncol=5) # then... sapply(sm[sm$Enabled,]$Signal, function(x){NROW(get(x))}) bar_signal cif_signal 5 12 Hope this helps John James Mango Solutions -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: 29 March 2007 11:40 Cc: r-help@stat.math.ethz.ch Subject: Re: [R] creating conditional list of elements Sorry to plague the list, but I think I got the answer. The following would do: signalList - list(tradingRules$Signal[tradingRules$Enabled]) [[1]] length(signalList) [1] 2 Now my problem is shifted: I have the Signal column in the original data frame referring to actual matrices previously created in R. That is, bar_signal and cif_signal are extant matrices. What I need is the minimum number of rows in these matrices, so what I plan to do is: n - min(sapply(signalList, NROW)) but this doesn't work (it returns 1, but I have 2800 rows in each of bar_signal and cif_signal, so I should get 2800) Is there a smart way to do this? And is there a way to tell R that the entries in the Signal column of tradingSignal are the names of objects? Thanks, Murali Murali Menon/GB/ABNAMRO/NL 29/03/2007 11:13 To r-help@stat.math.ethz.ch cc Subject creating conditional list of elements Folks, I have a matrix as follows (first column is the rownames, first row is the columnnames) RuleEnabled Signal Foo False foo_signal Bar Truebar_signal Gum False gum_signal Cif Truecif_signal I would like to create a list of only those signals whose 'enabled' flag is True. So in the above case I should end up with signalList = bar_signal, cif_signal Likewise, if the enabled flags were all set to False, then signalList should be an empty list. What's a good way to achieve this, please? Thanks, Murali --- This message (including any attachments) is confidential and...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. --- This message (including any attachments) is confidential and...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] cumsum over varying column lengths
Folks, I have a matrix of historicalReturns, where entry (i, j) is the daily return corresponding to date i and equity j. I also have a matrix startOffset, where entry (1, k) is the row offset in historicalReturns where I entered into equity k. So we have that NCOL(startOffset) = NCOL(historicalReturns). Now I would like compute for each column in historicalReturns, the cumulative return 'returnsFromInception' for the equity starting from the startOffset date. Is there a better way than as follows: n - NROW(historicalReturns) returnsFromInception - matrix(nrow = 1, ncol = NCOL(historicalInceptionDates)) for (i in 1 : NCOL(historicalReturns)) { cumReturn - cumsum(historicalReturns[startOffset[1, i] : n, i]) returnsFromInception[1, i] - cumReturn[length(cumReturn)] } This works for me, but seems rather inelegant, and I don't like having to use a matrix for returnsFromInception and startOffset, where vectors would work. Thanks for your help. Murali _ Its tax season, make sure to follow these few simple tips __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] array searches
Hi, This is truly amazing stuff. Inspired by Jim's and Olivier's suggestions, I'm trying to expand it to work with a m x n matrix, where the first column is dates and the next columns are all signals. I dare say a suitable application of 'apply' should work. Thanks a ton. Murali From: jim holtman [EMAIL PROTECTED] To: Murali Menon [EMAIL PROTECTED] CC: r-help@stat.math.ethz.ch Subject: Re: [R] array searches Date: Fri, 16 Feb 2007 10:21:40 -0500 try this: x - scan(textConnection(30/01/2007 0 + 31/01/2007 -1 + 01/02/2007 -1 + 02/02/2007 -1 + 03/02/2007 1 + 04/02/2007 1 + 05/02/2007 1 + 06/02/2007 1 + 07/02/2007 1 + 08/02/2007 1 + 09/02/2007 0 + 10/02/2007 0 + 11/02/2007 0 + 12/02/2007 1 + 13/02/2007 1 + 14/02/2007 1 + 15/02/2007 0 + 16/02/2007 0 + ), what=list(date=, value=0)) Read 18 records x$date - as.Date(x$date, %d/%m/%Y) # determine the breaks x.breaks - c(TRUE, diff(x$value) != 0) # determine the value at the break; assume that it is the minimum x.bdate - x$date[x.breaks] data.frame(date=x.bdate[cumsum(x.breaks)], value=x$value) date value 1 2007-01-30 0 2 2007-01-31-1 3 2007-01-31-1 4 2007-01-31-1 5 2007-02-03 1 6 2007-02-03 1 7 2007-02-03 1 8 2007-02-03 1 9 2007-02-03 1 10 2007-02-03 1 11 2007-02-09 0 12 2007-02-09 0 13 2007-02-09 0 14 2007-02-12 1 15 2007-02-12 1 16 2007-02-12 1 17 2007-02-15 0 18 2007-02-15 0 On 2/16/07, Murali Menon [EMAIL PROTECTED] wrote: Folks, I have a dataframe comprising a column of dates and a column of signals (-1, 0, 1) that looks something like this: 30/01/2007 0 31/01/2007 -1 01/02/2007 -1 02/02/2007 -1 03/02/2007 1 04/02/2007 1 05/02/2007 1 06/02/2007 1 07/02/2007 1 08/02/2007 1 09/02/2007 0 10/02/2007 0 11/02/2007 0 12/02/2007 1 13/02/2007 1 14/02/2007 1 15/02/2007 0 16/02/2007 0 What I need to do is for each signal *in reverse chronological order* to find the date that it first appeared. So, for the zero on 16/02/2007 and 15/02/2007, the 'inception' date would be 15/02/2007, because the day before, the signal was 1. Likewise, the 'inception' date for the signal 1 on 08/02/2007 and the five days prior, would be 03/02/2007. I need to create a structure of inception dates that would finally look as follows: -1 31/01/2007 -1 31/01/2007 -1 31/01/2007 1 03/02/2007 1 03/02/2007 1 03/02/2007 1 03/02/2007 1 03/02/2007 1 03/02/2007 0 09/02/2007 0 09/02/2007 0 09/02/2007 1 12/02/2007 1 12/02/2007 1 12/02/2007 0 15/02/2007 0 15/02/2007 Is there a clever way of doing this? My sadly C-oriented upbringing can only think in terms of for-loops. Thanks! Murali _ The average US Credit Score is 675. The cost to see yours: $0 by Experian. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? _ fast as 1 year __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] array searches
Folks, I have a dataframe comprising a column of dates and a column of signals (-1, 0, 1) that looks something like this: 30/01/2007 0 31/01/2007 -1 01/02/2007 -1 02/02/2007 -1 03/02/2007 1 04/02/2007 1 05/02/2007 1 06/02/2007 1 07/02/2007 1 08/02/2007 1 09/02/2007 0 10/02/2007 0 11/02/2007 0 12/02/2007 1 13/02/2007 1 14/02/2007 1 15/02/2007 0 16/02/2007 0 What I need to do is for each signal *in reverse chronological order* to find the date that it first appeared. So, for the zero on 16/02/2007 and 15/02/2007, the 'inception' date would be 15/02/2007, because the day before, the signal was 1. Likewise, the 'inception' date for the signal 1 on 08/02/2007 and the five days prior, would be 03/02/2007. I need to create a structure of inception dates that would finally look as follows: -1 31/01/2007 -1 31/01/2007 -1 31/01/2007 1 03/02/2007 1 03/02/2007 1 03/02/2007 1 03/02/2007 1 03/02/2007 1 03/02/2007 0 09/02/2007 0 09/02/2007 0 09/02/2007 1 12/02/2007 1 12/02/2007 1 12/02/2007 0 15/02/2007 0 15/02/2007 Is there a clever way of doing this? My sadly C-oriented upbringing can only think in terms of for-loops. Thanks! Murali _ The average US Credit Score is 675. The cost to see yours: $0 by Experian. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Computing stats on common parts of multiple dataframes
Hi Gabor, Eric, Your suggestions are certainly more elegant (and involve less typing) than my for-loop. Thanks! Patrick, I see what you mean regarding dates. The problem is that I'm doing all sorts of manipulations on the original data, which work best on numeric matrices, and the dates just mess it all up. I could, of course, reintroduce the date columns for the issue at hand, and do an intersect. Best wishes, Murali From: Erik Iverson [EMAIL PROTECTED] To: Murali Menon [EMAIL PROTECTED] CC: r-help@stat.math.ethz.ch Subject: Re: [R] Computing stats on common parts of multiple dataframes Date: Tue, 13 Feb 2007 14:42:21 -0600 Murali - I've come up with something that might with work, with gratutious use of the *apply functions. See ?apply, ?lappy, and ?mapply for how this would work. Basically, just set my.list equal to a list of data.frames you would like included. I made this to work with matrices first, so it does use as.matrix() in my function. Also, this could be turned into a general function so that you could specify a different function other than median. #Make my.list equal to a list of dataframes you want my.list - list(df1,df2) #What's the shortest? minrow - min(sapply(my.list,nrow)) #Chop all to the shortest tmp - lapply(my.list, function(x) x[(nrow(x)-(minrow-1)):nrow(x),]) #Do the computation, could change median to mean, or a user defined #function matrix(apply(mapply([,lapply(tmp,as.matrix), MoreArgs=list(1:(minrow*2))), 1, median), ncol=2) HTH, whether or not this is any better than your for loop solution is left up to you. Erik Murali Menon wrote: Folks, I have three dataframes storing some information about two currency pairs, as follows: R a EUR-USDNOK-SEK 1.231.33 1.221.43 1.261.42 1.241.50 1.211.36 1.261.60 1.291.44 1.251.36 1.271.39 1.231.48 1.221.26 1.241.29 1.271.57 1.211.55 1.231.35 1.251.41 1.251.30 1.231.11 1.281.37 1.271.23 R b EUR-USDNOK-SEK 1.231.22 1.211.36 1.281.61 1.231.34 1.211.22 R d EUR-USDNOK-SEK 1.271.39 1.231.48 1.221.26 1.241.29 1.271.57 1.211.55 1.231.35 1.251.41 1.251.33 1.231.11 1.281.37 1.271.23 The twist is that these entries correspond to dates where the *last* rows in each frame are today's entries, and so on backwards in time. I would like to create a matrix of medians (a median for each row and for each currency pair), but only for those rows where all dataframes have entries. My answer in this case should look like: EUR-USDNOK-SEK 1.251.41 1.251.33 1.231.11 1.281.37 1.271.23 where the last EUR-USD entry = median(1.27, 1.21, 1.27), etc. Notice that the output is of the same dimensions as the smallest dataframe (in this case 'b'). I can do it in a clumsy fashion by first obtaining the number of rows in the smallest matrix, chopping off the top rows of the other matrices to reduce them this size, then doing a for-loop across each currency pair, row-wise, to create a 3-vector which I then apply median() on. Surely there's a better way to do this? Please advise. Thanks, Murali Menon _ Valentines Day -- Shop for gifts that spell L-O-V-E at MSN Shopping __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. _ Turn searches into helpful donations. Make your search count. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Computing stats on common parts of multiple dataframes
Folks, I have three dataframes storing some information about two currency pairs, as follows: R a EUR-USD NOK-SEK 1.231.33 1.221.43 1.261.42 1.241.50 1.211.36 1.261.60 1.291.44 1.251.36 1.271.39 1.231.48 1.221.26 1.241.29 1.271.57 1.211.55 1.231.35 1.251.41 1.251.30 1.231.11 1.281.37 1.271.23 R b EUR-USD NOK-SEK 1.231.22 1.211.36 1.281.61 1.231.34 1.211.22 R d EUR-USD NOK-SEK 1.271.39 1.231.48 1.221.26 1.241.29 1.271.57 1.211.55 1.231.35 1.251.41 1.251.33 1.231.11 1.281.37 1.271.23 The twist is that these entries correspond to dates where the *last* rows in each frame are today's entries, and so on backwards in time. I would like to create a matrix of medians (a median for each row and for each currency pair), but only for those rows where all dataframes have entries. My answer in this case should look like: EUR-USD NOK-SEK 1.251.41 1.251.33 1.231.11 1.281.37 1.271.23 where the last EUR-USD entry = median(1.27, 1.21, 1.27), etc. Notice that the output is of the same dimensions as the smallest dataframe (in this case 'b'). I can do it in a clumsy fashion by first obtaining the number of rows in the smallest matrix, chopping off the top rows of the other matrices to reduce them this size, then doing a for-loop across each currency pair, row-wise, to create a 3-vector which I then apply median() on. Surely there's a better way to do this? Please advise. Thanks, Murali Menon _ Valentines Day -- Shop for gifts that spell L-O-V-E at MSN Shopping __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.