[R] rbind-ing numeric matrices
Good morning, I'm running into trouble rbind-ing numeric matrices with differing numbers of rows. In particular, there seem to be issues whenever a one-row numeric matrix is involved. Assume A is a numeric matrix with 1 row and Y columns and B is a numeric matrix with X rows and Y columns. Let C be the result of rbinding A and B. Then C is a numeric matrix with X + 1 rows and Y columns, only instead of the rows of B being stacked beneath the row of A as expected, the first Y elements of the 1st column of B are placed in the 2nd row of C, the remaining values of B are discarded, and NULL values fill out the rest of the matrix C. The number of columns of A and B match. The colnames of A and B match. Both are numeric matrices. I've pored over the rbind/cbind documentation but can't identify why I'm getting this behavior from rbind. I'd be extremely grateful for your suggestions or thoughts. Nick [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rbind-ing numeric matrices
Thank you, Steve and Sarah, for your swift replies. I didn't know about dput(). class() returns matrix for A, B, and C, but here: class(A) [1] matrix class(B) [1] matrix dput(A) structure(list(1239814462, 1239814601, 14349, 3, 4, 0, 12, 46601, 17801, 12401, 106001), .Dim = c(1L, 11L), .Dimnames = list( NULL, c(a, b, c, d, e, f, g, h, i, j, k))) dput(B) structure(c(1239197400, 1239199200, 1239202800, 1239199199, 1239202799, 1239206399, 14342, 14342, 14342, 3, 3, 3, 0, 0, 0, 0, 0, 0, 9, 10, 11, 35999, 39599, 43199, 7199, 10799, 14399, 1799, 5399, 8999, 1799, 5399, 8999), .Dim = c(3L, 11L), .Dimnames = list( NULL, c(a, b, c, d, e, f, g, h, i, j, k))) C - rbind(A,B) dput(C) structure(list(1239814462, 1239197400, NULL, NULL, 1239814601, 1239199200, NULL, NULL, 14349, 1239202800, NULL, NULL, 3, 1239199199, NULL, NULL, 4, 1239202799, NULL, NULL, 0, 1239206399, NULL, NULL, 12, 14342, NULL, NULL, 46601, 14342, NULL, NULL, 17801, 14342, NULL, NULL, 12401, 3, NULL, NULL, 106001, 3, NULL, NULL), .Dim = c(4L, 11L), .Dimnames = list(NULL, c(a, b, c, d, e, f, g, h, i, j, k))) But how do I convert A to the appropriate form for rbinding? Not simply with as.matrix(). Witness: dput(as.matrix(A)) structure(list(1239814462, 1239814601, 14349, 3, 4, 0, 12, 46601, 17801, 12401, 106001), .Dim = c(1L, 11L), .Dimnames = list( NULL, c(a, b, c, d, e, f, g, h, i, j, k))) I think we're closer but I still need your help! Nick On Tue, May 1, 2012 at 12:02 PM, Steve Lianoglou mailinglist.honey...@gmail.com wrote: Hi, On Tue, May 1, 2012 at 11:52 AM, Nick Switanek nswita...@gmail.com wrote: Good morning, I'm running into trouble rbind-ing numeric matrices with differing numbers of rows. In particular, there seem to be issues whenever a one-row numeric matrix is involved. Assume A is a numeric matrix with 1 row and Y columns and B is a numeric matrix with X rows and Y columns. Let C be the result of rbinding A and B. Then C is a numeric matrix with X + 1 rows and Y columns, only instead of the rows of B being stacked beneath the row of A as expected, the first Y elements of the 1st column of B are placed in the 2nd row of C, the remaining values of B are discarded, and NULL values fill out the rest of the matrix C. The number of columns of A and B match. The colnames of A and B match. Both are numeric matrices. I've pored over the rbind/cbind documentation but can't identify why I'm getting this behavior from rbind. I'd be extremely grateful for your suggestions or thoughts. If everything you say is true (and I'm understanding what you're saying), there must be something else going on with your data. Consider: R m1 - matrix(-(1:5), nrow=1) R m2 - matrix(1:20, ncol=5) R rbind(m1, m2) [,1] [,2] [,3] [,4] [,5] [1,] -1 -2 -3 -4 -5 [2,]159 13 17 [3,]26 10 14 18 [4,]37 11 15 19 [5,]48 12 16 20 Can you provide a small example of your data that reproduces the problem you're seeing? Construct these objects in your workspace and copy/paste the output of dput on your m1 and m2 matrices so we can easily work w/ them. Cheers, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rbind-ing numeric matrices
Thank you, Sarah! Yes, unlist() looks like it will do just what I need. Thank you. I took a different tack. I'd changed the function returning A to coerce the row from the data.frame using as.numeric(). Any reason to prefer your way or mine? thanks again, Nick On Tue, May 1, 2012 at 2:08 PM, Sarah Goslee sarah.gos...@gmail.com wrote: Hi Nick, On Tue, May 1, 2012 at 1:56 PM, Nick Switanek nswita...@gmail.com wrote: Thank you, Steve and Sarah, for your swift replies. I didn't know about dput(). class() returns matrix for A, B, and C, but here: class(A) [1] matrix class(B) [1] matrix dput(A) structure(list(1239814462, 1239814601, 14349, 3, 4, 0, 12, 46601, 17801, 12401, 106001), .Dim = c(1L, 11L), .Dimnames = list( NULL, c(a, b, c, d, e, f, g, h, i, j, k))) I'm not sure how you created A, but it's a list - see the structure given by dput(), and also: is.list(A) [1] TRUE is.list(B) [1] FALSE dput(B) structure(c(1239197400, 1239199200, 1239202800, 1239199199, 1239202799, 1239206399, 14342, 14342, 14342, 3, 3, 3, 0, 0, 0, 0, 0, 0, 9, 10, 11, 35999, 39599, 43199, 7199, 10799, 14399, 1799, 5399, 8999, 1799, 5399, 8999), .Dim = c(3L, 11L), .Dimnames = list( NULL, c(a, b, c, d, e, f, g, h, i, j, k))) Note that the structure for B doesn't say anything about it being a list. In this particular case, you can fix that with: rbind(unlist(A), B) a b c d e f g h i j k [1,] 1239814462 1239814601 14349 3 4 0 12 46601 17801 12401 106001 [2,] 1239197400 1239199199 14342 3 0 0 9 35999 7199 1799 1799 [3,] 1239199200 1239202799 14342 3 0 0 10 39599 10799 5399 5399 [4,] 1239202800 1239206399 14342 3 0 0 11 43199 14399 8999 8999 But now I'm curious where A came from. Sarah C - rbind(A,B) dput(C) structure(list(1239814462, 1239197400, NULL, NULL, 1239814601, 1239199200, NULL, NULL, 14349, 1239202800, NULL, NULL, 3, 1239199199, NULL, NULL, 4, 1239202799, NULL, NULL, 0, 1239206399, NULL, NULL, 12, 14342, NULL, NULL, 46601, 14342, NULL, NULL, 17801, 14342, NULL, NULL, 12401, 3, NULL, NULL, 106001, 3, NULL, NULL), .Dim = c(4L, 11L), .Dimnames = list(NULL, c(a, b, c, d, e, f, g, h, i, j, k))) But how do I convert A to the appropriate form for rbinding? Not simply with as.matrix(). Witness: dput(as.matrix(A)) structure(list(1239814462, 1239814601, 14349, 3, 4, 0, 12, 46601, 17801, 12401, 106001), .Dim = c(1L, 11L), .Dimnames = list( NULL, c(a, b, c, d, e, f, g, h, i, j, k))) I think we're closer but I still need your help! Nick On Tue, May 1, 2012 at 12:02 PM, Steve Lianoglou mailinglist.honey...@gmail.com wrote: Hi, On Tue, May 1, 2012 at 11:52 AM, Nick Switanek nswita...@gmail.com wrote: Good morning, I'm running into trouble rbind-ing numeric matrices with differing numbers of rows. In particular, there seem to be issues whenever a one-row numeric matrix is involved. Assume A is a numeric matrix with 1 row and Y columns and B is a numeric matrix with X rows and Y columns. Let C be the result of rbinding A and B. Then C is a numeric matrix with X + 1 rows and Y columns, only instead of the rows of B being stacked beneath the row of A as expected, the first Y elements of the 1st column of B are placed in the 2nd row of C, the remaining values of B are discarded, and NULL values fill out the rest of the matrix C. The number of columns of A and B match. The colnames of A and B match. Both are numeric matrices. I've pored over the rbind/cbind documentation but can't identify why I'm getting this behavior from rbind. I'd be extremely grateful for your suggestions or thoughts. If everything you say is true (and I'm understanding what you're saying), there must be something else going on with your data. Consider: R m1 - matrix(-(1:5), nrow=1) R m2 - matrix(1:20, ncol=5) R rbind(m1, m2) [,1] [,2] [,3] [,4] [,5] [1,] -1 -2 -3 -4 -5 [2,]159 13 17 [3,]26 10 14 18 [4,]37 11 15 19 [5,]48 12 16 20 Can you provide a small example of your data that reproduces the problem you're seeing? Construct these objects in your workspace and copy/paste the output of dput on your m1 and m2 matrices so we can easily work w/ them. Cheers, -steve -- Sarah Goslee http://www.functionaldiversity.org [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] sqldf: issues with natural joins
Hello, I'm having trouble discovering what's going wrong with my use of natural joins via sqldf. Following the instructions under 4i at http://code.google.com/p/sqldf/, which discusses creating indices to speed joins, I have been only unreliably able to get natural joins to work. For example, Tid - c('AES 01-01-02 10:58:00', 'AES 01-01-02 11:53:00', 'AES 01-01-05 10:58:00', 'AES 01-01-11 12:30:00') A - data.frame(Tid, dfName = 'a') B - data.frame(Tid = Tid[2:4], dfName = 'b') C - data.frame(Tid = Tid[1:3], dfName = 'c') # then use the sqldf library library(sqldf) sqldf() # to create indices on the Tid variable shared across data.frames sqldf('create index indA on A(Tid)') sqldf('create index indB on B(Tid)') sqldf('create index indC on C(Tid)') # check to make sure everything is there sqldf('select * from sqlite_master') # doing a natural join (implicitly on Tid) # does not give the expected joins sqldf('select * from main.A natural join main.B') [1] TiddfName 0 rows (or 0-length row.names) sqldf('select * from main.A natural join main.C') [1] TiddfName 0 rows (or 0-length row.names) sqldf('select * from main.B natural join main.C') [1] TiddfName 0 rows (or 0-length row.names) # even using a where clause (which doesn't have the efficiency qualities I need the indexed natural joins for) is problematic, setting values of the dfName variable incorrectly for the data from C sqldf('select * from main.B b, main.C c where b.Tid = c.Tid') Tid dfName Tid dfName 1 AES 01-01-02 11:53:00 b AES 01-01-02 11:53:00 b 2 AES 01-01-05 10:58:00 b AES 01-01-05 10:58:00 b I'm grateful for your guidance on what I'm doing wrong with the natural join in sqldf. many thanks, Nick [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf: issues with natural joins
Thank you very much for these clarifying responses, Gabor. I had mistakenly assumed that creating the index on Tid restricted the natural join to joining on Tid. Can you describe when and how indices speed up joins, or can you point me to resources that address this? Is it only for natural joins or any joins (including, say, a select statement with where clause)? thanks, nick On Thu, May 20, 2010 at 11:42 AM, Gabor Grothendieck ggrothendi...@gmail.com wrote: Although that works I had meant to write: names(B)[2] - dfNameB # ... other commands sqldf('select * from main.A natural join main.B') so that now only Tid is in common so the natural join just picks it up and also the heuristic works again since we no longer retrieve duplicate column names. On Thu, May 20, 2010 at 12:32 PM, Gabor Grothendieck ggrothendi...@gmail.com wrote: There are two problems: 1. A natural join will join all columns with the same names in the two tables and that includes not only Tid but also dfName and since there are no rows that have the same Tid and dfName the result has zero rows. 2. the heuristic it uses fails when you retrieve the same column name from multiple tables so use method = raw to turn off the heuristic. The heuristic will be improved to cover this case in the future. Read FAQ #1 on the home page: http://code.google.com/p/sqldf/#1._How_does_sqldf_handle_classes_and_factors ? This should work: sqldf('select * from main.A join main.B using(Tid)', method = raw) Tid dfName dfName 1 AES 01-01-02 11:53:00 a b 2 AES 01-01-05\n10:58:00 a b 3 AES 01-01-11 12:30:00 a b This works too as the double dfName no longer exists to confuse the heuristic: names(B)[2] - dfNameB sqldf('select * from main.A join main.B using(Tid)') On Thu, May 20, 2010 at 12:04 PM, Nick Switanek nswita...@gmail.com wrote: Hello, I'm having trouble discovering what's going wrong with my use of natural joins via sqldf. Following the instructions under 4i at http://code.google.com/p/sqldf/, which discusses creating indices to speed joins, I have been only unreliably able to get natural joins to work. For example, Tid - c('AES 01-01-02 10:58:00', 'AES 01-01-02 11:53:00', 'AES 01-01-05 10:58:00', 'AES 01-01-11 12:30:00') A - data.frame(Tid, dfName = 'a') B - data.frame(Tid = Tid[2:4], dfName = 'b') C - data.frame(Tid = Tid[1:3], dfName = 'c') # then use the sqldf library library(sqldf) sqldf() # to create indices on the Tid variable shared across data.frames sqldf('create index indA on A(Tid)') sqldf('create index indB on B(Tid)') sqldf('create index indC on C(Tid)') # check to make sure everything is there sqldf('select * from sqlite_master') # doing a natural join (implicitly on Tid) # does not give the expected joins sqldf('select * from main.A natural join main.B') [1] TiddfName 0 rows (or 0-length row.names) sqldf('select * from main.A natural join main.C') [1] TiddfName 0 rows (or 0-length row.names) sqldf('select * from main.B natural join main.C') [1] TiddfName 0 rows (or 0-length row.names) # even using a where clause (which doesn't have the efficiency qualities I need the indexed natural joins for) is problematic, setting values of the dfName variable incorrectly for the data from C sqldf('select * from main.B b, main.C c where b.Tid = c.Tid') Tid dfName Tid dfName 1 AES 01-01-02 11:53:00 b AES 01-01-02 11:53:00 b 2 AES 01-01-05 10:58:00 b AES 01-01-05 10:58:00 b I'm grateful for your guidance on what I'm doing wrong with the natural join in sqldf. many thanks, Nick [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] translating SQL statements into data.table operations
I've recently stumbled across data.table, Matthew Dowle's package. I'm impressed by the speed of the package in handling operations with large data.frames, but am a bit overwhelmed with the syntax. I'd like to express the SQL statement below using data.table operations rather than sqldf (which was incredibly slow for a small subset of my financial data) or import/export with a DBMS, but I haven't been able to figure out how to do it. I would be grateful for your suggestions. nick My aim is to join events (trades) from two datasets (edt and cdt) where, for the same stock, the events in one dataset occur between 15 and 75 days before the other, and within the same time window. I can only see how to express the WHERE e.SYMBOL = c.SYMBOL part in data.table syntax. I'm also at a loss at whether I can express the remainder using data.table's %between% operator or not. ctqm - sqldf(SELECT e.*, c.DATE 'DATEctrl', c.TIME 'TIMEctrl', c.PRICE 'PRICEctrl', c.SIZE 'SIZEctrl' FROM edt e, ctq c WHERE e.SYMBOL = c.SYMBOL AND julianday(e.DATE) - julianday(c.DATE) BETWEEN 15 AND 75 AND strftime('%H:%M:%S',c.TIME) BETWEEN strftime('%H:%M:%S',e.BEGTIME) AND strftime('%H:%M:%S',e.ENDTIME)) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.