[R] rbind-ing numeric matrices

2012-05-01 Thread Nick Switanek
Good morning,

I'm running into trouble rbind-ing numeric matrices with differing numbers
of rows. In particular, there seem to be issues whenever a one-row numeric
matrix is involved.

Assume A is a numeric matrix with 1 row and Y columns and B is a numeric
matrix with X rows and Y columns. Let C be the result of rbinding A and B.
Then C is a numeric matrix with X + 1 rows and Y columns, only instead of
the rows of B being stacked beneath the row of A as expected, the first Y
elements of the 1st column of B are placed in the 2nd row of C, the
remaining values of B are discarded, and NULL values fill out the rest of
the matrix C.

The number of columns of A and B match. The colnames of A and B match. Both
are numeric matrices. I've pored over the rbind/cbind documentation but
can't identify why I'm getting this behavior from rbind. I'd be extremely
grateful for your suggestions or thoughts.

Nick

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] rbind-ing numeric matrices

2012-05-01 Thread Nick Switanek
Thank you, Steve and Sarah, for your swift replies.

I didn't know about dput(). class() returns matrix for A, B, and C, but
here:

 class(A)
[1] matrix
 class(B)
[1] matrix
 dput(A)
structure(list(1239814462, 1239814601, 14349, 3, 4, 0, 12, 46601,
17801, 12401, 106001), .Dim = c(1L, 11L), .Dimnames = list(
NULL, c(a, b, c, d, e, f, g, h, i, j,
k)))
 dput(B)
structure(c(1239197400, 1239199200, 1239202800, 1239199199, 1239202799,
1239206399, 14342, 14342, 14342, 3, 3, 3, 0, 0, 0, 0, 0, 0, 9,
10, 11, 35999, 39599, 43199, 7199, 10799, 14399, 1799, 5399,
8999, 1799, 5399, 8999), .Dim = c(3L, 11L), .Dimnames = list(
NULL, c(a, b, c, d, e, f, g, h, i, j,
k)))
 C - rbind(A,B)
 dput(C)
structure(list(1239814462, 1239197400, NULL, NULL, 1239814601,
1239199200, NULL, NULL, 14349, 1239202800, NULL, NULL, 3,
1239199199, NULL, NULL, 4, 1239202799, NULL, NULL, 0, 1239206399,
NULL, NULL, 12, 14342, NULL, NULL, 46601, 14342, NULL, NULL,
17801, 14342, NULL, NULL, 12401, 3, NULL, NULL, 106001, 3,
NULL, NULL), .Dim = c(4L, 11L), .Dimnames = list(NULL, c(a,
b, c, d, e, f, g, h, i, j, k)))

But how do I convert A to the appropriate form for rbinding? Not simply
with as.matrix(). Witness:
 dput(as.matrix(A))
structure(list(1239814462, 1239814601, 14349, 3, 4, 0, 12, 46601,
17801, 12401, 106001), .Dim = c(1L, 11L), .Dimnames = list(
NULL, c(a, b, c, d, e, f, g, h, i, j,
k)))

I think we're closer but I still need your help!

Nick


On Tue, May 1, 2012 at 12:02 PM, Steve Lianoglou 
mailinglist.honey...@gmail.com wrote:

 Hi,

 On Tue, May 1, 2012 at 11:52 AM, Nick Switanek nswita...@gmail.com
 wrote:
  Good morning,
 
  I'm running into trouble rbind-ing numeric matrices with differing
 numbers
  of rows. In particular, there seem to be issues whenever a one-row
 numeric
  matrix is involved.
 
  Assume A is a numeric matrix with 1 row and Y columns and B is a numeric
  matrix with X rows and Y columns. Let C be the result of rbinding A and
 B.
  Then C is a numeric matrix with X + 1 rows and Y columns, only instead of
  the rows of B being stacked beneath the row of A as expected, the
 first Y
  elements of the 1st column of B are placed in the 2nd row of C, the
  remaining values of B are discarded, and NULL values fill out the rest of
  the matrix C.
 
  The number of columns of A and B match. The colnames of A and B match.
 Both
  are numeric matrices. I've pored over the rbind/cbind documentation but
  can't identify why I'm getting this behavior from rbind. I'd be extremely
  grateful for your suggestions or thoughts.

 If everything you say is true (and I'm understanding what you're
 saying), there must be something else going on with your data.
 Consider:

 R m1 - matrix(-(1:5), nrow=1)
 R m2 - matrix(1:20, ncol=5)
 R rbind(m1, m2)
  [,1] [,2] [,3] [,4] [,5]
 [1,]   -1   -2   -3   -4   -5
 [2,]159   13   17
 [3,]26   10   14   18
 [4,]37   11   15   19
 [5,]48   12   16   20

 Can you provide a small example of your data that reproduces the
 problem you're seeing?

 Construct these objects in your workspace and copy/paste the output of
 dput on your m1 and m2 matrices so we can easily work w/ them.

 Cheers,
 -steve

 --
 Steve Lianoglou
 Graduate Student: Computational Systems Biology
  | Memorial Sloan-Kettering Cancer Center
  | Weill Medical College of Cornell University
 Contact Info: http://cbio.mskcc.org/~lianos/contact


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] rbind-ing numeric matrices

2012-05-01 Thread Nick Switanek
Thank you, Sarah!

Yes, unlist() looks like it will do just what I need. Thank you.

I took a different tack. I'd changed the function returning A to coerce the
row from the data.frame using as.numeric(). Any reason to prefer your way
or mine?

thanks again,
Nick


On Tue, May 1, 2012 at 2:08 PM, Sarah Goslee sarah.gos...@gmail.com wrote:

 Hi Nick,

 On Tue, May 1, 2012 at 1:56 PM, Nick Switanek nswita...@gmail.com wrote:
  Thank you, Steve and Sarah, for your swift replies.
 
  I didn't know about dput(). class() returns matrix for A, B, and C, but
  here:
 
  class(A)
  [1] matrix
  class(B)
  [1] matrix
  dput(A)
  structure(list(1239814462, 1239814601, 14349, 3, 4, 0, 12, 46601,
 17801, 12401, 106001), .Dim = c(1L, 11L), .Dimnames = list(
 NULL, c(a, b, c, d, e, f, g, h, i, j,
 k)))

 I'm not sure how you created A, but it's a list - see the structure
 given by dput(), and also:

  is.list(A)
 [1] TRUE
  is.list(B)
 [1] FALSE


  dput(B)
  structure(c(1239197400, 1239199200, 1239202800, 1239199199, 1239202799,
  1239206399, 14342, 14342, 14342, 3, 3, 3, 0, 0, 0, 0, 0, 0, 9,
  10, 11, 35999, 39599, 43199, 7199, 10799, 14399, 1799, 5399,
  8999, 1799, 5399, 8999), .Dim = c(3L, 11L), .Dimnames = list(
 NULL, c(a, b, c, d, e, f, g, h, i, j,
 k)))

 Note that the structure for B doesn't say anything about it being a list.

 In this particular case, you can fix that with:
  rbind(unlist(A), B)
  a  b c d e f  g h i j  k
 [1,] 1239814462 1239814601 14349 3 4 0 12 46601 17801 12401 106001
 [2,] 1239197400 1239199199 14342 3 0 0  9 35999  7199  1799   1799
 [3,] 1239199200 1239202799 14342 3 0 0 10 39599 10799  5399   5399
 [4,] 1239202800 1239206399 14342 3 0 0 11 43199 14399  8999   8999

 But now I'm curious where A came from.

 Sarah

  C - rbind(A,B)
  dput(C)
  structure(list(1239814462, 1239197400, NULL, NULL, 1239814601,
 1239199200, NULL, NULL, 14349, 1239202800, NULL, NULL, 3,
 1239199199, NULL, NULL, 4, 1239202799, NULL, NULL, 0, 1239206399,
 NULL, NULL, 12, 14342, NULL, NULL, 46601, 14342, NULL, NULL,
 17801, 14342, NULL, NULL, 12401, 3, NULL, NULL, 106001, 3,
 NULL, NULL), .Dim = c(4L, 11L), .Dimnames = list(NULL, c(a,
  b, c, d, e, f, g, h, i, j, k)))
 
  But how do I convert A to the appropriate form for rbinding? Not simply
  with as.matrix(). Witness:
  dput(as.matrix(A))
  structure(list(1239814462, 1239814601, 14349, 3, 4, 0, 12, 46601,
 17801, 12401, 106001), .Dim = c(1L, 11L), .Dimnames = list(
 NULL, c(a, b, c, d, e, f, g, h, i, j,
 k)))
 
  I think we're closer but I still need your help!
 
  Nick
 
 
  On Tue, May 1, 2012 at 12:02 PM, Steve Lianoglou 
  mailinglist.honey...@gmail.com wrote:
 
  Hi,
 
  On Tue, May 1, 2012 at 11:52 AM, Nick Switanek nswita...@gmail.com
  wrote:
   Good morning,
  
   I'm running into trouble rbind-ing numeric matrices with differing
  numbers
   of rows. In particular, there seem to be issues whenever a one-row
  numeric
   matrix is involved.
  
   Assume A is a numeric matrix with 1 row and Y columns and B is a
 numeric
   matrix with X rows and Y columns. Let C be the result of rbinding A
 and
  B.
   Then C is a numeric matrix with X + 1 rows and Y columns, only
 instead of
   the rows of B being stacked beneath the row of A as expected, the
  first Y
   elements of the 1st column of B are placed in the 2nd row of C, the
   remaining values of B are discarded, and NULL values fill out the
 rest of
   the matrix C.
  
   The number of columns of A and B match. The colnames of A and B match.
  Both
   are numeric matrices. I've pored over the rbind/cbind documentation
 but
   can't identify why I'm getting this behavior from rbind. I'd be
 extremely
   grateful for your suggestions or thoughts.
 
  If everything you say is true (and I'm understanding what you're
  saying), there must be something else going on with your data.
  Consider:
 
  R m1 - matrix(-(1:5), nrow=1)
  R m2 - matrix(1:20, ncol=5)
  R rbind(m1, m2)
   [,1] [,2] [,3] [,4] [,5]
  [1,]   -1   -2   -3   -4   -5
  [2,]159   13   17
  [3,]26   10   14   18
  [4,]37   11   15   19
  [5,]48   12   16   20
 
  Can you provide a small example of your data that reproduces the
  problem you're seeing?
 
  Construct these objects in your workspace and copy/paste the output of
  dput on your m1 and m2 matrices so we can easily work w/ them.
 
  Cheers,
  -steve
 


 --
 Sarah Goslee
 http://www.functionaldiversity.org


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] sqldf: issues with natural joins

2010-05-20 Thread Nick Switanek
Hello,

I'm having trouble discovering what's going wrong with my use of natural
joins via sqldf.

Following the instructions under 4i at http://code.google.com/p/sqldf/,
which discusses creating indices to speed joins, I have been only unreliably
able to get natural joins to work.

For example,

 Tid - c('AES 01-01-02 10:58:00', 'AES 01-01-02 11:53:00', 'AES 01-01-05
10:58:00', 'AES 01-01-11 12:30:00')
 A - data.frame(Tid, dfName = 'a')
 B - data.frame(Tid = Tid[2:4], dfName = 'b')
 C - data.frame(Tid = Tid[1:3], dfName = 'c')

# then use the sqldf library
 library(sqldf)
 sqldf()

# to create indices on the Tid variable shared across data.frames
 sqldf('create index indA on A(Tid)')
 sqldf('create index indB on B(Tid)')
 sqldf('create index indC on C(Tid)')

# check to make sure everything is there
 sqldf('select * from sqlite_master')

# doing a natural join (implicitly on Tid)
# does not give the expected joins
 sqldf('select * from main.A natural join main.B')
[1] TiddfName
0 rows (or 0-length row.names)
 sqldf('select * from main.A natural join main.C')
[1] TiddfName
0 rows (or 0-length row.names)
 sqldf('select * from main.B natural join main.C')
[1] TiddfName
0 rows (or 0-length row.names)

# even using a where clause (which doesn't have the efficiency qualities I
need the indexed natural joins for) is problematic, setting values of the
dfName variable incorrectly for the data from C
 sqldf('select * from main.B b, main.C c where b.Tid = c.Tid')
Tid dfName   Tid dfName
1 AES 01-01-02 11:53:00  b AES 01-01-02 11:53:00  b
2 AES 01-01-05 10:58:00  b AES 01-01-05 10:58:00  b

I'm grateful for your guidance on what I'm doing wrong with the natural join
in sqldf.

many thanks,
Nick

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf: issues with natural joins

2010-05-20 Thread Nick Switanek
Thank you very much for these clarifying responses, Gabor.

I had mistakenly assumed that creating the index on Tid restricted the
natural join to joining on Tid. Can you describe when and how indices speed
up joins, or can you point me to resources that address this? Is it only for
natural joins or any joins (including, say, a select statement with where
clause)?

thanks,
nick


On Thu, May 20, 2010 at 11:42 AM, Gabor Grothendieck 
ggrothendi...@gmail.com wrote:

 Although that works I had meant to write:

  names(B)[2] - dfNameB
  # ... other commands
  sqldf('select * from main.A natural join main.B')

 so that now only Tid is in common so the natural join just picks it up
 and also the heuristic works again since we no longer retrieve
 duplicate column names.

 On Thu, May 20, 2010 at 12:32 PM, Gabor Grothendieck
 ggrothendi...@gmail.com wrote:
  There are two problems:
 
  1. A natural join will join all columns with the same names in the two
  tables and that includes not only Tid but also dfName and since there
  are no rows that have the same Tid and dfName the result has zero
  rows.
 
  2. the heuristic it uses fails when you retrieve the same column name
  from multiple tables so use method = raw to turn off the heuristic.
  The heuristic will be improved to cover this case in the future.
  Read FAQ #1 on the home page:
 
 http://code.google.com/p/sqldf/#1._How_does_sqldf_handle_classes_and_factors
 ?
 
  This should work:
 
  sqldf('select * from main.A join main.B using(Tid)', method = raw)
  Tid dfName dfName
  1  AES 01-01-02 11:53:00  a  b
  2 AES 01-01-05\n10:58:00  a  b
  3  AES 01-01-11 12:30:00  a  b
 
  This works too as the double dfName no longer exists to confuse the
 heuristic:
 
  names(B)[2] - dfNameB
  sqldf('select * from main.A join main.B using(Tid)')
 
 
 
  On Thu, May 20, 2010 at 12:04 PM, Nick Switanek nswita...@gmail.com
 wrote:
  Hello,
 
  I'm having trouble discovering what's going wrong with my use of natural
  joins via sqldf.
 
  Following the instructions under 4i at http://code.google.com/p/sqldf/,
  which discusses creating indices to speed joins, I have been only
 unreliably
  able to get natural joins to work.
 
  For example,
 
  Tid - c('AES 01-01-02 10:58:00', 'AES 01-01-02 11:53:00', 'AES
 01-01-05
  10:58:00', 'AES 01-01-11 12:30:00')
  A - data.frame(Tid, dfName = 'a')
  B - data.frame(Tid = Tid[2:4], dfName = 'b')
  C - data.frame(Tid = Tid[1:3], dfName = 'c')
 
  # then use the sqldf library
  library(sqldf)
  sqldf()
 
  # to create indices on the Tid variable shared across data.frames
  sqldf('create index indA on A(Tid)')
  sqldf('create index indB on B(Tid)')
  sqldf('create index indC on C(Tid)')
 
  # check to make sure everything is there
  sqldf('select * from sqlite_master')
 
  # doing a natural join (implicitly on Tid)
  # does not give the expected joins
  sqldf('select * from main.A natural join main.B')
  [1] TiddfName
  0 rows (or 0-length row.names)
  sqldf('select * from main.A natural join main.C')
  [1] TiddfName
  0 rows (or 0-length row.names)
  sqldf('select * from main.B natural join main.C')
  [1] TiddfName
  0 rows (or 0-length row.names)
 
  # even using a where clause (which doesn't have the efficiency qualities
 I
  need the indexed natural joins for) is problematic, setting values of
 the
  dfName variable incorrectly for the data from C
  sqldf('select * from main.B b, main.C c where b.Tid = c.Tid')
 Tid dfName   Tid dfName
  1 AES 01-01-02 11:53:00  b AES 01-01-02 11:53:00  b
  2 AES 01-01-05 10:58:00  b AES 01-01-05 10:58:00  b
 
  I'm grateful for your guidance on what I'm doing wrong with the natural
 join
  in sqldf.
 
  many thanks,
  Nick
 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] translating SQL statements into data.table operations

2010-03-24 Thread Nick Switanek
I've recently stumbled across data.table, Matthew Dowle's package. I'm
impressed by the speed of the package in handling operations with large
data.frames, but am a bit overwhelmed with the syntax. I'd like to express
the SQL statement below using data.table operations rather than sqldf (which
was incredibly slow for a small subset of my financial data) or
import/export with a DBMS, but I haven't been able to figure out how to do
it. I would be grateful for your suggestions.

nick



My aim is to join events (trades) from two datasets (edt and cdt) where,
for the same stock, the events in one dataset occur between 15 and 75 days
before the other, and within the same time window. I can only see how to
express the WHERE e.SYMBOL = c.SYMBOL part in data.table syntax. I'm also
at a loss at whether I can express the remainder using data.table's
%between% operator or not.

ctqm - sqldf(SELECT e.*,
 c.DATE 'DATEctrl',
 c.TIME 'TIMEctrl',
 c.PRICE 'PRICEctrl',
 c.SIZE 'SIZEctrl'

 FROM edt e, ctq c

 WHERE e.SYMBOL = c.SYMBOL AND
   julianday(e.DATE) - julianday(c.DATE) BETWEEN 15 AND
75 AND
   strftime('%H:%M:%S',c.TIME) BETWEEN
strftime('%H:%M:%S',e.BEGTIME) AND strftime('%H:%M:%S',e.ENDTIME))

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.