Responses inline. On Sun, 11 Mar 2018, Neha Aggarwal wrote:

## Advertising

Hello All, I am facing a unique problem and am unable to find any help in R help pages or online. I will appreciate your help for the following problem: I have 2 data-frames, samples below and there is an expected output R Dataframe1: C1 C2 C3 C4...... CN R1 0 1 0 1 R2 1 0 1 1 R3 1 0 0 0 . . . RN U Dataframe2 : C1 C2 C3 C4...... CN U1 1 1 0 1 U2 1 1 1 1 Expected Output: U1 satisfies R1, R3 U2 satisfies R1, R2, R3 So this is a comparison of dataframes problem, with a subset dimension. There are 2 dataframe R and U. column names are same. There are certain columns belonging to each row in dataframe 1, denoted as 1s, while there are certain cols to each U denoted as 1s in each URow in dataframe2. I have to find relationships between Rs and Us. So i start with each U row in U dataframe (lets say U1 row) and try to find all the rows in R dataframe, which are subset of U1 row. I cant find a way to compare rows to see if one is subset of another....what can I try, any pointers/ packages will be great help. Please help. Thanks Neha [[alternative HTML version deleted]]

`As the Posting Guide says (you have read it, haven't you?), please post`

`plain text... the mailing list mangles your code with varying levels of`

`damage as it tries to fix this problem for you. It also helps if you can`

`pose your question in R code rather than pseudo-code and formatted data`

`tables.`

`Your problem appears to be an outer join of binary subsets... I don't`

`think this is a very common problem structure (in most cases you want to`

`avoid outer joins if you can because they are computationally expensive),`

`but you can read ?outer and ?expand.grid to see some ways to pair up all`

`possible row indexes. If you know that the number of rows in both inputs`

`is <32, this problem can be optimized for speed and memory with the bitops`

`package, or for larger size problems you can use the bit package. The`

`below code shows the skeleton of logic with no such optimizations, and is`

`likely the most practical solution for a one-off analysis:`

############## r <- read.table( text= " C1 C2 C3 C4 R1 0 1 0 1 R2 1 0 1 1 R3 1 0 0 0 ", header=TRUE ) u <- read.table( text= " C1 C2 C3 C4 U1 1 1 0 1 U2 1 1 1 1 ", header=TRUE ) rmx <- as.matrix( r ) umx <- as.matrix( u ) result <- expand.grid( R = rownames( rmx ) , U = rownames( umx ) ) # see how: 1L - umx[ U, ] # 1 for every 0 in u rmx[ R, ] # 1 for every 1 in r ( 1L - umx[ U, ] ) * rmx[ R, ] # 1 where both have 1 # do it: # for every row, 0 where both conditions are true in any column result$IN <- 1L - with( result , apply( ( 1L - umx[ U, ] ) # any 0 column * rmx[ R, ] # any 1 column , 1 # by rows , max ) ) result # show key pairings only result[ as.logical( result$IN ), c( "U", "R" ) ] ############## --------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnew...@dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.