Re: [R] unique/subset problem

2007-01-26 Thread lalitha viswanath
Hi
The pruned dataset has 8 unique genomes in it while
the dataset before pruning has 65 unique genomes in
it.
However calling unique on the pruned dataset seems to
return 65 no matter what.

Any assistance in this matter would be appreciated.

Thanks
Lalitha
--- Weiwei Shi [EMAIL PROTECTED] wrote:

 Hi,
 
 Even you removed many genomes1 by setting score
 -5; it is not
 necessary saying you changed the uniqueness.
 
 To check this, you can do like
 p0 - unique(dataset[dataset$score -5, genome1])
 # same as subset
 p1 - unique(dataset[dataset$score= -5, genome1])
 
 setdiff(p1, p0)
 
 if the output above has NULL, then it means even
 though you remove
 many genomes1, but it does not help changing the
 uniqueness.
 
 HTH,
 
 weiwei
 
 
 
 On 1/25/07, lalitha viswanath
 [EMAIL PROTECTED] wrote:
  Hi
  I am new to R programming and am using subset to
  extract part of a data as follows
 
  names(dataset) =
  c(genome1,genome2,dist,score);
  prunedrelatives - subset(dataset, score  -5);
 
  However when I use unique to find the number of
 unique
  genomes now present in prunedrelatives I get
 results
  identical to calling unique(dataset$genome1)
 although
  subset has eliminated many genomes and records.
 
  I would greatly appreciate your input about using
  unique correctly  in this regard.
 
  Thanks
  Lalitha
 
 
 
 


  TV dinner still cooling?
  Check out Tonight's Picks on Yahoo! TV.
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained,
 reproducible code.
 
 
 
 -- 
 Weiwei Shi, Ph.D
 Research Scientist
 GeneGO, Inc.
 
 Did you always know?
 No, I did not. But I believed...
 ---Matrix III
 



 

Bored stiff? Loosen up...

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] unique/subset problem

2007-01-26 Thread Sarah Goslee
Without knowing more about your data, it is hard to say for certain,
but might you be confusing unique _values_ with _factor levels_?

 mydata - as.factor(sort(rep(1:5, 2)))
# mydata has 10 values, 5 unique values, and 5 factor levels
 mydata
 [1] 1 1 2 2 3 3 4 4 5 5
Levels: 1 2 3 4 5
 unique(mydata)
[1] 1 2 3 4 5
Levels: 1 2 3 4 5
 mydata.subset - mydata[1:4]
# the subset now has only 2 unique values, but the output
# still lists all five factor levels
 unique(mydata.subset)
[1] 1 2
Levels: 1 2 3 4 5

# try drop=TRUE as an option to subset
 mydata.subset - mydata[1:4, drop=TRUE]
 unique(mydata.subset)
[1] 1 2
Levels: 1 2

Alternatively, if this is the problem and you don't need those
data to be factors, you could always convert them to a more
appropriate form.

Sarah

  On 1/25/07, lalitha viswanath
  [EMAIL PROTECTED] wrote:
   Hi
   I am new to R programming and am using subset to
   extract part of a data as follows
  
   names(dataset) =
   c(genome1,genome2,dist,score);
   prunedrelatives - subset(dataset, score  -5);
  
   However when I use unique to find the number of
  unique
   genomes now present in prunedrelatives I get
  results
   identical to calling unique(dataset$genome1)
  although
   subset has eliminated many genomes and records.
  
   I would greatly appreciate your input about using
   unique correctly  in this regard.
  
   Thanks
   Lalitha
  

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] unique/subset problem

2007-01-26 Thread Weiwei Shi
Then you need to provide more details about the calls you made and your dataset.
For example, you can tell us by
str(prunedrelatives, 1)

how did you call unique on prunedrelative and so on? I made a test
data it gave me what you wanted (omitted here).

On 1/26/07, lalitha viswanath [EMAIL PROTECTED] wrote:
 Hi
 The pruned dataset has 8 unique genomes in it while
 the dataset before pruning has 65 unique genomes in
 it.
 However calling unique on the pruned dataset seems to
 return 65 no matter what.

 Any assistance in this matter would be appreciated.

 Thanks
 Lalitha
 --- Weiwei Shi [EMAIL PROTECTED] wrote:

  Hi,
 
  Even you removed many genomes1 by setting score
  -5; it is not
  necessary saying you changed the uniqueness.
 
  To check this, you can do like
  p0 - unique(dataset[dataset$score -5, genome1])
  # same as subset
  p1 - unique(dataset[dataset$score= -5, genome1])
 
  setdiff(p1, p0)
 
  if the output above has NULL, then it means even
  though you remove
  many genomes1, but it does not help changing the
  uniqueness.
 
  HTH,
 
  weiwei
 
 
 
  On 1/25/07, lalitha viswanath
  [EMAIL PROTECTED] wrote:
   Hi
   I am new to R programming and am using subset to
   extract part of a data as follows
  
   names(dataset) =
   c(genome1,genome2,dist,score);
   prunedrelatives - subset(dataset, score  -5);
  
   However when I use unique to find the number of
  unique
   genomes now present in prunedrelatives I get
  results
   identical to calling unique(dataset$genome1)
  although
   subset has eliminated many genomes and records.
  
   I would greatly appreciate your input about using
   unique correctly  in this regard.
  
   Thanks
   Lalitha
  
  
  
  
 
 
   TV dinner still cooling?
   Check out Tonight's Picks on Yahoo! TV.
  
   __
   R-help@stat.math.ethz.ch mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained,
  reproducible code.
  
 
 
  --
  Weiwei Shi, Ph.D
  Research Scientist
  GeneGO, Inc.
 
  Did you always know?
  No, I did not. But I believed...
  ---Matrix III
 




 
 Bored stiff? Loosen up...
 Download and play hundreds of games for free on Yahoo! Games.
 http://games.yahoo.com/games/front



-- 
Weiwei Shi, Ph.D
Research Scientist
GeneGO, Inc.

Did you always know?
No, I did not. But I believed...
---Matrix III

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] unique/subset problem

2007-01-26 Thread lalitha viswanath
Hi
I read in my dataset using
dt read.table(filename)
calling unique(levels(dt$genome1))  yields the
following 

 aero  aful  aquae atum_D   
bbur  bhal  bmel  bsub 
 [9] buch  cace  ccre  cglu 
cjej  cper  cpneuAcpneuC   
[17] cpneuJctraM ecoliO157 hbsp 
hinf  hpyl  linn  llact
[25] lmon  mgen  mjan  mlep 
mlot  mpneu mpul  mthe 
[33] mtub  mtub_cdc  nost  pabyssi  
paer  paero pmul  pyro 
[41] rcon  rpxx  saur_mu50 saur_n315
sent  smel  spneu spyo 
[49] ssol  stok  styp  synecho  
tacid tmar  tpal  tvol 
[57] uure  vcho  xfas  ypes 

It shows 60 genomes, which is correct.

I extracted a subset as follows
possible_relatives_subset - subset(dt, Y  -5)
I am pasting the results below
 genome1   genome2 parameterX  Y
21   sent ecoliO157  0.00590 -200.633493
22   sent  paer  0.18603 -100.200570
27   styp ecoliO157  0.00484 -240.708645
28   styp  paer  0.18497 -30.250127
41   paer  sent  0.18603 -60.200570
44   paer  styp  0.18497 -80.250127
49   paer  hinf  0.18913 -90.056333
53   paer  vcho  0.18703 -10.153929
55   paer  pmul  0.18587 -100.208042
67   paer  buch  0.21485  -80.898667
70   paer  ypes  0.18460 -107.267454
82   paer  xfas  0.26268  -61.920552
95   hinf ecoliO157  0.07654 -163.018417
96   hinf  paer  0.18913 -10.056333
103  vcho ecoliO157  0.09518 -140.921153
104  vcho  paer  0.18703 -10.153929
107  pmul ecoliO157  0.07328 -165.215225
108  pmul  paer  0.18587 -10.208042
131  buch ecoliO157  0.15412 -11.746939
132  buch  paer  0.21485  -8.898667
137  ypes ecoliO157  0.02705 -19.171851
138  ypes  paer  0.18460 -10.267454
171 ecoliO157  sent  0.00590 -20.633493
174 ecoliO157  styp  0.00484 -20.708645
179 ecoliO157  hinf  0.07654 -6.018417
183 ecoliO157  vcho  0.09518 -14.921153
185 ecoliO157  pmul  0.07328 -6.215225
197 ecoliO157  buch  0.15412 -11.746939
200 ecoliO157  ypes  0.02705 -9.171851
211 ecoliO157  xfas  0.25833  -71.091552
217  xfas ecoliO157  0.25833  -75.091552
218  xfas  paer  0.26268  -64.920552

I think  even a cursory look will tell us that there
are not as many unique genomes in the subset results.
(around 8/10).
However when I do
unique(levels(possible_relatives_subset$genome1)), I
get

[1] aero  aful  aquae atum_D   
bbur  bhal  bmel  bsub 
 [9] buch  cace  ccre  cglu 
cjej  cper  cpneuAcpneuC   
[17] cpneuJctraM ecoliO157 hbsp 
hinf  hpyl  linn  llact
[25] lmon  mgen  mjan  mlep 
mlot  mpneu mpul  mthe 
[33] mtub  mtub_cdc  nost  pabyssi  
paer  paero pmul  pyro 
[41] rcon  rpxx  saur_mu50 saur_n315
sent  smel  spneu spyo 
[49] ssol  stok  styp  synecho  
tacid tmar  tpal  tvol 
[57] uure  vcho  xfas  ypes 

Where am I going wrong?
I tried calling unique without the levels too, which
gives me the following response

[1] sent  styp  paer  hinf  vcho 
pmul  buch  ypes  ecoliO157 xfas 
60 Levels: aero aful aquae atum_D bbur bhal bmel bsub
buch cace ccre cglu cjej cper cpneuA ... ypes

--- Weiwei Shi [EMAIL PROTECTED] wrote:

 Then you need to provide more details about the
 calls you made and your dataset.
 For example, you can tell us by
 str(prunedrelatives, 1)
 
 how did you call unique on prunedrelative and so on?
 I made a test
 data it gave me what you wanted (omitted here).
 
 On 1/26/07, lalitha viswanath
 [EMAIL PROTECTED] wrote:
  Hi
  The pruned dataset has 8 unique genomes in it
 while
  the dataset before pruning has 65 unique genomes
 in
  it.
  However calling unique on the pruned dataset seems
 to
  return 65 no matter what.
 
  Any assistance in this matter would be
 appreciated.
 
  Thanks
  Lalitha
  --- Weiwei Shi [EMAIL PROTECTED] wrote:
 
   Hi,
  
   Even you removed many genomes1 by setting
 score
   -5; it is not
   necessary saying you changed the uniqueness.
  
   To check this, you can do like
   p0 - unique(dataset[dataset$score -5,
 genome1])
   # same as subset
   p1 - unique(dataset[dataset$score= -5,
 genome1])
  
   setdiff(p1, p0)
  
   if the output above has NULL, then it means even
   though you remove
   many genomes1, but it does not help changing the
   uniqueness.
  
   HTH,
  
   weiwei
  
  
  
   On 1/25/07, lalitha viswanath
   [EMAIL PROTECTED] wrote:
Hi
I am new to R programming and am using subset
 to
extract part of a data as follows
   
names(dataset) =
c(genome1,genome2,dist,score);
prunedrelatives - subset(dataset, score 
 -5);
   
However when I use unique to 

Re: [R] unique/subset problem

2007-01-26 Thread Weiwei Shi
check
?read.table

and add as.is=T in the option. So you read string as character now
and avoid the factor things.

Then repeat your work.

For example
 x0 - read.table(~/Documents/tox/noodles/four_sheets_orig/reg_r2.txt, 
 sep=\t, nrows=10)
 str(x0,1)
`data.frame':   10 obs. of  7 variables:
 $ V1: Factor w/ 10 levels -4086733916,..: 10 9 8 7 6 5 4 3 2 1
 $ V2: Factor w/ 10 levels -1963744741,..: 10 8 7 4 5 6 3 9 1 2
 $ V3: Factor w/ 7 levels -1687428658,..: 7 4 4 2 5 1 6 6 3 4
 $ V4: Factor w/ 2 levels 5,MECHANISM: 2 1 1 1 1 1 1 1 1 1
 $ V5: Factor w/ 2 levels 0,TYPE: 2 1 1 1 1 1 1 1 1 1
 $ V6: Factor w/ 2 levels USER_,alexey: 1 2 2 2 2 2 2 2 2 2
 $ V7: Factor w/ 2 levels 3,TRUST: 2 1 1 1 1 1 1 1 1 1
 x0 - read.table(~/Documents/tox/noodles/four_sheets_orig/reg_r2.txt, 
 sep=\t, nrows=10, as.is=T)
 str(x0,1)
`data.frame':   10 obs. of  7 variables:
 $ V1: chr  LINK_ID -4293537751 -4247422653 -4223137153 ...
 $ V2: chr  ID1 65259 1020286 -518245428 ...
 $ V3: chr  ID2 6436 6436 -2099509019 ...
 $ V4: chr  MECHANISM 5 5 5 ...
 $ V5: chr  TYPE 0 0 0 ...
 $ V6: chr  USER_ alexey alexey alexey ...
 $ V7: chr  TRUST 3 3 3 ...

HTH,

weiwei

On 1/26/07, lalitha viswanath [EMAIL PROTECTED] wrote:
 Hi
 I read in my dataset using
 dt read.table(filename)
 calling unique(levels(dt$genome1))  yields the
 following

  aero  aful  aquae atum_D
 bbur  bhal  bmel  bsub
  [9] buch  cace  ccre  cglu
 cjej  cper  cpneuAcpneuC
 [17] cpneuJctraM ecoliO157 hbsp
 hinf  hpyl  linn  llact
 [25] lmon  mgen  mjan  mlep
 mlot  mpneu mpul  mthe
 [33] mtub  mtub_cdc  nost  pabyssi
 paer  paero pmul  pyro
 [41] rcon  rpxx  saur_mu50 saur_n315
 sent  smel  spneu spyo
 [49] ssol  stok  styp  synecho
 tacid tmar  tpal  tvol
 [57] uure  vcho  xfas  ypes

 It shows 60 genomes, which is correct.

 I extracted a subset as follows
 possible_relatives_subset - subset(dt, Y  -5)
 I am pasting the results below
  genome1   genome2 parameterX  Y
 21   sent ecoliO157  0.00590 -200.633493
 22   sent  paer  0.18603 -100.200570
 27   styp ecoliO157  0.00484 -240.708645
 28   styp  paer  0.18497 -30.250127
 41   paer  sent  0.18603 -60.200570
 44   paer  styp  0.18497 -80.250127
 49   paer  hinf  0.18913 -90.056333
 53   paer  vcho  0.18703 -10.153929
 55   paer  pmul  0.18587 -100.208042
 67   paer  buch  0.21485  -80.898667
 70   paer  ypes  0.18460 -107.267454
 82   paer  xfas  0.26268  -61.920552
 95   hinf ecoliO157  0.07654 -163.018417
 96   hinf  paer  0.18913 -10.056333
 103  vcho ecoliO157  0.09518 -140.921153
 104  vcho  paer  0.18703 -10.153929
 107  pmul ecoliO157  0.07328 -165.215225
 108  pmul  paer  0.18587 -10.208042
 131  buch ecoliO157  0.15412 -11.746939
 132  buch  paer  0.21485  -8.898667
 137  ypes ecoliO157  0.02705 -19.171851
 138  ypes  paer  0.18460 -10.267454
 171 ecoliO157  sent  0.00590 -20.633493
 174 ecoliO157  styp  0.00484 -20.708645
 179 ecoliO157  hinf  0.07654 -6.018417
 183 ecoliO157  vcho  0.09518 -14.921153
 185 ecoliO157  pmul  0.07328 -6.215225
 197 ecoliO157  buch  0.15412 -11.746939
 200 ecoliO157  ypes  0.02705 -9.171851
 211 ecoliO157  xfas  0.25833  -71.091552
 217  xfas ecoliO157  0.25833  -75.091552
 218  xfas  paer  0.26268  -64.920552

 I think  even a cursory look will tell us that there
 are not as many unique genomes in the subset results.
 (around 8/10).
 However when I do
 unique(levels(possible_relatives_subset$genome1)), I
 get

 [1] aero  aful  aquae atum_D
 bbur  bhal  bmel  bsub
  [9] buch  cace  ccre  cglu
 cjej  cper  cpneuAcpneuC
 [17] cpneuJctraM ecoliO157 hbsp
 hinf  hpyl  linn  llact
 [25] lmon  mgen  mjan  mlep
 mlot  mpneu mpul  mthe
 [33] mtub  mtub_cdc  nost  pabyssi
 paer  paero pmul  pyro
 [41] rcon  rpxx  saur_mu50 saur_n315
 sent  smel  spneu spyo
 [49] ssol  stok  styp  synecho
 tacid tmar  tpal  tvol
 [57] uure  vcho  xfas  ypes

 Where am I going wrong?
 I tried calling unique without the levels too, which
 gives me the following response

 [1] sent  styp  paer  hinf  vcho
 pmul  buch  ypes  ecoliO157 xfas
 60 Levels: aero aful aquae atum_D bbur bhal bmel bsub
 buch cace ccre cglu cjej cper cpneuA ... ypes

 --- Weiwei Shi [EMAIL PROTECTED] wrote:

  Then you need to provide more details about the
  calls you made and your dataset.
  For example, you can tell us by
  str(prunedrelatives, 1)
 
  how did you call unique on prunedrelative and so on?
  I made a test
  data it gave me what you wanted (omitted here).
 
  On 

Re: [R] unique/subset problem

2007-01-26 Thread Weiwei Shi
oh, i forgot, you can also convert factor into string like
dataset$genome1 - as.character(dataset$genome1)

so you don't have to use
as.numeric(dataset$score) if you use as.is=T when you read.table

HTH,

weiwei

On 1/26/07, Weiwei Shi [EMAIL PROTECTED] wrote:
 check
 ?read.table

 and add as.is=T in the option. So you read string as character now
 and avoid the factor things.

 Then repeat your work.

 For example
  x0 - read.table(~/Documents/tox/noodles/four_sheets_orig/reg_r2.txt, 
  sep=\t, nrows=10)
  str(x0,1)
 `data.frame':   10 obs. of  7 variables:
  $ V1: Factor w/ 10 levels -4086733916,..: 10 9 8 7 6 5 4 3 2 1
  $ V2: Factor w/ 10 levels -1963744741,..: 10 8 7 4 5 6 3 9 1 2
  $ V3: Factor w/ 7 levels -1687428658,..: 7 4 4 2 5 1 6 6 3 4
  $ V4: Factor w/ 2 levels 5,MECHANISM: 2 1 1 1 1 1 1 1 1 1
  $ V5: Factor w/ 2 levels 0,TYPE: 2 1 1 1 1 1 1 1 1 1
  $ V6: Factor w/ 2 levels USER_,alexey: 1 2 2 2 2 2 2 2 2 2
  $ V7: Factor w/ 2 levels 3,TRUST: 2 1 1 1 1 1 1 1 1 1
  x0 - read.table(~/Documents/tox/noodles/four_sheets_orig/reg_r2.txt, 
  sep=\t, nrows=10, as.is=T)
  str(x0,1)
 `data.frame':   10 obs. of  7 variables:
  $ V1: chr  LINK_ID -4293537751 -4247422653 -4223137153 ...
  $ V2: chr  ID1 65259 1020286 -518245428 ...
  $ V3: chr  ID2 6436 6436 -2099509019 ...
  $ V4: chr  MECHANISM 5 5 5 ...
  $ V5: chr  TYPE 0 0 0 ...
  $ V6: chr  USER_ alexey alexey alexey ...
  $ V7: chr  TRUST 3 3 3 ...

 HTH,

 weiwei

 On 1/26/07, lalitha viswanath [EMAIL PROTECTED] wrote:
  Hi
  I read in my dataset using
  dt read.table(filename)
  calling unique(levels(dt$genome1))  yields the
  following
 
   aero  aful  aquae atum_D
  bbur  bhal  bmel  bsub
   [9] buch  cace  ccre  cglu
  cjej  cper  cpneuAcpneuC
  [17] cpneuJctraM ecoliO157 hbsp
  hinf  hpyl  linn  llact
  [25] lmon  mgen  mjan  mlep
  mlot  mpneu mpul  mthe
  [33] mtub  mtub_cdc  nost  pabyssi
  paer  paero pmul  pyro
  [41] rcon  rpxx  saur_mu50 saur_n315
  sent  smel  spneu spyo
  [49] ssol  stok  styp  synecho
  tacid tmar  tpal  tvol
  [57] uure  vcho  xfas  ypes
 
  It shows 60 genomes, which is correct.
 
  I extracted a subset as follows
  possible_relatives_subset - subset(dt, Y  -5)
  I am pasting the results below
   genome1   genome2 parameterX  Y
  21   sent ecoliO157  0.00590 -200.633493
  22   sent  paer  0.18603 -100.200570
  27   styp ecoliO157  0.00484 -240.708645
  28   styp  paer  0.18497 -30.250127
  41   paer  sent  0.18603 -60.200570
  44   paer  styp  0.18497 -80.250127
  49   paer  hinf  0.18913 -90.056333
  53   paer  vcho  0.18703 -10.153929
  55   paer  pmul  0.18587 -100.208042
  67   paer  buch  0.21485  -80.898667
  70   paer  ypes  0.18460 -107.267454
  82   paer  xfas  0.26268  -61.920552
  95   hinf ecoliO157  0.07654 -163.018417
  96   hinf  paer  0.18913 -10.056333
  103  vcho ecoliO157  0.09518 -140.921153
  104  vcho  paer  0.18703 -10.153929
  107  pmul ecoliO157  0.07328 -165.215225
  108  pmul  paer  0.18587 -10.208042
  131  buch ecoliO157  0.15412 -11.746939
  132  buch  paer  0.21485  -8.898667
  137  ypes ecoliO157  0.02705 -19.171851
  138  ypes  paer  0.18460 -10.267454
  171 ecoliO157  sent  0.00590 -20.633493
  174 ecoliO157  styp  0.00484 -20.708645
  179 ecoliO157  hinf  0.07654 -6.018417
  183 ecoliO157  vcho  0.09518 -14.921153
  185 ecoliO157  pmul  0.07328 -6.215225
  197 ecoliO157  buch  0.15412 -11.746939
  200 ecoliO157  ypes  0.02705 -9.171851
  211 ecoliO157  xfas  0.25833  -71.091552
  217  xfas ecoliO157  0.25833  -75.091552
  218  xfas  paer  0.26268  -64.920552
 
  I think  even a cursory look will tell us that there
  are not as many unique genomes in the subset results.
  (around 8/10).
  However when I do
  unique(levels(possible_relatives_subset$genome1)), I
  get
 
  [1] aero  aful  aquae atum_D
  bbur  bhal  bmel  bsub
   [9] buch  cace  ccre  cglu
  cjej  cper  cpneuAcpneuC
  [17] cpneuJctraM ecoliO157 hbsp
  hinf  hpyl  linn  llact
  [25] lmon  mgen  mjan  mlep
  mlot  mpneu mpul  mthe
  [33] mtub  mtub_cdc  nost  pabyssi
  paer  paero pmul  pyro
  [41] rcon  rpxx  saur_mu50 saur_n315
  sent  smel  spneu spyo
  [49] ssol  stok  styp  synecho
  tacid tmar  tpal  tvol
  [57] uure  vcho  xfas  ypes
 
  Where am I going wrong?
  I tried calling unique without the levels too, which
  gives me the following response
 
  [1] sent  styp  paer  hinf  vcho
  pmul  buch  ypes  ecoliO157 xfas
  60 Levels: aero aful aquae atum_D 

[R] unique/subset problem

2007-01-25 Thread lalitha viswanath
Hi
I am new to R programming and am using subset to
extract part of a data as follows

names(dataset) =
c(genome1,genome2,dist,score);
prunedrelatives - subset(dataset, score  -5);

However when I use unique to find the number of unique
genomes now present in prunedrelatives I get results
identical to calling unique(dataset$genome1) although
subset has eliminated many genomes and records.

I would greatly appreciate your input about using
unique correctly  in this regard.

Thanks
Lalitha


 

TV dinner still cooling? 
Check out Tonight's Picks on Yahoo! TV.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] unique/subset problem

2007-01-25 Thread Weiwei Shi
Hi,

Even you removed many genomes1 by setting score -5; it is not
necessary saying you changed the uniqueness.

To check this, you can do like
p0 - unique(dataset[dataset$score -5, genome1]) # same as subset
p1 - unique(dataset[dataset$score= -5, genome1])

setdiff(p1, p0)

if the output above has NULL, then it means even though you remove
many genomes1, but it does not help changing the uniqueness.

HTH,

weiwei



On 1/25/07, lalitha viswanath [EMAIL PROTECTED] wrote:
 Hi
 I am new to R programming and am using subset to
 extract part of a data as follows

 names(dataset) =
 c(genome1,genome2,dist,score);
 prunedrelatives - subset(dataset, score  -5);

 However when I use unique to find the number of unique
 genomes now present in prunedrelatives I get results
 identical to calling unique(dataset$genome1) although
 subset has eliminated many genomes and records.

 I would greatly appreciate your input about using
 unique correctly  in this regard.

 Thanks
 Lalitha



 
 TV dinner still cooling?
 Check out Tonight's Picks on Yahoo! TV.

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Weiwei Shi, Ph.D
Research Scientist
GeneGO, Inc.

Did you always know?
No, I did not. But I believed...
---Matrix III

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] unique columns of a matrix

2006-12-18 Thread Roman Akhter Ahmed

Dear all,

I have a matrix of repeating columns in R, for example a matrix X is

 [,1]   [,2]   [,3]   [,4]
[1,]   1  1  1  1
[2,]   1  1  2  2

I want to store unique columns of the matrix X in a new matrix Y. 
Therefore, Y will be


 [,1]   [,2]  
[1,]   1  1
[2,]   1  2


It will be really appreciated if you can provide me some function for 
this job.

Thanks for your time and effort in advance,
Roman

--
--
Roman Akhter Ahmed (Ph.D. Candidate)
Department of Econometrics and Business Statistics
Room 659, Building 11 (East Wing), Clayton Campus
Monash University, Victoria 3800, Australia
Ph.: +61 3 9905 8346 (W), +61 3 9543 1958 (R)
Web: http://www.buseco.monash.edu.au/staff/profile.php?uid=rahmed
--

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] unique columns of a matrix

2006-12-18 Thread John Fox
Dear Roman,

You can use unique(X, MARGIN=2). See ?unique for details.

I hope this helps,
 John


John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Roman 
 Akhter Ahmed
 Sent: Monday, December 18, 2006 8:42 PM
 To: r-help@stat.math.ethz.ch
 Subject: [R] unique columns of a matrix
 
 Dear all,
 
 I have a matrix of repeating columns in R, for example a matrix X is
 
   [,1]   [,2]   [,3]   [,4]
 [1,]   1  1  1  1
 [2,]   1  1  2  2
 
 I want to store unique columns of the matrix X in a new matrix Y. 
 Therefore, Y will be
 
   [,1]   [,2]  
 [1,]   1  1
 [2,]   1  2
 
 It will be really appreciated if you can provide me some 
 function for this job.
 Thanks for your time and effort in advance, Roman
 
 --
 --
 Roman Akhter Ahmed (Ph.D. Candidate)
 Department of Econometrics and Business Statistics Room 659, 
 Building 11 (East Wing), Clayton Campus Monash University, 
 Victoria 3800, Australia
 Ph.: +61 3 9905 8346 (W), +61 3 9543 1958 (R)
 Web: http://www.buseco.monash.edu.au/staff/profile.php?uid=rahmed
 --
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] unique sets of factors

2006-10-19 Thread Tony Long
All:

I have a matrix, X, with a LARGE number of rows.  Consider the 
following three rows of that matrix:

1 1 1 1 2 2 3 3
1 1 1 1 3 3 2 2
3 3 2 2 1 1 1 1

I wish to fit many one-way ANOVAs to some response variable using 
each row as a set of factors.  For example, for each row above I will 
do something like anova(lm(Y~as.factor(X[1,]))).  My problem is that 
in the above example, I do not want to fit models for both rows 1 and 
2 as they are essentially duplicates in terms of the ANOVA model. 
Clearly row 3, although it has the same number of 1's, 2's, and 3's, 
is a different model.

Is there some computationally efficient way to remove such factor 
duplicates from my large matrix?  I have been banging my head 
against the wall all morning.

Thanks!!

Tony
-- 
###

Tony Long

Ecology and Evolutionary Biology
Steinhaus Hall
University of California at Irvine
Irvine, CA
92697-2525

Tel:  (949) 824-2562   (office)
Tel:  (949) 824-5994   (lab)
Fax: (949) 824-2181

email:  [EMAIL PROTECTED]
http://hjmuller.bio.uci.edu/~labhome/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] unique sets of factors

2006-10-19 Thread Gabor Grothendieck
If DF is a data frame containing the rows then:

unique(t(apply(DF, 1, function(x) as.numeric(factor(x, levels = unique(x))


On 10/19/06, Tony Long [EMAIL PROTECTED] wrote:
 All:

 I have a matrix, X, with a LARGE number of rows.  Consider the
 following three rows of that matrix:

 1 1 1 1 2 2 3 3
 1 1 1 1 3 3 2 2
 3 3 2 2 1 1 1 1

 I wish to fit many one-way ANOVAs to some response variable using
 each row as a set of factors.  For example, for each row above I will
 do something like anova(lm(Y~as.factor(X[1,]))).  My problem is that
 in the above example, I do not want to fit models for both rows 1 and
 2 as they are essentially duplicates in terms of the ANOVA model.
 Clearly row 3, although it has the same number of 1's, 2's, and 3's,
 is a different model.

 Is there some computationally efficient way to remove such factor
 duplicates from my large matrix?  I have been banging my head
 against the wall all morning.

 Thanks!!

 Tony
 --
 ###

 Tony Long

 Ecology and Evolutionary Biology
 Steinhaus Hall
 University of California at Irvine
 Irvine, CA
 92697-2525

 Tel:  (949) 824-2562   (office)
 Tel:  (949) 824-5994   (lab)
 Fax: (949) 824-2181

 email:  [EMAIL PROTECTED]
 http://hjmuller.bio.uci.edu/~labhome/

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] unique sets of factors

2006-10-19 Thread Gabor Grothendieck
Or since that messes up the values:

u - unique(t(apply(DF, 1, function(x) as.numeric(factor(x, levels =
unique(x))
DF[rownames(u), ]


On 10/19/06, Gabor Grothendieck [EMAIL PROTECTED] wrote:
 If DF is a data frame containing the rows then:

 unique(t(apply(DF, 1, function(x) as.numeric(factor(x, levels = unique(x))


 On 10/19/06, Tony Long [EMAIL PROTECTED] wrote:
  All:
 
  I have a matrix, X, with a LARGE number of rows.  Consider the
  following three rows of that matrix:
 
  1 1 1 1 2 2 3 3
  1 1 1 1 3 3 2 2
  3 3 2 2 1 1 1 1
 
  I wish to fit many one-way ANOVAs to some response variable using
  each row as a set of factors.  For example, for each row above I will
  do something like anova(lm(Y~as.factor(X[1,]))).  My problem is that
  in the above example, I do not want to fit models for both rows 1 and
  2 as they are essentially duplicates in terms of the ANOVA model.
  Clearly row 3, although it has the same number of 1's, 2's, and 3's,
  is a different model.
 
  Is there some computationally efficient way to remove such factor
  duplicates from my large matrix?  I have been banging my head
  against the wall all morning.
 
  Thanks!!
 
  Tony
  --
  ###
 
  Tony Long
 
  Ecology and Evolutionary Biology
  Steinhaus Hall
  University of California at Irvine
  Irvine, CA
  92697-2525
 
  Tel:  (949) 824-2562   (office)
  Tel:  (949) 824-5994   (lab)
  Fax: (949) 824-2181
 
  email:  [EMAIL PROTECTED]
  http://hjmuller.bio.uci.edu/~labhome/
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Unique rows

2006-08-09 Thread Lanre Okusanya
hello all,

I have a dataset where the subjects are duplicated. How do I subset
such that I can get only I row/subject.

aa-c(1,1,2,2,3,3,4,4,5,5,6,6)
bb-c(56,56,33,33,53,53,20,20,63,63,9,9)
cc-data.frame(aa,bb)

I would like to subset df(cc) such that I can get
aa bb
1 56
2 33
3 53
4 20
5 63
6 9

I know this should be fairly easy but I can't figure how to do it in a
dataframe and keep all my columns

Thanks

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Unique rows

2006-08-09 Thread Dimitrios Rizopoulos
if you want the first row for the unique 'aa' entries, try the following:

cc[!duplicated(cc$aa), ]


I hope it helps.

Best,
Dimitris


Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/(0)16/336899
Fax: +32/(0)16/337015
Web: http://med.kuleuven.be/biostat/
  http://www.student.kuleuven.be/~m0390867/dimitris.htm



Quoting Lanre Okusanya [EMAIL PROTECTED]:

 hello all,

 I have a dataset where the subjects are duplicated. How do I subset
 such that I can get only I row/subject.

 aa-c(1,1,2,2,3,3,4,4,5,5,6,6)
 bb-c(56,56,33,33,53,53,20,20,63,63,9,9)
 cc-data.frame(aa,bb)

 I would like to subset df(cc) such that I can get
 aa bb
 1 56
 2 33
 3 53
 4 20
 5 63
 6 9

 I know this should be fairly easy but I can't figure how to do it in a
 dataframe and keep all my columns

 Thanks

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Unique rows

2006-08-09 Thread jim holtman
 cc[!duplicated(cc$bb),]
   aa bb
1   1 56
3   2 33
5   3 53
7   4 20
9   5 63
11  6  9



On 8/9/06, Lanre Okusanya [EMAIL PROTECTED] wrote:

 hello all,

 I have a dataset where the subjects are duplicated. How do I subset
 such that I can get only I row/subject.

 aa-c(1,1,2,2,3,3,4,4,5,5,6,6)
 bb-c(56,56,33,33,53,53,20,20,63,63,9,9)
 cc-data.frame(aa,bb)

 I would like to subset df(cc) such that I can get
 aa bb
 1 56
 2 33
 3 53
 4 20
 5 63
 6 9

 I know this should be fairly easy but I can't figure how to do it in a
 dataframe and keep all my columns

 Thanks

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Unique rows

2006-08-09 Thread Lanre Okusanya
Thanks. I tried that, however for some reason, it still left some duplicates

On 8/9/06, Gary Collins [EMAIL PROTECTED] wrote:
 try

  unique(cc)
aa bb
 1   1 56
 3   2 33
 5   3 53
 7   4 20
 9   5 63
 11  6  9

 HTH

 Gary

 On 09/08/06, Lanre Okusanya [EMAIL PROTECTED] wrote:
  hello all,
 
  I have a dataset where the subjects are duplicated. How do I subset
  such that I can get only I row/subject.
 
  aa-c(1,1,2,2,3,3,4,4,5,5,6,6)
  bb-c(56,56,33,33,53,53,20,20,63,63,9,9)
  cc-data.frame(aa,bb)
 
  I would like to subset df(cc) such that I can get
  aa bb
  1 56
  2 33
  3 53
  4 20
  5 63
  6 9
 
  I know this should be fairly easy but I can't figure how to do it in a
  dataframe and keep all my columns
 
  Thanks
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Unique rows

2006-08-09 Thread Lanre Okusanya
Thanks! That worked well

On 8/9/06, Dimitrios Rizopoulos [EMAIL PROTECTED] wrote:
 if you want the first row for the unique 'aa' entries, try the following:

 cc[!duplicated(cc$aa), ]


 I hope it helps.

 Best,
 Dimitris

 
 Dimitris Rizopoulos
 Ph.D. Student
 Biostatistical Centre
 School of Public Health
 Catholic University of Leuven

 Address: Kapucijnenvoer 35, Leuven, Belgium
 Tel: +32/(0)16/336899
 Fax: +32/(0)16/337015
 Web: http://med.kuleuven.be/biostat/
   http://www.student.kuleuven.be/~m0390867/dimitris.htm



 Quoting Lanre Okusanya [EMAIL PROTECTED]:

  hello all,
 
  I have a dataset where the subjects are duplicated. How do I subset
  such that I can get only I row/subject.
 
  aa-c(1,1,2,2,3,3,4,4,5,5,6,6)
  bb-c(56,56,33,33,53,53,20,20,63,63,9,9)
  cc-data.frame(aa,bb)
 
  I would like to subset df(cc) such that I can get
  aa bb
  1 56
  2 33
  3 53
  4 20
  5 63
  6 9
 
  I know this should be fairly easy but I can't figure how to do it in a
  dataframe and keep all my columns
 
  Thanks
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 



 Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm



__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] unique, but keep LAST occurence

2006-07-24 Thread davidr
?unique says

Value:

 An object of the same type of 'x'. but if an element is equal to
 one with a smaller index, it is removed.

However, I need to keep the one with the LARGEST index.
Can someone please show me the light? 
I thought about reversing the row order twice, but I couldn't get it to work 
right

(My data frame has 125000 rows and 7 columns, 
and I'm 'uniqueing' on column #1 (chron) only, although the class of the column 
may not matter.)

Say, e.g., 
 DF - data.frame(t = c(1,2,3,1,4,5,1,2,3), x = c(0,1,2,3,4,5,6,7,8))

I would like the result to be (sorted as well)
 t x
 1 6
 2 7
 3 8
 4 4
 5 5

If I got the original rownames, that would be a bonus (for debugging.)

 R.version
   _ 
platform   i386-pc-mingw32   
arch   i386  
os mingw32   
system i386, mingw32 
status   
major  2 
minor  3.1   
year   2006  
month  06
day01
svn rev38247 
language   R 
version.string Version 2.3.1 (2006-06-01)

Thanks for any hints!
David

David L. Reiner
Rho Trading Securities, LLC
Chicago  IL  60605
312-362-4963

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] unique, but keep LAST occurence

2006-07-24 Thread Marc Schwartz (via MN)
On Mon, 2006-07-24 at 12:00 -0500, [EMAIL PROTECTED] wrote:
 ?unique says
 
 Value:
 
  An object of the same type of 'x'. but if an element is equal to
  one with a smaller index, it is removed.
 
 However, I need to keep the one with the LARGEST index.
 Can someone please show me the light? 
 I thought about reversing the row order twice, but I couldn't get it to work 
 right
 
 (My data frame has 125000 rows and 7 columns, 
 and I'm 'uniqueing' on column #1 (chron) only, although the class of the 
 column may not matter.)
 
 Say, e.g., 
  DF - data.frame(t = c(1,2,3,1,4,5,1,2,3), x = c(0,1,2,3,4,5,6,7,8))
 
 I would like the result to be (sorted as well)
  t x
  1 6
  2 7
  3 8
  4 4
  5 5
 
 If I got the original rownames, that would be a bonus (for debugging.)

Does this get it?

 DF[sapply(unique(DF$t), function(x) max(which(DF$t == x))), ]
  t x
7 1 6
8 2 7
9 3 8
5 4 4
6 5 5


HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] unique, but keep LAST occurence

2006-07-24 Thread Berton Gunter
Try:

 largestDF - DF[nrow(DF)- which(!duplicated(rev(DF$t)))+1,]

You can then sort this however you like in the usual way. Row names will be
preserved.

-- Bert Gunter
Genentech Non-Clinical Statistics
South San Francisco, CA
 
The business of the statistician is to catalyze the scientific learning
process.  - George E. P. Box
 
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 [EMAIL PROTECTED]
 Sent: Monday, July 24, 2006 10:00 AM
 To: r-help@stat.math.ethz.ch
 Subject: [R] unique, but keep LAST occurence
 
 ?unique says
 
 Value:
 
  An object of the same type of 'x'. but if an element is equal to
  one with a smaller index, it is removed.
 
 However, I need to keep the one with the LARGEST index.
 Can someone please show me the light? 
 I thought about reversing the row order twice, but I couldn't 
 get it to work right
 
 (My data frame has 125000 rows and 7 columns, 
 and I'm 'uniqueing' on column #1 (chron) only, although the 
 class of the column may not matter.)
 
 Say, e.g., 
  DF - data.frame(t = c(1,2,3,1,4,5,1,2,3), x = c(0,1,2,3,4,5,6,7,8))
 
 I would like the result to be (sorted as well)
  t x
  1 6
  2 7
  3 8
  4 4
  5 5
 
 If I got the original rownames, that would be a bonus (for debugging.)
 
  R.version
_ 
 platform   i386-pc-mingw32   
 arch   i386  
 os mingw32   
 system i386, mingw32 
 status   
 major  2 
 minor  3.1   
 year   2006  
 month  06
 day01
 svn rev38247 
 language   R 
 version.string Version 2.3.1 (2006-06-01)
 
 Thanks for any hints!
 David
 
 David L. Reiner
 Rho Trading Securities, LLC
 Chicago  IL  60605
 312-362-4963
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] unique, but keep LAST occurence

2006-07-24 Thread davidr
Thank you, Bert and Mark.
I believe Mark's solution works, but it was taking a very long time.
Bert's is very fast.

My day is saved!

David L. Reiner
Rho Trading Securities, LLC
Chicago  IL  60605
312-362-4963

-Original Message-
From: Berton Gunter [mailto:[EMAIL PROTECTED] 
Sent: Monday, July 24, 2006 12:51 PM
To: David Reiner [EMAIL PROTECTED]; r-help@stat.math.ethz.ch
Subject: RE: [R] unique, but keep LAST occurence

Try:

 largestDF - DF[nrow(DF)- which(!duplicated(rev(DF$t)))+1,]

You can then sort this however you like in the usual way. Row names will be
preserved.

-- Bert Gunter
Genentech Non-Clinical Statistics
South San Francisco, CA
 
The business of the statistician is to catalyze the scientific learning
process.  - George E. P. Box
 
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 [EMAIL PROTECTED]
 Sent: Monday, July 24, 2006 10:00 AM
 To: r-help@stat.math.ethz.ch
 Subject: [R] unique, but keep LAST occurence
 
 ?unique says
 
 Value:
 
  An object of the same type of 'x'. but if an element is equal to
  one with a smaller index, it is removed.
 
 However, I need to keep the one with the LARGEST index.
 Can someone please show me the light? 
 I thought about reversing the row order twice, but I couldn't 
 get it to work right
 
 (My data frame has 125000 rows and 7 columns, 
 and I'm 'uniqueing' on column #1 (chron) only, although the 
 class of the column may not matter.)
 
 Say, e.g., 
  DF - data.frame(t = c(1,2,3,1,4,5,1,2,3), x = c(0,1,2,3,4,5,6,7,8))
 
 I would like the result to be (sorted as well)
  t x
  1 6
  2 7
  3 8
  4 4
  5 5
 
 If I got the original rownames, that would be a bonus (for debugging.)
 
  R.version
_ 
 platform   i386-pc-mingw32   
 arch   i386  
 os mingw32   
 system i386, mingw32 
 status   
 major  2 
 minor  3.1   
 year   2006  
 month  06
 day01
 svn rev38247 
 language   R 
 version.string Version 2.3.1 (2006-06-01)
 
 Thanks for any hints!
 David
 
 David L. Reiner
 Rho Trading Securities, LLC
 Chicago  IL  60605
 312-362-4963
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] unique, but keep LAST occurence

2006-07-24 Thread johan Faux
I have a question about deparse function in R
What is the reason that deparse use an argument like width.cutoff ? 
Why the maximum cutoff is 500?
I was manipulating an R formula and used deparse. Since the length of user's 
formula was greater then 500, my code didnt work.

thanks
Johan




-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] unique deletes names - intended?

2006-07-05 Thread Prof Brian Ripley
On Tue, 4 Jul 2006, Heinz Tuechler wrote:

 Dear All,

 as shown in the example, unique() deletes names of vector elements.
 Is this intended?

Yes.  Think of the vector as a set: it is supposed to immaterial which of 
the duplicated elements is retained.

The help page says

  An object of the same type of 'x'. but if an element is equal to
  one with a smaller index, it is removed.

so it is starting with a new object, not 'x'.  However, the array method 
works differently, so the documentation needs clarification.

 Of course, one can use indexing by !duplicated() instead.

Be careful, as you might get a method for [ and that might not do want you 
intended (e.g. for a time series).


 Greetings,
 Heinz

 ## unique deletes names
 v1 - c(a=1, b=2, c=3, e=2, a=4)
 unique(v1) # names deleted

 v1[!duplicated(v1)] # names preserved


 platform   i386-pc-mingw32
 arch   i386
 os mingw32
 system i386, mingw32
 status Patched
 major  2
 minor  3.1
 year   2006
 month  07
 day01
 svn rev38471
 language   R
 version.string Version 2.3.1 Patched (2006-07-01 r38471)

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] unique deletes names - intended?

2006-07-04 Thread Heinz Tuechler
Dear All,

as shown in the example, unique() deletes names of vector elements. 
Is this intended?
Of course, one can use indexing by !duplicated() instead.

Greetings,
Heinz

## unique deletes names
v1 - c(a=1, b=2, c=3, e=2, a=4)
unique(v1) # names deleted

v1[!duplicated(v1)] # names preserved


platform   i386-pc-mingw32  
arch   i386 
os mingw32  
system i386, mingw32
status Patched  
major  2
minor  3.1  
year   2006 
month  07   
day01   
svn rev38471
language   R
version.string Version 2.3.1 Patched (2006-07-01 r38471)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Unique?

2006-05-11 Thread Francisco J. Zagmutt
Hi Cameron

You need to be more specific when you ask a question so you can get a better 
answer.  Anyhow, when you say that you want to retain all the other 
variables do you mean that you want to create a new column in the dataset 
that contains the calculated sum?   If that is the case you can use a 
construction like:

set.seed(1)
step4-data.frame(TRIPID=rep(c(111,222,333),3),CONVUNIT=rpois(9,40))
result-tapply(step4$CONVUNIT,INDEX=step4$TRIPID,FUN=sum)
step4[,SUM]=result[match(step4[,TRIPID],names(result))]
step4
  TRIPID CONVUNIT Sum
1111   36 122
2222   48 121
3333   48 129
4111   42 122
5222   30 121
6333   43 129
7111   44 122
8222   43 121
9333   38 129


Cheers

Francisco

From: Guenther, Cameron [EMAIL PROTECTED]
To: Francisco J. Zagmutt [EMAIL PROTECTED]
Subject: RE: [R] Unique?
Date: Thu, 11 May 2006 12:08:31 -0400

It is close but not quite what I want.  I need to retain all of the
other variables as well.


Cameron Guenther, Ph.D.
Associate Research Scientist
FWC/FWRI, Marine Fisheries Research
100 8th Avenue S.E.
St. Petersburg, FL 33701
(727)896-8626 Ext. 4305
[EMAIL PROTECTED]
-Original Message-
From: Francisco J. Zagmutt [mailto:[EMAIL PROTECTED]
Sent: Wednesday, May 10, 2006 6:06 PM
To: Guenther, Cameron; r-help@stat.math.ethz.ch
Subject: RE: [R] Unique?

If you only care about the sum of CONVUNIT by each TRIPID then you can
use tapply i.e.:

step4-data.frame(TRIPID=rep(c(111,222,333),3),CONVUNIT=rpois(9,40))
result-tapply(step4$CONVUNIT,INDEX=step4$TRIPID,FUN=sum)
result
111 222 333
115 107 123

Is this what you wanted to do?  I can't think of anything faster than
tapply for your problem.

I hope this helps

Francisco




 From: Guenther, Cameron [EMAIL PROTECTED]
 To: r-help@stat.math.ethz.ch
 Subject: [R] Unique?
 Date: Wed, 10 May 2006 17:02:33 -0400
 
 
 Hello,
 I have sample data set that looks like:
 
 YEAR MONTH   DAY CONTINUESPL TIMEFISH
 TIMEUNIT AREACOUNTY  DEPTH   DEPUNIT GEARTRIPID
 CONVUNIT
 1992 1   26  1   SP0073928   8
 H7   25  4   NA  100
 02163399054  161
 1992 1   26  1   SP0073928   8
 H7   25  4   NA  100
 02163399054  8
 1992 1   26  2   SP0004228   8
 H7   25  4   NA  100
 02163399054  161
 1992 1   26  2   SP0004228   8
 H7   25  4   NA  100
 02163399054  8
 1992 1   25  NA  SP0052652   8
 H7   25  4   NA  100
 02163399057  85
 1992 1   26  NA  SP0037940   8
 H7   25  4   NA  100
 02163399058  70
 1992 1   27  NA  SP0072357   8
 H7   25  4   NA  100
 02163399059  15
 1992 1   27  NA  SP0072357   8
 H7   25  4   NA  100
 02163399059  20
 1992 1   27  NA  SP0026324   8
 H7   25  4   NA  100
 02163399060  8
 1992 1   28  1   SP0072357   8
 H7   25  4   NA  100
 02163399062  200
 
 How can I use unique to extract the rows that have repeated tripid's
 only, not a unique value for each variable but only for TRIPID.  I then

 want to condense the unique values by summing the CONVUNIT for each
 unique value of TRIPID.  I posted a similar question last week and
 received a sufficient answer of how to do this without using uniqe.
 The solution below worked just fine on this sample data set but the
 full data set has 446,000 rows of data and my computer and R simply
 cannot handle this follwing code on data this large.
 
 conds-by(Step4,Step4$TRIPID,function(x)
 replace(x[1,],CONVUNIT,sum(x$CONVUNIT)))
 Step5-do.call(rbind,conds)
 
 Thank you,
 
 Cameron Guenther, Ph.D.
 Associate Research Scientist
 FWC/FWRI, Marine Fisheries Research
 100 8th Avenue S.E.
 St. Petersburg, FL 33701
 (727)896-8626 Ext. 4305
 [EMAIL PROTECTED]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html



__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Unique?

2006-05-10 Thread Guenther, Cameron

Hello,
I have sample data set that looks like:

YEARMONTH   DAY CONTINUESPL TIMEFISH
TIMEUNITAREACOUNTY  DEPTH   DEPUNIT GEARTRIPID
CONVUNIT
19921   26  1   SP0073928   8
H   7   25  4   NA  100
02163399054 161
19921   26  1   SP0073928   8
H   7   25  4   NA  100
02163399054 8
19921   26  2   SP0004228   8
H   7   25  4   NA  100
02163399054 161
19921   26  2   SP0004228   8
H   7   25  4   NA  100
02163399054 8
19921   25  NA  SP0052652   8
H   7   25  4   NA  100
02163399057 85
19921   26  NA  SP0037940   8
H   7   25  4   NA  100
02163399058 70
19921   27  NA  SP0072357   8
H   7   25  4   NA  100
02163399059 15
19921   27  NA  SP0072357   8
H   7   25  4   NA  100
02163399059 20
19921   27  NA  SP0026324   8
H   7   25  4   NA  100
02163399060 8
19921   28  1   SP0072357   8
H   7   25  4   NA  100
02163399062 200

How can I use unique to extract the rows that have repeated tripid's
only, not a unique value for each variable but only for TRIPID.  I then
want to condense the unique values by summing the CONVUNIT for each
unique value of TRIPID.  I posted a similar question last week and
received a sufficient answer of how to do this without using uniqe.  The
solution below worked just fine on this sample data set but the full
data set has 446,000 rows of data and my computer and R simply cannot
handle this follwing code on data this large.

conds-by(Step4,Step4$TRIPID,function(x)
replace(x[1,],CONVUNIT,sum(x$CONVUNIT)))
Step5-do.call(rbind,conds)

Thank you,

Cameron Guenther, Ph.D. 
Associate Research Scientist
FWC/FWRI, Marine Fisheries Research
100 8th Avenue S.E.
St. Petersburg, FL 33701
(727)896-8626 Ext. 4305
[EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Unique?

2006-05-10 Thread Robert Citek

On May 10, 2006, at 4:02 PM, Guenther, Cameron wrote:
 How can I use unique to extract the rows that have repeated tripid's
 only, not a unique value for each variable but only for TRIPID.  I  
 then
 want to condense the unique values by summing the CONVUNIT for each
 unique value of TRIPID.

Thanks, Cameron, for this question.  This type of manipulation would  
be relatively simple to do in a RDBMS (e.g. MySQL, PostgreSQL,  
Oracle, etc.)  But I'm curious to see how one would do the same in  
R.  So, if folks send you solutions off-list, please do post them  
back to the list.

Regards,
- Robert
http://www.cwelug.org/downloads
Help others get OpenSource software.  Distribute FLOSS
for Windows, Linux, *BSD, and MacOS X with BitTorrent

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Unique?

2006-05-10 Thread Francisco J. Zagmutt
If you only care about the sum of CONVUNIT by each TRIPID then you can use 
tapply i.e.:

step4-data.frame(TRIPID=rep(c(111,222,333),3),CONVUNIT=rpois(9,40))
result-tapply(step4$CONVUNIT,INDEX=step4$TRIPID,FUN=sum)
result
111 222 333
115 107 123

Is this what you wanted to do?  I can't think of anything faster than tapply 
for your problem.

I hope this helps

Francisco




From: Guenther, Cameron [EMAIL PROTECTED]
To: r-help@stat.math.ethz.ch
Subject: [R] Unique?
Date: Wed, 10 May 2006 17:02:33 -0400


Hello,
I have sample data set that looks like:

YEAR   MONTH   DAY CONTINUESPL TIMEFISH
TIMEUNIT   AREACOUNTY  DEPTH   DEPUNIT GEARTRIPID
CONVUNIT
1992   1   26  1   SP0073928   8
H  7   25  4   NA  100
02163399054161
1992   1   26  1   SP0073928   8
H  7   25  4   NA  100
021633990548
1992   1   26  2   SP0004228   8
H  7   25  4   NA  100
02163399054161
1992   1   26  2   SP0004228   8
H  7   25  4   NA  100
021633990548
1992   1   25  NA  SP0052652   8
H  7   25  4   NA  100
0216339905785
1992   1   26  NA  SP0037940   8
H  7   25  4   NA  100
0216339905870
1992   1   27  NA  SP0072357   8
H  7   25  4   NA  100
0216339905915
1992   1   27  NA  SP0072357   8
H  7   25  4   NA  100
0216339905920
1992   1   27  NA  SP0026324   8
H  7   25  4   NA  100
021633990608
1992   1   28  1   SP0072357   8
H  7   25  4   NA  100
02163399062200

How can I use unique to extract the rows that have repeated tripid's
only, not a unique value for each variable but only for TRIPID.  I then
want to condense the unique values by summing the CONVUNIT for each
unique value of TRIPID.  I posted a similar question last week and
received a sufficient answer of how to do this without using uniqe.  The
solution below worked just fine on this sample data set but the full
data set has 446,000 rows of data and my computer and R simply cannot
handle this follwing code on data this large.

conds-by(Step4,Step4$TRIPID,function(x)
replace(x[1,],CONVUNIT,sum(x$CONVUNIT)))
Step5-do.call(rbind,conds)

Thank you,

Cameron Guenther, Ph.D.
Associate Research Scientist
FWC/FWRI, Marine Fisheries Research
100 8th Avenue S.E.
St. Petersburg, FL 33701
(727)896-8626 Ext. 4305
[EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Unique?

2006-05-10 Thread Dave Armstrong
Dear Cameron,

This is not with unique, but it gets the job done.  Just create a new
variable that is the three variables concatenated together.  Then, you can
just sum by this variable, like the following:

mymat - matrix(letters, ncol=3, nrow=260)
mymat - as.data.frame(mymat)
mymat$dat - rnorm(260)
mymat$id - paste(mymat[,1], mymat[,2], mymat[,3])
aggregate(mymat$dat, list(mymat$id), sum)

HTH,
Dave.

On 5/10/06, Robert Citek [EMAIL PROTECTED] wrote:


 On May 10, 2006, at 4:02 PM, Guenther, Cameron wrote:
  How can I use unique to extract the rows that have repeated tripid's
  only, not a unique value for each variable but only for TRIPID.  I
  then
  want to condense the unique values by summing the CONVUNIT for each
  unique value of TRIPID.

 Thanks, Cameron, for this question.  This type of manipulation would
 be relatively simple to do in a RDBMS (e.g. MySQL, PostgreSQL,
 Oracle, etc.)  But I'm curious to see how one would do the same in
 R.  So, if folks send you solutions off-list, please do post them
 back to the list.

 Regards,
 - Robert
 http://www.cwelug.org/downloads
 Help others get OpenSource software.  Distribute FLOSS
 for Windows, Linux, *BSD, and MacOS X with BitTorrent

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html




--
Dave Armstrong
University of Maryland
Dept of Government and Politics
3140 Tydings Hall
College Park, MD 20742
Office: 2103L Cole Field House
Phone: 301-405-9735
e-mail: [EMAIL PROTECTED]
web: www.davearmstrong-ps.com

Facts are meaningless.  You can use facts to prove anything that's even
remotely true. - Homer Simpson

To this day, philosophers suffer from Plato's disease: the assumption that
reality fundamentally consists of
abstract essences best described by words or geometry. (In truth, reality is
largely a probabilistic affair best
described by statistics) - Steve Sailer The Unexpected Uselessness of
Philosophy

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Unique arrangements of a vector

2005-05-31 Thread Uwe Ligges

Tarmo Remmel wrote:


Dear List,

Running on a PC (Windows 2000) with 256 MB RAM, Version R1.9.1


This one is quite outdated...


I have a relatively simple problem, which I can solve for relatively small
datasets, but run into difficulties with larger ones.  I believe that my
approach is a hack rather than something elegant and I was hoping that
somebody on this list might help me improve my code.  Basically, given a
vector of values (e.g., 0,0,1,1), I want to generate all of the unique
arrangements of these values, of which there are 4!/(2!2!) = 6.

0 0 1 1
0 1 0 1
0 1 1 0
1 0 0 1
1 0 1 0
1 1 0 0

Using unique() in conjunction with expand.grid(), and later filtering
impossible results, I can obtain the answer.  However, this is slow, and
does not work for large initial vectors and is difficult to filter when
using values beyond 0,1.  Is there some mathematically elegant method for
doing this?  I'd hope to have initial vectors significantly longer than the
demonstrated 4 values (e.g., thousands).


Nice for length 4, but you will get problems far sooner than for length 
1000... please calculate the size before!
For thing as short as 4, you might want to try out permutations() in 
package gtools (formerly in bundle gregmisc, since yesterday a single 
package).


Uwe Ligges



Any help is appreciated and I will gladly SUM afterwards.

Thank you,

Tarmo

__
Tarmo Remmel  Ph.D.
GUESS Lab, Department of Geography
University of Toronto at Mississauga
Mississauga, Ontario, L5L 1C6
Tel: 905-828-3868
Fax: 905-828-5273
Skype: tarmoremmel
http://eratos.erin.utoronto.ca/remmelt

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Unique arrangements of a vector

2005-05-30 Thread Tarmo Remmel
Dear List,

Running on a PC (Windows 2000) with 256 MB RAM, Version R1.9.1

I have a relatively simple problem, which I can solve for relatively small
datasets, but run into difficulties with larger ones.  I believe that my
approach is a hack rather than something elegant and I was hoping that
somebody on this list might help me improve my code.  Basically, given a
vector of values (e.g., 0,0,1,1), I want to generate all of the unique
arrangements of these values, of which there are 4!/(2!2!) = 6.

0 0 1 1
0 1 0 1
0 1 1 0
1 0 0 1
1 0 1 0
1 1 0 0

Using unique() in conjunction with expand.grid(), and later filtering
impossible results, I can obtain the answer.  However, this is slow, and
does not work for large initial vectors and is difficult to filter when
using values beyond 0,1.  Is there some mathematically elegant method for
doing this?  I'd hope to have initial vectors significantly longer than the
demonstrated 4 values (e.g., thousands).

Any help is appreciated and I will gladly SUM afterwards.

Thank you,

Tarmo

__
Tarmo Remmel  Ph.D.
GUESS Lab, Department of Geography
University of Toronto at Mississauga
Mississauga, Ontario, L5L 1C6
Tel: 905-828-3868
Fax: 905-828-5273
Skype: tarmoremmel
http://eratos.erin.utoronto.ca/remmelt

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] unique rows

2005-01-29 Thread dax42
Dear list,
I would like to extract from a matrix all those rows, that are unique.
By unique, I don't mean the unique that is accomplished by the function 
unique(), though...

Consider the following example:
 h
 [,1] [,2]
[1,]44
[2,]14
[3,]41
Now unique(h) returns exactly the same - because 1 4 and 4 1 is not the 
same for that function.
What I would like to see, though, are only the first two rows (or the 
first and the third, it does not matter).

Does anybody know how to do that?
Cheers, Dax.
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] unique rows

2005-01-29 Thread John Fox
Dear Dax,

I'll bet that someone comes up with a better approach, but the following
does appear to work:

u - unique(t(sapply(as.data.frame(t(h)), sort)))

I hope this helps,
 John


John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of dax42
 Sent: Saturday, January 29, 2005 7:54 AM
 To: r-help@stat.math.ethz.ch
 Subject: [R] unique rows
 
 Dear list,
 
 I would like to extract from a matrix all those rows, that are unique.
 By unique, I don't mean the unique that is accomplished by 
 the function unique(), though...
 
 Consider the following example:
   h
   [,1] [,2]
 [1,]44
 [2,]14
 [3,]41
 
 Now unique(h) returns exactly the same - because 1 4 and 4 1 
 is not the same for that function.
 What I would like to see, though, are only the first two rows 
 (or the first and the third, it does not matter).
 
 Does anybody know how to do that?
 Cheers, Dax.
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] unique rows

2005-01-29 Thread Patrick Burns
There may be more efficient ways, but
unique(t(apply(h, 1, sort)))
does what I think you want.
Patrick Burns
Burns Statistics
[EMAIL PROTECTED]
+44 (0)20 8525 0696
http://www.burns-stat.com
(home of S Poetry and A Guide for the Unwilling S User)
dax42 wrote:
Dear list,
I would like to extract from a matrix all those rows, that are unique.
By unique, I don't mean the unique that is accomplished by the 
function unique(), though...

Consider the following example:
 h
 [,1] [,2]
[1,]44
[2,]14
[3,]41
Now unique(h) returns exactly the same - because 1 4 and 4 1 is not 
the same for that function.
What I would like to see, though, are only the first two rows (or the 
first and the third, it does not matter).

Does anybody know how to do that?
Cheers, Dax.
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] unique rows

2005-01-29 Thread Peter Dalgaard
John Fox [EMAIL PROTECTED] writes:

 Dear Dax,
 
 I'll bet that someone comes up with a better approach, but the following
 does appear to work:
 
 u - unique(t(sapply(as.data.frame(t(h)), sort)))

Or maybe just

unique(t(apply(h,1,sort)))


-- 
   O__   Peter Dalgaard Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics 2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark  Ph: (+45) 35327918
~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] unique rows

2005-01-29 Thread Ted Harding
On 29-Jan-05 dax42 wrote:
 Dear list,
 
 I would like to extract from a matrix all those rows, that are unique.
 By unique, I don't mean the unique that is accomplished by the function
 unique(), though...
 
 Consider the following example:
   h
   [,1] [,2]
 [1,]44
 [2,]14
 [3,]41
 
 Now unique(h) returns exactly the same - because 1 4 and 4 1 is not the
 same for that function.
 What I would like to see, though, are only the first two rows (or the 
 first and the third, it does not matter).
 
 Does anybody know how to do that?
 Cheers, Dax.

How about:

  h[!duplicated(t(apply(h,1,sort))),]
   [,1] [,2]
  [1,]44
  [2,]14

Better than

  unique(t(apply(h,1,sort)))
   [,1] [,2]
  [1,]44
  [2,]14

in general (though it comes to the same for your example)
since it preserves the order of elements in each row.

Cheers,
Ted.



E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861  [NB: New number!]
Date: 29-Jan-05   Time: 14:26:31
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Unique lists from a list

2004-09-01 Thread michael watson (IAH-C)
Hi

I have a list.  Two of the elements of this list are Name and
Address, both of which are character vectors.  Name and Address are
linked, so that the same Name always associates with the same
Address.

What I want to do is pull out the unique values, as a new list of the
same format (ie two elements of character vectors).  Now I've worked out
that unique(list$Name) will give me a list of the unique names, but how
do I then go and link those to the correct (unique) addresses so I end
up with a new list which is the same format as the rest, but now unique?

Cheers
Mick

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Unique lists from a list

2004-09-01 Thread Prof Brian Ripley
On Wed, 1 Sep 2004, michael watson (IAH-C) wrote:

 I have a list.  Two of the elements of this list are Name and
 Address, both of which are character vectors.  Name and Address are
 linked, so that the same Name always associates with the same
 Address.
 
 What I want to do is pull out the unique values, as a new list of the
 same format (ie two elements of character vectors).  Now I've worked out
 that unique(list$Name) will give me a list of the unique names, but how
 do I then go and link those to the correct (unique) addresses so I end
 up with a new list which is the same format as the rest, but now unique?

match, as in match(unique(list$Name), list$name), OR indexing as in

Address - list$Address
names(Address) - list$Name
Name - unique(list$Name)
list(Name, as.vector(Address[Name])

OR choose a better data structure as in

unique(as.data.frame(list))

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Unique lists from a list

2004-09-01 Thread james . holtman




Try this:

l.1 - list(list(name='a', addr='123'),list(name='b', addr='234'),
  list(name='b', addr='234'), list(name='a', addr='123'))  # create a
list


l.names - unlist(lapply(l.1, '[[', 'name'))  # get the 'name'
l.u - unique(l.names)  # make unique

new.list - l.1[match(l.u, l.names)]  # create new list with just one
'name'

__
James HoltmanWhat is the problem you are trying to solve?
Executive Technical Consultant  --  Office of Technology, Convergys
[EMAIL PROTECTED]
+1 (513) 723-2929


   
   
  michael watson  
   
  (IAH-C) To:   [EMAIL PROTECTED]   

  [EMAIL PROTECTED]cc:

  .ac.uk  Subject:  [R] Unique lists from a 
list 
  Sent by: 
   
  [EMAIL PROTECTED]
   
  ath.ethz.ch  
   
   
   
   
   
  09/01/2004 10:31 
   
   
   
   
   




Hi

I have a list.  Two of the elements of this list are Name and
Address, both of which are character vectors.  Name and Address are
linked, so that the same Name always associates with the same
Address.

What I want to do is pull out the unique values, as a new list of the
same format (ie two elements of character vectors).  Now I've worked out
that unique(list$Name) will give me a list of the unique names, but how
do I then go and link those to the correct (unique) addresses so I end
up with a new list which is the same format as the rest, but now unique?

Cheers
Mick

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Unique lists from a list

2004-09-01 Thread Adaikalavan Ramasamy
name - c(a, b, a, c, d, a, b)
addr - c(10, 20, 10, 30, 40, 10, 20)

duplicated(name)
[1] FALSE FALSE  TRUE FALSE FALSE  TRUE  TRUE

which(duplicated(name))
[1] 3 6 7

addr[ -which(duplicated(name)) ]
[1] 10 20 30 40

cbind( name, addr) [ -which(duplicated(name)),  ]
 name addr
[1,] a  10
[2,] b  20
[3,] c  30
[4,] d  40

Make sure that person named a always lives in address 10 (i.e.
one-to-one mapping). 


If it is possible for person a to have two addresses (e.g. house and
office) 10 and 11, then it might be better to collect both address.
In this case, you can try :

addr2  - c(10, 20, 11, 30, 40, 12, 21)
tapply(addr2, as.factor(name), function(x) paste(x, collapse=, ) )
   abcd
10, 11, 12 20, 21 30 40

To convert this into a list, use sapply(a, strsplit, split=, ).



On Wed, 2004-09-01 at 15:31, michael watson (IAH-C) wrote:
 Hi
 
 I have a list.  Two of the elements of this list are Name and
 Address, both of which are character vectors.  Name and Address are
 linked, so that the same Name always associates with the same
 Address.
 
 What I want to do is pull out the unique values, as a new list of the
 same format (ie two elements of character vectors).  Now I've worked out
 that unique(list$Name) will give me a list of the unique names, but how
 do I then go and link those to the correct (unique) addresses so I end
 up with a new list which is the same format as the rest, but now unique?
 
 Cheers
 Mick
 
 __
 [EMAIL PROTECTED] mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html