Re: [R] newbie: new_data_frame - selected set of rows

2006-12-01 Thread Darek Kedra
Two missing things:

distances
 [1] 13 14 10 11  2  4  6  1  3  9  8 12  7  5

#numbers correspond to rows in my_dataframe

 my_dataframe
  V2 V3 V4
V5 V6
ENSP0354687 35660.45 0.04794521 0.05479452
0.06849315 0.07534247
ENSP0355046 38942.77 0.02967359 0.04451039
0.04451039 0.06824926
ENSP0354499 57041.21 0.04700855 0.08760684
0.11965812 0.06196581

ENSP0354687 etc are rownames. 

I am trying to get top five row names with smallest
distances from a given vector as calculated by
distancevector from hopach.



Darek Kedra





 

Cheap talk?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] new_data_frame - selected set of rows

2006-11-30 Thread Darek Kedra
Hello,

this is probably trivial but I failed to find this particular snippet of code.

What I got:
my_dataframe (contains say a 40k rows and 4 columns)
distances (vector with euclidean distances between a query vector and each of 
the rows of my_dataframe)

What I do:
after scaling data my_dataframe I calculate distances. order them then extract 
top five hits

my_dataframe  - read.table(myDB.csv, header=F, dec=., sep=;,
row.names=1)
#reads the whole file

scaled_DB - scale(my_dataframe, center=FALSE)
#scales the values

require(hopach)
#checks necessary R package

distances - order(distancevector(scaled_DB, scaled_DB['query',], d=euclid))
#calculates distances and orders the results from lowest

for(i in distances[1:5]) print( dbfile[i,])
#prints top five hits just for debugging
 
What I want to do:
1) create a small top_five frame
sadly this does not work:
for(i in distances[1:5]) top_five[i,] - my_dataframe[i,]

2) after I got top_five I woul like to get the index of my query entry, 
something along Pythons 
top_five.index('query_string')

3) possibly combine values in distances with row names from my_dataframe:
row_1 distance_from_query1
row_2 distance_from_query2

Thank you very much for your help

Darek Kedra




 
-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] newbie: new_data_frame - selected set of rows

2006-11-30 Thread Darek Kedra
Hello,

this is probably trivial but I failed to find this
particular snippet of code.

What I got:
my_dataframe (contains say a 40k rows and 4 columns)
distances (vector with euclidean distances between a
query vector and each of the rows of my_dataframe)

What I do:
after scaling data my_dataframe I calculate distances.
order them then extract top five hits

my_dataframe  - read.table(myDB.csv, header=F,
dec=., sep=;,
row.names=1)
#reads the whole file

scaled_DB - scale(my_dataframe, center=FALSE)
#scales the values

require(hopach)
#checks necessary R package

distances - order(distancevector(scaled_DB,
scaled_DB['query',], d=euclid))
#calculates distances and orders the results from
lowest

for(i in distances[1:5]) print( dbfile[i,])
#prints top five hits just for debugging
 
What I want to do:
1) create a small top_five frame
sadly this does not work:
for(i in distances[1:5]) top_five[i,] -
my_dataframe[i,]

2) after I got top_five I woul like to get the index
of my query entry, something along Pythons 
top_five.index('query_string')

3) possibly combine values in distances with row names
from my_dataframe:
row_1 distance_from_query1
row_2 distance_from_query2

Thank you very much for your help

Darek Kedra

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.