Re: [R] newbie: new_data_frame - selected set of rows
Two missing things: distances [1] 13 14 10 11 2 4 6 1 3 9 8 12 7 5 #numbers correspond to rows in my_dataframe my_dataframe V2 V3 V4 V5 V6 ENSP0354687 35660.45 0.04794521 0.05479452 0.06849315 0.07534247 ENSP0355046 38942.77 0.02967359 0.04451039 0.04451039 0.06824926 ENSP0354499 57041.21 0.04700855 0.08760684 0.11965812 0.06196581 ENSP0354687 etc are rownames. I am trying to get top five row names with smallest distances from a given vector as calculated by distancevector from hopach. Darek Kedra Cheap talk? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] new_data_frame - selected set of rows
Hello, this is probably trivial but I failed to find this particular snippet of code. What I got: my_dataframe (contains say a 40k rows and 4 columns) distances (vector with euclidean distances between a query vector and each of the rows of my_dataframe) What I do: after scaling data my_dataframe I calculate distances. order them then extract top five hits my_dataframe - read.table(myDB.csv, header=F, dec=., sep=;, row.names=1) #reads the whole file scaled_DB - scale(my_dataframe, center=FALSE) #scales the values require(hopach) #checks necessary R package distances - order(distancevector(scaled_DB, scaled_DB['query',], d=euclid)) #calculates distances and orders the results from lowest for(i in distances[1:5]) print( dbfile[i,]) #prints top five hits just for debugging What I want to do: 1) create a small top_five frame sadly this does not work: for(i in distances[1:5]) top_five[i,] - my_dataframe[i,] 2) after I got top_five I woul like to get the index of my query entry, something along Pythons top_five.index('query_string') 3) possibly combine values in distances with row names from my_dataframe: row_1 distance_from_query1 row_2 distance_from_query2 Thank you very much for your help Darek Kedra - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] newbie: new_data_frame - selected set of rows
Hello, this is probably trivial but I failed to find this particular snippet of code. What I got: my_dataframe (contains say a 40k rows and 4 columns) distances (vector with euclidean distances between a query vector and each of the rows of my_dataframe) What I do: after scaling data my_dataframe I calculate distances. order them then extract top five hits my_dataframe - read.table(myDB.csv, header=F, dec=., sep=;, row.names=1) #reads the whole file scaled_DB - scale(my_dataframe, center=FALSE) #scales the values require(hopach) #checks necessary R package distances - order(distancevector(scaled_DB, scaled_DB['query',], d=euclid)) #calculates distances and orders the results from lowest for(i in distances[1:5]) print( dbfile[i,]) #prints top five hits just for debugging What I want to do: 1) create a small top_five frame sadly this does not work: for(i in distances[1:5]) top_five[i,] - my_dataframe[i,] 2) after I got top_five I woul like to get the index of my query entry, something along Pythons top_five.index('query_string') 3) possibly combine values in distances with row names from my_dataframe: row_1 distance_from_query1 row_2 distance_from_query2 Thank you very much for your help Darek Kedra __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.