This sounds either easy via a greedy algorithm or NP-hard. Moreover, it is not clear to me that
1) A subset of 500 indpendent rows exists, where I presume "independent" means pairwise nonoverlapping; 2) That the mean and sd can be simultaneously optimized as you describe-- what if the subset with maximum mean also has bigger than minimal sd? -- Bert Gunter Genentech Non-Clinical Statistics South San Francisco, CA "The business of the statistician is to catalyze the scientific learning process." - George E. P. Box > -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Mark > Sent: Wednesday, March 01, 2006 12:40 PM > To: r-help@stat.math.ethz.ch > Subject: [R] Optimization problem: selecting independent rows > to maximizethe mean > > Dear R community, > > I have a dataframe with 500,000 rows and 102 columns. The rows > represent spatial polygons, some of which overlap others (i.e., not > all rows are independent of each other). > > Given a particular row, the first column contains a unique "RowID". > The second column contains the "Variable" of interest. The remaining > 100 columns ("Overlap1" ... "Overlap100") each contain a row ID that > overlaps this row (but if this row overlaps fewer than 100 other rows > then the remainder of the columns "OL1...OL100" contain NA). > > Here's the problem: I need to select the subset of 500 independent > rows that maximizes the mean and minimizes the stdev of "Variable". > > Clearly this requires iterative selection and comparison of rows, > because each newly-selected row must be compared to rows already > selected to ensure it does not overlap them. At each step, a row > already selected might be removed from the subset if it can be > replaced with another that increases the mean and/or reduces the > stdev. > > The above description is a simplification of my problem, but > it's a start. > > As I am new to R (and programming in general) I'm not sure how to > start thinking about this, or even where to look. I'd appreciate any > ideas that might help. > > Thank you, Mark > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html