Re: (È«º¸)ÃÖ°È«º¸ÇÁ·Î±×·¥!!È«º¸°ÆÁ¤³¡.
This is a multi-part message in MIME format. --=_NextPart_000_0017_01C1BFC6.F446E040 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable O=BA=BB=B8=DE=C0=CF=C0=BA=C1=A4=BA=B8=C5=EB=BD=C5=B8=C1=C0=CC=BF=EB=C3=CB= =C1=F8=B9=D7=C1=A4=BA=B8=BA=B8=C8=A3=B5=EE=BF=A1=B0=FC=C7=D1=B9=FD=B7=FC=C1= =A650=C1=B6=BF=A1=C0=C7=B0=C5=C7=D1[=B1=A4=B0=ED]=B8=DE=C0=CF=C0=D4=B4=CF= =B4=D9BLANK [EMAIL PROTECTED] wrote in message = [EMAIL PROTECTED]">news:[EMAIL PROTECTED]... O =BA=BB =B8=DE=C0=CF=C0=BA =C1=A4=BA=B8=C5=EB=BD=C5=B8=C1 = =C0=CC=BF=EB=C3=CB=C1=F8 =B9=D7 =C1=A4=BA=B8=BA=B8=C8=A3 =B5=EE=BF=A1 = =B0=FC=C7=D1 =B9=FD=B7=FC =C1=A6 50=C1=B6=BF=A1 =C0=C7=B0=C5=C7=D1 = [=B1=A4=B0=ED] =B8=DE=C0=CF=C0=D4=B4=CF=B4=D9 O e-mail=C1=D6=BC=D2=B4=C2 =C0=CE=C5=CD=B3=DD=BB=F3=BF=A1=BC=AD = =C3=EB=B5=E6=C7=CF=BF=B4=C0=B8=B8=E7, =C1=D6=BC=D2=BF=DC = =BE=EE=B6=B0=C7=D1 =B0=B3=C0=CE =C1=A4=BA=B8=B5=B5 =B0=A1=C1=F6=B0=ED = =C0=D6=C1=F6 =BE=CA=BD=C0=B4=CF=B4=D9 =BC=F6=BD=C5=B0=C5=BA=CE=B8=A6 =BF=F8=C7=CF=BD=C3=B8=E9 = =BE=C6=B7=A1=BF=A1=BC=AD =BC=F6=BD=C5=B0=C5=BA=CE =C7=D8 = =C1=D6=BC=BC=BF=E4.=C1=A4=BA=B8=B8=A6 =BF=F8=C4=A1 =BE=CA=B4=C2 = =BA=D0=B2=B2=B4=C2 =B4=EB=B4=DC=C8=F7 =C1=CB=BC=DB =C7=D5=B4=CF=B4=D9. =A2=BF=A2=BF=A2=BF =C8=AB=BA=B8 =B6=A7=B9=AE=BF=A1 =B0=C6=C1=A4 = =C7=CF=BC=CC=B3=AA=BF=E4? =C0=CC=C1=A8 =B0=C6=C1=A4 =B8=B6=BC=BC=BF=E4. = =A2=BF=A2=BF=A2=BF =C8=AB=BA=B8=BF=A1 =B4=EB=C7=D1 =B8=F0=B5=E7=B0=CD=B0=FA = =B3=EB=C7=CF=BF=EC =BF=A9=B1=E2 =B4=D9 =C0=D6=BD=C0=B4=CF=B4=D9.=20 =B9=AB=BE=FA=C0=CC=B5=E7=C1=F6 =B9=B0=BE=EE =BA=B8=BC=BC=BF=E4. = mailto:[EMAIL PROTECTED] =A2=BA=A2=BA=A2=BA =C0=CC=B9=F8=BF=A1 = =C8=AB=BA=B8=B4=EB=C7=E0=BE=F7 =C0=B8=B7=CE = =C0=FC=C8=AF=C7=D4=BF=A1=B5=FB=B6=F3=20 3=B3=E2=B5=BF=BE=C8 =B8=F0=BE=C6=B3=F5=C0=BA = =C8=AB=BA=B8=C7=C3=B1=D7=B7=A5=C0=BB =BF=B0=B0=A1=B7=CE = =B4=D9=B5=E5=B8=B2=B4=CF=B4=D9.=A2=B8=A2=B8=A2=B8 =20 =A2=BE=A2=BE=A2=BE =C8=AB=BA=B8=C3=CA=BA=B8=BF=EB = =A2=BE=A2=BE=A2=BE=20 =A2=BD=C0=CC=B8=E1=C3=DF=C3=E2=B1=E22=B0=B3 = =A2=BD=C0=CC=B8=E1=C6=ED=C1=FD=B1=E21=B0=B3 = =A2=BD=C0=CC=B8=E1=B9=DF=BC=DB=B1=E22=B0=B3(=C1=A4=C7=B01,=B5=A5=B8=F01) = =A2=BD=C0=CC=B8=E1=B8=AE=BD=BA=C6=AE50=B8=B8=B0=B3 = =A2=BD=B0=D4=BD=C3=C6=C7=B5=EE=B7=CF=B1=E21=B0=B3 = =A2=BD=B0=D4=BD=C3=C6=C7=B5=F0DB2000=B0=B3 =A2=D1 =C0=A7=C0=C7 =B8=F0=B5=E7=B0=CD=C0=BB = 10=B8=B8=BF=F8=BF=A1 =B4=D9 =B5=E5=B8=B3=B4=CF=B4=D9. =A2=D0 =20 =A2=BE=A2=BE=A2=BE =C8=AB=BA=B8=C1=DF=B1=DE=BF=EB = =A2=BE=A2=BE=A2=BE =A2=BD=C0=CC=B8=E1=C3=DF=C3=E2=B1=E23=B0=B3 = =A2=BD=C0=CC=B8=E1=C6=ED=C1=FD=B1=E21=B0=B3 = =A2=BD=C0=CC=B8=E1=B9=DF=BC=DB=B1=E23=B0=B3(=C1=A4=C7=B02=B0=B3,=B5=A5=B8= =F01=B0=B3) =A2=BD=C0=CC=B8=E1=B8=AE=BD=BA=C6=AE100=B8=B8=B0=B3 = =A2=BD=B0=D4=BD=C3=C6=C7=B5=EE=B7=CF=B1=E21=B0=B3 = =A2=BD=B0=D4=BD=C3=C6=C7DB5000=B0=B3 =A2=D1 =C0=A7=C0=C7 =B8=F0=B5=E7=B0=CD=C0=BB = 20=B8=B8=BF=F8=BF=A1 =B4=D9 =B5=E5=B8=B3=B4=CF=B4=D9. =A2=D0 =20 =A2=BE=A2=BE=A2=BE =C8=AB=BA=B8=B0=ED=B1=DE=BF=EB(1) = =A2=BE=A2=BE=A2=BE =A2=BC=A2=BC=A2=BC=B0=B3=C0=CE =C8=A8=C6=E4=C1=F6=BF=A1 = =C0=CC=B8=E1=C3=DF=C3=E2,=B9=DF=BC=DB=B1=E2=B8=A6 =C1=F7=C1=A2 = =BC=B3=C4=A1 =C7=D8 =B5=E5=B8=B3=B4=CF=B4=D9.=A2=BC=A2=BC=A2=BC = =A2=BD=C0=CC=B8=E1=C3=DF=C3=E2=B1=E2=B4=C9=A2=BD=C0=CC=B8=E1=C1=DF=BA=B9=BB= =E8=C1=A6=B1=E2=B4=C9=A2=BD=BC=F6=BD=C5=B0=C5=BA=CE=C0=DA=B5=BF=B1=E2=B4=C9= =A2=BD=C0=CC=B8=E1=B9=DF=BC=DB=B1=E2=B4=C9 = =A2=BD=BC=F6=BD=C5=B0=C5=BA=CE=C0=DA=C0=D3=BD=C3=BA=B8=B3=BB=B1=E2=A2=BD=C0= =D3=BD=C3=B0=C5=BA=CE=C0=DA=BC=F6=BD=C5=B0=C5=BA=CE=C0=DA=B7=CE = =A2=D1=BC=B3=C4=A1=B0=A1=B4=C9=C7=D1=B0=F7=3D=C8=A8=C6=E4=C1=F6=BF=A1MYSQ= L=B0=E8=C1=A4=C0=CC =C0=D6=BE=EE=BE=DF=C7=D4 =C0=AF=B7=E1=C8=A8=C0=CC =BE=F8=B4=C2=B0=E6=BF=EC=B4=C2 = (200=B8=DE=B0=A1,=C0=CF=B3=E2=C8=A3=BD=BA=C6=C34.4000=BF=F8=BA=B0=B5=B5=C0= =D3) =A2=D1 =C0=A7=C0=C7 =BC=B3=C4=A1=B8=A6 20=B8=B8=BF=F8=BF=A1 = =C7=D8=B5=E5=B8=B3=B4=CF=B4=D9. =A2=D0 =20 =A2=BE=A2=BE=A2=BE =C8=AB=BA=B8=B0=ED=B1=DE=BF=EB(2) = =A2=BE=A2=BE=A2=BE 1000=B8=B8=B0=B3 =C0=CC=B8=E1=B8=AE=BD=BA=C6=AE=B8=A6 = =BF=C3=B8=B0=BC=AD=B9=F6=B8=A6 = =B8=EE=BB=E7=B6=F7=BF=A1=B0=D4=B8=B8=C0=D3=B4=EB=C7=D4=B4=CF=B4=D9. = (=B1=E2=B0=A31=B3=E2=3D=B0=A1=B0=DD100=B8=B8=BF=F8)=C8=AB=BA=B8=C7=C1=B7=CE= =B1=D7=B7=A5=B0=FA =B8=F0=B5=E7 =B3=EB=C7=CF=BF=EC=B8=A6 =C0=FC=BA=CE = =C0=FC=BC=F6 =C7=D4=B4=CF=B4=D9. =20 =A2=C2=A2=C2=A2=C2 =C0=CC=B8=E1 =B1=A4=B0=ED =B4=EB=C7=E0 = =A2=C2=A2=C2=A2=C2 =B1=D7=B5=BF=BE=C8 =C8=AB=BA=B8=C0=C7 =B3=EB=C7=CF=BF=EC=B7=CE = 2=B3=E2=BF=A1 =B0=C9=C3=C4 =B9=DF=BC=DB=BD=C3=BC=B3=C0=BB =BF=CF=BA=F1 = =C7=CF=B0=ED 6000=B8=B8=B0=B3=C0=C7 =C0=CC=B8=E1=B5=A5=C0=CC=C5=B8=B8=A6 = =B1=B8=BA=F1=C7=CF=BF=A9 =C0=CC=B8=E1=C8=AB=BA=B8=B8=A6 = =B4=EB=C7=E0=C7=D8 =B5=E5=B8=B3=B4=CF=B4=D9.
Re: detecting outliers in NON normal data ?
What about Hat Matrix ? Mahalanobis distance ? Yves Voltolini [EMAIL PROTECTED] wrote in message 00f301c1be68$13413000$fde9e3c8@oemcomputer">news:00f301c1be68$13413000$fde9e3c8@oemcomputer... Hi, I would like to know if methods for detecting outliers using interquartil ranges are indicated for data with NON normal distribution. The software Statistica presents this method: data point value UBV + o.c.*(UBV - LBV) data point value LBV - o.c.*(UBV - LBV) where: UBV is the 75th percentile) and LBV is the 25th percentile). o.c. is the outlier coefficient. In the biological world many data are not normally distributed and tests like Rosner, Dixon and Grubbs (if I am wright ! ) are good just for normally distributed data. Does anyone can help me ? Thanks.. _ Prof. J. C. Voltolini Grupo de Estudos em Ecologia de Mamiferos - ECOMAM Universidade de Taubate - Depto. Biologia Praca Marcellino Monteiro 63, Bom Conselho, Taubate, SP - BRASIL. 12030-010 TEL: 0XX12-2254165 (lab.), 2254277 (depto.) FAX: 0XX12-2322947 E-Mail: [EMAIL PROTECTED] http://www.mundobio.rg3.net/ = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ = = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Save Money this Month on Toner Cartridges!!
D J Printing Corporation 2564 Cochise Drive Acworth, GA 30102 (V)770-974-8228 (F)770-974-7223 [EMAIL PROTECTED] --LASER, FAX AND COPIER PRINTER TONER CARTRIDGES-- *WE ACCEPT GOVERNMENT, SCHOOL AND UNIVERSITY PURCHASE ORDERS* ***FREE SHIPPING WITH ANY ORDER OF $200 OR MORE!!!*** APPLE LASER WRITER SELECT 300/310/360 $60 LASER WRITER PRO 600/630 OR 16/600$60 LASER WRITER 300/320 OR 4/600 $45 LASER WRITER LS/NT/NTR/SC $50 LASER WRITER 2NT/2NTX/2SC/2F/2G $50 LASER WRITER 12/640$60 HEWLETT PACKARD LASERJET SERIES 1200 (C7115A) $40 LASERJET SERIES 4100X/4100A (C8061A/X)$99 LASERJET SERIES 1100/1100A (C4092A) $40 LASERJET SERIES 2100/SE/XI/M/TN (C4096A) $70 LASERJET SERIES 2/2D/3/3D (92295A)$43 LASERJET SERIES 2P/2P+/3P (92275A)$55 LASERJET SERIES 3SI/4SI (92291A)$75 LASERJET SERIES 4/4M/4+/4M+/5/5M/5N (92298A/X)$55 LASERJET SERIES 4L/4ML/4P/4MP (92274A)$40 LASERJET SERIES 4000/T/N/TN (C4127A/X-H YLD) $70 LASERJET SERIES 4V/4MV (C3900A) $80 LASERJET SERIES 5000 (C4129X)$95 LASERJET SERIES 5L/6L (C3906A)$39 LASERJET SERIES 5P/5MP/6P/6MP (C3903A)$50 LASERJET SERIES 5SI/5SI MX/5SI MOPIER/8000(C3909A/X) $80 LASERJET SERIES 8100/N/DN ((C4182X) $100 HEWLETT PACKARD LASERFAX LASERFAX 500/700, FX1 $50 LASERFAX 5000/7000, FX2 $65 LASERFAX FX3 $60 LASERFAX FX4 $65 LEXMARK E312L, E310 (13T0101) $60 OPTRA 4019, 4029 HIGH YIELD $130 OPTRA R, 4039, 4049 HIGH YIELD$125 OPTRA S, 4059 HIGH YIELD $135 OPTRA N $100 OPTRA T 610/612/614 $185 EPSON LASER TONER EPL-7000/7500/8000$95 EPL-1000/1500 $95 EPSON INK JET STYLUS COLOR 440/640/740/760/860 (COLOR) $20 STYLUS COLOR 740/760/860 (BLACK) $20 CANON LBP-430 $45 LBP-460/465 $55 LBP-8 II $50 LBP-LX$54 LBP-NX$90 LBP-AX$49 LBP-EX$59 LBP-SX$49 LBP-BX$90 LBP-PX$49 LBP-WX$90 LBP-VX$59 CANON FAX L700 THRU L790 (FX1)$55 CANON FAX L5000 THRU L7500 (FX2) $65 CANON LASERCLASS 4000/4500/300 (FX3) $60 CANON LASERCLASS 8500 THRU 9800 (FX4) $65 CANON COPIERS PC 1/2/3/6/6RE/7/8/11/12/65 (A30) $69 PC 210 THRU 780 (E40/E31) $80 PC 300/400 (E20/E16) $80 NEC SERIES 2 LASER MODEL 90/95$100 ***FREE SHIPPING WITH ANY ORDER OF $200 OR MORE!!!*** PLEASE NOTE: * ALL OF OUR PRICES ARE IN US DOLLARS * WE SHIP UPS GROUND. ADD $6.50 FOR SHIPPING AND HANDLING * WE ACCEPT ALL MAJOR CREDIT CARDS OR COD ORDERS. * COD CHECK ORDERS ADD $3.50 TO YOUR SHIPPING COST. * OUR STANDARD MERCHANDISE REPLACEMENT POLICY IS NET 90 DAYS. * WE DO NOT SELL TO RESELLERS OR BUY FROM DISTRIBUTERS. * WE DO NOT CARRY: BROTHER, MINOLTA, KYOSERA, PANASONIC, XEROX, FUJITSU, OKIDATA OR SHARP PRODUCTS. * WE ALSO DO NOT CARRY: DESKJET OR BUBBLEJET SUPPLIES. * WE DO NOT BUY FROM OR SELL TO RECYCLERS OR REMANUFACTURERS. -PLACE YOUR ORDER AS FOLLOWS- 1) BY PHONE (770) 974-8228 2) BY MAIL: D AND J PRINTING CORPORATION 2564 COCHISE DR ACWORTH, GA 30102 3) BY INTERNET: [EMAIL PROTECTED]
Re: CRIMCOORD transformation in QUEST
That is either a sloppiness in writing or reliance on the relationship between eigen decomposition and SVD. SSM - square symmetric matrix AM - arbitrary matrix In ED, SSM = Q E Q' In SVD, AM = P D Q' SSM = AM' AM = Q D P' P D Q' = Q D D Q' = Q E Q', if E = D D I haven't checked that above, but it is pretty close to accurate. You may need to throw in a division by n. David Chang wrote: Hi, thank you for reading this message. I have the following problems in getting the correct CRIMCOORD transformation of categorical variables in QUEST decision tree algorithm. Your help will be greatly appreciated. Q1: In Loh Shih's paper (Split Selection Models for Classification Trees, Statistica Sinica, 1997, vol 7, p815-840), they mentioned about the mapping from categorical variable to ordered variable via CRIMCOORD. But, their explanation, in particular, step 5 of algorithm 2 is not clear. For example, they wrote Perform a singular value decomposition of the matrix GFU and let a (vector) be the eigenvector (of what?) associated with the largest eigenvalue in step 5. Does this mean a(vector) is the eigenvector of transpose(GFU)*GFU? Q2. I tried to verify the data sets in Table 1. Data set I-III are OK. But, the result for data set IV seems to be incorrect. Could any one of you help me verify that? Thank you very much for your help !! David = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
EDSTAT list
Dear EdStat readers, During the next week (probably Friday or Monday), the EdStat list will move to a new server. At that time, we will also start using a new version of the Majordomo software. We hope that these changes will reduce the amount of spam sent to the list. In addition, the threat of viruses will be reduced because attachments will no longer be allowed. We hope the transition will be smooth, and we will try to keep you informed as changes are implemented. If problems arise, please be patient and check the web page http://jse.stat.ncsu.edu for information. Jackie Dietz Listowner -- E. Jacquelin Dietz (919) 515-1929 (phone) Department of Statistics, Box 8203 (919) 515-1169 (FAX) North Carolina State University Raleigh, NC 27695-8203 USA[EMAIL PROTECTED] Street address for FedEx: Room 210E Patterson Hall, 2501 Founders Drive = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Applied analysis question
I have a continuous response variable that ranges from 0 to 750. I only have 90 observations and 26 are at the lower limit of 0, which is the modal category. The mean is about 60 and the median is 3; the distribution is highly skewed, extremely kurtotic, etc. Obviously, none of the power transformations are especially useful. The product moment correlation between the response and the primary covariate is near zero, however, a rank-order correlation coefficient is about .3 and is signficant. We have 5 additional control variables. I'm convinced that any attempt to model the conditional mean response is completely inappropriate, yet all of the alternatives appear flawed as well. Here's what I've done: I've collapsed the outcome into 3- and 4- category ordered response variables and estimated ordered logit models. I dichotomized the response (any vs none) and estimated binomial logit. All of these approaches yield substantively consistent results using both the model based standard errors and the Huber-White sandwich robust standard errors. My concerns about this approach are 1) the somewhat arbitrary classification restricts the observed variability, and 2) the estimators assume large sample sizes. I rank transformed the response variable and estimated a robust regression (using the rreg procedure in Stata)--results were consistent with those obtained for the ordered and binomial logit models described above. I know that Stokes, Davis, and Koch have presented procedures to estimate analysis of covariance on ranks, but I've not seen reference to the use of rank transformed response variables in a regression context. A plot of the rank-transformed response with the primary covariate clearly suggests a meaningful pattern. Contingency table analysis with a collapsed covariate strongly suggest a meaningful pattern. But I'm at something of a loss to know the best way to analyze and report the results. Thanks in advance. = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: Applied analysis question
At 04:11 PM 2/27/02 -0500, Rich Ulrich wrote: Categorizing the values into a few categories labeled, none, almost none, is one way to convert your scores. If those labels do make sense. well, if 750 has the same numerical sort of meaning as 0 (unit wise) ... in terms of what is being measured then i would personally not think so SINCE, the categories above 0 will encompass very wide ranges of possible values if the scale was # of emails you look at in a day ... and 1/3 said none or 0 ... we could rename the scale 0 = not any, 1 to 50 as = some, and 51 to 750 as = many (and recode as 1, 2, and 3) .. i don't think anyone who just saw the labels ... and were then asked to give some extemporaneous 'values' for each of the categories ... would have any clue what to put in for the some and many categories ... but i would predict they would seriously UNderestimate the values compared to the ACTUAL responses this just highlights that for some scales, we have almost no differentiation at one end where they pile up ... perhaps (not saying one could have in this case) we could have anticipated this ahead of time and put scale categories that might have anticipated that after the fact, we are more or less dead ducks i would say this though ... treating the data only in terms of ranks ... does not really solve anything ... and clearly represents being able to say LESS about your data or interrelationships (even if the rank order r is .3 compared to the regular pearson of about 0) ... than if you did not resort to only thinking about the data in rank terms -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ = Dennis Roberts, 208 Cedar Bldg., University Park PA 16802 Emailto: [EMAIL PROTECTED] WWW: http://roberts.ed.psu.edu/users/droberts/drober~1.htm AC 8148632401 = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: Means of semantic differential scales
At 01:39 PM 2/27/02 -0600, Jay Warner wrote: Not stressful 1__ 2__ 3__ 4__ 5__ 6__ 7__ Very stressful just out of curiosity ... how many consider the above to be an example of a bipolar scale? i don't now, if we had an item like: sad happy 1 . 7 THEN the mid point becomes much more problematic ... since being a 4 ... is neither a downer nor upper now, a quick search found info from ncs about the 16pf personality scale ... it shows 16 BIpolar dimensions as: Bipolar Dimensions of Personality Factor A Warmth (Cool vs Warm) Factor B Intelligence (Concrete Thinking vs Abstract Thinking) Factor C Emotional Stability (Easily Upset vs Calm) Factor E Dominance (Not Assertive vs Dominant) Factor F Impulsiveness (Sober vs Enthusiastic) Factor G Conformity (Expedient vs Conscientious) Factor H Boldness (Shy vs Venturesome) Factor I Sensitivity (Tough-Minded vs Sensitive) Factor L Suspiciousness (Trusting vs Suspicious) Factor M Imagination (Practical vs Imaginative) Factor N Shrewdness (Forthright vs Shrewd) Factor O Insecurity (Self-Assured vs Self-Doubting) Factor Q1 Radicalism (Conservative vs Experimenting) Factor Q2 Self-Sufficiency (Group-Oriented vs Self-Sufficient) Factor Q3 Self-Discipline (Undisciplined vs Self-Disciplined) Factor Q4 Tension (Relaxed vs Tense) let's take the one ... shy versus venturesome ... now, we could make a venturesome scale by itself ... 0 venturesomeness .. (up to) very venturesome 7 does 0 = shy seems like if the answer is no ... then we might have a bipolar scale ... if the answer is yes ... then we don't It could be the use of the particular bipolars not stressful and very stressful. = Dennis Roberts, 208 Cedar Bldg., University Park PA 16802 Emailto: [EMAIL PROTECTED] WWW: http://roberts.ed.psu.edu/users/droberts/drober~1.htm AC 8148632401 = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: Statistics Tool For Classification/Clustering
Good places to start: Optimal feature extractors, that's better than PCA because you whiten your inter class scatter and so put all inter class comparisons on the same level. The good thing is this will also reduce your feature vector dimensionality to c-1 (where c is # classes). PCA will not do this. Check the stats of each class, is it Gaussian or known pdf? Apply parameteric classifier if so. However you are lucky if you get good classification after this, so you will probably need non linear, non parametric classifiers. Try K nearest neighobour, but that might take the age of the Universe so use a condensing algorithm first to get a smaller representative set. Matlab is what I use for coding, there are a lot of free toolboxes around. Mostly I write my own though. Best wishes Andrew Rishabh Gupta [EMAIL PROTECTED] wrote in message news:a4eje9$ip8$[EMAIL PROTECTED]; Hi All, I'm a research student at the Department Of Electronics, University Of York, UK. I'm working a project related to music analysis and classification. I am at the stage where I perform some analysis on music files (currently only in MIDI format) and extract about 500 variables that are related to music properties like pitch, rhythm, polyphony and volume. I am performing basic analysis like mean and standard deviation but then I also perform more elaborate analysis like measuring complexity of melody and rhythm. The aim is that the variables obtained can be used to perform a number of different operations. - The variables can be used to classify / categorise each piece of music, on its own, in terms of some meta classifier (e.g. rock, pop, classical). - The variables can be used to perform comparison between two files. A variable from one music file can be compared to the equivalent variable in the other music file. By comparing all the variables in one file with the equivalent variable in the other file, an overall similarity measurement can be obtained. The next stage is to test the ability of the of the variables obtained to perform the classification / comparison. I need to identify variables that are redundant (redundant in the sense of 'they do not provide any information' and 'they provide the same information as the other variable') so that they can be removed and I need to identify variables that are distinguishing (provide the most amount of information). My Basic Questions Are: - What are the best statistical techniques / methods that should be applied here. E.g. I have looked at Principal Component Analysis; this would be a good method to remove the redundant variables and hence reduce some the amount of data that needs to be processed. Can anyone suggest any other sensible statistical anaysis methods? - What are the ideal tools / software to perform the clustering / classification. I have access to SPSS software but I have never used it before and am not really sure how to apply it or whether it is any good when dealing with 100s of variables. So far I have been analysing each variable on its own 'by eye' by plotting the mean and sd for all music files. However this approach is not feasible in the long term since I am dealing with such a large number of variables. In addition, by looking at each variable on its own, I do not find clusters / patterns that are only visible through multivariate analysis. If anyone can recommend a better approach I would be greatly appreciated. Any help or suggestion that can be offered will be greatly appreciated. Many Thanks! Rishabh Gupta = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: Statistics Tool For Classification/Clustering
Corection typo: Should read 'Whiten intra class scatter' Mark Harrison [EMAIL PROTECTED] wrote in message news:FIif8.16518$[EMAIL PROTECTED]; Good places to start: Optimal feature extractors, that's better than PCA because you whiten your inter class scatter and so put all inter class comparisons on the same level. The good thing is this will also reduce your feature vector dimensionality to c-1 (where c is # classes). PCA will not do this. Check the stats of each class, is it Gaussian or known pdf? Apply parameteric classifier if so. However you are lucky if you get good classification after this, so you will probably need non linear, non parametric classifiers. Try K nearest neighobour, but that might take the age of the Universe so use a condensing algorithm first to get a smaller representative set. Matlab is what I use for coding, there are a lot of free toolboxes around. Mostly I write my own though. Best wishes Andrew Rishabh Gupta [EMAIL PROTECTED] wrote in message news:a4eje9$ip8$[EMAIL PROTECTED]; Hi All, I'm a research student at the Department Of Electronics, University Of York, UK. I'm working a project related to music analysis and classification. I am at the stage where I perform some analysis on music files (currently only in MIDI format) and extract about 500 variables that are related to music properties like pitch, rhythm, polyphony and volume. I am performing basic analysis like mean and standard deviation but then I also perform more elaborate analysis like measuring complexity of melody and rhythm. The aim is that the variables obtained can be used to perform a number of different operations. - The variables can be used to classify / categorise each piece of music, on its own, in terms of some meta classifier (e.g. rock, pop, classical). - The variables can be used to perform comparison between two files. A variable from one music file can be compared to the equivalent variable in the other music file. By comparing all the variables in one file with the equivalent variable in the other file, an overall similarity measurement can be obtained. The next stage is to test the ability of the of the variables obtained to perform the classification / comparison. I need to identify variables that are redundant (redundant in the sense of 'they do not provide any information' and 'they provide the same information as the other variable') so that they can be removed and I need to identify variables that are distinguishing (provide the most amount of information). My Basic Questions Are: - What are the best statistical techniques / methods that should be applied here. E.g. I have looked at Principal Component Analysis; this would be a good method to remove the redundant variables and hence reduce some the amount of data that needs to be processed. Can anyone suggest any other sensible statistical anaysis methods? - What are the ideal tools / software to perform the clustering / classification. I have access to SPSS software but I have never used it before and am not really sure how to apply it or whether it is any good when dealing with 100s of variables. So far I have been analysing each variable on its own 'by eye' by plotting the mean and sd for all music files. However this approach is not feasible in the long term since I am dealing with such a large number of variables. In addition, by looking at each variable on its own, I do not find clusters / patterns that are only visible through multivariate analysis. If anyone can recommend a better approach I would be greatly appreciated. Any help or suggestion that can be offered will be greatly appreciated. Many Thanks! Rishabh Gupta = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: Applied analysis question
Brad Anderson wrote: I have a continuous response variable that ranges from 0 to 750. I only have 90 observations and 26 are at the lower limit of 0, What if you treated the information collected by that variable as really two variables, one categorical variable indicating zero or non-zero value. Then the remaining numerical variable could only be analyzed conditionally on the category was non-zero. In many cases when you collect data on consumers consumption of some commodity, you would end up in a big number of them not using the product at all, while those who used the product would consume different amounts. Rolf Dalin ** Rolf Dalin Department of Information Tchnology and Media Mid Sweden University S-870 51 SUNDSVALL Sweden Phone: 060 148690, international: +46 60 148690 Fax: 060 148970, international: +46 60 148970 Mobile: 0705 947896, intnational: +46 70 5947896 mailto:[EMAIL PROTECTED] http://www.itk.mh.se/~roldal/ ** = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =