Hello, I am a graduate student in industrial psychology working on my dissertation to create an adaptive job analysis survey questionnaire that classifies people into one of 1152 discrete �job type� categories (i.e., job titles) using Bbns. For those without time to read a long-winded question, I will summarize it now and elaborate later for those interested who want to read on. The network is a naive Bayes model with 438 children (i.e., the 438 survey questions, each with 7 seven states) and one parent (i.e., the job type category, with 1152 states) with no links among children. Using Netica�s �Sensitivity to Findings� I am selecting the most informative questions to present-- in a way to eliminate questions that don�t provide much additional information about the person�s job type. Therefore, after they respond to a question, the network is updated and the next most informative question is selected. Each time they respond, I query the parent (i.e. job type) node to find the post probable state (of the 1152) and its probability value. After administering about 30-35 questions (out of the 438), the probability values of the most probable state of the parent node (i.e., job type) often exceeds .8, and as more questions are administered, that value exceeds .95 (the point at which I stop administering questions). However, the accuracy of the Bbn to accurately predict a person�s ACTUAL job type is around one in four (roughly one in four times it actually guesses the correct state from the 1152 job types). Why would the computer be 95% confident that a node is a particular state, yet only be 25% accurate at predicting the actual state? Any suggestions to improve the accuracy of the prediction? MORE INFORMATION: This questionnaire is a job analysis instrument created by the government to measure the knowledge, skills, abilities, and activities needed for all types of work. The questions are seven-point Likert-scale questions (i.e., strongly agree to strongly disagree). There is an existing database of 6000 cases (people who responded to all 438 questions) across all 1152 job types. In other words, roughly 5 people in each job type responded to all questions, and this dataset is what I used to make the Bbn. Obviously, since there are multiple people in each job (and multiple jobs may have similar responses to several questions), the data is noisy. Before creating the network, I randomly selected 50 cases out of the 6000 (as simulated participants) and made the network on the remaining 5950. I used Netica�s Sensitivity to Findings to select a question that provides the most information for the person�s job type, each time updating the network with their response. Using the 50 cases (of which I know their correct job type), I simulated people answering the questions as they would have responded and observe the job type node probability value. I keep administering questions until a state within the parent node (i.e., job type) exceeds .95. Keep in mind that there are 1152 states; and the probability of any one state, given no information, is .00087. It is pretty strange that given 30-35 of the most informative findings (out of 438) that the probability of a particular state would exceed .8 or .9. Nonetheless, I have checked the accuracy of the network to predict the actual job type (of the simulated participants), and it is slightly less than .25. Why would the Bbn insist that it is over 95% confident that the job type node is a given state, yet be less that 25% accurate in prediction? Also, although 25% accurate is noteworthy, and MUCH better than a human could do given the same information, it isn�t as high as I would like (hard to convince people that 75% wrong is acceptable). When it is wrong, it is usually not far off (the Bbn will guess the person is a Chemical Engineer when they are really a Chemist). Any suggestions to further improve it�s accuracy rate? Thanks for your time, Scott Bublitz NC State University
