[CODE4LIB] complex drupal taxonomy question
Hi, I'm working on a drupal site with a very complicated taxonomy. Backstory: A polisci professor and team of students designed this project first as a theoretcal exercise as part of a senior thesis double major in political science and computer science, and then as the project of a very devoted and smart student using drupal. It's both amazingly cool and technically complex. At this point, we are trying to help rein it in to the library servers and help support it so that new crops of students can maintain it without needing to be CS majors, and also to help them address a few issues and problems that have been discovered over the past year or so. My colleague and I are totally new to Drupal, and to this database. While he's working on the solr indexing, I'm trying to help figure out the taxonomy issue. See here: http://gtrp.haverford.edu/aqsi/aqsi/statements/mustafa-abu-al-yazids-interview-al-jazeera Basically, the site indexes the public statements of al-qaeda. Each statements is assigned a bunch of terms by students who have studied jihad and al-qaeda. Each term is composed of two parts. First part: a keyword from a controlled list of keywords - there are many of these and they include places, people, theories, and other things. So, Afghanistan, Barack Obama, and media are all keywords. Second part: a context from a much smaller (around 20) collection of contexts, including I guess how the keyword figures in this statement. Example include area of jihad, enemy of islam, religious relations and others. So, the full term would be media - enemy of islam for example. And each record includes a large number of these. Going forward, we'd ideally like to allow users of the site to find all three of the following: 1. Records that contain a particular two part term. (easy - that's what taxonomy is for) 2. A list of terms that begin with the first part so that they can select the modifier for it (also easy, if we make the second term a subterm or child of the first, this will work fine) 3. A list of terms that have the second part as a qualifier. So, for example, show me all terms in which anything is called an enemy of islam and then let me choose which keyword is referred to as an enemy of jihad and show me that record. It's that third one that we can't figure out. The only way we can think to accomplish this is to basically duplicate each entry so that we'd say Haverford - enemy of islam and enemy of islam - Haverford I think that will work, but since there are many statements, and each statement has many terms, this solution doesn't seem ideal. Do any of you have ideas? Thanks very much. Laurie -- Coordinator for Digital Scholarship and Services Haverford College Library 370 Lancaster Ave Haverford, PA 19041 610-896-4226 lal...@haverford.edu
Re: [CODE4LIB] complex drupal taxonomy question
Just taking a stab in the dark: -- set up a copy field in Solr. This basically takes the content from an existing field and creates a mirror of it. -- apply some extra string processing to your copy field so that it splits and tokenizes the content on the - (e.g., enemy of islam and haverford become two tokens on the field) -- ??? -- Profit. Seriously, though, I'm not sure what you would do after you've tokenized it. You could set up some sort of faceted browse interface to show co-occuring terms, or something else. Maybe some other Solr folks out there have some better ideas. -Andrew On 2012-07-11, at 11:32 AM, Laurie Allen wrote: Hi, I'm working on a drupal site with a very complicated taxonomy. Backstory: A polisci professor and team of students designed this project first as a theoretcal exercise as part of a senior thesis double major in political science and computer science, and then as the project of a very devoted and smart student using drupal. It's both amazingly cool and technically complex. At this point, we are trying to help rein it in to the library servers and help support it so that new crops of students can maintain it without needing to be CS majors, and also to help them address a few issues and problems that have been discovered over the past year or so. My colleague and I are totally new to Drupal, and to this database. While he's working on the solr indexing, I'm trying to help figure out the taxonomy issue. See here: http://gtrp.haverford.edu/aqsi/aqsi/statements/mustafa-abu-al-yazids-interview-al-jazeera Basically, the site indexes the public statements of al-qaeda. Each statements is assigned a bunch of terms by students who have studied jihad and al-qaeda. Each term is composed of two parts. First part: a keyword from a controlled list of keywords - there are many of these and they include places, people, theories, and other things. So, Afghanistan, Barack Obama, and media are all keywords. Second part: a context from a much smaller (around 20) collection of contexts, including I guess how the keyword figures in this statement. Example include area of jihad, enemy of islam, religious relations and others. So, the full term would be media - enemy of islam for example. And each record includes a large number of these. Going forward, we'd ideally like to allow users of the site to find all three of the following: 1. Records that contain a particular two part term. (easy - that's what taxonomy is for) 2. A list of terms that begin with the first part so that they can select the modifier for it (also easy, if we make the second term a subterm or child of the first, this will work fine) 3. A list of terms that have the second part as a qualifier. So, for example, show me all terms in which anything is called an enemy of islam and then let me choose which keyword is referred to as an enemy of jihad and show me that record. It's that third one that we can't figure out. The only way we can think to accomplish this is to basically duplicate each entry so that we'd say Haverford - enemy of islam and enemy of islam - Haverford I think that will work, but since there are many statements, and each statement has many terms, this solution doesn't seem ideal. Do any of you have ideas? Thanks very much. Laurie -- Coordinator for Digital Scholarship and Services Haverford College Library 370 Lancaster Ave Haverford, PA 19041 610-896-4226 lal...@haverford.edu
Re: [CODE4LIB] complex drupal taxonomy question
The issue is that child terms (contexts) are not reusable, so the term, enemy of islam is actually going to be a different entry for each parent (keyword) if you use a parent/child relationship. You should probably use separate vocabularies for contexts and keywords, then a module that establish term relationships, like http://drupal.org/project/term_relations/ May I suggest that you check out the Drupal4lib mailing list http://listserv.uic.edu/archives/drupal4lib.html. Drupal questions posted there get a good audience and a quick turnaround, and, even better, the answers serve the library Drupal community. Thanks, Cary On Wed, Jul 11, 2012 at 8:32 AM, Laurie Allen lal...@haverford.edu wrote: Hi, I'm working on a drupal site with a very complicated taxonomy. Backstory: A polisci professor and team of students designed this project first as a theoretcal exercise as part of a senior thesis double major in political science and computer science, and then as the project of a very devoted and smart student using drupal. It's both amazingly cool and technically complex. At this point, we are trying to help rein it in to the library servers and help support it so that new crops of students can maintain it without needing to be CS majors, and also to help them address a few issues and problems that have been discovered over the past year or so. My colleague and I are totally new to Drupal, and to this database. While he's working on the solr indexing, I'm trying to help figure out the taxonomy issue. See here: http://gtrp.haverford.edu/aqsi/aqsi/statements/mustafa-abu-al-yazids-interview-al-jazeera Basically, the site indexes the public statements of al-qaeda. Each statements is assigned a bunch of terms by students who have studied jihad and al-qaeda. Each term is composed of two parts. First part: a keyword from a controlled list of keywords - there are many of these and they include places, people, theories, and other things. So, Afghanistan, Barack Obama, and media are all keywords. Second part: a context from a much smaller (around 20) collection of contexts, including I guess how the keyword figures in this statement. Example include area of jihad, enemy of islam, religious relations and others. So, the full term would be media - enemy of islam for example. And each record includes a large number of these. Going forward, we'd ideally like to allow users of the site to find all three of the following: 1. Records that contain a particular two part term. (easy - that's what taxonomy is for) 2. A list of terms that begin with the first part so that they can select the modifier for it (also easy, if we make the second term a subterm or child of the first, this will work fine) 3. A list of terms that have the second part as a qualifier. So, for example, show me all terms in which anything is called an enemy of islam and then let me choose which keyword is referred to as an enemy of jihad and show me that record. It's that third one that we can't figure out. The only way we can think to accomplish this is to basically duplicate each entry so that we'd say Haverford - enemy of islam and enemy of islam - Haverford I think that will work, but since there are many statements, and each statement has many terms, this solution doesn't seem ideal. Do any of you have ideas? Thanks very much. Laurie -- Coordinator for Digital Scholarship and Services Haverford College Library 370 Lancaster Ave Haverford, PA 19041 610-896-4226 lal...@haverford.edu -- Cary Gordon The Cherry Hill Company http://chillco.com
Re: [CODE4LIB] complex drupal taxonomy question
Thanks very much, Cary. I'll check out that module and repost on drupal4lib. Laurie