[CODE4LIB] complex drupal taxonomy question

2012-07-11 Thread Laurie Allen
Hi,
I'm working on a drupal site with a very complicated taxonomy.
Backstory: A polisci professor and team of students designed this
project first as a theoretcal exercise as part of a senior thesis
double major in political science and computer science, and then as
the project of a very devoted and smart student using drupal. It's
both amazingly cool and technically complex. At this point, we are
trying to help rein it in to the library servers and help support it
so that new crops of students can maintain it without needing to be CS
majors, and also to help them address a few issues and problems that
have been discovered over the past year or so. My colleague and I are
totally new to Drupal, and to this database. While he's working on the
solr indexing, I'm trying to help figure out the taxonomy issue.

See here: 
http://gtrp.haverford.edu/aqsi/aqsi/statements/mustafa-abu-al-yazids-interview-al-jazeera
Basically, the site indexes the public statements of al-qaeda. Each
statements is assigned a bunch of terms by students who have studied
jihad and al-qaeda.

Each term is composed of two parts.
First part: a keyword from a controlled list of keywords - there are
many of these and they include places, people, theories, and other
things. So, Afghanistan, Barack Obama, and media are all
keywords.
Second part: a context from a much smaller (around 20) collection of
contexts, including I guess how the keyword figures in this statement.
Example include area of jihad, enemy of islam, religious relations
and others.

So, the full term would be media - enemy of islam for example. And
each record includes a large number of these.

Going forward, we'd ideally like to allow users of the site to find
all three of the following:
1. Records that contain a particular two part term. (easy - that's
what taxonomy is for)
2. A list of terms that begin with the first part so that they can
select the modifier for it (also easy, if we make the second term a
subterm or child of the first, this will work fine)
3. A list of terms that have the second part as a qualifier. So, for
example, show me all terms in which anything is called an enemy of
islam and then let me choose which keyword is referred to as an enemy
of jihad and show me that record.

It's that third one that we can't figure out. The only way we can
think to accomplish this is to basically duplicate each entry so that
we'd say Haverford - enemy of islam and enemy of islam - Haverford
I think that will work, but since there are many statements, and each
statement has many terms, this solution doesn't seem ideal. Do any of
you have ideas?
Thanks very much.
Laurie
-- 
Coordinator for Digital Scholarship and Services
Haverford College Library
370 Lancaster Ave
Haverford, PA 19041
610-896-4226
lal...@haverford.edu


Re: [CODE4LIB] complex drupal taxonomy question

2012-07-11 Thread Andrew Hankinson
Just taking a stab in the dark:

-- set up a copy field in Solr. This basically takes the content from an 
existing field and creates a mirror of it.
-- apply some extra string processing to your copy field so that it splits and 
tokenizes the content on the - (e.g., enemy of islam and haverford become 
two tokens on the field)
-- ???
-- Profit.

Seriously, though, I'm not sure what you would do after you've tokenized it. 
You could set up some sort of faceted browse interface to show co-occuring 
terms, or something else. Maybe some other Solr folks out there have some 
better ideas.

-Andrew

On 2012-07-11, at 11:32 AM, Laurie Allen wrote:

 Hi,
 I'm working on a drupal site with a very complicated taxonomy.
 Backstory: A polisci professor and team of students designed this
 project first as a theoretcal exercise as part of a senior thesis
 double major in political science and computer science, and then as
 the project of a very devoted and smart student using drupal. It's
 both amazingly cool and technically complex. At this point, we are
 trying to help rein it in to the library servers and help support it
 so that new crops of students can maintain it without needing to be CS
 majors, and also to help them address a few issues and problems that
 have been discovered over the past year or so. My colleague and I are
 totally new to Drupal, and to this database. While he's working on the
 solr indexing, I'm trying to help figure out the taxonomy issue.
 
 See here: 
 http://gtrp.haverford.edu/aqsi/aqsi/statements/mustafa-abu-al-yazids-interview-al-jazeera
 Basically, the site indexes the public statements of al-qaeda. Each
 statements is assigned a bunch of terms by students who have studied
 jihad and al-qaeda.
 
 Each term is composed of two parts.
 First part: a keyword from a controlled list of keywords - there are
 many of these and they include places, people, theories, and other
 things. So, Afghanistan, Barack Obama, and media are all
 keywords.
 Second part: a context from a much smaller (around 20) collection of
 contexts, including I guess how the keyword figures in this statement.
 Example include area of jihad, enemy of islam, religious relations
 and others.
 
 So, the full term would be media - enemy of islam for example. And
 each record includes a large number of these.
 
 Going forward, we'd ideally like to allow users of the site to find
 all three of the following:
 1. Records that contain a particular two part term. (easy - that's
 what taxonomy is for)
 2. A list of terms that begin with the first part so that they can
 select the modifier for it (also easy, if we make the second term a
 subterm or child of the first, this will work fine)
 3. A list of terms that have the second part as a qualifier. So, for
 example, show me all terms in which anything is called an enemy of
 islam and then let me choose which keyword is referred to as an enemy
 of jihad and show me that record.
 
 It's that third one that we can't figure out. The only way we can
 think to accomplish this is to basically duplicate each entry so that
 we'd say Haverford - enemy of islam and enemy of islam - Haverford
 I think that will work, but since there are many statements, and each
 statement has many terms, this solution doesn't seem ideal. Do any of
 you have ideas?
 Thanks very much.
 Laurie
 -- 
 Coordinator for Digital Scholarship and Services
 Haverford College Library
 370 Lancaster Ave
 Haverford, PA 19041
 610-896-4226
 lal...@haverford.edu


Re: [CODE4LIB] complex drupal taxonomy question

2012-07-11 Thread Cary Gordon
The issue is that child terms (contexts) are not reusable, so the
term, enemy of
islam is actually going to be a different entry for each parent
(keyword) if you use a parent/child relationship.

You should probably use separate vocabularies for contexts and
keywords, then a module that establish term relationships, like
http://drupal.org/project/term_relations/

May I suggest that you check out the Drupal4lib mailing list
http://listserv.uic.edu/archives/drupal4lib.html. Drupal questions
posted there get a good audience and a quick turnaround, and, even
better, the answers serve the library Drupal community.

Thanks,

Cary

On Wed, Jul 11, 2012 at 8:32 AM, Laurie Allen lal...@haverford.edu wrote:
 Hi,
 I'm working on a drupal site with a very complicated taxonomy.
 Backstory: A polisci professor and team of students designed this
 project first as a theoretcal exercise as part of a senior thesis
 double major in political science and computer science, and then as
 the project of a very devoted and smart student using drupal. It's
 both amazingly cool and technically complex. At this point, we are
 trying to help rein it in to the library servers and help support it
 so that new crops of students can maintain it without needing to be CS
 majors, and also to help them address a few issues and problems that
 have been discovered over the past year or so. My colleague and I are
 totally new to Drupal, and to this database. While he's working on the
 solr indexing, I'm trying to help figure out the taxonomy issue.

 See here: 
 http://gtrp.haverford.edu/aqsi/aqsi/statements/mustafa-abu-al-yazids-interview-al-jazeera
 Basically, the site indexes the public statements of al-qaeda. Each
 statements is assigned a bunch of terms by students who have studied
 jihad and al-qaeda.

 Each term is composed of two parts.
 First part: a keyword from a controlled list of keywords - there are
 many of these and they include places, people, theories, and other
 things. So, Afghanistan, Barack Obama, and media are all
 keywords.
 Second part: a context from a much smaller (around 20) collection of
 contexts, including I guess how the keyword figures in this statement.
 Example include area of jihad, enemy of islam, religious relations
 and others.

 So, the full term would be media - enemy of islam for example. And
 each record includes a large number of these.

 Going forward, we'd ideally like to allow users of the site to find
 all three of the following:
 1. Records that contain a particular two part term. (easy - that's
 what taxonomy is for)
 2. A list of terms that begin with the first part so that they can
 select the modifier for it (also easy, if we make the second term a
 subterm or child of the first, this will work fine)
 3. A list of terms that have the second part as a qualifier. So, for
 example, show me all terms in which anything is called an enemy of
 islam and then let me choose which keyword is referred to as an enemy
 of jihad and show me that record.

 It's that third one that we can't figure out. The only way we can
 think to accomplish this is to basically duplicate each entry so that
 we'd say Haverford - enemy of islam and enemy of islam - Haverford
 I think that will work, but since there are many statements, and each
 statement has many terms, this solution doesn't seem ideal. Do any of
 you have ideas?
 Thanks very much.
 Laurie
 --
 Coordinator for Digital Scholarship and Services
 Haverford College Library
 370 Lancaster Ave
 Haverford, PA 19041
 610-896-4226
 lal...@haverford.edu



-- 
Cary Gordon
The Cherry Hill Company
http://chillco.com


Re: [CODE4LIB] complex drupal taxonomy question

2012-07-11 Thread Laurie Allen
Thanks very much, Cary. I'll check out that module and repost on drupal4lib.
Laurie