[dev-biblio] Hierarchical Keyword Tree

2006-09-29 Thread Leonard Mada

 German Schlagwort vs Stichwort
I do not know if there is an English equivalent for the two terms. I 
believe you have in English only keywords, which are actually 
Stichwoerter. Schlagwoerter would be some kind of keywords, too 
(like used in Indices), but there is no distinction between the two, as 
far as I know.


What I want is an amalgam of both, and even more than that. Simple 
keywords are to primitive and do not offer the wanted advantages when 
you want to search something. e.g. I recently searched for the term 
febrile neutropenia on Pubmed and retrieved 1883 search results. This 
search was not the most sensitive, though. Searching for febrile and 
neutropenia yields 3500 results. Searching for fever and 
neutropenia results in 3283 hits.


As the sensitivity of the search increases, so drops the specificity. 
Most of those documents would have been useless for me. And by the way, 
febrile neutropenia is not such a common term. If you search for 
something common, you would have one-two orders of magnitude more search 
results.


There is definitively the need for something better, and I believe a 
form of hierarchical keywords (or tags) could offer some relief, but 
there is definitely need for a more thorough thought on this subject.


As I described on the wiki page: the endocarditis example (infection of 
heart valves)
- in endocarditis heart valves are most often infected (but not 
exclusively):

 -- so most of the time endocarditis implies heart valves, too
 -- I may want sometime to search more extensively for heart valves; 
the option would be to:

 -- add heart valves as a keyword to every article on endocarditis;
   --- but the keyword list would become very fast a huge list (because 
I would have to enter other terms as well, like cardiology, various 
bacteria and many more)
   --- many terms can be selectively used on some articles, so applying 
them indiscriminately will result in a severe loss of specificity for 
the search:
   --- e.g.: most endocarditis causes bacteremia (bacteria in the 
blood), yet not all
   --- bacteremia can also cause endocarditis (i.e. be the reason for 
endocarditis)
   --- however I would add bacteremia as a keyword only when 
specifically studied in the article (to maintain a high specificity)
   --- yet for a more general search on bacteremia, I would include 
endocarditis, too, in my search protocol
   --- of course, the search could often be done without that 
hierarchical tree, by manually including all the search strings in the 
query, but the query would look odd and be difficult to understand (and 
many users wouldn't be able even to write it correctly); you would easy 
forget to include some indirect search term;
- to expand your example: Nonfiction - Guidebooks - Cooking - Asian 
meals: I may want to specifically search for 'Asian meals'; another time 
for Cooking (including Asian meals) and still another time !!only!! for 
'Guidebooks' (excluding books on Cooking or any other specific 
'Guide'-book, i.e. generally on guidebooks). To expand it on 
endocarditis: I may want to search on endocarditis or infection 
(including endocarditis and other infections), or more generally 
articles dealing broadly with infections (but not with specific 
infections, like endocarditis).


 Stichworte are not usually stored hierarchically
- see comment on sensitivity vs specificity: adding every possible 
keyword to the list would make these lists huge,

- reduce the specificity, and
- it would be notoriously cumbersome to physically add all those 
keywords to the list (and not to forget one)


I believe that hierarchical keyword lists/ trees could offer a very 
powerful mechanism for such searches (because one would be able to 
dynamically change the tree structure to be best suited for the 
particular search).


Also, this way you do not have always to remember every keyword (tag) 
that should be included in the tree (the tree is simply there; no user 
would create for every new search a new, very different tree; rather, 
most trees would be used for a number of searches, and a new tree would 
most often be a tweek of a previous tree, not a de novo invention).


I have over 2500 articles on my PC. They are arranged hierarchically in 
subdirectories. The problem is:

- articles may belong to more than one directory (aka category)
 -- I would like to have more than one tree for my articles, but you 
can't do this on a filesystem
- I need sometime searches on more than one subdirectory from different 
directory trees (this is indeed difficult to do on a file system)
- there are many other limitations, but currently its the best method 
to organise so many articles


When you have so many articles, the organization of them becomes a real 
nightmare.


I believe that hierarchical keywords are a good start (!!and I do not 
have any better idea right know!!). Therefore, I believe that a little 
brainstorming would be quite useful.





Re: [dev-biblio] Hierarchical Keyword Tree

2006-09-28 Thread Bruce D'Arcus


On Sep 26, 2006, at 5:51 PM, Leonard Mada wrote:

I come up with another idea regarding the standardisation of keywords.  
I believe that the ultimate goal is to have standard keywords, too.  
However, as this will be difficult, a possible solution is to let  
users specify their own keywords. Have a talk-back feature. Collect  
used keywords over a period of 1-2 years. And build a list with the  
most frequently used keywords. These are likely to be used more widely  
and therefore could be bundled with future versions of OOo. Of course,  
users could change this list and adapt it further to their specific  
needs, but it would be a starting point for their own list.


I've suggested something like this to the Zotero developers:

http://netapps.muohio.edu/blogs/darcusb/darcusb/archives/2006/09/09/ 
zotero-and-the-practical-semantic-web


But there you use other mechanisms for associating tags than uses  
having to worry about explicitly defining a hierarchy.


BTW, Zotero ought to be going public beta next week. I suggest people  
take a close look when it does.


Bruce

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [dev-biblio] Hierarchical Keyword Tree

2006-09-27 Thread David Wilson
Leonard Mada.

Your have raised some very interesting questions. I think the idea of 
setting 
a scheme for sharing subject specific key word lists is well worth 
considering - and rather simple to implement.

David


On Wednesday 27 September 2006 7:51 am, Leonard Mada wrote:
 Hi,

 I made some progress regarding the keywords. Unfortunately, I believe
 that a plain keyword list won't solve much of the current problems; see
 http://wiki.services.openoffice.org/wiki/Bib-Keywords paragraph 2.2
 Limitations of Current Keyword Strategies for some reasons why basic
 keywords are far from adequate.

 I believe that a solution to this problem could lie in a hierarchical
 keyword tree. Users would be allowed to create dynamically such a
 keyword tree (using existing keywords) to enhance the capabilities of
 the search strategies. See the paragraph 3.1.2 Hierarchical Keyword
 Tree on the same page for a more extended discussion.

 Because all this is virtually new land, I would like to open a
 brainstorming session. I would appreciate any comments and suggestions.

 I come up with another idea regarding the standardisation of keywords. I
 believe that the ultimate goal is to have standard keywords, too.
 However, as this will be difficult, a possible solution is to let users
 specify their own keywords. Have a talk-back feature. Collect used
 keywords over a period of 1-2 years. And build a list with the most
 frequently used keywords. These are likely to be used more widely and
 therefore could be bundled with future versions of OOo. Of course, users
 could change this list and adapt it further to their specific needs, but
 it would be a starting point for their own list.

 Kind regards,

 Leonard Mada
 [aka discoleo]

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]

-- 
---
David N. Wilson
Co-Project Lead for the Bibliographic 
OpenOffice Project
http://bibliographic.openoffice.org

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]