Hi Eric, If you use debugQuery=on parameter, you'll receive the explain structure, which tell you about the score number calculation factors. An example: str name=oai:URMST:Transformation_Service/1 1.5076942 = (MATCH) fieldWeight(text:chant in 0), product of: 1.4142135 = tf(termFreq(text:chant)=2) 6.8230457 = idf(docFreq=1, numDocs=676) 0.15625 = fieldNorm(field=text, doc=0) /str Here tf(termFreq(text:chant)=2) tell you, that the queried term found two times in the document. You should apply a regex to extract this info from the explain string. Since this term is an analyzed term, it is possible that it not equals with the user input, but debug's 'parsedquery' parameter tell you the terms Solr search behind the scene. In Lucene, if the field stores the termVector's positions, there are API calls, that you can get the exact place of the term within the field (as character positions, or as the n-th token), but I don't know how to extract this info through Solr. Hope this helps. Király Péter eXtensible Catalog http://xcproject.org - Original Message - From: Eric James cirese...@hotmail.com To: CODE4LIB@LISTSERV.ND.EDU Sent: Friday, October 16, 2009 9:52 PM Subject: Re: [CODE4LIB] solr - search query count | highlighting Thanks for your response. But, yes I'm able to use facets in general, and yes I'm able to do highlighting on stored fields. But finding how many times the query appears in the full text is my question. For example say you search on Heisenberg We'd like to see: Hit 1: Your search for Heisenberg appears 10 times within the Finding Aid Hit 2: Your search for Heisenberg appears 3 times within the Finding Aid Hit 3: Your search for Heisenberg appears 88 times within the Finding Aid etc Could there be a solr parameter that calculates this? Otherwise a klugey, not very scalable method could be that once you retrieve a solr result xml, find the fedora pid, retrieve the EAD full text, run a standard function to count how many times the query appears in the text for each hit, and add parameters back into the xml with these counts. Date: Fri, 16 Oct 2009 15:27:42 -0400 From: ewg4x...@gmail.com Subject: Re: [CODE4LIB] solr - search query count | highlighting To: CODE4LIB@LISTSERV.ND.EDU Hi Eric, You do not have to store the entire text content of the EAD guide in order to enable facets. Here's an example: http://kittredgecollection.org/results?q=*:* . There are about 15 facets enabled on a collection of almost 1500 EAD documents (though quite small in filesize compared to traditional EAD finding aids), and there's no slowdown whatsoever. I don't believe you need to store the guides to enable highlighting either, though I have heard there is some dropoff in performance with highlighting enabled. I've never done benchmarking on highlighting enabled versus disabled, so I can't tell you how much of a dropoff there is. In an index of only several hundred documents, I would think that the dropoff with highlighting enabled would be fairly negligible. Ethan On Fri, Oct 16, 2009 at 3:12 PM, Eric James cirese...@hotmail.com wrote: For our finding aids, we are using fedoragenericsearch 2.2 with solr as index. Because the EADs can be huge, the EADs are indexed but not stored (with stored EADs, search time for ~500 objects = 20 min rather than 1 sec). However, we would like to have number of search terms found within each hit. For example, CDL's collection: http://www.oac.cdlib.org/search?query=Donner Also we would like highlighting/snippets of the search term similar to CDL's. Is it a lost cause to have this functionality without storing the EAD? Is there a way to store the EAD and have a reasonable response time? --- Eric James Yale University Libraries
Hi Jill, The eXtensible Catalog (http://eXtensibleCatalog.org) provides similar funtionality. The user interface of the XC is a set of Drupal modules, and it runs inside Drupal, which probably the most popular PHP CMS application. Our modules (called Drupal Toolkit), are able to harvest metadata from OAI-PMH repositories, then process XML, save fields inside MySQL and in Solr. We provided administrator interfaces, where you can decide how to index different fields, what kind of facets do you want to build from the fields, and -- still inside the admin interface -- you can create search and browse interfaces, including search forms, navigationable lists, tempates for results. You can interact with your ILS for circulation data or authentication. You can mashup the results with additional data from external sources, like table of contents, cover images, reviews. The Drupal Toolkit is still in alpha release, we plan to issue the first more stable release in weeks. You can see more in the eXtensible Catalog screencast: http://www.screencast.com/users/eXtensibleCatalog (the second part is about the Drupal Toolkit). You can download the software from here: http://drupal.org/project/xc. If you have any question don't hesitate to contact me, or the leaders of the project. Regards, Péter Király http://eXtensibleCatalog.org - Original Message - From: Earles, Jill Denae jdear...@ku.edu To: CODE4LIB@LISTSERV.ND.EDU Sent: Monday, February 08, 2010 5:58 PM Subject: [CODE4LIB] faceted browsing I would like recommendations for faceted browsing systems that include authentication, and easily support multimedia content and metadata. The ability to add comments and tags to content, and browse by tag cloud is also desirable. My skills include ColdFusion, PHP, CakePHP, and XML/XSL. The only system I've worked with that includes faceted browsing is XTF, and I don't think it's well suited to this. I am willing to learn a new language/technology if there is a system that includes most of what I'm looking for. Please let me know of any open-source systems you know of that might be suited to this. If you have time and interest, see the detailed description of the system below. Thank you, Jill Earles Detailed description: I am planning to build a system to manage a collection of multimedia artwork, to include audio, video, images, and text along with accompanying metadata. The system should allow for uploading the content and entering metadata, and discovery of content via searching and faceted browsing. Ideally it will also include a couple of ways of visually representing the relationships between items (for example, a video and the images and audio files that are included in the video, and notes about the creative process). The views we've conceived of at this point include a flow view that shows relationships with arrows between them (showing chronology or this begat that relationship), and a constellation view that shows all of the related items, with or without lines between them. It needs to have security built in so that only contributing members can search and browse the contributions by default. Ideally, there would be an approval process so that a contributor could propose making a work public, and if all contributors involved in the work (including any components of the work, i.e. the images and audio files included in the video) give their approval, the work would be made public. The public site would also have faceted browsing, searching by all metadata that we make public, and possibly tag clouds, and the ability to add tags and comments about the work.
Hi, I would like to ask you, whether is there somebody, from whom I can ask permissions, to use the name code4lib.hu for an unconference meetup, where Hungarian library coders could talk, and pair-program in a style of a Drupal codesprint or OCLC mashaton? Péter eXtensible Catalog
Dear Jonathan and Edward, Thank you for your kindness. I will let you know, if the initiative were successfull. Regards, Péter ps. Edward: if you come to Hungary, and you would like to hear some advice about nice places here, drop me a private email, maybe I can help you. - Original Message - From: Edward M. Corrado ecorr...@ecorrado.us To: CODE4LIB@LISTSERV.ND.EDU Sent: Wednesday, March 10, 2010 5:14 PM Subject: Re: [CODE4LIB] code4lib.hu meetup As Jonathan pointed out, there is nobody to ask formal permission - just go ahead and do it. Personally, I would love to see some of these regional code4lib conferences/meetups/symposium/whatever happen around the world. Who knows, I might even show up to one :-). Edward - who actually plans to be in Hungry for a day or two in late June on his way to Romania. Jonathan Rochkind wrote: There's nobody to ask formal permission for, but I think you've done the right thing by suggesting it on this listserv and seeing what the community thinks. As one member of the community, I think that's a great idea and an appropriate use of the code4Lib name, and I expect that everyone else will think so too. You are also welcome to use the Code4Lib wiki if it's useful for your local group/meeting. You can see that other local/regional/national Code4Lib meetups very similar to what you envision have already listed themselves on the wiki and make use of the wiki. Look under Local / Regional Groups on http://wiki.code4lib.org/index.php/Main_Page . You are welcome to list your group on the wiki and use the wiki if you like. Jonathan Király Péter wrote: Hi, I would like to ask you, whether is there somebody, from whom I can ask permissions, to use the name code4lib.hu for an unconference meetup, where Hungarian library coders could talk, and pair-program in a style of a Drupal codesprint or OCLC mashaton? Péter eXtensible Catalog
- Original Message - From: Aaron Rubinstein arubi...@library.umass.edu I would like to see: 1. Code snippets/gists. For the interface I can imagine a similar something as http://pastebin.com/, like http://drupal.pastebin.com/41WtCpTY, maybe with library-tech related categories (UI, search, circ, admin UI, DB, XML, ...) Péter http://eXtensibleCatalog.org
Dear code4lib-ers, during last week (wendesday afternoon) we held the first code4lib.hu workshop in Debrecen, at the University Library. The purpose of the meeting was that the library developers, and library information system's power users meet and talk each other, on order, that in the future different systems could communicate over standard protocols, which is the base condition of any mashupable, shareable service. Preliminary only 9 person said that they will be there for sure, but finally 28 developers participated, from libraries and developer companies. The result was not a workshop for hardcore coders, but an interesting and (more important) productive talking. Since participants were not tied to any concrete project, we could discuss a somehow 'ideal' state-of-art: how to get there, what development and library policy steps would be involved. The discussion focused on the uniform library authentication (one entry oint for all Hungarian library) and the inter-library loan. Some important statements: - the services should be based on standards, either international, or if we couldn't find a proper one, we could form a doemstic (Hungarian) standard - the authentication system provided by the National Infrastructure Agency does not fit for all libraries, since even the university libraries have users, who are not university citizens, so they lack university identifiers - bilateral agreement between libraries is a must have for the unified authentication, that A library accepts the authentication system of B library, and it will provide services for the users of B library - the current statistical measurements are outdated, and could not reflect such a shared services, but since the statistics are the most important measuring tool for the owner of libraries, the libraries tend to not develop shared services, because they could loose some of their resources (they spend on things, which do not reflect in the statistics...) - the inter-library loans could be initialized by the users, and such way, it releases some burden from the librarians. The librarians could controll the whole process, but not as the only player. The meeting was not aimed to agree on anything, so we do not created any document or manifestation, but there were some ideas about the continuation. Since then, one of the participants bought the code4lib.hu domain, and offered it for free to community usage. We restarted an older listserv (at http://groups.google.com/group/ikr-fejlesztok), and we decided, that we will continue the meeting in the near future with lighting talks and discussions on library standards (like NCIP, inter library loans etc.), and personally I hope, that we could do mashaton-like meeting. Final note: somebody said on the code4lib IRC, that we will miss bbq. Well, we didn't have bbq, but as I promissed we had slambuc, a traditional shepherds' dish near Debrecen. Thank you for your support! Király Péter http://eXtensibleCatalog.org
Hi! I gladly report, that we had the first code4lib.hu codesprint yesterday. The purpose was to code with each other, and learn something from each other. It was a 3,5 hour session at the National Széchényi Library, Budapest. We created a script, which extracts ISBN numbers and book cover images from an OAI-PMH data provider, embeded as METS records. Hopefuly this code will be part in two or three different library or book related services in the next months. We have discussed the technical details, and the advantages, and the right problems of uploading a local history photo collection to Flickr. Unfortunatelly we didn't have time to code the Flickr part. There was only a couple of coders, but we had a goot talk, new acquaintances. (For those in #code4lib: this time we had no bbq, nor 'slambuc', but lots of biscuits and mineral water. ;-) If - for whatever reason - you want to follow or join us, see our group page: http://groups.google.com/group/ikr-fejlesztok/ The meeting was run as a section of the Library's K2 (library 2.0) task force's workshop about the usage of library 2.0 tools. http://blog.konyvtar.hu/k2/ Some technical details: - we use PHP as the common language - for OAI-PMH harvesting we use Omeka's OAI harvester plugin - for Flickr communication we planned to use Phlickr, a PHP library - the OAI server we harvested run at University of Debrecen, and based on DSpace - we found a bug in the Ubuntu version of PHP 5.2.10 (SimpleXMLElement have a problem with xpath() method) - but we found a workaround as well. Regards, Péter