Re: about flexing ranking module in lucene
Hi, http://wiki.apache.org/lucene-java/SummerOfCode2011ProjectRanking seem s interesting. what's the status of this branch? will it be included in lucene4 release? Hi, its very close. there are some nocommits still in the branch right now, once these are fixed we will look at merging to trunk. I've checked the nocommits in the similarities package, and it seems to me that there is only one that is really no-worky (the phrase df). The rest are about modifications to a few DFR models that are suboptimal, but they work nevertheless. Robert: I figured I'd take a week out for a much needed rest (not), what about getting back on this on Monday? David - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Fwd: Final Evaluations results processed for LUCENE-2959: Implementing State of the Art Ranking for Lucene
Hi, let me chime in too. I have received the attached mail today, telling me that I have also passed the final evaluations. I would like to thank everyone for the support, especially Robert, who accepted the heavy burden of mentoring me through the project. :) There's still a lot to do though, so I guess it won't be the last you heard of me. Thanks, David -- Forwarded Message -- Subject: Final Evaluations results processed for LUCENE-2959: Implementing State of the Art Ranking for Lucene Date: 2011 August 26, Friday, 21:07:41 From: no-re...@socghop.appspotmail.com To: nemeskey.da...@sztaki.hu CC: nor...@apache.org, u...@apache.org, rcm...@gmail.com Hi David Nemeskey, We have processed the evaluation for your project named LUCENE-2959: Implementing State of the Art Ranking for Lucene with Apache Software Foundation. Congratulations, from our data it seems that you have successfully passed the Final Evaluations. Please contact your mentor to discuss the results of your evaluation and to plan your goals and development plan for the rest of the program Greetings, The Google Open Source Programs Team - - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
My GSoC project page
Hi, I've created a page for my GSoC project in the wiki. Currently there is only a short report on Terrier's ranking architecture and a few questions on it, but I plan to publish all documents related to the project there -- analyses, design plans, benchmarks, etc. Please feel free to comment on anything you read there. The address is http://wiki.apache.org/lucene- java/SummerOfCode2011ProjectRanking I have modified the the central GSoC page, SummerOfCode2011, so that the project title links to the new page. David On 2011 May 30, Monday 16:20:49 Apache Jenkins Server wrote: Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/8516/ All tests passed Build Log (for compile errors): [...truncated 14814 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: I was accepted in GSoC!!!
Hi Uwe, do you mean one issue per GSoC proposal, or one for every logical unit in the project? If the second: Robert told me to use the flexscoring branch as a base for my project, since preliminary work has already been done in that branch. Should I open JIRA issues nevertheless? Thanks, David On 2011 May 04, Wednesday 09:56:02 Uwe Schindler wrote: Hi Vinicius, Submitting patches via JIRA is fine! We were just thinking about possibly providing some SVN to work with (as additional training), but came to the conclusion, that all students should go the standard Apache Lucene way of submitting patches to JIRA issues. You can of course still use SVN / GIT locally to organize your code. At the end we just need a patch to be committed by one of the core committers. Uwe - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [GSoC] Apache Lucene @ Google Summer of Code 2011 [STUDENTS READ THIS]
Hey Simon and all, May we get an update on this? I understand that Google has published the list of accepted organizations, which -- not surprisingly -- includes the ASF. Is there any information on how many slots Apache got, and which issues will be selected? The student application period opens on the 28th, so I'm just wondering if I should go ahead and apply or wait for the decision. Thanks, David On 2011 March 11, Friday 17:23:58 Simon Willnauer wrote: Hey folks, Google Summer of Code 2011 is very close and the Project Applications Period has started recently. Now it's time to get some excited students on board for this year's GSoC. I encourage students to submit an application to the Google Summer of Code web-application. Lucene Solr are amazing projects and GSoC is an incredible opportunity to join the community and push the project forward. If you are a student and you are interested spending some time on a great open source project while getting paid for it, you should submit your application from March 28 - April 8, 2011. There are only 3 weeks until this process starts! Quote from the GSoC website: We hear almost universally from our mentoring organizations that the best applications they receive are from students who took the time to interact and discuss their ideas before submitting an application, so make sure to check out each organization's Ideas list to get to know a particular open source organization better. So if you have any ideas what Lucene Solr should have, or if you find any of the GSoC pre-selected projects [1] interesting, please join us on dev@lucene.apache.org [2]. Since you as a student must apply for a certain project via the GSoC website [3], it's a good idea to work on it ahead of time and include the community and possible mentors as soon as possible. Open source development here at the Apache Software Foundation happens almost exclusively in the public and I encourage you to follow this. Don't mail folks privately; please use the mailing list to get the best possible visibility and attract interested community members and push your idea forward. As always, it's the idea that counts not the person! That said, please do not underestimate the complexity of even small GSoC - Projects. Don't try to rewrite Lucene or Solr! A project usually gains more from a smaller, well discussed and carefully crafted tested feature than from a half baked monster change that's too large to work with. Once your proposal has been accepted and you begin work, you should give the community the opportunity to iterate with you. We prefer progress over perfection so don't hesitate to describe your overall vision, but when the rubber meets the road let's take it in small steps. A code patch of 20 KB is likely to be reviewed very quickly so get fast feedback, while a patch even 60kb in size can take very - Hide quoted text - long. So try to break up your vision and the community will work with you to get things done! On behalf of the Lucene Solr community, Go! join the mailing list and apply for GSoC 2011, Simon [1] https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=truejqlQu ery=labels+%3D+lucene-gsoc-11 [2] http://lucene.apache.org/java/docs/mailinglists.html [3] http://www.google-melange.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: GSoC
Ok, I have created a new issue, LUCENE-2959 for this project. I have uploaded the pdfs and added the gsoc2011 and lucene-gsoc-2011 labels as well. David On 2011 March 09, Wednesday 21:58:53 Simon Willnauer wrote: On Wed, Mar 9, 2011 at 5:48 PM, Grant Ingersoll gsing...@apache.org wrote: I think we, Lucene committers, need to identify who is willing to mentor. In my experience, it is less than 5 hours a week. Most of the work is done as part of the community. Sometimes you have to be tough and fail someone (I did last year) but most of the time, if you take the time to interview the candidates up front, it is a good experience for everyone. count me in I'd add it would be useful to have everyone put the lucene-gsoc-11 label on their issues too, that way we can quickly find the Lucene ones. done on at least one ;) simon Also, feel free to label existing bugs. On Mar 9, 2011, at 2:11 AM, Simon Willnauer wrote: Hey David and all others who want to contribute to GSoC, the ASF has applied for GSoC 2011 as a mentoring organization. As a ASF project we don't need to apply directly though but we need to register our ideas now. This works like almost anything in the ASF through JIRA. All ideas should be recorded as JIRA tickets labeled with gsoc2011. Once this is done it will show up here: http://s.apache.org/gsoc2011tasks Everybody who is interested in GSoC as a mentor or student should now read this too http://community.apache.org/gsoc.html Thanks, Simon On Thu, Feb 24, 2011 at 12:14 PM, David Nemeskey nemeskey.da...@sztaki.hu wrote: Please find the implementation plan attached. The word soon gets a new meaning when power outages are taken into account. :) As before, comments are welcome. David On Tuesday, February 22, 2011 15:22:57 Simon Willnauer wrote: I think that is good for now. I should get started on codeawards and wrap up our proposals. I hope I can do that this week. simon On Tue, Feb 22, 2011 at 3:16 PM, David Nemeskey nemeskey.da...@sztaki.hu wrote: Hey, I have written the proposal. Please let me know if you want more / less of certain parts. Should I upload it somewhere? Implementation plan soon to follow. Sorry for the late reply; I have been rather busy these past few weeks. David On Wednesday, February 02, 2011 10:35:55 Simon Willnauer wrote: Hey David, I saw that you added a tiny line to the GSoC Lucene wiki - thanks for that. On Wed, Feb 2, 2011 at 10:10 AM, David Nemeskey nemeskey.da...@sztaki.hu wrote: Hi guys, Mark, Robert, Simon: thanks for the support! I really hope we can work together this summer (and before that, obviously). Same here! According to http://www.google- melange.com/document/show/gsoc_program/google/gsoc2011/timeline , there's still some time until the application period. So let me use this week to finish my PhD research plan, and get back to you next week. I am not really familiar with how the program works, i.e. how detailed the application description should be, when mentorship is decided, etc. so I guess we will have a lot to talk about. :) so from a 1ft view it work like this: 1. Write up a short proposal what your idea is about 2. make it public! and publish a implementation plan - how you would want to realize your proposal. If you don't follow that 100% in the actual impl. don't worry. Its just mean to give us an idea that you know what you are doing and where you want to go. something like a 1 A4 rough design doc. 3. give other people the change to apply for the same suggestion (this is how it works though) 4 Let the ASF / us assign one or more possible mentors to it 5. let us apply for a slot in GSoC (those are limited for organizations) 6. get accepted 7. rock it! (Actually, should we move this discussion private?) no - we usually do everything in public except of discussion within the PMC that are meant to be private for legal reasons or similar things. Lets stick to the mailing list for all communication except you have something that should clearly not be public. This also give other contributors a chance to help and get interested in your work!! simon David Hi David, honestly this sounds fantastic. It would be great to have someone to work with us on this issue! To date, progress is pretty slow-going (minor improvements, cleanups, additional stats here and there)... but we really need all the help we can get, especially from people who have a really good understanding of the various models. In case you are interested, here are some references to discussions about adding more flexibility (with some prototypes etc): http://www.lucidimagination.com/search/document/72787e0e54f798e4/ baby _st eps _towards_making_lucene_s_scoring_more_flexible
Re: GSoC
Hi guys, Mark, Robert, Simon: thanks for the support! I really hope we can work together this summer (and before that, obviously). According to http://www.google- melange.com/document/show/gsoc_program/google/gsoc2011/timeline , there's still some time until the application period. So let me use this week to finish my PhD research plan, and get back to you next week. I am not really familiar with how the program works, i.e. how detailed the application description should be, when mentorship is decided, etc. so I guess we will have a lot to talk about. :) (Actually, should we move this discussion private?) David Hi David, honestly this sounds fantastic. It would be great to have someone to work with us on this issue! To date, progress is pretty slow-going (minor improvements, cleanups, additional stats here and there)... but we really need all the help we can get, especially from people who have a really good understanding of the various models. In case you are interested, here are some references to discussions about adding more flexibility (with some prototypes etc): http://www.lucidimagination.com/search/document/72787e0e54f798e4/baby_steps _towards_making_lucene_s_scoring_more_flexible https://issues.apache.org/jira/browse/LUCENE-2392 On Fri, Jan 28, 2011 at 11:32 AM, David Nemeskey nemeskey.da...@sztaki.hu wrote: Hi all, I have already sent this mail to Simon Willnauer, and he suggested me to post it here for discussion. I am David Nemeskey, a PhD student at the Eotvos Lorand University, Budapest, Hungary. I am doing an IR-related research, and we have considered using Lucene as our search engine. We were quite satisfied with the speed and ease of use. However, we would like to experiment with different ranking algorithms, and this is where problems arise. Lucene only supports the VSM, and unfortunately the ranking architecture seems to be tailored specifically to its needs. I would be very much interested in revamping the ranking component as a GSoC project. The following modifications should be doable in the allocated time frame: - a new ranking class hierarchy, which is generic enough to allow easy implementation of new weighting schemes (at least bag-of-words ones), - addition of state-of-the-art ranking methods, such as Okapi BM25, proximity and DFR models, - configuration for ranking selection, with the old method as default. I believe all users of Lucene would profit from such a project. It would provide the scientific community with an even more useful research aid, while regular users could benefit from superior ranking results. Please let me know your opinion about this proposal. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
GSoC
Hi all, I have already sent this mail to Simon Willnauer, and he suggested me to post it here for discussion. I am David Nemeskey, a PhD student at the Eotvos Lorand University, Budapest, Hungary. I am doing an IR-related research, and we have considered using Lucene as our search engine. We were quite satisfied with the speed and ease of use. However, we would like to experiment with different ranking algorithms, and this is where problems arise. Lucene only supports the VSM, and unfortunately the ranking architecture seems to be tailored specifically to its needs. I would be very much interested in revamping the ranking component as a GSoC project. The following modifications should be doable in the allocated time frame: - a new ranking class hierarchy, which is generic enough to allow easy implementation of new weighting schemes (at least bag-of-words ones), - addition of state-of-the-art ranking methods, such as Okapi BM25, proximity and DFR models, - configuration for ranking selection, with the old method as default. I believe all users of Lucene would profit from such a project. It would provide the scientific community with an even more useful research aid, while regular users could benefit from superior ranking results. Please let me know your opinion about this proposal. Thank you very much, David Nemeskey - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org