Re: about flexing ranking module in lucene

2011-09-02 Thread David Nemeskey
Hi,

http://wiki.apache.org/lucene-java/SummerOfCode2011ProjectRanking seem
  s interesting. what's the status of this branch? will it be included in
  lucene4 release?
 
 Hi, its very close. there are some nocommits still in the branch right
 now, once these are fixed we will look at merging to trunk.
I've checked the nocommits in the similarities package, and it seems to me 
that there is only one that is really no-worky (the phrase df). The rest are 
about modifications to a few DFR models that are suboptimal, but they work 
nevertheless.

Robert: I figured I'd take a week out for a much needed rest (not), what about 
getting back on this on Monday?

David

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Fwd: Final Evaluations results processed for LUCENE-2959: Implementing State of the Art Ranking for Lucene

2011-08-29 Thread David Nemeskey
Hi,

let me chime in too. I have received the attached mail today, telling me that 
I have also passed the final evaluations. I would like to thank everyone for 
the support, especially Robert, who accepted the heavy burden of mentoring me 
through the project. :)

There's still a lot to do though, so I guess it won't be the last you heard of 
me.

Thanks,
David

--  Forwarded Message  --

Subject: Final Evaluations results processed for LUCENE-2959: Implementing 
State of the Art Ranking for Lucene
Date: 2011 August 26, Friday, 21:07:41
From: no-re...@socghop.appspotmail.com
To: nemeskey.da...@sztaki.hu
CC: nor...@apache.org, u...@apache.org, rcm...@gmail.com

Hi David Nemeskey,


We have processed the evaluation for your project named LUCENE-2959:  
Implementing State of the Art Ranking for Lucene with Apache Software  
Foundation.

Congratulations, from our data it seems that you have successfully passed  
the Final Evaluations. Please contact your mentor to discuss the results of  
your evaluation and to plan your goals and development plan for the rest of  
the program


Greetings,
The Google Open Source Programs Team


-

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



My GSoC project page

2011-05-30 Thread David Nemeskey
Hi,

I've created a page for my GSoC project in the wiki. Currently there is only a 
short report on Terrier's ranking architecture and a few questions on it, but 
I plan to publish all documents related to the project there -- analyses, 
design plans, benchmarks, etc. Please feel free to comment on anything you 
read there.

The address is http://wiki.apache.org/lucene-
java/SummerOfCode2011ProjectRanking

I have modified the the central GSoC page, SummerOfCode2011, so that the 
project title links to the new page.

David

On 2011 May 30, Monday 16:20:49 Apache Jenkins Server wrote:
 Build:
 https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/8516/
 
 All tests passed
 
 Build Log (for compile errors):
 [...truncated 14814 lines...]
 
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: I was accepted in GSoC!!!

2011-05-04 Thread David Nemeskey
Hi Uwe,

do you mean one issue per GSoC proposal, or one for every logical unit in 
the project?

If the second: Robert told me to use the flexscoring branch as a base for my 
project, since preliminary work has already been done in that branch. Should I 
open JIRA issues nevertheless?

Thanks,
David

On 2011 May 04, Wednesday 09:56:02 Uwe Schindler wrote:
 Hi Vinicius,
 
 Submitting patches via JIRA is fine! We were just thinking about possibly
 providing some SVN to work with (as additional training), but came to the
 conclusion, that all students should go the standard Apache Lucene way of
 submitting patches to JIRA issues. You can of course still use SVN / GIT
 locally to organize your code. At the end we just need a patch to be
 committed by one of the core committers.

Uwe

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [GSoC] Apache Lucene @ Google Summer of Code 2011 [STUDENTS READ THIS]

2011-03-23 Thread David Nemeskey
Hey Simon and all,

May we get an update on this? I understand that Google has published the list 
of accepted organizations, which -- not surprisingly -- includes the ASF. Is 
there any information on how many slots Apache got, and which issues will be 
selected?

The student application period opens on the 28th, so I'm just wondering if I 
should go ahead and apply or wait for the decision.

Thanks,
David

On 2011 March 11, Friday 17:23:58 Simon Willnauer wrote:
 Hey folks,
 
 Google Summer of Code 2011 is very close and the Project Applications
 Period has started recently. Now it's time to get some excited students
 on board for this year's GSoC.
 
 I encourage students to submit an application to the Google Summer of Code
 web-application. Lucene  Solr are amazing projects and GSoC is an
 incredible opportunity to join the community and push the project
 forward.
 
 If you are a student and you are interested spending some time on a
 great open source project while getting paid for it, you should submit
 your application from March 28 - April 8, 2011. There are only 3
 weeks until this process starts!
 
 Quote from the GSoC website: We hear almost universally from our
 mentoring organizations that the best applications they receive are
 from students who took the time to interact and discuss their ideas
 before submitting an application, so make sure to check out each
 organization's Ideas list to get to know a particular open source
 organization better.
 
 So if you have any ideas what Lucene  Solr should have, or if you
 find any of the GSoC pre-selected projects [1] interesting, please
 join us on dev@lucene.apache.org [2].  Since you as a student must
 apply for a certain project via the GSoC website [3], it's a good idea
 to work on it ahead of time and include the community and possible
 mentors as soon as possible.
 
 Open source development here at the Apache Software
 Foundation happens almost exclusively in the public and I encourage you to
 follow this. Don't mail folks privately; please use the mailing list to
 get the best possible visibility and attract interested community
 members and push your idea forward. As always, it's the idea that
 counts not the person!
 
 That said, please do not underestimate the complexity of even small
 GSoC - Projects. Don't try to rewrite Lucene or Solr!  A project
 usually gains more from a smaller, well discussed and carefully
 crafted  tested feature than from a half baked monster change that's
 too large to work with.
 
 Once your proposal has been accepted and you begin work, you should
 give the community the opportunity to iterate with you.  We prefer
 progress over perfection so don't hesitate to describe your overall
 vision, but when the rubber meets the road let's take it in small
 steps.  A code patch of 20 KB is likely to be reviewed very quickly so
 get fast feedback, while a patch even 60kb in size can take very
 - Hide quoted text -
 long. So try to break up your vision and the community will work with
 you to get things done!
 
 On behalf of the Lucene  Solr community,
 
 Go! join the mailing list and apply for GSoC 2011,
 
 Simon
 
 [1]
 https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=truejqlQu
 ery=labels+%3D+lucene-gsoc-11 [2]
 http://lucene.apache.org/java/docs/mailinglists.html
 [3] http://www.google-melange.com
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: GSoC

2011-03-10 Thread David Nemeskey
Ok, I have created a new issue, LUCENE-2959 for this project. I have uploaded 
the pdfs and added the gsoc2011 and lucene-gsoc-2011 labels as well.

David

On 2011 March 09, Wednesday 21:58:53 Simon Willnauer wrote:
 On Wed, Mar 9, 2011 at 5:48 PM, Grant Ingersoll gsing...@apache.org wrote:
  I think we, Lucene committers, need to identify who is willing to mentor.
 In my experience, it is less than 5 hours a week.  Most of the work
  is done as part of the community.  Sometimes you have to be tough and
  fail someone (I did last year) but most of the time, if you take the
  time to interview the candidates up front, it is a good experience for
  everyone.
 
 count me in
 
  I'd add it would be useful to have everyone put the lucene-gsoc-11 label
  on their issues too, that way we can quickly find the Lucene ones.
 
 done on at least one ;)
 
 simon
 
  Also, feel free to label existing bugs.
  
  On Mar 9, 2011, at 2:11 AM, Simon Willnauer wrote:
  Hey David and all others who want to contribute to GSoC,
  
  the ASF has applied for GSoC 2011 as a mentoring organization. As a
  ASF project we don't need to apply directly though but we need to
  register our ideas now. This works like almost anything in the ASF
  through JIRA. All ideas should be recorded as JIRA tickets  labeled
  with gsoc2011. Once this is done it will show up here:
  http://s.apache.org/gsoc2011tasks
  
  Everybody who is interested in GSoC as a mentor or student should now
  read this too http://community.apache.org/gsoc.html
  
  
  Thanks,
  
  Simon
  
  
  
  
  On Thu, Feb 24, 2011 at 12:14 PM, David Nemeskey
  
  nemeskey.da...@sztaki.hu wrote:
  Please find the implementation plan attached. The word soon gets a
  new meaning when power outages are taken into account. :)
  
  As before, comments are welcome.
  
  David
  
  On Tuesday, February 22, 2011 15:22:57 Simon Willnauer wrote:
  I think that is good for now. I should get started on codeawards and
  wrap up our proposals. I hope I can do that this week.
  
  simon
  
  On Tue, Feb 22, 2011 at 3:16 PM, David Nemeskey
  
  nemeskey.da...@sztaki.hu wrote:
  Hey,
  
  I have written the proposal. Please let me know if you want more /
  less of certain parts. Should I upload it somewhere?
  
  Implementation plan soon to follow.
  
  Sorry for the late reply; I have been rather busy these past few
  weeks.
  
  David
  
  On Wednesday, February 02, 2011 10:35:55 Simon Willnauer wrote:
  Hey David,
  
  I saw that you added a tiny line to the GSoC Lucene wiki - thanks
  for that.
  
  On Wed, Feb 2, 2011 at 10:10 AM, David Nemeskey
  
  nemeskey.da...@sztaki.hu wrote:
  Hi guys,
  
  Mark, Robert, Simon: thanks for the support! I really hope we can
  work together this summer (and before that, obviously).
  
  Same here!
  
  According to http://www.google-
  melange.com/document/show/gsoc_program/google/gsoc2011/timeline ,
  there's still some time until the application period. So let me use
  this week to finish my PhD research plan, and get back to you next
  week.
  
  I am not really familiar with how the program works, i.e. how
  detailed the application description should be, when mentorship is
  decided, etc. so I guess we will have a lot to talk about. :)
  
  so from a 1ft view it work like this:
  
  1. Write up a short proposal what your idea is about
  2. make it public! and publish a implementation plan - how you would
  want to realize your proposal. If you don't follow that 100% in the
  actual impl. don't worry. Its just mean to give us an idea that you
  know what you are doing and where you want to go. something like a 1
  A4 rough design doc.
  3. give other people the change to apply for the same suggestion
  (this is how it works though)
  4 Let the ASF / us assign one or more possible mentors to it
  5. let us apply for a slot in GSoC (those are limited for
  organizations) 6. get accepted
  7. rock it!
  
  (Actually, should we move this discussion private?)
  
  no - we usually do everything in public except of discussion within
  the PMC that are meant to be private for legal reasons or similar
  things. Lets stick to the mailing list for all communication except
  you have something that should clearly not be public. This also give
  other contributors a chance to help and get interested in your
  work!!
  
  simon
  
  David
  
  Hi David, honestly this sounds fantastic.
  
  It would be great to have someone to work with us on this issue!
  
  To date, progress is pretty slow-going (minor improvements,
  cleanups, additional stats here and there)... but we really need
  all the help we can get, especially from people who have a really
  good understanding of the various models.
  
  In case you are interested, here are some references to
  discussions about adding more flexibility (with some prototypes
  etc):
  http://www.lucidimagination.com/search/document/72787e0e54f798e4/
  baby _st eps _towards_making_lucene_s_scoring_more_flexible

Re: GSoC

2011-02-02 Thread David Nemeskey
Hi guys,

Mark, Robert, Simon: thanks for the support! I really hope we can work 
together this summer (and before that, obviously).

According to http://www.google-
melange.com/document/show/gsoc_program/google/gsoc2011/timeline , there's 
still some time until the application period. So let me use this week to finish 
my PhD research plan, and get back to you next week.

I am not really familiar with how the program works, i.e. how detailed the 
application description should be, when mentorship is decided, etc. so I guess 
we will have a lot to talk about. :)

(Actually, should we move this discussion private?)

David

 Hi David, honestly this sounds fantastic.
 
 It would be great to have someone to work with us on this issue!
 
 To date, progress is pretty slow-going (minor improvements, cleanups,
 additional stats here and there)... but we really need all the help we
 can get, especially from people who have a really good understanding
 of the various models.
 
 In case you are interested, here are some references to discussions
 about adding more flexibility (with some prototypes etc):
 http://www.lucidimagination.com/search/document/72787e0e54f798e4/baby_steps
 _towards_making_lucene_s_scoring_more_flexible
 https://issues.apache.org/jira/browse/LUCENE-2392

 On Fri, Jan 28, 2011 at 11:32 AM, David Nemeskey
 
 nemeskey.da...@sztaki.hu wrote:
  Hi all,
  
  I have already sent this mail to Simon Willnauer, and he suggested me to
  post it here for discussion.
  
  I am David Nemeskey, a PhD student at the Eotvos Lorand University,
  Budapest, Hungary. I am doing an IR-related research, and we have
  considered using Lucene as our search engine. We were quite satisfied
  with the speed and ease of use. However, we would like to experiment
  with different ranking algorithms, and this is where problems arise.
  Lucene only supports the VSM, and unfortunately the ranking architecture
  seems to be tailored specifically to its needs.
  
  I would be very much interested in revamping the ranking component as a
  GSoC project. The following modifications should be doable in the
  allocated time frame:
  - a new ranking class hierarchy, which is generic enough to allow easy
  implementation of new weighting schemes (at least bag-of-words ones),
  - addition of state-of-the-art ranking methods, such as Okapi BM25,
  proximity and DFR models,
  - configuration for ranking selection, with the old method as default.
  
  I believe all users of Lucene would profit from such a project. It would
  provide the scientific community with an even more useful research aid,
  while regular users could benefit from superior ranking results.
  
  Please let me know your opinion about this proposal.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



GSoC

2011-01-28 Thread David Nemeskey
Hi all,

I have already sent this mail to Simon Willnauer, and he suggested me to post 
it here for discussion.

I am David Nemeskey, a PhD student at the Eotvos Lorand University, Budapest, 
Hungary. I am doing an IR-related research, and we have considered using 
Lucene as our search engine. We were quite satisfied with the speed and ease of 
use. However, we would like to experiment with different ranking algorithms, 
and this is where problems arise. Lucene only supports the VSM, and 
unfortunately the ranking architecture seems to be tailored specifically to its 
needs.

I would be very much interested in revamping the ranking component as a GSoC 
project. The following modifications should be doable in the allocated time 
frame:
- a new ranking class hierarchy, which is generic enough to allow easy 
implementation of new weighting schemes (at least bag-of-words ones),
- addition of state-of-the-art ranking methods, such as Okapi BM25, proximity 
and DFR models,
- configuration for ranking selection, with the old method as default.

I believe all users of Lucene would profit from such a project. It would 
provide the scientific community with an even more useful research aid, while 
regular users could benefit from superior ranking results.

Please let me know your opinion about this proposal.

Thank you very much,
David Nemeskey

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org