Re: GSoC 2014 mentor request

2014-03-21 Thread Tommaso Teofili
Thanks all, just subscribed to the mentors list. Regards, Tommaso 2014-03-21 10:23 GMT+01:00 Michael McCandless luc...@mikemccandless.com: ACK from Lucene PMC. I'm also CC'ing ment...@community.apache.org (Tommaso, you should subscribe if you haven't already). Thanks Tommaso! Sad to have

Re: GSoC 2014 mentor request

2014-03-21 Thread Michael McCandless
You should also subscribe to code-awards@a.o. See http://community.apache.org/gsoc.html for details ... Thanks for being a mentor! We have far too few mentors in Lucene/Solr unfortunately. Mike McCandless http://blog.mikemccandless.com On Fri, Mar 21, 2014 at 6:23 AM, Tommaso Teofili

Re: GSoC 2014 mentor request

2014-03-21 Thread Tommaso Teofili
2014-03-21 11:35 GMT+01:00 Michael McCandless luc...@mikemccandless.com: You should also subscribe to code-awards@a.o. strangely this resulted in qmail-send program replying: code-awards-subscr...@apache.org: This mailing list has moved to mentors at community.apache.org. so I guess

Re: GSoC 2014 mentor request

2014-03-21 Thread Michael McCandless
Ahh... the list must have moved. Good to know :) Mike McCandless http://blog.mikemccandless.com On Fri, Mar 21, 2014 at 7:04 AM, Tommaso Teofili tommaso.teof...@gmail.com wrote: 2014-03-21 11:35 GMT+01:00 Michael McCandless luc...@mikemccandless.com: You should also subscribe to

Re: GSoC

2014-03-12 Thread Michael McCandless
Hi Ivan, It's best to just add a comment onto LUCENE-466 with your ideas/questions specific to that issue; other more general questions should be sent to this dev list. Since the big part of that issue (supporting minShouldMatch in BooleanQuery) was already done, I think fixing query parsers to

Re: GSoC

2014-03-12 Thread Ivan Biggs
First, thanks so much for getting me pointed in the right direction! I assume you mean straight on Jira? Also do you have any clue where one would be able to find past proposals for Lucene? Thanks, Ivan On Wed, Mar 12, 2014 at 12:08 PM, Michael McCandless luc...@mikemccandless.com wrote: Hi

Re: GSoC

2014-03-12 Thread Michael McCandless
Sorry, yes, please add comments/ideas straight on the Jira issue, i.e. https://issues.apache.org/jira/browse/LUCENE-466 in this case. Hmm, I'm not sure how to find past proposals. The links to these proposals, e.g. from my past blog post, and from past Jira issues, seem to be broken now. Mike

Re: GSoC 2014 on LUCENE-466: Need QueryParser support for BooleanQuery.minNrShouldMatch

2014-02-28 Thread Michael McCandless
I think a good place to start is on the issue itself. E.g. add a comment expressing that you're interested in this issue, maybe summarize roughly what's entailed. E.g., that issue is quite old, and the first part of it (supporting minShouldMatch in BQ) has already been done, so all that remains

Re: GSOC 2013

2013-03-30 Thread Michael McCandless
Thanks Adrien! Mike McCandless http://blog.mikemccandless.com On Fri, Mar 29, 2013 at 1:49 PM, Adrien Grand jpou...@gmail.com wrote: Hi, Although I probably won't be able to mentor students next summer, I think it would be great to have students this year too. I modified open JIRA issues

Re: GSoC 2013

2013-03-20 Thread Tommaso Teofili
Hello Raimon, depending on what focus your master thesis should be Lucene / Solr may or not be the right project. Basically if your sentiment analysis topic is tight to information retrieval (very dummy example: making a search engine which scores documents boosting positive ones) then it could

Re: GSoC 2013

2013-03-20 Thread Raimon Bosch
Hi Tommaso, Yes, I agree. To use Lucene in this kind of project we would need to focus on creating sentiment ranking or improve the text classification capabilities of Lucene. Integration with other might be interesting, also. Thanks, Raimon Bosch. 2013/3/20 Tommaso Teofili

Re: GSoC 2013

2013-03-19 Thread Raimon Bosch
Anyone interested? 2013/3/18 Raimon Bosch raimon.bo...@gmail.com Hi all, I would be interested in doing a Google Summer of Code this year with Lucene or Solr. My master thesis topic is about Sentiment analysis, there is any research in this direction inside Solr and Lucene? If there is any

Re: [GSoC] codec not registered?

2012-04-30 Thread Robert Muir
Since your test uses PerFieldPostingsFormat, its going to write the name of your format PForDelta into the index and expects to be able to load it via the SPI mechanism. So I think you should register your PForDeltaPostingsFormat in

Re: [GSoC] codec not registered?

2012-04-30 Thread Han Jiang
Ah, I see. Thank you Robert ! On Tue, May 1, 2012 at 2:46 AM, Robert Muir rcm...@gmail.com wrote: Since your test uses PerFieldPostingsFormat, its going to write the name of your format PForDelta into the index and expects to be able to load it via the SPI mechanism. So I think you should

Re: GSoC 2012 - Refactoring IndexWriter (LUCENE-2026)

2012-04-05 Thread Timur Achmetow
Hi, here's my first suggestion for the Refactoring steps: By now is the IW-class very big and i would try to reduce the code, by delegate special functions to the new components (Pattern: SRP). So keeps the IndexWriter most of his APIs and delegates only. I would try to extract the internals

Re: GSoC - Refactoring IndexWriter

2012-04-04 Thread Achmetow (Google)
Hey Simon, thx for your fast response! to begin with make sure you read this: http://wiki.apache.org/lucene-java/SummerOfCode2012 http://wiki.apache.org/lucene-java/HowToContribute Okay, i read the documentation. Yeah we have multiple test for IndexWriter (IW in short) the are all

Re: GSoC - Refactoring IndexWriter

2012-04-03 Thread Simon Willnauer
Hey Tim, great to have you! to begin with make sure you read this: http://wiki.apache.org/lucene-java/SummerOfCode2012 On Wed, Apr 4, 2012 at 12:20 AM, Achmetow (Google) achmeto...@googlemail.com wrote: Hi, I am a student from Germany and would like to contribute to the ASF Lucene project.

Re: [GSoC] About how flexible indexing works in lucene 4.0

2012-03-28 Thread Michael McCandless
On Mon, Mar 26, 2012 at 6:59 PM, Han Jiang jiangha...@gmail.com wrote: Hi all, I was trying to figure out the control flow of IndexWriter and IndexSearcher, in order to get a better understanding of the idea behind Codec implementation. However, there seem to be some questions related with

Re: [GSoC] Question about LUCENE-3892

2012-03-23 Thread Michael McCandless
Hello, One quick question up front: are you subscribed to the dev list? If not, you may have missed my response to your last email with GSoC questions: http://lucene.markmail.org/thread/lqv6lyql2nlagv7f#query:+page:1+mid:ubjsvvfviuaexqlo+state:results Answers below: On Fri, Mar 23,

Re: [GSoC]About some general information

2012-03-21 Thread Michael McCandless
Hello! Answers below...: On Wed, Mar 21, 2012 at 11:03 AM, Han Jiang jiangha...@gmail.com wrote: Hi All, I'm Billy, a senior undergraduate student in Peking University. I'm working in the area of Information Retrieval and Web Mining. When going through the idea list, I felt quite interested

Re: GSOC 2012?

2012-03-10 Thread Simon Willnauer
Mark, can you open an issue for this and lable it as: gsoc2012 lucene-gsoc-12 mentor just like this one https://issues.apache.org/jira/browse/LUCENE-2562 thanks, simon On Fri, Mar 2, 2012 at 12:26 PM, mark harwood markharw...@yahoo.co.uk wrote: Does anyone have any ideas? A framework for

Re: GSOC 2012?

2012-03-02 Thread Simon Willnauer
On Fri, Mar 2, 2012 at 11:30 AM, Robert Muir rcm...@gmail.com wrote: Hello, I was asked by a student if we are participating in GSOC this year. I hope the answer is yes? If we are planning to, I think it would be good if we came up with a list on the wiki of potential tasks. Does anyone

Re: GSOC 2012?

2012-03-02 Thread mark harwood
Does anyone have any ideas? A framework for match metadata? Similar to the way tokenization was changed to allow tokenizers to to enrich a stream of tokens with arbitrary attributes, Scorers could provide MatchAttributes to provide arbitrary metadata about the stream of matches they produce.

Re: GSOC 2012?

2012-03-02 Thread Simon Willnauer
I created an initial GSOC 2012 page here: http://wiki.apache.org/lucene-java/SummerOfCode2012 simon On Fri, Mar 2, 2012 at 12:26 PM, mark harwood markharw...@yahoo.co.uk wrote: Does anyone have any ideas? A framework for match metadata? Similar to the way tokenization was changed to allow

Re: GSOC 2012?

2012-03-02 Thread Robert Muir
Thanks for helping to get this started Simon and Mark! On Fri, Mar 2, 2012 at 7:10 AM, Simon Willnauer simon.willna...@googlemail.com wrote: I created an initial GSOC 2012 page here: http://wiki.apache.org/lucene-java/SummerOfCode2012 simon On Fri, Mar 2, 2012 at 12:26 PM, mark harwood

Re: GSoC: LUCENE-2308: Separately specify a field's type

2011-05-13 Thread Nikola Tanković
2011/5/12 Michael McCandless luc...@mikemccandless.com 2011/5/9 Nikola Tanković nikola.tanko...@gmail.com: Introduction of an FieldType class that will hold all the extra properties now stored inside Field instance other than field value itself. Seems like this is an easy first

Re: GSoC: LUCENE-2308: Separately specify a field's type

2011-04-14 Thread Michael McCandless
2011/4/13 Nikola Tanković nikola.tanko...@gmail.com: Hi all, if everything goes well I'll be delighted to be part of this project this summer together with my assigned mentor Mike. My task will be to introduce new classes to Lucene core which will enable to separate Fields' Lucene properties

Re: GSoC Lucene proposals

2011-04-06 Thread Vinicius Barrox
Done! --- Em qua, 6/4/11, Adriano Crestani adrianocrest...@apache.org escreveu: De: Adriano Crestani adrianocrest...@apache.org Assunto: GSoC Lucene proposals Para: dev@lucene.apache.org Data: Quarta-feira, 6 de Abril de 2011, 22:43 Hi students, We are receiving very good proposals this year, I

Re: GSoC 2011

2011-03-24 Thread Adriano Crestani
Hi Phillipe, You could start taking a look at these projects: LUCENE-2979 https://issues.apache.org/jira/browse/LUCENE-2979 https://issues.apache.org/jira/browse/LUCENE-2979LUCENE-2309https://issues.apache.org/jira/browse/LUCENE-2309

Re: [GSoC] Apache Lucene @ Google Summer of Code 2011 [STUDENTS READ THIS]

2011-03-23 Thread David Nemeskey
Hey Simon and all, May we get an update on this? I understand that Google has published the list of accepted organizations, which -- not surprisingly -- includes the ASF. Is there any information on how many slots Apache got, and which issues will be selected? The student application period

Re: [GSoC] Apache Lucene @ Google Summer of Code 2011 [STUDENTS READ THIS]

2011-03-23 Thread Simon Willnauer
On Wed, Mar 23, 2011 at 9:37 AM, David Nemeskey nemeskey.da...@sztaki.hu wrote: Hey Simon and all, May we get an update on this? I understand that Google has published the list of accepted organizations, which -- not surprisingly -- includes the ASF. Is there any information on how many slots

Re: GSoC

2011-03-10 Thread David Nemeskey
Ok, I have created a new issue, LUCENE-2959 for this project. I have uploaded the pdfs and added the gsoc2011 and lucene-gsoc-2011 labels as well. David On 2011 March 09, Wednesday 21:58:53 Simon Willnauer wrote: On Wed, Mar 9, 2011 at 5:48 PM, Grant Ingersoll gsing...@apache.org wrote: I

Re: GSoC

2011-03-10 Thread Simon Willnauer
awesome thanks! simon On Thu, Mar 10, 2011 at 11:54 AM, David Nemeskey nemeskey.da...@sztaki.hu wrote: Ok, I have created a new issue, LUCENE-2959 for this project. I have uploaded the pdfs and added the gsoc2011 and lucene-gsoc-2011 labels as well. David On 2011 March 09, Wednesday

Re: GSoC

2011-03-10 Thread Michael McCandless
On Wed, Mar 9, 2011 at 3:58 PM, Simon Willnauer simon.willna...@googlemail.com wrote: On Wed, Mar 9, 2011 at 5:48 PM, Grant Ingersoll gsing...@apache.org wrote: I think we, Lucene committers, need to identify who is willing to mentor.     In my experience, it is less than 5 hours a week.  Most

Re: GSoC

2011-03-09 Thread Grant Ingersoll
I think we, Lucene committers, need to identify who is willing to mentor.In my experience, it is less than 5 hours a week. Most of the work is done as part of the community. Sometimes you have to be tough and fail someone (I did last year) but most of the time, if you take the time to

Re: GSoC

2011-03-09 Thread Simon Willnauer
On Wed, Mar 9, 2011 at 5:48 PM, Grant Ingersoll gsing...@apache.org wrote: I think we, Lucene committers, need to identify who is willing to mentor.     In my experience, it is less than 5 hours a week.  Most of the work is done as part of the community.  Sometimes you have to be tough and

Re: GSoC

2011-03-08 Thread Simon Willnauer
Hey David and all others who want to contribute to GSoC, the ASF has applied for GSoC 2011 as a mentoring organization. As a ASF project we don't need to apply directly though but we need to register our ideas now. This works like almost anything in the ASF through JIRA. All ideas should be

Re: GSoC

2011-02-22 Thread Simon Willnauer
I think that is good for now. I should get started on codeawards and wrap up our proposals. I hope I can do that this week. simon On Tue, Feb 22, 2011 at 3:16 PM, David Nemeskey nemeskey.da...@sztaki.hu wrote: Hey, I have written the proposal. Please let me know if you want more / less of

Re: GSoC

2011-02-22 Thread Fernando Wasylyszyn
nemeskey.da...@sztaki.hu Enviado: martes, 22 de febrero, 2011 11:22:57 Asunto: Re: GSoC I think that is good for now. I should get started on codeawards and wrap up our proposals. I hope I can do that this week. simon On Tue, Feb 22, 2011 at 3:16 PM, David Nemeskey nemeskey.da...@sztaki.hu wrote: Hey

Re: GSoC

2011-02-02 Thread David Nemeskey
Hi guys, Mark, Robert, Simon: thanks for the support! I really hope we can work together this summer (and before that, obviously). According to http://www.google- melange.com/document/show/gsoc_program/google/gsoc2011/timeline , there's still some time until the application period. So let me

Re: GSoC

2011-02-02 Thread Simon Willnauer
Hey David, I saw that you added a tiny line to the GSoC Lucene wiki - thanks for that. On Wed, Feb 2, 2011 at 10:10 AM, David Nemeskey nemeskey.da...@sztaki.hu wrote: Hi guys, Mark, Robert, Simon: thanks for the support! I really hope we can work together this summer (and before that,

Re: GSoC

2011-02-02 Thread Grant Ingersoll
On Feb 2, 2011, at 4:10 AM, David Nemeskey wrote: Hi guys, Mark, Robert, Simon: thanks for the support! I really hope we can work together this summer (and before that, obviously). Sounds like a great idea. Looking forward to the proposal. According to http://www.google-

Re: GSoC

2011-01-28 Thread Mark Miller
+1 the proposal. We already have a committer digging into this area - he would make a perfect GSoC mentor! And would likely love the help. His response likely to follow... - Mark On Jan 28, 2011, at 11:32 AM, David Nemeskey wrote: Hi all, I have already sent this mail to Simon Willnauer,

Re: GSoC

2011-01-28 Thread Simon Willnauer
On Fri, Jan 28, 2011 at 5:42 PM, Mark Miller markrmil...@gmail.com wrote: +1 the proposal. We already have a committer digging into this area - he would make a perfect GSoC mentor! And would likely love the help. same here +1 - if there is mentoring needed I will be there too. Robert I

Re: GSoC

2011-01-28 Thread Robert Muir
On Fri, Jan 28, 2011 at 11:32 AM, David Nemeskey nemeskey.da...@sztaki.hu wrote: Hi all, I have already sent this mail to Simon Willnauer, and he suggested me to post it here for discussion. I am David Nemeskey, a PhD student at the Eotvos Lorand University, Budapest, Hungary. I am doing an

Re: [GSOC] Congrats to all students

2010-04-27 Thread Richard Simon Just
Thanks guys! So happy to get it, and really excited that Mahout got 5 slots. @Robin: I'm totally up for a shared blog, was planning on blogging about it anyway. Robin Anil wrote: Congrats everyone.And a special thanks to Benson for helping us get the slots to 5 this year :) For students

Re: [GSOC] Congrats to all students

2010-04-27 Thread zhao zhendong
Thanks everyone! I am so exciting to be accepted and I will do my best to finish my proposal in time. A shared blog sounds great to me. The GSoC looks like a training, we suppose to share the experience with all who interested in Mahout project. Cheers, Zhendong On Tue, Apr 27, 2010 at 3:22 PM,

Re: [GSOC] Congrats to all students

2010-04-27 Thread Sisir Koppaka
+1 for shared blog!

Re: [GSOC] Congrats to all students

2010-04-27 Thread Zaid Md Abdul Wahab Sheikh
Thanks. It's great to finally have the chance to be a part of Apache Mahout. Congratulations to everyone who got selected! +1 for the shared blog idea! On Tue, Apr 27, 2010 at 12:52 PM, Robin Anil robin.a...@gmail.com wrote: Congrats everyone.And a special thanks to Benson for helping us

Re: [GSOC] Congrats to all students

2010-04-26 Thread Sisir Koppaka
Thanks everyone! This is a fantastic opportunity, and I'll try to make the best of this for myself, as well as Mahout. Hopefully, we'll have a great compilation of deep learning networks within the next few releases. BTW, congrats to everyone on Mahout becoming a TLP! On Tue, Apr 27, 2010 at

Re: [GSOC] 2010 Timelines

2010-04-09 Thread Isabel Drost
Timeline including Apache internal deadlines: http://cwiki.apache.org/confluence/display/COMDEVxSITE/GSoC Mentors, please also click on the ranking link to the ranking explanation [1] for more information on how to rank student proposals. Isabel [1]

Re: [GSOC] Wiki Page Added

2010-03-31 Thread zhao zhendong
Hi Grant, Could you please give us the link of this page? Cheers, Zhendong On Wed, Mar 31, 2010 at 8:53 PM, Grant Ingersoll gsing...@apache.orgwrote: I created a Wiki page on GSOC. I hope everyone considering GSOC reads it. Mentors, please add as you see fit. Would be good to get a Mahout

Re: [GSOC] Wiki Page Added

2010-03-31 Thread Grant Ingersoll
D'oh! My bad: http://cwiki.apache.org/MAHOUT/gsoc.html. It's linked from the front wiki page under community. -Grant On Mar 31, 2010, at 9:11 AM, zhao zhendong wrote: Hi Grant, Could you please give us the link of this page? Cheers, Zhendong On Wed, Mar 31, 2010 at 8:53 PM, Grant

Re: [GSOC] Wiki Page Added

2010-03-31 Thread zhao zhendong
Ha, thanks. On Wed, Mar 31, 2010 at 9:29 PM, Grant Ingersoll gsing...@apache.orgwrote: D'oh! My bad: http://cwiki.apache.org/MAHOUT/gsoc.html. It's linked from the front wiki page under community. -Grant On Mar 31, 2010, at 9:11 AM, zhao zhendong wrote: Hi Grant, Could you please

Re: GSOC 2010

2010-03-31 Thread Robin Anil
Hi Tanya, MAHOUT-328 is just a general stub. There is no detailed project description other than what is given there. The idea is we let you propose to implement a clustering algorithm in Mahout. Start here http://cwiki.apache.org/MAHOUT/gsoc.html. Browse through the Wiki. Look at

Re: GSOC 2010 is here

2010-02-02 Thread Isabel Drost
On Mon Robin Anil robin.a...@gmail.com wrote: 2. UIMA Integration with Mahout? (Maybe a good project if UIMA folks are taking in GSOC students) I guess one could easily split this one in two: a) Using UIMA (whole pipeline or just the analysers if that is possible) for data pre-processing

Re: GSOC 2010 is here

2010-02-01 Thread Isabel Drost
On Wed Robin Anil robin.a...@gmail.com wrote: Greetings! Fellow GSOC alums, administrators and dear mentors, the next edition is right here. Details are given in the link below. https://groups.google.com/group/google-summer-of-code-discuss/browse_thread/thread/d839c0b02ac15b3f Some

Re: GSOC 2010 is here

2010-02-01 Thread Robin Anil
Some more Wild and Wacky Ideas. Might be out of scope for GSOC, but are nice to have features for mahout. I would like to encourage all of you to put down your ideas here. 1. Data Visualization tool backed with HDFS/Hbase for inspecting clusters, Topic model etc etc - It could have many

Re : [GSOC] Code Submissions

2009-09-08 Thread deneche abdelhakim
done. --- En date de : Mar 8.9.09, Grant Ingersoll gsing...@apache.org a écrit : De: Grant Ingersoll gsing...@apache.org Objet: [GSOC] Code Submissions À: Mahout Dev List mahout-dev@lucene.apache.org Date: Mardi 8 Septembre 2009, 13h09 Hi Robin, David and Deneche, You will need to submit

Re: Re : [GSOC] July 6 is mid-term evaluations

2009-07-07 Thread Ted Dunning
I filled out one for Deneche. On Tue, Jul 7, 2009 at 9:32 AM, deneche abdelhakim a_dene...@yahoo.frwrote: The students mid-term survey is available online. I'm posting this because I almost forgot it =P --- En date de : Mer 17.6.09, Grant Ingersoll gsing...@apache.org a écrit : De:

Re: [GSOC] July 6 is mid-term evaluations

2009-07-07 Thread Isabel Drost
On Tuesday 07 July 2009 20:34:09 Ted Dunning wrote: I filled out one for Deneche. I submitted the one for Robin yesterday evening. Isabel -- QOTD: Produtos desenvolvidos para todo tipo de idiota * Impresso no fundo, embaixo, de uma sobremesa tiramisudo Tesco: ``N�o vire de ponta cabe�a.''

Re: [GSOC] Thoughts about Random forests map-reduce implementation

2009-06-18 Thread Ted Dunning
Very similar, but I was talking about building trees on each split of the data (a la map reduce split). That would give many small splits and would thus give very different results from bagging because the splits would be small and contigous rather than large and random. On Thu, Jun 18, 2009 at

Re: [GSOC] GSOC Start time nearing

2009-05-14 Thread Isabel Drost
On Tuesday 12 May 2009 19:50:21 Grant Ingersoll wrote: http://socghop.appspot.com/document/show/program/google/gsoc2009/timeline May 23. Hope all of our students and mentors are ready to go. I certainly am*. Isabel * Might be a bit distracted on that exact day though: It's my birthday ;)

Re: [GSOC] Accepted Students

2009-04-23 Thread Grant Ingersoll
It's also helpful to get yourself a Wiki account and a JIRA account if you don't already have them. Small patches to the existing docs/code can also help you figure out the process On Apr 21, 2009, at 1:19 PM, Isabel Drost wrote: On Tuesday 21 April 2009 08:30:34 David Hall wrote: As

Re: [GSOC] Accepted Students

2009-04-23 Thread David Hall
Thanks everyone! -- David On Thu, Apr 23, 2009 at 12:53 PM, Grant Ingersoll gsing...@apache.org wrote: It's also helpful to get yourself a Wiki account and a JIRA account if you don't already have them.  Small patches to the existing docs/code can also help you figure out the process On

Re: [GSOC] Accepted Students

2009-04-21 Thread deneche abdelhakim
...@cs.stanford.edu a écrit : De: David Hall d...@cs.stanford.edu Objet: Re: [GSOC] Accepted Students À: mahout-dev@lucene.apache.org Date: Mardi 21 Avril 2009, 8h30 On Mon, Apr 20, 2009 at 11:18 PM, deneche abdelhakim a_dene...@yahoo.fr wrote: Hi, =D I've been accepted. And I'll

Re: [GSOC] Accepted Students

2009-04-21 Thread Joe Kumar
. * know how to run an example in Hadoop, at least in pseudo-distributed: http://hadoop.apache.org/core/docs/current/quickstart.html --- En date de : Mar 21.4.09, David Hall d...@cs.stanford.edu a écrit : De: David Hall d...@cs.stanford.edu Objet: Re: [GSOC] Accepted Students À

Re: [GSOC] Accepted Students

2009-04-21 Thread Robin Anil
: De: David Hall d...@cs.stanford.edu Objet: Re: [GSOC] Accepted Students À: mahout-dev@lucene.apache.org Date: Mardi 21 Avril 2009, 8h30 On Mon, Apr 20, 2009 at 11:18 PM, deneche abdelhakim a_dene...@yahoo.fr wrote: Hi, =D I've been accepted. And I'll be working

Re: [GSOC] Accepted Students

2009-04-21 Thread Isabel Drost
On Tuesday 21 April 2009 08:30:34 David Hall wrote: As for questions, what am I supposed to be reading during this community building period? I see: * http://cwiki.apache.org/MAHOUT/howtocontribute.html * http://www.apache.org/foundation/how-it-works.html plus skimming javadocs. These are

Re: gsoc , EM or SVM?

2009-04-02 Thread Yifan Wang
Hi I decided to go with the mixture model for EM. I have modified my proposal and submit it both on gsoc website and apache wiki. Best Regards Yifan 2009/4/1 Yifan Wang heavens...@gmail.com: I will choose Mixture Model for the EM implementation. Yifan 2009/4/1 Ted Dunning

Re: [gsoc] Collaborative filtering algorithms

2009-04-01 Thread Ted Dunning
I would hope that your SVD implementation would not be limited to NetFlix like problems, but would be applicable to any reasonably sparse matrix-like data. Likewise, I would expect a good SVD implementation to be useful for nearest neighbor methods or direct prediction by smoothing the history

Re: [gsoc] Collaborative filtering algorithms

2009-04-01 Thread Atul Kulkarni
On Wed, Apr 1, 2009 at 1:30 AM, Ted Dunning ted.dunn...@gmail.com wrote: I would hope that your SVD implementation would not be limited to NetFlix like problems, but would be applicable to any reasonably sparse matrix-like data. Yes, ofcourse. it would apply to any large sparse matrix

Re: [gsoc] Collaborative filtering algorithms

2009-04-01 Thread Atul Kulkarni
Thanks David, that helped. On Wed, Apr 1, 2009 at 1:47 AM, David Hall d...@cs.stanford.edu wrote: On Tue, Mar 31, 2009 at 11:43 PM, Atul Kulkarni atulskulka...@gmail.com wrote: questions in line. On Wed, Apr 1, 2009 at 1:27 AM, Ted Dunning ted.dunn...@gmail.com wrote: Nobody is

Re: [GSOC] Ranking Process

2009-04-01 Thread Richard Tomsett
I'm preparing an application, but haven't submitted yet as I was waiting on confirmation of my student status... as I now know that I'm going to be eligible I'll get my application in soon :) 2009/4/1 Ted Dunning ted.dunn...@gmail.com: I only see two applications for Mahout, one reasonably

Re: [GSOC] Ranking Process

2009-04-01 Thread Grant Ingersoll
Hmm, I see several in there, but they aren't all labeled w/ Mahout, so that may be why. I also expanded to see 100 at a time. -Grant On Mar 31, 2009, at 8:43 PM, Ted Dunning wrote: I only see two applications for Mahout, one reasonably strong, one much less so. Are there students out

Re: [GSOC] Ranking Process

2009-04-01 Thread Grant Ingersoll
The other thing to note, here, is that people should be aware that the ASF is only going to get a certain number of slots from Google (last year, it was somewhere in the 30-40 range, I think), which are distributed across all projects that have expressed an interest in mentoring. While

Re: [gsoc] Collaborative filtering algorithms

2009-04-01 Thread Ted Dunning
The machinery of SVD is almost always described in terms of least squares matrix approximation without mentioning the probabilistic underpinnings of why least-squares is a good idea. The connection, however, goes all the way back to Gauss' reduction of planetary position observations (this is

Re: [GSOC] Ranking Process

2009-04-01 Thread Ted Dunning
Let me second that. When I am hiring a student without professional experience, it is almost a perfect predictor that if they have done significant work on a significant outside project they will get an interview with me and if not, they won't. Moreover, if I have a candidate at any level who

Re: gsoc , EM or SVM?

2009-04-01 Thread Grant Ingersoll
Hi Yifan, I think both are good candidates, although AIUI, SVM is a bit harder to parallelize, so maybe it would make sense to focus on EM. Of course, we don't have to be distributed, so you could propose a non- distributed SVM implementation as a first cut and then work on the

Re: gsoc , EM or SVM?

2009-04-01 Thread Ted Dunning
Yifan, EM is a highly non-specific term and covers a huge range of very different algorithms. For example, pLSI, HMM's, and mixture models can all be estimated using EM. What exactly did you mean to address with an EM implementation? On Wed, Apr 1, 2009 at 1:05 PM, Grant Ingersoll

Re: gsoc , EM or SVM?

2009-04-01 Thread Yifan Wang
I will choose Mixture Model for the EM implementation. Yifan 2009/4/1 Ted Dunning ted.dunn...@gmail.com: Yifan, EM is a highly non-specific term and covers a huge range of very different algorithms.  For example, pLSI, HMM's, and mixture models can all be estimated using EM. What exactly

Re: [GSoC] SimRank Algorithms on Mahout Proposal draft from Xuan Yang

2009-04-01 Thread Robert Burrell Donkin
On Wed, Apr 1, 2009 at 7:12 PM, Xuan Yang sailingw...@gmail.com wrote: Hello everyone,    This is my proposal draft. BTW remember http://markmail.org/message/rbwp2hf6iipc2ut3 - robert

Re: [GSoC] SimRank Algorithms on Mahout Proposal draft from Xuan Yang

2009-04-01 Thread Xuan Yang
Thanks, I have submited it there. :) 2009/4/2 Robert Burrell Donkin robertburrelldon...@gmail.com: On Wed, Apr 1, 2009 at 7:12 PM, Xuan Yang sailingw...@gmail.com wrote: Hello everyone,    This is my proposal draft. BTW remember http://markmail.org/message/rbwp2hf6iipc2ut3 - robert --

Re: [gsoc] random forests

2009-03-31 Thread deneche abdelhakim
Here is a draft of my proposal ** Title/Summary: [Apache Mahout] Implement parallel Random/Regression Forests Student: AbdelHakim Deneche Student e-mail: ... Student Major: Phd in Computer Science Student Degree: Master in Computer Science

Re: [GSOC] Ranking Process

2009-03-31 Thread Ted Dunning
I only see two applications for Mahout, one reasonably strong, one much less so. Are there students out there who still need to prepare an application? The deadline is coming up fast. 2009/3/31 Grant Ingersoll gsing...@apache.org FYI: http://wiki.apache.org/general/RankingProcess -Grant

Re: [gsoc] random forests

2009-03-31 Thread Ted Dunning
Deneche, I don't see your application on the GSOC web site. Nor on the apache wiki. Time is running out and I would hate to not see you in the program. Is it just that I can't see the application yet? On Tue, Mar 31, 2009 at 1:05 PM, deneche abdelhakim a_dene...@yahoo.frwrote: Here is a

Re: [gsoc] random forests

2009-03-30 Thread deneche abdelhakim
in the node hard-drive, and thus must be distributed across the cluster. abdelHakim --- En date de : Lun 30.3.09, Ted Dunning ted.dunn...@gmail.com a écrit : De: Ted Dunning ted.dunn...@gmail.com Objet: Re: [gsoc] random forests À: mahout-dev@lucene.apache.org Date: Lundi 30 Mars 2009, 0h59 I

Re: [gsoc] random forests

2009-03-30 Thread Ted Dunning
Indeed. And those datasets exist. It is also plausible that this full data scan approach will fail when you want the forest building to take less time. It is also plausible that a full data scan approach fails to improve enough on a non-parallel implementation. This would happen if a

Re: [gsoc] random forests

2009-03-30 Thread Ted Dunning
I suggest that we all learn from the experience you are about to have on the reference implementation. And, yes, I did mean the reference implementation when I said non-parallel. Thanks for clarifying. On Mon, Mar 30, 2009 at 10:45 AM, deneche abdelhakim a_dene...@yahoo.frwrote: What do you

Re: [gsoc] random forests

2009-03-29 Thread Ted Dunning
I have two answers for you. The first is that for any given application, the odds that the data will not fit in a single machine are small, especially if you have an out-of-core tree builder. Really, really big datasets are increasingly common, but are still a small minority of all datasets.

Re: [gsoc] random forests

2009-03-28 Thread deneche abdelhakim
you should read in . 2a . This implementation is, relatively, easy given... --- En date de : Sam 28.3.09, deneche abdelhakim a_dene...@yahoo.fr a écrit : De: deneche abdelhakim a_dene...@yahoo.fr Objet: Re: [gsoc] random forests À: mahout-dev@lucene.apache.org Date: Samedi 28 Mars 2009

Re: [GSoC] SimRank algorithms on Mahout

2009-03-24 Thread Grant Ingersoll
Graph ranking strategies are something I am very much interested in and would love to see in Mahout. Please do propose. -Grant On Mar 24, 2009, at 6:00 AM, Xuan Yang wrote: Hello everyone, I am a student from Fudan University, Shanghai, China. These days I am doing some research work on

Re: [GSoC] SimRank algorithms on Mahout

2009-03-24 Thread Xuan Yang
ok~ I will do it asap~ btw, I there any advices? thanks a lot~ :) 2009/3/24 Grant Ingersoll gsing...@apache.org Graph ranking strategies are something I am very much interested in and would love to see in Mahout. Please do propose. -Grant On Mar 24, 2009, at 6:00 AM, Xuan Yang wrote:

Re: GSoC 2009-Discussion

2009-03-24 Thread deneche abdelhakim
would be interested in the first...but of course if actually the community need them both :) --- En date de : Mar 24.3.09, Ted Dunning ted.dunn...@gmail.com a écrit : De: Ted Dunning ted.dunn...@gmail.com Objet: Re: GSoC 2009-Discussion À: mahout-dev@lucene.apache.org Date: Mardi 24 Mars 2009

Re: [GSoC] SimRank algorithms on Mahout

2009-03-24 Thread Ted Dunning
Answering some of your email out of order, On Mon, Mar 23, 2009 at 10:00 PM, Xuan Yang sailingw...@gmail.com wrote: These days I am doing some research work on SimRank, which is an model measuring similarity of objects. Great. I think it would be great to solve these problems and

Re: GSoC 2009-Discussion

2009-03-23 Thread Dawid Weiss
[snip] a web crawler. By doing this, a crawler, for instance, can use the output of the classification to only follow certain links that lie on informative content parts. Is this interesting make sense for you guys? Hi Samuel. This would be of great interest for the Nutch folks, I

Re: GSoC 2009-Discussion

2009-03-23 Thread Otis Gospodnetic
Mmmm :) This would definitely be very useful to anyone dealing with web page parsing and indexing. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Samuel Louvan samuel.lou...@gmail.com To: mahout-dev@lucene.apache.org Sent: Sunday,

Re: GSOC Mentor

2009-03-20 Thread Grady Laksmono
Hi guys, I'm actually interested with your project. I haven't started my proposal yet, because I'm still working on my finals now, I'll be writing it soon and let you guys know any updates. But I'm generally interested this idea: http://wiki.apache.org/general/SummerOfCode2008#lucene I had

Re: GSoC 09 project ideas...

2009-03-18 Thread Jason Rutherglen
Hi Z.S., I'll update LUCENE-1313 after LUCENE-1516 is committed. I can post the basic new patch I have for LUCENE-1313 (heavily simplified compared to the previous patches), however it will assume LUCENE-1516. The other area that will need to be addressed is standard benchmarking for different

Re: GSoC 09 project ideas...

2009-03-18 Thread Michael McCandless
I think creating a better Highlighter for Lucene, which is actively being discussed: https://issues.apache.org/jira/browse/LUCENE-1522 would make a good GSoC project, but I don't think I have time to mentor. Realtime search is currently in progress already, being tracked/iterated here:

  1   2   >