RE: carrot2 question too - Re: Fun with the Wikipedia
OK, thanks. Adam > -Original Message- > From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] > Sent: Monday, January 31, 2005 5:51 PM > To: Lucene Users List; [EMAIL PROTECTED] > Subject: RE: carrot2 question too - Re: Fun with the Wikipedia > > Adam, > > Dawid posted some code that lets you use Carrot2 locally with Lucene, > without the componentized pipe line system described on Carrot2 site. > > Otis > > --- Adam Saltiel <[EMAIL PROTECTED]> wrote: > > > David, Hi, > > Would you be able to comment on coincidentally recent thread " RE: -> > > Grouping Search Results by Clustering Snippets:"? > > Also, when I looked at Carrot2 the pipe line is implemented as over > > http. I > > wonder how efficient that is, or can it be changed, for instance for > > an all > > local implementation? > > Has Carrot2 been integrated in with Lucene, has it been used as the > > bases > > for a recommender system (could it be?)? > > TIA. > > > > Adam > > > > > -----Original Message----- > > > From: Dawid Weiss [mailto:[EMAIL PROTECTED] > > > Sent: Monday, January 31, 2005 4:12 PM > > > To: Lucene Users List > > > Subject: Re: carrot2 question too - Re: Fun with the Wikipedia > > > > > > > > > Hi. > > > > > > Coming up with answers... a little belated, but hope you're still > > on: > > > > > > > we have been experimenting with carrot2 and are very pleased so > > far, > > > > only one issue: there is no release not even an alpha one and the > > > > dependencies seemed to be patched (jama) > > > > > > Yes, there is not "official" release. We just don't feel the need > > to tag > > > the sources with an official label because Carrot is not a > > stand-alone > > > product (rather a library... or a framework). It does not imply > > that the > > > project is in alpha stage... quite the contrary, in fact -- it has > > been > > > out there for a while and it seems to do a good job for most > > people. > > > > > > > is there any intentions to have any releases in the near future? > > > > > > I could tag a release even today if it makes you happy ;) But I > > hope I > > > made the status of the project clear above. > > > > > > D. > > > > > > > > - > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > For additional commands, e-mail: > > [EMAIL PROTECTED] > > > > > > > > > > - > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: carrot2 question too - Re: Fun with the Wikipedia
Hi Adam. Otis and David have already provided you with pointers to my previous post regarding Carrot2-Lucene integration, so just a tiny note here: Also, when I looked at Carrot2 the pipe line is implemented as over http. I wonder how efficient that is, or can it be changed, for instance for an all local implementation? Yes, there exists a possibility to combine components locally. It is even demonstrated in the sample code David Spencer mentioned. Has Carrot2 been integrated in with Lucene, has it been used as the bases for a recommender system (could it be?)? I don't know... I guess it could but you'd have to play with the source code and modify it a bit to get the required functionality. Can't really tell anything more specific because I'm not deep in that subject. D. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: carrot2 question too - Re: Fun with the Wikipedia
Otis Gospodnetic wrote: Adam, Dawid posted some code that lets you use Carrot2 locally with Lucene, see embedded zip url here for carrot2/lucene code - it may also be in the carrot2 cvs tree too - this is what I used in the wikipedia/cluster stuff as the basis http://www.newsarch.com/archive/mailinglist/jakarta/lucene/user/msg03928.html without the componentized pipe line system described on Carrot2 site. Otis --- Adam Saltiel <[EMAIL PROTECTED]> wrote: David, Hi, Would you be able to comment on coincidentally recent thread " RE: -> Grouping Search Results by Clustering Snippets:"? Also, when I looked at Carrot2 the pipe line is implemented as over http. I wonder how efficient that is, or can it be changed, for instance for an all local implementation? Has Carrot2 been integrated in with Lucene, has it been used as the bases for a recommender system (could it be?)? TIA. Adam -Original Message- From: Dawid Weiss [mailto:[EMAIL PROTECTED] Sent: Monday, January 31, 2005 4:12 PM To: Lucene Users List Subject: Re: carrot2 question too - Re: Fun with the Wikipedia Hi. Coming up with answers... a little belated, but hope you're still on: we have been experimenting with carrot2 and are very pleased so far, only one issue: there is no release not even an alpha one and the dependencies seemed to be patched (jama) Yes, there is not "official" release. We just don't feel the need to tag the sources with an official label because Carrot is not a stand-alone product (rather a library... or a framework). It does not imply that the project is in alpha stage... quite the contrary, in fact -- it has been out there for a while and it seems to do a good job for most people. is there any intentions to have any releases in the near future? I could tag a release even today if it makes you happy ;) But I hope I made the status of the project clear above. D. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: carrot2 question too - Re: Fun with the Wikipedia
Adam, Dawid posted some code that lets you use Carrot2 locally with Lucene, without the componentized pipe line system described on Carrot2 site. Otis --- Adam Saltiel <[EMAIL PROTECTED]> wrote: > David, Hi, > Would you be able to comment on coincidentally recent thread " RE: -> > Grouping Search Results by Clustering Snippets:"? > Also, when I looked at Carrot2 the pipe line is implemented as over > http. I > wonder how efficient that is, or can it be changed, for instance for > an all > local implementation? > Has Carrot2 been integrated in with Lucene, has it been used as the > bases > for a recommender system (could it be?)? > TIA. > > Adam > > > -Original Message- > > From: Dawid Weiss [mailto:[EMAIL PROTECTED] > > Sent: Monday, January 31, 2005 4:12 PM > > To: Lucene Users List > > Subject: Re: carrot2 question too - Re: Fun with the Wikipedia > > > > > > Hi. > > > > Coming up with answers... a little belated, but hope you're still > on: > > > > > we have been experimenting with carrot2 and are very pleased so > far, > > > only one issue: there is no release not even an alpha one and the > > > dependencies seemed to be patched (jama) > > > > Yes, there is not "official" release. We just don't feel the need > to tag > > the sources with an official label because Carrot is not a > stand-alone > > product (rather a library... or a framework). It does not imply > that the > > project is in alpha stage... quite the contrary, in fact -- it has > been > > out there for a while and it seems to do a good job for most > people. > > > > > is there any intentions to have any releases in the near future? > > > > I could tag a release even today if it makes you happy ;) But I > hope I > > made the status of the project clear above. > > > > D. > > > > > - > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: > [EMAIL PROTECTED] > > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: carrot2 question too - Re: Fun with the Wikipedia
David, Hi, Would you be able to comment on coincidentally recent thread " RE: -> Grouping Search Results by Clustering Snippets:"? Also, when I looked at Carrot2 the pipe line is implemented as over http. I wonder how efficient that is, or can it be changed, for instance for an all local implementation? Has Carrot2 been integrated in with Lucene, has it been used as the bases for a recommender system (could it be?)? TIA. Adam > -Original Message- > From: Dawid Weiss [mailto:[EMAIL PROTECTED] > Sent: Monday, January 31, 2005 4:12 PM > To: Lucene Users List > Subject: Re: carrot2 question too - Re: Fun with the Wikipedia > > > Hi. > > Coming up with answers... a little belated, but hope you're still on: > > > we have been experimenting with carrot2 and are very pleased so far, > > only one issue: there is no release not even an alpha one and the > > dependencies seemed to be patched (jama) > > Yes, there is not "official" release. We just don't feel the need to tag > the sources with an official label because Carrot is not a stand-alone > product (rather a library... or a framework). It does not imply that the > project is in alpha stage... quite the contrary, in fact -- it has been > out there for a while and it seems to do a good job for most people. > > > is there any intentions to have any releases in the near future? > > I could tag a release even today if it makes you happy ;) But I hope I > made the status of the project clear above. > > D. > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: carrot2 question too - Re: Fun with the Wikipedia
Hi. Coming up with answers... a little belated, but hope you're still on: we have been experimenting with carrot2 and are very pleased so far, only one issue: there is no release not even an alpha one and the dependencies seemed to be patched (jama) Yes, there is not "official" release. We just don't feel the need to tag the sources with an official label because Carrot is not a stand-alone product (rather a library... or a framework). It does not imply that the project is in alpha stage... quite the contrary, in fact -- it has been out there for a while and it seems to do a good job for most people. is there any intentions to have any releases in the near future? I could tag a release even today if it makes you happy ;) But I hope I made the status of the project clear above. D. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: carrot2 question too - Re: Fun with the Wikipedia
Strangely enough this subject is being taken up in the RE: -> Grouping Search Results by Clustering Snippets: thread. Adam > -Original Message- > From: Owen Densmore [mailto:[EMAIL PROTECTED] > Sent: Friday, January 28, 2005 4:57 PM > To: lucene-user@jakarta.apache.org > Cc: Owen Densmore > Subject: Re: carrot2 question too - Re: Fun with the Wikipedia > > I looked at the Carrot2 docs which mentioned dimension reduction via > singular value decomposition (SVD) .. and other forms too I think. > > Question: Does anyone have pointers to successful clustering techniques > used with lucene? I'm particularly interested in 2D and 3D graphics as > well, possibly SOM (Self Organizing Maps). > > I'm hoping to combine lucene with a graphical auto-clustering stunt of > some kind but am not sure how to do it yet. > > Owen > > > > From: Akmal Sarhan <[EMAIL PROTECTED]> > > Date: January 28, 2005 8:19:03 AM MST > > To: Lucene Users List > > Subject: Re: carrot2 question too - Re: Fun with the Wikipedia > > > > > > Hello, > > > > we have been experimenting with carrot2 and are very pleased so far, > > only one issue: there is no release not even an alpha one and the > > dependencies seemed to be patched (jama) > > is there any intentions to have any releases in the near future? > > > > thanks > > > > Akmal > > Am Montag, den 17.01.2005, 10:15 +0100 schrieb Dawid Weiss: > >> Hi David, > >> > >> I apologize about the delay in answering this one, Lucene is a busy > >> mailing list and I had a hectic last week... Again, sorry for belated > >> answer, hope you still find it useful. > >> > >>>> That is awesome and very inspirational! > >> > >> Yes, I admit what you've done with Wikipedia is quite interesting and > >> looks very good. I'm also glad you spent some time working out Carrot > >> integration with Lucene. It works quite nice. > >> > >>>> Carrot2 looks very interesting. Wondering if anybody has a list of > >>>> all > >>>> the > >>> > >>> Technically I don't think carrot2 uses lucene per-se- it's just that > >>> you > >>> can integrate the two, and ditto for Nutch - it has code that uses > >>> Carrot2. > >> > >> Yes, this is true. Carrot2 doesn't use all of Lucene's potential -- it > >> merely takes the output from a query (titles, urls and snippets) and > >> attempts to cluster them into some sensible groups. I think many > >> things > >> could be improved, the most important of them is fast snippet > >> retrieval > >>from Lucene because right now it takes 50% of the time of the > >> clustering; I've seen a post a while ago describing a faster snippet > >> generation technique, I'm sure that would give clustering a huge boost > >> speed-wise. > >> > >>> And here's my question. I reread the Carrot2<->Lucene code, esp > >>> Demo.java, and there's this fragment: > >>> > >>> // warm-up round (stemmer tables must be read etc). > >>> List clusters = clusterer.clusterHits(docs); > >>> > >>> long clusteringStartTime = System.currentTimeMillis(); > >>> clusters = clusterer.clusterHits(docs); > >>> long clusteringEndTime = System.currentTimeMillis(); > >>> > >>> Thus it calls clusterHits() twice. > >>> > >>> I don't really understand how to use Carrot2 - but I think the above > >>> is > >>> just for the sake of benchmarking clusterHits() w/o the effect of > >>> 1-time > >>> initialization - and that there's no benefit of repeatedly calling > >>> clusterHits (where a benefit might be that it can find nested > >>> clusters > >>> or whatever) - is that right (that there's no benefit)? > >> > >> No, there is absolutely no benefit from it. It was merely to show > >> people > >> that the clustering needs to be warmed up a bit. I should not have put > >> it in the code knowing people would be confused by it. You can safely > >> use clusterHits just once. It will just have a small delay at the > >> first > >> invocation. > >> > >> > >> Thanks for experimenting. Please BCC me if you have any urgent > >> projects > >> -- I read Lucene's list in batches and my personal e-mail I try to > >> keep > >> up to date with. > >> > >> Dawid > > > > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: carrot2 question too - Re: Fun with the Wikipedia
I looked at the Carrot2 docs which mentioned dimension reduction via singular value decomposition (SVD) .. and other forms too I think. Question: Does anyone have pointers to successful clustering techniques used with lucene? I'm particularly interested in 2D and 3D graphics as well, possibly SOM (Self Organizing Maps). I'm hoping to combine lucene with a graphical auto-clustering stunt of some kind but am not sure how to do it yet. Owen From: Akmal Sarhan <[EMAIL PROTECTED]> Date: January 28, 2005 8:19:03 AM MST To: Lucene Users List Subject: Re: carrot2 question too - Re: Fun with the Wikipedia Hello, we have been experimenting with carrot2 and are very pleased so far, only one issue: there is no release not even an alpha one and the dependencies seemed to be patched (jama) is there any intentions to have any releases in the near future? thanks Akmal Am Montag, den 17.01.2005, 10:15 +0100 schrieb Dawid Weiss: Hi David, I apologize about the delay in answering this one, Lucene is a busy mailing list and I had a hectic last week... Again, sorry for belated answer, hope you still find it useful. That is awesome and very inspirational! Yes, I admit what you've done with Wikipedia is quite interesting and looks very good. I'm also glad you spent some time working out Carrot integration with Lucene. It works quite nice. Carrot2 looks very interesting. Wondering if anybody has a list of all the Technically I don't think carrot2 uses lucene per-se- it's just that you can integrate the two, and ditto for Nutch - it has code that uses Carrot2. Yes, this is true. Carrot2 doesn't use all of Lucene's potential -- it merely takes the output from a query (titles, urls and snippets) and attempts to cluster them into some sensible groups. I think many things could be improved, the most important of them is fast snippet retrieval from Lucene because right now it takes 50% of the time of the clustering; I've seen a post a while ago describing a faster snippet generation technique, I'm sure that would give clustering a huge boost speed-wise. And here's my question. I reread the Carrot2<->Lucene code, esp Demo.java, and there's this fragment: // warm-up round (stemmer tables must be read etc). List clusters = clusterer.clusterHits(docs); long clusteringStartTime = System.currentTimeMillis(); clusters = clusterer.clusterHits(docs); long clusteringEndTime = System.currentTimeMillis(); Thus it calls clusterHits() twice. I don't really understand how to use Carrot2 - but I think the above is just for the sake of benchmarking clusterHits() w/o the effect of 1-time initialization - and that there's no benefit of repeatedly calling clusterHits (where a benefit might be that it can find nested clusters or whatever) - is that right (that there's no benefit)? No, there is absolutely no benefit from it. It was merely to show people that the clustering needs to be warmed up a bit. I should not have put it in the code knowing people would be confused by it. You can safely use clusterHits just once. It will just have a small delay at the first invocation. Thanks for experimenting. Please BCC me if you have any urgent projects -- I read Lucene's list in batches and my personal e-mail I try to keep up to date with. Dawid - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: carrot2 question too - Re: Fun with the Wikipedia
Hello, we have been experimenting with carrot2 and are very pleased so far, only one issue: there is no release not even an alpha one and the dependencies seemed to be patched (jama) is there any intentions to have any releases in the near future? thanks Akmal Am Montag, den 17.01.2005, 10:15 +0100 schrieb Dawid Weiss: > Hi David, > > I apologize about the delay in answering this one, Lucene is a busy > mailing list and I had a hectic last week... Again, sorry for belated > answer, hope you still find it useful. > > >> That is awesome and very inspirational! > > Yes, I admit what you've done with Wikipedia is quite interesting and > looks very good. I'm also glad you spent some time working out Carrot > integration with Lucene. It works quite nice. > > >> Carrot2 looks very interesting. Wondering if anybody has a list of all > >> the > > > > Technically I don't think carrot2 uses lucene per-se- it's just that you > > can integrate the two, and ditto for Nutch - it has code that uses Carrot2. > > Yes, this is true. Carrot2 doesn't use all of Lucene's potential -- it > merely takes the output from a query (titles, urls and snippets) and > attempts to cluster them into some sensible groups. I think many things > could be improved, the most important of them is fast snippet retrieval >from Lucene because right now it takes 50% of the time of the > clustering; I've seen a post a while ago describing a faster snippet > generation technique, I'm sure that would give clustering a huge boost > speed-wise. > > > And here's my question. I reread the Carrot2<->Lucene code, esp > > Demo.java, and there's this fragment: > > > > // warm-up round (stemmer tables must be read etc). > > List clusters = clusterer.clusterHits(docs); > > > > long clusteringStartTime = System.currentTimeMillis(); > > clusters = clusterer.clusterHits(docs); > > long clusteringEndTime = System.currentTimeMillis(); > > > > Thus it calls clusterHits() twice. > > > > I don't really understand how to use Carrot2 - but I think the above is > > just for the sake of benchmarking clusterHits() w/o the effect of 1-time > > initialization - and that there's no benefit of repeatedly calling > > clusterHits (where a benefit might be that it can find nested clusters > > or whatever) - is that right (that there's no benefit)? > > No, there is absolutely no benefit from it. It was merely to show people > that the clustering needs to be warmed up a bit. I should not have put > it in the code knowing people would be confused by it. You can safely > use clusterHits just once. It will just have a small delay at the first > invocation. > > > Thanks for experimenting. Please BCC me if you have any urgent projects > -- I read Lucene's list in batches and my personal e-mail I try to keep > up to date with. > > Dawid > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > !EXCUBATOR:41eb81f8156071530375633! > -- Akmal Sarhan <[EMAIL PROTECTED]> ByteAction GmbH - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: carrot2 question too - Re: Fun with the Wikipedia
Dawid Weiss wrote: Hi David, I apologize about the delay in answering this one, Lucene is a busy mailing list and I had a hectic last week... Again, sorry for belated answer, hope you still find it useful. Oh no problem, and yes carrot2 is useful and fun. It's a rich package so it takes a while to understand all that it can do. That is awesome and very inspirational! Yes, I admit what you've done with Wikipedia is quite interesting and looks very good. I'm also glad you spent some time working out Carrot integration with Lucene. It works quite nice. Thanks but I just took code that I think you wrote(!) and made minor mods to it - here's one link: http://www.newsarch.com/archive/mailinglist/jakarta/lucene/user/msg03928.html I'd like to do more w/ Carrot2- that's where things get harder. Carrot2 looks very interesting. Wondering if anybody has a list of all the Technically I don't think carrot2 uses lucene per-se- it's just that you can integrate the two, and ditto for Nutch - it has code that uses Carrot2. Yes, this is true. Carrot2 doesn't use all of Lucene's potential -- it merely takes the output from a query (titles, urls and snippets) and attempts to cluster them into some sensible groups. I think many things could be improved, the most important of them is fast snippet retrieval from Lucene because right now it takes 50% of the time of the clustering; I've seen a post a while ago describing a faster snippet generation technique, I'm sure that would give clustering a huge boost speed-wise. And here's my question. I reread the Carrot2<->Lucene code, esp Demo.java, and there's this fragment: // warm-up round (stemmer tables must be read etc). List clusters = clusterer.clusterHits(docs); long clusteringStartTime = System.currentTimeMillis(); clusters = clusterer.clusterHits(docs); long clusteringEndTime = System.currentTimeMillis(); Thus it calls clusterHits() twice. I don't really understand how to use Carrot2 - but I think the above is just for the sake of benchmarking clusterHits() w/o the effect of 1-time initialization - and that there's no benefit of repeatedly calling clusterHits (where a benefit might be that it can find nested clusters or whatever) - is that right (that there's no benefit)? No, there is absolutely no benefit from it. It was merely to show people that the clustering needs to be warmed up a bit. I should not have put it in the code knowing people would be confused by it. You can safely use clusterHits just once. It will just have a small delay at the first invocation. Thanks for experimenting. Please BCC me if you have any urgent projects -- I read Lucene's list in batches and my personal e-mail I try to keep up to date with. Dawid thx, Dave - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: carrot2 question too - Re: Fun with the Wikipedia
Hi David, I apologize about the delay in answering this one, Lucene is a busy mailing list and I had a hectic last week... Again, sorry for belated answer, hope you still find it useful. That is awesome and very inspirational! Yes, I admit what you've done with Wikipedia is quite interesting and looks very good. I'm also glad you spent some time working out Carrot integration with Lucene. It works quite nice. Carrot2 looks very interesting. Wondering if anybody has a list of all the Technically I don't think carrot2 uses lucene per-se- it's just that you can integrate the two, and ditto for Nutch - it has code that uses Carrot2. Yes, this is true. Carrot2 doesn't use all of Lucene's potential -- it merely takes the output from a query (titles, urls and snippets) and attempts to cluster them into some sensible groups. I think many things could be improved, the most important of them is fast snippet retrieval from Lucene because right now it takes 50% of the time of the clustering; I've seen a post a while ago describing a faster snippet generation technique, I'm sure that would give clustering a huge boost speed-wise. And here's my question. I reread the Carrot2<->Lucene code, esp Demo.java, and there's this fragment: // warm-up round (stemmer tables must be read etc). List clusters = clusterer.clusterHits(docs); long clusteringStartTime = System.currentTimeMillis(); clusters = clusterer.clusterHits(docs); long clusteringEndTime = System.currentTimeMillis(); Thus it calls clusterHits() twice. I don't really understand how to use Carrot2 - but I think the above is just for the sake of benchmarking clusterHits() w/o the effect of 1-time initialization - and that there's no benefit of repeatedly calling clusterHits (where a benefit might be that it can find nested clusters or whatever) - is that right (that there's no benefit)? No, there is absolutely no benefit from it. It was merely to show people that the clustering needs to be warmed up a bit. I should not have put it in the code knowing people would be confused by it. You can safely use clusterHits just once. It will just have a small delay at the first invocation. Thanks for experimenting. Please BCC me if you have any urgent projects -- I read Lucene's list in batches and my personal e-mail I try to keep up to date with. Dawid - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]