Re: ANN search current state

2020-07-17 Thread Tommaso Teofili
would it make sense to create a separate Lucene module for ANN search ? we could then experiment with the different approaches and compare them across the same benchmarks. On Thu, 16 Jul 2020 at 23:14, Ali Akhtar wrote: > I’m a bit of a layman in this area, but if we are talking about formats

Re: Optimizing a boolean query for 100s of term clauses

2020-06-25 Thread Tommaso Teofili
hi Alex, I had worked on a similar problem directly on Lucene (within Anserini toolkit) using LSH fingerprints of tokenized feature vector values. You can find code at [1] and some information on the Anserini documentation page [2] and in a short preprint [3]. As a side note my current thinking

Re: [VOTE] Lucene logo contest

2020-06-17 Thread Tommaso Teofili
PMC vote: option C (current) On Wed, 17 Jun 2020 at 07:58, Ignacio Vera Sequeiros wrote: > PMC vote: option A > > On Wed, Jun 17, 2020 at 7:36 AM Jeroen Lauwers > wrote: > > > A. Definitely. > > > > Verstuurd vanaf mijn telefoon > > > > > Op 17 jun. 2020 om 03:46 heeft Jason Gerlowski > >

Re: German decompounding/tokenization with Lucene?

2017-09-16 Thread Tommaso Teofili
+1, some time ago I also used the decompounder mentioned by Dawid and was satisfied back then. Regards, Tommaso Il giorno sab 16 set 2017 alle ore 09:29 Dawid Weiss ha scritto: > Hi Mike. Search lucene dev archives. I did write a decompounder with Daniel > Naber. The

Re: Using POS payloads for chunking

2017-06-14 Thread Tommaso Teofili
I think it'd be interesting to also investigate using TypeAttribute [1] together with TypeTokenFilter [2]. Regards, Tommaso [1] : https://lucene.apache.org/core/6_5_0/core/org/apache/lucene/analysis/tokenattributes/TypeAttribute.html [2] :

Re: Possible to cause documents to be contiguous after forceMerge?

2016-11-16 Thread Tommaso Teofili
improved locality of "near" documents could be used to avoid loading some segments during the retrieval phase for certain use cases (e.g. spatial search). Il giorno mer 16 nov 2016 alle ore 09:45 Ishan Chattopadhyaya < ichattopadhy...@gmail.com> ha scritto:

Re: POS tagging in Lucene

2016-10-19 Thread Tommaso Teofili
I think it might be helpful to handle POS tags as TypeAttributes so that the input and output texts would cleaner and you can still filter and retrieve tokens by type (e.g. with TypeTokenFilter). My 2 cents, Tommaso Il giorno mer 19 ott 2016 alle ore 11:56 Niki Pavlopoulou ha

Re: [blog post] Comparing Document Classification Functions of Lucene and Mahout

2014-03-08 Thread Tommaso Teofili
see simple one first. :-) Why don't we consider adding Analyzer parameter to assignClass()? koji (14/03/07 17:18), Tommaso Teofili wrote: cool Koji, thanks a lot for sharing. Some useful points / suggestions come out of it, let's see if we can follow up :) Regards, Tommaso 2014-03

Re: [blog post] Comparing Document Classification Functions of Lucene and Mahout

2014-03-07 Thread Tommaso Teofili
cool Koji, thanks a lot for sharing. Some useful points / suggestions come out of it, let's see if we can follow up :) Regards, Tommaso 2014-03-07 3:30 GMT+01:00 Koji Sekiguchi k...@r.email.ne.jp: Hello, I just posted an article on Comparing Document Classification Functions of Lucene and

Re: [blog post] Automatically Acquiring Synonym Knowledge from Wikipedia

2013-05-29 Thread Tommaso Teofili
2013/5/29 Koji Sekiguchi k...@r.email.ne.jp Hi Rajesh, Thanks! I'm planning to open an NLP tool kit for Lucene, and the tool kit will include the following synonym library. sounds nice, looking forward to it. Tommaso koji (13/05/28 14:12), Rajesh Nikam wrote: Hello Koji, This

Re: Reg Lucene Naive Bayesian classifier.

2013-01-15 Thread Tommaso Teofili
2013/1/15 VIGNESH S vigneshkln...@gmail.com Hi All, Thanks for your replies.. Actually I am trying to classify the email mail data in to categories and also spam mails .. I have tried clustering but it is not useful since we can not control categories. I am looking for a light weight

Re: Help needed Regarding classification of Text Data using Lucene..

2013-01-09 Thread Tommaso Teofili
Hi, you can have a look at the (early stage) Lucene classification module on trunk [1], see also a brief introduction given at last ApacheCon EU [2]. Hope this helps, Tommaso [1] : http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/classification/ [2] :

Re: ANN: UweSays Query Operator

2012-11-20 Thread Tommaso Teofili
that's nice! Tommaso 2012/11/19 Uwe Schindler u...@thetaphi.de Lol! Many thanks for this support! Uwes Otis Gospodnetic otis.gospodne...@gmail.com schrieb: Hi, Quick announcement for Uwe Friends. UweSays is now a super-duper-special query operator over on

Re: Lucene index on NFS

2012-10-02 Thread Tommaso Teofili
Ok, that saves you from concurrency issue, but in my experience is just much slower than local file system, so still NFS can be used but with some tradeoff on performance. My 2 cents, Tommaso 2012/10/2 Jong Kim jong.luc...@gmail.com The setup is I have a home-grown server process that has

Re: Custom Payload Analyzer and Query

2012-02-07 Thread Tommaso Teofili
2012/2/6 Ian Lea ian@gmail.com Not sure if you got an answer to this or not. Don't recall seeing one and gmail threading says not. Is the use of payloads I've described appropriate? Sounds OK to me, although I'm not sure why you can't store the metadata as a Document Field. Can I

Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-18 Thread Tommaso Teofili
[X] ASF Mirrors (linked in our release announcements or via the Lucene website) [X] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.) [] I/we build them from source via an SVN/Git checkout. [] Other (someone in your company mirrors them internally or via a downstream project)