Re: Proximity query

2015-02-12 Thread Maisnam Ns
Hi, I googled it but could not find the jars of these classes can some help me where to get the jars import org.apache.lucene.corpus.stats.IDFCalc; import org.apache.lucene.corpus.stats.TFIDFPriorityQueue; import org.apache.lucene.corpus.stats.TermIDF; Thanks On Thu, Feb 12, 2015 at 11:01 PM,

Re: A codec moment or pickle

2015-02-12 Thread Benson Margulies
Based on reading the same comments you read, I'm pretty doubtful that Codec.getDefault() is going to work. It seems to me that this situation renders the FilterCodec a bit hard to to use, at least given the 'every release deprecates a codec' sort of pattern. On Thu, Feb 12, 2015 at 3:20 AM, Uwe

Re: A codec moment or pickle

2015-02-12 Thread Benson Margulies
Robert, Let me lay out the scenario. Hardware has .5T of Index is relatively small. Application profiling shows a significant amount of time spent codec-ing. Options as I see them: 1. Use DPF complete with the irritation of having to have this spurious codec name in the on-disk format that has

Re: A codec moment or pickle

2015-02-12 Thread Benson Margulies
WHOOPS. First sentence was, until just before I clicked 'send', Hardware has .5T of RAM. Index is relatively small (20g) ... On Thu, Feb 12, 2015 at 4:51 PM, Benson Margulies ben...@basistech.com wrote: Robert, Let me lay out the scenario. Hardware has .5T of Index is relatively small.

Re: occurrence of two terms with the highest frequency

2015-02-12 Thread Ian Lea
I think you can do it with 4 simple queries: 1) +flying +shooting 2) +flying +fighting etc. or BooleanQuery equivalents with MUST clauses. Use aol.search.TotalHitCountCollector and it should be blazingly fast, even if you have more that 100 docs. -- Ian. On Thu, Feb 12, 2015 at 5:42 PM,

RE: A codec moment or pickle

2015-02-12 Thread Uwe Schindler
Hi, How about Codec.getDefault()? It does indeed not necessarily return the newest one (if somebody changes the default using Codec.setDefault()), but for your use case wrapping the current default one, it should be fine? I have not tried this yet, but there might be a chicken-egg problem: -

Re: A codec moment or pickle

2015-02-12 Thread Robert Muir
Honestly i dont agree. I don't know what you are trying to do, but if you want file format backwards compat working, then you need a different FilterCodec to match each lucene codec. Otherwise your codec is broken from a back compat standpoint. Wrapping the latest is an antipattern here. On

Re: A codec moment or pickle

2015-02-12 Thread Robert Muir
On Thu, Feb 12, 2015 at 8:51 AM, Benson Margulies ben...@basistech.com wrote: On Thu, Feb 12, 2015 at 8:43 AM, Robert Muir rcm...@gmail.com wrote: Honestly i dont agree. I don't know what you are trying to do, but if you want file format backwards compat working, then you need a different

RE: A codec moment or pickle

2015-02-12 Thread Uwe Schindler
Hi, FYI, this is the same issues like Locales have/had in ICU! If you try to render an error message in Locales's constructors, this breaks with NPE - because default Locale is not yet there... I think they implemented some fallback that is guaranteed to be there. But this would not help you,

Re: A codec moment or pickle

2015-02-12 Thread Benson Margulies
On Thu, Feb 12, 2015 at 8:43 AM, Robert Muir rcm...@gmail.com wrote: Honestly i dont agree. I don't know what you are trying to do, but if you want file format backwards compat working, then you need a different FilterCodec to match each lucene codec. Otherwise your codec is broken from a

RE: Proximity query

2015-02-12 Thread Allison, Timothy B.
Might also look at concordance code on LUCENE-5317 and here: https://github.com/tballison/lucene-addons/tree/master/lucene-5317 Let me know if you have any questions. -Original Message- From: Maisnam Ns [mailto:maisnam...@gmail.com] Sent: Thursday, February 12, 2015 11:57 AM To:

Re: Lucene Version Upgrade (3-4) and Java JVM Versions(6-8)

2015-02-12 Thread Robert Muir
On Thu, Feb 12, 2015 at 11:58 AM, McKinley, James T james.mckin...@cengage.com wrote: Hi Robert, Thanks for responding to my message. Are you saying that you or others have encountered problems running Lucene 4.8+ on the 64-bit Java SE 1.7 JVM with G1 and was it on Windows or on Linux? If

Re: Proximity query

2015-02-12 Thread Sujit Pal
I did something like this sometime back. The objective was to find patterns surrounding some keywords of interest so I could find keywords similar to the ones I was looking for, sort of like a poor man's word2vec. It uses SpanQuery as Jigar said, and you can find the code here (I believe it was

Proximity query

2015-02-12 Thread Maisnam Ns
Hi, Can someone help me if this use case is possible or not with lucene Use case: I have a string say 'Japan' appearing in 10 documents and I want to get back , say some results which contain two words before 'Japan' and two words after 'Japan' may be something like this ' Economy of Japan is

RE: Lucene Version Upgrade (3-4) and Java JVM Versions(6-8)

2015-02-12 Thread McKinley, James T
Hi Robert, Thanks for responding to my message. Are you saying that you or others have encountered problems running Lucene 4.8+ on the 64-bit Java SE 1.7 JVM with G1 and was it on Windows or on Linux? If so, where can I find out more? I only looked into the one bug because that was the only

Re: Proximity query

2015-02-12 Thread Jigar Shah
This concept is called Proximity Search in general. In Lucene they are achieved using SpanQuery. On Thu, Feb 12, 2015 at 10:10 PM, Maisnam Ns maisnam...@gmail.com wrote: Hi, Can someone help me if this use case is possible or not with lucene Use case: I have a string say 'Japan' appearing

Re: Proximity query

2015-02-12 Thread Maisnam Ns
Hi Shah, Thanks for your reply. Will try to google SpanQuery meanwhile if you have some links can you please share Thanks On Thu, Feb 12, 2015 at 10:17 PM, Jigar Shah jigaronl...@gmail.com wrote: This concept is called Proximity Search in general. In Lucene they are achieved using

Re: Proximity query

2015-02-12 Thread Maisnam Ns
Hi Allison and Sujit, Thanks so much for your links I am so happy I am looking at exactly the links that almost covers my use case. Allison, sure will get back to you if I have some more questions. Regards NS On Thu, Feb 12, 2015 at 10:49 PM, Sujit Pal sujit@comcast.net wrote: I did

occurrence of two terms with the highest frequency

2015-02-12 Thread Maisnam Ns
Hi, Can someone help me with this use case. Use case: Say there are 4 key words 'Flying', 'Shooting', 'fighting' and 'looking' in100 documents to search for. Consider 'Flying' and 'Shooting' co- occurs (together) in 70 documents where as 'Flying and 'fighting' co- occurs in 14 documents