Grant Ingersoll apache.org> writes:
With a little logic on your size to count, you can use SpanQueries to do that.
-Grant
On Jan 21, 2011, at 4:03 PM, Sharma Kollaparthi wrote:
Hi ,
I have started to use Lucene for searching in HTML files. Is it
possible to get Hits
Hi there,
I had a question about migrating the coord value one level up. My current
query structure has a root BooleanQuery with a bunch of nested BooleanQuery
children: one of these looks for all terms in the query issued, and I want
to apply the coord factor for this BooleanQuery to all its sibl
while calling addindexes or addindexes with no optimize can any gurantee be
given about the document order in the new documents given that the order of
directories/indexreader is fixed.
So is it that ith document coming from jth indexreader will always have some
x(i,j) position in the final merged
This implies there is no way to merge two parallel indexes(based on parallel
reader) to get a new parallel index. Correct me if I am wrong.
On Tue, Jun 29, 2010 at 11:24 PM, Andrzej Bialecki wrote:
> On 2010-06-30 05:12, Apoorv Sharma wrote:
> > while calling addindexes or addindexe
using termfrequency vector.
Thanks,
Sharma
--
Sharma Kollaparthi
CDU Systems & Process Tools
Software Developer I
ANSYS INC.
I am building my code using Lucene 4.7.1 and Hadoop 2.4.0 . Here is what I am
trying to do
Create Index
1. Build index in RAMDirectory based on data stored on HDFS .
2. Once built , copy the index onto HDFS.
Search Index
1. Bring in the index stored on HDFS into RAMDirector
Please do help here.
Thank you ,
Varun.
On Tuesday, 15 July 2014 2:14 PM, varun sharma
wrote:
I am building my code using Lucene 4.7.1 and Hadoop 2.4.0 . Here is what I am
trying to do
Create Index
1. Build index in RAMDirectory based on data stored on HDFS .
2. Once
Hi All,
While working on a new Query type, I was inclined to think of a couple
of use cases where the documents being scored need not be all of the
data set, but a sample of them. This can be useful for very large
datasets, where a query is only interested in getting the "feel" of
the data, and ot
Is your FuzzyQuery matching any documents at all?
It would be helpful if you could post your entire query. It might be
happening that your Fuzzy query is not matching any hits, but when you
specify it as a MUST clause, then it becomes a necessary condition for
any hit to be returned by your overal
>However, with MUST
> clause, that restriction is lifted.
I meant that with a SHOULD clause, that restriction is lifted i.e. a
query can score hits even if SHOULD clause does not match the hit (but
other MUST clauses do match).
On Sun, Jun 9, 2019 at 8:53 PM Tomoko Uchida
wrote:
>
> Hi,
>
> What analyzer do you use for the text field? Is the term "Main"
> correctly indexed?
Agreed. Also, it would be good if you could post your actual code.
What analyzer are you using? If you are using StandardAnalyzer, then
all of your
Any thoughts on this? I am envisioning applications to machine
learning systems, where the training dataset might be a small sample
of the entire dataset, and the user wants scoring to be done only on
samples of the dataset.
On Fri, Jun 7, 2019 at 5:45 PM Atri Sharma wrote:
>
> Hi All,
>
> i make sure i specify a string with 1 edit away misspelled and that
> never gets hit but the word with correct spelling is in the index.
How long are your query terms and the actual word? For fuzzy query to
match, your edit distance needs to be less than the smaller of the
query and the actual w
ing
> in the call.
>
> Best regards
>
>
>
> On 6/10/19 10:47 AM, baris.ka...@oracle.com wrote:
> > How do i check how it is indexed? lowecase or uppercase?
> >
> > only way is now to by testing.
> >
> > i am using standardanalyzer.
> >
> >
Yes, Lucene supports incremental indexing. Note that the underlying
structure is append only, so you are still paying the cost of delete +
insert, but the semantics are what you expect them to be.
On Mon, 24 Jun 2019 at 7:18 PM, Sukhendu Kumar Biswal
wrote:
> Hi Team,
> Does Lucene support incre
It depends a lot on the actual clauses (whether they are SHOULD, MUST,
MUST_NOT), each query’s type (phrase, term etc).
Could you post your query and the explain plan of IndexSearcher post the
rewrite?
On Wed, 26 Jun 2019 at 6:46 PM, wrote:
> Hi,-
>
> how can one find out each score contribut
n required clause (+countryDFLT:united
> (countryDFLT:uniten)^0.4202 +countryDFLT:states
> (countryDFLT:statesir)^0.56)
> 0.0 = Failure to meet condition(s) of required/prohibited clause(s)
>0.0 = no match on required clause (countryDFLT:united)
> 0.0 = no matching
Should not matter, AFAIK.
If your first MUST clause in a BooleanQuery fails to match for a
document, then there is no point for the engine to match further
clauses, right?
On Fri, Jul 5, 2019 at 7:56 PM wrote:
>
> Re-sending and please let me know Your amazing thoughts
>
> Happy July 4th
>
> Bes
ments in postings lists.
> Then this information is leveraged by block-max WAND in order to skip
> low-scoring blocks.
>
> This does indeed help avoid reading norms, but also document IDs and
> term frequencies.
>
> On Wed, Jul 10, 2019 at 4:10 PM Wu,Yunfeng
> mailto:wuyunfen.
MUST_NOT represents a clause which must not match against a document in
order for it to be qualified as a hit (think of SQL’s NOT IN).
MUST_NOT clauses are used as filters to eliminate candidate documents.
On Sun, 4 Aug 2019 at 23:11, Claude Lepere wrote:
> Hello!
>
> What score of a hit in res
It is not very clear as to what is it that you are trying to achieve
here. If you want to match similar terms as the one you specify in the
query (test, tesk, lest etc), then a fuzzy query (~) should suffice.
Note that you cannot specify a mandatory part of the text that has to
match in every resul
Yes, that will allow specifying wildcard as the first character, but
it can lead to very slow queries, especially on larger indices.
On Mon, Aug 5, 2019 at 6:08 PM wrote:
>
> Does QueryParser.setAllowLeadingWildCard(true) work?
>
> this will allow to use wildcard as first char in the search strin
I am curious — what use case are you targeting to solve here?
In relational world, this is useful primarily due to the fact that prepared
statements eliminate the need for re planning the query, thus saving the
cost of iterating over a potentially large combinatorial space. However,
for Lucene, th
query many times with a different parameter means recreating the
> > Query
> > > every time.
> > >
> > > I admit that creation of the Lucene query is not the most expensive
> part
> > of
> > > the planning process still we can gain something by not creati
This are typical symptoms of an index merge.
However, it is hard to predict more without knowing more data. What is
your segment size limit? Have you changed the default merge frequency
or max segments configuration? Would you have an estimate of ratio of
number of segments reaching max limit / to
PhraseQuery enforces the order of terms specified and needs an exact
match of order of terms unless slop is specified.
When appending terms, term pos numbers need to be incremental in the builder
On Fri, Jan 24, 2020 at 11:15 PM wrote:
>
> Hi,-
>
> how do i enforce the order of sequence of ter
On Fri, Mar 6, 2020 at 1:04 AM Aadithya C wrote:
>
> In my personal opinion, there are a few advantages of resizing -
>
>
> 1) The size of the cache is unpredictable as there is a fixed(guesstimate)
> accounting for the key size. With a resizable cache, we can potentially
> cache heavier queries a
D (binding)
On Wed, 2 Sep 2020 at 01:51, Ryan Ernst wrote:
> Dear Lucene and Solr developers!
>
>
>
> Sorry for the multiple threads. This should be the last one.
>
>
>
> In February a contest was started to design a new logo for Lucene
>
> [jira-issue]. The initial attempt [first-vote] to call
03/11/2020, Apache Lucene™ 8.7 available
The Lucene PMC is pleased to announce the release of Apache Lucene 8.7.
Apache Lucene is a high-performance, full-featured text search engine
library written entirely in Java. It is a technology suitable for
nearly any application that requires full-text
+1 to Adrien.
Let's keep the tone neutral.
On Mon, 14 Jun 2021, 16:00 Adrien Grand, wrote:
> Baris, you called out an insult from Alessandro and your replies suggest
> anger, but I couldn't see an insult from Alessandro actually.
>
> +1 to Alessandro's call to make the tone softer on this discu
Hi,
We have a large index that we divide into X lucene indices - we use lucene
6.5.0. On each of our serving machines serves 8 lucene indices in parallel.
We are getting realtime updates to each of these 8 indices. We are seeing a
couple of things:
a) When we turn off realtime updates, performanc
osts you are seeing are related to computing scores and not required
> for matching?
>
> -Mike
>
> On Fri, Aug 20, 2021 at 2:02 PM Varun Sharma
> wrote:
> >
> > Hi,
> >
> > We have a large index that we divide into X lucene indices - we use
> lucene
&g
and have done transactions worth more than
500$ between two date ranges.
The queries can go deeper than this.
Thanks in advance.
Gopal Sharma
n and then finally doing the
aggregates.
Is there any other way around this?
Thanks
Gopal Sharma
On Mon, Nov 15, 2021 at 10:36 PM Adrien Grand wrote:
> It's not straightforward as we don't provide high-level tooling to do this.
> You need to use the BitSetPro
reader.document(int docID) and then parse it which would be again the same
issue i pointed out.
Thanks
Gopal Sharma
On Tue, Nov 16, 2021 at 1:41 PM Adrien Grand wrote:
> Indeed you shouldn't load all hits, you should register a
> org.apache.lucene.search.Collector that will aggregate
Hi, all! I am currently using the default lucene's pattern capture token
filter in one of my projects where i have to utilize it for pattern
matching. The issue with it is that the default pattern capture token
filter gives the same start and end offset for each generated token: the
start and end o
I am currently making some changes in the default pattern capture group
token filter code to meet my requirement. I am a beginner in JAVA so
finding it a bit hard to fully understand the code and make changes, I have
successfully done my changes in the increment token() method and got the
desired r
Mine is atris for github, atri for JIRA
On Mon, Aug 1, 2022 at 4:03 PM Tomoko Uchida
wrote:
>
> Hi Mike, Marcus, and Praveen:
>
> I verified the added two mappings - these Jira users have activity on
> Lucene Jira, also corresponding GitHub accounts are valid.
> - marcussorealheis
> - pru30
>
> T
Love this! Thanks for all the hard work, Tomoko.
-
Vigya
On Wed, Aug 24, 2022 at 12:27 PM Michael Sokolov wrote:
> Thanks! It seems to be working nicely.
>
> Question about the fix-version: tagging. I wonder if going forward we
> want to main that for new issues? I happened to notice there is a
Hi Patrick,
This is an interesting question, and from what I understood, I see
correctness problems in what you're trying to implement. Let me make sure I
understand correctly...
So indexer-1 created segments 1,2,3,4 and indexer-2 created segments 1',
2', 3', 4' independently (they just have the
Dear Community
I am writing to share thoughts on the existing Disk Usage API, I believe
there is an opportunity to improve its functionality and performance
through a reimplementation.
Currently, the best tool we have for this is based on a custom Codec that
separates storage by field; to get the
Hello Team,
I am new to Lucene and want to use Lucene in a distributed system to write
in a Amazon EFS index.
As per my understanding, the index writer for a particular index needs to
be opened by 1 server only. Is there a way we can achieved this in
distributed system to write parallelly in Luce
lyzer);
writter = new IndexWriter(indexDirectory, indexWriterConfig);
Can someone please help to understand why such huge reads are happening?
and how to mitigate such issues.
Thanks in advance
Gopal Sharma
Hi Folks,
I have somewhat complex scoring/boosting requirement.
Say I have 3 text fields A, B, C and a Numeric field called D.
Say My query is "testrank".
Scoring should be based on following:
Query matches
1. text fields A, B and C, & Highest value of D (highest boost/rank)
2. A and B, & Highe
I am looking for an example if anyone has done any custom scoring with
Lucene.
I need to implement a Query similar to DisjunctionMaxQuery, the only
difference would
be it should score based on sum of score of sub queries' scores instead of
max.
Any custom scoring example will help.
(On one hand,
Hello,
I am trying to count the total of number of posting entries for terms having
a given prefix in an index. Also count the number of such terms in the
index.
The following is the code I am using for that. The problem is the result is
not as expected.
Can you point out if what am I doing som
I don't know of classes which will be suitable but if they are ordered
queries a simple code could easily be written.
On Mon, Feb 22, 2010 at 9:59 PM, Nigel wrote:
> I'd like to scan documents as they're being indexed, to find out
> immediately
> if any of them match certain queries. The goal i
ints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
>
>
>
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
--
Shailendra Sharma
+91-988-011-3066
t Ingersoll
> http://lucene.grantingersoll.com
> http://www.lucenebootcamp.com
>
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
>
>
>
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
--
Shailendra Sharma
+91-988-011-3066
Create a match all docs query like following:
MatchAllDocsQuery matchAllDocsQuery = new MatchAllDocsQuery();
And then search as you search for any other query -
searcher.search(matchAllDocsQuery) - it returns hit
Thanks,
Shailendra
-Original Message-
From: sandyg [mailto:[
Hi,
I am using Lucene for indexing and searching the documents.
Its working file for supported documents. Now i want to index documents with
unsupported mime types.
Right now i am using LIUS which is built over Lucene for indexing the
documents.
Is there any tool which I can use for indexing the
Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
> - Original Message
>> From: Gaurav Sharma <[EMAIL PROTECTED]>
>> To: java-user@lucene.apache.org
>> Sent: Wednesday, June 18, 2008 10:07:22 AM
>> Subject: indexing unsupp
Hi,
I am stuck with one more exception.
When i am using a wild card such as a* i am getting too many clauses
exception. It saying maximum clause count is set to 1024. Is there any way
to increase this count.
Can u please help me out in overcoming this.
Thanks in advance.
-Gaurav
-
-Gaura
Hi,
I am stuck with an exception in lucene (too many clauses).
When i am using a wild card such as a* i am getting too many clauses
exception. It saying maximum clause count is set to 1024. Is there any way
to increase this count.
Can u please help me out in overcoming this.
Thanks in advance.
-
Hi all,
I am using MultiSearcher to search more then one Index folders. I have one
Index searcher array which contains 3 Index searchers...
01. C:\IndexFolder1
02. C:\IndexFolder2
03. C:\IndexFolder3
When I searched in 3 index folders using a MultiSearcher then I got 3000
hits.
1 to 1000 from C
earcher(int n) (n would be the docid of
result x).
Hope this helps,
Doron
"Sawan Sharma" <[EMAIL PROTECTED]> wrote on 24/04/2007 03:19:47:
> Hi all,
>
> I am using MultiSearcher to search more then one Index folders. I have
one
> Index searcher array which conta
Hi All,
I was wondering - is it possible to search and group the results by a
given field?
For example, I have an index with several million records. Most of
them are different Features of the same ID.
I'd love to be able to do.. groupby=ID or something like that
in the results, and provide the
Hello Jay,
I am not sure up to what level I understood your problem . But as far as my
assumption, you can try HitCollector class and its collect method. Here you
can get DocID for each hit and can remove while searching.
Hope it will be useful.
Sawan
(Chambal.com inc. NJ USA)
On 6/15/07,
Hi friends,
I tried to implement the facet searching in a sample code and when I tried
it with various case and found no result in one case.I wanted to narrow by
one field "title" and gave the multiple word or say phrase.
So First, in this preparing the lucene query and converting it into
QueryF
your
piece of code would be really small.
Thanks,
Shailendra Sharma
CTO, Ver Se' Innovation Private Ltd.
Bangalore, India
On 7/30/07, Dennis Kubes <[EMAIL PROTECTED]> wrote:
>
> We found that a fast way to do this simply by running a query for each
> category and getting the ma
Though I am not sure what is the possible use case for thing like below, but
here is the pointer:
Using IndexSearcher you can get the "Explanation" for the given query and
document-id. Complex Explanation has multiple sub-explanations and so forth.
Simple Explanation would contain the weight of th
without re-creating indices everytime.
Thanks,
Shailendra Sharma,
CTO, Ver se' Innovation Pvt. Ltd.
Bangalore, India
On 8/1/07, Cedric Ho <[EMAIL PROTECTED]> wrote:
>
> Hi all,
>
> I was wondering if it is possible to do boosting by search terms'
> position in the document.
; > Cedric,
> >
> > SpanFirstQuery could be a solution without payloads.
> > You may want to give it your own Similarity.sloppyFreq() .
> >
> > Regards,
> > Paul Elschot
> >
> > On Thursday 02 August 2007 04:07, Cedric Ho wrote:
> > > Thanks for
IL PROTECTED]> wrote:
> > > Cedric,
> > >
> > > SpanFirstQuery could be a solution without payloads.
> > > You may want to give it your own Similarity.sloppyFreq() .
> > >
> > > Regards,
> > > Paul Elschot
> > >
> >
Ah, Good way !
On 8/4/07, Paul Elschot <[EMAIL PROTECTED]> wrote:
>
> On Friday 03 August 2007 20:35, Shailendra Sharma wrote:
> > Paul,
> >
> > If I understand Cedric right, he wants to have different boosting
> depending
> > on search term positions in t
Hi
My index has 4 keyword fields and one unindexed field.
I want to search by the 4 keyword fields and return the one unindexed field.
I can iterate over the documents via Luke.
But when I search for the same values that I see via Luke, it does not find
the document.
Out of the 4 fields, 2 are a
I figured out the problem when I copied the document from the clipboard.
It had trailing spaces.
After I changed the database query to have an ltrim(rtrim(
for each query, prior to indexing, its fine now.
-Original Message-
From: Sharma, Siddharth
Sent: Thursday, October 27, 2005 4:35
Is using a QueryParser to parse a query using the same, single instance of
Analyzer thread-safe?
Or should I create a new Analyzer each time?
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PR
I have downloaded Lucene 1.4.3
I am trying to narrow down on the JRE version to use.
We have the flexibility to use 1.3.1 up.
Which JVM will be the best for running Lucene?
I saw a note on the FAQ that said that Lucene will run on 1.3.1 but will
require 1.4 to compile.
Why would anyone want to com
Place the lucene jar file in the WEB-INF/lib directory of your web
application prior to creating its war.
If your ISP inspects the war and removes all jar files within it, then I
suppose you might just have to place all the lucene classes under
WEB-INF/classes of your web application as 'loose cla
I have just joined this user group, but I probably will be asking questions /
contributing for a while now as I am starting to work on a product which will
use Lucene exclusively.
Still in the designing phase, and I see that we need to manage several user /
application specific configurations
I have two applications, one which will be generating all the indexes and the
second one which will be reading those indexes. I cannot keep them in the same
application, because one will run all the times in batches via some sort of
scheduler to generate the indexes and the application which wil
- > Just set your maxBufferedDocs to as high a number as your RAM/heap will
let you, and pick a mergeFactor that is high, but doesn't get you in trouble
with open files.
can you please explaing this in brief??
regards and thanks,
On 6/9/06, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:
When wri
Hi,
i am having problem in getting the count on distict values of a field. The
reason for getting this value is that, each of all documents in index
belongs to one predefined class and i want to get the number of documents
belonging to each class.
Regards..
Hi
I am complete newbie to Lucene. In fact I'm not even a search guy. I looked
up terms such as stemming just yesterday. So this is going to be so much fun
;)
Here's the problem I am trying to solve:
I work in the B2B space at Staples (an office supplies company in the US).
We sell office products
Hoss
Thanks for the reply. The posting was an excellent write-up and helped me
visualize my problem domain and solution better.
I like the idea about storing filter information in the contract index
indexed by company. It might work in my case.
I am not sure if I understand the BitSet solution t
Hiya
Given that I have two high level business entities, catalog (containing
product information) and contract (containing filter criteria about which
products are available for sale and which are not), what is a better
approach?
1. To have two different indices and query them separately.
OR
2. H
Query: caught a class org.apache.lucene.queryParser.ParseException
with message: Too many boolean clauses
I realize why this is happening (the 1024 clauses limit for BooleanQuery).
My question is more design related.
During customer registration, the customer defines a set of skus/products
that
se the max clause count.
//Setting the clause Count
BooleanQuery.setMaxClauseCount(int);
Can use maxint or some number smaller.. When I set this high, I have had
to set the java pool higher for memory as well.
Tom
-Original Message-----
From: Sharma, Siddharth [mailto:[EMAIL PROTECTED]
Se
Thanks Chris
I haven't tried it yet, but I think I understand your idea now (after 24
hours, man I'm slow on the uptake;)
I'll try it today.
-Sid
-Original Message-
From: Chris Hostetter [mailto:[EMAIL PROTECTED]
Sent: Monday, October 17, 2005 5:05 PM
To: java-user@lucene.apache.org
Sub
I downloaded the source code of 1.4.3 but did not find the source of
RangeFilter.
I could not find it in the sandbox either?
RangeFilter, where art thou?
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-ma
Hi
I have an instance (each) of IndexSearcher and StandardAnalyzer housed in a
Singleton and I intend to use this one single instance (of Searcher and
Analyzer) for multiple concurrent search requests.
I vaguely remember reading that I (as a client) do not have to synchronize.
Lucene internals take
The Lucene PMC is pleased to announce the release of Apache Lucene 10.3.0.
Apache Lucene is a high-performance, full-featured search engine library
written entirely in Java. It is a technology suitable for nearly any
application that requires structured search, full-text search, faceting,
nearest-
Hello gentlemen,
I am novice to lucene and carrot 2 but I have urgent requirement for building
a prototype using lucene and carrot2. Please help me with working web
application demo along with code.
Thanks
Arun
-
Sneak p
84 matches
Mail list logo