The above works fine but how do I get the state of *last docID*. Also, there
will be multiple users accessing this and we need to maintain the integrity
of last docID. Can we know the last docID from the collector collect call?
Thanks,
Ranjith.
--
View this message in context:
There might be an issue with the below approach as the docID that is saved
might be deleted before the next call to search and I am not sure if it does
break the seach functionality when such a thing happens.
Thanks,
Ranjith.
--
View this message in context:
The collector is collecting all the documents. Let's say I have 50k documents
and I want the collector to give me the results taking the start and
maxHits. Can we get this functionality from Lucene? For example, very first
time, I want to collect from 0 -100 the next time I want to collect from
Could you please elaborate on the above? I am not sure if the collector is
already doing it or do I need to call any other API?
Thanks,
Sai.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Collector-is-collecting-more-than-the-specified-hits-tp4117329p4117883.html
Sent
As I mentioned in my original post, I am calling like the below:
MyCollector collector;
TopScoreDocCollector topScore =
TopScoreDocCollector.create(firstIndex+numHits, true);
IndexSearcher searcher = new IndexSearcher(reader);
try {
I am not interested in the scores at all. My requirement is simple, I only
need the first 100 hits or the numHits I specify ( irrespective of there
scores). The collector should stop after collecting the numHits specified.
Is there a way to tell in the collector to stop after collecting the
The problem with the below collector is the collect method is not stopping
after the numHits count has reached. Is there a way to stop the collector
collecting the docs after it has reached the numHits specified.
For example:
* TopScoreDocCollector topScore = TopScoreDocCollector.create(numHits,
We recently upgraded to Lucen4.0 and found performance issues in searching
the results. Upon some analysis, we found that it chokes when there are
multiple requests coming for lucene search.
User1 - Search
User2 - search
User3 - search
The search request done by User Search1 is still waiting
I have created strings like the below
searchtext
+sampletext
and when I try to search the following using *** or *+** it does not give
any result.
I am using QueryParser.escape(String s) method to handle the special
characters but does not look like it did anything.
Also, when I search
StandardAnalyzer both at index and search time. We use the default one and
don't have any custom analyzers.
Thanks,
Sai
--
View this message in context:
http://lucene.472066.n3.nabble.com/Handling-special-characters-in-Lucene-4-0-tp4096674p4096710.html
Sent from the Lucene - Java Users
Thanks.
So, if I understand correctly, StandardAnalyzer wont work for the following
below as it strips out the special characters and does search only on
searchText ( in this case).
queryText = *searchText*
If we want to do a search like *** then we need to use
WhiteSpaceAnalyzer. Please let me
what about other characters like ','( quote) characters. We have a
requirement that a text can start with 'sampletext' and when I search with a
'* it does not return any results but instead when I search with sample*, it
does return the result.
Thanks,
Ranjith,
--
View this message in context:
Could someone please comment on the above?
Thanks,
Sai
--
View this message in context:
http://lucene.472066.n3.nabble.com/TopDocCollector-vs-TopScoreDocCollector-semantics-changed-in-4-0-not-backward-comptabile-tp4035806p4045855.html
Sent from the Lucene - Java Users mailing list archive at
Thanks for the response and really appreciate your help. I have read the
documentation but could not get it in the first read as I was new to Lucene.
I have changed it to AtomicReader and it seems to be working fine.
One last clarification is do we also need to use AtomicReader for the
following
Could someone please comment on the above code snippet ?
Also, one observation is that our search results are not consistent if we
are using* IndexReader vs AtomicReader?* Could this be a problem?
Thanks,
Sai.
--
View this message in context:
Thanks a lot. Really appreciate your help here.
I have read through the document and understand that the IndexReader uses
sub readers (to look into the index files) and AtomicReader does not. But
how does this affect from the search stand point of view. I think search
results should be consistent
I want to get the Document in the following below code and thats why I need
an indexReader
public void collect(int doc) throws IOException {
// ADD YOUR CUSTOM LOGIC HERE
*Document doc = indexReader.document(doc)*
delegate.collect(doc);
}
But this seems to be the problem as the
Thanks. Is there any issue the way we are calling the
indexReader.getDocument(doc)?
Not sure how do I get an AtomicReaderConext in the following below method?
Any pointers on how do I get that instance is appreciated?
public void collect(int doc) throws IOException {
// ADD YOUR CUSTOM
Here is how I am using it:
public class MyCollector extends PositiveScoresOnlyCollector {
private IndexReader indexReader;
public MyCollector(IndexReader indexReader, PositiveScoresOnlyCollector
topScore) {
super(topScore);
this.indexReader = indexReader;
The following query does not seems to work after we upgrade from 2.4 - 4.0
*+type:sometype +title:sometitle**
Any ideas as to what are some of the places to look for? Is the above Query
correct in syntax. Appreciate if you could advise on the above?
Thanks,
Sai.
--
View this message in
I understand from the JIRA ticket(Lucene-3640) that the IndexSearcher.close()
is no-op operation but not very clear on why it is a no-op? Could someone
shed some light on this? We were using this method in the older versions and
is it safe now to remove this call. Just want to understand the
I am looking at the versions supported by newer version of Tika (1.3) and was
not sure what version(s) of the Microsoft office it supports
(97/2000/2010/2013) for each of the below?
http://tika.apache.org/1.3/formats.html#Microsoft_Office_document_formats
Microsoft word (also does it support
Are you closing or committing your IndexWriter after each added
document? Because if you add 100 docs you should not see 100 versions
of these files, only one set of files in the end (many docs are
written to one segment).
No. What I meant to say here is if 100 users have updated the document
I am using the TieredMergePolicy and using the compound index:
TieredMergePolicy mergePolicy = new TieredMergePolicy();
indexWriterConfig.setMergePolicy(mergePolicy.setNoCFSRatio(1.0d));
Prior to 4.0, there was an optimize() in the IndexWriter which was merging
the index files. Is there any
Thanks. I read this ( and also tried it out in my code) and understand that
forceMerge(1) is not advisable for performance reasons. My question here is
if we don't have a way to compress these files, it will produce enormous
amount of files which will lead to some file system issues ( such as
I am using the following below for creating the IndexWriter (for my
indexing):
IndexWriterConfig indexWriterConfig = new
IndexWriterConfig(Version.LUCENE_40,
new LimitTokenCountAnalyzer(analyzer,
MAX_FIELD_SCAN_LENGTH));
if (create) { // create will be trure for indexing
It's _0.si ( typo)
For second update, create = false.
Thanks,
Sai.
--
View this message in context:
http://lucene.472066.n3.nabble.com/IndexWriterConfig-OpenMode-CREATE-vs-OpenMode-APPEND-index-files-tp4037766p4037785.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
Is it by design. The older API (2.4) does not have this problem. Lets say if
I have 100 updates or so.. then it will create 100 versions of those files
in the index. This would increase the number of files in the index directory
and might run into some file issues?
It would be good to just have
The following files are originally created files (upon an initial indexing):
_0.fdt
_0.fdx
_0.fnm
_0.si
_0_Lucene40_0.frq
_0_Lucene40_0.prx
_0_Lucene40_0.tim
_0_Lucene40_0.tip
_0_nrm.cfe
_0_nrm.cfs
index.v0008
We are not using Solr and using just Lucene core 4.0 engine. I am trying to
see if we can use tika library to extract textual information from
pdf/word/excel documents. I am mainly interested in reading the contents
inside the documents and index using lucene. My question here is , is tika
Thanks a lot. If we want to wrap TopScoreDocCollector into
PositiveScoresOnlyCollector. Can we do that?
I need only positive scores and I dont think topscore collector can handle
by itself right?
Thanks,
Sai
--
View this message in context:
I am not looking for negative scores and want to skip it.
Thanks,
Sai
--
View this message in context:
http://lucene.472066.n3.nabble.com/TopDocCollector-vs-TopScoreDocCollector-semantics-changed-in-4-0-not-backward-comptabile-tp4035806p4036378.html
Sent from the Lucene - Java Users mailing
I want to index the document content( such as PDF/word/excel) and am just
wondering if there are any good readers that I can use to integrate into
Lucene 4.0. Any pointers/example code is appreciated..
Lucene In Action book mentions tika as the library to use but not sure if
this is the preferred
Thanks. I checked it out.
Here are the list of files that has been generated:
_0.fdt
_0.fdx
_0.fnm
_0.si
_0_Lucene40_0.frq
_0_Lucene40_0.prx
_0_Lucene40_0.tim
_0_Lucene40_0.tip
_0_nrm.cfe
_0_nrm.cfs
Thanks Michael. The additional file in the list is just a typo.
One more question is, we were using 2.4 before, and it only generated few
files
_0.cfs
_0.cfx
// segment files
I am assuming that the 2.4 version has the compound index structure enabled
by default. Do we need to set it explicitly
Thanks. Are there any best practices to follow here? or leave the the default
( which is hybrid approach as you mentioned).
--
View this message in context:
http://lucene.472066.n3.nabble.com/List-of-files-that-Lucene-4-0-generates-during-indexing-tp4035993p4036086.html
Sent from the Lucene -
Thanks a lot. One last question, how do we set it? IndexWriter.???
Thanks,
Ranjith.
--
View this message in context:
http://lucene.472066.n3.nabble.com/List-of-files-that-Lucene-4-0-generates-during-indexing-tp4035993p4036091.html
Sent from the Lucene - Java Users mailing list archive at
Can someone please help us here to validate the above?
Thanks,
Sai.
--
View this message in context:
http://lucene.472066.n3.nabble.com/TopDocCollector-vs-TopScoreDocCollector-semantics-changed-in-4-0-not-backward-comptabile-tp4035806p4036093.html
Sent from the Lucene - Java Users mailing
Thanks. Could you please also comment on the following as well?
http://lucene.472066.n3.nabble.com/TopDocCollector-vs-TopScoreDocCollector-semantics-changed-in-4-0-not-backward-comptabile-td4035806.html
Thanks and really appreciate your help.
Thanks,
Sai.
--
View this message in context:
Thanks. We decided to delete the existing index directories and recreate it
once we upgrade to 4.0 (unless we hit any major api blockers during
compilation, we will prefer to go to 3.6.2 first and then later to 4.0).
Thanks,
Sai.
--
View this message in context:
There is no optimize() method in 4.0. I looked at the 3.6 docs and it did
mention the following below. Does the following below mean that we no longer
need this method and should not be used anymore. Is there any supplement
method that we need to use as it is deprecated as of version 3.6.0
/*
Our current search implementation (based on 2.4.0) uses a collector extending
the TopDocCollector class
public class MyHitCollector extends TopDocsCollector {
private IndexReader indexReader;
private CustomFilter customFilter;
public MyHitCollector (IndexReader indexReader, int
I am sorry but I am confused looking at the change logs and the enhancements
done. Since we are jumping from 2.4 - 4.0. Could you please point me to any
example code that extends one of the new collectors.. that would help a lot
or it would be great if you could give some pointers on how we can
I would like to write a custom collector ( similar to the one which is inside
the source of TopScoreDocCollector like InOrderTopScoreDocCollector). The
reason for extending this is because InOrderTopScoreDocCollector and
OutOfOrderTopScoreDocCollector are private to the class and I really wanted
Here is the way I implemented a collector class. Appreciate if you could let
me know of any issues..
public class MyCollector extends PositiveScoresOnlyCollector {
private IndexReader indexReader;
public MyCollector (IndexReader indexReader,PositiveScoresOnlyCollector
Here is the way I implemented a collector class. Appreciate if you could let
me know of any issues..
public class MyCollector extends PositiveScoresOnlyCollector {
private IndexReader indexReader;
public MyCollector (IndexReader indexReader,PositiveScoresOnlyCollector
We are using the following below method with Lucene 2.4.0
public void search(Weight weight,
Filter filter,
HitCollector results)
throws IOException
We are upgrading to the latest version and looking at the API (4.0), the
above signature has been
Thanks.
Can we use the following method in 4.0 as a replacement for the above
method? However, we will rewrite this to use FilteredQuery later but don't
want to refactor a lot.
public void search(Query query,
Filter filter,
Collector results)
throws IOException
We are upgrading from 2.4 - 4.0? What are the options here? ( To delete the
existing index directories and reindex with the upgraded ones).
We don't want to take any intermediate steps which would cause more work
again to upgrade to the latest version.
Thanks,
Sai.
--
View this message in
Also, I am not sure about the following below statement:
A direct update from 2.x to 4.0 is not possible
Are you saying that its impossible to upgrade from 2.4 - 4.0 version? Why
can't we upgrade? Any technical limitations with Lucene that will not allow
from upgrading from 2.4 - 4.x?.
I am
We are using StandardAnalyzer for indexing some Japanese Keywords. It works
fine so far but just wanted to confirm if the StandardAnalyzer can fully
support it ( I have read somewhere in Lucene In Action book, that
StandardAnalyzer does support CJK). Just want to confirm if my understanding
is
Not sure what does the following below mean?
using Field.Store.NO the field itself is definitely searchable. You will
not be able to retrieve the field value itself
For example, if we have a file that we upload using some keywords and if the
keyword (is of type Field.Store.NO but is analyzed)
Thanks for all the responses.
Apart from the API changes, is there any major functionality change from
2.4.0 - 4.x version. I know we need to modify the API to the latest version
but just curious if we need to be aware of any functional changes so as to
do more thorough testing?
Thanks,
Sai.
We have an existing application which uses Lucene 2.4.0 version. We are
thinking of upgrading it to alatest version (4.0). I am not sure the process
involved in upgrading to latest version. Is it just copying of the jars? If
yes, what are all the jars that we need to copy over. Will it be backward
Thanks. Could you please elaborate on what is needed other than replacing the
jars? Are the jars listed is the only jars or any additional jars required?
Is the API not backward compatible? I mean to say whatever the API calls we
are using in 2.4.0 is not supported by 4.0? Has the signature
Are there any best practices that we can follow? We want to get to the latest
version and am thinking if we can directly go from 2.4.0 to 4.x (as supposed
to 2.x - 3.x and 3.x - 4.x)? so that it will not only save time but also
testing cycle at each migration hop.
Are there any limitations in
We are using Lucene (2.4.0 libraries) for implementing search in our
application. We are using Standard Analyzer for Analyzer part.
Our application has a documents upload feature which lets you upload the
documents and be able to put in some keywords (while uploading it). When we
search (using
DoesLucene StandardAnalyzer work for all the languagues for tokenizing before
indexing (since we are using java, I think the content is converted to UTF-8
before tokenizing/indeing)? or do we need to use special analyzers for each
of the language. In this case, if a document has a mixed case (
58 matches
Mail list logo