I will investigate it. In the meantime, this is the correct link:
http://lucene.apache.org/java/3_5_0/api/contrib-facet/userguide.html
Shai
On Wed, Dec 14, 2011 at 3:08 PM, Lukáš Vlček wrote:
> Hi,
>
> is there broken link in
>
> http://lucene.apache.org/java/3_5_0/api/all/org/apache/lucene/fac
get to the
> o.a.l.facet package?
>
> On Wed, Dec 14, 2011 at 8:14 AM, Shai Erera wrote:
> > I will investigate it. In the meantime, this is the correct link:
> > http://lucene.apache.org/java/3_5_0/api/contrib-facet/userguide.html
> >
> > Shai
> >
> &
t; > -
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: u...@thetaphi.de
> >
> >
> > > -Original Message-
> > > From: Shai Erera [mailto:ser...@gmail.com]
> > > Sent: Thur
... issue for *it*, not 'other' :)
Shai
On Dec 15, 2011 2:47 PM, "Shai Erera" wrote:
> If you already did it, then a patch will be great. Perhaps we should open
> an issue for other?
>
> Shai
> On Dec 15, 2011 11:44 AM, "Uwe Schindler" wrote:
&g
u...@thetaphi.de
>
> > -----Original Message-
> > From: Shai Erera [mailto:ser...@gmail.com]
> > Sent: Thursday, December 15, 2011 1:47 PM
> > To: java-user@lucene.apache.org
> > Subject: RE: Broken link in Lucene 3.5 JavaDoc?
> >
> > If you already di
I opened LUCENE-3649.
Shai
On Thu, Dec 15, 2011 at 2:50 PM, Shai Erera wrote:
> Sure, as soon as I'll be in front of a computer.
>
> Shai
> On Dec 15, 2011 2:48 PM, "Uwe Schindler" wrote:
>
>> Yes, I could attach the patch there! Will you open it?
>>
Hi Cheng,
You will need to use the exact path labels in order to get to the category
'Mark Twain', unless you index multiple paths from start, e.g.:
/author/American/Mark Twain
/writer/American/Mart Twain
The taxonomy index does not process the CategoryPath labels in anyway to
e.g. produce synony
If I understand 'group search' correctly, you mean grouping search results
by some criteria?
The main difference between grouping search results to faceted search is
that when you group search results by some criteria, your request is
something like "give me the top 3 results from each movie categ
Or ... move to use a per-segment array. Then you don't need to rely on doc
IDs changing. You will need to build the array from the documents that are
in that segment only.
It's like FieldCache in a way. The array is relevant as long as the segment
exists (i.e. not merged away).
Hope this helps.
return new DocValues() {
>public float floatVal(int doc) {
>if(doc < values.length)
>return values[doc];
>return 1.0f;
>}
>};
>}
> }
>
> How would I need to change it to make the
If I understand correctly, you're using the NativeFSLockFactory and that's
the expected behavior -- unlike SimpleFSLockFactory, if you terminate the
JVM and then restart the program, the lock is not held anymore -- that's
the advantage of using native-fs-lock because nobody really holds the lock
an
Hi
You have several ways to do it:
1) Use NativeFSLockFactory, which obtains native locks that are released
automatically when the process dies, as well as after a successful
IndexWriter.close(). If your writer.close() is called just before the
process terminates, then this might be a good soluti
You could extend IndexWriter to AutoCommitIndexWriter and override flush()
to call super.flush() then commit() (or simply just commit()). I haven't
tested it but I think it should work.
However, make sure you understand the implications of commit() -- it's
heavier than just flush. Perhaps you can
Hi Ganesh
I recently upgraded my code to 3.6, and yesterday finished part of my
upgrades to 4.0-ALPHA.
Upgrading from 3.0.3 to 3.6 is relatively easy as all API should be
backwards compatible. But I think there were some API breaks, and
back-compat issues. Therefore, if I were you, I'd first upgr
ly the stable
> version.
>
> Regards
> Ganesh
>
>
> - Original Message -
> From: "Shai Erera"
> To:
> Sent: Tuesday, July 10, 2012 10:50 AM
> Subject: Re: Upgrade to 3.6 OR wait for 4.0
>
>
> > Hi Ganesh
> >
> > I recently
Hi.
Facetted search exists since 3.5 and will exist in 4.0 too !
Shai
On Jul 26, 2012 7:21 PM, "Subramanian, Ranjith" <
ranjith.subraman...@capgemini.com> wrote:
> Hi Team,
>
> ** **
>
> I would like to know if Lucene 4.0 will support facetted search.
>
> Thanks in advance.
>
> ** **
Hey Mike,
I'm not sure that I like the idea of throwing LuceneException or
SearchException everywhere. I've been there (long time ago) and I always
hated it.
First, what's the difference between 'new SearchException("Failed to read
the index", ioe)' and 'new IOException("Failed to read the index"
I think that specific exceptions should be thrown only in case we expect
the user to do something with it. E.g. LockObtainException is something
that I can catch and try to recover from in the code, maybe retry to obtain
the lock.
But all IOExceptions, maybe excluding FNFE, are unrecoverable in th
Hi Ravi,
I've been dealing with reverse indexing lately, so let me share with you a
bit of my experience thus far.
First, you need to define what does reverse indexing mean for you. If it
means that docs that were indexed in the following order: d1, d2, d3 should
be traversed during search in tha
n the meantime, I will live with good old sorting
>
> --
> Ravi
>
> On Wed, Nov 21, 2012 at 1:59 AM, Shai Erera wrote:
>
> > Hi Ravi,
> >
> > I've been dealing with reverse indexing lately, so let me share with you
> a
> > bit of my experience
Hi Jan,
Basically, DrillDown is a helper class for creating such queries. You're
right that its query() methods create AND, because that's normally the
case, but if you require OR, you could do this:
BooleanQuery res = new BooleanQuery();
for (CategoryPath cp : paths) {
res.add(new
Really off the top of my head, if that's an expected query, you can try to
index the words backwards (in that field) and then convert the query *plan
to nalp* :).
You can also index the suffixes of words, e.g. vacancyplan, acancyplan,
cancyplan and so forth, and then convert the query *plan to pla
There's no specific branch for 4.1 yet. All development still happens on
the 4x branch (
http://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x/).
Note that Lucene maintains two active branches for development: 'trunk'
(currently to be 5.0) and '4x' off of which all Lucene 4.x releases are
2, 2013 at 8:06 PM, Lance Norskog wrote:
> 4.x does not promise backwards compatibility with 3.x. Have you made your
> own extensions?
>
> On 01/02/2013 04:38 AM, Shai Erera wrote:
>
>> There's no specific branch for 4.1 yet. All development still happens on
Hi Nicola,
I think that what you're describing corresponds to distributed faceted
search. I.e., you have N content indexes, alongside N taxonomy indexes.
The information that's indexed in each of those sub-indexes does not
correlate with the other ones.
For example, say that you index the category
I see the resulting
> > categories indexes are not that big currently), but I would prefer to
> > have a solution where I can collect the facets over multiple categories
> > indexes in this way I will be sure the solution will scale better.
> >
> >
> > Nicola.
> >
know if this version will be released
> soon?
>
>
> Nicola.
>
> On Tue, 2013-01-22 at 06:20 +0200, Shai Erera wrote:
> > Hi Nicola,
> >
> > What I had in mind is something similar to this, which is possible
> starting
> > with Lucene
, IMO, to
the common user.
Shai
On Tue, Jan 22, 2013 at 4:57 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:
> On Mon, Jan 21, 2013 at 11:20 PM, Shai Erera wrote:
>
> > (unfortunately, there's still no tool in Lucene to do that for you).
>
> I thi
Hi Nicola,
Regarding the OR drill-down, yes you can construct your own BooleanQuery,
passing Occur.SHOULD instead of MUST. Currently DrillDown does not help you
do that, so you can copy the code from DrillDown.query and change SHOULD to
MUST. I opened LUCENE-4716 to add this support to DrillDown.
Ooops, I just realized that at some point java-user was removed from the CC
:).
Fixing that.
Shai
On Fri, Jan 25, 2013 at 2:27 PM, Shai Erera wrote:
> Hi Nicola,
>
> Indeed, if it's a URL with parameters, it's not a UI trick :). I think
> that you can do what you want
Hi
Are the values of 'a' and 'b' known in advance? Is it a limited set of
values? Are you always interested in a table which covers all values?
If so, one way to do that is to each value of 'a' against all values of
'b'. Of course, pick as pivot the dimension with the least values. Note
however t
Hi Nicola,
How does the interface allow the user to select a facet values not from the
top-10? How does the interface know which other facet values are there?
Does it query the taxonomy somehow?
One thing you can do is to set numResults to Integer.MAX_VALUE and
numToLabel to 10. That way your Fac
randXCount = counts[brandXOrdinal];
// now you can build your final result with the count of Brand/X
Hope that helps
Shai
On Tue, Jan 29, 2013 at 7:55 PM, Shai Erera wrote:
> Hi Nicola,
>
> How does the interface allow the user to select a facet values not from
> the top-10? How d
I'm glad to hear it helped you, Ramprakash.
Don't hesitate to post questions to the list if you need further assistance!
Shai
On Fri, Feb 1, 2013 at 9:12 AM, Ramprakash Ramamoorthy <
youngestachie...@gmail.com> wrote:
> On Fri, Jan 25, 2013 at 6:23 PM, Shai Erera wrote:
&
Hi
You can do so quite easily, using TaxonomyReader, following code such as:
ParallelTaxonomyArrays arrays = taxoReader.getParallelTaxonomyArrays();
int[] children = arrays.children();
int[] siblings = arrays.siblings();
int ordinal = taxoReader.getOrdinal(category); // ordinal of requested
cate
It's the same decision that you need to make regarding IndexWriter.
You should commit when you want the data to be persistent. This can happen
on a timer-basis (e.g. every 10 minutes), or following some application
logic, e.g. finished crawling a website or indexing a chunk of documents.
NRT suppo
orking correctly.
>
>
> Nicola.
>
> On Thu, 2013-01-24 at 16:53 +, Nicola Buso wrote:
> > Hi Shai,
> >
> > I'd like just to give you a confirmation that your solution is working
> > after the tests I did.
> >
> > Thanks again for the use
Hi Nicola,
I didn't read the code examples, but I'll relate to your last question
regarding the Aggregator. Indeed, with Lucene 4.2,
FacetRequest.createAggregator is not called by the default
FacetsAccumulator. This method should go away from FacetRequest entirely,
but unfortunately we did not fin
Hi Carsten,
You're right that Lucene document numbers are ephemeral, but they are
consistent for a certain IndexReader instance. So perhaps you can use
SearcherLifetimeManager to obtain a 'version' of the reader that returned
the original results and store a bitset together with that version. Then
Hi Nicola,
Yes, the residue was removed in LUCENE-4709 since it was a senseless
number. If you index your facets with OrdinalPolicy.ALL_PARENTS, then the
residue can be computed from root.value - sum(topK.value).
Also, FacetResult.numValidDescendants actually contains the right statistic
(total n
Hi
It's ... tricky :).
If you ask for depth=3, then you will never get idC because idB's count is
0. I think what you could do is the following:
1. Index the categories with NO_PARENTS
2. Write a FacetsAggregator which extends FastCountingFacetsAggregator
3. Override rollupValues() to c
Hi Nicola,
I think this limit denotes the number of bytes you can write in a single DV
value. So this actually means much less number of facets you index. Do you
know how many categories are indexed for that one document?
Also, do you expect to index large number of facets for most documents, or
o I've seen where partitions came
in handy was IMO an abuse of the fact module ... :-)
Shai
On Apr 26, 2013 6:04 PM, "Shai Erera" wrote:
> Hi Nicola,
>
> I think this limit denotes the number of bytes you can write in a single
> DV value. So this actually means much l
card
> a bunch of facets values; I imagine there will be queries that will
> point out some species (let me say) in the 32766 saved values and some
> other queries that will point out the species not saved in the facets.
>
> We can try to save the most relevant values for this face
Hi
I don't think it's possible, not with the default configuration. The
problem is that drill-down terms are created with a default delimiter,
which is \u001F, and you can't really type that character.
One way is to override FacetIndexingParams.getFacetDelimChar() to return a
human readable chara
Hi Clive,
In order to use Lucene facets you need to make indexing time decisions.
It's not that you don't make these decisions anyway, even with Solr -- for
example, you need to decide how to tokenize the fields by which you want to
facet, or in Lucene 4.0 index them as SortedSetDocValuesField.
I
he time to explain the situation.
>
> Clive
>
>
>
>
> From: Shai Erera
> To: "java-user@lucene.apache.org" ; kiwi
> clive
> Sent: Monday, May 6, 2013 5:56 AM
> Subject: Re: search-time facetting in Lucene
>
>
> Hi Clive,
>
&
I think you can do what Mike suggested quite easily. Create your own
FacetResultsHandler, override the Accumulater.createFRH(). The handler will
zero out all counts you are not interested in and then delegate to the
wrapped FRH to compute the actual top K.
Shai
On Thu, May 9, 2013 at 1:44 PM, Ni
If your documents *always* contain the same fields then yes. But in
general, you can do:
addDocument("f:value");
commit();
addDocument("c:value");
commit();
And each AtomicReader will contain different fields. As getFieldInfos()
documents "Get the {@link FieldInfos} describing all fields in *this
Hi Raj,
Unfortunately the userguide is outdated after refactorings made to the
package. We have an issue open to fix that.
Until then, you can find an example code here:
https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_3/lucene/demo/src/java/org/apache/lucene/demo/facet/SimpleFac
ect based on the sample provided. But a bit confused on how to query
> the document content which also has associated Facets.
>
>
> On Thu, May 16, 2013 at 11:34 AM, raj wrote:
>
> > Thanks a lot Shai! That was really quick response.
> >
> >
> &g
Hi
To override OrdinalPolicy you need to do the following:
FacetIndexingParams fip = new FacetIndexingParams() {
public CategoryListParams getCategoryListParams(CategoryPath) {
return new CategoryListParams() {
public OrdinalPolicy getOrdinalPolicy(String) {}
}
}
}
BTW, in the
ble then?) for helping me figure out how to achieve
> my goals.
>
> Thanks a lot and sorry for asking THAT much!
> -Danny
>
>
> -Ursprüngliche Nachricht-
> Von: Shai Erera [mailto:ser...@gmail.com]
> Gesendet: Montag, 27. Mai 2013 15:18
> An: java-user@lucene.a
bership (hierarchical categories) and access them based
> on required results?
>
> If something isn't clear, please ask and I'll explain obscurities.
> Thanks a lot for spent your precious time for solve my problem!
>
> -Danny
>
>
> -Ursprüngliche Nachricht
C3 (1)
> C4 (2)
> C5 (1)
> C6 (1)
> C7 (1)
>
> 3. Hierarchical, indirect membership:
> C1 (2)
> |
> |- C2 (1)
> |- C3 (1)
> |- C4 (2)
> |
> |- C5 (1)
> | |- C6 (1)
> |
> |- C7 (1)
>
> 4. Hierarchic
The best practice is:
- The component which calls DirectoryReader.open should call close()
- Any code which calls incRef() should match that call with decRef(),
preferably in a try-finally clause
Shai
On Sun, Jun 2, 2013 at 6:18 AM, Yonghui Zhao wrote:
> Thanks, Michael.
>
> My under
Hi Oded,
These times sound way too high, even for really hard queries. Can you share
a bit about how you index the documents and what do they contain?
Specifically:
- How many facet dimensions per-document do you have?
- Are the dimensions unique, i.e. a document has one category from a
Hi
Taking a backup of the index by doing a naive file copy is not a good
approach. As you mentioned, Lucene does background merging and if your
application suddenly commits, old segment files may be deleted. Also, your
backup will most probably include files that were not committed yet.
Rather, y
There are several ways to implement it :
Query as you mentioned. You'd need to implement a Scorer which traverses
the posting list where the payload exists. The methods you should implement
are nextDoc() and advance(). You'll also need to traverse
DocsAndPositionsEnum.
A Filter. That's somewhat e
Hi,
I assume that you use a single TaxonomyReader instance? It must be the same
for both indexes, that is, both indexes must share the same taxonomy index,
or otherwise their ordinals would not match as well as you may hit such
exceptions since one index may have bigger ordinals than what the taxo
Do you want your top-K to be computed by label too? Or first deduce the
top-K facets, then sort them otherwise?
Shai
On Tue, Jul 2, 2013 at 6:36 PM, Nicola Buso wrote:
> Hi,
>
> I was looking to change the order of the facet results; in this case I
> would like to order by the facet label inst
x, using
> IndexWriter.AddIndexes().
> If the temp index has facet index, this approach creates a bad index.
>
> Is there a way I can build faceted index in multiple threads?
>
> - Gao Peng
>
> > -Original Message-
> > From: Shai Erera [mailto:ser...@gma
op through the temp index, and for each doc, check if it's already in
> the master,
> addDocument() only if it doesn't exist.
> Now I have facets, how do I selectively merge docs?
>
> Thanks again for your help,
> Gao Peng
>
>
> > -Original Message-
>
the values in the original
> hierarchy in the new created one? Too expensive?
>
>
> Nicola.
>
>
>
> On Tue, 2013-07-02 at 20:49 +0300, Shai Erera wrote:
> > Well, in general it can be done, but it won't be cheap. You can
> > implement a FacetResultsHandler which in
RangeFacetRequest will be released in 4.4, I guess a couple of weeks away.
Shai
On Jul 4, 2013 12:02 PM, "Nicola Buso" wrote:
> On Wed, 2013-07-03 at 21:58 +0300, Shai Erera wrote:
> > What's maxCount? What I mean is that if you create a FacetRequest with
> >
acetFields.addFacets() on the doc works.
>
> Given that I need to check the uniqueness before merging an index with
> facets
> into a master, is there better way to it without re-indexing?
>
> Gao Peng
>
>
> > -Original Message-
> > From: Shai Erera [mai
no option 1 is better than reindexing, but option 2 is the fastest imo.
Shai
On Fri, Jul 5, 2013 at 6:55 PM, Peng Gao wrote:
> Thanks.
>
> Yes, that's the case. I'll try it out.
>
> Is Option 1 more expensive than re-indexing?
>
>
> > -Origi
Well ... at a high level, this is what you should do:
1. Integrate with Apache Tika for parsing the .DOC files (and maybe
other office files you have)
2. Tika extracts the contents of the document, as well as some metadata
3. Create a Lucene Document object to which you add Fields:
useful for Java
> newbie?
>
>
>
> --
> Thanks and Best Regards
>
> Vinh Dang (Msc.)
> Project Manager
> FPT Software
> Mobile: +84 982 058 956
> Skype: dqvinh87
> Y!M:dqvinh87
> Email: dqvin...@gmail.com
> Websites: http://www.vinhdq.blogspot.com
>
&g
Hmm, does that mean that Lucene 4.0+ cannot run on Android?
Shai
On Sat, Jul 13, 2013 at 6:51 PM, VIGNESH S wrote:
> Hi Robert,
>
> Thanks for your reply.
>
> If possible,can you please explain why this new class loading mechanism was
> introduced in Lucene 4
>
> Thanks and Regards
> Vignesh
>
t implement java, its not java.
>
> On Sat, Jul 13, 2013 at 2:23 PM, Shai Erera wrote:
>
> > Hmm, does that mean that Lucene 4.0+ cannot run on Android?
> >
> > Shai
> >
> >
> > On Sat, Jul 13, 2013 at 6:51 PM, VIGNESH S
> > wrote:
> >
> &g
There are several options:
As Allison suggested, pad your words with ##, so that "quota tom" becomes
"##quota## ##tom##" at indexing time, and the query "quota to" becomes
either "##quota ##to", or if you want to optimize, only pad query terms < 3
characters, so it becomes "quota ##to". That shoul
Wait, I didn't mean to pad the entire string. If the string is broken on _
already, then NGramFilter already receives the individual terms and you can
put a Filter in front that will pass through a padded token?
Shai
On Fri, Jul 19, 2013 at 3:45 PM, Becker, Thomas wrote:
> In general the data f
Hi
In Lucene 4.4 we've improved the snapshotting process so that you don't
need to specify an ID.
Also, there's a new Replicator module which can be used for just that
purpose - take hot index backups of the index.
It pretty much hides most of the snapshotting from you. You can read about
it here:
4876,
> > where we added cloning of IndexDeletionPolicy on IW construction.
> > It's very confusing that the IDP you set on your IWC is not in fact
> > the one that IW uses...
> >
> > Mike McCandless
> >
> > http://blog.mikemccandless.com
> &g
Hi
You should do the following:
TopFieldCollector tfc = TopFieldCollector.create();
FacetsCollector fc = FacetsCollector.create();
searcher.search(query, MultiCollector.wrap(tfc, fc));
Basically IndexSearcher.search(..., Sort) creates TopFieldCollector
internally, so you need to create it outsid
t later on
>pagination request to show facets.
> 3. On subsequent pagination requests use IndexSearcher.searchAfter
>method to get next set of results using ScoreDoc from session.
> 4. If user want to narrow down on facets then follow steps from 1 to 3
>using Drill-down featur
Hi
You cannot just update headers -- the file formats have changed. Therefore
you need to rewrite the index entirely, at least from 2.3.1 to 3.6.2 (for
4.1 to be able to read it).
If your index is already optimized, then IndexUpgrader is your best option.
The reason it calls forceMerge(1) is that
hie...@gmail.com> wrote:
> Thank you Shai for the quick response. Have responded inline.
>
>
> On Fri, Aug 2, 2013 at 5:37 PM, Shai Erera wrote:
>
> > Hi
> >
> > You cannot just update headers -- the file formats have changed.
> Therefore
> > you n
Rob, when DiskDV becomes the default DVFormat, would it not make sense to
load the values into the cache if someone uses FieldCache API? Vs. if
someone calls DV API directly, he uses whatever is the default Codec, or
the one that he plugs.
That's what I would expect from a 'cache'. So it's ok that
ok that makes sense.
Shai
On Mon, Aug 12, 2013 at 9:18 PM, Robert Muir wrote:
> On Mon, Aug 12, 2013 at 11:06 AM, Shai Erera wrote:
> >
> > Or, you'd like to keep FieldCache API for sort of back-compat with
> existing
> > features, and let the app control the &
Hi
SortedSetDocValuesAccumulator does receive FacetArrays in its ctor, so you
can pass ReusingFacetArrays. You will need to call FacetArrays.free() when
you're done with accumulation though. However, do notice that
ReusingFacetArrays did not show any big gain even with large taxonomies --
that is
Oops you're right, it was committed in LUCENE-4985 which will be released
in Lucene 4.5.
Shai
On Wed, Aug 28, 2013 at 6:16 PM, Krishnamurthy, Kannan <
kannan.krishnamur...@contractor.cengage.com> wrote:
> Thanks for the response. I double checked that
> SortedSetDocValuesAccumulator doesn't tak
How do you add documents to the index? Is it synchronized (such that
basically only one thread can add documents at a time)?
The same goes for removing documents as well.
Also, did you encounter any exceptions during the run - if say an addDoc
fails on one of the slices, then you need to revert th
Yes, looks like clearLock should be changed to not throw the exception, but
rather do a best effort - call delete() but don't respond to its return
value. I'll change that on 3x, I'm not sure if a backport to 3.0.x is needed
(doesn't seem to justify a 3.0.3 ...)
Shai
On Wed, Jul 7, 2010 at 8:59 A
we should do is try to forcefully unlock it first, and if that
succeeds then delete the lock file, ignoring the returned output. Or change
the javadocs.
I'll check it
Shai
On Wed, Jul 7, 2010 at 7:28 PM, Shai Erera wrote:
> Yes, looks like clearLock should be changed to not throw the ex
I committed a fix earlier today. clearLock will fail if the lock cannot be
released (meaning someone else holds it), however ignore the result of
file.delete().
Shai
On Wed, Jul 7, 2010 at 7:41 PM, Shai Erera wrote:
> Double-checking the code, this isn't that simple :). Someone
Depends for which query no? ;)
Sounds like you want to simulate the QP behavior
http://lucene.apache.org/java/2_4_0/queryparsersyntax.html for
boosting. Meaning, if for the query "b" you want to simulate the query
"b OR b$^2" and have matches of b$ count more than b, then I'd follow
how QP does it
his for me per result. The
> easiest path would be subcalssing Similarity, if only the relevant functions
> wouldn't have been deprecated...
>
> Are there any other ways to do so? For example, is this doable with
> function queries (since access to the actual term
ere are many tricks you can do on your end, w/o overriding much in
Lucene. Still, IMO extending QP is the easiest and gives you the control you
need.
Shai
On Mon, Jul 19, 2010 at 9:24 PM, Itamar Syn-Hershko wrote:
> On 19/7/2010 5:50 PM, Shai Erera wrote:
>
>> If your analyzer outp
You can also call deleteUnusedFiles(), and all unreferenced files will be
deleted either. Make sure to set the index DeletionPolicy to
KeepOnlyLastCommit (which is the default), before you do that. That's
relevant though if you've built the index using either 3x or 4.0 code.
If not, you can achiev
>
> Shai Erera brought a similar idea up before, to use Locale, but my concerns
> are it would be limited by javas Locale mechanism... but we can figure this
> out.
>
It really depends how sophisticated you want such an AnalyzerFactory
(that's how I call it in my code) to be.
ing the point here, but how do you define an analyzer <->
> language match? What do you do in cases of mixed content, for example?
>
> Itamar.
>
>
> On 25/9/2010 10:27 PM, Shai Erera wrote:
>
>> Shai Erera brought a similar idea up before, to use Locale, but my
>
There's a deleteAll() method on IndexWriter, which is very fast. After you
commit(), all documents won't be visible to searchers anymore. When the last
searcher will be closed, the documents will completely disappear from the
index. All in all it's quite a good approach to take.
You can also consi
Note that deleteAll does not require you to optimize anything. It literally
removes all segments from the index in one shot, and when the files are
unreferenced, they will be removed entirely.
Shai
On Wed, Oct 13, 2010 at 4:53 PM, Dan OConnor wrote:
> Jeff,
> I would suggest not deleting documen
When you close IndexWriter, it performs several operations that might have a
connection to the problem you describe:
* Commit all the pending updates -- if your update batch size is more or
less the same (i.e., comparable # of docs and total # bytes indexed), then
you should not see a performance
I'd even offer, if the index is small, perhaps you can post it
somewhere for us to download and debug trace commit()…
Also, though not very scientific, you can turn on debug messages by
setting an infoSfream and observe which print take the most to appear.
Not very accurate but if there's one oper
In Lucene 3x there is a new addIndexes which accepts Directory… that
simply registers the new indexes in the index, without running merges.
That makes addIndexes very fast.
Also, you can consider calling close(false) to not wait for merges.
That can speed things up as well.
But note that not run
Ok, so a couple of clarifications:
addIndexes(Directory...) *does not* trigger any merges. It simply registers
the incoming directories in the target index, and returns. You can later
call maybeMerge() or optimize() as you see fit.
Compound files are irrelevant to addIndexes - it just adds the in
That's right. In 3x though you have to call addIndexes followed by
maybeMerge if you want to achieve the same effect of
addindexesNoOptimize.
Shai
On Friday, November 12, 2010, Marc Sturlese wrote:
>
> Thanks, so clarifying. As far as I've understood, if I have to end up
> optimizing the index j
1 - 100 of 350 matches
Mail list logo