Re: Broken link in Lucene 3.5 JavaDoc?

2011-12-14 Thread Shai Erera
I will investigate it. In the meantime, this is the correct link: http://lucene.apache.org/java/3_5_0/api/contrib-facet/userguide.html Shai On Wed, Dec 14, 2011 at 3:08 PM, Lukáš Vlček wrote: > Hi, > > is there broken link in > > http://lucene.apache.org/java/3_5_0/api/all/org/apache/lucene/fac

Re: Broken link in Lucene 3.5 JavaDoc?

2011-12-14 Thread Shai Erera
get to the > o.a.l.facet package? > > On Wed, Dec 14, 2011 at 8:14 AM, Shai Erera wrote: > > I will investigate it. In the meantime, this is the correct link: > > http://lucene.apache.org/java/3_5_0/api/contrib-facet/userguide.html > > > > Shai > > > &

RE: Broken link in Lucene 3.5 JavaDoc?

2011-12-15 Thread Shai Erera
t; > - > > Uwe Schindler > > H.-H.-Meier-Allee 63, D-28213 Bremen > > http://www.thetaphi.de > > eMail: u...@thetaphi.de > > > > > > > -Original Message- > > > From: Shai Erera [mailto:ser...@gmail.com] > > > Sent: Thur

RE: Broken link in Lucene 3.5 JavaDoc?

2011-12-15 Thread Shai Erera
... issue for *it*, not 'other' :) Shai On Dec 15, 2011 2:47 PM, "Shai Erera" wrote: > If you already did it, then a patch will be great. Perhaps we should open > an issue for other? > > Shai > On Dec 15, 2011 11:44 AM, "Uwe Schindler" wrote: &g

RE: Broken link in Lucene 3.5 JavaDoc?

2011-12-15 Thread Shai Erera
u...@thetaphi.de > > > -----Original Message- > > From: Shai Erera [mailto:ser...@gmail.com] > > Sent: Thursday, December 15, 2011 1:47 PM > > To: java-user@lucene.apache.org > > Subject: RE: Broken link in Lucene 3.5 JavaDoc? > > > > If you already di

Re: Broken link in Lucene 3.5 JavaDoc?

2011-12-15 Thread Shai Erera
I opened LUCENE-3649. Shai On Thu, Dec 15, 2011 at 2:50 PM, Shai Erera wrote: > Sure, as soon as I'll be in front of a computer. > > Shai > On Dec 15, 2011 2:48 PM, "Uwe Schindler" wrote: > >> Yes, I could attach the patch there! Will you open it? >>

Re: TaxonomySearch & similar words?

2012-02-22 Thread Shai Erera
Hi Cheng, You will need to use the exact path labels in order to get to the category 'Mark Twain', unless you index multiple paths from start, e.g.: /author/American/Mark Twain /writer/American/Mart Twain The taxonomy index does not process the CategoryPath labels in anyway to e.g. produce synony

Re: facet vs group search

2012-02-28 Thread Shai Erera
If I understand 'group search' correctly, you mean grouping search results by some criteria? The main difference between grouping search results to faceted search is that when you group search results by some criteria, your request is something like "give me the top 3 results from each movie categ

Re: Document-Ids and Merges

2012-03-27 Thread Shai Erera
Or ... move to use a per-segment array. Then you don't need to rely on doc IDs changing. You will need to build the array from the documents that are in that segment only. It's like FieldCache in a way. The array is relevant as long as the segment exists (i.e. not merged away). Hope this helps.

Re: Document-Ids and Merges

2012-03-28 Thread Shai Erera
return new DocValues() { >public float floatVal(int doc) { >if(doc < values.length) >return values[doc]; >return 1.0f; >} >}; >} > } > > How would I need to change it to make the

Re: IndexWriter.isLock()

2012-05-07 Thread Shai Erera
If I understand correctly, you're using the NativeFSLockFactory and that's the expected behavior -- unlike SimpleFSLockFactory, if you terminate the JVM and then restart the program, the lock is not held anymore -- that's the advantage of using native-fs-lock because nobody really holds the lock an

Re: forcing an IndexWriter to close

2012-06-04 Thread Shai Erera
Hi You have several ways to do it: 1) Use NativeFSLockFactory, which obtains native locks that are released automatically when the process dies, as well as after a successful IndexWriter.close(). If your writer.close() is called just before the process terminates, then this might be a good soluti

Re: Auto commit when flush

2012-06-27 Thread Shai Erera
You could extend IndexWriter to AutoCommitIndexWriter and override flush() to call super.flush() then commit() (or simply just commit()). I haven't tested it but I think it should work. However, make sure you understand the implications of commit() -- it's heavier than just flush. Perhaps you can

Re: Upgrade to 3.6 OR wait for 4.0

2012-07-09 Thread Shai Erera
Hi Ganesh I recently upgraded my code to 3.6, and yesterday finished part of my upgrades to 4.0-ALPHA. Upgrading from 3.0.3 to 3.6 is relatively easy as all API should be backwards compatible. But I think there were some API breaks, and back-compat issues. Therefore, if I were you, I'd first upgr

Re: Upgrade to 3.6 OR wait for 4.0

2012-07-10 Thread Shai Erera
ly the stable > version. > > Regards > Ganesh > > > - Original Message - > From: "Shai Erera" > To: > Sent: Tuesday, July 10, 2012 10:50 AM > Subject: Re: Upgrade to 3.6 OR wait for 4.0 > > > > Hi Ganesh > > > > I recently

Re: Facet Support

2012-07-26 Thread Shai Erera
Hi. Facetted search exists since 3.5 and will exist in 4.0 too ! Shai On Jul 26, 2012 7:21 PM, "Subramanian, Ranjith" < ranjith.subraman...@capgemini.com> wrote: > Hi Team, > > ** ** > > I would like to know if Lucene 4.0 will support facetted search. > > Thanks in advance. > > ** **

Re: Excessive use of IOException without proper documentation

2012-11-04 Thread Shai Erera
Hey Mike, I'm not sure that I like the idea of throwing LuceneException or SearchException everywhere. I've been there (long time ago) and I always hated it. First, what's the difference between 'new SearchException("Failed to read the index", ioe)' and 'new IOException("Failed to read the index"

Re: Excessive use of IOException without proper documentation

2012-11-04 Thread Shai Erera
I think that specific exceptions should be thrown only in case we expect the user to do something with it. E.g. LockObtainException is something that I can catch and try to recover from in the code, maybe retry to obtain the lock. But all IOExceptions, maybe excluding FNFE, are unrecoverable in th

Re: Grouping on multiple shards possible in lucene?

2012-11-20 Thread Shai Erera
Hi Ravi, I've been dealing with reverse indexing lately, so let me share with you a bit of my experience thus far. First, you need to define what does reverse indexing mean for you. If it means that docs that were indexed in the following order: d1, d2, d3 should be traversed during search in tha

Re: Grouping on multiple shards possible in lucene?

2012-11-21 Thread Shai Erera
n the meantime, I will live with good old sorting > > -- > Ravi > > On Wed, Nov 21, 2012 at 1:59 AM, Shai Erera wrote: > > > Hi Ravi, > > > > I've been dealing with reverse indexing lately, so let me share with you > a > > bit of my experience

Re: Multiple facets in Lucene searches

2012-11-21 Thread Shai Erera
Hi Jan, Basically, DrillDown is a helper class for creating such queries. You're right that its query() methods create AND, because that's normally the case, but if you require OR, you could do this: BooleanQuery res = new BooleanQuery(); for (CategoryPath cp : paths) { res.add(new

Re: Alternative for WildcardQuery with leading *

2012-12-07 Thread Shai Erera
Really off the top of my head, if that's an expected query, you can try to index the words backwards (in that field) and then convert the query *plan to nalp* :). You can also index the suffixes of words, e.g. vacancyplan, acancyplan, cancyplan and so forth, and then convert the query *plan to pla

Re: Pulling lucene 4.1

2013-01-02 Thread Shai Erera
There's no specific branch for 4.1 yet. All development still happens on the 4x branch ( http://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x/). Note that Lucene maintains two active branches for development: 'trunk' (currently to be 5.0) and '4x' off of which all Lucene 4.x releases are

Re: Pulling lucene 4.1

2013-01-02 Thread Shai Erera
2, 2013 at 8:06 PM, Lance Norskog wrote: > 4.x does not promise backwards compatibility with 3.x. Have you made your > own extensions? > > On 01/02/2013 04:38 AM, Shai Erera wrote: > >> There's no specific branch for 4.1 yet. All development still happens on

Re: FacetedSearch and MultiReader

2013-01-21 Thread Shai Erera
Hi Nicola, I think that what you're describing corresponds to distributed faceted search. I.e., you have N content indexes, alongside N taxonomy indexes. The information that's indexed in each of those sub-indexes does not correlate with the other ones. For example, say that you index the category

Re: FacetedSearch and MultiReader

2013-01-21 Thread Shai Erera
I see the resulting > > categories indexes are not that big currently), but I would prefer to > > have a solution where I can collect the facets over multiple categories > > indexes in this way I will be sure the solution will scale better. > > > > > > Nicola. > >

Re: FacetedSearch and MultiReader

2013-01-22 Thread Shai Erera
know if this version will be released > soon? > > > Nicola. > > On Tue, 2013-01-22 at 06:20 +0200, Shai Erera wrote: > > Hi Nicola, > > > > What I had in mind is something similar to this, which is possible > starting > > with Lucene

Re: FacetedSearch and MultiReader

2013-01-23 Thread Shai Erera
, IMO, to the common user. Shai On Tue, Jan 22, 2013 at 4:57 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > On Mon, Jan 21, 2013 at 11:20 PM, Shai Erera wrote: > > > (unfortunately, there's still no tool in Lucene to do that for you). > > I thi

Re: Faceted search in OR

2013-01-24 Thread Shai Erera
Hi Nicola, Regarding the OR drill-down, yes you can construct your own BooleanQuery, passing Occur.SHOULD instead of MUST. Currently DrillDown does not help you do that, so you can copy the code from DrillDown.query and change SHOULD to MUST. I opened LUCENE-4716 to add this support to DrillDown.

Re: Faceted search in OR

2013-01-25 Thread Shai Erera
Ooops, I just realized that at some point java-user was removed from the CC :). Fixing that. Shai On Fri, Jan 25, 2013 at 2:27 PM, Shai Erera wrote: > Hi Nicola, > > Indeed, if it's a URL with parameters, it's not a UI trick :). I think > that you can do what you want

Re: Multiple faceting in lucene

2013-01-25 Thread Shai Erera
Hi Are the values of 'a' and 'b' known in advance? Is it a limited set of values? Are you always interested in a table which covers all values? If so, one way to do that is to each value of 'a' against all values of 'b'. Of course, pick as pivot the dimension with the least values. Note however t

Re: FacetRequest include residue

2013-01-29 Thread Shai Erera
Hi Nicola, How does the interface allow the user to select a facet values not from the top-10? How does the interface know which other facet values are there? Does it query the taxonomy somehow? One thing you can do is to set numResults to Integer.MAX_VALUE and numToLabel to 10. That way your Fac

Re: FacetRequest include residue

2013-01-29 Thread Shai Erera
randXCount = counts[brandXOrdinal]; // now you can build your final result with the count of Brand/X Hope that helps Shai On Tue, Jan 29, 2013 at 7:55 PM, Shai Erera wrote: > Hi Nicola, > > How does the interface allow the user to select a facet values not from > the top-10? How d

Re: Multiple faceting in lucene

2013-02-01 Thread Shai Erera
I'm glad to hear it helped you, Ramprakash. Don't hesitate to post questions to the list if you need further assistance! Shai On Fri, Feb 1, 2013 at 9:12 AM, Ramprakash Ramamoorthy < youngestachie...@gmail.com> wrote: > On Fri, Jan 25, 2013 at 6:23 PM, Shai Erera wrote: &

Re: How do I retrieve a list of child categories for a category?

2013-03-30 Thread Shai Erera
Hi You can do so quite easily, using TaxonomyReader, following code such as: ParallelTaxonomyArrays arrays = taxoReader.getParallelTaxonomyArrays(); int[] children = arrays.children(); int[] siblings = arrays.siblings(); int ordinal = taxoReader.getOrdinal(category); // ordinal of requested cate

Re: When should I commit IndexWriter and TaxonomyWriter if I use NRT readers?

2013-04-03 Thread Shai Erera
It's the same decision that you need to make regarding IndexWriter. You should commit when you want the data to be persistent. This can happen on a timer-basis (e.g. every 10 minutes), or following some application logic, e.g. finished crawling a website or indexing a chunk of documents. NRT suppo

Re: FacetedSearch and MultiReader

2013-04-09 Thread Shai Erera
orking correctly. > > > Nicola. > > On Thu, 2013-01-24 at 16:53 +, Nicola Buso wrote: > > Hi Shai, > > > > I'd like just to give you a confirmation that your solution is working > > after the tests I did. > > > > Thanks again for the use

Re: lucene 4.2 count on merged taxonomies

2013-04-11 Thread Shai Erera
Hi Nicola, I didn't read the code examples, but I'll relate to your last question regarding the Aggregator. Indeed, with Lucene 4.2, FacetRequest.createAggregator is not called by the default FacetsAccumulator. This method should go away from FacetRequest entirely, but unfortunately we did not fin

Re: Statically store sub-collections for search (faceted search?)

2013-04-13 Thread Shai Erera
Hi Carsten, You're right that Lucene document numbers are ephemeral, but they are consistent for a certain IndexReader instance. So perhaps you can use SearcherLifetimeManager to obtain a 'version' of the reader that returned the original results and store a bitset together with that version. Then

Re: Lucene 4.2, where is facet residue

2013-04-16 Thread Shai Erera
Hi Nicola, Yes, the residue was removed in LUCENE-4709 since it was a senseless number. If you index your facets with OrdinalPolicy.ALL_PARENTS, then the residue can be computed from root.value - sum(topK.value). Also, FacetResult.numValidDescendants actually contains the right statistic (total n

Re: Faceted Search: count direct matches/member für result nodes

2013-04-25 Thread Shai Erera
Hi It's ... tricky :). If you ask for depth=3, then you will never get idC because idB's count is 0. I think what you could do is the following: 1. Index the categories with NO_PARENTS 2. Write a FacetsAggregator which extends FastCountingFacetsAggregator 3. Override rollupValues() to c

Re: Big number of values for facets

2013-04-26 Thread Shai Erera
Hi Nicola, I think this limit denotes the number of bytes you can write in a single DV value. So this actually means much less number of facets you index. Do you know how many categories are indexed for that one document? Also, do you expect to index large number of facets for most documents, or

Re: Big number of values for facets

2013-04-26 Thread Shai Erera
o I've seen where partitions came in handy was IMO an abuse of the fact module ... :-) Shai On Apr 26, 2013 6:04 PM, "Shai Erera" wrote: > Hi Nicola, > > I think this limit denotes the number of bytes you can write in a single > DV value. So this actually means much l

Re: Big number of values for facets

2013-04-26 Thread Shai Erera
card > a bunch of facets values; I imagine there will be queries that will > point out some species (let me say) in the 32766 saved values and some > other queries that will point out the species not saved in the facets. > > We can try to save the most relevant values for this face

Re: Luke and Facet search

2013-05-01 Thread Shai Erera
Hi I don't think it's possible, not with the default configuration. The problem is that drill-down terms are created with a default delimiter, which is \u001F, and you can't really type that character. One way is to override FacetIndexingParams.getFacetDelimChar() to return a human readable chara

Re: search-time facetting in Lucene

2013-05-05 Thread Shai Erera
Hi Clive, In order to use Lucene facets you need to make indexing time decisions. It's not that you don't make these decisions anyway, even with Solr -- for example, you need to decide how to tokenize the fields by which you want to facet, or in Lucene 4.0 index them as SortedSetDocValuesField. I

Re: search-time facetting in Lucene

2013-05-06 Thread Shai Erera
he time to explain the situation. > > Clive > > > > > From: Shai Erera > To: "java-user@lucene.apache.org" ; kiwi > clive > Sent: Monday, May 6, 2013 5:56 AM > Subject: Re: search-time facetting in Lucene > > > Hi Clive, > &

Re: Collect facet only on specific values

2013-05-09 Thread Shai Erera
I think you can do what Mike suggested quite easily. Create your own FacetResultsHandler, override the Accumulater.createFRH(). The handler will zero out all counts you are not interested in and then delegate to the wrapped FRH to compute the actual top K. Shai On Thu, May 9, 2013 at 1:44 PM, Ni

Re: Retrieving FieldInfo

2013-05-14 Thread Shai Erera
If your documents *always* contain the same fields then yes. But in general, you can do: addDocument("f:value"); commit(); addDocument("c:value"); commit(); And each AtomicReader will contain different fields. As getFieldInfos() documents "Get the {@link FieldInfos} describing all fields in *this

Re: Faceted search using Lucene 4.3

2013-05-15 Thread Shai Erera
Hi Raj, Unfortunately the userguide is outdated after refactorings made to the package. We have an issue open to fix that. Until then, you can find an example code here: https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_3/lucene/demo/src/java/org/apache/lucene/demo/facet/SimpleFac

Re: Faceted search using Lucene 4.3

2013-05-20 Thread Shai Erera
ect based on the sample provided. But a bit confused on how to query > the document content which also has associated Facets. > > > On Thu, May 16, 2013 at 11:34 AM, raj wrote: > > > Thanks a lot Shai! That was really quick response. > > > > > &g

Re: Faceted Search: count direct matches/member für result nodes

2013-05-27 Thread Shai Erera
Hi To override OrdinalPolicy you need to do the following: FacetIndexingParams fip = new FacetIndexingParams() { public CategoryListParams getCategoryListParams(CategoryPath) { return new CategoryListParams() { public OrdinalPolicy getOrdinalPolicy(String) {} } } } BTW, in the

Re: Faceted Search: count direct matches/member für result nodes

2013-05-28 Thread Shai Erera
ble then?) for helping me figure out how to achieve > my goals. > > Thanks a lot and sorry for asking THAT much! > -Danny > > > -Ursprüngliche Nachricht- > Von: Shai Erera [mailto:ser...@gmail.com] > Gesendet: Montag, 27. Mai 2013 15:18 > An: java-user@lucene.a

Re: Faceted Search: count direct matches/member für result nodes

2013-05-28 Thread Shai Erera
bership (hierarchical categories) and access them based > on required results? > > If something isn't clear, please ask and I'll explain obscurities. > Thanks a lot for spent your precious time for solve my problem! > > -Danny > > > -Ursprüngliche Nachricht

Re: Faceted Search: count direct matches/member für result nodes

2013-05-28 Thread Shai Erera
C3 (1) > C4 (2) > C5 (1) > C6 (1) > C7 (1) > > 3. Hierarchical, indirect membership: > C1 (2) > | > |- C2 (1) > |- C3 (1) > |- C4 (2) > | > |- C5 (1) > | |- C6 (1) > | > |- C7 (1) > > 4. Hierarchic

Re: simple question about decRef

2013-06-02 Thread Shai Erera
The best practice is: - The component which calls DirectoryReader.open should call close() - Any code which calls incRef() should match that call with decRef(), preferably in a try-finally clause Shai On Sun, Jun 2, 2013 at 6:18 AM, Yonghui Zhao wrote: > Thanks, Michael. > > My under

Re: Message via your Google Profile: Lucene limits

2013-06-02 Thread Shai Erera
Hi Oded, These times sound way too high, even for really hard queries. Can you share a bit about how you index the documents and what do they contain? Specifically: - How many facet dimensions per-document do you have? - Are the dimensions unique, i.e. a document has one category from a

Re: Taking backup of a Lucene index

2013-06-06 Thread Shai Erera
Hi Taking a backup of the index by doing a naive file copy is not a good approach. As you mentioned, Lucene does background merging and if your application suddenly commits, old segment files may be deleted. Also, your backup will most probably include files that were not committed yet. Rather, y

Re: Payload Matching Query

2013-06-20 Thread Shai Erera
There are several ways to implement it : Query as you mentioned. You'd need to implement a Scorer which traverses the posting list where the payload exists. The methods you should implement are nextDoc() and advance(). You'll also need to traverse DocsAndPositionsEnum. A Filter. That's somewhat e

Re: Accumulating facets over a MultiReader

2013-07-01 Thread Shai Erera
Hi, I assume that you use a single TaxonomyReader instance? It must be the same for both indexes, that is, both indexes must share the same taxonomy index, or otherwise their ordinals would not match as well as you may hit such exceptions since one index may have bigger ordinals than what the taxo

Re: Facets ordering

2013-07-02 Thread Shai Erera
Do you want your top-K to be computed by label too? Or first deduce the top-K facets, then sort them otherwise? Shai On Tue, Jul 2, 2013 at 6:36 PM, Nicola Buso wrote: > Hi, > > I was looking to change the order of the facet results; in this case I > would like to order by the facet label inst

Re: Accumulating facets over a MultiReader

2013-07-03 Thread Shai Erera
x, using > IndexWriter.AddIndexes(). > If the temp index has facet index, this approach creates a bad index. > > Is there a way I can build faceted index in multiple threads? > > - Gao Peng > > > -Original Message- > > From: Shai Erera [mailto:ser...@gma

Re: Accumulating facets over a MultiReader

2013-07-03 Thread Shai Erera
op through the temp index, and for each doc, check if it's already in > the master, > addDocument() only if it doesn't exist. > Now I have facets, how do I selectively merge docs? > > Thanks again for your help, > Gao Peng > > > > -Original Message- >

Re: Facets ordering

2013-07-03 Thread Shai Erera
the values in the original > hierarchy in the new created one? Too expensive? > > > Nicola. > > > > On Tue, 2013-07-02 at 20:49 +0300, Shai Erera wrote: > > Well, in general it can be done, but it won't be cheap. You can > > implement a FacetResultsHandler which in

Re: Facets ordering

2013-07-04 Thread Shai Erera
RangeFacetRequest will be released in 4.4, I guess a couple of weeks away. Shai On Jul 4, 2013 12:02 PM, "Nicola Buso" wrote: > On Wed, 2013-07-03 at 21:58 +0300, Shai Erera wrote: > > What's maxCount? What I mean is that if you create a FacetRequest with > >

Re: Accumulating facets over a MultiReader

2013-07-05 Thread Shai Erera
acetFields.addFacets() on the doc works. > > Given that I need to check the uniqueness before merging an index with > facets > into a master, is there better way to it without re-indexing? > > Gao Peng > > > > -Original Message- > > From: Shai Erera [mai

Re: Accumulating facets over a MultiReader

2013-07-05 Thread Shai Erera
no option 1 is better than reindexing, but option 2 is the fastest imo. Shai On Fri, Jul 5, 2013 at 6:55 PM, Peng Gao wrote: > Thanks. > > Yes, that's the case. I'll try it out. > > Is Option 1 more expensive than re-indexing? > > > > -Origi

Re: How to start with Lucene 4.6.1

2013-07-08 Thread Shai Erera
Well ... at a high level, this is what you should do: 1. Integrate with Apache Tika for parsing the .DOC files (and maybe other office files you have) 2. Tika extracts the contents of the document, as well as some metadata 3. Create a Lucene Document object to which you add Fields:

Re: How to start with Lucene 4.6.1

2013-07-08 Thread Shai Erera
useful for Java > newbie? > > > > -- > Thanks and Best Regards > > Vinh Dang (Msc.) > Project Manager > FPT Software > Mobile: +84 982 058 956 > Skype: dqvinh87 > Y!M:dqvinh87 > Email: dqvin...@gmail.com > Websites: http://www.vinhdq.blogspot.com > &g

Re: A SPI class of type org.apache.lucene.codecs.Codec with name 'Lucene42' does not exist

2013-07-13 Thread Shai Erera
Hmm, does that mean that Lucene 4.0+ cannot run on Android? Shai On Sat, Jul 13, 2013 at 6:51 PM, VIGNESH S wrote: > Hi Robert, > > Thanks for your reply. > > If possible,can you please explain why this new class loading mechanism was > introduced in Lucene 4 > > Thanks and Regards > Vignesh >

Re: A SPI class of type org.apache.lucene.codecs.Codec with name 'Lucene42' does not exist

2013-07-13 Thread Shai Erera
t implement java, its not java. > > On Sat, Jul 13, 2013 at 2:23 PM, Shai Erera wrote: > > > Hmm, does that mean that Lucene 4.0+ cannot run on Android? > > > > Shai > > > > > > On Sat, Jul 13, 2013 at 6:51 PM, VIGNESH S > > wrote: > > > &g

Re: Partial word match using n-grams

2013-07-18 Thread Shai Erera
There are several options: As Allison suggested, pad your words with ##, so that "quota tom" becomes "##quota## ##tom##" at indexing time, and the query "quota to" becomes either "##quota ##to", or if you want to optimize, only pad query terms < 3 characters, so it becomes "quota ##to". That shoul

Re: Partial word match using n-grams

2013-07-19 Thread Shai Erera
Wait, I didn't mean to pad the entire string. If the string is broken on _ already, then NGramFilter already receives the individual terms and you can put a Filter in front that will pass through a padded token? Shai On Fri, Jul 19, 2013 at 3:45 PM, Becker, Thomas wrote: > In general the data f

Re: Question on Lucene hot-backup functionality.

2013-07-23 Thread Shai Erera
Hi In Lucene 4.4 we've improved the snapshotting process so that you don't need to specify an ID. Also, there's a new Replicator module which can be used for just that purpose - take hot index backups of the index. It pretty much hides most of the snapshotting from you. You can read about it here:

Re: Question on Lucene hot-backup functionality.

2013-07-25 Thread Shai Erera
4876, > > where we added cloning of IndexDeletionPolicy on IW construction. > > It's very confusing that the IDP you set on your IWC is not in fact > > the one that IW uses... > > > > Mike McCandless > > > > http://blog.mikemccandless.com > &g

Re: Lucene 4 - Faceted Search with Sorting

2013-08-01 Thread Shai Erera
Hi You should do the following: TopFieldCollector tfc = TopFieldCollector.create(); FacetsCollector fc = FacetsCollector.create(); searcher.search(query, MultiCollector.wrap(tfc, fc)); Basically IndexSearcher.search(..., Sort) creates TopFieldCollector internally, so you need to create it outsid

Re: Lucene 4 - Faceted Search with Sorting

2013-08-02 Thread Shai Erera
t later on >pagination request to show facets. > 3. On subsequent pagination requests use IndexSearcher.searchAfter >method to get next set of results using ScoreDoc from session. > 4. If user want to narrow down on facets then follow steps from 1 to 3 >using Drill-down featur

Re: IndexUpgrade - Any ways to speed up?

2013-08-02 Thread Shai Erera
Hi You cannot just update headers -- the file formats have changed. Therefore you need to rewrite the index entirely, at least from 2.3.1 to 3.6.2 (for 4.1 to be able to read it). If your index is already optimized, then IndexUpgrader is your best option. The reason it calls forceMerge(1) is that

Re: IndexUpgrade - Any ways to speed up?

2013-08-02 Thread Shai Erera
hie...@gmail.com> wrote: > Thank you Shai for the quick response. Have responded inline. > > > On Fri, Aug 2, 2013 at 5:37 PM, Shai Erera wrote: > > > Hi > > > > You cannot just update headers -- the file formats have changed. > Therefore > > you n

Re: How to retrieve value of NumericDocValuesField in similarity

2013-08-12 Thread Shai Erera
Rob, when DiskDV becomes the default DVFormat, would it not make sense to load the values into the cache if someone uses FieldCache API? Vs. if someone calls DV API directly, he uses whatever is the default Codec, or the one that he plugs. That's what I would expect from a 'cache'. So it's ok that

Re: How to retrieve value of NumericDocValuesField in similarity

2013-08-12 Thread Shai Erera
ok that makes sense. Shai On Mon, Aug 12, 2013 at 9:18 PM, Robert Muir wrote: > On Mon, Aug 12, 2013 at 11:06 AM, Shai Erera wrote: > > > > Or, you'd like to keep FieldCache API for sort of back-compat with > existing > > features, and let the app control the &

Re: Huge FacetArrays while using SortedSetDocValuesAccumulator

2013-08-27 Thread Shai Erera
Hi SortedSetDocValuesAccumulator does receive FacetArrays in its ctor, so you can pass ReusingFacetArrays. You will need to call FacetArrays.free() when you're done with accumulation though. However, do notice that ReusingFacetArrays did not show any big gain even with large taxonomies -- that is

Re: Huge FacetArrays while using SortedSetDocValuesAccumulator

2013-08-28 Thread Shai Erera
Oops you're right, it was committed in LUCENE-4985 which will be released in Lucene 4.5. Shai On Wed, Aug 28, 2013 at 6:16 PM, Krishnamurthy, Kannan < kannan.krishnamur...@contractor.cengage.com> wrote: > Thanks for the response. I double checked that > SortedSetDocValuesAccumulator doesn't tak

Re: Problems with homebrew ParallelWriter

2010-06-23 Thread Shai Erera
How do you add documents to the index? Is it synchronized (such that basically only one thread can add documents at a time)? The same goes for removing documents as well. Also, did you encounter any exceptions during the run - if say an addDoc fails on one of the slices, then you need to revert th

Re: Issue Lucene-2421 and NativeFSLockFactory.clearLock behaviour?

2010-07-07 Thread Shai Erera
Yes, looks like clearLock should be changed to not throw the exception, but rather do a best effort - call delete() but don't respond to its return value. I'll change that on 3x, I'm not sure if a backport to 3.0.x is needed (doesn't seem to justify a 3.0.3 ...) Shai On Wed, Jul 7, 2010 at 8:59 A

Re: Issue Lucene-2421 and NativeFSLockFactory.clearLock behaviour?

2010-07-07 Thread Shai Erera
we should do is try to forcefully unlock it first, and if that succeeds then delete the lock file, ignoring the returned output. Or change the javadocs. I'll check it Shai On Wed, Jul 7, 2010 at 7:28 PM, Shai Erera wrote: > Yes, looks like clearLock should be changed to not throw the ex

Re: Issue Lucene-2421 and NativeFSLockFactory.clearLock behaviour?

2010-07-08 Thread Shai Erera
I committed a fix earlier today. clearLock will fail if the lock cannot be released (meaning someone else holds it), however ignore the result of file.delete(). Shai On Wed, Jul 7, 2010 at 7:41 PM, Shai Erera wrote: > Double-checking the code, this isn't that simple :). Someone

Re: Scoring exact matches higher in a stemmed field

2010-07-16 Thread Shai Erera
Depends for which query no? ;) Sounds like you want to simulate the QP behavior http://lucene.apache.org/java/2_4_0/queryparsersyntax.html for boosting. Meaning, if for the query "b" you want to simulate the query "b OR b$^2" and have matches of b$ count more than b, then I'd follow how QP does it

Re: Scoring exact matches higher in a stemmed field

2010-07-19 Thread Shai Erera
his for me per result. The > easiest path would be subcalssing Similarity, if only the relevant functions > wouldn't have been deprecated... > > Are there any other ways to do so? For example, is this doable with > function queries (since access to the actual term

Re: Scoring exact matches higher in a stemmed field

2010-07-22 Thread Shai Erera
ere are many tricks you can do on your end, w/o overriding much in Lucene. Still, IMO extending QP is the easiest and gives you the control you need. Shai On Mon, Jul 19, 2010 at 9:24 PM, Itamar Syn-Hershko wrote: > On 19/7/2010 5:50 PM, Shai Erera wrote: > >> If your analyzer outp

Re: 140GB index directory, what can I do?

2010-08-14 Thread Shai Erera
You can also call deleteUnusedFiles(), and all unreferenced files will be deleted either. Make sure to set the index DeletionPolicy to KeepOnlyLastCommit (which is the default), before you do that. That's relevant though if you've built the index using either 3x or 4.0 code. If not, you can achiev

Re: finding the analyzer for a language...

2010-09-25 Thread Shai Erera
> > Shai Erera brought a similar idea up before, to use Locale, but my concerns > are it would be limited by javas Locale mechanism... but we can figure this > out. > It really depends how sophisticated you want such an AnalyzerFactory (that's how I call it in my code) to be.

Re: finding the analyzer for a language...

2010-09-26 Thread Shai Erera
ing the point here, but how do you define an analyzer <-> > language match? What do you do in cases of mixed content, for example? > > Itamar. > > > On 25/9/2010 10:27 PM, Shai Erera wrote: > >> Shai Erera brought a similar idea up before, to use Locale, but my >

Re: How about lucene's delete performance ?

2010-10-13 Thread Shai Erera
There's a deleteAll() method on IndexWriter, which is very fast. After you commit(), all documents won't be visible to searchers anymore. When the last searcher will be closed, the documents will completely disappear from the index. All in all it's quite a good approach to take. You can also consi

Re: How about lucene's delete performance ?

2010-10-13 Thread Shai Erera
Note that deleteAll does not require you to optimize anything. It literally removes all segments from the index in one shot, and when the files are unreferenced, they will be removed entirely. Shai On Wed, Oct 13, 2010 at 4:53 PM, Dan OConnor wrote: > Jeff, > I would suggest not deleting documen

Re: IndexWriter.close() performance issue

2010-11-02 Thread Shai Erera
When you close IndexWriter, it performs several operations that might have a connection to the problem you describe: * Commit all the pending updates -- if your update batch size is more or less the same (i.e., comparable # of docs and total # bytes indexed), then you should not see a performance

Re: IndexWriter.close() performance issue

2010-11-03 Thread Shai Erera
I'd even offer, if the index is small, perhaps you can post it somewhere for us to download and debug trace commit()… Also, though not very scientific, you can turn on debug messages by setting an infoSfream and observe which print take the most to appear. Not very accurate but if there's one oper

Re: performance merging indexes with addIndexesNoOptimize

2010-11-12 Thread Shai Erera
In Lucene 3x there is a new addIndexes which accepts Directory… that simply registers the new indexes in the index, without running merges. That makes addIndexes very fast. Also, you can consider calling close(false) to not wait for merges. That can speed things up as well. But note that not run

Re: performance merging indexes with addIndexesNoOptimize

2010-11-12 Thread Shai Erera
Ok, so a couple of clarifications: addIndexes(Directory...) *does not* trigger any merges. It simply registers the incoming directories in the target index, and returns. You can later call maybeMerge() or optimize() as you see fit. Compound files are irrelevant to addIndexes - it just adds the in

Re: performance merging indexes with addIndexesNoOptimize

2010-11-12 Thread Shai Erera
That's right. In 3x though you have to call addIndexes followed by maybeMerge if you want to achieve the same effect of addindexesNoOptimize. Shai On Friday, November 12, 2010, Marc Sturlese wrote: > > Thanks, so clarifying. As far as I've understood, if I have to end up > optimizing the index j

  1   2   3   4   >