Re: SortingMergePolicy for already sorted segments

2014-06-17 Thread Ravikumar Govindarajan
Shai, This is the code snippet I use inside my class... public class MySorter extends Sorter { @Override public DocMap sort(AtomicReader reader) throws IOException { final MapInteger, BytesRef docVsId = loadSortTerm(reader); final Sorter.DocComparator comparator = new

RE: Lucene Upgrade from 2.9.x to 4.7.x

2014-06-17 Thread Uwe Schindler
Hi, Thanks Uwe. I tried this path and I do not find any .cfs files. Lucene 3 and Lucene 4 indexes do not necessarily always contain CFS files, especially not if they are optimized. This depends on the merge policy. The index upgrader uses the default one, which creates no CFS files for the

Search degradation on Windows when upgrading from lucene 3.6 to lucene 4.7.2

2014-06-17 Thread Shlomit Rosen
Hi, We are in the process of upgrading from lucene 3.6.0 to lucene 4.7.2, and our tests show a significant search degradation on Windows platform. Trying to figure this out, here are a couple of points we noticed. Any suggestions/thoughts will be greatly appreciated. Thanks! 1) Running

Re: SortingMergePolicy for already sorted segments

2014-06-17 Thread Ravikumar Govindarajan
I am afraid the DocMap still maintains doc-id mappings till merge and I am trying to avoid it... I think lucene itself has a MergeIterator in o.a.l.util package. A MergePolicy can wrap a simple MergeIterator for iterating docs across different AtomicReaders in correct sort-order for a given

Re: SortingMergePolicy for already sorted segments

2014-06-17 Thread Shai Erera
I am afraid the DocMap still maintains doc-id mappings till merge and I am trying to avoid it... What do you mean 'till merge'? The method OneMerge.getMergeReaders() is called only when the merge is executed, not when the MergePolicy decided to merge those segments. Therefore the DocMap is

Re: SortingMergePolicy for already sorted segments

2014-06-17 Thread Ravikumar Govindarajan
Therefore the DocMap is initialized only when the merge actually executes ... what is there more to postpone? Agreed. However, what I am asking is, if there is an alternative to DocMap, will that be better? Plz read-on And besides, if the segments are already sorted, you should return a

Re: Facets in Lucene 4.7.2

2014-06-17 Thread Sandeep Khanzode
Hi, Thanks again! This time, I have indexed data with the following specs. I run into 40 seconds for the FastTaxonomyFacetCounts to create all the facets. Is this as per your measurements? Subsequent runs fare much better probably because of the Windows file system cache. How can I speed

Facet migration 4.6.1 to 4.7.0

2014-06-17 Thread Nicola Buso
Hi, I'm migrating from lucene 4.6.1 to 4.8.1 and I noticed some Facet API changes happened on 4.7.0 probably mostly related to this ticket: http://issues.apache.org/jira/browse/LUCENE-5339 Here are few question about some customization/extension we did and seem not having a direct

Indexing size increase 20% after switching from lucene 4.4 to 4.5 or 4.8 with BinaryDocValuesField

2014-06-17 Thread Zhao, Gang
I used lucene 4.4 to create index for some documents. One of the indexing fields is BinaryDocValuesField. After I change the dependency to lucene 4.5. The index size for 1 million documents increases from 293MB to 357MB. If I did not use BinaryDocValuesField, the index size increases only about

Re: Indexing size increase 20% after switching from lucene 4.4 to 4.5 or 4.8 with BinaryDocValuesField

2014-06-17 Thread Robert Muir
Again, because merging is based on byte size, you have to be careful how you measure (hint: use LogDocMergePolicy). Otherwise you are comparing apples and oranges. Separately, your configuration is using experimental codecs like disk/memory which arent as heavily benchmarked etc as the default

Re: Facets in Lucene 4.7.2

2014-06-17 Thread Shai Erera
Hi 40 seconds for faceted search is ... crazy. Also, note how the times don't differ much even though the number of hits is much higher (29K vs 15.1M) ... That, w/ that you say that subsequent queries are much faster (few seconds) suggests that something is seriously messed up w/ your

Re: SortingMergePolicy for already sorted segments

2014-06-17 Thread Shai Erera
OK I think I now understand what you're asking :). It's unrelated though to SortingMergePolicy. You propose to do the merge part of a merge-sort, since we know the indexes are already sorted, right? This is something we've considered in the past, but it is very tricky (see below) and we went with

Re: SortingMergePolicy for already sorted segments

2014-06-17 Thread Shai Erera
That said... if we generate the global DocMap up front, there's no reason to not execute the merge of the segments more efficiently, i.e. without wrapping them in a SlowCompositeReaderWrapper. But that's not work for SortingMergePolicy, it's either a special SortingAtomicReader which wraps a

Re: Facets in Lucene 4.7.2

2014-06-17 Thread Sandeep Khanzode
Hi, Thanks for your response. It does sound pretty bad which is why I am not sure whether there is an issue with the code, the index, the searcher, or just the machine, as you say.  I will try with another machine just to make sure and post the results. Meanwhile, can you tell me if there is

Re: Facets in Lucene 4.7.2

2014-06-17 Thread Shai Erera
Nothing suspicious ... code looks fine. The call to FastTaxoFacetCounts actually computes the counts ... that's the expensive part of faceted search. How big is your taxonomy (number categories)? Is it hierarchical (i.e. are your dimensions flat, or deep like A/1/2/3/)? What does your

Lucene QueryParser/Analyzer inconsistency

2014-06-17 Thread Luis Pureza
Hi, I'm experience a puzzling behaviour with the QueryParser and was hoping someone around here can help me. I have a very simple Analyzer that tries to replace forward slashes (/) by spaces. Because QueryParser forces me to escape strings with slashes before parsing, I added a MappingCharFilter

Re: Facets in Lucene 4.7.2

2014-06-17 Thread Sandeep Khanzode
If I am counting correctly, the $facets field in the index shows a count of approx. 28k. That does not sound like much, I guess. All my facets are flat and the FacetsConfig only defines a couple of them to be multi-valued. Let me know if I am not counting the taxonomy size correctly. The

Re: Facets in Lucene 4.7.2

2014-06-17 Thread Shai Erera
You can get the size of the taxonomy by calling taxoReader.getSize(). What does the 28K of the $facets field denote - the number of terms (drill-down)? If so, that sounds like your taxonomy is of that size. And indeed, this is a tiny taxonomy ... How many facets do you record per document? This

Re: Lucene QueryParser/Analyzer inconsistency

2014-06-17 Thread Jack Krupansky
Yeah, this is kind of tricky and confusing! Here's what happens: 1. The query parser parses the input string into individual source terms, each delimited by white space. The escape is removed in this process, but... no analyzer has been called at this stage. 2. The query parser (generator)