date:20111006


[ 
https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13121779#comment-13121779
 ] 

Uwe Schindler commented on LUCENE-1536:
---

Hi Chris, hi Male,

I was going to bed after my last post. I had a crisis with two facts in the new 
API, that do no play nicely together. I thought the whole night about it again 
and I also started to recode some details last evening, but all was not so fine 
(but I found lots of problems, so it's a good thing that I started to code - 
especially on several filters that are not so basic like those which only use 
FixedFitSet/OpenBitSet):

# the hidden implementation of Bits is a nice idea, but has one big problem: 
Java is a strongly-typed language. If a DocIdSet implements Bits, but you want 
to wrap it using FilteredDocIdSet, this interface implementation  might 
suddenly go away, because the wrapper class does not implement Bits. If we make 
FilteredDocIdSet implement Bits, its also wrong, as it might wrap another 
DocIdSet that is not random access. So I tend to keep DocIdSet abstrcat and let 
it only expose functions that return a Bits interface. The same is that 
DocIdSet does not directly implement DocIdSetIterator, it can just return one. 
So I would strongly recommend to add a method like iterator() that returns a 
impl and not rely on marker interfaces. I would favor Bits DocIdSet.bits() 
- would be in line with the iterator method. If the implementing class like 
FixedBitSet implements it itsself and returns this is an implementation 
detail. If DocIdSet does not allow random access it should expose with an 
exception thrown by bits or if it returns null. Does not really matter to me. - 
In general a wrapper like FilteredDocIdSet can do this in one class, wrapping 
bits() would check if bits() returns non-null, and then wrap another wrapper 
around bits() that uses match() to filter. The impl of this class is fast and 
supports both (iterator and bits, if available).
# the other thing, I dont like, is the setContainsOnlyLiveDocs setter on 
DocIdSet. It allows anybody to change the DocIdSet (which should have an API 
that exposes only read-access). Only classes like FixedBitSet that implement 
this read-only interface might be able to change it from their own API (means 
the setter might be in the various DocIdSet implementations in oal.util). A 
consumer of the filter should not be able to change the DocIdSet behaviour from 
outside using a public API. I started to rewrite this yesterday and only left 
the getter in DocIdSet, but added the setter to FixedBitSet, OpenBitSet, 
DocIdBitSet,... The setter in the abstract base class also violates 
unmodifiable of EMPTY_DOCIDSET. This impl should be 
containsOnlyLiveDocs=true) and this must be unchangeable fixed.
# Also DocIdSet is a class not really related solely to Filters, e.g. Scorer 
extends DocIdSetIterator or DocsEnum extends DocIdSetIterator, Solr Facetting 
uses DocIdSet. DocIdSet is just a holder class for a bunch of documents 
exposing a iterator (and a Bits API - this is why I want two getter methods and 
no interface magic)). The existence of live docs is outside it's scope. I 
therefore would like a similar API like for scorers, so IndexSearcher can ask 
the Filter for a DocIdSet based on the given liveDocs (like the scorer method 
in Weights). The returned DocIdSet would not know if it only has live Docs or 
not (as the Scorer itsself also does not expose this information). 
CachingWrapperFilter is little bit special, but this one would always ask the 
wrapped Filter for a DocidSet without deletions and cache that one, but always 
return a FilteredDocIdSet bringing the liveDocs passed from IndexSearcher in. 
The cache would then always be without LiveDocs and easier to maintain. 
Reopening segments would never need to reload cache. CachingWrapperFilter would 
just decide on the fact if IndexSearcher passes a liveDocs BitSet or not, if it 
needs to use it or not (in its own getDocIdSet method). If we have a query and 
only filter some documents, IndexSearcher already knows about liveDocs from the 
main scorer and would pass null to the filter. This would remove lots of 
additional checks to liveDocs. Only the main scorer would know about them, the 
filter will ignore them (so there is no overhead in CachingWrapperFilter, as it 
can return the cached filter directly to IndexSearcher, without wrapping). 
QueryWrapperFilter could pass the liveDocs through the wrapped filter, too.

I may have time today to implement some parts of this, should not be to 
difficult.

 if a filter can support random access API, we should use it
 ---

 Key: LUCENE-1536
 URL: https://issues.apache.org/jira/browse/LUCENE-1536
 Project: Lucene - Java
  Issue Type: Improvement

[jira] [Commented] (LUCENE-1536) if a filter can support random access API, we should use it

2011-10-06 Thread Chris Male (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13121786#comment-13121786
 ] 

Chris Male commented on LUCENE-1536:


Okay thats alot to take in again.

You've made a good case for dropping setContainsOnlyLiveDocs, I totally agree.  
I really do like the idea of adding the acceptDocs to Filter.getDocIdSet.

I'm also comfortable with adding .bits() to DocIdSet to address the typing 
problem.

Should we bash out a quick patch making these changes and see how it looks?

 if a filter can support random access API, we should use it
 ---

 Key: LUCENE-1536
 URL: https://issues.apache.org/jira/browse/LUCENE-1536
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/search
Affects Versions: 2.4
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: CachedFilterIndexReader.java, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch


 I ran some performance tests, comparing applying a filter via
 random-access API instead of current trunk's iterator API.
 This was inspired by LUCENE-1476, where we realized deletions should
 really be implemented just like a filter, but then in testing found
 that switching deletions to iterator was a very sizable performance
 hit.
 Some notes on the test:
   * Index is first 2M docs of Wikipedia.  Test machine is Mac OS X
 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153.
   * I test across multiple queries.  1-X means an OR query, eg 1-4
 means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2
 AND 3 AND 4.  u s means united states (phrase search).
   * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90,
 95, 98, 99, 99.9 (filter is non-null but all bits are set),
 100 (filter=null, control)).
   * Method high means I use random-access filter API in
 IndexSearcher's main loop.  Method low means I use random-access
 filter API down in SegmentTermDocs (just like deleted docs
 today).
   * Baseline (QPS) is current trunk, where filter is applied as iterator up
 high (ie in IndexSearcher's search loop).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-1536) if a filter can support random access API, we should use it

2011-10-06 Thread Martijn van Groningen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13121790#comment-13121790
 ] 

Uwe Schindler commented on LUCENE-1536:
---

+1, I have to revert here a lot again because I was trying to move the 
setLiveDocsOnly/liveDocsOnly down to FixedBitSet  Co, but this is too 
complicated.

Should I start to hack something together? The biuggest change will be in all 
filter impls to add the parameter to getDocIdSet().

 if a filter can support random access API, we should use it
 ---

 Key: LUCENE-1536
 URL: https://issues.apache.org/jira/browse/LUCENE-1536
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/search
Affects Versions: 2.4
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: CachedFilterIndexReader.java, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch


 I ran some performance tests, comparing applying a filter via
 random-access API instead of current trunk's iterator API.
 This was inspired by LUCENE-1476, where we realized deletions should
 really be implemented just like a filter, but then in testing found
 that switching deletions to iterator was a very sizable performance
 hit.
 Some notes on the test:
   * Index is first 2M docs of Wikipedia.  Test machine is Mac OS X
 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153.
   * I test across multiple queries.  1-X means an OR query, eg 1-4
 means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2
 AND 3 AND 4.  u s means united states (phrase search).
   * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90,
 95, 98, 99, 99.9 (filter is non-null but all bits are set),
 100 (filter=null, control)).
   * Method high means I use random-access filter API in
 IndexSearcher's main loop.  Method low means I use random-access
 filter API down in SegmentTermDocs (just like deleted docs
 today).
   * Baseline (QPS) is current trunk, where filter is applied as iterator up
 high (ie in IndexSearcher's search loop).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-1536) if a filter can support random access API, we should use it

2011-10-06 Thread Chris Male (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13121807#comment-13121807
 ] 

Chris Male commented on LUCENE-1536:


Yes please put something together and then we'll review / iterate.

 if a filter can support random access API, we should use it
 ---

 Key: LUCENE-1536
 URL: https://issues.apache.org/jira/browse/LUCENE-1536
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/search
Affects Versions: 2.4
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: CachedFilterIndexReader.java, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch


 I ran some performance tests, comparing applying a filter via
 random-access API instead of current trunk's iterator API.
 This was inspired by LUCENE-1476, where we realized deletions should
 really be implemented just like a filter, but then in testing found
 that switching deletions to iterator was a very sizable performance
 hit.
 Some notes on the test:
   * Index is first 2M docs of Wikipedia.  Test machine is Mac OS X
 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153.
   * I test across multiple queries.  1-X means an OR query, eg 1-4
 means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2
 AND 3 AND 4.  u s means united states (phrase search).
   * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90,
 95, 98, 99, 99.9 (filter is non-null but all bits are set),
 100 (filter=null, control)).
   * Method high means I use random-access filter API in
 IndexSearcher's main loop.  Method low means I use random-access
 filter API down in SegmentTermDocs (just like deleted docs
 today).
   * Baseline (QPS) is current trunk, where filter is applied as iterator up
 high (ie in IndexSearcher's search loop).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (LUCENE-3433) Random access non RAM resident IndexDocValues (CSF)

2011-10-06 Thread Simon Willnauer (Assigned) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer reassigned LUCENE-3433:
---

Assignee: Simon Willnauer

 Random access non RAM resident IndexDocValues (CSF)
 ---

 Key: LUCENE-3433
 URL: https://issues.apache.org/jira/browse/LUCENE-3433
 Project: Lucene - Java
  Issue Type: New Feature
Affects Versions: 4.0
Reporter: Yonik Seeley
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-3433.patch, LUCENE-3433.patch, LUCENE-3433.patch, 
 sorted_source.patch


 There should be a way to get specific IndexDocValues by going through the 
 Directory rather than loading all of the values into memory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3433) Random access non RAM resident IndexDocValues (CSF)

2011-10-06 Thread Michael McCandless (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-3433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13121830#comment-13121830
]

Michael McCandless commented on LUCENE-3433:

bq. I agree with simon here, can't we spin off a different issue for these?

+1: I agree removal of SortedSource is unrelated to this issue. We
should discuss it under a new issue (it's obviously contentious), and
for this issue do the nice cleanups we all agree on. It shouldn't be
removed under this one.

One shouldn't have to pay such a high price (uninversion on searcher
startup) to sort or group by a string field, which we do today. It's
silly to re-invert on every searcher startup when we can sort once
during indexing and record that in the doc values, and SortedSource
gives us that.

Besides the merge RAM usage (which I think is minor) is there a
technical/code complexity reason that SortedSource should be removed?
Does it somehow require the enums or something? I'm trying to
understand how/why it suddenly got coupled into this issue...

I think sorting and grouping by string fields are first class
functions for Lucene.

Random access non RAM resident IndexDocValues (CSF)
---

Key: LUCENE-3433
URL: https://issues.apache.org/jira/browse/LUCENE-3433
Project: Lucene - Java
Issue Type: New Feature
Affects Versions: 4.0
Reporter: Yonik Seeley
Assignee: Simon Willnauer
Fix For: 4.0

Attachments: LUCENE-3433.patch, LUCENE-3433.patch, LUCENE-3433.patch,
sorted_source.patch

There should be a way to get specific IndexDocValues by going through the
Directory rather than loading all of the values into memory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3492) Extract a generic framework for running randomized tests.

2011-10-06 Thread Dawid Weiss (Created) (JIRA)

Extract a generic framework for running randomized tests.
-

 Key: LUCENE-3492
 URL: https://issues.apache.org/jira/browse/LUCENE-3492
 Project: Lucene - Java
  Issue Type: Improvement
  Components: general/test
Reporter: Dawid Weiss
Assignee: Dawid Weiss
Priority: Minor
 Fix For: 4.0


I love the idea of randomized testing. Everyone (we at CarrotSearch, Lucene and 
Solr folks) have their glue to make it possible. The question is if there's 
something to pull out that others could share without having the need to import 
Lucene-specific classes.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3433) Random access non RAM resident IndexDocValues (CSF)

2011-10-06 Thread Michael McCandless (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13121831#comment-13121831
 ] 

Michael McCandless commented on LUCENE-3433:


bq. I also attached a patch that adds back the sorted source so we can spin off 
a new issue and make them efficient without writing it from the scratch.

Simon, can you invert this patch, and open a new issue for removing
SortedSource?

 Random access non RAM resident IndexDocValues (CSF)
 ---

 Key: LUCENE-3433
 URL: https://issues.apache.org/jira/browse/LUCENE-3433
 Project: Lucene - Java
  Issue Type: New Feature
Affects Versions: 4.0
Reporter: Yonik Seeley
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-3433.patch, LUCENE-3433.patch, LUCENE-3433.patch, 
 sorted_source.patch


 There should be a way to get specific IndexDocValues by going through the 
 Directory rather than loading all of the values into memory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3492) Extract a generic framework for running randomized tests.

2011-10-06 Thread Dawid Weiss (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss updated LUCENE-3492:


Attachment: Screen Shot 2011-10-06 at 12.58.02 PM.png

Static fixtures couldn't be handled with a rule, so I've decided to rewrite 
JUnit Runner instead of subclassing it. Lots of frustration so far, but I like 
the result :)

 Extract a generic framework for running randomized tests.
 -

 Key: LUCENE-3492
 URL: https://issues.apache.org/jira/browse/LUCENE-3492
 Project: Lucene - Java
  Issue Type: Improvement
  Components: general/test
Reporter: Dawid Weiss
Assignee: Dawid Weiss
Priority: Minor
 Fix For: 4.0

 Attachments: Screen Shot 2011-10-06 at 12.58.02 PM.png


 I love the idea of randomized testing. Everyone (we at CarrotSearch, Lucene 
 and Solr folks) have their glue to make it possible. The question is if 
 there's something to pull out that others could share without having the need 
 to import Lucene-specific classes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3433) Random access non RAM resident IndexDocValues (CSF)


[ 
https://issues.apache.org/jira/browse/LUCENE-3433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13121840#comment-13121840
 ] 

Martijn van Groningen commented on LUCENE-3433:
---

bq. I think sorting and grouping by string fields are first class functions for 
Lucene.
And faceting too!

Maybe we should have DocTermIndex that is independent of source and have impls 
for DV and impls for indexed values.
Maybe the name DocTermIndex doesn't make sense then, because it suggests that 
values come from the inverted index which might not be the case.

 Random access non RAM resident IndexDocValues (CSF)
 ---

 Key: LUCENE-3433
 URL: https://issues.apache.org/jira/browse/LUCENE-3433
 Project: Lucene - Java
  Issue Type: New Feature
Affects Versions: 4.0
Reporter: Yonik Seeley
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-3433.patch, LUCENE-3433.patch, LUCENE-3433.patch, 
 sorted_source.patch


 There should be a way to get specific IndexDocValues by going through the 
 Directory rather than loading all of the values into memory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3492) Extract a generic framework for running randomized tests.

[
https://issues.apache.org/jira/browse/LUCENE-3492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13121842#comment-13121842
]

Dawid Weiss commented on LUCENE-3492:
-

I've implemented a runner that follows the basic algorithm given in
LUCENE-3489. Basically speaking, seeds for each test run are fixed derivations
of a single master seed (used for the runner and all class-level fixtures) and
don't rely on the order of invocations or other factors.

There's plenty of ways to tweak and tune by overriding class-level @Seed,
method-level @Seed. @Repeat gives you control on how many times a given test is
executed and whether a seed is reused (constant for each iteration) or
randomized (predictably from the start seed).

Most of all, everything fits quite nicely in Eclipse (and I hope other GUIs...
didn't check Idea or Netbeans though) because each executed test run is nicely
described in the runner (full seed), so that you can either click on it and
re-run a single test or write down the seed and fix it at runtime.

Lots of TODOs in the code, will continue in the evening.

Extract a generic framework for running randomized tests.
-

Key: LUCENE-3492
URL: https://issues.apache.org/jira/browse/LUCENE-3492
Project: Lucene - Java
Issue Type: Improvement
Components: general/test
Reporter: Dawid Weiss
Assignee: Dawid Weiss
Priority: Minor
Fix For: 4.0

Attachments: Screen Shot 2011-10-06 at 12.58.02 PM.png

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3262) Facet benchmarking

[
https://issues.apache.org/jira/browse/LUCENE-3262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13121852#comment-13121852
]

Shai Erera commented on LUCENE-3262:

bq. ItemSource.resetInputs

I don't have that warning turned on in Eclipse. I disabled it for exactly this
reason :).

bq. ItemSource rename

The new name is ok, and the properties better fit it. BTW, if you wanted to
have the .algs out there to not silently fail, you could add some code to
setConfig that checks for these outdated properties, and throw a proper
exception. But I'm ok with the solution you chose.

bq. PFD.readers.incRef()

The javadocs are good. I'd also add bNOTE:/b if you no longer need that
IndexReader/TaxoReader, you should decRef()/close() after calling this method.
Otherwise, the IR/TR will just stay open ...

Facet benchmarking
--

Key: LUCENE-3262
URL: https://issues.apache.org/jira/browse/LUCENE-3262
Project: Lucene - Java
Issue Type: New Feature
Components: modules/benchmark, modules/facet
Reporter: Shai Erera
Assignee: Doron Cohen
Attachments: CorpusGenerator.java, LUCENE-3262.patch,
TestPerformanceHack.java

A spin off from LUCENE-3079. We should define few benchmarks for faceting
scenarios, so we can evaluate the new faceting module as well as any
improvement we'd like to consider in the future (such as cutting over to
docvalues, implement FST-based caches etc.).
Toke attached a preliminary test case to LUCENE-3079, so I'll attach it here
as a starting point.
We've also done some preliminary job for extending Benchmark for faceting, so
I'll attach it here as well.
We should perhaps create a Wiki page where we clearly describe the benchmark
scenarios, then include results of 'default settings' and 'optimized
settings', or something like that.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3433) Random access non RAM resident IndexDocValues (CSF)


[ 
https://issues.apache.org/jira/browse/LUCENE-3433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13121859#comment-13121859
 ] 

Robert Muir commented on LUCENE-3433:
-

{quote}
I think sorting and grouping by string fields are first class
functions for Lucene.
{quote}

I disagree: if you aren't sorting by score, then go use a database.

 Random access non RAM resident IndexDocValues (CSF)
 ---

 Key: LUCENE-3433
 URL: https://issues.apache.org/jira/browse/LUCENE-3433
 Project: Lucene - Java
  Issue Type: New Feature
Affects Versions: 4.0
Reporter: Yonik Seeley
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-3433.patch, LUCENE-3433.patch, LUCENE-3433.patch, 
 sorted_source.patch


 There should be a way to get specific IndexDocValues by going through the 
 Directory rather than loading all of the values into memory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3433) Random access non RAM resident IndexDocValues (CSF)

2011-10-06 Thread Simon Willnauer (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13121860#comment-13121860
 ] 

Simon Willnauer commented on LUCENE-3433:
-

bq. Simon, can you invert this patch, and open a new issue for removing 
SortedSource?

actually my plan was to have one iterface for now and then open an issue to add 
back the SortedSource with an impl that we all agree on. Currently, the sorted 
variants are somewhat flaky and heavy I think we should simply remove it here 
and then go and work out a plan how to implement this. The technical reason 
here is simply to rethink the interface, we now have one which is simple so let 
see what we can do to make this work with sorted variants. 

simon

 Random access non RAM resident IndexDocValues (CSF)
 ---

 Key: LUCENE-3433
 URL: https://issues.apache.org/jira/browse/LUCENE-3433
 Project: Lucene - Java
  Issue Type: New Feature
Affects Versions: 4.0
Reporter: Yonik Seeley
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-3433.patch, LUCENE-3433.patch, LUCENE-3433.patch, 
 sorted_source.patch


 There should be a way to get specific IndexDocValues by going through the 
 Directory rather than loading all of the values into memory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3433) Random access non RAM resident IndexDocValues (CSF)


[ 
https://issues.apache.org/jira/browse/LUCENE-3433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13121862#comment-13121862
 ] 

Robert Muir commented on LUCENE-3433:
-

{quote}
+1: I agree removal of SortedSource is unrelated to this issue. We
should discuss it under a new issue (it's obviously contentious), and
for this issue do the nice cleanups we all agree on. It shouldn't be
removed under this one.
{quote}

Thats not what i meant by spin off a different issue, i think we should
spin off a different issue to add back SortedSource.

Docvalues really needs to be simplified, Simon has done just that, and I think
its great as a part of that that it focuses on what it should be, thats 
per-document values,
not being some precomputed FieldCache.



 Random access non RAM resident IndexDocValues (CSF)
 ---

 Key: LUCENE-3433
 URL: https://issues.apache.org/jira/browse/LUCENE-3433
 Project: Lucene - Java
  Issue Type: New Feature
Affects Versions: 4.0
Reporter: Yonik Seeley
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-3433.patch, LUCENE-3433.patch, LUCENE-3433.patch, 
 sorted_source.patch


 There should be a way to get specific IndexDocValues by going through the 
 Directory rather than loading all of the values into memory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-1536) if a filter can support random access API, we should use it


 [ 
https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-1536:
--

Attachment: LUCENE-1536-rewrite.patch

A first rewrite of Lucene core to pass acceptDocs down to Filter.getDocIdSet:
- optimized and simpliefied CachingWrapper* - no deletesmode anymore
- FieldCacheTermsFilter has optimized DocIdSet
- Added bits() to all DocIdSet
- IndexSearcher.searchWithFilter was rewritten to pass liveDocs down.
- AndBits is no longer needed

The tests are not yet rewritten, still 55 compile errors This patch is just 
for review

 if a filter can support random access API, we should use it
 ---

 Key: LUCENE-1536
 URL: https://issues.apache.org/jira/browse/LUCENE-1536
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/search
Affects Versions: 2.4
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: CachedFilterIndexReader.java, LUCENE-1536-rewrite.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch


 I ran some performance tests, comparing applying a filter via
 random-access API instead of current trunk's iterator API.
 This was inspired by LUCENE-1476, where we realized deletions should
 really be implemented just like a filter, but then in testing found
 that switching deletions to iterator was a very sizable performance
 hit.
 Some notes on the test:
   * Index is first 2M docs of Wikipedia.  Test machine is Mac OS X
 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153.
   * I test across multiple queries.  1-X means an OR query, eg 1-4
 means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2
 AND 3 AND 4.  u s means united states (phrase search).
   * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90,
 95, 98, 99, 99.9 (filter is non-null but all bits are set),
 100 (filter=null, control)).
   * Method high means I use random-access filter API in
 IndexSearcher's main loop.  Method low means I use random-access
 filter API down in SegmentTermDocs (just like deleted docs
 today).
   * Baseline (QPS) is current trunk, where filter is applied as iterator up
 high (ie in IndexSearcher's search loop).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-1536) if a filter can support random access API, we should use it


[ 
https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13121877#comment-13121877
 ] 

Robert Muir commented on LUCENE-1536:
-

{noformat}
I therefore would like a similar API like for scorers, so IndexSearcher can ask 
the Filter for a DocIdSet based on the given liveDocs (like the scorer method 
in Weights).
{noformat}

If this is the case, then in the !randomAccess path of indexsearcher.java 
please pass null as liveDocs.

 if a filter can support random access API, we should use it
 ---

 Key: LUCENE-1536
 URL: https://issues.apache.org/jira/browse/LUCENE-1536
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/search
Affects Versions: 2.4
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: CachedFilterIndexReader.java, LUCENE-1536-rewrite.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch


 I ran some performance tests, comparing applying a filter via
 random-access API instead of current trunk's iterator API.
 This was inspired by LUCENE-1476, where we realized deletions should
 really be implemented just like a filter, but then in testing found
 that switching deletions to iterator was a very sizable performance
 hit.
 Some notes on the test:
   * Index is first 2M docs of Wikipedia.  Test machine is Mac OS X
 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153.
   * I test across multiple queries.  1-X means an OR query, eg 1-4
 means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2
 AND 3 AND 4.  u s means united states (phrase search).
   * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90,
 95, 98, 99, 99.9 (filter is non-null but all bits are set),
 100 (filter=null, control)).
   * Method high means I use random-access filter API in
 IndexSearcher's main loop.  Method low means I use random-access
 filter API down in SegmentTermDocs (just like deleted docs
 today).
   * Baseline (QPS) is current trunk, where filter is applied as iterator up
 high (ie in IndexSearcher's search loop).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-1536) if a filter can support random access API, we should use it

2011-10-06 Thread Robert Muir (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-1536:


Attachment: LUCENE-1536.patch

adding back this optimization, again.

before committing please give me time to write tests to ensure we aren't losing 
these optimizations.

 if a filter can support random access API, we should use it
 ---

 Key: LUCENE-1536
 URL: https://issues.apache.org/jira/browse/LUCENE-1536
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/search
Affects Versions: 2.4
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: CachedFilterIndexReader.java, LUCENE-1536-rewrite.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch


 I ran some performance tests, comparing applying a filter via
 random-access API instead of current trunk's iterator API.
 This was inspired by LUCENE-1476, where we realized deletions should
 really be implemented just like a filter, but then in testing found
 that switching deletions to iterator was a very sizable performance
 hit.
 Some notes on the test:
   * Index is first 2M docs of Wikipedia.  Test machine is Mac OS X
 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153.
   * I test across multiple queries.  1-X means an OR query, eg 1-4
 means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2
 AND 3 AND 4.  u s means united states (phrase search).
   * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90,
 95, 98, 99, 99.9 (filter is non-null but all bits are set),
 100 (filter=null, control)).
   * Method high means I use random-access filter API in
 IndexSearcher's main loop.  Method low means I use random-access
 filter API down in SegmentTermDocs (just like deleted docs
 today).
   * Baseline (QPS) is current trunk, where filter is applied as iterator up
 high (ie in IndexSearcher's search loop).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-1536) if a filter can support random access API, we should use it


[ 
https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13121885#comment-13121885
 ] 

Uwe Schindler commented on LUCENE-1536:
---

Robert, thanks!

I missed this line:
{code}
Bits acceptDocs = filterContainsLiveDocs ? null : context.reader.getLiveDocs();
{code}

As we now always use live docs in filter this would always be null!

 if a filter can support random access API, we should use it
 ---

 Key: LUCENE-1536
 URL: https://issues.apache.org/jira/browse/LUCENE-1536
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/search
Affects Versions: 2.4
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: CachedFilterIndexReader.java, LUCENE-1536-rewrite.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch


 I ran some performance tests, comparing applying a filter via
 random-access API instead of current trunk's iterator API.
 This was inspired by LUCENE-1476, where we realized deletions should
 really be implemented just like a filter, but then in testing found
 that switching deletions to iterator was a very sizable performance
 hit.
 Some notes on the test:
   * Index is first 2M docs of Wikipedia.  Test machine is Mac OS X
 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153.
   * I test across multiple queries.  1-X means an OR query, eg 1-4
 means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2
 AND 3 AND 4.  u s means united states (phrase search).
   * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90,
 95, 98, 99, 99.9 (filter is non-null but all bits are set),
 100 (filter=null, control)).
   * Method high means I use random-access filter API in
 IndexSearcher's main loop.  Method low means I use random-access
 filter API down in SegmentTermDocs (just like deleted docs
 today).
   * Baseline (QPS) is current trunk, where filter is applied as iterator up
 high (ie in IndexSearcher's search loop).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-2403) Problem with facet.sort=lex, shards, and facet.mincount

2011-10-06 Thread Yonik Seeley (Resolved) (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Yonik Seeley resolved SOLR-2403.

Resolution: Fixed
Fix Version/s: 3.2

Problem with facet.sort=lex, shards, and facet.mincount
---

Key: SOLR-2403
URL: https://issues.apache.org/jira/browse/SOLR-2403
Project: Solr
Issue Type: Bug
Components: search
Affects Versions: 4.0
Environment: RHEL5, Ubuntu 10.04
Reporter: Peter Cline
Fix For: 3.2

I tested this on a recent trunk snapshot (2/25), haven't verified with 3.1 or
1.4.1. I can if necessary and update.
Solr is not returning the proper number of facet values when sorting
alphabetically, using distributed search, and using a facet.mincount that
excludes some of the values in the first facet.limit values.
Easiest explained by example. Sorting alphabetically, the first 20 values
for my subject_facet field have few documents. 19 facet values have only 1
document associated, and 1 has 2 documents. There are plenty after that have
more than 2.
{code}
http://localhost:8082/solr/select?q=*:*facet=truefacet.field=subject_facetfacet.limit=20facet.sort=lexfacet.mincount=2
{code}
comes back with the expected 20 facet values with = 2 documents associated.
If I add a shards parameter that points back to itself, the result is
different.
{code}
http://localhost:8082/solr/select?q=*:*facet=truefacet.field=subject_facetfacet.limit=20facet.sort=lexfacet.mincount=2shards=localhost:8082/solr
{code}
comes back with only 1 facet value: the single value in the first 20 that had
more than 1 document.
It appears to me that mincount is ignored when doing the original query to
the shards, then applied afterwards.
Let me know if you need any more info.
Thanks,
Peter

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3433) Random access non RAM resident IndexDocValues (CSF)

2011-10-06 Thread Simon Willnauer (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-3433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13121911#comment-13121911
]

Simon Willnauer commented on LUCENE-3433:
-

bq. +1 - something contentious should not be removed in an unrelated issue like
this. If it's already in, but some want it out, let's make an a new issue to
discuss. Once something is in, there should be a clear and dedicated issue
discussing it's removal if there is dispute. I don't agree with simply pulling
it and putting the onus on those who want it to make an issue to get it back in.

there is no dispute here If you'd have looked at the API and the code you'd
know what I talk about though. We cut over to a new api where sorted source
doesn't fit in nicely. We had ValuesEnum used for merging as the LCD between
SortedSource and Source. Now we only have Source as a RandomAccess API. To keep
this at a reasonable size we should try to add the missing part in a different
issue. We should also rethink how to merge sortd sources since they are quite
mem heavy (I think this is a potential issue). Adding back SortedSource is
going to be tough without a new issue since lots of stuff has changed. To make
this dev process easier backing out and re-adding seems best to me. Don't worry
we gonna add it back though.

Random access non RAM resident IndexDocValues (CSF)
---

Attachments: LUCENE-3433.patch, LUCENE-3433.patch, LUCENE-3433.patch,
sorted_source.patch

There should be a way to get specific IndexDocValues by going through the
Directory rather than loading all of the values into memory.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3433) Random access non RAM resident IndexDocValues (CSF)


[ 
https://issues.apache.org/jira/browse/LUCENE-3433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13121918#comment-13121918
 ] 

Mark Miller commented on LUCENE-3433:
-

bq. there is no dispute here

If there is no dispute, what exactly is Mike talking about?

 Random access non RAM resident IndexDocValues (CSF)
 ---

 Key: LUCENE-3433
 URL: https://issues.apache.org/jira/browse/LUCENE-3433
 Project: Lucene - Java
  Issue Type: New Feature
Affects Versions: 4.0
Reporter: Yonik Seeley
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-3433.patch, LUCENE-3433.patch, LUCENE-3433.patch, 
 sorted_source.patch


 There should be a way to get specific IndexDocValues by going through the 
 Directory rather than loading all of the values into memory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3433) Random access non RAM resident IndexDocValues (CSF)


[ 
https://issues.apache.org/jira/browse/LUCENE-3433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13121935#comment-13121935
 ] 

Robert Muir commented on LUCENE-3433:
-

Here is the way i see the problem:
* Currently docvalues is a bit confusing... I think a lot of this is due to the 
current API
* no offense to simon but i think in a way this forces him to feel responsible 
for doing all work on it. The complexity makes it hard for others to get 
involved.
* with this patch, the api becomes a lot simpler: i'm sure its not perfect but 
the API seems to correspond to what DV does, at least it makes sense to me.

can we temporarily drop SortedSource, open a new issue to add it back (mark it 
blocker for 4.0 even?!). This way, we can rethink how to implement this 
functionality (maybe it doesnt even belong as docvalues but something on top of 
it, or something else entirely).


 Random access non RAM resident IndexDocValues (CSF)
 ---

 Key: LUCENE-3433
 URL: https://issues.apache.org/jira/browse/LUCENE-3433
 Project: Lucene - Java
  Issue Type: New Feature
Affects Versions: 4.0
Reporter: Yonik Seeley
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-3433.patch, LUCENE-3433.patch, LUCENE-3433.patch, 
 sorted_source.patch


 There should be a way to get specific IndexDocValues by going through the 
 Directory rather than loading all of the values into memory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3492) Extract a generic framework for running randomized tests.

[
https://issues.apache.org/jira/browse/LUCENE-3492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13121954#comment-13121954
]

Shai Erera commented on LUCENE-3492:

This is only for debugging from an IDE right? It does not replace tests.iter
and tests.seed?

It looks very cool.

It also adds a risk that someone will accidentally commit tests with these
annotations. So perhaps we should add pre-commit hooks, or a test that scans
all test files and ensures those annotations do not exist?

Extract a generic framework for running randomized tests.
-

Attachments: Screen Shot 2011-10-06 at 12.58.02 PM.png

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-2218) Performance of start= and rows= parameters are exponentially slow with large data sets

2011-10-06 Thread Grant Ingersoll (Resolved) (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Grant Ingersoll resolved SOLR-2218.
---

Resolution: Duplicate

Dup of SOLR-1726

Performance of start= and rows= parameters are exponentially slow with large
data sets
--

Key: SOLR-2218
URL: https://issues.apache.org/jira/browse/SOLR-2218
Project: Solr
Issue Type: Improvement
Components: Build
Affects Versions: 1.4.1
Reporter: Bill Bell

With large data sets, 10M rows.
Setting start=large number and rows=large numbers is slow, and gets
slower the farther you get from start=0 with a complex query. Random also
makes this slower.
Would like to somehow make this performance faster for looping through large
data sets. It would be nice if we could pass a pointer to the result set to
loop, or support very large rows=number.
Something like:
rows=1000
start=0
spointer=string_my_query_1
Then within interval (like 5 mins) I can reference this loop:
Something like:
rows=1000
start=1000
spointer=string_my_query_1
What do you think? Since the data is too great the cache is not helping.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2372) Upgrade Solr to Tika 0.10

2011-10-06 Thread Commented


[ 
https://issues.apache.org/jira/browse/SOLR-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13121962#comment-13121962
 ] 

Jan Høydahl commented on SOLR-2372:
---

Also fixed the dot.classpath for eclipse so that the new Tika jars are found

 Upgrade Solr to Tika 0.10
 -

 Key: SOLR-2372
 URL: https://issues.apache.org/jira/browse/SOLR-2372
 Project: Solr
  Issue Type: Improvement
  Components: contrib - Solr Cell (Tika extraction)
Reporter: Grant Ingersoll
Assignee: Jan Høydahl
 Fix For: 3.5, 4.0


 as the title says

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-1536) if a filter can support random access API, we should use it


 [ 
https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-1536:
--

Attachment: LUCENE-1536-rewrite.patch

Here a patch with almost all core tests rewritten (I left out the 
CachingWrapper tests, as I nuked DeletesMode). Its just for demonstartion.

Some tests have really stupid filters and work only with optimized indexes. I 
added asserts in those filters (except one), that acceptDocs==null. The 
remaining one uses QueryUtils and I have no idea whats going on there, that the 
acceptDocs!=null.

When looking at the code in IndexSearcher, I would propose to remove all Filter 
special handling in IndexSaercher and move all code over to FilteredQuery (with 
all our optimizations). If you call IS.search(query, filter,...), IndexSearcher 
would simply wrap with FilteredQuery and we would have no code duplication and 
much easier maintainability in IS.

 if a filter can support random access API, we should use it
 ---

 Key: LUCENE-1536
 URL: https://issues.apache.org/jira/browse/LUCENE-1536
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/search
Affects Versions: 2.4
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: CachedFilterIndexReader.java, LUCENE-1536-rewrite.patch, 
 LUCENE-1536-rewrite.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch


 I ran some performance tests, comparing applying a filter via
 random-access API instead of current trunk's iterator API.
 This was inspired by LUCENE-1476, where we realized deletions should
 really be implemented just like a filter, but then in testing found
 that switching deletions to iterator was a very sizable performance
 hit.
 Some notes on the test:
   * Index is first 2M docs of Wikipedia.  Test machine is Mac OS X
 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153.
   * I test across multiple queries.  1-X means an OR query, eg 1-4
 means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2
 AND 3 AND 4.  u s means united states (phrase search).
   * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90,
 95, 98, 99, 99.9 (filter is non-null but all bits are set),
 100 (filter=null, control)).
   * Method high means I use random-access filter API in
 IndexSearcher's main loop.  Method low means I use random-access
 filter API down in SegmentTermDocs (just like deleted docs
 today).
   * Baseline (QPS) is current trunk, where filter is applied as iterator up
 high (ie in IndexSearcher's search loop).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-1536) if a filter can support random access API, we should use it


[ 
https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13121976#comment-13121976
 ] 

Robert Muir commented on LUCENE-1536:
-

{quote}
When looking at the code in IndexSearcher, I would propose to remove all Filter 
special handling in IndexSaercher and move all code over to FilteredQuery (with 
all our optimizations). If you call IS.search(query, filter,...), IndexSearcher 
would simply wrap with FilteredQuery and we would have no code duplication and 
much easier maintainability in IS.
{quote}

+1

Also, we can nuke AndBits.java now?

 if a filter can support random access API, we should use it
 ---

 Key: LUCENE-1536
 URL: https://issues.apache.org/jira/browse/LUCENE-1536
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/search
Affects Versions: 2.4
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: CachedFilterIndexReader.java, LUCENE-1536-rewrite.patch, 
 LUCENE-1536-rewrite.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch


 I ran some performance tests, comparing applying a filter via
 random-access API instead of current trunk's iterator API.
 This was inspired by LUCENE-1476, where we realized deletions should
 really be implemented just like a filter, but then in testing found
 that switching deletions to iterator was a very sizable performance
 hit.
 Some notes on the test:
   * Index is first 2M docs of Wikipedia.  Test machine is Mac OS X
 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153.
   * I test across multiple queries.  1-X means an OR query, eg 1-4
 means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2
 AND 3 AND 4.  u s means united states (phrase search).
   * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90,
 95, 98, 99, 99.9 (filter is non-null but all bits are set),
 100 (filter=null, control)).
   * Method high means I use random-access filter API in
 IndexSearcher's main loop.  Method low means I use random-access
 filter API down in SegmentTermDocs (just like deleted docs
 today).
   * Baseline (QPS) is current trunk, where filter is applied as iterator up
 high (ie in IndexSearcher's search loop).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-1536) if a filter can support random access API, we should use it


[ 
https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13121978#comment-13121978
 ] 

Uwe Schindler commented on LUCENE-1536:
---

bq. Also, we can nuke AndBits.java now?

It was nuked here, but still made it into the patch :(

 if a filter can support random access API, we should use it
 ---

 Key: LUCENE-1536
 URL: https://issues.apache.org/jira/browse/LUCENE-1536
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/search
Affects Versions: 2.4
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: CachedFilterIndexReader.java, LUCENE-1536-rewrite.patch, 
 LUCENE-1536-rewrite.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch


 I ran some performance tests, comparing applying a filter via
 random-access API instead of current trunk's iterator API.
 This was inspired by LUCENE-1476, where we realized deletions should
 really be implemented just like a filter, but then in testing found
 that switching deletions to iterator was a very sizable performance
 hit.
 Some notes on the test:
   * Index is first 2M docs of Wikipedia.  Test machine is Mac OS X
 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153.
   * I test across multiple queries.  1-X means an OR query, eg 1-4
 means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2
 AND 3 AND 4.  u s means united states (phrase search).
   * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90,
 95, 98, 99, 99.9 (filter is non-null but all bits are set),
 100 (filter=null, control)).
   * Method high means I use random-access filter API in
 IndexSearcher's main loop.  Method low means I use random-access
 filter API down in SegmentTermDocs (just like deleted docs
 today).
   * Baseline (QPS) is current trunk, where filter is applied as iterator up
 high (ie in IndexSearcher's search loop).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-2813) TrieTokenizerFactory should catch NumberFormatException, return 400 (not 500)

2011-10-06 Thread Jeff Crump (Created) (JIRA)

TrieTokenizerFactory should catch NumberFormatException, return 400 (not 500)
-

 Key: SOLR-2813
 URL: https://issues.apache.org/jira/browse/SOLR-2813
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis
Affects Versions: 4.0
 Environment: 4.0 trunk, snapshot taken 09/08/2011.
Reporter: Jeff Crump
Priority: Minor


TrieTokenizerFactory is allowing bad user input to result in a 500 error rather 
than a 400.  For a long-valued field, for example, this code in 
TrieTokenizerFactory.reset() will throw a NumberFormatException:

 case LONG:
  ts.setLongValue(Long.parseLong(v));
  break;

The NFE gets all the way out to RequestHandlerBase.handleRequest():

 catch (Exception e) {
  SolrException.log(SolrCore.log,e);
  if (e instanceof ParseException) {
e = new SolrException(SolrException.ErrorCode.BAD_REQUEST, e);
  }

but is not caught here, and ends up coming out of SolrDispatchFilter.sendError 
as a 500.

Simply catching NFE and turning it into a SolrException does the trick:

 solr/core/src/java/org/apache/solr/analysis/TrieTokenizerFactory.java#1 - 
/4.0-trunk-09082011/solr/core/src/java/org/apache/solr/analysis/TrieTokenizerFactory.java
 
110a111,112
 } catch (NumberFormatException e) {
 throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, Unable 
 to parse input, e);


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3493) Solr reopen on a custom reader doesn't work

2011-10-06 Thread Jason Rutherglen (Created) (JIRA)

Solr reopen on a custom reader doesn't work
---

 Key: LUCENE-3493
 URL: https://issues.apache.org/jira/browse/LUCENE-3493
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Affects Versions: 3.4
Reporter: Jason Rutherglen
Priority: Blocker


When a custom index reader is used with Solr and reopen, the custom reader 
vanishes after the reopen.  It's a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3493) Solr reopen on a custom reader doesn't work

2011-10-06 Thread Jason Rutherglen (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-3493:
-

Attachment: LUCENE-3493.patch

Patch with unit test demonstrating the bug.  

The fix required in Lucene is randomly in the patch as well.

I'll post another patch showing the Lucene fix, allows fixing the bug on the 
Solr side.

 Solr reopen on a custom reader doesn't work
 ---

 Key: LUCENE-3493
 URL: https://issues.apache.org/jira/browse/LUCENE-3493
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Affects Versions: 3.4
Reporter: Jason Rutherglen
Priority: Blocker
 Attachments: LUCENE-3493.patch


 When a custom index reader is used with Solr and reopen, the custom reader 
 vanishes after the reopen.  It's a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3494) Remove per-document multiply in FilteredQuery

2011-10-06 Thread Robert Muir (Created) (JIRA)

Remove per-document multiply in FilteredQuery
-

 Key: LUCENE-3494
 URL: https://issues.apache.org/jira/browse/LUCENE-3494
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Robert Muir
 Fix For: 3.5, 4.0
 Attachments: LUCENE-3494.patch

Spinoff of LUCENE-1536.

In LUCENE-1536, Uwe suggested using FilteredQuery under-the-hood to implement 
filtered search.

But this query is inefficient, it does a per-document multiplication 
(wrapped.score() * boost()).

Instead, it should just pass the boost down in its weight, like BooleanQuery 
does to avoid this per-document multiply.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3494) Remove per-document multiply in FilteredQuery

2011-10-06 Thread Robert Muir (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3494:


Attachment: LUCENE-3494.patch

 Remove per-document multiply in FilteredQuery
 -

 Key: LUCENE-3494
 URL: https://issues.apache.org/jira/browse/LUCENE-3494
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Robert Muir
 Fix For: 3.5, 4.0

 Attachments: LUCENE-3494.patch


 Spinoff of LUCENE-1536.
 In LUCENE-1536, Uwe suggested using FilteredQuery under-the-hood to implement 
 filtered search.
 But this query is inefficient, it does a per-document multiplication 
 (wrapped.score() * boost()).
 Instead, it should just pass the boost down in its weight, like BooleanQuery 
 does to avoid this per-document multiply.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes

2011-10-06 Thread David Smiley (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13121996#comment-13121996
]

David Smiley commented on SOLR-2155:

Frederick, a rough inspection of your problem suggests that the GeoHashField is
declared multiValue=true but the field in your POJO is not correspondingly a
ListString like it should be. If you only need a single value then I suggest
you use LatLonType instead, since it's what comes with Solr.

Geospatial search using geohash prefixes

Key: SOLR-2155
URL: https://issues.apache.org/jira/browse/SOLR-2155
Project: Solr
Issue Type: Improvement
Reporter: David Smiley
Assignee: Grant Ingersoll
Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch,
GeoHashPrefixFilter.patch,
SOLR-2155_GeoHashPrefixFilter_with_sorting_no_poly.patch, SOLR.2155.p3.patch,
SOLR.2155.p3tests.patch, Solr2155-1.0.2-project.zip,
Solr2155-1.0.3-project.zip, Solr2155-for-1.0.2-3.x-port.patch

There currently isn't a solution in Solr for doing geospatial filtering on
documents that have a variable number of points. This scenario occurs when
there is location extraction (i.e. via a gazateer) occurring on free text.
None, one, or many geospatial locations might be extracted from any given
document and users want to limit their search results to those occurring in a
user-specified area.
I've implemented this by furthering the GeoHash based work in Lucene/Solr
with a geohash prefix based filter. A geohash refers to a lat-lon box on the
earth. Each successive character added further subdivides the box into a 4x8
(or 8x4 depending on the even/odd length of the geohash) grid. The first
step in this scheme is figuring out which geohash grid squares cover the
user's search query. I've added various extra methods to GeoHashUtils (and
added tests) to assist in this purpose. The next step is an actual Lucene
Filter, GeoHashPrefixFilter, that uses these geohash prefixes in
TermsEnum.seek() to skip to relevant grid squares in the index. Once a
matching geohash grid is found, the points therein are compared against the
user's query to see if it matches. I created an abstraction GeoShape
extended by subclasses named PointDistance... and CartesianBox to support
different queried shapes so that the filter need not care about these details.
This work was presented at LuceneRevolution in Boston on October 8th.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

PagedBytes additional method

2011-10-06 Thread Jason Rutherglen

PagedBytes is great!  Even better would be a couple of additional
methods, one to write it out to an IndexOutput and the other for the
total bytes used.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: PagedBytes additional method

2011-10-06 Thread Simon Willnauer

why don't you open an issue for this?

thanks,

simon

On Thu, Oct 6, 2011 at 5:33 PM, Jason Rutherglen
jason.rutherg...@gmail.com wrote:
 PagedBytes is great!  Even better would be a couple of additional
 methods, one to write it out to an IndexOutput and the other for the
 total bytes used.

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: PagedBytes additional method

2011-10-06 Thread Jason Rutherglen

I try not to without having a patch somewhat prepared!

On Thu, Oct 6, 2011 at 11:38 AM, Simon Willnauer
simon.willna...@googlemail.com wrote:
 why don't you open an issue for this?

 thanks,

 simon

 On Thu, Oct 6, 2011 at 5:33 PM, Jason Rutherglen
 jason.rutherg...@gmail.com wrote:
 PagedBytes is great!  Even better would be a couple of additional
 methods, one to write it out to an IndexOutput and the other for the
 total bytes used.

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3494) Remove per-document multiply in FilteredQuery


[ 
https://issues.apache.org/jira/browse/LUCENE-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122010#comment-13122010
 ] 

Uwe Schindler commented on LUCENE-3494:
---

+1, commit this so i can move forward with 1536!

Thanks for help!!!

 Remove per-document multiply in FilteredQuery
 -

 Key: LUCENE-3494
 URL: https://issues.apache.org/jira/browse/LUCENE-3494
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Robert Muir
 Fix For: 3.5, 4.0

 Attachments: LUCENE-3494.patch


 Spinoff of LUCENE-1536.
 In LUCENE-1536, Uwe suggested using FilteredQuery under-the-hood to implement 
 filtered search.
 But this query is inefficient, it does a per-document multiplication 
 (wrapped.score() * boost()).
 Instead, it should just pass the boost down in its weight, like BooleanQuery 
 does to avoid this per-document multiply.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3493) Solr reopen on a custom reader doesn't work


[ 
https://issues.apache.org/jira/browse/LUCENE-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122011#comment-13122011
 ] 

Mark Miller commented on LUCENE-3493:
-

I have a couple questions:

If the bug is in Lucene, shouldn't we write a test at the Lucene level?

What exactly is the bug? That when you subclass DirectoryReader, it doesn't 
return that subclass from reopen? If this is the desired behavior, isn't it up 
to the subclass to override reopen?

Also, you say the required lucene fix is randomly in the patch, but also that 
you will post another patch showing the lucene fix - I don't see it in the 
patch, so I assume its coming, but the only change I see is making some Lucene 
constructors public - we shouldn't likely do that just for this Solr test I 
think.

 Solr reopen on a custom reader doesn't work
 ---

 Key: LUCENE-3493
 URL: https://issues.apache.org/jira/browse/LUCENE-3493
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Affects Versions: 3.4
Reporter: Jason Rutherglen
Priority: Blocker
 Attachments: LUCENE-3493.patch


 When a custom index reader is used with Solr and reopen, the custom reader 
 vanishes after the reopen.  It's a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: [Lucene.Net] Asking about Lucene.net License

2011-10-06 Thread Scott Lombard

The answer is no.  None of the code within Lucene.Net is GPL.   All Apache
products are under the Apache License.

 -Original Message-
 From: Ron Grabowski [mailto:rongrabow...@yahoo.com]
 Sent: Thursday, October 06, 2011 12:11 AM
 To: lucene-net-...@lucene.apache.org
 Subject: Re: [Lucene.Net] Asking about Lucene.net License

 I think he was asking if any of the code within Lucene.Net is GPL.

 From: Scott Lombard lombardena...@gmail.com
 To: lucene-net-...@lucene.apache.org
 Sent: Wednesday, October 5, 2011 5:08 PM
 Subject: RE: [Lucene.Net] Asking about Lucene.net License

 Asha,

 Lucene.net is an Apache Incubator project and is only distributed under an
 Apache License version 2.  If you are using a GPL 3 License then there is
 documented compatibility between the licenses as described on the page
 http://www.apache.org/licenses/GPL-compatibility.html.  Give this
 compatibility you can include Lucene.net in a GPL 3 project.  I am not
 sure
 how on all of the mechanics of this inclusion but it can be done.

 Scott

  -Original Message-
  From: Asha Kang [mailto:stereo...@gmail.com]
  Sent: Tuesday, October 04, 2011 8:57 PM
  To: lucene-net-...@lucene.apache.org
  Subject: [Lucene.Net] Asking about Lucene.net License

  Hi this is Asha Kang from Korea.

  the reason i`m writing this email is that I`d like to make sure which
  license Lucene.net is following.
  Now I`m developing Search Engine by using Lucene.net.
  As I know, Lucene.net follows apache licene 2.0.
  but my co-woker told me that some classes included in lucene.net`s dll
  could follow GPL License.
  So now I`m confused.
  Are there any classes following GPL LICENSE?
  Do I need to follow two license apache license 2.0 and  GPL LICENSE?

  I'm looking forward to replying from you.

  bset regards

  Asha Kang.

[jira] [Commented] (LUCENE-3493) Solr reopen on a custom reader doesn't work

2011-10-06 Thread Jason Rutherglen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122019#comment-13122019
 ] 

Jason Rutherglen commented on LUCENE-3493:
--

The patch shows the bug only.  Which needs a test in Solr.  The next patch will 
show the fix etc.  A Lucene test makes sense as well.  

 Solr reopen on a custom reader doesn't work
 ---

 Key: LUCENE-3493
 URL: https://issues.apache.org/jira/browse/LUCENE-3493
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Affects Versions: 3.4
Reporter: Jason Rutherglen
Priority: Blocker
 Attachments: LUCENE-3493.patch


 When a custom index reader is used with Solr and reopen, the custom reader 
 vanishes after the reopen.  It's a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3493) Solr reopen on a custom reader doesn't work


[ 
https://issues.apache.org/jira/browse/LUCENE-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122021#comment-13122021
 ] 

Uwe Schindler commented on LUCENE-3493:
---

This is not a bug at all: Your custom IndexReader has to override reopen() and 
return your own implementation.

 Solr reopen on a custom reader doesn't work
 ---

 Key: LUCENE-3493
 URL: https://issues.apache.org/jira/browse/LUCENE-3493
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Affects Versions: 3.4
Reporter: Jason Rutherglen
Priority: Blocker
 Attachments: LUCENE-3493.patch


 When a custom index reader is used with Solr and reopen, the custom reader 
 vanishes after the reopen.  It's a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3262) Facet benchmarking

2011-10-06 Thread Doron Cohen (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-3262:


Attachment: LUCENE-3262.patch

Updated patch with a test, more javadocs, and a comment as Shai suggested.

I think this is ready to commit.

More tests are needed, and also Search with facets is missing, but that can go 
in a separate issue.


 Facet benchmarking
 --

 Key: LUCENE-3262
 URL: https://issues.apache.org/jira/browse/LUCENE-3262
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/benchmark, modules/facet
Reporter: Shai Erera
Assignee: Doron Cohen
 Attachments: CorpusGenerator.java, LUCENE-3262.patch, 
 LUCENE-3262.patch, TestPerformanceHack.java


 A spin off from LUCENE-3079. We should define few benchmarks for faceting 
 scenarios, so we can evaluate the new faceting module as well as any 
 improvement we'd like to consider in the future (such as cutting over to 
 docvalues, implement FST-based caches etc.).
 Toke attached a preliminary test case to LUCENE-3079, so I'll attach it here 
 as a starting point.
 We've also done some preliminary job for extending Benchmark for faceting, so 
 I'll attach it here as well.
 We should perhaps create a Wiki page where we clearly describe the benchmark 
 scenarios, then include results of 'default settings' and 'optimized 
 settings', or something like that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3493) Solr reopen on a custom reader doesn't work

2011-10-06 Thread Jason Rutherglen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122033#comment-13122033
 ] 

Jason Rutherglen commented on LUCENE-3493:
--

Uwe, I'd like to agree with you, however I cannot (because then I wouldn't have 
had to create an issue!).  Look at DR.doOpen* methods.  They're private.  
There's no reason for them to be.  They need to be protected, that's in the 
next patch.  Fairly simple.  The follow on to this is overriding IW to return 
custom readers.  I had an issue and patch for that a while back.  It's best to 
implement both here, as Lucene 4.x Solr's NRT will show the same problem!

I think you're right, looks like this *could* be done be overriding 
doOpenIfChanged* however, it doesn't make sense to duplicate code!

 Solr reopen on a custom reader doesn't work
 ---

 Key: LUCENE-3493
 URL: https://issues.apache.org/jira/browse/LUCENE-3493
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Affects Versions: 3.4
Reporter: Jason Rutherglen
Priority: Blocker
 Attachments: LUCENE-3493.patch


 When a custom index reader is used with Solr and reopen, the custom reader 
 vanishes after the reopen.  It's a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3493) Solr reopen on a custom reader doesn't work

2011-10-06 Thread Yonik Seeley (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122036#comment-13122036
 ] 

Yonik Seeley commented on LUCENE-3493:
--

Implementing your own IndexReader has always been a very tricky endeavor, esp 
wrt maintainability... super-expert only.  One of the reasons I was glad to get 
rid of SolrIndexReader (the fragile base class problem).

 Solr reopen on a custom reader doesn't work
 ---

 Key: LUCENE-3493
 URL: https://issues.apache.org/jira/browse/LUCENE-3493
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Affects Versions: 3.4
Reporter: Jason Rutherglen
Priority: Blocker
 Attachments: LUCENE-3493.patch


 When a custom index reader is used with Solr and reopen, the custom reader 
 vanishes after the reopen.  It's a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3433) Random access non RAM resident IndexDocValues (CSF)

2011-10-06 Thread Simon Willnauer (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-3433:


Attachment: LUCENE-3433.patch

here we go.. the make everybody happy patch! I added SortedSource back and 
integrated it into the Source pattern for random access. we now have an 
entirely disk resident SortedSource impl for both variants and a single 
interface in the first place. SortedSource instance can be obtained via 
Source#asSortedSource() which returns null if the source is not sorted. With 
this random access DirectSortedSource we can also improve the merging for 
sorted sources which was one of my major issues here.

 Random access non RAM resident IndexDocValues (CSF)
 ---

 Key: LUCENE-3433
 URL: https://issues.apache.org/jira/browse/LUCENE-3433
 Project: Lucene - Java
  Issue Type: New Feature
Affects Versions: 4.0
Reporter: Yonik Seeley
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-3433.patch, LUCENE-3433.patch, LUCENE-3433.patch, 
 LUCENE-3433.patch, sorted_source.patch


 There should be a way to get specific IndexDocValues by going through the 
 Directory rather than loading all of the values into memory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3494) Remove per-document multiply in FilteredQuery

2011-10-06 Thread Robert Muir (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-3494.
-

Resolution: Fixed
  Assignee: Robert Muir

 Remove per-document multiply in FilteredQuery
 -

 Key: LUCENE-3494
 URL: https://issues.apache.org/jira/browse/LUCENE-3494
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 3.5, 4.0

 Attachments: LUCENE-3494.patch


 Spinoff of LUCENE-1536.
 In LUCENE-1536, Uwe suggested using FilteredQuery under-the-hood to implement 
 filtered search.
 But this query is inefficient, it does a per-document multiplication 
 (wrapped.score() * boost()).
 Instead, it should just pass the boost down in its weight, like BooleanQuery 
 does to avoid this per-document multiply.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3493) Solr reopen on a custom reader doesn't work


[ 
https://issues.apache.org/jira/browse/LUCENE-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122040#comment-13122040
 ] 

Mark Miller commented on LUCENE-3493:
-

That clears things up a bit Jason. The title and patch don't really explain the 
issue.

bq. as Lucene 4.x Solr's NRT will show the same problem!

How is that? Solr's NRT does not rely on a custom IndexReader (and if it did, I 
imagine we would make that properly override doOpenIfChanged, else it would be 
a bug)?

 Solr reopen on a custom reader doesn't work
 ---

 Key: LUCENE-3493
 URL: https://issues.apache.org/jira/browse/LUCENE-3493
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Affects Versions: 3.4
Reporter: Jason Rutherglen
Priority: Blocker
 Attachments: LUCENE-3493.patch


 When a custom index reader is used with Solr and reopen, the custom reader 
 vanishes after the reopen.  It's a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3493) Solr reopen on a custom reader doesn't work

2011-10-06 Thread Jason Rutherglen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122050#comment-13122050
 ] 

Jason Rutherglen commented on LUCENE-3493:
--

bq. Solr's NRT does not rely on a custom IndexReader

Yikes, logically the custom reader functionality should!

{quote}properly override doOpenIfChanged, else it would be a bug{quote}

It's a bug because there's no way to implement that today.  The DirectoryReader 
is created deep inside of IW.getReader (there's no way to re-implement it's 
functionality either because of private variable access).  

I think we need a protected method for creating reader in IW.  I think though 
this becomes almost endless because I don't think there's a way to implement a 
custom IW in Solr.

 Solr reopen on a custom reader doesn't work
 ---

 Key: LUCENE-3493
 URL: https://issues.apache.org/jira/browse/LUCENE-3493
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Affects Versions: 3.4
Reporter: Jason Rutherglen
Priority: Blocker
 Attachments: LUCENE-3493.patch


 When a custom index reader is used with Solr and reopen, the custom reader 
 vanishes after the reopen.  It's a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3493) Solr reopen on a custom reader doesn't work


[ 
https://issues.apache.org/jira/browse/LUCENE-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122056#comment-13122056
 ] 

Mark Miller commented on LUCENE-3493:
-

bq. Yikes, logically the custom reader functionality should!

Okay, I see - you also want your Reader impl to be pulled from IW when using 
NRT. But as you allude to below, you would need a custom IndexWriter to do that 
- that is where we get the IndexReader from for NRT.

That's a scary road to start down currently (or as you say, endless).

bq. I don't think there's a way to implement a custom IW in Solr.

Would be pretty advanced stuff, but what we would likely have to do is allow 
users to provide alternate SolrCoreState impls (currently the 
DefaultSolrCoreState impl is simply used). This would let you manage what 
IndexWriter impl was used.

 Solr reopen on a custom reader doesn't work
 ---

 Key: LUCENE-3493
 URL: https://issues.apache.org/jira/browse/LUCENE-3493
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Affects Versions: 3.4
Reporter: Jason Rutherglen
Priority: Blocker
 Attachments: LUCENE-3493.patch


 When a custom index reader is used with Solr and reopen, the custom reader 
 vanishes after the reopen.  It's a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3492) Extract a generic framework for running randomized tests.

2011-10-06 Thread Dawid Weiss (Issue Comment Edited) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122068#comment-13122068
 ] 

Dawid Weiss commented on LUCENE-3492:
-

Hi Shai. This is definitely not only for debugging. For example we use 
randomized testing inside CarrotSearch to test algorithmic/ combinatorial code. 
Once you hit a bug, you simply copy the test case (or a call to a common test 
case method) and fix the seed to have a regression test for the future (so that 
you know you're not failing examples that previously failed). So, for example:
{code}
@Test @Seed(23095324)
public void runFixedRegression_1 { doSomethingWithRandoms(); }

@Test @Seed(239735923)
public void runFixedRegression_1 { doSomethingWithRandoms(); }

@Test
public void runRandomized { doSomethingWithRandoms(); }
{code}

This is a scenario I really came to like. It's a bit like your tests write 
themselves for you :)

I left system properties for fixing seeds and enforcing repetition number 
because they are currently in Lucene, although I personally don't like them 
that much (because they affect everything globally). I do understand they're 
useful for quick hacking without recompiling stuff or for remote executions, 
but I'd much rather have something like -Dseed.testClass[.method]= which 
would affect only a single class or method rather than everything. The same can 
be done for filtering which method/ test case to execute. This is debatable of 
course and a matter of personal taste.

I should publish what I have tonight on github (I'm moving certain things out 
of our proprietary codebase and there are JUnit corner cases that slow things 
down).

 Extract a generic framework for running randomized tests.
 -

 Key: LUCENE-3492
 URL: https://issues.apache.org/jira/browse/LUCENE-3492
 Project: Lucene - Java
  Issue Type: Improvement
  Components: general/test
Reporter: Dawid Weiss
Assignee: Dawid Weiss
Priority: Minor
 Fix For: 4.0

 Attachments: Screen Shot 2011-10-06 at 12.58.02 PM.png


 I love the idea of randomized testing. Everyone (we at CarrotSearch, Lucene 
 and Solr folks) have their glue to make it possible. The question is if 
 there's something to pull out that others could share without having the need 
 to import Lucene-specific classes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Issue Comment Edited] (LUCENE-3492) Extract a generic framework for running randomized tests.


[ 
https://issues.apache.org/jira/browse/LUCENE-3492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122068#comment-13122068
 ] 

Dawid Weiss edited comment on LUCENE-3492 at 10/6/11 5:04 PM:
--

Hi Shai. This is definitely not only for debugging. For example we use 
randomized testing inside CarrotSearch to test algorithmic/ combinatorial code. 
Once you hit a bug, you simply copy the test case (or a call to a common test 
case method) and fix the seed to have a regression test for the future (so that 
you know you're not failing examples that previously failed). So, for example:
{code}
@Test @Seed(23095324)
public void runFixedRegression_1 { doSomethingWithRandoms(); }

@Test @Seed(239735923)
public void runFixedRegression_2 { doSomethingWithRandoms(); }

@Test
public void runRandomized { doSomethingWithRandoms(); }
{code}

This is a scenario I really came to like. It's a bit like your tests write 
themselves for you :)

I left system properties for fixing seeds and enforcing repetition number 
because they are currently in Lucene, although I personally don't like them 
that much (because they affect everything globally). I do understand they're 
useful for quick hacking without recompiling stuff or for remote executions, 
but I'd much rather have something like -Dseed.testClass[.method]= which 
would affect only a single class or method rather than everything. The same can 
be done for filtering which method/ test case to execute. This is debatable of 
course and a matter of personal taste.

I should publish what I have tonight on github (I'm moving certain things out 
of our proprietary codebase and there are JUnit corner cases that slow things 
down).

  was (Author: dweiss):
Hi Shai. This is definitely not only for debugging. For example we use 
randomized testing inside CarrotSearch to test algorithmic/ combinatorial code. 
Once you hit a bug, you simply copy the test case (or a call to a common test 
case method) and fix the seed to have a regression test for the future (so that 
you know you're not failing examples that previously failed). So, for example:
{code}
@Test @Seed(23095324)
public void runFixedRegression_1 { doSomethingWithRandoms(); }

@Test @Seed(239735923)
public void runFixedRegression_1 { doSomethingWithRandoms(); }

@Test
public void runRandomized { doSomethingWithRandoms(); }
{code}

This is a scenario I really came to like. It's a bit like your tests write 
themselves for you :)

I left system properties for fixing seeds and enforcing repetition number 
because they are currently in Lucene, although I personally don't like them 
that much (because they affect everything globally). I do understand they're 
useful for quick hacking without recompiling stuff or for remote executions, 
but I'd much rather have something like -Dseed.testClass[.method]= which 
would affect only a single class or method rather than everything. The same can 
be done for filtering which method/ test case to execute. This is debatable of 
course and a matter of personal taste.

I should publish what I have tonight on github (I'm moving certain things out 
of our proprietary codebase and there are JUnit corner cases that slow things 
down).
  
 Extract a generic framework for running randomized tests.
 -

 Key: LUCENE-3492
 URL: https://issues.apache.org/jira/browse/LUCENE-3492
 Project: Lucene - Java
  Issue Type: Improvement
  Components: general/test
Reporter: Dawid Weiss
Assignee: Dawid Weiss
Priority: Minor
 Fix For: 4.0

 Attachments: Screen Shot 2011-10-06 at 12.58.02 PM.png


 I love the idea of randomized testing. Everyone (we at CarrotSearch, Lucene 
 and Solr folks) have their glue to make it possible. The question is if 
 there's something to pull out that others could share without having the need 
 to import Lucene-specific classes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3493) Solr reopen on a custom reader doesn't work

2011-10-06 Thread Jason Rutherglen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122089#comment-13122089
 ] 

Jason Rutherglen commented on LUCENE-3493:
--

Uwe, I tried your idea.  It doesn't work!  Here's why: DR.writeLock and 
DR.segmentInfos are private.  Meaning the re-duplicated code because the useful 
methods aren't protected, cannot access these private variables.  Of course one 
can use reflection but that's just 'atrocious'.  :)

 Solr reopen on a custom reader doesn't work
 ---

 Key: LUCENE-3493
 URL: https://issues.apache.org/jira/browse/LUCENE-3493
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Affects Versions: 3.4
Reporter: Jason Rutherglen
Priority: Blocker
 Attachments: LUCENE-3493.patch


 When a custom index reader is used with Solr and reopen, the custom reader 
 vanishes after the reopen.  It's a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3493) Solr reopen on a custom reader doesn't work

2011-10-06 Thread Jason Rutherglen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122095#comment-13122095
 ] 

Jason Rutherglen commented on LUCENE-3493:
--

One way to solve all of this without subclassing, is to move the 
IndexReaderFactory to Lucene, integrate it into IW and DR.  

That will be much cleaner than forcing users to subclass, which is a monstrous 
pain, and will generate excessive unnecessary code in the end.

 Solr reopen on a custom reader doesn't work
 ---

 Key: LUCENE-3493
 URL: https://issues.apache.org/jira/browse/LUCENE-3493
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Affects Versions: 3.4
Reporter: Jason Rutherglen
Priority: Blocker
 Attachments: LUCENE-3493.patch


 When a custom index reader is used with Solr and reopen, the custom reader 
 vanishes after the reopen.  It's a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3495) BlockJoinQuery doesn't implement boost

2011-10-06 Thread Robert Muir (Created) (JIRA)

BlockJoinQuery doesn't implement boost
--

 Key: LUCENE-3495
 URL: https://issues.apache.org/jira/browse/LUCENE-3495
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 3.4
Reporter: Robert Muir
 Fix For: 3.5, 4.0


After reviewing LUCENE-3494, i checked other queries and noticed that 
BlockJoinQuery currently throws UOE for getBoost and setBoost:
{noformat}
throw new UnsupportedOperationException(this query cannot support boosting; 
please use childQuery.setBoost instead);
{noformat}

I don't think we can safely do that in queries, because other parts of lucene 
rely upon this working... for example BQs rewrite when
it has a single clause and erases itself.

So I think we should just pass down the boost to the inner weight.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3495) BlockJoinQuery doesn't implement boost

2011-10-06 Thread Robert Muir (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3495:


Attachment: LUCENE-3495.patch

 BlockJoinQuery doesn't implement boost
 --

 Key: LUCENE-3495
 URL: https://issues.apache.org/jira/browse/LUCENE-3495
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 3.4
Reporter: Robert Muir
 Fix For: 3.5, 4.0

 Attachments: LUCENE-3495.patch


 After reviewing LUCENE-3494, i checked other queries and noticed that 
 BlockJoinQuery currently throws UOE for getBoost and setBoost:
 {noformat}
 throw new UnsupportedOperationException(this query cannot support boosting; 
 please use childQuery.setBoost instead);
 {noformat}
 I don't think we can safely do that in queries, because other parts of lucene 
 rely upon this working... for example BQs rewrite when
 it has a single clause and erases itself.
 So I think we should just pass down the boost to the inner weight.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3495) BlockJoinQuery doesn't implement boost

2011-10-06 Thread Michael McCandless (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122128#comment-13122128
 ] 

Michael McCandless commented on LUCENE-3495:


+1 looks good!


 BlockJoinQuery doesn't implement boost
 --

 Key: LUCENE-3495
 URL: https://issues.apache.org/jira/browse/LUCENE-3495
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 3.4
Reporter: Robert Muir
 Fix For: 3.5, 4.0

 Attachments: LUCENE-3495.patch


 After reviewing LUCENE-3494, i checked other queries and noticed that 
 BlockJoinQuery currently throws UOE for getBoost and setBoost:
 {noformat}
 throw new UnsupportedOperationException(this query cannot support boosting; 
 please use childQuery.setBoost instead);
 {noformat}
 I don't think we can safely do that in queries, because other parts of lucene 
 rely upon this working... for example BQs rewrite when
 it has a single clause and erases itself.
 So I think we should just pass down the boost to the inner weight.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3433) Random access non RAM resident IndexDocValues (CSF)

2011-10-06 Thread Michael McCandless (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122130#comment-13122130
 ] 

Michael McCandless commented on LUCENE-3433:


Thanks Simon!  I'll look through the patch... it's a great cleanup.

 Random access non RAM resident IndexDocValues (CSF)
 ---

 Key: LUCENE-3433
 URL: https://issues.apache.org/jira/browse/LUCENE-3433
 Project: Lucene - Java
  Issue Type: New Feature
Affects Versions: 4.0
Reporter: Yonik Seeley
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-3433.patch, LUCENE-3433.patch, LUCENE-3433.patch, 
 LUCENE-3433.patch, sorted_source.patch


 There should be a way to get specific IndexDocValues by going through the 
 Directory rather than loading all of the values into memory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3262) Facet benchmarking

[
https://issues.apache.org/jira/browse/LUCENE-3262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122134#comment-13122134
]

Shai Erera commented on LUCENE-3262:

bq. I think this is ready to commit.

+1. Perhaps just add a CHANGES entry?

bq. but that can go in a separate issue

I think it's better if we resolve it in that issue, and maybe rename the issue
to Facet benchmarking framework. You can still commit the current progress
because it is 'whole' - covering the indexing side. I've worked on issues
before that had several commits, so this will not be the first one.

We should also run some benchmark tests, describing clearly the data sets, but
this can be done under a separate issue.

Facet benchmarking
--

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3492) Extract a generic framework for running randomized tests.

[
https://issues.apache.org/jira/browse/LUCENE-3492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122140#comment-13122140
]

Shai Erera commented on LUCENE-3492:

Ok I get the point now.

But I still think we should have specific unit tests that reproduce specific
scenarios, than using some monstrous tests that happened to stumble on a seed
that revealed a bug. If however the scenario cannot be reproduced
deterministically, then I agree that this framework is powerful and useful.

Extract a generic framework for running randomized tests.
-

Attachments: Screen Shot 2011-10-06 at 12.58.02 PM.png

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3492) Extract a generic framework for running randomized tests.

[
https://issues.apache.org/jira/browse/LUCENE-3492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122148#comment-13122148
]

Dawid Weiss commented on LUCENE-3492:
-

Sure, absolutely. In our (mostly algorithmic, mind you) experience even small
test cases can be randomized and then it is really duplicated effort to
re-write them for a particular bug scenario (the tests are often simple, the
data changes). But sure: the simpler the test, the better.

Extract a generic framework for running randomized tests.
-

Attachments: Screen Shot 2011-10-06 at 12.58.02 PM.png

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3492) Extract a generic framework for running randomized tests.


[ 
https://issues.apache.org/jira/browse/LUCENE-3492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122153#comment-13122153
 ] 

Robert Muir commented on LUCENE-3492:
-

I agree too. one difficulty with using @seed or something is our seeds quickly 
become out of date because we are often adding more randomization to our 
testing framework (e.g. additional craziness to randomindexwriter, searchers, 
analyzer, whatever)

 Extract a generic framework for running randomized tests.
 -

 Key: LUCENE-3492
 URL: https://issues.apache.org/jira/browse/LUCENE-3492
 Project: Lucene - Java
  Issue Type: Improvement
  Components: general/test
Reporter: Dawid Weiss
Assignee: Dawid Weiss
Priority: Minor
 Fix For: 4.0

 Attachments: Screen Shot 2011-10-06 at 12.58.02 PM.png


 I love the idea of randomized testing. Everyone (we at CarrotSearch, Lucene 
 and Solr folks) have their glue to make it possible. The question is if 
 there's something to pull out that others could share without having the need 
 to import Lucene-specific classes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3495) BlockJoinQuery doesn't implement boost

2011-10-06 Thread Robert Muir (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-3495.
-

Resolution: Fixed
  Assignee: Robert Muir

 BlockJoinQuery doesn't implement boost
 --

 Key: LUCENE-3495
 URL: https://issues.apache.org/jira/browse/LUCENE-3495
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 3.4
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 3.5, 4.0

 Attachments: LUCENE-3495.patch


 After reviewing LUCENE-3494, i checked other queries and noticed that 
 BlockJoinQuery currently throws UOE for getBoost and setBoost:
 {noformat}
 throw new UnsupportedOperationException(this query cannot support boosting; 
 please use childQuery.setBoost instead);
 {noformat}
 I don't think we can safely do that in queries, because other parts of lucene 
 rely upon this working... for example BQs rewrite when
 it has a single clause and erases itself.
 So I think we should just pass down the boost to the inner weight.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3492) Extract a generic framework for running randomized tests.


[ 
https://issues.apache.org/jira/browse/LUCENE-3492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122165#comment-13122165
 ] 

Dawid Weiss commented on LUCENE-3492:
-

That's why I mentioned I would like this to become _generally_ useful, not only 
restricted to Lucene/Solr :) If we make it work for two projects (Carrot2 and 
Lucene) chances are the outcome will be flexible enough to use elsewhere. 

I'm not saying you must fix the seeds using annotations -- it's an option.

 Extract a generic framework for running randomized tests.
 -

 Key: LUCENE-3492
 URL: https://issues.apache.org/jira/browse/LUCENE-3492
 Project: Lucene - Java
  Issue Type: Improvement
  Components: general/test
Reporter: Dawid Weiss
Assignee: Dawid Weiss
Priority: Minor
 Fix For: 4.0

 Attachments: Screen Shot 2011-10-06 at 12.58.02 PM.png


 I love the idea of randomized testing. Everyone (we at CarrotSearch, Lucene 
 and Solr folks) have their glue to make it possible. The question is if 
 there's something to pull out that others could share without having the need 
 to import Lucene-specific classes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2765) Shard/Node states

2011-10-06 Thread Jamie Johnson (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jamie Johnson updated SOLR-2765:


Attachment: shard-roles.patch

 Shard/Node states
 -

 Key: SOLR-2765
 URL: https://issues.apache.org/jira/browse/SOLR-2765
 Project: Solr
  Issue Type: Sub-task
  Components: SolrCloud, update
Reporter: Yonik Seeley
 Fix For: 4.0

 Attachments: shard-roles.patch


 Need state for shards that indicate they are recovering, active/enabled, or 
 disabled.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2765) Shard/Node states

2011-10-06 Thread Jamie Johnson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122185#comment-13122185
 ] 

Jamie Johnson commented on SOLR-2765:
-

Yonik,

I had a need to have this role capability so I could dynamically 
add/remove/discover solr instances and their responsibility as the state of the 
cloud changed.  To do this I added the following snippet to ZKController.java, 
CoreContainer.java and CloudDescriptor.java to incorporate this information.  
Now in solr.xml you define the following:

core name=coreName instanceDir=. shard=shard1 
collection=collection roles=searcher,indexer/

I've attached the patch for comment (wasn't done against trunk but I can try to 
pull that down and do it there if necessary).

 Shard/Node states
 -

 Key: SOLR-2765
 URL: https://issues.apache.org/jira/browse/SOLR-2765
 Project: Solr
  Issue Type: Sub-task
  Components: SolrCloud, update
Reporter: Yonik Seeley
 Fix For: 4.0

 Attachments: shard-roles.patch


 Need state for shards that indicate they are recovering, active/enabled, or 
 disabled.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2765) Shard/Node states

2011-10-06 Thread Mark Miller (Issue Comment Edited) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122196#comment-13122196
 ] 

Mark Miller commented on SOLR-2765:
---

This is where incremental update of the cloud state gets tricky...

If you have something like these roles at the shard level, all of a sudden you 
cannot change them on the fly because the new incremental update will not pick 
them up.

Its a tricky situation - without incremental, things start to get nasty at a 
huge number of shards. One possibility is that everyone also watches another 
node, that when pinged, causes a full read - so that must cloud state updates 
are incremental, but when per shard info like this is changed on the fly, you 
can then trigger a full read by everyone...

 Shard/Node states
 -

 Key: SOLR-2765
 URL: https://issues.apache.org/jira/browse/SOLR-2765
 Project: Solr
  Issue Type: Sub-task
  Components: SolrCloud, update
Reporter: Yonik Seeley
 Fix For: 4.0

 Attachments: shard-roles.patch


 Need state for shards that indicate they are recovering, active/enabled, or 
 disabled.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Issue Comment Edited] (SOLR-2765) Shard/Node states


[ 
https://issues.apache.org/jira/browse/SOLR-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122196#comment-13122196
 ] 

Mark Miller edited comment on SOLR-2765 at 10/6/11 7:54 PM:


This is where incremental update of the cloud state gets tricky...

If you have something like these roles at the shard level, all of a sudden you 
cannot change them on the fly because the new incremental update will not pick 
them up.

Its a tricky situation - without incremental, things start to get nasty at a 
huge number of shards. One possibility is that everyone also watches another 
node, that when pinged, causes a full read - so that most cloud state updates 
are incremental, but when per shard info like this is changed on the fly, you 
can then trigger a full read by everyone...

  was (Author: markrmil...@gmail.com):
This is where incremental update of the cloud state gets tricky...

If you have something like these roles at the shard level, all of a sudden you 
cannot change them on the fly because the new incremental update will not pick 
them up.

Its a tricky situation - without incremental, things start to get nasty at a 
huge number of shards. One possibility is that everyone also watches another 
node, that when pinged, causes a full read - so that must cloud state updates 
are incremental, but when per shard info like this is changed on the fly, you 
can then trigger a full read by everyone...
  
 Shard/Node states
 -

 Key: SOLR-2765
 URL: https://issues.apache.org/jira/browse/SOLR-2765
 Project: Solr
  Issue Type: Sub-task
  Components: SolrCloud, update
Reporter: Yonik Seeley
 Fix For: 4.0

 Attachments: shard-roles.patch


 Need state for shards that indicate they are recovering, active/enabled, or 
 disabled.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-2814) Core names that contain a - fail in new Admin Gui

2011-10-06 Thread Eric Pugh (Created) (JIRA)

Core names that contain a - fail in new Admin Gui
---

 Key: SOLR-2814
 URL: https://issues.apache.org/jira/browse/SOLR-2814
 Project: Solr
  Issue Type: Bug
  Components: web gui
Affects Versions: 4.0
 Environment: Working with Solr 4 trunk
Reporter: Eric Pugh
Priority: Minor


If you have a core with a - in the name, any clicks on it in the new web GUI 
seem to be ignored.   A core named uspatentgrant works fine, but a core named 
us-patent-grant isn't openable in the GUI.  Nothing is logged in the solr 
output either.   I will attach a screenshot.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2814) Core names that contain a - fail in new Admin Gui

2011-10-06 Thread Eric Pugh (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Pugh updated SOLR-2814:


Attachment: solr-admin.png

screenshot showing admin gui

 Core names that contain a - fail in new Admin Gui
 ---

 Key: SOLR-2814
 URL: https://issues.apache.org/jira/browse/SOLR-2814
 Project: Solr
  Issue Type: Bug
  Components: web gui
Affects Versions: 4.0
 Environment: Working with Solr 4 trunk
Reporter: Eric Pugh
Priority: Minor
 Attachments: solr-admin.png


 If you have a core with a - in the name, any clicks on it in the new web 
 GUI seem to be ignored.   A core named uspatentgrant works fine, but a core 
 named us-patent-grant isn't openable in the GUI.  Nothing is logged in the 
 solr output either.   I will attach a screenshot.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3483) Move Function grouping collectors from Solr to grouping module

2011-10-06 Thread Martijn van Groningen (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martijn van Groningen resolved LUCENE-3483.
---

Resolution: Fixed

 Move Function grouping collectors from Solr to grouping module
 --

 Key: LUCENE-3483
 URL: https://issues.apache.org/jira/browse/LUCENE-3483
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/grouping
Affects Versions: 4.0
Reporter: Martijn van Groningen
Assignee: Martijn van Groningen
Priority: Minor
 Fix For: 4.0

 Attachments: LUCENE-3483.patch, LUCENE-3483.patch, LUCENE-3483.patch


 Move the Function*Collectors from Solr (inside Grouping source file) to 
 grouping module.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3483) Move Function grouping collectors from Solr to grouping module

2011-10-06 Thread Martijn van Groningen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1310#comment-1310
 ] 

Martijn van Groningen commented on LUCENE-3483:
---

Committed in r1179808

 Move Function grouping collectors from Solr to grouping module
 --

 Key: LUCENE-3483
 URL: https://issues.apache.org/jira/browse/LUCENE-3483
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/grouping
Affects Versions: 4.0
Reporter: Martijn van Groningen
Assignee: Martijn van Groningen
Priority: Minor
 Fix For: 4.0

 Attachments: LUCENE-3483.patch, LUCENE-3483.patch, LUCENE-3483.patch


 Move the Function*Collectors from Solr (inside Grouping source file) to 
 grouping module.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2814) Core names that contain a - fail in new Admin Gui

2011-10-06 Thread Martijn van Groningen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122233#comment-13122233
 ] 

Martijn van Groningen commented on SOLR-2814:
-

I also noticed this today. I didn't know that this was the problem. Now that I 
have renamed core I know it is.
The core with a dash does exist in Solr but it isn't possible to interact with 
core via the new gui.

 Core names that contain a - fail in new Admin Gui
 ---

 Key: SOLR-2814
 URL: https://issues.apache.org/jira/browse/SOLR-2814
 Project: Solr
  Issue Type: Bug
  Components: web gui
Affects Versions: 4.0
 Environment: Working with Solr 4 trunk
Reporter: Eric Pugh
Priority: Minor
 Attachments: solr-admin.png


 If you have a core with a - in the name, any clicks on it in the new web 
 GUI seem to be ignored.   A core named uspatentgrant works fine, but a core 
 named us-patent-grant isn't openable in the GUI.  Nothing is logged in the 
 solr output either.   I will attach a screenshot.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2814) Core names that contain a - fail in new Admin Gui

2011-10-06 Thread Eric Pugh (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122237#comment-13122237
 ] 

Eric Pugh commented on SOLR-2814:
-

Much better description of the behavior of the bug!

 Core names that contain a - fail in new Admin Gui
 ---

 Key: SOLR-2814
 URL: https://issues.apache.org/jira/browse/SOLR-2814
 Project: Solr
  Issue Type: Bug
  Components: web gui
Affects Versions: 4.0
 Environment: Working with Solr 4 trunk
Reporter: Eric Pugh
Priority: Minor
 Attachments: solr-admin.png


 If you have a core with a - in the name, any clicks on it in the new web 
 GUI seem to be ignored.   A core named uspatentgrant works fine, but a core 
 named us-patent-grant isn't openable in the GUI.  Nothing is logged in the 
 solr output either.   I will attach a screenshot.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2765) Shard/Node states

2011-10-06 Thread Jamie Johnson (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122277#comment-13122277
]

Jamie Johnson commented on SOLR-2765:
-

Yeah 100% agree. The current implementation of update doesn't check to see if
the data in the node changed, you'd need a watcher on each node to do that.
The other project that I'm working on does just that. We create a watcher on
/live_nodes to track the list of available servers, we create a watch on the
collection to see if a slice was added/removed, we create a watcher on each
slice (not sure if that is the correct terminology) to check if a shard is
added/removed and subsequently a watcher on each shard to track data changes.
So lots of watchers all around.

Would it be easier to store this information on the ephemeral nodes (under
live_nodes)? Then we only need a watcher for live_nodes (add/remove) and a
watcher for each shard under live_nodes to see if their data changed. I'm not
sure what else is using the collection hierarchy (just query?), but perhaps
would be a bit simpler.

Shard/Node states
-

Key: SOLR-2765
URL: https://issues.apache.org/jira/browse/SOLR-2765
Project: Solr
Issue Type: Sub-task
Components: SolrCloud, update
Reporter: Yonik Seeley
Fix For: 4.0

Attachments: shard-roles.patch

Need state for shards that indicate they are recovering, active/enabled, or
disabled.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-2815) Fields with a - in the name are interpreted as functions in the fl= parameter.

2011-10-06 Thread Eric Pugh (Created) (JIRA)

Fields with a - in the name are interpreted as functions in the fl= parameter.


 Key: SOLR-2815
 URL: https://issues.apache.org/jira/browse/SOLR-2815
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 4.0
 Environment: Using latest from trunk
Reporter: Eric Pugh


If you query for a field that has a - character in the name, you get odd 
results.  I took the example schema and added a field called in-stock to go 
along with the existing inStock field.  

A query for http://localhost:8983/solr/select?q=*:*fl=id,in-stock throws back 
an error saying the field in can't be found.  

I can sort of work around it by quoting the field name as in-stock:

http://localhost:8983/solr/select?q=*:*fl=id,%22in-stock%22rows=1

However the output is still off:

doc
str name=idGB18030TEST/str
str name=in-stockin-stock/str
/doc

In looking at it, I think the dash character causes the field name to be 
interpreted as an actual function!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3433) Random access non RAM resident IndexDocValues (CSF)

2011-10-06 Thread Michael McCandless (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122312#comment-13122312
 ] 

Michael McCandless commented on LUCENE-3433:


Patch is awesome Simon; thank you.

Only thing I noticed: can you fix SortedSource.numOrds back to .getValueCount?

+1 to commit!

 Random access non RAM resident IndexDocValues (CSF)
 ---

 Key: LUCENE-3433
 URL: https://issues.apache.org/jira/browse/LUCENE-3433
 Project: Lucene - Java
  Issue Type: New Feature
Affects Versions: 4.0
Reporter: Yonik Seeley
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-3433.patch, LUCENE-3433.patch, LUCENE-3433.patch, 
 LUCENE-3433.patch, sorted_source.patch


 There should be a way to get specific IndexDocValues by going through the 
 Directory rather than loading all of the values into memory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3488) Factor out SearcherManager from NRTManager

2011-10-06 Thread Michael McCandless (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-3488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122314#comment-13122314
]

Michael McCandless commented on LUCENE-3488:

I still see a few javadoc warnings... but otherwise +1 to commit; what a great
simplification. It's nice that you can again pass either a Directory or Writer
to SearcherManager as your source for new readers...

Factor out SearcherManager from NRTManager
--

Key: LUCENE-3488
URL: https://issues.apache.org/jira/browse/LUCENE-3488
Project: Lucene - Java
Issue Type: Improvement
Affects Versions: 3.5, 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
Fix For: 3.5, 4.0

Attachments: LUCENE-3488.patch, LUCENE-3488.patch, LUCENE-3488.patch

Currently we have NRTManager and SearcherManager while NRTManager contains a
big piece of the code that is already in SearcherManager. Users are kind of
forced to use NRTManager if they want to have SearcherManager goodness with
NRT. The integration into NRTManager also forces you to maintain two
instances even if you know you always want deletes. To me NRTManager tries to
do more than necessary and mixes lots of responsibilities ie. handling
searchers and handling indexing generations. NRTManager should use a
SearcherManager by aggregation rather than duplicate a lot of logic.
SearcherManager should have a NRT and Directory based implementation users
can simply choose from.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-1536) if a filter can support random access API, we should use it


 [ 
https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-1536:
--

Attachment: LUCENE-1536-rewrite.patch

New patch (still only Lucene Core, no contrib/modules/solr modified):
- Nuked Filter handling completely from IndexSearcher. Algorithms and Random 
access optimizations were added to FilteredQuery. IS.search(Query, Filter,...) 
now only wraps the query with the Filter, if filter!=null (small helper method).
- The random access threshhold is still in 
IndexSearcher.setFilterRandomAccessThreshold(), FilteredQuery gets it in it's 
weight from IndexSearcher. This is maybe not the best solutions, we can also 
add a setter to FilteredQuery and IS passes it to FilteredQuery.

What do you think? Mike: Can you do perf tests?

 if a filter can support random access API, we should use it
 ---

 Key: LUCENE-1536
 URL: https://issues.apache.org/jira/browse/LUCENE-1536
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/search
Affects Versions: 2.4
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: CachedFilterIndexReader.java, LUCENE-1536-rewrite.patch, 
 LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch


 I ran some performance tests, comparing applying a filter via
 random-access API instead of current trunk's iterator API.
 This was inspired by LUCENE-1476, where we realized deletions should
 really be implemented just like a filter, but then in testing found
 that switching deletions to iterator was a very sizable performance
 hit.
 Some notes on the test:
   * Index is first 2M docs of Wikipedia.  Test machine is Mac OS X
 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153.
   * I test across multiple queries.  1-X means an OR query, eg 1-4
 means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2
 AND 3 AND 4.  u s means united states (phrase search).
   * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90,
 95, 98, 99, 99.9 (filter is non-null but all bits are set),
 100 (filter=null, control)).
   * Method high means I use random-access filter API in
 IndexSearcher's main loop.  Method low means I use random-access
 filter API down in SegmentTermDocs (just like deleted docs
 today).
   * Baseline (QPS) is current trunk, where filter is applied as iterator up
 high (ie in IndexSearcher's search loop).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-1536) if a filter can support random access API, we should use it

2011-10-06 Thread Michael McCandless (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122355#comment-13122355
 ] 

Michael McCandless commented on LUCENE-1536:


I will do perf tests!  Working on getting luceneutil to do random filters... 
but could be a few days (I'm offline for the next 3 days) unless I can commit 
to luceneutil and someone else can run the tests...

 if a filter can support random access API, we should use it
 ---

 Key: LUCENE-1536
 URL: https://issues.apache.org/jira/browse/LUCENE-1536
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/search
Affects Versions: 2.4
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: CachedFilterIndexReader.java, LUCENE-1536-rewrite.patch, 
 LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch


 I ran some performance tests, comparing applying a filter via
 random-access API instead of current trunk's iterator API.
 This was inspired by LUCENE-1476, where we realized deletions should
 really be implemented just like a filter, but then in testing found
 that switching deletions to iterator was a very sizable performance
 hit.
 Some notes on the test:
   * Index is first 2M docs of Wikipedia.  Test machine is Mac OS X
 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153.
   * I test across multiple queries.  1-X means an OR query, eg 1-4
 means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2
 AND 3 AND 4.  u s means united states (phrase search).
   * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90,
 95, 98, 99, 99.9 (filter is non-null but all bits are set),
 100 (filter=null, control)).
   * Method high means I use random-access filter API in
 IndexSearcher's main loop.  Method low means I use random-access
 filter API down in SegmentTermDocs (just like deleted docs
 today).
   * Baseline (QPS) is current trunk, where filter is applied as iterator up
 high (ie in IndexSearcher's search loop).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-1536) if a filter can support random access API, we should use it


[ 
https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122356#comment-13122356
 ] 

Uwe Schindler commented on LUCENE-1536:
---

I will add further tests tomorrow, to test all code paths in FilteredQuery. 
There is a short-circuit (it implements Scorer.score(Collector) for fast 
top-scorer as it existed in IndexSearcher.searchWithFilter before. To test the 
standard scorer behavior (nextDoc/advance), a test should be added that adds 
FilteredQuery as clause with others to a BQ, so ConjunctionScorer tries 
nextDoc/advance. 

Somebody else might look at the scorer and double check. I had to rewrite 
FilteredQuery#Weight#Scorer, as the filterIter is already advanced to first doc 
(to check the random access threshold).

 if a filter can support random access API, we should use it
 ---

 Key: LUCENE-1536
 URL: https://issues.apache.org/jira/browse/LUCENE-1536
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/search
Affects Versions: 2.4
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: CachedFilterIndexReader.java, LUCENE-1536-rewrite.patch, 
 LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch


 I ran some performance tests, comparing applying a filter via
 random-access API instead of current trunk's iterator API.
 This was inspired by LUCENE-1476, where we realized deletions should
 really be implemented just like a filter, but then in testing found
 that switching deletions to iterator was a very sizable performance
 hit.
 Some notes on the test:
   * Index is first 2M docs of Wikipedia.  Test machine is Mac OS X
 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153.
   * I test across multiple queries.  1-X means an OR query, eg 1-4
 means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2
 AND 3 AND 4.  u s means united states (phrase search).
   * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90,
 95, 98, 99, 99.9 (filter is non-null but all bits are set),
 100 (filter=null, control)).
   * Method high means I use random-access filter API in
 IndexSearcher's main loop.  Method low means I use random-access
 filter API down in SegmentTermDocs (just like deleted docs
 today).
   * Baseline (QPS) is current trunk, where filter is applied as iterator up
 high (ie in IndexSearcher's search loop).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-1536) if a filter can support random access API, we should use it


 [ 
https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-1536:
--

Attachment: LUCENE-1536-rewrite.patch

New patch:
- Fixed the FilteredQuery-Scorer's advance by logic change. Its now much easier 
to understand. The corresponding tests are in TestFilteredQuery: All tests are 
executed 2 times: as random access filter and as iterator filter. Also 
FilteredQuery is added to BQ, so the conventional scorer (nextDoc/advance) is 
tested.

The tests for CachingWrapper* are still disabled, have to rewrite them 
tomorrow. Then we can change contrib and Solr.

 if a filter can support random access API, we should use it
 ---

 Key: LUCENE-1536
 URL: https://issues.apache.org/jira/browse/LUCENE-1536
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/search
Affects Versions: 2.4
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: CachedFilterIndexReader.java, LUCENE-1536-rewrite.patch, 
 LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
 LUCENE-1536-rewrite.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch


 I ran some performance tests, comparing applying a filter via
 random-access API instead of current trunk's iterator API.
 This was inspired by LUCENE-1476, where we realized deletions should
 really be implemented just like a filter, but then in testing found
 that switching deletions to iterator was a very sizable performance
 hit.
 Some notes on the test:
   * Index is first 2M docs of Wikipedia.  Test machine is Mac OS X
 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153.
   * I test across multiple queries.  1-X means an OR query, eg 1-4
 means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2
 AND 3 AND 4.  u s means united states (phrase search).
   * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90,
 95, 98, 99, 99.9 (filter is non-null but all bits are set),
 100 (filter=null, control)).
   * Method high means I use random-access filter API in
 IndexSearcher's main loop.  Method low means I use random-access
 filter API down in SegmentTermDocs (just like deleted docs
 today).
   * Baseline (QPS) is current trunk, where filter is applied as iterator up
 high (ie in IndexSearcher's search loop).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3492) Extract a generic framework for running randomized tests.

2011-10-06 Thread Dawid Weiss (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss updated LUCENE-3492:


Description: 
I love the idea of randomized testing. Everyone (we at CarrotSearch, Lucene and 
Solr folks) have their glue to make it possible. The question is if there's 
something to pull out that others could share without having the need to import 
Lucene-specific classes.

The work on this issue is on my github account (lots of experiments):
https://github.com/dweiss/randomizedtesting

Or directly: git clone git://github.com/dweiss/randomizedtesting.git

  was:
I love the idea of randomized testing. Everyone (we at CarrotSearch, Lucene and 
Solr folks) have their glue to make it possible. The question is if there's 
something to pull out that others could share without having the need to import 
Lucene-specific classes.



 Extract a generic framework for running randomized tests.
 -

 Key: LUCENE-3492
 URL: https://issues.apache.org/jira/browse/LUCENE-3492
 Project: Lucene - Java
  Issue Type: Improvement
  Components: general/test
Reporter: Dawid Weiss
Assignee: Dawid Weiss
Priority: Minor
 Fix For: 4.0

 Attachments: Screen Shot 2011-10-06 at 12.58.02 PM.png


 I love the idea of randomized testing. Everyone (we at CarrotSearch, Lucene 
 and Solr folks) have their glue to make it possible. The question is if 
 there's something to pull out that others could share without having the need 
 to import Lucene-specific classes.
 The work on this issue is on my github account (lots of experiments):
 https://github.com/dweiss/randomizedtesting
 Or directly: git clone git://github.com/dweiss/randomizedtesting.git

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3492) Extract a generic framework for running randomized tests.

[
https://issues.apache.org/jira/browse/LUCENE-3492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122401#comment-13122401
]

Dawid Weiss commented on LUCENE-3492:
-

Ok. I've published the project on github here:
https://github.com/dweiss/randomizedtesting

The repo contains the runner, some tests and examples. Lots of TODOs (in TODO),
so consider this a work-in-progress, but if anybody cares to take a look and
shout if something is definitely not right -- go ahead.

mvn verify on the topmost project compiles everything and runs the tests/
examples. I don't see any functional deviations or differences in execution
between ant maven and my Eclipse GUI (mentioned by Robert) which is good.

Extract a generic framework for running randomized tests.
-

Attachments: Screen Shot 2011-10-06 at 12.58.02 PM.png

I love the idea of randomized testing. Everyone (we at CarrotSearch, Lucene
and Solr folks) have their glue to make it possible. The question is if
there's something to pull out that others could share without having the need
to import Lucene-specific classes.
The work on this issue is on my github account (lots of experiments):
https://github.com/dweiss/randomizedtesting
Or directly: git clone git://github.com/dweiss/randomizedtesting.git

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2765) Shard/Node states