RE: svn commit: r950853 - /lucene/dev/trunk/solr/src/java/org/apache/solr/search/QueryParsing.java

2010-06-02 Thread Uwe Schindler
...and all MTQ's in Lucene default to constant score since 2.9

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: hoss...@apache.org [mailto:hoss...@apache.org]
> Sent: Thursday, June 03, 2010 3:47 AM
> To: comm...@lucene.apache.org
> Subject: svn commit: r950853 -
> /lucene/dev/trunk/solr/src/java/org/apache/solr/search/QueryParsing.java
> 
> Author: hossman
> Date: Thu Jun  3 01:46:44 2010
> New Revision: 950853
> 
> URL: http://svn.apache.org/viewvc?rev=950853&view=rev
> Log:
> yonik removed ConstantScoreprefixQuery in r950784, but forgot to remove
> this usage of it (it was never migrated to lucene because it's trivial to 
> build
> with a PrefixFilter)
> 
> Modified:
> lucene/dev/trunk/solr/src/java/org/apache/solr/search/QueryParsing.java
> 
> Modified:
> lucene/dev/trunk/solr/src/java/org/apache/solr/search/QueryParsing.java
> URL:
> http://svn.apache.org/viewvc/lucene/dev/trunk/solr/src/java/org/apache/s
> olr/search/QueryParsing.java?rev=950853&r1=950852&r2=950853&view=diff
> ==
> 
> ---
> lucene/dev/trunk/solr/src/java/org/apache/solr/search/QueryParsing.java
> (original)
> +++
> lucene/dev/trunk/solr/src/java/org/apache/solr/search/QueryParsing.java
> Thu Jun  3 01:46:44 2010
> @@ -539,12 +539,6 @@ public class QueryParsing {
>FieldType ft = writeFieldName(prefix.field(), schema, out, flags);
>out.append(prefix.text());
>out.append('*');
> -} else if (query instanceof ConstantScorePrefixQuery) {
> -  ConstantScorePrefixQuery q = (ConstantScorePrefixQuery) query;
> -  Term prefix = q.getPrefix();
> -  FieldType ft = writeFieldName(prefix.field(), schema, out, flags);
> -  out.append(prefix.text());
> -  out.append('*');
>  } else if (query instanceof WildcardQuery) {
>out.append(query.toString());
>writeBoost = false;
> 



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (SOLR-1870) Binary Update Request (javabin) fails when the field type of a multivalued SolrInputDocument field is a Set (or any type that is identified as an instance of i

2010-06-02 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12874957#action_12874957
 ] 

Noble Paul edited comment on SOLR-1870 at 6/3/10 1:19 AM:
--

Hoss , the fix is good. Treating all collections as type ARR instead of just 
collection. 
Moreover we do not need to support Iterable.
+1

  was (Author: noble.paul):
Hoss , the fix is good. Treating all collections as type ARR instead of 
just collection. 
+1
  
> Binary Update Request (javabin) fails when the field type of a multivalued 
> SolrInputDocument field is a Set (or any type that is identified as an 
> instance of iterable) 
> 
>
> Key: SOLR-1870
> URL: https://issues.apache.org/jira/browse/SOLR-1870
> Project: Solr
>  Issue Type: Bug
>  Components: clients - java, update
>Affects Versions: 1.4
>Reporter: Prasanna Ranganathan
> Fix For: 1.4.1, 3.1, 4.0
>
> Attachments: SOLR-1870-test.patch, SOLR-1870-test.patch, 
> SOLR-1870.patch, SOLR-1870.patch, SOLR-1870.patch
>
>
> When the field type of a field in a SolrInputDocument is a Collection based 
> on the Set interface, the JavaBinUpdate request fails. It works when sending 
> the document data over XML.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1870) Binary Update Request (javabin) fails when the field type of a multivalued SolrInputDocument field is a Set (or any type that is identified as an instance of iterable)

2010-06-02 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12874957#action_12874957
 ] 

Noble Paul commented on SOLR-1870:
--

Hoss , the fix is good. Treating all collections as type ARR instead of just 
collection. 
+1

> Binary Update Request (javabin) fails when the field type of a multivalued 
> SolrInputDocument field is a Set (or any type that is identified as an 
> instance of iterable) 
> 
>
> Key: SOLR-1870
> URL: https://issues.apache.org/jira/browse/SOLR-1870
> Project: Solr
>  Issue Type: Bug
>  Components: clients - java, update
>Affects Versions: 1.4
>Reporter: Prasanna Ranganathan
> Fix For: 1.4.1, 3.1, 4.0
>
> Attachments: SOLR-1870-test.patch, SOLR-1870-test.patch, 
> SOLR-1870.patch, SOLR-1870.patch, SOLR-1870.patch
>
>
> When the field type of a field in a SolrInputDocument is a Collection based 
> on the Set interface, the JavaBinUpdate request fails. It works when sending 
> the document data over XML.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-1938) make ElisionFilterFactory user-friendly

2010-06-02 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated SOLR-1938:
--

Attachment: SOLR-1938.patch

> make ElisionFilterFactory user-friendly
> ---
>
> Key: SOLR-1938
> URL: https://issues.apache.org/jira/browse/SOLR-1938
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Reporter: Robert Muir
>Assignee: Robert Muir
> Fix For: 3.1, 4.0
>
> Attachments: SOLR-1938.patch
>
>
> The ElisionFilterFactory is useful for removing french articles from words 
> (e.g. l'avion -> avion, see its tests).
> But the factory itself isnt very friendly, you need to provide an actual text 
> file listing these, which is sorta overkill
> Such a text file would look like:
> {noformat}
> # below are my articles
> l
> m
> t
> ...
> {noformat}
> I propose instead of throwing a RuntimeException if you dont provide the 
> articles param, to just use the default set
> already in ElisionFilter: (l, m, t, qu, n, s, j)
> It wont backwards break anyone as if they weren't providing it, they were 
> getting RuntimeException before.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (SOLR-1938) make ElisionFilterFactory user-friendly

2010-06-02 Thread Robert Muir (JIRA)
make ElisionFilterFactory user-friendly
---

 Key: SOLR-1938
 URL: https://issues.apache.org/jira/browse/SOLR-1938
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 3.1, 4.0


The ElisionFilterFactory is useful for removing french articles from words 
(e.g. l'avion -> avion, see its tests).

But the factory itself isnt very friendly, you need to provide an actual text 
file listing these, which is sorta overkill
Such a text file would look like:
{noformat}
# below are my articles
l
m
t
...
{noformat}

I propose instead of throwing a RuntimeException if you dont provide the 
articles param, to just use the default set
already in ElisionFilter: (l, m, t, qu, n, s, j)

It wont backwards break anyone as if they weren't providing it, they were 
getting RuntimeException before.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: svn commit: r950853 - /lucene/dev/trunk/solr/src/java/org/apache/solr/search/QueryParsing.java

2010-06-02 Thread Yonik Seeley
On Wed, Jun 2, 2010 at 9:46 PM,   wrote:
> URL: http://svn.apache.org/viewvc?rev=950853&view=rev
> Log:
> yonik removed ConstantScoreprefixQuery in r950784, but forgot to remove this 
> usage of it (it was never migrated to lucene because it's trivial to build 
> with a PrefixFilter)

Ah, thanks... I was about to blame my IDE, but then I realized I had
made that change but accidentally just didn't commit it (I had more
than one outstanding change, so I cut'n'pasted the file names to
commit).

-Yonik
http://www.lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Hudson build is back to normal : Lucene-3.x #32

2010-06-02 Thread Apache Hudson Server
See 



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-1870) Binary Update Request (javabin) fails when the field type of a multivalued SolrInputDocument field is a Set (or any type that is identified as an instance of iterable)

2010-06-02 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-1870:
---

Fix Version/s: 1.4.1
   3.1
   4.0

we should try to get this into 1.4.1 if we can get consensus on the fix.

> Binary Update Request (javabin) fails when the field type of a multivalued 
> SolrInputDocument field is a Set (or any type that is identified as an 
> instance of iterable) 
> 
>
> Key: SOLR-1870
> URL: https://issues.apache.org/jira/browse/SOLR-1870
> Project: Solr
>  Issue Type: Bug
>  Components: clients - java, update
>Affects Versions: 1.4
>Reporter: Prasanna Ranganathan
> Fix For: 1.4.1, 3.1, 4.0
>
> Attachments: SOLR-1870-test.patch, SOLR-1870-test.patch, 
> SOLR-1870.patch, SOLR-1870.patch, SOLR-1870.patch
>
>
> When the field type of a field in a SolrInputDocument is a Collection based 
> on the Set interface, the JavaBinUpdate request fails. It works when sending 
> the document data over XML.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-1870) Binary Update Request (javabin) fails when the field type of a multivalued SolrInputDocument field is a Set (or any type that is identified as an instance of iterable)

2010-06-02 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-1870:
---

Attachment: SOLR-1870-test.patch
SOLR-1870.patch

Noble: the newly updated SOLR-1870-test.patch demonstrates the concern i have 
for your fix: if the JavaBinCodec has support for Iterator and Iterable, but 
the JavaBinUpdateRequestCodec makes assumptions about Iterators only being used 
for streaming docs, then if people add Field values containing Custom objects 
that implement Iterable but are not actually Collection then the 
JavaBinUpdateRequestCodec will marshal them correctly, but it will have a 
ClassCastException when unmarshaling them -- ditto for people who want to add 
lazy Iterators as field values.

I don't disagree that making JavaBinCodec support Collection is a good idea in 
general, but it doesn't fix the root problem -- i think we need both changes.

The latest SOLR-1870.patch incorporates both my suggested fix for 
JavaBinUpdateRequestCodec, as well as your change to JavaBinCodec (with my 
suggested tweak of replacing List with Collection in the if tree), and all of 
the tests i've previously posted (ie: SOLR-1870-test.patch is for illustrative 
purposes only, it's not needed)

what do you think?

> Binary Update Request (javabin) fails when the field type of a multivalued 
> SolrInputDocument field is a Set (or any type that is identified as an 
> instance of iterable) 
> 
>
> Key: SOLR-1870
> URL: https://issues.apache.org/jira/browse/SOLR-1870
> Project: Solr
>  Issue Type: Bug
>  Components: clients - java, update
>Affects Versions: 1.4
>Reporter: Prasanna Ranganathan
> Fix For: 1.4.1, 3.1, 4.0
>
> Attachments: SOLR-1870-test.patch, SOLR-1870-test.patch, 
> SOLR-1870.patch, SOLR-1870.patch, SOLR-1870.patch
>
>
> When the field type of a field in a SolrInputDocument is a Collection based 
> on the Set interface, the JavaBinUpdate request fails. It works when sending 
> the document data over XML.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1750) SystemInfoRequestHandler - replacement for stats.jsp and registry.jsp

2010-06-02 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12874910#action_12874910
 ] 

Hoss Man commented on SOLR-1750:


bq. Please add an option that just lists the catalog of MBeans. 

It's already there -- if stats=false it just returns the list of SolrInfoMBeans 
from the registry (like registry.jsp)

what do you think of the proposed name change & path: SolrInfoMBeanHandler & 
/admin/mbeans ?

> SystemInfoRequestHandler - replacement for stats.jsp and registry.jsp
> -
>
> Key: SOLR-1750
> URL: https://issues.apache.org/jira/browse/SOLR-1750
> Project: Solr
>  Issue Type: Improvement
>  Components: web gui
>Reporter: Erik Hatcher
>Assignee: Erik Hatcher
>Priority: Trivial
> Fix For: 1.5, 3.1, 4.0
>
> Attachments: SystemStatsRequestHandler.java, 
> SystemStatsRequestHandler.java, SystemStatsRequestHandler.java
>
>
> stats.jsp is cool and all, but suffers from escaping issues, and also is not 
> accessible from SolrJ or other standard Solr APIs.
> Here's a request handler that emits everything stats.jsp does.
> For now, it needs to be registered in solrconfig.xml like this:
> {code}
>  class="solr.SystemStatsRequestHandler" />
> {code}
> But will register this in AdminHandlers automatically before committing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1750) SystemInfoRequestHandler - replacement for stats.jsp and registry.jsp

2010-06-02 Thread Lance Norskog (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12874904#action_12874904
 ] 

Lance Norskog commented on SOLR-1750:
-

Please add an option that just lists the catalog of MBeans.

> SystemInfoRequestHandler - replacement for stats.jsp and registry.jsp
> -
>
> Key: SOLR-1750
> URL: https://issues.apache.org/jira/browse/SOLR-1750
> Project: Solr
>  Issue Type: Improvement
>  Components: web gui
>Reporter: Erik Hatcher
>Assignee: Erik Hatcher
>Priority: Trivial
> Fix For: 1.5, 3.1, 4.0
>
> Attachments: SystemStatsRequestHandler.java, 
> SystemStatsRequestHandler.java, SystemStatsRequestHandler.java
>
>
> stats.jsp is cool and all, but suffers from escaping issues, and also is not 
> accessible from SolrJ or other standard Solr APIs.
> Here's a request handler that emits everything stats.jsp does.
> For now, it needs to be registered in solrconfig.xml like this:
> {code}
>  class="solr.SystemStatsRequestHandler" />
> {code}
> But will register this in AdminHandlers automatically before committing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (SOLR-1602) Refactor SOLR package structure to include o.a.solr.response and move QueryResponseWriters in there

2010-06-02 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-1602.


Fix Version/s: 3.1
   4.0
   (was: Next)
   Resolution: Fixed

Thanks Chris, that looks good.

Trunk...
Committed revision 950835.

branch 3x...
Committed revision 950838.


> Refactor SOLR package structure to include o.a.solr.response and move 
> QueryResponseWriters in there
> ---
>
> Key: SOLR-1602
> URL: https://issues.apache.org/jira/browse/SOLR-1602
> Project: Solr
>  Issue Type: Improvement
>  Components: Response Writers
>Affects Versions: 1.2, 1.3, 1.4
> Environment: independent of environment (code structure)
>Reporter: Chris A. Mattmann
>Assignee: Ryan McKinley
> Fix For: 3.1, 4.0
>
> Attachments: SOLR-1602.Mattmann.112509.patch.txt, 
> SOLR-1602.Mattmann.112509_02.patch.txt, 
> SOLR-1602.Mattmann.final.050810.patch.txt, 
> SOLR-1602.Mattmann.wrapup.031010.patch.txt, upgrade_solr_config
>
>
> Currently all o.a.solr.request.QueryResponseWriter implementations are 
> curiously located in the o.a.solr.request package. Not only is this package 
> getting big (30+ classes), a lot of them are misplaced. There should be a 
> first-class o.a.solr.response package, and the response related classes 
> should be given a home there. Patch forthcoming.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (SOLR-1935) BaseResponseWriter neglects to add SolrDocument in DocList isStreamingDocs=false

2010-06-02 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-1935.


 Assignee: Hoss Man
Fix Version/s: 3.1
   4.0
   Resolution: Fixed

Trunk...
Committed revision 950830.

branch_3x...
Committed revision 950831.

thanks Chris.

> BaseResponseWriter neglects to add SolrDocument in DocList 
> isStreamingDocs=false
> 
>
> Key: SOLR-1935
> URL: https://issues.apache.org/jira/browse/SOLR-1935
> Project: Solr
>  Issue Type: Bug
>  Components: Response Writers
>Affects Versions: 1.5
> Environment: working on SOLR-1925, i noticed this
>Reporter: Chris A. Mattmann
>Assignee: Hoss Man
> Fix For: 3.1, 4.0
>
> Attachments: SOLR-1935.Mattmann.053010.patch.txt
>
>
> There is a bug near line 126/127 in BaseResponseWriter.java in the 
> isStreamingDocs() == false section for the DocList case. The SorlDocuments 
> aren't being added back to the list object for return. I noticed this while I 
> was working on SOLR-1925. Simple patch to fix, attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2355) Refactor Directory/Multi/SegmentReader creation/reopening/cloning/closing

2010-06-02 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12874857#action_12874857
 ] 

Earwin Burrfoot commented on LUCENE-2355:
-

* NRT Reader shared live SegmentInfos with papa-IW -> getVersion, isOptimized, 
getCommitUserData, getIndexCommit were all broken.

> Refactor Directory/Multi/SegmentReader creation/reopening/cloning/closing
> -
>
> Key: LUCENE-2355
> URL: https://issues.apache.org/jira/browse/LUCENE-2355
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Earwin Burrfoot
>
> *Reader lifecycle evolved over time to become some heavily tangled mess. It's 
> hard to understand what's going on there, it's even harder to add some 
> fields/logic while ensuring that all possible code paths preserve these 
> fields/interact with the logic properly. While some of said mess is justified 
> by the task at hand, a big part is just badly done copypaste and can be 
> removed.
> I am currently refactoring this and intended to open an issue with a working 
> patch, but the task winded up somewhat bigger than I expected, so I'm opening 
> it earlier to track stuff encountered/changed/fixed.
> The list is by no means exhaustive.
> - an iteration to create SRs is copypasted several times, one of them (IW) 
> with wrong iteration bound
> - it is also overly complex and can be folded for create/reopen cases
> - readers sent to IndexReaderWarmer are termindexless/docstoreless on some 
> occasions
> - it is possible to clone() your way to readwrite NRT reader
> - IndexDeletionPolicy is not always preserved through clones/reopens
> - cloned readers share CoreReaders and, consequently, updated 
> termsIndex/docStores
> - threadlocal versions of fieldsReader/termsVector are bound to SR, not 
> CoreReaders and thus are recreated on clone/reopen
> - double-initialization for some fields (someone got lost and did this to be 
> sure I guess), stupid assert checks ( qwe = new(); assert qwe != null )
> - SR is not always recreated when compound status of underlying segment 
> changes
> - deleting already deleted doc marks deletions dirty and rewrites them
> - lots of synchronization is done around Reader, while it can be narrowed 
> down to norms/deletions/whatever
> I did some structural modifications:
> - CompositeReader extracts common code from DirectoryReader and MultiReader 
> (complete)
> - ReadonlyDirectoryReader and ReadonlySegmentReader are dead, 
> MutableD/SReaders are introduced and carry all modification logic/fields (DR 
> complete, SR in progress)
> - WriterBackedReader encapsulates NRT reader logic (complete)
> - CoreReaders split into CoreReaders, DocStores, TermInfos. All of these are 
> immutable and SR is cloned when you need to change its mode (in progress)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2348) DuplicateFilter incorrectly handles multiple calls to getDocIdSet for segment readers

2010-06-02 Thread Trejkaz (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12874841#action_12874841
 ] 

Trejkaz commented on LUCENE-2348:
-

That change broke nearly all our own filters.  We have a lot of filters which 
get their data from a database where the IDs are across the top-level reader's 
doc IDs.  The DuplicateFilter in contrib was noticed because I was reading 
about how the Filter API had changed, but when I went to find an example of a 
filter which (in theory :)) would have worked the same way so that I could 
borrow its solution, I found it was also making the same assumptions we were.

Our workaround was the same as described, passing the top-level reader into the 
constructor and then computing the doc ID set for that, and splitting it up and 
doing the maths to create the sub-sets for each segment reader.

The downside is that now you can only use this Filter instance with this 
reader, whereas the original DuplicateFilter would have worked on multiple 
top-level readers happily.

Having the top reader passed in before each sub-reader sounds like a good idea. 
 It might make it possible for the same filter instance to support multiple 
top-level readers as well.


> DuplicateFilter incorrectly handles multiple calls to getDocIdSet for segment 
> readers
> -
>
> Key: LUCENE-2348
> URL: https://issues.apache.org/jira/browse/LUCENE-2348
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: contrib/*
>Affects Versions: 2.9.2
>Reporter: Trejkaz
>
> DuplicateFilter currently works by building a single doc ID set, without 
> taking into account that getDocIdSet() will be called once per segment and 
> only with each segment's local reader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1405) Show the index files in the web UI

2010-06-02 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12874776#action_12874776
 ] 

Hoss Man commented on SOLR-1405:


Juan: no apologies neccessary, you made no mistake -- i was just sharing my 
opinions on the subject.

I was just looking at SOLR-1750 with the intent of finishing it up and merging 
your patch into that handler, when i realized it may not the best place either 
(there is already *another* SystemInfoHandler so i think 
SystemInfoRequestHandler should narrow it's focus to just be about returning 
the lists of SolrInfoMBean's w/stats)

If you want to tweak the patch to updated either the LukeRequestHandler or the 
SystemInfoHandler feel free -- i'm not honestly sure where it makes the most 
sense at this point.


> Show the index files in the web UI
> --
>
> Key: SOLR-1405
> URL: https://issues.apache.org/jira/browse/SOLR-1405
> Project: Solr
>  Issue Type: Improvement
>  Components: web gui
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Trivial
> Fix For: Next
>
> Attachments: data menu.png, Index file list.png, SOLR-1405.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> It would be great to view the actual index files from the web console.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1750) SystemInfoRequestHandler - replacement for stats.jsp and registry.jsp

2010-06-02 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12874768#action_12874768
 ] 

Hoss Man commented on SOLR-1750:


1) i screwed up, and should have put this in o.a.s.handler.admin instead of 
o.a.s.handler

2) Somehow i completely overlooked the fact that there is already a 
o.a.s.handler.admin.SystemInfoHandler (Erik even mentioned it above) which is 
registered to the path /admin/system and returns basic info on the current 
machine, current JVM, the versions of Lucene/Solr, and some basic info about 
the SolrCore.

With that in mind i propose we rename this new one to "SolrInfoMBeanHandler" 
since that's the crux of what it provides (data about all of the 
SolrInfoMBeans) and have AdminHandler register it with the path /admin/mbeans.  
We could/should probably also remove some of the code that overlaps between 
this handler and SystemInfoHandler.

comments?



> SystemInfoRequestHandler - replacement for stats.jsp and registry.jsp
> -
>
> Key: SOLR-1750
> URL: https://issues.apache.org/jira/browse/SOLR-1750
> Project: Solr
>  Issue Type: Improvement
>  Components: web gui
>Reporter: Erik Hatcher
>Assignee: Erik Hatcher
>Priority: Trivial
> Fix For: 1.5, 3.1, 4.0
>
> Attachments: SystemStatsRequestHandler.java, 
> SystemStatsRequestHandler.java, SystemStatsRequestHandler.java
>
>
> stats.jsp is cool and all, but suffers from escaping issues, and also is not 
> accessible from SolrJ or other standard Solr APIs.
> Here's a request handler that emits everything stats.jsp does.
> For now, it needs to be registered in solrconfig.xml like this:
> {code}
>  class="solr.SystemStatsRequestHandler" />
> {code}
> But will register this in AdminHandlers automatically before committing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (SOLR-1405) Show the index files in the web UI

2010-06-02 Thread Juan Pedro Danculovic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12874762#action_12874762
 ] 

Juan Pedro Danculovic edited comment on SOLR-1405 at 6/2/10 3:56 PM:
-

I didn't know about the idea of stop adding code into jsp pages
I can change the patch and add the index files information into the 
SystemInfoRequestHandler. Is this ok?
Sorry about my mistake.


  was (Author: jdancu):
Yo no sabía nada de la idea de dejar de añadir código en jsp.
Puedo cambiar el parche y añadir índice de información sobre los archivos en el 
SystemInfoRequestHandler. ¿Es esto correcto?
Perdón por mi error.
  
> Show the index files in the web UI
> --
>
> Key: SOLR-1405
> URL: https://issues.apache.org/jira/browse/SOLR-1405
> Project: Solr
>  Issue Type: Improvement
>  Components: web gui
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Trivial
> Fix For: Next
>
> Attachments: data menu.png, Index file list.png, SOLR-1405.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> It would be great to view the actual index files from the web console.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1630) StringIndexOutOfBoundsException in SpellCheckComponent

2010-06-02 Thread Khaled Hammouda (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12874765#action_12874765
 ] 

Khaled Hammouda commented on SOLR-1630:
---

We just hit this bug as well. To reproduce, you must index a document that 
contains a hyphen (or underscore) and then search with a misspelled version of 
the indexed text; e.g.

document contains: mid-term
query: mis-term
result: exception thrown

I looked at the code of where this is happening and it seems to be related to 
token offsets (of the tokenized query) in conjunction with a feature of the 
spellcheck component called collation. Basically collation tries to replace the 
original query with the top suggested words. It relies on the tokenizer to 
remove the original misspelled words and insert the suggested ones (using 
StringBuilder.replace). Unfortunately the token offsets look weird for words 
with hyphens (or underscore); for example:

query: abc_def
1st token: value = abc; startOffset = 0; endOffset = 7
2nd token: value = def; startOffset = 0; endOffset = 7

Because the two tokens occupy the same range (0-7) this messes up the 
replacement logic. I'm not sure if this tokenizer behavior is the correct one, 
but it's part of the problem.

Having said that, I tried to change the spellcheck tokenizer from standard to 
whitespace and this actually solved the problem; no errors and I get correct 
suggestions.

So, until this gets fixed you can either:

1) Disable spellchecker collation, or
2) Use a whitespace tokenizer for the spellchecker component

> StringIndexOutOfBoundsException in SpellCheckComponent
> --
>
> Key: SOLR-1630
> URL: https://issues.apache.org/jira/browse/SOLR-1630
> Project: Solr
>  Issue Type: Bug
>  Components: Schema and Analysis, spellchecker
>Affects Versions: 1.4
> Environment: Solr 1.4
> Lucene 2.9.1
> Win XP
> java version "1.6.0_14"
>Reporter: Robin Wojciki
>Assignee: Shalin Shekhar Mangar
> Attachments: bug.xml, schema.xml, SOLR-1630.patch, solrconfig.xml, 
> spellcheckconfig.xml
>
>
> For some documents/search strings, the SpellCheckComponent throws 
> StringIndexOutOfBoundsException
> See: http://www.lucidimagination.com/search/document/3be6555227e031fc/
> h2. Replication
>  * Save attached schema.xml and solrconfig.xml in 
> apache-solr-1.4.0/example/solr/conf
>  * Start Solr
>  * Index attached bug.xml
>  * Query [http://localhost:8983/solr/select/?q=awehjse-wjkekw]
> It throws a StringIndexOutOfBoundsException
> {noformat} String index out of range: -7
> java.lang.StringIndexOutOfBoundsException: String index out of range: -7
>   at java.lang.AbstractStringBuilder.replace(Unknown Source)
>   at java.lang.StringBuilder.replace(Unknown Source)
>   at 
> org.apache.solr.handler.component.SpellCheckComponent.toNamedList(SpellCheckComponent.java:248)
>   at 
> org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:143)
>   at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
>   at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
> {noformat} 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1405) Show the index files in the web UI

2010-06-02 Thread Juan Pedro Danculovic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12874762#action_12874762
 ] 

Juan Pedro Danculovic commented on SOLR-1405:
-

Yo no sabía nada de la idea de dejar de añadir código en jsp.
Puedo cambiar el parche y añadir índice de información sobre los archivos en el 
SystemInfoRequestHandler. ¿Es esto correcto?
Perdón por mi error.

> Show the index files in the web UI
> --
>
> Key: SOLR-1405
> URL: https://issues.apache.org/jira/browse/SOLR-1405
> Project: Solr
>  Issue Type: Improvement
>  Components: web gui
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Trivial
> Fix For: Next
>
> Attachments: data menu.png, Index file list.png, SOLR-1405.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> It would be great to view the actual index files from the web console.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Solr Security

2010-06-02 Thread Matthew Mauriello
Thanks for the quick response, I'll look into this a bit more and see what
I can do.

~Matt

> : requestHandlers are those that are active on default. I think the gist
> of
> : what your saying is go through my solrconfig.xml file and secure any
> paths
> : that seem like they should be "admin" only? We are not really concerned
>
> correct.
>
> : about security so much as just making sure the average user cannot mess
> : anything up. Users should only be able to search and retrieve xml
> : responses from solr and admins should be able to do everything and
> : anything else.
>
> sure ... but if your "users" are people who can hit the solr app directly,
> and if you are planning to block access to "/update" that implies that you
> are worried about them *trying* to update -- in which case you should also
> block /select?qt=/update because they could use that to update as well
> (it doesn't matter if there are no links to that URL anywhere, there are
> no links to /update either -- but evidently you are worried about your
> users constructing that URL as well)
>
> : > : BASIC Tomcat. Essentially I want users to only be able to /select/*
> and
> : > : admins to be able to do everything else. Right now I am checking for
> : > :
> : > : /select/* - Users
> : > : /admin/*  - Admin
> : > : /update/* - Admin
> : > :
> : > : Are there other url strings I should be protecting?
> : > : (This was unclear to me in the documentation)
> : >
> : > in general it depends on what requestHandlers you have configured in
> your
> : > solrconfig.xml ...  if you have an instance of the
> ExtractinRequestHandler
> : > configured with the path "/extract/stuff" then you'll probably want to
> : > protect that as well.  In particular you may want to block users from
> : > accessing /replication (but then if you'll need to give special access
> to
> : > the slave machines so they can query the master)
> : >
> : > You should also watch out for the "qt" param when using the special
> : > "/select" path.  I would suggest that you just block user access
> : > /select, and use specific paths for accessing handlers directly (ie
> : > /search, /dismax, etc...)
> : >
> : >
> : > -Hoss
> : >
> : >
> : > -
> : > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> : > For additional commands, e-mail: dev-h...@lucene.apache.org
> : >
> : >
> :
> :
> : -
> : To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> : For additional commands, e-mail: dev-h...@lucene.apache.org
> :
>
>
>
> -Hoss
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Solr Security

2010-06-02 Thread Chris Hostetter
: requestHandlers are those that are active on default. I think the gist of
: what your saying is go through my solrconfig.xml file and secure any paths
: that seem like they should be "admin" only? We are not really concerned

correct.

: about security so much as just making sure the average user cannot mess
: anything up. Users should only be able to search and retrieve xml
: responses from solr and admins should be able to do everything and
: anything else.

sure ... but if your "users" are people who can hit the solr app directly, 
and if you are planning to block access to "/update" that implies that you 
are worried about them *trying* to update -- in which case you should also 
block /select?qt=/update because they could use that to update as well  
(it doesn't matter if there are no links to that URL anywhere, there are 
no links to /update either -- but evidently you are worried about your 
users constructing that URL as well)

: > : BASIC Tomcat. Essentially I want users to only be able to /select/* and
: > : admins to be able to do everything else. Right now I am checking for
: > :
: > : /select/* - Users
: > : /admin/*  - Admin
: > : /update/* - Admin
: > :
: > : Are there other url strings I should be protecting?
: > : (This was unclear to me in the documentation)
: >
: > in general it depends on what requestHandlers you have configured in your
: > solrconfig.xml ...  if you have an instance of the ExtractinRequestHandler
: > configured with the path "/extract/stuff" then you'll probably want to
: > protect that as well.  In particular you may want to block users from
: > accessing /replication (but then if you'll need to give special access to
: > the slave machines so they can query the master)
: >
: > You should also watch out for the "qt" param when using the special
: > "/select" path.  I would suggest that you just block user access
: > /select, and use specific paths for accessing handlers directly (ie
: > /search, /dismax, etc...)
: >
: >
: > -Hoss
: >
: >
: > -
: > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
: > For additional commands, e-mail: dev-h...@lucene.apache.org
: >
: >
: 
: 
: -
: To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
: For additional commands, e-mail: dev-h...@lucene.apache.org
: 



-Hoss


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1405) Show the index files in the web UI

2010-06-02 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12874746#action_12874746
 ] 

Hoss Man commented on SOLR-1405:


bq. Hmmm, really?

it's a recurring theme that keeps coming up over and over - i actually thought 
you were the person who originally started pushing for it.

bq. Sometimes it's just a much easier way to expose simple information that's 
primarily meant for human consumption (and when there is a patch in hand).

Fair enough ... but at this point we already have the SystemInfoRequestHandler 
which eliminates the need for stats.jsp or registry.jsp (which is what this 
patch updates) ... so i'd rather not add more functionality to either of those 
JSP.


> Show the index files in the web UI
> --
>
> Key: SOLR-1405
> URL: https://issues.apache.org/jira/browse/SOLR-1405
> Project: Solr
>  Issue Type: Improvement
>  Components: web gui
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Trivial
> Fix For: Next
>
> Attachments: data menu.png, Index file list.png, SOLR-1405.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> It would be great to view the actual index files from the web console.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (SOLR-1914) phps and json add a double qoute / brackets, serialize isnt working

2010-06-02 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley resolved SOLR-1914.


Fix Version/s: 3.1
   Resolution: Fixed

> phps and json add a double qoute / brackets, serialize isnt working
> ---
>
> Key: SOLR-1914
> URL: https://issues.apache.org/jira/browse/SOLR-1914
> Project: Solr
>  Issue Type: Bug
>  Components: clients - php
>Affects Versions: 1.4
> Environment: apache Solr 1.4, Jetty (out of the box).  e-comerce 
> application UTF-8
>Reporter: Sascha Jovanoski
> Fix For: 3.1
>
> Attachments: ff01.xml, SOLR-1949.patch
>
>
> (first of all Solr Rocks!)
> iam runnig an e-comerce application with dynamic filteset, depend on selected 
> categories.
> everything works fine and stable but today i found out that ther is a strange 
> case:
> i guess around 5000 strings in my app used as facet filter. There is one 
> String "transluzentes Acryl" and even the english  translation "Translucent 
> acrylics" are not working.
> QUERY:
> http://ff01:8983/solr/core1/select/?q=*:*&fq=shop_id:10 AND language:2 AND 
> category_1:"Badewannen" AND material:"transluzentes 
> Acryl"&facet=on&facet.limit=400&facet.field=category_1&facet.field=category_2&facet.field=manu&facet.field=length&facet.field=width&facet.field=height&facet.field=color&facet.field=material&facet.field=delivery&facet.field=attributes&facet.mincount=1&facet.sort=lex&facet.method=fc&stats=true&stats.field=price&start=0&rows=50&sort=sort+desc&fl=name,model_nr,image,url,manu,delivery,in_stock,serie,price,uvp,nav_id,template,option,name,model_nr,image,url,manu,delivery,in_stock,serie,price,uvp,nav_id,template,option&wt=phps&V=1.0&version=2.2
> RESPONSE: as xml and php works fine! but json , phps ends up like this.
> .s:2:"fq";s:79:"shop_id:10 AND language:2 AND category_1:"Badewannen" AND 
> material:"transluzentes Acryl"";..
> see that double qoutes at the end.
> That happens only on this strings or Docs because even the english Docs 
> fails. I though its may be a matter of encoding, so i added a transformer in 
> the DIH which  replaced these strings with "Hello World".
> Faces on "Hello Wolrd" fails. Other Facet Strings with white spaces, special 
> chars and so on never fail. 
> if i add brackets on the fq 
> ...q=*:*&fq=shop_id:10 AND language:2 AND category_1:"Badewannen" AND 
> material:"transluzentes Acryl"&facet=on&...
> it ends up with a double }}
> s:2:"fq";s:79:"shop_id:20 AND language:2 AND category_1:"Badewannen" AND 
> material:"Hallo Welt"";}}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Solr Security

2010-06-02 Thread Matthew Mauriello
Thanks for the response.

I am using the most basic Solr Installation so I imagine the only
requestHandlers are those that are active on default. I think the gist of
what your saying is go through my solrconfig.xml file and secure any paths
that seem like they should be "admin" only? We are not really concerned
about security so much as just making sure the average user cannot mess
anything up. Users should only be able to search and retrieve xml
responses from solr and admins should be able to do everything and
anything else.

Thank you for your time,

~Matt

>
> : BASIC Tomcat. Essentially I want users to only be able to /select/* and
> : admins to be able to do everything else. Right now I am checking for
> :
> : /select/* - Users
> : /admin/*  - Admin
> : /update/* - Admin
> :
> : Are there other url strings I should be protecting?
> : (This was unclear to me in the documentation)
>
> in general it depends on what requestHandlers you have configured in your
> solrconfig.xml ...  if you have an instance of the ExtractinRequestHandler
> configured with the path "/extract/stuff" then you'll probably want to
> protect that as well.  In particular you may want to block users from
> accessing /replication (but then if you'll need to give special access to
> the slave machines so they can query the master)
>
> You should also watch out for the "qt" param when using the special
> "/select" path.  I would suggest that you just block user access
> /select, and use specific paths for accessing handlers directly (ie
> /search, /dismax, etc...)
>
>
> -Hoss
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Push for a Solr 1.4.1 Bug Fix Release?

2010-06-02 Thread Robert Muir
thanks, perfect!

On Wed, Jun 2, 2010 at 2:51 PM, Chris Hostetter wrote:

>
> : lets just document the workaround for this one in the CHANGES? Solr wiki?
>
> Let's do both ... we know it affects 1.4.1, and it's good advice in
> general even if we fix every issue we know of in 3.1 and 4.0 (because we
> might miss some)
>
> I took a stab at some verbage on the wiki...
>
> http://wiki.apache.org/solr/SolrInstall?action=diff&rev2=22&rev1=21
>
> ...and in CHANGES.txt for 1.4.1...
>
> http://svn.apache.org/viewvc?view=revision&revision=950716
>
> ...please update if i worded poorly (or got it wrong).
>
>
> -Hoss
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


-- 
Robert Muir
rcm...@gmail.com


[jira] Commented: (SOLR-1405) Show the index files in the web UI

2010-06-02 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12874741#action_12874741
 ] 

Yonik Seeley commented on SOLR-1405:


bq. There's a general concensus that we should stop adding new JSPs to the 
admin interface (or enhancing existing JSPs) and we should instead use 
RequestHandlers to return this type of information

Hmmm, really?

I've not opposed to people converting JSPs to something else (when the 
functionality is not diminished), but I'm certainly against a JSP moratorium 
without very good reason.  Sometimes it's just a much easier way to expose 
simple information that's primarily meant for human consumption (and when there 
is a patch in hand).

Progress, not perfection ;-)

> Show the index files in the web UI
> --
>
> Key: SOLR-1405
> URL: https://issues.apache.org/jira/browse/SOLR-1405
> Project: Solr
>  Issue Type: Improvement
>  Components: web gui
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Trivial
> Fix For: Next
>
> Attachments: data menu.png, Index file list.png, SOLR-1405.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> It would be great to view the actual index files from the web console.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Push for a Solr 1.4.1 Bug Fix Release?

2010-06-02 Thread Chris Hostetter

: lets just document the workaround for this one in the CHANGES? Solr wiki?

Let's do both ... we know it affects 1.4.1, and it's good advice in 
general even if we fix every issue we know of in 3.1 and 4.0 (because we 
might miss some)

I took a stab at some verbage on the wiki...

http://wiki.apache.org/solr/SolrInstall?action=diff&rev2=22&rev1=21

...and in CHANGES.txt for 1.4.1...

http://svn.apache.org/viewvc?view=revision&revision=950716

...please update if i worded poorly (or got it wrong).


-Hoss


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-1937) add ability to set a facet.minpercentage (analog to facet.mincount)

2010-06-02 Thread Lukas Kahwe Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukas Kahwe Smith updated SOLR-1937:


Affects Version/s: 1.4
 Priority: Minor  (was: Major)

> add ability to set a facet.minpercentage (analog to facet.mincount)
> ---
>
> Key: SOLR-1937
> URL: https://issues.apache.org/jira/browse/SOLR-1937
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 1.4
>Reporter: Lukas Kahwe Smith
>Priority: Minor
>
> See this thread on the ML: 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201005.mbox/%3c6ee240dc-7674-4dee-806e-b78ffd558...@pooteeweet.org%3e
> Obviously I could implement this in userland (like mincount as well if it 
> wouldn't be available yet), but I wonder if anyone else see's use in being 
> able to define that a facet must match a minimum percentage of all documents 
> in the result set, rather than a hardcoded value? The idea being that while I 
> might not be interested in a facet that only covers 3 documents in the result 
> set if there are lets say 1000 documents in the result set, the situation 
> would be a lot different if I only have 10 documents in the result set.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (SOLR-1937) add ability to set a facet.minpercentage (analog to facet.mincount)

2010-06-02 Thread Lukas Kahwe Smith (JIRA)
add ability to set a facet.minpercentage (analog to facet.mincount)
---

 Key: SOLR-1937
 URL: https://issues.apache.org/jira/browse/SOLR-1937
 Project: Solr
  Issue Type: Improvement
Reporter: Lukas Kahwe Smith


See this thread on the ML: 
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201005.mbox/%3c6ee240dc-7674-4dee-806e-b78ffd558...@pooteeweet.org%3e

Obviously I could implement this in userland (like mincount as well if it 
wouldn't be available yet), but I wonder if anyone else see's use in being able 
to define that a facet must match a minimum percentage of all documents in the 
result set, rather than a hardcoded value? The idea being that while I might 
not be interested in a facet that only covers 3 documents in the result set if 
there are lets say 1000 documents in the result set, the situation would be a 
lot different if I only have 10 documents in the result set.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-1706) wrong tokens output from WordDelimiterFilter depending upon options

2010-06-02 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated SOLR-1706:
--

Fix Version/s: 1.4.1

> wrong tokens output from WordDelimiterFilter depending upon options
> ---
>
> Key: SOLR-1706
> URL: https://issues.apache.org/jira/browse/SOLR-1706
> Project: Solr
>  Issue Type: Bug
>  Components: Schema and Analysis
>Affects Versions: 1.4
>Reporter: Robert Muir
>Assignee: Mark Miller
> Fix For: 1.4.1, 3.1, 4.0
>
>
> below you can see that when I have requested to only output numeric 
> concatenations (not words), some words are still sometimes output, ignoring 
> the options i have provided, and even then, in a very inconsistent way.
> {code}
>   assertWdf("Super-Duper-XL500-42-AutoCoder's", 0,0,0,1,0,0,0,0,1, null,
> new String[] { "42", "AutoCoder" },
> new int[] { 18, 21 },
> new int[] { 20, 30 },
> new int[] { 1, 1 });
>   assertWdf("Super-Duper-XL500-42-AutoCoder's-56", 0,0,0,1,0,0,0,0,1, null,
> new String[] { "42", "AutoCoder", "56" },
> new int[] { 18, 21, 33 },
> new int[] { 20, 30, 35 },
> new int[] { 1, 1, 1 });
>   assertWdf("Super-Duper-XL500-AB-AutoCoder's", 0,0,0,1,0,0,0,0,1, null,
> new String[] {  },
> new int[] {  },
> new int[] {  },
> new int[] {  });
>   assertWdf("Super-Duper-XL500-42-AutoCoder's-BC", 0,0,0,1,0,0,0,0,1, null,
> new String[] { "42" },
> new int[] { 18 },
> new int[] { 20 },
> new int[] { 1 });
> {code}
> where assertWdf is 
> {code}
>   void assertWdf(String text, int generateWordParts, int generateNumberParts,
>   int catenateWords, int catenateNumbers, int catenateAll,
>   int splitOnCaseChange, int preserveOriginal, int splitOnNumerics,
>   int stemEnglishPossessive, CharArraySet protWords, String expected[],
>   int startOffsets[], int endOffsets[], String types[], int posIncs[])
>   throws IOException {
> TokenStream ts = new WhitespaceTokenizer(new StringReader(text));
> WordDelimiterFilter wdf = new WordDelimiterFilter(ts, generateWordParts,
> generateNumberParts, catenateWords, catenateNumbers, catenateAll,
> splitOnCaseChange, preserveOriginal, splitOnNumerics,
> stemEnglishPossessive, protWords);
> assertTokenStreamContents(wdf, expected, startOffsets, endOffsets, types,
> posIncs);
>   }
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (SOLR-1852) enablePositionIncrements="true" can cause searches to fail when they are parsed as phrase queries

2010-06-02 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved SOLR-1852.
---

Resolution: Fixed

Committed revision 950711. Thanks Peter!

> enablePositionIncrements="true" can cause searches to fail when they are 
> parsed as phrase queries
> -
>
> Key: SOLR-1852
> URL: https://issues.apache.org/jira/browse/SOLR-1852
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 1.4
>Reporter: Peter Wolanin
>Assignee: Robert Muir
> Fix For: 1.4.1
>
> Attachments: SOLR-1852.patch, SOLR-1852_solr14branch.patch, 
> SOLR-1852_testcase.patch
>
>
> Symptom: searching for a string like a domain name containing a '.', the Solr 
> 1.4 analyzer tells me that I will get a match, but when I enter the search 
> either in the client or directly in Solr, the search fails. 
> test string:  Identi.ca
> queries that fail:  IdentiCa, Identi.ca, Identi-ca
> query that matches: Identi ca
> schema in use is:
> http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34&content-type=text%2Fplain&view=co&pathrev=DRUPAL-6--1
> Screen shots:
> analysis:  http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png
> dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png
> dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png
> standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png
> Whether or not the bug appears is determined by the surrounding text:
> "would be great to have support for Identi.ca on the follow block"
> fails to match "Identi.ca", but putting the content on its own or in another 
> sentence:
> "Support Identi.ca"
> the search matches.  Testing suggests the word "for" is the problem, and it 
> looks like the bug occurs when a stop word preceeds a word that is split up 
> using the word delimiter filter.
> Setting enablePositionIncrements="false" in the stop filter and reindexing 
> causes the searches to match.
> According to Mark Miller in #solr, this bug appears to be fixed already in 
> Solr trunk, either due to the upgraded lucene or changes to the 
> WordDelimiterFactory

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (SOLR-1889) Change default value of 'mm' param to depend on explicit/implicit value of q.op

2010-06-02 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-1889.


 Assignee: Hoss Man
Fix Version/s: 4.0
   (was: Next)
   Resolution: Fixed

Committed revision 950710.

> Change default value of 'mm' param to depend on explicit/implicit value of 
> q.op
> ---
>
> Key: SOLR-1889
> URL: https://issues.apache.org/jira/browse/SOLR-1889
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Hoss Man
>Assignee: Hoss Man
> Fix For: 4.0
>
> Attachments: SOLR-1889.patch
>
>
> The 'mm' param for the dismax parser has always defaulted to 100%, but many 
> first time users seem to expect that the default behavior of dismax should 
> work similar to the standard QParser, and be influenced by the schema 
> configured default query op, or the q.op query param.
> we should change the default value for "mm" to be equivalent to 100% if the 
> derived value of "q.op" would be AND, and to be 1 if the derived value of 
> "q.op" would be OR.  We should document this in CHANGES.txt so people who are 
> upgrading know that if they have q.op=OR, but they still want a default mm 
> value of 100%, they should add it as a configured default for their request 
> handlers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1405) Show the index files in the web UI

2010-06-02 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12874715#action_12874715
 ] 

Jason Rutherglen commented on SOLR-1405:


bq. general concensus that we should stop adding new JSPs to the admin interface

Makes sense... I'll update my ZK related patch accordingly.  

Also, ReplicationHandler shows the index files as well.

> Show the index files in the web UI
> --
>
> Key: SOLR-1405
> URL: https://issues.apache.org/jira/browse/SOLR-1405
> Project: Solr
>  Issue Type: Improvement
>  Components: web gui
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Trivial
> Fix For: Next
>
> Attachments: data menu.png, Index file list.png, SOLR-1405.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> It would be great to view the actual index files from the web console.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Solr Security

2010-06-02 Thread Chris Hostetter

: BASIC Tomcat. Essentially I want users to only be able to /select/* and
: admins to be able to do everything else. Right now I am checking for
: 
: /select/* - Users
: /admin/*  - Admin
: /update/* - Admin
: 
: Are there other url strings I should be protecting?
: (This was unclear to me in the documentation)

in general it depends on what requestHandlers you have configured in your 
solrconfig.xml ...  if you have an instance of the ExtractinRequestHandler 
configured with the path "/extract/stuff" then you'll probably want to 
protect that as well.  In particular you may want to block users from 
accessing /replication (but then if you'll need to give special access to 
the slave machines so they can query the master)

You should also watch out for the "qt" param when using the special 
"/select" path.  I would suggest that you just block user access 
/select, and use specific paths for accessing handlers directly (ie 
/search, /dismax, etc...)


-Hoss


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Release-schedule JCC 2.6.0

2010-06-02 Thread Andi Vajda


On Wed, 2 Jun 2010, Julian Maibaum wrote:

after the great work on integrating different JCC-wrapped packages with the 
--shared and --import flags, it would be really cool to see these features 
being officially released in JCC 2.6.0 as well.
Since we rely on official releases for our production environment and are 
eager to apply the new technological possibilities, I would like to know when 
JCC 2.6.0 is due to arrive.


I expect the next Lucene 2.9.3/3.0.2 releases to happen shortly, this month.
PyLucene releases typically follow within a few days thereafter and include 
the latest JCC.


Andi..


[jira] Commented: (LUCENE-2485) IndexWriter should also warm flushed segments

2010-06-02 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12874687#action_12874687
 ] 

Earwin Burrfoot commented on LUCENE-2485:
-

bq. w/o this ability, there's no advantage to warming in a hook vs warming 
explicitly after getReader()
There is. Consistency. I understand that this word is not in high regard 
amongst Luceners (progress, not perfection!), but still.
It is logical to have all your warming happen in one defined place. If Lucene 
does magic for you, and biggest part of said warming happens in a separate 
thread without making you wait - that's very nice! But that's just a sideffect, 
like compiler optimizations that may or may not happen.
Also, if your app requires warming for each segment, having a single callback 
frees you from the need to determine for a given new segment returned from 
getReader(), if it is a product of merge and thus already warm, or is it a 
still-cold newly-flushed segment.

> IndexWriter should also warm flushed segments
> -
>
> Key: LUCENE-2485
> URL: https://issues.apache.org/jira/browse/LUCENE-2485
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael McCandless
> Fix For: 4.0
>
>
> Spinoff of LUCENE-2311.
> You can now set a mergedSegmentWarmer on IW, which warms only newly merged 
> segments.
> But for consistency maybe we should change this to warm all new segments (ie, 
> also flushed ones).  We should rename it to something "setSegmentWarmer".
> Really, the reader pool should be pulled out of IndexWriter, be externally 
> provided, and be responsible for doing warming of new segments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Lucene 2.9.3 ? ( blocking Solr 1.4.1 ? ? ? )

2010-06-02 Thread Michael McCandless
OK I'm done backporting fixes for 2.9.3/3.0.2...

Mike

On Tue, Jun 1, 2010 at 4:44 PM, Michael McCandless
 wrote:
> +1 for end-of-day Thu freeze -- I'll finish mine by then.
>
> But I think for trivial fixes for kinda bad bugs (LUCENE-2311 -- real
> bad: mergedSegmentWarner is basically unusable; LUCENE-2356) we should
> make an exception (allow them into 2.9.3)?  I've already got the patch
> up for 2311; I'll do 2356 shortly.
>
> Mike
>
> On Tue, Jun 1, 2010 at 4:30 PM, Uwe Schindler  wrote:
>> Hi all,
>>
>> I am quite fine with doing a 2.9.3 / 3.0.2 release as soon as possible, but
>> I don't like to force any reopened issues into this release. I have no
>> problem with doing this in parallel to the Solr 1.4.1 Release. I don't think
>> that it is a good idea, to now reopen a lot of issues, just to get them into
>> 2.9.3 / 3.0.2:
>>
>> My idea was, to release the current branches as the artifacts for this
>> release. Both branches are stable and contain very qualitative branches.
>> They contain very stable patches and a release should be very unproblematic.
>> The testsuites pass easy and I have no problem to create artifacts out of
>> it. Based on this I said, that I would do the release manager again. I have
>> the scripts for the parallel releases upto date and it's easy for me to
>> build the release artifacts quickly using JDK 1.5.0_22.
>>
>> But now starting to forcefully back port issues is not a good idea, so I
>> would like to freeze the branches soon and reject patches to go in. I would
>> like to also vote against too late patches to go into these branches. Mike
>> did hard work to get lots of recent memory problems in the indexer that were
>> fixed in 3x and trunk branch. But we should not add patches from the late
>> developments and for sure no analyzer changes (it's impossible, because we
>> cannot change analyzers because they would change index format. Robert told
>> me that he does not want to back port any changes here). Next week is
>> buzzwords. I would like to start the RC1 shortly after the buzzwords. Simon,
>> Grant and Robert and me will hopefully have fruitful discussions about it,
>> but I think we should come to an end very soon.
>>
>> So I suggest the following timeline:
>> I may accept backports in the branches until Thursday evening, so I can
>> start to review the branch on Friday. During Buzzwords, we should not commit
>> anything and maybe everybody tests the branch and its changes in Solr and
>> his own installations. If you like I could create pre-artifacts (like a
>> Hudson build) on Friday) or maybe RC1. I will also merge changes.txt files
>> accorss 2.9, 3.0, 3.x and trunk on Friday. After that I don't want to accept
>> any more changes and declare "freeze" on 2.9 and 3.0 branch.
>>
>> Mark, Hoss: Would it be possible to start the release process of Solr
>> together with Lucene's RCs. Would it be possible to *replace" the final
>> Lucene artifacts and build the Solr Atifacts together using my builds. So we
>> would not block each other and maybe we can release on the same day. This
>> would be a good start for future combines Solucene releases :-)
>>
>> Any comments?
>>
>> Thanks for all the work.
>> -
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: u...@thetaphi.de
>>
>>> -Original Message-
>>> From: Chris Hostetter [mailto:hossman_luc...@fucit.org]
>>> Sent: Tuesday, June 01, 2010 8:13 PM
>>> To: Lucene Dev
>>> Subject: Lucene 2.9.3 ? ( blocking Solr 1.4.1 ? ? ? )
>>>
>>>
>>> My suggestion that we do a Solr 1.4.1 bug fix seems to have gottne Uwe and
>>> McCandless all excited about doing a Lucene 2.9.3 release...
>>>
>>> https://issues.apache.org/jira/browse/SOLR-1934
>>>
>>> ...but it's not clear to me how realisitc this is, or how close we are to
>> seeing it
>>> happen.  Beyond hte few issues mentiond in that SOLR issue, is there a
>>> concrete list of issues that are RESOLVED that people want to backport for
>> a
>>> 2.9.3 release?
>>>
>>> I also see quite a few UNRESOLVED issues listed for 2.9.3...
>>>
>>> https://issues.apache.org/jira/secure/BrowseVersion.jspa?id=12310110&ver
>>> sionId=12314799&showOpenIssuesOnly=true
>>>
>>> ...are those really blocking a 2.9.3 release, or should we de-classify
>> those as
>>> "Fix For 2.9.3" issues?
>>>
>>>
>>> -Hoss
>>>
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-2311) Pass potent SR to IRWarmer.warm(), and also call warm() for new segments

2010-06-02 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-2311.


Resolution: Fixed

> Pass potent SR to IRWarmer.warm(), and also call warm() for new segments
> 
>
> Key: LUCENE-2311
> URL: https://issues.apache.org/jira/browse/LUCENE-2311
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Earwin Burrfoot
>Assignee: Michael McCandless
> Fix For: 2.9.3, 3.0.2, 3.1, 4.0
>
> Attachments: LUCENE-2311.patch, LUCENE-2311.patch
>
>
> Currently warm() receives a SegmentReader without terms index and docstores.
> It would be arguably more useful for the app to receive a fully loaded 
> reader, so it can actually fire up some caches. If the warmer is undefined on 
> IW, we probably leave things as they are.
> It is also arguably more concise and clear to call warm() on all newly 
> created segments, so there is a single point of warming readers in NRT 
> context, and every subreader coming from getReader is guaranteed to be warmed 
> up -> you don't have to introduce even more mess in your code by rechecking 
> it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (SOLR-1915) DebugComponent should use NamedList to output Explanations instead of Explanation.toString()

2010-06-02 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-1915.


Resolution: Fixed

yeah ... "debug.explain.structured" is along the lines of what i had in mind.


Committed revision 950667.

> DebugComponent should use NamedList to output Explanations instead of 
> Explanation.toString()
> 
>
> Key: SOLR-1915
> URL: https://issues.apache.org/jira/browse/SOLR-1915
> Project: Solr
>  Issue Type: Improvement
>  Components: SearchComponents - other
>Reporter: Hoss Man
>Assignee: Hoss Man
>Priority: Minor
> Fix For: 4.0
>
> Attachments: SOLR-1915.patch
>
>
> DebugComponent currently uses Explanation.toString() to "format" score 
> explanations for each document as plain text with whitespace indenting to 
> denote the hierarchical relationship, and then adds those explanations to the 
> SolrQueryResponse.
> Instead DebugComponent should transform the Explanation objects into 
> NamedLists so that the full structure can be formatted in a logical way by 
> the ResponseWriter

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2485) IndexWriter should also warm flushed segments

2010-06-02 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12874652#action_12874652
 ] 

Yonik Seeley commented on LUCENE-2485:
--

bq. If an application needs warming, it will need to warm up new segments 
exposed through getReader() anyway.

But it's very different... the advantage to warming new segments is that the 
warm step was considered part of the merge by getReader() - if the whole thing 
hadn't completed yet, getReader() would still immediately return with the old 
segments pre-merge.  w/o this ability, there's no advantage to warming in a 
hook vs warming explicitly after getReader().


> IndexWriter should also warm flushed segments
> -
>
> Key: LUCENE-2485
> URL: https://issues.apache.org/jira/browse/LUCENE-2485
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael McCandless
> Fix For: 4.0
>
>
> Spinoff of LUCENE-2311.
> You can now set a mergedSegmentWarmer on IW, which warms only newly merged 
> segments.
> But for consistency maybe we should change this to warm all new segments (ie, 
> also flushed ones).  We should rename it to something "setSegmentWarmer".
> Really, the reader pool should be pulled out of IndexWriter, be externally 
> provided, and be responsible for doing warming of new segments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2485) IndexWriter should also warm flushed segments

2010-06-02 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12874650#action_12874650
 ] 

Earwin Burrfoot commented on LUCENE-2485:
-

bq. As long as warming a new segment doesn't block that new segment from being 
exposed via getReader()?
If an application needs warming, it will need to warm up new segments exposed 
through getReader() anyway. If you're bent on fast turnaround, you're probably 
not relying on things being warmed up (or okay with the costs).
Add to this the thing that for realtime-hungry deployments the size of 
newly-created (not merged) segments is likely smallish, and any warmup (if 
present) will take negligible time.

I think you're going to do a bit of overoptimizing here.

> IndexWriter should also warm flushed segments
> -
>
> Key: LUCENE-2485
> URL: https://issues.apache.org/jira/browse/LUCENE-2485
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael McCandless
> Fix For: 4.0
>
>
> Spinoff of LUCENE-2311.
> You can now set a mergedSegmentWarmer on IW, which warms only newly merged 
> segments.
> But for consistency maybe we should change this to warm all new segments (ie, 
> also flushed ones).  We should rename it to something "setSegmentWarmer".
> Really, the reader pool should be pulled out of IndexWriter, be externally 
> provided, and be responsible for doing warming of new segments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-1914) phps and json add a double qoute / brackets, serialize isnt working

2010-06-02 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-1914:
---

Attachment: SOLR-1949.patch

Here's a patch w/ test that writes the special values as a String, in the 
format the Java uses.
"NaN","-Infinity","Infinity".  I plan on committing shortly to 4.0 and 3.1

> phps and json add a double qoute / brackets, serialize isnt working
> ---
>
> Key: SOLR-1914
> URL: https://issues.apache.org/jira/browse/SOLR-1914
> Project: Solr
>  Issue Type: Bug
>  Components: clients - php
>Affects Versions: 1.4
> Environment: apache Solr 1.4, Jetty (out of the box).  e-comerce 
> application UTF-8
>Reporter: Sascha Jovanoski
> Attachments: ff01.xml, SOLR-1949.patch
>
>
> (first of all Solr Rocks!)
> iam runnig an e-comerce application with dynamic filteset, depend on selected 
> categories.
> everything works fine and stable but today i found out that ther is a strange 
> case:
> i guess around 5000 strings in my app used as facet filter. There is one 
> String "transluzentes Acryl" and even the english  translation "Translucent 
> acrylics" are not working.
> QUERY:
> http://ff01:8983/solr/core1/select/?q=*:*&fq=shop_id:10 AND language:2 AND 
> category_1:"Badewannen" AND material:"transluzentes 
> Acryl"&facet=on&facet.limit=400&facet.field=category_1&facet.field=category_2&facet.field=manu&facet.field=length&facet.field=width&facet.field=height&facet.field=color&facet.field=material&facet.field=delivery&facet.field=attributes&facet.mincount=1&facet.sort=lex&facet.method=fc&stats=true&stats.field=price&start=0&rows=50&sort=sort+desc&fl=name,model_nr,image,url,manu,delivery,in_stock,serie,price,uvp,nav_id,template,option,name,model_nr,image,url,manu,delivery,in_stock,serie,price,uvp,nav_id,template,option&wt=phps&V=1.0&version=2.2
> RESPONSE: as xml and php works fine! but json , phps ends up like this.
> .s:2:"fq";s:79:"shop_id:10 AND language:2 AND category_1:"Badewannen" AND 
> material:"transluzentes Acryl"";..
> see that double qoutes at the end.
> That happens only on this strings or Docs because even the english Docs 
> fails. I though its may be a matter of encoding, so i added a transformer in 
> the DIH which  replaced these strings with "Hello World".
> Faces on "Hello Wolrd" fails. Other Facet Strings with white spaces, special 
> chars and so on never fail. 
> if i add brackets on the fq 
> ...q=*:*&fq=shop_id:10 AND language:2 AND category_1:"Badewannen" AND 
> material:"transluzentes Acryl"&facet=on&...
> it ends up with a double }}
> s:2:"fq";s:79:"shop_id:20 AND language:2 AND category_1:"Badewannen" AND 
> material:"Hallo Welt"";}}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2311) Pass potent SR to IRWarmer.warm(), and also call warm() for new segments

2010-06-02 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12874634#action_12874634
 ] 

Earwin Burrfoot commented on LUCENE-2311:
-

bq. Does your pending patch (what's the issue number again?) do this already?
LUCENE-2355 - this patch doesn't do this yet.
The next part removes the need for readerWarmer, as each reader has a number of 
components that are notified when reader is created/closed (and can warm 
themselves appropriately).
This also takes care of one of Yonik's concerns from LUCENE-2485
bq.Passing in the complete index (in addition to just the new segment) would 
allow incremental updating of an index-wide data structure
The factories that create components are shared for DirReader+SRs or 
IW-readerPool+SRs+IWBackedReader, so new components by default have access to 
index-wide context.

The part that is missing is the way for the user to specify if he wants his 
newly merged SRs pre-warmed and up to which runlevel.

> Pass potent SR to IRWarmer.warm(), and also call warm() for new segments
> 
>
> Key: LUCENE-2311
> URL: https://issues.apache.org/jira/browse/LUCENE-2311
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Earwin Burrfoot
>Assignee: Michael McCandless
> Fix For: 2.9.3, 3.0.2, 3.1, 4.0
>
> Attachments: LUCENE-2311.patch, LUCENE-2311.patch
>
>
> Currently warm() receives a SegmentReader without terms index and docstores.
> It would be arguably more useful for the app to receive a fully loaded 
> reader, so it can actually fire up some caches. If the warmer is undefined on 
> IW, we probably leave things as they are.
> It is also arguably more concise and clear to call warm() on all newly 
> created segments, so there is a single point of warming readers in NRT 
> context, and every subreader coming from getReader is guaranteed to be warmed 
> up -> you don't have to introduce even more mess in your code by rechecking 
> it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-2468) reopen on NRT reader should share readers w/ unchanged segments

2010-06-02 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-2468.


Fix Version/s: 3.0.2
   Resolution: Fixed

> reopen on NRT reader should share readers w/ unchanged segments
> ---
>
> Key: LUCENE-2468
> URL: https://issues.apache.org/jira/browse/LUCENE-2468
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Yonik Seeley
>Assignee: Michael McCandless
> Fix For: 2.9.3, 3.0.2, 3.1, 4.0
>
> Attachments: CacheTest.java, DeletionAwareConstantScoreQuery.java, 
> LUCENE-2468.patch, LUCENE-2468.patch, LUCENE-2468.patch
>
>
> A repoen on an NRT reader doesn't seem to share readers for those segments 
> that are unchanged.
> http://search.lucidimagination.com/search/document/9f0335d480d2e637/nrt_and_caching_based_on_indexreader

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (SOLR-1778) java.io.IOException: read past EOF

2010-06-02 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley resolved SOLR-1778.


Fix Version/s: 1.4.1
   (was: Next)
   Resolution: Fixed

> java.io.IOException: read past EOF
> --
>
> Key: SOLR-1778
> URL: https://issues.apache.org/jira/browse/SOLR-1778
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 1.4
>Reporter: Yonik Seeley
>Assignee: Yonik Seeley
>Priority: Critical
> Fix For: 1.4.1
>
>
> A query with relevancy scores of all zeros produces an invalid doclist that 
> includes sentinel values 2147483647 and causes Solr to request that invalid 
> docid from Lucene which results in a java.io.IOException: read past EOF
> http://search.lucidimagination.com/search/document/2d5359c0e0d103be/java_io_ioexception_read_past_eof_after_solr_1_4_0

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2485) IndexWriter should also warm flushed segments

2010-06-02 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12874613#action_12874613
 ] 

Yonik Seeley commented on LUCENE-2485:
--

I'm not sure how practical this is or not... but in general, more context would 
enable a broader range of applications.
- Passing in the complete index (in addition to just the new segment) would 
allow incremental updating of an index-wide data structure
- If the new segment was the result of a merge of existing segments, passing in 
those existing segments could allow more efficient generation of cached items 
from the cached items of the old segments.

> IndexWriter should also warm flushed segments
> -
>
> Key: LUCENE-2485
> URL: https://issues.apache.org/jira/browse/LUCENE-2485
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael McCandless
> Fix For: 4.0
>
>
> Spinoff of LUCENE-2311.
> You can now set a mergedSegmentWarmer on IW, which warms only newly merged 
> segments.
> But for consistency maybe we should change this to warm all new segments (ie, 
> also flushed ones).  We should rename it to something "setSegmentWarmer".
> Really, the reader pool should be pulled out of IndexWriter, be externally 
> provided, and be responsible for doing warming of new segments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2485) IndexWriter should also warm flushed segments

2010-06-02 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12874611#action_12874611
 ] 

Yonik Seeley commented on LUCENE-2485:
--

bq. But for consistency maybe we should change this to warm all new segments

As long as warming a new segment doesn't block that new segment from being 
exposed via getReader()?

> IndexWriter should also warm flushed segments
> -
>
> Key: LUCENE-2485
> URL: https://issues.apache.org/jira/browse/LUCENE-2485
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael McCandless
> Fix For: 4.0
>
>
> Spinoff of LUCENE-2311.
> You can now set a mergedSegmentWarmer on IW, which warms only newly merged 
> segments.
> But for consistency maybe we should change this to warm all new segments (ie, 
> also flushed ones).  We should rename it to something "setSegmentWarmer".
> Really, the reader pool should be pulled out of IndexWriter, be externally 
> provided, and be responsible for doing warming of new segments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2311) Pass potent SR to IRWarmer.warm(), and also call warm() for new segments

2010-06-02 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12874606#action_12874606
 ] 

Michael McCandless commented on LUCENE-2311:


OK I opened LUCENE-2485 for the 2nd part of this issue.

I'll commit the first part shortly.

> Pass potent SR to IRWarmer.warm(), and also call warm() for new segments
> 
>
> Key: LUCENE-2311
> URL: https://issues.apache.org/jira/browse/LUCENE-2311
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Earwin Burrfoot
>Assignee: Michael McCandless
> Fix For: 2.9.3, 3.0.2, 3.1, 4.0
>
> Attachments: LUCENE-2311.patch, LUCENE-2311.patch
>
>
> Currently warm() receives a SegmentReader without terms index and docstores.
> It would be arguably more useful for the app to receive a fully loaded 
> reader, so it can actually fire up some caches. If the warmer is undefined on 
> IW, we probably leave things as they are.
> It is also arguably more concise and clear to call warm() on all newly 
> created segments, so there is a single point of warming readers in NRT 
> context, and every subreader coming from getReader is guaranteed to be warmed 
> up -> you don't have to introduce even more mess in your code by rechecking 
> it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2485) IndexWriter should also warm flushed segments

2010-06-02 Thread Michael McCandless (JIRA)
IndexWriter should also warm flushed segments
-

 Key: LUCENE-2485
 URL: https://issues.apache.org/jira/browse/LUCENE-2485
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael McCandless
 Fix For: 4.0


Spinoff of LUCENE-2311.

You can now set a mergedSegmentWarmer on IW, which warms only newly merged 
segments.

But for consistency maybe we should change this to warm all new segments (ie, 
also flushed ones).  We should rename it to something "setSegmentWarmer".

Really, the reader pool should be pulled out of IndexWriter, be externally 
provided, and be responsible for doing warming of new segments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1914) phps and json add a double qoute / brackets, serialize isnt working

2010-06-02 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12874575#action_12874575
 ] 

Yonik Seeley commented on SOLR-1914:


Looks like this was sort of fixed for Python & Ruby back in SOLR-449.
I haven't had much luck finding a way to express them in PHP... I guess the 
default for all JSON subclasses should just write as a String.


> phps and json add a double qoute / brackets, serialize isnt working
> ---
>
> Key: SOLR-1914
> URL: https://issues.apache.org/jira/browse/SOLR-1914
> Project: Solr
>  Issue Type: Bug
>  Components: clients - php
>Affects Versions: 1.4
> Environment: apache Solr 1.4, Jetty (out of the box).  e-comerce 
> application UTF-8
>Reporter: Sascha Jovanoski
> Attachments: ff01.xml
>
>
> (first of all Solr Rocks!)
> iam runnig an e-comerce application with dynamic filteset, depend on selected 
> categories.
> everything works fine and stable but today i found out that ther is a strange 
> case:
> i guess around 5000 strings in my app used as facet filter. There is one 
> String "transluzentes Acryl" and even the english  translation "Translucent 
> acrylics" are not working.
> QUERY:
> http://ff01:8983/solr/core1/select/?q=*:*&fq=shop_id:10 AND language:2 AND 
> category_1:"Badewannen" AND material:"transluzentes 
> Acryl"&facet=on&facet.limit=400&facet.field=category_1&facet.field=category_2&facet.field=manu&facet.field=length&facet.field=width&facet.field=height&facet.field=color&facet.field=material&facet.field=delivery&facet.field=attributes&facet.mincount=1&facet.sort=lex&facet.method=fc&stats=true&stats.field=price&start=0&rows=50&sort=sort+desc&fl=name,model_nr,image,url,manu,delivery,in_stock,serie,price,uvp,nav_id,template,option,name,model_nr,image,url,manu,delivery,in_stock,serie,price,uvp,nav_id,template,option&wt=phps&V=1.0&version=2.2
> RESPONSE: as xml and php works fine! but json , phps ends up like this.
> .s:2:"fq";s:79:"shop_id:10 AND language:2 AND category_1:"Badewannen" AND 
> material:"transluzentes Acryl"";..
> see that double qoutes at the end.
> That happens only on this strings or Docs because even the english Docs 
> fails. I though its may be a matter of encoding, so i added a transformer in 
> the DIH which  replaced these strings with "Hello World".
> Faces on "Hello Wolrd" fails. Other Facet Strings with white spaces, special 
> chars and so on never fail. 
> if i add brackets on the fq 
> ...q=*:*&fq=shop_id:10 AND language:2 AND category_1:"Badewannen" AND 
> material:"transluzentes Acryl"&facet=on&...
> it ends up with a double }}
> s:2:"fq";s:79:"shop_id:20 AND language:2 AND category_1:"Badewannen" AND 
> material:"Hallo Welt"";}}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-2356) Enable setting the terms index divisor used by IndexWriter whenever it opens internal readers

2010-06-02 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-2356.


Resolution: Fixed

> Enable setting the terms index divisor used by IndexWriter whenever it opens 
> internal readers
> -
>
> Key: LUCENE-2356
> URL: https://issues.apache.org/jira/browse/LUCENE-2356
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 2.9.3, 3.0.2, 3.1, 4.0
>
> Attachments: LUCENE-2356.patch
>
>
> Opening a place holder issue... if all the refactoring being discussed don't 
> make this possible, then we should add a setting to IWC to do so.
> Apps with very large numbers of unique terms must set the terms index divisor 
> to control RAM usage.
> (NOTE: flex's RAM terms dict index RAM usage is more efficient, so this will 
> help such apps).
> But, when IW resolves deletes internally it always uses default 1 terms index 
> divisor, and the app cannot change that.  Though one workaround is to call 
> getReader(termInfosIndexDivisor) which will pool the reader with the right 
> divisor.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2348) DuplicateFilter incorrectly handles multiple calls to getDocIdSet for segment readers

2010-06-02 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12874529#action_12874529
 ] 

Michael McCandless commented on LUCENE-2348:


bq. What you describe is precisely the problem. It will deduplicate only over 
each segment, not over the text index as one would expect given the name of the 
class.

Duh, right!  You want dedup to apply to the entire index

Ugh, so this has been broken since the cutover to per-segment searching (2.9.x).

This is tricky to fix.  Somehow DuplicateFilter needs to get ahold of the top 
reader.  It then must run its dup detection against the TermEnum from that top 
reader, but then when requested per sub-reader, it must return a slice into the 
bits for the top reader.

There's no way, now, given a sub-reader to figure out which parent reader it 
belongs to... so I think we'd have to change DuplicateFilter to take in the top 
reader to its ctor?  (But this is sort of messy -- no other core/contrib 
filters have this "state" -- they are normally free to be reused across 
readers).

The only other [big] change I can think of is if we could change the Filter API 
to be more like Scorer, which does first receive the top reader (since it needs 
to init measures like idf across all segments), and then separately steps 
through each sub-reader.

> DuplicateFilter incorrectly handles multiple calls to getDocIdSet for segment 
> readers
> -
>
> Key: LUCENE-2348
> URL: https://issues.apache.org/jira/browse/LUCENE-2348
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: contrib/*
>Affects Versions: 2.9.2
>Reporter: Trejkaz
>
> DuplicateFilter currently works by building a single doc ID set, without 
> taking into account that getDocIdSet() will be called once per segment and 
> only with each segment's local reader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (SOLR-1680) Provide an API to specify custom Collectors

2010-06-02 Thread Jan Kurella (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12874524#action_12874524
 ] 

Jan Kurella edited comment on SOLR-1680 at 6/2/10 6:23 AM:
---

the "streamlining" could be done in simple approach?

It seems to be quite simple according to the linked ticket.
The codephrase
{code:java}
   if( timeAllowed > 0 ) {
 collector = new TimeLimitingCollector(collector, timeAllowed);
   }
   try {
 super.search(query, luceneFilter, collector);
   }
   catch( TimeLimitingCollector.TimeExceededException x ) {
 log.warn( "Query: " + query + "; " + x.getMessage() );
 qr.setPartialResults(true);
   }
{code}
is spread several times over the SolrIndexSearcher.

It should be enough to put this in a separate function and wrap the collector 
with any custom collector here (in one place):
{code:java}
  private Collector doSearch(neededParams)
   if( timeAllowed > 0 ) {
 collector = new TimeLimitingCollector(collector, timeAllowed);
   }
   if( customCollector != null) {
 customCollector.setInnerCollector(collector);
 collector = customCollector
   }
   try {
 super.search(query, luceneFilter, collector);
   }
   catch( TimeLimitingCollector.TimeExceededException x ) {
 log.warn( "Query: " + query + "; " + x.getMessage() );
 qr.setPartialResults(true);
   }
  }
{code}

And custom collector needs to be retrieved by the whatever plugin concept.

??


  was (Author: kurellajunior):
the "streamlining" could be done in simple approach?

It seems to be quite simple according to the linked ticket.
The codephrase

   if( timeAllowed > 0 ) {
 collector = new TimeLimitingCollector(collector, timeAllowed);
   }
   try {
 super.search(query, luceneFilter, collector);
   }
   catch( TimeLimitingCollector.TimeExceededException x ) {
 log.warn( "Query: " + query + "; " + x.getMessage() );
 qr.setPartialResults(true);
   }

is spread several times over the SolrIndexSearcher.

It should be enough to put this in a separate function and wrap the collector 
with any custom collector here (in one place):

  private Collector doSearch(neededParams)
   if( timeAllowed > 0 ) {
 collector = new TimeLimitingCollector(collector, timeAllowed);
   }
   if( customCollector != null) {
 customCollector.setInnerCollector(collector);
 collector = customCollector
   }
   try {
 super.search(query, luceneFilter, collector);
   }
   catch( TimeLimitingCollector.TimeExceededException x ) {
 log.warn( "Query: " + query + "; " + x.getMessage() );
 qr.setPartialResults(true);
   }
  }


And custom collector needs to be retrieved by the whatever plugin concept.

??

  
> Provide an API to specify custom Collectors
> ---
>
> Key: SOLR-1680
> URL: https://issues.apache.org/jira/browse/SOLR-1680
> Project: Solr
>  Issue Type: Sub-task
>  Components: search
>Affects Versions: 1.3
>Reporter: Martijn van Groningen
> Fix For: Next
>
> Attachments: field-collapse-core.patch, SOLR-1680.patch
>
>
> The issue is dedicated to incorporate fieldcollapse's changes to the Solr's 
> core code. 
> We want to make it possible for components to specify custom Collectors in 
> SolrIndexSearcher methods.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1680) Provide an API to specify custom Collectors

2010-06-02 Thread Jan Kurella (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12874524#action_12874524
 ] 

Jan Kurella commented on SOLR-1680:
---

the "streamlining" could be done in simple approach?

It seems to be quite simple according to the linked ticket.
The codephrase
   if( timeAllowed > 0 ) {
 collector = new TimeLimitingCollector(collector, timeAllowed);
   }
   try {
 super.search(query, luceneFilter, collector);
   }
   catch( TimeLimitingCollector.TimeExceededException x ) {
 log.warn( "Query: " + query + "; " + x.getMessage() );
 qr.setPartialResults(true);
   }

is spread several times over the SolrIndexSearcher.

It should be enough to put this in a separate function and wrap the collector 
with any custom collector here (in one place):
  private Collector doSearch(neededParams)
   if( timeAllowed > 0 ) {
 collector = new TimeLimitingCollector(collector, timeAllowed);
   }
   if( customCollector != null) {
 customCollector.setInnerCollector(collector);
 collector = customCollector
   }
   try {
 super.search(query, luceneFilter, collector);
   }
   catch( TimeLimitingCollector.TimeExceededException x ) {
 log.warn( "Query: " + query + "; " + x.getMessage() );
 qr.setPartialResults(true);
   }
  }

And custom collector needs to be retrieved by the whatever plugin concept.

??


> Provide an API to specify custom Collectors
> ---
>
> Key: SOLR-1680
> URL: https://issues.apache.org/jira/browse/SOLR-1680
> Project: Solr
>  Issue Type: Sub-task
>  Components: search
>Affects Versions: 1.3
>Reporter: Martijn van Groningen
> Fix For: Next
>
> Attachments: field-collapse-core.patch, SOLR-1680.patch
>
>
> The issue is dedicated to incorporate fieldcollapse's changes to the Solr's 
> core code. 
> We want to make it possible for components to specify custom Collectors in 
> SolrIndexSearcher methods.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (SOLR-1680) Provide an API to specify custom Collectors

2010-06-02 Thread Jan Kurella (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12874524#action_12874524
 ] 

Jan Kurella edited comment on SOLR-1680 at 6/2/10 6:21 AM:
---

the "streamlining" could be done in simple approach?

It seems to be quite simple according to the linked ticket.
The codephrase

   if( timeAllowed > 0 ) {
 collector = new TimeLimitingCollector(collector, timeAllowed);
   }
   try {
 super.search(query, luceneFilter, collector);
   }
   catch( TimeLimitingCollector.TimeExceededException x ) {
 log.warn( "Query: " + query + "; " + x.getMessage() );
 qr.setPartialResults(true);
   }

is spread several times over the SolrIndexSearcher.

It should be enough to put this in a separate function and wrap the collector 
with any custom collector here (in one place):

  private Collector doSearch(neededParams)
   if( timeAllowed > 0 ) {
 collector = new TimeLimitingCollector(collector, timeAllowed);
   }
   if( customCollector != null) {
 customCollector.setInnerCollector(collector);
 collector = customCollector
   }
   try {
 super.search(query, luceneFilter, collector);
   }
   catch( TimeLimitingCollector.TimeExceededException x ) {
 log.warn( "Query: " + query + "; " + x.getMessage() );
 qr.setPartialResults(true);
   }
  }


And custom collector needs to be retrieved by the whatever plugin concept.

??


  was (Author: kurellajunior):
the "streamlining" could be done in simple approach?

It seems to be quite simple according to the linked ticket.
The codephrase
   if( timeAllowed > 0 ) {
 collector = new TimeLimitingCollector(collector, timeAllowed);
   }
   try {
 super.search(query, luceneFilter, collector);
   }
   catch( TimeLimitingCollector.TimeExceededException x ) {
 log.warn( "Query: " + query + "; " + x.getMessage() );
 qr.setPartialResults(true);
   }

is spread several times over the SolrIndexSearcher.

It should be enough to put this in a separate function and wrap the collector 
with any custom collector here (in one place):
  private Collector doSearch(neededParams)
   if( timeAllowed > 0 ) {
 collector = new TimeLimitingCollector(collector, timeAllowed);
   }
   if( customCollector != null) {
 customCollector.setInnerCollector(collector);
 collector = customCollector
   }
   try {
 super.search(query, luceneFilter, collector);
   }
   catch( TimeLimitingCollector.TimeExceededException x ) {
 log.warn( "Query: " + query + "; " + x.getMessage() );
 qr.setPartialResults(true);
   }
  }

And custom collector needs to be retrieved by the whatever plugin concept.

??

  
> Provide an API to specify custom Collectors
> ---
>
> Key: SOLR-1680
> URL: https://issues.apache.org/jira/browse/SOLR-1680
> Project: Solr
>  Issue Type: Sub-task
>  Components: search
>Affects Versions: 1.3
>Reporter: Martijn van Groningen
> Fix For: Next
>
> Attachments: field-collapse-core.patch, SOLR-1680.patch
>
>
> The issue is dedicated to incorporate fieldcollapse's changes to the Solr's 
> core code. 
> We want to make it possible for components to specify custom Collectors in 
> SolrIndexSearcher methods.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-2135) IndexReader.close should forcefully evict entries from FieldCache

2010-06-02 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-2135.


Resolution: Fixed

> IndexReader.close should forcefully evict entries from FieldCache
> -
>
> Key: LUCENE-2135
> URL: https://issues.apache.org/jira/browse/LUCENE-2135
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 2.9.3, 3.0.2, 3.1, 4.0
>
> Attachments: LUCENE-2135.patch, LUCENE-2135.patch, LUCENE-2135.patch
>
>
> Spinoff of java-user thread "heap memory issues when sorting by a string 
> field".
> We rely on WeakHashMap to hold our FieldCache, keyed by reader.  But this 
> lacks immediacy on releasing the reference, after a reader is closed.
> WeakHashMap can't free the key until the reader is no longer referenced by 
> the app. And, apparently, WeakHashMap has a further impl detail that requires 
> invoking one of its methods for it to notice that a key has just become only 
> weakly reachable.
> To fix this, I think on IR.close we should evict entries from the FieldCache, 
> as long as the sub-readers are truly closed (refCount dropped to 0).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1915) DebugComponent should use NamedList to output Explanations instead of Explanation.toString()

2010-06-02 Thread Erik Hatcher (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12874491#action_12874491
 ] 

Erik Hatcher commented on SOLR-1915:


debug.explain.structured=true ??



> DebugComponent should use NamedList to output Explanations instead of 
> Explanation.toString()
> 
>
> Key: SOLR-1915
> URL: https://issues.apache.org/jira/browse/SOLR-1915
> Project: Solr
>  Issue Type: Improvement
>  Components: SearchComponents - other
>Reporter: Hoss Man
>Assignee: Hoss Man
>Priority: Minor
> Fix For: 4.0
>
> Attachments: SOLR-1915.patch
>
>
> DebugComponent currently uses Explanation.toString() to "format" score 
> explanations for each document as plain text with whitespace indenting to 
> denote the hierarchical relationship, and then adds those explanations to the 
> SolrQueryResponse.
> Instead DebugComponent should transform the Explanation objects into 
> NamedLists so that the full structure can be formatted in a logical way by 
> the ResponseWriter

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Solr updateRequestHandler and performance vs. atomicity

2010-06-02 Thread Otis Gospodnetic
While preparing material for 
http://blog.sematext.com/2010/06/02/lucene-digest-may-2010-3/ I came across 
something that looks relevant:

https://issues.apache.org/jira/browse/LUCENE-2456

...where the author wrote this:

"In conclusion, this directory attempts to marry the rich search-based 
query language of Lucene with the distributed fault-tolerant database 
that is Cassandra. By delegating the responsibilities of replication, 
durability and elasticity to the directory, we free the layers above 
from such non-functional concerns. Our hope is that users will choose to make 
their large-scale indices instantly scalable by seamlessly 
migrating them to this type of directory (using 
Directory#copyTo(Directory))."
 Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Yonik Seeley 
> To: dev@lucene.apache.org
> Sent: Tue, May 25, 2010 8:59:29 AM
> Subject: Re: Solr updateRequestHandler and performance vs. atomicity
> 
> On Mon, May 24, 2010 at 9:10 AM,  <
> ymailto="mailto:karl.wri...@nokia.com"; 
> href="mailto:karl.wri...@nokia.com";>karl.wri...@nokia.com> wrote:
> 
> In particular, it would be nice to be able to post documents in such a 
> way
> that you can guarantee that the document is permanently in Solr’s 
> queue,
> safe in the event of a Solr restart, etc., even if the document 
> has not yet
> been “committed”.

Yep, this is a longer term goal of 
> SolrCloud.
And to be truly safe, committing to stable storage is not enough 
> -
that still might crash and never recover.  One needs to write 
> to
multiple nodes.

-Yonik

> target=_blank 
> >http://www.lucidimagination.com

-
To 
> unsubscribe, e-mail: 
> href="mailto:dev-unsubscr...@lucene.apache.org";>dev-unsubscr...@lucene.apache.org
For 
> additional commands, e-mail: 
> href="mailto:dev-h...@lucene.apache.org";>dev-h...@lucene.apache.org

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org