[Lucene.Net] [jira] [Created] (LUCENENET-426) Mark BaseFragmentsBuilder methods as virtual

2011-06-16 Thread Itamar Syn-Hershko (JIRA)
Mark BaseFragmentsBuilder methods as virtual


 Key: LUCENENET-426
 URL: https://issues.apache.org/jira/browse/LUCENENET-426
 Project: Lucene.Net
  Issue Type: Improvement
  Components: Lucene.Net Contrib
Affects Versions: Lucene.Net 2.9.2, Lucene.Net 2.9.4, Lucene.Net 3.x, 
Lucene.Net 2.9.4g
Reporter: Itamar Syn-Hershko
Priority: Minor


Without marking methods in BaseFragmentsBuilder as virtual, it is meaningless 
to have FragmentsBuilder deriving from a class named Base, since most of its 
functionality cannot be overridden. Attached is a patch for marking the 
important methods virtual.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[Lucene.Net] [jira] [Updated] (LUCENENET-426) Mark BaseFragmentsBuilder methods as virtual

2011-06-16 Thread Itamar Syn-Hershko (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENENET-426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Itamar Syn-Hershko updated LUCENENET-426:
-

Attachment: fvh.patch

Patch fixing this

 Mark BaseFragmentsBuilder methods as virtual
 

 Key: LUCENENET-426
 URL: https://issues.apache.org/jira/browse/LUCENENET-426
 Project: Lucene.Net
  Issue Type: Improvement
  Components: Lucene.Net Contrib
Affects Versions: Lucene.Net 2.9.2, Lucene.Net 2.9.4, Lucene.Net 3.x, 
 Lucene.Net 2.9.4g
Reporter: Itamar Syn-Hershko
Priority: Minor
 Attachments: fvh.patch


 Without marking methods in BaseFragmentsBuilder as virtual, it is meaningless 
 to have FragmentsBuilder deriving from a class named Base, since most of 
 its functionality cannot be overridden. Attached is a patch for marking the 
 important methods virtual.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (SOLR-2593) A new core admin command 'split' for splitting index

2011-06-16 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050236#comment-13050236
 ] 

Noble Paul commented on SOLR-2593:
--

bq. the fq type option is basically going to require making a full copy of hte 
index and then deleting by query...

Lucene does it better. We can pass a Filtered Index to a new writer and it 
creates a new index w/ only those docs. I was surprised at the speed at which 
it split a dummy 1million doc index in  1 sec





 A new core admin command 'split' for splitting index
 

 Key: SOLR-2593
 URL: https://issues.apache.org/jira/browse/SOLR-2593
 Project: Solr
  Issue Type: New Feature
Reporter: Noble Paul
 Fix For: 4.0


 If an index is too large/hot it would be desirable to split it out to another 
 core .
 This core may eventually be replicated out to another host.
 There can be to be multiple strategies 
 * random split of x or x% 
 * fq=user:johndoe
 example 
 example :
 command=splitsplit=20percentnewcore=my_new_index
 or
 command=splitfq=user:johndoenewcore=john_doe_index

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2593) A new core admin action 'split' for splitting index

2011-06-16 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-2593:
-

Description: 
If an index is too large/hot it would be desirable to split it out to another 
core .
This core may eventually be replicated out to another host.

There can be to be multiple strategies 
* random split of x or x% 
* fq=user:johndoe


example :
action=splitsplit=20percentnewcore=my_new_index
or
action=splitfq=user:johndoenewcore=john_doe_index







  was:
If an index is too large/hot it would be desirable to split it out to another 
core .
This core may eventually be replicated out to another host.

There can be to be multiple strategies 
* random split of x or x% 
* fq=user:johndoe

example 
example :
command=splitsplit=20percentnewcore=my_new_index
or
command=splitfq=user:johndoenewcore=john_doe_index







Summary: A new core admin action 'split' for splitting index  (was: A 
new core admin command 'split' for splitting index)

 A new core admin action 'split' for splitting index
 ---

 Key: SOLR-2593
 URL: https://issues.apache.org/jira/browse/SOLR-2593
 Project: Solr
  Issue Type: New Feature
Reporter: Noble Paul
 Fix For: 4.0


 If an index is too large/hot it would be desirable to split it out to another 
 core .
 This core may eventually be replicated out to another host.
 There can be to be multiple strategies 
 * random split of x or x% 
 * fq=user:johndoe
 example :
 action=splitsplit=20percentnewcore=my_new_index
 or
 action=splitfq=user:johndoenewcore=john_doe_index

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3206) FST package API refactoring

2011-06-16 Thread Dawid Weiss (JIRA)
FST package API refactoring
---

 Key: LUCENE-3206
 URL: https://issues.apache.org/jira/browse/LUCENE-3206
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/FSTs
Affects Versions: 3.2
Reporter: Dawid Weiss
Assignee: Dawid Weiss
Priority: Minor
 Fix For: 3.3, 4.0


The current API is still marked @experimental, so I think there's still time to 
fiddle with it. I've been using the current API for some time and I do have 
some ideas for improvement. This is a placeholder for these -- I'll post a 
patch once I have a working proof of concept.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3174) Similarity.Stats class for term collection statistics

2011-06-16 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3174:


Attachment: LUCENE-3174.patch

here's an updated patch, i pushed query normalization into the Stats, and 
removed idf/etc from the weight impls.

I think this is close, all tests pass except TestCustomScoreQuery (its some 
explanation problem). I'm this close to @Ignoring it, since the query nor the 
test make any sense.

 Similarity.Stats class for term  collection statistics
 ---

 Key: LUCENE-3174
 URL: https://issues.apache.org/jira/browse/LUCENE-3174
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: core/search
Affects Versions: flexscoring branch
Reporter: David Mark Nemeskey
Assignee: David Mark Nemeskey
Priority: Minor
 Fix For: flexscoring branch

 Attachments: LUCENE-3174.patch, LUCENE-3174.patch, LUCENE-3174.patch, 
 LUCENE-3174.patch, LUCENE-3174_normalize_boost.patch


 In order to support ranking methods besides TF-IDF, we need to make the 
 statistics they need available. These statistics could be computed in 
 computeWeight (soon to become computeStats) and stored in a separate object 
 for easy access. Since this object will be used solely by subclasses of 
 Similarity, it should be implented as a static inner class, i.e. 
 Similarity.Stats.
 There are two ways this could be implemented:
 - as a single Similarity.Stats class, reused by all ranking algorithms. In 
 this case, this class would have a member field for all statistics;
 - as a hierarchy of Stats classes, one for each ranking algorithm. Each 
 subclass would define only the statistics needed for the ranking algorithm.
 In the second case, the Stats class in DefaultSimilarity would have a single 
 field, idf, while the one in e.g. BM25Similarity would have idf and average 
 field/document length.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2592) Pluggable shard lookup mechanism for SolrCloud

2011-06-16 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050243#comment-13050243
 ] 

Noble Paul commented on SOLR-2592:
--

This is why I created the issue SOLR-1431 . It may have a configuration as 
follows

{code:xml}
requestHandler name=standard class=solr.SearchHandler default=true
!-- other params go here --
 
 shardHandler class=CloudShardHandler/
/requestHandler
{code}



The CloudShardHandler should lookup ZK and return all the shards return all the 
shards by default. 


I should be able to write a custom FqFilterCloudShardHandler and narrow down 
the requests to one or more shards
{code:xml}
requestHandler name=standard class=solr.SearchHandler default=true
!-- other params go here --
 
 shardHandler class=FqFilterCloudShardHandler/
/requestHandler
{code}




 Pluggable shard lookup mechanism for SolrCloud
 --

 Key: SOLR-2592
 URL: https://issues.apache.org/jira/browse/SOLR-2592
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Affects Versions: 4.0
Reporter: Noble Paul

 If the data in a cloud can be partitioned on some criteria (say range, hash, 
 attribute value etc) It will be easy to narrow down the search to a smaller 
 subset of shards and in effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1431) CommComponent abstracted

2011-06-16 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050245#comment-13050245
 ] 

Noble Paul commented on SOLR-1431:
--

What are the concerns with the latest patch? I can work on them. I guess this 
is the optimal way to resolve SOLR-2592




 CommComponent abstracted
 

 Key: SOLR-1431
 URL: https://issues.apache.org/jira/browse/SOLR-1431
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 4.0
Reporter: Jason Rutherglen
Assignee: Mark Miller
 Fix For: 4.0

 Attachments: SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
 SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
 SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch


 We'll abstract CommComponent in this issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3174) Similarity.Stats class for term collection statistics

2011-06-16 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3174:


Attachment: LUCENE-3174.patch

here's the patch with the unrelated bug fixed in CustomScoreQuery.

now all tests pass.

 Similarity.Stats class for term  collection statistics
 ---

 Key: LUCENE-3174
 URL: https://issues.apache.org/jira/browse/LUCENE-3174
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: core/search
Affects Versions: flexscoring branch
Reporter: David Mark Nemeskey
Assignee: David Mark Nemeskey
Priority: Minor
 Fix For: flexscoring branch

 Attachments: LUCENE-3174.patch, LUCENE-3174.patch, LUCENE-3174.patch, 
 LUCENE-3174.patch, LUCENE-3174.patch, LUCENE-3174_normalize_boost.patch


 In order to support ranking methods besides TF-IDF, we need to make the 
 statistics they need available. These statistics could be computed in 
 computeWeight (soon to become computeStats) and stored in a separate object 
 for easy access. Since this object will be used solely by subclasses of 
 Similarity, it should be implented as a static inner class, i.e. 
 Similarity.Stats.
 There are two ways this could be implemented:
 - as a single Similarity.Stats class, reused by all ranking algorithms. In 
 this case, this class would have a member field for all statistics;
 - as a hierarchy of Stats classes, one for each ranking algorithm. Each 
 subclass would define only the statistics needed for the ranking algorithm.
 In the second case, the Stats class in DefaultSimilarity would have a single 
 field, idf, while the one in e.g. BM25Similarity would have idf and average 
 field/document length.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3207) CustomScoreQuery calls weight() where it should call createWeight()

2011-06-16 Thread Robert Muir (JIRA)
CustomScoreQuery calls weight() where it should call createWeight()
---

 Key: LUCENE-3207
 URL: https://issues.apache.org/jira/browse/LUCENE-3207
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-3207.patch

Thanks to Uwe for helping me track down this bug after I pulled my hair out for 
hours on LUCENE-3174.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3207) CustomScoreQuery calls weight() where it should call createWeight()

2011-06-16 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3207:


Attachment: LUCENE-3207.patch

 CustomScoreQuery calls weight() where it should call createWeight()
 ---

 Key: LUCENE-3207
 URL: https://issues.apache.org/jira/browse/LUCENE-3207
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-3207.patch


 Thanks to Uwe for helping me track down this bug after I pulled my hair out 
 for hours on LUCENE-3174.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3207) CustomScoreQuery calls weight() where it should call createWeight()

2011-06-16 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050263#comment-13050263
 ] 

Robert Muir commented on LUCENE-3207:
-

as an explanation, this causes this query to call sumOfSquaredWeights + 
queryNorm + normalize() twice.

the reason it doesnt cause any tests to fail in trunk is this:
in trunk sumOfSquaredWeights is not really a getter, its also a setter:
{noformat}
@Override
public float sumOfSquaredWeights() {
  queryWeight = idf * getBoost(); // compute query weight
  return queryWeight * queryWeight;   // square it
}
{noformat}

in my patch on LUCENE-3174, my sumOfSquaredWeights returns queryWeight * 
queryWeight, but doesn't reset any state.
so you end out normalizing twice and thats why the test failed on the branch.


 CustomScoreQuery calls weight() where it should call createWeight()
 ---

 Key: LUCENE-3207
 URL: https://issues.apache.org/jira/browse/LUCENE-3207
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-3207.patch


 Thanks to Uwe for helping me track down this bug after I pulled my hair out 
 for hours on LUCENE-3174.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3207) CustomScoreQuery calls weight() where it should call createWeight()

2011-06-16 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050264#comment-13050264
 ] 

Uwe Schindler commented on LUCENE-3207:
---

This bug is stupid: I had a similar issue during the rewrite of 
ConstantScoreQuery to directly wrap queries, where I copied some code from 
CustomScoreQuery (just removed the custom scoring). I fixed it in Constant*, 
not sure why I left CustomScoreQuery unchanged. Maybe because tests passed.

 CustomScoreQuery calls weight() where it should call createWeight()
 ---

 Key: LUCENE-3207
 URL: https://issues.apache.org/jira/browse/LUCENE-3207
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-3207.patch


 Thanks to Uwe for helping me track down this bug after I pulled my hair out 
 for hours on LUCENE-3174.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3174) Similarity.Stats class for term collection statistics

2011-06-16 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3174:


Attachment: LUCENE-3174.patch

i fixed a few problems: javadocs warnings and also the fact that i had left an 
assert commented out from hair-pulling with CustomScoreQuery.


 Similarity.Stats class for term  collection statistics
 ---

 Key: LUCENE-3174
 URL: https://issues.apache.org/jira/browse/LUCENE-3174
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: core/search
Affects Versions: flexscoring branch
Reporter: David Mark Nemeskey
Assignee: David Mark Nemeskey
Priority: Minor
 Fix For: flexscoring branch

 Attachments: LUCENE-3174.patch, LUCENE-3174.patch, LUCENE-3174.patch, 
 LUCENE-3174.patch, LUCENE-3174.patch, LUCENE-3174.patch, 
 LUCENE-3174_normalize_boost.patch


 In order to support ranking methods besides TF-IDF, we need to make the 
 statistics they need available. These statistics could be computed in 
 computeWeight (soon to become computeStats) and stored in a separate object 
 for easy access. Since this object will be used solely by subclasses of 
 Similarity, it should be implented as a static inner class, i.e. 
 Similarity.Stats.
 There are two ways this could be implemented:
 - as a single Similarity.Stats class, reused by all ranking algorithms. In 
 this case, this class would have a member field for all statistics;
 - as a hierarchy of Stats classes, one for each ranking algorithm. Each 
 subclass would define only the statistics needed for the ranking algorithm.
 In the second case, the Stats class in DefaultSimilarity would have a single 
 field, idf, while the one in e.g. BM25Similarity would have idf and average 
 field/document length.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3208) Move Query.weight() to IndexSearcher as protected method

2011-06-16 Thread Uwe Schindler (JIRA)
Move Query.weight() to IndexSearcher as protected method


 Key: LUCENE-3208
 URL: https://issues.apache.org/jira/browse/LUCENE-3208
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.3, 4.0


We had this issue several times, latest in LUCENE-3207.

The method Query.weight() was left in Query for backwards reasons in Lucene 2.9 
when we changed Weight class. This method is only to be called on top-level 
queries - and this is done by IndexSearcher. This method is just a utility 
method, that has nothing to do with the query itsself (it just combines the 
createWeight method and calls the normalization afterwards). 

The problem we have is that any query that wraps other queries (like 
CustomScore, ConstantScore, Boolean) calls Query.weight() instead of 
Query.createWeight(), it will do normalization two times, leading to strange 
bugs.

For 3.3 I will make Query.weight() simply delegate to IndexSearcher's 
replacement method with a big deprecation warning, so user sees this. In 
IndexSearcher itsself the method will be protected to only be called by itsself 
or subclasses of IndexSearcher. Delegation for backwards is no problem, as 
protected is accessible by classes in same package.

I would suggest the method name to be 
IndexSearcher.createNormalizedWeight(Query q)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3208) Move Query.weight() to IndexSearcher as protected method

2011-06-16 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050296#comment-13050296
 ] 

Robert Muir commented on LUCENE-3208:
-

+1

 Move Query.weight() to IndexSearcher as protected method
 

 Key: LUCENE-3208
 URL: https://issues.apache.org/jira/browse/LUCENE-3208
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.3, 4.0


 We had this issue several times, latest in LUCENE-3207.
 The method Query.weight() was left in Query for backwards reasons in Lucene 
 2.9 when we changed Weight class. This method is only to be called on 
 top-level queries - and this is done by IndexSearcher. This method is just a 
 utility method, that has nothing to do with the query itsself (it just 
 combines the createWeight method and calls the normalization afterwards). 
 The problem we have is that any query that wraps other queries (like 
 CustomScore, ConstantScore, Boolean) calls Query.weight() instead of 
 Query.createWeight(), it will do normalization two times, leading to strange 
 bugs.
 For 3.3 I will make Query.weight() simply delegate to IndexSearcher's 
 replacement method with a big deprecation warning, so user sees this. In 
 IndexSearcher itsself the method will be protected to only be called by 
 itsself or subclasses of IndexSearcher. Delegation for backwards is no 
 problem, as protected is accessible by classes in same package.
 I would suggest the method name to be 
 IndexSearcher.createNormalizedWeight(Query q)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3208) Move Query.weight() to IndexSearcher as protected method

2011-06-16 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050301#comment-13050301
 ] 

Simon Willnauer commented on LUCENE-3208:
-

+1

 Move Query.weight() to IndexSearcher as protected method
 

 Key: LUCENE-3208
 URL: https://issues.apache.org/jira/browse/LUCENE-3208
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.3, 4.0


 We had this issue several times, latest in LUCENE-3207.
 The method Query.weight() was left in Query for backwards reasons in Lucene 
 2.9 when we changed Weight class. This method is only to be called on 
 top-level queries - and this is done by IndexSearcher. This method is just a 
 utility method, that has nothing to do with the query itsself (it just 
 combines the createWeight method and calls the normalization afterwards). 
 The problem we have is that any query that wraps other queries (like 
 CustomScore, ConstantScore, Boolean) calls Query.weight() instead of 
 Query.createWeight(), it will do normalization two times, leading to strange 
 bugs.
 For 3.3 I will make Query.weight() simply delegate to IndexSearcher's 
 replacement method with a big deprecation warning, so user sees this. In 
 IndexSearcher itsself the method will be protected to only be called by 
 itsself or subclasses of IndexSearcher. Delegation for backwards is no 
 problem, as protected is accessible by classes in same package.
 I would suggest the method name to be 
 IndexSearcher.createNormalizedWeight(Query q)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3208) Move Query.weight() to IndexSearcher as protected method

2011-06-16 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050316#comment-13050316
 ] 

Uwe Schindler commented on LUCENE-3208:
---

I started to rewrite some stuff, very straightforward.

- BufferedDeletesStream has to be changed as it was also calling Query.weight, 
but I replaced the usage here by QueryWrapperFilter and getting the DocIdSet. 
Code gets much easier here.
- QueryWrapperFilter's hack was rewritten, easy
- in TestFrameWork, QueryUtils were also rewritten, they often use weight, but 
thats internal only.

The main issue:
In IndexSearcher is already a method called createWeight(Query) (which 
currently delegates to the Query). I moved the code over here. I have to still 
complain about the name, it creates a Weight yes, but it should also note that 
it rewrites and normalizes the weight. So I would like to rename that method, 
too and deprecate the old one.

For now I leave the name unchanged. Patch comes soon (core only).

 Move Query.weight() to IndexSearcher as protected method
 

 Key: LUCENE-3208
 URL: https://issues.apache.org/jira/browse/LUCENE-3208
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.3, 4.0


 We had this issue several times, latest in LUCENE-3207.
 The method Query.weight() was left in Query for backwards reasons in Lucene 
 2.9 when we changed Weight class. This method is only to be called on 
 top-level queries - and this is done by IndexSearcher. This method is just a 
 utility method, that has nothing to do with the query itsself (it just 
 combines the createWeight method and calls the normalization afterwards). 
 The problem we have is that any query that wraps other queries (like 
 CustomScore, ConstantScore, Boolean) calls Query.weight() instead of 
 Query.createWeight(), it will do normalization two times, leading to strange 
 bugs.
 For 3.3 I will make Query.weight() simply delegate to IndexSearcher's 
 replacement method with a big deprecation warning, so user sees this. In 
 IndexSearcher itsself the method will be protected to only be called by 
 itsself or subclasses of IndexSearcher. Delegation for backwards is no 
 problem, as protected is accessible by classes in same package.
 I would suggest the method name to be 
 IndexSearcher.createNormalizedWeight(Query q)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-06-16 Thread Varun Thacker (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Thacker updated LUCENE-2793:
--

Attachment: LUCENE-2793.patch

I made the changes. I also fixed test-framework but haven't touched the test 
cases yet. 

 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
Assignee: Varun Thacker
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch


 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-1431) CommComponent abstracted

2011-06-16 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-1431:
-

Attachment: SOLR-1431.patch

Even the checkDistributed() method is abstracted out to ShardHandler. The 
current HttpShardHandler (this is default) takes care of zookeeper also

 CommComponent abstracted
 

 Key: SOLR-1431
 URL: https://issues.apache.org/jira/browse/SOLR-1431
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 4.0
Reporter: Jason Rutherglen
Assignee: Mark Miller
 Fix For: 4.0

 Attachments: SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
 SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
 SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch


 We'll abstract CommComponent in this issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-1431) CommComponent abstracted

2011-06-16 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul reassigned SOLR-1431:


Assignee: Noble Paul  (was: Mark Miller)

 CommComponent abstracted
 

 Key: SOLR-1431
 URL: https://issues.apache.org/jira/browse/SOLR-1431
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 4.0
Reporter: Jason Rutherglen
Assignee: Noble Paul
 Fix For: 4.0

 Attachments: SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
 SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
 SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch


 We'll abstract CommComponent in this issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1431) CommComponent abstracted

2011-06-16 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050340#comment-13050340
 ] 

Mark Miller commented on SOLR-1431:
---

I can look at this latest patch soon Noble. We should also give Jason a fair 
amount of time to weigh in.

 CommComponent abstracted
 

 Key: SOLR-1431
 URL: https://issues.apache.org/jira/browse/SOLR-1431
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 4.0
Reporter: Jason Rutherglen
Assignee: Noble Paul
 Fix For: 4.0

 Attachments: SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
 SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
 SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch


 We'll abstract CommComponent in this issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1431) CommComponent abstracted

2011-06-16 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050339#comment-13050339
 ] 

Noble Paul commented on SOLR-1431:
--

This might need some more cleanup, but I think it is close to a state where it 
can be checked in. 



 CommComponent abstracted
 

 Key: SOLR-1431
 URL: https://issues.apache.org/jira/browse/SOLR-1431
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 4.0
Reporter: Jason Rutherglen
Assignee: Noble Paul
 Fix For: 4.0

 Attachments: SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
 SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
 SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch


 We'll abstract CommComponent in this issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3206) FST package API refactoring

2011-06-16 Thread Dawid Weiss (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss updated LUCENE-3206:


Attachment: LUCENE-3206.patch

An empty (but compiling and consistent) take at the FST/FSA API.

 FST package API refactoring
 ---

 Key: LUCENE-3206
 URL: https://issues.apache.org/jira/browse/LUCENE-3206
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/FSTs
Affects Versions: 3.2
Reporter: Dawid Weiss
Assignee: Dawid Weiss
Priority: Minor
 Fix For: 3.3, 4.0

 Attachments: LUCENE-3206.patch


 The current API is still marked @experimental, so I think there's still time 
 to fiddle with it. I've been using the current API for some time and I do 
 have some ideas for improvement. This is a placeholder for these -- I'll post 
 a patch once I have a working proof of concept.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3206) FST package API refactoring

2011-06-16 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050347#comment-13050347
 ] 

Dawid Weiss commented on LUCENE-3206:
-

This is my take at the revamped FST API. My changes are mostly aiming at having 
a bit clearer code (especially wrt. to loops), but also detach the algebra of 
a transition's output from the actual output. This should allow us to create an 
output algebra that would work directly on mutable integers, for example (to 
save on autoboxing). I also just like the way it reads after the changes:
{code}
  FSTInteger fst = FSTBuilder.fst(FST.ArcLabel.BYTE2, PositiveInt.class)
.add(abc, 10)
.add(abc, 5)
.add(def, 0, 3), 2)
.build();
{code}
or a loop over all arcs of a state:
{code}
  ArcInteger arc = fst.getRoot();
  for (ArcInteger tmp = arc.copy(); tmp.hasNext(); tmp.next()) {
int label = tmp.getLabel(); // transition label here.
Integer output = tmp.getOutput(); // FSAs have a constant empty output.
  }
{code}

I definitely didn't consider all the use cases that FSTs are used for currently 
(in particular the stop bit indicating non-accepted input sequences that are 
also dead ends), but I think these could be integrated... I think :) 

Arcs now also store the pointer to the FST object, which may seem like an 
overhead, but I doubt it really will be (it's a single pointer and we buffer 
arcs whenever we can; a larger waste is having an object on each arc's output, 
even if it can be a primitive type or reused buffer).




 FST package API refactoring
 ---

 Key: LUCENE-3206
 URL: https://issues.apache.org/jira/browse/LUCENE-3206
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/FSTs
Affects Versions: 3.2
Reporter: Dawid Weiss
Assignee: Dawid Weiss
Priority: Minor
 Fix For: 3.3, 4.0

 Attachments: LUCENE-3206.patch


 The current API is still marked @experimental, so I think there's still time 
 to fiddle with it. I've been using the current API for some time and I do 
 have some ideas for improvement. This is a placeholder for these -- I'll post 
 a patch once I have a working proof of concept.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (LUCENE-3206) FST package API refactoring

2011-06-16 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050347#comment-13050347
 ] 

Dawid Weiss edited comment on LUCENE-3206 at 6/16/11 11:01 AM:
---

This is my take at the revamped FST API. My changes are mostly aiming at having 
a bit clearer code (especially wrt. to loops), but also detach the algebra of 
a transition's output from the actual output. This should allow us to create an 
output algebra that would work directly on mutable integers, for example (to 
save on autoboxing). I also just like the way it reads after the changes:
{code}
  FSTInteger fst = FSTBuilder.fst(FST.ArcLabel.BYTE2, PositiveInt.class)
.add(abc, 10)
.add(abc, 5)
.add(def, 0, 3), 2)
.build();
{code}
or a loop over all arcs of a state:
{code}
  ArcInteger arc = fst.getRoot();
  for (ArcInteger tmp = arc.copy(); tmp.hasNext(); tmp.next()) {
int label = tmp.getLabel(); // transition label here.
Integer output = tmp.getOutput();
  }
{code}

I definitely didn't consider all the use cases that FSTs are used for currently 
(in particular the stop bit indicating non-accepted input sequences that are 
also dead ends), but I think these could be integrated... I think :) 

Arcs now also store the pointer to the FST object, which may seem like an 
overhead, but I doubt it really will be (it's a single pointer and we buffer 
arcs whenever we can; a larger waste is having an object on each arc's output, 
even if it can be a primitive type or reused buffer).




  was (Author: dweiss):
This is my take at the revamped FST API. My changes are mostly aiming at 
having a bit clearer code (especially wrt. to loops), but also detach the 
algebra of a transition's output from the actual output. This should allow us 
to create an output algebra that would work directly on mutable integers, for 
example (to save on autoboxing). I also just like the way it reads after the 
changes:
{code}
  FSTInteger fst = FSTBuilder.fst(FST.ArcLabel.BYTE2, PositiveInt.class)
.add(abc, 10)
.add(abc, 5)
.add(def, 0, 3), 2)
.build();
{code}
or a loop over all arcs of a state:
{code}
  ArcInteger arc = fst.getRoot();
  for (ArcInteger tmp = arc.copy(); tmp.hasNext(); tmp.next()) {
int label = tmp.getLabel(); // transition label here.
Integer output = tmp.getOutput(); // FSAs have a constant empty output.
  }
{code}

I definitely didn't consider all the use cases that FSTs are used for currently 
(in particular the stop bit indicating non-accepted input sequences that are 
also dead ends), but I think these could be integrated... I think :) 

Arcs now also store the pointer to the FST object, which may seem like an 
overhead, but I doubt it really will be (it's a single pointer and we buffer 
arcs whenever we can; a larger waste is having an object on each arc's output, 
even if it can be a primitive type or reused buffer).



  
 FST package API refactoring
 ---

 Key: LUCENE-3206
 URL: https://issues.apache.org/jira/browse/LUCENE-3206
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/FSTs
Affects Versions: 3.2
Reporter: Dawid Weiss
Assignee: Dawid Weiss
Priority: Minor
 Fix For: 3.3, 4.0

 Attachments: LUCENE-3206.patch


 The current API is still marked @experimental, so I think there's still time 
 to fiddle with it. I've been using the current API for some time and I do 
 have some ideas for improvement. This is a placeholder for these -- I'll post 
 a patch once I have a working proof of concept.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-2601) Create a MessagePackResponseWriter

2011-06-16 Thread Noble Paul (JIRA)
Create a MessagePackResponseWriter
--

 Key: SOLR-2601
 URL: https://issues.apache.org/jira/browse/SOLR-2601
 Project: Solr
  Issue Type: New Feature
Reporter: Noble Paul
Assignee: Noble Paul
Priority: Minor


In the past I explored various standard communication formats for Solr. No 
other format was very suitable. MessagePack seems to be a suitable format . 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-1331) Support merging multiple cores

2011-06-16 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-1331:


Attachment: SOLR-1331.patch

Adds a srcCore (multi-valued) parameter through which one or more source core 
names can be given.

We use the IW.addIndexes(IndexReader...) method to merge the source core's 
indexes to the target core's index. Even if an IW is open on the source 
indexes, using a reader protects against corruption.

Note - although the indexDir param also ends up calling the 
IW.addIndexes(IndexReader...) method, we cannot protect against open IWs on the 
directory so the caveat of calling commit before using mergeindexes with 
indexDir param still applies.

A commit needs to be called after a merge action to see the changes.

 Support merging multiple cores
 --

 Key: SOLR-1331
 URL: https://issues.apache.org/jira/browse/SOLR-1331
 Project: Solr
  Issue Type: New Feature
  Components: multicore
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
 Fix For: 3.3

 Attachments: SOLR-1331.patch


 There should be a provision to merge one core with another. It should be 
 possible to create a core, add documents to it and then just merge it into 
 the main core which is serving requests. This way, the user will not need to 
 know the filesystem as it is needed for SOLR-1051

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2597) XmlCharFilter

2011-06-16 Thread Mike Sokolov (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050370#comment-13050370
 ] 

Mike Sokolov commented on SOLR-2597:


OK - I can extend LuceneTestCase, use its random, add can certainly a test for 
the Factory.

I'm not sure what the right package for this code is; working in Eclipse of 
course, all the jars get mushed into one giant classpath.  I guess I should 
build w/ant to see the dependency issues?  But it does sound as if it needs to 
move somewhere where solr/lib contents can be a dependent.

Apparently there is another jar you can get 
(http://woodstox.codehaus.org/stax-api-1.0.1.jar) to provide the 
javax.xml.stream package (StaX) for Java 5, but it doesn't sound as if it would 
be worth the trouble if this moves into solr land - is that right, can we rely 
on Java 6 there? 

I agree that having a static parser is distasteful, but it's a performance 
optimization.  It tends to be expensive to instantiate these parsers.  I'm not 
clear on what the object lifecycle for the XmlCharFilter is exactly - Robert 
are you saying the factory is long-lived, but the filter is not?

 XmlCharFilter
 -

 Key: SOLR-2597
 URL: https://issues.apache.org/jira/browse/SOLR-2597
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 4.0
Reporter: Mike Sokolov
 Attachments: SOLR-2597.patch


 This CharFilter processes incoming XML using the Woodstox parser, stripping 
 all non-text content and remembering offsets, just like HTMLCharFilter, but 
 respecting XML conventions like XML entities defined in a DTD.  XmlCharFilter 
 also provides the ability to exclude (and include) the content of certain 
 named elements.
 In order to compute character offsets properly when mixed line termination 
 styles are present (\r, \r\n), or when XML character entities (lt;, quot;, 
 amp;) are present, we require a newer version of Woodstox (4.1.1) than is 
 currently in solr/lib.  The earlier versions of the parser could not report 
 these entity events, so we couldn't tell the difference between  and 
 lt; and the offsets could be wrong.  The upgraded version is in the patch.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3208) Move Query.weight() to IndexSearcher as protected method

2011-06-16 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050382#comment-13050382
 ] 

Michael McCandless commented on LUCENE-3208:


+1

 Move Query.weight() to IndexSearcher as protected method
 

 Key: LUCENE-3208
 URL: https://issues.apache.org/jira/browse/LUCENE-3208
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.3, 4.0

 Attachments: LUCENE-3208.patch


 We had this issue several times, latest in LUCENE-3207.
 The method Query.weight() was left in Query for backwards reasons in Lucene 
 2.9 when we changed Weight class. This method is only to be called on 
 top-level queries - and this is done by IndexSearcher. This method is just a 
 utility method, that has nothing to do with the query itsself (it just 
 combines the createWeight method and calls the normalization afterwards). 
 The problem we have is that any query that wraps other queries (like 
 CustomScore, ConstantScore, Boolean) calls Query.weight() instead of 
 Query.createWeight(), it will do normalization two times, leading to strange 
 bugs.
 For 3.3 I will make Query.weight() simply delegate to IndexSearcher's 
 replacement method with a big deprecation warning, so user sees this. In 
 IndexSearcher itsself the method will be protected to only be called by 
 itsself or subclasses of IndexSearcher. Delegation for backwards is no 
 problem, as protected is accessible by classes in same package.
 I would suggest the method name to be 
 IndexSearcher.createNormalizedWeight(Query q)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3208) Move Query.weight() to IndexSearcher as protected method

2011-06-16 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-3208:
--

Attachment: LUCENE-3208.patch

Here the patch with IndexSearcher.createWeigth renamed to 
createNormalizedWeight() and public/expert, so Solr can access it and custom 
search code.

I am currently thinking about a possibility to check that each Weight is only 
normaliized one time, possibly using setOnce(). Its not easy to do, maybe wrap 
the Weight returned by the IndexSearcher method using a WrappedWeight that 
throws UOE on normalize,

 Move Query.weight() to IndexSearcher as protected method
 

 Key: LUCENE-3208
 URL: https://issues.apache.org/jira/browse/LUCENE-3208
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.3, 4.0

 Attachments: LUCENE-3208.patch, LUCENE-3208.patch


 We had this issue several times, latest in LUCENE-3207.
 The method Query.weight() was left in Query for backwards reasons in Lucene 
 2.9 when we changed Weight class. This method is only to be called on 
 top-level queries - and this is done by IndexSearcher. This method is just a 
 utility method, that has nothing to do with the query itsself (it just 
 combines the createWeight method and calls the normalization afterwards). 
 The problem we have is that any query that wraps other queries (like 
 CustomScore, ConstantScore, Boolean) calls Query.weight() instead of 
 Query.createWeight(), it will do normalization two times, leading to strange 
 bugs.
 For 3.3 I will make Query.weight() simply delegate to IndexSearcher's 
 replacement method with a big deprecation warning, so user sees this. In 
 IndexSearcher itsself the method will be protected to only be called by 
 itsself or subclasses of IndexSearcher. Delegation for backwards is no 
 problem, as protected is accessible by classes in same package.
 I would suggest the method name to be 
 IndexSearcher.createNormalizedWeight(Query q)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-219) Determine if prefix, wildcard, fuzzy queries should be lowercased

2011-06-16 Thread Gunnar Wagenknecht (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050404#comment-13050404
 ] 

Gunnar Wagenknecht commented on SOLR-219:
-

Any progress on the issue? We are also hit by this issue. Ideally, it would be 
nice if I could configure the analyzers to run for wildcard queries. For 
example, I still want to do lowercasing and character normalization (umlauts) 
for wildcard queries.

 Determine if prefix, wildcard, fuzzy queries should be lowercased
 -

 Key: SOLR-219
 URL: https://issues.apache.org/jira/browse/SOLR-219
 Project: Solr
  Issue Type: Improvement
Reporter: Yonik Seeley
Priority: Minor
 Fix For: 3.3

 Attachments: lowercase_prefix.patch, wildcardlowercase.patch


 Solr should be able to do the right thing when doing prefix/wildcard/fuzzy 
 queries on fields with respect to lowercasing or not.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2597) XmlCharFilter

2011-06-16 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050406#comment-13050406
 ] 

Robert Muir commented on SOLR-2597:
---

yes, the factories are long-lived and do expensive things up-front to configure 
themselves (parsing files etc)


 XmlCharFilter
 -

 Key: SOLR-2597
 URL: https://issues.apache.org/jira/browse/SOLR-2597
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 4.0
Reporter: Mike Sokolov
 Attachments: SOLR-2597.patch


 This CharFilter processes incoming XML using the Woodstox parser, stripping 
 all non-text content and remembering offsets, just like HTMLCharFilter, but 
 respecting XML conventions like XML entities defined in a DTD.  XmlCharFilter 
 also provides the ability to exclude (and include) the content of certain 
 named elements.
 In order to compute character offsets properly when mixed line termination 
 styles are present (\r, \r\n), or when XML character entities (lt;, quot;, 
 amp;) are present, we require a newer version of Woodstox (4.1.1) than is 
 currently in solr/lib.  The earlier versions of the parser could not report 
 these entity events, so we couldn't tell the difference between  and 
 lt; and the offsets could be wrong.  The upgraded version is in the patch.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Indexing slower in trunk

2011-06-16 Thread Simon Willnauer
On Tue, Jun 14, 2011 at 2:54 PM, Erick Erickson erickerick...@gmail.com wrote:
 Thanks, guys. Yes, I am running it all locally and disk seeks
 may well be the culprit. This thread is mainly to be sure that
 the behavior I'm seeing is expected, or at least explainable.

 Really, I don't need to pursue this further unless there's
 actually data I can gather to help speed things up. If this
 is just a consequence of DWPT and/or my particular
 setup then that's fine. I'm mostly trying to understand
 the characteristics of indexing/searching on the trunk.
 This started with me exploring memory
 requirements, and is really just something I noticed along
 the way and wanted to get some feedback on.

 So, absent the commit step, the times are reasonably
 comparable. Can I impose upon one of you to give a
 two-sentence summary of what DWPT buys us from a
 user perspective? If memory serves it should have
 background merging and other goodies.

Let me try to conclude this in a couple of sentences:

Previously IW wrote small in memory segments on a per-thread basis and
merged then together on flush. Yet, this means you can't add / update
any documents while we are flushing and flushing can take a long time.
With DWPT  we write segments single-threaded so each thread gets its
private DWPT. That allows to flush segments to disc concurrently while
carry on indexing at the same time. The performance gains are massive
here, our nightly benchmark sees 269% speedup on indexing throughput.
The downside is that you have to do more merges eventually since you
write more smallish segments (no in memory merge on flush). Plus if
you read from the same disk you are writing too you might see slower
indexing.

to read more about this I wrote a blog that explains a big portion of
it: http://blog.jteam.nl/2011/04/01/gimme-all-resources-you-have-i-can-use-them/


 Uwe:
 Yep, I was curious about optimize but understand that it's not required
 in recent code. That said, data is not searchable until a commit
 happens, so just for yucks I changed the optimize to a commit. Stats
 of that run below.

 Simon:
 OK, adjusted the ram buffer size to 512M, and it's a bit faster, but
 not all that much, see stats, and the delta could well be sampling
 errors, one run doth not a statistical certainty make. Up until the
 commit step, the admin stats page is showing no documents in
 the index so I think this setting completely avoids intermediate
 committing although that says nothing about the individual writers
 writing lots of segments to disk, that still happens.

 Added 188 docs. Took 1437 ms. cumulative interval (seconds) = 284
 Added 189 docs. Took 1285 ms. cumulative interval (seconds) = 285
 Added 190 docs. Took 1182 ms. cumulative interval (seconds) = 286
 Added 191 docs. Took 1675 ms. cumulative interval (seconds) = 288
 About to commit, total time so far: 290
 Total Time Taken- 395 seconds    ***100 secs for the commit to finish.
 Total documents added- 1917728
 Docs/sec- 4855

 Thanks, all
 Erick


 On Tue, Jun 14, 2011 at 4:39 AM, Simon Willnauer
 simon.willna...@googlemail.com wrote:
 Erick, it seems you need to adjust your settings for 4.0 a little.
 When you index with DWPT it builds thread private segments which are
 independently flushed to disk. Yet, when you set your ram buffer IW
 will accumulate the ram used by all active DWPT and flush the largest
 once you reach your ram buffer. with 128M you might end up wil lots of
 small segments which need to be merged in the background. Eventually
 what will happen here is that your disk is so busy that you are not
 able to flush fast enough and threads might stall.

 What you can try here is adjust your RAM buffer to be a little higher,
 lets say 350MB or change the max number of thread states in
 DocumentsWriterPerThreadPool ie.
 ThreadAffinityDocumentsWriterThreadPool. The latter is unfortunately
 not exposed yet in solr so maybe for testing you just want to change
 the default value in DocumentsWriterPerThreadPool to 4. That will also
 cause segments to be bigger eventually.

 simon

 On Tue, Jun 14, 2011 at 10:28 AM, Uwe Schindler u...@thetaphi.de wrote:
 Hi Erick,

 Do you use harddisks or SSDs? I assume harddisks, which may explain what you
 see:

 - DWPT writes lots of segments in parallel, which also explains why you are
 seeing more files. Writing in parallel to several files, needs more head
 movements of your harddisk and this slows down. In the past, only one
 segment was written at the same time (sequential), so the harddisk is not so
 stressed.
 - Optimizing may be slower for the same reason: there are many more files to
 merge (but optimize cost should not be counted as a problem here as normally
 you won't need to optimize after initial indexing and optimizing was only a
 good idea pre Lucene-2.9, now it's mostly obsolete)

 Uwe

 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de


 -Original 

[jira] [Commented] (SOLR-2583) Make external scoring more efficient (ExternalFileField, FileFloatSource)

2011-06-16 Thread Martin Grotzke (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050435#comment-13050435
 ] 

Martin Grotzke commented on SOLR-2583:
--

bq. Are you sure real floats are actually needed?
In our case score values are e.g. 15887 (one example just taken from one of 
the files). With this sample this test fails:
{noformat}
byte small = SmallFloat.floatToByte315(104626500f);
assertEquals(104626500f, SmallFloat.byte315ToFloat(small), 0f);
- AssertionError: expected:1.04626496E8 but was:1.00663296E8
{noformat}

This shows that even we have a case where this will produce wrong results, and 
even if we could fix this in our case there might be someone else with the same 
issue.


bq. it would also good to measure performance...
I'd not expect that the boxing makes a real difference here, especially in 
relation to the rest of the time spent during a search request.
A time based performance comparison that has a real value would take some time, 
it would have to put in relation to the rest of a search request (how do you do 
this?) and finally it would require proper interpretation when everything is 
together. Right now I don't think it's worth the effort.


{quote}
bq. that uses a fixed size and an increasing number of puts
I'm not certain how realistic that is, remember behind the scenes 
compactbytearray uses blocks,
and if you touch every one (by putting every K docid or something) then you are 
just testing
the worst case.
{quote}
Do you want to change the test to s.th. that's more realistic?


@Yonik: what do you say regarding the suggestion to use HashMap up to ~5.5% and 
above that using the float[]?

 Make external scoring more efficient (ExternalFileField, FileFloatSource)
 -

 Key: SOLR-2583
 URL: https://issues.apache.org/jira/browse/SOLR-2583
 Project: Solr
  Issue Type: Improvement
  Components: search
Reporter: Martin Grotzke
Priority: Minor
 Attachments: FileFloatSource.java.patch, patch.txt


 External scoring eats much memory, depending on the number of documents in 
 the index. The ExternalFileField (used for external scoring) uses 
 FileFloatSource, where one FileFloatSource is created per external scoring 
 file. FileFloatSource creates a float array with the size of the number of 
 docs (this is also done if the file to load is not found). If there are much 
 less entries in the scoring file than there are number of docs in total the 
 big float array wastes much memory.
 This could be optimized by using a map of doc - score, so that the map 
 contains as many entries as there are scoring entries in the external file, 
 but not more.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[Lucene.Net] alternatives to FSDirectory for multi-threaded search performance

2011-06-16 Thread Robert Stewart
What are the recommended best practices for using FSDirectory vs. RamDirectory, 
etc. for use in multi-threaded search?

In a previous version of Lucene.Net (1.9) I used a modified FSDirectory 
implementation which used a pool of open FileStream objects for each segment 
file, and handed them out in round-robin fashion from the Clone() method.  That 
way multiple threads could read most segment files in parallel.  It definitely 
increased multithreaded search performance quite a bit.  My indexes are quite 
large (100+ million docs) and I can not load entire segments in to RAM using 
RamDirectory.

My question is what is the best practice here?  Is using a pool of descriptors 
as described above the best idea?

Thanks
Bob

[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-06-16 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050444#comment-13050444
 ] 

Simon Willnauer commented on LUCENE-2793:
-

hey varun

patch looks close!

here are some comments:

* the assert context == Context.MERGE should be assert context != Context.MERGE 
|| mergeInfo != null;
* can you move that assert into IOContext(Context, MergeInfo) and let other 
related constructors call this(context, mergeInfo) instead of initializing all 
members themself?
* I think there should be a public static final IOContext READONCE = new 
IOContext(true); then you can make the corresponding constructor private. I 
think the context should be Context.READ instead of default in that case right?
* IOContext(MergePolicy.OneMerge) seems to be unnecessary. I think you should 
add a method to OneMerge to get a MergeInfo from it and only have a MergeInfo 
ctor. Then you can move MergeInfo into OneMerge too.
* PerFieldCodecWrapper still seems to be deleted
* In IndexReader IOContext context=null; should be IOContext context= new 
IOContext(READ); no?
*  no commit should be nocommit - we have a script on jenkins that checks this 
:)
* I still see some whitespace problems in SegmentWriteState.java 
* I think IOContext.DEFAULT_IOCONTEXT should be IOContext.DEFAULT since 
IOContext is implicit


I am waiting for you fixing the tests before I review further. Yet, what is 
missing is still the decision what buffer size to used down in direcotries etc.

good work so far!



 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
Assignee: Varun Thacker
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch


 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-06-16 Thread Varun Thacker (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Thacker updated LUCENE-2793:
--

Attachment: LUCENE-2793.patch

Sorry for messing up the patch again! 

 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
Assignee: Varun Thacker
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch


 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3206) FST package API refactoring

2011-06-16 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050459#comment-13050459
 ] 

Michael McCandless commented on LUCENE-3206:


This new FST API looks *sweet*!  Nice work :)

So with this we no longer need static Util methods right?  (Since each
arc can .follow a sequence of inputs).

I like OutputAlgebra ... better matches what this class actually does,
and if this means we can not create a new Object for every arc transition
that would be great (this makes FST lookups costly now).

I don't know if this is possible, but, one thing I don't like about
the current API is that the BYTE1/2/4 is an enum and not parameterized
into the Builder/FST.  Ie, Builder/FST should really take the input
type as a type param too, since really an FST acts like a SortedMapK,V.
But I fear this could get scary-hairy w/ the required generics...


 FST package API refactoring
 ---

 Key: LUCENE-3206
 URL: https://issues.apache.org/jira/browse/LUCENE-3206
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/FSTs
Affects Versions: 3.2
Reporter: Dawid Weiss
Assignee: Dawid Weiss
Priority: Minor
 Fix For: 3.3, 4.0

 Attachments: LUCENE-3206.patch


 The current API is still marked @experimental, so I think there's still time 
 to fiddle with it. I've been using the current API for some time and I do 
 have some ideas for improvement. This is a placeholder for these -- I'll post 
 a patch once I have a working proof of concept.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3208) Move Query.weight() to IndexSearcher as protected method

2011-06-16 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050473#comment-13050473
 ] 

Yonik Seeley commented on LUCENE-3208:
--

+1, looks good!  
Doesn't seem like it's worth the trouble to catch Weight being normalized more 
than once.  I'd say just commit this as is.

 Move Query.weight() to IndexSearcher as protected method
 

 Key: LUCENE-3208
 URL: https://issues.apache.org/jira/browse/LUCENE-3208
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.3, 4.0

 Attachments: LUCENE-3208.patch, LUCENE-3208.patch


 We had this issue several times, latest in LUCENE-3207.
 The method Query.weight() was left in Query for backwards reasons in Lucene 
 2.9 when we changed Weight class. This method is only to be called on 
 top-level queries - and this is done by IndexSearcher. This method is just a 
 utility method, that has nothing to do with the query itsself (it just 
 combines the createWeight method and calls the normalization afterwards). 
 The problem we have is that any query that wraps other queries (like 
 CustomScore, ConstantScore, Boolean) calls Query.weight() instead of 
 Query.createWeight(), it will do normalization two times, leading to strange 
 bugs.
 For 3.3 I will make Query.weight() simply delegate to IndexSearcher's 
 replacement method with a big deprecation warning, so user sees this. In 
 IndexSearcher itsself the method will be protected to only be called by 
 itsself or subclasses of IndexSearcher. Delegation for backwards is no 
 problem, as protected is accessible by classes in same package.
 I would suggest the method name to be 
 IndexSearcher.createNormalizedWeight(Query q)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3208) Move Query.weight() to IndexSearcher as protected method

2011-06-16 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050477#comment-13050477
 ] 

Robert Muir commented on LUCENE-3208:
-

i think its worth the trouble, if we can do it.

we shouldnt rely upon the fact that getting sumOfSquaredWeights in some of 
these weights currently has *side effects* and sometimes is just wasted 
computation.

other times it creates wrong scores.


 Move Query.weight() to IndexSearcher as protected method
 

 Key: LUCENE-3208
 URL: https://issues.apache.org/jira/browse/LUCENE-3208
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.3, 4.0

 Attachments: LUCENE-3208.patch, LUCENE-3208.patch


 We had this issue several times, latest in LUCENE-3207.
 The method Query.weight() was left in Query for backwards reasons in Lucene 
 2.9 when we changed Weight class. This method is only to be called on 
 top-level queries - and this is done by IndexSearcher. This method is just a 
 utility method, that has nothing to do with the query itsself (it just 
 combines the createWeight method and calls the normalization afterwards). 
 The problem we have is that any query that wraps other queries (like 
 CustomScore, ConstantScore, Boolean) calls Query.weight() instead of 
 Query.createWeight(), it will do normalization two times, leading to strange 
 bugs.
 For 3.3 I will make Query.weight() simply delegate to IndexSearcher's 
 replacement method with a big deprecation warning, so user sees this. In 
 IndexSearcher itsself the method will be protected to only be called by 
 itsself or subclasses of IndexSearcher. Delegation for backwards is no 
 problem, as protected is accessible by classes in same package.
 I would suggest the method name to be 
 IndexSearcher.createNormalizedWeight(Query q)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs

2011-06-16 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3191.


Resolution: Fixed

Thanks Uwe!

 Add TopDocs.merge to merge multiple TopDocs
 ---

 Key: LUCENE-3191
 URL: https://issues.apache.org/jira/browse/LUCENE-3191
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 3.3, 4.0

 Attachments: LUCENE-3191-3x.patch, LUCENE-3191.patch, 
 LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch


 It's not easy today to merge TopDocs, eg produced by multiple shards,
 supporting arbitrary Sort.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2186) DataImportHandler multi-threaded option throws exception

2011-06-16 Thread Frank Wesemann (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Frank Wesemann updated SOLR-2186:
-

Attachment: TestTikaEntityProcessor.patch

Adds a test for entity threads=1 ...

 DataImportHandler multi-threaded option throws exception
 

 Key: SOLR-2186
 URL: https://issues.apache.org/jira/browse/SOLR-2186
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Reporter: Lance Norskog
Assignee: Grant Ingersoll
 Attachments: SOLR-2186.patch, SOLR-2186.patch, Solr-2186.patch, 
 TestDocBuilderThreaded.java, TestTikaEntityProcessor.patch, TikaResolver.patch


 The multi-threaded option for the DataImportHandler throws an exception and 
 the entire operation fails. This is true even if only 1 thread is configured 
 via *threads='1'*

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3208) Move Query.weight() to IndexSearcher as protected method

2011-06-16 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050488#comment-13050488
 ] 

Uwe Schindler commented on LUCENE-3208:
---

A second idea would be that LuceneTestCase.newSearcher() returns such a 
Searcher, that wraps and disallows this. We have other helper classes like 
MockDirectory asserting similar things.

I am currently thinking about coding this, its just a few lines.

 Move Query.weight() to IndexSearcher as protected method
 

 Key: LUCENE-3208
 URL: https://issues.apache.org/jira/browse/LUCENE-3208
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.3, 4.0

 Attachments: LUCENE-3208.patch, LUCENE-3208.patch


 We had this issue several times, latest in LUCENE-3207.
 The method Query.weight() was left in Query for backwards reasons in Lucene 
 2.9 when we changed Weight class. This method is only to be called on 
 top-level queries - and this is done by IndexSearcher. This method is just a 
 utility method, that has nothing to do with the query itsself (it just 
 combines the createWeight method and calls the normalization afterwards). 
 The problem we have is that any query that wraps other queries (like 
 CustomScore, ConstantScore, Boolean) calls Query.weight() instead of 
 Query.createWeight(), it will do normalization two times, leading to strange 
 bugs.
 For 3.3 I will make Query.weight() simply delegate to IndexSearcher's 
 replacement method with a big deprecation warning, so user sees this. In 
 IndexSearcher itsself the method will be protected to only be called by 
 itsself or subclasses of IndexSearcher. Delegation for backwards is no 
 problem, as protected is accessible by classes in same package.
 I would suggest the method name to be 
 IndexSearcher.createNormalizedWeight(Query q)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3208) Move Query.weight() to IndexSearcher as protected method

2011-06-16 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050498#comment-13050498
 ] 

Robert Muir commented on LUCENE-3208:
-

bq. Wrapping every weight just makes things uglier, esp if you want to do 
something with the produced weight.

It doesn't have to be done this way necessarily. Personally i would be happy if 
TermWeight had a boolean 'normalized' (used only for asserting) and an assert.

it doesn't have to be totally perfect, but, I refuse to debug this issue again.

If its not done here, I will open a blocker issue!


 Move Query.weight() to IndexSearcher as protected method
 

 Key: LUCENE-3208
 URL: https://issues.apache.org/jira/browse/LUCENE-3208
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.3, 4.0

 Attachments: LUCENE-3208.patch, LUCENE-3208.patch


 We had this issue several times, latest in LUCENE-3207.
 The method Query.weight() was left in Query for backwards reasons in Lucene 
 2.9 when we changed Weight class. This method is only to be called on 
 top-level queries - and this is done by IndexSearcher. This method is just a 
 utility method, that has nothing to do with the query itsself (it just 
 combines the createWeight method and calls the normalization afterwards). 
 The problem we have is that any query that wraps other queries (like 
 CustomScore, ConstantScore, Boolean) calls Query.weight() instead of 
 Query.createWeight(), it will do normalization two times, leading to strange 
 bugs.
 For 3.3 I will make Query.weight() simply delegate to IndexSearcher's 
 replacement method with a big deprecation warning, so user sees this. In 
 IndexSearcher itsself the method will be protected to only be called by 
 itsself or subclasses of IndexSearcher. Delegation for backwards is no 
 problem, as protected is accessible by classes in same package.
 I would suggest the method name to be 
 IndexSearcher.createNormalizedWeight(Query q)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3208) Move Query.weight() to IndexSearcher as protected method

2011-06-16 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-3208:
--

Attachment: LUCENE-3208-LTC.patch

Here is my idea to enforce one-time normalizing and prevent side-effects during 
tests.

 Move Query.weight() to IndexSearcher as protected method
 

 Key: LUCENE-3208
 URL: https://issues.apache.org/jira/browse/LUCENE-3208
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.3, 4.0

 Attachments: LUCENE-3208-LTC.patch, LUCENE-3208.patch, 
 LUCENE-3208.patch


 We had this issue several times, latest in LUCENE-3207.
 The method Query.weight() was left in Query for backwards reasons in Lucene 
 2.9 when we changed Weight class. This method is only to be called on 
 top-level queries - and this is done by IndexSearcher. This method is just a 
 utility method, that has nothing to do with the query itsself (it just 
 combines the createWeight method and calls the normalization afterwards). 
 The problem we have is that any query that wraps other queries (like 
 CustomScore, ConstantScore, Boolean) calls Query.weight() instead of 
 Query.createWeight(), it will do normalization two times, leading to strange 
 bugs.
 For 3.3 I will make Query.weight() simply delegate to IndexSearcher's 
 replacement method with a big deprecation warning, so user sees this. In 
 IndexSearcher itsself the method will be protected to only be called by 
 itsself or subclasses of IndexSearcher. Delegation for backwards is no 
 problem, as protected is accessible by classes in same package.
 I would suggest the method name to be 
 IndexSearcher.createNormalizedWeight(Query q)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3208) Move Query.weight() to IndexSearcher as protected method

2011-06-16 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050506#comment-13050506
 ] 

Robert Muir commented on LUCENE-3208:
-

Great, Uwe, I'm satisfied.

Sorry for being so vocal about this, but i wasted many hours on this stupid bug 
(I know you did before, too), and the bug is not very friendly to people that 
debug with System.out.println, you don't catch it until you pull out enough of 
your hair to start using Thread.dumpStack...

 Move Query.weight() to IndexSearcher as protected method
 

 Key: LUCENE-3208
 URL: https://issues.apache.org/jira/browse/LUCENE-3208
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.3, 4.0

 Attachments: LUCENE-3208-LTC.patch, LUCENE-3208.patch, 
 LUCENE-3208.patch


 We had this issue several times, latest in LUCENE-3207.
 The method Query.weight() was left in Query for backwards reasons in Lucene 
 2.9 when we changed Weight class. This method is only to be called on 
 top-level queries - and this is done by IndexSearcher. This method is just a 
 utility method, that has nothing to do with the query itsself (it just 
 combines the createWeight method and calls the normalization afterwards). 
 The problem we have is that any query that wraps other queries (like 
 CustomScore, ConstantScore, Boolean) calls Query.weight() instead of 
 Query.createWeight(), it will do normalization two times, leading to strange 
 bugs.
 For 3.3 I will make Query.weight() simply delegate to IndexSearcher's 
 replacement method with a big deprecation warning, so user sees this. In 
 IndexSearcher itsself the method will be protected to only be called by 
 itsself or subclasses of IndexSearcher. Delegation for backwards is no 
 problem, as protected is accessible by classes in same package.
 I would suggest the method name to be 
 IndexSearcher.createNormalizedWeight(Query q)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2091) Add BM25 Scoring to Lucene

2011-06-16 Thread ian towey (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050508#comment-13050508
 ] 

ian towey commented on LUCENE-2091:
---

Not sure am i using this BM25BooleanQuery correctly, getting variation in the 
number of hits when testing v QueryParser. Is there limitations to the query 
string that BM25BooleanQuery can deal with, e.g.  gas OR ((oil AND car) NOT 
ship), the results returned by BM25BooleanQuery seem to be the all docs that 
don't contain the term ship, (comparing  BM25BooleanQuery v QueryParser)


 Add BM25 Scoring to Lucene
 --

 Key: LUCENE-2091
 URL: https://issues.apache.org/jira/browse/LUCENE-2091
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/other
Reporter: Yuval Feinstein
Priority: Minor
 Fix For: 4.0

 Attachments: BM25SimilarityProvider.java, LUCENE-2091.patch, 
 persianlucene.jpg

   Original Estimate: 48h
  Remaining Estimate: 48h

 http://nlp.uned.es/~jperezi/Lucene-BM25/ describes an implementation of 
 Okapi-BM25 scoring in the Lucene framework,
 as an alternative to the standard Lucene scoring (which is a version of mixed 
 boolean/TFIDF).
 I have refactored this a bit, added unit tests and improved the runtime 
 somewhat.
 I would like to contribute the code to Lucene under contrib. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[Lucene.Net] [jira] [Resolved] (LUCENENET-426) Mark BaseFragmentsBuilder methods as virtual

2011-06-16 Thread Digy (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENENET-426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Digy resolved LUCENENET-426.


   Resolution: Fixed
Fix Version/s: Lucene.Net 2.9.4g
   Lucene.Net 2.9.4

Thanks Itamar.
Fixed in trunk  2.9.4g branch.

DIGY

 Mark BaseFragmentsBuilder methods as virtual
 

 Key: LUCENENET-426
 URL: https://issues.apache.org/jira/browse/LUCENENET-426
 Project: Lucene.Net
  Issue Type: Improvement
  Components: Lucene.Net Contrib
Affects Versions: Lucene.Net 2.9.2, Lucene.Net 2.9.4, Lucene.Net 3.x, 
 Lucene.Net 2.9.4g
Reporter: Itamar Syn-Hershko
Priority: Minor
 Fix For: Lucene.Net 2.9.4, Lucene.Net 2.9.4g

 Attachments: fvh.patch


 Without marking methods in BaseFragmentsBuilder as virtual, it is meaningless 
 to have FragmentsBuilder deriving from a class named Base, since most of 
 its functionality cannot be overridden. Attached is a patch for marking the 
 important methods virtual.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Indexing slower in trunk

2011-06-16 Thread Erick Erickson
OK, after more tests I'm pretty sure that my personal machine
that I'm testing on is just resource-constrained, leading to the
results I mentioned before. After all, I'm running my Solr
instance, the indexing program, etc on a Macbook
with 1 CPU and 2 cores. The indexing program is parsing the
XML.

On a proper setup, where the indexing machine was separate
from the machine(s) feeding the index process I suspect this would
be a different story. H, I may try that sometime too

Best
Erick

On Tue, Jun 14, 2011 at 9:25 AM, Uwe Schindler u...@thetaphi.de wrote:
 For simple removing deletes, there is also IW.expungeDeletes(), which is
 less intensive! Not sure if solr support this, too, but as far as I know
 there is an issue open.

 Also please note: As soon as one segment is selected for merging (the merge
 policy may also do this dependent on the number of deletes in a segment), it
 will reclaim all deleted ressources - that's what merging does. So expunging
 deletes once per week is a good idea, if your index consists of very old and
 large segments that are rarely merged anymore and lots of documents are
 deleted from them.

 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de


 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Tuesday, June 14, 2011 3:19 PM
 To: dev@lucene.apache.org
 Subject: Re: Indexing slower in trunk

 Optimization used to have a very noticeable impact on search speed prior
 to
 some index format changes from quite a while ago.

 At this point the effect is much less noticeable, but the thing optimize
 does
 do is reclaim resources from deleted documents. If you have lots of
 deletions, it's a good idea to periodically optimize, but in that case
 it's often
 done pretty infrequently (once a
 day/week/month) rather than as part of any ongoing indexing process.

 Best
 Erick

 2011/6/14 Yury Kats yuryk...@yahoo.com:
  On 6/14/2011 4:28 AM, Uwe Schindler wrote:
  indexing and optimizing was only a
  good idea pre Lucene-2.9, now it's mostly obsolete)
 
  Could you please elaborate on this? Is optimizing obsolete in general
  or after indexing new documents? Is it obsolete after deletions? And
  what it mostly?
 
  Thanks!
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
  additional commands, e-mail: dev-h...@lucene.apache.org
 
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
 commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1431) CommComponent abstracted

2011-06-16 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050521#comment-13050521
 ] 

Jason Rutherglen commented on SOLR-1431:


Seems to be fine.  It'd be great to modularize Zookeeper references into a 
separate abstract interface (like what's done here), and not tie it to 
CoreContainer.  I think it could conflict with other uses of Zookeeper when the 
library versions are different.

 CommComponent abstracted
 

 Key: SOLR-1431
 URL: https://issues.apache.org/jira/browse/SOLR-1431
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 4.0
Reporter: Jason Rutherglen
Assignee: Noble Paul
 Fix For: 4.0

 Attachments: SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
 SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
 SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch


 We'll abstract CommComponent in this issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1431) CommComponent abstracted

2011-06-16 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050544#comment-13050544
 ] 

Mark Miller commented on SOLR-1431:
---

bq. I think it could conflict with other uses of Zookeeper when the library 
versions are different.

Yeah - always a problem with dependencies like this. It's hard to say what 
direction we go right now though - some have argued even non zookeeper mode 
should be single install zookeeper mode instead. Has it's advantages and 
disadvantages I think. For me, I can really only take it an issue at a team, 
and while I hope to drive some more things around SolrCloud soon, it's 
obviously been a while. Others have some issues open, but more ideas are always 
good.

I certainly agree that CoreContainer could be modularized better - would help 
for testing too. I have an issue to do this for the persistence code (baby 
steps :) ), but feel free to open further issues.

I somewhat took the easy route in integrating zookeeper - there are certainly 
lots of improvements that could be made overall. And TODO's to finish - I think 
a couple guys have done a few from the wiki in various issues, and I know 
loggly has privately impl'd a couple from their talk at revolution (would be 
cool to see that come back, but I know they are busy guys). I love TODO's - 
minimal effort, but when you put one at a future pain point, your code doesn't 
look so stupid even when it's not perfect yet :)

We should discuss in other issues though.

 CommComponent abstracted
 

 Key: SOLR-1431
 URL: https://issues.apache.org/jira/browse/SOLR-1431
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 4.0
Reporter: Jason Rutherglen
Assignee: Noble Paul
 Fix For: 4.0

 Attachments: SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
 SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
 SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch


 We'll abstract CommComponent in this issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Indexing slower in trunk

2011-06-16 Thread Martijn v Groningen
@Uwe
Solr does support expunge deletes:
http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22commit.22

On 16 June 2011 18:05, Erick Erickson erickerick...@gmail.com wrote:

 OK, after more tests I'm pretty sure that my personal machine
 that I'm testing on is just resource-constrained, leading to the
 results I mentioned before. After all, I'm running my Solr
 instance, the indexing program, etc on a Macbook
 with 1 CPU and 2 cores. The indexing program is parsing the
 XML.

 On a proper setup, where the indexing machine was separate
 from the machine(s) feeding the index process I suspect this would
 be a different story. H, I may try that sometime too

 Best
 Erick

 On Tue, Jun 14, 2011 at 9:25 AM, Uwe Schindler u...@thetaphi.de wrote:
  For simple removing deletes, there is also IW.expungeDeletes(), which is
  less intensive! Not sure if solr support this, too, but as far as I know
  there is an issue open.
 
  Also please note: As soon as one segment is selected for merging (the
 merge
  policy may also do this dependent on the number of deletes in a segment),
 it
  will reclaim all deleted ressources - that's what merging does. So
 expunging
  deletes once per week is a good idea, if your index consists of very old
 and
  large segments that are rarely merged anymore and lots of documents are
  deleted from them.
 
  -
  Uwe Schindler
  H.-H.-Meier-Allee 63, D-28213 Bremen
  http://www.thetaphi.de
  eMail: u...@thetaphi.de
 
 
  -Original Message-
  From: Erick Erickson [mailto:erickerick...@gmail.com]
  Sent: Tuesday, June 14, 2011 3:19 PM
  To: dev@lucene.apache.org
  Subject: Re: Indexing slower in trunk
 
  Optimization used to have a very noticeable impact on search speed prior
  to
  some index format changes from quite a while ago.
 
  At this point the effect is much less noticeable, but the thing optimize
  does
  do is reclaim resources from deleted documents. If you have lots of
  deletions, it's a good idea to periodically optimize, but in that case
  it's often
  done pretty infrequently (once a
  day/week/month) rather than as part of any ongoing indexing process.
 
  Best
  Erick
 
  2011/6/14 Yury Kats yuryk...@yahoo.com:
   On 6/14/2011 4:28 AM, Uwe Schindler wrote:
   indexing and optimizing was only a
   good idea pre Lucene-2.9, now it's mostly obsolete)
  
   Could you please elaborate on this? Is optimizing obsolete in general
   or after indexing new documents? Is it obsolete after deletions? And
   what it mostly?
  
   Thanks!
  
   -
   To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
   additional commands, e-mail: dev-h...@lucene.apache.org
  
  
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
 additional
  commands, e-mail: dev-h...@lucene.apache.org
 
 
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: dev-h...@lucene.apache.org
 
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




-- 
Met vriendelijke groet,

Martijn van Groningen


RE: svn commit: r1136543 - /lucene/dev/branches/branch_3x/lucene/CHANGES.txt

2011-06-16 Thread Steven A Rowe
Thanks Robert, I was the botcher...  TODO: double check CHANGES.txt diff after 
a merge... - Steve

 -Original Message-
 From: rm...@apache.org [mailto:rm...@apache.org]
 Sent: Thursday, June 16, 2011 12:57 PM
 To: comm...@lucene.apache.org
 Subject: svn commit: r1136543 -
 /lucene/dev/branches/branch_3x/lucene/CHANGES.txt

 Author: rmuir
 Date: Thu Jun 16 16:56:39 2011
 New Revision: 1136543

 URL: http://svn.apache.org/viewvc?rev=1136543view=rev
 Log:
 LUCENE-3204: fix botched CHANGES merge

 Modified:
 lucene/dev/branches/branch_3x/lucene/CHANGES.txt

 Modified: lucene/dev/branches/branch_3x/lucene/CHANGES.txt
 URL:
 http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/CHANGES
 .txt?rev=1136543r1=1136542r2=1136543view=diff
 =
 =
 --- lucene/dev/branches/branch_3x/lucene/CHANGES.txt (original)
 +++ lucene/dev/branches/branch_3x/lucene/CHANGES.txt Thu Jun 16 16:56:39
 2011
 @@ -3,468 +3,6 @@ Lucene Change Log
  For more information on past and future Lucene versions, please see:
  http://s.apache.org/luceneversions

 -=== Trunk (not yet released) ===
 -
 -Changes in backwards compatibility policy
 -
 -* LUCENE-1458, LUCENE-2111, LUCENE-2354: Changes from flexible indexing:
 -
 -  - On upgrading to 3.1, if you do not fully reindex your documents,
 -Lucene will emulate the new flex API on top of the old index,
 -incurring some performance cost (up to ~10% slowdown, typically).
 -To prevent this slowdown, use oal.index.IndexUpgrader
 -to upgrade your indexes to latest file format (LUCENE-3082).
 -
 -Mixed flex/pre-flex indexes are perfectly fine -- the two
 -emulation layers (flex API on pre-flex index, and pre-flex API on
 -flex index) will remap the access as required.  So on upgrading to
 -3.1 you can start indexing new documents into an existing index.
 -To get optimal performance, use oal.index.IndexUpgrader
 -to upgrade your indexes to latest file format (LUCENE-3082).
 -
 -  - The postings APIs (TermEnum, TermDocsEnum, TermPositionsEnum)
 -have been removed in favor of the new flexible
 -indexing (flex) APIs (Fields, FieldsEnum, Terms, TermsEnum,
 -DocsEnum, DocsAndPositionsEnum). One big difference is that field
 -and terms are now enumerated separately: a TermsEnum provides a
 -BytesRef (wraps a byte[]) per term within a single field, not a
 -Term.  Another is that when asking for a Docs/AndPositionsEnum, you
 -now specify the skipDocs explicitly (typically this will be the
 -deleted docs, but in general you can provide any Bits).
 -
 -  - MultiReader ctor now throws IOException
 -
 -  - Directory.copy/Directory.copyTo now copies all files (not just
 -index files), since what is and isn't and index file is now
 -dependent on the codecs used.
 -
 -  - UnicodeUtil now uses BytesRef for UTF-8 output, and some method
 -signatures have changed to CharSequence.  These are internal APIs
 -and subject to change suddenly.
 -
 -  - Positional queries (PhraseQuery, *SpanQuery) will now throw an
 -exception if use them on a field that omits positions during
 -indexing (previously they silently returned no results).
 -
 -  - FieldCache.{Byte,Short,Int,Long,Float,Double}Parser's API has
 -changed -- each parse method now takes a BytesRef instead of a
 -String.  If you have an existing Parser, a simple way to fix it is
 -invoke BytesRef.utf8ToString, and pass that String to your
 -existing parser.  This will work, but performance would be better
 -if you could fix your parser to instead operate directly on the
 -byte[] in the BytesRef.
 -
 -  - The internal (experimental) API of NumericUtils changed completely
 -from String to BytesRef. Client code should never use this class,
 -so the change would normally not affect you. If you used some of
 -the methods to inspect terms or create TermQueries out of
 -prefix encoded terms, change to use BytesRef. Please note:
 -Do not use TermQueries to search for single numeric terms.
 -The recommended way is to create a corresponding NumericRangeQuery
 -with upper and lower bound equal and included. TermQueries do not
 -score correct, so the constant score mode of NRQ is the only
 -correct way to handle single value queries.
 -
 -  - NumericTokenStream now works directly on byte[] terms. If you
 -plug a TokenFilter on top of this stream, you will likely get
 -an IllegalArgumentException, because the NTS does not support
 -TermAttribute/CharTermAttribute. If you want to further filter
 -or attach Payloads to NTS, use the new NumericTermAttribute.
 -
 -  (Mike McCandless, Robert Muir, Uwe Schindler, Mark Miller, Michael
 Busch)
 -
 -* LUCENE-2265: FuzzyQuery and WildcardQuery now operate on Unicode
 codepoints,
 -  not unicode code units. For example, a Wildcard ? 

[jira] [Updated] (LUCENE-3208) Move Query.weight() to IndexSearcher as protected method

2011-06-16 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-3208:
--

Attachment: LUCENE-3208.patch

New patch:
- Added AssertingIndexReader in test-framework, this one ensures that weights 
are only normalized once when this is done by IndexSearcher. This class can be 
extended to add further checks
- As suggested by Yonik, changes the key used for fContext in the 
QueryValueSource to be the valuesource itsself. The backup code cannot be 
removed, there is somewhere a bug (new issue)

All tests pass. I would like to commit this to trunk soon and then add 
sophisticated backwards for 3.x :-)

 Move Query.weight() to IndexSearcher as protected method
 

 Key: LUCENE-3208
 URL: https://issues.apache.org/jira/browse/LUCENE-3208
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.3, 4.0

 Attachments: LUCENE-3208-LTC.patch, LUCENE-3208.patch, 
 LUCENE-3208.patch, LUCENE-3208.patch


 We had this issue several times, latest in LUCENE-3207.
 The method Query.weight() was left in Query for backwards reasons in Lucene 
 2.9 when we changed Weight class. This method is only to be called on 
 top-level queries - and this is done by IndexSearcher. This method is just a 
 utility method, that has nothing to do with the query itsself (it just 
 combines the createWeight method and calls the normalization afterwards). 
 The problem we have is that any query that wraps other queries (like 
 CustomScore, ConstantScore, Boolean) calls Query.weight() instead of 
 Query.createWeight(), it will do normalization two times, leading to strange 
 bugs.
 For 3.3 I will make Query.weight() simply delegate to IndexSearcher's 
 replacement method with a big deprecation warning, so user sees this. In 
 IndexSearcher itsself the method will be protected to only be called by 
 itsself or subclasses of IndexSearcher. Delegation for backwards is no 
 problem, as protected is accessible by classes in same package.
 I would suggest the method name to be 
 IndexSearcher.createNormalizedWeight(Query q)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Related project link to ManifoldCF from Solr site?

2011-06-16 Thread karl.wright
Hi folks,

How hard would it be to get a link to ManifoldCF from the Solr site's 
related-link section?  I'm seeing a lot of people who know Solr but have no 
idea ManifoldCF even exists, and I'd like to find some way to correct that 
problem.

Karl



[jira] [Commented] (LUCENE-3208) Move Query.weight() to IndexSearcher as protected method

2011-06-16 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050603#comment-13050603
 ] 

Uwe Schindler commented on LUCENE-3208:
---

Committed trunk revision: 1136568

Now backporting and adding sophisticated backwards...

 Move Query.weight() to IndexSearcher as protected method
 

 Key: LUCENE-3208
 URL: https://issues.apache.org/jira/browse/LUCENE-3208
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.3, 4.0

 Attachments: LUCENE-3208-LTC.patch, LUCENE-3208.patch, 
 LUCENE-3208.patch, LUCENE-3208.patch


 We had this issue several times, latest in LUCENE-3207.
 The method Query.weight() was left in Query for backwards reasons in Lucene 
 2.9 when we changed Weight class. This method is only to be called on 
 top-level queries - and this is done by IndexSearcher. This method is just a 
 utility method, that has nothing to do with the query itsself (it just 
 combines the createWeight method and calls the normalization afterwards). 
 The problem we have is that any query that wraps other queries (like 
 CustomScore, ConstantScore, Boolean) calls Query.weight() instead of 
 Query.createWeight(), it will do normalization two times, leading to strange 
 bugs.
 For 3.3 I will make Query.weight() simply delegate to IndexSearcher's 
 replacement method with a big deprecation warning, so user sees this. In 
 IndexSearcher itsself the method will be protected to only be called by 
 itsself or subclasses of IndexSearcher. Delegation for backwards is no 
 problem, as protected is accessible by classes in same package.
 I would suggest the method name to be 
 IndexSearcher.createNormalizedWeight(Query q)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-2602) It would be great if the Solr site referred to ManifoldCF as a related product

2011-06-16 Thread Karl Wright (JIRA)
It would be great if the Solr site referred to ManifoldCF as a related product
--

 Key: SOLR-2602
 URL: https://issues.apache.org/jira/browse/SOLR-2602
 Project: Solr
  Issue Type: Improvement
  Components: documentation
Reporter: Karl Wright
Priority: Minor


The Related products section of the Solr site has just Lucene and Nutch in 
it.  It would be appropriate to have a link for ManifoldCF as well.  Url would 
be: http://incubator.apache.org/connectors/


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Related project link to ManifoldCF from Solr site?

2011-06-16 Thread Simon Willnauer
a link in the related projects section seems possible, what do other think?

simon

On Thu, Jun 16, 2011 at 7:46 PM,  karl.wri...@nokia.com wrote:
 Hi folks,



 How hard would it be to get a link to ManifoldCF from the Solr site’s
 related-link section?  I’m seeing a lot of people who know Solr but have no
 idea ManifoldCF even exists, and I’d like to find some way to correct that
 problem.



 Karl



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1431) CommComponent abstracted

2011-06-16 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050606#comment-13050606
 ] 

Noble Paul commented on SOLR-1431:
--

Jason, Yeah , it would be ideal. But we need to get things moving fast enough 
so that users can get the benefit ASAP. We badly need the cloud features now. 
I'm sure there are others too. We have clusters with 1000's of Solr hosts which 
are managed w/ ad-hoc tools.

 CommComponent abstracted
 

 Key: SOLR-1431
 URL: https://issues.apache.org/jira/browse/SOLR-1431
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 4.0
Reporter: Jason Rutherglen
Assignee: Noble Paul
 Fix For: 4.0

 Attachments: SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
 SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
 SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch


 We'll abstract CommComponent in this issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1431) CommComponent abstracted

2011-06-16 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050613#comment-13050613
 ] 

Jason Rutherglen commented on SOLR-1431:


@Noble I agree, I don't think committing this patch should hold things up.  
That was just a little note.  

I've been looking at implementing Solr into HBase and am worried [somewhat] 
about the ZK libaries.  HBase + Solr can help with massive scale near realtime 
systems you've described, eg, HBase implements splitting, partitioning, a fast 
write ahead log, etc.  Facebook has implemented the index directly into HBase, 
which probably offers degraded indexing and search performance.

bq. We badly need the cloud features now

Right, many users are going with Elastic Search for the reasons mentioned.

 CommComponent abstracted
 

 Key: SOLR-1431
 URL: https://issues.apache.org/jira/browse/SOLR-1431
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 4.0
Reporter: Jason Rutherglen
Assignee: Noble Paul
 Fix For: 4.0

 Attachments: SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
 SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
 SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch


 We'll abstract CommComponent in this issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1431) CommComponent abstracted

2011-06-16 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050618#comment-13050618
 ] 

Noble Paul commented on SOLR-1431:
--

Jason. Open an issue and I will be glad to pitch in

 CommComponent abstracted
 

 Key: SOLR-1431
 URL: https://issues.apache.org/jira/browse/SOLR-1431
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 4.0
Reporter: Jason Rutherglen
Assignee: Noble Paul
 Fix For: 4.0

 Attachments: SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
 SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
 SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch


 We'll abstract CommComponent in this issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Related project link to ManifoldCF from Solr site?

2011-06-16 Thread Mark Miller

On Jun 16, 2011, at 2:00 PM, Simon Willnauer wrote:

 a link in the related projects section seems possible, what do other think?

Seems fine to me - for open source projects anyway? Apache Open Source projects?

Too lazy to think about it, but lazily, I'm willing to support linking to 
Apache Open Source projects that integrate with Solr without hesitation.

- Mark Miller
lucidimagination.com









-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2602) It would be great if the Solr site referred to ManifoldCF as a related product

2011-06-16 Thread Karl Wright (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright updated SOLR-2602:
--

Attachment: SOLR-2602.patch

 It would be great if the Solr site referred to ManifoldCF as a related product
 --

 Key: SOLR-2602
 URL: https://issues.apache.org/jira/browse/SOLR-2602
 Project: Solr
  Issue Type: Improvement
  Components: documentation
Reporter: Karl Wright
Priority: Minor
 Attachments: SOLR-2602.patch


 The Related products section of the Solr site has just Lucene and Nutch in 
 it.  It would be appropriate to have a link for ManifoldCF as well.  Url 
 would be: http://incubator.apache.org/connectors/

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: Related project link to ManifoldCF from Solr site?

2011-06-16 Thread karl.wright
I created a ticket for it - SOLR-2602.  I'll attach a patch shortly.
Karl

-Original Message-
From: ext Simon Willnauer [mailto:simon.willna...@googlemail.com] 
Sent: Thursday, June 16, 2011 2:00 PM
To: dev@lucene.apache.org
Subject: Re: Related project link to ManifoldCF from Solr site?

a link in the related projects section seems possible, what do other think?

simon

On Thu, Jun 16, 2011 at 7:46 PM,  karl.wri...@nokia.com wrote:
 Hi folks,



 How hard would it be to get a link to ManifoldCF from the Solr site’s 
 related-link section?  I’m seeing a lot of people who know Solr but 
 have no idea ManifoldCF even exists, and I’d like to find some way to 
 correct that problem.



 Karl



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional 
commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1431) CommComponent abstracted

2011-06-16 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050629#comment-13050629
 ] 

Mark Miller commented on SOLR-1431:
---

Got a 3 day weekend, so I won't likely look at nobles patch more till next week 
- I def will still take a peek and weigh in, but this is simple enough that I 
don't mind if we just commit and iterate on trunk if necessary in further 
issues.

 CommComponent abstracted
 

 Key: SOLR-1431
 URL: https://issues.apache.org/jira/browse/SOLR-1431
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 4.0
Reporter: Jason Rutherglen
Assignee: Noble Paul
 Fix For: 4.0

 Attachments: SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
 SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
 SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch


 We'll abstract CommComponent in this issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1431) CommComponent abstracted

2011-06-16 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050638#comment-13050638
 ] 

Jason Rutherglen commented on SOLR-1431:


Noble, the Jira issue is HBASE-3529 where much of the code is offline on Git 
because of the different pieces involved.  That being said, I've linked the 
various Lucene and Solr Jira issues that are required to implement Solr in 
HBase, eg LUCENE-2919 and SOLR-2563.

 CommComponent abstracted
 

 Key: SOLR-1431
 URL: https://issues.apache.org/jira/browse/SOLR-1431
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 4.0
Reporter: Jason Rutherglen
Assignee: Noble Paul
 Fix For: 4.0

 Attachments: SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
 SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
 SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch


 We'll abstract CommComponent in this issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2477) add analyzer type=phrase

2011-06-16 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050642#comment-13050642
 ] 

Hoss Man commented on SOLR-2477:


At first glance this looks great to me ... but we should seriously consider 
whether FieldQParser should also be using getPhraseAnalyzer.  I think given the 
semantics the answer is yes -- but either way it should be clearly documented.

we should also make sure analysis.jsp and the Analysis RequestHandler(s?) have 
options for using this.



 add analyzer type=phrase
 --

 Key: SOLR-2477
 URL: https://issues.apache.org/jira/browse/SOLR-2477
 Project: Solr
  Issue Type: Improvement
Reporter: Robert Muir
 Fix For: 4.0

 Attachments: SOLR-2477.patch


 This is just exposing LUCENE-2892, so you can easily configure things
 so that if users put things in double quotes they get a more precise search.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (SOLR-219) Determine if prefix, wildcard, fuzzy queries should be lowercased

2011-06-16 Thread Mike Sokolov (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050645#comment-13050645
 ] 

Mike Sokolov edited comment on SOLR-219 at 6/16/11 6:52 PM:


Is there a reason this issue can't be dealt with by including an appropriate 
MappingCharFilter in the field definition?

  was (Author: sokolov):
Is there a reson this issue can't be dealt with by including an appropriate 
MappingCharFilter in the field definition?
  
 Determine if prefix, wildcard, fuzzy queries should be lowercased
 -

 Key: SOLR-219
 URL: https://issues.apache.org/jira/browse/SOLR-219
 Project: Solr
  Issue Type: Improvement
Reporter: Yonik Seeley
Priority: Minor
 Fix For: 3.3

 Attachments: lowercase_prefix.patch, wildcardlowercase.patch


 Solr should be able to do the right thing when doing prefix/wildcard/fuzzy 
 queries on fields with respect to lowercasing or not.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2477) add analyzer type=phrase

2011-06-16 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050646#comment-13050646
 ] 

Robert Muir commented on SOLR-2477:
---

{quote}
but we should seriously consider whether FieldQParser should also be using 
getPhraseAnalyzer. 
{quote}

Looking at how this is described, it seems to me it should use the phrase 
analyzer... we can document that it does this, and of course the change is 
backwards compatible (because if you don't define it, its your query analyzer).

{quote}
we should also make sure analysis.jsp and the Analysis RequestHandler(s?) have 
options for using this.
{quote}

I agree... hopefully this isn't too bad.


 add analyzer type=phrase
 --

 Key: SOLR-2477
 URL: https://issues.apache.org/jira/browse/SOLR-2477
 Project: Solr
  Issue Type: Improvement
Reporter: Robert Muir
 Fix For: 4.0

 Attachments: SOLR-2477.patch


 This is just exposing LUCENE-2892, so you can easily configure things
 so that if users put things in double quotes they get a more precise search.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-219) Determine if prefix, wildcard, fuzzy queries should be lowercased

2011-06-16 Thread Mike Sokolov (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050645#comment-13050645
 ] 

Mike Sokolov commented on SOLR-219:
---

Is there a reson this issue can't be dealt with by including an appropriate 
MappingCharFilter in the field definition?

 Determine if prefix, wildcard, fuzzy queries should be lowercased
 -

 Key: SOLR-219
 URL: https://issues.apache.org/jira/browse/SOLR-219
 Project: Solr
  Issue Type: Improvement
Reporter: Yonik Seeley
Priority: Minor
 Fix For: 3.3

 Attachments: lowercase_prefix.patch, wildcardlowercase.patch


 Solr should be able to do the right thing when doing prefix/wildcard/fuzzy 
 queries on fields with respect to lowercasing or not.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2490) PropertiesRequestHandler; encode line.separator

2011-06-16 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050648#comment-13050648
 ] 

Hoss Man commented on SOLR-2490:


hmmm...

i don't think we should do this.

the request handler as written is total agnostic to what the properties are or 
how they are being written out -- it just builds up the response and lets the 
writer take care of it.  As noted the XmlResponseWriter does in fact output the 
newline.

if PropertiesRequestHandler tried to specially encode any (or all) properties 
with whitespace in them, that would screw up clients that were treating the 
whitespace as significant when parsing the xml -- and worse it would royally 
screw up clients using other response writers where whitespace is always 
significant.


 PropertiesRequestHandler; encode line.separator
 ---

 Key: SOLR-2490
 URL: https://issues.apache.org/jira/browse/SOLR-2490
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Reporter: Stefan Matheis (steffkes)
Priority: Trivial

 Currently, the XML looks like this:
 {code}!-- .. --
 str name=java.io.tmpdir/tmp/str
 str name=line.separator
 /str
 str name=java.vm.specification.vendorSun Microsystems Inc./str
 !-- .. --{code}
 would be good to have this instead:
 {code}!-- .. --
 str name=java.io.tmpdir/tmp/str
 str name=line.separator\n/str
 str name=java.vm.specification.vendorSun Microsystems Inc./str
 !-- .. --{code}
 afterwords we will be able to display to used line seperator

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2490) PropertiesRequestHandler; encode line.separator

2011-06-16 Thread Stefan Matheis (steffkes) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050653#comment-13050653
 ] 

Stefan Matheis (steffkes) commented on SOLR-2490:
-

bq. i don't think we should do this.
okay -- but, then there is no chance to show any difference between {{\n}}, 
{{\r}} or {{\r\n}} in the interface, because it's just a linebreak in the 
xml-source. 

bq. if PropertiesRequestHandler tried to specially encode any (or all) 
properties with whitespace in them ...
what about especially (and only) this one? That's a common problem for 
displaying linebreaks.

 PropertiesRequestHandler; encode line.separator
 ---

 Key: SOLR-2490
 URL: https://issues.apache.org/jira/browse/SOLR-2490
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Reporter: Stefan Matheis (steffkes)
Priority: Trivial

 Currently, the XML looks like this:
 {code}!-- .. --
 str name=java.io.tmpdir/tmp/str
 str name=line.separator
 /str
 str name=java.vm.specification.vendorSun Microsystems Inc./str
 !-- .. --{code}
 would be good to have this instead:
 {code}!-- .. --
 str name=java.io.tmpdir/tmp/str
 str name=line.separator\n/str
 str name=java.vm.specification.vendorSun Microsystems Inc./str
 !-- .. --{code}
 afterwords we will be able to display to used line seperator

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3208) Move Query.weight() to IndexSearcher as protected method

2011-06-16 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050654#comment-13050654
 ] 

Uwe Schindler commented on LUCENE-3208:
---

Missed a change in the new grouping module: Trunk revision: 1136605

 Move Query.weight() to IndexSearcher as protected method
 

 Key: LUCENE-3208
 URL: https://issues.apache.org/jira/browse/LUCENE-3208
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.3, 4.0

 Attachments: LUCENE-3208-LTC.patch, LUCENE-3208.patch, 
 LUCENE-3208.patch, LUCENE-3208.patch


 We had this issue several times, latest in LUCENE-3207.
 The method Query.weight() was left in Query for backwards reasons in Lucene 
 2.9 when we changed Weight class. This method is only to be called on 
 top-level queries - and this is done by IndexSearcher. This method is just a 
 utility method, that has nothing to do with the query itsself (it just 
 combines the createWeight method and calls the normalization afterwards). 
 The problem we have is that any query that wraps other queries (like 
 CustomScore, ConstantScore, Boolean) calls Query.weight() instead of 
 Query.createWeight(), it will do normalization two times, leading to strange 
 bugs.
 For 3.3 I will make Query.weight() simply delegate to IndexSearcher's 
 replacement method with a big deprecation warning, so user sees this. In 
 IndexSearcher itsself the method will be protected to only be called by 
 itsself or subclasses of IndexSearcher. Delegation for backwards is no 
 problem, as protected is accessible by classes in same package.
 I would suggest the method name to be 
 IndexSearcher.createNormalizedWeight(Query q)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: svn commit: r1135956 - in /lucene/dev/branches/branch_3x: ./ lucene/ lucene/backwards/ lucene/backwards/src/test-framework/ lucene/backwards/src/test/ solr/ solr/contrib/dataimporthandler/ solr/co

2011-06-16 Thread Uwe Schindler
Shalin,

i had to comment out your test because the finally block does not compile with 
Java 5 (Solr 3.1), Jenkins is down at the moment, so did not catch earlier.

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: sha...@apache.org [mailto:sha...@apache.org]
 Sent: Wednesday, June 15, 2011 10:36 AM
 To: comm...@lucene.apache.org
 Subject: svn commit: r1135956 - in /lucene/dev/branches/branch_3x: ./
 lucene/ lucene/backwards/ lucene/backwards/src/test-framework/
 lucene/backwards/src/test/ solr/ solr/contrib/dataimporthandler/
 solr/contrib/dataimporthandler/src/main/java/org/apache/solr/ha...
 
 Author: shalin
 Date: Wed Jun 15 08:36:06 2011
 New Revision: 1135956
 
 URL: http://svn.apache.org/viewvc?rev=1135956view=rev
 Log:
 SOLR-2551 -- Check dataimport.properties for write access (if delta-import is
 supported in DIH configuration) before starting an import
 
 Modified:
 lucene/dev/branches/branch_3x/   (props changed)
 lucene/dev/branches/branch_3x/lucene/   (props changed)
 lucene/dev/branches/branch_3x/lucene/backwards/   (props changed)
 lucene/dev/branches/branch_3x/lucene/backwards/src/test/   (props
 changed)
 lucene/dev/branches/branch_3x/lucene/backwards/src/test-framework/
 (props changed)
 lucene/dev/branches/branch_3x/solr/   (props changed)
 
 lucene/dev/branches/branch_3x/solr/contrib/dataimporthandler/CHANGES.
 txt
 
 lucene/dev/branches/branch_3x/solr/contrib/dataimporthandler/src/main/j
 ava/org/apache/solr/handler/dataimport/DataImporter.java
 
 lucene/dev/branches/branch_3x/solr/contrib/dataimporthandler/src/main/j
 ava/org/apache/solr/handler/dataimport/SolrWriter.java
 
 lucene/dev/branches/branch_3x/solr/contrib/dataimporthandler/src/test/ja
 va/org/apache/solr/handler/dataimport/TestSqlEntityProcessorDelta.java
 
 Modified:
 lucene/dev/branches/branch_3x/solr/contrib/dataimporthandler/CHANGES.
 txt
 URL:
 http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/solr/contrib
 /dataimporthandler/CHANGES.txt?rev=1135956r1=1135955r2=1135956vi
 ew=diff
 ==
 
 ---
 lucene/dev/branches/branch_3x/solr/contrib/dataimporthandler/CHANGES.
 txt (original)
 +++
 lucene/dev/branches/branch_3x/solr/contrib/dataimporthandler/CHANGES
 +++ .txt Wed Jun 15 08:36:06 2011
 @@ -11,7 +11,8 @@ $Id$
 
  ==  3.3.0-dev ==
 
 -(No Changes)
 +* SOLR-2551: Check dataimport.properties for write access (if
 +delta-import is supported
 +  in DIH configuration) before starting an import (C S, shalin)
 
  ==  3.2.0 ==
 
 
 Modified:
 lucene/dev/branches/branch_3x/solr/contrib/dataimporthandler/src/main/j
 ava/org/apache/solr/handler/dataimport/DataImporter.java
 URL:
 http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/solr/contrib
 /dataimporthandler/src/main/java/org/apache/solr/handler/dataimport/Dat
 aImporter.java?rev=1135956r1=1135955r2=1135956view=diff
 ==
 
 ---
 lucene/dev/branches/branch_3x/solr/contrib/dataimporthandler/src/main/j
 ava/org/apache/solr/handler/dataimport/DataImporter.java (original)
 +++
 lucene/dev/branches/branch_3x/solr/contrib/dataimporthandler/src/mai
 +++ n/java/org/apache/solr/handler/dataimport/DataImporter.java Wed
 Jun
 +++ 15 08:36:06 2011
 @@ -39,6 +39,7 @@ import org.apache.commons.io.IOUtils;
 
  import javax.xml.parsers.DocumentBuilder;
  import javax.xml.parsers.DocumentBuilderFactory;
 +import java.io.File;
  import java.io.StringReader;
  import java.text.SimpleDateFormat;
  import java.util.*;
 @@ -85,6 +86,8 @@ public class DataImporter {
 
private final MapString , Object coreScopeSession;
 
 +  private boolean isDeltaImportSupported = false;
 +
/**
 * Only for testing purposes
 */
 @@ -113,7 +116,9 @@ public class DataImporter {
initEntity(e, fields, false);
verifyWithSchema(fields);
identifyPk(e);
 -}
 +  if (e.allAttributes.containsKey(SqlEntityProcessor.DELTA_QUERY))
 +isDeltaImportSupported = true;
 +}
}
 
private void verifyWithSchema(MapString, DataConfig.Field fields) { @@
 -350,6 +355,7 @@ public class DataImporter {
 
  try {
docBuilder = new DocBuilder(this, writer, requestParams);
 +  checkWritablePersistFile(writer);
docBuilder.execute();
if (!requestParams.debug)
  cumulativeStatistics.add(docBuilder.importStatistics);
 @@ -364,6 +370,15 @@ public class DataImporter {
 
}
 
 +  private void checkWritablePersistFile(SolrWriter writer) {
 +File persistFile = writer.getPersistFile();
 +boolean isWritable = persistFile.exists() ? persistFile.canWrite() :
 persistFile.getParentFile().canWrite();
 +if (isDeltaImportSupported  !isWritable) {
 +  throw new DataImportHandlerException(SEVERE,
 

[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0

2011-06-16 Thread Martijn van Groningen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050677#comment-13050677
 ] 

Martijn van Groningen commented on SOLR-2564:
-

I also did some performance tests with the following query on random data in 
the example schema:
{code}http://localhost:8983/solr/select?q=*:*sort=_docid_ 
descgroup=truegroup.cacheMB=0group.field=single1000_i{code}
The field single1000_i had 1000 distinct values and the index has in total 
10 documents.

I ran this query on the following Solr setups:
* Last nights nightly build.
* Solr build with this patch as it is.
* Solr build with this patch and the necessary changes in 
AbstractFirstPassGroupingCollector so that pollLast was used in all cases.
During my tests I noticed that differences between the first and the second 
setups was neglectable smal, but the the last Solr setup was on average 32% 
faster than the two other setups. So moving to the Java6's pollLast() method 
has definitely a positive impact on performance!

I also think that this patch is ready to be committed and that the pollLast 
should be added when Lucene or the grouping module is java 6. (I prefer the 
first option) I'll commit it in the coming day or so.

 Integrating grouping module into Solr 4.0
 -

 Key: SOLR-2564
 URL: https://issues.apache.org/jira/browse/SOLR-2564
 Project: Solr
  Issue Type: Improvement
Reporter: Martijn van Groningen
Assignee: Martijn van Groningen
Priority: Blocker
 Fix For: 4.0

 Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, 
 SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, 
 SOLR-2564.patch


 Since work on grouping module is going well. I think it is time to wire this 
 up in Solr.
 Besides the current grouping features Solr provides, Solr will then also 
 support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2490) PropertiesRequestHandler; encode line.separator

2011-06-16 Thread Mike Sokolov (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050698#comment-13050698
 ] 

Mike Sokolov commented on SOLR-2490:


I would recommend using entities for this: #13;#10; for CRLF, just #10; for 
LF?

If this is processed by an XML parser, that'll already work for free anyway.

 PropertiesRequestHandler; encode line.separator
 ---

 Key: SOLR-2490
 URL: https://issues.apache.org/jira/browse/SOLR-2490
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Reporter: Stefan Matheis (steffkes)
Priority: Trivial

 Currently, the XML looks like this:
 {code}!-- .. --
 str name=java.io.tmpdir/tmp/str
 str name=line.separator
 /str
 str name=java.vm.specification.vendorSun Microsystems Inc./str
 !-- .. --{code}
 would be good to have this instead:
 {code}!-- .. --
 str name=java.io.tmpdir/tmp/str
 str name=line.separator\n/str
 str name=java.vm.specification.vendorSun Microsystems Inc./str
 !-- .. --{code}
 afterwords we will be able to display to used line seperator

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (SOLR-2490) PropertiesRequestHandler; encode line.separator

2011-06-16 Thread Mike Sokolov (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050698#comment-13050698
 ] 

Mike Sokolov edited comment on SOLR-2490 at 6/16/11 8:12 PM:
-

I would recommend using entities for this: amp;#13;amp;#10; for CRLF, just 
amp;#10; for LF?

If this is processed by an XML parser, that'll already work for free anyway.

  was (Author: sokolov):
I would recommend using entities for this: #13;#10; for CRLF, just #10; 
for LF?

If this is processed by an XML parser, that'll already work for free anyway.
  
 PropertiesRequestHandler; encode line.separator
 ---

 Key: SOLR-2490
 URL: https://issues.apache.org/jira/browse/SOLR-2490
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Reporter: Stefan Matheis (steffkes)
Priority: Trivial

 Currently, the XML looks like this:
 {code}!-- .. --
 str name=java.io.tmpdir/tmp/str
 str name=line.separator
 /str
 str name=java.vm.specification.vendorSun Microsystems Inc./str
 !-- .. --{code}
 would be good to have this instead:
 {code}!-- .. --
 str name=java.io.tmpdir/tmp/str
 str name=line.separator\n/str
 str name=java.vm.specification.vendorSun Microsystems Inc./str
 !-- .. --{code}
 afterwords we will be able to display to used line seperator

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-219) Determine if prefix, wildcard, fuzzy queries should be lowercased

2011-06-16 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050705#comment-13050705
 ] 

Jan Høydahl commented on SOLR-219:
--

Agree with Gunnar that the problem is wider than lowercasing. How hard would it 
be to let each filter choose whether to work on prefix terms or not, and run 
them through analysis?

A use case is for the Nordic characters æøåäö. A Norwegian name Øyvind would 
typically be normalized and indexed as oeyvind, and when a swede searches for 
Öyvin*, he'd get match if at least the mappingCharFilter and LowercaseFilter 
were allowed to run and turn the query into oeyvin*.

 Determine if prefix, wildcard, fuzzy queries should be lowercased
 -

 Key: SOLR-219
 URL: https://issues.apache.org/jira/browse/SOLR-219
 Project: Solr
  Issue Type: Improvement
Reporter: Yonik Seeley
Priority: Minor
 Fix For: 3.3

 Attachments: lowercase_prefix.patch, wildcardlowercase.patch


 Solr should be able to do the right thing when doing prefix/wildcard/fuzzy 
 queries on fields with respect to lowercasing or not.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-219) Determine if prefix, wildcard, fuzzy queries should be lowercased

2011-06-16 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050708#comment-13050708
 ] 

Robert Muir commented on SOLR-219:
--

a lot of analysis things like stemming are not prepared to deal with wildcard 
characters in the term, and returning multiple tokens (because a tokenizer 
splits on a * or whatever) makes no sense either

in my opinion, a good solution here is to allow you to specify in your schema: 
this is the analysis chain for these multitermqueries, so it would be a 
different chain rather than query or index (similar to SOLR-2477 where I 
propose allowing you to specify one for phrase). The QP would use this chain 
for things like wildcards, and throw an exception if the analyzer returns more 
than one token from a wildcard term.

This way you can use KeywordTokenizer + lowercase/fold characters or whatever, 
but in general doing things like WDF or synonyms makes no sense here.  If you 
want to do things like stemming, thats fine, you can shoot yourself in the foot 
this way and we won't stop you.

But in no case should we try to magically apply the analysis chain... too 
ambiguous what would happen.


 Determine if prefix, wildcard, fuzzy queries should be lowercased
 -

 Key: SOLR-219
 URL: https://issues.apache.org/jira/browse/SOLR-219
 Project: Solr
  Issue Type: Improvement
Reporter: Yonik Seeley
Priority: Minor
 Fix For: 3.3

 Attachments: lowercase_prefix.patch, wildcardlowercase.patch


 Solr should be able to do the right thing when doing prefix/wildcard/fuzzy 
 queries on fields with respect to lowercasing or not.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-219) Determine if prefix, wildcard, fuzzy queries should be lowercased

2011-06-16 Thread Gunnar Wagenknecht (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050714#comment-13050714
 ] 

Gunnar Wagenknecht commented on SOLR-219:
-

{quote}
But in no case should we try to magically apply the analysis chain... too 
ambiguous what would happen.
{quote}

Agreed. I just need a way in the schema when configuring fields to say which 
analyzers should run for wildcard and/or prefix queries.

 Determine if prefix, wildcard, fuzzy queries should be lowercased
 -

 Key: SOLR-219
 URL: https://issues.apache.org/jira/browse/SOLR-219
 Project: Solr
  Issue Type: Improvement
Reporter: Yonik Seeley
Priority: Minor
 Fix For: 3.3

 Attachments: lowercase_prefix.patch, wildcardlowercase.patch


 Solr should be able to do the right thing when doing prefix/wildcard/fuzzy 
 queries on fields with respect to lowercasing or not.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-219) Determine if prefix, wildcard, fuzzy queries should be lowercased

2011-06-16 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050728#comment-13050728
 ] 

Jan Høydahl commented on SOLR-219:
--

I like your idea @Robert. It's explicit and backwards compat, and would allow 
us to shoot our issues as well as our feet :)

 Determine if prefix, wildcard, fuzzy queries should be lowercased
 -

 Key: SOLR-219
 URL: https://issues.apache.org/jira/browse/SOLR-219
 Project: Solr
  Issue Type: Improvement
Reporter: Yonik Seeley
Priority: Minor
 Fix For: 3.3

 Attachments: lowercase_prefix.patch, wildcardlowercase.patch


 Solr should be able to do the right thing when doing prefix/wildcard/fuzzy 
 queries on fields with respect to lowercasing or not.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-06-16 Thread Varun Thacker (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Thacker updated LUCENE-2793:
--

Attachment: LUCENE-2793.patch

I have made changes suggested by Simon and have added Context to the test 
cases, though I've used DEFAULT in most of it. 

Also do we need the test- TestBufferedIndexInput ? I have added a 
IOContext.DEFAULT and fixed it though. 

 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
Assignee: Varun Thacker
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch


 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2506) EOFException from SolrServer.queryAndStreamResponse() in /trunk

2011-06-16 Thread Ryan McKinley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley updated SOLR-2506:


Attachment: index.zip

I am unable to reproduce with a test case, but here is an index that has just 
two docs that will always reproduce.

I don't think it has anything to do with SOLR-1566

 EOFException from SolrServer.queryAndStreamResponse() in /trunk
 ---

 Key: SOLR-2506
 URL: https://issues.apache.org/jira/browse/SOLR-2506
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Ryan McKinley
Priority: Minor
 Attachments: index.zip


 Ran into this on trunk... don't have time to dig into it now, but will post 
 it here so it is not lost.
 I suspect this is caused by something in SOLR-1566,  need to add some better 
 tests to flush it out
 {code}
 org.apache.solr.client.solrj.SolrServerException: java.lang.RuntimeException: 
 java.io.EOFException
   at 
 org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:223)
   at 
 org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89)
   at 
 org.apache.solr.client.solrj.SolrServer.queryAndStreamResponse(SolrServer.java:143)
 ...
 Caused by: java.lang.RuntimeException: java.io.EOFException
   at 
 org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:211)
   ... 51 more
 Caused by: java.io.EOFException
   at 
 org.apache.solr.common.util.FastInputStream.readByte(FastInputStream.java:160)
   at 
 org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:158)
   at 
 org.apache.solr.common.util.JavaBinCodec.readArray(JavaBinCodec.java:401)
   at 
 org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:172)
   at 
 org.apache.solr.common.util.JavaBinCodec.readOrderedMap(JavaBinCodec.java:110)
   at 
 org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:174)
   at 
 org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:102)
   at 
 org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:208)
   ... 51 more
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3208) Move Query.weight() to IndexSearcher as protected method

2011-06-16 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-3208:
--

Attachment: LUCENE-3208-3x.patch

Patch for 3.x branch. To apply, copy the trunk's AssertingIndexSearcher first 
to its target dir and then apply patch.

 Move Query.weight() to IndexSearcher as protected method
 

 Key: LUCENE-3208
 URL: https://issues.apache.org/jira/browse/LUCENE-3208
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.3, 4.0

 Attachments: LUCENE-3208-3x.patch, LUCENE-3208-LTC.patch, 
 LUCENE-3208.patch, LUCENE-3208.patch, LUCENE-3208.patch


 We had this issue several times, latest in LUCENE-3207.
 The method Query.weight() was left in Query for backwards reasons in Lucene 
 2.9 when we changed Weight class. This method is only to be called on 
 top-level queries - and this is done by IndexSearcher. This method is just a 
 utility method, that has nothing to do with the query itsself (it just 
 combines the createWeight method and calls the normalization afterwards). 
 The problem we have is that any query that wraps other queries (like 
 CustomScore, ConstantScore, Boolean) calls Query.weight() instead of 
 Query.createWeight(), it will do normalization two times, leading to strange 
 bugs.
 For 3.3 I will make Query.weight() simply delegate to IndexSearcher's 
 replacement method with a big deprecation warning, so user sees this. In 
 IndexSearcher itsself the method will be protected to only be called by 
 itsself or subclasses of IndexSearcher. Delegation for backwards is no 
 problem, as protected is accessible by classes in same package.
 I would suggest the method name to be 
 IndexSearcher.createNormalizedWeight(Query q)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-3207) CustomScoreQuery calls weight() where it should call createWeight()

2011-06-16 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler reassigned LUCENE-3207:
-

Assignee: Uwe Schindler

 CustomScoreQuery calls weight() where it should call createWeight()
 ---

 Key: LUCENE-3207
 URL: https://issues.apache.org/jira/browse/LUCENE-3207
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Uwe Schindler
 Fix For: 3.3, 4.0

 Attachments: LUCENE-3207.patch


 Thanks to Uwe for helping me track down this bug after I pulled my hair out 
 for hours on LUCENE-3174.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3208) Move Query.weight() to IndexSearcher as protected method

2011-06-16 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler resolved LUCENE-3208.
---

Resolution: Fixed

Committed 3.x revision: 1136702

 Move Query.weight() to IndexSearcher as protected method
 

 Key: LUCENE-3208
 URL: https://issues.apache.org/jira/browse/LUCENE-3208
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.3, 4.0

 Attachments: LUCENE-3208-3x.patch, LUCENE-3208-3x.patch, 
 LUCENE-3208-LTC.patch, LUCENE-3208.patch, LUCENE-3208.patch, LUCENE-3208.patch


 We had this issue several times, latest in LUCENE-3207.
 The method Query.weight() was left in Query for backwards reasons in Lucene 
 2.9 when we changed Weight class. This method is only to be called on 
 top-level queries - and this is done by IndexSearcher. This method is just a 
 utility method, that has nothing to do with the query itsself (it just 
 combines the createWeight method and calls the normalization afterwards). 
 The problem we have is that any query that wraps other queries (like 
 CustomScore, ConstantScore, Boolean) calls Query.weight() instead of 
 Query.createWeight(), it will do normalization two times, leading to strange 
 bugs.
 For 3.3 I will make Query.weight() simply delegate to IndexSearcher's 
 replacement method with a big deprecation warning, so user sees this. In 
 IndexSearcher itsself the method will be protected to only be called by 
 itsself or subclasses of IndexSearcher. Delegation for backwards is no 
 problem, as protected is accessible by classes in same package.
 I would suggest the method name to be 
 IndexSearcher.createNormalizedWeight(Query q)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3207) CustomScoreQuery calls weight() where it should call createWeight()

2011-06-16 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler resolved LUCENE-3207.
---

   Resolution: Fixed
Fix Version/s: 4.0
   3.3

Fixed through LUCENE-3208.

 CustomScoreQuery calls weight() where it should call createWeight()
 ---

 Key: LUCENE-3207
 URL: https://issues.apache.org/jira/browse/LUCENE-3207
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Uwe Schindler
 Fix For: 3.3, 4.0

 Attachments: LUCENE-3207.patch


 Thanks to Uwe for helping me track down this bug after I pulled my hair out 
 for hours on LUCENE-3174.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3208) Move Query.weight() to IndexSearcher as protected method

2011-06-16 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050764#comment-13050764
 ] 

Robert Muir commented on LUCENE-3208:
-

the backport looks good, and important/scary to also fix this 
IndexSearcher/Searcher bug.

 Move Query.weight() to IndexSearcher as protected method
 

 Key: LUCENE-3208
 URL: https://issues.apache.org/jira/browse/LUCENE-3208
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.3, 4.0

 Attachments: LUCENE-3208-3x.patch, LUCENE-3208-3x.patch, 
 LUCENE-3208-LTC.patch, LUCENE-3208.patch, LUCENE-3208.patch, LUCENE-3208.patch


 We had this issue several times, latest in LUCENE-3207.
 The method Query.weight() was left in Query for backwards reasons in Lucene 
 2.9 when we changed Weight class. This method is only to be called on 
 top-level queries - and this is done by IndexSearcher. This method is just a 
 utility method, that has nothing to do with the query itsself (it just 
 combines the createWeight method and calls the normalization afterwards). 
 The problem we have is that any query that wraps other queries (like 
 CustomScore, ConstantScore, Boolean) calls Query.weight() instead of 
 Query.createWeight(), it will do normalization two times, leading to strange 
 bugs.
 For 3.3 I will make Query.weight() simply delegate to IndexSearcher's 
 replacement method with a big deprecation warning, so user sees this. In 
 IndexSearcher itsself the method will be protected to only be called by 
 itsself or subclasses of IndexSearcher. Delegation for backwards is no 
 problem, as protected is accessible by classes in same package.
 I would suggest the method name to be 
 IndexSearcher.createNormalizedWeight(Query q)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2399) Solr Admin Interface, reworked

2011-06-16 Thread noah (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050784#comment-13050784
 ] 

noah commented on SOLR-2399:


The admin interface doesn't load in Safari 5 due to the use of variables and 
properties named 'class'.
Simple patch available here: https://gist.github.com/1030496


 Solr Admin Interface, reworked
 --

 Key: SOLR-2399
 URL: https://issues.apache.org/jira/browse/SOLR-2399
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Reporter: Stefan Matheis (steffkes)
Assignee: Ryan McKinley
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2399-110603-2.patch, SOLR-2399-110603.patch, 
 SOLR-2399-110606.patch, SOLR-2399-admin-interface.patch, 
 SOLR-2399-analysis-stopwords.patch, SOLR-2399-fluid-width.patch, 
 SOLR-2399-sorting-fields.patch, SOLR-2399-wip-notice.patch, SOLR-2399.patch


 *The idea was to create a new, fresh (and hopefully clean) Solr Admin 
 Interface.* [Based on this 
 [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]]
 *Features:*
 * [Dashboard|http://files.mathe.is/solr-admin/01_dashboard.png]
 * [Query-Form|http://files.mathe.is/solr-admin/02_query.png]
 * [Plugins|http://files.mathe.is/solr-admin/05_plugins.png]
 * [Analysis|http://files.mathe.is/solr-admin/04_analysis.png] (SOLR-2476, 
 SOLR-2400)
 * [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png]
 * [Dataimport|http://files.mathe.is/solr-admin/08_dataimport.png] (SOLR-2482)
 * [Core-Admin|http://files.mathe.is/solr-admin/09_coreadmin.png]
 * [Replication|http://files.mathe.is/solr-admin/10_replication.png]
 * [Zookeeper|http://files.mathe.is/solr-admin/11_cloud.png]
 * [Logging|http://files.mathe.is/solr-admin/07_logging.png] (SOLR-2459)
 ** Stub (using static data)
 Newly created Wiki-Page: http://wiki.apache.org/solr/ReworkedSolrAdminGUI
 I've quickly created a Github-Repository (Just for me, to keep track of the 
 changes)
 » https://github.com/steffkes/solr-admin

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3209) Memory codec

2011-06-16 Thread Michael McCandless (JIRA)
Memory codec


 Key: LUCENE-3209
 URL: https://issues.apache.org/jira/browse/LUCENE-3209
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.0


This codec stores all terms/postings in RAM.  It uses an
FSTBytesRef.  This is useful on a primary key field to ensure
lookups don't need to hit disk, to keep NRT reopen time fast even
under IO contention.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2399) Solr Admin Interface, reworked

2011-06-16 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050790#comment-13050790
 ] 

Erick Erickson commented on SOLR-2399:
--

Stefan:

Minor nit. If you refresh the stats page, everything shows up collapsed. Is it 
possible to show the same view as it was when the refresh was hit? The use-case 
here is that I wanted to watch how many documents were in the index as a job 
was running, so I wanted the search node expanded just as it was when I hit 
refresh...

Really minor nit, though.

 Solr Admin Interface, reworked
 --

 Key: SOLR-2399
 URL: https://issues.apache.org/jira/browse/SOLR-2399
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Reporter: Stefan Matheis (steffkes)
Assignee: Ryan McKinley
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2399-110603-2.patch, SOLR-2399-110603.patch, 
 SOLR-2399-110606.patch, SOLR-2399-admin-interface.patch, 
 SOLR-2399-analysis-stopwords.patch, SOLR-2399-fluid-width.patch, 
 SOLR-2399-sorting-fields.patch, SOLR-2399-wip-notice.patch, SOLR-2399.patch


 *The idea was to create a new, fresh (and hopefully clean) Solr Admin 
 Interface.* [Based on this 
 [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]]
 *Features:*
 * [Dashboard|http://files.mathe.is/solr-admin/01_dashboard.png]
 * [Query-Form|http://files.mathe.is/solr-admin/02_query.png]
 * [Plugins|http://files.mathe.is/solr-admin/05_plugins.png]
 * [Analysis|http://files.mathe.is/solr-admin/04_analysis.png] (SOLR-2476, 
 SOLR-2400)
 * [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png]
 * [Dataimport|http://files.mathe.is/solr-admin/08_dataimport.png] (SOLR-2482)
 * [Core-Admin|http://files.mathe.is/solr-admin/09_coreadmin.png]
 * [Replication|http://files.mathe.is/solr-admin/10_replication.png]
 * [Zookeeper|http://files.mathe.is/solr-admin/11_cloud.png]
 * [Logging|http://files.mathe.is/solr-admin/07_logging.png] (SOLR-2459)
 ** Stub (using static data)
 Newly created Wiki-Page: http://wiki.apache.org/solr/ReworkedSolrAdminGUI
 I've quickly created a Github-Repository (Just for me, to keep track of the 
 changes)
 » https://github.com/steffkes/solr-admin

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3209) Memory codec

2011-06-16 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3209:
---

Attachment: LUCENE-3209.patch

Patch; I think it's working and ready to commit.  All tests pass w/ it, and I 
went and disabled the same tests that avoid SimpleText codec.

 Memory codec
 

 Key: LUCENE-3209
 URL: https://issues.apache.org/jira/browse/LUCENE-3209
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-3209.patch


 This codec stores all terms/postings in RAM.  It uses an
 FSTBytesRef.  This is useful on a primary key field to ensure
 lookups don't need to hit disk, to keep NRT reopen time fast even
 under IO contention.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-3.x - Build # 410 - Still Failing

2011-06-16 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-3.x/410/

1 tests failed.
FAILED:  org.apache.lucene.util.fst.TestFSTs.testBigSet

Error Message:
Java heap space

Stack Trace:
java.lang.OutOfMemoryError: Java heap space
at 
org.apache.lucene.util.fst.TestFSTs$FSTTester.verifyPruned(TestFSTs.java:791)
at 
org.apache.lucene.util.fst.TestFSTs$FSTTester.doTest(TestFSTs.java:499)
at 
org.apache.lucene.util.fst.TestFSTs$FSTTester.doTest(TestFSTs.java:363)
at org.apache.lucene.util.fst.TestFSTs.doTest(TestFSTs.java:211)
at 
org.apache.lucene.util.fst.TestFSTs.testRandomWords(TestFSTs.java:944)
at org.apache.lucene.util.fst.TestFSTs.testBigSet(TestFSTs.java:964)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1271)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1189)




Build Log (for compile errors):
[...truncated 12481 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Indexing slower in trunk

2011-06-16 Thread Erick Erickson
OK, I tried using separate machines for indexing and running Solr, connected
my work and personal Macs with an ethernet cable. Poor-man's network...

I also changed things a bit to parse the Wiki data and store 1.5M docs in memory
then then try to send them to Solr in various-sized batches, thus removing
all the work associated with reading/parsing the XML from the timings.

And the results are...ambiguous. So I re-read some of the blog posts by
Simon and Mark, and think that where I'm missing out is the phrase
... computers with highly concurrent hardware  I don't have that, and
what I'm seeing is that DWPT doesn't seem to make much difference in
this situation. Of course my situation is probably totally irrelevant,
since I've
got to believe that people indexing *serious* data will have, shall we say,
more impressive hardware than I have.

Or perhaps I should say that whatever I do, I can get trunk and 3x to
perform pretty equivalently. Do note that what I'm really looking for is
the time until I can search the last document sent to Solr, so included
in here is a commit step. If I take that out, I'm seeing very substantial
gains in trunk. So presumably with a run that lasted longer than just
a couple of minutes I'd see impressive speedups.

I suspect that I just don't have enough hardware to consistently
encounter the situations where DWPT really shines.

It's also possible that I'm doing something stupid, but until some kind
person sets me up with sufficient hardware I'm afraid I'll have to drop
it G

Best
Erick

On Thu, Jun 16, 2011 at 12:05 PM, Erick Erickson
erickerick...@gmail.com wrote:
 OK, after more tests I'm pretty sure that my personal machine
 that I'm testing on is just resource-constrained, leading to the
 results I mentioned before. After all, I'm running my Solr
 instance, the indexing program, etc on a Macbook
 with 1 CPU and 2 cores. The indexing program is parsing the
 XML.

 On a proper setup, where the indexing machine was separate
 from the machine(s) feeding the index process I suspect this would
 be a different story. H, I may try that sometime too

 Best
 Erick

 On Tue, Jun 14, 2011 at 9:25 AM, Uwe Schindler u...@thetaphi.de wrote:
 For simple removing deletes, there is also IW.expungeDeletes(), which is
 less intensive! Not sure if solr support this, too, but as far as I know
 there is an issue open.

 Also please note: As soon as one segment is selected for merging (the merge
 policy may also do this dependent on the number of deletes in a segment), it
 will reclaim all deleted ressources - that's what merging does. So expunging
 deletes once per week is a good idea, if your index consists of very old and
 large segments that are rarely merged anymore and lots of documents are
 deleted from them.

 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de


 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Tuesday, June 14, 2011 3:19 PM
 To: dev@lucene.apache.org
 Subject: Re: Indexing slower in trunk

 Optimization used to have a very noticeable impact on search speed prior
 to
 some index format changes from quite a while ago.

 At this point the effect is much less noticeable, but the thing optimize
 does
 do is reclaim resources from deleted documents. If you have lots of
 deletions, it's a good idea to periodically optimize, but in that case
 it's often
 done pretty infrequently (once a
 day/week/month) rather than as part of any ongoing indexing process.

 Best
 Erick

 2011/6/14 Yury Kats yuryk...@yahoo.com:
  On 6/14/2011 4:28 AM, Uwe Schindler wrote:
  indexing and optimizing was only a
  good idea pre Lucene-2.9, now it's mostly obsolete)
 
  Could you please elaborate on this? Is optimizing obsolete in general
  or after indexing new documents? Is it obsolete after deletions? And
  what it mostly?
 
  Thanks!
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
  additional commands, e-mail: dev-h...@lucene.apache.org
 
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
 commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2597) XmlCharFilter

2011-06-16 Thread Mike Sokolov (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Sokolov updated SOLR-2597:
---

Attachment: SOLR-2597.patch

Updated patch addresses (most of) Robert and Hoss' comments (thanks for the 
speedy review!):

Test now uses the random in the test framework

I added a test for the factory (actually all the tests now use the factory 
since it is now used to create the parser), but I haven't plumbed this all the 
way through to a schema declaration. 

Moved to org.apache.solr.analysis: I don't know if this is the right place for 
this, but at least it should resolve any jar and java 1.6 dependency problems - 
I think? - at least I can compile and run the tests from both eclipse and ant 
command line although I'm not sure what that proves exactly.

The parser is now created in the factory rather than being maintained as a 
static in the reader class.

 XmlCharFilter
 -

 Key: SOLR-2597
 URL: https://issues.apache.org/jira/browse/SOLR-2597
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 4.0
Reporter: Mike Sokolov
 Attachments: SOLR-2597.patch, SOLR-2597.patch


 This CharFilter processes incoming XML using the Woodstox parser, stripping 
 all non-text content and remembering offsets, just like HTMLCharFilter, but 
 respecting XML conventions like XML entities defined in a DTD.  XmlCharFilter 
 also provides the ability to exclude (and include) the content of certain 
 named elements.
 In order to compute character offsets properly when mixed line termination 
 styles are present (\r, \r\n), or when XML character entities (lt;, quot;, 
 amp;) are present, we require a newer version of Woodstox (4.1.1) than is 
 currently in solr/lib.  The earlier versions of the parser could not report 
 these entity events, so we couldn't tell the difference between  and 
 lt; and the offsets could be wrong.  The upgraded version is in the patch.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



solr expungeDeletes default value?

2011-06-16 Thread Ryan McKinley
on /trunk expungeDeletes=false by default

Is that the most reasonable default?

What are the tradeoffs?

With expungeDeletes=true, how does that relate to optimize?

(sorry if this has already been covered)

thanks
ryan

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org