[jira] [Created] (SOLR-2592) Pluggable shard lookup mechanism for SolrCloud

2011-06-15 Thread Noble Paul (JIRA)
Pluggable shard lookup mechanism for SolrCloud
--

 Key: SOLR-2592
 URL: https://issues.apache.org/jira/browse/SOLR-2592
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Affects Versions: 4.0
Reporter: Noble Paul


If the data in a cloud can be partitioned on some criteria (say range, hash, 
attribute value etc) It will be easy to narrow down the search to a smaller 
subset of shards and in effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2583) Make external scoring more efficient (ExternalFileField, FileFloatSource)

2011-06-15 Thread Martin Grotzke (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049674#comment-13049674
 ] 

Martin Grotzke commented on SOLR-2583:
--

The test that produced this output can be found in my lucene-solr fork on 
github: https://github.com/magro/lucene-solr/commit/b9af87b1
The test method that was executed was testCompareMemoryUsage, for measuring 
memory usage I used http://code.google.com/p/memory-measurer/ and ran the 
test/jvm with -Xmx1G -javaagent:solr/lib/object-explorer.jar (just from 
eclipse).

I just added another test, that uses a fixed size and an increasing number of 
puts (testCompareMemoryUsageWithFixSizeAndIncreasingNumPuts, 
https://github.com/magro/lucene-solr/blob/trunk/solr/src/test/org/apache/solr/search/function/FileFloatSourceMemoryTest.java#L56),
 with the following results:

{noformat}
Size: 100
NumPuts 1.000 (0,1%),   CompactFloatArray 918.616,  float[] 
4.000.016,  HashMap  72.128
NumPuts 10.000 (1,0%),  CompactFloatArray 3.738.712,float[] 
4.000.016,  HashMap  701.696
NumPuts 50.000 (5,0%),  CompactFloatArray 4.016.472,float[] 
4.000.016,  HashMap  3.383.104
NumPuts 55.000 (5,5%),  CompactFloatArray 4.016.472,float[] 
4.000.016,  HashMap  3.949.120
NumPuts 60.000 (6,0%),  CompactFloatArray 4.016.472,float[] 
4.000.016,  HashMap  4.254.848
NumPuts 100.000 (10,0%),CompactFloatArray 4.016.472,float[] 
4.000.016,  HashMap  6.622.272
NumPuts 500.000 (50,0%),CompactFloatArray 4.016.472,float[] 
4.000.016,  HashMap  27.262.976
NumPuts 1.000.000 (100,0%), CompactFloatArray 4.016.472,float[] 
4.000.016,  HashMap  44.649.664
{noformat}

It seems that the HashMap is the most efficient solution up to ~5.5%. Starting 
from this threshold CompactFloatArray and float[] use less memory, while the 
CompactFloatArray has no advantages over float[] for puts  5%.

Therefore I'd suggest that we use an adaptive strategy that uses a HashMap up 
to 5,5% of number of scores compared to numdocs, and starting from this 
threshold the original float[] approach is used.

What do you say?

 Make external scoring more efficient (ExternalFileField, FileFloatSource)
 -

 Key: SOLR-2583
 URL: https://issues.apache.org/jira/browse/SOLR-2583
 Project: Solr
  Issue Type: Improvement
  Components: search
Reporter: Martin Grotzke
Priority: Minor
 Attachments: FileFloatSource.java.patch, patch.txt


 External scoring eats much memory, depending on the number of documents in 
 the index. The ExternalFileField (used for external scoring) uses 
 FileFloatSource, where one FileFloatSource is created per external scoring 
 file. FileFloatSource creates a float array with the size of the number of 
 docs (this is also done if the file to load is not found). If there are much 
 less entries in the scoring file than there are number of docs in total the 
 big float array wastes much memory.
 This could be optimized by using a map of doc - score, so that the map 
 contains as many entries as there are scoring entries in the external file, 
 but not more.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2551) Check dataimport.properties for write access before starting import

2011-06-15 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-2551:


Fix Version/s: 4.0
   3.3
  Summary: Check dataimport.properties for write access before starting 
import  (was: Checking dataimport.properties for write access during startup)

 Check dataimport.properties for write access before starting import
 ---

 Key: SOLR-2551
 URL: https://issues.apache.org/jira/browse/SOLR-2551
 Project: Solr
  Issue Type: Improvement
  Components: contrib - DataImportHandler
Affects Versions: 1.4.1, 3.1
Reporter: C S
Assignee: Shalin Shekhar Mangar
Priority: Minor
 Fix For: 3.3, 4.0

 Attachments: SOLR-2551.patch


 A common mistake is that the /conf (respectively the dataimport.properties) 
 file is not writable for solr. It would be great if that were detected on 
 starting a dataimport job. 
 Currently and import might grind away for days and fail if it can't write its 
 timestamp to the dataimport.properties file.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-2593) A new command 'split' for splitting index

2011-06-15 Thread Noble Paul (JIRA)
A new command 'split' for splitting index
-

 Key: SOLR-2593
 URL: https://issues.apache.org/jira/browse/SOLR-2593
 Project: Solr
  Issue Type: New Feature
Reporter: Noble Paul


If an index is too large/hot it would be desirable to split it out to another 
core 


There can be to be multiple strategies 
* random split of x or x% 
* fq=user:johndoe

example 
example :
command=splitsplit=20percentnewcore=my_new_index
or
command=splitfq=user:johndoenewcore=john_doe_index







--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-2551) Check dataimport.properties for write access before starting import

2011-06-15 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar resolved SOLR-2551.
-

Resolution: Fixed

Committed revision 1135954 on trunk and 1135956 on branch_3x.

 Check dataimport.properties for write access before starting import
 ---

 Key: SOLR-2551
 URL: https://issues.apache.org/jira/browse/SOLR-2551
 Project: Solr
  Issue Type: Improvement
  Components: contrib - DataImportHandler
Affects Versions: 1.4.1, 3.1
Reporter: C S
Assignee: Shalin Shekhar Mangar
Priority: Minor
 Fix For: 3.3, 4.0

 Attachments: SOLR-2551.patch


 A common mistake is that the /conf (respectively the dataimport.properties) 
 file is not writable for solr. It would be great if that were detected on 
 starting a dataimport job. 
 Currently and import might grind away for days and fail if it can't write its 
 timestamp to the dataimport.properties file.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2593) A new core admin command 'split' for splitting index

2011-06-15 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-2593:
-

Summary: A new core admin command 'split' for splitting index  (was: A new 
command 'split' for splitting index)

 A new core admin command 'split' for splitting index
 

 Key: SOLR-2593
 URL: https://issues.apache.org/jira/browse/SOLR-2593
 Project: Solr
  Issue Type: New Feature
Reporter: Noble Paul

 If an index is too large/hot it would be desirable to split it out to another 
 core .
 This core may eventually be replicated out to another host.
 There can be to be multiple strategies 
 * random split of x or x% 
 * fq=user:johndoe
 example 
 example :
 command=splitsplit=20percentnewcore=my_new_index
 or
 command=splitfq=user:johndoenewcore=john_doe_index

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2593) A new command 'split' for splitting index

2011-06-15 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-2593:
-

Description: 
If an index is too large/hot it would be desirable to split it out to another 
core .
This core may eventually be replicated out to another host.

There can be to be multiple strategies 
* random split of x or x% 
* fq=user:johndoe

example 
example :
command=splitsplit=20percentnewcore=my_new_index
or
command=splitfq=user:johndoenewcore=john_doe_index







  was:
If an index is too large/hot it would be desirable to split it out to another 
core 


There can be to be multiple strategies 
* random split of x or x% 
* fq=user:johndoe

example 
example :
command=splitsplit=20percentnewcore=my_new_index
or
command=splitfq=user:johndoenewcore=john_doe_index








 A new command 'split' for splitting index
 -

 Key: SOLR-2593
 URL: https://issues.apache.org/jira/browse/SOLR-2593
 Project: Solr
  Issue Type: New Feature
Reporter: Noble Paul

 If an index is too large/hot it would be desirable to split it out to another 
 core .
 This core may eventually be replicated out to another host.
 There can be to be multiple strategies 
 * random split of x or x% 
 * fq=user:johndoe
 example 
 example :
 command=splitsplit=20percentnewcore=my_new_index
 or
 command=splitfq=user:johndoenewcore=john_doe_index

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



SolrCloud: Automatic master failover

2011-06-15 Thread Shalin Shekhar Mangar
Hello,

What is the status for automatic master failover (leader election) in
SolrCloud? Is there an issue open? I'm interested in this and I've some time
to take it up.

-- 
Regards,
Shalin Shekhar Mangar.


[jira] [Updated] (SOLR-2355) simple distrib update processor

2011-06-15 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-2355:


Component/s: update
 SolrCloud

 simple distrib update processor
 ---

 Key: SOLR-2355
 URL: https://issues.apache.org/jira/browse/SOLR-2355
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud, update
Reporter: Yonik Seeley
Priority: Minor
 Fix For: 3.3

 Attachments: DistributedUpdateProcessorFactory.java, 
 TestDistributedUpdate.java


 Here's a simple update processor for distributed indexing that I implemented 
 years ago.
 It implements a simple hash(id) MOD nservers and just fails if any servers 
 are down.
 Given the recent activity in distributed indexing, I thought this might be at 
 least a good source for ideas.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2341) Shard distribution policy

2011-06-15 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-2341:


  Component/s: SolrCloud
Fix Version/s: 4.0

 Shard distribution policy
 -

 Key: SOLR-2341
 URL: https://issues.apache.org/jira/browse/SOLR-2341
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Reporter: William Mayor
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2341.patch, SOLR-2341.patch


 A first crack at creating policies to be used for determining to which of a 
 list of shards a document should go. See discussion on Distributed Indexing 
 on dev-list.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2358) Distributing Indexing

2011-06-15 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-2358:


  Component/s: update
   SolrCloud
Fix Version/s: 4.0

 Distributing Indexing
 -

 Key: SOLR-2358
 URL: https://issues.apache.org/jira/browse/SOLR-2358
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud, update
Reporter: William Mayor
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2358.patch


 The first steps towards creating distributed indexing functionality in Solr

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2593) A new core admin command 'split' for splitting index

2011-06-15 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-2593:


Fix Version/s: 4.0

 A new core admin command 'split' for splitting index
 

 Key: SOLR-2593
 URL: https://issues.apache.org/jira/browse/SOLR-2593
 Project: Solr
  Issue Type: New Feature
Reporter: Noble Paul
 Fix For: 4.0


 If an index is too large/hot it would be desirable to split it out to another 
 core .
 This core may eventually be replicated out to another host.
 There can be to be multiple strategies 
 * random split of x or x% 
 * fq=user:johndoe
 example 
 example :
 command=splitsplit=20percentnewcore=my_new_index
 or
 command=splitfq=user:johndoenewcore=john_doe_index

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-2594) Make Replication Handler cloud aware

2011-06-15 Thread Shalin Shekhar Mangar (JIRA)
Make Replication Handler cloud aware


 Key: SOLR-2594
 URL: https://issues.apache.org/jira/browse/SOLR-2594
 Project: Solr
  Issue Type: Improvement
  Components: replication (java), SolrCloud
Reporter: Shalin Shekhar Mangar
 Fix For: 4.0


Replication handler should be cloud aware. It should be possible to switch 
roles from slave to master as well as switch masterUrls based on the cluster 
topology and state.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-2595) Split and migrate indexes

2011-06-15 Thread Shalin Shekhar Mangar (JIRA)
Split and migrate indexes
-

 Key: SOLR-2595
 URL: https://issues.apache.org/jira/browse/SOLR-2595
 Project: Solr
  Issue Type: New Feature
  Components: multicore, replication (java), SolrCloud
Reporter: Shalin Shekhar Mangar
 Fix For: 4.0


When an shard's index grows too large or a shard becomes too loaded, it should 
be possible to split parts of a shard's index and migrate/merge to a less 
loaded node.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-2596) Enhance CoreAdmin mergeindexes to use a core's index as the source

2011-06-15 Thread Shalin Shekhar Mangar (JIRA)
Enhance CoreAdmin mergeindexes to use a core's index as the source
--

 Key: SOLR-2596
 URL: https://issues.apache.org/jira/browse/SOLR-2596
 Project: Solr
  Issue Type: Improvement
  Components: multicore, update
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
 Fix For: 4.0


Enhance CoreAdmin mergeindexes to use a core's index as the source. Right now 
the mergeindexes command accepts a list of index directories on the local disk 
which is not very convenient.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2595) Split and migrate indexes

2011-06-15 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049703#comment-13049703
 ] 

Shalin Shekhar Mangar commented on SOLR-2595:
-

Example: Lets say you have a core C1 on host H1 which you want to split and 
move a part of the index to core C2 on host H2

The sequence of operation can be:
# Use SOLR-2593 to split C1 and move the part to be migrated into a temporary 
core, say S
# Create a temporary core on H2 host, say T
# Assign T to be a slave of S
# When replication completes, use SOLR-2596 to merge T into C2 - perhaps 
update some ZK flags so that

Some details still need to be figured out e.g.
* What strategy to use for splitting?
* How to delete the migrated part from the source index?
* How to update the shard lookup and distributed indexing schemes for the 
migrated part?
* What happens to writes during the migration? Should we disallow it?

 Split and migrate indexes
 -

 Key: SOLR-2595
 URL: https://issues.apache.org/jira/browse/SOLR-2595
 Project: Solr
  Issue Type: New Feature
  Components: multicore, replication (java), SolrCloud
Reporter: Shalin Shekhar Mangar
 Fix For: 4.0


 When an shard's index grows too large or a shard becomes too loaded, it 
 should be possible to split parts of a shard's index and migrate/merge to a 
 less loaded node.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1431) CommComponent abstracted

2011-06-15 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049705#comment-13049705
 ] 

Noble Paul commented on SOLR-1431:
--

Jason, 

the configuration which I have specified lets you do ShardHandler specific 
configuration. It goes well with the general Solr configuration. 

{code:xml}
requestHandler name=standard class=solr.SearchHandler default=true
!-- other params go here --
 
 shardHandler class=HttpShardHandler
!-- To be implemented--
int name=httpReadTimeOut1000/int
int name=httpConnTimeOut5000/int
  /shardHandler
/requestHandler
{code}


Creating a new instance per request is not wise.

 CommComponent abstracted
 

 Key: SOLR-1431
 URL: https://issues.apache.org/jira/browse/SOLR-1431
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 4.0
Reporter: Jason Rutherglen
Assignee: Mark Miller
 Fix For: 4.0

 Attachments: SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
 SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
 SOLR-1431.patch, SOLR-1431.patch


 We'll abstract CommComponent in this issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2583) Make external scoring more efficient (ExternalFileField, FileFloatSource)

2011-06-15 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049706#comment-13049706
 ] 

Robert Muir commented on SOLR-2583:
---

Are you sure real floats are actually needed?
Why not use compactbytearray with smallfloat encoding?

it would also good to measure performance... doesn't a hashmap have to box 
*per-docid* into an Integer for lookup?



 Make external scoring more efficient (ExternalFileField, FileFloatSource)
 -

 Key: SOLR-2583
 URL: https://issues.apache.org/jira/browse/SOLR-2583
 Project: Solr
  Issue Type: Improvement
  Components: search
Reporter: Martin Grotzke
Priority: Minor
 Attachments: FileFloatSource.java.patch, patch.txt


 External scoring eats much memory, depending on the number of documents in 
 the index. The ExternalFileField (used for external scoring) uses 
 FileFloatSource, where one FileFloatSource is created per external scoring 
 file. FileFloatSource creates a float array with the size of the number of 
 docs (this is also done if the file to load is not found). If there are much 
 less entries in the scoring file than there are number of docs in total the 
 big float array wastes much memory.
 This could be optimized by using a map of doc - score, so that the map 
 contains as many entries as there are scoring entries in the external file, 
 but not more.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2583) Make external scoring more efficient (ExternalFileField, FileFloatSource)

2011-06-15 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049709#comment-13049709
 ] 

Robert Muir commented on SOLR-2583:
---

bq. that uses a fixed size and an increasing number of puts

I'm not certain how realistic that is, remember behind the scenes 
compactbytearray uses blocks,
and if you touch every one (by putting every K docid or something) then you are 
just testing 
the worst case.


 Make external scoring more efficient (ExternalFileField, FileFloatSource)
 -

 Key: SOLR-2583
 URL: https://issues.apache.org/jira/browse/SOLR-2583
 Project: Solr
  Issue Type: Improvement
  Components: search
Reporter: Martin Grotzke
Priority: Minor
 Attachments: FileFloatSource.java.patch, patch.txt


 External scoring eats much memory, depending on the number of documents in 
 the index. The ExternalFileField (used for external scoring) uses 
 FileFloatSource, where one FileFloatSource is created per external scoring 
 file. FileFloatSource creates a float array with the size of the number of 
 docs (this is also done if the file to load is not found). If there are much 
 less entries in the scoring file than there are number of docs in total the 
 big float array wastes much memory.
 This could be optimized by using a map of doc - score, so that the map 
 contains as many entries as there are scoring entries in the external file, 
 but not more.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2593) A new core admin command 'split' for splitting index

2011-06-15 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049713#comment-13049713
 ] 

Koji Sekiguchi commented on SOLR-2593:
--

CoreAdminHandler uses action, not command.

 A new core admin command 'split' for splitting index
 

 Key: SOLR-2593
 URL: https://issues.apache.org/jira/browse/SOLR-2593
 Project: Solr
  Issue Type: New Feature
Reporter: Noble Paul
 Fix For: 4.0


 If an index is too large/hot it would be desirable to split it out to another 
 core .
 This core may eventually be replicated out to another host.
 There can be to be multiple strategies 
 * random split of x or x% 
 * fq=user:johndoe
 example 
 example :
 command=splitsplit=20percentnewcore=my_new_index
 or
 command=splitfq=user:johndoenewcore=john_doe_index

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3205) remove MultiTermQuery get/inc/clear totalNumberOfTerms

2011-06-15 Thread Robert Muir (JIRA)
remove MultiTermQuery get/inc/clear totalNumberOfTerms
--

 Key: LUCENE-3205
 URL: https://issues.apache.org/jira/browse/LUCENE-3205
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-3205.patch

This method is not correct if the index has more than one segment.
Its also not thread safe, and it means calling query.rewrite() modifies
the original query. 

All of these things add up to confusion, I think we should remove this 
from multitermquery, the only thing that uses it is the NRQ tests, which 
conditionalizes all the asserts anyway.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3205) remove MultiTermQuery get/inc/clear totalNumberOfTerms

2011-06-15 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3205:


Attachment: LUCENE-3205.patch

 remove MultiTermQuery get/inc/clear totalNumberOfTerms
 --

 Key: LUCENE-3205
 URL: https://issues.apache.org/jira/browse/LUCENE-3205
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-3205.patch


 This method is not correct if the index has more than one segment.
 Its also not thread safe, and it means calling query.rewrite() modifies
 the original query. 
 All of these things add up to confusion, I think we should remove this 
 from multitermquery, the only thing that uses it is the NRQ tests, which 
 conditionalizes all the asserts anyway.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2593) A new core admin command 'split' for splitting index

2011-06-15 Thread Peter Sturge (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049727#comment-13049727
 ] 

Peter Sturge commented on SOLR-2593:


This is a really great idea, thanks!
If it's possible, it would be cool to have config parameters to:
 create a new core
 overwrite an existing core
 rename an existing core, then create (rolling backup)
 merge with an existing core (ever-growing, but kind of an accessible 'archive' 
index)


 A new core admin command 'split' for splitting index
 

 Key: SOLR-2593
 URL: https://issues.apache.org/jira/browse/SOLR-2593
 Project: Solr
  Issue Type: New Feature
Reporter: Noble Paul
 Fix For: 4.0


 If an index is too large/hot it would be desirable to split it out to another 
 core .
 This core may eventually be replicated out to another host.
 There can be to be multiple strategies 
 * random split of x or x% 
 * fq=user:johndoe
 example 
 example :
 command=splitsplit=20percentnewcore=my_new_index
 or
 command=splitfq=user:johndoenewcore=john_doe_index

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3205) remove MultiTermQuery get/inc/clear totalNumberOfTerms

2011-06-15 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3205:


Fix Version/s: 4.0
   3.3

 remove MultiTermQuery get/inc/clear totalNumberOfTerms
 --

 Key: LUCENE-3205
 URL: https://issues.apache.org/jira/browse/LUCENE-3205
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Robert Muir
 Fix For: 3.3, 4.0

 Attachments: LUCENE-3205.patch


 This method is not correct if the index has more than one segment.
 Its also not thread safe, and it means calling query.rewrite() modifies
 the original query. 
 All of these things add up to confusion, I think we should remove this 
 from multitermquery, the only thing that uses it is the NRQ tests, which 
 conditionalizes all the asserts anyway.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3205) remove MultiTermQuery get/inc/clear totalNumberOfTerms

2011-06-15 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049733#comment-13049733
 ] 

Uwe Schindler commented on LUCENE-3205:
---

I am perfectly fine to remove it. For analysis and debugging NRQ, it would 
still  be good to have something, but I suggest to change the tests (I will 
simply request TermsEnum and count terms, possibly on MultiTerms).

Should I take the issue and modify my tests?

 remove MultiTermQuery get/inc/clear totalNumberOfTerms
 --

 Key: LUCENE-3205
 URL: https://issues.apache.org/jira/browse/LUCENE-3205
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Robert Muir
 Fix For: 3.3, 4.0

 Attachments: LUCENE-3205.patch


 This method is not correct if the index has more than one segment.
 Its also not thread safe, and it means calling query.rewrite() modifies
 the original query. 
 All of these things add up to confusion, I think we should remove this 
 from multitermquery, the only thing that uses it is the NRQ tests, which 
 conditionalizes all the asserts anyway.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3205) remove MultiTermQuery get/inc/clear totalNumberOfTerms

2011-06-15 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049734#comment-13049734
 ] 

Robert Muir commented on LUCENE-3205:
-

yes, please do?

 remove MultiTermQuery get/inc/clear totalNumberOfTerms
 --

 Key: LUCENE-3205
 URL: https://issues.apache.org/jira/browse/LUCENE-3205
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Robert Muir
 Fix For: 3.3, 4.0

 Attachments: LUCENE-3205.patch


 This method is not correct if the index has more than one segment.
 Its also not thread safe, and it means calling query.rewrite() modifies
 the original query. 
 All of these things add up to confusion, I think we should remove this 
 from multitermquery, the only thing that uses it is the NRQ tests, which 
 conditionalizes all the asserts anyway.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: SolrCloud: Automatic master failover

2011-06-15 Thread Yonik Seeley
On Wed, Jun 15, 2011 at 5:31 AM, Shalin Shekhar Mangar
shalinman...@gmail.com wrote:
 Hello,
 What is the status for automatic master failover (leader election) in
 SolrCloud? Is there an issue open? I'm interested in this and I've some time
 to take it up.

Awesome!  I'm hoping to find time next week myself to start doing more
on cloud stuff!
Do you mean master in the traditional sense (master in a whole index
replication sense), or
leader (where decisions for changing the configuration of a cluster get made)?

-Yonik
http://www.lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: XmlCharFilter

2011-06-15 Thread Erick Erickson
Yonik's law of patches states:

A half-baked patch in Jira, with no documentation, no tests
and no backwards compatibility is better than no patch at all.

and what you've described sounds wy better than that!

Anyway, I doubt you'll *ever* find someone on the dev list
*complain* about opening up a JIRA on something when you're
willing to attach a patch, especially one with unit tests

Although you might have to nudge people to follow up on it...

Best
Erick

On Tue, Jun 14, 2011 at 9:50 PM, Michael Sokolov soko...@ifactory.com wrote:
 I work with a lot of XML data sources and have needed to implement an
 analysis chain for Solr/Lucene that accepts XML. In the course of doing
 that, I found I needed something very much like HTMLCharFilter, but that
 does standard XML parsing (understands XML entities defined in an internal
 or external DTD, for example).  So I wrote XmlCharFilter, which uses the
 Woodstox XML parser (already used by Solr).  I think this could be useful
 for others, and it would be nice for me if it were committed here, so I'd
 like to contribute.  Should I open a JIRA for this?  Is there anybody that
 can spare the time to review?  It is basically one class (plus a factory
 class) and has a fairly complete set of tests.

 -Mike Sokolov
 Engineering Directory
 iFactory.com


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: SolrCloud: Automatic master failover

2011-06-15 Thread Shalin Shekhar Mangar
On Wed, Jun 15, 2011 at 5:45 PM, Yonik Seeley yo...@lucidimagination.comwrote:


 Awesome!  I'm hoping to find time next week myself to start doing more
 on cloud stuff!
 Do you mean master in the traditional sense (master in a whole index
 replication sense), or
 leader (where decisions for changing the configuration of a cluster get
 made)?


In this particular case, I meant the traditional replication master. We'll
need a cluster leader for a sharded cloud setup but lets solve that
separately.
-- 
Regards,
Shalin Shekhar Mangar.


[jira] [Assigned] (LUCENE-3205) remove MultiTermQuery get/inc/clear totalNumberOfTerms

2011-06-15 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler reassigned LUCENE-3205:
-

Assignee: Uwe Schindler

 remove MultiTermQuery get/inc/clear totalNumberOfTerms
 --

 Key: LUCENE-3205
 URL: https://issues.apache.org/jira/browse/LUCENE-3205
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Uwe Schindler
 Fix For: 3.3, 4.0

 Attachments: LUCENE-3205.patch


 This method is not correct if the index has more than one segment.
 Its also not thread safe, and it means calling query.rewrite() modifies
 the original query. 
 All of these things add up to confusion, I think we should remove this 
 from multitermquery, the only thing that uses it is the NRQ tests, which 
 conditionalizes all the asserts anyway.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: XmlCharFilter

2011-06-15 Thread Simon Willnauer
On Wed, Jun 15, 2011 at 2:24 PM, Erick Erickson erickerick...@gmail.com wrote:
 Yonik's law of patches states:

 A half-baked patch in Jira, with no documentation, no tests
 and no backwards compatibility is better than no patch at all.
+1

simon

 and what you've described sounds wy better than that!

 Anyway, I doubt you'll *ever* find someone on the dev list
 *complain* about opening up a JIRA on something when you're
 willing to attach a patch, especially one with unit tests

 Although you might have to nudge people to follow up on it...

 Best
 Erick

 On Tue, Jun 14, 2011 at 9:50 PM, Michael Sokolov soko...@ifactory.com wrote:
 I work with a lot of XML data sources and have needed to implement an
 analysis chain for Solr/Lucene that accepts XML. In the course of doing
 that, I found I needed something very much like HTMLCharFilter, but that
 does standard XML parsing (understands XML entities defined in an internal
 or external DTD, for example).  So I wrote XmlCharFilter, which uses the
 Woodstox XML parser (already used by Solr).  I think this could be useful
 for others, and it would be nice for me if it were committed here, so I'd
 like to contribute.  Should I open a JIRA for this?  Is there anybody that
 can spare the time to review?  It is basically one class (plus a factory
 class) and has a fairly complete set of tests.

 -Mike Sokolov
 Engineering Directory
 iFactory.com


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: XmlCharFilter

2011-06-15 Thread Mike Sokolov

OK - thanks for the encouragement, Erick; I'll open a JIRA then.

-Mike

On 06/15/2011 08:24 AM, Erick Erickson wrote:

Yonik's law of patches states:

A half-baked patch in Jira, with no documentation, no tests
and no backwards compatibility is better than no patch at all.

and what you've described sounds wy better than that!

Anyway, I doubt you'll *ever* find someone on the dev list
*complain* about opening up a JIRA on something when you're
willing to attach a patch, especially one with unit tests

Although you might have to nudge people to follow up on it...

Best
Erick

On Tue, Jun 14, 2011 at 9:50 PM, Michael Sokolovsoko...@ifactory.com  wrote:
   

I work with a lot of XML data sources and have needed to implement an
analysis chain for Solr/Lucene that accepts XML. In the course of doing
that, I found I needed something very much like HTMLCharFilter, but that
does standard XML parsing (understands XML entities defined in an internal
or external DTD, for example).  So I wrote XmlCharFilter, which uses the
Woodstox XML parser (already used by Solr).  I think this could be useful
for others, and it would be nice for me if it were committed here, so I'd
like to contribute.  Should I open a JIRA for this?  Is there anybody that
can spare the time to review?  It is basically one class (plus a factory
class) and has a fairly complete set of tests.

-Mike Sokolov
Engineering Directory
iFactory.com


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org


 

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

   


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-2597) XmlCharFilter

2011-06-15 Thread Mike Sokolov (JIRA)
XmlCharFilter
-

 Key: SOLR-2597
 URL: https://issues.apache.org/jira/browse/SOLR-2597
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 4.0
Reporter: Mike Sokolov


This CharFilter processes incoming XML using the Woodstox parser, stripping all 
non-text content and remembering offsets, just like HTMLCharFilter, but 
respecting XML conventions like XML entities defined in a DTD.  XmlCharFilter 
also provides the ability to exclude (and include) the content of certain named 
elements.

In order to compute character offsets properly when mixed line termination 
styles are present (\r, \r\n), or when XML character entities (lt;, quot;, 
amp;) are present, we require a newer version of Woodstox (4.1.1) than is 
currently in solr/lib.  The earlier versions of the parser could not report 
these entity events, so we couldn't tell the difference between  and lt; 
and the offsets could be wrong.  The upgraded version is in the patch.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [jira] [Created] (SOLR-2597) XmlCharFilter

2011-06-15 Thread Koji Sekiguchi

Did you mean Xml*Strip*CharFilter?

koji
--
http://www.rondhuit.com/en/

(11/06/15 22:12), Mike Sokolov (JIRA) wrote:

XmlCharFilter
-

  Key: SOLR-2597
  URL: https://issues.apache.org/jira/browse/SOLR-2597
  Project: Solr
   Issue Type: Improvement
   Components: Schema and Analysis
 Affects Versions: 4.0
 Reporter: Mike Sokolov


This CharFilter processes incoming XML using the Woodstox parser, stripping all 
non-text content and remembering offsets, just like HTMLCharFilter, but 
respecting XML conventions like XML entities defined in a DTD.  XmlCharFilter 
also provides the ability to exclude (and include) the content of certain named 
elements.

In order to compute character offsets properly when mixed line termination styles are present (\r, \r\n), or when XML 
character entities (lt;,quot;,amp;) are present, we require a newer version of Woodstox (4.1.1) than is 
currently in solr/lib.  The earlier versions of the parser could not report these entity events, so we couldn't tell 
the difference between  andlt; and the offsets could be wrong.  The upgraded version 
is in the patch.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org






-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3197) Optimize runs forever if you keep deleting docs at the same time

2011-06-15 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049786#comment-13049786
 ] 

Michael McCandless commented on LUCENE-3197:


Right, this has been the intended semantics of a background optimize for some 
time, ie, when it returns it only ensures that whatever was not optimized as of 
when it was called has been merged away.

This already works correctly for newly added docs, meaning if you continue 
adding docs / flushing new segments while the optimize runs, it knows that the 
newly flushed segments do not have to be merged away.

But for new deletions we are not handling it correctly, which leads to the 
forever running merges.

 Optimize runs forever if you keep deleting docs at the same time
 

 Key: LUCENE-3197
 URL: https://issues.apache.org/jira/browse/LUCENE-3197
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.3, 4.0


 Because we cascade merges for an optimize... if you also delete documents 
 while the merges are running, then the merge policy will see the resulting 
 single segment as still not optimized (since it has pending deletes) and do a 
 single-segment merge, and will repeat indefinitely (as long as your app keeps 
 deleting docs).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [jira] [Created] (SOLR-2597) XmlCharFilter

2011-06-15 Thread Mike Sokolov
Perhaps that name would be more consistent with HTMLStripCharFilter, 
yes, but it wasn't the one I was using.  Also - I mean to post a patch 
here, but left the important files on a machine which is inaccessible at 
the moment, so I will post this evening.


-Mike

On 06/15/2011 09:28 AM, Koji Sekiguchi wrote:

Did you mean Xml*Strip*CharFilter?

koji


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-2598) exampledocs/books.json should use name instead of title

2011-06-15 Thread JIRA
exampledocs/books.json should use name instead of title
---

 Key: SOLR-2598
 URL: https://issues.apache.org/jira/browse/SOLR-2598
 Project: Solr
  Issue Type: Improvement
Reporter: Jan Høydahl
Assignee: Jan Høydahl
Priority: Minor
 Fix For: 3.3


The file exampledocs/books.json currently contains two books. But they do not 
show up in the default solr/browse interface because they use title instead 
of name, which the Velocity template does not show. Also we should include a 
few more books

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-2554) RandomSortField values are cached in the FieldCache

2011-06-15 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley resolved SOLR-2554.


   Resolution: Fixed
Fix Version/s: 3.3

 RandomSortField values are cached in the FieldCache
 ---

 Key: SOLR-2554
 URL: https://issues.apache.org/jira/browse/SOLR-2554
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 3.1
Reporter: Vadim Geshel
 Fix For: 3.3


 The values of RandomSortField get cached in the FieldCache. When using many 
 RandomSortFields over time, this leads to running out of memory.
 This may be one of the cases already covered in SOLR- but I'm not sure.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2598) exampledocs/books.json should use name instead of title

2011-06-15 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-2598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-2598:
--

Attachment: (was: SOLR-2589.patch)

 exampledocs/books.json should use name instead of title
 ---

 Key: SOLR-2598
 URL: https://issues.apache.org/jira/browse/SOLR-2598
 Project: Solr
  Issue Type: Improvement
Reporter: Jan Høydahl
Assignee: Jan Høydahl
Priority: Minor
 Fix For: 3.3

 Attachments: SOLR-2598.patch


 The file exampledocs/books.json currently contains two books. But they do not 
 show up in the default solr/browse interface because they use title instead 
 of name, which the Velocity template does not show. Also we should include 
 a few more books

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2598) exampledocs/books.json should use name instead of title

2011-06-15 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-2598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-2598:
--

Attachment: SOLR-2589.patch

Attaching patch which changes title to name and adds two more books to the json 
file

 exampledocs/books.json should use name instead of title
 ---

 Key: SOLR-2598
 URL: https://issues.apache.org/jira/browse/SOLR-2598
 Project: Solr
  Issue Type: Improvement
Reporter: Jan Høydahl
Assignee: Jan Høydahl
Priority: Minor
 Fix For: 3.3

 Attachments: SOLR-2598.patch


 The file exampledocs/books.json currently contains two books. But they do not 
 show up in the default solr/browse interface because they use title instead 
 of name, which the Velocity template does not show. Also we should include 
 a few more books

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2598) exampledocs/books.json should use name instead of title

2011-06-15 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-2598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-2598:
--

Attachment: SOLR-2598.patch

Attaching patch which renames title to name and adds two more books

 exampledocs/books.json should use name instead of title
 ---

 Key: SOLR-2598
 URL: https://issues.apache.org/jira/browse/SOLR-2598
 Project: Solr
  Issue Type: Improvement
Reporter: Jan Høydahl
Assignee: Jan Høydahl
Priority: Minor
 Fix For: 3.3

 Attachments: SOLR-2598.patch


 The file exampledocs/books.json currently contains two books. But they do not 
 show up in the default solr/browse interface because they use title instead 
 of name, which the Velocity template does not show. Also we should include 
 a few more books

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3190) TestStressIndexing2 testMultiConfig failure

2011-06-15 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-3190.
-

   Resolution: Fixed
Lucene Fields: [New, Patch Available]  (was: [New])

fixed in rev 1136086

 TestStressIndexing2 testMultiConfig failure
 ---

 Key: LUCENE-3190
 URL: https://issues.apache.org/jira/browse/LUCENE-3190
 Project: Lucene - Java
  Issue Type: Bug
Reporter: selckin
Assignee: Simon Willnauer
 Attachments: LUCENE-3190.patch


 trunk: r1134311
 reproducible
 {code}
 [junit] Testsuite: org.apache.lucene.index.TestStressIndexing2
 [junit] Tests run: 1, Failures: 2, Errors: 0, Time elapsed: 0.882 sec
 [junit] 
 [junit] - Standard Error -
 [junit] java.lang.AssertionError: ram was 460908 expected: 408216 flush 
 mem: 395100 active: 65808
 [junit] at 
 org.apache.lucene.index.DocumentsWriterFlushControl.assertMemory(DocumentsWriterFlushControl.java:102)
 [junit] at 
 org.apache.lucene.index.DocumentsWriterFlushControl.doAfterDocument(DocumentsWriterFlushControl.java:164)
 [junit] at 
 org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:380)
 [junit] at 
 org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1473)
 [junit] at 
 org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1445)
 [junit] at 
 org.apache.lucene.index.TestStressIndexing2$IndexingThread.indexDoc(TestStressIndexing2.java:723)
 [junit] at 
 org.apache.lucene.index.TestStressIndexing2$IndexingThread.run(TestStressIndexing2.java:757)
 [junit] NOTE: reproduce with: ant test -Dtestcase=TestStressIndexing2 
 -Dtestmethod=testMultiConfig 
 -Dtests.seed=2571834029692482827:-8116419692655152763
 [junit] NOTE: reproduce with: ant test -Dtestcase=TestStressIndexing2 
 -Dtestmethod=testMultiConfig 
 -Dtests.seed=2571834029692482827:-8116419692655152763
 [junit] The following exceptions were thrown by threads:
 [junit] *** Thread: Thread-0 ***
 [junit] junit.framework.AssertionFailedError: java.lang.AssertionError: 
 ram was 460908 expected: 408216 flush mem: 395100 active: 65808
 [junit] at junit.framework.Assert.fail(Assert.java:47)
 [junit] at 
 org.apache.lucene.index.TestStressIndexing2$IndexingThread.run(TestStressIndexing2.java:762)
 [junit] NOTE: test params are: codec=RandomCodecProvider: {f33=Standard, 
 f57=MockFixedIntBlock(blockSize=649), f11=Standard, f41=MockRandom, 
 f40=Standard, f62=MockRandom, f75=Standard, f73=MockSep, 
 f29=MockFixedIntBlock(blockSize=649), f83=MockRandom, f66=MockSep, 
 f49=MockVariableIntBlock(baseBlockSize=9), f72=Pulsing(freqCutoff=7), 
 f54=Standard, id=MockFixedIntBlock(blockSize=649), f80=MockRandom, 
 f94=MockSep, f93=Pulsing(freqCutoff=7), f95=Standard}, locale=en_SG, 
 timezone=Pacific/Palau
 [junit] NOTE: all tests run in this JVM:
 [junit] [TestStressIndexing2]
 [junit] NOTE: Linux 2.6.39-gentoo amd64/Sun Microsystems Inc. 1.6.0_25 
 (64-bit)/cpus=8,threads=1,free=133324528,total=158400512
 [junit] -  ---
 [junit] Testcase: 
 testMultiConfig(org.apache.lucene.index.TestStressIndexing2): FAILED
 [junit] r1.numDocs()=17 vs r2.numDocs()=16
 [junit] junit.framework.AssertionFailedError: r1.numDocs()=17 vs 
 r2.numDocs()=16
 [junit] at 
 org.apache.lucene.index.TestStressIndexing2.verifyEquals(TestStressIndexing2.java:308)
 [junit] at 
 org.apache.lucene.index.TestStressIndexing2.verifyEquals(TestStressIndexing2.java:278)
 [junit] at 
 org.apache.lucene.index.TestStressIndexing2.testMultiConfig(TestStressIndexing2.java:124)
 [junit] at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403)
 [junit] at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321)
 [junit] 
 [junit] 
 [junit] Testcase: 
 testMultiConfig(org.apache.lucene.index.TestStressIndexing2): FAILED
 [junit] Some threads threw uncaught exceptions!
 [junit] junit.framework.AssertionFailedError: Some threads threw uncaught 
 exceptions!
 [junit] at 
 org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:603)
 [junit] at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403)
 [junit] at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321)
 [junit] 
 [junit] 
 [junit] Test org.apache.lucene.index.TestStressIndexing2 FAILED
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs

2011-06-15 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049908#comment-13049908
 ] 

Michael McCandless commented on LUCENE-3191:


For 3.x, I think we should make an exception to back-compat and break the API 
(changing FieldComp.value(..) to return T not Comparable; changing 
FieldDoc.fields from Comparable[] to Object[]).  I'll advertise the break in 
CHANGES.

 Add TopDocs.merge to merge multiple TopDocs
 ---

 Key: LUCENE-3191
 URL: https://issues.apache.org/jira/browse/LUCENE-3191
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 3.3, 4.0

 Attachments: LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch, 
 LUCENE-3191.patch


 It's not easy today to merge TopDocs, eg produced by multiple shards,
 supporting arbitrary Sort.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs

2011-06-15 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049928#comment-13049928
 ] 

Uwe Schindler commented on LUCENE-3191:
---

I think this has less impact on users. Two user types:

- People using FieldDoc.fields[] would always cast the return type, so a simple 
recompile should be fine
- People writing own FieldComparators must change return value of getValue() 
and maybe add generics (not required)
- People that dont implement compareValue() will be also fine, as the default 
impl casts to Comparable and that will have the same behaviour

The 3.x impl just have to fix FieldDocSortedHitQueue to use compareValue() and 
remove the negation for scores.

 Add TopDocs.merge to merge multiple TopDocs
 ---

 Key: LUCENE-3191
 URL: https://issues.apache.org/jira/browse/LUCENE-3191
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 3.3, 4.0

 Attachments: LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch, 
 LUCENE-3191.patch


 It's not easy today to merge TopDocs, eg produced by multiple shards,
 supporting arbitrary Sort.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs

2011-06-15 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3191:
---

Attachment: LUCENE-3191-3x.patch

Patch for merging back to 3.x.

 Add TopDocs.merge to merge multiple TopDocs
 ---

 Key: LUCENE-3191
 URL: https://issues.apache.org/jira/browse/LUCENE-3191
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 3.3, 4.0

 Attachments: LUCENE-3191-3x.patch, LUCENE-3191.patch, 
 LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch


 It's not easy today to merge TopDocs, eg produced by multiple shards,
 supporting arbitrary Sort.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs

2011-06-15 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049932#comment-13049932
 ] 

Uwe Schindler commented on LUCENE-3191:
---

Patch looks good, let the BackwardsPoliceman think about some possibilities to 
lower the risk of breaking code. Of course nothing sophisticated...

 Add TopDocs.merge to merge multiple TopDocs
 ---

 Key: LUCENE-3191
 URL: https://issues.apache.org/jira/browse/LUCENE-3191
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 3.3, 4.0

 Attachments: LUCENE-3191-3x.patch, LUCENE-3191.patch, 
 LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch


 It's not easy today to merge TopDocs, eg produced by multiple shards,
 supporting arbitrary Sort.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[Lucene.Net] [jira] [Commented] (LUCENENET-417) implement streams as field values

2011-06-15 Thread Digy (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENENET-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049937#comment-13049937
 ] 

Digy commented on LUCENENET-417:


Maybe something like this

doc.Add(new Field(name,-
doc.Add(new Field(metadata,-
doc.Add(new Field(content,part1-
doc.Add(new Field(content,part2-

doc.Add(new Field(content,partN-


DIGY

 implement streams as field values
 -

 Key: LUCENENET-417
 URL: https://issues.apache.org/jira/browse/LUCENENET-417
 Project: Lucene.Net
  Issue Type: New Feature
  Components: Lucene.Net Core
Reporter: Christopher Currens
 Attachments: StreamValues.patch


 Adding binary values to a field is an expensive operation, as the whole 
 binary data must be loaded into memory and then written to the index.  Adding 
 the ability to use a stream instead of a byte array could not only speed up 
 the indexing process, but reducing the memory footprint as well.
 -Java lucene has the ability to use a TextReader the both analyze and store 
 text in the index.-  Lucene.NET lacks the ability to store string data in the 
 index via streams. This should be a feature added into Lucene .NET as well.  
 My thoughts are to add another Field constructor, that is Field(string name, 
 System.IO.Stream stream, System.Text.Encoding encoding), that will allow the 
 text to be analyzed and stored into the index.
 Comments about this approach are greatly appreciated.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs

2011-06-15 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3191:
---

Attachment: LUCENE-3191.patch

Small further patch for trunk:

  * Simplifies the API by moving shardIndex onto ScoreDoc

  * Fixes TopDocs.merge to return TopFieldDocs if the Sort != null

  * A couple FieldComparators must override compareValue because the
values may be null.


 Add TopDocs.merge to merge multiple TopDocs
 ---

 Key: LUCENE-3191
 URL: https://issues.apache.org/jira/browse/LUCENE-3191
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 3.3, 4.0

 Attachments: LUCENE-3191-3x.patch, LUCENE-3191.patch, 
 LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch


 It's not easy today to merge TopDocs, eg produced by multiple shards,
 supporting arbitrary Sort.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3197) Optimize runs forever if you keep deleting docs at the same time

2011-06-15 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3197:
---

Attachment: LUCENE-3197.patch

Patch.


 Optimize runs forever if you keep deleting docs at the same time
 

 Key: LUCENE-3197
 URL: https://issues.apache.org/jira/browse/LUCENE-3197
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.3, 4.0

 Attachments: LUCENE-3197.patch


 Because we cascade merges for an optimize... if you also delete documents 
 while the merges are running, then the merge policy will see the resulting 
 single segment as still not optimized (since it has pending deletes) and do a 
 single-segment merge, and will repeat indefinitely (as long as your app keeps 
 deleting docs).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-2599) FieldCopy Update Processor

2011-06-15 Thread JIRA
FieldCopy Update Processor
--

 Key: SOLR-2599
 URL: https://issues.apache.org/jira/browse/SOLR-2599
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Jan Høydahl


Need an UpdateProcessor which can copy and move fields

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2597) XmlCharFilter

2011-06-15 Thread Mike Sokolov (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Sokolov updated SOLR-2597:
---

Attachment: SOLR-2597.patch

I tried to include the upgraded Woodstox jars, but I don't think I figured how 
to put binaries in the patch actually.  What's needed are: 
http://repository.codehaus.org/org/codehaus/woodstox/woodstox-core-asl/4.1.1/woodstox-core-asl-4.1.1.jar
 and 
http://repository.codehaus.org/org/codehaus/woodstox/stax2-api/3.1.1/stax2-api-3.1.1.jar
which replace the existing wstx-asl-xxx.jar. 

 XmlCharFilter
 -

 Key: SOLR-2597
 URL: https://issues.apache.org/jira/browse/SOLR-2597
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 4.0
Reporter: Mike Sokolov
 Attachments: SOLR-2597.patch


 This CharFilter processes incoming XML using the Woodstox parser, stripping 
 all non-text content and remembering offsets, just like HTMLCharFilter, but 
 respecting XML conventions like XML entities defined in a DTD.  XmlCharFilter 
 also provides the ability to exclude (and include) the content of certain 
 named elements.
 In order to compute character offsets properly when mixed line termination 
 styles are present (\r, \r\n), or when XML character entities (lt;, quot;, 
 amp;) are present, we require a newer version of Woodstox (4.1.1) than is 
 currently in solr/lib.  The earlier versions of the parser could not report 
 these entity events, so we couldn't tell the difference between  and 
 lt; and the offsets could be wrong.  The upgraded version is in the patch.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2593) A new core admin command 'split' for splitting index

2011-06-15 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050137#comment-13050137
 ] 

Hoss Man commented on SOLR-2593:


bq. If it's possible, it would be cool to have config parameters to:

...those seem like they should be discrete actions that can be taken after the 
split has happened.  the simplest thing is to have a split action that _just_ 
creates a new core with the docs selected either using the fq (or randomly 
selection) and then use other CoreAdmin actions for the other stuff: rename, 
swap, swap+delete (the old one), merge ... merge is really the only one we 
don't have at a core level yet (i think)



 A new core admin command 'split' for splitting index
 

 Key: SOLR-2593
 URL: https://issues.apache.org/jira/browse/SOLR-2593
 Project: Solr
  Issue Type: New Feature
Reporter: Noble Paul
 Fix For: 4.0


 If an index is too large/hot it would be desirable to split it out to another 
 core .
 This core may eventually be replicated out to another host.
 There can be to be multiple strategies 
 * random split of x or x% 
 * fq=user:johndoe
 example 
 example :
 command=splitsplit=20percentnewcore=my_new_index
 or
 command=splitfq=user:johndoenewcore=john_doe_index

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2593) A new core admin command 'split' for splitting index

2011-06-15 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050139#comment-13050139
 ] 

Hoss Man commented on SOLR-2593:


one thing to think about when talking about the API is how the implementation 
will actually work.

the fq type option is basically going to require making a full copy of hte 
index and then deleting by query. (unless i'm missing something) but for people 
who don't care how the index is partitioned a more efficient approach could 
probably happen by working at the segment level -- let the user say split off 
a hunk of at least 20% but no more then 50% and then you can look at 
individual segments and doc counts and see if it's possible to just move 
segments around (and maybe only do the copy+deleteByQuery logic on a single 
segment.


 A new core admin command 'split' for splitting index
 

 Key: SOLR-2593
 URL: https://issues.apache.org/jira/browse/SOLR-2593
 Project: Solr
  Issue Type: New Feature
Reporter: Noble Paul
 Fix For: 4.0


 If an index is too large/hot it would be desirable to split it out to another 
 core .
 This core may eventually be replicated out to another host.
 There can be to be multiple strategies 
 * random split of x or x% 
 * fq=user:johndoe
 example 
 example :
 command=splitsplit=20percentnewcore=my_new_index
 or
 command=splitfq=user:johndoenewcore=john_doe_index

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-2596) Enhance CoreAdmin mergeindexes to use a core's index as the source

2011-06-15 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-2596.


   Resolution: Duplicate
Fix Version/s: (was: 4.0)

dup of SOLR-1331

 Enhance CoreAdmin mergeindexes to use a core's index as the source
 --

 Key: SOLR-2596
 URL: https://issues.apache.org/jira/browse/SOLR-2596
 Project: Solr
  Issue Type: Improvement
  Components: multicore, update
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar

 Enhance CoreAdmin mergeindexes to use a core's index as the source. Right now 
 the mergeindexes command accepts a list of index directories on the local 
 disk which is not very convenient.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-2600) ensure example schema.xml has some mention/explanation of per field similarity vs similarityprovider vs (global) similarity

2011-06-15 Thread Hoss Man (JIRA)
ensure example schema.xml has some mention/explanation of per field similarity 
vs similarityprovider vs (global) similarity
---

 Key: SOLR-2600
 URL: https://issues.apache.org/jira/browse/SOLR-2600
 Project: Solr
  Issue Type: Task
  Components: documentation
Reporter: Hoss Man
Priority: Blocker
 Fix For: 4.0


when SOLR-2338 was commited, there wasn't yet clear understanding of how much 
the new feature per field similarity fields (vs custom similarity provider (vs 
global similarity factory)) should be advertised in the example configs, and 
what type of usage should be encouraged/promoted.

it's likely that by the time 4.0 is released, new language specific field types 
will already demonstrate these features, and no additional artificial usages 
of them will be needed, but one way or another we should ensure that they are 
either demoed or mentioned in comments

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-3.x - Build # 409 - Failure

2011-06-15 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-3.x/409/

1 tests failed.
FAILED:  org.apache.lucene.util.fst.TestFSTs.testBigSet

Error Message:
Java heap space

Stack Trace:
java.lang.OutOfMemoryError: Java heap space
at org.apache.lucene.util.fst.NodeHash.rehash(NodeHash.java:156)
at org.apache.lucene.util.fst.NodeHash.add(NodeHash.java:126)
at org.apache.lucene.util.fst.Builder.compileNode(Builder.java:118)
at org.apache.lucene.util.fst.Builder.compilePrevTail(Builder.java:204)
at org.apache.lucene.util.fst.Builder.add(Builder.java:321)
at 
org.apache.lucene.util.fst.TestFSTs$FSTTester.doTest(TestFSTs.java:463)
at 
org.apache.lucene.util.fst.TestFSTs$FSTTester.doTest(TestFSTs.java:359)
at org.apache.lucene.util.fst.TestFSTs.doTest(TestFSTs.java:211)
at 
org.apache.lucene.util.fst.TestFSTs.testRandomWords(TestFSTs.java:944)
at org.apache.lucene.util.fst.TestFSTs.testBigSet(TestFSTs.java:964)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1268)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1186)




Build Log (for compile errors):
[...truncated 12477 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2597) XmlCharFilter

2011-06-15 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050184#comment-13050184
 ] 

Hoss Man commented on SOLR-2597:


Mike: thanks for the patch!

as Koji mentioned on the mailing list, might want to consider naming this 
XmlStripCharFilter ... that was my first opinion, but reading the docs the 
include and exclude options definitely make it a bit more generic, so i'm 
leaning towards the opinion that XmlCharFilter is better.

(there's an argument to be made that we should have an XmlStripCharFilter that 
only removes pi/comments/whitespace and resolves entities, and then a distinct 
XmlTagCharFilter that does the include/exclude -- but i'm guessing that would 
be less efficient since this makes it possible to do in one pass, and anyone 
who wants include/exclude at the tag level is almost certainly going to want 
the striping/entities as well)

skiming the patch i'm +1 except for the new Random in the test case ... if 
you take a look at the existing test cases you'll see how you can hook into the 
solr test framework to get random values that are consistent with a global seed 
-- that way if a test fails, it will report which seed was used and people can 
reproduce it using system properties.

would also be nice to have a test of the Factory (using a schema.xml 
declaration) but that's not nearly as important.

and of course: would be great if the xml policeman uwe could review.

bq. I tried to include the upgraded Woodstox jars, but I don't think I figured 
how to put binaries in the patch actually.

it's not possible, so don't worry about it.  the important thing is noting in a 
comment (like you did) exactly what new/upgraded jars are needed.


 XmlCharFilter
 -

 Key: SOLR-2597
 URL: https://issues.apache.org/jira/browse/SOLR-2597
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 4.0
Reporter: Mike Sokolov
 Attachments: SOLR-2597.patch


 This CharFilter processes incoming XML using the Woodstox parser, stripping 
 all non-text content and remembering offsets, just like HTMLCharFilter, but 
 respecting XML conventions like XML entities defined in a DTD.  XmlCharFilter 
 also provides the ability to exclude (and include) the content of certain 
 named elements.
 In order to compute character offsets properly when mixed line termination 
 styles are present (\r, \r\n), or when XML character entities (lt;, quot;, 
 amp;) are present, we require a newer version of Woodstox (4.1.1) than is 
 currently in solr/lib.  The earlier versions of the parser could not report 
 these entity events, so we couldn't tell the difference between  and 
 lt; and the offsets could be wrong.  The upgraded version is in the patch.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-trunk - Build # 1596 - Still Failing

2011-06-15 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-trunk/1596/

No tests ran.

Build Log (for compile errors):
[...truncated 11265 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2509) spellcheck: StringIndexOutOfBoundsException: String index out of range: -1

2011-06-15 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-2509:
---

Description: 
Hi,

I'm a french user of SOLR and i've encountered a problem since i've installed 
SOLR 3.1.

I've got an error with this query : 
cle_frbr:LYSROUGE1149-73190

*SEE COMMENTS BELOW*

I've tested to escape the minus char and the query worked :
cle_frbr:LYSROUGE1149(BACKSLASH)-73190

But, strange fact, if i change one letter in my query it works :
cle_frbr:LASROUGE1149-73190


I've tested the same query on SOLR 1.4 and it works !

Can someone test the query on next line on a 3.1 SOLR version and tell me if he 
have the same problem ? 
yourfield:LYSROUGE1149-73190

Where do the problem come from ?

Thank you by advance for your help.

Tom

  was:
Hi,

I'm a french user of SOLR and i've encountered a problem since i've installed 
SOLR 3.1.

I've got an error with this query : 
cle_frbr:LYSROUGE1149-73190

The error is :
HTTP ERROR 500

Problem accessing /solr/select. Reason:

String index out of range: -1

java.lang.StringIndexOutOfBoundsException: String index out of range: -1
at 
java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:797)
at java.lang.StringBuilder.replace(StringBuilder.java:271)
at 
org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:131)
at 
org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:69)
at 
org.apache.solr.handler.component.SpellCheckComponent.addCollationsToResponse(SpellCheckComponent.java:179)
at 
org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:157)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)


I've tested to escape the minus char and the query worked :
cle_frbr:LYSROUGE1149(BACKSLASH)-73190

But, strange fact, if i change one letter in my query it works :
cle_frbr:LASROUGE1149-73190


I've tested the same query on SOLR 1.4 and it works !

Can someone test the query on next line on a 3.1 SOLR version and tell me if he 
have the same problem ? 
yourfield:LYSROUGE1149-73190

Where do the problem come from ?

Thank you by advance for your help.

Tom

Summary: spellcheck: StringIndexOutOfBoundsException: String index out 
of range: -1  (was: String index out of range: -1)

Moved original stack trace out of description for brevity...

{noformat}
The error is :
HTTP ERROR 500

Problem accessing /solr/select. Reason:

String index out of range: -1

java.lang.StringIndexOutOfBoundsException: String index out of range: -1
at 
java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:797)
at java.lang.StringBuilder.replace(StringBuilder.java:271)
at 
org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:131)
at 
org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:69)
at 
org.apache.solr.handler.component.SpellCheckComponent.addCollationsToResponse(SpellCheckComponent.java:179)
at 

[jira] [Commented] (SOLR-2509) spellcheck: StringIndexOutOfBoundsException: String index out of range: -1

2011-06-15 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050212#comment-13050212
 ] 

Hoss Man commented on SOLR-2509:


the stack trace is different, but based on the fact that it has to do with 
using spellcheck.collate and - in the query this smells like it might be 
realted to SOLR-1630

 spellcheck: StringIndexOutOfBoundsException: String index out of range: -1
 --

 Key: SOLR-2509
 URL: https://issues.apache.org/jira/browse/SOLR-2509
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.1
 Environment: Debian Lenny
 JAVA Version 1.6.0_20
Reporter: Thomas Gambier
Priority: Blocker

 Hi,
 I'm a french user of SOLR and i've encountered a problem since i've installed 
 SOLR 3.1.
 I've got an error with this query : 
 cle_frbr:LYSROUGE1149-73190
 *SEE COMMENTS BELOW*
 I've tested to escape the minus char and the query worked :
 cle_frbr:LYSROUGE1149(BACKSLASH)-73190
 But, strange fact, if i change one letter in my query it works :
 cle_frbr:LASROUGE1149-73190
 I've tested the same query on SOLR 1.4 and it works !
 Can someone test the query on next line on a 3.1 SOLR version and tell me if 
 he have the same problem ? 
 yourfield:LYSROUGE1149-73190
 Where do the problem come from ?
 Thank you by advance for your help.
 Tom

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-1331) Support merging multiple cores

2011-06-15 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar reassigned SOLR-1331:
---

Assignee: Shalin Shekhar Mangar

 Support merging multiple cores
 --

 Key: SOLR-1331
 URL: https://issues.apache.org/jira/browse/SOLR-1331
 Project: Solr
  Issue Type: New Feature
  Components: multicore
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
 Fix For: 3.3


 There should be a provision to merge one core with another. It should be 
 possible to create a core, add documents to it and then just merge it into 
 the main core which is serving requests. This way, the user will not need to 
 know the filesystem as it is needed for SOLR-1051

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org