date:20131011


 [ 
https://issues.apache.org/jira/browse/LUCENE-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-5269.
-

Resolution: Fixed

 TestRandomChains failure
 

 Key: LUCENE-5269
 URL: https://issues.apache.org/jira/browse/LUCENE-5269
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Fix For: 4.5.1, 4.6, 5.0

 Attachments: LUCENE-5269.patch, LUCENE-5269.patch, LUCENE-5269.patch, 
 LUCENE-5269_test.patch, LUCENE-5269_test.patch, LUCENE-5269_test.patch


 One of EdgeNGramTokenizer, ShingleFilter, NGramTokenFilter is buggy, or 
 possibly only the combination of them conspiring together.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5277) Modify FixedBitSet copy constructor to take numBits to allow grow/shrink the new bitset

2013-10-11 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792396#comment-13792396
 ] 

Uwe Schindler commented on LUCENE-5277:
---

Is there any issue that will use the new ctor? As the current ctor is unused 
why not simply remove it and leave adding the new one to an issue that really 
needs it?

 Modify FixedBitSet copy constructor to take numBits to allow grow/shrink the 
 new bitset
 ---

 Key: LUCENE-5277
 URL: https://issues.apache.org/jira/browse/LUCENE-5277
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
 Attachments: LUCENE-5277.patch


 FixedBitSet copy constructor is redundant the way it is now -- one can call 
 FBS.clone() to achieve that (and indeed, no code in Lucene calls this ctor). 
 I think it will be useful to add a numBits parameter to that method to allow 
 growing/shrinking the new bitset, while copying all relevant bits from the 
 passed one.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5269) TestRandomChains failure

2013-10-11 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792402#comment-13792402
 ] 

Uwe Schindler commented on LUCENE-5269:
---

This is so crazy! Why did we never hit this combination before?

Thanks for fixing, although I see the CodePointLengthFilter not really as a bug 
fix, it is more a new feature! Maybe explicitely add this as new feature to 
changes.txt?

 TestRandomChains failure
 

 Key: LUCENE-5269
 URL: https://issues.apache.org/jira/browse/LUCENE-5269
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Fix For: 4.5.1, 4.6, 5.0

 Attachments: LUCENE-5269.patch, LUCENE-5269.patch, LUCENE-5269.patch, 
 LUCENE-5269_test.patch, LUCENE-5269_test.patch, LUCENE-5269_test.patch


 One of EdgeNGramTokenizer, ShingleFilter, NGramTokenFilter is buggy, or 
 possibly only the combination of them conspiring together.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5269) TestRandomChains failure


[ 
https://issues.apache.org/jira/browse/LUCENE-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792424#comment-13792424
 ] 

Robert Muir commented on LUCENE-5269:
-

I didnt want new features mixed with bugfixes really :(

But in my opinion this was the simplest way to solve the problem: to just add a 
filter like this and for it to use that instead of LengthFilter.

I think it would be wierd to see new features in a 4.5.1?

 TestRandomChains failure
 

 Key: LUCENE-5269
 URL: https://issues.apache.org/jira/browse/LUCENE-5269
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Fix For: 4.5.1, 4.6, 5.0

 Attachments: LUCENE-5269.patch, LUCENE-5269.patch, LUCENE-5269.patch, 
 LUCENE-5269_test.patch, LUCENE-5269_test.patch, LUCENE-5269_test.patch


 One of EdgeNGramTokenizer, ShingleFilter, NGramTokenFilter is buggy, or 
 possibly only the combination of them conspiring together.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5269) TestRandomChains failure


[ 
https://issues.apache.org/jira/browse/LUCENE-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792429#comment-13792429
 ] 

Robert Muir commented on LUCENE-5269:
-

{quote}
This is so crazy! Why did we never hit this combination before?
{quote}

This combination is especially good at finding the bug, here's why:
{code}
Tokenizer tokenizer = new EdgeNGramTokenizer(TEST_VERSION_CURRENT, reader, 2, 
94);
TokenStream stream = new ShingleFilter(tokenizer, 5);
stream = new NGramTokenFilter(TEST_VERSION_CURRENT, stream, 55, 83);
{code}

The edge-ngram has min=2 max=94, its basically brute forcing every token size.
then the shingles makes tons of tokens with positionIncrement=0.
so it makes it easy for the (previously buggy ngramtokenfilter with wrong 
length filter) to misclassify tokens with its logic expecting codepoints, emit 
an initial token with posinc=0:

{code}
if ((curPos + curGramSize) = curCodePointCount) {
...
  posIncAtt.setPositionIncrement(curPosInc);
{code}


 TestRandomChains failure
 

 Key: LUCENE-5269
 URL: https://issues.apache.org/jira/browse/LUCENE-5269
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Fix For: 4.5.1, 4.6, 5.0

 Attachments: LUCENE-5269.patch, LUCENE-5269.patch, LUCENE-5269.patch, 
 LUCENE-5269_test.patch, LUCENE-5269_test.patch, LUCENE-5269_test.patch


 One of EdgeNGramTokenizer, ShingleFilter, NGramTokenFilter is buggy, or 
 possibly only the combination of them conspiring together.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5338) Split shards by a route key


 [ 
https://issues.apache.org/jira/browse/SOLR-5338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-5338:


Description: 
Provide a way to split a shard using a route key such that all documents of the 
specified route key end up in a single dedicated sub-shard.

Example:
Assume that collection1, shard1 has hash range [0, 20]. Also that route key 
'A!' has hash range [12,15]. Then invoking:
{code}
/admin/collections?action=SPLITcollection=collection1split.key=A!
{code}
should produce three sub-shards with hash range [0,11], [12,15] and [16,20].

Specifying the source shard is not required here because the route key is 
enough to figure it out. Route keys spanning more than one shards will not be 
supported.

Note that the sub-shard with the hash range of the route key may also contain 
documents for other route keys whose hashes collide.



  was:
Provide a way to split a shard using a route key such that all documents of the 
specified route key end up in a single dedicated sub-shard.

Example:
Assume that collection1, shard1 has hash range [0, 20]. Also that route key 
'A!' has hash range [12,15]. Then invoking:
{code}
/admin/collections?action=SPLITcollection=collection1split.key=A!
{code}
should produce three sub-shards with hash range [0,11], [12,15] and [16,20]. 
Then the sub-shard dedicated to documents for route key 'A!' can be scaled 
separately.

Specifying the source shard is not required here because the route key is 
enough to figure it out.




 Split shards by a route key
 ---

 Key: SOLR-5338
 URL: https://issues.apache.org/jira/browse/SOLR-5338
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
 Fix For: 4.6, 5.0


 Provide a way to split a shard using a route key such that all documents of 
 the specified route key end up in a single dedicated sub-shard.
 Example:
 Assume that collection1, shard1 has hash range [0, 20]. Also that route key 
 'A!' has hash range [12,15]. Then invoking:
 {code}
 /admin/collections?action=SPLITcollection=collection1split.key=A!
 {code}
 should produce three sub-shards with hash range [0,11], [12,15] and [16,20].
 Specifying the source shard is not required here because the route key is 
 enough to figure it out. Route keys spanning more than one shards will not be 
 supported.
 Note that the sub-shard with the hash range of the route key may also contain 
 documents for other route keys whose hashes collide.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5310) Add a collection admin command to remove a replica


 [ 
https://issues.apache.org/jira/browse/SOLR-5310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-5310:
-

Attachment: SOLR-5310.patch

 Add a collection admin command to remove a replica
 --

 Key: SOLR-5310
 URL: https://issues.apache.org/jira/browse/SOLR-5310
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Noble Paul
 Attachments: SOLR-5310.patch

   Original Estimate: 72h
  Remaining Estimate: 72h

 the only way a replica can removed is by unloading the core .There is no way 
 to remove a replica that is down . So, the clusterstate will have 
 unreferenced nodes if a few nodes go down over time
 We need a cluster admin command to clean that up
 e.g: 
 /admin/collections?action=DELETEREPLICAcollection=coll1shard=shard1replica=core_node3
 The system would first see if the replica is active. If yes , a core UNLOAD 
 command is fired , which would take care of deleting the replica from the 
 clusterstate as well
 if the state is inactive, then the core or node may be down , in that case 
 the entry is removed from cluster state  



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5260) Make older Suggesters more accepting of TermFreqPayloadIterator

2013-10-11 Thread Areek Zillur (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Areek Zillur updated LUCENE-5260:
-

Attachment: LUCENE-5260.patch

Uploaded Patch:
  - changed the input to lookup.build to take TermFreqPayloadIterator instead 
of TermFreqPayloadIterator 
  - made all suggesters compatible with termFreqPayloadIterator (but error if 
payload is present but cannot be used)
  - nuked all implementations of TermFreq and made them work with 
termFreqPayload instead (Except for SortedTermFreqIteratorWrapper). 
  - got rid of all the references to termFreqIter

Still todo:
  - actually nuke TermFreqIterator
  - change the names of the implementations to reflect that they are 
implementations of TermFreqPayloadIter
  - add tests to ensure that all the implementations work with payload
  - support payloads in SortedTermFreqIteratorWrapper

 Make older Suggesters more accepting of TermFreqPayloadIterator
 ---

 Key: LUCENE-5260
 URL: https://issues.apache.org/jira/browse/LUCENE-5260
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Reporter: Areek Zillur
 Attachments: LUCENE-5260.patch


 As discussed in https://issues.apache.org/jira/browse/LUCENE-5251, it would 
 be nice to make the older suggesters accepting of TermFreqPayloadIterator and 
 throw an exception if payload is found (if it cannot be used). 
 This will also allow us to nuke most of the other interfaces for 
 BytesRefIterator. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5269) TestRandomChains failure

2013-10-11 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792461#comment-13792461
 ] 

Uwe Schindler commented on LUCENE-5269:
---

bq. I didnt want new features mixed with bugfixes really 

I agree! But now we have the new feature, so I just asked to add this as a 
separate entry in CHANGES.txt under New features, just the new filter nothing 
more.

 TestRandomChains failure
 

 Key: LUCENE-5269
 URL: https://issues.apache.org/jira/browse/LUCENE-5269
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Fix For: 4.5.1, 4.6, 5.0

 Attachments: LUCENE-5269.patch, LUCENE-5269.patch, LUCENE-5269.patch, 
 LUCENE-5269_test.patch, LUCENE-5269_test.patch, LUCENE-5269_test.patch


 One of EdgeNGramTokenizer, ShingleFilter, NGramTokenFilter is buggy, or 
 possibly only the combination of them conspiring together.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5290) Warming up using search logs.

2013-10-11 Thread Minoru Osuka (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792471#comment-13792471
 ] 

Minoru Osuka commented on SOLR-5290:


The patch includes test code.

 Warming up using search logs.
 -

 Key: SOLR-5290
 URL: https://issues.apache.org/jira/browse/SOLR-5290
 Project: Solr
  Issue Type: Wish
  Components: search
Affects Versions: 4.4
Reporter: Minoru Osuka
Priority: Minor
 Attachments: SOLR-5290.patch


 It is possible to warm up of cache automatically in newSearcher event, but it 
 is impossible to warm up of cache automatically in firstSearcher event 
 because there isn't old searcher.
 We describe queries in solrconfig.xml if we required to cache in 
 firstSearcher event like this:
 {code:xml}
 listener event=firstSearcher class=solr.QuerySenderListener
   arr name=queries
 lst
   str name=qstatic firstSearcher warming in solrconfig.xml/str
 /lst
   /arr
 /listener
 {code}
 This setting is very statically. I want to query dynamically in firstSearcher 
 event when restart solr. So I paid my attention to the past search log. I 
 think if there are past search logs, it is possible to warm up of cache 
 automatically in firstSearcher event like an autowarming of the cache in 
 newSearcher event.
 I had created QueryLogSenderListener which extended QuerySenderListener.
 Sample definition in solrconfig.xml:
  - directory : Specify the Solr log directory. (Required)
  - regex : Describe the regular expression of log. (Required)
  - encoding : Specify the Solr log encoding. (Default : UTF-8)
  - count : Specify the number of the log to process. (Default : 100)
  - paths : Specify the request handler name to process.
  - exclude_params : Specify the request parameter to except.
 {code:xml}
 !-- Warming up using search logs.
   --
 listener event=firstSearcher class=solr.QueryLogSenderListener
   arr name=queries
 lst
   str name=qstatic firstSearcher warming in solrconfig.xml/str
 /lst
   /arr
   str name=directorylogs/str
   str name=encodingUTF-8/str
   str 
 name=regex![CDATA[^(?level[\w]+)\s+\-\s+(?timestamp[\d\-\s\.:]+);\s+(?class[\w\.\_\$]+);\s+\[(?core.+)\]\s+webapp=(?webapp.+)\s+path=(?path.+)\s+params=\{(?params.*)\}\s+hits=(?hits\d+)\s+status=(?status\d+)\s+QTime=(?qtime\d+).*]]/str
   arr name=paths
 str/select/str
   /arr
   int name=count100/int
   arr name=exclude_params
 strindent/str
 str_/str
   /arr
 /listener
 {code}
 I'd like to propose this feature.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-5339) solr-core-4.4's ip is not right when the os is centos 5.6 sometimes

2013-10-11 Thread dejie Chang (JIRA)

dejie Chang created SOLR-5339:
-

 Summary: solr-core-4.4's ip is not right when the os is centos 5.6 
sometimes 
 Key: SOLR-5339
 URL: https://issues.apache.org/jira/browse/SOLR-5339
 Project: Solr
  Issue Type: Bug
  Components: contrib - Clustering
Affects Versions: 4.4
 Environment: centos 5.6
Reporter: dejie Chang
Priority: Critical


when I install the solr-cloud on the centos5.6 . t



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5338) Split shards by a route key


 [ 
https://issues.apache.org/jira/browse/SOLR-5338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-5338:


Attachment: SOLR-5338.patch

Changes:
* Introduces two new methods in CompositeIdRouter 
{code}
public ListRange partitionRangeByKey(String key, Range range)
{code}
and
{code}
public Range routeKeyHashRange(String routeKey)
{code}
* The collection split action accepts a new parameter 'split.key'
* The parent slice is found and its range is partitioned according to split.key
* We re-use the logic introduced in SOLR-5300 to do the actual splitting. 

 Split shards by a route key
 ---

 Key: SOLR-5338
 URL: https://issues.apache.org/jira/browse/SOLR-5338
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
 Fix For: 4.6, 5.0

 Attachments: SOLR-5338.patch


 Provide a way to split a shard using a route key such that all documents of 
 the specified route key end up in a single dedicated sub-shard.
 Example:
 Assume that collection1, shard1 has hash range [0, 20]. Also that route key 
 'A!' has hash range [12,15]. Then invoking:
 {code}
 /admin/collections?action=SPLITcollection=collection1split.key=A!
 {code}
 should produce three sub-shards with hash range [0,11], [12,15] and [16,20].
 Specifying the source shard is not required here because the route key is 
 enough to figure it out. Route keys spanning more than one shards will not be 
 supported.
 Note that the sub-shard with the hash range of the route key may also contain 
 documents for other route keys whose hashes collide.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5339) solr-core-4.4's ip is not right when the os is centos 5.6 sometimes

2013-10-11 Thread dejie Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dejie Chang updated SOLR-5339:
--

Description: when I install the solr-cloud on the centos5.6 . it is strange 
that sometimes ,the ip is not correct which is displayed on the 
http://192.168.10.54:8081/solr/#/~cloud , it is 202.106.199.36,but my actual is 
192.168.10.54. but on the windows it is right. and i found it is because of 
hostaddress = InetAddress.getLocalHost().getHostAddress(); in ZkController.java 
. sometimes the method which get ip is not correct .we should not trust . so i 
think in linux we should not use this method   (was: when I install the 
solr-cloud on the centos5.6 . t)

 solr-core-4.4's ip is not right when the os is centos 5.6 sometimes 
 

 Key: SOLR-5339
 URL: https://issues.apache.org/jira/browse/SOLR-5339
 Project: Solr
  Issue Type: Bug
  Components: contrib - Clustering
Affects Versions: 4.4
 Environment: centos 5.6
Reporter: dejie Chang
Priority: Critical

 when I install the solr-cloud on the centos5.6 . it is strange that sometimes 
 ,the ip is not correct which is displayed on the 
 http://192.168.10.54:8081/solr/#/~cloud , it is 202.106.199.36,but my actual 
 is 192.168.10.54. but on the windows it is right. and i found it is because 
 of hostaddress = InetAddress.getLocalHost().getHostAddress(); in 
 ZkController.java . sometimes the method which get ip is not correct .we 
 should not trust . so i think in linux we should not use this method 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5320) Multi level compositeId router

2013-10-11 Thread Anshum Gupta (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792478#comment-13792478
 ] 

Anshum Gupta commented on SOLR-5320:


A 3 level composite id routing to begin with is what I think would be good.
I'd use 8 bits each from the first 2 components of the key and 16 bits from the 
last component.
Functionally, this should work on similar lines as the current 2-level 
composite id routing.

 Multi level compositeId router
 --

 Key: SOLR-5320
 URL: https://issues.apache.org/jira/browse/SOLR-5320
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Reporter: Anshum Gupta
   Original Estimate: 336h
  Remaining Estimate: 336h

 This would enable multi level routing as compared to the 2 level routing 
 available as of now. On the usage bit, here's an example:
 Document Id: myapp!dummyuser!doc
 myapp!dummyuser! can be used as the shardkey for searching content for 
 dummyuser.
 myapp! can be used for searching across all users of myapp.
 I am looking at either a 3 (or 4) level routing. The 32 bit hash would then 
 comprise of 8X4 components from each part (in case of 4 level).



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5308) Split all documents of a route key into another collection


 [ 
https://issues.apache.org/jira/browse/SOLR-5308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-5308:


Attachment: (was: SOLR-5308.patch)

 Split all documents of a route key into another collection
 --

 Key: SOLR-5308
 URL: https://issues.apache.org/jira/browse/SOLR-5308
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
 Fix For: 4.6, 5.0


 Enable SolrCloud users to split out a set of documents from a source 
 collection into another collection.
 This will be useful in multi-tenant environments. This feature will make it 
 possible to split a tenant out of a collection and put them into their own 
 collection which can be scaled separately.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-NightlyTests-trunk - Build # 407 - Failure

2013-10-11 Thread Apache Jenkins Server

Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-trunk/407/

1 tests failed.
REGRESSION:  org.apache.lucene.index.Test2BPostings.test

Error Message:
Java heap space

Stack Trace:
java.lang.OutOfMemoryError: Java heap space
at 
__randomizedtesting.SeedInfo.seed([D8D3920C725BF71C:5087ADD6DCA79AE4]:0)
at 
org.apache.lucene.store.BufferedIndexOutput.init(BufferedIndexOutput.java:50)
at 
org.apache.lucene.store.FSDirectory$FSIndexOutput.init(FSDirectory.java:365)
at 
org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:280)
at 
org.apache.lucene.store.NRTCachingDirectory.createOutput(NRTCachingDirectory.java:206)
at 
org.apache.lucene.store.MockDirectoryWrapper.createOutput(MockDirectoryWrapper.java:478)
at 
org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:44)
at 
org.apache.lucene.store.CompoundFileWriter.close(CompoundFileWriter.java:149)
at 
org.apache.lucene.store.CompoundFileDirectory.close(CompoundFileDirectory.java:171)
at 
org.apache.lucene.util.IOUtils.closeWhileHandlingException(IOUtils.java:80)
at 
org.apache.lucene.index.IndexWriter.createCompoundFile(IndexWriter.java:4408)
at 
org.apache.lucene.index.DocumentsWriterPerThread.sealFlushedSegment(DocumentsWriterPerThread.java:535)
at 
org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:502)
at 
org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:506)
at 
org.apache.lucene.index.DocumentsWriter.postUpdate(DocumentsWriter.java:378)
at 
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:470)
at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1523)
at 
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1193)
at 
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1174)
at org.apache.lucene.index.Test2BPostings.test(Test2BPostings.java:76)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)




Build Log:
[...truncated 655 lines...]
   [junit4] Suite: org.apache.lucene.index.Test2BPostings
   [junit4]   2 NOTE: download the large Jenkins line-docs file by running 
'ant get-jenkins-line-docs' in the lucene directory.
   [junit4]   2 NOTE: reproduce with: ant test  -Dtestcase=Test2BPostings 
-Dtests.method=test -Dtests.seed=D8D3920C725BF71C -Dtests.multiplier=2 
-Dtests.nightly=true -Dtests.slow=true 
-Dtests.linedocsfile=/home/hudson/lucene-data/enwiki.random.lines.txt 
-Dtests.locale=en_IN -Dtests.timezone=America/Puerto_Rico 
-Dtests.file.encoding=US-ASCII
   [junit4] ERROR408s J0 | Test2BPostings.test 
   [junit4] Throwable #1: java.lang.OutOfMemoryError: Java heap space
   [junit4]at 
__randomizedtesting.SeedInfo.seed([D8D3920C725BF71C:5087ADD6DCA79AE4]:0)
   [junit4]at 
org.apache.lucene.store.BufferedIndexOutput.init(BufferedIndexOutput.java:50)
   [junit4]at 
org.apache.lucene.store.FSDirectory$FSIndexOutput.init(FSDirectory.java:365)
   [junit4]at 
org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:280)
   [junit4]at 
org.apache.lucene.store.NRTCachingDirectory.createOutput(NRTCachingDirectory.java:206)
   [junit4]at 
org.apache.lucene.store.MockDirectoryWrapper.createOutput(MockDirectoryWrapper.java:478)
   [junit4]at 
org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:44)
   [junit4]at 
org.apache.lucene.store.CompoundFileWriter.close(CompoundFileWriter.java:149)
   [junit4]at

[jira] [Updated] (SOLR-5310) Add a collection admin command to remove a replica


 [ 
https://issues.apache.org/jira/browse/SOLR-5310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-5310:
-

Attachment: SOLR-5310-1.patch

The testcases still fail occassionally

 Add a collection admin command to remove a replica
 --

 Key: SOLR-5310
 URL: https://issues.apache.org/jira/browse/SOLR-5310
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Noble Paul
 Attachments: SOLR-5310-1.patch, SOLR-5310.patch

   Original Estimate: 72h
  Remaining Estimate: 72h

 the only way a replica can removed is by unloading the core .There is no way 
 to remove a replica that is down . So, the clusterstate will have 
 unreferenced nodes if a few nodes go down over time
 We need a cluster admin command to clean that up
 e.g: 
 /admin/collections?action=DELETEREPLICAcollection=coll1shard=shard1replica=core_node3
 The system would first see if the replica is active. If yes , a core UNLOAD 
 command is fired , which would take care of deleting the replica from the 
 clusterstate as well
 if the state is inactive, then the core or node may be down , in that case 
 the entry is removed from cluster state  



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5310) Add a collection admin command to remove a replica


 [ 
https://issues.apache.org/jira/browse/SOLR-5310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-5310:
-

Attachment: (was: SOLR-5310-1.patch)

 Add a collection admin command to remove a replica
 --

 Key: SOLR-5310
 URL: https://issues.apache.org/jira/browse/SOLR-5310
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Noble Paul
 Attachments: SOLR-5310.patch, SOLR-5310.patch

   Original Estimate: 72h
  Remaining Estimate: 72h

 the only way a replica can removed is by unloading the core .There is no way 
 to remove a replica that is down . So, the clusterstate will have 
 unreferenced nodes if a few nodes go down over time
 We need a cluster admin command to clean that up
 e.g: 
 /admin/collections?action=DELETEREPLICAcollection=coll1shard=shard1replica=core_node3
 The system would first see if the replica is active. If yes , a core UNLOAD 
 command is fired , which would take care of deleting the replica from the 
 clusterstate as well
 if the state is inactive, then the core or node may be down , in that case 
 the entry is removed from cluster state  



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5310) Add a collection admin command to remove a replica


 [ 
https://issues.apache.org/jira/browse/SOLR-5310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-5310:
-

Attachment: SOLR-5310.patch

 Add a collection admin command to remove a replica
 --

 Key: SOLR-5310
 URL: https://issues.apache.org/jira/browse/SOLR-5310
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Noble Paul
 Attachments: SOLR-5310.patch, SOLR-5310.patch

   Original Estimate: 72h
  Remaining Estimate: 72h

 the only way a replica can removed is by unloading the core .There is no way 
 to remove a replica that is down . So, the clusterstate will have 
 unreferenced nodes if a few nodes go down over time
 We need a cluster admin command to clean that up
 e.g: 
 /admin/collections?action=DELETEREPLICAcollection=coll1shard=shard1replica=core_node3
 The system would first see if the replica is active. If yes , a core UNLOAD 
 command is fired , which would take care of deleting the replica from the 
 clusterstate as well
 if the state is inactive, then the core or node may be down , in that case 
 the entry is removed from cluster state  



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-trunk-Linux-Java7-64-test-only - Build # 62737 - Failure!

2013-10-11 Thread builder

Build: builds.flonkings.com/job/Lucene-trunk-Linux-Java7-64-test-only/62737/

No tests ran.

Build Log:
[...truncated 61 lines...]


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS] Lucene-trunk-Linux-Java7-64-test-only - Build # 62737 - Failure!

2013-10-11 Thread Simon Willnauer

ok maybe updateing the JDK would be a good idea :)



On Fri, Oct 11, 2013 at 2:46 PM,  buil...@flonkings.com wrote:
 Build: builds.flonkings.com/job/Lucene-trunk-Linux-Java7-64-test-only/62737/

 No tests ran.

 Build Log:
 [...truncated 61 lines...]

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: [JENKINS] Lucene-trunk-Linux-Java7-64-test-only - Build # 62737 - Failure!

2013-10-11 Thread Uwe Schindler

Hihi,

FYI: I have a compilation unit here (non-Lucene) that also segfaults on JDK 
7.0u25, if you don't do ant clean before. If there are already existing class 
files and only modified ones are recompiled it always segfaults. Reproducible, 
but I have no idea what causes this. :-)

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: simon.willna...@gmail.com [mailto:simon.willna...@gmail.com] On
 Behalf Of Simon Willnauer
 Sent: Friday, October 11, 2013 2:50 PM
 Cc: dev@lucene.apache.org
 Subject: Re: [JENKINS] Lucene-trunk-Linux-Java7-64-test-only - Build # 62737
 - Failure!
 
 ok maybe updateing the JDK would be a good idea :)
 
 
 
 On Fri, Oct 11, 2013 at 2:46 PM,  buil...@flonkings.com wrote:
  Build:
  builds.flonkings.com/job/Lucene-trunk-Linux-Java7-64-test-only/62737/
 
  No tests ran.
 
  Build Log:
  [...truncated 61 lines...]
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
 commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5338) Split shards by a route key

2013-10-11 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792600#comment-13792600
 ] 

Shalin Shekhar Mangar commented on SOLR-5338:
-

[~ysee...@gmail.com] - Would you mind reviewing the new CompositeIdRouter 
methods?

 Split shards by a route key
 ---

 Key: SOLR-5338
 URL: https://issues.apache.org/jira/browse/SOLR-5338
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
 Fix For: 4.6, 5.0

 Attachments: SOLR-5338.patch


 Provide a way to split a shard using a route key such that all documents of 
 the specified route key end up in a single dedicated sub-shard.
 Example:
 Assume that collection1, shard1 has hash range [0, 20]. Also that route key 
 'A!' has hash range [12,15]. Then invoking:
 {code}
 /admin/collections?action=SPLITcollection=collection1split.key=A!
 {code}
 should produce three sub-shards with hash range [0,11], [12,15] and [16,20].
 Specifying the source shard is not required here because the route key is 
 enough to figure it out. Route keys spanning more than one shards will not be 
 supported.
 Note that the sub-shard with the hash range of the route key may also contain 
 documents for other route keys whose hashes collide.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5252) add NGramSynonymTokenizer

2013-10-11 Thread Koji Sekiguchi (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated LUCENE-5252:
---

Attachment: LUCENE-5252_4x.patch

Fix a bug regarding ignoreCase in the attached patch.

 add NGramSynonymTokenizer
 -

 Key: LUCENE-5252
 URL: https://issues.apache.org/jira/browse/LUCENE-5252
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Reporter: Koji Sekiguchi
Priority: Minor
 Attachments: LUCENE-5252_4x.patch, LUCENE-5252_4x.patch, 
 LUCENE-5252_4x.patch


 I'd like to propose that we have another n-gram tokenizer which can process 
 synonyms. That is NGramSynonymTokenizer. Note that in this ticket, the gram 
 size is fixed, i.e. minGramSize = maxGramSize.
 Today, I think we have the following problems when using SynonymFilter with 
 NGramTokenizer. 
 For purpose of illustration, we have a synonym setting ABC, DEFG w/ 
 expand=true and N = 2 (2-gram).
 # There is no consensus (I think :-) how we assign offsets to generated 
 synonym tokens DE, EF and FG when expanding source token AB and BC.
 # If the query pattern looks like ABCY, it cannot be matched even if there is 
 a document …ABCY… in index when autoGeneratePhraseQueries set to true, 
 because there is no CY token (but GY is there) in the index.
 NGramSynonymTokenizer can solve these problems by providing the following 
 methods.
 * NGramSynonymTokenizer reads synonym settings (synonyms.txt) and it doesn't 
 tokenize registered words. e.g.
 ||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer||
 |ABC|AB/DE/BC/EF/FG|ABC/DEFG|
 * The back and forth of the registered words, NGramSynonymTokenizer generates 
 *extra* tokens w/ posInc=0. e.g.
 ||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer||
 |XYZABC123|XY/YZ/ZA/AB/DE/BC/EF/C1/FG/12/23|XY/YZ/Z/ABC/DEFG/1/12/23|
 In the above sample, Z and 1 are the extra tokens.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5325) zk connection loss causes overseer leader loss


[ 
https://issues.apache.org/jira/browse/SOLR-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792661#comment-13792661
 ] 

ASF subversion and git services commented on SOLR-5325:
---

Commit 1531313 from [~markrmil...@gmail.com] in branch 'dev/trunk'
[ https://svn.apache.org/r1531313 ]

SOLR-5325: ZooKeeper connection loss can cause the Overseer to stop processing 
commands.

 zk connection loss causes overseer leader loss
 --

 Key: SOLR-5325
 URL: https://issues.apache.org/jira/browse/SOLR-5325
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.3, 4.4, 4.5
Reporter: Christine Poerschke
Assignee: Mark Miller
 Fix For: 4.5.1, 4.6, 5.0

 Attachments: SOLR-5325.patch, SOLR-5325.patch, SOLR-5325.patch


 The problem we saw was that when the solr overseer leader experienced 
 temporary zk connectivity problems it stopped processing overseer queue 
 events.
 This first happened when quorum within the external zk ensemble was lost due 
 to too many zookeepers being stopped (similar to SOLR-5199). The second time 
 it happened when there was a sufficient number of zookeepers but they were 
 holding zookeeper leadership elections and thus refused connections (the 
 elections were taking several seconds, we were using the default 
 zookeeper.cnxTimeout=5s value and it was hit for one ensemble member).



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5325) zk connection loss causes overseer leader loss

2013-10-11 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792663#comment-13792663
 ] 

ASF subversion and git services commented on SOLR-5325:
---

Commit 1531315 from [~markrmil...@gmail.com] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1531315 ]

SOLR-5325: ZooKeeper connection loss can cause the Overseer to stop processing 
commands.

 zk connection loss causes overseer leader loss
 --

 Key: SOLR-5325
 URL: https://issues.apache.org/jira/browse/SOLR-5325
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.3, 4.4, 4.5
Reporter: Christine Poerschke
Assignee: Mark Miller
 Fix For: 4.5.1, 4.6, 5.0

 Attachments: SOLR-5325.patch, SOLR-5325.patch, SOLR-5325.patch


 The problem we saw was that when the solr overseer leader experienced 
 temporary zk connectivity problems it stopped processing overseer queue 
 events.
 This first happened when quorum within the external zk ensemble was lost due 
 to too many zookeepers being stopped (similar to SOLR-5199). The second time 
 it happened when there was a sufficient number of zookeepers but they were 
 holding zookeeper leadership elections and thus refused connections (the 
 elections were taking several seconds, we were using the default 
 zookeeper.cnxTimeout=5s value and it was hit for one ensemble member).



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5260) Make older Suggesters more accepting of TermFreqPayloadIterator

2013-10-11 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792662#comment-13792662
 ] 

Michael McCandless commented on LUCENE-5260:


Thanks Areek, patch looks great!  I like the hasPayloads() up-front
introspection.

In UnsortedTermFreqIteratorWrapper.payload(), why do we set currentOrd
as a side effect?  Shouldn't next() already do that?  Maybe, we should
instead assert currentOrd == ords[curPos]?  Also, can we break that
sneaky currentOrd assignment in next into its own line before?


 Make older Suggesters more accepting of TermFreqPayloadIterator
 ---

 Key: LUCENE-5260
 URL: https://issues.apache.org/jira/browse/LUCENE-5260
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Reporter: Areek Zillur
 Attachments: LUCENE-5260.patch


 As discussed in https://issues.apache.org/jira/browse/LUCENE-5251, it would 
 be nice to make the older suggesters accepting of TermFreqPayloadIterator and 
 throw an exception if payload is found (if it cannot be used). 
 This will also allow us to nuke most of the other interfaces for 
 BytesRefIterator. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5325) zk connection loss causes overseer leader loss


[ 
https://issues.apache.org/jira/browse/SOLR-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792671#comment-13792671
 ] 

Mark Miller commented on SOLR-5325:
---

Add some more testing that I thought would catch it, but it has not yet on my 
system. Still poking around a bit.

Anyway, I've committed the fix.

 zk connection loss causes overseer leader loss
 --

 Key: SOLR-5325
 URL: https://issues.apache.org/jira/browse/SOLR-5325
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.3, 4.4, 4.5
Reporter: Christine Poerschke
Assignee: Mark Miller
 Fix For: 4.5.1, 4.6, 5.0

 Attachments: SOLR-5325.patch, SOLR-5325.patch, SOLR-5325.patch


 The problem we saw was that when the solr overseer leader experienced 
 temporary zk connectivity problems it stopped processing overseer queue 
 events.
 This first happened when quorum within the external zk ensemble was lost due 
 to too many zookeepers being stopped (similar to SOLR-5199). The second time 
 it happened when there was a sufficient number of zookeepers but they were 
 holding zookeeper leadership elections and thus refused connections (the 
 elections were taking several seconds, we were using the default 
 zookeeper.cnxTimeout=5s value and it was hit for one ensemble member).



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-5325) zk connection loss causes overseer leader loss

[
https://issues.apache.org/jira/browse/SOLR-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792671#comment-13792671
]

Mark Miller edited comment on SOLR-5325 at 10/11/13 2:50 PM:
-

Added some more testing that I thought would catch it, but it has not yet on my
system. Still poking around a bit.

Anyway, I've committed the fix.

was (Author: markrmil...@gmail.com):
Add some more testing that I thought would catch it, but it has not yet on my
system. Still poking around a bit.

Anyway, I've committed the fix.

zk connection loss causes overseer leader loss
--

Key: SOLR-5325
URL: https://issues.apache.org/jira/browse/SOLR-5325
Project: Solr
Issue Type: Bug
Affects Versions: 4.3, 4.4, 4.5
Reporter: Christine Poerschke
Assignee: Mark Miller
Fix For: 4.5.1, 4.6, 5.0

Attachments: SOLR-5325.patch, SOLR-5325.patch, SOLR-5325.patch

The problem we saw was that when the solr overseer leader experienced
temporary zk connectivity problems it stopped processing overseer queue
events.
This first happened when quorum within the external zk ensemble was lost due
to too many zookeepers being stopped (similar to SOLR-5199). The second time
it happened when there was a sufficient number of zookeepers but they were
holding zookeeper leadership elections and thus refused connections (the
elections were taking several seconds, we were using the default
zookeeper.cnxTimeout=5s value and it was hit for one ensemble member).

--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5274) Teach fast FastVectorHighlighter to highlight child fields with parent fields


 [ 
https://issues.apache.org/jira/browse/LUCENE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nik Everett updated LUCENE-5274:


Attachment: LUCENE-5274-4.patch

Reworked to remove dependency on query parser and most of the analyzer 
dependency and to fix errors with phrases.  It'll need to lose the rest of the 
analyzer dependency and have more test cases in addition to any other concerns 
raised in the review. 

 Teach fast FastVectorHighlighter to highlight child fields with parent 
 fields
 ---

 Key: LUCENE-5274
 URL: https://issues.apache.org/jira/browse/LUCENE-5274
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/other
Reporter: Nik Everett
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-5274-4.patch, LUCENE-5274.patch


 I've been messing around with the FastVectorHighlighter and it looks like I 
 can teach it to highlight matches on child fields.  Like this query:
 foo:scissors foo_exact:running
 would highlight foo like this:
 emrunning/em with emscissors/em
 Where foo is stored WITH_POSITIONS_OFFSETS and foo_plain is an unstored copy 
 of foo a different analyzer and its own WITH_POSITIONS_OFFSETS.
 This would make queries that perform weighted matches against different 
 analyzers much more convenient to highlight.
 I have working code and test cases but they are hacked into Elasticsearch.  
 I'd love to Lucene-ify if you'll take them.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5199) Restarting zookeeper makes the overseer stop processing queue events


[ 
https://issues.apache.org/jira/browse/SOLR-5199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792679#comment-13792679
 ] 

Mark Miller commented on SOLR-5199:
---

Hey Jessica - if we can confirm this is the same issue as SOLR-5325, we can 
close this as a duplicate.

 Restarting zookeeper makes the overseer stop processing queue events
 

 Key: SOLR-5199
 URL: https://issues.apache.org/jira/browse/SOLR-5199
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.4
Reporter: Jessica Cheng
Assignee: Mark Miller
  Labels: overseer, zookeeper
 Fix For: 4.5.1, 4.6, 5.0

 Attachments: 5199-log


 Taking the external zookeeper down (I'm just testing, so I only have one 
 external zookeeper instance running) and then bringing it back up seems to 
 have caused the overseer to stop processing queue event.
 I tried to issue the delete collection command (curl 
 'http://localhost:7574/solr/admin/collections?action=DELETEname=c1') and 
 each time it just timed out. Looking at the zookeeper data, I see
 ... 
 /overseer
collection-queue-work
  qn-02
  qn-04
  qn-06
 ...
 and the qn-xxx are not being processed.
 Attached please find the log from the overseer (according to 
 /overseer_elect/leader).



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5325) zk connection loss causes overseer leader loss

2013-10-11 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792684#comment-13792684
 ] 

Mark Miller commented on SOLR-5325:
---

I'm still kind of surprised this would happen - we should be retrying on 
connectionloss up to an expiration - which would make us the leader no longer. 
Perhaps the length of retrying can be a little short or something. And perhaps 
that is part of why it is more difficult for me to reproduce in a test.

 zk connection loss causes overseer leader loss
 --

 Key: SOLR-5325
 URL: https://issues.apache.org/jira/browse/SOLR-5325
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.3, 4.4, 4.5
Reporter: Christine Poerschke
Assignee: Mark Miller
 Fix For: 4.5.1, 4.6, 5.0

 Attachments: SOLR-5325.patch, SOLR-5325.patch, SOLR-5325.patch


 The problem we saw was that when the solr overseer leader experienced 
 temporary zk connectivity problems it stopped processing overseer queue 
 events.
 This first happened when quorum within the external zk ensemble was lost due 
 to too many zookeepers being stopped (similar to SOLR-5199). The second time 
 it happened when there was a sufficient number of zookeepers but they were 
 holding zookeeper leadership elections and thus refused connections (the 
 elections were taking several seconds, we were using the default 
 zookeeper.cnxTimeout=5s value and it was hit for one ensemble member).



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5325) zk connection loss causes overseer leader loss


[ 
https://issues.apache.org/jira/browse/SOLR-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792688#comment-13792688
 ] 

ASF subversion and git services commented on SOLR-5325:
---

Commit 1531323 from [~markrmil...@gmail.com] in branch 'dev/trunk'
[ https://svn.apache.org/r1531323 ]

SOLR-5325: raise retry padding a bit

 zk connection loss causes overseer leader loss
 --

 Key: SOLR-5325
 URL: https://issues.apache.org/jira/browse/SOLR-5325
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.3, 4.4, 4.5
Reporter: Christine Poerschke
Assignee: Mark Miller
 Fix For: 4.5.1, 4.6, 5.0

 Attachments: SOLR-5325.patch, SOLR-5325.patch, SOLR-5325.patch


 The problem we saw was that when the solr overseer leader experienced 
 temporary zk connectivity problems it stopped processing overseer queue 
 events.
 This first happened when quorum within the external zk ensemble was lost due 
 to too many zookeepers being stopped (similar to SOLR-5199). The second time 
 it happened when there was a sufficient number of zookeepers but they were 
 holding zookeeper leadership elections and thus refused connections (the 
 elections were taking several seconds, we were using the default 
 zookeeper.cnxTimeout=5s value and it was hit for one ensemble member).



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5325) zk connection loss causes overseer leader loss

2013-10-11 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792689#comment-13792689
 ] 

ASF subversion and git services commented on SOLR-5325:
---

Commit 1531324 from [~markrmil...@gmail.com] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1531324 ]

SOLR-5325: raise retry padding a bit

 zk connection loss causes overseer leader loss
 --

 Key: SOLR-5325
 URL: https://issues.apache.org/jira/browse/SOLR-5325
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.3, 4.4, 4.5
Reporter: Christine Poerschke
Assignee: Mark Miller
 Fix For: 4.5.1, 4.6, 5.0

 Attachments: SOLR-5325.patch, SOLR-5325.patch, SOLR-5325.patch


 The problem we saw was that when the solr overseer leader experienced 
 temporary zk connectivity problems it stopped processing overseer queue 
 events.
 This first happened when quorum within the external zk ensemble was lost due 
 to too many zookeepers being stopped (similar to SOLR-5199). The second time 
 it happened when there was a sufficient number of zookeepers but they were 
 holding zookeeper leadership elections and thus refused connections (the 
 elections were taking several seconds, we were using the default 
 zookeeper.cnxTimeout=5s value and it was hit for one ensemble member).



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5308) Split all documents of a route key into another collection

2013-10-11 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792692#comment-13792692
 ] 

Shalin Shekhar Mangar commented on SOLR-5308:
-

For splitting a single source shard into a single target collection/shard by a 
route key such as:
{code}
/admin/collections?action=migratecollection=collection1split.key=A!shard=shardXtarget.collection=collection2target.shard=shardY
{code}
A rough strategy could be to:
# Create new core X on source
# Create new core Y on target
# Ask target core to buffer updates
# Start forwarding updates for route key received by source shard to target 
collection
# Split source shard to a new core X
# Ask Y to replicate fully from X
# Core Admin merge Y to target core
# Ask target core to replay buffer updates


 Split all documents of a route key into another collection
 --

 Key: SOLR-5308
 URL: https://issues.apache.org/jira/browse/SOLR-5308
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
 Fix For: 4.6, 5.0


 Enable SolrCloud users to split out a set of documents from a source 
 collection into another collection.
 This will be useful in multi-tenant environments. This feature will make it 
 possible to split a tenant out of a collection and put them into their own 
 collection which can be scaled separately.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5325) zk connection loss causes overseer leader loss


[ 
https://issues.apache.org/jira/browse/SOLR-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792695#comment-13792695
 ] 

ASF subversion and git services commented on SOLR-5325:
---

Commit 1531327 from [~markrmil...@gmail.com] in branch 
'dev/branches/lucene_solr_4_5'
[ https://svn.apache.org/r1531327 ]

SOLR-5325: raise retry padding a bit

 zk connection loss causes overseer leader loss
 --

 Key: SOLR-5325
 URL: https://issues.apache.org/jira/browse/SOLR-5325
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.3, 4.4, 4.5
Reporter: Christine Poerschke
Assignee: Mark Miller
 Fix For: 4.5.1, 4.6, 5.0

 Attachments: SOLR-5325.patch, SOLR-5325.patch, SOLR-5325.patch


 The problem we saw was that when the solr overseer leader experienced 
 temporary zk connectivity problems it stopped processing overseer queue 
 events.
 This first happened when quorum within the external zk ensemble was lost due 
 to too many zookeepers being stopped (similar to SOLR-5199). The second time 
 it happened when there was a sufficient number of zookeepers but they were 
 holding zookeeper leadership elections and thus refused connections (the 
 elections were taking several seconds, we were using the default 
 zookeeper.cnxTimeout=5s value and it was hit for one ensemble member).



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5325) zk connection loss causes overseer leader loss

2013-10-11 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792694#comment-13792694
 ] 

ASF subversion and git services commented on SOLR-5325:
---

Commit 1531325 from [~markrmil...@gmail.com] in branch 
'dev/branches/lucene_solr_4_5'
[ https://svn.apache.org/r1531325 ]

SOLR-5325: ZooKeeper connection loss can cause the Overseer to stop processing 
commands.

 zk connection loss causes overseer leader loss
 --

 Key: SOLR-5325
 URL: https://issues.apache.org/jira/browse/SOLR-5325
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.3, 4.4, 4.5
Reporter: Christine Poerschke
Assignee: Mark Miller
 Fix For: 4.5.1, 4.6, 5.0

 Attachments: SOLR-5325.patch, SOLR-5325.patch, SOLR-5325.patch


 The problem we saw was that when the solr overseer leader experienced 
 temporary zk connectivity problems it stopped processing overseer queue 
 events.
 This first happened when quorum within the external zk ensemble was lost due 
 to too many zookeepers being stopped (similar to SOLR-5199). The second time 
 it happened when there was a sufficient number of zookeepers but they were 
 holding zookeeper leadership elections and thus refused connections (the 
 elections were taking several seconds, we were using the default 
 zookeeper.cnxTimeout=5s value and it was hit for one ensemble member).



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4824) Fuzzy / Faceting results are changed after ingestion of documents past a certain number

2013-10-11 Thread Lakshmi Venkataswamy (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792742#comment-13792742
 ] 

Lakshmi Venkataswamy commented on SOLR-4824:


I have tested 4.5.0 version and the same behavior has been observed.  So we are 
staying with 3.6 in production for now.

 Fuzzy / Faceting results are changed after ingestion of documents past a 
 certain number 
 

 Key: SOLR-4824
 URL: https://issues.apache.org/jira/browse/SOLR-4824
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.2, 4.3
 Environment: Ubuntu 12.04 LTS 12.04.2 
 jre1.7.0_17
 jboss-as-7.1.1.Final
Reporter: Lakshmi Venkataswamy

 In upgrading from SOLR 3.6 to 4.2/4.3 and comparing results on fuzzy queries, 
 I found that after a certain number of documents were ingested the fuzzy 
 query had drastically lower number of results.  We have approximately 18,000 
 documents per day and after ingesting approximately 40 days of documents, the 
 next incremental day of documents results in a lower number of results of a 
 fuzzy search.
 The query :  
 http://10.100.1.xx:8080/solr/corex/select?q=cc:worde~1facet=onfacet.field=datefl=datefacet.sort
 produces the following result before the threshold is crossed
 responselst name=responseHeader
 int name=status0/intint name=QTime2349/intlst name=paramsstr 
 name=faceton/strstr name=fldate/strstr name=facet.sort/
 str name=qcc:worde~1/strstr 
 name=facet.fielddate/str/lst/lstresult name=response 
 numFound=362803 start=0/result
 lst name=facet_countslst name=facet_queries/lst 
 name=facet_fieldslst name=date
 int name=2012-12-312866/int
 int name=2013-01-0111372/int
 int name=2013-01-0211514/int
 int name=2013-01-0312015/int
 int name=2013-01-0411746/int
 int name=2013-01-0510853/int
 int name=2013-01-0611053/int
 int name=2013-01-0711815/int
 int name=2013-01-0811427/int
 int name=2013-01-0911475/int
 int name=2013-01-1011461/int
 int name=2013-01-1112058/int
 int name=2013-01-1211335/int
 int name=2013-01-1312039/int
 int name=2013-01-1412064/int
 int name=2013-01-1512234/int
 int name=2013-01-1612545/int
 int name=2013-01-1711766/int
 int name=2013-01-1812197/int
 int name=2013-01-1911414/int
 int name=2013-01-2011633/int
 int name=2013-01-2112863/int
 int name=2013-01-2212378/int
 int name=2013-01-2311947/int
 int name=2013-01-2411822/int
 int name=2013-01-2511882/int
 int name=2013-01-2610474/int
 int name=2013-01-2711051/int
 int name=2013-01-2811776/int
 int name=2013-01-2911957/int
 int name=2013-01-3011260/int
 int name=2013-01-318511/int
 /lst/lstlst name=facet_dates/lst 
 name=facet_ranges//lst/response
 Once the 40 days of documents ingested threshold is crossed the results drop 
 as show below for the same query
 responselst name=responseHeader
 int name=status0/intint name=QTime2/intlst name=paramsstr 
 name=faceton/strstr name=fldate/strstr name=facet.sort/str 
 name=qcc:worde~1/strstr name=facet.fielddate/str/lst/lst
 result name=response numFound=1338 start=0/result
 lst name=facet_countslst name=facet_queries/lst 
 name=facet_fieldslst name=date
 int name=2012-12-310/int
 int name=2013-01-0141/int
 int name=2013-01-0221/int
 int name=2013-01-0324/int
 int name=2013-01-0419/int
 int name=2013-01-059/int
 int name=2013-01-0611/int
 int name=2013-01-0717/int
 int name=2013-01-0814/int
 int name=2013-01-0924/int
 int name=2013-01-1043/int
 int name=2013-01-1114/int
 int name=2013-01-1252/int
 int name=2013-01-1357/int
 int name=2013-01-1425/int
 int name=2013-01-1517/int
 int name=2013-01-1634/int
 int name=2013-01-1711/int
 int name=2013-01-1816/int
 int name=2013-01-19121/int
 int name=2013-01-2033/int
 int name=2013-01-2126/int
 int name=2013-01-2259/int
 int name=2013-01-2327/int
 int name=2013-01-2410/int
 int name=2013-01-259/int
 int name=2013-01-266/int
 int name=2013-01-2716/int
 int name=2013-01-2811/int
 int name=2013-01-2915/int
 int name=2013-01-3021/int
 int name=2013-01-31109/int
 int name=2013-02-0111/int
 int name=2013-02-027/int
 int name=2013-02-0310/int
 int name=2013-02-048/int
 int name=2013-02-0513/int
 int name=2013-02-0675/int
 int name=2013-02-0777/int
 int name=2013-02-0831/int
 int name=2013-02-0935/int
 int name=2013-02-1022/int
 int name=2013-02-1118/int
 int name=2013-02-1211/int
 int name=2013-02-1368/int
 int name=2013-02-1440/int
 /lst/lstlst name=facet_dates/lst 
 name=facet_ranges//lst/response
 I have also tested this with different months of data and have seen the same 
 issue  around the number of documents.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5199) Restarting zookeeper makes the overseer stop processing queue events

2013-10-11 Thread Jessica Cheng (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-5199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792778#comment-13792778
]

Jessica Cheng commented on SOLR-5199:
-

Sorry, I only saw this once and I didn't have time to investigate, so I don't
know what the cause is. SOLR-5325 definitely sounds similar so I'll close this
issue now. Thanks!

Restarting zookeeper makes the overseer stop processing queue events

Key: SOLR-5199
URL: https://issues.apache.org/jira/browse/SOLR-5199
Project: Solr
Issue Type: Bug
Components: SolrCloud
Affects Versions: 4.4
Reporter: Jessica Cheng
Assignee: Mark Miller
Labels: overseer, zookeeper
Fix For: 4.5.1, 4.6, 5.0

Attachments: 5199-log

Taking the external zookeeper down (I'm just testing, so I only have one
external zookeeper instance running) and then bringing it back up seems to
have caused the overseer to stop processing queue event.
I tried to issue the delete collection command (curl
'http://localhost:7574/solr/admin/collections?action=DELETEname=c1') and
each time it just timed out. Looking at the zookeeper data, I see
...
/overseer
collection-queue-work
qn-02
qn-04
qn-06
...
and the qn-xxx are not being processed.
Attached please find the log from the overseer (according to
/overseer_elect/leader).

--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Closed] (SOLR-5199) Restarting zookeeper makes the overseer stop processing queue events

2013-10-11 Thread Jessica Cheng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jessica Cheng closed SOLR-5199.
---

Resolution: Duplicate

 Restarting zookeeper makes the overseer stop processing queue events
 

 Key: SOLR-5199
 URL: https://issues.apache.org/jira/browse/SOLR-5199
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.4
Reporter: Jessica Cheng
Assignee: Mark Miller
  Labels: overseer, zookeeper
 Fix For: 4.5.1, 4.6, 5.0

 Attachments: 5199-log


 Taking the external zookeeper down (I'm just testing, so I only have one 
 external zookeeper instance running) and then bringing it back up seems to 
 have caused the overseer to stop processing queue event.
 I tried to issue the delete collection command (curl 
 'http://localhost:7574/solr/admin/collections?action=DELETEname=c1') and 
 each time it just timed out. Looking at the zookeeper data, I see
 ... 
 /overseer
collection-queue-work
  qn-02
  qn-04
  qn-06
 ...
 and the qn-xxx are not being processed.
 Attached please find the log from the overseer (according to 
 /overseer_elect/leader).



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5273) Binary artifacts in Lucene and Solr convenience binary distributions accompanying a release, including on Maven Central, should be identical across all distributions

2013-10-11 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792800#comment-13792800
 ] 

ASF subversion and git services commented on LUCENE-5273:
-

Commit 1531354 from [~steve_rowe] in branch 'dev/trunk'
[ https://svn.apache.org/r1531354 ]

LUCENE-5273: Binary artifacts in Lucene and Solr convenience binary 
distributions accompanying a release, including on Maven Central, should be 
identical across all distributions.

 Binary artifacts in Lucene and Solr convenience binary distributions 
 accompanying a release, including on Maven Central, should be identical 
 across all distributions
 -

 Key: LUCENE-5273
 URL: https://issues.apache.org/jira/browse/LUCENE-5273
 Project: Lucene - Core
  Issue Type: Bug
  Components: general/build
Reporter: Steve Rowe
Assignee: Steve Rowe
 Fix For: 4.6

 Attachments: LUCENE-5273.patch


 As mentioned in various issues (e.g. LUCENE-3655, LUCENE-3885, SOLR-4766), we 
 release multiple versions of the same artifact: binary Maven artifacts are 
 not identical to the ones in the Lucene and Solr binary distributions, and 
 the Lucene jars in the Solr binary distribution, including within the war, 
 are not identical to the ones in the Lucene binary distribution.  This is bad.
 It's (probably always?) not horribly bad, since the differences all appear to 
 be caused by the build re-creating manifests and re-building jars and the 
 Solr war from their constituents at various points in the release build 
 process; as a result, manifest timestamp attributes, as well as archive 
 metadata (at least constituent timestamps, maybe other things?), differ each 
 time a jar is rebuilt.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5323) Solr requires -Dsolr.clustering.enabled=false when pointing at example config


[ 
https://issues.apache.org/jira/browse/SOLR-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792824#comment-13792824
 ] 

Mark Miller commented on SOLR-5323:
---

I also think this was a mistake - I don't know that we need another solr.home 
type thing to address it though.

The root of the issue is that the clustering is not really lazy loading 
clustering - and the current policy is to lazy load the contrib modules - and 
that is because of the component. I think Erik is on to the right path with 
lazy SearchComponents. I think that if the only request handlers that refer to 
a search component are lazy, they should probably also init lazily. I have not 
looked into how hard that is to do, but it seems like the correct fix to bring 
clustering in line with the other contribs. I also think the whole enabled flag 
we had is no good.

 Solr requires -Dsolr.clustering.enabled=false when pointing at example config
 -

 Key: SOLR-5323
 URL: https://issues.apache.org/jira/browse/SOLR-5323
 Project: Solr
  Issue Type: Bug
  Components: contrib - Clustering
Affects Versions: 4.5
 Environment: vanilla mac
Reporter: John Berryman
Assignee: Dawid Weiss
 Fix For: 4.6, 5.0


 my typical use of Solr is something like this: 
 {code}
 cd SOLR_HOME/example
 cp -r solr /myProjectDir/solr_home
 java -jar -Dsolr.solr.home=/myProjectDir/solr_home  start.jar
 {code}
 But in solr 4.5.0 this fails to start successfully. I get an error:
 {code}
 org.apache.solr.common.SolrException: Error loading class 
 'solr.clustering.ClusteringComponent'
 {code}
 The reason is because solr.clustering.enabled defaults to true now. I don't 
 know why this might be the case.
 you can get around it with 
 {code}
 java -jar -Dsolr.solr.home=/myProjectDir/solr_home 
 -Dsolr.clustering.enabled=false start.jar
 {code}
 SOLR-4708 is when this became an issue.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5323) Solr requires -Dsolr.clustering.enabled=false when pointing at example config

2013-10-11 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792834#comment-13792834
 ] 

Dawid Weiss commented on SOLR-5323:
---

I can revert to lazy-loading, not a problem. But this isn't solving the 
relative paths issue at all. Like I mentioned there were several times when I 
had to pass an example preconfigured solr configuration to somebody -- this 
always required that person to put the content of the example under a specific 
directory in Solr distribution, otherwise things wouldn't work because of 
relative paths. It was a pain to explain why this step is needed and to 
enforce... I ended up just copying the required JARs into the example. This 
seems wrong somehow -- if it's a solr distribution then there should be a way 
to reference contribs in a way that allows people to have their stuff in any 
folder hierarchy?

What do you think?

 Solr requires -Dsolr.clustering.enabled=false when pointing at example config
 -

 Key: SOLR-5323
 URL: https://issues.apache.org/jira/browse/SOLR-5323
 Project: Solr
  Issue Type: Bug
  Components: contrib - Clustering
Affects Versions: 4.5
 Environment: vanilla mac
Reporter: John Berryman
Assignee: Dawid Weiss
 Fix For: 4.6, 5.0


 my typical use of Solr is something like this: 
 {code}
 cd SOLR_HOME/example
 cp -r solr /myProjectDir/solr_home
 java -jar -Dsolr.solr.home=/myProjectDir/solr_home  start.jar
 {code}
 But in solr 4.5.0 this fails to start successfully. I get an error:
 {code}
 org.apache.solr.common.SolrException: Error loading class 
 'solr.clustering.ClusteringComponent'
 {code}
 The reason is because solr.clustering.enabled defaults to true now. I don't 
 know why this might be the case.
 you can get around it with 
 {code}
 java -jar -Dsolr.solr.home=/myProjectDir/solr_home 
 -Dsolr.clustering.enabled=false start.jar
 {code}
 SOLR-4708 is when this became an issue.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5269) TestRandomChains failure


[ 
https://issues.apache.org/jira/browse/LUCENE-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792845#comment-13792845
 ] 

ASF subversion and git services commented on LUCENE-5269:
-

Commit 1531368 from [~rcmuir] in branch 'dev/trunk'
[ https://svn.apache.org/r1531368 ]

LUCENE-5269: satisfy the policeman

 TestRandomChains failure
 

 Key: LUCENE-5269
 URL: https://issues.apache.org/jira/browse/LUCENE-5269
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Fix For: 4.5.1, 4.6, 5.0

 Attachments: LUCENE-5269.patch, LUCENE-5269.patch, LUCENE-5269.patch, 
 LUCENE-5269_test.patch, LUCENE-5269_test.patch, LUCENE-5269_test.patch


 One of EdgeNGramTokenizer, ShingleFilter, NGramTokenFilter is buggy, or 
 possibly only the combination of them conspiring together.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5269) TestRandomChains failure

2013-10-11 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792846#comment-13792846
 ] 

ASF subversion and git services commented on LUCENE-5269:
-

Commit 1531369 from [~rcmuir] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1531369 ]

LUCENE-5269: satisfy the policeman

 TestRandomChains failure
 

 Key: LUCENE-5269
 URL: https://issues.apache.org/jira/browse/LUCENE-5269
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Fix For: 4.5.1, 4.6, 5.0

 Attachments: LUCENE-5269.patch, LUCENE-5269.patch, LUCENE-5269.patch, 
 LUCENE-5269_test.patch, LUCENE-5269_test.patch, LUCENE-5269_test.patch


 One of EdgeNGramTokenizer, ShingleFilter, NGramTokenFilter is buggy, or 
 possibly only the combination of them conspiring together.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5323) Solr requires -Dsolr.clustering.enabled=false when pointing at example config


[ 
https://issues.apache.org/jira/browse/SOLR-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792850#comment-13792850
 ] 

Mark Miller commented on SOLR-5323:
---

I just think anything with the relative paths is a separate issue.

You can use any hierarchy - you just have to change those paths. I'm all for 
that being improved somehow, but the issue here seems to be:

Solr contrib modules are lazy loaded so that if you don't use them, you can 
delete any of them from the dist package layout and things still work. Or you 
can not delete them and if you try and use them, things work. Clustering now 
violates that. It's not really clusterings fault, it seems to more be a 
limitation of the search component.


 Solr requires -Dsolr.clustering.enabled=false when pointing at example config
 -

 Key: SOLR-5323
 URL: https://issues.apache.org/jira/browse/SOLR-5323
 Project: Solr
  Issue Type: Bug
  Components: contrib - Clustering
Affects Versions: 4.5
 Environment: vanilla mac
Reporter: John Berryman
Assignee: Dawid Weiss
 Fix For: 4.6, 5.0


 my typical use of Solr is something like this: 
 {code}
 cd SOLR_HOME/example
 cp -r solr /myProjectDir/solr_home
 java -jar -Dsolr.solr.home=/myProjectDir/solr_home  start.jar
 {code}
 But in solr 4.5.0 this fails to start successfully. I get an error:
 {code}
 org.apache.solr.common.SolrException: Error loading class 
 'solr.clustering.ClusteringComponent'
 {code}
 The reason is because solr.clustering.enabled defaults to true now. I don't 
 know why this might be the case.
 you can get around it with 
 {code}
 java -jar -Dsolr.solr.home=/myProjectDir/solr_home 
 -Dsolr.clustering.enabled=false start.jar
 {code}
 SOLR-4708 is when this became an issue.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5323) Solr requires -Dsolr.clustering.enabled=false when pointing at example config

2013-10-11 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792859#comment-13792859
 ] 

Dawid Weiss commented on SOLR-5323:
---

Ok, I will reverting the changes from SOLR-4708.

 Solr requires -Dsolr.clustering.enabled=false when pointing at example config
 -

 Key: SOLR-5323
 URL: https://issues.apache.org/jira/browse/SOLR-5323
 Project: Solr
  Issue Type: Bug
  Components: contrib - Clustering
Affects Versions: 4.5
 Environment: vanilla mac
Reporter: John Berryman
Assignee: Dawid Weiss
 Fix For: 4.6, 5.0


 my typical use of Solr is something like this: 
 {code}
 cd SOLR_HOME/example
 cp -r solr /myProjectDir/solr_home
 java -jar -Dsolr.solr.home=/myProjectDir/solr_home  start.jar
 {code}
 But in solr 4.5.0 this fails to start successfully. I get an error:
 {code}
 org.apache.solr.common.SolrException: Error loading class 
 'solr.clustering.ClusteringComponent'
 {code}
 The reason is because solr.clustering.enabled defaults to true now. I don't 
 know why this might be the case.
 you can get around it with 
 {code}
 java -jar -Dsolr.solr.home=/myProjectDir/solr_home 
 -Dsolr.clustering.enabled=false start.jar
 {code}
 SOLR-4708 is when this became an issue.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5275) Fix AttributeSource.toString()


[ 
https://issues.apache.org/jira/browse/LUCENE-5275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792884#comment-13792884
 ] 

ASF subversion and git services commented on LUCENE-5275:
-

Commit 1531376 from [~rcmuir] in branch 'dev/trunk'
[ https://svn.apache.org/r1531376 ]

LUCENE-5275: Change AttributeSource.toString to display the current state of 
attributes

 Fix AttributeSource.toString()
 --

 Key: LUCENE-5275
 URL: https://issues.apache.org/jira/browse/LUCENE-5275
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-5275.patch, LUCENE-5275.patch


 Its currently just Object.toString, e.g.:
 org.apache.lucene.analysis.en.PorterStemFilter@8a32165c
 But I think we should make it more useful, to end users trying to see what 
 their chain is doing, and to make SOPs easier when debugging:
 {code}
 EnglishAnalyzer analyzer = new EnglishAnalyzer(TEST_VERSION_CURRENT);
 try (TokenStream ts = analyzer.tokenStream(body, Its 2013, let's fix this 
 already!)) {
   ts.reset();
   while (ts.incrementToken()) {
 System.out.println(ts.toString());
   }
   ts.end();
 }
 {code}
 Proposed output:
 {noformat}
 PorterStemFilter@8a32165c term=it,bytes=[69 
 74],startOffset=0,endOffset=3,positionIncrement=1,type=ALPHANUM,keyword=false
 PorterStemFilter@987b9eea term=2013,bytes=[32 30 31 
 33],startOffset=4,endOffset=8,positionIncrement=1,type=NUM,keyword=false
 PorterStemFilter@6b5dbd1f term=let,bytes=[6c 65 
 74],startOffset=10,endOffset=15,positionIncrement=1,type=ALPHANUM,keyword=false
 PorterStemFilter@45cbde1b term=fix,bytes=[66 69 
 78],startOffset=16,endOffset=19,positionIncrement=1,type=ALPHANUM,keyword=false
 PorterStemFilter@bcd8f627 term=alreadi,bytes=[61 6c 72 65 61 64 
 69],startOffset=25,endOffset=32,positionIncrement=2,type=ALPHANUM,keyword=false
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5323) Solr requires -Dsolr.clustering.enabled=false when pointing at example config


 [ 
https://issues.apache.org/jira/browse/SOLR-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss updated SOLR-5323:
--

Attachment: SOLR-5323.patch

Patch reverting (portions) of SOLR-4708.

 Solr requires -Dsolr.clustering.enabled=false when pointing at example config
 -

 Key: SOLR-5323
 URL: https://issues.apache.org/jira/browse/SOLR-5323
 Project: Solr
  Issue Type: Bug
  Components: contrib - Clustering
Affects Versions: 4.5
 Environment: vanilla mac
Reporter: John Berryman
Assignee: Dawid Weiss
 Fix For: 4.6, 5.0

 Attachments: SOLR-5323.patch


 my typical use of Solr is something like this: 
 {code}
 cd SOLR_HOME/example
 cp -r solr /myProjectDir/solr_home
 java -jar -Dsolr.solr.home=/myProjectDir/solr_home  start.jar
 {code}
 But in solr 4.5.0 this fails to start successfully. I get an error:
 {code}
 org.apache.solr.common.SolrException: Error loading class 
 'solr.clustering.ClusteringComponent'
 {code}
 The reason is because solr.clustering.enabled defaults to true now. I don't 
 know why this might be the case.
 you can get around it with 
 {code}
 java -jar -Dsolr.solr.home=/myProjectDir/solr_home 
 -Dsolr.clustering.enabled=false start.jar
 {code}
 SOLR-4708 is when this became an issue.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5323) Solr requires -Dsolr.clustering.enabled=false when pointing at example config

2013-10-11 Thread ASF subversion and git services (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss updated SOLR-5323:
--

Fix Version/s: 4.5.1

 Solr requires -Dsolr.clustering.enabled=false when pointing at example config
 -

 Key: SOLR-5323
 URL: https://issues.apache.org/jira/browse/SOLR-5323
 Project: Solr
  Issue Type: Bug
  Components: contrib - Clustering
Affects Versions: 4.5
 Environment: vanilla mac
Reporter: John Berryman
Assignee: Dawid Weiss
 Fix For: 4.5.1, 4.6, 5.0

 Attachments: SOLR-5323.patch


 my typical use of Solr is something like this: 
 {code}
 cd SOLR_HOME/example
 cp -r solr /myProjectDir/solr_home
 java -jar -Dsolr.solr.home=/myProjectDir/solr_home  start.jar
 {code}
 But in solr 4.5.0 this fails to start successfully. I get an error:
 {code}
 org.apache.solr.common.SolrException: Error loading class 
 'solr.clustering.ClusteringComponent'
 {code}
 The reason is because solr.clustering.enabled defaults to true now. I don't 
 know why this might be the case.
 you can get around it with 
 {code}
 java -jar -Dsolr.solr.home=/myProjectDir/solr_home 
 -Dsolr.clustering.enabled=false start.jar
 {code}
 SOLR-4708 is when this became an issue.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4708) Enable ClusteringComponent by default


[ 
https://issues.apache.org/jira/browse/SOLR-4708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792895#comment-13792895
 ] 

ASF subversion and git services commented on SOLR-4708:
---

Commit 1531377 from [~dawidweiss] in branch 'dev/trunk'
[ https://svn.apache.org/r1531377 ]

SOLR-5323: Disable ClusteringComponent by default in collection1 example. The 
solr.clustering.enabled system property needs to be set to 'true' to enable the 
clustering contrib (reverts SOLR-4708). (Dawid Weiss)

 Enable ClusteringComponent by default
 -

 Key: SOLR-4708
 URL: https://issues.apache.org/jira/browse/SOLR-4708
 Project: Solr
  Issue Type: Task
Reporter: Erik Hatcher
Assignee: Dawid Weiss
Priority: Minor
 Fix For: 4.5, 5.0

 Attachments: SOLR-4708.patch, SOLR-4708.patch


 In the past, the ClusteringComponent used to rely on 3rd party JARs not 
 available from a Solr distro.  This is no longer the case, but the /browse UI 
 and other references still had the clustering component disabled in the 
 example with an awkward system property way to enable it.  Let's remove all 
 of that unnecessary stuff and just enable it as it works out of the box now.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5323) Solr requires -Dsolr.clustering.enabled=false when pointing at example config

2013-10-11 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792894#comment-13792894
 ] 

ASF subversion and git services commented on SOLR-5323:
---

Commit 1531377 from [~dawidweiss] in branch 'dev/trunk'
[ https://svn.apache.org/r1531377 ]

SOLR-5323: Disable ClusteringComponent by default in collection1 example. The 
solr.clustering.enabled system property needs to be set to 'true' to enable the 
clustering contrib (reverts SOLR-4708). (Dawid Weiss)

 Solr requires -Dsolr.clustering.enabled=false when pointing at example config
 -

 Key: SOLR-5323
 URL: https://issues.apache.org/jira/browse/SOLR-5323
 Project: Solr
  Issue Type: Bug
  Components: contrib - Clustering
Affects Versions: 4.5
 Environment: vanilla mac
Reporter: John Berryman
Assignee: Dawid Weiss
 Fix For: 4.5.1, 4.6, 5.0

 Attachments: SOLR-5323.patch


 my typical use of Solr is something like this: 
 {code}
 cd SOLR_HOME/example
 cp -r solr /myProjectDir/solr_home
 java -jar -Dsolr.solr.home=/myProjectDir/solr_home  start.jar
 {code}
 But in solr 4.5.0 this fails to start successfully. I get an error:
 {code}
 org.apache.solr.common.SolrException: Error loading class 
 'solr.clustering.ClusteringComponent'
 {code}
 The reason is because solr.clustering.enabled defaults to true now. I don't 
 know why this might be the case.
 you can get around it with 
 {code}
 java -jar -Dsolr.solr.home=/myProjectDir/solr_home 
 -Dsolr.clustering.enabled=false start.jar
 {code}
 SOLR-4708 is when this became an issue.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5323) Solr requires -Dsolr.clustering.enabled=false when pointing at example config

2013-10-11 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792897#comment-13792897
 ] 

ASF subversion and git services commented on SOLR-5323:
---

Commit 1531378 from [~dawidweiss] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1531378 ]

SOLR-5323: Disable ClusteringComponent by default in collection1 example. The 
solr.clustering.enabled system property needs to be set to 'true' to enable the 
clustering contrib (reverts SOLR-4708). (Dawid Weiss)

 Solr requires -Dsolr.clustering.enabled=false when pointing at example config
 -

 Key: SOLR-5323
 URL: https://issues.apache.org/jira/browse/SOLR-5323
 Project: Solr
  Issue Type: Bug
  Components: contrib - Clustering
Affects Versions: 4.5
 Environment: vanilla mac
Reporter: John Berryman
Assignee: Dawid Weiss
 Fix For: 4.5.1, 4.6, 5.0

 Attachments: SOLR-5323.patch


 my typical use of Solr is something like this: 
 {code}
 cd SOLR_HOME/example
 cp -r solr /myProjectDir/solr_home
 java -jar -Dsolr.solr.home=/myProjectDir/solr_home  start.jar
 {code}
 But in solr 4.5.0 this fails to start successfully. I get an error:
 {code}
 org.apache.solr.common.SolrException: Error loading class 
 'solr.clustering.ClusteringComponent'
 {code}
 The reason is because solr.clustering.enabled defaults to true now. I don't 
 know why this might be the case.
 you can get around it with 
 {code}
 java -jar -Dsolr.solr.home=/myProjectDir/solr_home 
 -Dsolr.clustering.enabled=false start.jar
 {code}
 SOLR-4708 is when this became an issue.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5274) Teach fast FastVectorHighlighter to highlight child fields with parent fields

2013-10-11 Thread ASF subversion and git services (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nik Everett updated LUCENE-5274:


Attachment: (was: LUCENE-5274.patch)

 Teach fast FastVectorHighlighter to highlight child fields with parent 
 fields
 ---

 Key: LUCENE-5274
 URL: https://issues.apache.org/jira/browse/LUCENE-5274
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/other
Reporter: Nik Everett
Assignee: Adrien Grand
Priority: Minor

 I've been messing around with the FastVectorHighlighter and it looks like I 
 can teach it to highlight matches on child fields.  Like this query:
 foo:scissors foo_exact:running
 would highlight foo like this:
 emrunning/em with emscissors/em
 Where foo is stored WITH_POSITIONS_OFFSETS and foo_plain is an unstored copy 
 of foo a different analyzer and its own WITH_POSITIONS_OFFSETS.
 This would make queries that perform weighted matches against different 
 analyzers much more convenient to highlight.
 I have working code and test cases but they are hacked into Elasticsearch.  
 I'd love to Lucene-ify if you'll take them.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4708) Enable ClusteringComponent by default


[ 
https://issues.apache.org/jira/browse/SOLR-4708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792898#comment-13792898
 ] 

ASF subversion and git services commented on SOLR-4708:
---

Commit 1531378 from [~dawidweiss] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1531378 ]

SOLR-5323: Disable ClusteringComponent by default in collection1 example. The 
solr.clustering.enabled system property needs to be set to 'true' to enable the 
clustering contrib (reverts SOLR-4708). (Dawid Weiss)

 Enable ClusteringComponent by default
 -

 Key: SOLR-4708
 URL: https://issues.apache.org/jira/browse/SOLR-4708
 Project: Solr
  Issue Type: Task
Reporter: Erik Hatcher
Assignee: Dawid Weiss
Priority: Minor
 Fix For: 4.5, 5.0

 Attachments: SOLR-4708.patch, SOLR-4708.patch


 In the past, the ClusteringComponent used to rely on 3rd party JARs not 
 available from a Solr distro.  This is no longer the case, but the /browse UI 
 and other references still had the clustering component disabled in the 
 example with an awkward system property way to enable it.  Let's remove all 
 of that unnecessary stuff and just enable it as it works out of the box now.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5274) Teach fast FastVectorHighlighter to highlight child fields with parent fields


 [ 
https://issues.apache.org/jira/browse/LUCENE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nik Everett updated LUCENE-5274:


Attachment: (was: LUCENE-5274-4.patch)

 Teach fast FastVectorHighlighter to highlight child fields with parent 
 fields
 ---

 Key: LUCENE-5274
 URL: https://issues.apache.org/jira/browse/LUCENE-5274
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/other
Reporter: Nik Everett
Assignee: Adrien Grand
Priority: Minor

 I've been messing around with the FastVectorHighlighter and it looks like I 
 can teach it to highlight matches on child fields.  Like this query:
 foo:scissors foo_exact:running
 would highlight foo like this:
 emrunning/em with emscissors/em
 Where foo is stored WITH_POSITIONS_OFFSETS and foo_plain is an unstored copy 
 of foo a different analyzer and its own WITH_POSITIONS_OFFSETS.
 This would make queries that perform weighted matches against different 
 analyzers much more convenient to highlight.
 I have working code and test cases but they are hacked into Elasticsearch.  
 I'd love to Lucene-ify if you'll take them.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5274) Teach fast FastVectorHighlighter to highlight child fields with parent fields

2013-10-11 Thread ASF subversion and git services (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Nik Everett updated LUCENE-5274:

Attachment: LUCENE-5274.patch

New version of the patch. This one works a lot better with phrases and even
works on fields that have the same source but different tokenizers.

It still makes highlighting depend on the analysis module to pick up
PerFieldAnalyzerWrapper.

I think all the new code this adds to FieldPhraseList deserves a unit test on
its own but I'm not in the frame of mind to write one at the moment so I'll
have to come back to it later.

Teach fast FastVectorHighlighter to highlight child fields with parent
fields
---

Key: LUCENE-5274
URL: https://issues.apache.org/jira/browse/LUCENE-5274
Project: Lucene - Core
Issue Type: Improvement
Components: core/other
Reporter: Nik Everett
Assignee: Adrien Grand
Priority: Minor
Attachments: LUCENE-5274.patch

I've been messing around with the FastVectorHighlighter and it looks like I
can teach it to highlight matches on child fields. Like this query:
foo:scissors foo_exact:running
would highlight foo like this:
emrunning/em with emscissors/em
Where foo is stored WITH_POSITIONS_OFFSETS and foo_plain is an unstored copy
of foo a different analyzer and its own WITH_POSITIONS_OFFSETS.
This would make queries that perform weighted matches against different
analyzers much more convenient to highlight.
I have working code and test cases but they are hacked into Elasticsearch.
I'd love to Lucene-ify if you'll take them.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4708) Enable ClusteringComponent by default


[ 
https://issues.apache.org/jira/browse/SOLR-4708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792903#comment-13792903
 ] 

ASF subversion and git services commented on SOLR-4708:
---

Commit 1531380 from [~dawidweiss] in branch 'dev/branches/lucene_solr_4_5'
[ https://svn.apache.org/r1531380 ]

SOLR-5323: Disable ClusteringComponent by default in collection1 example. The 
solr.clustering.enabled system property needs to be set to 'true' to enable the 
clustering contrib (reverts SOLR-4708). (Dawid Weiss)

 Enable ClusteringComponent by default
 -

 Key: SOLR-4708
 URL: https://issues.apache.org/jira/browse/SOLR-4708
 Project: Solr
  Issue Type: Task
Reporter: Erik Hatcher
Assignee: Dawid Weiss
Priority: Minor
 Fix For: 4.5, 5.0

 Attachments: SOLR-4708.patch, SOLR-4708.patch


 In the past, the ClusteringComponent used to rely on 3rd party JARs not 
 available from a Solr distro.  This is no longer the case, but the /browse UI 
 and other references still had the clustering component disabled in the 
 example with an awkward system property way to enable it.  Let's remove all 
 of that unnecessary stuff and just enable it as it works out of the box now.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5323) Solr requires -Dsolr.clustering.enabled=false when pointing at example config

2013-10-11 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792902#comment-13792902
 ] 

ASF subversion and git services commented on SOLR-5323:
---

Commit 1531380 from [~dawidweiss] in branch 'dev/branches/lucene_solr_4_5'
[ https://svn.apache.org/r1531380 ]

SOLR-5323: Disable ClusteringComponent by default in collection1 example. The 
solr.clustering.enabled system property needs to be set to 'true' to enable the 
clustering contrib (reverts SOLR-4708). (Dawid Weiss)

 Solr requires -Dsolr.clustering.enabled=false when pointing at example config
 -

 Key: SOLR-5323
 URL: https://issues.apache.org/jira/browse/SOLR-5323
 Project: Solr
  Issue Type: Bug
  Components: contrib - Clustering
Affects Versions: 4.5
 Environment: vanilla mac
Reporter: John Berryman
Assignee: Dawid Weiss
 Fix For: 4.5.1, 4.6, 5.0

 Attachments: SOLR-5323.patch


 my typical use of Solr is something like this: 
 {code}
 cd SOLR_HOME/example
 cp -r solr /myProjectDir/solr_home
 java -jar -Dsolr.solr.home=/myProjectDir/solr_home  start.jar
 {code}
 But in solr 4.5.0 this fails to start successfully. I get an error:
 {code}
 org.apache.solr.common.SolrException: Error loading class 
 'solr.clustering.ClusteringComponent'
 {code}
 The reason is because solr.clustering.enabled defaults to true now. I don't 
 know why this might be the case.
 you can get around it with 
 {code}
 java -jar -Dsolr.solr.home=/myProjectDir/solr_home 
 -Dsolr.clustering.enabled=false start.jar
 {code}
 SOLR-4708 is when this became an issue.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-5323) Solr requires -Dsolr.clustering.enabled=false when pointing at example config


 [ 
https://issues.apache.org/jira/browse/SOLR-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss resolved SOLR-5323.
---

Resolution: Fixed

Applied to branch_4x, lucene_solr_4_5 and trunk.

 Solr requires -Dsolr.clustering.enabled=false when pointing at example config
 -

 Key: SOLR-5323
 URL: https://issues.apache.org/jira/browse/SOLR-5323
 Project: Solr
  Issue Type: Bug
  Components: contrib - Clustering
Affects Versions: 4.5
 Environment: vanilla mac
Reporter: John Berryman
Assignee: Dawid Weiss
 Fix For: 4.5.1, 4.6, 5.0

 Attachments: SOLR-5323.patch


 my typical use of Solr is something like this: 
 {code}
 cd SOLR_HOME/example
 cp -r solr /myProjectDir/solr_home
 java -jar -Dsolr.solr.home=/myProjectDir/solr_home  start.jar
 {code}
 But in solr 4.5.0 this fails to start successfully. I get an error:
 {code}
 org.apache.solr.common.SolrException: Error loading class 
 'solr.clustering.ClusteringComponent'
 {code}
 The reason is because solr.clustering.enabled defaults to true now. I don't 
 know why this might be the case.
 you can get around it with 
 {code}
 java -jar -Dsolr.solr.home=/myProjectDir/solr_home 
 -Dsolr.clustering.enabled=false start.jar
 {code}
 SOLR-4708 is when this became an issue.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5274) Teach fast FastVectorHighlighter to highlight child fields with parent fields

2013-10-11 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792907#comment-13792907
 ] 

Robert Muir commented on LUCENE-5274:
-

Why would a highlighter improvement require mocktokenizer changes?

 Teach fast FastVectorHighlighter to highlight child fields with parent 
 fields
 ---

 Key: LUCENE-5274
 URL: https://issues.apache.org/jira/browse/LUCENE-5274
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/other
Reporter: Nik Everett
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-5274.patch


 I've been messing around with the FastVectorHighlighter and it looks like I 
 can teach it to highlight matches on child fields.  Like this query:
 foo:scissors foo_exact:running
 would highlight foo like this:
 emrunning/em with emscissors/em
 Where foo is stored WITH_POSITIONS_OFFSETS and foo_plain is an unstored copy 
 of foo a different analyzer and its own WITH_POSITIONS_OFFSETS.
 This would make queries that perform weighted matches against different 
 analyzers much more convenient to highlight.
 I have working code and test cases but they are hacked into Elasticsearch.  
 I'd love to Lucene-ify if you'll take them.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5275) Fix AttributeSource.toString()


[ 
https://issues.apache.org/jira/browse/LUCENE-5275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792912#comment-13792912
 ] 

ASF subversion and git services commented on LUCENE-5275:
-

Commit 1531381 from [~rcmuir] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1531381 ]

LUCENE-5275: Change AttributeSource.toString to display the current state of 
attributes

 Fix AttributeSource.toString()
 --

 Key: LUCENE-5275
 URL: https://issues.apache.org/jira/browse/LUCENE-5275
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-5275.patch, LUCENE-5275.patch


 Its currently just Object.toString, e.g.:
 org.apache.lucene.analysis.en.PorterStemFilter@8a32165c
 But I think we should make it more useful, to end users trying to see what 
 their chain is doing, and to make SOPs easier when debugging:
 {code}
 EnglishAnalyzer analyzer = new EnglishAnalyzer(TEST_VERSION_CURRENT);
 try (TokenStream ts = analyzer.tokenStream(body, Its 2013, let's fix this 
 already!)) {
   ts.reset();
   while (ts.incrementToken()) {
 System.out.println(ts.toString());
   }
   ts.end();
 }
 {code}
 Proposed output:
 {noformat}
 PorterStemFilter@8a32165c term=it,bytes=[69 
 74],startOffset=0,endOffset=3,positionIncrement=1,type=ALPHANUM,keyword=false
 PorterStemFilter@987b9eea term=2013,bytes=[32 30 31 
 33],startOffset=4,endOffset=8,positionIncrement=1,type=NUM,keyword=false
 PorterStemFilter@6b5dbd1f term=let,bytes=[6c 65 
 74],startOffset=10,endOffset=15,positionIncrement=1,type=ALPHANUM,keyword=false
 PorterStemFilter@45cbde1b term=fix,bytes=[66 69 
 78],startOffset=16,endOffset=19,positionIncrement=1,type=ALPHANUM,keyword=false
 PorterStemFilter@bcd8f627 term=alreadi,bytes=[61 6c 72 65 61 64 
 69],startOffset=25,endOffset=32,positionIncrement=2,type=ALPHANUM,keyword=false
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4816) Add document routing to CloudSolrServer

2013-10-11 Thread Jessica Cheng (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792911#comment-13792911
 ] 

Jessica Cheng commented on SOLR-4816:
-

I think the latest patch:

-if (request instanceof IsUpdateRequest  updatesToLeaders) {
+if (request instanceof IsUpdateRequest) {

removed the effect of the updatesToLeaders variable. Looking at 
http://svn.apache.org/viewvc/lucene/dev/branches/lucene_solr_4_5/solr/solrj/src/java/org/apache/solr/client/solrj/impl/CloudSolrServer.java?view=markup
 it's not used anywhere to make a decision anymore.

 Add document routing to CloudSolrServer
 ---

 Key: SOLR-4816
 URL: https://issues.apache.org/jira/browse/SOLR-4816
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Joel Bernstein
Assignee: Mark Miller
Priority: Minor
 Fix For: 4.5, 5.0

 Attachments: RequestTask-removal.patch, SOLR-4816.patch, 
 SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
 SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
 SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
 SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
 SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
 SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
 SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816-sriesenberg.patch


 This issue adds the following enhancements to CloudSolrServer's update logic:
 1) Document routing: Updates are routed directly to the correct shard leader 
 eliminating document routing at the server.
 2) Optional parallel update execution: Updates for each shard are executed in 
 a separate thread so parallel indexing can occur across the cluster.
 These enhancements should allow for near linear scalability on indexing 
 throughput.
 Usage:
 CloudSolrServer cloudClient = new CloudSolrServer(zkAddress);
 cloudClient.setParallelUpdates(true); 
 SolrInputDocument doc1 = new SolrInputDocument();
 doc1.addField(id, 0);
 doc1.addField(a_t, hello1);
 SolrInputDocument doc2 = new SolrInputDocument();
 doc2.addField(id, 2);
 doc2.addField(a_t, hello2);
 UpdateRequest request = new UpdateRequest();
 request.add(doc1);
 request.add(doc2);
 request.setAction(AbstractUpdateRequest.ACTION.OPTIMIZE, false, false);
 NamedList response = cloudClient.request(request); // Returns a backwards 
 compatible condensed response.
 //To get more detailed response down cast to RouteResponse:
 CloudSolrServer.RouteResponse rr = (CloudSolrServer.RouteResponse)response;



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5274) Teach fast FastVectorHighlighter to highlight child fields with parent fields

[
https://issues.apache.org/jira/browse/LUCENE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792913#comment-13792913
]

Nik Everett commented on LUCENE-5274:
-

Hey, forgot to mention that. MockTokenizer seems to throw away the character
after the end of each token even if that character is the valid start to the
next token. This comes up because I wanted to tokenize strings in a simplistic
way to test that the highlighter can handle different tokenizers and it just
wasn't working right. So I fixed MockTokenizer but I did it in a pretty
brutal way. I'm happy to move the change to another bug and improve it but
testing the highlighter change without it is a bit painful.

Teach fast FastVectorHighlighter to highlight child fields with parent
fields
---

--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5274) Teach fast FastVectorHighlighter to highlight child fields with parent fields


[ 
https://issues.apache.org/jira/browse/LUCENE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792921#comment-13792921
 ] 

Robert Muir commented on LUCENE-5274:
-

if you suspect there is a bug in mocktokenizer, please open a separate issue 
for that. mocktokenizer is used by like, thousands of tests :)

 Teach fast FastVectorHighlighter to highlight child fields with parent 
 fields
 ---

 Key: LUCENE-5274
 URL: https://issues.apache.org/jira/browse/LUCENE-5274
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/other
Reporter: Nik Everett
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-5274.patch


 I've been messing around with the FastVectorHighlighter and it looks like I 
 can teach it to highlight matches on child fields.  Like this query:
 foo:scissors foo_exact:running
 would highlight foo like this:
 emrunning/em with emscissors/em
 Where foo is stored WITH_POSITIONS_OFFSETS and foo_plain is an unstored copy 
 of foo a different analyzer and its own WITH_POSITIONS_OFFSETS.
 This would make queries that perform weighted matches against different 
 analyzers much more convenient to highlight.
 I have working code and test cases but they are hacked into Elasticsearch.  
 I'd love to Lucene-ify if you'll take them.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-5275) Fix AttributeSource.toString()


 [ 
https://issues.apache.org/jira/browse/LUCENE-5275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-5275.
-

   Resolution: Fixed
Fix Version/s: 5.0
   4.6

 Fix AttributeSource.toString()
 --

 Key: LUCENE-5275
 URL: https://issues.apache.org/jira/browse/LUCENE-5275
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Fix For: 4.6, 5.0

 Attachments: LUCENE-5275.patch, LUCENE-5275.patch


 Its currently just Object.toString, e.g.:
 org.apache.lucene.analysis.en.PorterStemFilter@8a32165c
 But I think we should make it more useful, to end users trying to see what 
 their chain is doing, and to make SOPs easier when debugging:
 {code}
 EnglishAnalyzer analyzer = new EnglishAnalyzer(TEST_VERSION_CURRENT);
 try (TokenStream ts = analyzer.tokenStream(body, Its 2013, let's fix this 
 already!)) {
   ts.reset();
   while (ts.incrementToken()) {
 System.out.println(ts.toString());
   }
   ts.end();
 }
 {code}
 Proposed output:
 {noformat}
 PorterStemFilter@8a32165c term=it,bytes=[69 
 74],startOffset=0,endOffset=3,positionIncrement=1,type=ALPHANUM,keyword=false
 PorterStemFilter@987b9eea term=2013,bytes=[32 30 31 
 33],startOffset=4,endOffset=8,positionIncrement=1,type=NUM,keyword=false
 PorterStemFilter@6b5dbd1f term=let,bytes=[6c 65 
 74],startOffset=10,endOffset=15,positionIncrement=1,type=ALPHANUM,keyword=false
 PorterStemFilter@45cbde1b term=fix,bytes=[66 69 
 78],startOffset=16,endOffset=19,positionIncrement=1,type=ALPHANUM,keyword=false
 PorterStemFilter@bcd8f627 term=alreadi,bytes=[61 6c 72 65 61 64 
 69],startOffset=25,endOffset=32,positionIncrement=2,type=ALPHANUM,keyword=false
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-4073) Overseer will miss operations in some cases for OverseerCollectionProcessor

[
https://issues.apache.org/jira/browse/SOLR-4073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mark Miller resolved SOLR-4073.
---

Resolution: Duplicate
Fix Version/s: (was: 4.6)

Overseer will miss operations in some cases for OverseerCollectionProcessor

Key: SOLR-4073
URL: https://issues.apache.org/jira/browse/SOLR-4073
Project: Solr
Issue Type: Bug
Components: SolrCloud
Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0, 4.1, 4.2
Environment: Solr cloud
Reporter: Raintung Li
Assignee: Mark Miller
Attachments: patch-4073

Original Estimate: 168h
Remaining Estimate: 168h

One overseer disconnect with Zookeeper, but overseer thread still handle the
request(A) in the DistributedQueue. Example: overseer thread reconnect
Zookeeper try to remove the Top's request. workQueue.remove();.
Now the other server will take over the overseer privilege because old
overseer disconnect. Start overseer thread and handle the queue request(A)
again, and remove the request(A) from queue, then try to get the top's
request(B, doesn't get). In the this time old overseer reconnect with
ZooKeeper, and remove the top's request from queue. Now the top request is B,
it is moved by old overseer server. New overseer server never do B
request,because this request deleted by old overseer server, at the last this
request(B) miss operations.
At best, distributeQueue.peek can get the request's ID that will be removed
for workqueue.remove(ID), not remove the top's request.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5265) Make BlockPackedWriter constructor take an acceptable overhead ratio

2013-10-11 Thread Adrien Grand (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-5265:
-

Attachment: LUCENE-5265.patch

Here is a patch.

 Make BlockPackedWriter constructor take an acceptable overhead ratio
 

 Key: LUCENE-5265
 URL: https://issues.apache.org/jira/browse/LUCENE-5265
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Priority: Minor
 Attachments: LUCENE-5265.patch


 Follow-up of http://search-lucene.com/m/SjmSW1CZYuZ1
 MemoryDocValuesFormat takes an acceptable overhead ratio but it is only used 
 when doing table compression. It should be used for all compression methods, 
 especially DELTA_COMPRESSED whose encoding is based on BlockPackedWriter.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5266) Optimization of the direct PackedInts readers

2013-10-11 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792948#comment-13792948
 ] 

Adrien Grand commented on LUCENE-5266:
--

bq. The only caveat is the encoding would need to ensure there is always an 
extra 2 bytes at the end.

There are some places (codecs) where I encode many short sequences 
consecutively so I care about not wasting extra bytes but if this proves to 
help performance, I think it shouldn't be too hard to do add the ability to 
have extra bytes at the end of the stream (I'm thinking about adding a new 
PackedInts.Format to the enum but there might be other options).

 Optimization of the direct PackedInts readers
 -

 Key: LUCENE-5266
 URL: https://issues.apache.org/jira/browse/LUCENE-5266
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-5266.patch, LUCENE-5266.patch


 Given that the initial focus for PackedInts readers was more on in-memory 
 readers (for storing stuff like the mapping from old to new doc IDs at 
 merging time), I never spent time trying to optimize the direct readers 
 although it could be beneficial now that they are used for disk-based doc 
 values.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-5340) Add support for named snapshots

2013-10-11 Thread Mike Schrag (JIRA)

Mike Schrag created SOLR-5340:
-

 Summary: Add support for named snapshots
 Key: SOLR-5340
 URL: https://issues.apache.org/jira/browse/SOLR-5340
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Affects Versions: 4.5
Reporter: Mike Schrag


It would be really nice if Solr supported named snapshots. Right now if you 
snapshot a SolrCloud cluster, every node potentially records a slightly 
different timestamp. Correlating those back together to effectively restore the 
entire cluster to a consistent snapshot is pretty tedious.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-5278) MockTokenizer throws away the character right after a token even if it is a valid start to a new token

Nik Everett created LUCENE-5278:
---

 Summary: MockTokenizer throws away the character right after a 
token even if it is a valid start to a new token
 Key: LUCENE-5278
 URL: https://issues.apache.org/jira/browse/LUCENE-5278
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Nik Everett
Priority: Trivial


MockTokenizer throws away the character right after a token even if it is a 
valid start to a new token.  You won't see this unless you build a tokenizer 
that can recognize every character like with new RegExp(.) or RegExp(...).

Changing this behaviour seems to break a number of tests.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5274) Teach fast FastVectorHighlighter to highlight child fields with parent fields


[ 
https://issues.apache.org/jira/browse/LUCENE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792974#comment-13792974
 ] 

Nik Everett commented on LUCENE-5274:
-

Filed LUCENE-5278.

 Teach fast FastVectorHighlighter to highlight child fields with parent 
 fields
 ---

 Key: LUCENE-5274
 URL: https://issues.apache.org/jira/browse/LUCENE-5274
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/other
Reporter: Nik Everett
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-5274.patch


 I've been messing around with the FastVectorHighlighter and it looks like I 
 can teach it to highlight matches on child fields.  Like this query:
 foo:scissors foo_exact:running
 would highlight foo like this:
 emrunning/em with emscissors/em
 Where foo is stored WITH_POSITIONS_OFFSETS and foo_plain is an unstored copy 
 of foo a different analyzer and its own WITH_POSITIONS_OFFSETS.
 This would make queries that perform weighted matches against different 
 analyzers much more convenient to highlight.
 I have working code and test cases but they are hacked into Elasticsearch.  
 I'd love to Lucene-ify if you'll take them.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5278) MockTokenizer throws away the character right after a token even if it is a valid start to a new token


 [ 
https://issues.apache.org/jira/browse/LUCENE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nik Everett updated LUCENE-5278:


Attachment: LUCENE-5278.patch

This patch fixes the behaviour from my perspective but breaks a bunch of 
other tests.

 MockTokenizer throws away the character right after a token even if it is a 
 valid start to a new token
 --

 Key: LUCENE-5278
 URL: https://issues.apache.org/jira/browse/LUCENE-5278
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Nik Everett
Priority: Trivial
 Attachments: LUCENE-5278.patch


 MockTokenizer throws away the character right after a token even if it is a 
 valid start to a new token.  You won't see this unless you build a tokenizer 
 that can recognize every character like with new RegExp(.) or RegExp(...).
 Changing this behaviour seems to break a number of tests.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5278) MockTokenizer throws away the character right after a token even if it is a valid start to a new token


[ 
https://issues.apache.org/jira/browse/LUCENE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792993#comment-13792993
 ] 

Robert Muir commented on LUCENE-5278:
-

I think i understand what you want: it makes sense. The only reason its the way 
it is today is because this thing historically came from CharTokenizer (see the 
isTokenChar?).

But it would be better if you could e.g. make a pattern like ([A-Z]a-z+) and 
for it to actually break FooBar into Foo, Bar rather than throwout out bar 
all together.

I'll dig into this!

 MockTokenizer throws away the character right after a token even if it is a 
 valid start to a new token
 --

 Key: LUCENE-5278
 URL: https://issues.apache.org/jira/browse/LUCENE-5278
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Nik Everett
Priority: Trivial
 Attachments: LUCENE-5278.patch


 MockTokenizer throws away the character right after a token even if it is a 
 valid start to a new token.  You won't see this unless you build a tokenizer 
 that can recognize every character like with new RegExp(.) or RegExp(...).
 Changing this behaviour seems to break a number of tests.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5274) Teach fast FastVectorHighlighter to highlight child fields with parent fields

[
https://issues.apache.org/jira/browse/LUCENE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13793000#comment-13793000
]

Robert Muir commented on LUCENE-5274:
-

Thanks Nik: I can help with that one!

Another question: about the MergedIterator :)

I can see the possible use case here, but I think it deserves some discussion
first (versus just making it public).
This thing has limitations (its currently only used by indexwriter for
buffereddeletes, its basically like a MultiTerms over an Iterator). For example
each iterator it consumes should not have duplicate values according to its
compareTo(): its not clear to me this WeightedPhraseInfo behaves this way:
* what if you have a synonym of dog sitting on top of cat with the same
boost factor... its a duplicate according to that compareTo, but the text is
different.
* what if the synonym is just dog with posinc=0 stacked ontop of itself
(which is totally valid to do)...

Perhaps highlighting can make use of it, but its unclear to me that its really
following the contract. Furthermore the class in question (WeightedPhraseInfo)
is public, and adding Comparable to it looks like it will create a situation
where its inconsistent with equals()... I think this is a little dangerous.

If it turns out we can reuse it: great! But i think rather than just slapping
public on it, we should move it to .util, ensure it has good javadocs and unit
tests, and investigate what exactly happens when these contracts are violated:
e.g. can we make an exception happen rather than just broken behavior in a way
that won't hurt performance and so on?

Teach fast FastVectorHighlighter to highlight child fields with parent
fields
---

--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (LUCENE-5278) MockTokenizer throws away the character right after a token even if it is a valid start to a new token


 [ 
https://issues.apache.org/jira/browse/LUCENE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir reassigned LUCENE-5278:
---

Assignee: Robert Muir

 MockTokenizer throws away the character right after a token even if it is a 
 valid start to a new token
 --

 Key: LUCENE-5278
 URL: https://issues.apache.org/jira/browse/LUCENE-5278
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Nik Everett
Assignee: Robert Muir
Priority: Trivial
 Attachments: LUCENE-5278.patch


 MockTokenizer throws away the character right after a token even if it is a 
 valid start to a new token.  You won't see this unless you build a tokenizer 
 that can recognize every character like with new RegExp(.) or RegExp(...).
 Changing this behaviour seems to break a number of tests.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5274) Teach fast FastVectorHighlighter to highlight child fields with parent fields

[
https://issues.apache.org/jira/browse/LUCENE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13793018#comment-13793018
]

Nik Everett commented on LUCENE-5274:
-

{quote}
I can see the possible use case here, but I think it deserves some discussion
first (versus just making it public).
{quote}
Sure! I'm more used to Guava's tools so I think I was lulled in to a false
sense of recognition. No chance of updating to a modern version of Guava?:)

{quote}
This thing has limitations (its currently only used by indexwriter for
buffereddeletes, its basically like a MultiTerms over an Iterator). For example
each iterator it consumes should not have duplicate values according to its
compareTo(): its not clear to me this WeightedPhraseInfo behaves this way
{quote}
Yikes! I didn't catch that but now that you point it out it is right there in
the docs and I should have. WeightedPhraseInfo doesn't behave that way and

{quote}
Furthermore the class in question (WeightedPhraseInfo) is public, and adding
Comparable to it looks like it will create a situation where its inconsistent
with equals()... I think this is a little dangerous.
{quote}
I agree on the inconsistent with inconsistent with equals. I can either fix
that or use a Comparator for sorting both WeightedPhraseInfo and Toffs. That'd
require a MergeSorter that can take one but

{quote}
If it turns out we can reuse it: great! But i think rather than just slapping
public on it, we should move it to .util, ensure it has good javadocs and unit
tests, and investigate what exactly happens when these contracts are violated:
e.g. can we make an exception happen rather than just broken behavior in a way
that won't hurt performance and so on?
{quote}
Makes sense to me.

Teach fast FastVectorHighlighter to highlight child fields with parent
fields
---

--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5274) Teach fast FastVectorHighlighter to highlight child fields with parent fields

[
https://issues.apache.org/jira/browse/LUCENE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13793029#comment-13793029
]

Robert Muir commented on LUCENE-5274:
-

{quote}
Sure! I'm more used to Guava's tools so I think I was lulled in to a false
sense of recognition. No chance of updating to a modern version of Guava?
{quote}

There is no lucene dependency on guava. I don't think we should introduce one,
and it wouldnt solve the issues i mentioned anyway (e.g. comparable
inconsistent with equals and stuff). It would only add 2.1MB of bloated
unnecessary syntactic sugar (sorry, thats just my opinion on it, i think its
useless).

Teach fast FastVectorHighlighter to highlight child fields with parent
fields
---

--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5027) Field Collapsing PostFilter

2013-10-11 Thread Joel Bernstein (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-5027:
-

Attachment: SOLR-5027.patch

 Field Collapsing PostFilter
 ---

 Key: SOLR-5027
 URL: https://issues.apache.org/jira/browse/SOLR-5027
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 5.0
Reporter: Joel Bernstein
Assignee: Joel Bernstein
Priority: Minor
 Fix For: 4.6, 5.0

 Attachments: SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, 
 SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, 
 SOLR-5027.patch, SOLR-5027.patch


 This ticket introduces the *CollapsingQParserPlugin* 
 The *CollapsingQParserPlugin* is a PostFilter that performs field collapsing. 
 This is a high performance alternative to standard Solr field collapsing 
 (with *ngroups*) when the number of distinct groups in the result set is high.
 For example in one performance test, a search with 10 million full results 
 and 1 million collapsed groups:
 Standard grouping with ngroups : 17 seconds.
 CollapsingQParserPlugin: 300 milli-seconds.
 Sample syntax:
 Collapse based on the highest scoring document:
 {code}
 fq=(!collapse field=field_name}
 {code}
 Collapse based on the min value of a numeric field:
 {code}
 fq={!collapse field=field_name min=field_name}
 {code}
 Collapse based on the max value of a numeric field:
 {code}
 fq={!collapse field=field_name max=field_name}
 {code}
 Collapse with a null policy:
 {code}
 fq={!collapse field=field_name nullPolicy=null_policy}
 {code}
 There are three null policies:
 ignore : removes docs with a null value in the collapse field (default).
 expand : treats each doc with a null value in the collapse field as a 
 separate group.
 collapse : collapses all docs with a null value into a single group using 
 either highest score, or min/max.
 The CollapsingQParserPlugin also fully supports the QueryElevationComponent
 *Note:*  The July 16 patch also includes and ExpandComponent that expands the 
 collapsed groups for the current search result page. This functionality will 
 be moved to it's own ticket.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5027) Field Collapsing PostFilter

2013-10-11 Thread Joel Bernstein (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13793035#comment-13793035
 ] 

Joel Bernstein commented on SOLR-5027:
--

Patch that passes precommit for trunk

 Field Collapsing PostFilter
 ---

 Key: SOLR-5027
 URL: https://issues.apache.org/jira/browse/SOLR-5027
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 5.0
Reporter: Joel Bernstein
Assignee: Joel Bernstein
Priority: Minor
 Fix For: 4.6, 5.0

 Attachments: SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, 
 SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, 
 SOLR-5027.patch, SOLR-5027.patch


 This ticket introduces the *CollapsingQParserPlugin* 
 The *CollapsingQParserPlugin* is a PostFilter that performs field collapsing. 
 This is a high performance alternative to standard Solr field collapsing 
 (with *ngroups*) when the number of distinct groups in the result set is high.
 For example in one performance test, a search with 10 million full results 
 and 1 million collapsed groups:
 Standard grouping with ngroups : 17 seconds.
 CollapsingQParserPlugin: 300 milli-seconds.
 Sample syntax:
 Collapse based on the highest scoring document:
 {code}
 fq=(!collapse field=field_name}
 {code}
 Collapse based on the min value of a numeric field:
 {code}
 fq={!collapse field=field_name min=field_name}
 {code}
 Collapse based on the max value of a numeric field:
 {code}
 fq={!collapse field=field_name max=field_name}
 {code}
 Collapse with a null policy:
 {code}
 fq={!collapse field=field_name nullPolicy=null_policy}
 {code}
 There are three null policies:
 ignore : removes docs with a null value in the collapse field (default).
 expand : treats each doc with a null value in the collapse field as a 
 separate group.
 collapse : collapses all docs with a null value into a single group using 
 either highest score, or min/max.
 The CollapsingQParserPlugin also fully supports the QueryElevationComponent
 *Note:*  The July 16 patch also includes and ExpandComponent that expands the 
 collapsed groups for the current search result page. This functionality will 
 be moved to it's own ticket.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5212) java 7u40 causes sigsegv and corrupt term vectors

2013-10-11 Thread Bill Bell (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13793037#comment-13793037
 ] 

Bill Bell commented on LUCENE-5212:
---

It appears this happens on 7u40 64-bit too. See 
https://bugs.openjdk.java.net/browse/JDK-8024830

Am I reading this wrong?

Start failing around hs24-b21:

   [junit4] # SIGSEGV (0xb) at pc=0xfd7ff91d9f7d, pid=23810, tid=343
   [junit4] #
   [junit4] # JRE version: Java(TM) SE Runtime Environment (8.0-b54)
   [junit4] # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.0-b21 mixed mode 
solaris-amd64 )
   [junit4] # Problematic frame:
   [junit4] # J 
org.apache.lucene.codecs.compressing.CompressingTermVectorsReader.get(I)Lorg/apache/lucene/index/Fields;
   [junit4] #

Note, first 7u40 build b01 has hs24-b24.

Next, I will try to find changeset.



 java 7u40 causes sigsegv and corrupt term vectors
 -

 Key: LUCENE-5212
 URL: https://issues.apache.org/jira/browse/LUCENE-5212
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: crashFaster2.0.patch, crashFaster.patch, 
 hs_err_pid32714.log, jenkins.txt






--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5274) Teach fast FastVectorHighlighter to highlight child fields with parent fields

[
https://issues.apache.org/jira/browse/LUCENE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13793038#comment-13793038
]

Nik Everett commented on LUCENE-5274:
-

{{quote}}
There is no lucene dependency on guava. I don't think we should introduce one,
and it wouldnt solve the issues i mentioned anyway (e.g. comparable
inconsistent with equals and stuff). It would only add 2.1MB of bloated
unnecessary syntactic sugar (sorry, thats just my opinion on it, i think its
useless).

We should keep our third party dependencies minimal and necessary so that any
app using lucene can choose for itself what version of this stuff (if any) it
wants to use. If we rely upon unnecessary stuff it hurts the end user by
forcing them to compatible versions.
{{quote}}
I figured that was the reasoning and I don't intend to argue with it. In this
case it would provide a method to merge sorted iterators just like
MergedIterator only without the caveats around duplication but I'm happy to
work around it. Guava certainly wouldn't fix my forgetting equals and hashcode.

Teach fast FastVectorHighlighter to highlight child fields with parent
fields
---

--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-5279) Don't use recursion in DisjunctionSumScorer.countMatches

2013-10-11 Thread Michael McCandless (JIRA)

Michael McCandless created LUCENE-5279:
--

 Summary: Don't use recursion in DisjunctionSumScorer.countMatches
 Key: LUCENE-5279
 URL: https://issues.apache.org/jira/browse/LUCENE-5279
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless


I noticed the TODO in there, to not use recursion, so I fixed it to just use a 
private queue ...



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5279) Don't use recursion in DisjunctionSumScorer.countMatches

2013-10-11 Thread Michael McCandless (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-5279:
---

Attachment: LUCENE-5279.patch

Patch.

However, it seems to be slower, testing on full Wikpedia en:

{noformat}
Report after iter 10:
TaskQPS base  StdDevQPS comp  StdDev
Pct diff
   OrHighLow   14.44  (7.7%)   12.48  (4.7%)  
-13.6% ( -24% -   -1%)
  OrHighHigh5.56  (6.2%)4.86  (4.4%)  
-12.6% ( -21% -   -2%)
   OrHighMed   18.62  (6.7%)   16.29  (4.4%)  
-12.5% ( -22% -   -1%)
  AndHighLow  398.09  (1.6%)  390.34  (2.3%)   
-1.9% (  -5% -1%)
OrNotHighLow  374.60  (1.7%)  369.61  (1.7%)   
-1.3% (  -4% -2%)
  Fuzzy1   67.10  (2.1%)   66.41  (2.2%)   
-1.0% (  -5% -3%)
OrNotHighMed   51.68  (1.7%)   51.37  (1.5%)   
-0.6% (  -3% -2%)
  Fuzzy2   46.73  (2.8%)   46.45  (2.6%)   
-0.6% (  -5% -4%)
OrHighNotLow   20.05  (3.5%)   19.96  (5.0%)   
-0.5% (  -8% -8%)
OrHighNotMed   27.15  (3.2%)   27.05  (4.8%)   
-0.3% (  -8% -7%)
   OrNotHighHigh7.72  (3.2%)7.70  (4.7%)   
-0.3% (  -7% -7%)
   OrHighNotHigh9.81  (3.0%)9.79  (4.5%)   
-0.1% (  -7% -7%)
 LowSloppyPhrase   43.83  (1.9%)   43.89  (2.1%)
0.2% (  -3% -4%)
  IntNRQ3.49  (4.5%)3.50  (4.1%)
0.2% (  -8% -9%)
 Prefix3   70.74  (2.7%)   71.01  (2.4%)
0.4% (  -4% -5%)
HighTerm   65.33  (3.0%)   65.62 (13.5%)
0.4% ( -15% -   17%)
 MedSloppyPhrase3.47  (3.5%)3.49  (4.7%)
0.6% (  -7% -9%)
   LowPhrase   13.06  (1.5%)   13.14  (2.0%)
0.6% (  -2% -4%)
Wildcard   16.71  (2.9%)   16.82  (2.2%)
0.7% (  -4% -5%)
 MedTerm  100.90  (2.5%)  101.71 (10.4%)
0.8% ( -11% -   14%)
 LowTerm  311.85  (1.4%)  314.53  (6.4%)
0.9% (  -6% -8%)
HighSpanNear8.06  (5.1%)8.13  (5.9%)
0.9% (  -9% -   12%)
 Respell   48.00  (2.3%)   48.45  (2.8%)
0.9% (  -4% -6%)
HighSloppyPhrase3.40  (4.1%)3.43  (6.6%)
1.0% (  -9% -   12%)
  AndHighMed   34.14  (1.6%)   34.52  (1.7%)
1.1% (  -2% -4%)
 AndHighHigh   28.15  (1.7%)   28.48  (1.7%)
1.2% (  -2% -4%)
 MedSpanNear   30.62  (2.8%)   31.07  (3.2%)
1.5% (  -4% -7%)
 LowSpanNear   10.30  (2.6%)   10.48  (2.9%)
1.7% (  -3% -7%)
   MedPhrase  195.60  (5.1%)  201.44  (6.6%)
3.0% (  -8% -   15%)
  HighPhrase4.17  (5.6%)4.34  (6.9%)
4.0% (  -8% -   17%)
{noformat}

So ... I don't plan on pursuing it any further, but wanted to open the issue in 
case anybody wants to try ...

 Don't use recursion in DisjunctionSumScorer.countMatches
 

 Key: LUCENE-5279
 URL: https://issues.apache.org/jira/browse/LUCENE-5279
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
 Attachments: LUCENE-5279.patch


 I noticed the TODO in there, to not use recursion, so I fixed it to just use 
 a private queue ...



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5278) MockTokenizer throws away the character right after a token even if it is a valid start to a new token

[
https://issues.apache.org/jira/browse/LUCENE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Robert Muir updated LUCENE-5278:

Attachment: LUCENE-5278.patch

Nice patch Nik!

I think this is ready: i tweaked variable names and rearranged stuff (e.g. i
use -1 instead of Integer so we arent boxing and a few other things).

I also added some unit tests.

The main issues why tests were failing with your original patch:
* reset() needed to clear the buffer variables.
* the state machine needed some particular extra check when emitting a token:
e.g. if you make a regex of .., but you send it abcde, the tokens should be
ab, cd, but not e. so when we end on a partial match, we have to check
that we are in an accept state.
* term-limit-exceeded is a special case (versus last character being in a
reject state)

MockTokenizer throws away the character right after a token even if it is a
valid start to a new token
--

Key: LUCENE-5278
URL: https://issues.apache.org/jira/browse/LUCENE-5278
Project: Lucene - Core
Issue Type: Bug
Reporter: Nik Everett
Assignee: Robert Muir
Priority: Trivial
Attachments: LUCENE-5278.patch, LUCENE-5278.patch

MockTokenizer throws away the character right after a token even if it is a
valid start to a new token. You won't see this unless you build a tokenizer
that can recognize every character like with new RegExp(.) or RegExp(...).
Changing this behaviour seems to break a number of tests.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5278) MockTokenizer throws away the character right after a token even if it is a valid start to a new token


 [ 
https://issues.apache.org/jira/browse/LUCENE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-5278:


Attachment: LUCENE-5278.patch

added a few more tests to TestMockAnalyzer so all these crazy corner cases are 
found there and not debugging other tests :)

 MockTokenizer throws away the character right after a token even if it is a 
 valid start to a new token
 --

 Key: LUCENE-5278
 URL: https://issues.apache.org/jira/browse/LUCENE-5278
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Nik Everett
Assignee: Robert Muir
Priority: Trivial
 Attachments: LUCENE-5278.patch, LUCENE-5278.patch, LUCENE-5278.patch


 MockTokenizer throws away the character right after a token even if it is a 
 valid start to a new token.  You won't see this unless you build a tokenizer 
 that can recognize every character like with new RegExp(.) or RegExp(...).
 Changing this behaviour seems to break a number of tests.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5274) Teach fast FastVectorHighlighter to highlight child fields with parent fields

2013-10-11 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13793204#comment-13793204
 ] 

Robert Muir commented on LUCENE-5274:
-

Yeah I guess for me, its not a caveat at all, but a feature :)

We need to iterate sorted-union for stuff in the index like terms and fields, 
so they appear as if they exist only once.
The guava one isn't doing a union operation but just simply maintaining 
compareTo() order...


 Teach fast FastVectorHighlighter to highlight child fields with parent 
 fields
 ---

 Key: LUCENE-5274
 URL: https://issues.apache.org/jira/browse/LUCENE-5274
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/other
Reporter: Nik Everett
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-5274.patch


 I've been messing around with the FastVectorHighlighter and it looks like I 
 can teach it to highlight matches on child fields.  Like this query:
 foo:scissors foo_exact:running
 would highlight foo like this:
 emrunning/em with emscissors/em
 Where foo is stored WITH_POSITIONS_OFFSETS and foo_plain is an unstored copy 
 of foo a different analyzer and its own WITH_POSITIONS_OFFSETS.
 This would make queries that perform weighted matches against different 
 analyzers much more convenient to highlight.
 I have working code and test cases but they are hacked into Elasticsearch.  
 I'd love to Lucene-ify if you'll take them.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5278) MockTokenizer throws away the character right after a token even if it is a valid start to a new token


[ 
https://issues.apache.org/jira/browse/LUCENE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13793205#comment-13793205
 ] 

ASF subversion and git services commented on LUCENE-5278:
-

Commit 1531479 from [~rcmuir] in branch 'dev/trunk'
[ https://svn.apache.org/r1531479 ]

LUCENE-5278: remove CharTokenizer brain-damage from MockTokenizer so it works 
better with custom regular expressions

 MockTokenizer throws away the character right after a token even if it is a 
 valid start to a new token
 --

 Key: LUCENE-5278
 URL: https://issues.apache.org/jira/browse/LUCENE-5278
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Nik Everett
Assignee: Robert Muir
Priority: Trivial
 Attachments: LUCENE-5278.patch, LUCENE-5278.patch, LUCENE-5278.patch


 MockTokenizer throws away the character right after a token even if it is a 
 valid start to a new token.  You won't see this unless you build a tokenizer 
 that can recognize every character like with new RegExp(.) or RegExp(...).
 Changing this behaviour seems to break a number of tests.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5278) MockTokenizer throws away the character right after a token even if it is a valid start to a new token