[jira] [Commented] (SOLR-4165) Queries blocked when stopping and starting a node
[ https://issues.apache.org/jira/browse/SOLR-4165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13551878#comment-13551878 ] Markus Jelsma commented on SOLR-4165: - Hi Mark, this is for standard stops. On shutdown the cluster can stall very briefly, a matter of 1 or 2 seconds at most in our case. On start up the problem is more serious. Queries blocked when stopping and starting a node - Key: SOLR-4165 URL: https://issues.apache.org/jira/browse/SOLR-4165 Project: Solr Issue Type: Bug Components: search, SolrCloud Affects Versions: 5.0 Environment: 5.0-SNAPSHOT 1366361:1420056M - markus - 2012-12-11 11:52:06 Reporter: Markus Jelsma Assignee: Mark Miller Priority: Critical Fix For: 4.1, 5.0 Our 10 node test cluster (10 shards, 20 cores) blocks incoming queries briefly when a node is stopped gracefully and again blocks queries for at least a few seconds when the node is started again. We're using siege to send roughly 10 queries per second to a pair a load balancers. Those load balancers ping (admin/ping) each node every few hundres milliseconds. The ping queries continue to operate normally while the requests to our main request handler is blocked. A manual request directly to a live Solr node is also blocked for the same duration. There are no errors logged. But it is clear that the the entire cluster blocks queries as soon as the starting node is reading its config from Zookeeper, likely even slightly earlier. The blocking time when stopping a node varies between 1 or 5 seconds. The blocking time when starting a node varies between 10 up to 30 seconds. The blocked queries come rushing in again after a queue of ping requests are served. The ping request sets the main request handler via the qt parameter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3735) Relocate the example mime-to-extension mapping
[ https://issues.apache.org/jira/browse/SOLR-3735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Hatcher updated SOLR-3735: --- Fix Version/s: 4.1 went ahead and merged this to 4.x (4.1+) in order to minimize diffs (especially something minor like this) from trunk to 4x. Relocate the example mime-to-extension mapping -- Key: SOLR-3735 URL: https://issues.apache.org/jira/browse/SOLR-3735 Project: Solr Issue Type: Improvement Components: web gui Affects Versions: 4.0-BETA, 4.0 Reporter: Erik Hatcher Assignee: Erik Hatcher Priority: Minor Fix For: 4.1, 5.0 Attachments: SOLR-3735.patch A mime-to-extension mapping was added to VelocityResponseWriter recently. This really belongs in the templates themselves, not in VrW, as it is specific to the example search results not meant for all VrW templates. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3735) Relocate the example mime-to-extension mapping
[ https://issues.apache.org/jira/browse/SOLR-3735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13551893#comment-13551893 ] Commit Tag Bot commented on SOLR-3735: -- [branch_4x commit] Erik Hatcher http://svn.apache.org/viewvc?view=revisionrevision=1432410 SOLR-3735: Relocate the example mime-to-extension mapping (merge from trunk) Relocate the example mime-to-extension mapping -- Key: SOLR-3735 URL: https://issues.apache.org/jira/browse/SOLR-3735 Project: Solr Issue Type: Improvement Components: web gui Affects Versions: 4.0-BETA, 4.0 Reporter: Erik Hatcher Assignee: Erik Hatcher Priority: Minor Fix For: 4.1, 5.0 Attachments: SOLR-3735.patch A mime-to-extension mapping was added to VelocityResponseWriter recently. This really belongs in the templates themselves, not in VrW, as it is specific to the example search results not meant for all VrW templates. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3735) Relocate the example mime-to-extension mapping
[ https://issues.apache.org/jira/browse/SOLR-3735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13551911#comment-13551911 ] Commit Tag Bot commented on SOLR-3735: -- [trunk commit] Erik Hatcher http://svn.apache.org/viewvc?view=revisionrevision=1432411 SOLR-3735: merged to 4x, so adjust CHANGES Relocate the example mime-to-extension mapping -- Key: SOLR-3735 URL: https://issues.apache.org/jira/browse/SOLR-3735 Project: Solr Issue Type: Improvement Components: web gui Affects Versions: 4.0-BETA, 4.0 Reporter: Erik Hatcher Assignee: Erik Hatcher Priority: Minor Fix For: 4.1, 5.0 Attachments: SOLR-3735.patch A mime-to-extension mapping was added to VelocityResponseWriter recently. This really belongs in the templates themselves, not in VrW, as it is specific to the example search results not meant for all VrW templates. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4678) FST should use paged byte[] instead of single contiguous byte[]
[ https://issues.apache.org/jira/browse/LUCENE-4678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13551929#comment-13551929 ] Robert Muir commented on LUCENE-4678: - {quote} I'll commit only to trunk for now ... and backport to 4.2 once 4.1 branches and once this has baked some in trunk ... {quote} +1... the copyBytes is frightening though! What do you think of the FST.BytesReader - FSTBytesReader? I'm just thinking it causes a lot of api noise (you can see it in the patch). Unfortunately lots of users have to create this thing to pass to methods on FST (e.g. findTargetArc). So if we kept it as FST.BytesReader they would be largely unaffected? FST should use paged byte[] instead of single contiguous byte[] --- Key: LUCENE-4678 URL: https://issues.apache.org/jira/browse/LUCENE-4678 Project: Lucene - Core Issue Type: Improvement Components: core/FSTs Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.2, 5.0 Attachments: LUCENE-4678.patch, LUCENE-4678.patch, LUCENE-4678.patch The single byte[] we use today has several limitations, eg it limits us to 2.1 GB FSTs (and suggesters in the wild are getting close to this limit), and it causes big RAM spikes during building when a the array has to grow. I took basically the same approach as LUCENE-3298, but I want to break out this patch separately from changing all int - long for 2.1 GB support. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4682) Reduce wasted bytes in FST due to array arcs
Michael McCandless created LUCENE-4682: -- Summary: Reduce wasted bytes in FST due to array arcs Key: LUCENE-4682 URL: https://issues.apache.org/jira/browse/LUCENE-4682 Project: Lucene - Core Issue Type: Improvement Components: core/FSTs Reporter: Michael McCandless Priority: Minor When a node is close to the root, or it has many outgoing arcs, the FST writes the arcs as an array (each arc gets N bytes), so we can e.g. bin search on lookup. The problem is N is set to the max(numBytesPerArc), so if you have an outlier arc e.g. with a big output, you can waste many bytes for all the other arcs that didn't need so many bytes. I generated Kuromoji's FST and found it has 271187 wasted bytes vs total size 1535612 = ~18% wasted. It would be nice to reduce this. One thing we could do without packing is: in addNode, if we detect that number of wasted bytes is above some threshold, then don't do the expansion. Another thing, if we are packing: we could record stats in the first pass about which nodes wasted the most, and then in the second pass (paack) we could set the threshold based on the top X% nodes that waste ... Another idea is maybe to deref large outputs, so that the numBytesPerArc is more uniform ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4682) Reduce wasted bytes in FST due to array arcs
[ https://issues.apache.org/jira/browse/LUCENE-4682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13551932#comment-13551932 ] Michael McCandless commented on LUCENE-4682: A couple more ideas: * Since the root arc is [usually?] cached ... we [usually] shouldn't make the root node into an array? * The building process sometimes has freedom in where the outputs are pushed ... so in theory we could push the outputs forwards if it would mean fewer wasted bytes on the prior node ... this would be a tricky optimization problem I think. Reduce wasted bytes in FST due to array arcs Key: LUCENE-4682 URL: https://issues.apache.org/jira/browse/LUCENE-4682 Project: Lucene - Core Issue Type: Improvement Components: core/FSTs Reporter: Michael McCandless Priority: Minor When a node is close to the root, or it has many outgoing arcs, the FST writes the arcs as an array (each arc gets N bytes), so we can e.g. bin search on lookup. The problem is N is set to the max(numBytesPerArc), so if you have an outlier arc e.g. with a big output, you can waste many bytes for all the other arcs that didn't need so many bytes. I generated Kuromoji's FST and found it has 271187 wasted bytes vs total size 1535612 = ~18% wasted. It would be nice to reduce this. One thing we could do without packing is: in addNode, if we detect that number of wasted bytes is above some threshold, then don't do the expansion. Another thing, if we are packing: we could record stats in the first pass about which nodes wasted the most, and then in the second pass (paack) we could set the threshold based on the top X% nodes that waste ... Another idea is maybe to deref large outputs, so that the numBytesPerArc is more uniform ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4682) Reduce wasted bytes in FST due to array arcs
[ https://issues.apache.org/jira/browse/LUCENE-4682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13551934#comment-13551934 ] Michael McCandless commented on LUCENE-4682: Maybe we should just tighten up the FST thresholds for when we make an array arc: {noformat} /** * @see #shouldExpand(UnCompiledNode) */ final static int FIXED_ARRAY_SHALLOW_DISTANCE = 3; // 0 = only root node. /** * @see #shouldExpand(UnCompiledNode) */ final static int FIXED_ARRAY_NUM_ARCS_SHALLOW = 5; /** * @see #shouldExpand(UnCompiledNode) */ final static int FIXED_ARRAY_NUM_ARCS_DEEP = 10; {noformat} When I print out the waste, it's generally the smaller nodes that have higher proportional waste: {noformat} [java] waste: 44 numArcs=16 perArc=2.75 [java] waste: 20 numArcs=11 perArc=1.8181819 [java] waste: 13 numArcs=5 perArc=2.6 [java] waste: 20 numArcs=12 perArc=1.666 [java] waste: 60 numArcs=20 perArc=3.0 [java] waste: 0 numArcs=5 perArc=0.0 [java] waste: 48 numArcs=15 perArc=3.2 [java] waste: 16 numArcs=5 perArc=3.2 [java] waste: 20 numArcs=6 perArc=3.333 [java] waste: 8 numArcs=6 perArc=1.334 [java] waste: 24 numArcs=8 perArc=3.0 [java] waste: 32 numArcs=9 perArc=3.556 [java] waste: 17 numArcs=7 perArc=2.4285715 [java] waste: 13 numArcs=5 perArc=2.6 [java] waste: 17 numArcs=6 perArc=2.833 [java] waste: 28 numArcs=8 perArc=3.5 [java] waste: 20 numArcs=16 perArc=1.25 [java] waste: 44 numArcs=15 perArc=2.934 [java] waste: 28 numArcs=13 perArc=2.1538463 [java] waste: 28 numArcs=15 perArc=1.867 {noformat} Reduce wasted bytes in FST due to array arcs Key: LUCENE-4682 URL: https://issues.apache.org/jira/browse/LUCENE-4682 Project: Lucene - Core Issue Type: Improvement Components: core/FSTs Reporter: Michael McCandless Priority: Minor When a node is close to the root, or it has many outgoing arcs, the FST writes the arcs as an array (each arc gets N bytes), so we can e.g. bin search on lookup. The problem is N is set to the max(numBytesPerArc), so if you have an outlier arc e.g. with a big output, you can waste many bytes for all the other arcs that didn't need so many bytes. I generated Kuromoji's FST and found it has 271187 wasted bytes vs total size 1535612 = ~18% wasted. It would be nice to reduce this. One thing we could do without packing is: in addNode, if we detect that number of wasted bytes is above some threshold, then don't do the expansion. Another thing, if we are packing: we could record stats in the first pass about which nodes wasted the most, and then in the second pass (paack) we could set the threshold based on the top X% nodes that waste ... Another idea is maybe to deref large outputs, so that the numBytesPerArc is more uniform ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4682) Reduce wasted bytes in FST due to array arcs
[ https://issues.apache.org/jira/browse/LUCENE-4682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-4682: --- Attachment: kuromoji.wasted.bytes.txt Shows the wasted bytes ... one line per node whose arcs were turned into an array, sorted by net bytes wasted. Reduce wasted bytes in FST due to array arcs Key: LUCENE-4682 URL: https://issues.apache.org/jira/browse/LUCENE-4682 Project: Lucene - Core Issue Type: Improvement Components: core/FSTs Reporter: Michael McCandless Priority: Minor Attachments: kuromoji.wasted.bytes.txt When a node is close to the root, or it has many outgoing arcs, the FST writes the arcs as an array (each arc gets N bytes), so we can e.g. bin search on lookup. The problem is N is set to the max(numBytesPerArc), so if you have an outlier arc e.g. with a big output, you can waste many bytes for all the other arcs that didn't need so many bytes. I generated Kuromoji's FST and found it has 271187 wasted bytes vs total size 1535612 = ~18% wasted. It would be nice to reduce this. One thing we could do without packing is: in addNode, if we detect that number of wasted bytes is above some threshold, then don't do the expansion. Another thing, if we are packing: we could record stats in the first pass about which nodes wasted the most, and then in the second pass (paack) we could set the threshold based on the top X% nodes that waste ... Another idea is maybe to deref large outputs, so that the numBytesPerArc is more uniform ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4682) Reduce wasted bytes in FST due to array arcs
[ https://issues.apache.org/jira/browse/LUCENE-4682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13551938#comment-13551938 ] Robert Muir commented on LUCENE-4682: - As an experiment i turned off array arcs for kuromoji in my trunk checkout: FST before: [java] 53645 nodes, 253185 arcs, 1535612 bytes... done after: [java] 53645 nodes, 253185 arcs, 1228816 bytes... done JAR before: rw-rw-r- 1 rmuir rmuir 4581420 Jan 12 09:56 lucene-analyzers-kuromoji-4.1-SNAPSHOT.jar after: rw-rw-r- 1 rmuir rmuir 4306792 Jan 12 09:56 lucene-analyzers-kuromoji-5.0-SNAPSHOT.jar Reduce wasted bytes in FST due to array arcs Key: LUCENE-4682 URL: https://issues.apache.org/jira/browse/LUCENE-4682 Project: Lucene - Core Issue Type: Improvement Components: core/FSTs Reporter: Michael McCandless Priority: Minor Attachments: kuromoji.wasted.bytes.txt When a node is close to the root, or it has many outgoing arcs, the FST writes the arcs as an array (each arc gets N bytes), so we can e.g. bin search on lookup. The problem is N is set to the max(numBytesPerArc), so if you have an outlier arc e.g. with a big output, you can waste many bytes for all the other arcs that didn't need so many bytes. I generated Kuromoji's FST and found it has 271187 wasted bytes vs total size 1535612 = ~18% wasted. It would be nice to reduce this. One thing we could do without packing is: in addNode, if we detect that number of wasted bytes is above some threshold, then don't do the expansion. Another thing, if we are packing: we could record stats in the first pass about which nodes wasted the most, and then in the second pass (paack) we could set the threshold based on the top X% nodes that waste ... Another idea is maybe to deref large outputs, so that the numBytesPerArc is more uniform ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4682) Reduce wasted bytes in FST due to array arcs
[ https://issues.apache.org/jira/browse/LUCENE-4682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13551939#comment-13551939 ] Michael McCandless commented on LUCENE-4682: Even more than the 271,187 I measured (20% smaller FST), I think because the FST is now smaller we use fewer bytes writing the delta-coded node addresses ... Reduce wasted bytes in FST due to array arcs Key: LUCENE-4682 URL: https://issues.apache.org/jira/browse/LUCENE-4682 Project: Lucene - Core Issue Type: Improvement Components: core/FSTs Reporter: Michael McCandless Priority: Minor Attachments: kuromoji.wasted.bytes.txt When a node is close to the root, or it has many outgoing arcs, the FST writes the arcs as an array (each arc gets N bytes), so we can e.g. bin search on lookup. The problem is N is set to the max(numBytesPerArc), so if you have an outlier arc e.g. with a big output, you can waste many bytes for all the other arcs that didn't need so many bytes. I generated Kuromoji's FST and found it has 271187 wasted bytes vs total size 1535612 = ~18% wasted. It would be nice to reduce this. One thing we could do without packing is: in addNode, if we detect that number of wasted bytes is above some threshold, then don't do the expansion. Another thing, if we are packing: we could record stats in the first pass about which nodes wasted the most, and then in the second pass (paack) we could set the threshold based on the top X% nodes that waste ... Another idea is maybe to deref large outputs, so that the numBytesPerArc is more uniform ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4682) Reduce wasted bytes in FST due to array arcs
[ https://issues.apache.org/jira/browse/LUCENE-4682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13551940#comment-13551940 ] Robert Muir commented on LUCENE-4682: - in the fixedArray case: {code} // write a false first arc: writer.writeByte(ARCS_AS_FIXED_ARRAY); writer.writeVInt(nodeIn.numArcs); // placeholder -- we'll come back and write the number // of bytes per arc (int) here: // TODO: we could make this a vInt instead writer.writeInt(0); fixedArrayStart = writer.getPosition(); {code} I think we should actually make that TODO line a writeByte. If it turns out the max arcSize is 255 i think we should just not encode as array arcs (just save our position before we write ARCS_AS_FIXED_ARRAY, rewind to that, and encode normally) This would reduce the overhead of array-arcs, but also maybe prevent some worst cases causing waste as a side effect. Reduce wasted bytes in FST due to array arcs Key: LUCENE-4682 URL: https://issues.apache.org/jira/browse/LUCENE-4682 Project: Lucene - Core Issue Type: Improvement Components: core/FSTs Reporter: Michael McCandless Priority: Minor Attachments: kuromoji.wasted.bytes.txt When a node is close to the root, or it has many outgoing arcs, the FST writes the arcs as an array (each arc gets N bytes), so we can e.g. bin search on lookup. The problem is N is set to the max(numBytesPerArc), so if you have an outlier arc e.g. with a big output, you can waste many bytes for all the other arcs that didn't need so many bytes. I generated Kuromoji's FST and found it has 271187 wasted bytes vs total size 1535612 = ~18% wasted. It would be nice to reduce this. One thing we could do without packing is: in addNode, if we detect that number of wasted bytes is above some threshold, then don't do the expansion. Another thing, if we are packing: we could record stats in the first pass about which nodes wasted the most, and then in the second pass (paack) we could set the threshold based on the top X% nodes that waste ... Another idea is maybe to deref large outputs, so that the numBytesPerArc is more uniform ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4678) FST should use paged byte[] instead of single contiguous byte[]
[ https://issues.apache.org/jira/browse/LUCENE-4678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13551941#comment-13551941 ] Michael McCandless commented on LUCENE-4678: bq. the copyBytes is frightening though! I know! But hopefully the random test catches any problems w/ it ... jenkins will tell us. bq. So if we kept it as FST.BytesReader they would be largely unaffected? +1, I moved back to that ... no more noise ... I'll attach new patch shortly. FST should use paged byte[] instead of single contiguous byte[] --- Key: LUCENE-4678 URL: https://issues.apache.org/jira/browse/LUCENE-4678 Project: Lucene - Core Issue Type: Improvement Components: core/FSTs Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.2, 5.0 Attachments: LUCENE-4678.patch, LUCENE-4678.patch, LUCENE-4678.patch The single byte[] we use today has several limitations, eg it limits us to 2.1 GB FSTs (and suggesters in the wild are getting close to this limit), and it causes big RAM spikes during building when a the array has to grow. I took basically the same approach as LUCENE-3298, but I want to break out this patch separately from changing all int - long for 2.1 GB support. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4678) FST should use paged byte[] instead of single contiguous byte[]
[ https://issues.apache.org/jira/browse/LUCENE-4678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-4678: --- Attachment: LUCENE-4678.patch New patch, move BytesReader back under FST. I think it's ready. FST should use paged byte[] instead of single contiguous byte[] --- Key: LUCENE-4678 URL: https://issues.apache.org/jira/browse/LUCENE-4678 Project: Lucene - Core Issue Type: Improvement Components: core/FSTs Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.2, 5.0 Attachments: LUCENE-4678.patch, LUCENE-4678.patch, LUCENE-4678.patch, LUCENE-4678.patch The single byte[] we use today has several limitations, eg it limits us to 2.1 GB FSTs (and suggesters in the wild are getting close to this limit), and it causes big RAM spikes during building when a the array has to grow. I took basically the same approach as LUCENE-3298, but I want to break out this patch separately from changing all int - long for 2.1 GB support. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4682) Reduce wasted bytes in FST due to array arcs
[ https://issues.apache.org/jira/browse/LUCENE-4682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13551944#comment-13551944 ] Dawid Weiss commented on LUCENE-4682: - bq. Even more than the 271,187 I measured (20% smaller FST), I think because the FST is now smaller we use fewer bytes writing the delta-coded node addresses Yes, these things are all tightly coupled. Dawid Reduce wasted bytes in FST due to array arcs Key: LUCENE-4682 URL: https://issues.apache.org/jira/browse/LUCENE-4682 Project: Lucene - Core Issue Type: Improvement Components: core/FSTs Reporter: Michael McCandless Priority: Minor Attachments: kuromoji.wasted.bytes.txt When a node is close to the root, or it has many outgoing arcs, the FST writes the arcs as an array (each arc gets N bytes), so we can e.g. bin search on lookup. The problem is N is set to the max(numBytesPerArc), so if you have an outlier arc e.g. with a big output, you can waste many bytes for all the other arcs that didn't need so many bytes. I generated Kuromoji's FST and found it has 271187 wasted bytes vs total size 1535612 = ~18% wasted. It would be nice to reduce this. One thing we could do without packing is: in addNode, if we detect that number of wasted bytes is above some threshold, then don't do the expansion. Another thing, if we are packing: we could record stats in the first pass about which nodes wasted the most, and then in the second pass (paack) we could set the threshold based on the top X% nodes that waste ... Another idea is maybe to deref large outputs, so that the numBytesPerArc is more uniform ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4678) FST should use paged byte[] instead of single contiguous byte[]
[ https://issues.apache.org/jira/browse/LUCENE-4678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13551947#comment-13551947 ] Commit Tag Bot commented on LUCENE-4678: [trunk commit] Michael McCandless http://svn.apache.org/viewvc?view=revisionrevision=1432459 LUCENE-4678: use paged byte[] under the hood for FST FST should use paged byte[] instead of single contiguous byte[] --- Key: LUCENE-4678 URL: https://issues.apache.org/jira/browse/LUCENE-4678 Project: Lucene - Core Issue Type: Improvement Components: core/FSTs Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.2, 5.0 Attachments: LUCENE-4678.patch, LUCENE-4678.patch, LUCENE-4678.patch, LUCENE-4678.patch The single byte[] we use today has several limitations, eg it limits us to 2.1 GB FSTs (and suggesters in the wild are getting close to this limit), and it causes big RAM spikes during building when a the array has to grow. I took basically the same approach as LUCENE-3298, but I want to break out this patch separately from changing all int - long for 2.1 GB support. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4682) Reduce wasted bytes in FST due to array arcs
[ https://issues.apache.org/jira/browse/LUCENE-4682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13551950#comment-13551950 ] Michael McCandless commented on LUCENE-4682: Another datapoint: the FreeDB suggester (tool in luceneutil to create/test it) is 1.05 GB FST, and has 87.5 MB wasted bytes (~8%). Reduce wasted bytes in FST due to array arcs Key: LUCENE-4682 URL: https://issues.apache.org/jira/browse/LUCENE-4682 Project: Lucene - Core Issue Type: Improvement Components: core/FSTs Reporter: Michael McCandless Priority: Minor Attachments: kuromoji.wasted.bytes.txt When a node is close to the root, or it has many outgoing arcs, the FST writes the arcs as an array (each arc gets N bytes), so we can e.g. bin search on lookup. The problem is N is set to the max(numBytesPerArc), so if you have an outlier arc e.g. with a big output, you can waste many bytes for all the other arcs that didn't need so many bytes. I generated Kuromoji's FST and found it has 271187 wasted bytes vs total size 1535612 = ~18% wasted. It would be nice to reduce this. One thing we could do without packing is: in addNode, if we detect that number of wasted bytes is above some threshold, then don't do the expansion. Another thing, if we are packing: we could record stats in the first pass about which nodes wasted the most, and then in the second pass (paack) we could set the threshold based on the top X% nodes that waste ... Another idea is maybe to deref large outputs, so that the numBytesPerArc is more uniform ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4677) Use vInt to encode node addresses inside FST
[ https://issues.apache.org/jira/browse/LUCENE-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13551954#comment-13551954 ] Commit Tag Bot commented on LUCENE-4677: [trunk commit] Michael McCandless http://svn.apache.org/viewvc?view=revisionrevision=1432466 LUCENE-4677: use vInt not int to encode arc's target address in un-packed FSTs Use vInt to encode node addresses inside FST Key: LUCENE-4677 URL: https://issues.apache.org/jira/browse/LUCENE-4677 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.2, 5.0 Attachments: LUCENE-4677.patch, LUCENE-4677.patch, LUCENE-4677.patch Today we use int, but towards enabling 2.1G sized FSTs, I'd like to make this vInt instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4287) Maven artifact file names do not match dist/ file names
[ https://issues.apache.org/jira/browse/SOLR-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13551975#comment-13551975 ] Commit Tag Bot commented on SOLR-4287: -- [trunk commit] Steven Rowe http://svn.apache.org/viewvc?view=revisionrevision=1432483 SOLR-4287: Removed apache- prefix from Solr distribution and artifact filenames. Maven artifact file names do not match dist/ file names --- Key: SOLR-4287 URL: https://issues.apache.org/jira/browse/SOLR-4287 Project: Solr Issue Type: Bug Components: Build Affects Versions: 4.0 Reporter: Ryan Ernst Assignee: Steve Rowe Priority: Blocker Fix For: 4.1 Attachments: SOLR-4287_alternative.patch, SOLR-4287.patch For the solr artifact, the war file name has the format solr-X.Y.Z.war. http://search.maven.org/#artifactdetails%7Corg.apache.solr%7Csolr%7C4.0.0%7Cwar However, when building from source or downloading the dist/ built war file, it is named apache-solr-X.Y.Z.war. This should really be the same... Preferably the apache- could just be removed, since the lucene build does not appear to use the same convention. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4287) Maven artifact file names do not match dist/ file names
[ https://issues.apache.org/jira/browse/SOLR-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13551980#comment-13551980 ] Commit Tag Bot commented on SOLR-4287: -- [branch_4x commit] Steven Rowe http://svn.apache.org/viewvc?view=revisionrevision=1432486 SOLR-4287: Removed apache- prefix from Solr distribution and artifact filenames. (merged trunk r1432483) Maven artifact file names do not match dist/ file names --- Key: SOLR-4287 URL: https://issues.apache.org/jira/browse/SOLR-4287 Project: Solr Issue Type: Bug Components: Build Affects Versions: 4.0 Reporter: Ryan Ernst Assignee: Steve Rowe Priority: Blocker Fix For: 4.1 Attachments: SOLR-4287_alternative.patch, SOLR-4287.patch For the solr artifact, the war file name has the format solr-X.Y.Z.war. http://search.maven.org/#artifactdetails%7Corg.apache.solr%7Csolr%7C4.0.0%7Cwar However, when building from source or downloading the dist/ built war file, it is named apache-solr-X.Y.Z.war. This should really be the same... Preferably the apache- could just be removed, since the lucene build does not appear to use the same convention. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
lucene-solr pull request: Fix API web link for IndexDeletionPolicy (against...
GitHub user arafalov opened a pull request: https://github.com/apache/lucene-solr/pull/6 Fix API web link for IndexDeletionPolicy (against solr 4.x branch) I did it for Lucene 4.0, as I am not sure where 4.1 will live. In any case, this is better than currently-dead 3.5 link. You can merge this pull request into a Git repository by running: $ git pull https://github.com/arafalov/lucene-solr patch-1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/lucene-solr/pull/6.patch commit 5991153f26cc92c7cc5c95d6a1774eb3050b0643 Author: Alexandre Rafalovitch arafa...@gmail.com Date: 2013-01-12T17:57:20Z Fix API web link for IndexDeletionPolicy I did it for Lucene 4.0, as I am not sure where 4.1 will live. In any case, this is better than currently-dead 3.5 link. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
lucene-solr pull request: Trivial documentation URL fix
Github user arafalov closed the pull request at: https://github.com/apache/lucene-solr/pull/5 - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-4287) Maven artifact file names do not match dist/ file names
[ https://issues.apache.org/jira/browse/SOLR-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Rowe resolved SOLR-4287. -- Resolution: Fixed Committed to trunk and branch_4x. Thanks Ryan! Maven artifact file names do not match dist/ file names --- Key: SOLR-4287 URL: https://issues.apache.org/jira/browse/SOLR-4287 Project: Solr Issue Type: Bug Components: Build Affects Versions: 4.0 Reporter: Ryan Ernst Assignee: Steve Rowe Priority: Blocker Fix For: 4.1 Attachments: SOLR-4287_alternative.patch, SOLR-4287.patch For the solr artifact, the war file name has the format solr-X.Y.Z.war. http://search.maven.org/#artifactdetails%7Corg.apache.solr%7Csolr%7C4.0.0%7Cwar However, when building from source or downloading the dist/ built war file, it is named apache-solr-X.Y.Z.war. This should really be the same... Preferably the apache- could just be removed, since the lucene build does not appear to use the same convention. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS-MAVEN] Lucene-Solr-Maven-4.x #212: POMs out of sync
The POMs really are out of sync: - -validate-maven-dependencies: [licenses] MISSING sha1 checksum file for: /home/hudson/.m2/repository/org/apache/velocity/velocity/1.6.4/velocity-1.6.4.jar [licenses] Scanned 32 JAR file(s) for licenses (in 0.14s.), 1 error(s). - I'll make an adjustment shortly. (I should also fix the log trimming regex for the Maven Jenkins jobs so that this error makes it into future failure emails.) Steve On Jan 12, 2013, at 12:06 PM, Apache Jenkins Server jenk...@builds.apache.org wrote: Build: https://builds.apache.org/job/Lucene-Solr-Maven-4.x/212/ No tests ran. Build Log: [...truncated 11125 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1028) Automatic core loading unloading for multicore
[ https://issues.apache.org/jira/browse/SOLR-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13551991#comment-13551991 ] Steve Rowe commented on SOLR-1028: -- Erick, can this issue be resolved? Automatic core loading unloading for multicore -- Key: SOLR-1028 URL: https://issues.apache.org/jira/browse/SOLR-1028 Project: Solr Issue Type: New Feature Components: multicore Affects Versions: 4.0, 5.0 Reporter: Noble Paul Assignee: Erick Erickson Fix For: 4.1, 5.0 Attachments: jenkins.jpg, SOLR-1028.patch, SOLR-1028.patch, SOLR-1028_testnoise.patch usecase: I have many small cores (say one per user) on a single Solr box . All the cores are not be always needed . But when I need it I should be able to directly issue a search request and the core must be STARTED automatically and the request must be served. This also requires that I must have an upper limit on the no:of cores that should be loaded at any given point in time. If the limit is crossed the CoreContainer must unload a core (preferably the least recently used core) There must be a choice of specifying some cores as fixed. These cores must never be unloaded -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-4299) Failed with java.net.BindException Address already in use
Nithin Chacko Ninan created SOLR-4299: - Summary: Failed with java.net.BindException Address already in use Key: SOLR-4299 URL: https://issues.apache.org/jira/browse/SOLR-4299 Project: Solr Issue Type: Bug Reporter: Nithin Chacko Ninan Hello Team, We have configured magetno solr search on our stage instance.While testing, we noticed that solr is not working as expected.we searched on solr confgiuration and we used java -jar start.jar to check the port status. we noticed the above mentioned issue (ie failed with java.net.BindException Address already in use). Any comment or help will be appriciated. thanks! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-2305) Introduce Version in more places long before 4.0
[ https://issues.apache.org/jira/browse/LUCENE-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera resolved LUCENE-2305. Resolution: Won't Fix Fix Version/s: (was: 4.2) (was: 5.0) 4.0 is out long ago :). And I don't think we need that issue if we want to add Version to more places. Introduce Version in more places long before 4.0 Key: LUCENE-2305 URL: https://issues.apache.org/jira/browse/LUCENE-2305 Project: Lucene - Core Issue Type: Improvement Components: core/other Reporter: Shai Erera We need to introduce Version in as many places as we can (wherever it makes sense of course), and preferably long before 4.0 (or shall I say 3.9?) is out. That way, we can have a bunch of deprecated API now, that will be gone in 4.0, rather than doing it one class at a time and never finish :). The purpose is to introduce Version wherever it is mandatory now, and also in places where we think it might be useful in the future (like most of our Analyzers, configured classes and configuration classes). I marked this issue for 3.1, though I don't expect it to end in 3.1. I still think it will be done one step at a time, perhaps for cluster of classes together. But on the other hand I don't want to mark it for 4.0.0 because that needs to be resolved much sooner. So if I had a 3.9 version defined, I'd mark it for 3.9. We can do several commits in one issue right? So this one can live for a while in JIRA, while we gradually convert more and more classes. The first candidate is InstantiatedIndexWriter which probably should take an IndexWriterConfig. While I converted the code to use IWC, I've noticed Instantiated defaults its maxFieldLength to the current default (10,000) which is deprecated. I couldn't change it for back-compat reasons. But we can upgrade it to accept IWC, and set to unlimited if the version is onOrAfter 3.1, otherwise stay w/ the deprecated default. if it's acceptable to have several commits in one issue, I can start w/ Instantiated, post a patch and then we can continue to more classes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4299) Failed with java.net.BindException Address already in use
[ https://issues.apache.org/jira/browse/SOLR-4299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nithin Chacko Ninan updated SOLR-4299: -- Description: Hello Team, We have configured magetno solr search on our stage instance.While testing, we noticed that solr is not working as expected.we searched on solr confgiuration and we used java -jar start.jar to check the port status. we noticed the below mentioned issue (ie failed with java.net.BindException Address already in use). Any comment or help will be appreciated. NFO: [] Registered new searcher Searcher@668db25b main 2013-01-12 19:36:50.223:WARN::failed SocketConnector@0.0.0.0:8983: java.net.BindException: Address already in use 2013-01-12 19:36:50.223:WARN::failed Server@7ca7700a: java.net.BindException: Address already in use 2013-01-12 19:36:50.223:WARN::EXCEPTION java.net.BindException: Address already in use at java.net.PlainSocketImpl.socketBind(Native Method) at java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:376) at java.net.ServerSocket.bind(ServerSocket.java:376) at java.net.ServerSocket.init(ServerSocket.java:237) at java.net.ServerSocket.init(ServerSocket.java:181) at org.mortbay.jetty.bio.SocketConnector.newServerSocket(SocketConnector.java:80) at org.mortbay.jetty.bio.SocketConnector.open(SocketConnector.java:73) at org.mortbay.jetty.AbstractConnector.doStart(AbstractConnector.java:283) at org.mortbay.jetty.bio.SocketConnector.doStart(SocketConnector.java:147) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.Server.doStart(Server.java:235) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:985) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.mortbay.start.Main.invokeMain(Main.java:194) at org.mortbay.start.Main.start(Main.java:534) at org.mortbay.start.Main.start(Main.java:441) at org.mortbay.start.Main.main(Main.java:119) thanks! was: Hello Team, We have configured magetno solr search on our stage instance.While testing, we noticed that solr is not working as expected.we searched on solr confgiuration and we used java -jar start.jar to check the port status. we noticed the above mentioned issue (ie failed with java.net.BindException Address already in use). Any comment or help will be appriciated. thanks! Failed with java.net.BindException Address already in use - Key: SOLR-4299 URL: https://issues.apache.org/jira/browse/SOLR-4299 Project: Solr Issue Type: Bug Reporter: Nithin Chacko Ninan Hello Team, We have configured magetno solr search on our stage instance.While testing, we noticed that solr is not working as expected.we searched on solr confgiuration and we used java -jar start.jar to check the port status. we noticed the below mentioned issue (ie failed with java.net.BindException Address already in use). Any comment or help will be appreciated. NFO: [] Registered new searcher Searcher@668db25b main 2013-01-12 19:36:50.223:WARN::failed SocketConnector@0.0.0.0:8983: java.net.BindException: Address already in use 2013-01-12 19:36:50.223:WARN::failed Server@7ca7700a: java.net.BindException: Address already in use 2013-01-12 19:36:50.223:WARN::EXCEPTION java.net.BindException: Address already in use at java.net.PlainSocketImpl.socketBind(Native Method) at java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:376) at java.net.ServerSocket.bind(ServerSocket.java:376) at java.net.ServerSocket.init(ServerSocket.java:237) at java.net.ServerSocket.init(ServerSocket.java:181) at org.mortbay.jetty.bio.SocketConnector.newServerSocket(SocketConnector.java:80) at org.mortbay.jetty.bio.SocketConnector.open(SocketConnector.java:73) at org.mortbay.jetty.AbstractConnector.doStart(AbstractConnector.java:283) at org.mortbay.jetty.bio.SocketConnector.doStart(SocketConnector.java:147) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.Server.doStart(Server.java:235) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:985) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
[jira] [Resolved] (SOLR-4299) Failed with java.net.BindException Address already in use
[ https://issues.apache.org/jira/browse/SOLR-4299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Rowe resolved SOLR-4299. -- Resolution: Invalid Assignee: Steve Rowe Please post questions about using Solr to the solr-user mailing list, rather than creating JIRA issues - see [http://lucene.apache.org/solr/discussion.html]. You might like the following, which I found by searching the interweb: * http://stackoverflow.com/questions/6645253/solr-configuration * http://javarevisited.blogspot.com/2011/12/address-already-use-jvm-bind-exception.html Failed with java.net.BindException Address already in use - Key: SOLR-4299 URL: https://issues.apache.org/jira/browse/SOLR-4299 Project: Solr Issue Type: Bug Reporter: Nithin Chacko Ninan Assignee: Steve Rowe Hello Team, We have configured magetno solr search on our stage instance.While testing, we noticed that solr is not working as expected.we searched on solr confgiuration and we used java -jar start.jar to check the port status. we noticed the below mentioned issue (ie failed with java.net.BindException Address already in use). Any comment or help will be appreciated. NFO: [] Registered new searcher Searcher@668db25b main 2013-01-12 19:36:50.223:WARN::failed SocketConnector@0.0.0.0:8983: java.net.BindException: Address already in use 2013-01-12 19:36:50.223:WARN::failed Server@7ca7700a: java.net.BindException: Address already in use 2013-01-12 19:36:50.223:WARN::EXCEPTION java.net.BindException: Address already in use at java.net.PlainSocketImpl.socketBind(Native Method) at java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:376) at java.net.ServerSocket.bind(ServerSocket.java:376) at java.net.ServerSocket.init(ServerSocket.java:237) at java.net.ServerSocket.init(ServerSocket.java:181) at org.mortbay.jetty.bio.SocketConnector.newServerSocket(SocketConnector.java:80) at org.mortbay.jetty.bio.SocketConnector.open(SocketConnector.java:73) at org.mortbay.jetty.AbstractConnector.doStart(AbstractConnector.java:283) at org.mortbay.jetty.bio.SocketConnector.doStart(SocketConnector.java:147) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.Server.doStart(Server.java:235) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:985) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.mortbay.start.Main.invokeMain(Main.java:194) at org.mortbay.start.Main.start(Main.java:534) at org.mortbay.start.Main.start(Main.java:441) at org.mortbay.start.Main.main(Main.java:119) thanks! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3735) Relocate the example mime-to-extension mapping
[ https://issues.apache.org/jira/browse/SOLR-3735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13552040#comment-13552040 ] Commit Tag Bot commented on SOLR-3735: -- [branch_4x commit] Steven Rowe http://svn.apache.org/viewvc?view=revisionrevision=1432501 SOLR-3735: Maven configuration: upgrade velocity dependency from 1.6.4 to 1.7 Relocate the example mime-to-extension mapping -- Key: SOLR-3735 URL: https://issues.apache.org/jira/browse/SOLR-3735 Project: Solr Issue Type: Improvement Components: web gui Affects Versions: 4.0-BETA, 4.0 Reporter: Erik Hatcher Assignee: Erik Hatcher Priority: Minor Fix For: 4.1, 5.0 Attachments: SOLR-3735.patch A mime-to-extension mapping was added to VelocityResponseWriter recently. This really belongs in the templates themselves, not in VrW, as it is specific to the example search results not meant for all VrW templates. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4682) Reduce wasted bytes in FST due to array arcs
[ https://issues.apache.org/jira/browse/LUCENE-4682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-4682: Attachment: LUCENE-4682.patch Mike can you try this patch on your corpus? It cuts us over to vint for the maxBytesPerArc (saving 3 bytes for the unpacked case), and adds an acceptable overhead for array arcs (currently 1.25). For the kuromoji packed case, this seems to solve the waste: [java] 53645 nodes, 253185 arcs, 1309077 bytes... done Reduce wasted bytes in FST due to array arcs Key: LUCENE-4682 URL: https://issues.apache.org/jira/browse/LUCENE-4682 Project: Lucene - Core Issue Type: Improvement Components: core/FSTs Reporter: Michael McCandless Priority: Minor Attachments: kuromoji.wasted.bytes.txt, LUCENE-4682.patch When a node is close to the root, or it has many outgoing arcs, the FST writes the arcs as an array (each arc gets N bytes), so we can e.g. bin search on lookup. The problem is N is set to the max(numBytesPerArc), so if you have an outlier arc e.g. with a big output, you can waste many bytes for all the other arcs that didn't need so many bytes. I generated Kuromoji's FST and found it has 271187 wasted bytes vs total size 1535612 = ~18% wasted. It would be nice to reduce this. One thing we could do without packing is: in addNode, if we detect that number of wasted bytes is above some threshold, then don't do the expansion. Another thing, if we are packing: we could record stats in the first pass about which nodes wasted the most, and then in the second pass (paack) we could set the threshold based on the top X% nodes that waste ... Another idea is maybe to deref large outputs, so that the numBytesPerArc is more uniform ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4682) Reduce wasted bytes in FST due to array arcs
[ https://issues.apache.org/jira/browse/LUCENE-4682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13552057#comment-13552057 ] Michael McCandless commented on LUCENE-4682: +1 This is much cleaner (write header in the end). I built the AnalyzingSuggester for FreeDB: trunk is 1.046 GB and with patch it's 0.917 GB = ~9% smaller! Reduce wasted bytes in FST due to array arcs Key: LUCENE-4682 URL: https://issues.apache.org/jira/browse/LUCENE-4682 Project: Lucene - Core Issue Type: Improvement Components: core/FSTs Reporter: Michael McCandless Priority: Minor Attachments: kuromoji.wasted.bytes.txt, LUCENE-4682.patch When a node is close to the root, or it has many outgoing arcs, the FST writes the arcs as an array (each arc gets N bytes), so we can e.g. bin search on lookup. The problem is N is set to the max(numBytesPerArc), so if you have an outlier arc e.g. with a big output, you can waste many bytes for all the other arcs that didn't need so many bytes. I generated Kuromoji's FST and found it has 271187 wasted bytes vs total size 1535612 = ~18% wasted. It would be nice to reduce this. One thing we could do without packing is: in addNode, if we detect that number of wasted bytes is above some threshold, then don't do the expansion. Another thing, if we are packing: we could record stats in the first pass about which nodes wasted the most, and then in the second pass (paack) we could set the threshold based on the top X% nodes that waste ... Another idea is maybe to deref large outputs, so that the numBytesPerArc is more uniform ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4682) Reduce wasted bytes in FST due to array arcs
[ https://issues.apache.org/jira/browse/LUCENE-4682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13552061#comment-13552061 ] Robert Muir commented on LUCENE-4682: - I can cleanup+commit the patch with the heuristic commented out (so we still get the cutover to vint, which i think is an obvious win?) This way we can benchmark and make sure the heuristic is set appropriately/doesnt hurt performance? Reduce wasted bytes in FST due to array arcs Key: LUCENE-4682 URL: https://issues.apache.org/jira/browse/LUCENE-4682 Project: Lucene - Core Issue Type: Improvement Components: core/FSTs Reporter: Michael McCandless Priority: Minor Attachments: kuromoji.wasted.bytes.txt, LUCENE-4682.patch When a node is close to the root, or it has many outgoing arcs, the FST writes the arcs as an array (each arc gets N bytes), so we can e.g. bin search on lookup. The problem is N is set to the max(numBytesPerArc), so if you have an outlier arc e.g. with a big output, you can waste many bytes for all the other arcs that didn't need so many bytes. I generated Kuromoji's FST and found it has 271187 wasted bytes vs total size 1535612 = ~18% wasted. It would be nice to reduce this. One thing we could do without packing is: in addNode, if we detect that number of wasted bytes is above some threshold, then don't do the expansion. Another thing, if we are packing: we could record stats in the first pass about which nodes wasted the most, and then in the second pass (paack) we could set the threshold based on the top X% nodes that waste ... Another idea is maybe to deref large outputs, so that the numBytesPerArc is more uniform ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4682) Reduce wasted bytes in FST due to array arcs
[ https://issues.apache.org/jira/browse/LUCENE-4682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13552063#comment-13552063 ] Michael McCandless commented on LUCENE-4682: +1 Reduce wasted bytes in FST due to array arcs Key: LUCENE-4682 URL: https://issues.apache.org/jira/browse/LUCENE-4682 Project: Lucene - Core Issue Type: Improvement Components: core/FSTs Reporter: Michael McCandless Priority: Minor Attachments: kuromoji.wasted.bytes.txt, LUCENE-4682.patch When a node is close to the root, or it has many outgoing arcs, the FST writes the arcs as an array (each arc gets N bytes), so we can e.g. bin search on lookup. The problem is N is set to the max(numBytesPerArc), so if you have an outlier arc e.g. with a big output, you can waste many bytes for all the other arcs that didn't need so many bytes. I generated Kuromoji's FST and found it has 271187 wasted bytes vs total size 1535612 = ~18% wasted. It would be nice to reduce this. One thing we could do without packing is: in addNode, if we detect that number of wasted bytes is above some threshold, then don't do the expansion. Another thing, if we are packing: we could record stats in the first pass about which nodes wasted the most, and then in the second pass (paack) we could set the threshold based on the top X% nodes that waste ... Another idea is maybe to deref large outputs, so that the numBytesPerArc is more uniform ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4682) Reduce wasted bytes in FST due to array arcs
[ https://issues.apache.org/jira/browse/LUCENE-4682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13552065#comment-13552065 ] Dawid Weiss commented on LUCENE-4682: - +1. Nice. Reduce wasted bytes in FST due to array arcs Key: LUCENE-4682 URL: https://issues.apache.org/jira/browse/LUCENE-4682 Project: Lucene - Core Issue Type: Improvement Components: core/FSTs Reporter: Michael McCandless Priority: Minor Attachments: kuromoji.wasted.bytes.txt, LUCENE-4682.patch When a node is close to the root, or it has many outgoing arcs, the FST writes the arcs as an array (each arc gets N bytes), so we can e.g. bin search on lookup. The problem is N is set to the max(numBytesPerArc), so if you have an outlier arc e.g. with a big output, you can waste many bytes for all the other arcs that didn't need so many bytes. I generated Kuromoji's FST and found it has 271187 wasted bytes vs total size 1535612 = ~18% wasted. It would be nice to reduce this. One thing we could do without packing is: in addNode, if we detect that number of wasted bytes is above some threshold, then don't do the expansion. Another thing, if we are packing: we could record stats in the first pass about which nodes wasted the most, and then in the second pass (paack) we could set the threshold based on the top X% nodes that waste ... Another idea is maybe to deref large outputs, so that the numBytesPerArc is more uniform ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4682) Reduce wasted bytes in FST due to array arcs
[ https://issues.apache.org/jira/browse/LUCENE-4682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13552067#comment-13552067 ] Uwe Schindler commented on LUCENE-4682: --- +1 Reduce wasted bytes in FST due to array arcs Key: LUCENE-4682 URL: https://issues.apache.org/jira/browse/LUCENE-4682 Project: Lucene - Core Issue Type: Improvement Components: core/FSTs Reporter: Michael McCandless Priority: Minor Attachments: kuromoji.wasted.bytes.txt, LUCENE-4682.patch When a node is close to the root, or it has many outgoing arcs, the FST writes the arcs as an array (each arc gets N bytes), so we can e.g. bin search on lookup. The problem is N is set to the max(numBytesPerArc), so if you have an outlier arc e.g. with a big output, you can waste many bytes for all the other arcs that didn't need so many bytes. I generated Kuromoji's FST and found it has 271187 wasted bytes vs total size 1535612 = ~18% wasted. It would be nice to reduce this. One thing we could do without packing is: in addNode, if we detect that number of wasted bytes is above some threshold, then don't do the expansion. Another thing, if we are packing: we could record stats in the first pass about which nodes wasted the most, and then in the second pass (paack) we could set the threshold based on the top X% nodes that waste ... Another idea is maybe to deref large outputs, so that the numBytesPerArc is more uniform ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4682) Reduce wasted bytes in FST due to array arcs
[ https://issues.apache.org/jira/browse/LUCENE-4682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13552071#comment-13552071 ] Robert Muir commented on LUCENE-4682: - ok i committed the vInt for maxBytesPerArc, but left out the heuristic (so we still have the waste!!!) Here's the comment i added: {code} // TODO: try to avoid wasteful cases: disable doFixedArray in that case /* * * LUCENE-4682: what is a fair heuristic here? * It could involve some of these: * 1. how busy the node is: nodeIn.inputCount relative to frontier[0].inputCount? * 2. how much binSearch saves over scan: nodeIn.numArcs * 3. waste: numBytes vs numBytesExpanded * * the one below just looks at #3 if (doFixedArray) { // rough heuristic: make this 1.25 waste factor a parameter to the phd ctor int numBytes = lastArcStart - startAddress; int numBytesExpanded = maxBytesPerArc * nodeIn.numArcs; if (numBytesExpanded numBytes*1.25) { doFixedArray = false; } } */ {code} I think it would just be best to do some performance benchmarks and figure this out. I know all the kuromoji waste is at node.depth=1 exactly. Also I indexed all of geonames with this heuristic and it barely changed the FST size: trunk FST: 45296685 packedFST: 39083451 vint maxBytesPerArc: FST: 45052386 packedFST: 39083451 vint maxBytesPerArc+heuristic: FST: 44988400 packedFST: 39029108 So the waste and heuristic doesn't affect all FSTs, only certain ones. Reduce wasted bytes in FST due to array arcs Key: LUCENE-4682 URL: https://issues.apache.org/jira/browse/LUCENE-4682 Project: Lucene - Core Issue Type: Improvement Components: core/FSTs Reporter: Michael McCandless Priority: Minor Attachments: kuromoji.wasted.bytes.txt, LUCENE-4682.patch When a node is close to the root, or it has many outgoing arcs, the FST writes the arcs as an array (each arc gets N bytes), so we can e.g. bin search on lookup. The problem is N is set to the max(numBytesPerArc), so if you have an outlier arc e.g. with a big output, you can waste many bytes for all the other arcs that didn't need so many bytes. I generated Kuromoji's FST and found it has 271187 wasted bytes vs total size 1535612 = ~18% wasted. It would be nice to reduce this. One thing we could do without packing is: in addNode, if we detect that number of wasted bytes is above some threshold, then don't do the expansion. Another thing, if we are packing: we could record stats in the first pass about which nodes wasted the most, and then in the second pass (paack) we could set the threshold based on the top X% nodes that waste ... Another idea is maybe to deref large outputs, so that the numBytesPerArc is more uniform ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4682) Reduce wasted bytes in FST due to array arcs
[ https://issues.apache.org/jira/browse/LUCENE-4682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13552072#comment-13552072 ] Commit Tag Bot commented on LUCENE-4682: [trunk commit] Robert Muir http://svn.apache.org/viewvc?view=revisionrevision=1432522 LUCENE-4682: vInt-encode maxBytesPerArc Reduce wasted bytes in FST due to array arcs Key: LUCENE-4682 URL: https://issues.apache.org/jira/browse/LUCENE-4682 Project: Lucene - Core Issue Type: Improvement Components: core/FSTs Reporter: Michael McCandless Priority: Minor Attachments: kuromoji.wasted.bytes.txt, LUCENE-4682.patch When a node is close to the root, or it has many outgoing arcs, the FST writes the arcs as an array (each arc gets N bytes), so we can e.g. bin search on lookup. The problem is N is set to the max(numBytesPerArc), so if you have an outlier arc e.g. with a big output, you can waste many bytes for all the other arcs that didn't need so many bytes. I generated Kuromoji's FST and found it has 271187 wasted bytes vs total size 1535612 = ~18% wasted. It would be nice to reduce this. One thing we could do without packing is: in addNode, if we detect that number of wasted bytes is above some threshold, then don't do the expansion. Another thing, if we are packing: we could record stats in the first pass about which nodes wasted the most, and then in the second pass (paack) we could set the threshold based on the top X% nodes that waste ... Another idea is maybe to deref large outputs, so that the numBytesPerArc is more uniform ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4678) FST should use paged byte[] instead of single contiguous byte[]
[ https://issues.apache.org/jira/browse/LUCENE-4678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-4678: --- Attachment: LUCENE-4678.patch Patch, fixing FST.pack to not double-buffer again, using the new BytesStore.truncate method to roll back the last N bytes ... FST should use paged byte[] instead of single contiguous byte[] --- Key: LUCENE-4678 URL: https://issues.apache.org/jira/browse/LUCENE-4678 Project: Lucene - Core Issue Type: Improvement Components: core/FSTs Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.2, 5.0 Attachments: LUCENE-4678.patch, LUCENE-4678.patch, LUCENE-4678.patch, LUCENE-4678.patch, LUCENE-4678.patch The single byte[] we use today has several limitations, eg it limits us to 2.1 GB FSTs (and suggesters in the wild are getting close to this limit), and it causes big RAM spikes during building when a the array has to grow. I took basically the same approach as LUCENE-3298, but I want to break out this patch separately from changing all int - long for 2.1 GB support. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4682) Reduce wasted bytes in FST due to array arcs
[ https://issues.apache.org/jira/browse/LUCENE-4682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13552075#comment-13552075 ] Robert Muir commented on LUCENE-4682: - Another simple idea: instead of boolean allowArrayArcs we just make this a float: acceptableArrayArcOverhead (or maybe a better name). you would pass 0 to disable array arcs completely (and we'd internally still have our boolean allowArrayArcs and not waste time computing stuff if this is actually = 0) Reduce wasted bytes in FST due to array arcs Key: LUCENE-4682 URL: https://issues.apache.org/jira/browse/LUCENE-4682 Project: Lucene - Core Issue Type: Improvement Components: core/FSTs Reporter: Michael McCandless Priority: Minor Attachments: kuromoji.wasted.bytes.txt, LUCENE-4682.patch When a node is close to the root, or it has many outgoing arcs, the FST writes the arcs as an array (each arc gets N bytes), so we can e.g. bin search on lookup. The problem is N is set to the max(numBytesPerArc), so if you have an outlier arc e.g. with a big output, you can waste many bytes for all the other arcs that didn't need so many bytes. I generated Kuromoji's FST and found it has 271187 wasted bytes vs total size 1535612 = ~18% wasted. It would be nice to reduce this. One thing we could do without packing is: in addNode, if we detect that number of wasted bytes is above some threshold, then don't do the expansion. Another thing, if we are packing: we could record stats in the first pass about which nodes wasted the most, and then in the second pass (paack) we could set the threshold based on the top X% nodes that waste ... Another idea is maybe to deref large outputs, so that the numBytesPerArc is more uniform ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4417) Re-Add the backwards compatibility tests to 4.1 branch
[ https://issues.apache.org/jira/browse/LUCENE-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Rowe updated LUCENE-4417: --- Priority: Blocker (was: Major) We shouldn't release 4.1 until at least lucene-core backwards tests are re-enabled. Re-Add the backwards compatibility tests to 4.1 branch -- Key: LUCENE-4417 URL: https://issues.apache.org/jira/browse/LUCENE-4417 Project: Lucene - Core Issue Type: Task Components: general/test Reporter: Uwe Schindler Assignee: Uwe Schindler Priority: Blocker Fix For: 4.1 In 4.0 we have no backwards compatibility, but in 4.1 we must again ivy-retrieve the 4.0 JAR file and run the core tests again (like in 3.6). We may think about other modules, too, so all modules that must be backwards compatible should be added to this build. I will work on this once we have a release candidate in Maven Central. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2125) Ability to store and retrieve attributes in the inverted index
[ https://issues.apache.org/jira/browse/LUCENE-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Rowe updated LUCENE-2125: --- Fix Version/s: (was: 4.1) 4.2 Ability to store and retrieve attributes in the inverted index -- Key: LUCENE-2125 URL: https://issues.apache.org/jira/browse/LUCENE-2125 Project: Lucene - Core Issue Type: New Feature Components: core/index Affects Versions: 4.0-ALPHA Reporter: Michael Busch Assignee: Michael Busch Priority: Minor Fix For: 4.2 Now that we have the cool attribute-based TokenStream API and also the great new flexible indexing features, the next logical step is to allow storing the attributes inline in the posting lists. Currently this is only supported for the PayloadAttribute. The flex search APIs already provide an AttributeSource, so there will be a very clean and performant symmetry. It should be seamlessly possible for the user to define a new attribute, add it to the TokenStream, and then retrieve it from the flex search APIs. What I'm planning to do is to add additional methods to the token attributes (e.g. by adding a new class TokenAttributeImpl, which extends AttributeImpl and is the super class of all impls in o.a.l.a.tokenattributes): - void serialize(DataOutput) - void deserialize(DataInput) - boolean storeInIndex() The indexer will only call the serialize method of an TokenAttributeImpl in case its storeInIndex() returns true. The big advantage here is the ease-of-use: A user can implement in one place everything necessary to add the attribute to the index. Btw: I'd like to introduce DataOutput and DataInput as super classes of IndexOutput and IndexInput. They will contain methods like readByte(), readVInt(), etc., but methods such as close(), getFilePointer() etc. will stay in the super classes. Currently the payload concept is hardcoded in TermsHashPerField and FreqProxTermsWriterPerField. These classes take care of copying the contents of the PayloadAttribute over into the intermediate in-memory postinglist representation and reading it again. Ideally these classes should not know about specific attributes, but only call serialze() on those attributes that shall be stored in the posting list. We also need to change the PositionsEnum and PositionsConsumer APIs to deal with attributes instead of payloads. I think the new codecs should all support storing attributes. Only the preflex one should be hardcoded to only take the PayloadAttribute into account. We'll possibly need another extension point that allows us to influence compression across multiple postings. Today we use the length-compression trick for the payloads: if the previous payload had the same length as the current one, we don't store the length explicitly again, but only set a bit in the shifted position VInt. Since often all payloads of one posting list have the same length, this results in effective compression. Now an advanced user might want to implement a similar encoding, where it's not enough to just control serialization of a single value, but where e.g. the previous position can be taken into account to decide how to encode a value. I'm not sure yet how this extension point should look like. Maybe the flex APIs are actually already sufficient. One major goal of this feature is performance: It ought to be more efficient to e.g. define an attribute that writes and reads a single VInt than storing that VInt as a payload. The payload has the overhead of converting the data into a byte array first. An attribute on the other hand should be able to call 'int value = dataInput.readVInt();' directly without the byte[] indirection. After this part is done I'd like to use a very similar approach for column-stride fields. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-1743) MMapDirectory should only mmap large files, small files should be opened using SimpleFS/NIOFS
[ https://issues.apache.org/jira/browse/LUCENE-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Rowe updated LUCENE-1743: --- Fix Version/s: (was: 4.1) 4.2 MMapDirectory should only mmap large files, small files should be opened using SimpleFS/NIOFS - Key: LUCENE-1743 URL: https://issues.apache.org/jira/browse/LUCENE-1743 Project: Lucene - Core Issue Type: Improvement Components: core/store Affects Versions: 2.9 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 4.2 This is a followup to LUCENE-1741: Javadocs state (in FileChannel#map): For most operating systems, mapping a file into memory is more expensive than reading or writing a few tens of kilobytes of data via the usual read and write methods. From the standpoint of performance it is generally only worth mapping relatively large files into memory. MMapDirectory should get a user-configureable size parameter that is a lower limit for mmapping files. All files with a sizelimit should be opened using a conventional IndexInput from SimpleFS or NIO (another configuration option for the fallback?). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4246) Fix IndexWriter.close() to not commit or wait for pending merges
[ https://issues.apache.org/jira/browse/LUCENE-4246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13552092#comment-13552092 ] Steve Rowe commented on LUCENE-4246: I'd like to push this to 4.2. Any objections? Fix IndexWriter.close() to not commit or wait for pending merges Key: LUCENE-4246 URL: https://issues.apache.org/jira/browse/LUCENE-4246 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Assignee: Robert Muir Fix For: 4.1 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-1689) supplementary character handling
[ https://issues.apache.org/jira/browse/LUCENE-1689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Rowe resolved LUCENE-1689. Resolution: Fixed Fix Version/s: (was: 4.2) (was: 5.0) Resolving. Any remaining problems can be opened as separate issues. supplementary character handling Key: LUCENE-1689 URL: https://issues.apache.org/jira/browse/LUCENE-1689 Project: Lucene - Core Issue Type: Improvement Components: modules/analysis Reporter: Robert Muir Priority: Minor Attachments: LUCENE-1689_lowercase_example.txt, LUCENE-1689.patch, LUCENE-1689.patch, LUCENE-1689.patch, testCurrentBehavior.txt for Java 5. Java 5 is based on unicode 4, which means variable-width encoding. supplementary character support should be fixed for code that works with char/char[] For example: StandardAnalyzer, SimpleAnalyzer, StopAnalyzer, etc should at least be changed so they don't actually remove suppl characters, or modified to look for surrogates and behave correctly. LowercaseFilter should be modified to lowercase suppl. characters correctly. CharTokenizer should either be deprecated or changed so that isTokenChar() and normalize() use int. in all of these cases code should remain optimized for the BMP case, and suppl characters should be the exception, but still work. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3380) enable FileSwitchDirectory randomly in tests and fix compound-file/NoSuchDirectoryException bugs
[ https://issues.apache.org/jira/browse/LUCENE-3380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Rowe updated LUCENE-3380: --- Fix Version/s: (was: 4.1) 4.2 enable FileSwitchDirectory randomly in tests and fix compound-file/NoSuchDirectoryException bugs Key: LUCENE-3380 URL: https://issues.apache.org/jira/browse/LUCENE-3380 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Assignee: Robert Muir Fix For: 4.2 Attachments: LUCENE-3380.patch Looks like FileSwitchDirectory has the same bugs in it as LUCENE-3374. We should randomly enable this guy in tests and flush them all out the same way. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3888) split off the spell check word and surface form in spell check dictionary
[ https://issues.apache.org/jira/browse/LUCENE-3888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Rowe updated LUCENE-3888: --- Fix Version/s: (was: 4.1) 4.2 split off the spell check word and surface form in spell check dictionary - Key: LUCENE-3888 URL: https://issues.apache.org/jira/browse/LUCENE-3888 Project: Lucene - Core Issue Type: Improvement Components: modules/spellchecker Reporter: Koji Sekiguchi Assignee: Koji Sekiguchi Priority: Minor Fix For: 4.2 Attachments: LUCENE-3888.patch, LUCENE-3888.patch, LUCENE-3888.patch, LUCENE-3888.patch, LUCENE-3888.patch, LUCENE-3888.patch The did you mean? feature by using Lucene's spell checker cannot work well for Japanese environment unfortunately and is the longstanding problem, because the logic needs comparatively long text to check spells, but for some languages (e.g. Japanese), most words are too short to use the spell checker. I think, for at least Japanese, the things can be improved if we split off the spell check word and surface form in the spell check dictionary. Then we can use ReadingAttribute for spell checking but CharTermAttribute for suggesting, for example. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3298) FST has hard limit max size of 2.1 GB
[ https://issues.apache.org/jira/browse/LUCENE-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-3298: --- Attachment: LUCENE-3298.patch Initial patch with int - long in lots of places ... the Test2BFST is still running ... FST has hard limit max size of 2.1 GB - Key: LUCENE-3298 URL: https://issues.apache.org/jira/browse/LUCENE-3298 Project: Lucene - Core Issue Type: Improvement Components: core/FSTs Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Attachments: LUCENE-3298.patch, LUCENE-3298.patch, LUCENE-3298.patch The FST uses a single contiguous byte[] under the hood, which in java is indexed by int so we cannot grow this over Integer.MAX_VALUE. It also internally encodes references to this array as vInt. We could switch this to a paged byte[] and make the far larger. But I think this is low priority... I'm not going to work on it any time soon. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4217) post.jar ignores -Dparams when -Durl is used
[ https://issues.apache.org/jira/browse/SOLR-4217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13552103#comment-13552103 ] Alexandre Rafalovitch commented on SOLR-4217: - Would it be possible to fit this into 4.1? I am trying to use this for an example and it is very clunky with the current workaround: java -Dauto -Durl=http://localhost:8983/solr/multivalued/update?f.to.split=truef.to.separator=;; -jar post.jar multivalued/multivalued.csv The example should be out after 4.1, but it will not wait until 4.2 The change should be trivial, something like: - urlStr = System.getProperty(url) if (urlStr == null) { urlStr = SimplePostTool.appendParam(DEFAULT_POST_URL, params); } else { urlStr = SimplePostTool.appendParam(urlStr, params); } - I just don't have the environment setup to do full patch myself yet. post.jar ignores -Dparams when -Durl is used Key: SOLR-4217 URL: https://issues.apache.org/jira/browse/SOLR-4217 Project: Solr Issue Type: Bug Components: update Affects Versions: 4.0 Reporter: Alexandre Rafalovitch Priority: Minor Fix For: 4.2, 5.0 When post.jar is used with a custom URL (e.g. for multi-core), it silently ignores -Dparams flag and requires parameters to be appended directly to -Durl value. The problem is the following code: String params = System.getProperty(params, ); urlStr = System.getProperty(url, SimplePostTool.appendParam(DEFAULT_POST_URL, params)); The workaround exists (by using -Durl=http://customurl?param1=valueparam2=value;), but it is both undocumented as a special case and clunky as Url and params may be coming from different places. It would be good to have this consistent. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS-MAVEN] Lucene-Solr-Maven-4.x #213: POMs out of sync
Build: https://builds.apache.org/job/Lucene-Solr-Maven-4.x/213/ 1 tests failed. FAILED: org.apache.solr.cloud.SyncSliceTest.testDistribSearch Error Message: shard1 should have just been set up to be inconsistent - but it's still consistent Stack Trace: java.lang.AssertionError: shard1 should have just been set up to be inconsistent - but it's still consistent at __randomizedtesting.SeedInfo.seed([400C776269C4BF8E:C1EAF97A1E9BDFB2]:0) at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertNotNull(Assert.java:526) at org.apache.solr.cloud.SyncSliceTest.doTest(SyncSliceTest.java:214) at org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:794) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
[jira] [Commented] (LUCENE-4417) Re-Add the backwards compatibility tests to 4.1 branch
[ https://issues.apache.org/jira/browse/LUCENE-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13552118#comment-13552118 ] Robert Muir commented on LUCENE-4417: - Seems pretty complicated, core tests depend upon things like test-framework and codecs, which have experimental APIs. (look at 4.0 codebase if you dont believe me, whole experimental codecs have been folded into core functionality and removed, and so on). Even if we were to do this, i don't think it would be maintainable. For example, take issues that will seriously change the codec API like LUCENE-4547. I'd be the first to simply disable the whole thing rather than waste a bunch of time fixing outdated tests and experimental codecs from a previous release. I think it would be more bang for the buck to integrate an API comparison tool (like jdiff or whatever) that shows the breaks so we know what they are. Re-Add the backwards compatibility tests to 4.1 branch -- Key: LUCENE-4417 URL: https://issues.apache.org/jira/browse/LUCENE-4417 Project: Lucene - Core Issue Type: Task Components: general/test Reporter: Uwe Schindler Assignee: Uwe Schindler Priority: Blocker Fix For: 4.1 In 4.0 we have no backwards compatibility, but in 4.1 we must again ivy-retrieve the 4.0 JAR file and run the core tests again (like in 3.6). We may think about other modules, too, so all modules that must be backwards compatible should be added to this build. I will work on this once we have a release candidate in Maven Central. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org