[jira] [Commented] (SOLR-4165) Queries blocked when stopping and starting a node

2013-01-12 Thread Markus Jelsma (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13551878#comment-13551878
 ] 

Markus Jelsma commented on SOLR-4165:
-

Hi Mark, this is for standard stops. On shutdown the cluster can stall very 
briefly, a matter of 1 or 2 seconds at most in our case. On start up the 
problem is more serious.

 Queries blocked when stopping and starting a node
 -

 Key: SOLR-4165
 URL: https://issues.apache.org/jira/browse/SOLR-4165
 Project: Solr
  Issue Type: Bug
  Components: search, SolrCloud
Affects Versions: 5.0
 Environment: 5.0-SNAPSHOT 1366361:1420056M - markus - 2012-12-11 
 11:52:06
Reporter: Markus Jelsma
Assignee: Mark Miller
Priority: Critical
 Fix For: 4.1, 5.0


 Our 10 node test cluster (10 shards, 20 cores) blocks incoming queries 
 briefly when a node is stopped gracefully and again blocks queries for at 
 least a few seconds when the node is started again.
 We're using siege to send roughly 10 queries per second to a pair a load 
 balancers. Those load balancers ping (admin/ping) each node every few hundres 
 milliseconds. The ping queries continue to operate normally while the 
 requests to our main request handler is blocked. A manual request directly to 
 a live Solr node is also blocked for the same duration.
 There are no errors logged. But it is clear that the the entire cluster 
 blocks queries as soon as the starting node is reading its config from 
 Zookeeper, likely even slightly earlier.
 The blocking time when stopping a node varies between 1 or 5 seconds. The 
 blocking time when starting a node varies between 10 up to 30 seconds. The 
 blocked queries come rushing in again after a queue of ping requests are 
 served. The ping request sets the main request handler via the qt parameter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3735) Relocate the example mime-to-extension mapping

2013-01-12 Thread Erik Hatcher (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Hatcher updated SOLR-3735:
---

Fix Version/s: 4.1

went ahead and merged this to 4.x (4.1+) in order to minimize diffs (especially 
something minor like this) from trunk to 4x.

 Relocate the example mime-to-extension mapping
 --

 Key: SOLR-3735
 URL: https://issues.apache.org/jira/browse/SOLR-3735
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Affects Versions: 4.0-BETA, 4.0
Reporter: Erik Hatcher
Assignee: Erik Hatcher
Priority: Minor
 Fix For: 4.1, 5.0

 Attachments: SOLR-3735.patch


 A mime-to-extension mapping was added to VelocityResponseWriter recently.  
 This really belongs in the templates themselves, not in VrW, as it is 
 specific to the example search results not meant for all VrW templates.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3735) Relocate the example mime-to-extension mapping

2013-01-12 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13551893#comment-13551893
 ] 

Commit Tag Bot commented on SOLR-3735:
--

[branch_4x commit] Erik Hatcher
http://svn.apache.org/viewvc?view=revisionrevision=1432410

SOLR-3735: Relocate the example mime-to-extension mapping (merge from trunk)


 Relocate the example mime-to-extension mapping
 --

 Key: SOLR-3735
 URL: https://issues.apache.org/jira/browse/SOLR-3735
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Affects Versions: 4.0-BETA, 4.0
Reporter: Erik Hatcher
Assignee: Erik Hatcher
Priority: Minor
 Fix For: 4.1, 5.0

 Attachments: SOLR-3735.patch


 A mime-to-extension mapping was added to VelocityResponseWriter recently.  
 This really belongs in the templates themselves, not in VrW, as it is 
 specific to the example search results not meant for all VrW templates.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3735) Relocate the example mime-to-extension mapping

2013-01-12 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13551911#comment-13551911
 ] 

Commit Tag Bot commented on SOLR-3735:
--

[trunk commit] Erik Hatcher
http://svn.apache.org/viewvc?view=revisionrevision=1432411

SOLR-3735: merged to 4x, so adjust CHANGES


 Relocate the example mime-to-extension mapping
 --

 Key: SOLR-3735
 URL: https://issues.apache.org/jira/browse/SOLR-3735
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Affects Versions: 4.0-BETA, 4.0
Reporter: Erik Hatcher
Assignee: Erik Hatcher
Priority: Minor
 Fix For: 4.1, 5.0

 Attachments: SOLR-3735.patch


 A mime-to-extension mapping was added to VelocityResponseWriter recently.  
 This really belongs in the templates themselves, not in VrW, as it is 
 specific to the example search results not meant for all VrW templates.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4678) FST should use paged byte[] instead of single contiguous byte[]

2013-01-12 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13551929#comment-13551929
 ] 

Robert Muir commented on LUCENE-4678:
-

{quote}
I'll commit only to trunk for now ... and backport to 4.2 once 4.1 branches and 
once this has baked some in trunk ...
{quote}

+1... the copyBytes is frightening though!

What do you think of the FST.BytesReader - FSTBytesReader? I'm just thinking 
it causes a lot of api noise (you can see it in the patch).
Unfortunately lots of users have to create this thing to pass to methods on FST 
(e.g. findTargetArc).

So if we kept it as FST.BytesReader they would be largely unaffected?

 FST should use paged byte[] instead of single contiguous byte[]
 ---

 Key: LUCENE-4678
 URL: https://issues.apache.org/jira/browse/LUCENE-4678
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/FSTs
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.2, 5.0

 Attachments: LUCENE-4678.patch, LUCENE-4678.patch, LUCENE-4678.patch


 The single byte[] we use today has several limitations, eg it limits us to  
 2.1 GB FSTs (and suggesters in the wild are getting close to this limit), and 
 it causes big RAM spikes during building when a the array has to grow.
 I took basically the same approach as LUCENE-3298, but I want to break out 
 this patch separately from changing all int - long for  2.1 GB support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-4682) Reduce wasted bytes in FST due to array arcs

2013-01-12 Thread Michael McCandless (JIRA)
Michael McCandless created LUCENE-4682:
--

 Summary: Reduce wasted bytes in FST due to array arcs
 Key: LUCENE-4682
 URL: https://issues.apache.org/jira/browse/LUCENE-4682
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/FSTs
Reporter: Michael McCandless
Priority: Minor


When a node is close to the root, or it has many outgoing arcs, the FST writes 
the arcs as an array (each arc gets N bytes), so we can e.g. bin search on 
lookup.

The problem is N is set to the max(numBytesPerArc), so if you have an outlier 
arc e.g. with a big output, you can waste many bytes for all the other arcs 
that didn't need so many bytes.

I generated Kuromoji's FST and found it has 271187 wasted bytes vs total size 
1535612 = ~18% wasted.

It would be nice to reduce this.

One thing we could do without packing is: in addNode, if we detect that number 
of wasted bytes is above some threshold, then don't do the expansion.

Another thing, if we are packing: we could record stats in the first pass about 
which nodes wasted the most, and then in the second pass (paack) we could set 
the threshold based on the top X% nodes that waste ...

Another idea is maybe to deref large outputs, so that the numBytesPerArc is 
more uniform ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4682) Reduce wasted bytes in FST due to array arcs

2013-01-12 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13551932#comment-13551932
 ] 

Michael McCandless commented on LUCENE-4682:


A couple more ideas:

  * Since the root arc is [usually?] cached ... we [usually] shouldn't make the 
root node into an array?

  * The building process sometimes has freedom in where the outputs are pushed 
... so in theory we could push the outputs forwards if it would mean fewer 
wasted bytes on the prior node ... this would be a tricky optimization problem 
I think.

 Reduce wasted bytes in FST due to array arcs
 

 Key: LUCENE-4682
 URL: https://issues.apache.org/jira/browse/LUCENE-4682
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/FSTs
Reporter: Michael McCandless
Priority: Minor

 When a node is close to the root, or it has many outgoing arcs, the FST 
 writes the arcs as an array (each arc gets N bytes), so we can e.g. bin 
 search on lookup.
 The problem is N is set to the max(numBytesPerArc), so if you have an outlier 
 arc e.g. with a big output, you can waste many bytes for all the other arcs 
 that didn't need so many bytes.
 I generated Kuromoji's FST and found it has 271187 wasted bytes vs total size 
 1535612 = ~18% wasted.
 It would be nice to reduce this.
 One thing we could do without packing is: in addNode, if we detect that 
 number of wasted bytes is above some threshold, then don't do the expansion.
 Another thing, if we are packing: we could record stats in the first pass 
 about which nodes wasted the most, and then in the second pass (paack) we 
 could set the threshold based on the top X% nodes that waste ...
 Another idea is maybe to deref large outputs, so that the numBytesPerArc is 
 more uniform ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4682) Reduce wasted bytes in FST due to array arcs

2013-01-12 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13551934#comment-13551934
 ] 

Michael McCandless commented on LUCENE-4682:


Maybe we should just tighten up the FST thresholds for when we make an array 
arc:
{noformat}
  /**
   * @see #shouldExpand(UnCompiledNode)
   */
  final static int FIXED_ARRAY_SHALLOW_DISTANCE = 3; // 0 = only root node.

  /**
   * @see #shouldExpand(UnCompiledNode)
   */
  final static int FIXED_ARRAY_NUM_ARCS_SHALLOW = 5;

  /**
   * @see #shouldExpand(UnCompiledNode)
   */
  final static int FIXED_ARRAY_NUM_ARCS_DEEP = 10;
{noformat}

When I print out the waste, it's generally the smaller nodes that have higher 
proportional waste:
{noformat}
 [java] waste: 44 numArcs=16 perArc=2.75
 [java] waste: 20 numArcs=11 perArc=1.8181819
 [java] waste: 13 numArcs=5 perArc=2.6
 [java] waste: 20 numArcs=12 perArc=1.666
 [java] waste: 60 numArcs=20 perArc=3.0
 [java] waste: 0 numArcs=5 perArc=0.0
 [java] waste: 48 numArcs=15 perArc=3.2
 [java] waste: 16 numArcs=5 perArc=3.2
 [java] waste: 20 numArcs=6 perArc=3.333
 [java] waste: 8 numArcs=6 perArc=1.334
 [java] waste: 24 numArcs=8 perArc=3.0
 [java] waste: 32 numArcs=9 perArc=3.556
 [java] waste: 17 numArcs=7 perArc=2.4285715
 [java] waste: 13 numArcs=5 perArc=2.6
 [java] waste: 17 numArcs=6 perArc=2.833
 [java] waste: 28 numArcs=8 perArc=3.5
 [java] waste: 20 numArcs=16 perArc=1.25
 [java] waste: 44 numArcs=15 perArc=2.934
 [java] waste: 28 numArcs=13 perArc=2.1538463
 [java] waste: 28 numArcs=15 perArc=1.867
{noformat}

 Reduce wasted bytes in FST due to array arcs
 

 Key: LUCENE-4682
 URL: https://issues.apache.org/jira/browse/LUCENE-4682
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/FSTs
Reporter: Michael McCandless
Priority: Minor

 When a node is close to the root, or it has many outgoing arcs, the FST 
 writes the arcs as an array (each arc gets N bytes), so we can e.g. bin 
 search on lookup.
 The problem is N is set to the max(numBytesPerArc), so if you have an outlier 
 arc e.g. with a big output, you can waste many bytes for all the other arcs 
 that didn't need so many bytes.
 I generated Kuromoji's FST and found it has 271187 wasted bytes vs total size 
 1535612 = ~18% wasted.
 It would be nice to reduce this.
 One thing we could do without packing is: in addNode, if we detect that 
 number of wasted bytes is above some threshold, then don't do the expansion.
 Another thing, if we are packing: we could record stats in the first pass 
 about which nodes wasted the most, and then in the second pass (paack) we 
 could set the threshold based on the top X% nodes that waste ...
 Another idea is maybe to deref large outputs, so that the numBytesPerArc is 
 more uniform ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4682) Reduce wasted bytes in FST due to array arcs

2013-01-12 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-4682:
---

Attachment: kuromoji.wasted.bytes.txt

Shows the wasted bytes ... one line per node whose arcs were turned into an 
array, sorted by net bytes wasted.

 Reduce wasted bytes in FST due to array arcs
 

 Key: LUCENE-4682
 URL: https://issues.apache.org/jira/browse/LUCENE-4682
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/FSTs
Reporter: Michael McCandless
Priority: Minor
 Attachments: kuromoji.wasted.bytes.txt


 When a node is close to the root, or it has many outgoing arcs, the FST 
 writes the arcs as an array (each arc gets N bytes), so we can e.g. bin 
 search on lookup.
 The problem is N is set to the max(numBytesPerArc), so if you have an outlier 
 arc e.g. with a big output, you can waste many bytes for all the other arcs 
 that didn't need so many bytes.
 I generated Kuromoji's FST and found it has 271187 wasted bytes vs total size 
 1535612 = ~18% wasted.
 It would be nice to reduce this.
 One thing we could do without packing is: in addNode, if we detect that 
 number of wasted bytes is above some threshold, then don't do the expansion.
 Another thing, if we are packing: we could record stats in the first pass 
 about which nodes wasted the most, and then in the second pass (paack) we 
 could set the threshold based on the top X% nodes that waste ...
 Another idea is maybe to deref large outputs, so that the numBytesPerArc is 
 more uniform ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4682) Reduce wasted bytes in FST due to array arcs

2013-01-12 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13551938#comment-13551938
 ] 

Robert Muir commented on LUCENE-4682:
-

As an experiment i turned off array arcs for kuromoji in my trunk checkout:

FST
before: [java]   53645 nodes, 253185 arcs, 1535612 bytes...   done
after:  [java]   53645 nodes, 253185 arcs, 1228816 bytes...   done

JAR
before: rw-rw-r-  1 rmuir rmuir 4581420 Jan 12 09:56 
lucene-analyzers-kuromoji-4.1-SNAPSHOT.jar
after:  rw-rw-r- 1 rmuir rmuir  4306792 Jan 12 09:56 
lucene-analyzers-kuromoji-5.0-SNAPSHOT.jar

 Reduce wasted bytes in FST due to array arcs
 

 Key: LUCENE-4682
 URL: https://issues.apache.org/jira/browse/LUCENE-4682
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/FSTs
Reporter: Michael McCandless
Priority: Minor
 Attachments: kuromoji.wasted.bytes.txt


 When a node is close to the root, or it has many outgoing arcs, the FST 
 writes the arcs as an array (each arc gets N bytes), so we can e.g. bin 
 search on lookup.
 The problem is N is set to the max(numBytesPerArc), so if you have an outlier 
 arc e.g. with a big output, you can waste many bytes for all the other arcs 
 that didn't need so many bytes.
 I generated Kuromoji's FST and found it has 271187 wasted bytes vs total size 
 1535612 = ~18% wasted.
 It would be nice to reduce this.
 One thing we could do without packing is: in addNode, if we detect that 
 number of wasted bytes is above some threshold, then don't do the expansion.
 Another thing, if we are packing: we could record stats in the first pass 
 about which nodes wasted the most, and then in the second pass (paack) we 
 could set the threshold based on the top X% nodes that waste ...
 Another idea is maybe to deref large outputs, so that the numBytesPerArc is 
 more uniform ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4682) Reduce wasted bytes in FST due to array arcs

2013-01-12 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13551939#comment-13551939
 ] 

Michael McCandless commented on LUCENE-4682:


Even more than the 271,187 I measured (20% smaller FST), I think because the 
FST is now smaller we use fewer bytes writing the delta-coded node addresses ...

 Reduce wasted bytes in FST due to array arcs
 

 Key: LUCENE-4682
 URL: https://issues.apache.org/jira/browse/LUCENE-4682
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/FSTs
Reporter: Michael McCandless
Priority: Minor
 Attachments: kuromoji.wasted.bytes.txt


 When a node is close to the root, or it has many outgoing arcs, the FST 
 writes the arcs as an array (each arc gets N bytes), so we can e.g. bin 
 search on lookup.
 The problem is N is set to the max(numBytesPerArc), so if you have an outlier 
 arc e.g. with a big output, you can waste many bytes for all the other arcs 
 that didn't need so many bytes.
 I generated Kuromoji's FST and found it has 271187 wasted bytes vs total size 
 1535612 = ~18% wasted.
 It would be nice to reduce this.
 One thing we could do without packing is: in addNode, if we detect that 
 number of wasted bytes is above some threshold, then don't do the expansion.
 Another thing, if we are packing: we could record stats in the first pass 
 about which nodes wasted the most, and then in the second pass (paack) we 
 could set the threshold based on the top X% nodes that waste ...
 Another idea is maybe to deref large outputs, so that the numBytesPerArc is 
 more uniform ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4682) Reduce wasted bytes in FST due to array arcs

2013-01-12 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13551940#comment-13551940
 ] 

Robert Muir commented on LUCENE-4682:
-

in the fixedArray case:
{code}
// write a false first arc:
writer.writeByte(ARCS_AS_FIXED_ARRAY);
writer.writeVInt(nodeIn.numArcs);
// placeholder -- we'll come back and write the number
// of bytes per arc (int) here:
// TODO: we could make this a vInt instead
writer.writeInt(0);
fixedArrayStart = writer.getPosition();
{code}

I think we should actually make that TODO line a writeByte.

If it turns out the max arcSize is  255 i think we should just not encode as 
array arcs (just save our position before we write ARCS_AS_FIXED_ARRAY, rewind 
to that, and encode normally)

This would reduce the overhead of array-arcs, but also maybe prevent some worst 
cases causing waste as a side effect.


 Reduce wasted bytes in FST due to array arcs
 

 Key: LUCENE-4682
 URL: https://issues.apache.org/jira/browse/LUCENE-4682
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/FSTs
Reporter: Michael McCandless
Priority: Minor
 Attachments: kuromoji.wasted.bytes.txt


 When a node is close to the root, or it has many outgoing arcs, the FST 
 writes the arcs as an array (each arc gets N bytes), so we can e.g. bin 
 search on lookup.
 The problem is N is set to the max(numBytesPerArc), so if you have an outlier 
 arc e.g. with a big output, you can waste many bytes for all the other arcs 
 that didn't need so many bytes.
 I generated Kuromoji's FST and found it has 271187 wasted bytes vs total size 
 1535612 = ~18% wasted.
 It would be nice to reduce this.
 One thing we could do without packing is: in addNode, if we detect that 
 number of wasted bytes is above some threshold, then don't do the expansion.
 Another thing, if we are packing: we could record stats in the first pass 
 about which nodes wasted the most, and then in the second pass (paack) we 
 could set the threshold based on the top X% nodes that waste ...
 Another idea is maybe to deref large outputs, so that the numBytesPerArc is 
 more uniform ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4678) FST should use paged byte[] instead of single contiguous byte[]

2013-01-12 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13551941#comment-13551941
 ] 

Michael McCandless commented on LUCENE-4678:


bq. the copyBytes is frightening though!

I know!  But hopefully the random test catches any problems w/ it ... jenkins 
will tell us.

bq. So if we kept it as FST.BytesReader they would be largely unaffected?

+1, I moved back to that ... no more noise ... I'll attach new patch shortly.

 FST should use paged byte[] instead of single contiguous byte[]
 ---

 Key: LUCENE-4678
 URL: https://issues.apache.org/jira/browse/LUCENE-4678
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/FSTs
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.2, 5.0

 Attachments: LUCENE-4678.patch, LUCENE-4678.patch, LUCENE-4678.patch


 The single byte[] we use today has several limitations, eg it limits us to  
 2.1 GB FSTs (and suggesters in the wild are getting close to this limit), and 
 it causes big RAM spikes during building when a the array has to grow.
 I took basically the same approach as LUCENE-3298, but I want to break out 
 this patch separately from changing all int - long for  2.1 GB support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4678) FST should use paged byte[] instead of single contiguous byte[]

2013-01-12 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-4678:
---

Attachment: LUCENE-4678.patch

New patch, move BytesReader back under FST.  I think it's ready.

 FST should use paged byte[] instead of single contiguous byte[]
 ---

 Key: LUCENE-4678
 URL: https://issues.apache.org/jira/browse/LUCENE-4678
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/FSTs
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.2, 5.0

 Attachments: LUCENE-4678.patch, LUCENE-4678.patch, LUCENE-4678.patch, 
 LUCENE-4678.patch


 The single byte[] we use today has several limitations, eg it limits us to  
 2.1 GB FSTs (and suggesters in the wild are getting close to this limit), and 
 it causes big RAM spikes during building when a the array has to grow.
 I took basically the same approach as LUCENE-3298, but I want to break out 
 this patch separately from changing all int - long for  2.1 GB support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4682) Reduce wasted bytes in FST due to array arcs

2013-01-12 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13551944#comment-13551944
 ] 

Dawid Weiss commented on LUCENE-4682:
-

bq. Even more than the 271,187 I measured (20% smaller FST), I think because 
the FST is now smaller we use fewer bytes writing the delta-coded node 
addresses 

Yes, these things are all tightly coupled.

Dawid

 Reduce wasted bytes in FST due to array arcs
 

 Key: LUCENE-4682
 URL: https://issues.apache.org/jira/browse/LUCENE-4682
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/FSTs
Reporter: Michael McCandless
Priority: Minor
 Attachments: kuromoji.wasted.bytes.txt


 When a node is close to the root, or it has many outgoing arcs, the FST 
 writes the arcs as an array (each arc gets N bytes), so we can e.g. bin 
 search on lookup.
 The problem is N is set to the max(numBytesPerArc), so if you have an outlier 
 arc e.g. with a big output, you can waste many bytes for all the other arcs 
 that didn't need so many bytes.
 I generated Kuromoji's FST and found it has 271187 wasted bytes vs total size 
 1535612 = ~18% wasted.
 It would be nice to reduce this.
 One thing we could do without packing is: in addNode, if we detect that 
 number of wasted bytes is above some threshold, then don't do the expansion.
 Another thing, if we are packing: we could record stats in the first pass 
 about which nodes wasted the most, and then in the second pass (paack) we 
 could set the threshold based on the top X% nodes that waste ...
 Another idea is maybe to deref large outputs, so that the numBytesPerArc is 
 more uniform ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4678) FST should use paged byte[] instead of single contiguous byte[]

2013-01-12 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13551947#comment-13551947
 ] 

Commit Tag Bot commented on LUCENE-4678:


[trunk commit] Michael McCandless
http://svn.apache.org/viewvc?view=revisionrevision=1432459

LUCENE-4678: use paged byte[] under the hood for FST


 FST should use paged byte[] instead of single contiguous byte[]
 ---

 Key: LUCENE-4678
 URL: https://issues.apache.org/jira/browse/LUCENE-4678
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/FSTs
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.2, 5.0

 Attachments: LUCENE-4678.patch, LUCENE-4678.patch, LUCENE-4678.patch, 
 LUCENE-4678.patch


 The single byte[] we use today has several limitations, eg it limits us to  
 2.1 GB FSTs (and suggesters in the wild are getting close to this limit), and 
 it causes big RAM spikes during building when a the array has to grow.
 I took basically the same approach as LUCENE-3298, but I want to break out 
 this patch separately from changing all int - long for  2.1 GB support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4682) Reduce wasted bytes in FST due to array arcs

2013-01-12 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13551950#comment-13551950
 ] 

Michael McCandless commented on LUCENE-4682:


Another datapoint: the FreeDB suggester (tool in luceneutil to create/test it) 
is 1.05 GB FST, and has 87.5 MB wasted bytes (~8%).

 Reduce wasted bytes in FST due to array arcs
 

 Key: LUCENE-4682
 URL: https://issues.apache.org/jira/browse/LUCENE-4682
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/FSTs
Reporter: Michael McCandless
Priority: Minor
 Attachments: kuromoji.wasted.bytes.txt


 When a node is close to the root, or it has many outgoing arcs, the FST 
 writes the arcs as an array (each arc gets N bytes), so we can e.g. bin 
 search on lookup.
 The problem is N is set to the max(numBytesPerArc), so if you have an outlier 
 arc e.g. with a big output, you can waste many bytes for all the other arcs 
 that didn't need so many bytes.
 I generated Kuromoji's FST and found it has 271187 wasted bytes vs total size 
 1535612 = ~18% wasted.
 It would be nice to reduce this.
 One thing we could do without packing is: in addNode, if we detect that 
 number of wasted bytes is above some threshold, then don't do the expansion.
 Another thing, if we are packing: we could record stats in the first pass 
 about which nodes wasted the most, and then in the second pass (paack) we 
 could set the threshold based on the top X% nodes that waste ...
 Another idea is maybe to deref large outputs, so that the numBytesPerArc is 
 more uniform ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4677) Use vInt to encode node addresses inside FST

2013-01-12 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13551954#comment-13551954
 ] 

Commit Tag Bot commented on LUCENE-4677:


[trunk commit] Michael McCandless
http://svn.apache.org/viewvc?view=revisionrevision=1432466

LUCENE-4677: use vInt not int to encode arc's target address in un-packed FSTs


 Use vInt to encode node addresses inside FST
 

 Key: LUCENE-4677
 URL: https://issues.apache.org/jira/browse/LUCENE-4677
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.2, 5.0

 Attachments: LUCENE-4677.patch, LUCENE-4677.patch, LUCENE-4677.patch


 Today we use int, but towards enabling  2.1G sized FSTs, I'd like to make 
 this vInt instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4287) Maven artifact file names do not match dist/ file names

2013-01-12 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13551975#comment-13551975
 ] 

Commit Tag Bot commented on SOLR-4287:
--

[trunk commit] Steven Rowe
http://svn.apache.org/viewvc?view=revisionrevision=1432483

SOLR-4287: Removed apache- prefix from Solr distribution and artifact 
filenames.


 Maven artifact file names do not match dist/ file names
 ---

 Key: SOLR-4287
 URL: https://issues.apache.org/jira/browse/SOLR-4287
 Project: Solr
  Issue Type: Bug
  Components: Build
Affects Versions: 4.0
Reporter: Ryan Ernst
Assignee: Steve Rowe
Priority: Blocker
 Fix For: 4.1

 Attachments: SOLR-4287_alternative.patch, SOLR-4287.patch


 For the solr artifact, the war file name has the format solr-X.Y.Z.war.
 http://search.maven.org/#artifactdetails%7Corg.apache.solr%7Csolr%7C4.0.0%7Cwar
 However, when building from source or downloading the dist/ built war file, 
 it is named apache-solr-X.Y.Z.war.  This should really be the same...
 Preferably the apache- could just be removed, since the lucene build does 
 not appear to use the same convention.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4287) Maven artifact file names do not match dist/ file names

2013-01-12 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13551980#comment-13551980
 ] 

Commit Tag Bot commented on SOLR-4287:
--

[branch_4x commit] Steven Rowe
http://svn.apache.org/viewvc?view=revisionrevision=1432486

SOLR-4287: Removed apache- prefix from Solr distribution and artifact 
filenames. (merged trunk r1432483)


 Maven artifact file names do not match dist/ file names
 ---

 Key: SOLR-4287
 URL: https://issues.apache.org/jira/browse/SOLR-4287
 Project: Solr
  Issue Type: Bug
  Components: Build
Affects Versions: 4.0
Reporter: Ryan Ernst
Assignee: Steve Rowe
Priority: Blocker
 Fix For: 4.1

 Attachments: SOLR-4287_alternative.patch, SOLR-4287.patch


 For the solr artifact, the war file name has the format solr-X.Y.Z.war.
 http://search.maven.org/#artifactdetails%7Corg.apache.solr%7Csolr%7C4.0.0%7Cwar
 However, when building from source or downloading the dist/ built war file, 
 it is named apache-solr-X.Y.Z.war.  This should really be the same...
 Preferably the apache- could just be removed, since the lucene build does 
 not appear to use the same convention.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



lucene-solr pull request: Fix API web link for IndexDeletionPolicy (against...

2013-01-12 Thread arafalov
GitHub user arafalov opened a pull request:

https://github.com/apache/lucene-solr/pull/6

Fix API web link for IndexDeletionPolicy (against solr 4.x branch)

I did it for Lucene 4.0, as I am not sure where 4.1 will live.
In any case, this is better than currently-dead 3.5 link.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/arafalov/lucene-solr patch-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/lucene-solr/pull/6.patch


commit 5991153f26cc92c7cc5c95d6a1774eb3050b0643
Author: Alexandre Rafalovitch arafa...@gmail.com
Date:   2013-01-12T17:57:20Z

Fix API web link for IndexDeletionPolicy

I did it for Lucene 4.0, as I am not sure where 4.1 will live.
In any case, this is better than currently-dead 3.5 link.




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



lucene-solr pull request: Trivial documentation URL fix

2013-01-12 Thread arafalov
Github user arafalov closed the pull request at:

https://github.com/apache/lucene-solr/pull/5


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-4287) Maven artifact file names do not match dist/ file names

2013-01-12 Thread Steve Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe resolved SOLR-4287.
--

Resolution: Fixed

Committed to trunk and branch_4x.

Thanks Ryan!

 Maven artifact file names do not match dist/ file names
 ---

 Key: SOLR-4287
 URL: https://issues.apache.org/jira/browse/SOLR-4287
 Project: Solr
  Issue Type: Bug
  Components: Build
Affects Versions: 4.0
Reporter: Ryan Ernst
Assignee: Steve Rowe
Priority: Blocker
 Fix For: 4.1

 Attachments: SOLR-4287_alternative.patch, SOLR-4287.patch


 For the solr artifact, the war file name has the format solr-X.Y.Z.war.
 http://search.maven.org/#artifactdetails%7Corg.apache.solr%7Csolr%7C4.0.0%7Cwar
 However, when building from source or downloading the dist/ built war file, 
 it is named apache-solr-X.Y.Z.war.  This should really be the same...
 Preferably the apache- could just be removed, since the lucene build does 
 not appear to use the same convention.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS-MAVEN] Lucene-Solr-Maven-4.x #212: POMs out of sync

2013-01-12 Thread Steve Rowe
The POMs really are out of sync:

-
-validate-maven-dependencies:
 [licenses] MISSING sha1 checksum file for: 
/home/hudson/.m2/repository/org/apache/velocity/velocity/1.6.4/velocity-1.6.4.jar
 [licenses] Scanned 32 JAR file(s) for licenses (in 0.14s.), 1 error(s).
-

I'll make an adjustment shortly.

(I should also fix the log trimming regex for the Maven Jenkins jobs so that 
this error makes it into future failure emails.)

Steve

On Jan 12, 2013, at 12:06 PM, Apache Jenkins Server jenk...@builds.apache.org 
wrote:

 Build: https://builds.apache.org/job/Lucene-Solr-Maven-4.x/212/
 
 No tests ran.
 
 Build Log:
 [...truncated 11125 lines...]
 
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1028) Automatic core loading unloading for multicore

2013-01-12 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13551991#comment-13551991
 ] 

Steve Rowe commented on SOLR-1028:
--

Erick, can this issue be resolved?

 Automatic core loading unloading for multicore
 --

 Key: SOLR-1028
 URL: https://issues.apache.org/jira/browse/SOLR-1028
 Project: Solr
  Issue Type: New Feature
  Components: multicore
Affects Versions: 4.0, 5.0
Reporter: Noble Paul
Assignee: Erick Erickson
 Fix For: 4.1, 5.0

 Attachments: jenkins.jpg, SOLR-1028.patch, SOLR-1028.patch, 
 SOLR-1028_testnoise.patch


 usecase: I have many small cores (say one per user) on a single Solr box . 
 All the cores are not be always needed . But when I need it I should be able 
 to directly issue a search request and the core must be STARTED automatically 
 and the request must be served.
 This also requires that I must have an upper limit on the no:of cores that 
 should be loaded at any given point in time. If the limit is crossed the 
 CoreContainer must unload a core (preferably the least recently used core)  
 There must be a choice of specifying some cores as fixed. These cores must 
 never be unloaded 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-4299) Failed with java.net.BindException Address already in use

2013-01-12 Thread Nithin Chacko Ninan (JIRA)
Nithin Chacko Ninan created SOLR-4299:
-

 Summary: Failed with java.net.BindException Address already in use
 Key: SOLR-4299
 URL: https://issues.apache.org/jira/browse/SOLR-4299
 Project: Solr
  Issue Type: Bug
Reporter: Nithin Chacko Ninan


Hello Team,

We have configured magetno solr search on our stage instance.While testing, we 
noticed that solr is not working as expected.we searched on solr confgiuration 
and we used java -jar start.jar to check the port status. we noticed the 
above mentioned issue (ie failed with java.net.BindException Address already in 
use).
Any comment or help will be appriciated.

thanks!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-2305) Introduce Version in more places long before 4.0

2013-01-12 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera resolved LUCENE-2305.


   Resolution: Won't Fix
Fix Version/s: (was: 4.2)
   (was: 5.0)

4.0 is out long ago :).
And I don't think we need that issue if we want to add Version to more places.

 Introduce Version in more places long before 4.0
 

 Key: LUCENE-2305
 URL: https://issues.apache.org/jira/browse/LUCENE-2305
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/other
Reporter: Shai Erera

 We need to introduce Version in as many places as we can (wherever it makes 
 sense of course), and preferably long before 4.0 (or shall I say 3.9?) is 
 out. That way, we can have a bunch of deprecated API now, that will be gone 
 in 4.0, rather than doing it one class at a time and never finish :).
 The purpose is to introduce Version wherever it is mandatory now, and also in 
 places where we think it might be useful in the future (like most of our 
 Analyzers, configured classes and configuration classes).
 I marked this issue for 3.1, though I don't expect it to end in 3.1. I still 
 think it will be done one step at a time, perhaps for cluster of classes 
 together. But on the other hand I don't want to mark it for 4.0.0 because 
 that needs to be resolved much sooner. So if I had a 3.9 version defined, I'd 
 mark it for 3.9. We can do several commits in one issue right? So this one 
 can live for a while in JIRA, while we gradually convert more and more 
 classes.
 The first candidate is InstantiatedIndexWriter which probably should take an 
 IndexWriterConfig. While I converted the code to use IWC, I've noticed 
 Instantiated defaults its maxFieldLength to the current default (10,000) 
 which is deprecated. I couldn't change it for back-compat reasons. But we can 
 upgrade it to accept IWC, and set to unlimited if the version is onOrAfter 
 3.1, otherwise stay w/ the deprecated default.
 if it's acceptable to have several commits in one issue, I can start w/ 
 Instantiated, post a patch and then we can continue to more classes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4299) Failed with java.net.BindException Address already in use

2013-01-12 Thread Nithin Chacko Ninan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nithin Chacko Ninan updated SOLR-4299:
--

Description: 
Hello Team,

We have configured magetno solr search on our stage instance.While testing, we 
noticed that solr is not working as expected.we searched on solr confgiuration 
and we used java -jar start.jar to check the port status. we noticed the 
below mentioned issue (ie failed with java.net.BindException Address already in 
use).
Any comment or help will be appreciated.



NFO: [] Registered new searcher Searcher@668db25b main
2013-01-12 19:36:50.223:WARN::failed SocketConnector@0.0.0.0:8983: 
java.net.BindException: Address already in use
2013-01-12 19:36:50.223:WARN::failed Server@7ca7700a: java.net.BindException: 
Address already in use
2013-01-12 19:36:50.223:WARN::EXCEPTION 
java.net.BindException: Address already in use
at java.net.PlainSocketImpl.socketBind(Native Method)
at 
java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:376)
at java.net.ServerSocket.bind(ServerSocket.java:376)
at java.net.ServerSocket.init(ServerSocket.java:237)
at java.net.ServerSocket.init(ServerSocket.java:181)
at 
org.mortbay.jetty.bio.SocketConnector.newServerSocket(SocketConnector.java:80)
at org.mortbay.jetty.bio.SocketConnector.open(SocketConnector.java:73)
at 
org.mortbay.jetty.AbstractConnector.doStart(AbstractConnector.java:283)
at 
org.mortbay.jetty.bio.SocketConnector.doStart(SocketConnector.java:147)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at org.mortbay.jetty.Server.doStart(Server.java:235)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:985)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.mortbay.start.Main.invokeMain(Main.java:194)
at org.mortbay.start.Main.start(Main.java:534)
at org.mortbay.start.Main.start(Main.java:441)
at org.mortbay.start.Main.main(Main.java:119)
thanks!

  was:
Hello Team,

We have configured magetno solr search on our stage instance.While testing, we 
noticed that solr is not working as expected.we searched on solr confgiuration 
and we used java -jar start.jar to check the port status. we noticed the 
above mentioned issue (ie failed with java.net.BindException Address already in 
use).
Any comment or help will be appriciated.

thanks!


 Failed with java.net.BindException Address already in use
 -

 Key: SOLR-4299
 URL: https://issues.apache.org/jira/browse/SOLR-4299
 Project: Solr
  Issue Type: Bug
Reporter: Nithin Chacko Ninan

 Hello Team,
 We have configured magetno solr search on our stage instance.While testing, 
 we noticed that solr is not working as expected.we searched on solr 
 confgiuration and we used java -jar start.jar to check the port status. we 
 noticed the below mentioned issue (ie failed with java.net.BindException 
 Address already in use).
 Any comment or help will be appreciated.
 NFO: [] Registered new searcher Searcher@668db25b main
 2013-01-12 19:36:50.223:WARN::failed SocketConnector@0.0.0.0:8983: 
 java.net.BindException: Address already in use
 2013-01-12 19:36:50.223:WARN::failed Server@7ca7700a: java.net.BindException: 
 Address already in use
 2013-01-12 19:36:50.223:WARN::EXCEPTION 
 java.net.BindException: Address already in use
   at java.net.PlainSocketImpl.socketBind(Native Method)
   at 
 java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:376)
   at java.net.ServerSocket.bind(ServerSocket.java:376)
   at java.net.ServerSocket.init(ServerSocket.java:237)
   at java.net.ServerSocket.init(ServerSocket.java:181)
   at 
 org.mortbay.jetty.bio.SocketConnector.newServerSocket(SocketConnector.java:80)
   at org.mortbay.jetty.bio.SocketConnector.open(SocketConnector.java:73)
   at 
 org.mortbay.jetty.AbstractConnector.doStart(AbstractConnector.java:283)
   at 
 org.mortbay.jetty.bio.SocketConnector.doStart(SocketConnector.java:147)
   at 
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
   at org.mortbay.jetty.Server.doStart(Server.java:235)
   at 
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
   at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:985)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 

[jira] [Resolved] (SOLR-4299) Failed with java.net.BindException Address already in use

2013-01-12 Thread Steve Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe resolved SOLR-4299.
--

Resolution: Invalid
  Assignee: Steve Rowe

Please post questions about using Solr to the solr-user mailing list, rather 
than creating JIRA issues - see 
[http://lucene.apache.org/solr/discussion.html].

You might like the following, which I found by searching the interweb:

* http://stackoverflow.com/questions/6645253/solr-configuration
* 
http://javarevisited.blogspot.com/2011/12/address-already-use-jvm-bind-exception.html

 Failed with java.net.BindException Address already in use
 -

 Key: SOLR-4299
 URL: https://issues.apache.org/jira/browse/SOLR-4299
 Project: Solr
  Issue Type: Bug
Reporter: Nithin Chacko Ninan
Assignee: Steve Rowe

 Hello Team,
 We have configured magetno solr search on our stage instance.While testing, 
 we noticed that solr is not working as expected.we searched on solr 
 confgiuration and we used java -jar start.jar to check the port status. we 
 noticed the below mentioned issue (ie failed with java.net.BindException 
 Address already in use).
 Any comment or help will be appreciated.
 NFO: [] Registered new searcher Searcher@668db25b main
 2013-01-12 19:36:50.223:WARN::failed SocketConnector@0.0.0.0:8983: 
 java.net.BindException: Address already in use
 2013-01-12 19:36:50.223:WARN::failed Server@7ca7700a: java.net.BindException: 
 Address already in use
 2013-01-12 19:36:50.223:WARN::EXCEPTION 
 java.net.BindException: Address already in use
   at java.net.PlainSocketImpl.socketBind(Native Method)
   at 
 java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:376)
   at java.net.ServerSocket.bind(ServerSocket.java:376)
   at java.net.ServerSocket.init(ServerSocket.java:237)
   at java.net.ServerSocket.init(ServerSocket.java:181)
   at 
 org.mortbay.jetty.bio.SocketConnector.newServerSocket(SocketConnector.java:80)
   at org.mortbay.jetty.bio.SocketConnector.open(SocketConnector.java:73)
   at 
 org.mortbay.jetty.AbstractConnector.doStart(AbstractConnector.java:283)
   at 
 org.mortbay.jetty.bio.SocketConnector.doStart(SocketConnector.java:147)
   at 
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
   at org.mortbay.jetty.Server.doStart(Server.java:235)
   at 
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
   at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:985)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:601)
   at org.mortbay.start.Main.invokeMain(Main.java:194)
   at org.mortbay.start.Main.start(Main.java:534)
   at org.mortbay.start.Main.start(Main.java:441)
   at org.mortbay.start.Main.main(Main.java:119)
 thanks!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3735) Relocate the example mime-to-extension mapping

2013-01-12 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13552040#comment-13552040
 ] 

Commit Tag Bot commented on SOLR-3735:
--

[branch_4x commit] Steven Rowe
http://svn.apache.org/viewvc?view=revisionrevision=1432501

SOLR-3735: Maven configuration: upgrade velocity dependency from 1.6.4 to 1.7


 Relocate the example mime-to-extension mapping
 --

 Key: SOLR-3735
 URL: https://issues.apache.org/jira/browse/SOLR-3735
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Affects Versions: 4.0-BETA, 4.0
Reporter: Erik Hatcher
Assignee: Erik Hatcher
Priority: Minor
 Fix For: 4.1, 5.0

 Attachments: SOLR-3735.patch


 A mime-to-extension mapping was added to VelocityResponseWriter recently.  
 This really belongs in the templates themselves, not in VrW, as it is 
 specific to the example search results not meant for all VrW templates.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4682) Reduce wasted bytes in FST due to array arcs

2013-01-12 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-4682:


Attachment: LUCENE-4682.patch

Mike can you try this patch on your corpus?

It cuts us over to vint for the maxBytesPerArc (saving 3 bytes for the unpacked 
case), and adds an acceptable overhead for array arcs (currently 1.25).

For the kuromoji packed case, this seems to solve the waste:

 [java]   53645 nodes, 253185 arcs, 1309077 bytes...   done


 Reduce wasted bytes in FST due to array arcs
 

 Key: LUCENE-4682
 URL: https://issues.apache.org/jira/browse/LUCENE-4682
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/FSTs
Reporter: Michael McCandless
Priority: Minor
 Attachments: kuromoji.wasted.bytes.txt, LUCENE-4682.patch


 When a node is close to the root, or it has many outgoing arcs, the FST 
 writes the arcs as an array (each arc gets N bytes), so we can e.g. bin 
 search on lookup.
 The problem is N is set to the max(numBytesPerArc), so if you have an outlier 
 arc e.g. with a big output, you can waste many bytes for all the other arcs 
 that didn't need so many bytes.
 I generated Kuromoji's FST and found it has 271187 wasted bytes vs total size 
 1535612 = ~18% wasted.
 It would be nice to reduce this.
 One thing we could do without packing is: in addNode, if we detect that 
 number of wasted bytes is above some threshold, then don't do the expansion.
 Another thing, if we are packing: we could record stats in the first pass 
 about which nodes wasted the most, and then in the second pass (paack) we 
 could set the threshold based on the top X% nodes that waste ...
 Another idea is maybe to deref large outputs, so that the numBytesPerArc is 
 more uniform ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4682) Reduce wasted bytes in FST due to array arcs

2013-01-12 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13552057#comment-13552057
 ] 

Michael McCandless commented on LUCENE-4682:


+1

This is much cleaner (write header in the end).

I built the AnalyzingSuggester for FreeDB: trunk is 1.046 GB and with patch 
it's 0.917 GB = ~9% smaller!

 Reduce wasted bytes in FST due to array arcs
 

 Key: LUCENE-4682
 URL: https://issues.apache.org/jira/browse/LUCENE-4682
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/FSTs
Reporter: Michael McCandless
Priority: Minor
 Attachments: kuromoji.wasted.bytes.txt, LUCENE-4682.patch


 When a node is close to the root, or it has many outgoing arcs, the FST 
 writes the arcs as an array (each arc gets N bytes), so we can e.g. bin 
 search on lookup.
 The problem is N is set to the max(numBytesPerArc), so if you have an outlier 
 arc e.g. with a big output, you can waste many bytes for all the other arcs 
 that didn't need so many bytes.
 I generated Kuromoji's FST and found it has 271187 wasted bytes vs total size 
 1535612 = ~18% wasted.
 It would be nice to reduce this.
 One thing we could do without packing is: in addNode, if we detect that 
 number of wasted bytes is above some threshold, then don't do the expansion.
 Another thing, if we are packing: we could record stats in the first pass 
 about which nodes wasted the most, and then in the second pass (paack) we 
 could set the threshold based on the top X% nodes that waste ...
 Another idea is maybe to deref large outputs, so that the numBytesPerArc is 
 more uniform ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4682) Reduce wasted bytes in FST due to array arcs

2013-01-12 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13552061#comment-13552061
 ] 

Robert Muir commented on LUCENE-4682:
-

I can cleanup+commit the patch with the heuristic commented out (so we still 
get the cutover to vint, which i think is an obvious win?)

This way we can benchmark and make sure the heuristic is set 
appropriately/doesnt hurt performance?

 Reduce wasted bytes in FST due to array arcs
 

 Key: LUCENE-4682
 URL: https://issues.apache.org/jira/browse/LUCENE-4682
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/FSTs
Reporter: Michael McCandless
Priority: Minor
 Attachments: kuromoji.wasted.bytes.txt, LUCENE-4682.patch


 When a node is close to the root, or it has many outgoing arcs, the FST 
 writes the arcs as an array (each arc gets N bytes), so we can e.g. bin 
 search on lookup.
 The problem is N is set to the max(numBytesPerArc), so if you have an outlier 
 arc e.g. with a big output, you can waste many bytes for all the other arcs 
 that didn't need so many bytes.
 I generated Kuromoji's FST and found it has 271187 wasted bytes vs total size 
 1535612 = ~18% wasted.
 It would be nice to reduce this.
 One thing we could do without packing is: in addNode, if we detect that 
 number of wasted bytes is above some threshold, then don't do the expansion.
 Another thing, if we are packing: we could record stats in the first pass 
 about which nodes wasted the most, and then in the second pass (paack) we 
 could set the threshold based on the top X% nodes that waste ...
 Another idea is maybe to deref large outputs, so that the numBytesPerArc is 
 more uniform ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4682) Reduce wasted bytes in FST due to array arcs

2013-01-12 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13552063#comment-13552063
 ] 

Michael McCandless commented on LUCENE-4682:


+1

 Reduce wasted bytes in FST due to array arcs
 

 Key: LUCENE-4682
 URL: https://issues.apache.org/jira/browse/LUCENE-4682
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/FSTs
Reporter: Michael McCandless
Priority: Minor
 Attachments: kuromoji.wasted.bytes.txt, LUCENE-4682.patch


 When a node is close to the root, or it has many outgoing arcs, the FST 
 writes the arcs as an array (each arc gets N bytes), so we can e.g. bin 
 search on lookup.
 The problem is N is set to the max(numBytesPerArc), so if you have an outlier 
 arc e.g. with a big output, you can waste many bytes for all the other arcs 
 that didn't need so many bytes.
 I generated Kuromoji's FST and found it has 271187 wasted bytes vs total size 
 1535612 = ~18% wasted.
 It would be nice to reduce this.
 One thing we could do without packing is: in addNode, if we detect that 
 number of wasted bytes is above some threshold, then don't do the expansion.
 Another thing, if we are packing: we could record stats in the first pass 
 about which nodes wasted the most, and then in the second pass (paack) we 
 could set the threshold based on the top X% nodes that waste ...
 Another idea is maybe to deref large outputs, so that the numBytesPerArc is 
 more uniform ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4682) Reduce wasted bytes in FST due to array arcs

2013-01-12 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13552065#comment-13552065
 ] 

Dawid Weiss commented on LUCENE-4682:
-

+1. Nice.


 Reduce wasted bytes in FST due to array arcs
 

 Key: LUCENE-4682
 URL: https://issues.apache.org/jira/browse/LUCENE-4682
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/FSTs
Reporter: Michael McCandless
Priority: Minor
 Attachments: kuromoji.wasted.bytes.txt, LUCENE-4682.patch


 When a node is close to the root, or it has many outgoing arcs, the FST 
 writes the arcs as an array (each arc gets N bytes), so we can e.g. bin 
 search on lookup.
 The problem is N is set to the max(numBytesPerArc), so if you have an outlier 
 arc e.g. with a big output, you can waste many bytes for all the other arcs 
 that didn't need so many bytes.
 I generated Kuromoji's FST and found it has 271187 wasted bytes vs total size 
 1535612 = ~18% wasted.
 It would be nice to reduce this.
 One thing we could do without packing is: in addNode, if we detect that 
 number of wasted bytes is above some threshold, then don't do the expansion.
 Another thing, if we are packing: we could record stats in the first pass 
 about which nodes wasted the most, and then in the second pass (paack) we 
 could set the threshold based on the top X% nodes that waste ...
 Another idea is maybe to deref large outputs, so that the numBytesPerArc is 
 more uniform ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4682) Reduce wasted bytes in FST due to array arcs

2013-01-12 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13552067#comment-13552067
 ] 

Uwe Schindler commented on LUCENE-4682:
---

+1

 Reduce wasted bytes in FST due to array arcs
 

 Key: LUCENE-4682
 URL: https://issues.apache.org/jira/browse/LUCENE-4682
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/FSTs
Reporter: Michael McCandless
Priority: Minor
 Attachments: kuromoji.wasted.bytes.txt, LUCENE-4682.patch


 When a node is close to the root, or it has many outgoing arcs, the FST 
 writes the arcs as an array (each arc gets N bytes), so we can e.g. bin 
 search on lookup.
 The problem is N is set to the max(numBytesPerArc), so if you have an outlier 
 arc e.g. with a big output, you can waste many bytes for all the other arcs 
 that didn't need so many bytes.
 I generated Kuromoji's FST and found it has 271187 wasted bytes vs total size 
 1535612 = ~18% wasted.
 It would be nice to reduce this.
 One thing we could do without packing is: in addNode, if we detect that 
 number of wasted bytes is above some threshold, then don't do the expansion.
 Another thing, if we are packing: we could record stats in the first pass 
 about which nodes wasted the most, and then in the second pass (paack) we 
 could set the threshold based on the top X% nodes that waste ...
 Another idea is maybe to deref large outputs, so that the numBytesPerArc is 
 more uniform ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4682) Reduce wasted bytes in FST due to array arcs

2013-01-12 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13552071#comment-13552071
 ] 

Robert Muir commented on LUCENE-4682:
-

ok i committed the vInt for maxBytesPerArc, but left out the heuristic (so we 
still have the waste!!!)

Here's the comment i added:
{code}
// TODO: try to avoid wasteful cases: disable doFixedArray in that case
/* 
 * 
 * LUCENE-4682: what is a fair heuristic here?
 * It could involve some of these:
 * 1. how busy the node is: nodeIn.inputCount relative to 
frontier[0].inputCount?
 * 2. how much binSearch saves over scan: nodeIn.numArcs
 * 3. waste: numBytes vs numBytesExpanded
 * 
 * the one below just looks at #3
if (doFixedArray) {
  // rough heuristic: make this 1.25 waste factor a parameter to the phd 
ctor
  int numBytes = lastArcStart - startAddress;
  int numBytesExpanded = maxBytesPerArc * nodeIn.numArcs;
  if (numBytesExpanded  numBytes*1.25) {
doFixedArray = false;
  }
}
*/
{code}

I think it would just be best to do some performance benchmarks and figure this 
out.
I know all the kuromoji waste is at node.depth=1 exactly.

Also I indexed all of geonames with this heuristic and it barely changed the 
FST size:

trunk
FST: 45296685
packedFST: 39083451
vint maxBytesPerArc:
FST: 45052386
packedFST: 39083451
vint maxBytesPerArc+heuristic:
FST: 44988400
packedFST: 39029108

So the waste and heuristic doesn't affect all FSTs, only certain ones.


 Reduce wasted bytes in FST due to array arcs
 

 Key: LUCENE-4682
 URL: https://issues.apache.org/jira/browse/LUCENE-4682
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/FSTs
Reporter: Michael McCandless
Priority: Minor
 Attachments: kuromoji.wasted.bytes.txt, LUCENE-4682.patch


 When a node is close to the root, or it has many outgoing arcs, the FST 
 writes the arcs as an array (each arc gets N bytes), so we can e.g. bin 
 search on lookup.
 The problem is N is set to the max(numBytesPerArc), so if you have an outlier 
 arc e.g. with a big output, you can waste many bytes for all the other arcs 
 that didn't need so many bytes.
 I generated Kuromoji's FST and found it has 271187 wasted bytes vs total size 
 1535612 = ~18% wasted.
 It would be nice to reduce this.
 One thing we could do without packing is: in addNode, if we detect that 
 number of wasted bytes is above some threshold, then don't do the expansion.
 Another thing, if we are packing: we could record stats in the first pass 
 about which nodes wasted the most, and then in the second pass (paack) we 
 could set the threshold based on the top X% nodes that waste ...
 Another idea is maybe to deref large outputs, so that the numBytesPerArc is 
 more uniform ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4682) Reduce wasted bytes in FST due to array arcs

2013-01-12 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13552072#comment-13552072
 ] 

Commit Tag Bot commented on LUCENE-4682:


[trunk commit] Robert Muir
http://svn.apache.org/viewvc?view=revisionrevision=1432522

LUCENE-4682: vInt-encode maxBytesPerArc


 Reduce wasted bytes in FST due to array arcs
 

 Key: LUCENE-4682
 URL: https://issues.apache.org/jira/browse/LUCENE-4682
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/FSTs
Reporter: Michael McCandless
Priority: Minor
 Attachments: kuromoji.wasted.bytes.txt, LUCENE-4682.patch


 When a node is close to the root, or it has many outgoing arcs, the FST 
 writes the arcs as an array (each arc gets N bytes), so we can e.g. bin 
 search on lookup.
 The problem is N is set to the max(numBytesPerArc), so if you have an outlier 
 arc e.g. with a big output, you can waste many bytes for all the other arcs 
 that didn't need so many bytes.
 I generated Kuromoji's FST and found it has 271187 wasted bytes vs total size 
 1535612 = ~18% wasted.
 It would be nice to reduce this.
 One thing we could do without packing is: in addNode, if we detect that 
 number of wasted bytes is above some threshold, then don't do the expansion.
 Another thing, if we are packing: we could record stats in the first pass 
 about which nodes wasted the most, and then in the second pass (paack) we 
 could set the threshold based on the top X% nodes that waste ...
 Another idea is maybe to deref large outputs, so that the numBytesPerArc is 
 more uniform ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4678) FST should use paged byte[] instead of single contiguous byte[]

2013-01-12 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-4678:
---

Attachment: LUCENE-4678.patch

Patch, fixing FST.pack to not double-buffer again, using the new 
BytesStore.truncate method to roll back the last N bytes ...

 FST should use paged byte[] instead of single contiguous byte[]
 ---

 Key: LUCENE-4678
 URL: https://issues.apache.org/jira/browse/LUCENE-4678
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/FSTs
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.2, 5.0

 Attachments: LUCENE-4678.patch, LUCENE-4678.patch, LUCENE-4678.patch, 
 LUCENE-4678.patch, LUCENE-4678.patch


 The single byte[] we use today has several limitations, eg it limits us to  
 2.1 GB FSTs (and suggesters in the wild are getting close to this limit), and 
 it causes big RAM spikes during building when a the array has to grow.
 I took basically the same approach as LUCENE-3298, but I want to break out 
 this patch separately from changing all int - long for  2.1 GB support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4682) Reduce wasted bytes in FST due to array arcs

2013-01-12 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13552075#comment-13552075
 ] 

Robert Muir commented on LUCENE-4682:
-

Another simple idea: instead of boolean allowArrayArcs we just make this a 
float: acceptableArrayArcOverhead (or maybe a better name).

you would pass 0 to disable array arcs completely (and we'd internally still 
have our boolean allowArrayArcs and not waste 
time computing stuff if this is actually = 0)

 Reduce wasted bytes in FST due to array arcs
 

 Key: LUCENE-4682
 URL: https://issues.apache.org/jira/browse/LUCENE-4682
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/FSTs
Reporter: Michael McCandless
Priority: Minor
 Attachments: kuromoji.wasted.bytes.txt, LUCENE-4682.patch


 When a node is close to the root, or it has many outgoing arcs, the FST 
 writes the arcs as an array (each arc gets N bytes), so we can e.g. bin 
 search on lookup.
 The problem is N is set to the max(numBytesPerArc), so if you have an outlier 
 arc e.g. with a big output, you can waste many bytes for all the other arcs 
 that didn't need so many bytes.
 I generated Kuromoji's FST and found it has 271187 wasted bytes vs total size 
 1535612 = ~18% wasted.
 It would be nice to reduce this.
 One thing we could do without packing is: in addNode, if we detect that 
 number of wasted bytes is above some threshold, then don't do the expansion.
 Another thing, if we are packing: we could record stats in the first pass 
 about which nodes wasted the most, and then in the second pass (paack) we 
 could set the threshold based on the top X% nodes that waste ...
 Another idea is maybe to deref large outputs, so that the numBytesPerArc is 
 more uniform ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4417) Re-Add the backwards compatibility tests to 4.1 branch

2013-01-12 Thread Steve Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe updated LUCENE-4417:
---

Priority: Blocker  (was: Major)

We shouldn't release 4.1 until at least lucene-core backwards tests are 
re-enabled.

 Re-Add the backwards compatibility tests to 4.1 branch
 --

 Key: LUCENE-4417
 URL: https://issues.apache.org/jira/browse/LUCENE-4417
 Project: Lucene - Core
  Issue Type: Task
  Components: general/test
Reporter: Uwe Schindler
Assignee: Uwe Schindler
Priority: Blocker
 Fix For: 4.1


 In 4.0 we have no backwards compatibility, but in 4.1 we must again 
 ivy-retrieve the 4.0 JAR file and run the core tests again (like in 3.6). We 
 may think about other modules, too, so all modules that must be backwards 
 compatible should be added to this build.
 I will work on this once we have a release candidate in Maven Central.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-2125) Ability to store and retrieve attributes in the inverted index

2013-01-12 Thread Steve Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe updated LUCENE-2125:
---

Fix Version/s: (was: 4.1)
   4.2

 Ability to store and retrieve attributes in the inverted index
 --

 Key: LUCENE-2125
 URL: https://issues.apache.org/jira/browse/LUCENE-2125
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/index
Affects Versions: 4.0-ALPHA
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Minor
 Fix For: 4.2


 Now that we have the cool attribute-based TokenStream API and also the
 great new flexible indexing features, the next logical step is to
 allow storing the attributes inline in the posting lists. Currently
 this is only supported for the PayloadAttribute.
 The flex search APIs already provide an AttributeSource, so there will
 be a very clean and performant symmetry. It should be seamlessly
 possible for the user to define a new attribute, add it to the
 TokenStream, and then retrieve it from the flex search APIs.
 What I'm planning to do is to add additional methods to the token
 attributes (e.g. by adding a new class TokenAttributeImpl, which
 extends AttributeImpl and is the super class of all impls in
 o.a.l.a.tokenattributes):
 - void serialize(DataOutput)
 - void deserialize(DataInput)
 - boolean storeInIndex()
 The indexer will only call the serialize method of an
 TokenAttributeImpl in case its storeInIndex() returns true. 
 The big advantage here is the ease-of-use: A user can implement in one
 place everything necessary to add the attribute to the index.
 Btw: I'd like to introduce DataOutput and DataInput as super classes
 of IndexOutput and IndexInput. They will contain methods like
 readByte(), readVInt(), etc., but methods such as close(),
 getFilePointer() etc. will stay in the super classes.
 Currently the payload concept is hardcoded in 
 TermsHashPerField and FreqProxTermsWriterPerField. These classes take
 care of copying the contents of the PayloadAttribute over into the 
 intermediate in-memory postinglist representation and reading it
 again. Ideally these classes should not know about specific
 attributes, but only call serialze() on those attributes that shall
 be stored in the posting list.
 We also need to change the PositionsEnum and PositionsConsumer APIs to
 deal with attributes instead of payloads.
 I think the new codecs should all support storing attributes. Only the
 preflex one should be hardcoded to only take the PayloadAttribute into
 account.
 We'll possibly need another extension point that allows us to influence 
 compression across multiple postings. Today we use the
 length-compression trick for the payloads: if the previous payload had
 the same length as the current one, we don't store the length
 explicitly again, but only set a bit in the shifted position VInt. Since
 often all payloads of one posting list have the same length, this
 results in effective compression.
 Now an advanced user might want to implement a similar encoding, where
 it's not enough to just control serialization of a single value, but
 where e.g. the previous position can be taken into account to decide
 how to encode a value. 
 I'm not sure yet how this extension point should look like. Maybe the
 flex APIs are actually already sufficient.
 One major goal of this feature is performance: It ought to be more 
 efficient to e.g. define an attribute that writes and reads a single 
 VInt than storing that VInt as a payload. The payload has the overhead
 of converting the data into a byte array first. An attribute on the other 
 hand should be able to call 'int value = dataInput.readVInt();' directly
 without the byte[] indirection.
 After this part is done I'd like to use a very similar approach for
 column-stride fields.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-1743) MMapDirectory should only mmap large files, small files should be opened using SimpleFS/NIOFS

2013-01-12 Thread Steve Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe updated LUCENE-1743:
---

Fix Version/s: (was: 4.1)
   4.2

 MMapDirectory should only mmap large files, small files should be opened 
 using SimpleFS/NIOFS
 -

 Key: LUCENE-1743
 URL: https://issues.apache.org/jira/browse/LUCENE-1743
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/store
Affects Versions: 2.9
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 4.2


 This is a followup to LUCENE-1741:
 Javadocs state (in FileChannel#map): For most operating systems, mapping a 
 file into memory is more expensive than reading or writing a few tens of 
 kilobytes of data via the usual read and write methods. From the standpoint 
 of performance it is generally only worth mapping relatively large files into 
 memory.
 MMapDirectory should get a user-configureable size parameter that is a lower 
 limit for mmapping files. All files with a sizelimit should be opened using 
 a conventional IndexInput from SimpleFS or NIO (another configuration option 
 for the fallback?).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4246) Fix IndexWriter.close() to not commit or wait for pending merges

2013-01-12 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13552092#comment-13552092
 ] 

Steve Rowe commented on LUCENE-4246:


I'd like to push this to 4.2.  Any objections?

 Fix IndexWriter.close() to not commit or wait for pending merges
 

 Key: LUCENE-4246
 URL: https://issues.apache.org/jira/browse/LUCENE-4246
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 4.1




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-1689) supplementary character handling

2013-01-12 Thread Steve Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe resolved LUCENE-1689.


   Resolution: Fixed
Fix Version/s: (was: 4.2)
   (was: 5.0)

Resolving.  Any remaining problems can be opened as separate issues.

 supplementary character handling
 

 Key: LUCENE-1689
 URL: https://issues.apache.org/jira/browse/LUCENE-1689
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/analysis
Reporter: Robert Muir
Priority: Minor
 Attachments: LUCENE-1689_lowercase_example.txt, LUCENE-1689.patch, 
 LUCENE-1689.patch, LUCENE-1689.patch, testCurrentBehavior.txt


 for Java 5. Java 5 is based on unicode 4, which means variable-width encoding.
 supplementary character support should be fixed for code that works with 
 char/char[]
 For example:
 StandardAnalyzer, SimpleAnalyzer, StopAnalyzer, etc should at least be 
 changed so they don't actually remove suppl characters, or modified to look 
 for surrogates and behave correctly.
 LowercaseFilter should be modified to lowercase suppl. characters correctly.
 CharTokenizer should either be deprecated or changed so that isTokenChar() 
 and normalize() use int.
 in all of these cases code should remain optimized for the BMP case, and 
 suppl characters should be the exception, but still work.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3380) enable FileSwitchDirectory randomly in tests and fix compound-file/NoSuchDirectoryException bugs

2013-01-12 Thread Steve Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe updated LUCENE-3380:
---

Fix Version/s: (was: 4.1)
   4.2

 enable FileSwitchDirectory randomly in tests and fix 
 compound-file/NoSuchDirectoryException bugs
 

 Key: LUCENE-3380
 URL: https://issues.apache.org/jira/browse/LUCENE-3380
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 4.2

 Attachments: LUCENE-3380.patch


 Looks like FileSwitchDirectory has the same bugs in it as LUCENE-3374.
 We should randomly enable this guy in tests and flush them all out the same 
 way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3888) split off the spell check word and surface form in spell check dictionary

2013-01-12 Thread Steve Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe updated LUCENE-3888:
---

Fix Version/s: (was: 4.1)
   4.2

 split off the spell check word and surface form in spell check dictionary
 -

 Key: LUCENE-3888
 URL: https://issues.apache.org/jira/browse/LUCENE-3888
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/spellchecker
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 4.2

 Attachments: LUCENE-3888.patch, LUCENE-3888.patch, LUCENE-3888.patch, 
 LUCENE-3888.patch, LUCENE-3888.patch, LUCENE-3888.patch


 The did you mean? feature by using Lucene's spell checker cannot work well 
 for Japanese environment unfortunately and is the longstanding problem, 
 because the logic needs comparatively long text to check spells, but for some 
 languages (e.g. Japanese), most words are too short to use the spell checker.
 I think, for at least Japanese, the things can be improved if we split off 
 the spell check word and surface form in the spell check dictionary. Then we 
 can use ReadingAttribute for spell checking but CharTermAttribute for 
 suggesting, for example.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3298) FST has hard limit max size of 2.1 GB

2013-01-12 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3298:
---

Attachment: LUCENE-3298.patch

Initial patch with int - long in lots of places ... the Test2BFST is still 
running ...

 FST has hard limit max size of 2.1 GB
 -

 Key: LUCENE-3298
 URL: https://issues.apache.org/jira/browse/LUCENE-3298
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/FSTs
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Attachments: LUCENE-3298.patch, LUCENE-3298.patch, LUCENE-3298.patch


 The FST uses a single contiguous byte[] under the hood, which in java is 
 indexed by int so we cannot grow this over Integer.MAX_VALUE.  It also 
 internally encodes references to this array as vInt.
 We could switch this to a paged byte[] and make the far larger.
 But I think this is low priority... I'm not going to work on it any time soon.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4217) post.jar ignores -Dparams when -Durl is used

2013-01-12 Thread Alexandre Rafalovitch (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13552103#comment-13552103
 ] 

Alexandre Rafalovitch commented on SOLR-4217:
-

Would it be possible to fit this into 4.1? I am trying to use this for an 
example and it is very clunky with the current workaround:
java -Dauto 
-Durl=http://localhost:8983/solr/multivalued/update?f.to.split=truef.to.separator=;;
 -jar post.jar multivalued/multivalued.csv 

The example should be out after 4.1, but it will not wait until 4.2

The change should be trivial, something like:
-
urlStr = System.getProperty(url)
if (urlStr == null)
{
  urlStr = SimplePostTool.appendParam(DEFAULT_POST_URL, params);
}
else
{
  urlStr = SimplePostTool.appendParam(urlStr, params);
}
-

I just don't have the environment setup to do full patch myself yet.

 post.jar ignores -Dparams when -Durl is used
 

 Key: SOLR-4217
 URL: https://issues.apache.org/jira/browse/SOLR-4217
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 4.0
Reporter: Alexandre Rafalovitch
Priority: Minor
 Fix For: 4.2, 5.0


 When post.jar is used with a custom URL (e.g. for multi-core), it silently 
 ignores -Dparams flag and requires parameters to be appended directly to 
 -Durl value.
 The problem is the following code:
 String params = System.getProperty(params, );
 urlStr = System.getProperty(url, 
 SimplePostTool.appendParam(DEFAULT_POST_URL, params));
 The workaround exists (by using 
 -Durl=http://customurl?param1=valueparam2=value;), but it is both 
 undocumented as a special case and clunky as Url and params may be coming 
 from different places. It would be good to have this consistent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS-MAVEN] Lucene-Solr-Maven-4.x #213: POMs out of sync

2013-01-12 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-Maven-4.x/213/

1 tests failed.
FAILED:  org.apache.solr.cloud.SyncSliceTest.testDistribSearch

Error Message:
shard1 should have just been set up to be inconsistent - but it's still 
consistent

Stack Trace:
java.lang.AssertionError: shard1 should have just been set up to be 
inconsistent - but it's still consistent
at 
__randomizedtesting.SeedInfo.seed([400C776269C4BF8E:C1EAF97A1E9BDFB2]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.assertTrue(Assert.java:43)
at org.junit.Assert.assertNotNull(Assert.java:526)
at org.apache.solr.cloud.SyncSliceTest.doTest(SyncSliceTest.java:214)
at 
org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:794)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
   

[jira] [Commented] (LUCENE-4417) Re-Add the backwards compatibility tests to 4.1 branch

2013-01-12 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13552118#comment-13552118
 ] 

Robert Muir commented on LUCENE-4417:
-

Seems pretty complicated, core tests depend upon things like test-framework and 
codecs, which have experimental APIs. (look at 4.0 codebase if you dont believe 
me, whole experimental codecs have been folded into core functionality and 
removed, and so on).

Even if we were to do this, i don't think it would be maintainable. For 
example, take issues that will seriously change the codec API like LUCENE-4547. 

I'd be the first to simply disable the whole thing rather than waste a bunch of 
time fixing outdated tests and experimental codecs from a previous release.

I think it would be more bang for the buck to integrate an API comparison tool 
(like jdiff or whatever) that shows the breaks so we know what they are.


 Re-Add the backwards compatibility tests to 4.1 branch
 --

 Key: LUCENE-4417
 URL: https://issues.apache.org/jira/browse/LUCENE-4417
 Project: Lucene - Core
  Issue Type: Task
  Components: general/test
Reporter: Uwe Schindler
Assignee: Uwe Schindler
Priority: Blocker
 Fix For: 4.1


 In 4.0 we have no backwards compatibility, but in 4.1 we must again 
 ivy-retrieve the 4.0 JAR file and run the core tests again (like in 3.6). We 
 may think about other modules, too, so all modules that must be backwards 
 compatible should be added to this build.
 I will work on this once we have a release candidate in Maven Central.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org