[HUDSON] Lucene-Solr-tests-only-3.x - Build # 6549 - Failure

2011-03-31 Thread Apache Hudson Server
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/6549/

1 tests failed.
REGRESSION:  org.apache.lucene.collation.TestCollationKeyAnalyzer.testThreadSafe

Error Message:
Java heap space

Stack Trace:
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2894)
at 
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:117)
at 
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:589)
at java.lang.StringBuffer.append(StringBuffer.java:337)
at 
java.text.RuleBasedCollator.getCollationKey(RuleBasedCollator.java:617)
at 
org.apache.lucene.collation.CollationKeyFilter.incrementToken(CollationKeyFilter.java:93)
at 
org.apache.lucene.collation.CollationTestBase.assertThreadSafe(CollationTestBase.java:304)
at 
org.apache.lucene.collation.TestCollationKeyAnalyzer.testThreadSafe(TestCollationKeyAnalyzer.java:89)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1082)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1010)




Build Log (for compile errors):
[...truncated 5227 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3003) Move UnInvertedField into Lucene core

2011-03-31 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013869#comment-13013869
 ] 

Dawid Weiss commented on LUCENE-3003:
-

For what it's worth, the instrumentation interface allows one to get exact 
allocation sizes of objects. I put together a small spike at 
https://github.com/dweiss/poligon/tree/master/instrumenter that measures the 
actual allocation size of byte[]. On my hotspot, 64-bit, this yields:

{noformat}
byte[0] takes 24 bytes.
byte[1] takes 32 bytes.
byte[2] takes 32 bytes.
byte[3] takes 32 bytes.
byte[4] takes 32 bytes.
byte[5] takes 32 bytes.
byte[6] takes 32 bytes.
byte[7] takes 32 bytes.
byte[8] takes 32 bytes.
byte[9] takes 40 bytes.
byte[10] takes 40 bytes.
byte[11] takes 40 bytes.
...
{noformat}

IBM's VM yields the same (64-bit), but the version of jrockit that I have 
(which may be an old one, but is 64-bit!) yields:

{noformat}
byte[0] takes 16 bytes.
byte[1] takes 24 bytes.
byte[2] takes 24 bytes.
byte[3] takes 24 bytes.
byte[4] takes 24 bytes.
byte[5] takes 24 bytes.
byte[6] takes 24 bytes.
byte[7] takes 24 bytes.
byte[8] takes 24 bytes.
byte[9] takes 32 bytes.
byte[10] takes 32 bytes.
byte[11] takes 32 bytes.
byte[12] takes 32 bytes.
byte[13] takes 32 bytes.
byte[14] takes 32 bytes.
byte[15] takes 32 bytes.
byte[16] takes 32 bytes.
byte[17] takes 40 bytes.
{noformat}

Don't have access to a 32-bit system right now, but if you're keen on checking, 
checkout that github repo and run:

{noformat}
cd instrumenter
mvn package
java -javaagent:target/instrumenter-0.1.0-SNAPSHOT.jar -version
{noformat}

 Move UnInvertedField into Lucene core
 -

 Key: LUCENE-3003
 URL: https://issues.apache.org/jira/browse/LUCENE-3003
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3003.patch, LUCENE-3003.patch


 Solr's UnInvertedField lets you quickly lookup all terms ords for a
 given doc/field.
 Like, FieldCache, it inverts the index to produce this, and creates a
 RAM-resident data structure holding the bits; but, unlike FieldCache,
 it can handle multiple values per doc, and, it does not hold the term
 bytes in RAM.  Rather, it holds only term ords, and then uses
 TermsEnum to resolve ord - term.
 This is great eg for faceting, where you want to use int ords for all
 of your counting, and then only at the end you need to resolve the
 top N ords to their text.
 I think this is a useful core functionality, and we should move most
 of it into Lucene's core.  It's a good complement to FieldCache.  For
 this first baby step, I just move it into core and refactor Solr's
 usage of it.
 After this, as separate issues, I think there are some things we could
 explore/improve:
   * The first-pass that allocates lots of tiny byte[] looks like it
 could be inefficient.  Maybe we could use the byte slices from the
 indexer for this...
   * We can improve the RAM efficiency of the TermIndex: if the codec
 supports ords, and we are operating on one segment, we should just
 use it.  If not, we can use a more RAM-efficient data structure,
 eg an FST mapping to the ord.
   * We may be able to improve on the main byte[] representation by
 using packed ints instead of delta-vInt?
   * Eventually we should fold this ability into docvalues, ie we'd
 write the byte[] image at indexing time, and then loading would be
 fast, instead of uninverting

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



HTTP ERROR: 400

2011-03-31 Thread Deepak Singh
Getting error messge while indexing file

HTTP ERROR: 400
ERROR:unknown field 'trapped'


Re: HTTP ERROR: 400

2011-03-31 Thread Danil ŢORIN
ERROR 500: unknown context

On Thu, Mar 31, 2011 at 10:21, Deepak Singh deep...@praumtech.com wrote:
 Getting error messge while indexing file

 HTTP ERROR: 400
 ERROR:unknown field 'trapped'

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-236) Field collapsing

2011-03-31 Thread Yuriy Akopov (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013878#comment-13013878
 ] 

Yuriy Akopov commented on SOLR-236:
---

Another question:

The patched version of .war starts and works as expected if I place the 
following simple instruction in solrconfig.xml:

searchComponent name=collapse 
class=org.apache.solr.handler.component.CollapseComponent
/searchComponent

But if I add additional factories like it is advised by the sample config, it 
produces an error when searching with collapsing turned on:

searchComponent name=collapse 
class=org.apache.solr.handler.component.CollapseComponent
collapseCollectorFactory 
class=solr.fieldcollapse.collector.DocumentGroupCountCollapseCollectorFactory 
/
collapseCollectorFactory 
class=solr.fieldcollapse.collector.FieldValueCountCollapseCollectorFactory /
collapseCollectorFactory 
class=solr.fieldcollapse.collector.DocumentFieldsCollapseCollectorFactory /
collapseCollectorFactory name=groupAggregatedData 
class=org.apache.solr.search.fieldcollapse.collector.AggregateCollapseCollectorFactory
function name=sum 
class=org.apache.solr.search.fieldcollapse.collector.aggregate.SumFunction/
function name=avg 
class=org.apache.solr.search.fieldcollapse.collector.aggregate.AverageFunction/
function name=min 
class=org.apache.solr.search.fieldcollapse.collector.aggregate.MinFunction/
function name=max 
class=org.apache.solr.search.fieldcollapse.collector.aggregate.MaxFunction/
/collapseCollectorFactory
/searchComponent

So far it does what I expect from it without additional factories mentioned, 
but still it bothers me that it fails when they're listed. Maybe I placed the 
libraries in a wrong place?

 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
Assignee: Shalin Shekhar Mangar
 Fix For: Next

 Attachments: DocSetScoreCollector.java, 
 NonAdjacentDocumentCollapser.java, NonAdjacentDocumentCollapserTest.java, 
 SOLR-236-1_4_1-NPEfix.patch, SOLR-236-1_4_1-paging-totals-working.patch, 
 SOLR-236-1_4_1.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-branch_3x.patch, SOLR-236-distinctFacet.patch, SOLR-236-trunk.patch, 
 SOLR-236-trunk.patch, SOLR-236-trunk.patch, SOLR-236-trunk.patch, 
 SOLR-236-trunk.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, 
 SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, 
 SOLR-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch, 
 collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, 
 collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, 
 field-collapse-4-with-solrj.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 quasidistributed.additional.patch, solr-236.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

--
This message is automatically generated by JIRA.

[HUDSON] Lucene-Solr-tests-only-trunk - Build # 6565 - Failure

2011-03-31 Thread Apache Hudson Server
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/6565/

3 tests failed.
REGRESSION:  org.apache.lucene.index.TestNRTThreads.testNRTThreads

Error Message:
null

Stack Trace:
junit.framework.AssertionFailedError
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1221)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1149)
at org.apache.lucene.index.FieldInfos.putInternal(FieldInfos.java:280)
at org.apache.lucene.index.FieldInfos.clone(FieldInfos.java:302)
at org.apache.lucene.index.SegmentInfo.clone(SegmentInfo.java:345)
at org.apache.lucene.index.SegmentInfos.clone(SegmentInfos.java:374)
at 
org.apache.lucene.index.DirectoryReader.init(DirectoryReader.java:165)
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:360)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:316)
at 
org.apache.lucene.index.TestNRTThreads.testNRTThreads(TestNRTThreads.java:244)


REGRESSION:  org.apache.lucene.index.TestSegmentTermDocs.test

Error Message:
Some threads threw uncaught exceptions!

Stack Trace:
junit.framework.AssertionFailedError: Some threads threw uncaught exceptions!
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1221)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1149)
at 
org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:521)
at 
org.apache.lucene.index.TestSegmentTermDocs.tearDown(TestSegmentTermDocs.java:45)


REGRESSION:  
org.apache.lucene.index.codecs.preflex.TestSurrogates.testSurrogatesOrder

Error Message:
Some threads threw uncaught exceptions!

Stack Trace:
junit.framework.AssertionFailedError: Some threads threw uncaught exceptions!
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1221)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1149)
at 
org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:521)




Build Log (for compile errors):
[...truncated 3276 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2959) [GSoC] Implementing State of the Art Ranking for Lucene

2011-03-31 Thread David Mark Nemeskey (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013907#comment-13013907
 ] 

David Mark Nemeskey commented on LUCENE-2959:
-

Robert: thanks for all the info! It's nice to see so much work has already been 
done. I plan to delve into it after the selection, and try to get other things 
out of the way until then, so that I can concentrate on GSoC during the summer.

I think the main point would be to make the addition of a new ranking function 
as easy as possible. At least a prototype implementation should be very 
straightforward, even at the expense of performance. Then, if the new method 
provides good results, the developer can go on to the lower level to squeeze 
more juice out of it. It's hard for me to discuss new this without knowing the 
code, of course, but do you think it is possible?

Even though I added a Performance section to my proposal 
(http://www.google-melange.com/gsoc/proposal/review/google/gsoc2011/davidnemeskey/1),
 I see now that it's probably more important than I believed it to be at first. 
I think I will follow your advice and concentrate on how to make BM25F fast. It 
may be a bit tougher nut to crack than DFR, as the latter has logarithms 
scattered all over it. However, the first thing that comes to mind is that the 
tf-BM25 curve becomes almost flat very quickly (less so for a high k1 value, 
though). So it may be possible to pre-compute a tf map or array for a query.

 [GSoC] Implementing State of the Art Ranking for Lucene
 ---

 Key: LUCENE-2959
 URL: https://issues.apache.org/jira/browse/LUCENE-2959
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Examples, Javadocs, Query/Scoring
Reporter: David Mark Nemeskey
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Attachments: LUCENE-2959_mockdfr.patch, implementation_plan.pdf, 
 proposal.pdf


 Lucene employs the Vector Space Model (VSM) to rank documents, which compares
 unfavorably to state of the art algorithms, such as BM25. Moreover, the 
 architecture is
 tailored specically to VSM, which makes the addition of new ranking functions 
 a non-
 trivial task.
 This project aims to bring state of the art ranking methods to Lucene and to 
 implement a
 query architecture with pluggable ranking functions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2378) FST-based Lookup (suggestions) for prefix matches.

2011-03-31 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013932#comment-13013932
 ] 

Dawid Weiss commented on SOLR-2378:
---

I didn't have time to take care of this until now, apologies. So, looking at 
Lookup#lookup(), I just wanted to clarify:

{code}
  /**
   * Look up a key and return possible completion for this key.
   * @param key lookup key. Depending on the implementation this may be
   * a prefix, misspelling, or even infix.
   * @param onlyMorePopular return only more popular results
   * @param num maximum number of results to return
   * @return a list of possible completions, with their relative weight (e.g. 
popularity)
   */
  public abstract ListLookupResult lookup(String key, boolean 
onlyMorePopular, int num);
{code}

the onlyMorePopular means more popular than... what? I see TSTLookup and 
JaspellLookup (Andrzej, will you confirm, please?) sorts matches in a priority 
queue by their associated value (frequency I guess). This makes sense, but 
onlyMorePopular is misleading -- it should be called onlyMostPopular (those 
with the native knowledge of English subtlieties, speak up if I'm right here).

I also see and wanted to confirm -- the Dictionary can come from various 
sources, so we can't rely on the presence of the built-in Lucene automaton, can 
we? Even if I wanted to reuse it, there'd be no easy way to determine if it's a 
full automaton, or a partial one (because of the gaps/trimming)... I think I'll 
just implement the solution by building the automaton from whatever Dictionary 
comes in and serializing/ deserializing it similar to TSTLookup.

Sounds ok?





 FST-based Lookup (suggestions) for prefix matches.
 --

 Key: SOLR-2378
 URL: https://issues.apache.org/jira/browse/SOLR-2378
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Reporter: Dawid Weiss
Assignee: Dawid Weiss
  Labels: lookup, prefix
 Fix For: 4.0


 Implement a subclass of Lookup based on finite state automata/ transducers 
 (Lucene FST package). This issue is for implementing a relatively basic 
 prefix matcher, we will handle infixes and other types of input matches 
 gradually.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Automaton-based suggest lookup

2011-03-31 Thread Dawid Weiss
https://issues.apache.org/jira/browse/SOLR-2378

Andrzej, Mike, would you peek at my latest comment and commit if I got
the API requirements right? I'll implement the FSA-based suggested
based on the trunk code layout for now and we can move it around later
if needed.

Dawid

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2378) FST-based Lookup (suggestions) for prefix matches.

2011-03-31 Thread Andrzej Bialecki (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013933#comment-13013933
 ] 

Andrzej Bialecki  commented on SOLR-2378:
-

bq. I see TSTLookup and JaspellLookup (Andrzej, will you confirm, please?) 
sorts matches in a priority queue by their associated value (frequency I guess)

Correct. I agree that the name is so-so, I inherited it from the spellchecker 
API - feel free to change it.

bq. the Dictionary can come from various sources, ...

Yes. This is again a legacy of the Lucene SpellChecker API. I tried to extend 
this API in Solr without changing it in Lucene (see the *IteratorWrapper 
classes and TermFreqIterator) but ultimately it would be better to refactor 
this.

 FST-based Lookup (suggestions) for prefix matches.
 --

 Key: SOLR-2378
 URL: https://issues.apache.org/jira/browse/SOLR-2378
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Reporter: Dawid Weiss
Assignee: Dawid Weiss
  Labels: lookup, prefix
 Fix For: 4.0


 Implement a subclass of Lookup based on finite state automata/ transducers 
 (Lucene FST package). This issue is for implementing a relatively basic 
 prefix matcher, we will handle infixes and other types of input matches 
 gradually.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: HTTP ERROR: 400

2011-03-31 Thread Erick Erickson
Deepak:

1 please put questions like this on the users list. This list for
development of Lucene and Solr.
2 please provide context

Best
Erick

On Thu, Mar 31, 2011 at 3:21 AM, Deepak Singh deep...@praumtech.com wrote:
 Getting error messge while indexing file

 HTTP ERROR: 400
 ERROR:unknown field 'trapped'

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2959) [GSoC] Implementing State of the Art Ranking for Lucene

2011-03-31 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013944#comment-13013944
 ] 

Robert Muir commented on LUCENE-2959:
-

{quote}
I think the main point would be to make the addition of a new ranking function 
as easy as possible. At least a prototype implementation should be very 
straightforward, even at the expense of performance. Then, if the new method 
provides good results, the developer can go on to the lower level to squeeze 
more juice out of it. It's hard for me to discuss new this without knowing the 
code, of course, but do you think it is possible?
{quote}

This sounds great! For example, you could extend the low-level api, gather 
every possible statistic that lucene has, and present a high-level api that 
looks more like terrier's scoring api (which i'm guessing is what researchers 
would prefer?), where they basically implement the scoring in one method with 
all the stats there.

So someone would extend this API to do prototyping, it would make it easier to 
experiment.

{quote}
I think I will follow your advice and concentrate on how to make BM25F fast.
{quote}

Actually as far as BM25f, this one presents a few challenges (some already 
discussed on LUCENE-2091). 

To summarize:
* for any field, Lucene has a per-field terms dictionary that contains that 
term's docFreq. To compute BM25f's IDF method would be challenging, because it 
wants a docFreq across all the fields. (its not clear to me at a glance 
either from the original paper, if this should be across only the fields in the 
query, across all the fields in the document, and if a static schema is 
implied in this scoring system (in lucene document 1 can have 3 fields and 
document 2 can have 40 different ones, even with different properties).
* the same issue applies to length normalization, lucene has a field length 
but really no concept of document length. 

So I just wanted to mention that while its possible here to apply a per-field 
TF boost before the non-linear TF saturation, its not immediately clear how to 
adjust the BM25f formula to lucene: how to combine these scores without using a 
(wasteful) catch-all-field and some lying behind the scenes to force this 
catch-all-field's length normalization and docFreq to be used.

Too many questions arise for BM25f and how it would fit with lucene, for 
example the fact that multiple fields can really mean anything, and having a 
field in lucene doesnt mean at all that it was in your original document! For 
example, Solr users frequently use a copyField to take the content of one 
field, duplicate it to a different field (and perhaps apply some processing). 
In terms of things like length normalization, it seems that document length 
calculated as the sum across the fields would be wrong for many use cases.

I only wanted to recommend against this one because of this rather serious 
challenge, it seems its something we might want to table at the moment: lucene 
is changing fast and as new capabilities arise, we might realize there is a 
more elegant way to address this... but at the moment I think I would recommend 
starting with BM25.




 [GSoC] Implementing State of the Art Ranking for Lucene
 ---

 Key: LUCENE-2959
 URL: https://issues.apache.org/jira/browse/LUCENE-2959
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Examples, Javadocs, Query/Scoring
Reporter: David Mark Nemeskey
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Attachments: LUCENE-2959_mockdfr.patch, implementation_plan.pdf, 
 proposal.pdf


 Lucene employs the Vector Space Model (VSM) to rank documents, which compares
 unfavorably to state of the art algorithms, such as BM25. Moreover, the 
 architecture is
 tailored specically to VSM, which makes the addition of new ranking functions 
 a non-
 trivial task.
 This project aims to bring state of the art ranking methods to Lucene and to 
 implement a
 query architecture with pluggable ranking functions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2399) Solr Admin Interface, reworked

2011-03-31 Thread Upayavira (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013971#comment-13013971
 ] 

Upayavira commented on SOLR-2399:
-

https://github.com/upayavira/solr-admin/commit/a96d7b8bc63cb5ae6125c0a2c91302f553782ef2

 * added current time (note, it is the time the overall page was loaded, not 
now). 
 * fixed cwd. I've added 'ms' to ping and made it stand out more.
 * fixed threaddump to make it work multicore
 * moved java properties to global level
 * removed replication link - info is already on dashboard

IMO this is now ready for testing - folks, please try it on your browsers (I've 
seen it work on Firefox/Chrome on Linux and Firefox on Mac). Anyone able to try 
it on IE?

 Solr Admin Interface, reworked
 --

 Key: SOLR-2399
 URL: https://issues.apache.org/jira/browse/SOLR-2399
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Reporter: Stefan Matheis (steffkes)
Priority: Minor
 Fix For: 4.0


 *The idea was to create a new, fresh (and hopefully clean) Solr Admin 
 Interface.* [Based on this 
 [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]]
 I've quickly created a Github-Repository (Just for me, to keep track of the 
 changes)
 » https://github.com/steffkes/solr-admin 
 [This commit shows the 
 differences|https://github.com/steffkes/solr-admin/commit/5f80bb0ea9deb4b94162632912fe63386f869e0d]
  between old/existing index.jsp and my new one (which is could 
 copy-cut/paste'd from the existing one).
 Main Action takes place in 
 [js/script.js|https://github.com/steffkes/solr-admin/blob/master/js/script.js]
  which is actually neither clean nor pretty .. just work-in-progress.
 Actually it's Work in Progress, so ... give it a try. It's developed with 
 Firefox as Browser, so, for a first impression .. please don't use _things_ 
 like Internet Explorer or so ;o
 Jan already suggested a bunch of good things, i'm sure there are more ideas 
 over there :)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Brainstorming on Improving the Release Process

2011-03-31 Thread Upayavira


On Wed, 30 Mar 2011 12:00 -0400, Grant Ingersoll gsing...@apache.org
wrote:
 
 On Mar 30, 2011, at 9:19 AM, Robert Muir wrote:
 
  On Wed, Mar 30, 2011 at 8:22 AM, Grant Ingersoll gsing...@apache.org 
  wrote:
  (Long post, please bear with me and please read!)
  
  Now that we have the release done (I'm working through the publication 
  process now), I want to start the process of thinking about how we can 
  improve the release process.  As I see it, building the artifacts and 
  checking the legal items are now almost completely automated and testable 
  at earlier stages in the game.
  
  
  Thanks for writing this up. Here is my major beef with 2 concrete 
  suggestions:
  
  It seems the current process is that we all develop and develop and at
  some point we agree we want to try to release. At this point its the
  RM's job to polish a turd, and no serious community participation
  takes place until an RC is actually produced: so its a chicken-and-egg
  thing, perhaps with the RM even declaring publicly 'i dont expect this
  to actually pass, i'm just building this to make you guys look at it'.
  
  I think its probably hard/impossible to force people to review this
  stuff before an RC, for some reason a VOTE seems to be the only thing
  for people to take it seriously.
  
  But what we can do is ask ourselves, how did the codebase become a
  turd in the first place? Because at one point we released off the code
  and the packaging was correct, there weren't javadocs warnings, and
  there weren't licensing issues, etc.
  
  So I think an important step would be to try to make more of this
  continuous, in other words, we did all the work to fix up the
  codebase to make it releasable, lets implement things to enforce it
  stays this way. It seems we did this for some things (e.g. code
  correctness with the unit tests and licensing with the license
  checker) but there is more to do.
  
  A. implement the hudson-patch capability to vote -1 on patches that
  break things as soon as they go on the JIRA issues. this is really
  early feedback and I think will go a long way.
 
 +1.  I asked on builds@a.o if there was any standard way of doing this,
 or if there is a place someone can point me at to get this going.
 
 
  B. increase the scope of our 'ant test'/hudson runs to check more
  things. For example, it would be nice if they failed on javadocs
  warnings. Its insane if you think about it: we go to a ton of effort
  to implement really cruel and picky unit tests to verify the
  correctness of our code, but you can almost break the packaging and
  documentation completely and the build still passes.
 
 +1 on failing on javadocs.
 
 Also, what about code coverage?  We run all this Clover stuff, but how do
 we incorporate that into our dev. cycle?
 
  
  Anyway, we spend a lot of time on trying to make our code correct, but
  our build is a bit messy. I know if we look at the time we spend on
  search performance and correctness, and applied even 1% of this effort
  to our build system to make it fast, picky, and and cleaner, that we
  would be in much better shape as a development team, with a faster
  compile/test/debug cycle to boot... I think there is a lot of
  low-hanging fruit here, and I think this thread has encouraged me to
  revisit the build and try to straighten some of this out.
 
 Yeah, our build is a bit messy, lots of recursion.  I'm still not totally
 happy w/ how license checking is hooked in.

Are you willing to say more? I have a little time, and have done a lot
of work with Ant. Maybe I could help.

Upayavira
--- 
Enterprise Search Consultant at Sourcesense UK, 
Making Sense of Open Source


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks

2011-03-31 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013974#comment-13013974
 ] 

Simon Willnauer commented on LUCENE-2573:
-

I run a couple of benchmarks with interesting results the graph below show 
documents per second for the RT branch with DWPT yielding a very good IO/CPU 
utilization and overall throughput is much better than trunks.
!http://people.apache.org/~simonw/DocumentsWriterPerThread_dps.png! 
Yet, when we look at trunk the peak performance is much better on trunk than on 
DWPT. The reason for that I think is that we flush concurrently which takes at 
most one thread out of the loop, those are the little drops in docs/sec. This 
does not yet explain the reason for the constantly lower max indexing rate, I 
suspect that this is at least influenced due to the fact that flushing is very 
very CPU intensive. At the same time CMS might kick in way more often since we 
are writing more segments which are also smaller compared to trunk. Eventually, 
I need to run a profiler and see what is going on.
!http://people.apache.org/~simonw/Trunk_dps.png! 

Interesting is that beside the nice CPU utilization we also have an nearly 
perfect IO utilization. The graph below shows that we are consistently using IO 
to flush segments. the width of the bars show the time it took to flush a 
single DWPT, there is almost no overlap.
!http://people.apache.org/~simonw/DocumentsWriterPerThread_flush.png! 

Overall those are super results! Good job everybody!

simon

 Tiered flushing of DWPTs by RAM with low/high water marks
 -

 Key: LUCENE-2573
 URL: https://issues.apache.org/jira/browse/LUCENE-2573
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael Busch
Assignee: Simon Willnauer
Priority: Minor
 Fix For: Realtime Branch

 Attachments: LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, 
 LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, 
 LUCENE-2573.patch, LUCENE-2573.patch


 Now that we have DocumentsWriterPerThreads we need to track total consumed 
 RAM across all DWPTs.
 A flushing strategy idea that was discussed in LUCENE-2324 was to use a 
 tiered approach:  
 - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM)
 - Flush all DWPTs at a high water mark (e.g. at 110%)
 - Use linear steps in between high and low watermark:  E.g. when 5 DWPTs are 
 used, flush at 90%, 95%, 100%, 105% and 110%.
 Should we allow the user to configure the low and high water mark values 
 explicitly using total values (e.g. low water mark at 120MB, high water mark 
 at 140MB)?  Or shall we keep for simplicity the single setRAMBufferSizeMB() 
 config method and use something like 90% and 110% for the water marks?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Brainstorming on Improving the Release Process

2011-03-31 Thread Robert Muir
On Thu, Mar 31, 2011 at 9:40 AM, Upayavira u...@odoko.co.uk wrote:

 Are you willing to say more? I have a little time, and have done a lot
 of work with Ant. Maybe I could help.

 Upayavira

Thanks, there is some followup discussion on this JIRA issue:
https://issues.apache.org/jira/browse/SOLR-2002

The prototype patch I refer to in the comments where solr build
system is changed to extend lucene's is the latest _merged.patch on
the issue: 
https://issues.apache.org/jira/secure/attachment/12456811/SOLR-2002_merged.patch

(Additionally as sort of a followup there are more comments/ideas
about additional things we could do besides just refactoring the build
system to be faster and simpler)

As a first step I think the patch needs to be brought up to trunk (it
gets out of date fast). I mentioned on the issue we can simply create
a branch to make coordination easier. A branch might seem silly for a
thing like this, but it would at least allow us to work together and
people could contribute parts (e.g. PMD integration or something)
without having to juggle huge out of sync patches.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Handling wildcard search containing special characters (unicode)

2011-03-31 Thread Patrick ALLAERT
Hello,

Facing a Solr issue, I have been told that queries with a term like:
Kiinteistösih*
will not match the Finnish word Kiinteistösihteeri and that it's a
known limitation of Lucene.
Instead, using the word directly, without wildcard, works.

Do you confirm this a known limitation/bug?
If so do you have any registered issue about that?

Searching the ML archive and the issue tracker in both SOLR and LUCENE
projects didn't provide me a pointer to this problem.

One of the reference I found on the web talking about this problem is:
http://forum.compass-project.org/message.jspa?messageID=227709
But again, no pointer to a discussion or issue.

Thanks in advance for your help,
Patrick

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Handling wildcard search containing special characters (unicode)

2011-03-31 Thread Robert Muir
On Thu, Mar 31, 2011 at 9:51 AM, Patrick ALLAERT
patrick.alla...@gmail.com wrote:
 Hello,

 Facing a Solr issue, I have been told that queries with a term like:
 Kiinteistösih*
 will not match the Finnish word Kiinteistösihteeri and that it's a
 known limitation of Lucene.
 Instead, using the word directly, without wildcard, works.

 Do you confirm this a known limitation/bug?
 If so do you have any registered issue about that?

this isn't the case, there's no unicode limitation here.

more likely, your analyzer is configured to lowercase text, so in the
index Kiinteistösihteeri is really kiinteistösihteeri
in other words, try kiinteistösih* and see how that works.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3003) Move UnInvertedField into Lucene core

2011-03-31 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013986#comment-13013986
 ] 

Yonik Seeley commented on LUCENE-3003:
--

Thanks Dawid, this suggests that we could round up to the 8 byte boundary for 
free.


 Move UnInvertedField into Lucene core
 -

 Key: LUCENE-3003
 URL: https://issues.apache.org/jira/browse/LUCENE-3003
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3003.patch, LUCENE-3003.patch


 Solr's UnInvertedField lets you quickly lookup all terms ords for a
 given doc/field.
 Like, FieldCache, it inverts the index to produce this, and creates a
 RAM-resident data structure holding the bits; but, unlike FieldCache,
 it can handle multiple values per doc, and, it does not hold the term
 bytes in RAM.  Rather, it holds only term ords, and then uses
 TermsEnum to resolve ord - term.
 This is great eg for faceting, where you want to use int ords for all
 of your counting, and then only at the end you need to resolve the
 top N ords to their text.
 I think this is a useful core functionality, and we should move most
 of it into Lucene's core.  It's a good complement to FieldCache.  For
 this first baby step, I just move it into core and refactor Solr's
 usage of it.
 After this, as separate issues, I think there are some things we could
 explore/improve:
   * The first-pass that allocates lots of tiny byte[] looks like it
 could be inefficient.  Maybe we could use the byte slices from the
 indexer for this...
   * We can improve the RAM efficiency of the TermIndex: if the codec
 supports ords, and we are operating on one segment, we should just
 use it.  If not, we can use a more RAM-efficient data structure,
 eg an FST mapping to the ord.
   * We may be able to improve on the main byte[] representation by
 using packed ints instead of delta-vInt?
   * Eventually we should fold this ability into docvalues, ie we'd
 write the byte[] image at indexing time, and then loading would be
 fast, instead of uninverting

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [jira] [Commented] (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks

2011-03-31 Thread Jason Rutherglen
Dr
On Mar 31, 2011 9:44 AM, Simon Willnauer (JIRA) j...@apache.org wrote:

 [
https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013974#comment-13013974]

 Simon Willnauer commented on LUCENE-2573:
 -

 I run a couple of benchmarks with interesting results the graph below show
documents per second for the RT branch with DWPT yielding a very good IO/CPU
utilization and overall throughput is much better than trunks.
 !http://people.apache.org/~simonw/DocumentsWriterPerThread_dps.png!
 Yet, when we look at trunk the peak performance is much better on trunk
than on DWPT. The reason for that I think is that we flush concurrently
which takes at most one thread out of the loop, those are the little drops
in docs/sec. This does not yet explain the reason for the constantly lower
max indexing rate, I suspect that this is at least influenced due to the
fact that flushing is very very CPU intensive. At the same time CMS might
kick in way more often since we are writing more segments which are also
smaller compared to trunk. Eventually, I need to run a profiler and see what
is going on.
 !http://people.apache.org/~simonw/Trunk_dps.png!

 Interesting is that beside the nice CPU utilization we also have an nearly
perfect IO utilization. The graph below shows that we are consistently using
IO to flush segments. the width of the bars show the time it took to flush a
single DWPT, there is almost no overlap.
 !http://people.apache.org/~simonw/DocumentsWriterPerThread_flush.png!

 Overall those are super results! Good job everybody!

 simon

 Tiered flushing of DWPTs by RAM with low/high water marks
 -

 Key: LUCENE-2573
 URL: https://issues.apache.org/jira/browse/LUCENE-2573
 Project: Lucene - Java
 Issue Type: Improvement
 Components: Index
 Reporter: Michael Busch
 Assignee: Simon Willnauer
 Priority: Minor
 Fix For: Realtime Branch

 Attachments: LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch,
LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch,
LUCENE-2573.patch, LUCENE-2573.patch


 Now that we have DocumentsWriterPerThreads we need to track total
consumed RAM across all DWPTs.
 A flushing strategy idea that was discussed in LUCENE-2324 was to use a
tiered approach:
 - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM)
 - Flush all DWPTs at a high water mark (e.g. at 110%)
 - Use linear steps in between high and low watermark: E.g. when 5 DWPTs
are used, flush at 90%, 95%, 100%, 105% and 110%.
 Should we allow the user to configure the low and high water mark values
explicitly using total values (e.g. low water mark at 120MB, high water mark
at 140MB)? Or shall we keep for simplicity the single setRAMBufferSizeMB()
config method and use something like 90% and 110% for the water marks?

 --
 This message is automatically generated by JIRA.
 For more information on JIRA, see: http://www.atlassian.com/software/jira

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS-MAVEN] Lucene-Solr-Maven-trunk #75: POMs out of sync

2011-03-31 Thread Apache Hudson Server
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-Maven-trunk/75/

1 tests failed.
FAILED:  
org.apache.solr.schema.TestICUCollationField.org.apache.solr.schema.TestICUCollationField

Error Message:
Cannot find resource: solr-analysis-extras/conf/solrconfig-icucollate.xml

Stack Trace:
java.lang.RuntimeException: Cannot find resource: 
solr-analysis-extras/conf/solrconfig-icucollate.xml
at org.apache.solr.SolrTestCaseJ4.getFile(SolrTestCaseJ4.java:1056)
at 
org.apache.solr.schema.TestICUCollationField.setupSolrHome(TestICUCollationField.java:77)
at 
org.apache.solr.schema.TestICUCollationField.beforeClass(TestICUCollationField.java:41)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:27)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
at 
org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:35)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:146)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:97)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at 
org.apache.maven.surefire.booter.ProviderFactory$ClassLoaderProxy.invoke(ProviderFactory.java:103)
at $Proxy0.invoke(Unknown Source)
at 
org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireStarter.java:145)
at 
org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcess(SurefireStarter.java:87)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:69)




Build Log (for compile errors):
[...truncated 18422 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3006) Javadocs warnings should fail the build

2011-03-31 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014011#comment-13014011
 ] 

Steven Rowe commented on LUCENE-3006:
-

bq. This patch eliminates javadoc warnings on trunk under Sun JDK 1.5.0_22 and 
1.6.0_21 for Lucene, and for just 1.6.0_21 on Solr.

Committed:
- r1087319: trunk
- r1087329: branch_3x

On branch_3x, under both Sun JDK 1.5.0_22 and 1.6.0_21, there are no javadoc 
warnings for either Solr or Lucene.

 Javadocs warnings should fail the build
 ---

 Key: LUCENE-3006
 URL: https://issues.apache.org/jira/browse/LUCENE-3006
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 3.2, 4.0
Reporter: Grant Ingersoll
 Attachments: LUCENE-3006-javadoc-warning-cleanup.patch, 
 LUCENE-3006.patch, LUCENE-3006.patch


 We should fail the build when there are javadocs warnings, as this should not 
 be the Release Manager's job to fix all at once right before the release.
 See 
 http://www.lucidimagination.com/search/document/14bd01e519f39aff/brainstorming_on_improving_the_release_process

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: Using contrib Lucene Benchmark with Solr

2011-03-31 Thread Burton-West, Tom
Thanks Robert and Grant,

Does this need a separate JIRA issue dealing specifically with the ability of 
benchmark to read Solr config settings, or is it subsumed in LUCENE-2845? or 
should I just add a comment to LUCENE-2845?

Tom
-Original Message-
From: Robert Muir [mailto:rcm...@gmail.com] 
Sent: Wednesday, March 30, 2011 7:56 PM
To: dev@lucene.apache.org
Subject: Re: Using contrib Lucene Benchmark with Solr

On Wed, Mar 30, 2011 at 4:49 PM, Burton-West, Tom tburt...@umich.edu wrote:
 I would like to be able to use the Lucene Benchmark code with Solr to run
 some indexing tests.  It would be nice if Lucene Benchmark to could read
 Solr configuration rather than having to translate my filter chain and other
 parameters into Lucene.   Would it be appropriate to open a JIRA issue for
 this or is this something that doesn’t really make any sense?


I think it makes great sense, we moved the benchmarking facility to a
toplevel module so we can do this:
https://issues.apache.org/jira/browse/LUCENE-2845, but we didn't
actually add any integration yet.

I've been in this exact same situation too when trying to use the
benchmark package, and I'd sure like to see better solr integration
with the benchmarking package myself.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks

2011-03-31 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014016#comment-13014016
 ] 

Jason Rutherglen commented on LUCENE-2573:
--

bq. influenced due to the fact that flushing is very very CPU intensive

Do you think this is due mostly to the vint decoding?  We're not interleaving 
postings on flush with this patch so the CPU consumption should be somewhat 
lower.

bq. At the same time CMS might kick in way more often since we are writing more 
segments which are also smaller compared to trunk

This's probably the more likely case.  In general, we may be able to default to 
a higher overall RAM buffer size, and perhaps there won't be degradation in 
indexing performance like there is with trunk?  In the future with RT we could 
get fancy and selectively merge segments as we're flushing, if writing larger 
segments is important.  

I'd personally prefer to write out 1-2 GB segments, and limit the number of 
DWPTs to 2-3, mainly for servers that are concurrently indexing and searching 
(eg, the RT use case).  I think the current default number of thread states is 
a bit high.  

 Tiered flushing of DWPTs by RAM with low/high water marks
 -

 Key: LUCENE-2573
 URL: https://issues.apache.org/jira/browse/LUCENE-2573
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael Busch
Assignee: Simon Willnauer
Priority: Minor
 Fix For: Realtime Branch

 Attachments: LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, 
 LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, 
 LUCENE-2573.patch, LUCENE-2573.patch


 Now that we have DocumentsWriterPerThreads we need to track total consumed 
 RAM across all DWPTs.
 A flushing strategy idea that was discussed in LUCENE-2324 was to use a 
 tiered approach:  
 - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM)
 - Flush all DWPTs at a high water mark (e.g. at 110%)
 - Use linear steps in between high and low watermark:  E.g. when 5 DWPTs are 
 used, flush at 90%, 95%, 100%, 105% and 110%.
 Should we allow the user to configure the low and high water mark values 
 explicitly using total values (e.g. low water mark at 120MB, high water mark 
 at 140MB)?  Or shall we keep for simplicity the single setRAMBufferSizeMB() 
 config method and use something like 90% and 110% for the water marks?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [HUDSON] Lucene-Solr-tests-only-trunk - Build # 6565 - Failure

2011-03-31 Thread Simon Willnauer
This on is weird seems like there is a synchronized missing on
FieldInfoBiMap#containsConsistent

I try to reproduce first.

simon
On Thu, Mar 31, 2011 at 11:37 AM, Apache Hudson Server
hud...@hudson.apache.org wrote:
 Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/6565/

 3 tests failed.
 REGRESSION:  org.apache.lucene.index.TestNRTThreads.testNRTThreads

 Error Message:
 null

 Stack Trace:
 junit.framework.AssertionFailedError
        at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1221)
        at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1149)
        at org.apache.lucene.index.FieldInfos.putInternal(FieldInfos.java:280)
        at org.apache.lucene.index.FieldInfos.clone(FieldInfos.java:302)
        at org.apache.lucene.index.SegmentInfo.clone(SegmentInfo.java:345)
        at org.apache.lucene.index.SegmentInfos.clone(SegmentInfos.java:374)
        at 
 org.apache.lucene.index.DirectoryReader.init(DirectoryReader.java:165)
        at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:360)
        at org.apache.lucene.index.IndexReader.open(IndexReader.java:316)
        at 
 org.apache.lucene.index.TestNRTThreads.testNRTThreads(TestNRTThreads.java:244)


 REGRESSION:  org.apache.lucene.index.TestSegmentTermDocs.test

 Error Message:
 Some threads threw uncaught exceptions!

 Stack Trace:
 junit.framework.AssertionFailedError: Some threads threw uncaught exceptions!
        at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1221)
        at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1149)
        at 
 org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:521)
        at 
 org.apache.lucene.index.TestSegmentTermDocs.tearDown(TestSegmentTermDocs.java:45)


 REGRESSION:  
 org.apache.lucene.index.codecs.preflex.TestSurrogates.testSurrogatesOrder

 Error Message:
 Some threads threw uncaught exceptions!

 Stack Trace:
 junit.framework.AssertionFailedError: Some threads threw uncaught exceptions!
        at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1221)
        at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1149)
        at 
 org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:521)




 Build Log (for compile errors):
 [...truncated 3276 lines...]



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Brainstorming on Improving the Release Process

2011-03-31 Thread Grant Ingersoll
Other things to add:

1. Managing our website is a big pain in the butt.  Why do we need to publish 
PDFs again?  We really need to get on the new CMS.
2. Copying/moving the artifacts to the release area could be automated, too

At the end of the day, #1 below is what strikes me as the biggest impediment to 
releases.

 
 
 -Original Message-
 From: ext Grant Ingersoll [mailto:gsing...@apache.org] 
 Sent: Wednesday, March 30, 2011 8:22 AM
 To: dev@lucene.apache.org
 Subject: Brainstorming on Improving the Release Process
 
 (Long post, please bear with me and please read!)
 
 Now that we have the release done (I'm working through the publication 
 process now), I want to start the process of thinking about how we can 
 improve the release process.  As I see it, building the artifacts and 
 checking the legal items are now almost completely automated and testable at 
 earlier stages in the game. 
 
 We have kept saying we want to release more often, but we have never defined 
 actionable steps with which we can get there.  Goals without actionable steps 
 are useless.
 
 So, with that in mind, I'd like to brainstorm on how we can improve things a 
 bit more.  Several us acted as RM this time around, so I think we have some 
 common, shared knowledge to take advantage of this time as opposed to in the 
 past where one person mostly just did the release in the background and then 
 we all voted.
 
 So, let's start with what we have right:
 
 1. The Ant process for building a release candidate for both Lucene and Solr 
 is almost identical now and fairly straightforward.
 2. I think the feature freeze is a good thing, although it is a bit too long 
 perhaps.
 3. Pretty good documentation on the steps involved to branch, etc.
 4. The new license validation stuff is a start for enforcing licensing up 
 front more effectively.  What else can we validate up front in terms of 
 packaging?
 5. We have an awesome test infrastructure now.  I think it is safe to say 
 that this version of Lucene is easily the most tested version we have ever 
 shipped.
 
 Things I see that can be improved, and these are only suggestions:
 
 1.  We need to define the Minimum Effective Dose (MED - 
 http://gizmodo.com/#!5709902/4+hour-body-the-principle-of-the-minimum-effective-dose)
  for producing a quality release.  Nothing more, nothing less.  I think one 
 of our biggest problems is we don't know when we are done.  It's this 
 loosey-goosey we all agree notion, but that's silly.  It's software, we 
 should be able to test almost all of the artifacts for certain attributes and 
 then release when they pass.  If we get something wrong, put in a test for it 
 in the next release.  The old saying about perfect being the enemy of great 
 applies here.
 
 In other words, we don't have well defined things that we all are looking for 
 when vetting a release candidate, other than what the ASF requires.  Look at 
 the last few vote threads or for any of the previous threads.  It's obvious 
 that we have a large variety of people doing a large variety of things when 
 it comes to testing the candidates.  For instance, I do the following:
  a. check sigs., md5 hashes, etc. 
  b. run the demos, 
  c. run the Solr example and index some content, 
  d. check over the LICENSE, NOTICE, CHANGES files
  e. Check the overall packaging, etc. is reasonable
  f. I run them through my training code
 
 Others clearly do many other things.  Many of you have your own benchmark 
 tests you run, others read over every last bit of documentation others still 
 put the RC into their own application and test it.   All of this is good, but 
 the problem is it is not _shared_ until the actual RC is up and it is not 
 repeatable (not that all of it can be).  If you have benchmark code/tests 
 that your run on an RC that doesn't involve proprietary code, why isn't it 
 donated to the project so that we can all use it?  That way we don't have to 
 wait until your -1 at the 11th hour to realize the RC is not good.  I 
 personally don't care whether it's python or perl or whatever.  Something 
 that works is better than nothing.  For instance, right now some of the 
 committers have an Apache Extras project going for benchmarking.  Can we get 
 this running on ASF resources on a regular basis?  If it's a computing 
 resource issue, let's go to Infrastructure and ask for resources.  
 Infrastructure has repeatedly said that if a project needs resources to put 
 together a proposal of what you want.  I bet we could get budget to spin up 
 an EC2 instance once a week, run those long running tests (Test2B and other 
 benchmarks) and then report back.  All of that can be automated.
 
 Also, please think hard about whether the things you test can be automated 
 and built into our test suite or at least run nightly or something on Jenkins 
 and then donating them.  I know reading documentation can't, but what else?  
 For instance, could we auto-generate the file 

Re: Using contrib Lucene Benchmark with Solr

2011-03-31 Thread Robert Muir
On Thu, Mar 31, 2011 at 11:24 AM, Burton-West, Tom tburt...@umich.edu wrote:
 Thanks Robert and Grant,

 Does this need a separate JIRA issue dealing specifically with the ability of 
 benchmark to read Solr config settings, or is it subsumed in LUCENE-2845? or 
 should I just add a comment to LUCENE-2845?


I think full integration with Solr might be a lot of work? So i would
start with opening an issue to address your particular itch (e.g.
benchmarking a Analyzer thats instantiated from a solr schema).

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Brainstorming on Improving the Release Process

2011-03-31 Thread Grant Ingersoll

On Mar 31, 2011, at 11:51 AM, Marvin Humphrey wrote:

 On Thu, Mar 31, 2011 at 11:45:53AM -0400, Grant Ingersoll wrote:
 Why do we need to publish PDFs again?  
 
 IIRC, publishing PDFs is the default in Forrest.  It might have been a passive
 choice.

Yeah, it is.  I know.  Just one more thing to worry about when it is broken.

I think we need to simplify across a lot of our processes and get back to what 
I said earlier Minimum Effective Dose when it comes to builds, releases, etc.
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



lucene.apache.org download link lucene/solr?

2011-03-31 Thread Christopher St John
On the front page, in the announcement:

  News 31 March 2011 - Lucene Core 3.1 and Solr 3.1 Available The Lucene
  PMC is... after Solr 1.4.1. Lucene can be downloaded from:

The Lucene download link says /java put actually points to /solr.

-cks

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks

2011-03-31 Thread Michael Busch (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014046#comment-13014046
 ] 

Michael Busch commented on LUCENE-2573:
---

Thanks, Simon, for running the benchmarks! Good results overall, even though 
it's puzzling why flushing would be CPU intensive.

We should probably do some profiling to figure out where the time is spent. I 
can probably do that Sunday, but feel free to beat me :)

 Tiered flushing of DWPTs by RAM with low/high water marks
 -

 Key: LUCENE-2573
 URL: https://issues.apache.org/jira/browse/LUCENE-2573
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael Busch
Assignee: Simon Willnauer
Priority: Minor
 Fix For: Realtime Branch

 Attachments: LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, 
 LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, 
 LUCENE-2573.patch, LUCENE-2573.patch


 Now that we have DocumentsWriterPerThreads we need to track total consumed 
 RAM across all DWPTs.
 A flushing strategy idea that was discussed in LUCENE-2324 was to use a 
 tiered approach:  
 - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM)
 - Flush all DWPTs at a high water mark (e.g. at 110%)
 - Use linear steps in between high and low watermark:  E.g. when 5 DWPTs are 
 used, flush at 90%, 95%, 100%, 105% and 110%.
 Should we allow the user to configure the low and high water mark values 
 explicitly using total values (e.g. low water mark at 120MB, high water mark 
 at 140MB)?  Or shall we keep for simplicity the single setRAMBufferSizeMB() 
 config method and use something like 90% and 110% for the water marks?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: lucene.apache.org download link lucene/solr?

2011-03-31 Thread Robert Muir
On Thu, Mar 31, 2011 at 12:07 PM, Christopher St John
ckstj...@gmail.com wrote:
 On the front page, in the announcement:

  News 31 March 2011 - Lucene Core 3.1 and Solr 3.1 Available The Lucene
  PMC is... after Solr 1.4.1. Lucene can be downloaded from:

 The Lucene download link says /java put actually points to /solr.


thank you!

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Handling wildcard search containing special characters (unicode)

2011-03-31 Thread Patrick ALLAERT
2011/3/31 Robert Muir rcm...@gmail.com:
 On Thu, Mar 31, 2011 at 9:51 AM, Patrick ALLAERT
 patrick.alla...@gmail.com wrote:
 Hello,

 Facing a Solr issue, I have been told that queries with a term like:
 Kiinteistösih*
 will not match the Finnish word Kiinteistösihteeri and that it's a
 known limitation of Lucene.
 Instead, using the word directly, without wildcard, works.

 Do you confirm this a known limitation/bug?
 If so do you have any registered issue about that?

 this isn't the case, there's no unicode limitation here.

 more likely, your analyzer is configured to lowercase text, so in the
 index Kiinteistösihteeri is really kiinteistösihteeri
 in other words, try kiinteistösih* and see how that works.

Following your suggestion, I tested with:
kiinteistösih*

but it doesn't show me the intended result.

I have found the reason why, this is because of the
ISOLatin1AccentFilterFactory filter which is present for both the
index and query analyzer.
Searching with:
kiinteistosih*
did the trick.

One question remains now: why should I lowercase terms containing a
wildcard and making the ISO Latin1 accent conversion myself while I do
have:
analyzer type=query
...
  filter class=solr.LowerCaseFilterFactory/
  filter class=solr.ISOLatin1AccentFilterFactory/
...
for the corresponding fieldType?
I would have guessed it would does it for me.

Your reply helped me a lot understanding what's going on.
Thank you very much for your participation!

Patrick

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-2981) Review and potentially remove unused/unsupported Contribs

2011-03-31 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2981:


Attachment: LUCENE-2981.patch

patch file implementing grant's suggestions.

 Review and potentially remove unused/unsupported Contribs
 -

 Key: LUCENE-2981
 URL: https://issues.apache.org/jira/browse/LUCENE-2981
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Grant Ingersoll
 Fix For: 3.2, 4.0

 Attachments: LUCENE-2981.patch


 Some of our contribs appear to be lacking for development/support or are 
 missing tests.  We should review whether they are even pertinent these days 
 and potentially deprecate and remove them.
 One of the things we did in Mahout when bringing in Colt code was to mark all 
 code that didn't have tests as @deprecated and then we removed the 
 deprecation once tests were added.  Those that didn't get tests added over 
 about a 6 mos. period of time were removed.
 I would suggest taking a hard look at:
 ant
 db
 lucli
 swing
 (spatial should be gutted to some extent and moved to modules)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2981) Review and potentially remove unused/unsupported Contribs

2011-03-31 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014057#comment-13014057
 ] 

Ryan McKinley commented on LUCENE-2981:
---

+1 for 4.0
-0 for 3.2

 Review and potentially remove unused/unsupported Contribs
 -

 Key: LUCENE-2981
 URL: https://issues.apache.org/jira/browse/LUCENE-2981
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Grant Ingersoll
 Fix For: 3.2, 4.0

 Attachments: LUCENE-2981.patch


 Some of our contribs appear to be lacking for development/support or are 
 missing tests.  We should review whether they are even pertinent these days 
 and potentially deprecate and remove them.
 One of the things we did in Mahout when bringing in Colt code was to mark all 
 code that didn't have tests as @deprecated and then we removed the 
 deprecation once tests were added.  Those that didn't get tests added over 
 about a 6 mos. period of time were removed.
 I would suggest taking a hard look at:
 ant
 db
 lucli
 swing
 (spatial should be gutted to some extent and moved to modules)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2981) Review and potentially remove unused/unsupported Contribs

2011-03-31 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014060#comment-13014060
 ] 

Grant Ingersoll commented on LUCENE-2981:
-

+1 for 4.0

I'm fine w/ 3.2, too, FWIW.  I can't remember the last time someone submitted a 
patch or even reported a bug on any of these or even asked about them on user@.

 Review and potentially remove unused/unsupported Contribs
 -

 Key: LUCENE-2981
 URL: https://issues.apache.org/jira/browse/LUCENE-2981
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Grant Ingersoll
 Fix For: 3.2, 4.0

 Attachments: LUCENE-2981.patch


 Some of our contribs appear to be lacking for development/support or are 
 missing tests.  We should review whether they are even pertinent these days 
 and potentially deprecate and remove them.
 One of the things we did in Mahout when bringing in Colt code was to mark all 
 code that didn't have tests as @deprecated and then we removed the 
 deprecation once tests were added.  Those that didn't get tests added over 
 about a 6 mos. period of time were removed.
 I would suggest taking a hard look at:
 ant
 db
 lucli
 swing
 (spatial should be gutted to some extent and moved to modules)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3010) Add the ability for the Lucene Benchmarkcode to read Solr configuration information for testing Analyzer/Filter Chains

2011-03-31 Thread Tom Burton-West (JIRA)
Add the ability for the  Lucene Benchmarkcode to read Solr configuration 
information for testing Analyzer/Filter Chains
---

 Key: LUCENE-3010
 URL: https://issues.apache.org/jira/browse/LUCENE-3010
 Project: Lucene - Java
  Issue Type: Wish
  Components: contrib/benchmark
Reporter: Tom Burton-West
Priority: Trivial


I would like to be able to use the Lucene Benchmark code in Lucene contrib with 
Solr to run some indexing tests.  It would be nice if Lucene Benchmark could 
read my Solr configuration rather than having to translate my filter chain and 
other parameters into Lucene java code.  This relates to Lucene 2845, 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3010) Add the ability for the Lucene Benchmark code to read Solr configuration information for testing Analyzer/Filter Chains

2011-03-31 Thread Tom Burton-West (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom Burton-West updated LUCENE-3010:


Summary: Add the ability for the  Lucene Benchmark code to read Solr 
configuration information for testing Analyzer/Filter Chains  (was: Add the 
ability for the  Lucene Benchmarkcode to read Solr configuration information 
for testing Analyzer/Filter Chains)

 Add the ability for the  Lucene Benchmark code to read Solr configuration 
 information for testing Analyzer/Filter Chains
 

 Key: LUCENE-3010
 URL: https://issues.apache.org/jira/browse/LUCENE-3010
 Project: Lucene - Java
  Issue Type: Wish
  Components: contrib/benchmark
Reporter: Tom Burton-West
Priority: Trivial

 I would like to be able to use the Lucene Benchmark code in Lucene contrib 
 with Solr to run some indexing tests.  It would be nice if Lucene Benchmark 
 could read my Solr configuration rather than having to translate my filter 
 chain and other parameters into Lucene java code.  This relates to Lucene 
 2845, 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2981) Review and potentially remove unused/unsupported Contribs

2011-03-31 Thread Andi Vajda (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014070#comment-13014070
 ] 

Andi Vajda commented on LUCENE-2981:


Unless there are users, I'm +1 for removing db anytime.
Last time I fixed something there was for the Java version of db, a 
contribution by someone else I haven't heard of in years.
I haven't heard from any users with questions or bug reports in a long time 
either.

 Review and potentially remove unused/unsupported Contribs
 -

 Key: LUCENE-2981
 URL: https://issues.apache.org/jira/browse/LUCENE-2981
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Grant Ingersoll
 Fix For: 3.2, 4.0

 Attachments: LUCENE-2981.patch


 Some of our contribs appear to be lacking for development/support or are 
 missing tests.  We should review whether they are even pertinent these days 
 and potentially deprecate and remove them.
 One of the things we did in Mahout when bringing in Colt code was to mark all 
 code that didn't have tests as @deprecated and then we removed the 
 deprecation once tests were added.  Those that didn't get tests added over 
 about a 6 mos. period of time were removed.
 I would suggest taking a hard look at:
 ant
 db
 lucli
 swing
 (spatial should be gutted to some extent and moved to modules)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[HUDSON] Lucene-Solr-tests-only-3.x - Build # 6567 - Failure

2011-03-31 Thread Apache Hudson Server
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/6567/

All tests passed

Build Log (for compile errors):
[...truncated 47 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[HUDSON] Lucene-Solr-tests-only-3.x - Build # 6568 - Still Failing

2011-03-31 Thread Apache Hudson Server
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/6568/

2 tests failed.
FAILED:  init.org.apache.lucene.util.TestBitVector

Error Message:
org.apache.lucene.util.TestBitVector

Stack Trace:
java.lang.ClassNotFoundException: org.apache.lucene.util.TestBitVector
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:186)


FAILED:  init.org.apache.lucene.util.TestFieldCacheSanityChecker

Error Message:
org.apache.lucene.util.TestFieldCacheSanityChecker

Stack Trace:
java.lang.ClassNotFoundException: 
org.apache.lucene.util.TestFieldCacheSanityChecker
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:186)




Build Log (for compile errors):
[...truncated 47 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2981) Review and potentially remove unused/unsupported Contribs

2011-03-31 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014108#comment-13014108
 ] 

Earwin Burrfoot commented on LUCENE-2981:
-

Bye-bye, DB. Few things can compete with it in pointlessness.

 Review and potentially remove unused/unsupported Contribs
 -

 Key: LUCENE-2981
 URL: https://issues.apache.org/jira/browse/LUCENE-2981
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Grant Ingersoll
 Fix For: 3.2, 4.0

 Attachments: LUCENE-2981.patch


 Some of our contribs appear to be lacking for development/support or are 
 missing tests.  We should review whether they are even pertinent these days 
 and potentially deprecate and remove them.
 One of the things we did in Mahout when bringing in Colt code was to mark all 
 code that didn't have tests as @deprecated and then we removed the 
 deprecation once tests were added.  Those that didn't get tests added over 
 about a 6 mos. period of time were removed.
 I would suggest taking a hard look at:
 ant
 db
 lucli
 swing
 (spatial should be gutted to some extent and moved to modules)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: [HUDSON] Lucene-Solr-tests-only-3.x - Build # 6568 - Still Failing

2011-03-31 Thread Uwe Schindler
I killed hanging java processes!

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Apache Hudson Server [mailto:hud...@hudson.apache.org]
 Sent: Thursday, March 31, 2011 8:10 PM
 To: dev@lucene.apache.org
 Subject: [HUDSON] Lucene-Solr-tests-only-3.x - Build # 6568 - Still Failing
 
 Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-
 3.x/6568/
 
 2 tests failed.
 FAILED:  init.org.apache.lucene.util.TestBitVector
 
 Error Message:
 org.apache.lucene.util.TestBitVector
 
 Stack Trace:
 java.lang.ClassNotFoundException: org.apache.lucene.util.TestBitVector
   at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
   at java.lang.Class.forName0(Native Method)
   at java.lang.Class.forName(Class.java:186)
 
 
 FAILED:  init.org.apache.lucene.util.TestFieldCacheSanityChecker
 
 Error Message:
 org.apache.lucene.util.TestFieldCacheSanityChecker
 
 Stack Trace:
 java.lang.ClassNotFoundException:
 org.apache.lucene.util.TestFieldCacheSanityChecker
   at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
   at java.lang.Class.forName0(Native Method)
   at java.lang.Class.forName(Class.java:186)
 
 
 
 
 Build Log (for compile errors):
 [...truncated 47 lines...]
 
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
 commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[Lucene.Net] [jira] [Commented] (LUCENENET-391) Luke.Net for Lucene.Net

2011-03-31 Thread Sergey Mirvoda (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENENET-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014116#comment-13014116
 ] 

Sergey Mirvoda commented on LUCENENET-391:
--

Notice guys We renamed the project. 

FYI
latest version works very good on mono.

 Luke.Net for Lucene.Net
 ---

 Key: LUCENENET-391
 URL: https://issues.apache.org/jira/browse/LUCENENET-391
 Project: Lucene.Net
  Issue Type: New Feature
  Components: Lucene.Net Contrib
Reporter: Pasha Bizhan
Assignee: Sergey Mirvoda
Priority: Minor
  Labels: Luke.Net
 Fix For: Lucene.Net 2.9.4

 Attachments: luke-net-bin.zip, luke-net-src.zip


 Create a port of Java Luke to .NET for use with Lucene.Net
 See attachments for a 1.4 compatible version or 
 https://bitbucket.org/thoward/luke.net-incbuating for a partial 
 implementation that is 2.9.2 compatible. 
 The attached version was contributed by Pasha Bizhan, and the bitbucket 
 version was contributed by Aaron Powell (above version is a fork, original at 
 https://bitbucket.org/slace/luke.net). If source code from either is used, a 
 software grant must be provided from the original authors. 
 The final version should be 2.9.4 compatible and implement most or all 
 features of Java Luke 1.0.1 (see http://code.google.com/p/luke/ ). 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[Lucene.Net] [jira] [Commented] (LUCENENET-397) Resolution of the legal issues

2011-03-31 Thread Sergey Mirvoda (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENENET-397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014117#comment-13014117
 ] 

Sergey Mirvoda commented on LUCENENET-397:
--

We decided to rename project and re implement it from scratch as much as 
possible 
but based on top of Pasha's work.

 Resolution of the legal issues
 --

 Key: LUCENENET-397
 URL: https://issues.apache.org/jira/browse/LUCENENET-397
 Project: Lucene.Net
  Issue Type: Sub-task
  Components: Lucene.Net Contrib
Reporter: Scott Lombard
Assignee: Troy Howard
Priority: Blocker
  Labels: Luke.Net
 Fix For: Lucene.Net 2.9.4


 Resolution of the legal issues around ingesting the code into Lucene.Net. 
 Coordinate with Aaron Powell to obtain software grant paperwork.
 Per Stefan Bodewig (Incubating Mentor):
 All it takes is:
 * attach the code to a JIRA ticket.
 * have software grants signed by all contributors to the original code
  base.
 * write a single page for the Incubator site
 * start a vote on Incubator general and wait for 72 hours.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Apache Lucene 3.1.0 is available

2011-03-31 Thread Grant Ingersoll
March 2011, Apache Lucene 3.1 available
The Lucene PMC is pleased to announce the release of Apache Lucene 3.1.

This release contains numerous bug fixes, optimizations, and
improvements, some of which are highlighted below.  The release
is available for immediate download at 
http://www.apache.org/dyn/closer.cgi/lucene/java (see note below).
See the CHANGES.txt
file included with the release for a full list of details.

Lucene 3.1 Release Highlights

* Numerous performance improvements: faster exact PhraseQuery; merging
  favors segments with deletions; primary key lookup is faster;
  IndexWriter.addIndexes(Directory[]) uses file copy instead of
  merging; various Directory performance improvements; compound file
  is dynamically turned off for large segments; fully deleted segments
  are dropped on commit; faster snowball analyzers (in contrib);
  ConcurrentMergeScheduler is more careful about setting priority of
  merge threads.

* ReusableAnalyzerBase makes it easier to reuse TokenStreams
  correctly.

* Improved Analysis capabilities: Improved Unicode support, including
  Unicode 4, more friendly term handling (CharTermAttribute), easier
  object reuse and better support for protected words in lossy token
  filters (e.g. stemmers).

* ConstantScoreQuery now allows directly wrapping a Query.

* IndexWriter is now configured with a new separate builder API,
  IndexWriterConfig.  You can now control IndexWriter's previously
  fixed internal thread limit by calling setMaxThreadStates.

* IndexWriter.getReader is replaced by IndexReader.open(IndexWriter).
  In addition you can now specify whether deletes should be resolved
  when you open an NRT reader.

* MultiSearcher is deprecated; ParallelMultiSearcher has been
  absorbed directly into IndexSearcher.

* On 64bit Windows and Solaris JVMs, MMapDirectory is now the
  default implementation (returned by FSDirectory.open).
  MMapDirectory also enables unmapping if the JVM supports it.

* New TotalHitCountCollector just counts total number of hits.

* ReaderFinishedListener API enables external caches to evict entries
  once a segment is finished.

Note: The Apache Software Foundation uses an extensive mirroring network for 
distributing releases.  It is possible that the mirror you are using may not 
have replicated the release yet.  If that is the case, please try another 
mirror.  This also goes for Maven access.
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Apache Solr 3.1.0 available

2011-03-31 Thread Grant Ingersoll
March 2011, Apache Solr 3.1 available

The Lucene PMC is pleased to announce the release of Apache Solr 3.1.

This release contains numerous bug fixes, optimizations, and
improvements, some of which are highlighted below.  The release is
available for immediate download at 
http://www.apache.org/dyn/closer.cgi/lucene/solr (see note below).
See the CHANGES.txt file included with the release for a full list of
details as well as instructions on upgrading.

What's in a Version? 

The version number for Solr 3.1 was chosen to reflect the merge of
development with Lucene, which is currently also on 3.1.  Going
forward, we expect the Solr version to be the same as the Lucene
version.  Solr 3.1 contains Lucene 3.1 and is the release after Solr 1.4.1.

Solr 3.1 Release Highlights

* Numeric range facets (similar to date faceting).

* New spatial search, including spatial filtering, boosting and sorting 
capabilities.

* Example Velocity driven search UI at http://localhost:8983/solr/browse

* A new termvector-based highlighter

* Extend dismax (edismax) query parser which addresses some
  missing features in the dismax query parser along with some
  extensions.

* Several more components now support distributed mode:
  TermsComponent, SpellCheckComponent.

* A new Auto Suggest component.

* Ability to sort by functions.

* JSON document indexing

* CSV response format

* Apache UIMA integration for metadata extraction

* Leverages Lucene 3.1 and it's inherent optimizations and bug fixes
  as well as new analysis capabilities.

* Numerous improvements, bug fixes, and optimizations.

Note: The Apache Software Foundation uses an extensive mirroring network for 
distributing releases.  It is possible that the mirror you are using may not 
have replicated the release yet.  If that is the case, please try another 
mirror.  This also goes for Maven access.
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Apache Solr 3.1.0

2011-03-31 Thread Grant Ingersoll
March 2011, Apache Solr 3.1 available

The Lucene PMC is pleased to announce the release of Apache Solr 3.1.

This release contains numerous bug fixes, optimizations, and
improvements, some of which are highlighted below.  The release is
available for immediate download at 
http://www.apache.org/dyn/closer.cgi/lucene/solr (see note below).
See the CHANGES.txt file included with the release for a full list of
details as well as instructions on upgrading.

What's in a Version? 

The version number for Solr 3.1 was chosen to reflect the merge of
development with Lucene, which is currently also on 3.1.  Going
forward, we expect the Solr version to be the same as the Lucene
version.  Solr 3.1 contains Lucene 3.1 and is the release after Solr 1.4.1.

Solr 3.1 Release Highlights

* Numeric range facets (similar to date faceting).

* New spatial search, including spatial filtering, boosting and sorting 
capabilities.

* Example Velocity driven search UI at http://localhost:8983/solr/browse

* A new termvector-based highlighter

* Extend dismax (edismax) query parser which addresses some
 missing features in the dismax query parser along with some
 extensions.

* Several more components now support distributed mode:
 TermsComponent, SpellCheckComponent.

* A new Auto Suggest component.

* Ability to sort by functions.

* JSON document indexing

* CSV response format

* Apache UIMA integration for metadata extraction

* Leverages Lucene 3.1 and it's inherent optimizations and bug fixes
 as well as new analysis capabilities.

* Numerous improvements, bug fixes, and optimizations.

Note: The Apache Software Foundation uses an extensive mirroring network for 
distributing releases.  It is possible that the mirror you are using may not 
have replicated the release yet.  If that is the case, please try another 
mirror.  This also goes for Maven access.

Apache Lucene 3.1.0

2011-03-31 Thread Grant Ingersoll
March 2011, Apache Lucene 3.1 available
The Lucene PMC is pleased to announce the release of Apache Lucene 3.1.

This release contains numerous bug fixes, optimizations, and
improvements, some of which are highlighted below.  The release
is available for immediate download at 
http://www.apache.org/dyn/closer.cgi/lucene/java (see note below).
See the CHANGES.txt
file included with the release for a full list of details.

Lucene 3.1 Release Highlights

* Numerous performance improvements: faster exact PhraseQuery; merging
 favors segments with deletions; primary key lookup is faster;
 IndexWriter.addIndexes(Directory[]) uses file copy instead of
 merging; various Directory performance improvements; compound file
 is dynamically turned off for large segments; fully deleted segments
 are dropped on commit; faster snowball analyzers (in contrib);
 ConcurrentMergeScheduler is more careful about setting priority of
 merge threads.

* ReusableAnalyzerBase makes it easier to reuse TokenStreams
 correctly.

* Improved Analysis capabilities: Improved Unicode support, including
 Unicode 4, more friendly term handling (CharTermAttribute), easier
 object reuse and better support for protected words in lossy token
 filters (e.g. stemmers).

* ConstantScoreQuery now allows directly wrapping a Query.

* IndexWriter is now configured with a new separate builder API,
 IndexWriterConfig.  You can now control IndexWriter's previously
 fixed internal thread limit by calling setMaxThreadStates.

* IndexWriter.getReader is replaced by IndexReader.open(IndexWriter).
 In addition you can now specify whether deletes should be resolved
 when you open an NRT reader.

* MultiSearcher is deprecated; ParallelMultiSearcher has been
 absorbed directly into IndexSearcher.

* On 64bit Windows and Solaris JVMs, MMapDirectory is now the
 default implementation (returned by FSDirectory.open).
 MMapDirectory also enables unmapping if the JVM supports it.

* New TotalHitCountCollector just counts total number of hits.

* ReaderFinishedListener API enables external caches to evict entries
 once a segment is finished.

Note: The Apache Software Foundation uses an extensive mirroring network for 
distributing releases.  It is possible that the mirror you are using may not 
have replicated the release yet.  If that is the case, please try another 
mirror.  This also goes for Maven access.

--
Grant Ingersoll
Lucene Revolution -- Lucene and Solr User Conference
May 25-26 in San Francisco
www.lucenerevolution.org



Re: Brainstorming on Improving the Release Process

2011-03-31 Thread Upayavira


On Thu, 31 Mar 2011 09:51 -0400, Robert Muir rcm...@gmail.com wrote:
 On Thu, Mar 31, 2011 at 9:40 AM, Upayavira u...@odoko.co.uk wrote:
 
  Are you willing to say more? I have a little time, and have done a lot
  of work with Ant. Maybe I could help.
 
  Upayavira
 
 Thanks, there is some followup discussion on this JIRA issue:
 https://issues.apache.org/jira/browse/SOLR-2002
 
 The prototype patch I refer to in the comments where solr build
 system is changed to extend lucene's is the latest _merged.patch on
 the issue:
 https://issues.apache.org/jira/secure/attachment/12456811/SOLR-2002_merged.patch
 
 (Additionally as sort of a followup there are more comments/ideas
 about additional things we could do besides just refactoring the build
 system to be faster and simpler)
 
 As a first step I think the patch needs to be brought up to trunk (it
 gets out of date fast). I mentioned on the issue we can simply create
 a branch to make coordination easier. A branch might seem silly for a
 thing like this, but it would at least allow us to work together and
 people could contribute parts (e.g. PMD integration or something)
 without having to juggle huge out of sync patches.

Thx. I'll take a look in the (uk) morning.

Upayavira
--- 
Enterprise Search Consultant at Sourcesense UK, 
Making Sense of Open Source


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-2451) Add assertQScore() to SolrTestCaseJ4 to account for small deltas

2011-03-31 Thread David Smiley (JIRA)
Add assertQScore() to SolrTestCaseJ4 to account for small deltas 
-

 Key: SOLR-2451
 URL: https://issues.apache.org/jira/browse/SOLR-2451
 Project: Solr
  Issue Type: Improvement
Affects Versions: Next
Reporter: David Smiley
Priority: Minor
 Attachments: SOLR-2451_assertQScore.patch

Attached is a patch that adds the following method to SolrTestCaseJ4:  (just 
javadoc  signature shown)
{code:java}
  /**
   * Validates that the document at the specified index in the results has the 
specified score, within 0.0001.
   */
  public static void assertQScore(SolrQueryRequest req, int docIdx, float 
targetScore) {
{code}

This is especially useful for geospatial in which slightly different precision 
deltas might occur when trying different geospatial indexing strategies are 
used, assuming the score is some geospatial distance.  This patch makes a 
simple modification to DistanceFunctionTest to use it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2451) Add assertQScore() to SolrTestCaseJ4 to account for small deltas

2011-03-31 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated SOLR-2451:
---

Attachment: SOLR-2451_assertQScore.patch

 Add assertQScore() to SolrTestCaseJ4 to account for small deltas 
 -

 Key: SOLR-2451
 URL: https://issues.apache.org/jira/browse/SOLR-2451
 Project: Solr
  Issue Type: Improvement
Affects Versions: Next
Reporter: David Smiley
Priority: Minor
 Attachments: SOLR-2451_assertQScore.patch


 Attached is a patch that adds the following method to SolrTestCaseJ4:  (just 
 javadoc  signature shown)
 {code:java}
   /**
* Validates that the document at the specified index in the results has 
 the specified score, within 0.0001.
*/
   public static void assertQScore(SolrQueryRequest req, int docIdx, float 
 targetScore) {
 {code}
 This is especially useful for geospatial in which slightly different 
 precision deltas might occur when trying different geospatial indexing 
 strategies are used, assuming the score is some geospatial distance.  This 
 patch makes a simple modification to DistanceFunctionTest to use it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3006) Javadocs warnings should fail the build

2011-03-31 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated LUCENE-3006:


Attachment: LUCENE-3006.patch

Here's the patch I just committed.

 Javadocs warnings should fail the build
 ---

 Key: LUCENE-3006
 URL: https://issues.apache.org/jira/browse/LUCENE-3006
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 3.2, 4.0
Reporter: Grant Ingersoll
 Attachments: LUCENE-3006-javadoc-warning-cleanup.patch, 
 LUCENE-3006.patch, LUCENE-3006.patch, LUCENE-3006.patch


 We should fail the build when there are javadocs warnings, as this should not 
 be the Release Manager's job to fix all at once right before the release.
 See 
 http://www.lucidimagination.com/search/document/14bd01e519f39aff/brainstorming_on_improving_the_release_process

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [HUDSON] Lucene-Solr-tests-only-trunk - Build # 6565 - Failure

2011-03-31 Thread Simon Willnauer
I just committed a fix for this

simon

On Thu, Mar 31, 2011 at 5:28 PM, Simon Willnauer
simon.willna...@googlemail.com wrote:
 This on is weird seems like there is a synchronized missing on
 FieldInfoBiMap#containsConsistent

 I try to reproduce first.

 simon
 On Thu, Mar 31, 2011 at 11:37 AM, Apache Hudson Server
 hud...@hudson.apache.org wrote:
 Build: 
 https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/6565/

 3 tests failed.
 REGRESSION:  org.apache.lucene.index.TestNRTThreads.testNRTThreads

 Error Message:
 null

 Stack Trace:
 junit.framework.AssertionFailedError
        at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1221)
        at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1149)
        at org.apache.lucene.index.FieldInfos.putInternal(FieldInfos.java:280)
        at org.apache.lucene.index.FieldInfos.clone(FieldInfos.java:302)
        at org.apache.lucene.index.SegmentInfo.clone(SegmentInfo.java:345)
        at org.apache.lucene.index.SegmentInfos.clone(SegmentInfos.java:374)
        at 
 org.apache.lucene.index.DirectoryReader.init(DirectoryReader.java:165)
        at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:360)
        at org.apache.lucene.index.IndexReader.open(IndexReader.java:316)
        at 
 org.apache.lucene.index.TestNRTThreads.testNRTThreads(TestNRTThreads.java:244)


 REGRESSION:  org.apache.lucene.index.TestSegmentTermDocs.test

 Error Message:
 Some threads threw uncaught exceptions!

 Stack Trace:
 junit.framework.AssertionFailedError: Some threads threw uncaught exceptions!
        at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1221)
        at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1149)
        at 
 org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:521)
        at 
 org.apache.lucene.index.TestSegmentTermDocs.tearDown(TestSegmentTermDocs.java:45)


 REGRESSION:  
 org.apache.lucene.index.codecs.preflex.TestSurrogates.testSurrogatesOrder

 Error Message:
 Some threads threw uncaught exceptions!

 Stack Trace:
 junit.framework.AssertionFailedError: Some threads threw uncaught exceptions!
        at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1221)
        at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1149)
        at 
 org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:521)




 Build Log (for compile errors):
 [...truncated 3276 lines...]



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-2338) improved per-field similarity integration into schema.xml

2011-03-31 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved SOLR-2338.
---

   Resolution: Fixed
Fix Version/s: 4.0

Committed revision 1087430.

Thanks hoss and yonik for feedback.

 improved per-field similarity integration into schema.xml
 -

 Key: SOLR-2338
 URL: https://issues.apache.org/jira/browse/SOLR-2338
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 4.0
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 4.0

 Attachments: SOLR-2338.patch, SOLR-2338.patch, SOLR-2338.patch


 Currently since LUCENE-2236, we can enable Similarity per-field, but in 
 schema.xml there is only a 'global' factory
 for the SimilarityProvider.
 In my opinion this is too low-level because to customize Similarity on a 
 per-field basis, you have to set your own
 CustomSimilarityProvider with similarity class=.../ and manage the 
 per-field mapping yourself in java code.
 Instead I think it would be better if you just specify the Similarity in the 
 FieldType, like after analyzer.
 As far as the example, one idea from LUCENE-1360 was to make a short_text 
 or metadata_text used by the
 various metadata fields in the example that has better norm quantization for 
 its shortness...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2061) Generate jar containing test classes.

2011-03-31 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014188#comment-13014188
 ] 

Robert Muir commented on SOLR-2061:
---

I think this issue just needs the maven parts to be resynced to the fact that 
lucene's tests-framework jar was renamed?

 Generate jar containing test classes.
 -

 Key: SOLR-2061
 URL: https://issues.apache.org/jira/browse/SOLR-2061
 Project: Solr
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.1
Reporter: Drew Farris
Assignee: Robert Muir
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: SOLR-2061.patch, SOLR-2061.patch, SOLR-2061.patch, 
 SOLR-2061.patch


 Follow-on to LUCENE-2609 for the solr build -- it would be useful to generate 
 and deploy a jar contaiing the test classes so other projects could write 
 unit tests using the framework in Solr. 
 This may take care of SOLR-717 as well.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-2061) Generate jar containing test classes.

2011-03-31 Thread Steven Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe reassigned SOLR-2061:
-

Assignee: Steven Rowe  (was: Robert Muir)

 Generate jar containing test classes.
 -

 Key: SOLR-2061
 URL: https://issues.apache.org/jira/browse/SOLR-2061
 Project: Solr
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.1
Reporter: Drew Farris
Assignee: Steven Rowe
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: SOLR-2061.patch, SOLR-2061.patch, SOLR-2061.patch, 
 SOLR-2061.patch


 Follow-on to LUCENE-2609 for the solr build -- it would be useful to generate 
 and deploy a jar contaiing the test classes so other projects could write 
 unit tests using the framework in Solr. 
 This may take care of SOLR-717 as well.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[HUDSON] Lucene-Solr-tests-only-trunk - Build # 6584 - Failure

2011-03-31 Thread Apache Hudson Server
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/6584/

3 tests failed.
REGRESSION:  
org.apache.solr.client.solrj.embedded.SolrExampleJettyTest.testCommitWithin

Error Message:
expected:0 but was:1

Stack Trace:
junit.framework.AssertionFailedError: expected:0 but was:1
at 
org.apache.solr.client.solrj.SolrExampleTests.testCommitWithin(SolrExampleTests.java:365)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1221)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1149)


REGRESSION:  org.apache.solr.cloud.CloudStateUpdateTest.testCoreRegistration

Error Message:
null

Stack Trace:
org.apache.solr.common.cloud.ZooKeeperException: 
at 
org.apache.solr.core.CoreContainer.initZooKeeper(CoreContainer.java:183)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:333)
at 
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:242)
at 
org.apache.solr.cloud.CloudStateUpdateTest.testCoreRegistration(CloudStateUpdateTest.java:216)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1221)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1149)
Caused by: java.util.concurrent.TimeoutException: Could not connect to 
ZooKeeper 127.0.0.1:16662/solr within 5000 ms
at 
org.apache.solr.common.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:124)
at 
org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:121)
at 
org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:69)
at org.apache.solr.cloud.ZkController.init(ZkController.java:104)
at 
org.apache.solr.core.CoreContainer.initZooKeeper(CoreContainer.java:164)


REGRESSION:  org.apache.solr.cloud.ZkControllerTest.testUploadToCloud

Error Message:
KeeperErrorCode = ConnectionLoss for /configs/config1/synonyms.txt

Stack Trace:
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for /configs/config1/synonyms.txt
at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637)
at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:347)
at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:308)
at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:290)
at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:255)
at 
org.apache.solr.common.cloud.SolrZkClient.setData(SolrZkClient.java:384)
at 
org.apache.solr.common.cloud.SolrZkClient.setData(SolrZkClient.java:410)
at org.apache.solr.cloud.ZkController.uploadToZK(ZkController.java:520)
at 
org.apache.solr.cloud.ZkControllerTest.testUploadToCloud(ZkControllerTest.java:191)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1221)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1149)




Build Log (for compile errors):
[...truncated 8836 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2155) Geospatial search using geohash prefixes

2011-03-31 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated SOLR-2155:
---

Attachment: SOLR-2155_GeoHashPrefixFilter_with_sorting_no_poly.patch

Attached is a new patch. The highlights are:
 * Requires the latest Solr trunk -- probably anything in the last few months: 
If this is ultimately going to get committed then this needed to happen.  There 
are only some slight differences so if you really need an earlier trunk then 
I'm sure you'll figure it out.
 * Adds support for sorting, including multi-value: Use the existing geodist() 
function query with a lat-lon constant and a reference to your geohash based 
field. Note that this works by loading all points from the field into memory, 
resolving each underlying full-length geohash into the lat  lon into a data 
structure which is a ListPoint2D[].  This is improved over Bill's patch, 
surely, but it could use some optimization.  It's not optimized for the 
single-value case either; that's a definite TODO.
 * Polygon/WKT features have been omitted due to LGPL licensing concerns of 
JTS. I've left hooks for their implementation to make adding on this capability 
that already existed easy. You'll easily figure it out if you are so inclined.  
I might ad this as a patch shortly (not to be committed) when I get some time; 
but longer term it will re-surface under a separate project.  Don't worry; 
it'll be painless to use if you need it.
 * This might be controversial but as part of this patch, I removed the 
ghhsin() and geohash() function queries. Their presence was confusing; I simply 
don't see what point there is too them now that this patch fleshes out the 
geohash capability.
 * I decided to pre-register my SpatialGeoHashFilterQParser as geohashfilt, 
instead of requiring you to do so in solrconfig.xml.  You could use geofilt 
for point-radius queries but I prefer this one since I can specify the bbox 
explicitly.

There are a few slight changes to GeoHashPrefixFilter that crept in from 
unfinished work (notably tying sorting to filtering in an efficient way) but it 
is harmless.

Bill, thanks for kick-starting the multi-value sorting. I re-used most of your 
code.

 Geospatial search using geohash prefixes
 

 Key: SOLR-2155
 URL: https://issues.apache.org/jira/browse/SOLR-2155
 Project: Solr
  Issue Type: Improvement
Reporter: David Smiley
Assignee: Grant Ingersoll
 Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, 
 GeoHashPrefixFilter.patch, 
 SOLR-2155_GeoHashPrefixFilter_with_sorting_no_poly.patch, SOLR.2155.p3.patch, 
 SOLR.2155.p3tests.patch


 There currently isn't a solution in Solr for doing geospatial filtering on 
 documents that have a variable number of points.  This scenario occurs when 
 there is location extraction (i.e. via a gazateer) occurring on free text.  
 None, one, or many geospatial locations might be extracted from any given 
 document and users want to limit their search results to those occurring in a 
 user-specified area.
 I've implemented this by furthering the GeoHash based work in Lucene/Solr 
 with a geohash prefix based filter.  A geohash refers to a lat-lon box on the 
 earth.  Each successive character added further subdivides the box into a 4x8 
 (or 8x4 depending on the even/odd length of the geohash) grid.  The first 
 step in this scheme is figuring out which geohash grid squares cover the 
 user's search query.  I've added various extra methods to GeoHashUtils (and 
 added tests) to assist in this purpose.  The next step is an actual Lucene 
 Filter, GeoHashPrefixFilter, that uses these geohash prefixes in 
 TermsEnum.seek() to skip to relevant grid squares in the index.  Once a 
 matching geohash grid is found, the points therein are compared against the 
 user's query to see if it matches.  I created an abstraction GeoShape 
 extended by subclasses named PointDistance... and CartesianBox to support 
 different queried shapes so that the filter need not care about these details.
 This work was presented at LuceneRevolution in Boston on October 8th.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[HUDSON] Lucene-Solr-tests-only-trunk - Build # 6590 - Failure

2011-03-31 Thread Apache Hudson Server
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/6590/

1 tests failed.
REGRESSION:  org.apache.solr.cloud.ZkSolrClientTest.testConnect

Error Message:
Could not connect to ZooKeeper 127.0.0.1:39750/solr within 3 ms

Stack Trace:
java.util.concurrent.TimeoutException: Could not connect to ZooKeeper 
127.0.0.1:39750/solr within 3 ms
at 
org.apache.solr.common.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:124)
at 
org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:121)
at 
org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:84)
at 
org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:65)
at 
org.apache.solr.cloud.ZkSolrClientTest.testConnect(ZkSolrClientTest.java:43)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1221)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1149)




Build Log (for compile errors):
[...truncated 8701 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[HUDSON] Lucene-Solr-tests-only-trunk - Build # 6591 - Still Failing

2011-03-31 Thread Apache Hudson Server
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/6591/

1 tests failed.
REGRESSION:  org.apache.lucene.benchmark.byTask.TestPerfTasksLogic.testCollator

Error Message:
expected:[,䀘䀌  䰁䨀€@ 䀀 ကࠀЀ] but was:[foobar]

Stack Trace:
at 
org.apache.lucene.benchmark.byTask.TestPerfTasksLogic.assertEqualCollation(TestPerfTasksLogic.java:969)
at 
org.apache.lucene.benchmark.byTask.TestPerfTasksLogic.testCollator(TestPerfTasksLogic.java:939)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1221)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1149)




Build Log (for compile errors):
[...truncated 6348 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[HUDSON] Lucene-Solr-tests-only-trunk - Build # 6592 - Still Failing

2011-03-31 Thread Apache Hudson Server
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/6592/

1 tests failed.
FAILED:  org.apache.lucene.benchmark.byTask.TestPerfTasksLogic.testCollator

Error Message:
expected:[,䀘䀌  䰁䨀€@ 䀀 ကࠀЀ] but was:[foobar]

Stack Trace:
at 
org.apache.lucene.benchmark.byTask.TestPerfTasksLogic.assertEqualCollation(TestPerfTasksLogic.java:969)
at 
org.apache.lucene.benchmark.byTask.TestPerfTasksLogic.testCollator(TestPerfTasksLogic.java:939)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1221)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1149)




Build Log (for compile errors):
[...truncated 6359 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3009) binary packaging: lucene modules/contribs that depend on jars are confusing

2011-03-31 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014355#comment-13014355
 ] 

Hoss Man commented on LUCENE-3009:
--

I question if we really need to bother with binary lucene / module tar/zip 
artifacts -- if we only had source release packages, then the build.xml files 
make it clear exactly what the dpeendencies for each piece of code are.

for Solr, a large percentage of hte user base doesn't know anything about java 
-- so it definitely makes sense to have artifacts with precompiled jars; but if 
you're using the java libraries directly, you're a java programer, and you 
should be able to run ant compile on a src release (or use the maven to fetch 
the published jars with poms that link to the appropriate dependencies)

 binary packaging: lucene modules/contribs that depend on jars are confusing 
 

 Key: LUCENE-3009
 URL: https://issues.apache.org/jira/browse/LUCENE-3009
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Robert Muir
 Fix For: 3.2, 4.0


 In the binary release, i noticed lucene contribs (for example benchmark)
 that rely upon jar files, don't include them, nor do they have a README 
 telling
 you they depend upon them, nor is there any hint they actually have any 
 dependencies at all!
 we should improve this either by including the jars you need or by including 
 a README.txt telling you what a particular module/contrib depends upon.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3009) binary packaging: lucene modules/contribs that depend on jars are confusing

2011-03-31 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014357#comment-13014357
 ] 

Robert Muir commented on LUCENE-3009:
-

When i brought up the idea of source code only, it didn't seem too popular.

That being said, if we go source code only, the maven stuff should be 
source-code only too.


 binary packaging: lucene modules/contribs that depend on jars are confusing 
 

 Key: LUCENE-3009
 URL: https://issues.apache.org/jira/browse/LUCENE-3009
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Robert Muir
 Fix For: 3.2, 4.0


 In the binary release, i noticed lucene contribs (for example benchmark)
 that rely upon jar files, don't include them, nor do they have a README 
 telling
 you they depend upon them, nor is there any hint they actually have any 
 dependencies at all!
 we should improve this either by including the jars you need or by including 
 a README.txt telling you what a particular module/contrib depends upon.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3009) binary packaging: lucene modules/contribs that depend on jars are confusing

2011-03-31 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014362#comment-13014362
 ] 

Steven Rowe commented on LUCENE-3009:
-

bq, the maven stuff should be source-code only too.

-1.  (mutually exclusive concepts)

 binary packaging: lucene modules/contribs that depend on jars are confusing 
 

 Key: LUCENE-3009
 URL: https://issues.apache.org/jira/browse/LUCENE-3009
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Robert Muir
 Fix For: 3.2, 4.0


 In the binary release, i noticed lucene contribs (for example benchmark)
 that rely upon jar files, don't include them, nor do they have a README 
 telling
 you they depend upon them, nor is there any hint they actually have any 
 dependencies at all!
 we should improve this either by including the jars you need or by including 
 a README.txt telling you what a particular module/contrib depends upon.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3003) Move UnInvertedField into Lucene core

2011-03-31 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated LUCENE-3003:


Attachment: byte_size_32-bit-openjdk6.txt

Attached: 32-bit results

 Move UnInvertedField into Lucene core
 -

 Key: LUCENE-3003
 URL: https://issues.apache.org/jira/browse/LUCENE-3003
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3003.patch, LUCENE-3003.patch, 
 byte_size_32-bit-openjdk6.txt


 Solr's UnInvertedField lets you quickly lookup all terms ords for a
 given doc/field.
 Like, FieldCache, it inverts the index to produce this, and creates a
 RAM-resident data structure holding the bits; but, unlike FieldCache,
 it can handle multiple values per doc, and, it does not hold the term
 bytes in RAM.  Rather, it holds only term ords, and then uses
 TermsEnum to resolve ord - term.
 This is great eg for faceting, where you want to use int ords for all
 of your counting, and then only at the end you need to resolve the
 top N ords to their text.
 I think this is a useful core functionality, and we should move most
 of it into Lucene's core.  It's a good complement to FieldCache.  For
 this first baby step, I just move it into core and refactor Solr's
 usage of it.
 After this, as separate issues, I think there are some things we could
 explore/improve:
   * The first-pass that allocates lots of tiny byte[] looks like it
 could be inefficient.  Maybe we could use the byte slices from the
 indexer for this...
   * We can improve the RAM efficiency of the TermIndex: if the codec
 supports ords, and we are operating on one segment, we should just
 use it.  If not, we can use a more RAM-efficient data structure,
 eg an FST mapping to the ord.
   * We may be able to improve on the main byte[] representation by
 using packed ints instead of delta-vInt?
   * Eventually we should fold this ability into docvalues, ie we'd
 write the byte[] image at indexing time, and then loading would be
 fast, instead of uninverting

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2061) Generate jar containing test classes.

2011-03-31 Thread Steven Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe updated SOLR-2061:
--

Attachment: SOLR-2061.patch

This patch brings the Maven aspects up to snuff. 

All tests pass under Ant and Maven.  {{generate-maven-artifacts}} generates the 
test-frameword jars, and they are signed by {{sign-artifacts}}.

Unless there are objections, I'll commit this tomorrow, then backport to 
branch_3x.

 Generate jar containing test classes.
 -

 Key: SOLR-2061
 URL: https://issues.apache.org/jira/browse/SOLR-2061
 Project: Solr
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.1
Reporter: Drew Farris
Assignee: Steven Rowe
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: SOLR-2061.patch, SOLR-2061.patch, SOLR-2061.patch, 
 SOLR-2061.patch, SOLR-2061.patch


 Follow-on to LUCENE-2609 for the solr build -- it would be useful to generate 
 and deploy a jar contaiing the test classes so other projects could write 
 unit tests using the framework in Solr. 
 This may take care of SOLR-717 as well.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org