[Lucene.Net] [jira] [Updated] (LUCENENET-427) Provide limit on phrase analysis in FastVectorHighlighter (LUCENE-3234)

2011-06-27 Thread Digy (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENENET-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Digy updated LUCENENET-427:
---

Attachment: FastVectorHighlighter.patch

 Provide limit on phrase analysis in FastVectorHighlighter (LUCENE-3234)
 ---

 Key: LUCENENET-427
 URL: https://issues.apache.org/jira/browse/LUCENENET-427
 Project: Lucene.Net
  Issue Type: Improvement
Affects Versions: Lucene.Net 2.9.2, Lucene.Net 2.9.4, Lucene.Net 2.9.4g
Reporter: Digy
Priority: Minor
 Fix For: Lucene.Net 2.9.4g

 Attachments: FastVectorHighlighter.patch


 https://issues.apache.org/jira/browse/LUCENE-3234

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[Lucene.Net] [jira] [Resolved] (LUCENENET-427) Provide limit on phrase analysis in FastVectorHighlighter (LUCENE-3234)

2011-06-27 Thread Digy (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENENET-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Digy resolved LUCENENET-427.


Resolution: Fixed

Committed 

 Provide limit on phrase analysis in FastVectorHighlighter (LUCENE-3234)
 ---

 Key: LUCENENET-427
 URL: https://issues.apache.org/jira/browse/LUCENENET-427
 Project: Lucene.Net
  Issue Type: Improvement
Affects Versions: Lucene.Net 2.9.2, Lucene.Net 2.9.4, Lucene.Net 2.9.4g
Reporter: Digy
Priority: Minor
 Fix For: Lucene.Net 2.9.4g

 Attachments: FastVectorHighlighter.patch


 https://issues.apache.org/jira/browse/LUCENE-3234

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (LUCENE-3243) FastVectorHighlighter - add position offset to FieldPhraseList.WeightedPhraseInfo.Toffs

2011-06-27 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055347#comment-13055347
 ] 

Koji Sekiguchi commented on LUCENE-3243:


Thank you for the proposal and patch! I don't understand:

* What is the position offset? Isn't it just a position?
* Why is the position offset String?
* Why do you need setPositionOffset()? I don't understand the implementation of 
the method... it appends the argument position to the current position.

 FastVectorHighlighter - add position offset to 
 FieldPhraseList.WeightedPhraseInfo.Toffs
 ---

 Key: LUCENE-3243
 URL: https://issues.apache.org/jira/browse/LUCENE-3243
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/highlighter
Affects Versions: 3.2
 Environment: Lucene 3.2
Reporter: Jahangir Anwari
Priority: Minor
  Labels: feature, lucene
 Attachments: LUCENE-3243.patch.diff


 Needed to return position offsets along with highlighted snippets when using 
 FVH for highlighting. 
 Using the ([LUCENE-3141|https://issues.apache.org/jira/browse/LUCENE-3141]) 
 patch I was able to get the fragInfo for a particular Phrase search. 
 Currently the Toffs(Term offsets) class only stores the start and end offset.
 To get the position offset, I added the position offset information in Toffs 
 and FieldPhraseList class.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-1889) FastVectorHighlighter: support for additional queries

2011-06-27 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055364#comment-13055364
 ] 

Koji Sekiguchi commented on LUCENE-1889:


Patch looks really good!

bq. To handle RangeQuery, you'd need to add another such data structure: it 
would probably be best to introduce some new abstraction to represent all of 
these query-proxies.

Would you like to try this one? :)

bq. It seemed a less useful case to me anyway since we don't usually use range 
queries in the context of full text; more often they come up in structured 
metadata? Curious if you have requests for that?

I don't have the requirement for highlighting range queries, even wildcard, 
prefix and regexp either. Because I'm using FVH to highlight terms in N-gram 
fields, and these MultiTermQueries are not ideal for N-gram. But if FVH could 
cover range queries, it should be nicer for users.

 FastVectorHighlighter: support for additional queries
 -

 Key: LUCENE-1889
 URL: https://issues.apache.org/jira/browse/LUCENE-1889
 Project: Lucene - Java
  Issue Type: Wish
  Components: modules/highlighter
Reporter: Robert Muir
Priority: Minor
 Attachments: LUCENE-1889.patch


 I am using fastvectorhighlighter for some strange languages and it is working 
 well! 
 One thing i noticed immediately is that many query types are not highlighted 
 (multitermquery, multiphrasequery, etc)
 Here is one thing Michael M posted in the original ticket:
 {quote}
 I think a nice [eventual] model would be if we could simply re-run the
 scorer on the single document (using InstantiatedIndex maybe, or
 simply some sort of wrapper on the term vectors which are already a
 mini-inverted-index for a single doc), but extend the scorer API to
 tell us the exact term occurrences that participated in a match (which
 I don't think is exposed today).
 {quote}
 Due to strange requirements I am using something similar to this (but 
 specialized to our case).
 I am doing strange things like forcing multitermqueries to rewrite into 
 boolean queries so they will be highlighted,
 and flattening multiphrasequeries into boolean or'ed phrasequeries.
 I do not think these things would be 'fast', but i had a few ideas that might 
 help:
 * looking at contrib/highlighter, you can support FilteredQuery in flatten() 
 by calling getQuery() right?
 * maybe as a last resort, try Query.extractTerms() ?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-826) Language detector

2011-06-27 Thread JIRA

[ 
https://issues.apache.org/jira/browse/LUCENE-826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055374#comment-13055374
 ] 

Jan Høydahl commented on LUCENE-826:


Reviving this issue - would be interesting to arrive at a proposal whether this 
code could replace Tika's existing languageIdentifier. We still need to solve 
the case with small texts. I'm thinking of a hybrid solution where we fallback 
to a dictionary based detector for small texts, i.e. based on Ooo dictionaries.

 Language detector
 -

 Key: LUCENE-826
 URL: https://issues.apache.org/jira/browse/LUCENE-826
 Project: Lucene - Java
  Issue Type: New Feature
Reporter: Karl Wettin
Assignee: Karl Wettin
 Attachments: ld.tar.gz, ld.tar.gz


 A formula 1A token/ngram-based language detector. Requires a paragraph of 
 text to avoid false positive classifications. 
 Depends on contrib/analyzers/ngrams for tokenization, Weka for classification 
 (logistic support vector models) feature selection and normalization of token 
 freuencies.  Optionally Wikipedia and NekoHTML for training data harvesting.
 Initialized like this:
 {code}
 LanguageRoot root = new LanguageRoot(new 
 File(documentClassifier/language root));
 root.addBranch(uralic);
 root.addBranch(fino-ugric, uralic);
 root.addBranch(ugric, uralic);
 root.addLanguage(fino-ugric, fin, finnish, fi, Suomi);
 root.addBranch(proto-indo european);
 root.addBranch(germanic, proto-indo european);
 root.addBranch(northern germanic, germanic);
 root.addLanguage(northern germanic, dan, danish, da, Danmark);
 root.addLanguage(northern germanic, nor, norwegian, no, Norge);
 root.addLanguage(northern germanic, swe, swedish, sv, Sverige);
 root.addBranch(west germanic, germanic);
 root.addLanguage(west germanic, eng, english, en, UK);
 root.mkdirs();
 LanguageClassifier classifier = new LanguageClassifier(root);
 if (!new File(root.getDataPath(), trainingData.arff).exists()) {
   classifier.compileTrainingData(); // from wikipedia
 }
 classifier.buildClassifier();
 {code}
 Training set build from Wikipedia is the pages describing the home country of 
 each registred language in the language to train. Above example pass this 
 test:
 (testEquals is the same as assertEquals, just not required. Only one of them 
 fail, see comment.)
 {code}
 assertEquals(swe, classifier.classify(sweden_in_swedish).getISO());
 testEquals(swe, classifier.classify(norway_in_swedish).getISO());
 testEquals(swe, classifier.classify(denmark_in_swedish).getISO());
 testEquals(swe, classifier.classify(finland_in_swedish).getISO());
 testEquals(swe, classifier.classify(uk_in_swedish).getISO());
 testEquals(nor, classifier.classify(sweden_in_norwegian).getISO());
 assertEquals(nor, classifier.classify(norway_in_norwegian).getISO());
 testEquals(nor, classifier.classify(denmark_in_norwegian).getISO());
 testEquals(nor, classifier.classify(finland_in_norwegian).getISO());
 testEquals(nor, classifier.classify(uk_in_norwegian).getISO());
 testEquals(fin, classifier.classify(sweden_in_finnish).getISO());
 testEquals(fin, classifier.classify(norway_in_finnish).getISO());
 testEquals(fin, classifier.classify(denmark_in_finnish).getISO());
 assertEquals(fin, classifier.classify(finland_in_finnish).getISO());
 testEquals(fin, classifier.classify(uk_in_finnish).getISO());
 testEquals(dan, classifier.classify(sweden_in_danish).getISO());
 // it is ok that this fails. dan and nor are very similar, and the 
 document about norway in danish is very small.
 testEquals(dan, classifier.classify(norway_in_danish).getISO()); 
 assertEquals(dan, classifier.classify(denmark_in_danish).getISO());
 testEquals(dan, classifier.classify(finland_in_danish).getISO());
 testEquals(dan, classifier.classify(uk_in_danish).getISO());
 testEquals(eng, classifier.classify(sweden_in_english).getISO());
 testEquals(eng, classifier.classify(norway_in_english).getISO());
 testEquals(eng, classifier.classify(denmark_in_english).getISO());
 testEquals(eng, classifier.classify(finland_in_english).getISO());
 assertEquals(eng, classifier.classify(uk_in_english).getISO());
 {code}
 I don't know how well it works on lots of lanugages, but this fits my needs 
 for now. I'll try do more work on considering the language trees when 
 classifying.
 It takes a bit of time and RAM to build the training data, so the patch 
 contains a pre-compiled arff-file.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional 

[jira] [Commented] (SOLR-2614) stats with pivot

2011-06-27 Thread pengyao (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055396#comment-13055396
 ] 

pengyao commented on SOLR-2614:
---

somebody  help me 
or give me some  suggestion ?

or is it  easy to patch it?

thanks very much.

 stats with pivot
 

 Key: SOLR-2614
 URL: https://issues.apache.org/jira/browse/SOLR-2614
 Project: Solr
  Issue Type: Improvement
  Components: SearchComponents - other
Affects Versions: 4.0
Reporter: pengyao
Priority: Critical
 Fix For: 4.0


  Is it possible to get stats (like Stats Component: min ,max, sum, count,
 missing, sumOfSquares, mean and stddev) from numeric fields inside
 hierarchical facets (with more than one level, like Pivot)?
  I would like to query:
 ...?q=*:*version=2.2start=0rows=0stats=truestats.field=numeric_field1stats.field=numeric_field2stats.pivot=field_x,field_y,field_z
  and get min, max, sum, count, etc. from numeric_field1 and
 numeric_field2 from all combinations of field_x, field_y and field_z
 (hierarchical values).
  Using stats.facet I get just one field at one level and using
 facet.pivot I get just counts, but no stats.
  Looping in client application to do all combinations of facets values
 will be to slow because there is a lot of combinations.
  Thanks a lot!
 this  is  very  import,because  only counts value,it's no use for sometimes.
 please add   stats with pivot  in solr 4.0 
 thanks a lot

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3079) Facetiing module

2011-06-27 Thread Toke Eskildsen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055402#comment-13055402
 ] 

Toke Eskildsen commented on LUCENE-3079:


This is quite another design than the quarter-baked one I've proposed with 
SOLR-2412 (which is really just a thin wrapper around LUCENE-2369). While 
maintaining a sidecar index makes the workflow more complicated, I would expect 
that it is beneficial for re-open speed and scalability.

Technical note: For hierarchical faceting, I find that it is possible to avoid 
storing all levels in the hierarchy. By maintaining two numbers for each tag, 
denoting the tag-level and the level for the previous tag that matches, only 
the relevant tags needs to be indexed (full explanation at 
https://sbdevel.wordpress.com/2010/10/05/fast-hierarchical-faceting/).

Kudos for contributing solid code. I am looking forward to seeing the patch.

 Facetiing module
 

 Key: LUCENE-3079
 URL: https://issues.apache.org/jira/browse/LUCENE-3079
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
 Attachments: LUCENE-3079.patch


 Faceting is a hugely important feature, available in Solr today but
 not [easily] usable by Lucene-only apps.
 We should fix this, by creating a shared faceting module.
 Ideally, we factor out Solr's faceting impl, and maybe poach/merge
 from other impls (eg Bobo browse).
 Hoss describes some important challenges we'll face in doing this
 (http://markmail.org/message/5w35c2fr4zkiwsz6), copied here:
 {noformat}
 To look at faceting as a concrete example, there are big the reasons 
 faceting works so well in Solr: Solr has total control over the 
 index, knows exactly when the index has changed to rebuild caches, has a 
 strict schema so it can make sense of field types and 
 pick faceting algos accordingly, has multi-phase distributed search 
 approach to get exact counts efficiently across multiple shards, etc...
 (and there are still a lot of additional enhancements and improvements 
 that can be made to take even more advantage of knowledge solr has because 
 it owns the index that we no one has had time to tackle)
 {noformat}
 This is a great list of the things we face in refactoring.  It's also
 important because, if Solr needed to be so deeply intertwined with
 caching, schema, etc., other apps that want to facet will have the
 same needs and so we really have to address them in creating the
 shared module.
 I think we should get a basic faceting module started, but should not
 cut Solr over at first.  We should iterate on the module, fold in
 improvements, etc., and then, once we can fully verify that cutting
 over doesn't hurt Solr (ie lose functionality or performance) we can
 later cutover.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3231) Add fixed size DocValues int variants expose Arrays where possible

2011-06-27 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-3231.
-

   Resolution: Fixed
 Assignee: Simon Willnauer
Lucene Fields: [New, Patch Available]  (was: [New])

 Add fixed size DocValues int variants  expose Arrays where possible
 

 Key: LUCENE-3231
 URL: https://issues.apache.org/jira/browse/LUCENE-3231
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-3231.patch, LUCENE-3231.patch


 currently we only have variable bit packed ints implementation. for flexible 
 scoring or loading field caches it is desirable to have fixed int 
 implementations for 8, 16, 32 and 64 bit. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2583) Make external scoring more efficient (ExternalFileField, FileFloatSource)

2011-06-27 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055437#comment-13055437
 ] 

Koji Sekiguchi commented on SOLR-2583:
--

I'd like the feature as I'm using ExternalFileField a lot!

bq. what do you say regarding the suggestion to use HashMap up to ~5.5% and 
above that using the float[]?

Looking at your test, I think it is reasonable. But I'd like to use 
CompactByteArray. I saw it wins over HashMap and float[] when 5% and above in 
my test.

How about introducing compact=yes (default is no and float[] is used) with 
sparse=yes/no/auto?

 Make external scoring more efficient (ExternalFileField, FileFloatSource)
 -

 Key: SOLR-2583
 URL: https://issues.apache.org/jira/browse/SOLR-2583
 Project: Solr
  Issue Type: Improvement
  Components: search
Reporter: Martin Grotzke
Priority: Minor
 Attachments: FileFloatSource.java.patch, patch.txt


 External scoring eats much memory, depending on the number of documents in 
 the index. The ExternalFileField (used for external scoring) uses 
 FileFloatSource, where one FileFloatSource is created per external scoring 
 file. FileFloatSource creates a float array with the size of the number of 
 docs (this is also done if the file to load is not found). If there are much 
 less entries in the scoring file than there are number of docs in total the 
 big float array wastes much memory.
 This could be optimized by using a map of doc - score, so that the map 
 contains as many entries as there are scoring entries in the external file, 
 but not more.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3079) Facetiing module

2011-06-27 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055443#comment-13055443
 ] 

Shai Erera commented on LUCENE-3079:


Thanks Toke for the pointer. I think it's very interesting. We've actually 
explored in the past storing just the category/leaf, instead of the entire 
hierarchy, in the document. The search response time was much slower than what 
I reported above (nearly 2x slowdown). While storing the entire hierarchy 
indeed consumes more space, it is more performing at search time, and we figure 
that space today is cheap, and usually search apps are more interested in 
faster search response times and are willing to spend some more time at 
indexing and analysis stages.

Nevertheless, the link you provided proposes an interesting way to manage the 
hierarchy, and I think it's worth exploring at some point. Could be that it 
will perform better than how we managed it when we indexed just the leaf 
category for each document. We'd also need to see how to update the taxonomy on 
the go. For example, it describes that for A/B/C you know that its level is 3 
(that's easy) and that the previous category/tag that matches (P) is A. But 
what if at some point A/B is added to a document? What happens to the data 
indexed for the doc w/ A/B/C, which now its previous matching category is A/B? 
It's not clear to me, but could be that I've missed the description in the 
proposal.

I am very close to uploading the patch. Hopefully I'll upload it by the end of 
my day.

 Facetiing module
 

 Key: LUCENE-3079
 URL: https://issues.apache.org/jira/browse/LUCENE-3079
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
 Attachments: LUCENE-3079.patch


 Faceting is a hugely important feature, available in Solr today but
 not [easily] usable by Lucene-only apps.
 We should fix this, by creating a shared faceting module.
 Ideally, we factor out Solr's faceting impl, and maybe poach/merge
 from other impls (eg Bobo browse).
 Hoss describes some important challenges we'll face in doing this
 (http://markmail.org/message/5w35c2fr4zkiwsz6), copied here:
 {noformat}
 To look at faceting as a concrete example, there are big the reasons 
 faceting works so well in Solr: Solr has total control over the 
 index, knows exactly when the index has changed to rebuild caches, has a 
 strict schema so it can make sense of field types and 
 pick faceting algos accordingly, has multi-phase distributed search 
 approach to get exact counts efficiently across multiple shards, etc...
 (and there are still a lot of additional enhancements and improvements 
 that can be made to take even more advantage of knowledge solr has because 
 it owns the index that we no one has had time to tackle)
 {noformat}
 This is a great list of the things we face in refactoring.  It's also
 important because, if Solr needed to be so deeply intertwined with
 caching, schema, etc., other apps that want to facet will have the
 same needs and so we really have to address them in creating the
 shared module.
 I think we should get a basic faceting module started, but should not
 cut Solr over at first.  We should iterate on the module, fold in
 improvements, etc., and then, once we can fully verify that cutting
 over doesn't hurt Solr (ie lose functionality or performance) we can
 later cutover.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-1889) FastVectorHighlighter: support for additional queries

2011-06-27 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055452#comment-13055452
 ] 

Robert Muir commented on LUCENE-1889:
-

{quote}
A possible issue is that regex support will differ from RegexpQuery, but I 
think? that Java's is a superset, so should be ok, but I'm not sure about this 
one.
{quote}

Actually, these are totally different syntaxes!

An alternative way to flatten these multitermqueries could be to implement 
o.a.l.index.Terms with what is in the term vector... then you could rewrite 
them with their own code.

trying to generate an equivalent string pattern could be a little problematic, 
for example wildcard supports escaped terms (and could contain other characters 
that are java.util.regex syntax characters but not wildcard syntax characters), 
the regex syntax is different, etc.

if you still decide you want to do it this way though, i would use 
o.a.l.util.automaton instead of java.util.regex? Besides being faster, this is 
internally what these queries are using anyway, so you can convert them with 
for example WildcardQuery.toAutomaton(). Then, union these and match against 
the union'ed machine instead of a List.

But personally i would look at going the Terms/rewriteMethod route if possible, 
this way all multitermqueries will just work.


 FastVectorHighlighter: support for additional queries
 -

 Key: LUCENE-1889
 URL: https://issues.apache.org/jira/browse/LUCENE-1889
 Project: Lucene - Java
  Issue Type: Wish
  Components: modules/highlighter
Reporter: Robert Muir
Priority: Minor
 Attachments: LUCENE-1889.patch


 I am using fastvectorhighlighter for some strange languages and it is working 
 well! 
 One thing i noticed immediately is that many query types are not highlighted 
 (multitermquery, multiphrasequery, etc)
 Here is one thing Michael M posted in the original ticket:
 {quote}
 I think a nice [eventual] model would be if we could simply re-run the
 scorer on the single document (using InstantiatedIndex maybe, or
 simply some sort of wrapper on the term vectors which are already a
 mini-inverted-index for a single doc), but extend the scorer API to
 tell us the exact term occurrences that participated in a match (which
 I don't think is exposed today).
 {quote}
 Due to strange requirements I am using something similar to this (but 
 specialized to our case).
 I am doing strange things like forcing multitermqueries to rewrite into 
 boolean queries so they will be highlighted,
 and flattening multiphrasequeries into boolean or'ed phrasequeries.
 I do not think these things would be 'fast', but i had a few ideas that might 
 help:
 * looking at contrib/highlighter, you can support FilteredQuery in flatten() 
 by calling getQuery() right?
 * maybe as a last resort, try Query.extractTerms() ?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: [VOTE] release 3.3 (take two)

2011-06-27 Thread Uwe Schindler
Hi,

This time, all is fine:

- Smoke tests with tests.iter=100,tests.multiplicator=100,tests.nightly on Sun 
Java 1.5.0_22 Solaris-64 passed for Lucene-Core (using Lucene-Src package, so 
even compiles fine). Lucene-Contrib failed somehow, but with iter=1 passed. One 
core dump on testing contrib/analysis-common/ (Portuguese), seems Java 5 bug 
happens sometimes (unfortunately log  hs_err is gone), not reproducible. So 
all fine, don't want to accuse Java 5 - but policeman is angry and wants an 
expensive ticket + driver license removed from Java 5 :-)
- All signatures are fine, signatures are all from Robert Muir who I know 
personally: find . -name '*.asc' | xargs -L1 gpg --verify
- Artifact META-INFs are correct, versions and revno are correct
- Lucene-core-3.3.0.jar from Maven plugged into PANGAEA was successfully 
without recompiling. No Hotspot issues - MMap is fine
- Extracted Solr src package, ant test with iter=1 and multiplicator=100 and 
nightly passes from root folder (includes lucene tests)
- Extracted Lucene and Solr binary packages and checked contents for 
completeness (Licenses, Javadocs,...) - fine! Solr.WAR file contains correct 
artifact versions, unfortunately the lucene jar files are not available in the 
dist folder inside solr binary - is that wanted?

Small issue:

- systemrequirements.html: The JUNIT version as requirement is ahm, ah, huho 
very old. We should remove that in future, as JUnit is bundled with src package.

So here is my PMC +1 !

Uwe
- Generics|Java5|Signature|Manifest Policeman with PMC vote -

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Robert Muir [mailto:rcm...@gmail.com]
 Sent: Sunday, June 26, 2011 5:12 PM
 To: dev@lucene.apache.org
 Subject: [VOTE] release 3.3 (take two)
 
 Artifacts here:
 
 http://s.apache.org/lusolr330rc1
 
 working release notes here:
 
 http://wiki.apache.org/lucene-java/ReleaseNote33
 http://wiki.apache.org/solr/ReleaseNote33
 
 To see the changes between the previous release candidate (rc0):
 svn diff -r 1139028:1139775
 https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_3
 
 Here is my +1
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
 commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3079) Facetiing module

2011-06-27 Thread Toke Eskildsen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055480#comment-13055480
 ] 

Toke Eskildsen commented on LUCENE-3079:


SOLR-2412/LUCENE-2369 were created with the trade-offs (relatively) long 
startup, low memory, high performance: When the index is (re)opened, the 
hierarchy is analyzed by iterating the terms (it could be offloaded to 
index-time, but it is still iterate-the-entire-term-list after each change). 
This does not play well with real-time, but should be a nice fit for large 
indexes with low update rate.

As for speed, my theory is that the sparser hierarchy (only the concrete paths) 
wins due to less counting, but without another solution to compare against it 
has so far remained a theory. There are some measurements at 
https://sbdevel.wordpress.com/2010/10/11/hierarchical-faceting/ but I find that 
for hierarchical faceting, small changes to test-setups can easily have vast 
implications on performance, so they are not comparable to your 
million-document test.

 Facetiing module
 

 Key: LUCENE-3079
 URL: https://issues.apache.org/jira/browse/LUCENE-3079
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
 Attachments: LUCENE-3079.patch


 Faceting is a hugely important feature, available in Solr today but
 not [easily] usable by Lucene-only apps.
 We should fix this, by creating a shared faceting module.
 Ideally, we factor out Solr's faceting impl, and maybe poach/merge
 from other impls (eg Bobo browse).
 Hoss describes some important challenges we'll face in doing this
 (http://markmail.org/message/5w35c2fr4zkiwsz6), copied here:
 {noformat}
 To look at faceting as a concrete example, there are big the reasons 
 faceting works so well in Solr: Solr has total control over the 
 index, knows exactly when the index has changed to rebuild caches, has a 
 strict schema so it can make sense of field types and 
 pick faceting algos accordingly, has multi-phase distributed search 
 approach to get exact counts efficiently across multiple shards, etc...
 (and there are still a lot of additional enhancements and improvements 
 that can be made to take even more advantage of knowledge solr has because 
 it owns the index that we no one has had time to tackle)
 {noformat}
 This is a great list of the things we face in refactoring.  It's also
 important because, if Solr needed to be so deeply intertwined with
 caching, schema, etc., other apps that want to facet will have the
 same needs and so we really have to address them in creating the
 shared module.
 I think we should get a basic faceting module started, but should not
 cut Solr over at first.  We should iterate on the module, fold in
 improvements, etc., and then, once we can fully verify that cutting
 over doesn't hurt Solr (ie lose functionality or performance) we can
 later cutover.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3217) Improve DocValues merging

2011-06-27 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-3217:


Attachment: LUCENE-3217.patch

here is a patch for int variant. All fixed int variants are merged without 
loading them into memory and bulk merged if no deleted docs are present.

 Improve DocValues merging
 -

 Key: LUCENE-3217
 URL: https://issues.apache.org/jira/browse/LUCENE-3217
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-3217.patch


 Some DocValues impl. still load all values from merged segments into memory 
 during merge. For efficiency we should merge them on the fly without 
 buffering in memory

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3179) OpenBitSet.prevSetBit()

2011-06-27 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055506#comment-13055506
 ] 

Michael McCandless commented on LUCENE-3179:


Patch looks good Uwe -- thanks!

 OpenBitSet.prevSetBit()
 ---

 Key: LUCENE-3179
 URL: https://issues.apache.org/jira/browse/LUCENE-3179
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Paul Elschot
Assignee: Paul Elschot
Priority: Minor
 Fix For: 3.3, 4.0

 Attachments: LUCENE-3179-fix.patch, LUCENE-3179-fix.patch, 
 LUCENE-3179-long-ntz.patch, LUCENE-3179-long-ntz.patch, LUCENE-3179.patch, 
 LUCENE-3179.patch, LUCENE-3179.patch, TestBitUtil.java, TestOpenBitSet.patch


 Find a previous set bit in an OpenBitSet.
 Useful for parent testing in nested document query execution LUCENE-2454 .

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3240) Move FunctionQuery, ValueSources and DocValues to Queries module

2011-06-27 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055510#comment-13055510
 ] 

Michael McCandless commented on LUCENE-3240:


Looks great Chris!

 Move FunctionQuery, ValueSources and DocValues to Queries module
 

 Key: LUCENE-3240
 URL: https://issues.apache.org/jira/browse/LUCENE-3240
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: core/search
Reporter: Chris Male
 Fix For: 4.0

 Attachments: LUCENE-3240.patch, LUCENE-3240.patch, LUCENE-3240.patch


 Having resolved the FunctionQuery sorting issue and moved the MutableValue 
 classes, we can now move FunctionQuery, ValueSources and DocValues to a 
 Queries module.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3079) Facetiing module

2011-06-27 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055526#comment-13055526
 ] 

Chris Male commented on LUCENE-3079:


Great contribution Shai.  What about putting it into a branch? I think it 
really does need a thorough review before we put it into trunk.

 Facetiing module
 

 Key: LUCENE-3079
 URL: https://issues.apache.org/jira/browse/LUCENE-3079
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
 Attachments: LUCENE-3079.patch, LUCENE-3079.patch


 Faceting is a hugely important feature, available in Solr today but
 not [easily] usable by Lucene-only apps.
 We should fix this, by creating a shared faceting module.
 Ideally, we factor out Solr's faceting impl, and maybe poach/merge
 from other impls (eg Bobo browse).
 Hoss describes some important challenges we'll face in doing this
 (http://markmail.org/message/5w35c2fr4zkiwsz6), copied here:
 {noformat}
 To look at faceting as a concrete example, there are big the reasons 
 faceting works so well in Solr: Solr has total control over the 
 index, knows exactly when the index has changed to rebuild caches, has a 
 strict schema so it can make sense of field types and 
 pick faceting algos accordingly, has multi-phase distributed search 
 approach to get exact counts efficiently across multiple shards, etc...
 (and there are still a lot of additional enhancements and improvements 
 that can be made to take even more advantage of knowledge solr has because 
 it owns the index that we no one has had time to tackle)
 {noformat}
 This is a great list of the things we face in refactoring.  It's also
 important because, if Solr needed to be so deeply intertwined with
 caching, schema, etc., other apps that want to facet will have the
 same needs and so we really have to address them in creating the
 shared module.
 I think we should get a basic faceting module started, but should not
 cut Solr over at first.  We should iterate on the module, fold in
 improvements, etc., and then, once we can fully verify that cutting
 over doesn't hurt Solr (ie lose functionality or performance) we can
 later cutover.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3079) Facetiing module

2011-06-27 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055532#comment-13055532
 ] 

Shai Erera commented on LUCENE-3079:


We can put it in a branch for trunk, in case we plan to refactor the code right 
away (at first I just thought to get it to compile against trunk). I thought 
that at first people would like to get hands on experience with it, before we 
discuss changes and refactoring. I mean, this code can really be released with 
Lucene's next 3x release. And since everything is @lucene.experimental, and is 
in its own separate contrib/module, I don't think a branch will ease off the 
review or refactoring process?

 Facetiing module
 

 Key: LUCENE-3079
 URL: https://issues.apache.org/jira/browse/LUCENE-3079
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
 Attachments: LUCENE-3079.patch, LUCENE-3079.patch


 Faceting is a hugely important feature, available in Solr today but
 not [easily] usable by Lucene-only apps.
 We should fix this, by creating a shared faceting module.
 Ideally, we factor out Solr's faceting impl, and maybe poach/merge
 from other impls (eg Bobo browse).
 Hoss describes some important challenges we'll face in doing this
 (http://markmail.org/message/5w35c2fr4zkiwsz6), copied here:
 {noformat}
 To look at faceting as a concrete example, there are big the reasons 
 faceting works so well in Solr: Solr has total control over the 
 index, knows exactly when the index has changed to rebuild caches, has a 
 strict schema so it can make sense of field types and 
 pick faceting algos accordingly, has multi-phase distributed search 
 approach to get exact counts efficiently across multiple shards, etc...
 (and there are still a lot of additional enhancements and improvements 
 that can be made to take even more advantage of knowledge solr has because 
 it owns the index that we no one has had time to tackle)
 {noformat}
 This is a great list of the things we face in refactoring.  It's also
 important because, if Solr needed to be so deeply intertwined with
 caching, schema, etc., other apps that want to facet will have the
 same needs and so we really have to address them in creating the
 shared module.
 I think we should get a basic faceting module started, but should not
 cut Solr over at first.  We should iterate on the module, fold in
 improvements, etc., and then, once we can fully verify that cutting
 over doesn't hurt Solr (ie lose functionality or performance) we can
 later cutover.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (LUCENE-3079) Facetiing module

2011-06-27 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055532#comment-13055532
 ] 

Shai Erera edited comment on LUCENE-3079 at 6/27/11 1:24 PM:
-

We can put it in a branch for trunk, in case we plan to refactor the code right 
away (at first I just thought to get it to compile against trunk). I thought 
that at first people would like to get hands on experience with it, before we 
discuss changes and refactoring. I mean, this code can really be released with 
Lucene's next 3x release. And since everything is @lucene.experimental, and is 
in its own separate contrib/module, I don't think a branch will ease off the 
review or refactoring process?

I guess what I'm aiming for is for our users to get this feature soon. And I'm 
afraid that putting it in a branch will only delay it.

  was (Author: shaie):
We can put it in a branch for trunk, in case we plan to refactor the code 
right away (at first I just thought to get it to compile against trunk). I 
thought that at first people would like to get hands on experience with it, 
before we discuss changes and refactoring. I mean, this code can really be 
released with Lucene's next 3x release. And since everything is 
@lucene.experimental, and is in its own separate contrib/module, I don't think 
a branch will ease off the review or refactoring process?
  
 Facetiing module
 

 Key: LUCENE-3079
 URL: https://issues.apache.org/jira/browse/LUCENE-3079
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
 Attachments: LUCENE-3079.patch, LUCENE-3079.patch


 Faceting is a hugely important feature, available in Solr today but
 not [easily] usable by Lucene-only apps.
 We should fix this, by creating a shared faceting module.
 Ideally, we factor out Solr's faceting impl, and maybe poach/merge
 from other impls (eg Bobo browse).
 Hoss describes some important challenges we'll face in doing this
 (http://markmail.org/message/5w35c2fr4zkiwsz6), copied here:
 {noformat}
 To look at faceting as a concrete example, there are big the reasons 
 faceting works so well in Solr: Solr has total control over the 
 index, knows exactly when the index has changed to rebuild caches, has a 
 strict schema so it can make sense of field types and 
 pick faceting algos accordingly, has multi-phase distributed search 
 approach to get exact counts efficiently across multiple shards, etc...
 (and there are still a lot of additional enhancements and improvements 
 that can be made to take even more advantage of knowledge solr has because 
 it owns the index that we no one has had time to tackle)
 {noformat}
 This is a great list of the things we face in refactoring.  It's also
 important because, if Solr needed to be so deeply intertwined with
 caching, schema, etc., other apps that want to facet will have the
 same needs and so we really have to address them in creating the
 shared module.
 I think we should get a basic faceting module started, but should not
 cut Solr over at first.  We should iterate on the module, fold in
 improvements, etc., and then, once we can fully verify that cutting
 over doesn't hurt Solr (ie lose functionality or performance) we can
 later cutover.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-1536) if a filter can support random access API, we should use it

2011-06-27 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055544#comment-13055544
 ] 

Michael McCandless commented on LUCENE-1536:


bq. My question: Do we really need to make the delDocs inverse in this issue?

I agree, let's break this (inverting delDocs/skipDocs) into a new issue and do 
it first, then come back to this issue.  There's still more work to do here, eg 
the bits should be stored inverted too (and the sparse encoding flipped).

bq.  The method name getNotDeletedDocs() should also be getVisibleDocs() or 
similar [I don't like double negation].

+1 for getVisibleDocs -- I also don't like double negation!

bq. In general, reversing the delDocs might be a good idea, but we should do it 
separate and hard (not allow both variants implemented by IndexReader  Co.).

I agree it must be hard cutover -- no more getDelDocs, and getVisibleDocs is 
abstract in IR.

bq. About the impls: FieldCacheRangeFilter can also implement getBits() 
directly as FieldCache is random access. It should just return an own Bits impl 
for the DocIdSet that checks the filtering in get(index).

Ahh, right: FCRF has no trouble being random access, and it can re-use the 
already created matchDoc in the subclasses.

 if a filter can support random access API, we should use it
 ---

 Key: LUCENE-1536
 URL: https://issues.apache.org/jira/browse/LUCENE-1536
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/search
Affects Versions: 2.4
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: CachedFilterIndexReader.java, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch


 I ran some performance tests, comparing applying a filter via
 random-access API instead of current trunk's iterator API.
 This was inspired by LUCENE-1476, where we realized deletions should
 really be implemented just like a filter, but then in testing found
 that switching deletions to iterator was a very sizable performance
 hit.
 Some notes on the test:
   * Index is first 2M docs of Wikipedia.  Test machine is Mac OS X
 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153.
   * I test across multiple queries.  1-X means an OR query, eg 1-4
 means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2
 AND 3 AND 4.  u s means united states (phrase search).
   * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90,
 95, 98, 99, 99.9 (filter is non-null but all bits are set),
 100 (filter=null, control)).
   * Method high means I use random-access filter API in
 IndexSearcher's main loop.  Method low means I use random-access
 filter API down in SegmentTermDocs (just like deleted docs
 today).
   * Baseline (QPS) is current trunk, where filter is applied as iterator up
 high (ie in IndexSearcher's search loop).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-1889) FastVectorHighlighter: support for additional queries

2011-06-27 Thread Mike Sokolov (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055546#comment-13055546
 ] 

Mike Sokolov commented on LUCENE-1889:
--

Robert: Thanks that sounds like good advice. I wasn't completely happy with 
that Pattern list anyway; really still just feeling my way around Lucene and 
trying random things at this point a bit.  I wonder if you could comment on 
this possible other idea, following up on Mike M's quote above:

I tried hacking up SpanScorer to see if I could get positions out of it using a 
custom Collector, but found that by the time a doc was reported, SpanScorer had 
already iterated over and dropped the positions.  I was thinking of adding a 
Collector.collectSpans(int start, int end), and having SpanScorer call it (it 
would be an empty function in Collector proper) or something like that.  At 
this point I'm wondering if it might be possible to rewrite many queries as 
some kind of SpanQuery (using a visitor), without the need to actually alter 
all the Query implementations.  Is there a better way?

I was also thinking it might be possible to capture and re-use positions 
gathered during the initial scoring episode rather than having to re-score 
during highlighting, but I guess that's a separate issue.

Koji: Thanks for the review, but it sounds like some more iteration is needed 
here; for sure on RegExpQuery.  I probably should have tested that a bit more 
carefully, although the one thing I tried (character classes) seems to work the 
same.

 FastVectorHighlighter: support for additional queries
 -

 Key: LUCENE-1889
 URL: https://issues.apache.org/jira/browse/LUCENE-1889
 Project: Lucene - Java
  Issue Type: Wish
  Components: modules/highlighter
Reporter: Robert Muir
Priority: Minor
 Attachments: LUCENE-1889.patch


 I am using fastvectorhighlighter for some strange languages and it is working 
 well! 
 One thing i noticed immediately is that many query types are not highlighted 
 (multitermquery, multiphrasequery, etc)
 Here is one thing Michael M posted in the original ticket:
 {quote}
 I think a nice [eventual] model would be if we could simply re-run the
 scorer on the single document (using InstantiatedIndex maybe, or
 simply some sort of wrapper on the term vectors which are already a
 mini-inverted-index for a single doc), but extend the scorer API to
 tell us the exact term occurrences that participated in a match (which
 I don't think is exposed today).
 {quote}
 Due to strange requirements I am using something similar to this (but 
 specialized to our case).
 I am doing strange things like forcing multitermqueries to rewrite into 
 boolean queries so they will be highlighted,
 and flattening multiphrasequeries into boolean or'ed phrasequeries.
 I do not think these things would be 'fast', but i had a few ideas that might 
 help:
 * looking at contrib/highlighter, you can support FilteredQuery in flatten() 
 by calling getQuery() right?
 * maybe as a last resort, try Query.extractTerms() ?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3079) Facetiing module

2011-06-27 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3079:


Attachment: LUCENE-3079.patch

just some trivial test modifications so the tests work with an unmodified 
LuceneTestCase:

* in some cases, if an assertion failed it would print the seed... but LTC does 
this.
* in other tests, the test wanted to repeat a random sequence, but instead of 
exposing LTC internals, the test just grabs random.nextLong, makes a new Random 
from this, and then resets it with .setSeed.


 Facetiing module
 

 Key: LUCENE-3079
 URL: https://issues.apache.org/jira/browse/LUCENE-3079
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
 Attachments: LUCENE-3079.patch, LUCENE-3079.patch, LUCENE-3079.patch


 Faceting is a hugely important feature, available in Solr today but
 not [easily] usable by Lucene-only apps.
 We should fix this, by creating a shared faceting module.
 Ideally, we factor out Solr's faceting impl, and maybe poach/merge
 from other impls (eg Bobo browse).
 Hoss describes some important challenges we'll face in doing this
 (http://markmail.org/message/5w35c2fr4zkiwsz6), copied here:
 {noformat}
 To look at faceting as a concrete example, there are big the reasons 
 faceting works so well in Solr: Solr has total control over the 
 index, knows exactly when the index has changed to rebuild caches, has a 
 strict schema so it can make sense of field types and 
 pick faceting algos accordingly, has multi-phase distributed search 
 approach to get exact counts efficiently across multiple shards, etc...
 (and there are still a lot of additional enhancements and improvements 
 that can be made to take even more advantage of knowledge solr has because 
 it owns the index that we no one has had time to tackle)
 {noformat}
 This is a great list of the things we face in refactoring.  It's also
 important because, if Solr needed to be so deeply intertwined with
 caching, schema, etc., other apps that want to facet will have the
 same needs and so we really have to address them in creating the
 shared module.
 I think we should get a basic faceting module started, but should not
 cut Solr over at first.  We should iterate on the module, fold in
 improvements, etc., and then, once we can fully verify that cutting
 over doesn't hurt Solr (ie lose functionality or performance) we can
 later cutover.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3079) Facetiing module

2011-06-27 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055570#comment-13055570
 ] 

Robert Muir commented on LUCENE-3079:
-

{quote}
I guess what I'm aiming for is for our users to get this feature soon. And I'm 
afraid that putting it in a branch will only delay it.
{quote}

+1

My suggestion:
# commit to branch 3.x with @experimental.
# next, do a fast port to trunk, this doesnt mean heavy refactoring to take 
advantage of things like docvalues, just get it working correctly on trunk's 
APIs.
# finally, close this issue and do improvements as normal, backporting 
whichever ones are easy and make sense, like any other issue.


 Facetiing module
 

 Key: LUCENE-3079
 URL: https://issues.apache.org/jira/browse/LUCENE-3079
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
 Attachments: LUCENE-3079.patch, LUCENE-3079.patch, LUCENE-3079.patch


 Faceting is a hugely important feature, available in Solr today but
 not [easily] usable by Lucene-only apps.
 We should fix this, by creating a shared faceting module.
 Ideally, we factor out Solr's faceting impl, and maybe poach/merge
 from other impls (eg Bobo browse).
 Hoss describes some important challenges we'll face in doing this
 (http://markmail.org/message/5w35c2fr4zkiwsz6), copied here:
 {noformat}
 To look at faceting as a concrete example, there are big the reasons 
 faceting works so well in Solr: Solr has total control over the 
 index, knows exactly when the index has changed to rebuild caches, has a 
 strict schema so it can make sense of field types and 
 pick faceting algos accordingly, has multi-phase distributed search 
 approach to get exact counts efficiently across multiple shards, etc...
 (and there are still a lot of additional enhancements and improvements 
 that can be made to take even more advantage of knowledge solr has because 
 it owns the index that we no one has had time to tackle)
 {noformat}
 This is a great list of the things we face in refactoring.  It's also
 important because, if Solr needed to be so deeply intertwined with
 caching, schema, etc., other apps that want to facet will have the
 same needs and so we really have to address them in creating the
 shared module.
 I think we should get a basic faceting module started, but should not
 cut Solr over at first.  We should iterate on the module, fold in
 improvements, etc., and then, once we can fully verify that cutting
 over doesn't hurt Solr (ie lose functionality or performance) we can
 later cutover.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-1889) FastVectorHighlighter: support for additional queries

2011-06-27 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055580#comment-13055580
 ] 

Robert Muir commented on LUCENE-1889:
-

Hi Mike, Simon has an issue open to make a lot of what you are talking about 
wrt positions easier:
LUCENE-2878

In my opinion once LUCENE-2878 is resolved, we may want to then consider adding 
the capability for a codec to encode the offset deltas in parallel with the 
positions (so its just a stream of delta-encoded integers you read in parallel 
with the positions for things like highlighting). 

Then, highlighting would not require term vectors anymore right? I think this 
would be much faster and more efficient without the space waste of term 
vectors, and we could prototype such a thing by encoding these ourselves into 
the payloads... which is close to the same, but I think ultimately optionally 
supporting offsets this way will be better especially with block-oriented 
compression algorithms.



 FastVectorHighlighter: support for additional queries
 -

 Key: LUCENE-1889
 URL: https://issues.apache.org/jira/browse/LUCENE-1889
 Project: Lucene - Java
  Issue Type: Wish
  Components: modules/highlighter
Reporter: Robert Muir
Priority: Minor
 Attachments: LUCENE-1889.patch


 I am using fastvectorhighlighter for some strange languages and it is working 
 well! 
 One thing i noticed immediately is that many query types are not highlighted 
 (multitermquery, multiphrasequery, etc)
 Here is one thing Michael M posted in the original ticket:
 {quote}
 I think a nice [eventual] model would be if we could simply re-run the
 scorer on the single document (using InstantiatedIndex maybe, or
 simply some sort of wrapper on the term vectors which are already a
 mini-inverted-index for a single doc), but extend the scorer API to
 tell us the exact term occurrences that participated in a match (which
 I don't think is exposed today).
 {quote}
 Due to strange requirements I am using something similar to this (but 
 specialized to our case).
 I am doing strange things like forcing multitermqueries to rewrite into 
 boolean queries so they will be highlighted,
 and flattening multiphrasequeries into boolean or'ed phrasequeries.
 I do not think these things would be 'fast', but i had a few ideas that might 
 help:
 * looking at contrib/highlighter, you can support FilteredQuery in flatten() 
 by calling getQuery() right?
 * maybe as a last resort, try Query.extractTerms() ?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS-MAVEN] Lucene-Solr-Maven-trunk #161: POMs out of sync

2011-06-27 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-Maven-trunk/161/

No tests ran.

Build Log (for compile errors):
[...truncated 14056 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-1536) if a filter can support random access API, we should use it

2011-06-27 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055581#comment-13055581
 ] 

Robert Muir commented on LUCENE-1536:
-

{quote}
+1 for getVisibleDocs – I also don't like double negation!
{quote}

I agree... getVisibleDocs() or another alternative would be getLiveDocs()

 if a filter can support random access API, we should use it
 ---

 Key: LUCENE-1536
 URL: https://issues.apache.org/jira/browse/LUCENE-1536
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/search
Affects Versions: 2.4
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: CachedFilterIndexReader.java, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch


 I ran some performance tests, comparing applying a filter via
 random-access API instead of current trunk's iterator API.
 This was inspired by LUCENE-1476, where we realized deletions should
 really be implemented just like a filter, but then in testing found
 that switching deletions to iterator was a very sizable performance
 hit.
 Some notes on the test:
   * Index is first 2M docs of Wikipedia.  Test machine is Mac OS X
 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153.
   * I test across multiple queries.  1-X means an OR query, eg 1-4
 means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2
 AND 3 AND 4.  u s means united states (phrase search).
   * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90,
 95, 98, 99, 99.9 (filter is non-null but all bits are set),
 100 (filter=null, control)).
   * Method high means I use random-access filter API in
 IndexSearcher's main loop.  Method low means I use random-access
 filter API down in SegmentTermDocs (just like deleted docs
 today).
   * Baseline (QPS) is current trunk, where filter is applied as iterator up
 high (ie in IndexSearcher's search loop).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3217) Improve DocValues merging

2011-06-27 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055582#comment-13055582
 ] 

Simon Willnauer commented on LUCENE-3217:
-

I am going to commit this part of the patch soon if nobody objects.

 Improve DocValues merging
 -

 Key: LUCENE-3217
 URL: https://issues.apache.org/jira/browse/LUCENE-3217
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-3217.patch


 Some DocValues impl. still load all values from merged segments into memory 
 during merge. For efficiency we should merge them on the fly without 
 buffering in memory

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3079) Facetiing module

2011-06-27 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-3079:
---

Attachment: LUCENE-3079-dev-tools.patch

Thanks Robert for the fix. This indeed looks better than patching LTC !

Patch for dev-tools only, this time w/ Maven 
support too. I hope it works well :).

 Facetiing module
 

 Key: LUCENE-3079
 URL: https://issues.apache.org/jira/browse/LUCENE-3079
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
 Attachments: LUCENE-3079-dev-tools.patch, LUCENE-3079.patch, 
 LUCENE-3079.patch, LUCENE-3079.patch


 Faceting is a hugely important feature, available in Solr today but
 not [easily] usable by Lucene-only apps.
 We should fix this, by creating a shared faceting module.
 Ideally, we factor out Solr's faceting impl, and maybe poach/merge
 from other impls (eg Bobo browse).
 Hoss describes some important challenges we'll face in doing this
 (http://markmail.org/message/5w35c2fr4zkiwsz6), copied here:
 {noformat}
 To look at faceting as a concrete example, there are big the reasons 
 faceting works so well in Solr: Solr has total control over the 
 index, knows exactly when the index has changed to rebuild caches, has a 
 strict schema so it can make sense of field types and 
 pick faceting algos accordingly, has multi-phase distributed search 
 approach to get exact counts efficiently across multiple shards, etc...
 (and there are still a lot of additional enhancements and improvements 
 that can be made to take even more advantage of knowledge solr has because 
 it owns the index that we no one has had time to tackle)
 {noformat}
 This is a great list of the things we face in refactoring.  It's also
 important because, if Solr needed to be so deeply intertwined with
 caching, schema, etc., other apps that want to facet will have the
 same needs and so we really have to address them in creating the
 shared module.
 I think we should get a basic faceting module started, but should not
 cut Solr over at first.  We should iterate on the module, fold in
 improvements, etc., and then, once we can fully verify that cutting
 over doesn't hurt Solr (ie lose functionality or performance) we can
 later cutover.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3079) Facetiing module

2011-06-27 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055584#comment-13055584
 ] 

Shai Erera commented on LUCENE-3079:


{quote}
My suggestion:

1. commit to branch 3.x with @experimental.
2. next, do a fast port to trunk, this doesnt mean heavy refactoring to take 
advantage of things like docvalues, just get it working correctly on trunk's 
APIs.
3. finally, close this issue and do improvements as normal, backporting 
whichever ones are easy and make sense, like any other issue.
{quote}

I agree. I'll give it a day or two before I commit, unless everyone agree it 
can be committed today, in which case I'll happily press the button :).

 Facetiing module
 

 Key: LUCENE-3079
 URL: https://issues.apache.org/jira/browse/LUCENE-3079
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
 Attachments: LUCENE-3079-dev-tools.patch, LUCENE-3079.patch, 
 LUCENE-3079.patch, LUCENE-3079.patch


 Faceting is a hugely important feature, available in Solr today but
 not [easily] usable by Lucene-only apps.
 We should fix this, by creating a shared faceting module.
 Ideally, we factor out Solr's faceting impl, and maybe poach/merge
 from other impls (eg Bobo browse).
 Hoss describes some important challenges we'll face in doing this
 (http://markmail.org/message/5w35c2fr4zkiwsz6), copied here:
 {noformat}
 To look at faceting as a concrete example, there are big the reasons 
 faceting works so well in Solr: Solr has total control over the 
 index, knows exactly when the index has changed to rebuild caches, has a 
 strict schema so it can make sense of field types and 
 pick faceting algos accordingly, has multi-phase distributed search 
 approach to get exact counts efficiently across multiple shards, etc...
 (and there are still a lot of additional enhancements and improvements 
 that can be made to take even more advantage of knowledge solr has because 
 it owns the index that we no one has had time to tackle)
 {noformat}
 This is a great list of the things we face in refactoring.  It's also
 important because, if Solr needed to be so deeply intertwined with
 caching, schema, etc., other apps that want to facet will have the
 same needs and so we really have to address them in creating the
 shared module.
 I think we should get a basic faceting module started, but should not
 cut Solr over at first.  We should iterate on the module, fold in
 improvements, etc., and then, once we can fully verify that cutting
 over doesn't hurt Solr (ie lose functionality or performance) we can
 later cutover.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities

2011-06-27 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3220:


Attachment: LUCENE-3220.patch

Information-based model framework due to Clinchant and Gaussier added.

 Implement various ranking models as Similarities
 

 Key: LUCENE-3220
 URL: https://issues.apache.org/jira/browse/LUCENE-3220
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: core/search
Affects Versions: flexscoring branch
Reporter: David Mark Nemeskey
Assignee: David Mark Nemeskey
  Labels: gsoc
 Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
 LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
 LUCENE-3220.patch, LUCENE-3220.patch

   Original Estimate: 336h
  Remaining Estimate: 336h

 With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
 can finally work on implementing the standard ranking models. Currently DFR, 
 BM25 and LM are on the menu.
 TODO:
  * {{EasyStats}}: contains all statistics that might be relevant for a 
 ranking algorithm
  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
 DocScorers and as much implementation detail as possible
  * _BM25_: the current mock implementation might be OK
  * _LM_
  * _DFR_
 Done:

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-1889) FastVectorHighlighter: support for additional queries

2011-06-27 Thread Mike Sokolov (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055599#comment-13055599
 ] 

Mike Sokolov commented on LUCENE-1889:
--

Ah, I see - that's awesome, thanks, had no idea.  Yeah - I had been thinking 
about matching positions-offsets using the existing term vectors, which was 
going to be kind of unpleasant; you have to iterate by term, which you don't 
care about, and scan for a matching position.

 FastVectorHighlighter: support for additional queries
 -

 Key: LUCENE-1889
 URL: https://issues.apache.org/jira/browse/LUCENE-1889
 Project: Lucene - Java
  Issue Type: Wish
  Components: modules/highlighter
Reporter: Robert Muir
Priority: Minor
 Attachments: LUCENE-1889.patch


 I am using fastvectorhighlighter for some strange languages and it is working 
 well! 
 One thing i noticed immediately is that many query types are not highlighted 
 (multitermquery, multiphrasequery, etc)
 Here is one thing Michael M posted in the original ticket:
 {quote}
 I think a nice [eventual] model would be if we could simply re-run the
 scorer on the single document (using InstantiatedIndex maybe, or
 simply some sort of wrapper on the term vectors which are already a
 mini-inverted-index for a single doc), but extend the scorer API to
 tell us the exact term occurrences that participated in a match (which
 I don't think is exposed today).
 {quote}
 Due to strange requirements I am using something similar to this (but 
 specialized to our case).
 I am doing strange things like forcing multitermqueries to rewrite into 
 boolean queries so they will be highlighted,
 and flattening multiphrasequeries into boolean or'ed phrasequeries.
 I do not think these things would be 'fast', but i had a few ideas that might 
 help:
 * looking at contrib/highlighter, you can support FilteredQuery in flatten() 
 by calling getQuery() right?
 * maybe as a last resort, try Query.extractTerms() ?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-1889) FastVectorHighlighter: support for additional queries

2011-06-27 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055603#comment-13055603
 ] 

Robert Muir commented on LUCENE-1889:
-

well I think Simon might be looking for feedback on LUCENE-2878, which would 
allow you to get at the positions and corresponding payloads.

So as an experiment close to what you describe, you could play with his patch, 
make a TokenFilter that copies whatever offset info highlighting needs into the 
payload (OffsetAsPayloadFilter or something), and try to make a quick-n-dirty 
highlighter that uses it?

It would be interesting to see what the performance is like from this versus 
the term vectors, besides working with all queries :)

 FastVectorHighlighter: support for additional queries
 -

 Key: LUCENE-1889
 URL: https://issues.apache.org/jira/browse/LUCENE-1889
 Project: Lucene - Java
  Issue Type: Wish
  Components: modules/highlighter
Reporter: Robert Muir
Priority: Minor
 Attachments: LUCENE-1889.patch


 I am using fastvectorhighlighter for some strange languages and it is working 
 well! 
 One thing i noticed immediately is that many query types are not highlighted 
 (multitermquery, multiphrasequery, etc)
 Here is one thing Michael M posted in the original ticket:
 {quote}
 I think a nice [eventual] model would be if we could simply re-run the
 scorer on the single document (using InstantiatedIndex maybe, or
 simply some sort of wrapper on the term vectors which are already a
 mini-inverted-index for a single doc), but extend the scorer API to
 tell us the exact term occurrences that participated in a match (which
 I don't think is exposed today).
 {quote}
 Due to strange requirements I am using something similar to this (but 
 specialized to our case).
 I am doing strange things like forcing multitermqueries to rewrite into 
 boolean queries so they will be highlighted,
 and flattening multiphrasequeries into boolean or'ed phrasequeries.
 I do not think these things would be 'fast', but i had a few ideas that might 
 help:
 * looking at contrib/highlighter, you can support FilteredQuery in flatten() 
 by calling getQuery() right?
 * maybe as a last resort, try Query.extractTerms() ?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3243) FastVectorHighlighter - add position offset to FieldPhraseList.WeightedPhraseInfo.Toffs

2011-06-27 Thread Jahangir Anwari (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jahangir Anwari updated LUCENE-3243:


Attachment: CustomSolrHighlighter.java

 FastVectorHighlighter - add position offset to 
 FieldPhraseList.WeightedPhraseInfo.Toffs
 ---

 Key: LUCENE-3243
 URL: https://issues.apache.org/jira/browse/LUCENE-3243
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/highlighter
Affects Versions: 3.2
 Environment: Lucene 3.2
Reporter: Jahangir Anwari
Priority: Minor
  Labels: feature, lucene
 Attachments: CustomSolrHighlighter.java, LUCENE-3243.patch.diff


 Needed to return position offsets along with highlighted snippets when using 
 FVH for highlighting. 
 Using the ([LUCENE-3141|https://issues.apache.org/jira/browse/LUCENE-3141]) 
 patch I was able to get the fragInfo for a particular Phrase search. 
 Currently the Toffs(Term offsets) class only stores the start and end offset.
 To get the position offset, I added the position offset information in Toffs 
 and FieldPhraseList class.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3246) Invert IR.getDelDocs - IR.getLiveDocs

2011-06-27 Thread Michael McCandless (JIRA)
Invert IR.getDelDocs - IR.getLiveDocs
--

 Key: LUCENE-3246
 URL: https://issues.apache.org/jira/browse/LUCENE-3246
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.0


Spinoff from LUCENE-1536, where we need to fix the low level filtering
we do for deleted docs to match Filters (ie, a set bit means the doc
is accepted) so that filters can be pushed all the way down to the
enums when possible/appropriate.

This change also inverts the meaning first arg to
TermsEnum.docs/AndPositions (renames from skipDocs to liveDocs).


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3243) FastVectorHighlighter - add position offset to FieldPhraseList.WeightedPhraseInfo.Toffs

2011-06-27 Thread Jahangir Anwari (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055610#comment-13055610
 ] 

Jahangir Anwari commented on LUCENE-3243:
-

Hi Koji,

Sorry for not elaborating more on our requirements and our implementation. 
Basically for every search result we needed the position(word offset)  
information of the search hits in the document. On the search result page, this 
position offsets information was embedded in the search result links. When the 
user clicked on a search link, at the target page using javascript and the 
position offset information we would highlight the search terms.

To return the position offset information along with the highlighted snippet we 
created a CustomSolrHighlihter(attached). Depending on the type of query the 
custom highlighter returns the position offsets information. 

# Non-phrase query: Using FieldTermStack we return the term position offset for 
the terms in the query.
# Phrase query: Using the WeightedFragInfo.fragInfos we return the term 
position offset for the terms in the query.

But currently the Toffs(Term offsets) class only stores the start and end 
offset and so we updated it so that it would store the position information as 
well.

Answers to your questions:

* *What is the position offset? Isn't it just a position?*
Yes, it is just the position.

* *Why is the position offset String?*
Since for phrase queries(e.g. divine knowledge) the position-gap between 
terms == 1, WeightedPhraseInfo would only store the startOffset(i.e 12) of the 
first term of the phrase terms and the endOffset(i.e. 29) of the phrase terms.
{code} 

[startOffset, endOffset]
divine knowledge: [(12,29)]
{code}
But as we needed position information(i.e. 5,6) of all the terms it required 
storing the position of the terms of a phrase query as a String. 
{code}
[startOffset, endOffset, positions]
divine knowledge: [(12,29, [5,6])]

{code}
* *Why do you need setPositionOffset()?*
setPositionOffset() is used to store the positions of consecutive terms of a 
phrase query. For every terms of the phrase query it just appends the argument 
position to the current position(i.e. [5,6]). 

P.S. In order to able to override doHighlightingByFastVectorHighlighter() 
method in CustomSolrHighlighter we had to change the access modifier for 
alternateField() and getSolrFragmentsBuilder() to protected.

 FastVectorHighlighter - add position offset to 
 FieldPhraseList.WeightedPhraseInfo.Toffs
 ---

 Key: LUCENE-3243
 URL: https://issues.apache.org/jira/browse/LUCENE-3243
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/highlighter
Affects Versions: 3.2
 Environment: Lucene 3.2
Reporter: Jahangir Anwari
Priority: Minor
  Labels: feature, lucene
 Attachments: CustomSolrHighlighter.java, LUCENE-3243.patch.diff


 Needed to return position offsets along with highlighted snippets when using 
 FVH for highlighting. 
 Using the ([LUCENE-3141|https://issues.apache.org/jira/browse/LUCENE-3141]) 
 patch I was able to get the fragInfo for a particular Phrase search. 
 Currently the Toffs(Term offsets) class only stores the start and end offset.
 To get the position offset, I added the position offset information in Toffs 
 and FieldPhraseList class.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3246) Invert IR.getDelDocs - IR.getLiveDocs

2011-06-27 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3246:
---

Attachment: LUCENE-3246.patch

Initial patch, pulled out of LUCENE-1536, plus 1) renamed IR.getNotDeletedDocs 
to IR.getLiveDocs, and 2) fixed IR to force subclasses to override this 
(removing getDeletedDocs).

I think this is close, but the one thing remaining is to fix the IR impls to 
properly invert their del docs (now they create a NotDocs wrapper around 
their current bitsets).

 Invert IR.getDelDocs - IR.getLiveDocs
 --

 Key: LUCENE-3246
 URL: https://issues.apache.org/jira/browse/LUCENE-3246
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-3246.patch


 Spinoff from LUCENE-1536, where we need to fix the low level filtering
 we do for deleted docs to match Filters (ie, a set bit means the doc
 is accepted) so that filters can be pushed all the way down to the
 enums when possible/appropriate.
 This change also inverts the meaning first arg to
 TermsEnum.docs/AndPositions (renames from skipDocs to liveDocs).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (LUCENE-3243) FastVectorHighlighter - add position offset to FieldPhraseList.WeightedPhraseInfo.Toffs

2011-06-27 Thread Jahangir Anwari (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055610#comment-13055610
 ] 

Jahangir Anwari edited comment on LUCENE-3243 at 6/27/11 4:01 PM:
--

Hi Koji,

Sorry for not elaborating more on our requirements and our implementation. 
Basically for every search result we needed the position(word offset)  
information of the search hits in the document. On the search result page, this 
position offsets information was embedded in the search result links. When the 
user clicked on a search link, at the target page using javascript and the 
position offset information we would highlight the search terms.

To return the position offset information along with the highlighted snippet we 
created a CustomSolrHighlihter(attached). Depending on the type of query the 
custom highlighter returns the position offsets information. 

# Non-phrase query: Using FieldTermStack we return the term position offset for 
the terms in the query.
# Phrase query: Using the WeightedFragInfo.fragInfos we return the term 
position offset for the terms in the query.

But currently the Toffs(Term offsets) class only stores the start and end 
offset and so we updated it so that it would store the position information as 
well.

Answers to your questions:

* *What is the position offset? Isn't it just a position?*
Yes, it is just the position.

* *Why is the position offset String?*
Since for phrase queries(e.g. divine knowledge) the position-gap between 
terms == 1, WeightedPhraseInfo would only store the startOffset(i.e 12) of the 
first term of the phrase terms and the endOffset(i.e. 29) of the phrase terms.
{code} 

[startOffset, endOffset]
divine knowledge: [(12,29)]
{code}
But as we needed position information(i.e. 5,6) of all the terms it required 
storing the position of the terms of a phrase query as a String. 
{code}
[startOffset, endOffset, positions]
divine knowledge: [(12,29, [5,6])]

{code}
* *Why do you need setPositionOffset()?*
setPositionOffset() is used to store the positions of consecutive terms of a 
phrase query. For every terms of the phrase query it just appends the argument 
position to the current position(i.e. [5,6]). 

Example output:

{code}
lst name=/book/title/pg15
   arr name=para
   strun of strong class=highlightdivine knowledge/strong and 
understanding, and become the recipients of a grace that is infinite and /str
   /arr
   str name=positionOffsets80,81,118,119/str
/lst

{code}


P.S. In order to able to override doHighlightingByFastVectorHighlighter() 
method in CustomSolrHighlighter we had to change the access modifier for 
alternateField() and getSolrFragmentsBuilder() to protected.

  was (Author: janwari):
Hi Koji,

Sorry for not elaborating more on our requirements and our implementation. 
Basically for every search result we needed the position(word offset)  
information of the search hits in the document. On the search result page, this 
position offsets information was embedded in the search result links. When the 
user clicked on a search link, at the target page using javascript and the 
position offset information we would highlight the search terms.

To return the position offset information along with the highlighted snippet we 
created a CustomSolrHighlihter(attached). Depending on the type of query the 
custom highlighter returns the position offsets information. 

# Non-phrase query: Using FieldTermStack we return the term position offset for 
the terms in the query.
# Phrase query: Using the WeightedFragInfo.fragInfos we return the term 
position offset for the terms in the query.

But currently the Toffs(Term offsets) class only stores the start and end 
offset and so we updated it so that it would store the position information as 
well.

Answers to your questions:

* *What is the position offset? Isn't it just a position?*
Yes, it is just the position.

* *Why is the position offset String?*
Since for phrase queries(e.g. divine knowledge) the position-gap between 
terms == 1, WeightedPhraseInfo would only store the startOffset(i.e 12) of the 
first term of the phrase terms and the endOffset(i.e. 29) of the phrase terms.
{code} 

[startOffset, endOffset]
divine knowledge: [(12,29)]
{code}
But as we needed position information(i.e. 5,6) of all the terms it required 
storing the position of the terms of a phrase query as a String. 
{code}
[startOffset, endOffset, positions]
divine knowledge: [(12,29, [5,6])]

{code}
* *Why do you need setPositionOffset()?*
setPositionOffset() is used to store the positions of consecutive terms of a 
phrase query. For every terms of the phrase query it just appends the argument 
position to the current position(i.e. [5,6]). 

P.S. In order to able to override doHighlightingByFastVectorHighlighter() 

RE: [JENKINS-MAVEN] Lucene-Solr-Maven-trunk #161: POMs out of sync

2011-06-27 Thread Steven A Rowe
This was the same misspelled common module problem.  I should have run both 
'ant generate-maven-artifacts' *and* 'mvn install' when I committed the 
(partial) fix last time...

Anyway, again I've committed a fix.  Going over to Jenkins now to run the trunk 
maven build again.  13th try's the charm?

 -Original Message-
 From: Apache Jenkins Server [mailto:jenk...@builds.apache.org]
 Sent: Monday, June 27, 2011 10:37 AM
 To: dev@lucene.apache.org
 Subject: [JENKINS-MAVEN] Lucene-Solr-Maven-trunk #161: POMs out of sync
 
 Build: https://builds.apache.org/job/Lucene-Solr-Maven-trunk/161/
 
 No tests ran.
 
 Build Log (for compile errors):
 [...truncated 14056 lines...]
 
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-3247) Update CompoundFile format on the website

2011-06-27 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer reassigned LUCENE-3247:
---

Assignee: Simon Willnauer

 Update CompoundFile format on the website
 -

 Key: LUCENE-3247
 URL: https://issues.apache.org/jira/browse/LUCENE-3247
 Project: Lucene - Java
  Issue Type: Task
  Components: general/website
Affects Versions: 3.4, 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
Priority: Minor
 Fix For: 3.4, 4.0


 since we changed the compound file format lately we should update the website 
 accordingly

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3247) Update CompoundFile format on the website

2011-06-27 Thread Simon Willnauer (JIRA)
Update CompoundFile format on the website
-

 Key: LUCENE-3247
 URL: https://issues.apache.org/jira/browse/LUCENE-3247
 Project: Lucene - Java
  Issue Type: Task
  Components: general/website
Affects Versions: 3.4, 4.0
Reporter: Simon Willnauer
Priority: Minor
 Fix For: 3.4, 4.0


since we changed the compound file format lately we should update the website 
accordingly

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3247) Update CompoundFile format on the website

2011-06-27 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-3247:


Attachment: LUCENE-3247.patch

here is a patch.

 Update CompoundFile format on the website
 -

 Key: LUCENE-3247
 URL: https://issues.apache.org/jira/browse/LUCENE-3247
 Project: Lucene - Java
  Issue Type: Task
  Components: general/website
Affects Versions: 3.4, 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
Priority: Minor
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3247.patch


 since we changed the compound file format lately we should update the website 
 accordingly

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Updateing the website

2011-06-27 Thread Simon Willnauer
hey folks,

I tried to update the website yesterday and run into some problems
with permissions etc. I talked to the infra guys which helped me to
fix it. Yet, the fact that we are relying on grants cron job bugs me a
little. It seems that we are doing things not the apache way where you
just go into
people.apache.org:/www/lucene.apache.org and run svn update. We still
export stuff from certain svn paths into that directory via
/home/gsingers/bin/exportLuceneDocs.sh
I wonder if we can achive the same thing by using something like svn
external (maybe I am wrong here) or we should change the layout of the
website so we can simply run svn update there?

SImon

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3216) Store DocValues per segment instead of per field

2011-06-27 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-3216:


Attachment: LUCENE-3216.patch

next iteration, this time fixing most of the Byte variants to only write / open 
one file at a time. Straight variants are still missing.

 Store DocValues per segment instead of per field
 

 Key: LUCENE-3216
 URL: https://issues.apache.org/jira/browse/LUCENE-3216
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-3216.patch, LUCENE-3216_floats.patch


 currently we are storing docvalues per field which results in at least one 
 file per field that uses docvalues (or at most two per field per segment 
 depending on the impl.). Yet, we should try to by default pack docvalues into 
 a single file if possible. To enable this we need to hold all docvalues in 
 memory during indexing and write them to disk once we flush a segment. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-06-27 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055649#comment-13055649
 ] 

Simon Willnauer commented on LUCENE-2793:
-

bq. Should IOContext and MergeInfo be in oal.store not .index?
+1

bq. I think SegmentMerger should receive an IOCtx from its caller, and
yeah I think we should pass the IOContext in via the ctor. Yet, for 
IW#addIndexes you can simply build a best effort IOContext like:
{code}
 for (IndexReader indexReader : readers) {
   numDocs += indexReader.numDocs();
 }
 final IOContext context = new IOContext(new MergeInfo(numDocs, -1, true, 
false));
}

bq. I think on flush IOContext should include num docs and estimated
+1 I think that is good no?

bq. Somehow, lucene/contrib/demo/data is deleted on the branch. We should check 
if anything else is missing!
oh man... I will check

you use new IOContext(Context.FLUSH) and new IOContext(Context.READ) in your 
patch but we have some static like IOContext.READ maybe we need FLUSH too?

for the tests I think we should start randomizing the IOContext. I think you 
should add a newIOContext(Random random) to LuceneTestCase and get the context 
from there in a unit test. At the end of the day we should see same behavior 
whatever context you pass in right?

simon






 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
Assignee: Varun Thacker
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Attachments: LUCENE-2793-nrt.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch


 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3248) In BufferedIndexInput to cleanup the bufferSize variable passed down to it as the default bufferSize(BUFFER_SIZE) is always used

2011-06-27 Thread Varun Thacker (JIRA)
In BufferedIndexInput to cleanup the bufferSize variable passed down to it as 
the default bufferSize(BUFFER_SIZE) is always used


 Key: LUCENE-3248
 URL: https://issues.apache.org/jira/browse/LUCENE-3248
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Affects Versions: 4.0, IOContext branch
Reporter: Varun Thacker


After adding IOContext in (LUCENE-2793) we can optimize the size of all the 
buffers accordingly. This patch would cleanup all the unused bufferSize 
variables.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3247) Update CompoundFile format on the website

2011-06-27 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-3247.
-

   Resolution: Fixed
Lucene Fields: [New, Patch Available]  (was: [New])

committed to trunk and backported to 3.x

 Update CompoundFile format on the website
 -

 Key: LUCENE-3247
 URL: https://issues.apache.org/jira/browse/LUCENE-3247
 Project: Lucene - Java
  Issue Type: Task
  Components: general/website
Affects Versions: 3.4, 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
Priority: Minor
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3247.patch


 since we changed the compound file format lately we should update the website 
 accordingly

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[VOTE] Drop Java 5 support for trunk (Lucene 4.0)

2011-06-27 Thread Simon Willnauer
This issue has been discussed on various occasions and lately on
LUCENE-3239 (https://issues.apache.org/jira/browse/LUCENE-3239)

The main reasons for this have been discussed on the issue but let me
put them out here too:

- Lack of testing on Jenkins with Java 5
- Java 5 end of lifetime is reached a long time ago so Java 5 is
totally unmaintained which means for us that bugs have to either be
hacked around, tests disabled, warnings placed, but some things simply
cannot be fixed... we cannot actually support something that is no
longer maintained: we do find JRE bugs
(http://wiki.apache.org/lucene-java/SunJavaBugs) and its important
that bugs actually get fixed: cannot do everything with hacks.\
- due to Java 5 we legitimate performance hits like 20% slower grouping speed.

For reference please read through the issue mentioned above.

A lot of the committers seem to be on the same page here to drop Java
5 support so I am calling out an official vote.

all Lucene 3.x releases will remain with Java 5 support this vote is
for trunk only.


Here is my +1

Simon

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3246) Invert IR.getDelDocs - IR.getLiveDocs

2011-06-27 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-3246:
--

Attachment: LUCENE-3246-IndexSplitters.patch

Hi Mike,

some work for you: I removed the nocommits in both contrib IndexSplitters. Now 
only NotBits usage in core is left over, right?

 Invert IR.getDelDocs - IR.getLiveDocs
 --

 Key: LUCENE-3246
 URL: https://issues.apache.org/jira/browse/LUCENE-3246
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-3246-IndexSplitters.patch, LUCENE-3246.patch


 Spinoff from LUCENE-1536, where we need to fix the low level filtering
 we do for deleted docs to match Filters (ie, a set bit means the doc
 is accepted) so that filters can be pushed all the way down to the
 enums when possible/appropriate.
 This change also inverts the meaning first arg to
 TermsEnum.docs/AndPositions (renames from skipDocs to liveDocs).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Drop Java 5 support for trunk (Lucene 4.0)

2011-06-27 Thread Dawid Weiss
My big +1.

D.

On Mon, Jun 27, 2011 at 7:38 PM, Simon Willnauer
simon.willna...@googlemail.com wrote:
 This issue has been discussed on various occasions and lately on
 LUCENE-3239 (https://issues.apache.org/jira/browse/LUCENE-3239)

 The main reasons for this have been discussed on the issue but let me
 put them out here too:

 - Lack of testing on Jenkins with Java 5
 - Java 5 end of lifetime is reached a long time ago so Java 5 is
 totally unmaintained which means for us that bugs have to either be
 hacked around, tests disabled, warnings placed, but some things simply
 cannot be fixed... we cannot actually support something that is no
 longer maintained: we do find JRE bugs
 (http://wiki.apache.org/lucene-java/SunJavaBugs) and its important
 that bugs actually get fixed: cannot do everything with hacks.\
 - due to Java 5 we legitimate performance hits like 20% slower grouping speed.

 For reference please read through the issue mentioned above.

 A lot of the committers seem to be on the same page here to drop Java
 5 support so I am calling out an official vote.

 all Lucene 3.x releases will remain with Java 5 support this vote is
 for trunk only.


 Here is my +1

 Simon

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: [VOTE] Drop Java 5 support for trunk (Lucene 4.0)

2011-06-27 Thread Uwe Schindler
My +1 for trunk :-)

I will change hudson scripts once this vote passes!

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Simon Willnauer [mailto:simon.willna...@googlemail.com]
 Sent: Monday, June 27, 2011 7:38 PM
 To: dev@lucene.apache.org
 Subject: [VOTE] Drop Java 5 support for trunk (Lucene 4.0)
 
 This issue has been discussed on various occasions and lately on
 LUCENE-3239 (https://issues.apache.org/jira/browse/LUCENE-3239)
 
 The main reasons for this have been discussed on the issue but let me put
 them out here too:
 
 - Lack of testing on Jenkins with Java 5
 - Java 5 end of lifetime is reached a long time ago so Java 5 is totally
 unmaintained which means for us that bugs have to either be hacked
 around, tests disabled, warnings placed, but some things simply cannot be
 fixed... we cannot actually support something that is no longer maintained:
 we do find JRE bugs
 (http://wiki.apache.org/lucene-java/SunJavaBugs) and its important that
 bugs actually get fixed: cannot do everything with hacks.\
 - due to Java 5 we legitimate performance hits like 20% slower grouping
 speed.
 
 For reference please read through the issue mentioned above.
 
 A lot of the committers seem to be on the same page here to drop Java
 5 support so I am calling out an official vote.
 
 all Lucene 3.x releases will remain with Java 5 support this vote is for trunk
 only.
 
 
 Here is my +1
 
 Simon
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
 commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: [VOTE] Drop Java 5 support for trunk (Lucene 4.0)

2011-06-27 Thread Steven A Rowe
+1

 -Original Message-
 From: Simon Willnauer [mailto:simon.willna...@googlemail.com]
 Sent: Monday, June 27, 2011 1:38 PM
 To: dev@lucene.apache.org
 Subject: [VOTE] Drop Java 5 support for trunk (Lucene 4.0)
 
 This issue has been discussed on various occasions and lately on
 LUCENE-3239 (https://issues.apache.org/jira/browse/LUCENE-3239)
 
 The main reasons for this have been discussed on the issue but let me
 put them out here too:
 
 - Lack of testing on Jenkins with Java 5
 - Java 5 end of lifetime is reached a long time ago so Java 5 is
 totally unmaintained which means for us that bugs have to either be
 hacked around, tests disabled, warnings placed, but some things simply
 cannot be fixed... we cannot actually support something that is no
 longer maintained: we do find JRE bugs
 (http://wiki.apache.org/lucene-java/SunJavaBugs) and its important
 that bugs actually get fixed: cannot do everything with hacks.\
 - due to Java 5 we legitimate performance hits like 20% slower grouping
 speed.
 
 For reference please read through the issue mentioned above.
 
 A lot of the committers seem to be on the same page here to drop Java
 5 support so I am calling out an official vote.
 
 all Lucene 3.x releases will remain with Java 5 support this vote is
 for trunk only.
 
 
 Here is my +1
 
 Simon
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3246) Invert IR.getDelDocs - IR.getLiveDocs

2011-06-27 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055683#comment-13055683
 ] 

Michael McCandless commented on LUCENE-3246:


Awesome, thanks Uwe!  I'll work on SR cutting over to live docs on disk...

 Invert IR.getDelDocs - IR.getLiveDocs
 --

 Key: LUCENE-3246
 URL: https://issues.apache.org/jira/browse/LUCENE-3246
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-3246-IndexSplitters.patch, LUCENE-3246.patch


 Spinoff from LUCENE-1536, where we need to fix the low level filtering
 we do for deleted docs to match Filters (ie, a set bit means the doc
 is accepted) so that filters can be pushed all the way down to the
 enums when possible/appropriate.
 This change also inverts the meaning first arg to
 TermsEnum.docs/AndPositions (renames from skipDocs to liveDocs).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Drop Java 5 support for trunk (Lucene 4.0)

2011-06-27 Thread Yonik Seeley
+1, never thought I'd see the day ;-)

-Yonik
http://www.lucidimagination.com


On Mon, Jun 27, 2011 at 1:38 PM, Simon Willnauer
simon.willna...@googlemail.com wrote:
 This issue has been discussed on various occasions and lately on
 LUCENE-3239 (https://issues.apache.org/jira/browse/LUCENE-3239)

 The main reasons for this have been discussed on the issue but let me
 put them out here too:

 - Lack of testing on Jenkins with Java 5
 - Java 5 end of lifetime is reached a long time ago so Java 5 is
 totally unmaintained which means for us that bugs have to either be
 hacked around, tests disabled, warnings placed, but some things simply
 cannot be fixed... we cannot actually support something that is no
 longer maintained: we do find JRE bugs
 (http://wiki.apache.org/lucene-java/SunJavaBugs) and its important
 that bugs actually get fixed: cannot do everything with hacks.\
 - due to Java 5 we legitimate performance hits like 20% slower grouping speed.

 For reference please read through the issue mentioned above.

 A lot of the committers seem to be on the same page here to drop Java
 5 support so I am calling out an official vote.

 all Lucene 3.x releases will remain with Java 5 support this vote is
 for trunk only.


 Here is my +1

 Simon

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Drop Java 5 support for trunk (Lucene 4.0)

2011-06-27 Thread Dawid Weiss
 +1, never thought I'd see the day ;-)

We should run Mike's super-duper graphical visualizations of
average/stddev queries speed for:

- latest SUN 1.5,
- trunk with latest SUN 1.6,
- trunk with latest SUN 1.6 after upgrades to use 1.6-specific
infrastructure (Arrays.copyOf, bit fiddling intrinsics).

This would be interesting and maybe inspiring for folks still willing
to keep 1.5 support in place ;)

Dawid

P.S. s/SUN/Oracle/g...

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Drop Java 5 support for trunk (Lucene 4.0)

2011-06-27 Thread Mark Miller
+1

On Jun 27, 2011, at 1:38 PM, Simon Willnauer wrote:

 This issue has been discussed on various occasions and lately on
 LUCENE-3239 (https://issues.apache.org/jira/browse/LUCENE-3239)
 
 The main reasons for this have been discussed on the issue but let me
 put them out here too:
 
 - Lack of testing on Jenkins with Java 5
 - Java 5 end of lifetime is reached a long time ago so Java 5 is
 totally unmaintained which means for us that bugs have to either be
 hacked around, tests disabled, warnings placed, but some things simply
 cannot be fixed... we cannot actually support something that is no
 longer maintained: we do find JRE bugs
 (http://wiki.apache.org/lucene-java/SunJavaBugs) and its important
 that bugs actually get fixed: cannot do everything with hacks.\
 - due to Java 5 we legitimate performance hits like 20% slower grouping speed.
 
 For reference please read through the issue mentioned above.
 
 A lot of the committers seem to be on the same page here to drop Java
 5 support so I am calling out an official vote.
 
 all Lucene 3.x releases will remain with Java 5 support this vote is
 for trunk only.
 
 
 Here is my +1
 
 Simon
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 

- Mark Miller
lucidimagination.com









-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Drop Java 5 support for trunk (Lucene 4.0)

2011-06-27 Thread Ryan McKinley
+1



On Mon, Jun 27, 2011 at 1:38 PM, Simon Willnauer
simon.willna...@googlemail.com wrote:
 This issue has been discussed on various occasions and lately on
 LUCENE-3239 (https://issues.apache.org/jira/browse/LUCENE-3239)

 The main reasons for this have been discussed on the issue but let me
 put them out here too:

 - Lack of testing on Jenkins with Java 5
 - Java 5 end of lifetime is reached a long time ago so Java 5 is
 totally unmaintained which means for us that bugs have to either be
 hacked around, tests disabled, warnings placed, but some things simply
 cannot be fixed... we cannot actually support something that is no
 longer maintained: we do find JRE bugs
 (http://wiki.apache.org/lucene-java/SunJavaBugs) and its important
 that bugs actually get fixed: cannot do everything with hacks.\
 - due to Java 5 we legitimate performance hits like 20% slower grouping speed.

 For reference please read through the issue mentioned above.

 A lot of the committers seem to be on the same page here to drop Java
 5 support so I am calling out an official vote.

 all Lucene 3.x releases will remain with Java 5 support this vote is
 for trunk only.


 Here is my +1

 Simon

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3245) Realtime terms dictionary

2011-06-27 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-3245:
-

Attachment: LUCENE-3245.patch

Here's a cut with a first implementation of the CSLM and AIA terms 
dictionaries.  

I think we're ready to benchmark writes.

 Realtime terms dictionary
 -

 Key: LUCENE-3245
 URL: https://issues.apache.org/jira/browse/LUCENE-3245
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Jason Rutherglen
Priority: Minor
 Attachments: LUCENE-3245.patch, LUCENE-3245.patch, LUCENE-3245.patch


 For LUCENE-2312 we need a realtime terms dictionary.  While 
 ConcurrentSkipListMap may be used, it has drawbacks in terms of high object 
 overhead which can impact GC collection times and heap memory usage.  
 If we implement a skip list that uses primitive backing arrays, we can 
 hopefully have a data structure that is [as] fast and memory efficient.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-tests-only-trunk - Build # 9121 - Failure

2011-06-27 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/9121/

1 tests failed.
REGRESSION:  org.apache.lucene.index.TestAddIndexes.testAddIndexesWithRollback

Error Message:
file 1.fnx was already written to

Stack Trace:
java.io.IOException: file 1.fnx was already written to
at 
org.apache.lucene.store.MockDirectoryWrapper.createOutput(MockDirectoryWrapper.java:347)
at 
org.apache.lucene.index.SegmentInfos.writeGlobalFieldMap(SegmentInfos.java:817)
at org.apache.lucene.index.SegmentInfos.write(SegmentInfos.java:305)
at 
org.apache.lucene.index.SegmentInfos.prepareCommit(SegmentInfos.java:813)
at 
org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:3789)
at 
org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2649)
at 
org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2720)
at 
org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1074)
at 
org.apache.lucene.index.IndexWriter.rollbackInternal(IndexWriter.java:2041)
at org.apache.lucene.index.IndexWriter.rollback(IndexWriter.java:1964)
at 
org.apache.lucene.index.TestAddIndexes.testAddIndexesWithRollback(TestAddIndexes.java:929)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1430)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1348)




Build Log (for compile errors):
[...truncated 3426 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2619) two sfields in geospatial search

2011-06-27 Thread jose rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055720#comment-13055720
 ] 

jose rodriguez commented on SOLR-2619:
--

Hi David thanks for your reply,

When i said it works for me is because i tried houdreds other possibilities 
without success.

i was triying to run all from q= _query_:{} _query_:{} and very very large etc..

If i understand i could have both {!geofilt} into q?

is there a better way to do my query q={!geofilt 
sfield=location_1}fq={!geofilt sfield=location_2} min this case ???

Thanks.



 two sfields in geospatial search
 

 Key: SOLR-2619
 URL: https://issues.apache.org/jira/browse/SOLR-2619
 Project: Solr
  Issue Type: Wish
  Components: clients - php
Affects Versions: 3.2
 Environment: Using with drupal
Reporter: jose rodriguez
 Fix For: 3.2


 Is it possible to create a query with two sfield (geospatial search)? .Want 
 to mean two diferents pt and d for each field.
 If i need from - to then i need fields around the from coordinate and around 
 the to coordinates.
 Thanks.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-2341) explore morfologik integration

2011-06-27 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/LUCENE-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michał Dybizbański updated LUCENE-2341:
---

Attachment: morfologik-polish-1.5.2.jar
morfologik-stemming-1.5.2.jar
morfologik-fsa-1.5.2.jar
LUCENE-2341.diff

David, as you suggested, I've changed the interface to MorfologikAnalyzer and 
MorfologikFilter to account for the changes in Morfologik 1.5.2, namely the 
multiple dictionaries.
Both those classes' constructors now accept a PolishStemmer.DICTIONARY (instead 
of languageCode String as in previous patch). A PolishStemmer object is 
instantiated by MorfologikFilter, so each invocation of 
MorfologikAnalyzer.createComponents (which instantiates MorfologikFilter) is 
coupled with an individual instance of PolishStemmer.
This way, sharing a MorfologikAnalyzer by separate threads is safe (even though 
MorfologikFilter itself isn't thread-safe) provided each thread obtains its own 
TokenStreamComponents through ReusableAnalyzerBase.createComponents (is this 
always the case ? looking at other filters, thay don't look thread-safe neither 
..)


 explore morfologik integration
 --

 Key: LUCENE-2341
 URL: https://issues.apache.org/jira/browse/LUCENE-2341
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/analysis
Reporter: Robert Muir
Assignee: Dawid Weiss
 Attachments: LUCENE-2341.diff, LUCENE-2341.diff, LUCENE-2341.diff, 
 morfologik-fsa-1.5.2.jar, morfologik-polish-1.5.2.jar, 
 morfologik-stemming-1.5.0.jar, morfologik-stemming-1.5.2.jar


 Dawid Weiss mentioned on LUCENE-2298 that there is another Polish stemmer 
 available:
 http://sourceforge.net/projects/morfologik/
 This works differently than LUCENE-2298, and ideally would be another option 
 for users.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (LUCENE-2341) explore morfologik integration

2011-06-27 Thread JIRA

[ 
https://issues.apache.org/jira/browse/LUCENE-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055729#comment-13055729
 ] 

Michał Dybizbański edited comment on LUCENE-2341 at 6/27/11 8:19 PM:
-

Dawid, as you suggested, I've changed the interface to MorfologikAnalyzer and 
MorfologikFilter to account for the changes in Morfologik 1.5.2, namely the 
multiple dictionaries.
Both those classes' constructors now accept a PolishStemmer.DICTIONARY (instead 
of languageCode String as in previous patch). A PolishStemmer object is 
instantiated by MorfologikFilter, so each invocation of 
MorfologikAnalyzer.createComponents (which instantiates MorfologikFilter) is 
coupled with an individual instance of PolishStemmer.
This way, sharing a MorfologikAnalyzer by separate threads is safe (even though 
MorfologikFilter itself isn't thread-safe) provided each thread obtains its own 
TokenStreamComponents through ReusableAnalyzerBase.createComponents (is this 
always the case ? looking at other filters, thay don't look thread-safe neither 
..)


  was (Author: michcio):
David, as you suggested, I've changed the interface to MorfologikAnalyzer 
and MorfologikFilter to account for the changes in Morfologik 1.5.2, namely the 
multiple dictionaries.
Both those classes' constructors now accept a PolishStemmer.DICTIONARY (instead 
of languageCode String as in previous patch). A PolishStemmer object is 
instantiated by MorfologikFilter, so each invocation of 
MorfologikAnalyzer.createComponents (which instantiates MorfologikFilter) is 
coupled with an individual instance of PolishStemmer.
This way, sharing a MorfologikAnalyzer by separate threads is safe (even though 
MorfologikFilter itself isn't thread-safe) provided each thread obtains its own 
TokenStreamComponents through ReusableAnalyzerBase.createComponents (is this 
always the case ? looking at other filters, thay don't look thread-safe neither 
..)

  
 explore morfologik integration
 --

 Key: LUCENE-2341
 URL: https://issues.apache.org/jira/browse/LUCENE-2341
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/analysis
Reporter: Robert Muir
Assignee: Dawid Weiss
 Attachments: LUCENE-2341.diff, LUCENE-2341.diff, LUCENE-2341.diff, 
 morfologik-fsa-1.5.2.jar, morfologik-polish-1.5.2.jar, 
 morfologik-stemming-1.5.0.jar, morfologik-stemming-1.5.2.jar


 Dawid Weiss mentioned on LUCENE-2298 that there is another Polish stemmer 
 available:
 http://sourceforge.net/projects/morfologik/
 This works differently than LUCENE-2298, and ideally would be another option 
 for users.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3228) build should allow you (especially hudson) to refer to a local javadocs installation instead of downloading

2011-06-27 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated LUCENE-3228:
-

Attachment: LUCENE-3228.patch

rmuir: here's a rough patch showing how the link offline stuff works. (as far 
as i understand it anyway)

some quick testing didn't turn up any problems, but i didn't test the where 
modules/contribs usage of invoke-javadoc.  

there may be cleanup we want to do to - for now i avoided adding more sys 
properties for the package-list dirs, but maybe we want them? i dunno.  there 
's also some existing instances of the link tag that look totally bogus and 
broken (see the WTF comments i added) but i didn't test what changes if i 
remove them

this patch should also fix SOLR-2439 (use relative links for lucene jdocs from 
solr jdocs.

 build should allow you (especially hudson) to refer to a local javadocs 
 installation instead of downloading
 ---

 Key: LUCENE-3228
 URL: https://issues.apache.org/jira/browse/LUCENE-3228
 Project: Lucene - Java
  Issue Type: Task
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: LUCENE-3228.patch


 Currently, we fail on all javadocs warnings.
 However, you get a warning if it cannot download the package-list from sun.com
 So I think we should allow you optionally set a sysprop using linkoffline.
 Then we would get much less hudson fake failures
 I feel like Mike opened an issue for this already but I cannot find it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2583) Make external scoring more efficient (ExternalFileField, FileFloatSource)

2011-06-27 Thread Martin Grotzke (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055737#comment-13055737
 ] 

Martin Grotzke commented on SOLR-2583:
--

bq. Looking at your test, I think it is reasonable. But I'd like to use 
CompactByteArray. I saw it wins over HashMap and float[] when 5% and above in 
my test.

Can you share your test code or s.th. similar? Perhaps you can just fork 
https://github.com/magro/lucene-solr/ and add an appropriate test that reflects 
your data?

 Make external scoring more efficient (ExternalFileField, FileFloatSource)
 -

 Key: SOLR-2583
 URL: https://issues.apache.org/jira/browse/SOLR-2583
 Project: Solr
  Issue Type: Improvement
  Components: search
Reporter: Martin Grotzke
Priority: Minor
 Attachments: FileFloatSource.java.patch, patch.txt


 External scoring eats much memory, depending on the number of documents in 
 the index. The ExternalFileField (used for external scoring) uses 
 FileFloatSource, where one FileFloatSource is created per external scoring 
 file. FileFloatSource creates a float array with the size of the number of 
 docs (this is also done if the file to load is not found). If there are much 
 less entries in the scoring file than there are number of docs in total the 
 big float array wastes much memory.
 This could be optimized by using a map of doc - score, so that the map 
 contains as many entries as there are scoring entries in the external file, 
 but not more.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Drop Java 5 support for trunk (Lucene 4.0)

2011-06-27 Thread Jan Høydahl
+1

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 27. juni 2011, at 19.38, Simon Willnauer wrote:

 This issue has been discussed on various occasions and lately on
 LUCENE-3239 (https://issues.apache.org/jira/browse/LUCENE-3239)
 
 The main reasons for this have been discussed on the issue but let me
 put them out here too:
 
 - Lack of testing on Jenkins with Java 5
 - Java 5 end of lifetime is reached a long time ago so Java 5 is
 totally unmaintained which means for us that bugs have to either be
 hacked around, tests disabled, warnings placed, but some things simply
 cannot be fixed... we cannot actually support something that is no
 longer maintained: we do find JRE bugs
 (http://wiki.apache.org/lucene-java/SunJavaBugs) and its important
 that bugs actually get fixed: cannot do everything with hacks.\
 - due to Java 5 we legitimate performance hits like 20% slower grouping speed.
 
 For reference please read through the issue mentioned above.
 
 A lot of the committers seem to be on the same page here to drop Java
 5 support so I am calling out an official vote.
 
 all Lucene 3.x releases will remain with Java 5 support this vote is
 for trunk only.
 
 
 Here is my +1
 
 Simon
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Drop Java 5 support for trunk (Lucene 4.0)

2011-06-27 Thread Michael McCandless
+1

Mike McCandless

http://blog.mikemccandless.com

On Mon, Jun 27, 2011 at 1:38 PM, Simon Willnauer
simon.willna...@googlemail.com wrote:
 This issue has been discussed on various occasions and lately on
 LUCENE-3239 (https://issues.apache.org/jira/browse/LUCENE-3239)

 The main reasons for this have been discussed on the issue but let me
 put them out here too:

 - Lack of testing on Jenkins with Java 5
 - Java 5 end of lifetime is reached a long time ago so Java 5 is
 totally unmaintained which means for us that bugs have to either be
 hacked around, tests disabled, warnings placed, but some things simply
 cannot be fixed... we cannot actually support something that is no
 longer maintained: we do find JRE bugs
 (http://wiki.apache.org/lucene-java/SunJavaBugs) and its important
 that bugs actually get fixed: cannot do everything with hacks.\
 - due to Java 5 we legitimate performance hits like 20% slower grouping speed.

 For reference please read through the issue mentioned above.

 A lot of the committers seem to be on the same page here to drop Java
 5 support so I am calling out an official vote.

 all Lucene 3.x releases will remain with Java 5 support this vote is
 for trunk only.


 Here is my +1

 Simon

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-tests-only-3.x - Build # 9126 - Failure

2011-06-27 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/9126/

1 tests failed.
REGRESSION:  org.apache.lucene.index.TestAddIndexes.testAddIndexesWithRollback

Error Message:
MockDirectoryWrapper: cannot close: there are still open files: {_co.cfs=1}

Stack Trace:
java.lang.RuntimeException: MockDirectoryWrapper: cannot close: there are still 
open files: {_co.cfs=1}
at 
org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:483)
at 
org.apache.lucene.index.TestAddIndexes$RunAddIndexesThreads.closeDir(TestAddIndexes.java:693)
at 
org.apache.lucene.index.TestAddIndexes.testAddIndexesWithRollback(TestAddIndexes.java:924)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1277)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1195)
Caused by: java.lang.RuntimeException: unclosed IndexOutput: _co.cfs
at 
org.apache.lucene.store.MockDirectoryWrapper.addFileHandle(MockDirectoryWrapper.java:410)
at 
org.apache.lucene.store.MockCompoundFileDirectoryWrapper.init(MockCompoundFileDirectoryWrapper.java:39)
at 
org.apache.lucene.store.MockDirectoryWrapper.createCompoundOutput(MockDirectoryWrapper.java:439)
at 
org.apache.lucene.index.SegmentMerger.createCompoundFile(SegmentMerger.java:128)
at org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:3101)
at 
org.apache.lucene.index.TestAddIndexes$CommitAndAddIndexes3.doBody(TestAddIndexes.java:839)
at 
org.apache.lucene.index.TestAddIndexes$RunAddIndexesThreads$1.run(TestAddIndexes.java:667)




Build Log (for compile errors):
[...truncated 6497 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2619) two sfields in geospatial search

2011-06-27 Thread jose rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055788#comment-13055788
 ] 

jose rodriguez commented on SOLR-2619:
--

Excuse my english David what i wanted to mean is that i didnt find the way to 
put all into q= without using fq.

Because i was reading about possibilities to write it using nested queries : 
http://www.lucidimagination.com/blog/2009/03/31/nested-queries-in-solr/

But all what i tried was without success.

And if is possible to use nested queries, in this case ... is better than my 
option using fq?

In solr im a newbie.

 two sfields in geospatial search
 

 Key: SOLR-2619
 URL: https://issues.apache.org/jira/browse/SOLR-2619
 Project: Solr
  Issue Type: Wish
  Components: clients - php
Affects Versions: 3.2
 Environment: Using with drupal
Reporter: jose rodriguez
 Fix For: 3.2


 Is it possible to create a query with two sfield (geospatial search)? .Want 
 to mean two diferents pt and d for each field.
 If i need from - to then i need fields around the from coordinate and around 
 the to coordinates.
 Thanks.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2366) Facet Range Gaps

2011-06-27 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055793#comment-13055793
 ] 

Hoss Man commented on SOLR-2366:


bq. Guess my main point with the examples was to suggest that a 
facet.range.spec should not require facet.range.start and facet.range.end, but 
that the first and last values in the spec list should be taken as start and 
end, instead of requiring start and end in addition. ...

bq. Simply document that facet.range.spec is mutually exclusive to the 
parameters gap,start,end and other.

I respect your argument, but i think if this new spec param is going to be 
mutually exclusive of facet.range.other as well as all of the existing 
mandatory facet.range params (facet.range.gap, facet.range.start, and 
facet.range.end) then it seems like what you're describing really shouldn't be 
an extension of facet.range at all ... it sounds should be some completley 
distinct type of faceting (sequence faceting ?) with it's own params and 
section in the response.  ie...

{noformat}
facet.seq=fieldName
f.fieldName.facet.seq.spec=0,5,25,50,100,200,400,*
f.fieldName.facet.seq.include=edge
{noformat}

(where facet.seq.include has same semantics as facet.range.include ... except i 
don't think edge makes sense at all w/o the other param concept ... need to 
think it through more)

Otherwise it could get really confusing for users trying to udnerstandwhat 
facet.range.* params do/don't make sense if they start using facet.range.gap 
and then switch to facet.range.spec (or vice-versa)  ... ie: how come i'm not 
getting the before/after ranges when i use 
'facet.range.spec=0,5,25,50facet.range.other=after' ?)



 Facet Range Gaps
 

 Key: SOLR-2366
 URL: https://issues.apache.org/jira/browse/SOLR-2366
 Project: Solr
  Issue Type: Improvement
Reporter: Grant Ingersoll
Priority: Minor
 Fix For: 3.3

 Attachments: SOLR-2366.patch, SOLR-2366.patch


 There really is no reason why the range gap for date and numeric faceting 
 needs to be evenly spaced.  For instance, if and when SOLR-1581 is completed 
 and one were doing spatial distance calculations, one could facet by function 
 into 3 different sized buckets: walking distance (0-5KM), driving distance 
 (5KM-150KM) and everything else (150KM+), for instance.  We should be able to 
 quantize the results into arbitrarily sized buckets.  I'd propose the syntax 
 to be a comma separated list of sizes for each bucket.  If only one value is 
 specified, then it behaves as it currently does.  Otherwise, it creates the 
 different size buckets.  If the number of buckets doesn't evenly divide up 
 the space, then the size of the last bucket specified is used to fill out the 
 remaining space (not sure on this)
 For instance,
 facet.range.start=0
 facet.range.end=400
 facet.range.gap=5,25,50,100
 would yield buckets of:
 0-5,5-30,30-80,80-180,180-280,280-380,380-400

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Drop Java 5 support for trunk (Lucene 4.0)

2011-06-27 Thread Sanne Grinovero
+1

Sanne

2011/6/27 Michael McCandless luc...@mikemccandless.com:
 +1

 Mike McCandless

 http://blog.mikemccandless.com

 On Mon, Jun 27, 2011 at 1:38 PM, Simon Willnauer
 simon.willna...@googlemail.com wrote:
 This issue has been discussed on various occasions and lately on
 LUCENE-3239 (https://issues.apache.org/jira/browse/LUCENE-3239)

 The main reasons for this have been discussed on the issue but let me
 put them out here too:

 - Lack of testing on Jenkins with Java 5
 - Java 5 end of lifetime is reached a long time ago so Java 5 is
 totally unmaintained which means for us that bugs have to either be
 hacked around, tests disabled, warnings placed, but some things simply
 cannot be fixed... we cannot actually support something that is no
 longer maintained: we do find JRE bugs
 (http://wiki.apache.org/lucene-java/SunJavaBugs) and its important
 that bugs actually get fixed: cannot do everything with hacks.\
 - due to Java 5 we legitimate performance hits like 20% slower grouping 
 speed.

 For reference please read through the issue mentioned above.

 A lot of the committers seem to be on the same page here to drop Java
 5 support so I am calling out an official vote.

 all Lucene 3.x releases will remain with Java 5 support this vote is
 for trunk only.


 Here is my +1

 Simon

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Drop Java 5 support for trunk (Lucene 4.0)

2011-06-27 Thread Chris Male
+1

On Tue, Jun 28, 2011 at 10:04 AM, Sanne Grinovero sanne.grinov...@gmail.com
 wrote:

 +1

 Sanne

 2011/6/27 Michael McCandless luc...@mikemccandless.com:
  +1
 
  Mike McCandless
 
  http://blog.mikemccandless.com
 
  On Mon, Jun 27, 2011 at 1:38 PM, Simon Willnauer
  simon.willna...@googlemail.com wrote:
  This issue has been discussed on various occasions and lately on
  LUCENE-3239 (https://issues.apache.org/jira/browse/LUCENE-3239)
 
  The main reasons for this have been discussed on the issue but let me
  put them out here too:
 
  - Lack of testing on Jenkins with Java 5
  - Java 5 end of lifetime is reached a long time ago so Java 5 is
  totally unmaintained which means for us that bugs have to either be
  hacked around, tests disabled, warnings placed, but some things simply
  cannot be fixed... we cannot actually support something that is no
  longer maintained: we do find JRE bugs
  (http://wiki.apache.org/lucene-java/SunJavaBugs) and its important
  that bugs actually get fixed: cannot do everything with hacks.\
  - due to Java 5 we legitimate performance hits like 20% slower grouping
 speed.
 
  For reference please read through the issue mentioned above.
 
  A lot of the committers seem to be on the same page here to drop Java
  5 support so I am calling out an official vote.
 
  all Lucene 3.x releases will remain with Java 5 support this vote is
  for trunk only.
 
 
  Here is my +1
 
  Simon
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: dev-h...@lucene.apache.org
 
 
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: dev-h...@lucene.apache.org
 
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




-- 
Chris Male | Software Developer | JTeam BV.| www.jteam.nl


[jira] [Resolved] (SOLR-2619) two sfields in geospatial search

2011-06-27 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-2619.


   Resolution: Invalid
Fix Version/s: (was: 3.2)

there doesn't seem to actually be a concrete improvemment/bug identified here.

jose: if you are having difficulties understanding/using solr features, please 
start by posting a detailed question explaining your usecase/problem to the 
solr-user mailing list

http://wiki.apache.org/solr/UsingMailingLists

 two sfields in geospatial search
 

 Key: SOLR-2619
 URL: https://issues.apache.org/jira/browse/SOLR-2619
 Project: Solr
  Issue Type: Wish
  Components: clients - php
Affects Versions: 3.2
 Environment: Using with drupal
Reporter: jose rodriguez

 Is it possible to create a query with two sfield (geospatial search)? .Want 
 to mean two diferents pt and d for each field.
 If i need from - to then i need fields around the from coordinate and around 
 the to coordinates.
 Thanks.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Drop Java 5 support for trunk (Lucene 4.0)

2011-06-27 Thread Chris Hostetter

+1

: Date: Mon, 27 Jun 2011 19:38:08 +0200
: From: Simon Willnauer simon.willna...@googlemail.com
: Reply-To: dev@lucene.apache.org, simon.willna...@gmail.com
: To: dev@lucene.apache.org
: Subject: [VOTE] Drop Java 5 support for trunk (Lucene 4.0)
: 
: This issue has been discussed on various occasions and lately on
: LUCENE-3239 (https://issues.apache.org/jira/browse/LUCENE-3239)
: 
: The main reasons for this have been discussed on the issue but let me
: put them out here too:
: 
: - Lack of testing on Jenkins with Java 5
: - Java 5 end of lifetime is reached a long time ago so Java 5 is
: totally unmaintained which means for us that bugs have to either be
: hacked around, tests disabled, warnings placed, but some things simply
: cannot be fixed... we cannot actually support something that is no
: longer maintained: we do find JRE bugs
: (http://wiki.apache.org/lucene-java/SunJavaBugs) and its important
: that bugs actually get fixed: cannot do everything with hacks.\
: - due to Java 5 we legitimate performance hits like 20% slower grouping speed.
: 
: For reference please read through the issue mentioned above.
: 
: A lot of the committers seem to be on the same page here to drop Java
: 5 support so I am calling out an official vote.
: 
: all Lucene 3.x releases will remain with Java 5 support this vote is
: for trunk only.
: 
: 
: Here is my +1
: 
: Simon
: 
: -
: To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
: For additional commands, e-mail: dev-h...@lucene.apache.org
: 
: 

-Hoss

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3228) build should allow you (especially hudson) to refer to a local javadocs installation instead of downloading

2011-06-27 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13056084#comment-13056084
 ] 

Robert Muir commented on LUCENE-3228:
-

I am glad you had the same WTF, although ant docs say its ok to use both, the 
current tasks in e.g. lucene have both the link attribute and nested 
link-without-href-wtf, and as i tried mixing linkoffline in different ways, it 
would appear to work, until i changed the link to javaBROKENURL.sun.com/, 
etc.

I think we should go with this patch so we aren't downloading this junk 
anymore, it causes false build failures, the only trick I can think of is how 
to ensure lucene source releases build by themself without reaching back to 
dev-tools (i think this is broken on trunk at the moment, but it does work on 
3.x right now)

 build should allow you (especially hudson) to refer to a local javadocs 
 installation instead of downloading
 ---

 Key: LUCENE-3228
 URL: https://issues.apache.org/jira/browse/LUCENE-3228
 Project: Lucene - Java
  Issue Type: Task
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: LUCENE-3228.patch


 Currently, we fail on all javadocs warnings.
 However, you get a warning if it cannot download the package-list from sun.com
 So I think we should allow you optionally set a sysprop using linkoffline.
 Then we would get much less hudson fake failures
 I feel like Mike opened an issue for this already but I cannot find it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-06-27 Thread Varun Thacker (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Thacker updated LUCENE-2793:
--

Attachment: LUCENE-2793.patch

I have made the necessary changes. Still I might have missed out changing 
couple of Test Cases to random IOContext. 

I wanted to put it our so that you'll can have a look as soon as possible. 

 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
Assignee: Varun Thacker
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Attachments: LUCENE-2793-nrt.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch


 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3240) Move FunctionQuery, ValueSources and DocValues to Queries module

2011-06-27 Thread Chris Male (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Male resolved LUCENE-3240.


Resolution: Fixed
  Assignee: Chris Male

Committed revision 1140379.

I'll open a separate task to move the impls.

 Move FunctionQuery, ValueSources and DocValues to Queries module
 

 Key: LUCENE-3240
 URL: https://issues.apache.org/jira/browse/LUCENE-3240
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: core/search
Reporter: Chris Male
Assignee: Chris Male
 Fix For: 4.0

 Attachments: LUCENE-3240.patch, LUCENE-3240.patch, LUCENE-3240.patch


 Having resolved the FunctionQuery sorting issue and moved the MutableValue 
 classes, we can now move FunctionQuery, ValueSources and DocValues to a 
 Queries module.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3249) Move Solr's FunctionQuery impls to Queries Module

2011-06-27 Thread Chris Male (JIRA)
Move Solr's FunctionQuery impls to Queries Module
-

 Key: LUCENE-3249
 URL: https://issues.apache.org/jira/browse/LUCENE-3249
 Project: Lucene - Java
  Issue Type: Sub-task
Reporter: Chris Male


Now that we have the main interfaces in the Queries module, we can move the 
actual impls over.

Impls that won't be moved are:

function/distance/* (to be moved to a spatial module)
function/FileFloatSource.java (depends on Solr's Schema, data directories and 
exposes a RequestHandler)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2583) Make external scoring more efficient (ExternalFileField, FileFloatSource)

2011-06-27 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13056224#comment-13056224
 ] 

Koji Sekiguchi commented on SOLR-2583:
--

I didn't save the test snippet because I wrote it out of my office (I used 
stranger's PC). What I did was just using CompactByteArray instead of 
CompactFloatArray in your FileFloatSourceMemoryTest.java.


 Make external scoring more efficient (ExternalFileField, FileFloatSource)
 -

 Key: SOLR-2583
 URL: https://issues.apache.org/jira/browse/SOLR-2583
 Project: Solr
  Issue Type: Improvement
  Components: search
Reporter: Martin Grotzke
Priority: Minor
 Attachments: FileFloatSource.java.patch, patch.txt


 External scoring eats much memory, depending on the number of documents in 
 the index. The ExternalFileField (used for external scoring) uses 
 FileFloatSource, where one FileFloatSource is created per external scoring 
 file. FileFloatSource creates a float array with the size of the number of 
 docs (this is also done if the file to load is not found). If there are much 
 less entries in the scoring file than there are number of docs in total the 
 big float array wastes much memory.
 This could be optimized by using a map of doc - score, so that the map 
 contains as many entries as there are scoring entries in the external file, 
 but not more.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3250) remove contrib/misc and contrib/wordnet's dependencies on analyzers module

2011-06-27 Thread Robert Muir (JIRA)
remove contrib/misc and contrib/wordnet's dependencies on analyzers module
--

 Key: LUCENE-3250
 URL: https://issues.apache.org/jira/browse/LUCENE-3250
 Project: Lucene - Java
  Issue Type: Task
Reporter: Robert Muir


These contribs don't actually analyze any text.

After this patch, only the contrib/demo relies upon the analyzers module... we 
can separately try to figure that one out (I don't think any of these lucene 
contribs needs to reach back into modules/)



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3250) remove contrib/misc and contrib/wordnet's dependencies on analyzers module

2011-06-27 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3250:


Attachment: LUCENE-3250.patch

 remove contrib/misc and contrib/wordnet's dependencies on analyzers module
 --

 Key: LUCENE-3250
 URL: https://issues.apache.org/jira/browse/LUCENE-3250
 Project: Lucene - Java
  Issue Type: Task
Reporter: Robert Muir
 Attachments: LUCENE-3250.patch


 These contribs don't actually analyze any text.
 After this patch, only the contrib/demo relies upon the analyzers module... 
 we can separately try to figure that one out (I don't think any of these 
 lucene contribs needs to reach back into modules/)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3250) remove contrib/misc and contrib/wordnet's dependencies on analyzers module

2011-06-27 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13056236#comment-13056236
 ] 

Chris Male commented on LUCENE-3250:


+1

 remove contrib/misc and contrib/wordnet's dependencies on analyzers module
 --

 Key: LUCENE-3250
 URL: https://issues.apache.org/jira/browse/LUCENE-3250
 Project: Lucene - Java
  Issue Type: Task
Reporter: Robert Muir
 Attachments: LUCENE-3250.patch, LUCENE-3250_suggest.patch


 These contribs don't actually analyze any text.
 After this patch, only the contrib/demo relies upon the analyzers module... 
 we can separately try to figure that one out (I don't think any of these 
 lucene contribs needs to reach back into modules/)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3250) remove contrib/misc and contrib/wordnet's dependencies on analyzers module

2011-06-27 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13056244#comment-13056244
 ] 

Robert Muir commented on LUCENE-3250:
-

ok, i'll commit this soon, if anyone wants to take care of the intellij/maven 
deps, please go for it (eclipse is one huge megaproject with all the jars in 
classpath so it does not know about these things)

 remove contrib/misc and contrib/wordnet's dependencies on analyzers module
 --

 Key: LUCENE-3250
 URL: https://issues.apache.org/jira/browse/LUCENE-3250
 Project: Lucene - Java
  Issue Type: Task
Reporter: Robert Muir
 Attachments: LUCENE-3250.patch, LUCENE-3250_suggest.patch


 These contribs don't actually analyze any text.
 After this patch, only the contrib/demo relies upon the analyzers module... 
 we can separately try to figure that one out (I don't think any of these 
 lucene contribs needs to reach back into modules/)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3250) remove contrib/misc and contrib/wordnet's dependencies on analyzers module

2011-06-27 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-3250.
-

   Resolution: Fixed
Fix Version/s: 4.0
 Assignee: Robert Muir

 remove contrib/misc and contrib/wordnet's dependencies on analyzers module
 --

 Key: LUCENE-3250
 URL: https://issues.apache.org/jira/browse/LUCENE-3250
 Project: Lucene - Java
  Issue Type: Task
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 4.0

 Attachments: LUCENE-3250.patch, LUCENE-3250_suggest.patch


 These contribs don't actually analyze any text.
 After this patch, only the contrib/demo relies upon the analyzers module... 
 we can separately try to figure that one out (I don't think any of these 
 lucene contribs needs to reach back into modules/)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2950) Modules under top-level modules/ directory should be included in lucene's build targets, e.g. 'package-tgz', 'package-tgz-src', and 'javadocs'

2011-06-27 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13056245#comment-13056245
 ] 

Robert Muir commented on LUCENE-2950:
-

just following up: the only thing in lucene reaching back into modules right 
now is contrib/demo...

 Modules under top-level modules/ directory should be included in lucene's 
 build targets, e.g. 'package-tgz', 'package-tgz-src', and 'javadocs'
 --

 Key: LUCENE-2950
 URL: https://issues.apache.org/jira/browse/LUCENE-2950
 Project: Lucene - Java
  Issue Type: Bug
  Components: general/build
Affects Versions: 4.0
Reporter: Steven Rowe
Priority: Blocker
 Fix For: 4.0


 Lucene's top level {{modules/}} directory is not included in the binary or 
 source release distribution Ant targets {{package-tgz}} and 
 {{package-tgz-src}}, or in {{javadocs}}, in {{lucene/build.xml}}.  (However, 
 these targets do include Lucene contribs.)
 This issue is visible via the nightly Jenkins (formerly Hudson) job named 
 Lucene-trunk, which publishes binary and source artifacts, using 
 {{package-tgz}} and {{package-tgz-src}}, as well as javadocs using the 
 {{javadocs}} target, all run from the top-level {{lucene/}} directory.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3250) remove contrib/misc and contrib/wordnet's dependencies on analyzers module

2011-06-27 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13056246#comment-13056246
 ] 

Chris Male commented on LUCENE-3250:


I'll sort out the IntelliJ and Maven deps in a moment.

 remove contrib/misc and contrib/wordnet's dependencies on analyzers module
 --

 Key: LUCENE-3250
 URL: https://issues.apache.org/jira/browse/LUCENE-3250
 Project: Lucene - Java
  Issue Type: Task
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 4.0

 Attachments: LUCENE-3250.patch, LUCENE-3250_suggest.patch


 These contribs don't actually analyze any text.
 After this patch, only the contrib/demo relies upon the analyzers module... 
 we can separately try to figure that one out (I don't think any of these 
 lucene contribs needs to reach back into modules/)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3249) Move Solr's FunctionQuery impls to Queries Module

2011-06-27 Thread Chris Male (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Male updated LUCENE-3249:
---

Attachment: LUCENE-3249.patch

Patch which moves the impls.

Compiles and tests pass.

I'd like to commit this in the next day or so.

 Move Solr's FunctionQuery impls to Queries Module
 -

 Key: LUCENE-3249
 URL: https://issues.apache.org/jira/browse/LUCENE-3249
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: core/search
Reporter: Chris Male
 Fix For: 4.0

 Attachments: LUCENE-3249.patch


 Now that we have the main interfaces in the Queries module, we can move the 
 actual impls over.
 Impls that won't be moved are:
 function/distance/* (to be moved to a spatial module)
 function/FileFloatSource.java (depends on Solr's Schema, data directories and 
 exposes a RequestHandler)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3249) Move Solr's FunctionQuery impls to Queries Module

2011-06-27 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13056251#comment-13056251
 ] 

Chris Male commented on LUCENE-3249:


Command for patch:

{code}

svn --parents mkdir 
modules/queries/src/java/org/apache/lucene/queries/function/valuesource
svn --parents mkdir 
modules/queries/src/java/org/apache/lucene/queries/function/docvalues
svn move solr/src/java/org/apache/solr/search/function/*Function.java 
modules/queries/src/java/org/apache/lucene/queries/function/valuesource/
svn move solr/src/java/org/apache/solr/search/function/*FieldSource.java 
modules/queries/src/java/org/apache/lucene/queries/function/valuesource
svn move solr/src/java/org/apache/solr/search/function/*ValueSource.java 
modules/queries/src/java/org/apache/lucene/queries/function/valuesource
svn move solr/src/java/org/apache/solr/search/function/*CacheSource.java 
modules/queries/src/java/org/apache/lucene/queries/function/valuesource
svn move solr/src/java/org/apache/solr/search/function/ConstNumberSource.java 
modules/queries/src/java/org/apache/lucene/queries/function/valuesource
svn move solr/src/java/org/apache/solr/search/function/*DocValues.java 
modules/queries/src/java/org/apache/lucene/queries/function/docvalues
{code}

 Move Solr's FunctionQuery impls to Queries Module
 -

 Key: LUCENE-3249
 URL: https://issues.apache.org/jira/browse/LUCENE-3249
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: core/search
Reporter: Chris Male
 Fix For: 4.0

 Attachments: LUCENE-3249.patch


 Now that we have the main interfaces in the Queries module, we can move the 
 actual impls over.
 Impls that won't be moved are:
 function/distance/* (to be moved to a spatial module)
 function/FileFloatSource.java (depends on Solr's Schema, data directories and 
 exposes a RequestHandler)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Reopened] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs

2011-06-27 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir reopened LUCENE-3191:
-


Reopening: this code in SlowCollatedStringComparator is totally broken:
{noformat}
  @Override
  public int compareValues(BytesRef first, BytesRef second) {
if (first == null) {
  if (second == null) {
return 0;
  }
  return -1;
} else if (second == null) {
  return 1;
} else {
  return collator.compare(first, second);
}
  }
{noformat}

I haven't tracked this issue to understand whats going on here, but you cannot 
pass BytesRefs to collator.compare. If this code is ever reached (and looking 
at the test i wrote for this damn thing, its unclear if this code is even 
necessary?!), it *will* throw ClassCastException:
http://download.oracle.com/javase/1,5.0/docs/api/java/text/Collator.html#compare(java.lang.Object,
 java.lang.Object)


 Add TopDocs.merge to merge multiple TopDocs
 ---

 Key: LUCENE-3191
 URL: https://issues.apache.org/jira/browse/LUCENE-3191
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 3.3, 4.0

 Attachments: LUCENE-3191-3x.patch, LUCENE-3191.patch, 
 LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch


 It's not easy today to merge TopDocs, eg produced by multiple shards,
 supporting arbitrary Sort.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3250) remove contrib/misc and contrib/wordnet's dependencies on analyzers module

2011-06-27 Thread Chris Male (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Male updated LUCENE-3250:
---

Attachment: LUCENE-3250.patch

Patch which fixes the deps for Maven and IntelliJ.  Also fixes incorrect 
IntelliJ dependencies on the common module, when it should be analysis-common.

I'll commit.

 remove contrib/misc and contrib/wordnet's dependencies on analyzers module
 --

 Key: LUCENE-3250
 URL: https://issues.apache.org/jira/browse/LUCENE-3250
 Project: Lucene - Java
  Issue Type: Task
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 4.0

 Attachments: LUCENE-3250.patch, LUCENE-3250.patch, 
 LUCENE-3250_suggest.patch


 These contribs don't actually analyze any text.
 After this patch, only the contrib/demo relies upon the analyzers module... 
 we can separately try to figure that one out (I don't think any of these 
 lucene contribs needs to reach back into modules/)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2950) Modules under top-level modules/ directory should be included in lucene's build targets, e.g. 'package-tgz', 'package-tgz-src', and 'javadocs'

2011-06-27 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13056259#comment-13056259
 ] 

Chris Male commented on LUCENE-2950:


The xml-query-parser demo also reaches back to StandardAnalyzer.  Does this get 
included in the packaging?

 Modules under top-level modules/ directory should be included in lucene's 
 build targets, e.g. 'package-tgz', 'package-tgz-src', and 'javadocs'
 --

 Key: LUCENE-2950
 URL: https://issues.apache.org/jira/browse/LUCENE-2950
 Project: Lucene - Java
  Issue Type: Bug
  Components: general/build
Affects Versions: 4.0
Reporter: Steven Rowe
Priority: Blocker
 Fix For: 4.0


 Lucene's top level {{modules/}} directory is not included in the binary or 
 source release distribution Ant targets {{package-tgz}} and 
 {{package-tgz-src}}, or in {{javadocs}}, in {{lucene/build.xml}}.  (However, 
 these targets do include Lucene contribs.)
 This issue is visible via the nightly Jenkins (formerly Hudson) job named 
 Lucene-trunk, which publishes binary and source artifacts, using 
 {{package-tgz}} and {{package-tgz-src}}, as well as javadocs using the 
 {{javadocs}} target, all run from the top-level {{lucene/}} directory.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2341) explore morfologik integration

2011-06-27 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13056261#comment-13056261
 ] 

Robert Muir commented on LUCENE-2341:
-

{quote}
provided each thread obtains its own TokenStreamComponents through 
ReusableAnalyzerBase.createComponents (is this always the case ? looking at 
other filters, thay don't look thread-safe neither ..)
{quote}

yes, its the case that Analyzer/ReusableAnalyzerBase take care of this with a 
threadlocal, as long as each thread only needs to use one tokenstream at a time 
(which is true for all lucene consumers), see:
http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/src/java/org/apache/lucene/analysis/Analyzer.java


 explore morfologik integration
 --

 Key: LUCENE-2341
 URL: https://issues.apache.org/jira/browse/LUCENE-2341
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/analysis
Reporter: Robert Muir
Assignee: Dawid Weiss
 Attachments: LUCENE-2341.diff, LUCENE-2341.diff, LUCENE-2341.diff, 
 morfologik-fsa-1.5.2.jar, morfologik-polish-1.5.2.jar, 
 morfologik-stemming-1.5.0.jar, morfologik-stemming-1.5.2.jar


 Dawid Weiss mentioned on LUCENE-2298 that there is another Polish stemmer 
 available:
 http://sourceforge.net/projects/morfologik/
 This works differently than LUCENE-2298, and ideally would be another option 
 for users.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Drop Java 5 support for trunk (Lucene 4.0)

2011-06-27 Thread Shai Erera
+1

On Tue, Jun 28, 2011 at 1:50 AM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 +1

 : Date: Mon, 27 Jun 2011 19:38:08 +0200
 : From: Simon Willnauer simon.willna...@googlemail.com
 : Reply-To: dev@lucene.apache.org, simon.willna...@gmail.com
 : To: dev@lucene.apache.org
 : Subject: [VOTE] Drop Java 5 support for trunk (Lucene 4.0)
 :
 : This issue has been discussed on various occasions and lately on
 : LUCENE-3239 (https://issues.apache.org/jira/browse/LUCENE-3239)
 :
 : The main reasons for this have been discussed on the issue but let me
 : put them out here too:
 :
 : - Lack of testing on Jenkins with Java 5
 : - Java 5 end of lifetime is reached a long time ago so Java 5 is
 : totally unmaintained which means for us that bugs have to either be
 : hacked around, tests disabled, warnings placed, but some things simply
 : cannot be fixed... we cannot actually support something that is no
 : longer maintained: we do find JRE bugs
 : (http://wiki.apache.org/lucene-java/SunJavaBugs) and its important
 : that bugs actually get fixed: cannot do everything with hacks.\
 : - due to Java 5 we legitimate performance hits like 20% slower grouping
 speed.
 :
 : For reference please read through the issue mentioned above.
 :
 : A lot of the committers seem to be on the same page here to drop Java
 : 5 support so I am calling out an official vote.
 :
 : all Lucene 3.x releases will remain with Java 5 support this vote is
 : for trunk only.
 :
 :
 : Here is my +1
 :
 : Simon
 :
 : -
 : To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 : For additional commands, e-mail: dev-h...@lucene.apache.org
 :
 :

 -Hoss

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




[jira] [Created] (SOLR-2623) Solr JMX MBeans do not survive core reloads

2011-06-27 Thread Alexey Serba (JIRA)
Solr JMX MBeans do not survive core reloads
---

 Key: SOLR-2623
 URL: https://issues.apache.org/jira/browse/SOLR-2623
 Project: Solr
  Issue Type: Bug
  Components: multicore
Affects Versions: 3.2, 3.1, 1.4.1, 1.4
Reporter: Alexey Serba
Priority: Minor


Solr JMX MBeans do not survive core reloads

{noformat:title=Steps to reproduce}
sh cd example
sh vi multicore/core0/conf/solrconfig.xml # enable jmx
sh java -Dcom.sun.management.jmxremote -Dsolr.solr.home=multicore -jar 
start.jar
sh echo 'open 8842 # 8842 is java pid
 domain solr/core0
 beans
 ' | java -jar jmxterm-1.0-alpha-4-uber.jar

solr/core0:id=core0,type=core
solr/core0:id=org.apache.solr.handler.StandardRequestHandler,type=org.apache.solr.handler.StandardRequestHandler
solr/core0:id=org.apache.solr.handler.StandardRequestHandler,type=standard
solr/core0:id=org.apache.solr.handler.XmlUpdateRequestHandler,type=/update
solr/core0:id=org.apache.solr.handler.XmlUpdateRequestHandler,type=org.apache.solr.handler.XmlUpdateRequestHandler
...
solr/core0:id=org.apache.solr.search.SolrIndexSearcher,type=searcher
solr/core0:id=org.apache.solr.update.DirectUpdateHandler2,type=updateHandler
sh curl 'http://localhost:8983/solr/admin/cores?action=RELOADcore=core0'
sh echo 'open 8842 # 8842 is java pid
 domain solr/core0
 beans
 ' | java -jar jmxterm-1.0-alpha-4-uber.jar
# there's only one bean left after Solr core reload
solr/core0:id=org.apache.solr.search.SolrIndexSearcher,type=Searcher@2e831a91 
main
{noformat}

The root cause of this is Solr core reload behavior:
# create new core (which overwrites existing registered MBeans)
# register new core and close old one (we remove/un-register MBeans on 
oldCore.close)

The correct sequence is:
# unregister MBeans from old core
# create and register new core
# close old core without touching MBeans

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Issues with Grouping

2011-06-27 Thread Bill Bell
The trunk has issues with grouping (NPE). I get this with or without
f.hgid_i1.facet.numFacetTerms, 1...


I think it has to do with a NPE in group in 4.0 it fails on other code.
Thoughts?

[junit] - Standard Error -
[junit] NOTE: reproduce with: ant test
-Dtestcase=NumFacetTermsFacetsTest
-Dtestmethod=testNumFacetTermsFacetCounts
-Dtests.seed=3921835369594659663:-3219730304883530389
[junit] *** BEGIN
org.apache.solr.request.NumFacetTermsFacetsTest.testNumFacetTermsFacetCount
s: Insane FieldCache usage(s) ***
[junit] SUBREADER: Found caches for descendants of
DirectoryReader(segments_3 _0(4.0):C6)+hgid_i1
[junit] 'DirectoryReader(segments_3 _0(4.0):C6)'='hgid_i1',class
org.apache.lucene.search.FieldCache$DocTermsIndex,org.apache.lucene.search.
cache.DocTermsIndexCreator@603bb3eb=org.apache.lucene.search.cache.DocTerm
sIndexCreator$DocTermsIndexImpl#1026179434 (size =~ 372 bytes)
[junit] 

'org.apache.lucene.index.SegmentCoreReaders@7e8905bd'='hgid_i1',int,org.a
pache.lucene.search.cache.IntValuesCreator@30781822=org.apache.lucene.sear
ch.cache.CachedArray$IntValues#291172425 (size =~ 92 bytes)
[junit] 
[junit] *** END
org.apache.solr.request.NumFacetTermsFacetsTest.testNumFacetTermsFacetCount
s: Insane FieldCache usage(s) ***
[junit] -  ---
[junit] Testcase:
testNumFacetTermsFacetCounts(org.apache.solr.request.NumFacetTermsFacetsTes
t): FAILED
[junit] 
org.apache.solr.request.NumFacetTermsFacetsTest.testNumFacetTermsFacetCount
s: Insane FieldCache usage(s) found expected:0 but was:1
[junit] junit.framework.AssertionFailedError:
org.apache.solr.request.NumFacetTermsFacetsTest.testNumFacetTermsFacetCount
s: Insane FieldCache usage(s) found expected:0 but was:1
[junit] at 
org.apache.lucene.util.LuceneTestCase.assertSaneFieldCaches(LuceneTestCase.
java:725)
[junit] at 
org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:620)
[junit] at 
org.apache.solr.SolrTestCaseJ4.tearDown(SolrTestCaseJ4.java:96)
[junit] at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneT
estCase.java:1430)
[junit] at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneT
estCase.java:1348)
[junit] 


assertQ(check group and facet counts with numFacetTerms=1,
req(q, id:[1 TO 6]
,indent, on
,facet, true
,group, true
,group.field, hgid_i1
,f.hgid_i1.facet.limit, -1
,f.hgid_i1.facet.mincount, 1
,f.hgid_i1.facet.numFacetTerms, 1
,facet.field, hgid_i1
)
,*[count(//arr[@name='groups'])=1]
,*[count(//lst[@name='facet_fields']/lst[@name='hgid_i1']/int)=1] //
there are 1 unique items
,//lst[@name='hgid_i1']/int[@name='numFacetTerms'][.='4']
);



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2623) Solr JMX MBeans do not survive core reloads

2011-06-27 Thread Alexey Serba (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13056281#comment-13056281
 ] 

Alexey Serba commented on SOLR-2623:


Related bug report in solr mailing list - 
http://www.lucidimagination.com/search/document/f109d695b7e5d2ae/weird_issue_with_solr_and_jconsole_jmx

 Solr JMX MBeans do not survive core reloads
 ---

 Key: SOLR-2623
 URL: https://issues.apache.org/jira/browse/SOLR-2623
 Project: Solr
  Issue Type: Bug
  Components: multicore
Affects Versions: 1.4, 1.4.1, 3.1, 3.2
Reporter: Alexey Serba
Priority: Minor

 Solr JMX MBeans do not survive core reloads
 {noformat:title=Steps to reproduce}
 sh cd example
 sh vi multicore/core0/conf/solrconfig.xml # enable jmx
 sh java -Dcom.sun.management.jmxremote -Dsolr.solr.home=multicore -jar 
 start.jar
 sh echo 'open 8842 # 8842 is java pid
  domain solr/core0
  beans
  ' | java -jar jmxterm-1.0-alpha-4-uber.jar
 
 solr/core0:id=core0,type=core
 solr/core0:id=org.apache.solr.handler.StandardRequestHandler,type=org.apache.solr.handler.StandardRequestHandler
 solr/core0:id=org.apache.solr.handler.StandardRequestHandler,type=standard
 solr/core0:id=org.apache.solr.handler.XmlUpdateRequestHandler,type=/update
 solr/core0:id=org.apache.solr.handler.XmlUpdateRequestHandler,type=org.apache.solr.handler.XmlUpdateRequestHandler
 ...
 solr/core0:id=org.apache.solr.search.SolrIndexSearcher,type=searcher
 solr/core0:id=org.apache.solr.update.DirectUpdateHandler2,type=updateHandler
 sh curl 'http://localhost:8983/solr/admin/cores?action=RELOADcore=core0'
 sh echo 'open 8842 # 8842 is java pid
  domain solr/core0
  beans
  ' | java -jar jmxterm-1.0-alpha-4-uber.jar
 # there's only one bean left after Solr core reload
 solr/core0:id=org.apache.solr.search.SolrIndexSearcher,type=Searcher@2e831a91 
 main
 {noformat}
 The root cause of this is Solr core reload behavior:
 # create new core (which overwrites existing registered MBeans)
 # register new core and close old one (we remove/un-register MBeans on 
 oldCore.close)
 The correct sequence is:
 # unregister MBeans from old core
 # create and register new core
 # close old core without touching MBeans

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2623) Solr JMX MBeans do not survive core reloads

2011-06-27 Thread Alexey Serba (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Serba updated SOLR-2623:
---

Attachment: SOLR-2623.patch

Added test

 Solr JMX MBeans do not survive core reloads
 ---

 Key: SOLR-2623
 URL: https://issues.apache.org/jira/browse/SOLR-2623
 Project: Solr
  Issue Type: Bug
  Components: multicore
Affects Versions: 1.4, 1.4.1, 3.1, 3.2
Reporter: Alexey Serba
Priority: Minor
 Attachments: SOLR-2623.patch


 Solr JMX MBeans do not survive core reloads
 {noformat:title=Steps to reproduce}
 sh cd example
 sh vi multicore/core0/conf/solrconfig.xml # enable jmx
 sh java -Dcom.sun.management.jmxremote -Dsolr.solr.home=multicore -jar 
 start.jar
 sh echo 'open 8842 # 8842 is java pid
  domain solr/core0
  beans
  ' | java -jar jmxterm-1.0-alpha-4-uber.jar
 
 solr/core0:id=core0,type=core
 solr/core0:id=org.apache.solr.handler.StandardRequestHandler,type=org.apache.solr.handler.StandardRequestHandler
 solr/core0:id=org.apache.solr.handler.StandardRequestHandler,type=standard
 solr/core0:id=org.apache.solr.handler.XmlUpdateRequestHandler,type=/update
 solr/core0:id=org.apache.solr.handler.XmlUpdateRequestHandler,type=org.apache.solr.handler.XmlUpdateRequestHandler
 ...
 solr/core0:id=org.apache.solr.search.SolrIndexSearcher,type=searcher
 solr/core0:id=org.apache.solr.update.DirectUpdateHandler2,type=updateHandler
 sh curl 'http://localhost:8983/solr/admin/cores?action=RELOADcore=core0'
 sh echo 'open 8842 # 8842 is java pid
  domain solr/core0
  beans
  ' | java -jar jmxterm-1.0-alpha-4-uber.jar
 # there's only one bean left after Solr core reload
 solr/core0:id=org.apache.solr.search.SolrIndexSearcher,type=Searcher@2e831a91 
 main
 {noformat}
 The root cause of this is Solr core reload behavior:
 # create new core (which overwrites existing registered MBeans)
 # register new core and close old one (we remove/un-register MBeans on 
 oldCore.close)
 The correct sequence is:
 # unregister MBeans from old core
 # create and register new core
 # close old core without touching MBeans

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Drop Java 5 support for trunk (Lucene 4.0)

2011-06-27 Thread Shalin Shekhar Mangar
+1

On Mon, Jun 27, 2011 at 11:08 PM, Simon Willnauer
simon.willna...@googlemail.com wrote:
 This issue has been discussed on various occasions and lately on
 LUCENE-3239 (https://issues.apache.org/jira/browse/LUCENE-3239)

 The main reasons for this have been discussed on the issue but let me
 put them out here too:

 - Lack of testing on Jenkins with Java 5
 - Java 5 end of lifetime is reached a long time ago so Java 5 is
 totally unmaintained which means for us that bugs have to either be
 hacked around, tests disabled, warnings placed, but some things simply
 cannot be fixed... we cannot actually support something that is no
 longer maintained: we do find JRE bugs
 (http://wiki.apache.org/lucene-java/SunJavaBugs) and its important
 that bugs actually get fixed: cannot do everything with hacks.\
 - due to Java 5 we legitimate performance hits like 20% slower grouping speed.

 For reference please read through the issue mentioned above.

 A lot of the committers seem to be on the same page here to drop Java
 5 support so I am calling out an official vote.

 all Lucene 3.x releases will remain with Java 5 support this vote is
 for trunk only.


 Here is my +1

 Simon

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org





-- 
Regards,
Shalin Shekhar Mangar.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-2623) Solr JMX MBeans do not survive core reloads

2011-06-27 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar reassigned SOLR-2623:
---

Assignee: Shalin Shekhar Mangar

 Solr JMX MBeans do not survive core reloads
 ---

 Key: SOLR-2623
 URL: https://issues.apache.org/jira/browse/SOLR-2623
 Project: Solr
  Issue Type: Bug
  Components: multicore
Affects Versions: 1.4, 1.4.1, 3.1, 3.2
Reporter: Alexey Serba
Assignee: Shalin Shekhar Mangar
Priority: Minor
 Attachments: SOLR-2623.patch, SOLR-2623.patch, SOLR-2623.patch


 Solr JMX MBeans do not survive core reloads
 {noformat:title=Steps to reproduce}
 sh cd example
 sh vi multicore/core0/conf/solrconfig.xml # enable jmx
 sh java -Dcom.sun.management.jmxremote -Dsolr.solr.home=multicore -jar 
 start.jar
 sh echo 'open 8842 # 8842 is java pid
  domain solr/core0
  beans
  ' | java -jar jmxterm-1.0-alpha-4-uber.jar
 
 solr/core0:id=core0,type=core
 solr/core0:id=org.apache.solr.handler.StandardRequestHandler,type=org.apache.solr.handler.StandardRequestHandler
 solr/core0:id=org.apache.solr.handler.StandardRequestHandler,type=standard
 solr/core0:id=org.apache.solr.handler.XmlUpdateRequestHandler,type=/update
 solr/core0:id=org.apache.solr.handler.XmlUpdateRequestHandler,type=org.apache.solr.handler.XmlUpdateRequestHandler
 ...
 solr/core0:id=org.apache.solr.search.SolrIndexSearcher,type=searcher
 solr/core0:id=org.apache.solr.update.DirectUpdateHandler2,type=updateHandler
 sh curl 'http://localhost:8983/solr/admin/cores?action=RELOADcore=core0'
 sh echo 'open 8842 # 8842 is java pid
  domain solr/core0
  beans
  ' | java -jar jmxterm-1.0-alpha-4-uber.jar
 # there's only one bean left after Solr core reload
 solr/core0:id=org.apache.solr.search.SolrIndexSearcher,type=Searcher@2e831a91 
 main
 {noformat}
 The root cause of this is Solr core reload behavior:
 # create new core (which overwrites existing registered MBeans)
 # register new core and close old one (we remove/un-register MBeans on 
 oldCore.close)
 The correct sequence is:
 # unregister MBeans from old core
 # create and register new core
 # close old core without touching MBeans

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org