[jira] Commented: (LUCENE-1410) PFOR implementation
[ https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638543#action_12638543 ] Michael McCandless commented on LUCENE-1410: Paul, in decompress I added "inputSize = -1" at the top, so that the header is re-read. I need this so I can re-use a single PFor instance during decompress. > PFOR implementation > --- > > Key: LUCENE-1410 > URL: https://issues.apache.org/jira/browse/LUCENE-1410 > Project: Lucene - Java > Issue Type: New Feature > Components: Other >Reporter: Paul Elschot >Priority: Minor > Attachments: autogen.tgz, LUCENE-1410b.patch, LUCENE-1410c.patch, > TestPFor2.java, TestPFor2.java, TestPFor2.java > > Original Estimate: 21840h > Remaining Estimate: 21840h > > Implementation of Patched Frame of Reference. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1410) PFOR implementation
[ https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638564#action_12638564 ] Paul Elschot commented on LUCENE-1410: -- Did you also move to relative addressing in the buffer? Another question: I suppose the place to add this initially would be in IndexOutput and IndexInput? In that case it would make sense to reserve (some bits of) the first byte in the compressed buffer for the compression method, and use these bits there to call PFor or another (de)compression method. > PFOR implementation > --- > > Key: LUCENE-1410 > URL: https://issues.apache.org/jira/browse/LUCENE-1410 > Project: Lucene - Java > Issue Type: New Feature > Components: Other >Reporter: Paul Elschot >Priority: Minor > Attachments: autogen.tgz, LUCENE-1410b.patch, LUCENE-1410c.patch, > TestPFor2.java, TestPFor2.java, TestPFor2.java > > Original Estimate: 21840h > Remaining Estimate: 21840h > > Implementation of Patched Frame of Reference. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1410) PFOR implementation
[ https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638573#action_12638573 ] Michael McCandless commented on LUCENE-1410: Another thing that bit me was the bufferByteSize(): if this returns something that's not 0 mod 4, you must increase it to the next multiple of 4 otherwise you will lose data since ByteBuffer is big endian by default. We should test little endian to see if performance changes (on different CPUs). bq. Did you also move to relative addressing in the buffer? No I haven't done that, but I think we should. I believe it's faster. I'm trying now to get a rudimentary test working for TermQuery using pfor. {quote} Another question: I suppose the place to add this initially would be in IndexOutput and IndexInput? In that case it would make sense to reserve (some bits of) the first byte in the compressed buffer for the compression method, and use these bits there to call PFor or another (de)compression method. {quote} This gets into flexible indexing... Ideally we do this in a pluggable way, so that PFor is just one such plugin, simple vInts is another, etc. I could see a compression layer living "above" IndexInput/Output, since logically how you encode an int block into bytes is independent from the means of storage. But: such an abstraction may hurt performance too much since during read it would entail an extra buffer copy. So maybe we should just add methods to IndexInput/Output, or, make a new IntBlockInput/Output. Also, some things you now store in the header of each block should presumably move to the start of the file instead (eg the compression method), or if we move to a separate "schema" file that can record which compressor was used per file, we'd put this there. So I'm not yet exactly sure how we should tie this in "for real"... > PFOR implementation > --- > > Key: LUCENE-1410 > URL: https://issues.apache.org/jira/browse/LUCENE-1410 > Project: Lucene - Java > Issue Type: New Feature > Components: Other >Reporter: Paul Elschot >Priority: Minor > Attachments: autogen.tgz, LUCENE-1410b.patch, LUCENE-1410c.patch, > TestPFor2.java, TestPFor2.java, TestPFor2.java > > Original Estimate: 21840h > Remaining Estimate: 21840h > > Implementation of Patched Frame of Reference. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-1417) Allowing for distance measures that incorporate frequency/popularity for SuggestWord comparison
Allowing for distance measures that incorporate frequency/popularity for SuggestWord comparison --- Key: LUCENE-1417 URL: https://issues.apache.org/jira/browse/LUCENE-1417 Project: Lucene - Java Issue Type: Improvement Components: contrib/spellchecker Affects Versions: 2.4 Reporter: Jason Rennie Spelling suggestions are currently ordered first by a string edit distance measure, then by popularity/frequency. This limits the ability of popularity/frequency to affect suggestions. I think it would be better for the distance measure to accept popularity/frequency as an argument and provide a distance/score that incorporates any popularity/frequency considerations. I.e. change StringDistance.getDistance to accept an additional argument: frequency of the potential suggestion. The new SuggestWord.compareTo function would only order by score. We could achieve the existing behavior by adding a small inverse frequency value to the distances. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Assigned: (LUCENE-1415) MultiPhraseQuery has incorrect hashCode() implementation - Leads to Solr Cache misses
[ https://issues.apache.org/jira/browse/LUCENE-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley reassigned LUCENE-1415: Assignee: Yonik Seeley > MultiPhraseQuery has incorrect hashCode() implementation - Leads to Solr > Cache misses > - > > Key: LUCENE-1415 > URL: https://issues.apache.org/jira/browse/LUCENE-1415 > Project: Lucene - Java > Issue Type: Bug > Components: Search >Affects Versions: 2.4 >Reporter: Todd Feak >Assignee: Yonik Seeley > Attachments: LUCENE-1415.patch, LUCENE-1415.patch, > MultiPhraseQuery.java, MultiPhraseQueryTest.java > > > I found this while hunting for the cause of Solr Cache misses. > The MultiPhraseQuery class hashCode() implementation is non-deterministic. It > uses termArrays.hashCode() in the computation. The contents of that ArrayList > are actually arrays themselves, which return there reference ID as a hashCode > instead of returning a hashCode which is based on the contents of the array. > I would suggest an implementation involving the Arrays.hashCode() method. > I will try to submit a patch soon, off for today. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Resolved: (LUCENE-1415) MultiPhraseQuery has incorrect hashCode() implementation - Leads to Solr Cache misses
[ https://issues.apache.org/jira/browse/LUCENE-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley resolved LUCENE-1415. -- Resolution: Fixed Thanks, I just committed this. > MultiPhraseQuery has incorrect hashCode() implementation - Leads to Solr > Cache misses > - > > Key: LUCENE-1415 > URL: https://issues.apache.org/jira/browse/LUCENE-1415 > Project: Lucene - Java > Issue Type: Bug > Components: Search >Affects Versions: 2.4 >Reporter: Todd Feak >Assignee: Yonik Seeley > Attachments: LUCENE-1415.patch, LUCENE-1415.patch, > MultiPhraseQuery.java, MultiPhraseQueryTest.java > > > I found this while hunting for the cause of Solr Cache misses. > The MultiPhraseQuery class hashCode() implementation is non-deterministic. It > uses termArrays.hashCode() in the computation. The contents of that ArrayList > are actually arrays themselves, which return there reference ID as a hashCode > instead of returning a hashCode which is based on the contents of the array. > I would suggest an implementation involving the Arrays.hashCode() method. > I will try to submit a patch soon, off for today. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-1418) QueryParser can throw NullPointerException during parsing of some queries in case if default field passed to constructor is null
QueryParser can throw NullPointerException during parsing of some queries in case if default field passed to constructor is null Key: LUCENE-1418 URL: https://issues.apache.org/jira/browse/LUCENE-1418 Project: Lucene - Java Issue Type: Bug Components: QueryParser Affects Versions: 2.4 Environment: CentOS 5.2 (probably any applies) Reporter: Alexei Dets Priority: Minor In case if QueryParser was constructed using "QueryParser(String f, Analyzer a)" constructor and f equals null then QueryParser can fail with NullPointerException during parsing of some queries that _does_ contain field name but have unbalanced parenthesis. Example 1: Query: field:(expr1) expr2) Result: java.lang.NullPointerException at org.apache.lucene.index.Term.(Term.java:50) at org.apache.lucene.index.Term.(Term.java:36) at org.apache.lucene.queryParser.QueryParser.getFieldQuery(QueryParser.java:543) at org.apache.lucene.queryParser.QueryParser.Term(QueryParser.java:1324) at org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:1211) at org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:1168) at org.apache.lucene.queryParser.QueryParser.TopLevelQuery(QueryParser.java:1128) at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:170) Example2: Query: field:(expr1) "expr2") Result: java.lang.NullPointerException at org.apache.lucene.index.Term.(Term.java:50) at org.apache.lucene.index.Term.(Term.java:36) at org.apache.lucene.queryParser.QueryParser.getFieldQuery(QueryParser.java:543) at org.apache.lucene.queryParser.QueryParser.getFieldQuery(QueryParser.java:612) at org.apache.lucene.queryParser.QueryParser.Term(QueryParser.java:1459) at org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:1211) at org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:1168) at org.apache.lucene.queryParser.QueryParser.TopLevelQuery(QueryParser.java:1128) at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:170) Workaround: pass in constructor empty string as a default field name - in this case QueryParser.parse method will throw ParseException (expected result because query string is wrong) instead of NullPointerException. It is not obvious to me how to fix this so I'll describe my usecase, may be I'm doing something completely wrong. Basically I have a set of per-field queries entered by user and need to programmatically construct (after some preprocessing) one real Lucene query combined from these user-entered per-field subqueries. To achieve this I basically do the following (simplified a bit): QueryParser parser = new QueryParser(null, analyzer); // I'll always provide a field name in a query string as it is different each time and I don't have any default BooleanQuery query = new BooleanQuery(); Query subQuery1 = parser.parse(field1 + ":(" + queryString1 + ')'); query.add(subQuery1, operator1); // operator = BooleanClause.Occur.MUST, BooleanClause.Occur.MUST_NOT or BooleanClause.Occur.SHOULD Query subQuery2 = parser.parse(field2 + ":(" + queryString2 + ')'); query.add(subQuery2, operator2); Query subQuery3 = parser.parse(field3 + ":(" + queryString3 + ')'); query.add(subQuery3, operator3); ... IMHO either QueryParser constructor should be changed to throw NullPointerException/InvalidArgumentException in case of null field passed (and API documentation updated) or QueryParser.parse behavior should be fixed to correctly throw ParseException instead of NullPointerException. Also IMHO of a great help can be _public_ setField/getField methods of QueryParser (that set/get field), this can help in use cases like my: QueryParser parser = new QueryParser(null, analyzer); // or add constructor with analyzer _only_ for such cases BooleanQuery query = new BooleanQuery(); parser.setField(field1); Query subQuery1 = parser.parse(queryString1); query.add(subQuery1, operator1); parser.setField(field2); Query subQuery2 = parser.parse(queryString2); query.add(subQuery2, operator2); ... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]