Re: How to get Term Positions?
I tried the same thing today, am happy to share a snippet with you: SchemaField field = req.getSchema().getFields().get("field_name"); AtomicReader ar = req.getSearcher().getAtomicReader(); AtomicReaderContext context = ar.getContext(); final Fields fields = context.reader().fields(); final Terms terms = fields.terms("field_name"); final TermsEnum termsEnum = terms.iterator(null); Bits acceptDocs = new Bits.MatchAllBits(10); BytesRef bytes; while ((bytes = termsEnum.next()) != null) { CharsRef chars = new CharsRef(); field.getType().indexedToReadable(bytes, chars); final DocsAndPositionsEnum postings = termsEnum.docsAndPositions(acceptDocs, null, DocsAndPositionsEnum.FLAG_PAYLOADS); assertNotNull(postings); List docIds = new ArrayList(); int docId; while ((docId = postings.nextDoc()) != DocIdSetIterator.NO_MORE_DOCS) { docIds.add(docId); int freq = postings.freq(); for (int i = 0; i < freq; i++) { int nextPosition = postings.nextPosition(); String str = docId + "\t" + chars.toString() + "\t" + nextPosition; System.out.println(str); } } } -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-get-Term-Positions-tp477519p4052608.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to get Term Positions?
If you're going to spend time mucking w/ TermPositions, you should just spend your time working with SpanQuery, as that is what I understand you to be asking about. AIUI, you want to be able to get at the positions in the document where the query matched. This is exactly what a SpanQuery and it's derivatives does. It does all the work that you would have to do yourself by using the TermPositions class. On Mar 12, 2010, at 6:38 PM, MitchK wrote: > > Thank you both for your responses. > > However, I am not familiar enough with Solr and even not with Lucene. So, at > the moment, I have no real idea of what payloads are (I can't even translate > this word...). > The manual says something about "metadata" - but there is nothing said about > what metadata they mean. > I think that - looking at my little experiences with Lucene and Solr - it > would be a better idea to firstly read some stuff like "Lucene in Action", > before tryring to customize (or contribute to) Lucene/Solr at such a level. > > Do they currently work on the tickets? It seems like there was no more time > to do so?? > > Last but not least: I want to add something productive to my question: > The paper that maybe describes the solution for my problem... > > http://lucene.apache.org/java/3_0_1/fileformats.html#Positions > > To quote: > PositionDelta is, if payloads are disabled for the term's field, the > difference between the position of the current occurrence in the document > and the previous occurrence (or zero, if this is the first occurrence in > this document). > > If I could retrive the given information, this would be great - even if it > forces me to iterate over the document where the term occurs. Lucene's > TermPositions-Class seems to be a good place to start, doesn't it??? What do > you think? [1] > > Integrating some Lucene-based work to Solr is another question...I think one > needs to have a map, where one can see which class is usually called by > which class, but that is really another topic :). > > [1] > http://lucene.apache.org/java/3_0_1/api/all/org/apache/lucene/store/instantiated/InstantiatedTermPositions.html > > Thank you! > - Mitch > -- > View this message in context: > http://old.nabble.com/How-to-get-Term-Positions--tp27880551p27884130.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search
Re: How to get Term Positions?
Thank you both for your responses. However, I am not familiar enough with Solr and even not with Lucene. So, at the moment, I have no real idea of what payloads are (I can't even translate this word...). The manual says something about "metadata" - but there is nothing said about what metadata they mean. I think that - looking at my little experiences with Lucene and Solr - it would be a better idea to firstly read some stuff like "Lucene in Action", before tryring to customize (or contribute to) Lucene/Solr at such a level. Do they currently work on the tickets? It seems like there was no more time to do so?? Last but not least: I want to add something productive to my question: The paper that maybe describes the solution for my problem... http://lucene.apache.org/java/3_0_1/fileformats.html#Positions To quote: PositionDelta is, if payloads are disabled for the term's field, the difference between the position of the current occurrence in the document and the previous occurrence (or zero, if this is the first occurrence in this document). If I could retrive the given information, this would be great - even if it forces me to iterate over the document where the term occurs. Lucene's TermPositions-Class seems to be a good place to start, doesn't it??? What do you think? [1] Integrating some Lucene-based work to Solr is another question...I think one needs to have a map, where one can see which class is usually called by which class, but that is really another topic :). [1] http://lucene.apache.org/java/3_0_1/api/all/org/apache/lucene/store/instantiated/InstantiatedTermPositions.html Thank you! - Mitch -- View this message in context: http://old.nabble.com/How-to-get-Term-Positions--tp27880551p27884130.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to get Term Positions?
I contributed a little reward to whoever can complete this task too http://nextsprocket.com/tasks/solr-1337-spans-and-payloads-query-support-asf-jira Feel free to contribute to the reward if you need this done too! Tommy Chheng Programmer and UC Irvine Graduate Student Twitter @tommychheng http://tommy.chheng.com On 3/12/10 2:14 PM, Grant Ingersoll wrote: OK, you need https://issues.apache.org/jira/browse/SOLR-1337 and it's related item: https://issues.apache.org/jira/browse/SOLR-1485 Unfortunately, not implemented yet. On Mar 12, 2010, at 1:36 PM, MitchK wrote: Thanks for your response, Grant! Imagine you are searching for "foo". "foor" occurs in doc1 three times. It is the 5th, the 20th, and the 50th term in the document. I want to get these positions. Of course, if I am searching for "foo bar" and "bar" occurs at the 4th and the 21th position, I also want to know that. I am not sure, but I think this is what you mean by "per doc basis", right? Since I need the TermPosition at scoring time, TermVectorComponent seems to be no option in this case, or do you think it could be one, if I create such Vectors at index-time? -- View this message in context: http://old.nabble.com/How-to-get-Term-Positions--tp27880551p27881024.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to get Term Positions?
OK, you need https://issues.apache.org/jira/browse/SOLR-1337 and it's related item: https://issues.apache.org/jira/browse/SOLR-1485 Unfortunately, not implemented yet. On Mar 12, 2010, at 1:36 PM, MitchK wrote: > > Thanks for your response, Grant! > > Imagine you are searching for "foo". > "foor" occurs in doc1 three times. It is the 5th, the 20th, and the 50th > term in the document. > I want to get these positions. > > Of course, if I am searching for "foo bar" and "bar" occurs at the 4th and > the 21th position, I also want to know that. I am not sure, but I think this > is what you mean by "per doc basis", right? > > Since I need the TermPosition at scoring time, TermVectorComponent seems to > be no option in this case, or do you think it could be one, if I create such > Vectors at index-time? > -- > View this message in context: > http://old.nabble.com/How-to-get-Term-Positions--tp27880551p27881024.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: How to get Term Positions?
Thanks for your response, Grant! Imagine you are searching for "foo". "foor" occurs in doc1 three times. It is the 5th, the 20th, and the 50th term in the document. I want to get these positions. Of course, if I am searching for "foo bar" and "bar" occurs at the 4th and the 21th position, I also want to know that. I am not sure, but I think this is what you mean by "per doc basis", right? Since I need the TermPosition at scoring time, TermVectorComponent seems to be no option in this case, or do you think it could be one, if I create such Vectors at index-time? -- View this message in context: http://old.nabble.com/How-to-get-Term-Positions--tp27880551p27881024.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to get Term Positions?
What TermPositions do you want? On a per doc basis or just in general for the index? I think the TermsComponent could add the latter. The former is only possible via TermVectors. -Grant On Mar 12, 2010, at 12:46 PM, MitchK wrote: > > Hello community, > > is it possible to get TermPositions without a TermVector? If yes, how can I > do so? > If such a feature is not yet implemented in Solr, it would be interesting > how to do so with Lucene. > > I don't want to use a TermVector, because I have read somewhere that Lucene > stores the TermPosition in its inverted index, but I don't know how to > retrieve it. > > Any suggestions? > > Thank you! > - Mitch > -- > View this message in context: > http://old.nabble.com/How-to-get-Term-Positions--tp27880551p27880551.html > Sent from the Solr - User mailing list archive at Nabble.com. >