Re: Return Lucene DocId in Solr Results
: Subject: Re: Return Lucene DocId in Solr Results : : Ahhh, you're already down in Lucene. That makes things easier... : : See TermDocs. Particularly seek(Term). That'll directly access the indexed : unique key rather than having to form a bunch of queries. you should also sort your keys lexigraphically first before you loop over them - that will let you reuse the same Term enumerator and always seek forward (single pass) -Hoss
RE: Return Lucene DocId in Solr Results
I know the doc ids from one core have nothing to do with the other. I was going to use the docId returned from the first core in the solr results and store it in the second core that way the second core knows about the doc ids from the first core. So when you query the second core from the Filter in the first core you get returned a set of data that includes the docId from the first core that the document relates to. I have backed off from this approach and have a user defined primary key in the firstCore, which is stored as the reference in the secondCore and when the filter performs the search it goes off and queries the firstCore for each primary key and gets the lucene docId from the returned doc. Thanks, Steve -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: 02 December 2010 02:19 To: solr-user@lucene.apache.org Subject: Re: Return Lucene DocId in Solr Results On the face of it, this doesn't make sense, so perhaps you can explain a bit.The doc IDs from one Solr instance have no relation to the doc IDs from another Solr instance. So anything that uses doc IDs from one Solr instance to create a filter on another instance doesn't seem to be something you'd want to do... Which may just mean I don't understand what you're trying to do. Can you back up a bit and describe the higher-level problem? This seems like it may be an XY problem, see: http://people.apache.org/~hossman/#xyproblem Best Erick On Tue, Nov 30, 2010 at 6:57 AM, Lohrenz, Steven steven.lohr...@hmhpub.comwrote: Hi, I was wondering how I would go about getting the lucene docid included in the results from a solr query? I've built a QueryParser to query another solr instance and and join the results of the two instances through the use of a Filter. The Filter needs the lucene docid to work. This is the only bit I'm missing right now. Thanks, Steve
Re: Return Lucene DocId in Solr Results
Sounds good, especially because your old scenario was fragile. The doc IDs in your first core could change as a result of a single doc deletion and optimize. So the doc IDs stored in the second core would then be wrong... Your user-defined unique key is definitely a better way to go. There are some tricks you could try if there are performance issues Best Erick On Thu, Dec 2, 2010 at 7:47 AM, Lohrenz, Steven steven.lohr...@hmhpub.comwrote: I know the doc ids from one core have nothing to do with the other. I was going to use the docId returned from the first core in the solr results and store it in the second core that way the second core knows about the doc ids from the first core. So when you query the second core from the Filter in the first core you get returned a set of data that includes the docId from the first core that the document relates to. I have backed off from this approach and have a user defined primary key in the firstCore, which is stored as the reference in the secondCore and when the filter performs the search it goes off and queries the firstCore for each primary key and gets the lucene docId from the returned doc. Thanks, Steve -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: 02 December 2010 02:19 To: solr-user@lucene.apache.org Subject: Re: Return Lucene DocId in Solr Results On the face of it, this doesn't make sense, so perhaps you can explain a bit.The doc IDs from one Solr instance have no relation to the doc IDs from another Solr instance. So anything that uses doc IDs from one Solr instance to create a filter on another instance doesn't seem to be something you'd want to do... Which may just mean I don't understand what you're trying to do. Can you back up a bit and describe the higher-level problem? This seems like it may be an XY problem, see: http://people.apache.org/~hossman/#xyproblem Best Erick On Tue, Nov 30, 2010 at 6:57 AM, Lohrenz, Steven steven.lohr...@hmhpub.comwrote: Hi, I was wondering how I would go about getting the lucene docid included in the results from a solr query? I've built a QueryParser to query another solr instance and and join the results of the two instances through the use of a Filter. The Filter needs the lucene docid to work. This is the only bit I'm missing right now. Thanks, Steve
RE: Return Lucene DocId in Solr Results
I would be interested in hearing about some ways to improve the algorithm. I have done a very straightforward Lucene query within a loop to get the docIds. Here's what I did to get it working where favsBean are objects returned from a query of the second core, but there is probably a better way to do it: private int[] getDocIdsFromPrimaryKey(SolrQueryRequest req, ListFavorites favsBeans) throws ParseException { // open the core get data directory String indexDir = req.getCore().getIndexDir(); FSDirectory index = null; try { index = FSDirectory.open(new File(indexDir)); } catch (IOException e) { throw new ParseException(IOException, cannot open the index at: + indexDir + + e.getMessage()); } int[] docIds = new int[favsBeans.size()]; int i = 0; for(Favorites favBean: favsBeans) { String pkQueryString = resourceId: + favBean.getResourceId(); Query pkQuery = new QueryParser(Version.LUCENE_CURRENT, resourceId, new StandardAnalyzer()).parse(pkQueryString); IndexSearcher searcher = null; TopScoreDocCollector collector = null; try { searcher = new IndexSearcher(index, true); collector = TopScoreDocCollector.create(1, true); searcher.search(pkQuery, collector); } catch (IOException e) { throw new ParseException(IOException, cannot search the index at: + indexDir + + e.getMessage()); } ScoreDoc[] hits = collector.topDocs().scoreDocs; if(hits != null hits[0] != null) { docIds[i] = hits[0].doc; i++; } } Arrays.sort(docIds); return docIds; } -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: 02 December 2010 13:46 To: solr-user@lucene.apache.org Subject: Re: Return Lucene DocId in Solr Results Sounds good, especially because your old scenario was fragile. The doc IDs in your first core could change as a result of a single doc deletion and optimize. So the doc IDs stored in the second core would then be wrong... Your user-defined unique key is definitely a better way to go. There are some tricks you could try if there are performance issues Best Erick On Thu, Dec 2, 2010 at 7:47 AM, Lohrenz, Steven steven.lohr...@hmhpub.comwrote: I know the doc ids from one core have nothing to do with the other. I was going to use the docId returned from the first core in the solr results and store it in the second core that way the second core knows about the doc ids from the first core. So when you query the second core from the Filter in the first core you get returned a set of data that includes the docId from the first core that the document relates to. I have backed off from this approach and have a user defined primary key in the firstCore, which is stored as the reference in the secondCore and when the filter performs the search it goes off and queries the firstCore for each primary key and gets the lucene docId from the returned doc. Thanks, Steve -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: 02 December 2010 02:19 To: solr-user@lucene.apache.org Subject: Re: Return Lucene DocId in Solr Results On the face of it, this doesn't make sense, so perhaps you can explain a bit.The doc IDs from one Solr instance have no relation to the doc IDs from another Solr instance. So anything that uses doc IDs from one Solr instance to create a filter on another instance doesn't seem to be something you'd want to do... Which may just mean I don't understand what you're trying to do. Can you back up a bit and describe the higher-level problem? This seems like it may be an XY problem, see: http://people.apache.org/~hossman/#xyproblem Best Erick On Tue, Nov 30, 2010 at 6:57 AM, Lohrenz, Steven steven.lohr...@hmhpub.comwrote: Hi, I was wondering how I would go about getting the lucene docid included in the results from a solr query? I've built a QueryParser to query another solr instance and and join the results of the two instances through the use of a Filter. The Filter needs the lucene docid to work. This is the only bit I'm missing right now. Thanks, Steve
Re: Return Lucene DocId in Solr Results
Ahhh, you're already down in Lucene. That makes things easier... See TermDocs. Particularly seek(Term). That'll directly access the indexed unique key rather than having to form a bunch of queries. Best Erick On Thu, Dec 2, 2010 at 8:59 AM, Lohrenz, Steven steven.lohr...@hmhpub.comwrote: I would be interested in hearing about some ways to improve the algorithm. I have done a very straightforward Lucene query within a loop to get the docIds. Here's what I did to get it working where favsBean are objects returned from a query of the second core, but there is probably a better way to do it: private int[] getDocIdsFromPrimaryKey(SolrQueryRequest req, ListFavorites favsBeans) throws ParseException { // open the core get data directory String indexDir = req.getCore().getIndexDir(); FSDirectory index = null; try { index = FSDirectory.open(new File(indexDir)); } catch (IOException e) { throw new ParseException(IOException, cannot open the index at: + indexDir + + e.getMessage()); } int[] docIds = new int[favsBeans.size()]; int i = 0; for(Favorites favBean: favsBeans) { String pkQueryString = resourceId: + favBean.getResourceId(); Query pkQuery = new QueryParser(Version.LUCENE_CURRENT, resourceId, new StandardAnalyzer()).parse(pkQueryString); IndexSearcher searcher = null; TopScoreDocCollector collector = null; try { searcher = new IndexSearcher(index, true); collector = TopScoreDocCollector.create(1, true); searcher.search(pkQuery, collector); } catch (IOException e) { throw new ParseException(IOException, cannot search the index at: + indexDir + + e.getMessage()); } ScoreDoc[] hits = collector.topDocs().scoreDocs; if(hits != null hits[0] != null) { docIds[i] = hits[0].doc; i++; } } Arrays.sort(docIds); return docIds; } -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: 02 December 2010 13:46 To: solr-user@lucene.apache.org Subject: Re: Return Lucene DocId in Solr Results Sounds good, especially because your old scenario was fragile. The doc IDs in your first core could change as a result of a single doc deletion and optimize. So the doc IDs stored in the second core would then be wrong... Your user-defined unique key is definitely a better way to go. There are some tricks you could try if there are performance issues Best Erick On Thu, Dec 2, 2010 at 7:47 AM, Lohrenz, Steven steven.lohr...@hmhpub.comwrote: I know the doc ids from one core have nothing to do with the other. I was going to use the docId returned from the first core in the solr results and store it in the second core that way the second core knows about the doc ids from the first core. So when you query the second core from the Filter in the first core you get returned a set of data that includes the docId from the first core that the document relates to. I have backed off from this approach and have a user defined primary key in the firstCore, which is stored as the reference in the secondCore and when the filter performs the search it goes off and queries the firstCore for each primary key and gets the lucene docId from the returned doc. Thanks, Steve -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: 02 December 2010 02:19 To: solr-user@lucene.apache.org Subject: Re: Return Lucene DocId in Solr Results On the face of it, this doesn't make sense, so perhaps you can explain a bit.The doc IDs from one Solr instance have no relation to the doc IDs from another Solr instance. So anything that uses doc IDs from one Solr instance to create a filter on another instance doesn't seem to be something you'd want to do... Which may just mean I don't understand what you're trying to do. Can you back up a bit and describe the higher-level problem? This seems like it may be an XY problem, see: http://people.apache.org/~hossman/#xyproblem Best Erick On Tue, Nov 30, 2010 at 6:57 AM, Lohrenz, Steven steven.lohr...@hmhpub.comwrote: Hi, I was wondering how I would go about getting the lucene docid included in the results from a solr query? I've built a QueryParser to query another solr instance and and join the results of the two instances through the use of a Filter. The Filter needs the lucene docid to work. This is the only bit I'm missing right now. Thanks, Steve
RE: Return Lucene DocId in Solr Results
I must be missing something as I'm getting a NPE on the line: docIds[i] = termDocs.doc(); here's what I came up with: private int[] getDocIdsFromPrimaryKey(SolrQueryRequest req, ListFavorites favsBeans) throws ParseException { // open the core get data directory String indexDir = req.getCore().getIndexDir(); FSDirectory indexDirectory = null; try { indexDirectory = FSDirectory.open(new File(indexDir)); } catch (IOException e) { throw new ParseException(IOException, cannot open the index at: + indexDir + + e.getMessage()); } //String pkQueryString = resourceId: + favBean.getResourceId(); //Query pkQuery = new QueryParser(Version.LUCENE_CURRENT, resourceId, new StandardAnalyzer()).parse(pkQueryString); IndexSearcher searcher = null; TopScoreDocCollector collector = null; IndexReader indexReader = null; TermDocs termDocs = null; try { searcher = new IndexSearcher(indexDirectory, true); indexReader = new FilterIndexReader(searcher.getIndexReader()); termDocs = indexReader.termDocs(); } catch (IOException e) { throw new ParseException(IOException, cannot open the index at: + indexDir + + e.getMessage()); } int[] docIds = new int[favsBeans.size()]; int i = 0; for(Favorites favBean: favsBeans) { Term term = new Term(resourceId, favBean.getResourceId()); try { termDocs.seek(term); docIds[i] = termDocs.doc(); } catch (IOException e) { throw new ParseException(IOException, cannot seek to the primary key + favBean.getResourceId() + in : + indexDir + + e.getMessage()); } //ScoreDoc[] hits = collector.topDocs().scoreDocs; //if(hits != null hits[0] != null) { i++; //} } Arrays.sort(docIds); return docIds; } Thanks, Steve -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: 02 December 2010 14:20 To: solr-user@lucene.apache.org Subject: Re: Return Lucene DocId in Solr Results Ahhh, you're already down in Lucene. That makes things easier... See TermDocs. Particularly seek(Term). That'll directly access the indexed unique key rather than having to form a bunch of queries. Best Erick On Thu, Dec 2, 2010 at 8:59 AM, Lohrenz, Steven steven.lohr...@hmhpub.comwrote: I would be interested in hearing about some ways to improve the algorithm. I have done a very straightforward Lucene query within a loop to get the docIds. Here's what I did to get it working where favsBean are objects returned from a query of the second core, but there is probably a better way to do it: private int[] getDocIdsFromPrimaryKey(SolrQueryRequest req, ListFavorites favsBeans) throws ParseException { // open the core get data directory String indexDir = req.getCore().getIndexDir(); FSDirectory index = null; try { index = FSDirectory.open(new File(indexDir)); } catch (IOException e) { throw new ParseException(IOException, cannot open the index at: + indexDir + + e.getMessage()); } int[] docIds = new int[favsBeans.size()]; int i = 0; for(Favorites favBean: favsBeans) { String pkQueryString = resourceId: + favBean.getResourceId(); Query pkQuery = new QueryParser(Version.LUCENE_CURRENT, resourceId, new StandardAnalyzer()).parse(pkQueryString); IndexSearcher searcher = null; TopScoreDocCollector collector = null; try { searcher = new IndexSearcher(index, true); collector = TopScoreDocCollector.create(1, true); searcher.search(pkQuery, collector); } catch (IOException e) { throw new ParseException(IOException, cannot search the index at: + indexDir + + e.getMessage()); } ScoreDoc[] hits = collector.topDocs().scoreDocs; if(hits != null hits[0] != null) { docIds[i] = hits[0].doc; i++; } } Arrays.sort(docIds); return docIds; } -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: 02 December 2010 13:46 To: solr-user@lucene.apache.org Subject: Re: Return Lucene DocId in Solr Results Sounds good, especially because your old scenario was fragile. The doc IDs in your first core could change as a result of a single doc deletion and optimize. So the doc IDs stored in the second core would then be wrong... Your user-defined unique key is definitely a better way to go. There are some tricks you could try if there are performance issues Best Erick On Thu, Dec 2, 2010 at 7:47 AM, Lohrenz
Re: Return Lucene DocId in Solr Results
You have to call termDocs.next() after termDocs.seek. Something like termDocs.seek(). if (termDocs.next()) { // means there was a term/doc matching and your references should be valid. } On Thu, Dec 2, 2010 at 10:22 AM, Lohrenz, Steven steven.lohr...@hmhpub.comwrote: I must be missing something as I'm getting a NPE on the line: docIds[i] = termDocs.doc(); here's what I came up with: private int[] getDocIdsFromPrimaryKey(SolrQueryRequest req, ListFavorites favsBeans) throws ParseException { // open the core get data directory String indexDir = req.getCore().getIndexDir(); FSDirectory indexDirectory = null; try { indexDirectory = FSDirectory.open(new File(indexDir)); } catch (IOException e) { throw new ParseException(IOException, cannot open the index at: + indexDir + + e.getMessage()); } //String pkQueryString = resourceId: + favBean.getResourceId(); //Query pkQuery = new QueryParser(Version.LUCENE_CURRENT, resourceId, new StandardAnalyzer()).parse(pkQueryString); IndexSearcher searcher = null; TopScoreDocCollector collector = null; IndexReader indexReader = null; TermDocs termDocs = null; try { searcher = new IndexSearcher(indexDirectory, true); indexReader = new FilterIndexReader(searcher.getIndexReader()); termDocs = indexReader.termDocs(); } catch (IOException e) { throw new ParseException(IOException, cannot open the index at: + indexDir + + e.getMessage()); } int[] docIds = new int[favsBeans.size()]; int i = 0; for(Favorites favBean: favsBeans) { Term term = new Term(resourceId, favBean.getResourceId()); try { termDocs.seek(term); docIds[i] = termDocs.doc(); } catch (IOException e) { throw new ParseException(IOException, cannot seek to the primary key + favBean.getResourceId() + in : + indexDir + + e.getMessage()); } //ScoreDoc[] hits = collector.topDocs().scoreDocs; //if(hits != null hits[0] != null) { i++; //} } Arrays.sort(docIds); return docIds; } Thanks, Steve -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: 02 December 2010 14:20 To: solr-user@lucene.apache.org Subject: Re: Return Lucene DocId in Solr Results Ahhh, you're already down in Lucene. That makes things easier... See TermDocs. Particularly seek(Term). That'll directly access the indexed unique key rather than having to form a bunch of queries. Best Erick On Thu, Dec 2, 2010 at 8:59 AM, Lohrenz, Steven steven.lohr...@hmhpub.comwrote: I would be interested in hearing about some ways to improve the algorithm. I have done a very straightforward Lucene query within a loop to get the docIds. Here's what I did to get it working where favsBean are objects returned from a query of the second core, but there is probably a better way to do it: private int[] getDocIdsFromPrimaryKey(SolrQueryRequest req, ListFavorites favsBeans) throws ParseException { // open the core get data directory String indexDir = req.getCore().getIndexDir(); FSDirectory index = null; try { index = FSDirectory.open(new File(indexDir)); } catch (IOException e) { throw new ParseException(IOException, cannot open the index at: + indexDir + + e.getMessage()); } int[] docIds = new int[favsBeans.size()]; int i = 0; for(Favorites favBean: favsBeans) { String pkQueryString = resourceId: + favBean.getResourceId(); Query pkQuery = new QueryParser(Version.LUCENE_CURRENT, resourceId, new StandardAnalyzer()).parse(pkQueryString); IndexSearcher searcher = null; TopScoreDocCollector collector = null; try { searcher = new IndexSearcher(index, true); collector = TopScoreDocCollector.create(1, true); searcher.search(pkQuery, collector); } catch (IOException e) { throw new ParseException(IOException, cannot search the index at: + indexDir + + e.getMessage()); } ScoreDoc[] hits = collector.topDocs().scoreDocs; if(hits != null hits[0] != null) { docIds[i] = hits[0].doc; i++; } } Arrays.sort(docIds); return docIds; } -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: 02 December 2010 13:46 To: solr-user@lucene.apache.org Subject: Re: Return Lucene DocId in Solr Results Sounds good, especially because your old scenario was fragile. The doc IDs
Re: Return Lucene DocId in Solr Results
Take this with a sizeable grain of salt as I haven't actually tried doing this. But you might try using an IndexReader which it looks like you can get from this class: http://lucene.apache.org/solr/api/org/apache/solr/core/StandardIndexReaderFactory.html sasank On Tue, Nov 30, 2010 at 6:45 AM, Lohrenz, Steven steven.lohr...@hmhpub.comwrote: Hmm, I found some similar queries on stackoverflow and they did not recommend exposing the lucene docId. So, I guess my question becomes: What is the best way, from within my custom QParser, to take a list of solr primary keys (that were retrieved from elsewhere) and turn them into docIds? I also saw something about cacheing them using a Field Cache - how would I do that? Thanks, Steve -Original Message- From: Lohrenz, Steven [mailto:steven.lohr...@hmhpub.com] Sent: 30 November 2010 11:57 To: solr-user@lucene.apache.org Subject: Return Lucene DocId in Solr Results Hi, I was wondering how I would go about getting the lucene docid included in the results from a solr query? I've built a QueryParser to query another solr instance and and join the results of the two instances through the use of a Filter. The Filter needs the lucene docid to work. This is the only bit I'm missing right now. Thanks, Steve
Re: Return Lucene DocId in Solr Results
On the face of it, this doesn't make sense, so perhaps you can explain a bit.The doc IDs from one Solr instance have no relation to the doc IDs from another Solr instance. So anything that uses doc IDs from one Solr instance to create a filter on another instance doesn't seem to be something you'd want to do... Which may just mean I don't understand what you're trying to do. Can you back up a bit and describe the higher-level problem? This seems like it may be an XY problem, see: http://people.apache.org/~hossman/#xyproblem Best Erick On Tue, Nov 30, 2010 at 6:57 AM, Lohrenz, Steven steven.lohr...@hmhpub.comwrote: Hi, I was wondering how I would go about getting the lucene docid included in the results from a solr query? I've built a QueryParser to query another solr instance and and join the results of the two instances through the use of a Filter. The Filter needs the lucene docid to work. This is the only bit I'm missing right now. Thanks, Steve
RE: Return Lucene DocId in Solr Results
Hmm, I found some similar queries on stackoverflow and they did not recommend exposing the lucene docId. So, I guess my question becomes: What is the best way, from within my custom QParser, to take a list of solr primary keys (that were retrieved from elsewhere) and turn them into docIds? I also saw something about cacheing them using a Field Cache - how would I do that? Thanks, Steve -Original Message- From: Lohrenz, Steven [mailto:steven.lohr...@hmhpub.com] Sent: 30 November 2010 11:57 To: solr-user@lucene.apache.org Subject: Return Lucene DocId in Solr Results Hi, I was wondering how I would go about getting the lucene docid included in the results from a solr query? I've built a QueryParser to query another solr instance and and join the results of the two instances through the use of a Filter. The Filter needs the lucene docid to work. This is the only bit I'm missing right now. Thanks, Steve