Re: Regarding DrillDown search
Dear Shai, Thank you for the quick response :) I have checked with PrefixQuery and term, it is working fine, But I think I cannot pass multiple Category path in it. I am calling the DrillDown.term() method 'N' number of times based on the number of Category Path list. And I have one more question, When I get the FacetResult, I am getting only the count of documents matched with the Category Path. Is there anyway to get the Document object also along with the count to know the file names For ex. Files (file names -title Field in Document) which have the same Author from the FacetResult. I have read some articles for the same from one of your answer I believe. In that you have explained like this "Categories will be listed to the user and when the user clicks the category we have to do DrillDown search to get further result. I just want to know if we can get the document names as well in the first Facet query search itself, when we get the count (no of hits) of documents along with the FacetResult. Is there any solution available already or what I can do for that. Kindly Guide me :) Thank you for All your Support. Regards, Jebarlin.R On Mon, Feb 10, 2014 at 1:28 PM, Shai Erera wrote: > Hi > > If you want to drill-down on first name only, then you have several > options: > > 1) Index Author/First, Author/Last, Author/First_Last as facets on the > document. This is the faster approach, but bloats the index. Also, if you > index the author Author/Jebarlin, Author/Robertson and > Author/Jebarlin_Robertson, it still won't allow you to execute a query > Author/Jebar. > > 2) You should modify the query to be a PrefixQuery, as if the user chose to > search Author/Jeral*. You can do that with DrillDown.term() to create a > Term($facets, Author/Jeral) (NOTE: you shouldn't pass '*' as part of the > CategoryPath) and then construct your own PrefixQuery with that Term. > > Hope that helps, > Shai > > > On Mon, Feb 10, 2014 at 6:21 AM, Jebarlin Robertson >wrote: > > > Dear Shai, > > > > I have one doubt in DrillDown search, when I search with a CategoryPath > of > > author, it is giving me the result if I give the accurate full name only. > > Is there any way to get the result even if I give the first or last name. > > Can you help me to search like (*contains* the word in Facet search), if > > the latest API supports or any other APIs. > > > > Thank You > > > > -- > > Thanks & Regards, > > Jebarlin Robertson.R > > GSM: 91-9538106181. > > > -- Thanks & Regards, Jebarlin Robertson.R GSM: 91-9538106181.
Re: Regarding DrillDown search
Hi You will need to build a BooleanQuery which comprises a list of PrefixQuery. The relation between each PrefixQuery should be OR or AND, as you see fit (I believe OR?). In order to get documents' attributes you should execute searcher.search() w/ e.g. MultiCollector which wraps a FacetsCollector and TopScoreDocCollector. Then after .search() finished, you should pull the facet results from the FacetsCollector instance and the document results from the TopScoreDocCollector instance. Something like (I hope it compiles in 3.6! :)): TopScoreDocCollector tsdc = TopScoreDocCollector.create(...); FacetsCollector fc = FacetsCollector.create(...); searcher.search(query, MultiCollector.wrap(tsdc, fc)); List facetResults = fc.getFacetResults(); TopDocs topDocs = tsdc.topDocs(); Something like that.. Shai On Mon, Feb 10, 2014 at 1:57 PM, Jebarlin Robertson wrote: > Dear Shai, > > Thank you for the quick response :) > > I have checked with PrefixQuery and term, it is working fine, But I think I > cannot pass multiple Category path in it. I am calling the > DrillDown.term() method 'N' number of times based on the number of Category > Path list. > > And I have one more question, When I get the FacetResult, I am getting only > the count of documents matched with the Category Path. > Is there anyway to get the Document object also along with the count to > know the file names For ex. Files (file names -title Field in Document) > which have the same Author from the FacetResult. I have read some articles > for the same from one of your answer I believe. > In that you have explained like this "Categories will be listed to the user > and when the user clicks the category we have to do DrillDown search to get > further result. > I just want to know if we can get the document names as well in the first > Facet query search itself, when we get the count (no of hits) of documents > along with the FacetResult. Is there any solution available already or what > I can do for that. > > Kindly Guide me :) > > Thank you for All your Support. > > Regards, > Jebarlin.R > > > On Mon, Feb 10, 2014 at 1:28 PM, Shai Erera wrote: > > > Hi > > > > If you want to drill-down on first name only, then you have several > > options: > > > > 1) Index Author/First, Author/Last, Author/First_Last as facets on the > > document. This is the faster approach, but bloats the index. Also, if you > > index the author Author/Jebarlin, Author/Robertson and > > Author/Jebarlin_Robertson, it still won't allow you to execute a query > > Author/Jebar. > > > > 2) You should modify the query to be a PrefixQuery, as if the user chose > to > > search Author/Jeral*. You can do that with DrillDown.term() to create a > > Term($facets, Author/Jeral) (NOTE: you shouldn't pass '*' as part of the > > CategoryPath) and then construct your own PrefixQuery with that Term. > > > > Hope that helps, > > Shai > > > > > > On Mon, Feb 10, 2014 at 6:21 AM, Jebarlin Robertson > >wrote: > > > > > Dear Shai, > > > > > > I have one doubt in DrillDown search, when I search with a CategoryPath > > of > > > author, it is giving me the result if I give the accurate full name > only. > > > Is there any way to get the result even if I give the first or last > name. > > > Can you help me to search like (*contains* the word in Facet search), > if > > > the latest API supports or any other APIs. > > > > > > Thank You > > > > > > -- > > > Thanks & Regards, > > > Jebarlin Robertson.R > > > GSM: 91-9538106181. > > > > > > > > > -- > Thanks & Regards, > Jebarlin Robertson.R > GSM: 91-9538106181. >
Re: Regarding DrillDown search
Hi Shai, Thanks, I am using the same way of BooleanQuery only with list of PrefixQuery only. I think I confused you sorry :) . I am using the same above code to get the result of documents. I am getting the TopDocs and retrieving the Documents also, If I don't even try that for the basic you will kill me :D. But my question was different, from the List of FacetResult I am getting only the counts or no of hits of Document in each category after iterating the list. I believe that the getLevel() of FacetNode returns the no of hits or no of documents falls into the particular Category. I need to know which are the documents are falling under the same category from the FacetResult Object also. I hope you will understand my question :) Thank you :) -- Jebarlin On Mon, Feb 10, 2014 at 9:09 PM, Shai Erera wrote: > Hi > > You will need to build a BooleanQuery which comprises a list of > PrefixQuery. The relation between each PrefixQuery should be OR or AND, as > you see fit (I believe OR?). > > In order to get documents' attributes you should execute searcher.search() > w/ e.g. MultiCollector which wraps a FacetsCollector and > TopScoreDocCollector. Then after .search() finished, you should pull the > facet results from the FacetsCollector instance and the document results > from the TopScoreDocCollector instance. Something like (I hope it compiles > in 3.6! :)): > > TopScoreDocCollector tsdc = TopScoreDocCollector.create(...); > FacetsCollector fc = FacetsCollector.create(...); > searcher.search(query, MultiCollector.wrap(tsdc, fc)); > > List facetResults = fc.getFacetResults(); > TopDocs topDocs = tsdc.topDocs(); > > Something like that.. > > Shai > > > On Mon, Feb 10, 2014 at 1:57 PM, Jebarlin Robertson >wrote: > > > Dear Shai, > > > > Thank you for the quick response :) > > > > I have checked with PrefixQuery and term, it is working fine, But I > think I > > cannot pass multiple Category path in it. I am calling the > > DrillDown.term() method 'N' number of times based on the number of > Category > > Path list. > > > > And I have one more question, When I get the FacetResult, I am getting > only > > the count of documents matched with the Category Path. > > Is there anyway to get the Document object also along with the count to > > know the file names For ex. Files (file names -title Field in Document) > > which have the same Author from the FacetResult. I have read some > articles > > for the same from one of your answer I believe. > > In that you have explained like this "Categories will be listed to the > user > > and when the user clicks the category we have to do DrillDown search to > get > > further result. > > I just want to know if we can get the document names as well in the first > > Facet query search itself, when we get the count (no of hits) of > documents > > along with the FacetResult. Is there any solution available already or > what > > I can do for that. > > > > Kindly Guide me :) > > > > Thank you for All your Support. > > > > Regards, > > Jebarlin.R > > > > > > On Mon, Feb 10, 2014 at 1:28 PM, Shai Erera wrote: > > > > > Hi > > > > > > If you want to drill-down on first name only, then you have several > > > options: > > > > > > 1) Index Author/First, Author/Last, Author/First_Last as facets on the > > > document. This is the faster approach, but bloats the index. Also, if > you > > > index the author Author/Jebarlin, Author/Robertson and > > > Author/Jebarlin_Robertson, it still won't allow you to execute a query > > > Author/Jebar. > > > > > > 2) You should modify the query to be a PrefixQuery, as if the user > chose > > to > > > search Author/Jeral*. You can do that with DrillDown.term() to create a > > > Term($facets, Author/Jeral) (NOTE: you shouldn't pass '*' as part of > the > > > CategoryPath) and then construct your own PrefixQuery with that Term. > > > > > > Hope that helps, > > > Shai > > > > > > > > > On Mon, Feb 10, 2014 at 6:21 AM, Jebarlin Robertson < > jebar...@gmail.com > > > >wrote: > > > > > > > Dear Shai, > > > > > > > > I have one doubt in DrillDown search, when I search with a > CategoryPath > > > of > > > > author, it is giving me the result if I give the accurate full name > > only. > > > > Is there any way to get the result even if I give the first or last > > name. > > > > Can you help me to search like (*contains* the word in Facet search), > > if > > > > the latest API supports or any other APIs. > > > > > > > > Thank You > > > > > > > > -- > > > > Thanks & Regards, > > > > Jebarlin Robertson.R > > > > GSM: 91-9538106181. > > > > > > > > > > > > > > > -- > > Thanks & Regards, > > Jebarlin Robertson.R > > GSM: 91-9538106181. > > > -- Thanks & Regards, Jebarlin Robertson.R GSM: 91-9538106181.
Re: Regarding DrillDown search
Ahh I see ... so given a single FacetResultNode, you would like to know which documents contributed to its weight (count in your case). This is not available immediately, that's why you need to do a drill-down query. So if you return the user a list of categories, when he clicks one of them, you perform a drill-down query on that category and retrieve all the associated documents. May I ask why do you need to know the list of documents given a FacetResultNode? Basically in the 3.6 API it's kind of not so simple to do what you want in one-pass, but in the 4.x API (especially the upcoming 4.7) it should be very easy -- when you traverse the list of matching documents, besides only reading the list of categories associated with it, you also store a map Category -> List. This isn't very cheap though ... So I guess it would be good if I understand why do you need to know which documents contributed to which category, before the results are returned to the user. Shai On Mon, Feb 10, 2014 at 3:16 PM, Jebarlin Robertson wrote: > Hi Shai, > > Thanks, > > I am using the same way of BooleanQuery only with list of PrefixQuery only. > I think I confused you sorry :) . > > I am using the same above code to get the result of documents. I am getting > the TopDocs and retrieving the Documents also, If I don't even try that for > the basic you will kill me :D. > But my question was different, from the List of FacetResult I am getting > only the counts or no of hits of Document in each category after iterating > the list. > I believe that the getLevel() of FacetNode returns the no of hits or no of > documents falls into the particular Category. > I need to know which are the documents are falling under the same category > from the FacetResult Object also. > > I hope you will understand my question :) > > Thank you :) > > -- > Jebarlin > > > > On Mon, Feb 10, 2014 at 9:09 PM, Shai Erera wrote: > > > Hi > > > > You will need to build a BooleanQuery which comprises a list of > > PrefixQuery. The relation between each PrefixQuery should be OR or AND, > as > > you see fit (I believe OR?). > > > > In order to get documents' attributes you should execute > searcher.search() > > w/ e.g. MultiCollector which wraps a FacetsCollector and > > TopScoreDocCollector. Then after .search() finished, you should pull the > > facet results from the FacetsCollector instance and the document results > > from the TopScoreDocCollector instance. Something like (I hope it > compiles > > in 3.6! :)): > > > > TopScoreDocCollector tsdc = TopScoreDocCollector.create(...); > > FacetsCollector fc = FacetsCollector.create(...); > > searcher.search(query, MultiCollector.wrap(tsdc, fc)); > > > > List facetResults = fc.getFacetResults(); > > TopDocs topDocs = tsdc.topDocs(); > > > > Something like that.. > > > > Shai > > > > > > On Mon, Feb 10, 2014 at 1:57 PM, Jebarlin Robertson > >wrote: > > > > > Dear Shai, > > > > > > Thank you for the quick response :) > > > > > > I have checked with PrefixQuery and term, it is working fine, But I > > think I > > > cannot pass multiple Category path in it. I am calling the > > > DrillDown.term() method 'N' number of times based on the number of > > Category > > > Path list. > > > > > > And I have one more question, When I get the FacetResult, I am getting > > only > > > the count of documents matched with the Category Path. > > > Is there anyway to get the Document object also along with the count to > > > know the file names For ex. Files (file names -title Field in Document) > > > which have the same Author from the FacetResult. I have read some > > articles > > > for the same from one of your answer I believe. > > > In that you have explained like this "Categories will be listed to the > > user > > > and when the user clicks the category we have to do DrillDown search to > > get > > > further result. > > > I just want to know if we can get the document names as well in the > first > > > Facet query search itself, when we get the count (no of hits) of > > documents > > > along with the FacetResult. Is there any solution available already or > > what > > > I can do for that. > > > > > > Kindly Guide me :) > > > > > > Thank you for All your Support. > > > > > > Regards, > > > Jebarlin.R > > > > > > > > > On Mon, Feb 10, 2014 at 1:28 PM, Shai Erera wrote: > > > > > > > Hi > > > > > > > > If you want to drill-down on first name only, then you have several > > > > options: > > > > > > > > 1) Index Author/First, Author/Last, Author/First_Last as facets on > the > > > > document. This is the faster approach, but bloats the index. Also, if > > you > > > > index the author Author/Jebarlin, Author/Robertson and > > > > Author/Jebarlin_Robertson, it still won't allow you to execute a > query > > > > Author/Jebar. > > > > > > > > 2) You should modify the query to be a PrefixQuery, as if the user > > chose > > > to > > > > search Author/Jeral*. You can do that with DrillDown.term() to > create a > > > > Term($facets, Autho
Faceting and Query-time Joins
Hello, tl;dr: I'd like to know how to do faceting over the result set of a query-time join (JoinUtils). If it's not currently supported by Lucene, I'd appreciate some pointers about what needs to be done. I'm working on a greenfields project with Lucene 4.6. The application treats its primary objects as a collection of child records. The child records are of different types and, unfortunately, are not available all at once (ruling out BlockJoinQuery). As the child records roll into the system for indexing, they're represented as Lucene Document objects that have the primary key of the parent object as a field. The child records themselves never change, so there's no need for re-indexing. I can use query-time joins on the parent ID field. So far, so good. The problem is that I also very much want to have faceting pertaining to the parent objects. Googling around the past couple days hasn't revealed much discussion of how to combine facets with query-time joins (except "nope": http://search-lucene.com/m/QTPadBcnv1). Is it possible to combine these two features with the above constraints? If so, how? If not in Lucene 4.6, is there related work in trunk? One thing I was thinking about last night is that it wouldn't seem to be too hard to do the faceting for this case by using update-able NumericDocValue on a dummy parent object, since that shouldn't require re-indexing. TIA, Jon -- Jon Stewart, Principal (646) 719-0317 | j...@lightboxtechnologies.com | Arlington, VA - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Regarding DrillDown search
Hi Shai, Thanks for the explanation :) For my requirement, I just want to display the list of resulted documents to the user. In Facet search case also, I already have the resulted documents list in TopDoc and the FacetResults have only the count of documents contributed to each Catagory, According to my understanding, Suppose I query for the word "Love", Now I do Facet Search and gets 4 (Files) documents as matched results from TopScoreDocCollector as TopDocs and I will get the FacetResult from the FacetCollector. And the FacetResultsNode gives me only the values of the category and the count of how many documents falls under same category (May be by Author or other provided categories ) among the 4 resulted documents only. I feel, It will be good if I get the category association with the resulted documents, as I have the document list already from TopScoreDocCollector. I can do DrillDown Search also by selecting each category, But in my case I just want to display the 4 documents result first and then category wise, suppose 2 documents by the same Author etc As per my requirement, I am doing DrillDown Search by asking the user to provide such as title of the docment, author of the document, etc... as advanced search option. --- Jebarlin Robertson.R On Mon, Feb 10, 2014 at 10:30 PM, Shai Erera wrote: > Ahh I see ... so given a single FacetResultNode, you would like to know > which documents contributed to its weight (count in your case). This is not > available immediately, that's why you need to do a drill-down query. So if > you return the user a list of categories, when he clicks one of them, you > perform a drill-down query on that category and retrieve all the associated > documents. > > May I ask why do you need to know the list of documents given a > FacetResultNode? > > Basically in the 3.6 API it's kind of not so simple to do what you want in > one-pass, but in the 4.x API (especially the upcoming 4.7) it should be > very easy -- when you traverse the list of matching documents, besides only > reading the list of categories associated with it, you also store a map > Category -> List. This isn't very cheap though ... > > So I guess it would be good if I understand why do you need to know which > documents contributed to which category, before the results are returned to > the user. > > Shai > > > On Mon, Feb 10, 2014 at 3:16 PM, Jebarlin Robertson >wrote: > > > Hi Shai, > > > > Thanks, > > > > I am using the same way of BooleanQuery only with list of PrefixQuery > only. > > I think I confused you sorry :) . > > > > I am using the same above code to get the result of documents. I am > getting > > the TopDocs and retrieving the Documents also, If I don't even try that > for > > the basic you will kill me :D. > > But my question was different, from the List of FacetResult I am getting > > only the counts or no of hits of Document in each category after > iterating > > the list. > > I believe that the getLevel() of FacetNode returns the no of hits or no > of > > documents falls into the particular Category. > > I need to know which are the documents are falling under the same > category > > from the FacetResult Object also. > > > > I hope you will understand my question :) > > > > Thank you :) > > > > -- > > Jebarlin > > > > > > > > On Mon, Feb 10, 2014 at 9:09 PM, Shai Erera wrote: > > > > > Hi > > > > > > You will need to build a BooleanQuery which comprises a list of > > > PrefixQuery. The relation between each PrefixQuery should be OR or AND, > > as > > > you see fit (I believe OR?). > > > > > > In order to get documents' attributes you should execute > > searcher.search() > > > w/ e.g. MultiCollector which wraps a FacetsCollector and > > > TopScoreDocCollector. Then after .search() finished, you should pull > the > > > facet results from the FacetsCollector instance and the document > results > > > from the TopScoreDocCollector instance. Something like (I hope it > > compiles > > > in 3.6! :)): > > > > > > TopScoreDocCollector tsdc = TopScoreDocCollector.create(...); > > > FacetsCollector fc = FacetsCollector.create(...); > > > searcher.search(query, MultiCollector.wrap(tsdc, fc)); > > > > > > List facetResults = fc.getFacetResults(); > > > TopDocs topDocs = tsdc.topDocs(); > > > > > > Something like that.. > > > > > > Shai > > > > > > > > > On Mon, Feb 10, 2014 at 1:57 PM, Jebarlin Robertson < > jebar...@gmail.com > > > >wrote: > > > > > > > Dear Shai, > > > > > > > > Thank you for the quick response :) > > > > > > > > I have checked with PrefixQuery and term, it is working fine, But I > > > think I > > > > cannot pass multiple Category path in it. I am calling the > > > > DrillDown.term() method 'N' number of times based on the number of > > > Category > > > > Path list. > > > > > > > > And I have one more question, When I get the FacetResult, I am > getting > > > only > > > > the count of documents matched with the Category Path. > > > > Is there anyway to get the Document
Re: Regarding DrillDown search
What you want sounds like grouping more like faceting? So e.g. if you have an Author field with values A1, A2, A3, and the user searches for 'love', then if I understand correctly, you want to display something like: Author/A1 Doc1 Doc2 Author/A2 Doc3 Doc4 Author/A3 Doc5 Doc6 Is that right? Where's today your result page looks like this: Facets Results -- --- Author Doc1_Title A1 (4) Doc1_Highlight A2 (3) A3 (1) Doc2_Title Doc2_Highlight +++ ... (Forgive my lack of creativity :)). If you're not interested in join, and just want to add to each document its Author facet in the results pane, then I suggest you add another stored field (only stored, not indexed) with the category value. And then you could display: Facets Results -- --- Author Doc1_Title A1 (4) Doc1_Highlight A2 (3) Author: A1 A3 (1) Doc2_Title Doc2_Highlight Author: A2 +++ ... Did I understand properly? Shai On Mon, Feb 10, 2014 at 4:51 PM, Jebarlin Robertson wrote: > Hi Shai, > > Thanks for the explanation :) > > For my requirement, I just want to display the list of resulted documents > to the user. > In Facet search case also, I already have the resulted documents list in > TopDoc and the FacetResults have only the count of documents contributed to > each Catagory, > > According to my understanding, > > Suppose I query for the word "Love", Now I do Facet Search and gets 4 > (Files) documents as matched results from TopScoreDocCollector as TopDocs > and I will get the FacetResult from the FacetCollector. > And the FacetResultsNode gives me only the values of the category and the > count of how many documents falls under same category (May be by Author or > other provided categories ) among the 4 resulted documents only. > > I feel, It will be good if I get the category association with the resulted > documents, as I have the document list already from TopScoreDocCollector. > > I can do DrillDown Search also by selecting each category, But in my case I > just want to display the 4 documents result first and then category wise, > suppose 2 documents by the same Author etc > > As per my requirement, I am doing DrillDown Search by asking the user to > provide such as title of the docment, author of the document, etc... as > advanced search option. > > --- > Jebarlin Robertson.R > > > > On Mon, Feb 10, 2014 at 10:30 PM, Shai Erera wrote: > > > Ahh I see ... so given a single FacetResultNode, you would like to know > > which documents contributed to its weight (count in your case). This is > not > > available immediately, that's why you need to do a drill-down query. So > if > > you return the user a list of categories, when he clicks one of them, you > > perform a drill-down query on that category and retrieve all the > associated > > documents. > > > > May I ask why do you need to know the list of documents given a > > FacetResultNode? > > > > Basically in the 3.6 API it's kind of not so simple to do what you want > in > > one-pass, but in the 4.x API (especially the upcoming 4.7) it should be > > very easy -- when you traverse the list of matching documents, besides > only > > reading the list of categories associated with it, you also store a map > > Category -> List. This isn't very cheap though ... > > > > So I guess it would be good if I understand why do you need to know which > > documents contributed to which category, before the results are returned > to > > the user. > > > > Shai > > > > > > On Mon, Feb 10, 2014 at 3:16 PM, Jebarlin Robertson > >wrote: > > > > > Hi Shai, > > > > > > Thanks, > > > > > > I am using the same way of BooleanQuery only with list of PrefixQuery > > only. > > > I think I confused you sorry :) . > > > > > > I am using the same above code to get the result of documents. I am > > getting > > > the TopDocs and retrieving the Documents also, If I don't even try that > > for > > > the basic you will kill me :D. > > > But my question was different, from the List of FacetResult I am > getting > > > only the counts or no of hits of Document in each category after > > iterating > > > the list. > > > I believe that the getLevel() of FacetNode returns the no of hits or no > > of > > > documents falls into the particular Category. > > > I need to know which are the documents are falling under the same > > category > > > from the FacetResult Object also. > > > > > > I hope you will understand my question :) > > > > > > Thank you :) > > > > > > -- > > > Jebarlin > > > > > > > > > > > > On Mon, Feb 10, 2014 at 9:09 PM, Shai Erera wrote: > > > > > > > Hi > > > > > > > > You will need to build a BooleanQuery which comprises a list of > > > > PrefixQuery. The relation between each PrefixQuery should be OR or > AN
Re: Regarding DrillDown search
Hi Shai, Yeah exactly the same way I want to display. Then I will do the same way of stored field. It is not about lack of creativity, I might have not explained you in the proper way :) Thank you for all the support :) On Tue, Feb 11, 2014 at 12:23 AM, Shai Erera wrote: > What you want sounds like grouping more like faceting? > > So e.g. if you have an Author field with values A1, A2, A3, and the user > searches for 'love', > then if I understand correctly, you want to display something like: > > Author/A1 > Doc1 > Doc2 > Author/A2 > Doc3 > Doc4 > Author/A3 > Doc5 > Doc6 > > Is that right? > > > Where's today your result page looks like this: > > Facets Results > -- --- > Author Doc1_Title > A1 (4) Doc1_Highlight > A2 (3) > A3 (1) Doc2_Title > Doc2_Highlight > +++ > ... > > (Forgive my lack of creativity :)). > > If you're not interested in join, and just want to add to each document its > Author facet in the results pane, then I suggest you add another stored > field (only stored, not indexed) with the category value. And then you > could display: > > Facets Results > -- --- > Author Doc1_Title > A1 (4) Doc1_Highlight > A2 (3) Author: A1 > A3 (1) > Doc2_Title > Doc2_Highlight > Author: A2 > +++ > ... > > Did I understand properly? > > Shai > > On Mon, Feb 10, 2014 at 4:51 PM, Jebarlin Robertson >wrote: > > > Hi Shai, > > > > Thanks for the explanation :) > > > > For my requirement, I just want to display the list of resulted documents > > to the user. > > In Facet search case also, I already have the resulted documents list in > > TopDoc and the FacetResults have only the count of documents contributed > to > > each Catagory, > > > > According to my understanding, > > > > Suppose I query for the word "Love", Now I do Facet Search and gets 4 > > (Files) documents as matched results from TopScoreDocCollector as TopDocs > > and I will get the FacetResult from the FacetCollector. > > And the FacetResultsNode gives me only the values of the category and the > > count of how many documents falls under same category (May be by Author > or > > other provided categories ) among the 4 resulted documents only. > > > > I feel, It will be good if I get the category association with the > resulted > > documents, as I have the document list already from TopScoreDocCollector. > > > > I can do DrillDown Search also by selecting each category, But in my > case I > > just want to display the 4 documents result first and then category wise, > > suppose 2 documents by the same Author etc > > > > As per my requirement, I am doing DrillDown Search by asking the user to > > provide such as title of the docment, author of the document, etc... as > > advanced search option. > > > > --- > > Jebarlin Robertson.R > > > > > > > > On Mon, Feb 10, 2014 at 10:30 PM, Shai Erera wrote: > > > > > Ahh I see ... so given a single FacetResultNode, you would like to know > > > which documents contributed to its weight (count in your case). This is > > not > > > available immediately, that's why you need to do a drill-down query. So > > if > > > you return the user a list of categories, when he clicks one of them, > you > > > perform a drill-down query on that category and retrieve all the > > associated > > > documents. > > > > > > May I ask why do you need to know the list of documents given a > > > FacetResultNode? > > > > > > Basically in the 3.6 API it's kind of not so simple to do what you want > > in > > > one-pass, but in the 4.x API (especially the upcoming 4.7) it should be > > > very easy -- when you traverse the list of matching documents, besides > > only > > > reading the list of categories associated with it, you also store a map > > > Category -> List. This isn't very cheap though ... > > > > > > So I guess it would be good if I understand why do you need to know > which > > > documents contributed to which category, before the results are > returned > > to > > > the user. > > > > > > Shai > > > > > > > > > On Mon, Feb 10, 2014 at 3:16 PM, Jebarlin Robertson < > jebar...@gmail.com > > > >wrote: > > > > > > > Hi Shai, > > > > > > > > Thanks, > > > > > > > > I am using the same way of BooleanQuery only with list of PrefixQuery > > > only. > > > > I think I confused you sorry :) . > > > > > > > > I am using the same above code to get the result of documents. I am > > > getting > > > > the TopDocs and retrieving the Documents also, If I don't even try > that > > > for > > > > the basic you will kill me :D. > > > > But my question was different, from the List of FacetResult I am > > getting > > > > only the counts or no of hits of Document in each category after > > > iterating > > > > the list. > > > > I believe that the g
Re: Regarding DrillDown search
You're welcome. And I suggest that you upgrade to 4.7 as soon as it's out! :) Shai On Mon, Feb 10, 2014 at 5:48 PM, Jebarlin Robertson wrote: > Hi Shai, > > Yeah exactly the same way I want to display. > > Then I will do the same way of stored field. > > It is not about lack of creativity, I might have not explained you in the > proper way :) > > Thank you for all the support :) > > > On Tue, Feb 11, 2014 at 12:23 AM, Shai Erera wrote: > > > What you want sounds like grouping more like faceting? > > > > So e.g. if you have an Author field with values A1, A2, A3, and the user > > searches for 'love', > > then if I understand correctly, you want to display something like: > > > > Author/A1 > > Doc1 > > Doc2 > > Author/A2 > > Doc3 > > Doc4 > > Author/A3 > > Doc5 > > Doc6 > > > > Is that right? > > > > > > Where's today your result page looks like this: > > > > Facets Results > > -- --- > > Author Doc1_Title > > A1 (4) Doc1_Highlight > > A2 (3) > > A3 (1) Doc2_Title > > Doc2_Highlight > > +++ > > ... > > > > (Forgive my lack of creativity :)). > > > > If you're not interested in join, and just want to add to each document > its > > Author facet in the results pane, then I suggest you add another stored > > field (only stored, not indexed) with the category value. And then you > > could display: > > > > Facets Results > > -- --- > > Author Doc1_Title > > A1 (4) Doc1_Highlight > > A2 (3) Author: A1 > > A3 (1) > > Doc2_Title > > Doc2_Highlight > > Author: A2 > > +++ > > ... > > > > Did I understand properly? > > > > Shai > > > > On Mon, Feb 10, 2014 at 4:51 PM, Jebarlin Robertson > >wrote: > > > > > Hi Shai, > > > > > > Thanks for the explanation :) > > > > > > For my requirement, I just want to display the list of resulted > documents > > > to the user. > > > In Facet search case also, I already have the resulted documents list > in > > > TopDoc and the FacetResults have only the count of documents > contributed > > to > > > each Catagory, > > > > > > According to my understanding, > > > > > > Suppose I query for the word "Love", Now I do Facet Search and gets 4 > > > (Files) documents as matched results from TopScoreDocCollector as > TopDocs > > > and I will get the FacetResult from the FacetCollector. > > > And the FacetResultsNode gives me only the values of the category and > the > > > count of how many documents falls under same category (May be by Author > > or > > > other provided categories ) among the 4 resulted documents only. > > > > > > I feel, It will be good if I get the category association with the > > resulted > > > documents, as I have the document list already from > TopScoreDocCollector. > > > > > > I can do DrillDown Search also by selecting each category, But in my > > case I > > > just want to display the 4 documents result first and then category > wise, > > > suppose 2 documents by the same Author etc > > > > > > As per my requirement, I am doing DrillDown Search by asking the user > to > > > provide such as title of the docment, author of the document, etc... as > > > advanced search option. > > > > > > --- > > > Jebarlin Robertson.R > > > > > > > > > > > > On Mon, Feb 10, 2014 at 10:30 PM, Shai Erera wrote: > > > > > > > Ahh I see ... so given a single FacetResultNode, you would like to > know > > > > which documents contributed to its weight (count in your case). This > is > > > not > > > > available immediately, that's why you need to do a drill-down query. > So > > > if > > > > you return the user a list of categories, when he clicks one of them, > > you > > > > perform a drill-down query on that category and retrieve all the > > > associated > > > > documents. > > > > > > > > May I ask why do you need to know the list of documents given a > > > > FacetResultNode? > > > > > > > > Basically in the 3.6 API it's kind of not so simple to do what you > want > > > in > > > > one-pass, but in the 4.x API (especially the upcoming 4.7) it should > be > > > > very easy -- when you traverse the list of matching documents, > besides > > > only > > > > reading the list of categories associated with it, you also store a > map > > > > Category -> List. This isn't very cheap though ... > > > > > > > > So I guess it would be good if I understand why do you need to know > > which > > > > documents contributed to which category, before the results are > > returned > > > to > > > > the user. > > > > > > > > Shai > > > > > > > > > > > > On Mon, Feb 10, 2014 at 3:16 PM, Jebarlin Robertson < > > jebar...@gmail.com > > > > >wrote: > > > > > > > > > Hi Shai, > > > > > > > > > > Thanks, > > > > > > > > > > I am using the same way of BooleanQuery only with list of > PrefixQuery > > > > only. > > > > >
Re: Faceting and Query-time Joins
Are you faceting on parent values or child values? Parent values should be easy; child values is not. Mike McCandless http://blog.mikemccandless.com On Mon, Feb 10, 2014 at 9:05 AM, Jon Stewart wrote: > Hello, > > tl;dr: I'd like to know how to do faceting over the result set of a > query-time join (JoinUtils). If it's not currently supported by > Lucene, I'd appreciate some pointers about what needs to be done. > > I'm working on a greenfields project with Lucene 4.6. The application > treats its primary objects as a collection of child records. The child > records are of different types and, unfortunately, are not available > all at once (ruling out BlockJoinQuery). As the child records roll > into the system for indexing, they're represented as Lucene Document > objects that have the primary key of the parent object as a field. The > child records themselves never change, so there's no need for > re-indexing. I can use query-time joins on the parent ID field. So > far, so good. > > The problem is that I also very much want to have faceting pertaining > to the parent objects. Googling around the past couple days hasn't > revealed much discussion of how to combine facets with query-time > joins (except "nope": http://search-lucene.com/m/QTPadBcnv1). Is it > possible to combine these two features with the above constraints? If > so, how? If not in Lucene 4.6, is there related work in trunk? One > thing I was thinking about last night is that it wouldn't seem to be > too hard to do the faceting for this case by using update-able > NumericDocValue on a dummy parent object, since that shouldn't require > re-indexing. > > TIA, > > Jon > -- > Jon Stewart, Principal > (646) 719-0317 | j...@lightboxtechnologies.com | Arlington, VA > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
New to Apache Lucene: Need help in querying data - text with wildCards
I have an application which is a log-analyzer, and I am using Apache Lucene to index my data, and I am storing only message in it (I am not storing all other fields in my object), and I am not using any database so I am using store for message though its huge) but I am taking care of deleting this data weekly to start a fresh indexing. I have created a domain object to ease my search with lucene in retrieving and indexing my data. I have these kind of fields in my object, className (value is fully qualified class with package, example: com.domain.infrastructure.MyClass), messageType (value example: xml, log message, exception) logLevel, timestamp (I am storing this as Long type) and logMessage (contains text and special characters like <,[,{.etc.) Main purpose is to retrieve logMessage based on user request, few scenarios below… Case 1: User can request a soap message (messageType:XML), at particularTime (timestamp: longVariable), Case 2: User can request a particular message (messageType: logMessage), at particular time (timestamp:longVariable), from particular className (className:com.businessdomain.layer.MyClass) Or Case 3: User can request a particular message(messageType: Exception), in loglevel (logLevel: DEBUG) at particular time (timestamp:longVariable) Currently I am Indexing data like this: document.add(new StringField("className", logsVO.getClassName(), Field.Store.NO)); document.add(new StringField("logLevel", logsVO.getLogLevel(), Field.Store.NO)); document.add(new TextField("logMessage", logsVO.getLogMessage(), Field.Store.YES)); document.add(new StringField("messageType", logsVO.getMessageType().toString(), Field.Store.NO)); document.add(new NumericDocValuesField("path", logsVO.hashCode())); document.add((new LongField("timeStamp", logsVO.getTimeStamp().getTime(), Field.Store.NO))); Actual Log Line is like this: 2013-12-19 15:53:42.379 [server.startup : 0] DEBUG o.a.commons.digester3.Digester - [ObjectCreateRule]{maplist/recvmap/recvfrag/recvfragoccurs/recvprop} Pop 'com.domain.ec.util.mapper.node.SomeClass' So here 2013-12-19 15:53:42.379 is timestamp, [server.startup : 0] - I will ignore this part DEBUG is logLevel, ‘o.a.commons.digester3.Digester’ is className [ObjectCreateRule]{maplist/recvmap/recvfrag/recvfragoccurs/recvprop} Pop 'com.domain.ec.util.mapper.node.SomeClass' This is my logMessage Now I am coming to my Problem: I have tried PhraseQuery,BooleanQuery and WildcardQuery too, but only time I am getting results is when I mentioned a small string like “pop” (in above logMessage), in all other cases which has any special characters I am not getting the results. Can anyone suggest what would be the pattern I have to use to satisfy above mentioned three cases user request? I appreciate your help in this regard. -- View this message in context: http://lucene.472066.n3.nabble.com/New-to-Apache-Lucene-Need-help-in-querying-data-text-with-wildCards-tp4116515.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: New to Apache Lucene: Need help in querying data - text with wildCards
Likely your analyzer (which one are you using?) is breaking up your text into tokens you don't expect? If you use QueryParser, passing the same analyzer, then it will also tokenize your query into the same tokens, and you should get the expected hits. But you may need your own analyzer to "properly" (by your definition) tokenize the log messages... Mike McCandless http://blog.mikemccandless.com On Mon, Feb 10, 2014 at 12:06 PM, gudiseashok wrote: > I have an application which is a log-analyzer, and I am using Apache Lucene > to index my data, and I am storing only message in it (I am not storing all > other fields in my object), and I am not using any database so I am using > store for message though its huge) but I am taking care of deleting this > data weekly to start a fresh indexing. > I have created a domain object to ease my search with lucene in retrieving > and indexing my data. > I have these kind of fields in my object, > className (value is fully qualified class with package, example: > com.domain.infrastructure.MyClass), messageType (value example: xml, log > message, exception) > logLevel, timestamp (I am storing this as Long type) > and logMessage (contains text and special characters like <,[,{.etc.) > Main purpose is to retrieve logMessage based on user request, few scenarios > below... > > Case 1: User can request a soap message (messageType:XML), at > particularTime (timestamp: longVariable), > Case 2: User can request a particular message (messageType: logMessage), at > particular time (timestamp:longVariable), from particular className > (className:com.businessdomain.layer.MyClass) > Or Case 3: User can request a particular message(messageType: Exception), in > loglevel (logLevel: DEBUG) at particular time (timestamp:longVariable) > Currently I am Indexing data like this: > > document.add(new StringField("className", logsVO.getClassName(), > Field.Store.NO)); > document.add(new StringField("logLevel", logsVO.getLogLevel(), > Field.Store.NO)); > document.add(new TextField("logMessage", logsVO.getLogMessage(), > Field.Store.YES)); > document.add(new StringField("messageType", > logsVO.getMessageType().toString(), Field.Store.NO)); > document.add(new NumericDocValuesField("path", logsVO.hashCode())); > document.add((new LongField("timeStamp", logsVO.getTimeStamp().getTime(), > Field.Store.NO))); > > Actual Log Line is like this: > 2013-12-19 15:53:42.379 [server.startup : 0] DEBUG > o.a.commons.digester3.Digester - > [ObjectCreateRule]{maplist/recvmap/recvfrag/recvfragoccurs/recvprop} Pop > 'com.domain.ec.util.mapper.node.SomeClass' > So here 2013-12-19 15:53:42.379 is timestamp, > [server.startup : 0] - I will ignore this part > DEBUG is logLevel, > 'o.a.commons.digester3.Digester' is className > [ObjectCreateRule]{maplist/recvmap/recvfrag/recvfragoccurs/recvprop} Pop > 'com.domain.ec.util.mapper.node.SomeClass' This is my logMessage > > Now I am coming to my Problem: I have tried PhraseQuery,BooleanQuery and > WildcardQuery too, but only time I am getting results is when I mentioned a > small string like "pop" (in above logMessage), in all other cases which has > any special characters I am not getting the results. Can anyone suggest what > would be the pattern I have to use to satisfy above mentioned three cases > user request? > > I appreciate your help in this regard. > > > > > > > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/New-to-Apache-Lucene-Need-help-in-querying-data-text-with-wildCards-tp4116515.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: New to Apache Lucene: Need help in querying data - text with wildCards
Hi Michael Thank you very much for your response, before I am trying your solution (writing my own analyzer) I am just using Standard Analyzer which is available in the example in lucene documentation, please see here my code which is writing to lucene, [code] File indexDirFile = new File(this.indexDir); Directory dir = FSDirectory.open(indexDirFile); /** * Use by certain classes to match version compatibility across releases of Lucene. * WARNING: When changing the version parameter that you supply to components in Lucene, * do not simply change the version at search-time, but instead also adjust your * indexing code to match, and re-index. */ /** * */ Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_46); /** * Holds all the configuration that is used to create an IndexWriter. * Once IndexWriter has been created with this object, changes to this * object will not affect the IndexWriter instance. For that, use * LiveIndexWriterConfig that is returned from IndexWriter.getConfig(). */ IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_46, analyzer); if((overrideOpenModeFlag && openModeFlag)|| create){ iwc.setOpenMode(IndexWriterConfig.OpenMode.CREATE); }else{ iwc.setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND); } this.writer = new IndexWriter(dir, iwc); [code] and my indexSearcher is , [code] IndexReader indexReader = null; IndexSearcher indexSearcher = null; try{ File indexDirFile = new File(this.indexDir); Directory dir = FSDirectory.open(indexDirFile); indexReader = DirectoryReader.open(dir); indexSearcher = new IndexSearcher(indexReader); }catch(IOException ioe){ ioe.printStackTrace(); } this.indexSearcher = indexSearcher; [code] please advice, -- View this message in context: http://lucene.472066.n3.nabble.com/New-to-Apache-Lucene-Need-help-in-querying-data-text-with-wildCards-tp4116515p4116519.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Faceting and Query-time Joins
Child values. There really isn't a true parent, other than to refer to the collection of children. However, the children are all of different types and I'd expect that a given facet would only pertain to a given child, i.e., you're not going to get a second child which involves the same facet. A bit unusual is that I care more about indexing performance and less about query latency. I don't want queries that take too long, of course, but a second or two or three is fine, and I don't expect much, if any, concurrency (nor is adding RAM :-). So having to do a little more work at query time isn't that big of a concern if I can avoid re-indexing. Jon On Mon, Feb 10, 2014 at 12:04 PM, Michael McCandless wrote: > Are you faceting on parent values or child values? > > Parent values should be easy; child values is not. > > Mike McCandless > > http://blog.mikemccandless.com > > > On Mon, Feb 10, 2014 at 9:05 AM, Jon Stewart > wrote: >> Hello, >> >> tl;dr: I'd like to know how to do faceting over the result set of a >> query-time join (JoinUtils). If it's not currently supported by >> Lucene, I'd appreciate some pointers about what needs to be done. >> >> I'm working on a greenfields project with Lucene 4.6. The application >> treats its primary objects as a collection of child records. The child >> records are of different types and, unfortunately, are not available >> all at once (ruling out BlockJoinQuery). As the child records roll >> into the system for indexing, they're represented as Lucene Document >> objects that have the primary key of the parent object as a field. The >> child records themselves never change, so there's no need for >> re-indexing. I can use query-time joins on the parent ID field. So >> far, so good. >> >> The problem is that I also very much want to have faceting pertaining >> to the parent objects. Googling around the past couple days hasn't >> revealed much discussion of how to combine facets with query-time >> joins (except "nope": http://search-lucene.com/m/QTPadBcnv1). Is it >> possible to combine these two features with the above constraints? If >> so, how? If not in Lucene 4.6, is there related work in trunk? One >> thing I was thinking about last night is that it wouldn't seem to be >> too hard to do the faceting for this case by using update-able >> NumericDocValue on a dummy parent object, since that shouldn't require >> re-indexing. >> >> TIA, >> >> Jon >> -- >> Jon Stewart, Principal >> (646) 719-0317 | j...@lightboxtechnologies.com | Arlington, VA >> >> - >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > -- Jon Stewart, Principal (646) 719-0317 | j...@lightboxtechnologies.com | Arlington, VA - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
[Suggestions Required] 110 Concurrency users indexing on Lucene dont finish in 200 ms.
Group - We have an Indexing and Searching Service (using Lucene 4.0) implemented over REST as part of our framework, which all the related modules will use to publish data that make it available for the UI. Moreover, every rest call that our service receives has a proxy timeout limit of 200 ms. Initially we started concurrency tests with Indexing. Recently, we started doing some concurrency tests on our service to handle min 110 users, who will be publishing json data ranging from 10, 50 100 documents by one user. The document is not huge like what u are using for your performance tests. Mostly we call updateDocument() as we do not know whether the document is already indexed or not 110 users/threads with a ramp up time of 0 seconds seems to fail randomly because some of the threads/REST call do not complete within 200 ms. We tried the following. 1. maxBufferSize of 500 MB. 2. DWPT count to 10. 3. Tried updateDocument in separate thread, as said in one of your suggestions in the internet. Doesn't seem to help. 4. TiredMergePolicy has been set. Unfortunately this does not help as the maxBufferSize is not hit while we index, and merge does not happen yet. 5. We are yet to try with NMapDirectory. Are there any other suggestions/experiences that you can share with us. Are there any benchmark tests that you have done on concurrency? Please help us. -Vidhya