Re: how to use DuplicateFilter to get unique documents based on a fieldName

2010-03-05 Thread ani...@ekkitab
Ok sorry for not explaining my problem clearly earlier. We have around 5 fields in each document. ID, ISBN, author, title and the category which this book falls under. ( You are right about point 3, we are indeed storing multiple genre against the book, which means 1 book 1 doc.) doc.add(new Fie

Re: how to use DuplicateFilter to get unique documents based on a fieldName

2010-03-05 Thread Anshum
Hi Anish, So am I getting something wrong here? You said "I have created a search index on book Id , title ,and author from a database of books which fall under various categories." so those are 3 fields, right? 1. How do you filter the doc types (as in the genres) at search time? Do you even need

Re: how to use DuplicateFilter to get unique documents based on a fieldName

2010-03-04 Thread ani...@ekkitab
Hi Zhangchi Thanks for your reply. We have about 3 million records (different isbns) in the database and documents little more than that, and we wouldn't want to do the deduping at indexing time, because one book ( one isbn ) can be available under 2 or more categories( like fiction, comics &

Re: how to use DuplicateFilter to get unique documents based on a fieldName

2010-03-04 Thread ani...@ekkitab
Hi Ian, Thanks for your reply. We had actually done what you had suggested first, and it wasn't working, so I was hoping for some sample code. But then we found out that the field name on which we wanted the duplicate filter to be applied was not actually indexed while adding it into the document

Re: how to use DuplicateFilter to get unique documents based on a fieldName

2010-03-04 Thread zhangchi
i think you should check the index first.using the lukeall to see if there is the duplicate books. On Thu, 04 Mar 2010 20:43:26 +0800, ani...@ekkitab wrote: Hi there, Could someone help me with the usage of DuplicateFilters. Here is my problem I have created a search index on book

Re: how to use DuplicateFilter to get unique documents based on a fieldName

2010-03-04 Thread Ian Lea
If the field you want to use for deduping is ISBN, create a DuplicateFilter using whatever your ISBN field name is as the field name and pass that to one of the search methods that takes a filter. If your index is large I'd be worried about performance and would look at deduping at indexing time i