Re: Multiple Values -Structured?
multiValued fields retain their order, for the record. Erik On Sep 4, 2007, at 12:37 AM, Jed Reynolds wrote: One of the difficulties that you're going to find with multi-valued fields is that they are an unordered collection without relation. If you have a document with a list of editors and revisions, the two fields have no inherent correlation unless your application can extract it from the data itself. [doc] [id]123[/id] [str name=name]hello world[/str] [array name=editor] [str name=editor]Fred[/str] [str name=editor]Bob[/str] [/array] [array name=revisiondate] [date name=revisiondate]2006-01-01T00:00:00Z[/date] [date name=revisiondate]2006-01-02T00:00:00Z[/date] [/array] [/doc] If your application can decipher that and do a slice on it showing a revision...then brilliant! But if the multi-value fields are out of order, that might make a significant different. I would create a document per revision and take advantage of range queries and sorting available at the query level. Jed
Re: Multiple Values -Structured?
You could index both a compound field and the components separately. This could be simplified by sending the value in once as the compound format: review,1 Jan 2007 revision, 2 Jan 200 And then use a copyField with a regex tokenizer to extract and index the date into a separate field. You could index the type separately via the same mechanism. -Yonik On 9/3/07, Bharani [EMAIL PROTECTED] wrote: Hi, I have got two sets of document 1) Primary Document 2) Occurrences of primary document Since there is no such thing as join i can either a) Post the primary document with occurrences as multi valued field or b) Post the primary document for every occurrences i.e. classic de-normalized route My problem with Option a) This works great as long as the occurrence is a single field but if i had a group of fields that describes the occurrence then the search returns wrong results becuase of the nature of text search i.e date1 Jan 2007/date type review/type date 2 Jan 2007 /date type revision/type if i search for 2 Jan 2007 and date 1 Jan 2007 /date i will get a hit (which is wrong) becuase there is no grouping of fields to associate date and type as one unit. If i merge them as one entity then i cant use the range quieries for date Option B) This would result in large number of documents and even if i try with index only and not store i am still have to deal with duplicate hit - becuase all i want is the primary document Is there a better approach to the problem? Thanks Bharani -- View this message in context: http://www.nabble.com/Multiple-Values--Structured--tf4370282.html#a12456399 Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multiple Values -Structured?
Thanks Yonik - I didnt know that before. But i am not sure how i can use the range queries on this compound field so that i dont get the wrong result. -Bharani Yonik Seeley wrote: You could index both a compound field and the components separately. This could be simplified by sending the value in once as the compound format: review,1 Jan 2007 revision, 2 Jan 200 And then use a copyField with a regex tokenizer to extract and index the date into a separate field. You could index the type separately via the same mechanism. -Yonik On 9/3/07, Bharani [EMAIL PROTECTED] wrote: Hi, I have got two sets of document 1) Primary Document 2) Occurrences of primary document Since there is no such thing as join i can either a) Post the primary document with occurrences as multi valued field or b) Post the primary document for every occurrences i.e. classic de-normalized route My problem with Option a) This works great as long as the occurrence is a single field but if i had a group of fields that describes the occurrence then the search returns wrong results becuase of the nature of text search i.e date1 Jan 2007/date type review/type date 2 Jan 2007 /date type revision/type if i search for 2 Jan 2007 and date 1 Jan 2007 /date i will get a hit (which is wrong) becuase there is no grouping of fields to associate date and type as one unit. If i merge them as one entity then i cant use the range quieries for date Option B) This would result in large number of documents and even if i try with index only and not store i am still have to deal with duplicate hit - becuase all i want is the primary document Is there a better approach to the problem? Thanks Bharani -- View this message in context: http://www.nabble.com/Multiple-Values--Structured--tf4370282.html#a12456399 Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Multiple-Values--Structured--tf4370282.html#a12479706 Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multiple Values -Structured?
No Size is not an issue - atleast for now. But i am thinking of implementing some sort of duplicate removal based on field. I happen to look at this thread http://www.nabble.com/Group-results-by-field--tf3683765.html#a10296394 Tom mentions some changes to the code to do that so was thinking in those lines too. Any idea how you can do this with out changes to solr? Thanks Bharani Jed Reynolds-2 wrote: Bharani wrote: Hi, I have got two sets of document 1) Primary Document 2) Occurrences of primary document Since there is no such thing as join i can either a) Post the primary document with occurrences as multi valued field or b) Post the primary document for every occurrences i.e. classic de-normalized route My problem with Option a) This works great as long as the occurrence is a single field but if i had a group of fields that describes the occurrence then the search returns wrong results becuase of the nature of text search i.e date1 Jan 2007/date type review/type date 2 Jan 2007 /date type revision/type if i search for 2 Jan 2007 and date 1 Jan 2007 /date i will get a hit (which is wrong) becuase there is no grouping of fields to associate date and type as one unit. If i merge them as one entity then i cant use the range quieries for date Option B) This would result in large number of documents and even if i try with index only and not store i am still have to deal with duplicate hit - becuase all i want is the primary document Is there a better approach to the problem? Are you concerned about the size of your index? One of the difficulties that you're going to find with multi-valued fields is that they are an unordered collection without relation. If you have a document with a list of editors and revisions, the two fields have no inherent correlation unless your application can extract it from the data itself. [doc] [id]123[/id] [str name=name]hello world[/str] [array name=editor] [str name=editor]Fred[/str] [str name=editor]Bob[/str] [/array] [array name=revisiondate] [date name=revisiondate]2006-01-01T00:00:00Z[/date] [date name=revisiondate]2006-01-02T00:00:00Z[/date] [/array] [/doc] If your application can decipher that and do a slice on it showing a revision...then brilliant! But if the multi-value fields are out of order, that might make a significant different. I would create a document per revision and take advantage of range queries and sorting available at the query level. Jed -- View this message in context: http://www.nabble.com/Multiple-Values--Structured--tf4370282.html#a12479721 Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multiple Values -Structured?
Bharani wrote: Hi, I have got two sets of document 1) Primary Document 2) Occurrences of primary document Since there is no such thing as join i can either a) Post the primary document with occurrences as multi valued field or b) Post the primary document for every occurrences i.e. classic de-normalized route My problem with Option a) This works great as long as the occurrence is a single field but if i had a group of fields that describes the occurrence then the search returns wrong results becuase of the nature of text search i.e date1 Jan 2007/date type review/type date 2 Jan 2007 /date type revision/type if i search for 2 Jan 2007 and date 1 Jan 2007 /date i will get a hit (which is wrong) becuase there is no grouping of fields to associate date and type as one unit. If i merge them as one entity then i cant use the range quieries for date Option B) This would result in large number of documents and even if i try with index only and not store i am still have to deal with duplicate hit - becuase all i want is the primary document Is there a better approach to the problem? Are you concerned about the size of your index? One of the difficulties that you're going to find with multi-valued fields is that they are an unordered collection without relation. If you have a document with a list of editors and revisions, the two fields have no inherent correlation unless your application can extract it from the data itself. [doc] [id]123[/id] [str name=name]hello world[/str] [array name=editor] [str name=editor]Fred[/str] [str name=editor]Bob[/str] [/array] [array name=revisiondate] [date name=revisiondate]2006-01-01T00:00:00Z[/date] [date name=revisiondate]2006-01-02T00:00:00Z[/date] [/array] [/doc] If your application can decipher that and do a slice on it showing a revision...then brilliant! But if the multi-value fields are out of order, that might make a significant different. I would create a document per revision and take advantage of range queries and sorting available at the query level. Jed