Re: Multiple Values -Structured?

2007-09-04 Thread Erik Hatcher

multiValued fields retain their order, for the record.

Erik


On Sep 4, 2007, at 12:37 AM, Jed Reynolds wrote:
One of the difficulties that you're going to find with multi-valued  
fields is that they are an unordered collection without relation.  
If you have a document with a list of editors and revisions, the  
two fields have no inherent correlation unless your application can  
extract it from the data itself.


[doc]
   [id]123[/id]
   [str name=name]hello world[/str]
   [array name=editor]
   [str name=editor]Fred[/str]
   [str name=editor]Bob[/str]
   [/array]
   [array name=revisiondate]
  [date name=revisiondate]2006-01-01T00:00:00Z[/date]
  [date name=revisiondate]2006-01-02T00:00:00Z[/date]
   [/array]
[/doc]

If your application can decipher that and do a slice on it showing  
a revision...then brilliant! But if the multi-value fields are out  
of order, that might make a significant different.


I would create a document per revision and take advantage of range  
queries and sorting available at the query level.





Jed




Re: Multiple Values -Structured?

2007-09-04 Thread Yonik Seeley
You could index both a compound field and the components separately.
This could be simplified by sending the value in once as the compound format:
  review,1 Jan 2007
  revision, 2 Jan 200
And then use a copyField with a regex tokenizer to extract and index
the date into a separate field.  You could index the type separately
via the same mechanism.

-Yonik

On 9/3/07, Bharani [EMAIL PROTECTED] wrote:

 Hi,

 I have got two sets of document

 1) Primary Document
 2) Occurrences of primary document

 Since there is no such thing as join i can either

 a) Post the primary document with occurrences as multi valued field
  or
 b) Post the primary document for every occurrences i.e. classic
 de-normalized route

 My problem with

 Option a) This works great as long as the occurrence is a single field but
 if i had a group of fields that describes the occurrence then the search
 returns wrong results becuase of the nature of text search

 i.e date1 Jan 2007/date
 type review/type

 date 2 Jan 2007 /date
 type revision/type

 if i search for 2 Jan 2007 and date 1 Jan 2007 /date i will get a hit
 (which is wrong)  becuase there is no grouping of fields to associate date
 and type as one unit. If i merge them as one entity then i cant use the
 range quieries for date

 Option B) This would result in large number of documents and even if i try
 with index only and not store i am still have to deal with duplicate hit -
 becuase all i want is the primary document


 Is there a better approach to the problem?

 Thanks
 Bharani


 --
 View this message in context: 
 http://www.nabble.com/Multiple-Values--Structured--tf4370282.html#a12456399
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: Multiple Values -Structured?

2007-09-04 Thread Bharani

Thanks Yonik - I didnt know that before. But i am not sure how i can use the
range queries on this compound field so that i dont get the wrong result. 

-Bharani


Yonik Seeley wrote:
 
 You could index both a compound field and the components separately.
 This could be simplified by sending the value in once as the compound
 format:
   review,1 Jan 2007
   revision, 2 Jan 200
 And then use a copyField with a regex tokenizer to extract and index
 the date into a separate field.  You could index the type separately
 via the same mechanism.
 
 -Yonik
 
 On 9/3/07, Bharani [EMAIL PROTECTED] wrote:

 Hi,

 I have got two sets of document

 1) Primary Document
 2) Occurrences of primary document

 Since there is no such thing as join i can either

 a) Post the primary document with occurrences as multi valued field
  or
 b) Post the primary document for every occurrences i.e. classic
 de-normalized route

 My problem with

 Option a) This works great as long as the occurrence is a single field
 but
 if i had a group of fields that describes the occurrence then the search
 returns wrong results becuase of the nature of text search

 i.e date1 Jan 2007/date
 type review/type

 date 2 Jan 2007 /date
 type revision/type

 if i search for 2 Jan 2007 and date 1 Jan 2007 /date i will get a hit
 (which is wrong)  becuase there is no grouping of fields to associate
 date
 and type as one unit. If i merge them as one entity then i cant use the
 range quieries for date

 Option B) This would result in large number of documents and even if i
 try
 with index only and not store i am still have to deal with duplicate hit
 -
 becuase all i want is the primary document


 Is there a better approach to the problem?

 Thanks
 Bharani


 --
 View this message in context:
 http://www.nabble.com/Multiple-Values--Structured--tf4370282.html#a12456399
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 

-- 
View this message in context: 
http://www.nabble.com/Multiple-Values--Structured--tf4370282.html#a12479706
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Multiple Values -Structured?

2007-09-04 Thread Bharani

No Size is not an issue - atleast for now. But i am thinking of implementing
some sort of duplicate removal based on field. I happen to look at this
thread
http://www.nabble.com/Group-results-by-field--tf3683765.html#a10296394

Tom mentions some changes to the code to do that so was thinking in those
lines too. Any idea how you can do this with out changes to solr?

Thanks
Bharani



Jed Reynolds-2 wrote:
 
 Bharani wrote:
 Hi,

 I have got two sets of document

 1) Primary Document
 2) Occurrences of primary document

 Since there is no such thing as join i can either 

 a) Post the primary document with occurrences as multi valued field
  or
 b) Post the primary document for every occurrences i.e. classic
 de-normalized route

 My problem with 

 Option a) This works great as long as the occurrence is a single field
 but
 if i had a group of fields that describes the occurrence then the search
 returns wrong results becuase of the nature of text search

 i.e date1 Jan 2007/date
 type review/type

 date 2 Jan 2007 /date
 type revision/type

 if i search for 2 Jan 2007 and date 1 Jan 2007 /date i will get a hit
 (which is wrong)  becuase there is no grouping of fields to associate
 date
 and type as one unit. If i merge them as one entity then i cant use the
 range quieries for date

 Option B) This would result in large number of documents and even if i
 try
 with index only and not store i am still have to deal with duplicate hit
 -
 becuase all i want is the primary document


 Is there a better approach to the problem?
   
 
 Are you concerned about the size of your index?
 
 One of the difficulties that you're going to find with multi-valued 
 fields is that they are an unordered collection without relation. If you 
 have a document with a list of editors and revisions, the two fields 
 have no inherent correlation unless your application can extract it from 
 the data itself.
 
 [doc]
 [id]123[/id]
 [str name=name]hello world[/str]
 [array name=editor]
 [str name=editor]Fred[/str]
 [str name=editor]Bob[/str]
 [/array]
 [array name=revisiondate]
[date name=revisiondate]2006-01-01T00:00:00Z[/date]
[date name=revisiondate]2006-01-02T00:00:00Z[/date]
 [/array]
 [/doc]
 
 If your application can decipher that and do a slice on it showing a 
 revision...then brilliant! But if the multi-value fields are out of 
 order, that might make a significant different.
 
 I would create a document per revision and take advantage of range 
 queries and sorting available at the query level.
 
 
 
 
 Jed
 
 

-- 
View this message in context: 
http://www.nabble.com/Multiple-Values--Structured--tf4370282.html#a12479721
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Multiple Values -Structured?

2007-09-03 Thread Jed Reynolds

Bharani wrote:

Hi,

I have got two sets of document

1) Primary Document
2) Occurrences of primary document

Since there is no such thing as join i can either 


a) Post the primary document with occurrences as multi valued field
 or
b) Post the primary document for every occurrences i.e. classic
de-normalized route

My problem with 


Option a) This works great as long as the occurrence is a single field but
if i had a group of fields that describes the occurrence then the search
returns wrong results becuase of the nature of text search

i.e date1 Jan 2007/date
type review/type

date 2 Jan 2007 /date
type revision/type

if i search for 2 Jan 2007 and date 1 Jan 2007 /date i will get a hit
(which is wrong)  becuase there is no grouping of fields to associate date
and type as one unit. If i merge them as one entity then i cant use the
range quieries for date

Option B) This would result in large number of documents and even if i try
with index only and not store i am still have to deal with duplicate hit -
becuase all i want is the primary document


Is there a better approach to the problem?
  


Are you concerned about the size of your index?

One of the difficulties that you're going to find with multi-valued 
fields is that they are an unordered collection without relation. If you 
have a document with a list of editors and revisions, the two fields 
have no inherent correlation unless your application can extract it from 
the data itself.


[doc]
   [id]123[/id]
   [str name=name]hello world[/str]
   [array name=editor]
   [str name=editor]Fred[/str]
   [str name=editor]Bob[/str]
   [/array]
   [array name=revisiondate]
  [date name=revisiondate]2006-01-01T00:00:00Z[/date]
  [date name=revisiondate]2006-01-02T00:00:00Z[/date]
   [/array]
[/doc]

If your application can decipher that and do a slice on it showing a 
revision...then brilliant! But if the multi-value fields are out of 
order, that might make a significant different.


I would create a document per revision and take advantage of range 
queries and sorting available at the query level.





Jed