Re: Field Collapsing (was Re: Schema for group/child entity setup)

2009-09-10 Thread Uri Boness
All work and progress on this patch is done under the JIRA issue: 
https://issues.apache.org/jira/browse/SOLR-236



R. Tan wrote:

The patch which will be committed soon will add this functionality.




Where can I follow the progress of this patch?


On Mon, Sep 7, 2009 at 3:38 PM, Uri Boness ubon...@gmail.com wrote:

  

Great. Nice site and very similar to my requirements.

  

thanks.

 So, right now, you get all field values by default?

Right now, no field values are returned for the collapsed documents. The

patch which will be committed soon will add this functionality.


R. Tan wrote:



Great. Nice site and very similar to my requirements.



  

There's work on the patch that is being done now which will enable you to
ask for specific field values of the collapsed documents using a
dedicated
request parameter.




So, right now, you get all field values by default?


On Sun, Sep 6, 2009 at 3:58 AM, Uri Boness ubon...@gmail.com wrote:



  

You can check out http://www.ilocal.nl. If you search for a bank in
Amsterdam then you'll see that a lot of the results are collapsed. For
this
we used an older version of this patch (which works on 1.3) but a lot has
changed since then. We're currently using this patch on another project,
but
it's not live yet.


Uri

R. Tan wrote:





Thanks Uri. Your personal suggestion is appreciated and I think I'll
follow
your advice. We're still early in development and 1.4 would be a good
choice. I hope I can get field collapsing to work with my requirements.
Do
you know any live site using field collapsing already?

On Sat, Sep 5, 2009 at 5:57 PM, Uri Boness ubon...@gmail.com wrote:





  

There's work on the patch that is being done now which will enable you
to
ask for specific field values of the collapsed documents using a
dedicated
request parameter. This work is not committed yet to the latest patch,
but
will be very soon. There is of course a drawback to that as well, the
collapsed documents set can be very large (depends on your data of
course)
in which case the returned result which includes the fields values can
be
rather large, which will impact performance, this is why this feature
will
be enabled only if you specify this extra parameter - by default no
field
values will be returned.

AFAIK, the latest patch should work fine with the latest build. Martijn
(which is the main maintainer of this patch) tries to keep it up to
date
with the latest builds. But I guess the safest way is to work with the
nightly build of the same date as the latest patch (though I would give
it a
try first with the latest build).

BTW, it's not an official suggestion from the Solr development team,
but
if
you ask me, if you have to choose now whether to use 1.3 or 1.4-dev, I
would
go for the later. 1.4 is supposed to be released in the upcoming week
or
two
and it bring loads of bug fixes, enhancements and extra functionality.
But
again, this is my personal suggestion.


cheers,
Uri

R. Tan wrote:







Okay. Thanks for giving an insight on how it works in general. Without
trying it myself, are the field values for the collapsed ones also
part
of
the results data?
What is the latest build that is safe to use on a production
environment?
I'd probably go for that and use field collapsing.

Thank you very much.


On Fri, Sep 4, 2009 at 4:49 AM, Uri Boness ubon...@gmail.com wrote:







  

The collapsed documents are represented by one master document
which
can
be part of the normal search result (the doc list), so pagination
just
works
as expected, meaning taking only the returned documents in account
(ignoring
the collapsed ones). As for the scoring, the master document is
actually
the document with the highest score in the collapsed group.

As for Solr 1.3 compatibility... well... it's very hart to tell. All
latest
patch are certainly *not* 1.3 compatible (I think they're also
depending
on
some changes in lucene which are not available for solr 1.3). I guess
you'll
have to try some of the old patches, but I'm not sure about their
stability.

cheers,
Uri


R. Tan wrote:









Thanks Uri. How does paging and scoring work when using field
collapsing?
What patch works with 1.3? Is it production ready?

R


On Thu, Sep 3, 2009 at 3:54 PM, Uri Boness ubon...@gmail.com
wrote:









  

The development on this patch is quite active. It works well for
single
solr instance, but distributed search (ie. shards) is not yet
supported.
Using this page you can group search results based on a specific
field.
There are two flavors of field collapsing - adjacent and
non-adjacent,
the
former collapses only document which happen to be located next to
each
other
in the otherwise-non-collapsed results set. The later (the
non-adjacent)
one
collapses all documents with the same field value (regardless of
their
position in the otherwise-non-collapsed results set). Note, that
non-adjacent 

Re: Field Collapsing (was Re: Schema for group/child entity setup)

2009-09-10 Thread Uri Boness

The current patch definitely supports facet before and after the collapsing.

Stephen Weiss wrote:
I just noticed this and it reminded me of an issue I've had with 
collapsed faceting with an older version of the patch in Solr 1.3.  
Would it be possible, if we can get the terms for all the collapsed 
documents on a field, to then facet each collapsed document on the 
unique terms it has collectively?  What I mean is for example:


Doc 1, 2, 3 collapse together on some other field

Doc 1 is the main document and has the colors blue and red
Doc 2 has red
Doc 3 has green

For the purposes of faceting, it would be ideal in our case for 
faceting on color to count one each for blue, red, and green on this 
document (the user drills down on this value to yet another collapsed 
set).  Right now, when you facet after collapse you just get blue and 
red (green is dropped because it collapses out).  To the user it makes 
the counts seem inaccurate, like they're missing something.  Instead 
we facet before collapsing and get an inflated value (which ticks 2 
for red - but when you drill down, you still only get 1 because Doc 1 
and Doc 2 collapse together again).  Either way it's not ideal.


At the time (many months ago) there was no way to account for this but 
it sounds like this patch could make it possible, maybe.


Thanks!

--
Steve

On Sep 5, 2009, at 5:57 AM, Uri Boness wrote:

There's work on the patch that is being done now which will enable 
you to ask for specific field values of the collapsed documents using 
a dedicated request parameter. This work is not committed yet to the 
latest patch, but will be very soon. There is of course a drawback to 
that as well, the collapsed documents set can be very large (depends 
on your data of course) in which case the returned result which 
includes the fields values can be rather large, which will impact 
performance, this is why this feature will be enabled only if you 
specify this extra parameter - by default no field values will be 
returned.


AFAIK, the latest patch should work fine with the latest build. 
Martijn (which is the main maintainer of this patch) tries to keep it 
up to date with the latest builds. But I guess the safest way is to 
work with the nightly build of the same date as the latest patch 
(though I would give it a try first with the latest build).


BTW, it's not an official suggestion from the Solr development team, 
but if you ask me, if you have to choose now whether to use 1.3 or 
1.4-dev, I would go for the later. 1.4 is supposed to be released in 
the upcoming week or two and it bring loads of bug fixes, 
enhancements and extra functionality. But again, this is my personal 
suggestion.


cheers,
Uri





Re: Field Collapsing (was Re: Schema for group/child entity setup)

2009-09-09 Thread R. Tan

 The patch which will be committed soon will add this functionality.


Where can I follow the progress of this patch?


On Mon, Sep 7, 2009 at 3:38 PM, Uri Boness ubon...@gmail.com wrote:


 Great. Nice site and very similar to my requirements.

 thanks.

  So, right now, you get all field values by default?

 Right now, no field values are returned for the collapsed documents. The
 patch which will be committed soon will add this functionality.


 R. Tan wrote:

 Great. Nice site and very similar to my requirements.



 There's work on the patch that is being done now which will enable you to
 ask for specific field values of the collapsed documents using a
 dedicated
 request parameter.




 So, right now, you get all field values by default?


 On Sun, Sep 6, 2009 at 3:58 AM, Uri Boness ubon...@gmail.com wrote:



 You can check out http://www.ilocal.nl. If you search for a bank in
 Amsterdam then you'll see that a lot of the results are collapsed. For
 this
 we used an older version of this patch (which works on 1.3) but a lot has
 changed since then. We're currently using this patch on another project,
 but
 it's not live yet.


 Uri

 R. Tan wrote:



 Thanks Uri. Your personal suggestion is appreciated and I think I'll
 follow
 your advice. We're still early in development and 1.4 would be a good
 choice. I hope I can get field collapsing to work with my requirements.
 Do
 you know any live site using field collapsing already?

 On Sat, Sep 5, 2009 at 5:57 PM, Uri Boness ubon...@gmail.com wrote:





 There's work on the patch that is being done now which will enable you
 to
 ask for specific field values of the collapsed documents using a
 dedicated
 request parameter. This work is not committed yet to the latest patch,
 but
 will be very soon. There is of course a drawback to that as well, the
 collapsed documents set can be very large (depends on your data of
 course)
 in which case the returned result which includes the fields values can
 be
 rather large, which will impact performance, this is why this feature
 will
 be enabled only if you specify this extra parameter - by default no
 field
 values will be returned.

 AFAIK, the latest patch should work fine with the latest build. Martijn
 (which is the main maintainer of this patch) tries to keep it up to
 date
 with the latest builds. But I guess the safest way is to work with the
 nightly build of the same date as the latest patch (though I would give
 it a
 try first with the latest build).

 BTW, it's not an official suggestion from the Solr development team,
 but
 if
 you ask me, if you have to choose now whether to use 1.3 or 1.4-dev, I
 would
 go for the later. 1.4 is supposed to be released in the upcoming week
 or
 two
 and it bring loads of bug fixes, enhancements and extra functionality.
 But
 again, this is my personal suggestion.


 cheers,
 Uri

 R. Tan wrote:





 Okay. Thanks for giving an insight on how it works in general. Without
 trying it myself, are the field values for the collapsed ones also
 part
 of
 the results data?
 What is the latest build that is safe to use on a production
 environment?
 I'd probably go for that and use field collapsing.

 Thank you very much.


 On Fri, Sep 4, 2009 at 4:49 AM, Uri Boness ubon...@gmail.com wrote:







 The collapsed documents are represented by one master document
 which
 can
 be part of the normal search result (the doc list), so pagination
 just
 works
 as expected, meaning taking only the returned documents in account
 (ignoring
 the collapsed ones). As for the scoring, the master document is
 actually
 the document with the highest score in the collapsed group.

 As for Solr 1.3 compatibility... well... it's very hart to tell. All
 latest
 patch are certainly *not* 1.3 compatible (I think they're also
 depending
 on
 some changes in lucene which are not available for solr 1.3). I guess
 you'll
 have to try some of the old patches, but I'm not sure about their
 stability.

 cheers,
 Uri


 R. Tan wrote:







 Thanks Uri. How does paging and scoring work when using field
 collapsing?
 What patch works with 1.3? Is it production ready?

 R


 On Thu, Sep 3, 2009 at 3:54 PM, Uri Boness ubon...@gmail.com
 wrote:









 The development on this patch is quite active. It works well for
 single
 solr instance, but distributed search (ie. shards) is not yet
 supported.
 Using this page you can group search results based on a specific
 field.
 There are two flavors of field collapsing - adjacent and
 non-adjacent,
 the
 former collapses only document which happen to be located next to
 each
 other
 in the otherwise-non-collapsed results set. The later (the
 non-adjacent)
 one
 collapses all documents with the same field value (regardless of
 their
 position in the otherwise-non-collapsed results set). Note, that
 non-adjacent performs better than adjacent one. There's currently
 discussion
 to extend this support so in addition to collapsing the documents,
 extra
 information 

Re: Field Collapsing (was Re: Schema for group/child entity setup)

2009-09-09 Thread Stephen Weiss
I just noticed this and it reminded me of an issue I've had with  
collapsed faceting with an older version of the patch in Solr 1.3.   
Would it be possible, if we can get the terms for all the collapsed  
documents on a field, to then facet each collapsed document on the  
unique terms it has collectively?  What I mean is for example:


Doc 1, 2, 3 collapse together on some other field

Doc 1 is the main document and has the colors blue and red
Doc 2 has red
Doc 3 has green

For the purposes of faceting, it would be ideal in our case for  
faceting on color to count one each for blue, red, and green on this  
document (the user drills down on this value to yet another collapsed  
set).  Right now, when you facet after collapse you just get blue and  
red (green is dropped because it collapses out).  To the user it makes  
the counts seem inaccurate, like they're missing something.  Instead  
we facet before collapsing and get an inflated value (which ticks 2  
for red - but when you drill down, you still only get 1 because Doc 1  
and Doc 2 collapse together again).  Either way it's not ideal.


At the time (many months ago) there was no way to account for this but  
it sounds like this patch could make it possible, maybe.


Thanks!

--
Steve

On Sep 5, 2009, at 5:57 AM, Uri Boness wrote:

There's work on the patch that is being done now which will enable  
you to ask for specific field values of the collapsed documents  
using a dedicated request parameter. This work is not committed yet  
to the latest patch, but will be very soon. There is of course a  
drawback to that as well, the collapsed documents set can be very  
large (depends on your data of course) in which case the returned  
result which includes the fields values can be rather large, which  
will impact performance, this is why this feature will be enabled  
only if you specify this extra parameter - by default no field  
values will be returned.


AFAIK, the latest patch should work fine with the latest build.  
Martijn (which is the main maintainer of this patch) tries to keep  
it up to date with the latest builds. But I guess the safest way is  
to work with the nightly build of the same date as the latest patch  
(though I would give it a try first with the latest build).


BTW, it's not an official suggestion from the Solr development team,  
but if you ask me, if you have to choose now whether to use 1.3 or  
1.4-dev, I would go for the later. 1.4 is supposed to be released in  
the upcoming week or two and it bring loads of bug fixes,  
enhancements and extra functionality. But again, this is my personal  
suggestion.


cheers,
Uri




Re: Field Collapsing (was Re: Schema for group/child entity setup)

2009-09-07 Thread Uri Boness


Great. Nice site and very similar to my requirements.

thanks.


So, right now, you get all field values by default?
Right now, no field values are returned for the collapsed documents. The 
patch which will be committed soon will add this functionality.


R. Tan wrote:

Great. Nice site and very similar to my requirements.

  

There's work on the patch that is being done now which will enable you to
ask for specific field values of the collapsed documents using a dedicated
request parameter.




So, right now, you get all field values by default?


On Sun, Sep 6, 2009 at 3:58 AM, Uri Boness ubon...@gmail.com wrote:

  

You can check out http://www.ilocal.nl. If you search for a bank in
Amsterdam then you'll see that a lot of the results are collapsed. For this
we used an older version of this patch (which works on 1.3) but a lot has
changed since then. We're currently using this patch on another project, but
it's not live yet.


Uri

R. Tan wrote:



Thanks Uri. Your personal suggestion is appreciated and I think I'll
follow
your advice. We're still early in development and 1.4 would be a good
choice. I hope I can get field collapsing to work with my requirements. Do
you know any live site using field collapsing already?

On Sat, Sep 5, 2009 at 5:57 PM, Uri Boness ubon...@gmail.com wrote:



  

There's work on the patch that is being done now which will enable you to
ask for specific field values of the collapsed documents using a
dedicated
request parameter. This work is not committed yet to the latest patch,
but
will be very soon. There is of course a drawback to that as well, the
collapsed documents set can be very large (depends on your data of
course)
in which case the returned result which includes the fields values can be
rather large, which will impact performance, this is why this feature
will
be enabled only if you specify this extra parameter - by default no field
values will be returned.

AFAIK, the latest patch should work fine with the latest build. Martijn
(which is the main maintainer of this patch) tries to keep it up to date
with the latest builds. But I guess the safest way is to work with the
nightly build of the same date as the latest patch (though I would give
it a
try first with the latest build).

BTW, it's not an official suggestion from the Solr development team, but
if
you ask me, if you have to choose now whether to use 1.3 or 1.4-dev, I
would
go for the later. 1.4 is supposed to be released in the upcoming week or
two
and it bring loads of bug fixes, enhancements and extra functionality.
But
again, this is my personal suggestion.


cheers,
Uri

R. Tan wrote:





Okay. Thanks for giving an insight on how it works in general. Without
trying it myself, are the field values for the collapsed ones also part
of
the results data?
What is the latest build that is safe to use on a production
environment?
I'd probably go for that and use field collapsing.

Thank you very much.


On Fri, Sep 4, 2009 at 4:49 AM, Uri Boness ubon...@gmail.com wrote:





  

The collapsed documents are represented by one master document which
can
be part of the normal search result (the doc list), so pagination just
works
as expected, meaning taking only the returned documents in account
(ignoring
the collapsed ones). As for the scoring, the master document is
actually
the document with the highest score in the collapsed group.

As for Solr 1.3 compatibility... well... it's very hart to tell. All
latest
patch are certainly *not* 1.3 compatible (I think they're also
depending
on
some changes in lucene which are not available for solr 1.3). I guess
you'll
have to try some of the old patches, but I'm not sure about their
stability.

cheers,
Uri


R. Tan wrote:







Thanks Uri. How does paging and scoring work when using field
collapsing?
What patch works with 1.3? Is it production ready?

R


On Thu, Sep 3, 2009 at 3:54 PM, Uri Boness ubon...@gmail.com wrote:







  

The development on this patch is quite active. It works well for
single
solr instance, but distributed search (ie. shards) is not yet
supported.
Using this page you can group search results based on a specific
field.
There are two flavors of field collapsing - adjacent and
non-adjacent,
the
former collapses only document which happen to be located next to
each
other
in the otherwise-non-collapsed results set. The later (the
non-adjacent)
one
collapses all documents with the same field value (regardless of
their
position in the otherwise-non-collapsed results set). Note, that
non-adjacent performs better than adjacent one. There's currently
discussion
to extend this support so in addition to collapsing the documents,
extra
information will be returned for the collapsed documents (see the
discussion
on the issue page).

Uri


R. Tan wrote:









I think this is what I'm looking for. What is the status of this
patch?

On Thu, Sep 3, 2009 at 12:00 PM, R. Tan 

Re: Field Collapsing (was Re: Schema for group/child entity setup)

2009-09-05 Thread Uri Boness
There's work on the patch that is being done now which will enable you 
to ask for specific field values of the collapsed documents using a 
dedicated request parameter. This work is not committed yet to the 
latest patch, but will be very soon. There is of course a drawback to 
that as well, the collapsed documents set can be very large (depends on 
your data of course) in which case the returned result which includes 
the fields values can be rather large, which will impact performance, 
this is why this feature will be enabled only if you specify this extra 
parameter - by default no field values will be returned.


AFAIK, the latest patch should work fine with the latest build. Martijn 
(which is the main maintainer of this patch) tries to keep it up to date 
with the latest builds. But I guess the safest way is to work with the 
nightly build of the same date as the latest patch (though I would give 
it a try first with the latest build).


BTW, it's not an official suggestion from the Solr development team, but 
if you ask me, if you have to choose now whether to use 1.3 or 1.4-dev, 
I would go for the later. 1.4 is supposed to be released in the upcoming 
week or two and it bring loads of bug fixes, enhancements and extra 
functionality. But again, this is my personal suggestion.


cheers,
Uri

R. Tan wrote:

Okay. Thanks for giving an insight on how it works in general. Without
trying it myself, are the field values for the collapsed ones also part of
the results data?
What is the latest build that is safe to use on a production environment?
I'd probably go for that and use field collapsing.

Thank you very much.


On Fri, Sep 4, 2009 at 4:49 AM, Uri Boness ubon...@gmail.com wrote:

  

The collapsed documents are represented by one master document which can
be part of the normal search result (the doc list), so pagination just works
as expected, meaning taking only the returned documents in account (ignoring
the collapsed ones). As for the scoring, the master document is actually
the document with the highest score in the collapsed group.

As for Solr 1.3 compatibility... well... it's very hart to tell. All latest
patch are certainly *not* 1.3 compatible (I think they're also depending on
some changes in lucene which are not available for solr 1.3). I guess you'll
have to try some of the old patches, but I'm not sure about their stability.

cheers,
Uri


R. Tan wrote:



Thanks Uri. How does paging and scoring work when using field collapsing?
What patch works with 1.3? Is it production ready?

R


On Thu, Sep 3, 2009 at 3:54 PM, Uri Boness ubon...@gmail.com wrote:



  

The development on this patch is quite active. It works well for single
solr instance, but distributed search (ie. shards) is not yet supported.
Using this page you can group search results based on a specific field.
There are two flavors of field collapsing - adjacent and non-adjacent,
the
former collapses only document which happen to be located next to each
other
in the otherwise-non-collapsed results set. The later (the non-adjacent)
one
collapses all documents with the same field value (regardless of their
position in the otherwise-non-collapsed results set). Note, that
non-adjacent performs better than adjacent one. There's currently
discussion
to extend this support so in addition to collapsing the documents, extra
information will be returned for the collapsed documents (see the
discussion
on the issue page).

Uri


R. Tan wrote:





I think this is what I'm looking for. What is the status of this patch?

On Thu, Sep 3, 2009 at 12:00 PM, R. Tan tanrihae...@gmail.com wrote:





  

Hi Solrers,
I would like to get your opinion on how to best approach a search
requirement that I have. The scenario is I have a set of business
listings
that may be group into one parent business (such as 7-eleven having
several
locations). On the results page, I only want 7-eleven to show up once
but
also show how many locations matched the query (facet filtered by
state,
for
example) and maybe a preview of the some of the locations.

Searching for the business name is straightforward but the locations
within
the a result is quite tricky. I can do the opposite, searching for the
locations and faceting on business names, but it will still basically
be
the
same thing and repeat results with the same business name.

Any advice?

Thanks,
R








  
  


  


Re: Field Collapsing (was Re: Schema for group/child entity setup)

2009-09-05 Thread R. Tan
Thanks Uri. Your personal suggestion is appreciated and I think I'll follow
your advice. We're still early in development and 1.4 would be a good
choice. I hope I can get field collapsing to work with my requirements. Do
you know any live site using field collapsing already?

On Sat, Sep 5, 2009 at 5:57 PM, Uri Boness ubon...@gmail.com wrote:

 There's work on the patch that is being done now which will enable you to
 ask for specific field values of the collapsed documents using a dedicated
 request parameter. This work is not committed yet to the latest patch, but
 will be very soon. There is of course a drawback to that as well, the
 collapsed documents set can be very large (depends on your data of course)
 in which case the returned result which includes the fields values can be
 rather large, which will impact performance, this is why this feature will
 be enabled only if you specify this extra parameter - by default no field
 values will be returned.

 AFAIK, the latest patch should work fine with the latest build. Martijn
 (which is the main maintainer of this patch) tries to keep it up to date
 with the latest builds. But I guess the safest way is to work with the
 nightly build of the same date as the latest patch (though I would give it a
 try first with the latest build).

 BTW, it's not an official suggestion from the Solr development team, but if
 you ask me, if you have to choose now whether to use 1.3 or 1.4-dev, I would
 go for the later. 1.4 is supposed to be released in the upcoming week or two
 and it bring loads of bug fixes, enhancements and extra functionality. But
 again, this is my personal suggestion.


 cheers,
 Uri

 R. Tan wrote:

 Okay. Thanks for giving an insight on how it works in general. Without
 trying it myself, are the field values for the collapsed ones also part of
 the results data?
 What is the latest build that is safe to use on a production environment?
 I'd probably go for that and use field collapsing.

 Thank you very much.


 On Fri, Sep 4, 2009 at 4:49 AM, Uri Boness ubon...@gmail.com wrote:



 The collapsed documents are represented by one master document which
 can
 be part of the normal search result (the doc list), so pagination just
 works
 as expected, meaning taking only the returned documents in account
 (ignoring
 the collapsed ones). As for the scoring, the master document is
 actually
 the document with the highest score in the collapsed group.

 As for Solr 1.3 compatibility... well... it's very hart to tell. All
 latest
 patch are certainly *not* 1.3 compatible (I think they're also depending
 on
 some changes in lucene which are not available for solr 1.3). I guess
 you'll
 have to try some of the old patches, but I'm not sure about their
 stability.

 cheers,
 Uri


 R. Tan wrote:



 Thanks Uri. How does paging and scoring work when using field
 collapsing?
 What patch works with 1.3? Is it production ready?

 R


 On Thu, Sep 3, 2009 at 3:54 PM, Uri Boness ubon...@gmail.com wrote:





 The development on this patch is quite active. It works well for single
 solr instance, but distributed search (ie. shards) is not yet
 supported.
 Using this page you can group search results based on a specific field.
 There are two flavors of field collapsing - adjacent and non-adjacent,
 the
 former collapses only document which happen to be located next to each
 other
 in the otherwise-non-collapsed results set. The later (the
 non-adjacent)
 one
 collapses all documents with the same field value (regardless of their
 position in the otherwise-non-collapsed results set). Note, that
 non-adjacent performs better than adjacent one. There's currently
 discussion
 to extend this support so in addition to collapsing the documents,
 extra
 information will be returned for the collapsed documents (see the
 discussion
 on the issue page).

 Uri


 R. Tan wrote:





 I think this is what I'm looking for. What is the status of this
 patch?

 On Thu, Sep 3, 2009 at 12:00 PM, R. Tan tanrihae...@gmail.com
 wrote:







 Hi Solrers,
 I would like to get your opinion on how to best approach a search
 requirement that I have. The scenario is I have a set of business
 listings
 that may be group into one parent business (such as 7-eleven having
 several
 locations). On the results page, I only want 7-eleven to show up once
 but
 also show how many locations matched the query (facet filtered by
 state,
 for
 example) and maybe a preview of the some of the locations.

 Searching for the business name is straightforward but the locations
 within
 the a result is quite tricky. I can do the opposite, searching for
 the
 locations and faceting on business names, but it will still basically
 be
 the
 same thing and repeat results with the same business name.

 Any advice?

 Thanks,
 R


















Re: Field Collapsing (was Re: Schema for group/child entity setup)

2009-09-05 Thread Uri Boness
You can check out http://www.ilocal.nl. If you search for a bank in 
Amsterdam then you'll see that a lot of the results are collapsed. For 
this we used an older version of this patch (which works on 1.3) but a 
lot has changed since then. We're currently using this patch on another 
project, but it's not live yet.


Uri

R. Tan wrote:

Thanks Uri. Your personal suggestion is appreciated and I think I'll follow
your advice. We're still early in development and 1.4 would be a good
choice. I hope I can get field collapsing to work with my requirements. Do
you know any live site using field collapsing already?

On Sat, Sep 5, 2009 at 5:57 PM, Uri Boness ubon...@gmail.com wrote:

  

There's work on the patch that is being done now which will enable you to
ask for specific field values of the collapsed documents using a dedicated
request parameter. This work is not committed yet to the latest patch, but
will be very soon. There is of course a drawback to that as well, the
collapsed documents set can be very large (depends on your data of course)
in which case the returned result which includes the fields values can be
rather large, which will impact performance, this is why this feature will
be enabled only if you specify this extra parameter - by default no field
values will be returned.

AFAIK, the latest patch should work fine with the latest build. Martijn
(which is the main maintainer of this patch) tries to keep it up to date
with the latest builds. But I guess the safest way is to work with the
nightly build of the same date as the latest patch (though I would give it a
try first with the latest build).

BTW, it's not an official suggestion from the Solr development team, but if
you ask me, if you have to choose now whether to use 1.3 or 1.4-dev, I would
go for the later. 1.4 is supposed to be released in the upcoming week or two
and it bring loads of bug fixes, enhancements and extra functionality. But
again, this is my personal suggestion.


cheers,
Uri

R. Tan wrote:



Okay. Thanks for giving an insight on how it works in general. Without
trying it myself, are the field values for the collapsed ones also part of
the results data?
What is the latest build that is safe to use on a production environment?
I'd probably go for that and use field collapsing.

Thank you very much.


On Fri, Sep 4, 2009 at 4:49 AM, Uri Boness ubon...@gmail.com wrote:



  

The collapsed documents are represented by one master document which
can
be part of the normal search result (the doc list), so pagination just
works
as expected, meaning taking only the returned documents in account
(ignoring
the collapsed ones). As for the scoring, the master document is
actually
the document with the highest score in the collapsed group.

As for Solr 1.3 compatibility... well... it's very hart to tell. All
latest
patch are certainly *not* 1.3 compatible (I think they're also depending
on
some changes in lucene which are not available for solr 1.3). I guess
you'll
have to try some of the old patches, but I'm not sure about their
stability.

cheers,
Uri


R. Tan wrote:





Thanks Uri. How does paging and scoring work when using field
collapsing?
What patch works with 1.3? Is it production ready?

R


On Thu, Sep 3, 2009 at 3:54 PM, Uri Boness ubon...@gmail.com wrote:





  

The development on this patch is quite active. It works well for single
solr instance, but distributed search (ie. shards) is not yet
supported.
Using this page you can group search results based on a specific field.
There are two flavors of field collapsing - adjacent and non-adjacent,
the
former collapses only document which happen to be located next to each
other
in the otherwise-non-collapsed results set. The later (the
non-adjacent)
one
collapses all documents with the same field value (regardless of their
position in the otherwise-non-collapsed results set). Note, that
non-adjacent performs better than adjacent one. There's currently
discussion
to extend this support so in addition to collapsing the documents,
extra
information will be returned for the collapsed documents (see the
discussion
on the issue page).

Uri


R. Tan wrote:







I think this is what I'm looking for. What is the status of this
patch?

On Thu, Sep 3, 2009 at 12:00 PM, R. Tan tanrihae...@gmail.com
wrote:







  

Hi Solrers,
I would like to get your opinion on how to best approach a search
requirement that I have. The scenario is I have a set of business
listings
that may be group into one parent business (such as 7-eleven having
several
locations). On the results page, I only want 7-eleven to show up once
but
also show how many locations matched the query (facet filtered by
state,
for
example) and maybe a preview of the some of the locations.

Searching for the business name is straightforward but the locations
within
the a result is quite tricky. I can do the opposite, searching for
the
locations and faceting on business names, but 

Re: Field Collapsing (was Re: Schema for group/child entity setup)

2009-09-05 Thread R. Tan
Great. Nice site and very similar to my requirements.

 There's work on the patch that is being done now which will enable you to
 ask for specific field values of the collapsed documents using a dedicated
 request parameter.


So, right now, you get all field values by default?


On Sun, Sep 6, 2009 at 3:58 AM, Uri Boness ubon...@gmail.com wrote:

 You can check out http://www.ilocal.nl. If you search for a bank in
 Amsterdam then you'll see that a lot of the results are collapsed. For this
 we used an older version of this patch (which works on 1.3) but a lot has
 changed since then. We're currently using this patch on another project, but
 it's not live yet.


 Uri

 R. Tan wrote:

 Thanks Uri. Your personal suggestion is appreciated and I think I'll
 follow
 your advice. We're still early in development and 1.4 would be a good
 choice. I hope I can get field collapsing to work with my requirements. Do
 you know any live site using field collapsing already?

 On Sat, Sep 5, 2009 at 5:57 PM, Uri Boness ubon...@gmail.com wrote:



 There's work on the patch that is being done now which will enable you to
 ask for specific field values of the collapsed documents using a
 dedicated
 request parameter. This work is not committed yet to the latest patch,
 but
 will be very soon. There is of course a drawback to that as well, the
 collapsed documents set can be very large (depends on your data of
 course)
 in which case the returned result which includes the fields values can be
 rather large, which will impact performance, this is why this feature
 will
 be enabled only if you specify this extra parameter - by default no field
 values will be returned.

 AFAIK, the latest patch should work fine with the latest build. Martijn
 (which is the main maintainer of this patch) tries to keep it up to date
 with the latest builds. But I guess the safest way is to work with the
 nightly build of the same date as the latest patch (though I would give
 it a
 try first with the latest build).

 BTW, it's not an official suggestion from the Solr development team, but
 if
 you ask me, if you have to choose now whether to use 1.3 or 1.4-dev, I
 would
 go for the later. 1.4 is supposed to be released in the upcoming week or
 two
 and it bring loads of bug fixes, enhancements and extra functionality.
 But
 again, this is my personal suggestion.


 cheers,
 Uri

 R. Tan wrote:



 Okay. Thanks for giving an insight on how it works in general. Without
 trying it myself, are the field values for the collapsed ones also part
 of
 the results data?
 What is the latest build that is safe to use on a production
 environment?
 I'd probably go for that and use field collapsing.

 Thank you very much.


 On Fri, Sep 4, 2009 at 4:49 AM, Uri Boness ubon...@gmail.com wrote:





 The collapsed documents are represented by one master document which
 can
 be part of the normal search result (the doc list), so pagination just
 works
 as expected, meaning taking only the returned documents in account
 (ignoring
 the collapsed ones). As for the scoring, the master document is
 actually
 the document with the highest score in the collapsed group.

 As for Solr 1.3 compatibility... well... it's very hart to tell. All
 latest
 patch are certainly *not* 1.3 compatible (I think they're also
 depending
 on
 some changes in lucene which are not available for solr 1.3). I guess
 you'll
 have to try some of the old patches, but I'm not sure about their
 stability.

 cheers,
 Uri


 R. Tan wrote:





 Thanks Uri. How does paging and scoring work when using field
 collapsing?
 What patch works with 1.3? Is it production ready?

 R


 On Thu, Sep 3, 2009 at 3:54 PM, Uri Boness ubon...@gmail.com wrote:







 The development on this patch is quite active. It works well for
 single
 solr instance, but distributed search (ie. shards) is not yet
 supported.
 Using this page you can group search results based on a specific
 field.
 There are two flavors of field collapsing - adjacent and
 non-adjacent,
 the
 former collapses only document which happen to be located next to
 each
 other
 in the otherwise-non-collapsed results set. The later (the
 non-adjacent)
 one
 collapses all documents with the same field value (regardless of
 their
 position in the otherwise-non-collapsed results set). Note, that
 non-adjacent performs better than adjacent one. There's currently
 discussion
 to extend this support so in addition to collapsing the documents,
 extra
 information will be returned for the collapsed documents (see the
 discussion
 on the issue page).

 Uri


 R. Tan wrote:







 I think this is what I'm looking for. What is the status of this
 patch?

 On Thu, Sep 3, 2009 at 12:00 PM, R. Tan tanrihae...@gmail.com
 wrote:









 Hi Solrers,
 I would like to get your opinion on how to best approach a search
 requirement that I have. The scenario is I have a set of business
 listings
 that may be group into one parent business (such as 7-eleven having
 

Re: Field Collapsing (was Re: Schema for group/child entity setup)

2009-09-04 Thread R. Tan
Okay. Thanks for giving an insight on how it works in general. Without
trying it myself, are the field values for the collapsed ones also part of
the results data?
What is the latest build that is safe to use on a production environment?
I'd probably go for that and use field collapsing.

Thank you very much.


On Fri, Sep 4, 2009 at 4:49 AM, Uri Boness ubon...@gmail.com wrote:

 The collapsed documents are represented by one master document which can
 be part of the normal search result (the doc list), so pagination just works
 as expected, meaning taking only the returned documents in account (ignoring
 the collapsed ones). As for the scoring, the master document is actually
 the document with the highest score in the collapsed group.

 As for Solr 1.3 compatibility... well... it's very hart to tell. All latest
 patch are certainly *not* 1.3 compatible (I think they're also depending on
 some changes in lucene which are not available for solr 1.3). I guess you'll
 have to try some of the old patches, but I'm not sure about their stability.

 cheers,
 Uri


 R. Tan wrote:

 Thanks Uri. How does paging and scoring work when using field collapsing?
 What patch works with 1.3? Is it production ready?

 R


 On Thu, Sep 3, 2009 at 3:54 PM, Uri Boness ubon...@gmail.com wrote:



 The development on this patch is quite active. It works well for single
 solr instance, but distributed search (ie. shards) is not yet supported.
 Using this page you can group search results based on a specific field.
 There are two flavors of field collapsing - adjacent and non-adjacent,
 the
 former collapses only document which happen to be located next to each
 other
 in the otherwise-non-collapsed results set. The later (the non-adjacent)
 one
 collapses all documents with the same field value (regardless of their
 position in the otherwise-non-collapsed results set). Note, that
 non-adjacent performs better than adjacent one. There's currently
 discussion
 to extend this support so in addition to collapsing the documents, extra
 information will be returned for the collapsed documents (see the
 discussion
 on the issue page).

 Uri


 R. Tan wrote:



 I think this is what I'm looking for. What is the status of this patch?

 On Thu, Sep 3, 2009 at 12:00 PM, R. Tan tanrihae...@gmail.com wrote:





 Hi Solrers,
 I would like to get your opinion on how to best approach a search
 requirement that I have. The scenario is I have a set of business
 listings
 that may be group into one parent business (such as 7-eleven having
 several
 locations). On the results page, I only want 7-eleven to show up once
 but
 also show how many locations matched the query (facet filtered by
 state,
 for
 example) and maybe a preview of the some of the locations.

 Searching for the business name is straightforward but the locations
 within
 the a result is quite tricky. I can do the opposite, searching for the
 locations and faceting on business names, but it will still basically
 be
 the
 same thing and repeat results with the same business name.

 Any advice?

 Thanks,
 R














Re: Field Collapsing (was Re: Schema for group/child entity setup)

2009-09-04 Thread R. Tan
Anybody using it on public site? Would love to see some live examples.

On Sat, Sep 5, 2009 at 12:50 AM, R. Tan tanrihae...@gmail.com wrote:

 Okay. Thanks for giving an insight on how it works in general. Without
 trying it myself, are the field values for the collapsed ones also part of
 the results data?
 What is the latest build that is safe to use on a production environment?
 I'd probably go for that and use field collapsing.

 Thank you very much.


 On Fri, Sep 4, 2009 at 4:49 AM, Uri Boness ubon...@gmail.com wrote:

 The collapsed documents are represented by one master document which can
 be part of the normal search result (the doc list), so pagination just works
 as expected, meaning taking only the returned documents in account (ignoring
 the collapsed ones). As for the scoring, the master document is actually
 the document with the highest score in the collapsed group.

 As for Solr 1.3 compatibility... well... it's very hart to tell. All
 latest patch are certainly *not* 1.3 compatible (I think they're also
 depending on some changes in lucene which are not available for solr 1.3). I
 guess you'll have to try some of the old patches, but I'm not sure about
 their stability.

 cheers,
 Uri


 R. Tan wrote:

 Thanks Uri. How does paging and scoring work when using field collapsing?
 What patch works with 1.3? Is it production ready?

 R


 On Thu, Sep 3, 2009 at 3:54 PM, Uri Boness ubon...@gmail.com wrote:



 The development on this patch is quite active. It works well for single
 solr instance, but distributed search (ie. shards) is not yet supported.
 Using this page you can group search results based on a specific field.
 There are two flavors of field collapsing - adjacent and non-adjacent,
 the
 former collapses only document which happen to be located next to each
 other
 in the otherwise-non-collapsed results set. The later (the non-adjacent)
 one
 collapses all documents with the same field value (regardless of their
 position in the otherwise-non-collapsed results set). Note, that
 non-adjacent performs better than adjacent one. There's currently
 discussion
 to extend this support so in addition to collapsing the documents, extra
 information will be returned for the collapsed documents (see the
 discussion
 on the issue page).

 Uri


 R. Tan wrote:



 I think this is what I'm looking for. What is the status of this patch?

 On Thu, Sep 3, 2009 at 12:00 PM, R. Tan tanrihae...@gmail.com wrote:





 Hi Solrers,
 I would like to get your opinion on how to best approach a search
 requirement that I have. The scenario is I have a set of business
 listings
 that may be group into one parent business (such as 7-eleven having
 several
 locations). On the results page, I only want 7-eleven to show up once
 but
 also show how many locations matched the query (facet filtered by
 state,
 for
 example) and maybe a preview of the some of the locations.

 Searching for the business name is straightforward but the locations
 within
 the a result is quite tricky. I can do the opposite, searching for the
 locations and faceting on business names, but it will still basically
 be
 the
 same thing and repeat results with the same business name.

 Any advice?

 Thanks,
 R















Re: Field Collapsing (was Re: Schema for group/child entity setup)

2009-09-03 Thread Uri Boness
The development on this patch is quite active. It works well for single 
solr instance, but distributed search (ie. shards) is not yet supported. 
Using this page you can group search results based on a specific field. 
There are two flavors of field collapsing - adjacent and non-adjacent, 
the former collapses only document which happen to be located next to 
each other in the otherwise-non-collapsed results set. The later (the 
non-adjacent) one collapses all documents with the same field value 
(regardless of their position in the otherwise-non-collapsed results 
set). Note, that non-adjacent performs better than adjacent one. There's 
currently discussion to extend this support so in addition to collapsing 
the documents, extra information will be returned for the collapsed 
documents (see the discussion on the issue page).


Uri

R. Tan wrote:

I think this is what I'm looking for. What is the status of this patch?

On Thu, Sep 3, 2009 at 12:00 PM, R. Tan tanrihae...@gmail.com wrote:

  

Hi Solrers,
I would like to get your opinion on how to best approach a search
requirement that I have. The scenario is I have a set of business listings
that may be group into one parent business (such as 7-eleven having several
locations). On the results page, I only want 7-eleven to show up once but
also show how many locations matched the query (facet filtered by state, for
example) and maybe a preview of the some of the locations.

Searching for the business name is straightforward but the locations within
the a result is quite tricky. I can do the opposite, searching for the
locations and faceting on business names, but it will still basically be the
same thing and repeat results with the same business name.

Any advice?

Thanks,
R




  


Re: Field Collapsing (was Re: Schema for group/child entity setup)

2009-09-03 Thread R. Tan
Thanks Uri. How does paging and scoring work when using field collapsing?
What patch works with 1.3? Is it production ready?

R


On Thu, Sep 3, 2009 at 3:54 PM, Uri Boness ubon...@gmail.com wrote:

 The development on this patch is quite active. It works well for single
 solr instance, but distributed search (ie. shards) is not yet supported.
 Using this page you can group search results based on a specific field.
 There are two flavors of field collapsing - adjacent and non-adjacent, the
 former collapses only document which happen to be located next to each other
 in the otherwise-non-collapsed results set. The later (the non-adjacent) one
 collapses all documents with the same field value (regardless of their
 position in the otherwise-non-collapsed results set). Note, that
 non-adjacent performs better than adjacent one. There's currently discussion
 to extend this support so in addition to collapsing the documents, extra
 information will be returned for the collapsed documents (see the discussion
 on the issue page).

 Uri


 R. Tan wrote:

 I think this is what I'm looking for. What is the status of this patch?

 On Thu, Sep 3, 2009 at 12:00 PM, R. Tan tanrihae...@gmail.com wrote:



 Hi Solrers,
 I would like to get your opinion on how to best approach a search
 requirement that I have. The scenario is I have a set of business
 listings
 that may be group into one parent business (such as 7-eleven having
 several
 locations). On the results page, I only want 7-eleven to show up once but
 also show how many locations matched the query (facet filtered by state,
 for
 example) and maybe a preview of the some of the locations.

 Searching for the business name is straightforward but the locations
 within
 the a result is quite tricky. I can do the opposite, searching for the
 locations and faceting on business names, but it will still basically be
 the
 same thing and repeat results with the same business name.

 Any advice?

 Thanks,
 R









Re: Field Collapsing (was Re: Schema for group/child entity setup)

2009-09-03 Thread Uri Boness
The collapsed documents are represented by one master document which 
can be part of the normal search result (the doc list), so pagination 
just works as expected, meaning taking only the returned documents in 
account (ignoring the collapsed ones). As for the scoring, the master 
document is actually the document with the highest score in the 
collapsed group.


As for Solr 1.3 compatibility... well... it's very hart to tell. All 
latest patch are certainly *not* 1.3 compatible (I think they're also 
depending on some changes in lucene which are not available for solr 
1.3). I guess you'll have to try some of the old patches, but I'm not 
sure about their stability.


cheers,
Uri

R. Tan wrote:

Thanks Uri. How does paging and scoring work when using field collapsing?
What patch works with 1.3? Is it production ready?

R


On Thu, Sep 3, 2009 at 3:54 PM, Uri Boness ubon...@gmail.com wrote:

  

The development on this patch is quite active. It works well for single
solr instance, but distributed search (ie. shards) is not yet supported.
Using this page you can group search results based on a specific field.
There are two flavors of field collapsing - adjacent and non-adjacent, the
former collapses only document which happen to be located next to each other
in the otherwise-non-collapsed results set. The later (the non-adjacent) one
collapses all documents with the same field value (regardless of their
position in the otherwise-non-collapsed results set). Note, that
non-adjacent performs better than adjacent one. There's currently discussion
to extend this support so in addition to collapsing the documents, extra
information will be returned for the collapsed documents (see the discussion
on the issue page).

Uri


R. Tan wrote:



I think this is what I'm looking for. What is the status of this patch?

On Thu, Sep 3, 2009 at 12:00 PM, R. Tan tanrihae...@gmail.com wrote:



  

Hi Solrers,
I would like to get your opinion on how to best approach a search
requirement that I have. The scenario is I have a set of business
listings
that may be group into one parent business (such as 7-eleven having
several
locations). On the results page, I only want 7-eleven to show up once but
also show how many locations matched the query (facet filtered by state,
for
example) and maybe a preview of the some of the locations.

Searching for the business name is straightforward but the locations
within
the a result is quite tricky. I can do the opposite, searching for the
locations and faceting on business names, but it will still basically be
the
same thing and repeat results with the same business name.

Any advice?

Thanks,
R






  


  


Field Collapsing (was Re: Schema for group/child entity setup)

2009-09-02 Thread R. Tan
I think this is what I'm looking for. What is the status of this patch?

On Thu, Sep 3, 2009 at 12:00 PM, R. Tan tanrihae...@gmail.com wrote:

 Hi Solrers,
 I would like to get your opinion on how to best approach a search
 requirement that I have. The scenario is I have a set of business listings
 that may be group into one parent business (such as 7-eleven having several
 locations). On the results page, I only want 7-eleven to show up once but
 also show how many locations matched the query (facet filtered by state, for
 example) and maybe a preview of the some of the locations.

 Searching for the business name is straightforward but the locations within
 the a result is quite tricky. I can do the opposite, searching for the
 locations and faceting on business names, but it will still basically be the
 same thing and repeat results with the same business name.

 Any advice?

 Thanks,
 R