Re: One item, multiple fields, and range queries

2007-01-13 Thread Jeff Rodenburg

Thanks Yonik.


1) model a single document as a single event at a singe place with a start

and end date.

This was my first approach, but at presentation time I need to display the
event once -- with multiple start/end dates and locations beneath it.

Is treating the given event uniqueId as a facet the way to go?

thanks,
jeff


On 1/12/07, Yonik Seeley [EMAIL PROTECTED] wrote:


On 1/12/07, Jeff Rodenburg [EMAIL PROTECTED] wrote:
 I'm stuck with a query issue that at present seems unresolvable.  Hoping
the
 community has some insight to this.

 My index contains events that have multiple beginning/ending date ranges
and
 multiple locations.  For example, event A (uniqueId = 123) occurs every
 weekend, sometimes in one location, sometimes in many locations.  Dates
have
 a beginning and ending date, and locations have a latitude 
longitude.  I
 need to query for the set of events for a given area, where area =
 bounding box.  So, a single event has multiple beginning and ending
dates
 and multiple locations.

 So, the beginning date, ending date, latitude and longitude values only
 apply collectively as a unit.  However, I need to do range queries on
both
 the dates and the lat/long values.

1) model a single document as a single event at a singe place with a
start and end date.
  OR
2) use multivalued fields as correlated vectors, so the first start
date corresponds
   to the first end date corresponds to the first lat and long value.
You get them all back
   in a query though, so your app would need to do extra work to sort
out which matched.

I'd do (1) if you can... it's simpler.

-Yonik



Re: listing/enumerating field information

2007-01-13 Thread J.J. Larrea
At 5:06 AM -0500 1/12/07, Erik Hatcher wrote:
What the user-interface needs is a way to ask Solr for terms that begin with a 
specified prefix, as the user types.   Paging via start/rows is necessary, and 
also sorting by frequency given some specified constraints.  I like the 
start/end term idea also, though I can't think of a scenario in my application 
where this would be different than having a prefix parameter.  If I want all 
the 1860's, prefix=186field=year, for example.

I also have exactly this requirement: Paging through the terms (and getting the 
document count for each term) optionally limited to those matching a supplied 
prefix (there can be thousands of terms for a prefix so start/rows is 
absolutely necessary even when prefixing). Choosing whether terms were sorted 
by index-order or document-count order would be a plus.

I would love to have this be provided by an extension to the Faceting logic, as 
suggested by Yonik and Hoss, incorporating the non-query pathway raised by Erik:
  - Assemble the list of term/frequency pairs for a field either by tallying 
the term references found in a DocList, or by using the term frequency 
information found in the index (optimization for non-query case)
  - Apply a criterion (RegExp based would obviously be most flexible -- no need 
for full Lucene query syntax -- but prefix-only might be an optimization that 
could be applied in the non-query case) to filter the terms, either during 
assembly or post-facto.
  - Apply the faceting criteria (e.g. facet.zeros, though facet.mincount would 
have been a more flexible option in all cases)
  - Optionally pass through the BoundedTreeSet/PriorityQueue mechanism to sort 
by frequency and in that case optionally keep only the top facet.limit terms
  - Cache the results with the query (including a special key for the non-query 
case) so paging could be done without any requerying, retallying, or resorting
  - Return in the response a subrange of the list
  - Naturally allow the full complement of response encodings
  - (Am I missing anything?)

While a commendable endeavor, this is a fair bit of work, and it may take a 
while before someone (perhaps me even) steps up to the plate, for performance 
if not functional considerations.  So IMHO it would also be worthwhile to craft 
a simpler index-only version.

I would be thrilled if this just magically appeared in Solr's codebase before 
I have a chance to build it. :)

Well, after my current deadline (next week) passes, this functionality is on my 
 task list for my next milestone... so I'd be equally elated if I didn't have 
to write it myself. :-)

And adding 2 cents to the other topic in this thread...

As for Hoss's suggestion of a Stats handler - I still hold the opinion that 
all of the admin JSPs really ought to be first class request handlers that go 
through the whole ResponseWriter stuff, so I can get all of that great 
capability in Ruby format instead of XML. 

Agreed in principle, though I'm an XML-person.

As it is, to build a Ruby API to Solr and provide access to the stats, there 
has to be two different parsing mechanisms.  I know he meant index stats, not 
Solr admin stats, but it reminded me of the XML pain I'm going to feel in 
solrb to add Solr stats :)

I am happy to merely be a spectator of the Rubification of SOLR!

Also,

On Jan 11, 2007, at 3:13 PM, Yonik Seeley wrote:
 Attempting to enumerating
all of the values for a field could be dangerous

We do it for faceting :-)  But we don't drag it all into memory at once...

Not entirely true: The FieldCache pathway of faceting single-valued fields does 
just that.  In some cases I've set multivalued=true even when it's not accurate 
in order to force the cached-filter pathway.

- J.J.