RE: Modeling openinghours using multipoints

2012-12-10 Thread David Smiley (@MITRE.org)
Maybe it would? I don't completely get your drift.  But you're talking about a 
user writing a bunch of custom code to build, save, and query the bitmap 
whereas working on top of existing functionality seems to me a lot more 
maintainable on the user's part.
~ David


From: Lance Norskog-2 [via Lucene] [ml-node+s472066n4025579...@n3.nabble.com]
Sent: Sunday, December 09, 2012 6:35 PM
To: Smiley, David W.
Subject: Re: Modeling openinghours using multipoints

If these are not raw times, but quantized on-the-hour, would it be
faster to create a bit map of hours and then query across the bit
maps?

On Sun, Dec 9, 2012 at 8:06 AM, Erick Erickson [hidden 
email]UrlBlockedError.aspx wrote:

 Thanks for the discussion, I've added this to my bag of tricks, way cool!

 Erick


 On Sat, Dec 8, 2012 at 10:52 PM, britske [hidden 
 email]UrlBlockedError.aspx wrote:

 Brilliant! Got some great ideas for this. Indeed all sorts of usecases
 which use multiple temporal ranges could benefit..

 Eg: Another Guy on stackoverflow asked me about this some days ago.. He
 wants to model multiple temporary offers per product (free shopping for
 christmas, 20% discount for Black friday , etc) .. All possible with this
 out of the box. Factor in 'offer category' in  x and y as well for some
 extra powerfull querying.

 Yup im enthousiastic about it , which im sure you can tell :)

 Thanks a lot David,

 Cheers,
 Geert-Jan



 Sent from my iPhone

 On 9 dec. 2012, at 05:35, David Smiley (@MITRE.org) [via Lucene] 
 [hidden email]UrlBlockedError.aspx wrote:

  britske wrote
  That's seriously awesome!
 
  Some change in the query though:
  You described: To query for a business that is open during at least some
  part of a given time duration
  I want To query for a business that is open during at least the entire
  given time duration.
 
  Feels like a small difference but probably isn't (I'm still wrapping my
  head on the intersect query I must admit)
  So this would be a slightly different rectangle query.  Interestingly,
 you simply swap the location in the rectangle where you put the start and
 end time.  In summary:
 
  Indexed span CONTAINS query span:
  minX minY maxX maxY - 0 end start *
 
  Indexed span INTERSECTS (i.e. OVERLAPS) query span:
  minX minY maxX maxY - 0 start end *
 
  Indexed span WITHIN query span:
  minX minY maxX maxY - start 0 * end
 
  I'm using '*' here to denote the max possible value.  At some point I
 may add that as a feature.
 
  That was a fun exercise!  I give you credit in prodding me in this
 direction as I'm not sure if this use of spatial would have occurred to me
 otherwise.
 
  britske wrote
  Moreover, any indication on performance? Should, say, 50.000 docs with
  about 100-200 points each (1 a 2 open-close spans per day) be ok? ( I
 know
  'your mileage may very' etc. but just a guestimate :)
  You should have absolutely no problem.  The real clincher in your favor
 is the fact that you only need 9600 discrete time values (so you said), not
 Long.MAX_VALUE.  Using Long.MAX_VALUE would simply not be possible with the
 current implementation because it's using Doubles which has 52 bits of
 precision not the 64 that would be required to be a complete substitute for
 any time/date.  Even given the 52 bits, a quad SpatialPrefixTree with
 maxLevels=52 would probably not perform well or might fail; not sure.
  Eventually when I have time to work on an implementation that can be based
 on a configurable number of grid cells (not unlike how you can configure
 precisionStep on the Trie numeric fields), 52 should be no problem.
 
  I'll have to remember to refer back to this email on the approach if I
 create a field type that wraps this functionality.
 
  ~ David
 
  britske wrote
  Again, this looks good!
  Geert-Jan
 
  2012/12/8 David Smiley (@MITRE.org) [via Lucene] 
  [hidden email]
 
   Hello again Geert-Jan!
  
   What you're trying to do is indeed possible with Solr 4 out of the box.
Other terminology people use for this is multi-value time duration.
  This
   creative solution is a pure application of spatial without the
 geospatial
   notion -- we're not using an earth or other sphere model -- it's a flat
   plane.  So no need to make reference to longitude  latitude, it's x 
 y.
  
   I would put opening time into x, and closing time into y.  To express a
   point, use x y (x space y), and supply this as a string to your
   SpatialRecursivePrefixTreeFieldType based field for indexing.  You can
 give
   it multiple values and it will work correctly; this is one of RPT's
 main
   features that set it apart from Solr 3 spatial.  To query for a
 business
   that is open during at least some part of a given time duration, say
 6-8
   o'clock, the query would look like openDuration:Intersects(minX minY
 maxX
   maxY)  and put 0 or minX (always), 6 for minY (start time), 8 for maxX
   (end time), and the largest possible value for maxY.  You wouldn't

RE: Modeling openinghours using multipoints

2012-12-10 Thread David Smiley (@MITRE.org)
Mikhail,
Join of any nature should be chosen in last resort to using a single index 
(when it's possible), especially if there is minimal to no denormalization of 
data.  In this specific case, if the average document had 200 temporal ranges 
to index (100 days out, 2 per day), a Join based solution would have 200+1 
documents in the index.  That's an explosion of the document count by 200x!  
Yoyzah!  Obviously what we're discussing here, modeling numeric ranges as x-y 
points has its limits -- namely that the spatial module is limited to 2 
dimensions currently.  It's plausible to see it generalized, but I don't think 
it'll scale well beyond 4-5 dimensions.  I recall a research paper talking 
about multi-dimensional numeric indexes seriously breaking down at about 6.

~ David


From: Mikhail Khludnev [via Lucene] [ml-node+s472066n4025602...@n3.nabble.com]
Sent: Monday, December 10, 2012 12:15 AM
To: Smiley, David W.
Subject: Re: Modeling openinghours using multipoints

Colleagues,
What are benefits of this approach at contrast to block join?

Thanks
10.12.2012 3:35 пользователь Lance Norskog [hidden 
email]UrlBlockedError.aspx написал:

 If these are not raw times, but quantized on-the-hour, would it be
 faster to create a bit map of hours and then query across the bit
 maps?

 On Sun, Dec 9, 2012 at 8:06 AM, Erick Erickson [hidden 
 email]UrlBlockedError.aspx
 wrote:
  Thanks for the discussion, I've added this to my bag of tricks, way cool!
 
  Erick
 
 
  On Sat, Dec 8, 2012 at 10:52 PM, britske [hidden 
  email]UrlBlockedError.aspx wrote:
 
  Brilliant! Got some great ideas for this. Indeed all sorts of usecases
  which use multiple temporal ranges could benefit..
 
  Eg: Another Guy on stackoverflow asked me about this some days ago.. He
  wants to model multiple temporary offers per product (free shopping for
  christmas, 20% discount for Black friday , etc) .. All possible with
 this
  out of the box. Factor in 'offer category' in  x and y as well for some
  extra powerfull querying.
 
  Yup im enthousiastic about it , which im sure you can tell :)
 
  Thanks a lot David,
 
  Cheers,
  Geert-Jan
 
 
 
  Sent from my iPhone
 
  On 9 dec. 2012, at 05:35, David Smiley (@MITRE.org) [via Lucene] 
  [hidden email]UrlBlockedError.aspx wrote:
 
   britske wrote
   That's seriously awesome!
  
   Some change in the query though:
   You described: To query for a business that is open during at least
 some
   part of a given time duration
   I want To query for a business that is open during at least the
 entire
   given time duration.
  
   Feels like a small difference but probably isn't (I'm still wrapping
 my
   head on the intersect query I must admit)
   So this would be a slightly different rectangle query.  Interestingly,
  you simply swap the location in the rectangle where you put the start
 and
  end time.  In summary:
  
   Indexed span CONTAINS query span:
   minX minY maxX maxY - 0 end start *
  
   Indexed span INTERSECTS (i.e. OVERLAPS) query span:
   minX minY maxX maxY - 0 start end *
  
   Indexed span WITHIN query span:
   minX minY maxX maxY - start 0 * end
  
   I'm using '*' here to denote the max possible value.  At some point I
  may add that as a feature.
  
   That was a fun exercise!  I give you credit in prodding me in this
  direction as I'm not sure if this use of spatial would have occurred to
 me
  otherwise.
  
   britske wrote
   Moreover, any indication on performance? Should, say, 50.000 docs with
   about 100-200 points each (1 a 2 open-close spans per day) be ok? ( I
  know
   'your mileage may very' etc. but just a guestimate :)
   You should have absolutely no problem.  The real clincher in your
 favor
  is the fact that you only need 9600 discrete time values (so you said),
 not
  Long.MAX_VALUE.  Using Long.MAX_VALUE would simply not be possible with
 the
  current implementation because it's using Doubles which has 52 bits of
  precision not the 64 that would be required to be a complete substitute
 for
  any time/date.  Even given the 52 bits, a quad SpatialPrefixTree with
  maxLevels=52 would probably not perform well or might fail; not sure.
   Eventually when I have time to work on an implementation that can be
 based
  on a configurable number of grid cells (not unlike how you can configure
  precisionStep on the Trie numeric fields), 52 should be no problem.
  
   I'll have to remember to refer back to this email on the approach if I
  create a field type that wraps this functionality.
  
   ~ David
  
   britske wrote
   Again, this looks good!
   Geert-Jan
  
   2012/12/8 David Smiley (@MITRE.org) [via Lucene] 
   [hidden email]
  
Hello again Geert-Jan!
   
What you're trying to do is indeed possible with Solr 4 out of the
 box.
 Other terminology people use for this is multi-value time duration.
   This
creative solution is a pure application of spatial without the
  geospatial
notion -- we're

Re: Modeling openinghours using multipoints

2012-12-10 Thread Lance Norskog
Bit maps can be done with a separate term for each bit. You search for 
all of the terms in the bit range you want.


On 12/10/2012 06:34 AM, David Smiley (@MITRE.org) wrote:

Maybe it would? I don't completely get your drift.  But you're talking about a 
user writing a bunch of custom code to build, save, and query the bitmap 
whereas working on top of existing functionality seems to me a lot more 
maintainable on the user's part.
~ David


From: Lance Norskog-2 [via Lucene] [ml-node+s472066n4025579...@n3.nabble.com]
Sent: Sunday, December 09, 2012 6:35 PM
To: Smiley, David W.
Subject: Re: Modeling openinghours using multipoints

If these are not raw times, but quantized on-the-hour, would it be
faster to create a bit map of hours and then query across the bit
maps?

On Sun, Dec 9, 2012 at 8:06 AM, Erick Erickson [hidden 
email]UrlBlockedError.aspx wrote:


Thanks for the discussion, I've added this to my bag of tricks, way cool!

Erick


On Sat, Dec 8, 2012 at 10:52 PM, britske [hidden email]UrlBlockedError.aspx 
wrote:


Brilliant! Got some great ideas for this. Indeed all sorts of usecases
which use multiple temporal ranges could benefit..

Eg: Another Guy on stackoverflow asked me about this some days ago.. He
wants to model multiple temporary offers per product (free shopping for
christmas, 20% discount for Black friday , etc) .. All possible with this
out of the box. Factor in 'offer category' in  x and y as well for some
extra powerfull querying.

Yup im enthousiastic about it , which im sure you can tell :)

Thanks a lot David,

Cheers,
Geert-Jan



Sent from my iPhone

On 9 dec. 2012, at 05:35, David Smiley (@MITRE.org) [via Lucene] 
[hidden email]UrlBlockedError.aspx wrote:


britske wrote
That's seriously awesome!

Some change in the query though:
You described: To query for a business that is open during at least some
part of a given time duration
I want To query for a business that is open during at least the entire
given time duration.

Feels like a small difference but probably isn't (I'm still wrapping my
head on the intersect query I must admit)
So this would be a slightly different rectangle query.  Interestingly,

you simply swap the location in the rectangle where you put the start and
end time.  In summary:

Indexed span CONTAINS query span:
minX minY maxX maxY - 0 end start *

Indexed span INTERSECTS (i.e. OVERLAPS) query span:
minX minY maxX maxY - 0 start end *

Indexed span WITHIN query span:
minX minY maxX maxY - start 0 * end

I'm using '*' here to denote the max possible value.  At some point I

may add that as a feature.

That was a fun exercise!  I give you credit in prodding me in this

direction as I'm not sure if this use of spatial would have occurred to me
otherwise.

britske wrote
Moreover, any indication on performance? Should, say, 50.000 docs with
about 100-200 points each (1 a 2 open-close spans per day) be ok? ( I

know

'your mileage may very' etc. but just a guestimate :)
You should have absolutely no problem.  The real clincher in your favor

is the fact that you only need 9600 discrete time values (so you said), not
Long.MAX_VALUE.  Using Long.MAX_VALUE would simply not be possible with the
current implementation because it's using Doubles which has 52 bits of
precision not the 64 that would be required to be a complete substitute for
any time/date.  Even given the 52 bits, a quad SpatialPrefixTree with
maxLevels=52 would probably not perform well or might fail; not sure.
  Eventually when I have time to work on an implementation that can be based
on a configurable number of grid cells (not unlike how you can configure
precisionStep on the Trie numeric fields), 52 should be no problem.

I'll have to remember to refer back to this email on the approach if I

create a field type that wraps this functionality.

~ David

britske wrote
Again, this looks good!
Geert-Jan

2012/12/8 David Smiley (@MITRE.org) [via Lucene] 
[hidden email]


Hello again Geert-Jan!

What you're trying to do is indeed possible with Solr 4 out of the box.
  Other terminology people use for this is multi-value time duration.

  This

creative solution is a pure application of spatial without the

geospatial

notion -- we're not using an earth or other sphere model -- it's a flat
plane.  So no need to make reference to longitude  latitude, it's x 

y.

I would put opening time into x, and closing time into y.  To express a
point, use x y (x space y), and supply this as a string to your
SpatialRecursivePrefixTreeFieldType based field for indexing.  You can

give

it multiple values and it will work correctly; this is one of RPT's

main

features that set it apart from Solr 3 spatial.  To query for a

business

that is open during at least some part of a given time duration, say

6-8

o'clock, the query would look like openDuration:Intersects(minX minY

maxX

maxY)  and put 0 or minX (always), 6 for minY (start time), 8 for maxX
(end time), and the largest

Re: Modeling openinghours using multipoints

2012-12-09 Thread Erick Erickson
Thanks for the discussion, I've added this to my bag of tricks, way cool!

Erick


On Sat, Dec 8, 2012 at 10:52 PM, britske gbr...@gmail.com wrote:

 Brilliant! Got some great ideas for this. Indeed all sorts of usecases
 which use multiple temporal ranges could benefit..

 Eg: Another Guy on stackoverflow asked me about this some days ago.. He
 wants to model multiple temporary offers per product (free shopping for
 christmas, 20% discount for Black friday , etc) .. All possible with this
 out of the box. Factor in 'offer category' in  x and y as well for some
 extra powerfull querying.

 Yup im enthousiastic about it , which im sure you can tell :)

 Thanks a lot David,

 Cheers,
 Geert-Jan



 Sent from my iPhone

 On 9 dec. 2012, at 05:35, David Smiley (@MITRE.org) [via Lucene] 
 ml-node+s472066n4025434...@n3.nabble.com wrote:

  britske wrote
  That's seriously awesome!
 
  Some change in the query though:
  You described: To query for a business that is open during at least some
  part of a given time duration
  I want To query for a business that is open during at least the entire
  given time duration.
 
  Feels like a small difference but probably isn't (I'm still wrapping my
  head on the intersect query I must admit)
  So this would be a slightly different rectangle query.  Interestingly,
 you simply swap the location in the rectangle where you put the start and
 end time.  In summary:
 
  Indexed span CONTAINS query span:
  minX minY maxX maxY - 0 end start *
 
  Indexed span INTERSECTS (i.e. OVERLAPS) query span:
  minX minY maxX maxY - 0 start end *
 
  Indexed span WITHIN query span:
  minX minY maxX maxY - start 0 * end
 
  I'm using '*' here to denote the max possible value.  At some point I
 may add that as a feature.
 
  That was a fun exercise!  I give you credit in prodding me in this
 direction as I'm not sure if this use of spatial would have occurred to me
 otherwise.
 
  britske wrote
  Moreover, any indication on performance? Should, say, 50.000 docs with
  about 100-200 points each (1 a 2 open-close spans per day) be ok? ( I
 know
  'your mileage may very' etc. but just a guestimate :)
  You should have absolutely no problem.  The real clincher in your favor
 is the fact that you only need 9600 discrete time values (so you said), not
 Long.MAX_VALUE.  Using Long.MAX_VALUE would simply not be possible with the
 current implementation because it's using Doubles which has 52 bits of
 precision not the 64 that would be required to be a complete substitute for
 any time/date.  Even given the 52 bits, a quad SpatialPrefixTree with
 maxLevels=52 would probably not perform well or might fail; not sure.
  Eventually when I have time to work on an implementation that can be based
 on a configurable number of grid cells (not unlike how you can configure
 precisionStep on the Trie numeric fields), 52 should be no problem.
 
  I'll have to remember to refer back to this email on the approach if I
 create a field type that wraps this functionality.
 
  ~ David
 
  britske wrote
  Again, this looks good!
  Geert-Jan
 
  2012/12/8 David Smiley (@MITRE.org) [via Lucene] 
  [hidden email]
 
   Hello again Geert-Jan!
  
   What you're trying to do is indeed possible with Solr 4 out of the box.
Other terminology people use for this is multi-value time duration.
  This
   creative solution is a pure application of spatial without the
 geospatial
   notion -- we're not using an earth or other sphere model -- it's a flat
   plane.  So no need to make reference to longitude  latitude, it's x 
 y.
  
   I would put opening time into x, and closing time into y.  To express a
   point, use x y (x space y), and supply this as a string to your
   SpatialRecursivePrefixTreeFieldType based field for indexing.  You can
 give
   it multiple values and it will work correctly; this is one of RPT's
 main
   features that set it apart from Solr 3 spatial.  To query for a
 business
   that is open during at least some part of a given time duration, say
 6-8
   o'clock, the query would look like openDuration:Intersects(minX minY
 maxX
   maxY)  and put 0 or minX (always), 6 for minY (start time), 8 for maxX
   (end time), and the largest possible value for maxY.  You wouldn't
 actually
   use 6  8, you'd use the number of 15 minute intervals since your
 epoch for
   this equivalent time span.
  
   You'll need to configure the field correctly: geo=false
 worldBounds=0 0
   maxTime maxTime substituting an appropriate value for maxTime based on
   your unit of time (number of 15 minute intervals you need) and
   distErrPct=0 (full precision).
  
   Let me know how this works for you.
  
   ~ David
Author:
   http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
   Author:
 http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
 
 
  If you reply to this email, your message will be added to the discussion
 below:
 
 

Re: Modeling openinghours using multipoints

2012-12-09 Thread Lance Norskog
If these are not raw times, but quantized on-the-hour, would it be
faster to create a bit map of hours and then query across the bit
maps?

On Sun, Dec 9, 2012 at 8:06 AM, Erick Erickson erickerick...@gmail.com wrote:
 Thanks for the discussion, I've added this to my bag of tricks, way cool!

 Erick


 On Sat, Dec 8, 2012 at 10:52 PM, britske gbr...@gmail.com wrote:

 Brilliant! Got some great ideas for this. Indeed all sorts of usecases
 which use multiple temporal ranges could benefit..

 Eg: Another Guy on stackoverflow asked me about this some days ago.. He
 wants to model multiple temporary offers per product (free shopping for
 christmas, 20% discount for Black friday , etc) .. All possible with this
 out of the box. Factor in 'offer category' in  x and y as well for some
 extra powerfull querying.

 Yup im enthousiastic about it , which im sure you can tell :)

 Thanks a lot David,

 Cheers,
 Geert-Jan



 Sent from my iPhone

 On 9 dec. 2012, at 05:35, David Smiley (@MITRE.org) [via Lucene] 
 ml-node+s472066n4025434...@n3.nabble.com wrote:

  britske wrote
  That's seriously awesome!
 
  Some change in the query though:
  You described: To query for a business that is open during at least some
  part of a given time duration
  I want To query for a business that is open during at least the entire
  given time duration.
 
  Feels like a small difference but probably isn't (I'm still wrapping my
  head on the intersect query I must admit)
  So this would be a slightly different rectangle query.  Interestingly,
 you simply swap the location in the rectangle where you put the start and
 end time.  In summary:
 
  Indexed span CONTAINS query span:
  minX minY maxX maxY - 0 end start *
 
  Indexed span INTERSECTS (i.e. OVERLAPS) query span:
  minX minY maxX maxY - 0 start end *
 
  Indexed span WITHIN query span:
  minX minY maxX maxY - start 0 * end
 
  I'm using '*' here to denote the max possible value.  At some point I
 may add that as a feature.
 
  That was a fun exercise!  I give you credit in prodding me in this
 direction as I'm not sure if this use of spatial would have occurred to me
 otherwise.
 
  britske wrote
  Moreover, any indication on performance? Should, say, 50.000 docs with
  about 100-200 points each (1 a 2 open-close spans per day) be ok? ( I
 know
  'your mileage may very' etc. but just a guestimate :)
  You should have absolutely no problem.  The real clincher in your favor
 is the fact that you only need 9600 discrete time values (so you said), not
 Long.MAX_VALUE.  Using Long.MAX_VALUE would simply not be possible with the
 current implementation because it's using Doubles which has 52 bits of
 precision not the 64 that would be required to be a complete substitute for
 any time/date.  Even given the 52 bits, a quad SpatialPrefixTree with
 maxLevels=52 would probably not perform well or might fail; not sure.
  Eventually when I have time to work on an implementation that can be based
 on a configurable number of grid cells (not unlike how you can configure
 precisionStep on the Trie numeric fields), 52 should be no problem.
 
  I'll have to remember to refer back to this email on the approach if I
 create a field type that wraps this functionality.
 
  ~ David
 
  britske wrote
  Again, this looks good!
  Geert-Jan
 
  2012/12/8 David Smiley (@MITRE.org) [via Lucene] 
  [hidden email]
 
   Hello again Geert-Jan!
  
   What you're trying to do is indeed possible with Solr 4 out of the box.
Other terminology people use for this is multi-value time duration.
  This
   creative solution is a pure application of spatial without the
 geospatial
   notion -- we're not using an earth or other sphere model -- it's a flat
   plane.  So no need to make reference to longitude  latitude, it's x 
 y.
  
   I would put opening time into x, and closing time into y.  To express a
   point, use x y (x space y), and supply this as a string to your
   SpatialRecursivePrefixTreeFieldType based field for indexing.  You can
 give
   it multiple values and it will work correctly; this is one of RPT's
 main
   features that set it apart from Solr 3 spatial.  To query for a
 business
   that is open during at least some part of a given time duration, say
 6-8
   o'clock, the query would look like openDuration:Intersects(minX minY
 maxX
   maxY)  and put 0 or minX (always), 6 for minY (start time), 8 for maxX
   (end time), and the largest possible value for maxY.  You wouldn't
 actually
   use 6  8, you'd use the number of 15 minute intervals since your
 epoch for
   this equivalent time span.
  
   You'll need to configure the field correctly: geo=false
 worldBounds=0 0
   maxTime maxTime substituting an appropriate value for maxTime based on
   your unit of time (number of 15 minute intervals you need) and
   distErrPct=0 (full precision).
  
   Let me know how this works for you.
  
   ~ David
Author:
   http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
   Author:
 

Re: Modeling openinghours using multipoints

2012-12-09 Thread Mikhail Khludnev
Colleagues,
What are benefits of this approach at contrast to block join?

Thanks
10.12.2012 3:35 пользователь Lance Norskog goks...@gmail.com написал:

 If these are not raw times, but quantized on-the-hour, would it be
 faster to create a bit map of hours and then query across the bit
 maps?

 On Sun, Dec 9, 2012 at 8:06 AM, Erick Erickson erickerick...@gmail.com
 wrote:
  Thanks for the discussion, I've added this to my bag of tricks, way cool!
 
  Erick
 
 
  On Sat, Dec 8, 2012 at 10:52 PM, britske gbr...@gmail.com wrote:
 
  Brilliant! Got some great ideas for this. Indeed all sorts of usecases
  which use multiple temporal ranges could benefit..
 
  Eg: Another Guy on stackoverflow asked me about this some days ago.. He
  wants to model multiple temporary offers per product (free shopping for
  christmas, 20% discount for Black friday , etc) .. All possible with
 this
  out of the box. Factor in 'offer category' in  x and y as well for some
  extra powerfull querying.
 
  Yup im enthousiastic about it , which im sure you can tell :)
 
  Thanks a lot David,
 
  Cheers,
  Geert-Jan
 
 
 
  Sent from my iPhone
 
  On 9 dec. 2012, at 05:35, David Smiley (@MITRE.org) [via Lucene] 
  ml-node+s472066n4025434...@n3.nabble.com wrote:
 
   britske wrote
   That's seriously awesome!
  
   Some change in the query though:
   You described: To query for a business that is open during at least
 some
   part of a given time duration
   I want To query for a business that is open during at least the
 entire
   given time duration.
  
   Feels like a small difference but probably isn't (I'm still wrapping
 my
   head on the intersect query I must admit)
   So this would be a slightly different rectangle query.  Interestingly,
  you simply swap the location in the rectangle where you put the start
 and
  end time.  In summary:
  
   Indexed span CONTAINS query span:
   minX minY maxX maxY - 0 end start *
  
   Indexed span INTERSECTS (i.e. OVERLAPS) query span:
   minX minY maxX maxY - 0 start end *
  
   Indexed span WITHIN query span:
   minX minY maxX maxY - start 0 * end
  
   I'm using '*' here to denote the max possible value.  At some point I
  may add that as a feature.
  
   That was a fun exercise!  I give you credit in prodding me in this
  direction as I'm not sure if this use of spatial would have occurred to
 me
  otherwise.
  
   britske wrote
   Moreover, any indication on performance? Should, say, 50.000 docs with
   about 100-200 points each (1 a 2 open-close spans per day) be ok? ( I
  know
   'your mileage may very' etc. but just a guestimate :)
   You should have absolutely no problem.  The real clincher in your
 favor
  is the fact that you only need 9600 discrete time values (so you said),
 not
  Long.MAX_VALUE.  Using Long.MAX_VALUE would simply not be possible with
 the
  current implementation because it's using Doubles which has 52 bits of
  precision not the 64 that would be required to be a complete substitute
 for
  any time/date.  Even given the 52 bits, a quad SpatialPrefixTree with
  maxLevels=52 would probably not perform well or might fail; not sure.
   Eventually when I have time to work on an implementation that can be
 based
  on a configurable number of grid cells (not unlike how you can configure
  precisionStep on the Trie numeric fields), 52 should be no problem.
  
   I'll have to remember to refer back to this email on the approach if I
  create a field type that wraps this functionality.
  
   ~ David
  
   britske wrote
   Again, this looks good!
   Geert-Jan
  
   2012/12/8 David Smiley (@MITRE.org) [via Lucene] 
   [hidden email]
  
Hello again Geert-Jan!
   
What you're trying to do is indeed possible with Solr 4 out of the
 box.
 Other terminology people use for this is multi-value time duration.
   This
creative solution is a pure application of spatial without the
  geospatial
notion -- we're not using an earth or other sphere model -- it's a
 flat
plane.  So no need to make reference to longitude  latitude, it's
 x 
  y.
   
I would put opening time into x, and closing time into y.  To
 express a
point, use x y (x space y), and supply this as a string to your
SpatialRecursivePrefixTreeFieldType based field for indexing.  You
 can
  give
it multiple values and it will work correctly; this is one of RPT's
  main
features that set it apart from Solr 3 spatial.  To query for a
  business
that is open during at least some part of a given time duration, say
  6-8
o'clock, the query would look like openDuration:Intersects(minX
 minY
  maxX
maxY)  and put 0 or minX (always), 6 for minY (start time), 8 for
 maxX
(end time), and the largest possible value for maxY.  You wouldn't
  actually
use 6  8, you'd use the number of 15 minute intervals since your
  epoch for
this equivalent time span.
   
You'll need to configure the field correctly: geo=false
  worldBounds=0 0
maxTime maxTime substituting an 

Re: Modeling openinghours using multipoints

2012-12-08 Thread David Smiley (@MITRE.org)
Hello again Geert-Jan!

What you're trying to do is indeed possible with Solr 4 out of the box. 
Other terminology people use for this is multi-value time duration.  This
creative solution is a pure application of spatial without the geospatial
notion -- we're not using an earth or other sphere model -- it's a flat
plane.  So no need to make reference to longitude  latitude, it's x  y.

I would put opening time into x, and closing time into y.  To express a
point, use x y (x space y), and supply this as a string to your
SpatialRecursivePrefixTreeFieldType based field for indexing.  You can give
it multiple values and it will work correctly; this is one of RPT's main
features that set it apart from Solr 3 spatial.  To query for a business
that is open during at least some part of a given time duration, say 6-8
o'clock, the query would look like openDuration:Intersects(minX minY maxX
maxY)  and put 0 or minX (always), 6 for minY (start time), 8 for maxX (end
time), and the largest possible value for maxY.  You wouldn't actually use 6
 8, you'd use the number of 15 minute intervals since your epoch for this
equivalent time span.

You'll need to configure the field correctly: geo=false worldBounds=0 0
maxTime maxTime substituting an appropriate value for maxTime based on your
unit of time (number of 15 minute intervals you need) and distErrPct=0
(full precision).

Let me know how this works for you.

~ David



-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Modeling-openinghours-using-multipoints-tp4025336p4025359.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Modeling openinghours using multipoints

2012-12-08 Thread David Smiley (@MITRE.org)
britske wrote
 That's seriously awesome!
 
 Some change in the query though:
 You described: To query for a business that is open during at least some
 part of a given time duration
 I want To query for a business that is open during at least the entire
 given time duration.
 
 Feels like a small difference but probably isn't (I'm still wrapping my
 head on the intersect query I must admit)

So this would be a slightly different rectangle query.  Interestingly, you
simply swap the location in the rectangle where you put the start and end
time.  In summary:

Indexed span CONTAINS query span:
minX minY maxX maxY - 0 end start *

Indexed span INTERSECTS (i.e. OVERLAPS) query span:
minX minY maxX maxY - 0 start end *

Indexed span WITHIN query span:
minX minY maxX maxY - start 0 * end

I'm using '*' here to denote the max possible value.  At some point I may
add that as a feature.

That was a fun exercise!  I give you credit in prodding me in this direction
as I'm not sure if this use of spatial would have occurred to me otherwise.


britske wrote
 Moreover, any indication on performance? Should, say, 50.000 docs with
 about 100-200 points each (1 a 2 open-close spans per day) be ok? ( I know
 'your mileage may very' etc. but just a guestimate :)

You should have absolutely no problem.  The real clincher in your favor is
the fact that you only need 9600 discrete time values (so you said), not
Long.MAX_VALUE.  Using Long.MAX_VALUE would simply not be possible with the
current implementation because it's using Doubles which has 52 bits of
precision not the 64 that would be required to be a complete substitute for
any time/date.  Even given the 52 bits, a quad SpatialPrefixTree with
maxLevels=52 would probably not perform well or might fail; not sure. 
Eventually when I have time to work on an implementation that can be based
on a configurable number of grid cells (not unlike how you can configure
precisionStep on the Trie numeric fields), 52 should be no problem.

I'll have to remember to refer back to this email on the approach if I
create a field type that wraps this functionality.

~ David


britske wrote
 Again, this looks good!
 Geert-Jan
 
 2012/12/8 David Smiley (@MITRE.org) [via Lucene] 

 ml-node+s472066n4025359h19@.nabble


 
 Hello again Geert-Jan!

 What you're trying to do is indeed possible with Solr 4 out of the box.
  Other terminology people use for this is multi-value time duration. 
 This
 creative solution is a pure application of spatial without the geospatial
 notion -- we're not using an earth or other sphere model -- it's a flat
 plane.  So no need to make reference to longitude  latitude, it's x  y.

 I would put opening time into x, and closing time into y.  To express a
 point, use x y (x space y), and supply this as a string to your
 SpatialRecursivePrefixTreeFieldType based field for indexing.  You can
 give
 it multiple values and it will work correctly; this is one of RPT's main
 features that set it apart from Solr 3 spatial.  To query for a business
 that is open during at least some part of a given time duration, say 6-8
 o'clock, the query would look like openDuration:Intersects(minX minY
 maxX
 maxY)  and put 0 or minX (always), 6 for minY (start time), 8 for maxX
 (end time), and the largest possible value for maxY.  You wouldn't
 actually
 use 6  8, you'd use the number of 15 minute intervals since your epoch
 for
 this equivalent time span.

 You'll need to configure the field correctly: geo=false worldBounds=0
 0
 maxTime maxTime substituting an appropriate value for maxTime based on
 your unit of time (number of 15 minute intervals you need) and
 distErrPct=0 (full precision).

 Let me know how this works for you.

 ~ David
  Author:
 http://www.packtpub.com/apache-solr-3-enterprise-search-server/book





-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Modeling-openinghours-using-multipoints-tp4025336p4025434.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Modeling openinghours using multipoints

2012-12-08 Thread britske
Brilliant! Got some great ideas for this. Indeed all sorts of usecases which 
use multiple temporal ranges could benefit.. 

Eg: Another Guy on stackoverflow asked me about this some days ago.. He wants 
to model multiple temporary offers per product (free shopping for christmas, 
20% discount for Black friday , etc) .. All possible with this out of the box. 
Factor in 'offer category' in  x and y as well for some extra powerfull 
querying. 

Yup im enthousiastic about it , which im sure you can tell :)

Thanks a lot David,

Cheers,
Geert-Jan 



Sent from my iPhone

On 9 dec. 2012, at 05:35, David Smiley (@MITRE.org) [via Lucene] 
ml-node+s472066n4025434...@n3.nabble.com wrote:

 britske wrote
 That's seriously awesome! 
 
 Some change in the query though: 
 You described: To query for a business that is open during at least some 
 part of a given time duration 
 I want To query for a business that is open during at least the entire 
 given time duration. 
 
 Feels like a small difference but probably isn't (I'm still wrapping my 
 head on the intersect query I must admit)
 So this would be a slightly different rectangle query.  Interestingly, you 
 simply swap the location in the rectangle where you put the start and end 
 time.  In summary: 
 
 Indexed span CONTAINS query span: 
 minX minY maxX maxY - 0 end start * 
 
 Indexed span INTERSECTS (i.e. OVERLAPS) query span: 
 minX minY maxX maxY - 0 start end * 
 
 Indexed span WITHIN query span: 
 minX minY maxX maxY - start 0 * end 
 
 I'm using '*' here to denote the max possible value.  At some point I may add 
 that as a feature. 
 
 That was a fun exercise!  I give you credit in prodding me in this direction 
 as I'm not sure if this use of spatial would have occurred to me otherwise. 
 
 britske wrote
 Moreover, any indication on performance? Should, say, 50.000 docs with 
 about 100-200 points each (1 a 2 open-close spans per day) be ok? ( I know 
 'your mileage may very' etc. but just a guestimate :)
 You should have absolutely no problem.  The real clincher in your favor is 
 the fact that you only need 9600 discrete time values (so you said), not 
 Long.MAX_VALUE.  Using Long.MAX_VALUE would simply not be possible with the 
 current implementation because it's using Doubles which has 52 bits of 
 precision not the 64 that would be required to be a complete substitute for 
 any time/date.  Even given the 52 bits, a quad SpatialPrefixTree with 
 maxLevels=52 would probably not perform well or might fail; not sure.  
 Eventually when I have time to work on an implementation that can be based on 
 a configurable number of grid cells (not unlike how you can configure 
 precisionStep on the Trie numeric fields), 52 should be no problem. 
 
 I'll have to remember to refer back to this email on the approach if I create 
 a field type that wraps this functionality. 
 
 ~ David 
 
 britske wrote
 Again, this looks good! 
 Geert-Jan 
 
 2012/12/8 David Smiley (@MITRE.org) [via Lucene]  
 [hidden email] 
 
  Hello again Geert-Jan! 
  
  What you're trying to do is indeed possible with Solr 4 out of the box. 
   Other terminology people use for this is multi-value time duration.  This 
  creative solution is a pure application of spatial without the geospatial 
  notion -- we're not using an earth or other sphere model -- it's a flat 
  plane.  So no need to make reference to longitude  latitude, it's x  y. 
  
  I would put opening time into x, and closing time into y.  To express a 
  point, use x y (x space y), and supply this as a string to your 
  SpatialRecursivePrefixTreeFieldType based field for indexing.  You can give 
  it multiple values and it will work correctly; this is one of RPT's main 
  features that set it apart from Solr 3 spatial.  To query for a business 
  that is open during at least some part of a given time duration, say 6-8 
  o'clock, the query would look like openDuration:Intersects(minX minY maxX 
  maxY)  and put 0 or minX (always), 6 for minY (start time), 8 for maxX 
  (end time), and the largest possible value for maxY.  You wouldn't actually 
  use 6  8, you'd use the number of 15 minute intervals since your epoch for 
  this equivalent time span. 
  
  You'll need to configure the field correctly: geo=false worldBounds=0 0 
  maxTime maxTime substituting an appropriate value for maxTime based on 
  your unit of time (number of 15 minute intervals you need) and 
  distErrPct=0 (full precision). 
  
  Let me know how this works for you. 
  
  ~ David 
   Author: 
  http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
  Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
 
 
 If you reply to this email, your message will be added to the discussion 
 below:
 http://lucene.472066.n3.nabble.com/Modeling-openinghours-using-multipoints-tp4025336p4025434.html
 To unsubscribe from Modeling openinghours using multipoints, click here.
 NAML




--
View this message in context: