Re: Solr: How to index range-pair fields?

2015-08-22 Thread Alexandre Rafalovitch
Sorry Venkat, this is pushing beyond my immediate knowledge. You'd
just need to experiment.

But the document still looks a bit wrong, specifically I don't
understand where those extra 366 values are coming from. It should
be just a two-dimensional coordinates, first one for start of the
range, second for the end. You seem to have 2 extra useless ones.

Regards,
   Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 21 August 2015 at 21:29, vaedama sudheer.u...@gmail.com wrote:
 Alexandre,

 Fantastic answer! I think having a start position would work nicely with my
 use-case :) Also I would prefer to do the date Math during indexing.

 *Question # 1:* Can you please tell me if this doc looks correct (given that
 I am not yet bothered about factoring in year into my use-case) ?

 Student X was `absent` between dates:

  Jan 1, 2015 and Jan 15, 2015
  Feb 13, 2015 and Feb 16, 2015 (assuming that Feb 13 is 43rd day in the
 year 2015 and Feb 16 is 46th day)
  March 19, 2015 and March 25, 2015

 Also X was `present` between dates:

  Jan 25, 2015 and Jan 30, 2015
  Feb 1, 2015 and Feb 12, 2015

 {
   id: X,
   state: [absent, present],
   presentDays: [ [01 15 366 366], [43, 46, 366, 366], [78, 84, 366, 366] ],
   absentDays: [ [25, 30, 366, 366],  [32, 43, 366, 366] ]
 }

 *Question #2:*

 Since I need timestamp level granularity, what is the appropriate way to
 store the field ?

 Student X was `absent` between epoch times:

  1420104600 (9:30 AM, Jan 1 2015) and 1421341200 (5:00 PM, Jan 15, 2015)

 Is it possible to change *worldBounds* to take a polygon structure where I
 can represent millisecond level granularity ?

 Thanks in advance,
 Venkat Sudheer Reddy Aedama




 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-How-to-index-range-pair-fields-tp4224369p4224582.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr: How to index range-pair fields?

2015-08-21 Thread Alexandre Rafalovitch
I can't find the discussion/presentation about it (about 2 years ago),
but basically you can use LatLong geographic field to do this.

You represent start date/time on X axis and end date/time on Y axes.
Then, for search you intersect it with a rectangle of your desired
check dates.

Hopefully this is enough for you to go on.

Regards,
   Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 20 August 2015 at 21:14, vaedama sudheer.u...@gmail.com wrote:
 My scenario is something like this:

 I have a students database. I want to query all the students who were either
 `absent` or `present` during a particular `date-range`.

 For example:

 Student X was `absent` between dates:

  Jan 1, 2015 and Jan 15, 2015
  Feb 13, 2015 and Feb 16, 2015
  March 19, 2015 and March 25, 2015

 Also X was `present` between dates:

  Jan 25, 2015 and Jan 30, 2015
  Feb 1, 2015 and Feb 12, 2015

 (Other days were either school holidays or the teacher was either
 lazy/forgot to take the attendance ;)

 If the date range was only a single-valued field then this approach would
 work:
 http://stackoverflow.com/questions/25246204/solr-query-for-documents-whose-from-to-date-range-contains-the-user-input.
 I have multiple-date ranges for each student, so this would not work for my
 use-case.

 Lucent 5.0 has support for `DateRangeField`
 (http://lucene.apache.org/solr/5_0_0/solr-core/index.html?org/apache/solr/schema/DateRangeField.html
 ) which is perfect for my use-case, but I cannot upgrade to 5.0 yet! I am on
 Lucene 4.1.0. David Smiley had mentioned that it would be ported to 4.x but
 I guess it never happened (https://issues.apache.org/jira/browse/SOLR-6103,
 I can try porting this patch my-self but I would like to know what it takes
 and opinions)

 So basically, I need to maintain relationship between the start and end
 dates for each of the `state`s (absence or presence). So I thought I would
 need to index the fields as pairs as mentioned here:
 http://grokbase.com/t/lucene/solr-user/128r96vwz6/how-do-i-represent-a-group-of-customer-key-value-pairs

 I guess my schema would look like:

 fieldType name=tdate class=solr.TrieDateField omitNorms=true
 precisionStep=6 positionIncrementGap=0/

 field name=state type=string indexed=true stored=true
 multiValued=true/
 dynamicField name=presenceStartTime_* type=tdate indexed=true
 stored=true/
 dynamicField name=presenceEndTime_* type=tdate indexed=true
 stored=true/
 dynamicField name=absenceStartTime_* type=tdate indexed=true
 stored=true/
 dynamicField name=absenceEndTime_* type=tdate indexed=true
 stored=true/

 **Question #1:** Does this look correct ?

 **Question #2:** What are the ramifications if I use `tlong` instead of
 `tdate` ? My `tlong` type looks like this:

 fieldType name=tlong class=solr.TrieLongField precisionStep=8
 omitNorms=true positionIncrementGap=0/

 **Question #3:** So in this case, for the query: get all the students who
 were absent between a date range would the query would look something
 similar to this ?

 (state: absent) AND
 (absenceStartTime1: givenLowerBoundDate) AND
 (absenceStartTime2: givenLowerBoundDate) AND
 (absenceStartTime3: givenLowerBoundDate) AND
 (absenceEndTime1: givenUpperBoundDate) AND
 (absenceEndTime2: givenUpperBoundDate) AND
 (absenceEndTime3: givenUpperBoundDate)


 This would work only if I knew that there were 3 dates in which the student
 was absent before hand and there's no way to query all dynamic fields with
 wild-cards according to
 http://stackoverflow.com/questions/6213184/solr-search-query-for-dynamic-fields-indexed

 **Question #4:** The workaround mentioned in one of the answers in that
 question did not look terrible but seemed a bit complicated. Is there a
 better alternative for solving this problem in Solr ?

 Of course, I would be highly interested in any other better approaches.



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-How-to-index-range-pair-fields-tp4224369.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr: How to index range-pair fields?

2015-08-21 Thread vaedama
Alexandre,

How would the data type look like ?

Currently, this is what I have:

fieldType name=days_of_year
  class=solr.SpatialRecursivePrefixTreeFieldType
  geo=false
  worldBounds=0 0 366 366
  distErrPct=0
  maxDistErr=0.0009
  units=degrees
/

   field name=state type=string indexed=true stored=true
multiValued=true
   field name=presentDays type=days_of_year indexed=true
stored=true multiValued=true/
   field name=absentDays type=days_of_year indexed=true stored=true
multiValued=true”/

This is how I am indexing each record:
for-each student:
get the presence/absence period list
for each presence/absence period
get the state (either presence or absence) and add the value to
*state* field inside the doc
if the state is absence, add the absence period to the
*absentDays* field
if the state is presence, add the presence period to the
*presentDays* field

So, for student X (taken from my previous msg):

Student X was `absent` between dates: 

 Jan 1, 2015 and Jan 15, 2015 
 Feb 13, 2015 and Feb 16, 2015 
 March 19, 2015 and March 25, 2015 

Also X was `present` between dates: 

 Jan 25, 2015 and Jan 30, 2015 
 Feb 1, 2015 and Feb 12, 2015 


This is how I think my student record would look like. Does it look correct
?

{
  id: X,
  state: [absent, present]
  presentDays: [ [01 15 366 366], [13, 16, 366, 366], [19, 25, 366, 366] ]
  absentDays: [ [25, 30, 366, 366],  [1, 12, 366, 366] ]
}
  

Also how would I represent the year in this case:
Student Y was absent between Jan 1, *2012* to Feb 1, *2015* ?

I would appreciate if you can provide an example of how to modify my
fieldType definition to store timestamp level granularity.

Thanks,
Sudheer



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-How-to-index-range-pair-fields-tp4224369p4224526.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr: How to index range-pair fields?

2015-08-21 Thread Alexandre Rafalovitch
On 21 August 2015 at 15:32, vaedama sudheer.u...@gmail.com wrote:
 presentDays: [ [01 15 366 366], [13, 16, 366, 366], [19, 25, 366, 366] ]

This does not look right. Your January 1 2015 should map to a single
number, representing 'X' in the coordinates. Your January 15 2015
should map to another number, representing Y in the coordinates.

That's why the world bounds is 0-366 (366 being the maximum number of
days in the year, ignoring specific year).

So, if you ignore a year, January 1 is '1', January 15 is '15,
February 1 is '32', etc.

If you don't ignore year, you need to factor it in somehow, perhaps as
a day offset from a particular start position, e.g. 2010. You would
need to do some date Math during indexing, either in the client, or in
the UpdateRequestProcessor

Regards,
   Alex.


Re: Solr: How to index range-pair fields?

2015-08-21 Thread Alexandre Rafalovitch
These look right.

Then, you just play around with mapping. Your dates to coordinates
could be as granular as you want as long as they fit into data type.
And with this being school, your epochs might be smaller (e.g.
semesters) and kept as a separate number.

Regards,
   Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 21 August 2015 at 13:57, vaedama sudheer.u...@gmail.com wrote:
 Hi Alexandre,

 Thanks for your reply!

 I guess these are the links that you were referring to :)

 http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201212.mbox/%3c1354991310424-4025359.p...@n3.nabble.com%3E
 https://wiki.apache.org/solr/SpatialForTimeDurations
 https://people.apache.org/~hossman/spatial-for-non-spatial-meetup-20130117/

 That would work if the time ranges were days_of_year. But I want to also
 maintain the timestamp level granularity. Apologies for not mentioning that
 in my earlier email.

 Thanks,
 Venkat Sudheer Reddy Aedama



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-How-to-index-range-pair-fields-tp4224369p4224508.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr: How to index range-pair fields?

2015-08-21 Thread Erick Erickson
You can always index to a second field with date math, or
even pull out the day as you're indexing.

Best,
Erick

On Fri, Aug 21, 2015 at 10:57 AM, vaedama sudheer.u...@gmail.com wrote:
 Hi Alexandre,

 Thanks for your reply!

 I guess these are the links that you were referring to :)

 http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201212.mbox/%3c1354991310424-4025359.p...@n3.nabble.com%3E
 https://wiki.apache.org/solr/SpatialForTimeDurations
 https://people.apache.org/~hossman/spatial-for-non-spatial-meetup-20130117/

 That would work if the time ranges were days_of_year. But I want to also
 maintain the timestamp level granularity. Apologies for not mentioning that
 in my earlier email.

 Thanks,
 Venkat Sudheer Reddy Aedama



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-How-to-index-range-pair-fields-tp4224369p4224508.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr: How to index range-pair fields?

2015-08-21 Thread vaedama
Hello Eric,

Thanks for your reply. 

You can always index to a second field with date math, or 
even pull out the day as you're indexing. 

How would this second field look like ? Can you please provide me an example
for both fieldType definition and field definition ?

Also, please tell me how would my query fit into this ? Specifically, my
use-case is CONTAINS (I do not care about INTERSECT or WITHIN).

Thanks in advance.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-How-to-index-range-pair-fields-tp4224369p4224520.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr: How to index range-pair fields?

2015-08-21 Thread vaedama
Hi Alexandre,

Thanks for your reply!

I guess these are the links that you were referring to :)

http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201212.mbox/%3c1354991310424-4025359.p...@n3.nabble.com%3E
https://wiki.apache.org/solr/SpatialForTimeDurations
https://people.apache.org/~hossman/spatial-for-non-spatial-meetup-20130117/

That would work if the time ranges were days_of_year. But I want to also
maintain the timestamp level granularity. Apologies for not mentioning that
in my earlier email.

Thanks,
Venkat Sudheer Reddy Aedama



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-How-to-index-range-pair-fields-tp4224369p4224508.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr: How to index range-pair fields?

2015-08-21 Thread vaedama
Alexandre,

Fantastic answer! I think having a start position would work nicely with my
use-case :) Also I would prefer to do the date Math during indexing.

*Question # 1:* Can you please tell me if this doc looks correct (given that
I am not yet bothered about factoring in year into my use-case) ?

Student X was `absent` between dates:

 Jan 1, 2015 and Jan 15, 2015 
 Feb 13, 2015 and Feb 16, 2015 (assuming that Feb 13 is 43rd day in the
year 2015 and Feb 16 is 46th day)
 March 19, 2015 and March 25, 2015 

Also X was `present` between dates: 

 Jan 25, 2015 and Jan 30, 2015 
 Feb 1, 2015 and Feb 12, 2015

{ 
  id: X, 
  state: [absent, present],
  presentDays: [ [01 15 366 366], [43, 46, 366, 366], [78, 84, 366, 366] ],
  absentDays: [ [25, 30, 366, 366],  [32, 43, 366, 366] ] 
} 

*Question #2:*

Since I need timestamp level granularity, what is the appropriate way to
store the field ?

Student X was `absent` between epoch times:

 1420104600 (9:30 AM, Jan 1 2015) and 1421341200 (5:00 PM, Jan 15, 2015)

Is it possible to change *worldBounds* to take a polygon structure where I
can represent millisecond level granularity ?

Thanks in advance,
Venkat Sudheer Reddy Aedama




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-How-to-index-range-pair-fields-tp4224369p4224582.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr: How to index range-pair fields?

2015-08-20 Thread vaedama
My scenario is something like this:

I have a students database. I want to query all the students who were either
`absent` or `present` during a particular `date-range`.

For example:

Student X was `absent` between dates:

 Jan 1, 2015 and Jan 15, 2015
 Feb 13, 2015 and Feb 16, 2015
 March 19, 2015 and March 25, 2015

Also X was `present` between dates:

 Jan 25, 2015 and Jan 30, 2015
 Feb 1, 2015 and Feb 12, 2015

(Other days were either school holidays or the teacher was either
lazy/forgot to take the attendance ;)

If the date range was only a single-valued field then this approach would
work:
http://stackoverflow.com/questions/25246204/solr-query-for-documents-whose-from-to-date-range-contains-the-user-input.
I have multiple-date ranges for each student, so this would not work for my
use-case.

Lucent 5.0 has support for `DateRangeField`
(http://lucene.apache.org/solr/5_0_0/solr-core/index.html?org/apache/solr/schema/DateRangeField.html
) which is perfect for my use-case, but I cannot upgrade to 5.0 yet! I am on
Lucene 4.1.0. David Smiley had mentioned that it would be ported to 4.x but
I guess it never happened (https://issues.apache.org/jira/browse/SOLR-6103,
I can try porting this patch my-self but I would like to know what it takes
and opinions)

So basically, I need to maintain relationship between the start and end
dates for each of the `state`s (absence or presence). So I thought I would
need to index the fields as pairs as mentioned here:
http://grokbase.com/t/lucene/solr-user/128r96vwz6/how-do-i-represent-a-group-of-customer-key-value-pairs

I guess my schema would look like:

fieldType name=tdate class=solr.TrieDateField omitNorms=true
precisionStep=6 positionIncrementGap=0/

field name=state type=string indexed=true stored=true
multiValued=true/
dynamicField name=presenceStartTime_* type=tdate indexed=true
stored=true/
dynamicField name=presenceEndTime_* type=tdate indexed=true
stored=true/
dynamicField name=absenceStartTime_* type=tdate indexed=true
stored=true/
dynamicField name=absenceEndTime_* type=tdate indexed=true
stored=true/

**Question #1:** Does this look correct ? 

**Question #2:** What are the ramifications if I use `tlong` instead of
`tdate` ? My `tlong` type looks like this:

fieldType name=tlong class=solr.TrieLongField precisionStep=8
omitNorms=true positionIncrementGap=0/

**Question #3:** So in this case, for the query: get all the students who
were absent between a date range would the query would look something
similar to this ?

(state: absent) AND 
(absenceStartTime1: givenLowerBoundDate) AND
(absenceStartTime2: givenLowerBoundDate) AND
(absenceStartTime3: givenLowerBoundDate) AND
(absenceEndTime1: givenUpperBoundDate) AND
(absenceEndTime2: givenUpperBoundDate) AND
(absenceEndTime3: givenUpperBoundDate)


This would work only if I knew that there were 3 dates in which the student
was absent before hand and there's no way to query all dynamic fields with
wild-cards according to
http://stackoverflow.com/questions/6213184/solr-search-query-for-dynamic-fields-indexed

**Question #4:** The workaround mentioned in one of the answers in that
question did not look terrible but seemed a bit complicated. Is there a
better alternative for solving this problem in Solr ?

Of course, I would be highly interested in any other better approaches.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-How-to-index-range-pair-fields-tp4224369.html
Sent from the Solr - User mailing list archive at Nabble.com.