How to import data with a different date format

2010-09-08 Thread Rico Lelina
Hi,

I am attempting to import some of our data into SOLR. I did it the quickest way 
I know because I literally only have 2 days to import the data and do some 
queries for a proof-of-concept.

So I have this data in XML format and I wrote a short XSLT script to convert it 
to the format in solr/example/exampledocs (except I retained the element names 
so I had to modify schema.xml in the conf directory. So far so good -- the 
import works and I can search the data. One of my immediate problems is that 
there is a date field with the format MM/DD/. Looking at schema.xml, it 
seems SOLR accepts only full date fields -- everything seems to be mandatory 
including the Z for Zulu/UTC time according to the doc. Is there a way to 
specify the date format?

Thanks very much.
Rico



RE: How to import data with a different date format

2010-09-08 Thread Markus Jelsma
No. The Datefield [1] will not accept it any other way. You could, however, 
fool your boss and dump your dates in an ordinary string field. But then you 
cannot use some of the nice date features.

 

[1]: http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html
 
-Original message-
From: Rico Lelina rlel...@yahoo.com
Sent: Wed 08-09-2010 17:36
To: solr-user@lucene.apache.org; 
Subject: How to import data with a different date format

Hi,

I am attempting to import some of our data into SOLR. I did it the quickest way 
I know because I literally only have 2 days to import the data and do some 
queries for a proof-of-concept.

So I have this data in XML format and I wrote a short XSLT script to convert it 
to the format in solr/example/exampledocs (except I retained the element names 
so I had to modify schema.xml in the conf directory. So far so good -- the 
import works and I can search the data. One of my immediate problems is that 
there is a date field with the format MM/DD/. Looking at schema.xml, it 
seems SOLR accepts only full date fields -- everything seems to be mandatory 
including the Z for Zulu/UTC time according to the doc. Is there a way to 
specify the date format?

Thanks very much.
Rico



Re: How to import data with a different date format

2010-09-08 Thread Rico Lelina
That was my first thought :-) But it would be nice to be able to do date 
queries. I guess when I export the data I can just add 00:00:00Z.

Thanks.


- Original Message 
From: Markus Jelsma markus.jel...@buyways.nl
To: solr-user@lucene.apache.org
Sent: Wed, September 8, 2010 11:34:32 AM
Subject: RE: How to import data with a different date format

No. The Datefield [1] will not accept it any other way. You could, however, 
fool 
your boss and dump your dates in an ordinary string field. But then you cannot 
use some of the nice date features.

 

[1]: http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html 
 
-Original message-
From: Rico Lelina rlel...@yahoo.com
Sent: Wed 08-09-2010 17:36
To: solr-user@lucene.apache.org; 
Subject: How to import data with a different date format

Hi,

I am attempting to import some of our data into SOLR. I did it the quickest way 
I know because I literally only have 2 days to import the data and do some 
queries for a proof-of-concept.

So I have this data in XML format and I wrote a short XSLT script to convert it 
to the format in solr/example/exampledocs (except I retained the element names 
so I had to modify schema.xml in the conf directory. So far so good -- the 
import works and I can search the data. One of my immediate problems is that 
there is a date field with the format MM/DD/. Looking at schema.xml, it 
seems SOLR accepts only full date fields -- everything seems to be mandatory 
including the Z for Zulu/UTC time according to the doc. Is there a way to 
specify the date format?

Thanks very much.
Rico


RE: Re: How to import data with a different date format

2010-09-08 Thread Markus Jelsma
Your format (MM/DD/) is not compatible. 
 
-Original message-
From: Rico Lelina rlel...@yahoo.com
Sent: Wed 08-09-2010 19:03
To: solr-user@lucene.apache.org; 
Subject: Re: How to import data with a different date format

That was my first thought :-) But it would be nice to be able to do date 
queries. I guess when I export the data I can just add 00:00:00Z.

Thanks.


- Original Message 
From: Markus Jelsma markus.jel...@buyways.nl
To: solr-user@lucene.apache.org
Sent: Wed, September 8, 2010 11:34:32 AM
Subject: RE: How to import data with a different date format

No. The Datefield [1] will not accept it any other way. You could, however, 
fool 
your boss and dump your dates in an ordinary string field. But then you cannot 
use some of the nice date features.



[1]: http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html 

-Original message-
From: Rico Lelina rlel...@yahoo.com
Sent: Wed 08-09-2010 17:36
To: solr-user@lucene.apache.org; 
Subject: How to import data with a different date format

Hi,

I am attempting to import some of our data into SOLR. I did it the quickest way 
I know because I literally only have 2 days to import the data and do some 
queries for a proof-of-concept.

So I have this data in XML format and I wrote a short XSLT script to convert it 
to the format in solr/example/exampledocs (except I retained the element names 
so I had to modify schema.xml in the conf directory. So far so good -- the 
import works and I can search the data. One of my immediate problems is that 
there is a date field with the format MM/DD/. Looking at schema.xml, it 
seems SOLR accepts only full date fields -- everything seems to be mandatory 
including the Z for Zulu/UTC time according to the doc. Is there a way to 
specify the date format?

Thanks very much.
Rico

 


Re: How to import data with a different date format

2010-09-08 Thread Erick Erickson
I think Markus is spot-on given the fact that you have 2 days. Using a
string field is quickest.

However, if you absolutely MUST have functioning dates, there are three
options I can think of:
1 can you make your XSLT transform the dates? Confession; I'm XSLT-ignorant
2 use DIH and DateTransformer, see:
http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer
  you can walk a directory importing all the XML files with
FileDataSource.
http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer3 you
could write a program to do this manually.

But given the time constraints, I suspect your time would be better spent
doing the other stuff and just using string as per Markus. I have no clue
how SOLR-savvy you are, so pardon if this is something you already know. But
lots of people trip up over the string field type, which is NOT tokenized.
You usually want text unless it's some sort of ID So it might be worth
it to do some searching earlier rather than later G

Best
Erick

On Wed, Sep 8, 2010 at 12:34 PM, Markus Jelsma markus.jel...@buyways.nlwrote:

 No. The Datefield [1] will not accept it any other way. You could, however,
 fool your boss and dump your dates in an ordinary string field. But then you
 cannot use some of the nice date features.



 [1]:
 http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html

 -Original message-
 From: Rico Lelina rlel...@yahoo.com
 Sent: Wed 08-09-2010 17:36
 To: solr-user@lucene.apache.org;
 Subject: How to import data with a different date format

 Hi,

 I am attempting to import some of our data into SOLR. I did it the quickest
 way
 I know because I literally only have 2 days to import the data and do some
 queries for a proof-of-concept.

 So I have this data in XML format and I wrote a short XSLT script to
 convert it
 to the format in solr/example/exampledocs (except I retained the element
 names
 so I had to modify schema.xml in the conf directory. So far so good -- the
 import works and I can search the data. One of my immediate problems is
 that
 there is a date field with the format MM/DD/. Looking at schema.xml, it
 seems SOLR accepts only full date fields -- everything seems to be
 mandatory
 including the Z for Zulu/UTC time according to the doc. Is there a way to
 specify the date format?

 Thanks very much.
 Rico




Re: Re: How to import data with a different date format

2010-09-08 Thread Rico Lelina
It will work. The original data is in XML format. I have an XSLT that 
transforms 
the data into the same format as that in exampledocs: adddocfield 
name=../field/doc.../add.



- Original Message 
From: Markus Jelsma markus.jel...@buyways.nl
To: solr-user@lucene.apache.org
Sent: Wed, September 8, 2010 12:06:39 PM
Subject: RE: Re: How to import data with a different date format

Your format (MM/DD/) is not compatible. 
 
-Original message-
From: Rico Lelina rlel...@yahoo.com
Sent: Wed 08-09-2010 19:03
To: solr-user@lucene.apache.org; 
Subject: Re: How to import data with a different date format

That was my first thought :-) But it would be nice to be able to do date 
queries. I guess when I export the data I can just add 00:00:00Z.

Thanks.


- Original Message 
From: Markus Jelsma markus.jel...@buyways.nl
To: solr-user@lucene.apache.org
Sent: Wed, September 8, 2010 11:34:32 AM
Subject: RE: How to import data with a different date format

No. The Datefield [1] will not accept it any other way. You could, however, 
fool 

your boss and dump your dates in an ordinary string field. But then you cannot 
use some of the nice date features.



[1]: http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html 

-Original message-
From: Rico Lelina rlel...@yahoo.com
Sent: Wed 08-09-2010 17:36
To: solr-user@lucene.apache.org; 
Subject: How to import data with a different date format

Hi,

I am attempting to import some of our data into SOLR. I did it the quickest way 
I know because I literally only have 2 days to import the data and do some 
queries for a proof-of-concept.

So I have this data in XML format and I wrote a short XSLT script to convert it 
to the format in solr/example/exampledocs (except I retained the element names 
so I had to modify schema.xml in the conf directory. So far so good -- the 
import works and I can search the data. One of my immediate problems is that 
there is a date field with the format MM/DD/. Looking at schema.xml, it 
seems SOLR accepts only full date fields -- everything seems to be mandatory 
including the Z for Zulu/UTC time according to the doc. Is there a way to 
specify the date format?

Thanks very much.
Rico


Re: How to import data with a different date format

2010-09-08 Thread Rico Lelina
I'm going with option 1, converting MM/DD/ to -MM-DD (which is fairly 
easy in XSLT) and then adding T00:00:00Z to it.

Thanks.



- Original Message 
From: Erick Erickson erickerick...@gmail.com
To: solr-user@lucene.apache.org
Sent: Wed, September 8, 2010 12:09:55 PM
Subject: Re: How to import data with a different date format

I think Markus is spot-on given the fact that you have 2 days. Using a
string field is quickest.

However, if you absolutely MUST have functioning dates, there are three
options I can think of:
1 can you make your XSLT transform the dates? Confession; I'm XSLT-ignorant
2 use DIH and DateTransformer, see:
http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer
  you can walk a directory importing all the XML files with
FileDataSource.
http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer3 you
could write a program to do this manually.

But given the time constraints, I suspect your time would be better spent
doing the other stuff and just using string as per Markus. I have no clue
how SOLR-savvy you are, so pardon if this is something you already know. But
lots of people trip up over the string field type, which is NOT tokenized.
You usually want text unless it's some sort of ID So it might be worth
it to do some searching earlier rather than later G

Best
Erick

On Wed, Sep 8, 2010 at 12:34 PM, Markus Jelsma markus.jel...@buyways.nlwrote:

 No. The Datefield [1] will not accept it any other way. You could, however,
 fool your boss and dump your dates in an ordinary string field. But then you
 cannot use some of the nice date features.



 [1]:
 http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html

 -Original message-
 From: Rico Lelina rlel...@yahoo.com
 Sent: Wed 08-09-2010 17:36
 To: solr-user@lucene.apache.org;
 Subject: How to import data with a different date format

 Hi,

 I am attempting to import some of our data into SOLR. I did it the quickest
 way
 I know because I literally only have 2 days to import the data and do some
 queries for a proof-of-concept.

 So I have this data in XML format and I wrote a short XSLT script to
 convert it
 to the format in solr/example/exampledocs (except I retained the element
 names
 so I had to modify schema.xml in the conf directory. So far so good -- the
 import works and I can search the data. One of my immediate problems is
 that
 there is a date field with the format MM/DD/. Looking at schema.xml, it
 seems SOLR accepts only full date fields -- everything seems to be
 mandatory
 including the Z for Zulu/UTC time according to the doc. Is there a way to
 specify the date format?

 Thanks very much.
 Rico





RE: Re: How to import data with a different date format

2010-09-08 Thread Markus Jelsma
Ah, that answers Erick's question. And mine ;) 
 
-Original message-
From: Rico Lelina rlel...@yahoo.com
Sent: Wed 08-09-2010 19:25
To: solr-user@lucene.apache.org; 
Subject: Re: How to import data with a different date format

I'm going with option 1, converting MM/DD/ to -MM-DD (which is fairly 
easy in XSLT) and then adding T00:00:00Z to it.

Thanks.



- Original Message 
From: Erick Erickson erickerick...@gmail.com
To: solr-user@lucene.apache.org
Sent: Wed, September 8, 2010 12:09:55 PM
Subject: Re: How to import data with a different date format

I think Markus is spot-on given the fact that you have 2 days. Using a
string field is quickest.

However, if you absolutely MUST have functioning dates, there are three
options I can think of:
1 can you make your XSLT transform the dates? Confession; I'm XSLT-ignorant
2 use DIH and DateTransformer, see:
http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer
     you can walk a directory importing all the XML files with
FileDataSource.
http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer3 you
could write a program to do this manually.

But given the time constraints, I suspect your time would be better spent
doing the other stuff and just using string as per Markus. I have no clue
how SOLR-savvy you are, so pardon if this is something you already know. But
lots of people trip up over the string field type, which is NOT tokenized.
You usually want text unless it's some sort of ID So it might be worth
it to do some searching earlier rather than later G

Best
Erick

On Wed, Sep 8, 2010 at 12:34 PM, Markus Jelsma markus.jel...@buyways.nlwrote:

 No. The Datefield [1] will not accept it any other way. You could, however,
 fool your boss and dump your dates in an ordinary string field. But then you
 cannot use some of the nice date features.



 [1]:
 http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html

 -Original message-
 From: Rico Lelina rlel...@yahoo.com
 Sent: Wed 08-09-2010 17:36
 To: solr-user@lucene.apache.org;
 Subject: How to import data with a different date format

 Hi,

 I am attempting to import some of our data into SOLR. I did it the quickest
 way
 I know because I literally only have 2 days to import the data and do some
 queries for a proof-of-concept.

 So I have this data in XML format and I wrote a short XSLT script to
 convert it
 to the format in solr/example/exampledocs (except I retained the element
 names
 so I had to modify schema.xml in the conf directory. So far so good -- the
 import works and I can search the data. One of my immediate problems is
 that
 there is a date field with the format MM/DD/. Looking at schema.xml, it
 seems SOLR accepts only full date fields -- everything seems to be
 mandatory
 including the Z for Zulu/UTC time according to the doc. Is there a way to
 specify the date format?

 Thanks very much.
 Rico





Re: How to import data with a different date format

2010-09-08 Thread Jonathan Rochkind
Just throwing it out there, I'd consider a different approach for an 
actual real app, although it might not be easier to get up quickly. (For 
quickly, yeah, I'd just store it as a string, more on that at bottom).


If none of your dates have times, they're all just full days, I'm not 
sure you really need the date type at all.


Convert the date to number-of-days since epoch integer.  (Most languages 
will have a way to do this, but I don't know about pure XSLT).  Store 
_that_ in a 1.4 'int' field.  On top of that, make it a tint 
(precision non-zero) for faster range queries.


But now your actual interface will have to convert from number of days 
since epoch to a displayable date. (And if you allow user input, 
convert the input to number-of-days-since-epoch before making a range 
query or fq, but you'd have to do that anyway even with solr dates, 
users aren't going to be entering W3CDate raw, I don't think).


That is probably the most efficient way to have solr handle it -- using 
an actual date field type gives you a lot more precision than you need, 
which is going to hurt performance on range queries. Which you can 
compensate for with trie date sure, but if you don't really need that 
precision to begin with, why use it?  Also the extra precision can end 
up doing unexpected things and making it easier to have bugs (range 
queries on that high precision stuff, you need to make sure your start 
date has 00:00:00 set and your end date has 23:59:59 set, to do what you 
probably expect). If you aren't going to use the extra precision, makes 
everything a lot simpler to not use a date field.


Alternately, for your get this done quick method, yeah, I'd just store 
it as a string. With a string exactly as you've specified, sorting and 
range queries won't work how you'd want.  But if you can make it a 
string of the format /mm/dd instead (always two-digit month and 
year), then you can even sort and do range queries on your string dates. 
For the quick and dirty prototype, I'd just do that.  In fact, while 
this might make range queries and sorting _slightly_ slower than if you 
use an int or a tint, this might really be good enough even for a real 
app (hey, it's what lots of people did before the trie-based fields 
existed).


Jonathan

Erick Erickson wrote:

I think Markus is spot-on given the fact that you have 2 days. Using a
string field is quickest.

However, if you absolutely MUST have functioning dates, there are three
options I can think of:
1 can you make your XSLT transform the dates? Confession; I'm XSLT-ignorant
2 use DIH and DateTransformer, see:
http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer
  you can walk a directory importing all the XML files with
FileDataSource.
http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer3 you
could write a program to do this manually.

But given the time constraints, I suspect your time would be better spent
doing the other stuff and just using string as per Markus. I have no clue
how SOLR-savvy you are, so pardon if this is something you already know. But
lots of people trip up over the string field type, which is NOT tokenized.
You usually want text unless it's some sort of ID So it might be worth
it to do some searching earlier rather than later G

Best
Erick

On Wed, Sep 8, 2010 at 12:34 PM, Markus Jelsma markus.jel...@buyways.nlwrote:

  

No. The Datefield [1] will not accept it any other way. You could, however,
fool your boss and dump your dates in an ordinary string field. But then you
cannot use some of the nice date features.



[1]:
http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html

-Original message-
From: Rico Lelina rlel...@yahoo.com
Sent: Wed 08-09-2010 17:36
To: solr-user@lucene.apache.org;
Subject: How to import data with a different date format

Hi,

I am attempting to import some of our data into SOLR. I did it the quickest
way
I know because I literally only have 2 days to import the data and do some
queries for a proof-of-concept.

So I have this data in XML format and I wrote a short XSLT script to
convert it
to the format in solr/example/exampledocs (except I retained the element
names
so I had to modify schema.xml in the conf directory. So far so good -- the
import works and I can search the data. One of my immediate problems is
that
there is a date field with the format MM/DD/. Looking at schema.xml, it
seems SOLR accepts only full date fields -- everything seems to be
mandatory
including the Z for Zulu/UTC time according to the doc. Is there a way to
specify the date format?

Thanks very much.
Rico





  


Re: How to import data with a different date format

2010-09-08 Thread Jonathan Rochkind
I'm really thinking, once you convert to -MM-DD anyway, you might be 
better off just sticking this in a string field, rather than using a 
date field at all. The extra precision in the date field is going to 
make things confusing later, I predict. Especially for a quick and dirty 
prototype, I'd just use a string.


Solr is not an rdbms, our learned behavior to always try and normalize 
everything and define the field 'right' often is not the right way to go 
with solr/lucene.


Jonathan

Rico Lelina wrote:
I'm going with option 1, converting MM/DD/ to -MM-DD (which is fairly 
easy in XSLT) and then adding T00:00:00Z to it.


Thanks.



- Original Message 
From: Erick Erickson erickerick...@gmail.com
To: solr-user@lucene.apache.org
Sent: Wed, September 8, 2010 12:09:55 PM
Subject: Re: How to import data with a different date format

I think Markus is spot-on given the fact that you have 2 days. Using a
string field is quickest.

However, if you absolutely MUST have functioning dates, there are three
options I can think of:
1 can you make your XSLT transform the dates? Confession; I'm XSLT-ignorant
2 use DIH and DateTransformer, see:
http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer
  you can walk a directory importing all the XML files with
FileDataSource.
http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer3 you
could write a program to do this manually.

But given the time constraints, I suspect your time would be better spent
doing the other stuff and just using string as per Markus. I have no clue
how SOLR-savvy you are, so pardon if this is something you already know. But
lots of people trip up over the string field type, which is NOT tokenized.
You usually want text unless it's some sort of ID So it might be worth
it to do some searching earlier rather than later G

Best
Erick

On Wed, Sep 8, 2010 at 12:34 PM, Markus Jelsma markus.jel...@buyways.nlwrote:

  

No. The Datefield [1] will not accept it any other way. You could, however,
fool your boss and dump your dates in an ordinary string field. But then you
cannot use some of the nice date features.



[1]:
http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html

-Original message-
From: Rico Lelina rlel...@yahoo.com
Sent: Wed 08-09-2010 17:36
To: solr-user@lucene.apache.org;
Subject: How to import data with a different date format

Hi,

I am attempting to import some of our data into SOLR. I did it the quickest
way
I know because I literally only have 2 days to import the data and do some
queries for a proof-of-concept.

So I have this data in XML format and I wrote a short XSLT script to
convert it
to the format in solr/example/exampledocs (except I retained the element
names
so I had to modify schema.xml in the conf directory. So far so good -- the
import works and I can search the data. One of my immediate problems is
that
there is a date field with the format MM/DD/. Looking at schema.xml, it
seems SOLR accepts only full date fields -- everything seems to be
mandatory
including the Z for Zulu/UTC time according to the doc. Is there a way to
specify the date format?

Thanks very much.
Rico






  


Re: How to import data with a different date format

2010-09-08 Thread Jonathan Rochkind



how SOLR-savvy you are, so pardon if this is something you already know. But
lots of people trip up over the string field type, which is NOT tokenized.
You usually want text unless it's some sort of ID So it might be worth
it to do some searching earlier rather than later G
  

Why would you want to tokenize a -mm-dd value?

I'm liking the 'string' type.  If you do -mm-dd, then you can even 
sort properly, and range query with endpoints also specified as 
-mm-dd, no?


Okay, I'll stop spamming the thread now, heh.

Jonathan



Re: How to import data with a different date format

2010-09-08 Thread Dennis Gearon
I'm doing something similar for dates/times/timestamps.

I'm actually trying to do, 'now' is within the range of what 
appointments(date/time from and to combos, i.e. timestamps).

Fairly simple search of:

   What items have a start time BEFORE now, and an end time AFTER now?

My thoughts were to store:
  unix time stamp BIGINTS (64 bit)
  ISO_DATE ISO_TIME strings

Which is going to be faster:
   1/ Indexing?
   2/ Searching?

How does the 'tint' field mentioned below apply?



Dennis Gearon

Signature Warning

EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Wed, 9/8/10, Jonathan Rochkind rochk...@jhu.edu wrote:

 From: Jonathan Rochkind rochk...@jhu.edu
 Subject: Re: How to import data with a different date format
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Date: Wednesday, September 8, 2010, 10:27 AM
 Just throwing it out there, I'd
 consider a different approach for an actual real app,
 although it might not be easier to get up quickly. (For
 quickly, yeah, I'd just store it as a string, more on that
 at bottom).
 
 If none of your dates have times, they're all just full
 days, I'm not sure you really need the date type at all.
 
 Convert the date to number-of-days since epoch
 integer.  (Most languages will have a way to do this,
 but I don't know about pure XSLT).  Store _that_ in a
 1.4 'int' field.  On top of that, make it a tint
 (precision non-zero) for faster range queries.
 
 But now your actual interface will have to convert from
 number of days since epoch to a displayable date. (And if
 you allow user input, convert the input to
 number-of-days-since-epoch before making a range query or
 fq, but you'd have to do that anyway even with solr dates,
 users aren't going to be entering W3CDate raw, I don't
 think).
 
 That is probably the most efficient way to have solr handle
 it -- using an actual date field type gives you a lot more
 precision than you need, which is going to hurt performance
 on range queries. Which you can compensate for with trie
 date sure, but if you don't really need that precision to
 begin with, why use it?  Also the extra precision can
 end up doing unexpected things and making it easier to have
 bugs (range queries on that high precision stuff, you need
 to make sure your start date has 00:00:00 set and your end
 date has 23:59:59 set, to do what you probably expect). If
 you aren't going to use the extra precision, makes
 everything a lot simpler to not use a date field.
 
 Alternately, for your get this done quick method, yeah,
 I'd just store it as a string. With a string exactly as
 you've specified, sorting and range queries won't work how
 you'd want.  But if you can make it a string of the
 format /mm/dd instead (always two-digit month and
 year), then you can even sort and do range queries on your
 string dates. For the quick and dirty prototype, I'd just do
 that.  In fact, while this might make range queries and
 sorting _slightly_ slower than if you use an int or a tint,
 this might really be good enough even for a real app (hey,
 it's what lots of people did before the trie-based fields
 existed).
 
 Jonathan
 
 Erick Erickson wrote:
  I think Markus is spot-on given the fact that you have
 2 days. Using a
  string field is quickest.
  
  However, if you absolutely MUST have functioning
 dates, there are three
  options I can think of:
  1 can you make your XSLT transform the dates?
 Confession; I'm XSLT-ignorant
  2 use DIH and DateTransformer, see:
  http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer
        you can walk a
 directory importing all the XML files with
  FileDataSource.
  http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer3
 you
  could write a program to do this manually.
  
  But given the time constraints, I suspect your time
 would be better spent
  doing the other stuff and just using string as per
 Markus. I have no clue
  how SOLR-savvy you are, so pardon if this is something
 you already know. But
  lots of people trip up over the string field type,
 which is NOT tokenized.
  You usually want text unless it's some sort of
 ID So it might be worth
  it to do some searching earlier rather than later
 G
  
  Best
  Erick
  
  On Wed, Sep 8, 2010 at 12:34 PM, Markus Jelsma 
  markus.jel...@buyways.nlwrote:
  
    
  No. The Datefield [1] will not accept it any other
 way. You could, however,
  fool your boss and dump your dates in an ordinary
 string field. But then you
  cannot use some of the nice date features.
  
  
  
  [1]:
  http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html
  
  -Original message-
  From: Rico Lelina rlel...@yahoo.com
  Sent: Wed 08-09-2010 17:36
  To: solr-user@lucene.apache.org;
  Subject: How to import data with a different date
 format
  
  Hi,
  
  I am attempting to import some of our data into
 SOLR. I did

Re: How to import data with a different date format

2010-09-08 Thread Erick Erickson
That was a general comment on SOLR string types. Mostly I wanted to
prompt Rico to try some searching before getting too hung up on indexing
refinements. I'd far rather demo a prototype being able to say Dates don't
work yet, but you can search than searching is broken to pieces, but
dates work fine!.

FWIW
Erick

On Wed, Sep 8, 2010 at 1:33 PM, Jonathan Rochkind rochk...@jhu.edu wrote:


  how SOLR-savvy you are, so pardon if this is something you already know.
 But
 lots of people trip up over the string field type, which is NOT
 tokenized.
 You usually want text unless it's some sort of ID So it might be
 worth
 it to do some searching earlier rather than later G


 Why would you want to tokenize a -mm-dd value?

 I'm liking the 'string' type.  If you do -mm-dd, then you can even sort
 properly, and range query with endpoints also specified as -mm-dd, no?

 Okay, I'll stop spamming the thread now, heh.

 Jonathan




Re: How to import data with a different date format

2010-09-08 Thread Jonathan Rochkind
So the standard 'int' field in Solr 1.4 is a trie based field, 
although the example int type in the default solrconfig.xml has a 
precision set to 0, which means it's not really doing trie things. 
If you set the precision to something greater than 0, as in the default 
example tint type, then it's really using 'trie' functionality.  
'trie' functionality speeds up range queries by putting each value into 
'buckets' (my own term), per the precision specified, so solr has to do 
less to grab all values within a certain range.


That's all tint/non-zero-precision-trie does, speed up range queries. 
Your use case involves range queries though, so it's worth 
investigating.  If you use a string or other textual type for sorting or 
range queries, you need to make sure your values sort the way you want 
them to as strings. But -mm-dd will.


More on trie: 
http://www.lucidimagination.com/blog/2009/05/13/exploring-lucene-and-solrs-trierange-capabilities/


I think there probably won't be much of a difference at query time 
between non-trie int and string, although I'm not sure, and it may 
depend on the nature of your data and queries.   Using a trie int will 
be faster for (and only for) range queries, if you have a lot of data. 
(There are some cases, depending on the data and the nature of your 
queries, where the overhead of a non-zero-precision trie may outweigh 
the hypothetical gain, but generally it's faster). 

I don't think there should be any appreciable difference between how 
long a non-trie int or a string will take to index -- at least as far as 
solr is concerned, if your app preparing the documents for solr takes 
longer to prepare one than another, that's another story. An actual trie 
(non-zero-precision) theoretically has indexing-time overhead, but I 
doubt it would be noticeable, unless you have a really really lean mean 
indexing setup where ever microsecond counts.


Jonathan

Dennis Gearon wrote:

I'm doing something similar for dates/times/timestamps.

I'm actually trying to do, 'now' is within the range of what 
appointments(date/time from and to combos, i.e. timestamps).

Fairly simple search of:

   What items have a start time BEFORE now, and an end time AFTER now?

My thoughts were to store:
  unix time stamp BIGINTS (64 bit)
  ISO_DATE ISO_TIME strings

Which is going to be faster:
   1/ Indexing?
   2/ Searching?

How does the 'tint' field mentioned below apply?



Dennis Gearon

Signature Warning

EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Wed, 9/8/10, Jonathan Rochkind rochk...@jhu.edu wrote:

  

From: Jonathan Rochkind rochk...@jhu.edu
Subject: Re: How to import data with a different date format
To: solr-user@lucene.apache.org solr-user@lucene.apache.org
Date: Wednesday, September 8, 2010, 10:27 AM
Just throwing it out there, I'd
consider a different approach for an actual real app,
although it might not be easier to get up quickly. (For
quickly, yeah, I'd just store it as a string, more on that
at bottom).

If none of your dates have times, they're all just full
days, I'm not sure you really need the date type at all.

Convert the date to number-of-days since epoch
integer.  (Most languages will have a way to do this,
but I don't know about pure XSLT).  Store _that_ in a
1.4 'int' field.  On top of that, make it a tint
(precision non-zero) for faster range queries.

But now your actual interface will have to convert from
number of days since epoch to a displayable date. (And if
you allow user input, convert the input to
number-of-days-since-epoch before making a range query or
fq, but you'd have to do that anyway even with solr dates,
users aren't going to be entering W3CDate raw, I don't
think).

That is probably the most efficient way to have solr handle
it -- using an actual date field type gives you a lot more
precision than you need, which is going to hurt performance
on range queries. Which you can compensate for with trie
date sure, but if you don't really need that precision to
begin with, why use it?  Also the extra precision can
end up doing unexpected things and making it easier to have
bugs (range queries on that high precision stuff, you need
to make sure your start date has 00:00:00 set and your end
date has 23:59:59 set, to do what you probably expect). If
you aren't going to use the extra precision, makes
everything a lot simpler to not use a date field.

Alternately, for your get this done quick method, yeah,
I'd just store it as a string. With a string exactly as
you've specified, sorting and range queries won't work how
you'd want.  But if you can make it a string of the
format /mm/dd instead (always two-digit month and
year), then you can even sort and do range queries on your
string dates. For the quick and dirty prototype, I'd just do
that.  In fact, while this might make range queries and
sorting _slightly_ slower than if you use an int

Re: How to import data with a different date format

2010-09-08 Thread Dennis Gearon
So now, vs when 'trie' came out, Solr has an INT field that IS 'trie', right?

And nothing date/timestamp related has come out since, making 'trie'/INT the 
field of choice for timestamps, right? 

Seems like the fastest choice.

I will have to read up on it.

Seems like my original choice to use unix timestamp as storage in my SQL 
database, vs native Postgres timestamp, will make everything easier between:
  PHP
  Symfony
  Postgres
  Solr

It's probably going to be a good idea to store two other columns in the search 
index for display, 'date', 'time'. That is, unless I force the user's 
javascript to generate the time and date from the unix timestamp. hmm.
  
Dennis Gearon

Signature Warning

EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Wed, 9/8/10, Jonathan Rochkind rochk...@jhu.edu wrote:

 From: Jonathan Rochkind rochk...@jhu.edu
 Subject: Re: How to import data with a different date format
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Date: Wednesday, September 8, 2010, 11:35 AM
 So the standard 'int' field in Solr
 1.4 is a trie based field, although the example int type
 in the default solrconfig.xml has a precision set to 0,
 which means it's not really doing trie things. If you set
 the precision to something greater than 0, as in the default
 example tint type, then it's really using 'trie'
 functionality.  'trie' functionality speeds up range
 queries by putting each value into 'buckets' (my own term),
 per the precision specified, so solr has to do less to grab
 all values within a certain range.
 
 That's all tint/non-zero-precision-trie does, speed up
 range queries. Your use case involves range queries though,
 so it's worth investigating.  If you use a string or
 other textual type for sorting or range queries, you need to
 make sure your values sort the way you want them to as
 strings. But -mm-dd will.
 
 More on trie: 
 http://www.lucidimagination.com/blog/2009/05/13/exploring-lucene-and-solrs-trierange-capabilities/
 
 I think there probably won't be much of a difference at
 query time between non-trie int and string, although I'm not
 sure, and it may depend on the nature of your data and
 queries.   Using a trie int will be faster
 for (and only for) range queries, if you have a lot of data.
 (There are some cases, depending on the data and the nature
 of your queries, where the overhead of a non-zero-precision
 trie may outweigh the hypothetical gain, but generally it's
 faster). 
 I don't think there should be any appreciable difference
 between how long a non-trie int or a string will take to
 index -- at least as far as solr is concerned, if your app
 preparing the documents for solr takes longer to prepare one
 than another, that's another story. An actual trie
 (non-zero-precision) theoretically has indexing-time
 overhead, but I doubt it would be noticeable, unless you
 have a really really lean mean indexing setup where ever
 microsecond counts.
 
 Jonathan
 
 Dennis Gearon wrote:
  I'm doing something similar for
 dates/times/timestamps.
  
  I'm actually trying to do, 'now' is within the range
 of what appointments(date/time from and to combos, i.e.
 timestamps).
  
  Fairly simple search of:
  
     What items have a start time BEFORE now,
 and an end time AFTER now?
  
  My thoughts were to store:
    unix time stamp BIGINTS (64 bit)
    ISO_DATE ISO_TIME strings
  
  Which is going to be faster:
     1/ Indexing?
     2/ Searching?
  
  How does the 'tint' field mentioned below apply?
  
  
  
  Dennis Gearon
  
  Signature Warning
  
  EARTH has a Right To Life,
    otherwise we all die.
  
  Read 'Hot, Flat, and Crowded'
  Laugh at http://www.yert.com/film.php
  
  
  --- On Wed, 9/8/10, Jonathan Rochkind rochk...@jhu.edu
 wrote:
  
    
  From: Jonathan Rochkind rochk...@jhu.edu
  Subject: Re: How to import data with a different
 date format
  To: solr-user@lucene.apache.org
 solr-user@lucene.apache.org
  Date: Wednesday, September 8, 2010, 10:27 AM
  Just throwing it out there, I'd
  consider a different approach for an actual real
 app,
  although it might not be easier to get up quickly.
 (For
  quickly, yeah, I'd just store it as a string, more
 on that
  at bottom).
  
  If none of your dates have times, they're all just
 full
  days, I'm not sure you really need the date type
 at all.
  
  Convert the date to number-of-days since epoch
  integer.  (Most languages will have a way to
 do this,
  but I don't know about pure XSLT).  Store
 _that_ in a
  1.4 'int' field.  On top of that, make it a
 tint
  (precision non-zero) for faster range queries.
  
  But now your actual interface will have to convert
 from
  number of days since epoch to a displayable
 date. (And if
  you allow user input, convert the input to
  number-of-days-since-epoch before making a range
 query or
  fq, but you'd have to do that anyway even

Re: How to import data with a different date format

2010-09-08 Thread Chris Hostetter

: If none of your dates have times, they're all just full days, I'm not sure you
: really need the date type at all.
: 
: Convert the date to number-of-days since epoch integer.  (Most languages will
: have a way to do this, but I don't know about pure XSLT).  Store _that_ in a
: 1.4 'int' field.  On top of that, make it a tint (precision non-zero) for
: faster range queries.

There's really no advantage to doing this over using the TrieDateField 
(available in Solr 1.4).  It's esentially how it's implemented under the 
covers (you can pick the precision just like TrieInt) except that:

1) it uses a long instead of an int
2) it supports DateMath expressions
3) it supports Date Faceting

-Hoss

--
http://lucenerevolution.org/  ...  October 7-8, Boston
http://bit.ly/stump-hoss  ...  Stump The Chump!



Re: How to import data with a different date format

2010-09-08 Thread Jonathan Rochkind

Solr 1.4 was the first tagged release with trie fields.

And Solr 1.4+ also includes a 'date' field based on 'trie' just for 
dates.  If your dates are actually going to include hour/minute/second, 
not just calendar day-of-month, then I'd definitely use the built in 
solr trie date field, that's what it's for, will do the translation from 
calendar date-time to integer for you (in both directions), and add trie 
buckets for fast range querying too.


I was suggesting that just using 'int' might be simpler if you don't 
need hour/minute/second precision, but are just storing year-month-day. 
If you've got hour-minute-second too, no reason not to use Solr's date 
type, and lots of reasons to do so.


Jonathan

Dennis Gearon wrote:

So now, vs when 'trie' came out, Solr has an INT field that IS 'trie', right?

And nothing date/timestamp related has come out since, making 'trie'/INT the 
field of choice for timestamps, right?

Seems like the fastest choice.

I will have to read up on it.

Seems like my original choice to use unix timestamp as storage in my SQL 
database, vs native Postgres timestamp, will make everything easier between:
  PHP
  Symfony
  Postgres
  Solr

It's probably going to be a good idea to store two other columns in the search 
index for display, 'date', 'time'. That is, unless I force the user's 
javascript to generate the time and date from the unix timestamp. hmm.

Dennis Gearon

Signature Warning

EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Wed, 9/8/10, Jonathan Rochkind rochk...@jhu.edu wrote:

  

From: Jonathan Rochkind rochk...@jhu.edu
Subject: Re: How to import data with a different date format
To: solr-user@lucene.apache.org solr-user@lucene.apache.org
Date: Wednesday, September 8, 2010, 11:35 AM
So the standard 'int' field in Solr
1.4 is a trie based field, although the example int type
in the default solrconfig.xml has a precision set to 0,
which means it's not really doing trie things. If you set
the precision to something greater than 0, as in the default
example tint type, then it's really using 'trie'
functionality.  'trie' functionality speeds up range
queries by putting each value into 'buckets' (my own term),
per the precision specified, so solr has to do less to grab
all values within a certain range.

That's all tint/non-zero-precision-trie does, speed up
range queries. Your use case involves range queries though,
so it's worth investigating.  If you use a string or
other textual type for sorting or range queries, you need to
make sure your values sort the way you want them to as
strings. But -mm-dd will.

More on trie: 
http://www.lucidimagination.com/blog/2009/05/13/exploring-lucene-and-solrs-trierange-capabilities/

I think there probably won't be much of a difference at
query time between non-trie int and string, although I'm not
sure, and it may depend on the nature of your data and
queries.   Using a trie int will be faster
for (and only for) range queries, if you have a lot of data.
(There are some cases, depending on the data and the nature
of your queries, where the overhead of a non-zero-precision
trie may outweigh the hypothetical gain, but generally it's
faster).
I don't think there should be any appreciable difference
between how long a non-trie int or a string will take to
index -- at least as far as solr is concerned, if your app
preparing the documents for solr takes longer to prepare one
than another, that's another story. An actual trie
(non-zero-precision) theoretically has indexing-time
overhead, but I doubt it would be noticeable, unless you
have a really really lean mean indexing setup where ever
microsecond counts.

Jonathan

Dennis Gearon wrote:


I'm doing something similar for
  

dates/times/timestamps.


I'm actually trying to do, 'now' is within the range
  

of what appointments(date/time from and to combos, i.e.
timestamps).


Fairly simple search of:

   What items have a start time BEFORE now,
  

and an end time AFTER now?


My thoughts were to store:
  unix time stamp BIGINTS (64 bit)
  ISO_DATE ISO_TIME strings

Which is going to be faster:
   1/ Indexing?
   2/ Searching?

How does the 'tint' field mentioned below apply?



Dennis Gearon

Signature Warning

EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Wed, 9/8/10, Jonathan Rochkind rochk...@jhu.edu
  

wrote:

  

From: Jonathan Rochkind rochk...@jhu.edu
Subject: Re: How to import data with a different


date format


To: solr-user@lucene.apache.org


solr-user@lucene.apache.org


Date: Wednesday, September 8, 2010, 10:27 AM
Just throwing it out there, I'd
consider a different approach for an actual real


app,


although it might not be easier to get up quickly.


(For


quickly, yeah

Re: How to import data with a different date format

2010-09-08 Thread Dennis Gearon
I already have the issue of how to store between different databases, 
languages, platforms, and frameworks.

Settling on LONGINT/unix timestamp solves the problem on all fronts.

I may even send them to the browser and have the JScript convert them to 
date/times (maybe ;-)

So, it's *nix timestamp or bust!

Dennis Gearon

Signature Warning

EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Wed, 9/8/10, Jonathan Rochkind rochk...@jhu.edu wrote:

 From: Jonathan Rochkind rochk...@jhu.edu
 Subject: Re: How to import data with a different date format
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Date: Wednesday, September 8, 2010, 3:07 PM
 Solr 1.4 was the first tagged release
 with trie fields.
 
 And Solr 1.4+ also includes a 'date' field based on 'trie'
 just for 
 dates.  If your dates are actually going to include
 hour/minute/second, 
 not just calendar day-of-month, then I'd definitely use the
 built in 
 solr trie date field, that's what it's for, will do the
 translation from 
 calendar date-time to integer for you (in both directions),
 and add trie 
 buckets for fast range querying too.
 
 I was suggesting that just using 'int' might be simpler if
 you don't 
 need hour/minute/second precision, but are just storing
 year-month-day. 
 If you've got hour-minute-second too, no reason not to use
 Solr's date 
 type, and lots of reasons to do so.
 
 Jonathan
 
 Dennis Gearon wrote:
  So now, vs when 'trie' came out, Solr has an INT field
 that IS 'trie', right?
 
  And nothing date/timestamp related has come out since,
 making 'trie'/INT the field of choice for timestamps,
 right?
 
  Seems like the fastest choice.
 
  I will have to read up on it.
 
  Seems like my original choice to use unix timestamp as
 storage in my SQL database, vs native Postgres timestamp,
 will make everything easier between:
    PHP
    Symfony
    Postgres
    Solr
 
  It's probably going to be a good idea to store two
 other columns in the search index for display, 'date',
 'time'. That is, unless I force the user's javascript to
 generate the time and date from the unix timestamp.
 hmm.
 
  Dennis Gearon
 
  Signature Warning
  
  EARTH has a Right To Life,
    otherwise we all die.
 
  Read 'Hot, Flat, and Crowded'
  Laugh at http://www.yert.com/film.php
 
 
  --- On Wed, 9/8/10, Jonathan Rochkind rochk...@jhu.edu
 wrote:
 
    
  From: Jonathan Rochkind rochk...@jhu.edu
  Subject: Re: How to import data with a different
 date format
  To: solr-user@lucene.apache.org
 solr-user@lucene.apache.org
  Date: Wednesday, September 8, 2010, 11:35 AM
  So the standard 'int' field in Solr
  1.4 is a trie based field, although the example
 int type
  in the default solrconfig.xml has a precision
 set to 0,
  which means it's not really doing trie things.
 If you set
  the precision to something greater than 0, as in
 the default
  example tint type, then it's really using
 'trie'
  functionality.  'trie' functionality speeds
 up range
  queries by putting each value into 'buckets' (my
 own term),
  per the precision specified, so solr has to do
 less to grab
  all values within a certain range.
 
  That's all tint/non-zero-precision-trie does,
 speed up
  range queries. Your use case involves range
 queries though,
  so it's worth investigating.  If you use a
 string or
  other textual type for sorting or range queries,
 you need to
  make sure your values sort the way you want them
 to as
  strings. But -mm-dd will.
 
  More on trie: 
  http://www.lucidimagination.com/blog/2009/05/13/exploring-lucene-and-solrs-trierange-capabilities/
 
  I think there probably won't be much of a
 difference at
  query time between non-trie int and string,
 although I'm not
  sure, and it may depend on the nature of your data
 and
  queries.   Using a trie int will be
 faster
  for (and only for) range queries, if you have a
 lot of data.
  (There are some cases, depending on the data and
 the nature
  of your queries, where the overhead of a
 non-zero-precision
  trie may outweigh the hypothetical gain, but
 generally it's
  faster).
  I don't think there should be any appreciable
 difference
  between how long a non-trie int or a string will
 take to
  index -- at least as far as solr is concerned, if
 your app
  preparing the documents for solr takes longer to
 prepare one
  than another, that's another story. An actual
 trie
  (non-zero-precision) theoretically has
 indexing-time
  overhead, but I doubt it would be noticeable,
 unless you
  have a really really lean mean indexing setup
 where ever
  microsecond counts.
 
  Jonathan
 
  Dennis Gearon wrote:
      
  I'm doing something similar for
        
  dates/times/timestamps.
      
  I'm actually trying to do, 'now' is within
 the range
        
  of what appointments(date/time from and to combos,
 i.e.
  timestamps).
      
  Fairly simple search