How to import data with a different date format
Hi, I am attempting to import some of our data into SOLR. I did it the quickest way I know because I literally only have 2 days to import the data and do some queries for a proof-of-concept. So I have this data in XML format and I wrote a short XSLT script to convert it to the format in solr/example/exampledocs (except I retained the element names so I had to modify schema.xml in the conf directory. So far so good -- the import works and I can search the data. One of my immediate problems is that there is a date field with the format MM/DD/. Looking at schema.xml, it seems SOLR accepts only full date fields -- everything seems to be mandatory including the Z for Zulu/UTC time according to the doc. Is there a way to specify the date format? Thanks very much. Rico
RE: How to import data with a different date format
No. The Datefield [1] will not accept it any other way. You could, however, fool your boss and dump your dates in an ordinary string field. But then you cannot use some of the nice date features. [1]: http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html -Original message- From: Rico Lelina rlel...@yahoo.com Sent: Wed 08-09-2010 17:36 To: solr-user@lucene.apache.org; Subject: How to import data with a different date format Hi, I am attempting to import some of our data into SOLR. I did it the quickest way I know because I literally only have 2 days to import the data and do some queries for a proof-of-concept. So I have this data in XML format and I wrote a short XSLT script to convert it to the format in solr/example/exampledocs (except I retained the element names so I had to modify schema.xml in the conf directory. So far so good -- the import works and I can search the data. One of my immediate problems is that there is a date field with the format MM/DD/. Looking at schema.xml, it seems SOLR accepts only full date fields -- everything seems to be mandatory including the Z for Zulu/UTC time according to the doc. Is there a way to specify the date format? Thanks very much. Rico
Re: How to import data with a different date format
That was my first thought :-) But it would be nice to be able to do date queries. I guess when I export the data I can just add 00:00:00Z. Thanks. - Original Message From: Markus Jelsma markus.jel...@buyways.nl To: solr-user@lucene.apache.org Sent: Wed, September 8, 2010 11:34:32 AM Subject: RE: How to import data with a different date format No. The Datefield [1] will not accept it any other way. You could, however, fool your boss and dump your dates in an ordinary string field. But then you cannot use some of the nice date features. [1]: http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html -Original message- From: Rico Lelina rlel...@yahoo.com Sent: Wed 08-09-2010 17:36 To: solr-user@lucene.apache.org; Subject: How to import data with a different date format Hi, I am attempting to import some of our data into SOLR. I did it the quickest way I know because I literally only have 2 days to import the data and do some queries for a proof-of-concept. So I have this data in XML format and I wrote a short XSLT script to convert it to the format in solr/example/exampledocs (except I retained the element names so I had to modify schema.xml in the conf directory. So far so good -- the import works and I can search the data. One of my immediate problems is that there is a date field with the format MM/DD/. Looking at schema.xml, it seems SOLR accepts only full date fields -- everything seems to be mandatory including the Z for Zulu/UTC time according to the doc. Is there a way to specify the date format? Thanks very much. Rico
RE: Re: How to import data with a different date format
Your format (MM/DD/) is not compatible. -Original message- From: Rico Lelina rlel...@yahoo.com Sent: Wed 08-09-2010 19:03 To: solr-user@lucene.apache.org; Subject: Re: How to import data with a different date format That was my first thought :-) But it would be nice to be able to do date queries. I guess when I export the data I can just add 00:00:00Z. Thanks. - Original Message From: Markus Jelsma markus.jel...@buyways.nl To: solr-user@lucene.apache.org Sent: Wed, September 8, 2010 11:34:32 AM Subject: RE: How to import data with a different date format No. The Datefield [1] will not accept it any other way. You could, however, fool your boss and dump your dates in an ordinary string field. But then you cannot use some of the nice date features. [1]: http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html -Original message- From: Rico Lelina rlel...@yahoo.com Sent: Wed 08-09-2010 17:36 To: solr-user@lucene.apache.org; Subject: How to import data with a different date format Hi, I am attempting to import some of our data into SOLR. I did it the quickest way I know because I literally only have 2 days to import the data and do some queries for a proof-of-concept. So I have this data in XML format and I wrote a short XSLT script to convert it to the format in solr/example/exampledocs (except I retained the element names so I had to modify schema.xml in the conf directory. So far so good -- the import works and I can search the data. One of my immediate problems is that there is a date field with the format MM/DD/. Looking at schema.xml, it seems SOLR accepts only full date fields -- everything seems to be mandatory including the Z for Zulu/UTC time according to the doc. Is there a way to specify the date format? Thanks very much. Rico
Re: How to import data with a different date format
I think Markus is spot-on given the fact that you have 2 days. Using a string field is quickest. However, if you absolutely MUST have functioning dates, there are three options I can think of: 1 can you make your XSLT transform the dates? Confession; I'm XSLT-ignorant 2 use DIH and DateTransformer, see: http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer you can walk a directory importing all the XML files with FileDataSource. http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer3 you could write a program to do this manually. But given the time constraints, I suspect your time would be better spent doing the other stuff and just using string as per Markus. I have no clue how SOLR-savvy you are, so pardon if this is something you already know. But lots of people trip up over the string field type, which is NOT tokenized. You usually want text unless it's some sort of ID So it might be worth it to do some searching earlier rather than later G Best Erick On Wed, Sep 8, 2010 at 12:34 PM, Markus Jelsma markus.jel...@buyways.nlwrote: No. The Datefield [1] will not accept it any other way. You could, however, fool your boss and dump your dates in an ordinary string field. But then you cannot use some of the nice date features. [1]: http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html -Original message- From: Rico Lelina rlel...@yahoo.com Sent: Wed 08-09-2010 17:36 To: solr-user@lucene.apache.org; Subject: How to import data with a different date format Hi, I am attempting to import some of our data into SOLR. I did it the quickest way I know because I literally only have 2 days to import the data and do some queries for a proof-of-concept. So I have this data in XML format and I wrote a short XSLT script to convert it to the format in solr/example/exampledocs (except I retained the element names so I had to modify schema.xml in the conf directory. So far so good -- the import works and I can search the data. One of my immediate problems is that there is a date field with the format MM/DD/. Looking at schema.xml, it seems SOLR accepts only full date fields -- everything seems to be mandatory including the Z for Zulu/UTC time according to the doc. Is there a way to specify the date format? Thanks very much. Rico
Re: Re: How to import data with a different date format
It will work. The original data is in XML format. I have an XSLT that transforms the data into the same format as that in exampledocs: adddocfield name=../field/doc.../add. - Original Message From: Markus Jelsma markus.jel...@buyways.nl To: solr-user@lucene.apache.org Sent: Wed, September 8, 2010 12:06:39 PM Subject: RE: Re: How to import data with a different date format Your format (MM/DD/) is not compatible. -Original message- From: Rico Lelina rlel...@yahoo.com Sent: Wed 08-09-2010 19:03 To: solr-user@lucene.apache.org; Subject: Re: How to import data with a different date format That was my first thought :-) But it would be nice to be able to do date queries. I guess when I export the data I can just add 00:00:00Z. Thanks. - Original Message From: Markus Jelsma markus.jel...@buyways.nl To: solr-user@lucene.apache.org Sent: Wed, September 8, 2010 11:34:32 AM Subject: RE: How to import data with a different date format No. The Datefield [1] will not accept it any other way. You could, however, fool your boss and dump your dates in an ordinary string field. But then you cannot use some of the nice date features. [1]: http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html -Original message- From: Rico Lelina rlel...@yahoo.com Sent: Wed 08-09-2010 17:36 To: solr-user@lucene.apache.org; Subject: How to import data with a different date format Hi, I am attempting to import some of our data into SOLR. I did it the quickest way I know because I literally only have 2 days to import the data and do some queries for a proof-of-concept. So I have this data in XML format and I wrote a short XSLT script to convert it to the format in solr/example/exampledocs (except I retained the element names so I had to modify schema.xml in the conf directory. So far so good -- the import works and I can search the data. One of my immediate problems is that there is a date field with the format MM/DD/. Looking at schema.xml, it seems SOLR accepts only full date fields -- everything seems to be mandatory including the Z for Zulu/UTC time according to the doc. Is there a way to specify the date format? Thanks very much. Rico
Re: How to import data with a different date format
I'm going with option 1, converting MM/DD/ to -MM-DD (which is fairly easy in XSLT) and then adding T00:00:00Z to it. Thanks. - Original Message From: Erick Erickson erickerick...@gmail.com To: solr-user@lucene.apache.org Sent: Wed, September 8, 2010 12:09:55 PM Subject: Re: How to import data with a different date format I think Markus is spot-on given the fact that you have 2 days. Using a string field is quickest. However, if you absolutely MUST have functioning dates, there are three options I can think of: 1 can you make your XSLT transform the dates? Confession; I'm XSLT-ignorant 2 use DIH and DateTransformer, see: http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer you can walk a directory importing all the XML files with FileDataSource. http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer3 you could write a program to do this manually. But given the time constraints, I suspect your time would be better spent doing the other stuff and just using string as per Markus. I have no clue how SOLR-savvy you are, so pardon if this is something you already know. But lots of people trip up over the string field type, which is NOT tokenized. You usually want text unless it's some sort of ID So it might be worth it to do some searching earlier rather than later G Best Erick On Wed, Sep 8, 2010 at 12:34 PM, Markus Jelsma markus.jel...@buyways.nlwrote: No. The Datefield [1] will not accept it any other way. You could, however, fool your boss and dump your dates in an ordinary string field. But then you cannot use some of the nice date features. [1]: http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html -Original message- From: Rico Lelina rlel...@yahoo.com Sent: Wed 08-09-2010 17:36 To: solr-user@lucene.apache.org; Subject: How to import data with a different date format Hi, I am attempting to import some of our data into SOLR. I did it the quickest way I know because I literally only have 2 days to import the data and do some queries for a proof-of-concept. So I have this data in XML format and I wrote a short XSLT script to convert it to the format in solr/example/exampledocs (except I retained the element names so I had to modify schema.xml in the conf directory. So far so good -- the import works and I can search the data. One of my immediate problems is that there is a date field with the format MM/DD/. Looking at schema.xml, it seems SOLR accepts only full date fields -- everything seems to be mandatory including the Z for Zulu/UTC time according to the doc. Is there a way to specify the date format? Thanks very much. Rico
RE: Re: How to import data with a different date format
Ah, that answers Erick's question. And mine ;) -Original message- From: Rico Lelina rlel...@yahoo.com Sent: Wed 08-09-2010 19:25 To: solr-user@lucene.apache.org; Subject: Re: How to import data with a different date format I'm going with option 1, converting MM/DD/ to -MM-DD (which is fairly easy in XSLT) and then adding T00:00:00Z to it. Thanks. - Original Message From: Erick Erickson erickerick...@gmail.com To: solr-user@lucene.apache.org Sent: Wed, September 8, 2010 12:09:55 PM Subject: Re: How to import data with a different date format I think Markus is spot-on given the fact that you have 2 days. Using a string field is quickest. However, if you absolutely MUST have functioning dates, there are three options I can think of: 1 can you make your XSLT transform the dates? Confession; I'm XSLT-ignorant 2 use DIH and DateTransformer, see: http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer you can walk a directory importing all the XML files with FileDataSource. http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer3 you could write a program to do this manually. But given the time constraints, I suspect your time would be better spent doing the other stuff and just using string as per Markus. I have no clue how SOLR-savvy you are, so pardon if this is something you already know. But lots of people trip up over the string field type, which is NOT tokenized. You usually want text unless it's some sort of ID So it might be worth it to do some searching earlier rather than later G Best Erick On Wed, Sep 8, 2010 at 12:34 PM, Markus Jelsma markus.jel...@buyways.nlwrote: No. The Datefield [1] will not accept it any other way. You could, however, fool your boss and dump your dates in an ordinary string field. But then you cannot use some of the nice date features. [1]: http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html -Original message- From: Rico Lelina rlel...@yahoo.com Sent: Wed 08-09-2010 17:36 To: solr-user@lucene.apache.org; Subject: How to import data with a different date format Hi, I am attempting to import some of our data into SOLR. I did it the quickest way I know because I literally only have 2 days to import the data and do some queries for a proof-of-concept. So I have this data in XML format and I wrote a short XSLT script to convert it to the format in solr/example/exampledocs (except I retained the element names so I had to modify schema.xml in the conf directory. So far so good -- the import works and I can search the data. One of my immediate problems is that there is a date field with the format MM/DD/. Looking at schema.xml, it seems SOLR accepts only full date fields -- everything seems to be mandatory including the Z for Zulu/UTC time according to the doc. Is there a way to specify the date format? Thanks very much. Rico
Re: How to import data with a different date format
Just throwing it out there, I'd consider a different approach for an actual real app, although it might not be easier to get up quickly. (For quickly, yeah, I'd just store it as a string, more on that at bottom). If none of your dates have times, they're all just full days, I'm not sure you really need the date type at all. Convert the date to number-of-days since epoch integer. (Most languages will have a way to do this, but I don't know about pure XSLT). Store _that_ in a 1.4 'int' field. On top of that, make it a tint (precision non-zero) for faster range queries. But now your actual interface will have to convert from number of days since epoch to a displayable date. (And if you allow user input, convert the input to number-of-days-since-epoch before making a range query or fq, but you'd have to do that anyway even with solr dates, users aren't going to be entering W3CDate raw, I don't think). That is probably the most efficient way to have solr handle it -- using an actual date field type gives you a lot more precision than you need, which is going to hurt performance on range queries. Which you can compensate for with trie date sure, but if you don't really need that precision to begin with, why use it? Also the extra precision can end up doing unexpected things and making it easier to have bugs (range queries on that high precision stuff, you need to make sure your start date has 00:00:00 set and your end date has 23:59:59 set, to do what you probably expect). If you aren't going to use the extra precision, makes everything a lot simpler to not use a date field. Alternately, for your get this done quick method, yeah, I'd just store it as a string. With a string exactly as you've specified, sorting and range queries won't work how you'd want. But if you can make it a string of the format /mm/dd instead (always two-digit month and year), then you can even sort and do range queries on your string dates. For the quick and dirty prototype, I'd just do that. In fact, while this might make range queries and sorting _slightly_ slower than if you use an int or a tint, this might really be good enough even for a real app (hey, it's what lots of people did before the trie-based fields existed). Jonathan Erick Erickson wrote: I think Markus is spot-on given the fact that you have 2 days. Using a string field is quickest. However, if you absolutely MUST have functioning dates, there are three options I can think of: 1 can you make your XSLT transform the dates? Confession; I'm XSLT-ignorant 2 use DIH and DateTransformer, see: http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer you can walk a directory importing all the XML files with FileDataSource. http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer3 you could write a program to do this manually. But given the time constraints, I suspect your time would be better spent doing the other stuff and just using string as per Markus. I have no clue how SOLR-savvy you are, so pardon if this is something you already know. But lots of people trip up over the string field type, which is NOT tokenized. You usually want text unless it's some sort of ID So it might be worth it to do some searching earlier rather than later G Best Erick On Wed, Sep 8, 2010 at 12:34 PM, Markus Jelsma markus.jel...@buyways.nlwrote: No. The Datefield [1] will not accept it any other way. You could, however, fool your boss and dump your dates in an ordinary string field. But then you cannot use some of the nice date features. [1]: http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html -Original message- From: Rico Lelina rlel...@yahoo.com Sent: Wed 08-09-2010 17:36 To: solr-user@lucene.apache.org; Subject: How to import data with a different date format Hi, I am attempting to import some of our data into SOLR. I did it the quickest way I know because I literally only have 2 days to import the data and do some queries for a proof-of-concept. So I have this data in XML format and I wrote a short XSLT script to convert it to the format in solr/example/exampledocs (except I retained the element names so I had to modify schema.xml in the conf directory. So far so good -- the import works and I can search the data. One of my immediate problems is that there is a date field with the format MM/DD/. Looking at schema.xml, it seems SOLR accepts only full date fields -- everything seems to be mandatory including the Z for Zulu/UTC time according to the doc. Is there a way to specify the date format? Thanks very much. Rico
Re: How to import data with a different date format
I'm really thinking, once you convert to -MM-DD anyway, you might be better off just sticking this in a string field, rather than using a date field at all. The extra precision in the date field is going to make things confusing later, I predict. Especially for a quick and dirty prototype, I'd just use a string. Solr is not an rdbms, our learned behavior to always try and normalize everything and define the field 'right' often is not the right way to go with solr/lucene. Jonathan Rico Lelina wrote: I'm going with option 1, converting MM/DD/ to -MM-DD (which is fairly easy in XSLT) and then adding T00:00:00Z to it. Thanks. - Original Message From: Erick Erickson erickerick...@gmail.com To: solr-user@lucene.apache.org Sent: Wed, September 8, 2010 12:09:55 PM Subject: Re: How to import data with a different date format I think Markus is spot-on given the fact that you have 2 days. Using a string field is quickest. However, if you absolutely MUST have functioning dates, there are three options I can think of: 1 can you make your XSLT transform the dates? Confession; I'm XSLT-ignorant 2 use DIH and DateTransformer, see: http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer you can walk a directory importing all the XML files with FileDataSource. http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer3 you could write a program to do this manually. But given the time constraints, I suspect your time would be better spent doing the other stuff and just using string as per Markus. I have no clue how SOLR-savvy you are, so pardon if this is something you already know. But lots of people trip up over the string field type, which is NOT tokenized. You usually want text unless it's some sort of ID So it might be worth it to do some searching earlier rather than later G Best Erick On Wed, Sep 8, 2010 at 12:34 PM, Markus Jelsma markus.jel...@buyways.nlwrote: No. The Datefield [1] will not accept it any other way. You could, however, fool your boss and dump your dates in an ordinary string field. But then you cannot use some of the nice date features. [1]: http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html -Original message- From: Rico Lelina rlel...@yahoo.com Sent: Wed 08-09-2010 17:36 To: solr-user@lucene.apache.org; Subject: How to import data with a different date format Hi, I am attempting to import some of our data into SOLR. I did it the quickest way I know because I literally only have 2 days to import the data and do some queries for a proof-of-concept. So I have this data in XML format and I wrote a short XSLT script to convert it to the format in solr/example/exampledocs (except I retained the element names so I had to modify schema.xml in the conf directory. So far so good -- the import works and I can search the data. One of my immediate problems is that there is a date field with the format MM/DD/. Looking at schema.xml, it seems SOLR accepts only full date fields -- everything seems to be mandatory including the Z for Zulu/UTC time according to the doc. Is there a way to specify the date format? Thanks very much. Rico
Re: How to import data with a different date format
how SOLR-savvy you are, so pardon if this is something you already know. But lots of people trip up over the string field type, which is NOT tokenized. You usually want text unless it's some sort of ID So it might be worth it to do some searching earlier rather than later G Why would you want to tokenize a -mm-dd value? I'm liking the 'string' type. If you do -mm-dd, then you can even sort properly, and range query with endpoints also specified as -mm-dd, no? Okay, I'll stop spamming the thread now, heh. Jonathan
Re: How to import data with a different date format
I'm doing something similar for dates/times/timestamps. I'm actually trying to do, 'now' is within the range of what appointments(date/time from and to combos, i.e. timestamps). Fairly simple search of: What items have a start time BEFORE now, and an end time AFTER now? My thoughts were to store: unix time stamp BIGINTS (64 bit) ISO_DATE ISO_TIME strings Which is going to be faster: 1/ Indexing? 2/ Searching? How does the 'tint' field mentioned below apply? Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Wed, 9/8/10, Jonathan Rochkind rochk...@jhu.edu wrote: From: Jonathan Rochkind rochk...@jhu.edu Subject: Re: How to import data with a different date format To: solr-user@lucene.apache.org solr-user@lucene.apache.org Date: Wednesday, September 8, 2010, 10:27 AM Just throwing it out there, I'd consider a different approach for an actual real app, although it might not be easier to get up quickly. (For quickly, yeah, I'd just store it as a string, more on that at bottom). If none of your dates have times, they're all just full days, I'm not sure you really need the date type at all. Convert the date to number-of-days since epoch integer. (Most languages will have a way to do this, but I don't know about pure XSLT). Store _that_ in a 1.4 'int' field. On top of that, make it a tint (precision non-zero) for faster range queries. But now your actual interface will have to convert from number of days since epoch to a displayable date. (And if you allow user input, convert the input to number-of-days-since-epoch before making a range query or fq, but you'd have to do that anyway even with solr dates, users aren't going to be entering W3CDate raw, I don't think). That is probably the most efficient way to have solr handle it -- using an actual date field type gives you a lot more precision than you need, which is going to hurt performance on range queries. Which you can compensate for with trie date sure, but if you don't really need that precision to begin with, why use it? Also the extra precision can end up doing unexpected things and making it easier to have bugs (range queries on that high precision stuff, you need to make sure your start date has 00:00:00 set and your end date has 23:59:59 set, to do what you probably expect). If you aren't going to use the extra precision, makes everything a lot simpler to not use a date field. Alternately, for your get this done quick method, yeah, I'd just store it as a string. With a string exactly as you've specified, sorting and range queries won't work how you'd want. But if you can make it a string of the format /mm/dd instead (always two-digit month and year), then you can even sort and do range queries on your string dates. For the quick and dirty prototype, I'd just do that. In fact, while this might make range queries and sorting _slightly_ slower than if you use an int or a tint, this might really be good enough even for a real app (hey, it's what lots of people did before the trie-based fields existed). Jonathan Erick Erickson wrote: I think Markus is spot-on given the fact that you have 2 days. Using a string field is quickest. However, if you absolutely MUST have functioning dates, there are three options I can think of: 1 can you make your XSLT transform the dates? Confession; I'm XSLT-ignorant 2 use DIH and DateTransformer, see: http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer you can walk a directory importing all the XML files with FileDataSource. http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer3 you could write a program to do this manually. But given the time constraints, I suspect your time would be better spent doing the other stuff and just using string as per Markus. I have no clue how SOLR-savvy you are, so pardon if this is something you already know. But lots of people trip up over the string field type, which is NOT tokenized. You usually want text unless it's some sort of ID So it might be worth it to do some searching earlier rather than later G Best Erick On Wed, Sep 8, 2010 at 12:34 PM, Markus Jelsma markus.jel...@buyways.nlwrote: No. The Datefield [1] will not accept it any other way. You could, however, fool your boss and dump your dates in an ordinary string field. But then you cannot use some of the nice date features. [1]: http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html -Original message- From: Rico Lelina rlel...@yahoo.com Sent: Wed 08-09-2010 17:36 To: solr-user@lucene.apache.org; Subject: How to import data with a different date format Hi, I am attempting to import some of our data into SOLR. I did
Re: How to import data with a different date format
That was a general comment on SOLR string types. Mostly I wanted to prompt Rico to try some searching before getting too hung up on indexing refinements. I'd far rather demo a prototype being able to say Dates don't work yet, but you can search than searching is broken to pieces, but dates work fine!. FWIW Erick On Wed, Sep 8, 2010 at 1:33 PM, Jonathan Rochkind rochk...@jhu.edu wrote: how SOLR-savvy you are, so pardon if this is something you already know. But lots of people trip up over the string field type, which is NOT tokenized. You usually want text unless it's some sort of ID So it might be worth it to do some searching earlier rather than later G Why would you want to tokenize a -mm-dd value? I'm liking the 'string' type. If you do -mm-dd, then you can even sort properly, and range query with endpoints also specified as -mm-dd, no? Okay, I'll stop spamming the thread now, heh. Jonathan
Re: How to import data with a different date format
So the standard 'int' field in Solr 1.4 is a trie based field, although the example int type in the default solrconfig.xml has a precision set to 0, which means it's not really doing trie things. If you set the precision to something greater than 0, as in the default example tint type, then it's really using 'trie' functionality. 'trie' functionality speeds up range queries by putting each value into 'buckets' (my own term), per the precision specified, so solr has to do less to grab all values within a certain range. That's all tint/non-zero-precision-trie does, speed up range queries. Your use case involves range queries though, so it's worth investigating. If you use a string or other textual type for sorting or range queries, you need to make sure your values sort the way you want them to as strings. But -mm-dd will. More on trie: http://www.lucidimagination.com/blog/2009/05/13/exploring-lucene-and-solrs-trierange-capabilities/ I think there probably won't be much of a difference at query time between non-trie int and string, although I'm not sure, and it may depend on the nature of your data and queries. Using a trie int will be faster for (and only for) range queries, if you have a lot of data. (There are some cases, depending on the data and the nature of your queries, where the overhead of a non-zero-precision trie may outweigh the hypothetical gain, but generally it's faster). I don't think there should be any appreciable difference between how long a non-trie int or a string will take to index -- at least as far as solr is concerned, if your app preparing the documents for solr takes longer to prepare one than another, that's another story. An actual trie (non-zero-precision) theoretically has indexing-time overhead, but I doubt it would be noticeable, unless you have a really really lean mean indexing setup where ever microsecond counts. Jonathan Dennis Gearon wrote: I'm doing something similar for dates/times/timestamps. I'm actually trying to do, 'now' is within the range of what appointments(date/time from and to combos, i.e. timestamps). Fairly simple search of: What items have a start time BEFORE now, and an end time AFTER now? My thoughts were to store: unix time stamp BIGINTS (64 bit) ISO_DATE ISO_TIME strings Which is going to be faster: 1/ Indexing? 2/ Searching? How does the 'tint' field mentioned below apply? Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Wed, 9/8/10, Jonathan Rochkind rochk...@jhu.edu wrote: From: Jonathan Rochkind rochk...@jhu.edu Subject: Re: How to import data with a different date format To: solr-user@lucene.apache.org solr-user@lucene.apache.org Date: Wednesday, September 8, 2010, 10:27 AM Just throwing it out there, I'd consider a different approach for an actual real app, although it might not be easier to get up quickly. (For quickly, yeah, I'd just store it as a string, more on that at bottom). If none of your dates have times, they're all just full days, I'm not sure you really need the date type at all. Convert the date to number-of-days since epoch integer. (Most languages will have a way to do this, but I don't know about pure XSLT). Store _that_ in a 1.4 'int' field. On top of that, make it a tint (precision non-zero) for faster range queries. But now your actual interface will have to convert from number of days since epoch to a displayable date. (And if you allow user input, convert the input to number-of-days-since-epoch before making a range query or fq, but you'd have to do that anyway even with solr dates, users aren't going to be entering W3CDate raw, I don't think). That is probably the most efficient way to have solr handle it -- using an actual date field type gives you a lot more precision than you need, which is going to hurt performance on range queries. Which you can compensate for with trie date sure, but if you don't really need that precision to begin with, why use it? Also the extra precision can end up doing unexpected things and making it easier to have bugs (range queries on that high precision stuff, you need to make sure your start date has 00:00:00 set and your end date has 23:59:59 set, to do what you probably expect). If you aren't going to use the extra precision, makes everything a lot simpler to not use a date field. Alternately, for your get this done quick method, yeah, I'd just store it as a string. With a string exactly as you've specified, sorting and range queries won't work how you'd want. But if you can make it a string of the format /mm/dd instead (always two-digit month and year), then you can even sort and do range queries on your string dates. For the quick and dirty prototype, I'd just do that. In fact, while this might make range queries and sorting _slightly_ slower than if you use an int
Re: How to import data with a different date format
So now, vs when 'trie' came out, Solr has an INT field that IS 'trie', right? And nothing date/timestamp related has come out since, making 'trie'/INT the field of choice for timestamps, right? Seems like the fastest choice. I will have to read up on it. Seems like my original choice to use unix timestamp as storage in my SQL database, vs native Postgres timestamp, will make everything easier between: PHP Symfony Postgres Solr It's probably going to be a good idea to store two other columns in the search index for display, 'date', 'time'. That is, unless I force the user's javascript to generate the time and date from the unix timestamp. hmm. Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Wed, 9/8/10, Jonathan Rochkind rochk...@jhu.edu wrote: From: Jonathan Rochkind rochk...@jhu.edu Subject: Re: How to import data with a different date format To: solr-user@lucene.apache.org solr-user@lucene.apache.org Date: Wednesday, September 8, 2010, 11:35 AM So the standard 'int' field in Solr 1.4 is a trie based field, although the example int type in the default solrconfig.xml has a precision set to 0, which means it's not really doing trie things. If you set the precision to something greater than 0, as in the default example tint type, then it's really using 'trie' functionality. 'trie' functionality speeds up range queries by putting each value into 'buckets' (my own term), per the precision specified, so solr has to do less to grab all values within a certain range. That's all tint/non-zero-precision-trie does, speed up range queries. Your use case involves range queries though, so it's worth investigating. If you use a string or other textual type for sorting or range queries, you need to make sure your values sort the way you want them to as strings. But -mm-dd will. More on trie: http://www.lucidimagination.com/blog/2009/05/13/exploring-lucene-and-solrs-trierange-capabilities/ I think there probably won't be much of a difference at query time between non-trie int and string, although I'm not sure, and it may depend on the nature of your data and queries. Using a trie int will be faster for (and only for) range queries, if you have a lot of data. (There are some cases, depending on the data and the nature of your queries, where the overhead of a non-zero-precision trie may outweigh the hypothetical gain, but generally it's faster). I don't think there should be any appreciable difference between how long a non-trie int or a string will take to index -- at least as far as solr is concerned, if your app preparing the documents for solr takes longer to prepare one than another, that's another story. An actual trie (non-zero-precision) theoretically has indexing-time overhead, but I doubt it would be noticeable, unless you have a really really lean mean indexing setup where ever microsecond counts. Jonathan Dennis Gearon wrote: I'm doing something similar for dates/times/timestamps. I'm actually trying to do, 'now' is within the range of what appointments(date/time from and to combos, i.e. timestamps). Fairly simple search of: What items have a start time BEFORE now, and an end time AFTER now? My thoughts were to store: unix time stamp BIGINTS (64 bit) ISO_DATE ISO_TIME strings Which is going to be faster: 1/ Indexing? 2/ Searching? How does the 'tint' field mentioned below apply? Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Wed, 9/8/10, Jonathan Rochkind rochk...@jhu.edu wrote: From: Jonathan Rochkind rochk...@jhu.edu Subject: Re: How to import data with a different date format To: solr-user@lucene.apache.org solr-user@lucene.apache.org Date: Wednesday, September 8, 2010, 10:27 AM Just throwing it out there, I'd consider a different approach for an actual real app, although it might not be easier to get up quickly. (For quickly, yeah, I'd just store it as a string, more on that at bottom). If none of your dates have times, they're all just full days, I'm not sure you really need the date type at all. Convert the date to number-of-days since epoch integer. (Most languages will have a way to do this, but I don't know about pure XSLT). Store _that_ in a 1.4 'int' field. On top of that, make it a tint (precision non-zero) for faster range queries. But now your actual interface will have to convert from number of days since epoch to a displayable date. (And if you allow user input, convert the input to number-of-days-since-epoch before making a range query or fq, but you'd have to do that anyway even
Re: How to import data with a different date format
: If none of your dates have times, they're all just full days, I'm not sure you : really need the date type at all. : : Convert the date to number-of-days since epoch integer. (Most languages will : have a way to do this, but I don't know about pure XSLT). Store _that_ in a : 1.4 'int' field. On top of that, make it a tint (precision non-zero) for : faster range queries. There's really no advantage to doing this over using the TrieDateField (available in Solr 1.4). It's esentially how it's implemented under the covers (you can pick the precision just like TrieInt) except that: 1) it uses a long instead of an int 2) it supports DateMath expressions 3) it supports Date Faceting -Hoss -- http://lucenerevolution.org/ ... October 7-8, Boston http://bit.ly/stump-hoss ... Stump The Chump!
Re: How to import data with a different date format
Solr 1.4 was the first tagged release with trie fields. And Solr 1.4+ also includes a 'date' field based on 'trie' just for dates. If your dates are actually going to include hour/minute/second, not just calendar day-of-month, then I'd definitely use the built in solr trie date field, that's what it's for, will do the translation from calendar date-time to integer for you (in both directions), and add trie buckets for fast range querying too. I was suggesting that just using 'int' might be simpler if you don't need hour/minute/second precision, but are just storing year-month-day. If you've got hour-minute-second too, no reason not to use Solr's date type, and lots of reasons to do so. Jonathan Dennis Gearon wrote: So now, vs when 'trie' came out, Solr has an INT field that IS 'trie', right? And nothing date/timestamp related has come out since, making 'trie'/INT the field of choice for timestamps, right? Seems like the fastest choice. I will have to read up on it. Seems like my original choice to use unix timestamp as storage in my SQL database, vs native Postgres timestamp, will make everything easier between: PHP Symfony Postgres Solr It's probably going to be a good idea to store two other columns in the search index for display, 'date', 'time'. That is, unless I force the user's javascript to generate the time and date from the unix timestamp. hmm. Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Wed, 9/8/10, Jonathan Rochkind rochk...@jhu.edu wrote: From: Jonathan Rochkind rochk...@jhu.edu Subject: Re: How to import data with a different date format To: solr-user@lucene.apache.org solr-user@lucene.apache.org Date: Wednesday, September 8, 2010, 11:35 AM So the standard 'int' field in Solr 1.4 is a trie based field, although the example int type in the default solrconfig.xml has a precision set to 0, which means it's not really doing trie things. If you set the precision to something greater than 0, as in the default example tint type, then it's really using 'trie' functionality. 'trie' functionality speeds up range queries by putting each value into 'buckets' (my own term), per the precision specified, so solr has to do less to grab all values within a certain range. That's all tint/non-zero-precision-trie does, speed up range queries. Your use case involves range queries though, so it's worth investigating. If you use a string or other textual type for sorting or range queries, you need to make sure your values sort the way you want them to as strings. But -mm-dd will. More on trie: http://www.lucidimagination.com/blog/2009/05/13/exploring-lucene-and-solrs-trierange-capabilities/ I think there probably won't be much of a difference at query time between non-trie int and string, although I'm not sure, and it may depend on the nature of your data and queries. Using a trie int will be faster for (and only for) range queries, if you have a lot of data. (There are some cases, depending on the data and the nature of your queries, where the overhead of a non-zero-precision trie may outweigh the hypothetical gain, but generally it's faster). I don't think there should be any appreciable difference between how long a non-trie int or a string will take to index -- at least as far as solr is concerned, if your app preparing the documents for solr takes longer to prepare one than another, that's another story. An actual trie (non-zero-precision) theoretically has indexing-time overhead, but I doubt it would be noticeable, unless you have a really really lean mean indexing setup where ever microsecond counts. Jonathan Dennis Gearon wrote: I'm doing something similar for dates/times/timestamps. I'm actually trying to do, 'now' is within the range of what appointments(date/time from and to combos, i.e. timestamps). Fairly simple search of: What items have a start time BEFORE now, and an end time AFTER now? My thoughts were to store: unix time stamp BIGINTS (64 bit) ISO_DATE ISO_TIME strings Which is going to be faster: 1/ Indexing? 2/ Searching? How does the 'tint' field mentioned below apply? Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Wed, 9/8/10, Jonathan Rochkind rochk...@jhu.edu wrote: From: Jonathan Rochkind rochk...@jhu.edu Subject: Re: How to import data with a different date format To: solr-user@lucene.apache.org solr-user@lucene.apache.org Date: Wednesday, September 8, 2010, 10:27 AM Just throwing it out there, I'd consider a different approach for an actual real app, although it might not be easier to get up quickly. (For quickly, yeah
Re: How to import data with a different date format
I already have the issue of how to store between different databases, languages, platforms, and frameworks. Settling on LONGINT/unix timestamp solves the problem on all fronts. I may even send them to the browser and have the JScript convert them to date/times (maybe ;-) So, it's *nix timestamp or bust! Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Wed, 9/8/10, Jonathan Rochkind rochk...@jhu.edu wrote: From: Jonathan Rochkind rochk...@jhu.edu Subject: Re: How to import data with a different date format To: solr-user@lucene.apache.org solr-user@lucene.apache.org Date: Wednesday, September 8, 2010, 3:07 PM Solr 1.4 was the first tagged release with trie fields. And Solr 1.4+ also includes a 'date' field based on 'trie' just for dates. If your dates are actually going to include hour/minute/second, not just calendar day-of-month, then I'd definitely use the built in solr trie date field, that's what it's for, will do the translation from calendar date-time to integer for you (in both directions), and add trie buckets for fast range querying too. I was suggesting that just using 'int' might be simpler if you don't need hour/minute/second precision, but are just storing year-month-day. If you've got hour-minute-second too, no reason not to use Solr's date type, and lots of reasons to do so. Jonathan Dennis Gearon wrote: So now, vs when 'trie' came out, Solr has an INT field that IS 'trie', right? And nothing date/timestamp related has come out since, making 'trie'/INT the field of choice for timestamps, right? Seems like the fastest choice. I will have to read up on it. Seems like my original choice to use unix timestamp as storage in my SQL database, vs native Postgres timestamp, will make everything easier between: PHP Symfony Postgres Solr It's probably going to be a good idea to store two other columns in the search index for display, 'date', 'time'. That is, unless I force the user's javascript to generate the time and date from the unix timestamp. hmm. Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Wed, 9/8/10, Jonathan Rochkind rochk...@jhu.edu wrote: From: Jonathan Rochkind rochk...@jhu.edu Subject: Re: How to import data with a different date format To: solr-user@lucene.apache.org solr-user@lucene.apache.org Date: Wednesday, September 8, 2010, 11:35 AM So the standard 'int' field in Solr 1.4 is a trie based field, although the example int type in the default solrconfig.xml has a precision set to 0, which means it's not really doing trie things. If you set the precision to something greater than 0, as in the default example tint type, then it's really using 'trie' functionality. 'trie' functionality speeds up range queries by putting each value into 'buckets' (my own term), per the precision specified, so solr has to do less to grab all values within a certain range. That's all tint/non-zero-precision-trie does, speed up range queries. Your use case involves range queries though, so it's worth investigating. If you use a string or other textual type for sorting or range queries, you need to make sure your values sort the way you want them to as strings. But -mm-dd will. More on trie: http://www.lucidimagination.com/blog/2009/05/13/exploring-lucene-and-solrs-trierange-capabilities/ I think there probably won't be much of a difference at query time between non-trie int and string, although I'm not sure, and it may depend on the nature of your data and queries. Using a trie int will be faster for (and only for) range queries, if you have a lot of data. (There are some cases, depending on the data and the nature of your queries, where the overhead of a non-zero-precision trie may outweigh the hypothetical gain, but generally it's faster). I don't think there should be any appreciable difference between how long a non-trie int or a string will take to index -- at least as far as solr is concerned, if your app preparing the documents for solr takes longer to prepare one than another, that's another story. An actual trie (non-zero-precision) theoretically has indexing-time overhead, but I doubt it would be noticeable, unless you have a really really lean mean indexing setup where ever microsecond counts. Jonathan Dennis Gearon wrote: I'm doing something similar for dates/times/timestamps. I'm actually trying to do, 'now' is within the range of what appointments(date/time from and to combos, i.e. timestamps). Fairly simple search