Re: Solr Terms and Date field issues
OK, I'm reaching a little here, but I think it's got a pretty good chance of being the issue you're seeing. Sure hope somebody jumps in and corrects me if I'm wrong (hint hint)... I haven't delved into the actual Trie code, this is just from looking with TermsComponent and Luke. Using Solr 1.4.1 BTW. What you're seeing it a consequence of the trie field type with a precision step other than 0. Trie fields with precisionstep 0 add extra stuff to the index to allow more efficient range queries. A hint about this is that your 5 documents with the tdate type produce 16 tokens rather than just 5. If you try your experiment with the date type (which is a trie type with precisionstep=0) you'll see exactly what you expect. So the long and short of it is that Solr's working as expected, and you can use your index without worrying. But, if you're trying to do some lower-level term walking, you'll either have to filter stuff out, copy your dates to something with precisionstep=0 and use that field or Best Erick On Thu, May 5, 2011 at 9:08 PM, Ahmet Arslan iori...@yahoo.com wrote: It is okey to see weird things in admin/schema.jsp or terms component with trie based types. Please see http://search-lucene.com/m/WEfSI1Yi4562/ If you really need terms component, consider using copyField (tdate to string type) Please find attached the schema and some test data (test.xml). Thanks for looking this. Viswa Date: Thu, 5 May 2011 19:08:31 -0400 Subject: Re: Solr Terms and Date field issues From: erickerick...@gmail.com To: solr-user@lucene.apache.org H, this is puzzling. If you could come up with a couple of xml files and a schema that illustrate this, I'll see what I can see... Thanks, Erick On Wed, May 4, 2011 at 7:05 PM, Viswa S svis...@hotmail.com wrote: Erik, I suspected the same, and setup a test instance to reproduce this. The date field I used is setup to capture indexing time, in other words the schema has a default value of NOW. However, I have reproduced this issue with fields which do no have defaults too. On the second one, I did a delete-commit (with expungeDeletes=true) and then a optimize. All other fields show updated terms except the date fields. I have also double checked to see if the Luke handler has any different terms, and it did not. Thanks Viswa Date: Wed, 4 May 2011 08:17:39 -0400 Subject: Re: Solr Terms and Date field issues From: erickerick...@gmail.com To: solr-user@lucene.apache.org Hmmm, this *looks* like you've changed your schema without re-indexing all your data so you're getting old (string?) values in that field, but that's just a guess. If this is really happening on a clean index it's a problem. I'm also going to guess that you're not really deleting the documents you think. Are you committing after the deletes? Best Erick On Wed, May 4, 2011 at 2:18 AM, Viswa S svis...@hotmail.com wrote: Hello, The terms query for a date field seems to get populated with some weird dates, many of these dates (1970,2009,2011-04-23) are not present in the indexed data. Please see sample data below I also notice that a delete and optimize does not remove the relevant terms for date fields, the string fields seems work fine. Thanks Viswa Results from Terms component: int name=2011-05-04T02:01:32.928Z3479/int int name=2011-05-04T02:00:19.2Z3479/int int name=2011-05-03T22:34:58.432Z3479/int int name=2011-04-23T01:36:14.336Z3479/int int name=2009-03-13T13:23:01.248Z3479/int int name=1970-01-01T00:00:00Z3479/int int name=1970-01-01T00:00:00Z3479/int int name=1970-01-01T00:00:00Z3479/int int name=1970-01-01T00:00:00Z3479/int int name=2011-05-04T02:01:34.592Z265/int Result from facet component, rounded by seconds.: lst name=InsertTime int name=2011-05-04T02:01:32Z1/int int name=2011-05-04T02:01:33Z1148/int int name=2011-05-04T02:01:34Z2333/int str name=gap+1SECOND/str date name=start2011-05-03T06:14:14Z/date date name=end2011-05-04T06:14:14Z/date/lst
RE: Solr Terms and Date field issues
Thanks Erick Ahmet, that helps. Date: Fri, 6 May 2011 09:25:11 -0400 Subject: Re: Solr Terms and Date field issues From: erickerick...@gmail.com To: solr-user@lucene.apache.org OK, I'm reaching a little here, but I think it's got a pretty good chance of being the issue you're seeing. Sure hope somebody jumps in and corrects me if I'm wrong (hint hint)... I haven't delved into the actual Trie code, this is just from looking with TermsComponent and Luke. Using Solr 1.4.1 BTW. What you're seeing it a consequence of the trie field type with a precision step other than 0. Trie fields with precisionstep 0 add extra stuff to the index to allow more efficient range queries. A hint about this is that your 5 documents with the tdate type produce 16 tokens rather than just 5. If you try your experiment with the date type (which is a trie type with precisionstep=0) you'll see exactly what you expect. So the long and short of it is that Solr's working as expected, and you can use your index without worrying. But, if you're trying to do some lower-level term walking, you'll either have to filter stuff out, copy your dates to something with precisionstep=0 and use that field or Best Erick On Thu, May 5, 2011 at 9:08 PM, Ahmet Arslan iori...@yahoo.com wrote: It is okey to see weird things in admin/schema.jsp or terms component with trie based types. Please see http://search-lucene.com/m/WEfSI1Yi4562/ If you really need terms component, consider using copyField (tdate to string type) Please find attached the schema and some test data (test.xml). Thanks for looking this. Viswa Date: Thu, 5 May 2011 19:08:31 -0400 Subject: Re: Solr Terms and Date field issues From: erickerick...@gmail.com To: solr-user@lucene.apache.org H, this is puzzling. If you could come up with a couple of xml files and a schema that illustrate this, I'll see what I can see... Thanks, Erick On Wed, May 4, 2011 at 7:05 PM, Viswa S svis...@hotmail.com wrote: Erik, I suspected the same, and setup a test instance to reproduce this. The date field I used is setup to capture indexing time, in other words the schema has a default value of NOW. However, I have reproduced this issue with fields which do no have defaults too. On the second one, I did a delete-commit (with expungeDeletes=true) and then a optimize. All other fields show updated terms except the date fields. I have also double checked to see if the Luke handler has any different terms, and it did not. Thanks Viswa Date: Wed, 4 May 2011 08:17:39 -0400 Subject: Re: Solr Terms and Date field issues From: erickerick...@gmail.com To: solr-user@lucene.apache.org Hmmm, this *looks* like you've changed your schema without re-indexing all your data so you're getting old (string?) values in that field, but that's just a guess. If this is really happening on a clean index it's a problem. I'm also going to guess that you're not really deleting the documents you think. Are you committing after the deletes? Best Erick On Wed, May 4, 2011 at 2:18 AM, Viswa S svis...@hotmail.com wrote: Hello, The terms query for a date field seems to get populated with some weird dates, many of these dates (1970,2009,2011-04-23) are not present in the indexed data. Please see sample data below I also notice that a delete and optimize does not remove the relevant terms for date fields, the string fields seems work fine. Thanks Viswa Results from Terms component: int name=2011-05-04T02:01:32.928Z3479/int int name=2011-05-04T02:00:19.2Z3479/int int name=2011-05-03T22:34:58.432Z3479/int int name=2011-04-23T01:36:14.336Z3479/int int name=2009-03-13T13:23:01.248Z3479/int int name=1970-01-01T00:00:00Z3479/int int name=1970-01-01T00:00:00Z3479/int int name=1970-01-01T00:00:00Z3479/int int name=1970-01-01T00:00:00Z3479/int int name=2011-05-04T02:01:34.592Z265/int Result from facet component, rounded by seconds.: lst name=InsertTime int name=2011-05-04T02:01:32Z1/int int name=2011-05-04T02:01:33Z1148/int int name=2011-05-04T02:01:34Z2333/int str name=gap+1SECOND/str date name=start2011-05-03T06:14:14Z/date date name=end2011-05-04T06:14:14Z/date/lst
Re: Solr Terms and Date field issues
H, this is puzzling. If you could come up with a couple of xml files and a schema that illustrate this, I'll see what I can see... Thanks, Erick On Wed, May 4, 2011 at 7:05 PM, Viswa S svis...@hotmail.com wrote: Erik, I suspected the same, and setup a test instance to reproduce this. The date field I used is setup to capture indexing time, in other words the schema has a default value of NOW. However, I have reproduced this issue with fields which do no have defaults too. On the second one, I did a delete-commit (with expungeDeletes=true) and then a optimize. All other fields show updated terms except the date fields. I have also double checked to see if the Luke handler has any different terms, and it did not. Thanks Viswa Date: Wed, 4 May 2011 08:17:39 -0400 Subject: Re: Solr Terms and Date field issues From: erickerick...@gmail.com To: solr-user@lucene.apache.org Hmmm, this *looks* like you've changed your schema without re-indexing all your data so you're getting old (string?) values in that field, but that's just a guess. If this is really happening on a clean index it's a problem. I'm also going to guess that you're not really deleting the documents you think. Are you committing after the deletes? Best Erick On Wed, May 4, 2011 at 2:18 AM, Viswa S svis...@hotmail.com wrote: Hello, The terms query for a date field seems to get populated with some weird dates, many of these dates (1970,2009,2011-04-23) are not present in the indexed data. Please see sample data below I also notice that a delete and optimize does not remove the relevant terms for date fields, the string fields seems work fine. Thanks Viswa Results from Terms component: int name=2011-05-04T02:01:32.928Z3479/int int name=2011-05-04T02:00:19.2Z3479/int int name=2011-05-03T22:34:58.432Z3479/int int name=2011-04-23T01:36:14.336Z3479/int int name=2009-03-13T13:23:01.248Z3479/int int name=1970-01-01T00:00:00Z3479/int int name=1970-01-01T00:00:00Z3479/int int name=1970-01-01T00:00:00Z3479/int int name=1970-01-01T00:00:00Z3479/int int name=2011-05-04T02:01:34.592Z265/int Result from facet component, rounded by seconds.: lst name=InsertTime int name=2011-05-04T02:01:32Z1/int int name=2011-05-04T02:01:33Z1148/int int name=2011-05-04T02:01:34Z2333/int str name=gap+1SECOND/str date name=start2011-05-03T06:14:14Z/date date name=end2011-05-04T06:14:14Z/date/lst
RE: Solr Terms and Date field issues
Please find attached the schema and some test data (test.xml). Thanks for looking this. Viswa Date: Thu, 5 May 2011 19:08:31 -0400 Subject: Re: Solr Terms and Date field issues From: erickerick...@gmail.com To: solr-user@lucene.apache.org H, this is puzzling. If you could come up with a couple of xml files and a schema that illustrate this, I'll see what I can see... Thanks, Erick On Wed, May 4, 2011 at 7:05 PM, Viswa S svis...@hotmail.com wrote: Erik, I suspected the same, and setup a test instance to reproduce this. The date field I used is setup to capture indexing time, in other words the schema has a default value of NOW. However, I have reproduced this issue with fields which do no have defaults too. On the second one, I did a delete-commit (with expungeDeletes=true) and then a optimize. All other fields show updated terms except the date fields. I have also double checked to see if the Luke handler has any different terms, and it did not. Thanks Viswa Date: Wed, 4 May 2011 08:17:39 -0400 Subject: Re: Solr Terms and Date field issues From: erickerick...@gmail.com To: solr-user@lucene.apache.org Hmmm, this *looks* like you've changed your schema without re-indexing all your data so you're getting old (string?) values in that field, but that's just a guess. If this is really happening on a clean index it's a problem. I'm also going to guess that you're not really deleting the documents you think. Are you committing after the deletes? Best Erick On Wed, May 4, 2011 at 2:18 AM, Viswa S svis...@hotmail.com wrote: Hello, The terms query for a date field seems to get populated with some weird dates, many of these dates (1970,2009,2011-04-23) are not present in the indexed data. Please see sample data below I also notice that a delete and optimize does not remove the relevant terms for date fields, the string fields seems work fine. Thanks Viswa Results from Terms component: int name=2011-05-04T02:01:32.928Z3479/int int name=2011-05-04T02:00:19.2Z3479/int int name=2011-05-03T22:34:58.432Z3479/int int name=2011-04-23T01:36:14.336Z3479/int int name=2009-03-13T13:23:01.248Z3479/int int name=1970-01-01T00:00:00Z3479/int int name=1970-01-01T00:00:00Z3479/int int name=1970-01-01T00:00:00Z3479/int int name=1970-01-01T00:00:00Z3479/int int name=2011-05-04T02:01:34.592Z265/int Result from facet component, rounded by seconds.: lst name=InsertTime int name=2011-05-04T02:01:32Z1/int int name=2011-05-04T02:01:33Z1148/int int name=2011-05-04T02:01:34Z2333/int str name=gap+1SECOND/str date name=start2011-05-03T06:14:14Z/date date name=end2011-05-04T06:14:14Z/date/lst add doc field name=fullTextLogI suspected the same, and setup a test instance to reproduce this/field /doc doc field name=fullTextLogThe date field I used is setup to capture indexing time, in other words the schema has a default value of NOW/field /doc doc field name=fullTextLogHowever, I have reproduced this issue with fields which do not have defaults too./field /doc doc field name=fullTextLog Lorem Ipsum is simply dummy text of the printing and typesetting industry/field /doc doc field name=fullTextLogContrary to popular belief, Lorem Ipsum is not simply random text. It has roots in a piece of classical Latin literature from 45 BC, making it over 2000 years old./field /doc /add ?xml version=1.0 encoding=UTF-8 ? !-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the License); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. -- !-- This is the Solr schema file. This file should be named schema.xml and should be in the conf directory under the solr home (i.e. ./solr/conf/schema.xml by default) or located where the classloader for the Solr webapp can find it. This example schema is the recommended starting point for users. It should be kept correct and concise, usable out-of-the-box. For more information, on how to customize this file, please see http://wiki.apache.org/solr/SchemaXml PERFORMANCE NOTE: this schema includes many optional features and should not be used for benchmarking. To improve
RE: Solr Terms and Date field issues
It is okey to see weird things in admin/schema.jsp or terms component with trie based types. Please see http://search-lucene.com/m/WEfSI1Yi4562/ If you really need terms component, consider using copyField (tdate to string type) Please find attached the schema and some test data (test.xml). Thanks for looking this. Viswa Date: Thu, 5 May 2011 19:08:31 -0400 Subject: Re: Solr Terms and Date field issues From: erickerick...@gmail.com To: solr-user@lucene.apache.org H, this is puzzling. If you could come up with a couple of xml files and a schema that illustrate this, I'll see what I can see... Thanks, Erick On Wed, May 4, 2011 at 7:05 PM, Viswa S svis...@hotmail.com wrote: Erik, I suspected the same, and setup a test instance to reproduce this. The date field I used is setup to capture indexing time, in other words the schema has a default value of NOW. However, I have reproduced this issue with fields which do no have defaults too. On the second one, I did a delete-commit (with expungeDeletes=true) and then a optimize. All other fields show updated terms except the date fields. I have also double checked to see if the Luke handler has any different terms, and it did not. Thanks Viswa Date: Wed, 4 May 2011 08:17:39 -0400 Subject: Re: Solr Terms and Date field issues From: erickerick...@gmail.com To: solr-user@lucene.apache.org Hmmm, this *looks* like you've changed your schema without re-indexing all your data so you're getting old (string?) values in that field, but that's just a guess. If this is really happening on a clean index it's a problem. I'm also going to guess that you're not really deleting the documents you think. Are you committing after the deletes? Best Erick On Wed, May 4, 2011 at 2:18 AM, Viswa S svis...@hotmail.com wrote: Hello, The terms query for a date field seems to get populated with some weird dates, many of these dates (1970,2009,2011-04-23) are not present in the indexed data. Please see sample data below I also notice that a delete and optimize does not remove the relevant terms for date fields, the string fields seems work fine. Thanks Viswa Results from Terms component: int name=2011-05-04T02:01:32.928Z3479/int int name=2011-05-04T02:00:19.2Z3479/int int name=2011-05-03T22:34:58.432Z3479/int int name=2011-04-23T01:36:14.336Z3479/int int name=2009-03-13T13:23:01.248Z3479/int int name=1970-01-01T00:00:00Z3479/int int name=1970-01-01T00:00:00Z3479/int int name=1970-01-01T00:00:00Z3479/int int name=1970-01-01T00:00:00Z3479/int int name=2011-05-04T02:01:34.592Z265/int Result from facet component, rounded by seconds.: lst name=InsertTime int name=2011-05-04T02:01:32Z1/int int name=2011-05-04T02:01:33Z1148/int int name=2011-05-04T02:01:34Z2333/int str name=gap+1SECOND/str date name=start2011-05-03T06:14:14Z/date date name=end2011-05-04T06:14:14Z/date/lst
Solr Terms and Date field issues
Hello, The terms query for a date field seems to get populated with some weird dates, many of these dates (1970,2009,2011-04-23) are not present in the indexed data. Please see sample data below I also notice that a delete and optimize does not remove the relevant terms for date fields, the string fields seems work fine. Thanks Viswa Results from Terms component: int name=2011-05-04T02:01:32.928Z3479/int int name=2011-05-04T02:00:19.2Z3479/int int name=2011-05-03T22:34:58.432Z3479/int int name=2011-04-23T01:36:14.336Z3479/int int name=2009-03-13T13:23:01.248Z3479/int int name=1970-01-01T00:00:00Z3479/int int name=1970-01-01T00:00:00Z3479/int int name=1970-01-01T00:00:00Z3479/int int name=1970-01-01T00:00:00Z3479/int int name=2011-05-04T02:01:34.592Z265/int Result from facet component, rounded by seconds.: lst name=InsertTime int name=2011-05-04T02:01:32Z1/int int name=2011-05-04T02:01:33Z1148/int int name=2011-05-04T02:01:34Z2333/int str name=gap+1SECOND/str date name=start2011-05-03T06:14:14Z/date date name=end2011-05-04T06:14:14Z/date/lst
Re: Solr Terms and Date field issues
Hmmm, this *looks* like you've changed your schema without re-indexing all your data so you're getting old (string?) values in that field, but that's just a guess. If this is really happening on a clean index it's a problem. I'm also going to guess that you're not really deleting the documents you think. Are you committing after the deletes? Best Erick On Wed, May 4, 2011 at 2:18 AM, Viswa S svis...@hotmail.com wrote: Hello, The terms query for a date field seems to get populated with some weird dates, many of these dates (1970,2009,2011-04-23) are not present in the indexed data. Please see sample data below I also notice that a delete and optimize does not remove the relevant terms for date fields, the string fields seems work fine. Thanks Viswa Results from Terms component: int name=2011-05-04T02:01:32.928Z3479/int int name=2011-05-04T02:00:19.2Z3479/int int name=2011-05-03T22:34:58.432Z3479/int int name=2011-04-23T01:36:14.336Z3479/int int name=2009-03-13T13:23:01.248Z3479/int int name=1970-01-01T00:00:00Z3479/int int name=1970-01-01T00:00:00Z3479/int int name=1970-01-01T00:00:00Z3479/int int name=1970-01-01T00:00:00Z3479/int int name=2011-05-04T02:01:34.592Z265/int Result from facet component, rounded by seconds.: lst name=InsertTime int name=2011-05-04T02:01:32Z1/int int name=2011-05-04T02:01:33Z1148/int int name=2011-05-04T02:01:34Z2333/int str name=gap+1SECOND/str date name=start2011-05-03T06:14:14Z/date date name=end2011-05-04T06:14:14Z/date/lst
RE: Solr Terms and Date field issues
Erik, I suspected the same, and setup a test instance to reproduce this. The date field I used is setup to capture indexing time, in other words the schema has a default value of NOW. However, I have reproduced this issue with fields which do no have defaults too. On the second one, I did a delete-commit (with expungeDeletes=true) and then a optimize. All other fields show updated terms except the date fields. I have also double checked to see if the Luke handler has any different terms, and it did not. Thanks Viswa Date: Wed, 4 May 2011 08:17:39 -0400 Subject: Re: Solr Terms and Date field issues From: erickerick...@gmail.com To: solr-user@lucene.apache.org Hmmm, this *looks* like you've changed your schema without re-indexing all your data so you're getting old (string?) values in that field, but that's just a guess. If this is really happening on a clean index it's a problem. I'm also going to guess that you're not really deleting the documents you think. Are you committing after the deletes? Best Erick On Wed, May 4, 2011 at 2:18 AM, Viswa S svis...@hotmail.com wrote: Hello, The terms query for a date field seems to get populated with some weird dates, many of these dates (1970,2009,2011-04-23) are not present in the indexed data. Please see sample data below I also notice that a delete and optimize does not remove the relevant terms for date fields, the string fields seems work fine. Thanks Viswa Results from Terms component: int name=2011-05-04T02:01:32.928Z3479/int int name=2011-05-04T02:00:19.2Z3479/int int name=2011-05-03T22:34:58.432Z3479/int int name=2011-04-23T01:36:14.336Z3479/int int name=2009-03-13T13:23:01.248Z3479/int int name=1970-01-01T00:00:00Z3479/int int name=1970-01-01T00:00:00Z3479/int int name=1970-01-01T00:00:00Z3479/int int name=1970-01-01T00:00:00Z3479/int int name=2011-05-04T02:01:34.592Z265/int Result from facet component, rounded by seconds.: lst name=InsertTime int name=2011-05-04T02:01:32Z1/int int name=2011-05-04T02:01:33Z1148/int int name=2011-05-04T02:01:34Z2333/int str name=gap+1SECOND/str date name=start2011-05-03T06:14:14Z/date date name=end2011-05-04T06:14:14Z/date/lst