Re: Need advice: what Word/Excel/PowerPoint lib to use?
Many thanks to everybody for interesting info Regards and have a nice day J. sergiu gordea [EMAIL PROTECTED] 25.10.2004 17:05 Please respond to Lucene Users List To: Lucene Users List [EMAIL PROTECTED] cc: (bcc: Iouli Golovatyi/X/GP/Novartis) Subject:Re: Need advice: what Word/Excel/PowerPoint lib to use? Category: of course POI, for open source. There are some commercial products based on POI also. for WORD consider textmining.org for XLS, POI does anything you need for powerpoint there is one commercial (it's about 1000$), but you can also find some source code in archives. All the best, Sergiu [EMAIL PROTECTED] wrote: Hello all, I need a piece of advice/experience again.. What ms Word/Excel/PowerPoint parsers (written in java) u'd recommend? Thanks in advance J. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Need advice: what pdf lib to use?
OK, but even in this case parsing the doc would not be a violation, because actually what we need for lucene is just collection of terms. Has nothing to do with printing or copying of _text_ pieces. As long You provide method returning just Document (I mean lucene document) permissions specified by the author of the PDF document are respected Ben Litchfield [EMAIL PROTECTED] 25.10.2004 17:59 Please respond to Lucene Users List To: Lucene Users List [EMAIL PROTECTED] cc: (bcc: Iouli Golovatyi/X/GP/Novartis) Subject:Re: Need advice: what pdf lib to use? Category: In order to write software that consumes PDF documents you must agree to a list of conditions. One of those conditions is that permissions specified by the author of the PDF document are respected. PDFBox complies with this statement, if there is software that does not then they are in violation of copyright law. That being said, PDFBox is open source so a user could make modifications to the source code, or as a PDF library could change permissions on a document. Ben On Mon, 25 Oct 2004 [EMAIL PROTECTED] wrote: Yes Ben, You are right. This would be correct functionality from technical perspective. But look it my way with application programmer eyes reporting to big boss that c. 30% of doc we cope with could not be indexed because of this stupid limitation. Neither he or me have any influence on pdf owners and any ideas about what made them create files with documet security set. In short, if You also could implement this uncorrect functionality the closed source guys did, it would be really great! As far as sponsoring is concerned I would be ready to hack (or at least to try) it even for 1/3 of that fortune:))) J. Ben Litchfield [EMAIL PROTECTED] 25.10.2004 14:02 Please respond to Lucene Users List To: Lucene Users List [EMAIL PROTECTED] cc: (bcc: Iouli Golovatyi/X/GP/Novartis) Subject:Re: Need advice: what pdf lib to use? Category: PDFBox does not 'stumble' when it gives that message, that is correct functionality if that permission is not allowed. If your company is willing to pay a 'fortune' why not sponsor a change to an open source project for half a fortune. Ben http://www.pdfbox.org On Mon, 25 Oct 2004 [EMAIL PROTECTED] wrote: PDFbox stumbles also with class java.io.IOException with message: - You do not have permission to extract text in case the doc is copy/print protected. I tested now the snowtide commercial product and it looks like it could process these files as well. Performance was also not so bad. Unfortunatly the test result could not be considered as 100%, because the free version processed just first 8 pages. After all this product costs a fortune (as long the company is ready to pay I don't realy mind:)) J. Robert Newson [EMAIL PROTECTED] Sent by: news [EMAIL PROTECTED] 24.10.2004 17:44 Please respond to Lucene Users List To: [EMAIL PROTECTED] cc: (bcc: Iouli Golovatyi/X/GP/Novartis) Subject:Re: Need advice: what pdf lib to use? Category: [EMAIL PROTECTED] wrote: Hello all, I need a piece of advice/experience.. What pdf parser (written in java) u'd recommend? I played now with PDFBox-0.6.7a and would not say I was satisfied too much with it On certain pdf's (not well formated but anyway readable with acrobate) it run into dead loop (this I could fix in code), and on one file it produced out of memory error and killed jvm:( (this problem I could not identify yet) After all the performance was not too great as well: it took c. 19 h. to index 13000 files (c. 3.5Gb) Regards, J. On the specific problem of the dead loop, I reported an instance of this to Ben a week or so ago and he has fixed it in the latest nightlies. I expect an official release will include this bugfix soon. The file in question was unreadable with any PDF software I have, but someone managed to create it somehow... http://sourceforge.net/tracker/index.php?func=detailaid=1037145group_id=78314atid=552832 I've found pdfbox to be pretty good. The only time I get problems is with corrupted or egregiously bad PDF files. B. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Large number of documents
Hi, I have just started looking at Lucene and are not an experienced user of Java, but from what I've been reading this search tool should manage large amounts of documents. I'm wondering if someone have any experience using Lucene on large amount of documents. I need to be able to index and search through 20-30 million documents of around 8kb. They are all simple text document with some attributes to restrict the search result on. Any feedback would be appreciated. Best regards, Gard Arneson Haugen - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: BooleanQuery - TooManyClauses
On Oct 25, 2004, at 6:35 PM, Angelov, Rossen wrote: Why there is a limit on the number of clauses? and is there any harm in setting MaxClauseCount to Integer.MAX_VALUE? The harm is in performance and resource utilization. Rather than do this, though, read on... I'm using a Range Query on a field that represents dates and getting BooleanQuery$TooManyClauses exception. This is the query - +/article/createddateiso8601:[2003010100 TO 2003123199] Do you really need to do ranges down to that time level? Or are you really just concerned with date? If you indexed using MMDD instead, there would only be a maximum of 365 terms in that range, whereas you've got zillions (ok, I was too lazy to do the math! But far more than 1,024). I recommend changing how you index dates, or at least use a different field for queries that do not need to concern themselves with the timestamp aspect. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Large number of documents
Hello Gard, This is certainly doable, it just depends on your hardware, complexity of queries, frequency of queries, and such. There is a benchmark page on the Lucene site that you may want to check to get some ideas. Otis --- Gard Arneson Haugen [EMAIL PROTECTED] wrote: Hi, I have just started looking at Lucene and are not an experienced user of Java, but from what I've been reading this search tool should manage large amounts of documents. I'm wondering if someone have any experience using Lucene on large amount of documents. I need to be able to index and search through 20-30 million documents of around 8kb. They are all simple text document with some attributes to restrict the search result on. Any feedback would be appreciated. Best regards, Gard Arneson Haugen - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: BooleanQuery - TooManyClauses
On Oct 25, 2004, at 6:35 PM, Angelov, Rossen wrote: Why there is a limit on the number of clauses? and is there any harm in setting MaxClauseCount to Integer.MAX_VALUE? The harm is in performance and resource utilization. Rather than do this, though, read on... I'm using a Range Query on a field that represents dates and getting BooleanQuery$TooManyClauses exception. This is the query - +/article/createddateiso8601:[2003010100 TO 2003123199] Do you really need to do ranges down to that time level? Or are you really just concerned with date? If you indexed using MMDD instead, there would only be a maximum of 365 terms in that range, whereas you've got zillions (ok, I was too lazy to do the math! But far more than 1,024). I need to do range searches. They are part of the requirements and even worse, the range can be as big as up to 10 years for now. It will get bigger. I'm indexing using MMDDHHmmssZ format and as you said there will be more than just 365 terms per year. This number changes every day as new documents are indexed daily. The only limit I can see is the number of documents that were indexed. I guess maxClauseCount can't be more than the indexed documents. I recommend changing how you index dates, or at least use a different field for queries that do not need to concern themselves with the timestamp aspect. What do you mean change how the dates are indexed? By the way this field is indexed as a string. Erik Ross This communication is intended solely for the addressee and is confidential and not for third party unauthorized distribution.
Re: BooleanQuery - TooManyClauses
I think what Erik's asking is whether you can live with expressing your indexed date in the form of MMDD, without the hour and minute extension. That will sharply educe the number of range query expansion terms. If you're using the timestamp as a unique identifier, you might consider creating two fields, one for the unique identifier (MMDDHHmmssZ) and one for the date (MMDD), and only use the range on the date field (not on the timestamp field) Regards, Terry - Original Message - From: Angelov, Rossen To: 'Lucene Users List' Sent: Tuesday, October 26, 2004 11:43 AM Subject: RE: BooleanQuery - TooManyClauses On Oct 25, 2004, at 6:35 PM, Angelov, Rossen wrote: Why there is a limit on the number of clauses? and is there any harm in setting MaxClauseCount to Integer.MAX_VALUE? The harm is in performance and resource utilization. Rather than do this, though, read on... I'm using a Range Query on a field that represents dates and getting BooleanQuery$TooManyClauses exception. This is the query - +/article/createddateiso8601:[2003010100 TO 2003123199] Do you really need to do ranges down to that time level? Or are you really just concerned with date? If you indexed using MMDD instead, there would only be a maximum of 365 terms in that range, whereas you've got zillions (ok, I was too lazy to do the math! But far more than 1,024). I need to do range searches. They are part of the requirements and even worse, the range can be as big as up to 10 years for now. It will get bigger. I'm indexing using MMDDHHmmssZ format and as you said there will be more than just 365 terms per year. This number changes every day as new documents are indexed daily. The only limit I can see is the number of documents that were indexed. I guess maxClauseCount can't be more than the indexed documents. I recommend changing how you index dates, or at least use a different field for queries that do not need to concern themselves with the timestamp aspect. What do you mean change how the dates are indexed? By the way this field is indexed as a string. Erik Ross This communication is intended solely for the addressee and is confidential and not for third party unauthorized distribution.
Re: Exception in thread main java.lang.NoClassDefFoundError
Hi Rob, I noticed that you are using org.apache.lucene.demos where its just demo Regards CG On Mon, 25 Oct 2004 21:54:38 +0100, Rob Hailey [EMAIL PROTECTED] wrote: I am using lucene version 1.4.2 but am consistently getting an error when I run this: java -verbose -classpath /Users/rob/Desktop/lucene/lucene.jar:/Users/rob/Desktop/lucene/lucene- demos.jar:. org.apache.lucene.demos.IndexFiles /Users/rob/Desktop/lucene/src/ The error I get is: Exception in thread main java.lang.NoClassDefFoundError: org/apache/lucene/demos/IndexFiles Can someone please help? I have tried on both Mac OS X (Panther) and Windows XP - both with the latest JVM - but I get the same error message. Thanks. The JVM version is: Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_05-141.3) Java HotSpot(TM) Client VM (build 1.4.2-38, mixed mode) The verbose error message is: [Opened /System/Library/Frameworks/JavaVM.framework/Versions/1.4.2/Classes/ classes.jar] [Opened /System/Library/Frameworks/JavaVM.framework/Versions/1.4.2/Classes/ ui.jar] [Opened /System/Library/Frameworks/JavaVM.framework/Versions/1.4.2/Classes/ laf.jar] [Opened /System/Library/Frameworks/JavaVM.framework/Versions/1.4.2/Classes/ sunrsasign.jar] [Opened /System/Library/Frameworks/JavaVM.framework/Versions/1.4.2/Classes/ jsse.jar] [Opened /System/Library/Frameworks/JavaVM.framework/Versions/1.4.2/Classes/ jce.jar] [Opened /System/Library/Frameworks/JavaVM.framework/Versions/1.4.2/Classes/ charsets.jar] [Loaded java.lang.Object from shared objects file] [Loaded java.io.Serializable from shared objects file] [Loaded java.lang.Comparable from shared objects file] [Loaded java.lang.CharSequence from shared objects file] [Loaded java.lang.String from shared objects file] [Loaded java.lang.Class from shared objects file] [Loaded java.lang.Cloneable from shared objects file] [Loaded java.lang.ClassLoader from shared objects file] [Loaded java.lang.System from shared objects file] [Loaded java.lang.Throwable from shared objects file] [Loaded java.lang.Error from shared objects file] [Loaded java.lang.ThreadDeath from shared objects file] [Loaded java.lang.Exception from shared objects file] [Loaded java.lang.RuntimeException from shared objects file] [Loaded java.security.ProtectionDomain from shared objects file] [Loaded java.security.AccessControlContext from shared objects file] [Loaded java.lang.ClassNotFoundException from shared objects file] [Loaded java.lang.LinkageError from shared objects file] [Loaded java.lang.NoClassDefFoundError from shared objects file] [Loaded java.lang.ClassCastException from shared objects file] [Loaded java.lang.ArrayStoreException from shared objects file] [Loaded java.lang.VirtualMachineError from shared objects file] [Loaded java.lang.OutOfMemoryError from shared objects file] [Loaded java.lang.StackOverflowError from shared objects file] [Loaded java.lang.ref.Reference from shared objects file] [Loaded java.lang.ref.SoftReference from shared objects file] [Loaded java.lang.ref.WeakReference from shared objects file] [Loaded java.lang.ref.FinalReference from shared objects file] [Loaded java.lang.ref.PhantomReference from shared objects file] [Loaded java.lang.ref.Finalizer from shared objects file] [Loaded java.lang.Runnable from shared objects file] [Loaded java.lang.Thread from shared objects file] [Loaded java.lang.ThreadGroup from shared objects file] [Loaded java.util.Dictionary from shared objects file] [Loaded java.util.Map from shared objects file] [Loaded java.util.Hashtable from shared objects file] [Loaded java.util.Properties from shared objects file] [Loaded java.lang.reflect.AccessibleObject from shared objects file] [Loaded java.lang.reflect.Member from shared objects file] [Loaded java.lang.reflect.Field from shared objects file] [Loaded java.lang.reflect.Method from shared objects file] [Loaded java.lang.reflect.Constructor from shared objects file] [Loaded sun.reflect.MagicAccessorImpl from shared objects file] [Loaded sun.reflect.MethodAccessor from shared objects file] [Loaded sun.reflect.MethodAccessorImpl from shared objects file] [Loaded sun.reflect.ConstructorAccessor from shared objects file] [Loaded sun.reflect.ConstructorAccessorImpl from shared objects file] [Loaded sun.reflect.DelegatingClassLoader from shared objects file] [Loaded java.util.Collection from shared objects file] [Loaded java.util.AbstractCollection from shared objects file] [Loaded java.util.List from shared objects file] [Loaded java.util.AbstractList from shared objects file] [Loaded java.util.RandomAccess from shared objects file] [Loaded java.util.Vector from shared objects file] [Loaded java.lang.StringBuffer from shared objects file] [Loaded java.nio.Buffer from shared objects file] [Loaded sun.misc.AtomicLong from shared objects file] [Loaded sun.misc.AtomicLongCSImpl from shared objects file] [Loaded
RE: BooleanQuery - TooManyClauses
OK, I got that part - to limit the clause counts limit the range. In my case replace the timestamp with date and if it gets too big again replace the MMDD with MM and later with . And that of course includes fixing the old files every time so they have new field. I was actually looking for more robust solution but this should do for now. Thanks, Ross -Original Message- From: Terry Steichen [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 26, 2004 11:28 AM To: Lucene Users List Subject: Re: BooleanQuery - TooManyClauses I think what Erik's asking is whether you can live with expressing your indexed date in the form of MMDD, without the hour and minute extension. That will sharply educe the number of range query expansion terms. If you're using the timestamp as a unique identifier, you might consider creating two fields, one for the unique identifier (MMDDHHmmssZ) and one for the date (MMDD), and only use the range on the date field (not on the timestamp field) Regards, Terry - Original Message - From: Angelov, Rossen To: 'Lucene Users List' Sent: Tuesday, October 26, 2004 11:43 AM Subject: RE: BooleanQuery - TooManyClauses On Oct 25, 2004, at 6:35 PM, Angelov, Rossen wrote: Why there is a limit on the number of clauses? and is there any harm in setting MaxClauseCount to Integer.MAX_VALUE? The harm is in performance and resource utilization. Rather than do this, though, read on... I'm using a Range Query on a field that represents dates and getting BooleanQuery$TooManyClauses exception. This is the query - +/article/createddateiso8601:[2003010100 TO 2003123199] Do you really need to do ranges down to that time level? Or are you really just concerned with date? If you indexed using MMDD instead, there would only be a maximum of 365 terms in that range, whereas you've got zillions (ok, I was too lazy to do the math! But far more than 1,024). I need to do range searches. They are part of the requirements and even worse, the range can be as big as up to 10 years for now. It will get bigger. I'm indexing using MMDDHHmmssZ format and as you said there will be more than just 365 terms per year. This number changes every day as new documents are indexed daily. The only limit I can see is the number of documents that were indexed. I guess maxClauseCount can't be more than the indexed documents. I recommend changing how you index dates, or at least use a different field for queries that do not need to concern themselves with the timestamp aspect. What do you mean change how the dates are indexed? By the way this field is indexed as a string. Erik Ross This communication is intended solely for the addressee and is confidential and not for third party unauthorized distribution. This communication is intended solely for the addressee and is confidential and not for third party unauthorized distribution.
RE: Aliasing problem
Looks like you produced a PhraseQuery rather than a BooleanQuery. You want +GAME:(doom3 3 doom) Chuck -Original Message- From: Abhay Saswade [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 26, 2004 10:22 AM To: [EMAIL PROTECTED] Subject: Aliasing problem Hi, One document in my index contains term 'doom 3' (indexed, tokenized, stored) How can I match term doom3 with that document? I tried following but no luck I have written alias filter which returns 2 more tokens for doom3 as 3 and doom I construct query +GAME:doom3 QueryParser returns +GAME:doom3 3 doom I am using StandardTokenizer Is my approach is correct? Or am I missing something? Any help highly appreciated. Thanks in advance, Abhay - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Aliasing problem
On Tuesday 26 October 2004 19:22, Abhay Saswade wrote: I tried following but no luck I have written alias filter which returns 2 more tokens for doom3 as 3 and doom I construct query +GAME:doom3 QueryParser returns +GAME:doom3 3 doom Your approach is correct, but QueryParser doesn't yet support analyzers which return more than one token at a position. There's already a patch about this in the bug tracking system. Regards Daniel -- http://www.danielnaber.de - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: BooleanQuery - TooManyClauses
On Oct 26, 2004, at 1:55 PM, Angelov, Rossen wrote: OK, I got that part - to limit the clause counts limit the range. In my case replace the timestamp with date and if it gets too big again replace the MMDD with MM and later with . And that of course includes fixing the old files every time so they have new field. I was actually looking for more robust solution but this should do for now. More robust, as in does not require re-indexing? This is one of the tricky things about making a search engine. Having fast searches, yet reserving the right to change how you query at a later time without re-indexing. Unfortunately it doesn't work that way. You have to consider the types of queries that will be made in order to index appropriately. Changes in types of queries may necessitate a re-index to accommodate. You may want to go ahead and index one field as MMDD, and another as . and possibly another as MM. You could also utilize a Filter for constraining searches based on a date range. QueryFilter is one option, or writing a custom one that selects the appropriate documents. Erik Thanks, Ross -Original Message- From: Terry Steichen [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 26, 2004 11:28 AM To: Lucene Users List Subject: Re: BooleanQuery - TooManyClauses I think what Erik's asking is whether you can live with expressing your indexed date in the form of MMDD, without the hour and minute extension. That will sharply educe the number of range query expansion terms. If you're using the timestamp as a unique identifier, you might consider creating two fields, one for the unique identifier (MMDDHHmmssZ) and one for the date (MMDD), and only use the range on the date field (not on the timestamp field) Regards, Terry - Original Message - From: Angelov, Rossen To: 'Lucene Users List' Sent: Tuesday, October 26, 2004 11:43 AM Subject: RE: BooleanQuery - TooManyClauses On Oct 25, 2004, at 6:35 PM, Angelov, Rossen wrote: Why there is a limit on the number of clauses? and is there any harm in setting MaxClauseCount to Integer.MAX_VALUE? The harm is in performance and resource utilization. Rather than do this, though, read on... I'm using a Range Query on a field that represents dates and getting BooleanQuery$TooManyClauses exception. This is the query - +/article/createddateiso8601:[2003010100 TO 2003123199] Do you really need to do ranges down to that time level? Or are you really just concerned with date? If you indexed using MMDD instead, there would only be a maximum of 365 terms in that range, whereas you've got zillions (ok, I was too lazy to do the math! But far more than 1,024). I need to do range searches. They are part of the requirements and even worse, the range can be as big as up to 10 years for now. It will get bigger. I'm indexing using MMDDHHmmssZ format and as you said there will be more than just 365 terms per year. This number changes every day as new documents are indexed daily. The only limit I can see is the number of documents that were indexed. I guess maxClauseCount can't be more than the indexed documents. I recommend changing how you index dates, or at least use a different field for queries that do not need to concern themselves with the timestamp aspect. What do you mean change how the dates are indexed? By the way this field is indexed as a string. Erik Ross This communication is intended solely for the addressee and is confidential and not for third party unauthorized distribution. This communication is intended solely for the addressee and is confidential and not for third party unauthorized distribution. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: BooleanQuery - TooManyClauses
Even if you need to be able to search on ranges that include the time, you could benefit from adding a few extra fields to your documents. For example: add a year field and an hour field: If the user then specifies a range between 2001-08-10 11:00 and 2004-10-11 13:00, you break it up behind the scenes into three parts as follows: - a query on the date field alone, testing on the range 2001-08-11 to 2004-10-10 (i.e. all dates fully within the date range) -= max number of clauses=max number of dates in your documents - a query on the hour field for the first date -= max number of clauses=24 - a query on the hour field for the last date -= max number of clauses=24 (You'll need a special case if the start and end happen to be on the same date of course) I'm not that familiar with the QueryParser syntax yet, but it should look something like this (note the use of curly brackets for the exclusive date ranges): (date:{20010810 TO 20041011}) OR (+date:20010910 +time:[11 TO ]) OR (+date:20041011 +time:{ TO 13}) If you need even more fine-grained ranges, you can extend this idea by adding more fields (at the cost of making the generated query even more complex) You can already add the separate fields to your documents even if you don't use them yet... Regards, Luc -Original Message- From: Terry Steichen [mailto:[EMAIL PROTECTED] Sent: dinsdag 26 oktober 2004 18:28 To: Lucene Users List Subject: Re: BooleanQuery - TooManyClauses I think what Erik's asking is whether you can live with expressing your indexed date in the form of MMDD, without the hour and minute extension. That will sharply educe the number of range query expansion terms. If you're using the timestamp as a unique identifier, you might consider creating two fields, one for the unique identifier (MMDDHHmmssZ) and one for the date (MMDD), and only use the range on the date field (not on the timestamp field) Regards, Terry - Original Message - From: Angelov, Rossen To: 'Lucene Users List' Sent: Tuesday, October 26, 2004 11:43 AM Subject: RE: BooleanQuery - TooManyClauses On Oct 25, 2004, at 6:35 PM, Angelov, Rossen wrote: Why there is a limit on the number of clauses? and is there any harm in setting MaxClauseCount to Integer.MAX_VALUE? The harm is in performance and resource utilization. Rather than do this, though, read on... I'm using a Range Query on a field that represents dates and getting BooleanQuery$TooManyClauses exception. This is the query - +/article/createddateiso8601:[2003010100 TO 2003123199] Do you really need to do ranges down to that time level? Or are you really just concerned with date? If you indexed using MMDD instead, there would only be a maximum of 365 terms in that range, whereas you've got zillions (ok, I was too lazy to do the math! But far more than 1,024). I need to do range searches. They are part of the requirements and even worse, the range can be as big as up to 10 years for now. It will get bigger. I'm indexing using MMDDHHmmssZ format and as you said there will be more than just 365 terms per year. This number changes every day as new documents are indexed daily. The only limit I can see is the number of documents that were indexed. I guess maxClauseCount can't be more than the indexed documents. I recommend changing how you index dates, or at least use a different field for queries that do not need to concern themselves with the timestamp aspect. What do you mean change how the dates are indexed? By the way this field is indexed as a string. Erik Ross This communication is intended solely for the addressee and is confidential and not for third party unauthorized distribution. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]