RE: Indexing very large files.

2008-02-23 Thread Jon Lehto
Thibault [mailto:[EMAIL PROTECTED] Sent: Thursday, February 21, 2008 7:58 PM To: solr-user@lucene.apache.org Subject: Re: Indexing very large files. All, A while back I was running into an issue with a Java heap out of memory error while indexing large files. I figured out that was my own error due

Re: Indexing very large files.

2008-02-23 Thread David Thibault
PROTECTED] Sent: Thursday, February 21, 2008 7:58 PM To: solr-user@lucene.apache.org Subject: Re: Indexing very large files. All, A while back I was running into an issue with a Java heap out of memory error while indexing large files. I figured out that was my own error due to a misconfiguration

Re: Indexing very large files.

2008-02-21 Thread David Thibault
All, A while back I was running into an issue with a Java heap out of memory error while indexing large files. I figured out that was my own error due to a misconfiguration of my Netbeans memory settings. However, now that is fixed and I have stumbled upon a new error. When trying to upload

Re: Indexing very large files.

2008-01-16 Thread David Thibault
All, I just found a thread about this on the mailing list archives because I'm troubleshooting the same problem. The kicker is that it doesn't take such large files to kill the StringBuilder. I have discovered the following: By using a text file made up of 3,443,464 bytes or less, I get no

Re: Indexing very large files.

2008-01-16 Thread Erick Erickson
I don't think this is a StringBuilder limitation, but rather your Java JVM doesn't start with enough memory. i.e. -Xmx. In raw Lucene, I've indexed 240M files Best Erick On Jan 16, 2008 10:12 AM, David Thibault [EMAIL PROTECTED] wrote: All, I just found a thread about this on the

Re: Indexing very large files.

2008-01-16 Thread Erick Erickson
P.S. Lucene by default limits the maximum field length to 10K tokens, so you have to bump that for large files. Erick On Jan 16, 2008 11:04 AM, Erick Erickson [EMAIL PROTECTED] wrote: I don't think this is a StringBuilder limitation, but rather your Java JVM doesn't start with enough memory.

Re: Indexing very large files.

2008-01-16 Thread David Thibault
I think your PS might do the trick. My JVM doesn't seem to be the issue, because I've set it to -Xmx512m -Xms256m. I will track down the solr config parameter you mentioned and try that. Thanks for the quick response! Dave On 1/16/08, Erick Erickson [EMAIL PROTECTED] wrote: P.S. Lucene by

Re: Indexing very large files.

2008-01-16 Thread David Thibault
I tried raising the maxFieldLength1/maxFieldLength under mainIndex as well as indexDefaults and still no luck. I'm trying to upload a text file that is about 8 MB in size. I think the following stack trace still points to some sort of overflowed String issue. Thoughts? Solr returned an

Re: Indexing very large files.

2008-01-16 Thread Erick Erickson
The PS really wasn't related to your OOM, and raising that shouldn't have changed the behavior. All that happens if you go beyond 10,000 tokens is that the rest gets thrown away. But we're beyond my real knowledge level about SOLR, so I'll defer to others. A very quick-n-dirty test as to whether

Re: Indexing very large files.

2008-01-16 Thread Walter Underwood
This error means that the JVM has run out of heap space. Increase the heap space. That is an option on the java command. I set my heap to 200 Meg and do it this way with Tomcat 6: JAVA_OPTS=-Xmx600M tomcat/bin/startup.sh wunder On 1/16/08 8:33 AM, David Thibault [EMAIL PROTECTED] wrote:

Re: Indexing very large files.

2008-01-16 Thread David Thibault
Nice signature...=) On 1/16/08, Erick Erickson [EMAIL PROTECTED] wrote: The PS really wasn't related to your OOM, and raising that shouldn't have changed the behavior. All that happens if you go beyond 10,000 tokens is that the rest gets thrown away. But we're beyond my real knowledge level

Re: Indexing very large files.

2008-01-16 Thread David Thibault
Walter and all, I had been bumping up the heap for my Java app (running outside of Tomcat) but I hadn't yet tried bumping up my Tomcat heap. That seems to have helped me upload the 8MB file, but it's crashing while uploading a 32MB file now. I Just bumped tomcat to 1024MB of heap, so I'm not sure

Re: Indexing very large files.

2008-01-16 Thread David Thibault
OK, I have now bumped my tomcat JVM up to 1024MB min and 1500MB max. For some reason Walter's suggestion helped me get past the 8MB file upload to Solr but it's still choking on a 32MB file. Is there a way to set per-webapp JVM settings in tomcat, or is the overall tomcat JVM sufficient to set?

RE: Indexing very large files.

2008-01-16 Thread Timothy Wonil Lee
/16849249410805339619 -Original Message- From: David Thibault [mailto:[EMAIL PROTECTED] Sent: Thursday, 17 January 2008 8:30 AM To: solr-user@lucene.apache.org Subject: Re: Indexing very large files. OK, I have now bumped my tomcat JVM up to 1024MB min and 1500MB max. For some reason Walter's

Re: Indexing very large files.

2008-01-16 Thread Yonik Seeley
From your stack trace, it looks like it's your client running out of memory, right? SimplePostTool was meant as a command-line replacement to curl to remove that dependency, not as a recommended way to talk to Solr. -Yonik On Jan 16, 2008 4:29 PM, David Thibault [EMAIL PROTECTED] wrote: OK, I

Re: Indexing very large files.

2008-01-16 Thread Otis Gospodnetic
-user@lucene.apache.org Sent: Wednesday, January 16, 2008 1:31:23 PM Subject: Re: Indexing very large files. Walter and all, I had been bumping up the heap for my Java app (running outside of Tomcat) but I hadn't yet tried bumping up my Tomcat heap. That seems to have helped me upload the 8MB file

Re: Indexing very large files.

2007-09-07 Thread Brian Carmalt
Lance Norskog schrieb: Now I'm curious: what is the use case for documents this large? Thanks, Lance Norskog It is a rand use case, but could become relevant for us. I was told to explore the possibilities, and that's what I'm doing. :) Since I haven't heard any suggestions as to how

Re: Indexing very large files.

2007-09-07 Thread Walter Underwood
Legal discovery can have requirements like this. --wunder On 9/7/07 4:47 AM, Brian Carmalt [EMAIL PROTECTED] wrote: Lance Norskog schrieb: Now I'm curious: what is the use case for documents this large? Thanks, Lance Norskog It is a rand use case, but could become relevant for

Re: Indexing very large files.

2007-09-07 Thread Mike Klaas
On 7-Sep-07, at 4:47 AM, Brian Carmalt wrote: Lance Norskog schrieb: Now I'm curious: what is the use case for documents this large? It is a rand use case, but could become relevant for us. I was told to explore the possibilities, and that's what I'm doing. :) Since I haven't heard any

Re: Indexing very large files.

2007-09-06 Thread Brian Carmalt
Yonik Seeley schrieb: On 9/5/07, Brian Carmalt [EMAIL PROTECTED] wrote: I've bin trying to index a 300MB file to solr 1.2. I keep getting out of memory heap errors. 300MB of what... a single 300MB document? Or is that file represent multiple documents in XML or CSV format? -Yonik

Re: Indexing very large files.

2007-09-06 Thread Brian Carmalt
Hello again, I run Solr on Tomcat under windows and use the tomcat monitor to start the service. I have set the minimum heap size to be 512MB and then maximum to be 1024mb. The system has 2 Gigs of ram. The error that I get after sending approximately 300 MB is: java.lang.OutOfMemoryError:

Re: Indexing very large files.

2007-09-06 Thread Thorsten Scherler
On Thu, 2007-09-06 at 08:55 +0200, Brian Carmalt wrote: Hello again, I run Solr on Tomcat under windows and use the tomcat monitor to start the service. I have set the minimum heap size to be 512MB and then maximum to be 1024mb. The system has 2 Gigs of ram. The error that I get after

Re: Indexing very large files.

2007-09-06 Thread Brian Carmalt
Moin Thorsten, I am using Solr 1.2.0. I'll try the svn version out and see of that helps. Thanks, Brian Which version do you use of solr? http://svn.apache.org/viewvc/lucene/solr/trunk/src/java/org/apache/solr/handler/XmlUpdateRequestHandler.java?view=markup The trunk version of the

Re: Indexing very large files.

2007-09-06 Thread Thorsten Scherler
On Thu, 2007-09-06 at 11:26 +0200, Brian Carmalt wrote: Hallo again, I checked out the solr source and built the 1.3-dev version and then I tried to index the same file to the new server. I do get a different exception trace, but the result is the same. java.lang.OutOfMemoryError: Java

RE: Indexing very large files.

2007-09-06 Thread Lance Norskog
Now I'm curious: what is the use case for documents this large? Thanks, Lance Norskog

Re: Indexing very large files.

2007-09-05 Thread Norberto Meijome
On Wed, 05 Sep 2007 17:18:09 +0200 Brian Carmalt [EMAIL PROTECTED] wrote: I've bin trying to index a 300MB file to solr 1.2. I keep getting out of memory heap errors. Even on an empty index with one Gig of vm memory it sill won't work. Hi Brian, VM != heap memory. VM = OS memory heap