Ok here is what I did.
I split 5 million pages over 8 segments. They
are all indexed and I go to load them up, and 0-3 work fine. The only
problem I had in 3 was when the fetcher seemed to stall due to the parsing of
pdf files. When I load anything beyond 3 I get this:
HTTP Status 500 -
type Exception report
message
description The server encountered an internal error () that prevented it from fulfilling this request.
exception
javax.servlet.ServletException org.apache.jasper.runtime.PageContextImpl.doHandlePageException(PageContextImpl.java:825) org.apache.jasper.runtime.PageContextImpl.handlePageException(PageContextImpl.java:758) org.apache.jsp.search_jsp._jspService(search_jsp.java:495) org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:94) javax.servlet.http.HttpServlet.service(HttpServlet.java:802) org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:324) org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:292) org.apache.jasper.servlet.JspServlet.service(JspServlet.java:236) javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
root cause
java.lang.OutOfMemoryError
note The full stack trace of the root cause is available in the Apache Tomcat/5.0.28 logs.
Apache Tomcat/5.0.28
These are the segments that work:
19M
segments/20040829092114/fetchlist
21M segments/20040829092114/fetcher
244M segments/20040829092114/content
82M segments/20040829092114/parse_text
143M segments/20040829092114/parse_data
119M segments/20040829092114/index
625M segments/20040829092114
26M segments/20040829122947/fetchlist
32M segments/20040829122947/fetcher
646M segments/20040829122947/content
217M segments/20040829122947/parse_text
358M segments/20040829122947/parse_data
297M segments/20040829122947/index
1.6G segments/20040829122947
70M segments/20040829124357/fetchlist
84M segments/20040829124357/fetcher
2.2G segments/20040829124357/content
811M segments/20040829124357/parse_text
1.5G segments/20040829124357/parse_data
1.1G segments/20040829124357/index
5.6G segments/20040829124357
90M segments/20040829130541/fetchlist
102M segments/20040829130541/fetcher
977M segments/20040829130541/content
301M segments/20040829130541/parse_text
547M segments/20040829130541/parse_data
428M segments/20040829130541/index
2.4G segments/20040829130541
28M segments/20040829212107/fetchlist
34M segments/20040829212107/fetcher
757M segments/20040829212107/content
256M segments/20040829212107/parse_text
502M segments/20040829212107/parse_data
344M segments/20040829212107/index
1.9G segments/20040829212107
28M segments/20040829225928/fetchlist
33M segments/20040829225928/fetcher
694M segments/20040829225928/content
243M segments/20040829225928/parse_text
398M segments/20040829225928/parse_data
329M segments/20040829225928/index
1.7G segments/20040829225928
29M segments/20040830042947/fetchlist
35M segments/20040830042947/fetcher
890M segments/20040830042947/content
292M segments/20040830042947/parse_text
673M segments/20040830042947/parse_data
376M segments/20040830042947/index
2.3G segments/20040830042947
29M segments/20040830043001/fetchlist
35M segments/20040830043001/fetcher
893M segments/20040830043001/content
293M segments/20040830043001/parse_text
678M segments/20040830043001/parse_data
377M segments/20040830043001/index
2.3G segments/20040830043001
29M segments/20040830065943/fetchlist
35M segments/20040830065943/fetcher
872M segments/20040830065943/content
292M segments/20040830065943/parse_text
661M segments/20040830065943/parse_data
377M segments/20040830065943/index
2.3G segments/20040830065943
154M segments/20040830111830/fetchlist
183M segments/20040830111830/fetcher
5.1G segments/20040830111830/content
1.7G segments/20040830111830/parse_text
3.9G segments/20040830111830/parse_data
2.2G segments/20040830111830/index
13G segments/20040830111830
12M segments/20040904035557-0/fetchlist
15M segments/20040904035557-0/fetcher
316M segments/20040904035557-0/content
122M segments/20040904035557-0/parse_text
203M segments/20040904035557-0/parse_data
163M segments/20040904035557-0/index
829M segments/20040904035557-0
13M segments/20040904035557-1/fetchlist
16M segments/20040904035557-1/fetcher
333M segments/20040904035557-1/content
132M segments/20040904035557-1/parse_text
210M segments/20040904035557-1/parse_data
177M segments/20040904035557-1/index
878M segments/20040904035557-1
12M segments/20040904035557-2/fetchlist
15M segments/20040904035557-2/fetcher
328M segments/20040904035557-2/content
126M segments/20040904035557-2/parse_text
207M segments/20040904035557-2/parse_data
170M segments/20040904035557-2/index
855M segments/20040904035557-2
73M segments/20040905133243-0/fetchlist
87M segments/20040905133243-0/fetcher
2.0G segments/20040905133243-0/content
762M segments/20040905133243-0/parse_text
1.4G segments/20040905133243-0/parse_data
974M segments/20040905133243-0/index
5.3G segments/20040905133243-0
73M segments/20040905133243-1/fetchlist
87M segments/20040905133243-1/fetcher
2.0G segments/20040905133243-1/content
746M segments/20040905133243-1/parse_text
1.4G segments/20040905133243-1/parse_data
958M segments/20040905133243-1/index
5.2G segments/20040905133243-1
72M segments/20040905133243-2/fetchlist
86M segments/20040905133243-2/fetcher
1.9G segments/20040905133243-2/content
722M segments/20040905133243-2/parse_text
1.4G segments/20040905133243-2/parse_data
935M segments/20040905133243-2/index
5.0G segments/20040905133243-2
71M segments/20040905133243-3/fetchlist
84M segments/20040905133243-3/fetcher
1.9G segments/20040905133243-3/content
727M segments/20040905133243-3/parse_text
1.3G segments/20040905133243-3/parse_data
926M segments/20040905133243-3/index
5.0G segments/20040905133243-3
57G segments
21M segments/20040829092114/fetcher
244M segments/20040829092114/content
82M segments/20040829092114/parse_text
143M segments/20040829092114/parse_data
119M segments/20040829092114/index
625M segments/20040829092114
26M segments/20040829122947/fetchlist
32M segments/20040829122947/fetcher
646M segments/20040829122947/content
217M segments/20040829122947/parse_text
358M segments/20040829122947/parse_data
297M segments/20040829122947/index
1.6G segments/20040829122947
70M segments/20040829124357/fetchlist
84M segments/20040829124357/fetcher
2.2G segments/20040829124357/content
811M segments/20040829124357/parse_text
1.5G segments/20040829124357/parse_data
1.1G segments/20040829124357/index
5.6G segments/20040829124357
90M segments/20040829130541/fetchlist
102M segments/20040829130541/fetcher
977M segments/20040829130541/content
301M segments/20040829130541/parse_text
547M segments/20040829130541/parse_data
428M segments/20040829130541/index
2.4G segments/20040829130541
28M segments/20040829212107/fetchlist
34M segments/20040829212107/fetcher
757M segments/20040829212107/content
256M segments/20040829212107/parse_text
502M segments/20040829212107/parse_data
344M segments/20040829212107/index
1.9G segments/20040829212107
28M segments/20040829225928/fetchlist
33M segments/20040829225928/fetcher
694M segments/20040829225928/content
243M segments/20040829225928/parse_text
398M segments/20040829225928/parse_data
329M segments/20040829225928/index
1.7G segments/20040829225928
29M segments/20040830042947/fetchlist
35M segments/20040830042947/fetcher
890M segments/20040830042947/content
292M segments/20040830042947/parse_text
673M segments/20040830042947/parse_data
376M segments/20040830042947/index
2.3G segments/20040830042947
29M segments/20040830043001/fetchlist
35M segments/20040830043001/fetcher
893M segments/20040830043001/content
293M segments/20040830043001/parse_text
678M segments/20040830043001/parse_data
377M segments/20040830043001/index
2.3G segments/20040830043001
29M segments/20040830065943/fetchlist
35M segments/20040830065943/fetcher
872M segments/20040830065943/content
292M segments/20040830065943/parse_text
661M segments/20040830065943/parse_data
377M segments/20040830065943/index
2.3G segments/20040830065943
154M segments/20040830111830/fetchlist
183M segments/20040830111830/fetcher
5.1G segments/20040830111830/content
1.7G segments/20040830111830/parse_text
3.9G segments/20040830111830/parse_data
2.2G segments/20040830111830/index
13G segments/20040830111830
12M segments/20040904035557-0/fetchlist
15M segments/20040904035557-0/fetcher
316M segments/20040904035557-0/content
122M segments/20040904035557-0/parse_text
203M segments/20040904035557-0/parse_data
163M segments/20040904035557-0/index
829M segments/20040904035557-0
13M segments/20040904035557-1/fetchlist
16M segments/20040904035557-1/fetcher
333M segments/20040904035557-1/content
132M segments/20040904035557-1/parse_text
210M segments/20040904035557-1/parse_data
177M segments/20040904035557-1/index
878M segments/20040904035557-1
12M segments/20040904035557-2/fetchlist
15M segments/20040904035557-2/fetcher
328M segments/20040904035557-2/content
126M segments/20040904035557-2/parse_text
207M segments/20040904035557-2/parse_data
170M segments/20040904035557-2/index
855M segments/20040904035557-2
73M segments/20040905133243-0/fetchlist
87M segments/20040905133243-0/fetcher
2.0G segments/20040905133243-0/content
762M segments/20040905133243-0/parse_text
1.4G segments/20040905133243-0/parse_data
974M segments/20040905133243-0/index
5.3G segments/20040905133243-0
73M segments/20040905133243-1/fetchlist
87M segments/20040905133243-1/fetcher
2.0G segments/20040905133243-1/content
746M segments/20040905133243-1/parse_text
1.4G segments/20040905133243-1/parse_data
958M segments/20040905133243-1/index
5.2G segments/20040905133243-1
72M segments/20040905133243-2/fetchlist
86M segments/20040905133243-2/fetcher
1.9G segments/20040905133243-2/content
722M segments/20040905133243-2/parse_text
1.4G segments/20040905133243-2/parse_data
935M segments/20040905133243-2/index
5.0G segments/20040905133243-2
71M segments/20040905133243-3/fetchlist
84M segments/20040905133243-3/fetcher
1.9G segments/20040905133243-3/content
727M segments/20040905133243-3/parse_text
1.3G segments/20040905133243-3/parse_data
926M segments/20040905133243-3/index
5.0G segments/20040905133243-3
57G segments
When I merge this, the index directory is right at
10 gigs.
Are there limits to the /index directory
size? Am I missing something simple?
Thanks!
J
