Hi Karl,
Thank you very much for your reply. MCF processed all the items in the large 
list with no errors when I switched to Postgresql. Your suggestion is very 
helpful. Thank you for your suggestion. Best regards,
Cheng

Date: Fri, 8 Apr 2016 06:05:38 -0400
Subject: Re: Sharepoint 2013 Crawling a large list
From: [email protected]
To: [email protected]

Hi Cheng,
That is a pretty impressively messed up system!
Let's start with what we know and then go on to what we don't.
The "Remote procedure exception" error is due to an org.apache.axis.AxisFault 
exception that is not apparently coming from the server.  That's pretty weird 
in its own right.  Equally weird is the NPE coming from within HttpClient 
during NTLM processing.  Unfortunately we aren't seeing the actual stack traces 
themselves, which would allow us to figure out what was happening; instead you 
are getting ArrayIndexOutOfBounds and NullPointerExceptions doing basic things 
like array copying (!).
Can you include one or two of the actual traces (with line numbers?)
My sense is that (a) you are using a non-standard JVM that is (b) running out 
of memory, but not throwing an out of memory exception when that happens.  
Rather, it's blowing up and not allocating memory that it needs instead.  It's 
running out of memory most likely because (c) you are using Hsqldb, and hsqldb 
is keeping its database tables in memory, which is what it does.
I would recommend either (1) give MCF more memory, or (2) better yet, switch to 
Postgresql.  And if this keeps happening under either scenario, please include 
a few of the full traces so I can make better sense of the problem.
Please let us know what happens.
Thanks,Karl

On Fri, Apr 8, 2016 at 3:32 AM, Cheng Zeng <[email protected]> wrote:



Hi,
I am trying to extract web pages and attachments from Sharepoint 2013 and 
upload these data to solr for indexing. 
I have installed the Sharepoint plugin on sharepoint 2013 server and been able 
to use manifoldCF to fetch items from the lists with less than 160 items. My 
problem is that there are a few lists which have more than 4,900 items. When 
manifoldCF tried to crawl on these large lists, it turned out that it started 
to process items very slow and seems to stop working, after 2,100 items were 
processed. I tried to slow down the speed to upload the items to the solr 
instance by forcing the working thread to sleep for 3 seconds after every 50 
items were added to the pipeline. I tried to slow down the speed several times 
but manifoldCF starts to process items very slow as long as 2,100 items in the 
list were processed. It is noted that manifoldCF  starts to process items very 
slow after around 30 minutes since the crawling job starts and the errors are 
tossed as follows.
WARN 2016-04-08 12:29:14,762 (Worker thread '19') - Service interruption 
reported for job 1460088455222 connection 'SharepointRepoistoryConn': Remote 
procedure exception: ; nested exception is:       
java.lang.ArrayIndexOutOfBoundsExceptionFATAL 2016-04-08 12:29:14,777 (Worker 
thread '28') - Error tossed: nulljava.lang.NullPointerExceptionFATAL 2016-04-08 
12:30:37,611 (Worker thread '29') - Error tossed: 
nulljava.lang.NullPointerException

The log is attached.  If someone could help me, I would really appreciated it.
Best regards,
Cheng                                     


                                          

Reply via email to