Hi Karl,
Thank you very much for your reply. MCF processed all the items in the large
list with no errors when I switched to Postgresql. Your suggestion is very
helpful. Thank you for your suggestion. Best regards,
Cheng
Date: Fri, 8 Apr 2016 06:05:38 -0400
Subject: Re: Sharepoint 2013 Crawling a large list
From: [email protected]
To: [email protected]
Hi Cheng,
That is a pretty impressively messed up system!
Let's start with what we know and then go on to what we don't.
The "Remote procedure exception" error is due to an org.apache.axis.AxisFault
exception that is not apparently coming from the server. That's pretty weird
in its own right. Equally weird is the NPE coming from within HttpClient
during NTLM processing. Unfortunately we aren't seeing the actual stack traces
themselves, which would allow us to figure out what was happening; instead you
are getting ArrayIndexOutOfBounds and NullPointerExceptions doing basic things
like array copying (!).
Can you include one or two of the actual traces (with line numbers?)
My sense is that (a) you are using a non-standard JVM that is (b) running out
of memory, but not throwing an out of memory exception when that happens.
Rather, it's blowing up and not allocating memory that it needs instead. It's
running out of memory most likely because (c) you are using Hsqldb, and hsqldb
is keeping its database tables in memory, which is what it does.
I would recommend either (1) give MCF more memory, or (2) better yet, switch to
Postgresql. And if this keeps happening under either scenario, please include
a few of the full traces so I can make better sense of the problem.
Please let us know what happens.
Thanks,Karl
On Fri, Apr 8, 2016 at 3:32 AM, Cheng Zeng <[email protected]> wrote:
Hi,
I am trying to extract web pages and attachments from Sharepoint 2013 and
upload these data to solr for indexing.
I have installed the Sharepoint plugin on sharepoint 2013 server and been able
to use manifoldCF to fetch items from the lists with less than 160 items. My
problem is that there are a few lists which have more than 4,900 items. When
manifoldCF tried to crawl on these large lists, it turned out that it started
to process items very slow and seems to stop working, after 2,100 items were
processed. I tried to slow down the speed to upload the items to the solr
instance by forcing the working thread to sleep for 3 seconds after every 50
items were added to the pipeline. I tried to slow down the speed several times
but manifoldCF starts to process items very slow as long as 2,100 items in the
list were processed. It is noted that manifoldCF starts to process items very
slow after around 30 minutes since the crawling job starts and the errors are
tossed as follows.
WARN 2016-04-08 12:29:14,762 (Worker thread '19') - Service interruption
reported for job 1460088455222 connection 'SharepointRepoistoryConn': Remote
procedure exception: ; nested exception is:
java.lang.ArrayIndexOutOfBoundsExceptionFATAL 2016-04-08 12:29:14,777 (Worker
thread '28') - Error tossed: nulljava.lang.NullPointerExceptionFATAL 2016-04-08
12:30:37,611 (Worker thread '29') - Error tossed:
nulljava.lang.NullPointerException
The log is attached. If someone could help me, I would really appreciated it.
Best regards,
Cheng