Hi,
I'm sending to user mailing list in case anybody else has the same issue with
JCIF connector.
RegardsKambiz Niktabar
----- Forwarded Message -----
From: Karl Wright <[email protected]>
To: Kambiz Niktabar <[email protected]>
Sent: Monday, January 26, 2015 12:58 PM
Subject: Re: Slow performance of Windows Share connector
Thanks for the update!
It would be great if you could post this to the user list; other people may
encounter similar problems.
Karl
On Mon, Jan 26, 2015 at 6:25 AM, Kambiz Niktabar <[email protected]> wrote:
Hi Karl,
As promised I wanted to inform you about the result of this case.By looking at
the capture of WireShark, I noticed that there are many errors complaining
about duplicate domain name. Then I just change the "Authentication domain" in
Server tab of repository connection to our pre-Windows 2000 domain name and now
it works perfectly fine.
RegardsKambiz
From: Karl Wright <[email protected]>
To: Kambiz Niktabar <[email protected]>
Sent: Friday, January 23, 2015 1:18 PM
Subject: Re: Slow performance of Windows Share connector
Hi Kambiz,
The "access" time includes the fetching of the document up to the time spent
sending the document to the outputs.
If you are crawling the local file system through JCIFS, and you are still
writing data locally, then clearly the output connection is not involved.
My suspicion is that, because CIFS is involved under Windows, it's possible
that you are indeed going through network even though both source and
destination are local. You can readily figure this out using WireShark, and
see what packets are going in and out of that machine during crawling.
I should also state that, in my experience, the CIFS protocol is relatively
fragile, because it is multiplexed. That means that when any one virtual
connection has errors, multiple connections must be dropped and retried.
Windows implementations of CIFS, likewise, are not very good at handling large
numbers of virtual connections simultaneously. If you have a max connection
count that is set too big, then, you might have errors you are unaware of.
My suggestion: First, look at the log to see if there are any errors.
Second: lower the maximum number of JCIFS repository connections to between 2
and 5.
Third: Verify that you are not doing something funny with network using
Wireshark.
As far as performance of the CIFS connector is concerned, that's a function
wholly of the jcifs library, the cifs server. It is what it is, therefore, and
there's not a lot you can do about it, other than to make sure there are no
obvious bottlenecks in the network or errors in the log.
Karl
On Fri, Jan 23, 2015 at 6:52 AM, Kambiz Niktabar <[email protected]> wrote:
Thanks for your prompt reply. Basically the snapshot I sent you, is related to
the test for crawling documents on the local disk and File system as output
connector (outputting into a folder on local disk too) so in this case no
switch is involved in the test. I tried testing the same folder with File
System repository connection and output to Solr and it was very quick so it
seems to be something related to JCIF connector.What kind of performance do you
get with that JCIF connector (docs/sec)?
P.S. What exactly that "access" time means? is it the time that connector reads
and fetches the content into the %USERPROFILE%\Local Settings\Temp ?
RegardsKambiz
From: Karl Wright <[email protected]>
To: Kambiz Niktabar <[email protected]>
Sent: Friday, January 23, 2015 12:23 PM
Subject: Re: Slow performance of Windows Share connector
>From your simple history, dividing the size of the document by the time it
>takes to fetch it, I get a pretty constant number (about 70 bytes per
>millisecond, or 70K bytes per second, on average). The longer the file,
>though, the slower it gets. It looks to me like you are crawling through an
>internet switch somewhere that is throttling your fetches. Popular behavior
>for such switches these days is to have fetches start off being fast, but then
>progressively slow down to some minimum speed as more data is transmitted.
>Obviously the point is to conserve bandwidth.
Karl
On Fri, Jan 23, 2015 at 6:15 AM, Kambiz Niktabar <[email protected]> wrote:
Hi Karl,
I am facing some kind of performance issue with Windows Share (JCIF) connector.
Crawling a folder contains PDF and Word documents (with images in the file)
takes long time. The following scenarios have been tested:1- Testing with Solr
and File system connector in separate jobs but the result were almost the
same.2- Copying documents into the local disk of ManifoldCF Server but no
difference, so it couldn't be network issue
Actually by looking at the simple history report (for the scenario of documents
on local disk and File system as output connector), I noticed that the access
time for some documents are extremely long (check attached snapshot). As it
shows, there is not always any direct relation between volume of the file and
the access time.Do you have any idea what could be the reason for the slow
performance?
RegardsKambiz Niktabar