Hi,
I'm sending to user mailing list in case anybody else has the same issue with 
JCIF connector.
RegardsKambiz Niktabar
    ----- Forwarded Message -----
  From: Karl Wright <[email protected]>
 To: Kambiz Niktabar <[email protected]> 
 Sent: Monday, January 26, 2015 12:58 PM
 Subject: Re: Slow performance of Windows Share connector
   
Thanks for the update!
It would be great if you could post this to the user list; other people may 
encounter similar problems.

Karl




On Mon, Jan 26, 2015 at 6:25 AM, Kambiz Niktabar <[email protected]> wrote:

Hi Karl,
As promised I wanted to inform you about the result of this case.By looking at 
the capture of WireShark, I noticed that there are many errors complaining 
about duplicate domain name. Then I just change the "Authentication domain" in 
Server tab of repository connection to our pre-Windows 2000 domain name and now 
it works perfectly fine. 
RegardsKambiz
      From: Karl Wright <[email protected]>
 To: Kambiz Niktabar <[email protected]> 
 Sent: Friday, January 23, 2015 1:18 PM
 Subject: Re: Slow performance of Windows Share connector
   
Hi Kambiz,

The "access" time includes the fetching of the document up to the time spent 
sending the document to the outputs.

If you are crawling the local file system through JCIFS, and you are still 
writing data locally, then clearly the output connection is not involved.

My suspicion is that, because CIFS is involved under Windows, it's possible 
that you are indeed going through network even though both source and 
destination are local.  You can readily figure this out using WireShark, and 
see what packets are going in and out of that machine during crawling.

I should also state that, in my experience, the CIFS protocol is relatively 
fragile, because it is multiplexed.  That means that when any one virtual 
connection has errors, multiple connections must be dropped and retried.  
Windows implementations of CIFS, likewise, are not very good at handling large 
numbers of virtual connections simultaneously.  If you have a max connection 
count that is set too big, then, you might have errors you are unaware of.

My suggestion: First, look at the log to see if there are any errors.
Second: lower the maximum number of JCIFS repository connections to between 2 
and 5.
Third: Verify that you are not doing something funny with network using 
Wireshark.

As far as performance of the CIFS connector is concerned, that's a function 
wholly of the jcifs library, the cifs server.  It is what it is, therefore, and 
there's not a lot you can do about it, other than to make sure there are no 
obvious bottlenecks in the network or errors in the log.

Karl




On Fri, Jan 23, 2015 at 6:52 AM, Kambiz Niktabar <[email protected]> wrote:

Thanks for your prompt reply. Basically the snapshot I sent you, is related to 
the test for crawling documents on the local disk and File system as output 
connector (outputting into a folder on local disk too) so in this case no 
switch is involved in the test. I tried testing the same folder with File 
System repository connection and output to Solr and it was very quick so it 
seems to be something related to JCIF connector.What kind of performance do you 
get with that JCIF connector (docs/sec)?
P.S. What exactly that "access" time means? is it the time that connector reads 
and fetches the content into the %USERPROFILE%\Local Settings\Temp ?
RegardsKambiz
      From: Karl Wright <[email protected]>
 To: Kambiz Niktabar <[email protected]> 
 Sent: Friday, January 23, 2015 12:23 PM
 Subject: Re: Slow performance of Windows Share connector
   
>From your simple history, dividing the size of the document by the time it 
>takes to fetch it, I get a pretty constant number (about 70 bytes per 
>millisecond, or 70K bytes per second, on average).  The longer the file, 
>though, the slower it gets.  It looks to me like you are crawling through an 
>internet switch somewhere that is throttling your fetches.  Popular behavior 
>for such switches these days is to have fetches start off being fast, but then 
>progressively slow down to some minimum speed as more data is transmitted.  
>Obviously the point is to conserve bandwidth.


Karl





On Fri, Jan 23, 2015 at 6:15 AM, Kambiz Niktabar <[email protected]> wrote:

Hi Karl,
I am facing some kind of performance issue with Windows Share (JCIF) connector. 
Crawling a folder contains PDF and Word documents (with images in the file) 
takes long time. The following scenarios have been tested:1- Testing with Solr 
and File system connector in separate jobs but the result were almost the 
same.2- Copying documents into the local disk of ManifoldCF Server but no 
difference, so it couldn't be network issue
Actually by looking at the simple history report (for the scenario of documents 
on local disk and File system as output connector), I noticed that the access 
time for some documents are extremely long (check attached snapshot). As it 
shows, there is not always any direct relation between volume of the file and 
the access time.Do you have any idea what could be the reason for the slow 
performance?
RegardsKambiz Niktabar




   



   



  

Reply via email to