Hi Timo, I've taken a deep look at the SearchBlox code and found a significant problem. I've created a patch for you to address it, although it is not the final fix. The patch should work on either 2.1 or 1.9. See CONNECTORS-1198 for complete details.
Please let me know ASAP if the patch does not solve your immediate problem, since I will be making other changes to the connector to bring it in line with ManifoldCF standards. Karl On Fri, May 8, 2015 at 8:01 PM, Karl Wright <[email protected]> wrote: > That error is what I was afraid of. > > We need the complete exception trace. Can you find that and create a > ticket, including the complete trace? > > My apologies; the searchblox connector is a contribution which obviously > still has bugs. With the trace though I should be able to get you a patch. > > Karl > > Sent from my Windows Phone > ------------------------------ > From: Timo Selvaraj > Sent: 5/8/2015 6:46 PM > To: Karl Wright > Cc: [email protected] > > Subject: Re: File system continuous crawl settings > > Hi Karl, > > The only error message which seems to be continuously thrown in manifold > log is : > > FATAL 2015-05-08 18:42:47,043 (Worker thread '40') - Error tossed: null > java.lang.NullPointerException > > I do notice that the file that needs to deleted is shown under the Queue > Status report and keeps jumping between “Processing” and “About to Process” > statuses every 30 seconds. > > Timo > > > On May 8, 2015, at 1:40 PM, Karl Wright <[email protected]> wrote: > > Hi Timo, > > As I said, I don't think your configuration is the source of the delete > issue. I suspect the searchblox connector. > > In the absence of a thread dump, can you look for exceptions in the > manifoldcf log? > > Karl > > Sent from my Windows Phone > ------------------------------ > From: Timo Selvaraj > Sent: 5/8/2015 10:06 AM > To: [email protected] > Subject: Re: File system continuous crawl settings > > When I change the settings to the following, updated or modified documents > are now indexed but deleting the documents that are removed is still an > issue: > > Schedule type:Rescan documents dynamicallyMinimum recrawl interval:5 > minutesMaximum recrawl interval:10 minutesExpiration interval:InfinityReseed > interval:60 minutesNo scheduled run timesMaximum hop count for link type > 'child':UnlimitedHop count mode:Delete unreachable documents > > Do I need to set the reseed interval to Infinity? > > Any thoughts? > > > On May 8, 2015, at 6:18 AM, Karl Wright <[email protected]> wrote: > > I just tried your configuration here. A deleted document in the file > system was indeed picked up as expected. > > I did notice that your "expiration" setting is, essentially, cleaning out > documents at a rapid clip. With this setting, documents will be expired > before they are recrawled. You probably want one strategy or the other but > not both. > > As for why a deleted document is "stuck" in Processing: the only thing I > can think of is that the output connection you've chosen is having trouble > deleting the document from the index. What output connector are you using? > > Karl > > > On Fri, May 8, 2015 at 4:36 AM, Timo Selvaraj <[email protected]> > wrote: > >> Hi, >> >> We are testing the continuous crawl feature for file system connector on >> a small folder to test if new documents are added to the folder, missing >> documents removed and modified documents updated are handled by the >> continuous crawl job: >> >> Here are the settings we use: >> >> Schedule type:Rescan documents dynamicallyMinimum recrawl interval:5 >> minutesMaximum recrawl interval:10 minutesExpiration interval:5 minutesReseed >> interval:10 minutesNo scheduled run timesMaximum hop count for link type >> 'child':UnlimitedHop count mode:Delete unreachable documents >> >> Adding new documents seem to be getting picked up by the job however >> removal of a document or update to a document are not being picked up. >> >> Am I missing any settings for the deletions or updates? I do see the >> document that has been removed is showing as Processing under Queue Status >> and others are showing as Waiting for Processing. >> >> Any idea what setting is missing for the deletes/updates to be recognized >> and re-indexed? >> >> Thanks, >> Timo >> > > > >
