Hah. Exactly the kind of configuration difference I was expecting. Whatever it is, it's showing up as a list.
I'll open a ticket, and propose a patch; let's see if that gets us past this. The ticket is CONNECTORS-812. I should have a patch in a few minutes, attached to the ticket. Karl On Mon, Nov 18, 2013 at 8:41 PM, Mark Libucha <[email protected]> wrote: > Seems to be a SP-internal thing. > > http://msdn.microsoft.com/en-us/library/aa661294.ASPX > > Mark > > > On Mon, Nov 18, 2013 at 5:39 PM, Karl Wright <[email protected]> wrote: > >> Hi Mark, >> >> Is "Cache Profiles" a list in your SharePoint? If not, what is it? >> >> Karl >> >> >> >> On Mon, Nov 18, 2013 at 8:37 PM, Mark Libucha <[email protected]> wrote: >> >>> Hi Karl, >>> >>> It's not the first problem you mentioned. I don't have a site specified >>> in my SP connection. But it could well be the misconfigured IIS issue... >>> >>> Here's what I get with your modified log message: >>> >>> ERROR 2013-11-18 20:35:47,440 (Worker thread '7') - Exception tossed: >>> Expected path to start with /Lists/, saw: '/Cache Profiles/1_.000' >>> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Expected path >>> to start with /Lists/, saw: '/Cache Profiles/1_.000' >>> >>> Thanks, >>> >>> Mark >>> >>> >>> >>> On Mon, Nov 18, 2013 at 5:29 PM, Karl Wright <[email protected]> wrote: >>> >>>> Hi Mark, >>>> >>>> The exception is very helpful. >>>> >>>> I've seen this before. I know of two ways it can happen. >>>> >>>> First way: your Repository Connection is not actually pointing at the >>>> SharePoint root, but rather a subsite of the root. That usually messes >>>> things up pretty well - and it's not easy to detect in the connector >>>> properly either. You must point at the actual root, not a subsite, and use >>>> the criteria to limit what you include. >>>> >>>> Second way: your SharePoint instance has a malconfigured IIS, which is >>>> mapping paths in ways that are unexpected. >>>> >>>> There may be other ways that this can happen; SharePoint has a myriad >>>> different configuration options and it is possible your instance has one >>>> that is not something we've ever seen before. If you think that is what is >>>> happening, change this line: >>>> >>>> throw new ManifoldCFException("Expected path to start with >>>> /Lists/"); >>>> >>>> to: >>>> >>>> throw new ManifoldCFException("Expected path to start with >>>> /Lists/, saw: '"+relPath+"'"); >>>> >>>> Karl >>>> >>>> >>>> >>>> >>>> On Mon, Nov 18, 2013 at 8:20 PM, Mark Libucha <[email protected]>wrote: >>>> >>>>> Screen shot attached. Using 4.1, SharePoint 2010. >>>>> >>>>> Throws this exception: >>>>> >>>>> ERROR 2013-11-18 20:12:58,058 (Worker thread '13') - Exception tossed: >>>>> Expected path to start with /Lists/ >>>>> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Expected >>>>> path to start with /Lists/ >>>>> at >>>>> org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository$ListItemStream.addFile(SharePointRepository.java:2255) >>>>> >>>>> I added a debug log message to the SharePoint crawler so the line >>>>> number may be off by 1 or 2... >>>>> >>>>> Thanks, >>>>> >>>>> Mark >>>>> >>>>> >>>>> >>>>> On Mon, Nov 18, 2013 at 4:59 PM, Karl Wright <[email protected]>wrote: >>>>> >>>>>> Hi Mark, >>>>>> >>>>>> First, what version of ManifoldCF are you using? 1.3 has some bugs >>>>>> where lists are concerned. >>>>>> >>>>>> Second, I've recently and repeatedly run exactly this crawl against a >>>>>> site that one of our ManifoldCF users set up in Amazon, so I know it >>>>>> works >>>>>> properly. So now the question is to determine exactly what you are doing >>>>>> that is not correct. >>>>>> >>>>>> If you want to crawl just lists, you will nevertheless need to enter >>>>>> both a Site match and a List match. Otherwise you will get nothing, >>>>>> because no sites can be crawled. >>>>>> >>>>>> To enter ANY of the rules I specified above, type a "*" in the >>>>>> type-in box, then select "Add Text". Then, select one of >>>>>> "File","Site","List",or "Library" from the pulldown, and then click the >>>>>> "Add new Rule" button. The Metadata tab works similarly. >>>>>> >>>>>> If you want me to verify you have done this correctly, please include >>>>>> a screen shot of the job's View page. >>>>>> >>>>>> If this still isn't helping you, please include a screen shot of the >>>>>> Simple History report after you have run a crawl. >>>>>> >>>>>> Thanks, >>>>>> Karl >>>>>> >>>>>> >>>>>> >>>>>> On Mon, Nov 18, 2013 at 7:49 PM, Mark Libucha <[email protected]>wrote: >>>>>> >>>>>>> I've seen this issue come up before, but I'd like to hear more about >>>>>>> it (Karl), if there is more to say about it... >>>>>>> >>>>>>> Why isn't there an option to crawl an entire SharePoint site. I mean >>>>>>> it's awesome that the UI gives us the option of drilling down >>>>>>> dynamically >>>>>>> and specifying exactly which parts we want crawled, but isn't the >>>>>>> default >>>>>>> case for most users to just crawl the whole thing? >>>>>>> >>>>>>> So, why exactly is this not an option, and what would adding that >>>>>>> functionality (I would be volunteering to try this) be feasible? >>>>>>> >>>>>>> On a more specific level, Karl wrote this in an earlier thread: >>>>>>> >>>>>>> <quote> >>>>>>> For SharePoint, if you want to crawl everything beneath your root >>>>>>> site, the simplest way is to define 4 rules: >>>>>>> (1) SITE rule "/*" >>>>>>> (2) LIST rule "/*" >>>>>>> (3) LIBRARY rule "/*" >>>>>>> (4) FILE rule "/*" >>>>>>> </quote> >>>>>>> >>>>>>> I haven't be able to get this to work. It only seems to get files. >>>>>>> >>>>>>> Limiting the scope to just Lists, when I use "/*" and specify List, >>>>>>> I get nothing crawled. Also tried "/Lists/*". Still nothing. >>>>>>> >>>>>>> Maybe I'm not specifying the Metadata correctly? Could you expand on >>>>>>> this Karl? What exactly needs to be specified to crawl all Lists? If I >>>>>>> can >>>>>>> get that to work I can probably figure out the rest of it. >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Mark >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
