Hi Mark, The patch removed the exception toss entirely, so I don't think you applied it right.
Can you do the following: cd trunk svn revert connectors/sharepoint/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/sharepoint/SharePointRepository.java svn patch CONNECTORS-812.patch ant clean build Thanks! Karl On Mon, Nov 18, 2013 at 9:27 PM, Mark Libucha <[email protected]> wrote: > I *think* I applied the patch correctly. Got a new error: > > ERROR 2013-11-18 21:25:47,994 (Worker thread '1') - Exception tossed: > Expected path to start with /Lists/, saw: '/Relationships List/1_.000' > org.apache.manifoldcf.core.interfaces.ManifoldCFException: Expected path > to start with /Lists/, saw: '/Relationships List/1_.000' > > http://msdn.microsoft.com/en-us/library/ff798514.aspx > > Mark > > > On Mon, Nov 18, 2013 at 5:53 PM, Karl Wright <[email protected]> wrote: > >> Ok, patch attached. >> >> One of two things will happen with this patch: >> (1) It will work >> (2) It will crawl to completion but not get any list rows >> >> If it is the latter, it means that SharePoint operating in this mode >> REPLACES the list items with some funky cache URL, rather than augmenting >> them. So please send me the log output if that happens. >> >> Thanks, >> Karl >> >> >> >> On Mon, Nov 18, 2013 at 8:45 PM, Karl Wright <[email protected]> wrote: >> >>> Hah. Exactly the kind of configuration difference I was expecting. >>> Whatever it is, it's showing up as a list. >>> >>> I'll open a ticket, and propose a patch; let's see if that gets us past >>> this. >>> >>> The ticket is CONNECTORS-812. I should have a patch in a few minutes, >>> attached to the ticket. >>> >>> Karl >>> >>> >>> >>> >>> On Mon, Nov 18, 2013 at 8:41 PM, Mark Libucha <[email protected]>wrote: >>> >>>> Seems to be a SP-internal thing. >>>> >>>> http://msdn.microsoft.com/en-us/library/aa661294.ASPX >>>> >>>> Mark >>>> >>>> >>>> On Mon, Nov 18, 2013 at 5:39 PM, Karl Wright <[email protected]>wrote: >>>> >>>>> Hi Mark, >>>>> >>>>> Is "Cache Profiles" a list in your SharePoint? If not, what is it? >>>>> >>>>> Karl >>>>> >>>>> >>>>> >>>>> On Mon, Nov 18, 2013 at 8:37 PM, Mark Libucha <[email protected]>wrote: >>>>> >>>>>> Hi Karl, >>>>>> >>>>>> It's not the first problem you mentioned. I don't have a site >>>>>> specified in my SP connection. But it could well be the misconfigured IIS >>>>>> issue... >>>>>> >>>>>> Here's what I get with your modified log message: >>>>>> >>>>>> ERROR 2013-11-18 20:35:47,440 (Worker thread '7') - Exception tossed: >>>>>> Expected path to start with /Lists/, saw: '/Cache Profiles/1_.000' >>>>>> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Expected >>>>>> path to start with /Lists/, saw: '/Cache Profiles/1_.000' >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Mark >>>>>> >>>>>> >>>>>> >>>>>> On Mon, Nov 18, 2013 at 5:29 PM, Karl Wright <[email protected]>wrote: >>>>>> >>>>>>> Hi Mark, >>>>>>> >>>>>>> The exception is very helpful. >>>>>>> >>>>>>> I've seen this before. I know of two ways it can happen. >>>>>>> >>>>>>> First way: your Repository Connection is not actually pointing at >>>>>>> the SharePoint root, but rather a subsite of the root. That usually >>>>>>> messes >>>>>>> things up pretty well - and it's not easy to detect in the connector >>>>>>> properly either. You must point at the actual root, not a subsite, and >>>>>>> use >>>>>>> the criteria to limit what you include. >>>>>>> >>>>>>> Second way: your SharePoint instance has a malconfigured IIS, which >>>>>>> is mapping paths in ways that are unexpected. >>>>>>> >>>>>>> There may be other ways that this can happen; SharePoint has a >>>>>>> myriad different configuration options and it is possible your instance >>>>>>> has >>>>>>> one that is not something we've ever seen before. If you think that is >>>>>>> what is happening, change this line: >>>>>>> >>>>>>> throw new ManifoldCFException("Expected path to start >>>>>>> with /Lists/"); >>>>>>> >>>>>>> to: >>>>>>> >>>>>>> throw new ManifoldCFException("Expected path to start >>>>>>> with /Lists/, saw: '"+relPath+"'"); >>>>>>> >>>>>>> Karl >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Mon, Nov 18, 2013 at 8:20 PM, Mark Libucha <[email protected]>wrote: >>>>>>> >>>>>>>> Screen shot attached. Using 4.1, SharePoint 2010. >>>>>>>> >>>>>>>> Throws this exception: >>>>>>>> >>>>>>>> ERROR 2013-11-18 20:12:58,058 (Worker thread '13') - Exception >>>>>>>> tossed: Expected path to start with /Lists/ >>>>>>>> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Expected >>>>>>>> path to start with /Lists/ >>>>>>>> at >>>>>>>> org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository$ListItemStream.addFile(SharePointRepository.java:2255) >>>>>>>> >>>>>>>> I added a debug log message to the SharePoint crawler so the line >>>>>>>> number may be off by 1 or 2... >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Mark >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Nov 18, 2013 at 4:59 PM, Karl Wright <[email protected]>wrote: >>>>>>>> >>>>>>>>> Hi Mark, >>>>>>>>> >>>>>>>>> First, what version of ManifoldCF are you using? 1.3 has some >>>>>>>>> bugs where lists are concerned. >>>>>>>>> >>>>>>>>> Second, I've recently and repeatedly run exactly this crawl >>>>>>>>> against a site that one of our ManifoldCF users set up in Amazon, so >>>>>>>>> I know >>>>>>>>> it works properly. So now the question is to determine exactly what >>>>>>>>> you >>>>>>>>> are doing that is not correct. >>>>>>>>> >>>>>>>>> If you want to crawl just lists, you will nevertheless need to >>>>>>>>> enter both a Site match and a List match. Otherwise you will get >>>>>>>>> nothing, >>>>>>>>> because no sites can be crawled. >>>>>>>>> >>>>>>>>> To enter ANY of the rules I specified above, type a "*" in the >>>>>>>>> type-in box, then select "Add Text". Then, select one of >>>>>>>>> "File","Site","List",or "Library" from the pulldown, and then click >>>>>>>>> the >>>>>>>>> "Add new Rule" button. The Metadata tab works similarly. >>>>>>>>> >>>>>>>>> If you want me to verify you have done this correctly, please >>>>>>>>> include a screen shot of the job's View page. >>>>>>>>> >>>>>>>>> If this still isn't helping you, please include a screen shot of >>>>>>>>> the Simple History report after you have run a crawl. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Karl >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Mon, Nov 18, 2013 at 7:49 PM, Mark Libucha >>>>>>>>> <[email protected]>wrote: >>>>>>>>> >>>>>>>>>> I've seen this issue come up before, but I'd like to hear more >>>>>>>>>> about it (Karl), if there is more to say about it... >>>>>>>>>> >>>>>>>>>> Why isn't there an option to crawl an entire SharePoint site. I >>>>>>>>>> mean it's awesome that the UI gives us the option of drilling down >>>>>>>>>> dynamically and specifying exactly which parts we want crawled, but >>>>>>>>>> isn't >>>>>>>>>> the default case for most users to just crawl the whole thing? >>>>>>>>>> >>>>>>>>>> So, why exactly is this not an option, and what would adding that >>>>>>>>>> functionality (I would be volunteering to try this) be feasible? >>>>>>>>>> >>>>>>>>>> On a more specific level, Karl wrote this in an earlier thread: >>>>>>>>>> >>>>>>>>>> <quote> >>>>>>>>>> For SharePoint, if you want to crawl everything beneath your >>>>>>>>>> root site, the simplest way is to define 4 rules: >>>>>>>>>> (1) SITE rule "/*" >>>>>>>>>> (2) LIST rule "/*" >>>>>>>>>> (3) LIBRARY rule "/*" >>>>>>>>>> (4) FILE rule "/*" >>>>>>>>>> </quote> >>>>>>>>>> >>>>>>>>>> I haven't be able to get this to work. It only seems to get files. >>>>>>>>>> >>>>>>>>>> Limiting the scope to just Lists, when I use "/*" and specify >>>>>>>>>> List, I get nothing crawled. Also tried "/Lists/*". Still nothing. >>>>>>>>>> >>>>>>>>>> Maybe I'm not specifying the Metadata correctly? Could you expand >>>>>>>>>> on this Karl? What exactly needs to be specified to crawl all Lists? >>>>>>>>>> If I >>>>>>>>>> can get that to work I can probably figure out the rest of it. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> Mark >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
