I *think* I applied the patch correctly. Got a new error: ERROR 2013-11-18 21:25:47,994 (Worker thread '1') - Exception tossed: Expected path to start with /Lists/, saw: '/Relationships List/1_.000' org.apache.manifoldcf.core.interfaces.ManifoldCFException: Expected path to start with /Lists/, saw: '/Relationships List/1_.000'
http://msdn.microsoft.com/en-us/library/ff798514.aspx Mark On Mon, Nov 18, 2013 at 5:53 PM, Karl Wright <[email protected]> wrote: > Ok, patch attached. > > One of two things will happen with this patch: > (1) It will work > (2) It will crawl to completion but not get any list rows > > If it is the latter, it means that SharePoint operating in this mode > REPLACES the list items with some funky cache URL, rather than augmenting > them. So please send me the log output if that happens. > > Thanks, > Karl > > > > On Mon, Nov 18, 2013 at 8:45 PM, Karl Wright <[email protected]> wrote: > >> Hah. Exactly the kind of configuration difference I was expecting. >> Whatever it is, it's showing up as a list. >> >> I'll open a ticket, and propose a patch; let's see if that gets us past >> this. >> >> The ticket is CONNECTORS-812. I should have a patch in a few minutes, >> attached to the ticket. >> >> Karl >> >> >> >> >> On Mon, Nov 18, 2013 at 8:41 PM, Mark Libucha <[email protected]> wrote: >> >>> Seems to be a SP-internal thing. >>> >>> http://msdn.microsoft.com/en-us/library/aa661294.ASPX >>> >>> Mark >>> >>> >>> On Mon, Nov 18, 2013 at 5:39 PM, Karl Wright <[email protected]> wrote: >>> >>>> Hi Mark, >>>> >>>> Is "Cache Profiles" a list in your SharePoint? If not, what is it? >>>> >>>> Karl >>>> >>>> >>>> >>>> On Mon, Nov 18, 2013 at 8:37 PM, Mark Libucha <[email protected]>wrote: >>>> >>>>> Hi Karl, >>>>> >>>>> It's not the first problem you mentioned. I don't have a site >>>>> specified in my SP connection. But it could well be the misconfigured IIS >>>>> issue... >>>>> >>>>> Here's what I get with your modified log message: >>>>> >>>>> ERROR 2013-11-18 20:35:47,440 (Worker thread '7') - Exception tossed: >>>>> Expected path to start with /Lists/, saw: '/Cache Profiles/1_.000' >>>>> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Expected >>>>> path to start with /Lists/, saw: '/Cache Profiles/1_.000' >>>>> >>>>> Thanks, >>>>> >>>>> Mark >>>>> >>>>> >>>>> >>>>> On Mon, Nov 18, 2013 at 5:29 PM, Karl Wright <[email protected]>wrote: >>>>> >>>>>> Hi Mark, >>>>>> >>>>>> The exception is very helpful. >>>>>> >>>>>> I've seen this before. I know of two ways it can happen. >>>>>> >>>>>> First way: your Repository Connection is not actually pointing at the >>>>>> SharePoint root, but rather a subsite of the root. That usually messes >>>>>> things up pretty well - and it's not easy to detect in the connector >>>>>> properly either. You must point at the actual root, not a subsite, and >>>>>> use >>>>>> the criteria to limit what you include. >>>>>> >>>>>> Second way: your SharePoint instance has a malconfigured IIS, which >>>>>> is mapping paths in ways that are unexpected. >>>>>> >>>>>> There may be other ways that this can happen; SharePoint has a myriad >>>>>> different configuration options and it is possible your instance has one >>>>>> that is not something we've ever seen before. If you think that is what >>>>>> is >>>>>> happening, change this line: >>>>>> >>>>>> throw new ManifoldCFException("Expected path to start >>>>>> with /Lists/"); >>>>>> >>>>>> to: >>>>>> >>>>>> throw new ManifoldCFException("Expected path to start >>>>>> with /Lists/, saw: '"+relPath+"'"); >>>>>> >>>>>> Karl >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Mon, Nov 18, 2013 at 8:20 PM, Mark Libucha <[email protected]>wrote: >>>>>> >>>>>>> Screen shot attached. Using 4.1, SharePoint 2010. >>>>>>> >>>>>>> Throws this exception: >>>>>>> >>>>>>> ERROR 2013-11-18 20:12:58,058 (Worker thread '13') - Exception >>>>>>> tossed: Expected path to start with /Lists/ >>>>>>> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Expected >>>>>>> path to start with /Lists/ >>>>>>> at >>>>>>> org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository$ListItemStream.addFile(SharePointRepository.java:2255) >>>>>>> >>>>>>> I added a debug log message to the SharePoint crawler so the line >>>>>>> number may be off by 1 or 2... >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Mark >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Mon, Nov 18, 2013 at 4:59 PM, Karl Wright <[email protected]>wrote: >>>>>>> >>>>>>>> Hi Mark, >>>>>>>> >>>>>>>> First, what version of ManifoldCF are you using? 1.3 has some bugs >>>>>>>> where lists are concerned. >>>>>>>> >>>>>>>> Second, I've recently and repeatedly run exactly this crawl against >>>>>>>> a site that one of our ManifoldCF users set up in Amazon, so I know it >>>>>>>> works properly. So now the question is to determine exactly what you >>>>>>>> are >>>>>>>> doing that is not correct. >>>>>>>> >>>>>>>> If you want to crawl just lists, you will nevertheless need to >>>>>>>> enter both a Site match and a List match. Otherwise you will get >>>>>>>> nothing, >>>>>>>> because no sites can be crawled. >>>>>>>> >>>>>>>> To enter ANY of the rules I specified above, type a "*" in the >>>>>>>> type-in box, then select "Add Text". Then, select one of >>>>>>>> "File","Site","List",or "Library" from the pulldown, and then click the >>>>>>>> "Add new Rule" button. The Metadata tab works similarly. >>>>>>>> >>>>>>>> If you want me to verify you have done this correctly, please >>>>>>>> include a screen shot of the job's View page. >>>>>>>> >>>>>>>> If this still isn't helping you, please include a screen shot of >>>>>>>> the Simple History report after you have run a crawl. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Karl >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Nov 18, 2013 at 7:49 PM, Mark Libucha >>>>>>>> <[email protected]>wrote: >>>>>>>> >>>>>>>>> I've seen this issue come up before, but I'd like to hear more >>>>>>>>> about it (Karl), if there is more to say about it... >>>>>>>>> >>>>>>>>> Why isn't there an option to crawl an entire SharePoint site. I >>>>>>>>> mean it's awesome that the UI gives us the option of drilling down >>>>>>>>> dynamically and specifying exactly which parts we want crawled, but >>>>>>>>> isn't >>>>>>>>> the default case for most users to just crawl the whole thing? >>>>>>>>> >>>>>>>>> So, why exactly is this not an option, and what would adding that >>>>>>>>> functionality (I would be volunteering to try this) be feasible? >>>>>>>>> >>>>>>>>> On a more specific level, Karl wrote this in an earlier thread: >>>>>>>>> >>>>>>>>> <quote> >>>>>>>>> For SharePoint, if you want to crawl everything beneath your root >>>>>>>>> site, the simplest way is to define 4 rules: >>>>>>>>> (1) SITE rule "/*" >>>>>>>>> (2) LIST rule "/*" >>>>>>>>> (3) LIBRARY rule "/*" >>>>>>>>> (4) FILE rule "/*" >>>>>>>>> </quote> >>>>>>>>> >>>>>>>>> I haven't be able to get this to work. It only seems to get files. >>>>>>>>> >>>>>>>>> Limiting the scope to just Lists, when I use "/*" and specify >>>>>>>>> List, I get nothing crawled. Also tried "/Lists/*". Still nothing. >>>>>>>>> >>>>>>>>> Maybe I'm not specifying the Metadata correctly? Could you expand >>>>>>>>> on this Karl? What exactly needs to be specified to crawl all Lists? >>>>>>>>> If I >>>>>>>>> can get that to work I can probably figure out the rest of it. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Mark >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
