Hi Deepa, For a start, it sounds like you are making some fairly basic errors here.
For example, you should be thinking in terms of what pages exactly constitute login pages and identifying those; if properly identified these will NEVER be indexed. If you are seeing pages that you think are part of the login sequence being indexed, you cannot have properly specified the login pages in some way. Second, the cookies for a session-authenticated part of the site are created by the site, not by you. What you must do is walk the web connector through the login sequence and allow it to set the cookies it needs to set. The Web Connector saves the cookies in effect aside at the end of the login sequence. So, trying to set them as a form parameter will not generally work, unless by "cookie" you really mean some kind of credential. >From your description of the problem, it sounds like you will need the following login pages: - one to catch the redirection from xyz.com to the login page. The login page type would be "redirection" and the target regexp would be something like "https://xyzLogin.com/opensso/UI/Login?realm=/xyz". - one to fill in form values on the first login page - one to fill in form values on the second login page - one to describe the redirection back to wherever you started from When you look in the simple history you should see BEGIN LOGIN happen with the first redirection, and it should walk all the way back to your original page, logging END LOGIN just before it fetches that. If the fetch fails because the login didn't work for some reason, then often you'll see the cycle repeat. If you never even get into the logging sequence, you may need to try things out with browser http logging so you can see what's actually happening. Karl On Mon, Sep 21, 2015 at 7:27 PM, Deepa Thakur <[email protected]> wrote: > Hello, > I am trying to configure ManicoldCF 2.1 Web Connector to crawl my intranet > site and it uses OpenAM for authorization. The sequence of steps involved > to get to intranet home page is: > > 1. Client requests http://xyz.com/ <http://foo.com/> > 2. If no valid authorization token is presented, client is > redirected to > https://xyzLogin.com/opensso/UI/Login?realm=/xyz&goto=http://xyz.com/ > 3. Client submits two login forms (frm1 and frm2) and expects a > valid userid and password. > 4. Client is given 3 cookies and JSESSIONID and then gets redirected > back to http://xyz.com/ > > > > In the Access Credentials tab I am defining the following login sequence: > > URL Regular Expression = xyz.com > > *Step 1: * > Login URL Regular expression = > https://xyzLogin.com/opensso/UI/Login?realm=/xyz&goto=http://xyz.com/ > <https://foologin.com/opensso/UI/Login?realm=/foo&goto=http://foo.com/> > Page type = form > Identification regular expression = frm1 > > In the Override form parameters section: > Parameter regular expression = IDToken1 Value=solruser > > *Step 2:* > Login URL Regular expression = frm1 > Page type = form > Identification regular expression = frm2 > > In the Override form parameters section: > Parameter regular expression = IDToken2 Value=<password> > > *Step 3:* > Login URL Regular expression = frm2 > Page type = form > Identification regular expression = post > > In the Override form parameters section: > Parameter regular expression = Cookie1 Value= <some value> > Parameter regular expression = Cookie2 Value= <some value> > Parameter regular expression = Cookie3 Value= <some value> > Parameter regular expression = JSESSIONID Value= <some value> > > *Step 4:* > Login URL Regular expression = > https://xyzLogin.com/opensso/UI/Login?realm=/xyz&goto=http://xyz.com > <https://foologin.com/opensso/UI/Login?realm=/foo&goto=http://foo.com> > Page type = form > > When I run the job, with seed URL http://xyz.com <http://foo.com/> , > contents of login page (https://xyzLogin.com/opensso/UI/Login > <https://foologin.com/opensso/UI/Login> ) are indexed into Solr. In > Simple History, I see Result code = RESPONSECODENOTINDEXABLE when it > processes identifier http://xyz.com <http://foo.com/>. > > Can you please tell me how I can fix the login sequence so that, after the > cookies are set, connector knows to redirect to seed URL? Also how do I > prevent the login page from getting indexed into Solr. > > Thanks, > D.T > > > > > > > > >
