ok, so, it means, that I do not need the 3rd stage at all? As the second stage (form authentication) records the cookies and redirects back:
the second stage: DEBUG 2016-07-07 10:52:48,231 (Worker thread '79') - WEB: Post method for '/Special:UserLogin' DEBUG 2016-07-07 10:52:48,231 (Worker thread '79') - WEB: Post parameter name 'username' value 'someuser' for '/Special:UserLogin' DEBUG 2016-07-07 10:52:48,231 (Worker thread '79') - WEB: Post parameter name 'returntourl' value 'http://wikisite/sitemap.xml' for '/Special:UserLogin' DEBUG 2016-07-07 10:52:48,231 (Worker thread '79') - WEB: Post parameter name 'password' value 'XXXXXX' for '/Special:UserLogin' DEBUG 2016-07-07 10:52:48,231 (Worker thread '79') - WEB: Adding 2 cookies for '/Special:UserLogin' DEBUG 2016-07-07 10:52:48,231 (Worker thread '79') - WEB: Cookie '[version: 0][name: PHPSESSID][value: bughgf8fbjkkevk79ot4ef2vj1][domain: wikisite][path: /][expiry: null]' added DEBUG 2016-07-07 10:52:48,231 (Worker thread '79') - WEB: Cookie '[version: 0][name: authtoken][value: 920_636034352097041592_136c71f2ac1fc2dd1ba72de805fcd1b5][domain: wikisite][path: /][expiry: Wed Jul 13 22:53:29 CEST 2016]' added DEBUG 2016-07-07 10:52:48,434 (Worker thread '79') - WEB: Retrieving cookies... DEBUG 2016-07-07 10:52:48,434 (Worker thread '79') - WEB: Cookie '[version: 0][name: PHPSESSID][value: 589h3f20tjndhkc391nu5u0u51][domain: wikisite][path: /][expiry: null]' DEBUG 2016-07-07 10:52:48,434 (Worker thread '79') - WEB: Cookie '[version: 0][name: authtoken][value: 920_636034783686256706_585415102d050458acfd91a9d1f223d5][domain: wikisite][path: /][expiry: Thu Jul 14 10:52:48 CEST 2016]' INFO 2016-07-07 10:52:48,449 (Worker thread '79') - WEB: FETCH LOGIN|http://wikisite/Special:UserLogin|1467881568231+218|302|153| DEBUG 2016-07-07 10:52:48,449 (Worker thread '79') - WEB: Document 'http://wikisite/Special:UserLogin' did not match expected form, link, redirection, or content for sequence 'wikisite' so, the last message means, nothing matches in the sequence anymore - logon end. And the last two cookies are being used for the next fetch of the sitemap, but the its content still matches the public pattern. Strange things happen... I just tried to use the authtoken cookie from the log direct in the browser - and it gets authenticated without problems: I get the "private" content. But the manifoldcf not... weird... DEBUG 2016-07-07 10:52:48,543 (Worker thread '79') - WEB: Adding 2 cookies for '/sitemap.xml' DEBUG 2016-07-07 10:52:48,543 (Worker thread '79') - WEB: Cookie '[version: 0][name: PHPSESSID][value: 589h3f20tjndhkc391nu5u0u51][domain: wikisite][path: /][expiry: null]' added DEBUG 2016-07-07 10:52:48,543 (Worker thread '79') - WEB: Cookie '[version: 0][name: authtoken][value: 920_636034783686256706_585415102d050458acfd91a9d1f223d5][domain: wikisite][path: /][expiry: Thu Jul 14 10:52:48 CEST 2016]' added INFO 2016-07-07 10:52:58,500 (Worker thread '79') - WEB: FETCH URL|http://wikisite/sitemap.xml|1467881568543+9957|200|684072| size: 684072 - is public content. Does it **really** add the cookies to the request? :) Thanks! Konstantin 2016-07-07 11:44 GMT+02:00 Karl Wright <[email protected]>: > "I thought, when the auth sequence is done > (exit login mode), the redirect to the original page happens > automatically (which is the case here, but somehow the content is > still "public")." > > That is correct BUT if the final redirection is what sets the cookies THEN > the cookies will only be recorded by the web connector if the final > redirection is part of the login sequence. > > Thanks, > Karl > > > On Thu, Jul 7, 2016 at 5:33 AM, jetnet <[email protected]> wrote: >> >> hi Karl, >> thank you for the very prompt feedback! >> >> > 1) Have you made sure to include the redirection back to the content? >> This is the step I don't quite understand - could you please clarify >> how that could be done? I thought, when the auth sequence is done >> (exit login mode), the redirect to the original page happens >> automatically (which is the case here, but somehow the content is >> still "public"). >> >> > 2) your check for *entering* the login sequence is too broad and fires >> > again even though the private sitemap page is being returned. >> totally agree, that's why the first step is to look into the content >> of the page, to check, if there is a pattern which appears in the >> public version ONLY. >> This is the only solution I can imagine so far, but any ideas - very >> welcome! >> >> The simple history shows basically the same - the process never leaves >> the login stage. >> >> If I remove the 3rd step, then I see, that the login stage is over >> (logon end), but as the content of the sitemap.xml is still "public", >> the login process kicks in again. >> >> Thanks! >> Konstantin >> >> 2016-07-07 11:07 GMT+02:00 Karl Wright <[email protected]>: >> > Hi Konstantin, >> > >> > There are two possibilities: >> > >> > (1) You have missed one stage when specifying the login sequence. The >> > cookies are getting set, but not during a step that's part of the login >> > sequence. Have you made sure to include the redirection back to the >> > content? >> > (2) You really are logging in but your check for *entering* the login >> > sequence is too broad and fires again even though the private sitemap >> > page >> > is being returned. >> > >> > You can also look at the simple history as well to get an idea what MCF >> > is >> > doing for your job for session handling. >> > >> > Thanks, >> > Karl >> > >> > >> > On Thu, Jul 7, 2016 at 4:35 AM, jetnet <[email protected]> wrote: >> >> >> >> Hi All, >> >> >> >> I've been trying to setup a session-based auth sequence for a forked >> >> MediaWiki site (Wiki connector does not work with this version), but >> >> somehow got stuck with the configuration. >> >> The idea is to index the site using its sitemap.xml with hops=1. The >> >> "public" version (user not logged in) of the sitemap.xml contains a >> >> different set of links as the "authenticated" one (user logged in). >> >> The current auth sequence looks like this (the job's seeding >> >> URL=http://wikisite/sitemap.xml): >> >> >> >> 1) the first call to the seeding URL should be redirected to the login >> >> page >> >> Login URL regexp: sitemap.xml >> >> Page type: content >> >> Identification regular expression: <some content from the "public" >> >> version> >> >> Override target URL: /Special:UserLogin >> >> >> >> 2) enter user's credentials on the login page >> >> Login URL regexp: Special:UserLogin >> >> Page type: form >> >> Override form parameters: username=someuser, password=******, >> >> returntourl=http://wikisite/sitemap.xml >> >> >> >> 3) the login page ***should*** redirect back to the seeding URL with >> >> the authorized content >> >> Login URL regexp: /Special:UserLogin >> >> Page type: redirection >> >> Identification regular expression: /sitemap.xml >> >> >> >> From the log-file I can see, that first 2 steps work fine - the public >> >> content gets recognized, the form data get sent, the session's cookies >> >> get set. But the 3rd step returns the "public" version of the >> >> sitemap.xml again, and the login process is getting stuck in a loop. >> >> Am I on the right way or did I miss something? >> >> >> >> here is the log for the 3rd step: >> >> >> >> INFO 2016-07-06 22:52:27,285 (Worker thread '43') - WEB: FETCH >> >> LOGIN|http://wikisite/Special:UserLogin|1467838347082+203|302|153| >> >> DEBUG 2016-07-06 22:52:27,285 (Worker thread '43') - WEB: Tried to >> >> match raw url 'http://wikisite/sitemap.xml' >> >> DEBUG 2016-07-06 22:52:27,285 (Worker thread '43') - WEB: Tried to >> >> match cooked url 'http://wikisite/sitemap.xml' >> >> DEBUG 2016-07-06 22:52:27,285 (Worker thread '43') - WEB: Redirection >> >> link lookup matched 'http://wikisite/sitemap.xml' >> >> DEBUG 2016-07-06 22:52:27,285 (Worker thread '43') - WEB: Document >> >> 'http://wikisite/Special:UserLogin' matches preferred redirection, so >> >> determined to be login page for sequence 'wikisite' >> >> DEBUG 2016-07-06 22:52:27,394 (Worker thread '43') - WEB: Waiting for >> >> an HttpClient object >> >> DEBUG 2016-07-06 22:52:27,394 (Worker thread '43') - WEB: For >> >> http://wikisite/sitemap.xml, setting virtual host to wikisite >> >> DEBUG 2016-07-06 22:52:27,394 (Worker thread '43') - WEB: Got an >> >> HttpClient object after 0 ms. >> >> DEBUG 2016-07-06 22:52:27,394 (Worker thread '43') - WEB: Get method >> >> for '/sitemap.xml' >> >> DEBUG 2016-07-06 22:52:27,394 (Worker thread '43') - WEB: Adding 2 >> >> cookies for '/sitemap.xml' >> >> DEBUG 2016-07-06 22:52:27,394 (Worker thread '43') - WEB: Cookie >> >> '[version: 0][name: PHPSESSID][value: >> >> 1vnhgi0f84dc9pi6eaoj0nau45][domain: wikisite][path: /][expiry: null]' >> >> added >> >> DEBUG 2016-07-06 22:52:27,394 (Worker thread '43') - WEB: Cookie >> >> '[version: 0][name: authtoken][value: >> >> 920_636034351472613318_616a5fd45ce4d5fed6c5318d73b38070][domain: >> >> wikisite][path: /][expiry: Wed Jul 13 22:52:27 CEST 2016]' added >> >> DEBUG 2016-07-06 22:52:35,660 (Worker thread '43') - WEB: Retrieving >> >> cookies... >> >> DEBUG 2016-07-06 22:52:35,660 (Worker thread '43') - WEB: Cookie >> >> '[version: 0][name: PHPSESSID][value: >> >> vqfpr88pqa6d62nl6h4lp03nu1][domain: wikisite][path: /][expiry: null]' >> >> DEBUG 2016-07-06 22:52:35,660 (Worker thread '43') - WEB: Cookie >> >> '[version: 0][name: authtoken][value: >> >> 920_636034351472613318_616a5fd45ce4d5fed6c5318d73b38070][domain: >> >> wikisite][path: /][expiry: Wed Jul 13 22:52:27 CEST 2016]' >> >> INFO 2016-07-06 22:52:37,004 (Worker thread '43') - WEB: FETCH >> >> LOGIN|http://wikisite/sitemap.xml|1467838347394+9610|200|683773| >> >> DEBUG 2016-07-06 22:52:37,004 (Worker thread '43') - WEB: Document >> >> 'http://wikisite/sitemap.xml' is text, with encoding 'utf-8'; link >> >> extraction starting >> >> DEBUG 2016-07-06 22:52:37,019 (Worker thread '43') - WEB: Document >> >> 'http://wikisite/sitemap.xml' matches content, so determined to be >> >> login page for sequence 'wikisite' >> >> >> >> >> >> Thank you! >> >> regards, Konstantin >> > >> > > >
