hi Karl, thank you for the very prompt feedback! > 1) Have you made sure to include the redirection back to the content? This is the step I don't quite understand - could you please clarify how that could be done? I thought, when the auth sequence is done (exit login mode), the redirect to the original page happens automatically (which is the case here, but somehow the content is still "public").
> 2) your check for *entering* the login sequence is too broad and fires again > even though the private sitemap page is being returned. totally agree, that's why the first step is to look into the content of the page, to check, if there is a pattern which appears in the public version ONLY. This is the only solution I can imagine so far, but any ideas - very welcome! The simple history shows basically the same - the process never leaves the login stage. If I remove the 3rd step, then I see, that the login stage is over (logon end), but as the content of the sitemap.xml is still "public", the login process kicks in again. Thanks! Konstantin 2016-07-07 11:07 GMT+02:00 Karl Wright <[email protected]>: > Hi Konstantin, > > There are two possibilities: > > (1) You have missed one stage when specifying the login sequence. The > cookies are getting set, but not during a step that's part of the login > sequence. Have you made sure to include the redirection back to the > content? > (2) You really are logging in but your check for *entering* the login > sequence is too broad and fires again even though the private sitemap page > is being returned. > > You can also look at the simple history as well to get an idea what MCF is > doing for your job for session handling. > > Thanks, > Karl > > > On Thu, Jul 7, 2016 at 4:35 AM, jetnet <[email protected]> wrote: >> >> Hi All, >> >> I've been trying to setup a session-based auth sequence for a forked >> MediaWiki site (Wiki connector does not work with this version), but >> somehow got stuck with the configuration. >> The idea is to index the site using its sitemap.xml with hops=1. The >> "public" version (user not logged in) of the sitemap.xml contains a >> different set of links as the "authenticated" one (user logged in). >> The current auth sequence looks like this (the job's seeding >> URL=http://wikisite/sitemap.xml): >> >> 1) the first call to the seeding URL should be redirected to the login >> page >> Login URL regexp: sitemap.xml >> Page type: content >> Identification regular expression: <some content from the "public" >> version> >> Override target URL: /Special:UserLogin >> >> 2) enter user's credentials on the login page >> Login URL regexp: Special:UserLogin >> Page type: form >> Override form parameters: username=someuser, password=******, >> returntourl=http://wikisite/sitemap.xml >> >> 3) the login page ***should*** redirect back to the seeding URL with >> the authorized content >> Login URL regexp: /Special:UserLogin >> Page type: redirection >> Identification regular expression: /sitemap.xml >> >> From the log-file I can see, that first 2 steps work fine - the public >> content gets recognized, the form data get sent, the session's cookies >> get set. But the 3rd step returns the "public" version of the >> sitemap.xml again, and the login process is getting stuck in a loop. >> Am I on the right way or did I miss something? >> >> here is the log for the 3rd step: >> >> INFO 2016-07-06 22:52:27,285 (Worker thread '43') - WEB: FETCH >> LOGIN|http://wikisite/Special:UserLogin|1467838347082+203|302|153| >> DEBUG 2016-07-06 22:52:27,285 (Worker thread '43') - WEB: Tried to >> match raw url 'http://wikisite/sitemap.xml' >> DEBUG 2016-07-06 22:52:27,285 (Worker thread '43') - WEB: Tried to >> match cooked url 'http://wikisite/sitemap.xml' >> DEBUG 2016-07-06 22:52:27,285 (Worker thread '43') - WEB: Redirection >> link lookup matched 'http://wikisite/sitemap.xml' >> DEBUG 2016-07-06 22:52:27,285 (Worker thread '43') - WEB: Document >> 'http://wikisite/Special:UserLogin' matches preferred redirection, so >> determined to be login page for sequence 'wikisite' >> DEBUG 2016-07-06 22:52:27,394 (Worker thread '43') - WEB: Waiting for >> an HttpClient object >> DEBUG 2016-07-06 22:52:27,394 (Worker thread '43') - WEB: For >> http://wikisite/sitemap.xml, setting virtual host to wikisite >> DEBUG 2016-07-06 22:52:27,394 (Worker thread '43') - WEB: Got an >> HttpClient object after 0 ms. >> DEBUG 2016-07-06 22:52:27,394 (Worker thread '43') - WEB: Get method >> for '/sitemap.xml' >> DEBUG 2016-07-06 22:52:27,394 (Worker thread '43') - WEB: Adding 2 >> cookies for '/sitemap.xml' >> DEBUG 2016-07-06 22:52:27,394 (Worker thread '43') - WEB: Cookie >> '[version: 0][name: PHPSESSID][value: >> 1vnhgi0f84dc9pi6eaoj0nau45][domain: wikisite][path: /][expiry: null]' >> added >> DEBUG 2016-07-06 22:52:27,394 (Worker thread '43') - WEB: Cookie >> '[version: 0][name: authtoken][value: >> 920_636034351472613318_616a5fd45ce4d5fed6c5318d73b38070][domain: >> wikisite][path: /][expiry: Wed Jul 13 22:52:27 CEST 2016]' added >> DEBUG 2016-07-06 22:52:35,660 (Worker thread '43') - WEB: Retrieving >> cookies... >> DEBUG 2016-07-06 22:52:35,660 (Worker thread '43') - WEB: Cookie >> '[version: 0][name: PHPSESSID][value: >> vqfpr88pqa6d62nl6h4lp03nu1][domain: wikisite][path: /][expiry: null]' >> DEBUG 2016-07-06 22:52:35,660 (Worker thread '43') - WEB: Cookie >> '[version: 0][name: authtoken][value: >> 920_636034351472613318_616a5fd45ce4d5fed6c5318d73b38070][domain: >> wikisite][path: /][expiry: Wed Jul 13 22:52:27 CEST 2016]' >> INFO 2016-07-06 22:52:37,004 (Worker thread '43') - WEB: FETCH >> LOGIN|http://wikisite/sitemap.xml|1467838347394+9610|200|683773| >> DEBUG 2016-07-06 22:52:37,004 (Worker thread '43') - WEB: Document >> 'http://wikisite/sitemap.xml' is text, with encoding 'utf-8'; link >> extraction starting >> DEBUG 2016-07-06 22:52:37,019 (Worker thread '43') - WEB: Document >> 'http://wikisite/sitemap.xml' matches content, so determined to be >> login page for sequence 'wikisite' >> >> >> Thank you! >> regards, Konstantin > >
