[jira] [Updated] (CONNECTORS-275) Clarify documentation as to how to set up session login for web connector

2011-12-13 Thread Karl Wright (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/CONNECTORS-275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright updated CONNECTORS-275:
---

Fix Version/s: ManifoldCF 0.4
 Assignee: Karl Wright

Documentation was actually updated, and there is agreement that we will open 
tickets for new features, so I'm going to resolve this ticket.


 Clarify documentation as to how to set up session login for web connector
 -

 Key: CONNECTORS-275
 URL: https://issues.apache.org/jira/browse/CONNECTORS-275
 Project: ManifoldCF
  Issue Type: Improvement
  Components: Documentation, Web connector
Affects Versions: ManifoldCF 0.4
Reporter: Karl Wright
Assignee: Karl Wright
 Fix For: ManifoldCF 0.4

 Attachments: CONNECTORS-275.patch


 A book reader has this comment, which basically implies that we need to 
 improve the documentation for the web connector:
 I was excited to get the full version of the online book, but then 
 disappointed when it referred back to the online doc for setting up logins 
 for a Web spidering. The online doc is very vague and only gives one example. 
 I've used Ultraseek's and Google's spider, but I still find the Session login 
 sequences non-obvious.
 I've got a subscription request into the user mailing list, but here's the 
 parts that are not clear.
 I generally understand about using regexes to define sites and sorting out 
 content pages from login pages.
 But it's not clear why there's TWO Regex's per entry. There's a Login URL 
 regex, and also a Form name/link target regex.
 It's also not clear about the page type radio button choices.
 For rediection, am I saying look for a redirect event, or am I saying 
 then DO a redirect to this page.
 And for form name, what if my login page doesn't have a named form? In the 
 case of the site I'm trying to spider, when your session expires, you 
 manually go back to an https page and supply your username and password as 
 CGI parameters. I know this sounds odd, but it's apparently how a number of 
 the sites we're trying to spider work, some proprietary software.
 Karl, I really think the book or Wiki or doc needs 3 or 4 different examples 
 of login scenarios.
 Here's the scenario I'm trying, if you'd like to use it:
 Try to fetch: http://site.com/product?id=1234
 If you get a redirect to: http://site.com/Main.asp
 Note that there's no login form nor link on this page.
 Then invoke this login URL: 
 https://site.com/validate?username=mepassword=thatotherArg=something
 Note that you can't just visit this page and fill in a form, that gives an 
 error, it has to be passed in (I think as a GET)
 Then record the session cookie and try for /product?id=1234 again.
 I realize this is odd, I didn't design it. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CONNECTORS-275) Clarify documentation as to how to set up session login for web connector

2011-12-11 Thread Mark Bennett (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/CONNECTORS-275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Bennett updated CONNECTORS-275:


Comment: was deleted

(was: Adding table comparing page based and session based authentication.)

 Clarify documentation as to how to set up session login for web connector
 -

 Key: CONNECTORS-275
 URL: https://issues.apache.org/jira/browse/CONNECTORS-275
 Project: ManifoldCF
  Issue Type: Improvement
  Components: Documentation, Web connector
Affects Versions: ManifoldCF 0.4
Reporter: Karl Wright

 A book reader has this comment, which basically implies that we need to 
 improve the documentation for the web connector:
 I was excited to get the full version of the online book, but then 
 disappointed when it referred back to the online doc for setting up logins 
 for a Web spidering. The online doc is very vague and only gives one example. 
 I've used Ultraseek's and Google's spider, but I still find the Session login 
 sequences non-obvious.
 I've got a subscription request into the user mailing list, but here's the 
 parts that are not clear.
 I generally understand about using regexes to define sites and sorting out 
 content pages from login pages.
 But it's not clear why there's TWO Regex's per entry. There's a Login URL 
 regex, and also a Form name/link target regex.
 It's also not clear about the page type radio button choices.
 For rediection, am I saying look for a redirect event, or am I saying 
 then DO a redirect to this page.
 And for form name, what if my login page doesn't have a named form? In the 
 case of the site I'm trying to spider, when your session expires, you 
 manually go back to an https page and supply your username and password as 
 CGI parameters. I know this sounds odd, but it's apparently how a number of 
 the sites we're trying to spider work, some proprietary software.
 Karl, I really think the book or Wiki or doc needs 3 or 4 different examples 
 of login scenarios.
 Here's the scenario I'm trying, if you'd like to use it:
 Try to fetch: http://site.com/product?id=1234
 If you get a redirect to: http://site.com/Main.asp
 Note that there's no login form nor link on this page.
 Then invoke this login URL: 
 https://site.com/validate?username=mepassword=thatotherArg=something
 Note that you can't just visit this page and fill in a form, that gives an 
 error, it has to be passed in (I think as a GET)
 Then record the session cookie and try for /product?id=1234 again.
 I realize this is odd, I didn't design it. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CONNECTORS-275) Clarify documentation as to how to set up session login for web connector

2011-12-11 Thread Mark Bennett (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/CONNECTORS-275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Bennett updated CONNECTORS-275:


Attachment: CONNECTORS-275.patch

Updated to doc comparing page based and session based authentication.

 Clarify documentation as to how to set up session login for web connector
 -

 Key: CONNECTORS-275
 URL: https://issues.apache.org/jira/browse/CONNECTORS-275
 Project: ManifoldCF
  Issue Type: Improvement
  Components: Documentation, Web connector
Affects Versions: ManifoldCF 0.4
Reporter: Karl Wright
 Attachments: CONNECTORS-275.patch


 A book reader has this comment, which basically implies that we need to 
 improve the documentation for the web connector:
 I was excited to get the full version of the online book, but then 
 disappointed when it referred back to the online doc for setting up logins 
 for a Web spidering. The online doc is very vague and only gives one example. 
 I've used Ultraseek's and Google's spider, but I still find the Session login 
 sequences non-obvious.
 I've got a subscription request into the user mailing list, but here's the 
 parts that are not clear.
 I generally understand about using regexes to define sites and sorting out 
 content pages from login pages.
 But it's not clear why there's TWO Regex's per entry. There's a Login URL 
 regex, and also a Form name/link target regex.
 It's also not clear about the page type radio button choices.
 For rediection, am I saying look for a redirect event, or am I saying 
 then DO a redirect to this page.
 And for form name, what if my login page doesn't have a named form? In the 
 case of the site I'm trying to spider, when your session expires, you 
 manually go back to an https page and supply your username and password as 
 CGI parameters. I know this sounds odd, but it's apparently how a number of 
 the sites we're trying to spider work, some proprietary software.
 Karl, I really think the book or Wiki or doc needs 3 or 4 different examples 
 of login scenarios.
 Here's the scenario I'm trying, if you'd like to use it:
 Try to fetch: http://site.com/product?id=1234
 If you get a redirect to: http://site.com/Main.asp
 Note that there's no login form nor link on this page.
 Then invoke this login URL: 
 https://site.com/validate?username=mepassword=thatotherArg=something
 Note that you can't just visit this page and fill in a form, that gives an 
 error, it has to be passed in (I think as a GET)
 Then record the session cookie and try for /product?id=1234 again.
 I realize this is odd, I didn't design it. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira