Hi, database may contain Z.com and X.Y.Z.com if created automatically through a JSP, but not the intermediate one Y.Z.com.
if the crawler decides to go to A.Y.Z.com and looking to database Z.com is present, it still doesn't work (it should since A.Y.Z is a sub-domain in Z). Only doing that changes by hand (replacing domain with sub-domain in database) and restarting manifold it begins to work. There might be security constrains somehow, I will consider further analysis. Regards. El jue., 26 jul. 2018 a las 0:06, Karl Wright (<[email protected]>) escribió: > The web connector, though, does not filter any cookies. It takes them all > -- whatever cookies HttpClient is storing at that point. So you should see > all the cookies in the database table, regardless of their site affinity, > unless HttpClient is refusing to accept a cookie for security reasons. > > It's also possible that HttpClient is selective about which cookies to > transmit on a page fetch. > > Can you look in the database and tell me whether your cookie gets stored, > or not? If not, then HttpClient's cookie acceptance policy is not lenient > enough. If it is in the database, then it's the transmission policy that > is too strict. > > Thanks, > Karl > > > On Wed, Jul 25, 2018 at 4:36 PM Gustavo Beneitez < > [email protected]> wrote: > >> I agree, but the fact is that if my "login sequence" defines a login >> credential for domain "Z.com" and the crawler reaches "Y.Z.com" or " >> X.Y.Z.com", none of the sub-sites receives that cookie, I need to write >> same cookie for every sub-domain, that solves the situation (and >> thankfully is a language cookie and not a dynamic one). >> >> Regards. >> >> El mié., 25 jul. 2018 a las 19:17, Karl Wright (<[email protected]>) >> escribió: >> >>> You should not need to fill the database by hand. Your login sequence >>> should include whatever redirection etc is used to set the cookies though. >>> >>> Karl >>> >>> >>> On Wed, Jul 25, 2018 at 1:06 PM Gustavo Beneitez < >>> [email protected]> wrote: >>> >>>> Hi again, >>>> >>>> Thanks Karl, I was able of doing that after defining some "login >>>> sequence", but also after filling database (cookiedata table) with certain >>>> values due to "domain constrictions". >>>> Before every web call, I suspect Manifold only takes cookies from URL >>>> exact subdomain (i.e. x.y.z.com), so if you define your cookie as " >>>> z.com" it won't be sent, so I added every subdomain by hand and >>>> started to work. >>>> >>>> Regards. >>>> >>>> >>>> El vie., 20 jul. 2018 a las 8:12, Gustavo Beneitez (< >>>> [email protected]>) escribió: >>>> >>>>> Hi, >>>>> >>>>> thanks a lot, please let me check then the documentation for an >>>>> example of that. >>>>> >>>>> Regards! >>>>> >>>>> El jue., 19 jul. 2018 a las 21:54, Karl Wright (<[email protected]>) >>>>> escribió: >>>>> >>>>>> You are correct that cookies are not shared among threads. That is >>>>>> by design. >>>>>> >>>>>> The only way to set cookies for the WebConnector is to have there be >>>>>> a "login sequence". The login sequence sets cookies that are then used >>>>>> by >>>>>> all subsequent fetches. >>>>>> >>>>>> Thanks, >>>>>> Karl >>>>>> >>>>>> >>>>>> On Thu, Jul 19, 2018 at 3:38 PM Gustavo Beneitez < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Hi everyone, >>>>>>> >>>>>>> I have tried to look for an answer before writing this email, no >>>>>>> luck. Sorry for the inconvenience if it is already answered. >>>>>>> >>>>>>> I need to set a cookie at the begining of the web crawling. The >>>>>>> cookie rules the language you get the content, and while there are >>>>>>> several >>>>>>> choices, if no cookie is found there will be a "default language". >>>>>>> >>>>>>> I made a JSP which sets the cookie and contains several links >>>>>>> (href), and pointed ManifoldCF to this page as the repository seed. I >>>>>>> expected to get the crawling engine starting to capture links with >>>>>>> correct >>>>>>> language indicated by the cookie, but what I really got is a lot of >>>>>>> content >>>>>>> shown in default language. >>>>>>> >>>>>>> What I think about that is that cookies are not shared between >>>>>>> thread spiders, so it is not possible to get cookies remain between >>>>>>> links. >>>>>>> Cookie domain is correct, also cookie expiration >>>>>>> >>>>>>> I would appreciate so much if you can help me on this. >>>>>>> >>>>>>> Thanks in advance! >>>>>>> >>>>>>> >>>>>>>
