Hi,

database may contain Z.com and X.Y.Z.com if created automatically through a
JSP, but not the intermediate one Y.Z.com.

if the crawler decides to go to A.Y.Z.com and looking to database Z.com is
present, it still doesn't work (it should since A.Y.Z is a sub-domain in Z).

Only doing that changes by hand (replacing domain with sub-domain in
database) and restarting manifold it begins to work.

There might be security constrains somehow, I will consider further
analysis.

Regards.


El jue., 26 jul. 2018 a las 0:06, Karl Wright (<[email protected]>)
escribió:

> The web connector, though, does not filter any cookies.  It takes them all
> -- whatever cookies HttpClient is storing at that point.  So you should see
> all the cookies in the database table, regardless of their site affinity,
> unless HttpClient is refusing to accept a cookie for security reasons.
>
> It's also possible that HttpClient is selective about which cookies to
> transmit on a page fetch.
>
> Can you look in the database and tell me whether your cookie gets stored,
> or not?  If not, then HttpClient's cookie acceptance policy is not lenient
> enough.  If it is in the database, then it's the transmission policy that
> is too strict.
>
> Thanks,
> Karl
>
>
> On Wed, Jul 25, 2018 at 4:36 PM Gustavo Beneitez <
> [email protected]> wrote:
>
>> I agree, but the fact is that if my "login sequence" defines a login
>> credential for domain "Z.com" and the crawler reaches "Y.Z.com" or "
>> X.Y.Z.com", none of the sub-sites receives that cookie, I need to write
>> same cookie  for every sub-domain, that solves the situation (and
>> thankfully is a language cookie and not a dynamic one).
>>
>> Regards.
>>
>> El mié., 25 jul. 2018 a las 19:17, Karl Wright (<[email protected]>)
>> escribió:
>>
>>> You should not need to fill the database by hand.  Your login sequence
>>> should include whatever redirection etc is used to set the cookies though.
>>>
>>> Karl
>>>
>>>
>>> On Wed, Jul 25, 2018 at 1:06 PM Gustavo Beneitez <
>>> [email protected]> wrote:
>>>
>>>> Hi again,
>>>>
>>>> Thanks Karl, I was able of doing that after defining some "login
>>>> sequence", but also after filling database (cookiedata table) with certain
>>>> values due to "domain constrictions".
>>>> Before every web call, I suspect Manifold only takes cookies from URL
>>>> exact subdomain (i.e. x.y.z.com), so if you define your cookie as "
>>>> z.com" it won't be sent, so I added every subdomain by hand and
>>>> started to work.
>>>>
>>>> Regards.
>>>>
>>>>
>>>> El vie., 20 jul. 2018 a las 8:12, Gustavo Beneitez (<
>>>> [email protected]>) escribió:
>>>>
>>>>> Hi,
>>>>>
>>>>> thanks a lot, please let me check then the documentation for an
>>>>> example of that.
>>>>>
>>>>> Regards!
>>>>>
>>>>> El jue., 19 jul. 2018 a las 21:54, Karl Wright (<[email protected]>)
>>>>> escribió:
>>>>>
>>>>>> You are correct that cookies are not shared among threads.  That is
>>>>>> by design.
>>>>>>
>>>>>> The only way to set cookies for the WebConnector is to have there be
>>>>>> a "login sequence".  The login sequence sets cookies that are then used 
>>>>>> by
>>>>>> all subsequent fetches.
>>>>>>
>>>>>> Thanks,
>>>>>> Karl
>>>>>>
>>>>>>
>>>>>> On Thu, Jul 19, 2018 at 3:38 PM Gustavo Beneitez <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Hi everyone,
>>>>>>>
>>>>>>> I have tried to look for an answer before writing this email, no
>>>>>>> luck. Sorry for the inconvenience if it is already answered.
>>>>>>>
>>>>>>> I need to set a cookie at the begining of the web crawling. The
>>>>>>> cookie rules the language you get the content, and while there are 
>>>>>>> several
>>>>>>> choices, if no cookie is found there will be a "default language".
>>>>>>>
>>>>>>> I made a JSP which sets the cookie and contains several links
>>>>>>> (href), and pointed ManifoldCF to this page as the repository seed. I
>>>>>>> expected to get the crawling engine starting to capture links with 
>>>>>>> correct
>>>>>>> language indicated by the cookie, but what I really got is a lot of 
>>>>>>> content
>>>>>>> shown in default language.
>>>>>>>
>>>>>>> What I think about that is that cookies are not shared between
>>>>>>> thread spiders, so it is not possible to get cookies remain between 
>>>>>>> links.
>>>>>>> Cookie domain is correct, also cookie expiration
>>>>>>>
>>>>>>> I would appreciate so much  if you can help me on this.
>>>>>>>
>>>>>>> Thanks in advance!
>>>>>>>
>>>>>>>
>>>>>>>

Reply via email to