Re: Web crawler/scraping

2021-02-17 Thread Carlos Cabral via Digitalmars-d-learn
On Wednesday, 17 February 2021 at 13:13:00 UTC, Adam D. Ruppe 
wrote:
On Wednesday, 17 February 2021 at 12:12:56 UTC, Carlos Cabral 
wrote:
I'm trying to collect some json data from a website/admin 
panel automatically, which is behind a login form.


Does the website need javascript?

If not, my dom.d may be able to help. It can download some 
HTML, parse it, fill in forms, then my http2.d submits it (I 
never implemented Form.submit in dom.d but it is pretty easy to 
make with other functions that are implemented, heck maybe I'll 
implement it now if it sounds like it might work).


Or if it is all json you might be able to just craft some 
requests with my lib or even phobos' std.net.curl that submits 
the login request, saves a cookie, then fetches some json stuff.


I literally just rolled out of bed but in an hour or two I can 
come back and make some example code for you if this sounds 
plausible.


...and it's working :)
thank you Adam and Ferhat

leaving this here if anyone needs:

```
import std.stdio;
import std.string;
import std.net.curl;
import core.thread;

void main()
{
int waitTime = 5;
auto domain = "https://example.com;;
auto cookiesFile = "cookies.txt";
auto http = HTTP();

http.handle.set(CurlOption.use_ssl, 1);
http.handle.set(CurlOption.ssl_verifypeer, 0);
http.handle.set(CurlOption.cookiefile, cookiesFile);
http.handle.set(CurlOption.cookiejar , cookiesFile);
http.setUserAgent("...");
http.onReceive = (ubyte[] data) { (...) }

http.method = HTTP.Method.get;
http.url = domain ~ "/login";
http.perform();

Thread.sleep(waitTime.seconds);

auto data = "username=user=pass";
http.method = HTTP.Method.post;
http.url = domain ~ "/login";
http.setPostData(data, "application/x-www-form-urlencoded");
http.perform();

Thread.sleep(waitTime.seconds);

http.method = HTTP.Method.get;
http.url = domain ~ "/fetchjson";
http.perform();
}
```


Re: Web crawler/scraping

2021-02-17 Thread Carlos Cabral via Digitalmars-d-learn
On Wednesday, 17 February 2021 at 13:13:00 UTC, Adam D. Ruppe 
wrote:
On Wednesday, 17 February 2021 at 12:12:56 UTC, Carlos Cabral 
wrote:
I'm trying to collect some json data from a website/admin 
panel automatically, which is behind a login form.


Does the website need javascript?

If not, my dom.d may be able to help. It can download some 
HTML, parse it, fill in forms, then my http2.d submits it (I 
never implemented Form.submit in dom.d but it is pretty easy to 
make with other functions that are implemented, heck maybe I'll 
implement it now if it sounds like it might work).


Or if it is all json you might be able to just craft some 
requests with my lib or even phobos' std.net.curl that submits 
the login request, saves a cookie, then fetches some json stuff.


I literally just rolled out of bed but in an hour or two I can 
come back and make some example code for you if this sounds 
plausible.


No, I don't think it needs JS.
I think can submit the login form and then just fetch/save the 
json request using the login cookie as you suggest. A 
crawler/scraping solution maybe overkill...


I'll try with std.net.curl and come back to you in a couple of 
hours


Thank you!!




Re: Web crawler/scraping

2021-02-17 Thread Adam D. Ruppe via Digitalmars-d-learn
On Wednesday, 17 February 2021 at 12:12:56 UTC, Carlos Cabral 
wrote:
I'm trying to collect some json data from a website/admin panel 
automatically, which is behind a login form.


Does the website need javascript?

If not, my dom.d may be able to help. It can download some HTML, 
parse it, fill in forms, then my http2.d submits it (I never 
implemented Form.submit in dom.d but it is pretty easy to make 
with other functions that are implemented, heck maybe I'll 
implement it now if it sounds like it might work).


Or if it is all json you might be able to just craft some 
requests with my lib or even phobos' std.net.curl that submits 
the login request, saves a cookie, then fetches some json stuff.


I literally just rolled out of bed but in an hour or two I can 
come back and make some example code for you if this sounds 
plausible.


Re: Web crawler/scraping

2021-02-17 Thread Carlos Cabral via Digitalmars-d-learn
On Wednesday, 17 February 2021 at 12:27:16 UTC, Ferhat Kurtulmuş 
wrote:
On Wednesday, 17 February 2021 at 12:12:56 UTC, Carlos Cabral 
wrote:

Hi,
I'm trying to collect some json data from a website/admin 
panel automatically, which is behind a login form.


Is there a D library that can help me with this?

Thank you


I found this but it looks outdated:

https://github.com/gedaiu/selenium.d


Thanks!
This seems to depend on Selenium, I was looking for something 
standalone, like


crawler.get(...)
crawler.post(...)
crawler.parse(...)

so that I can deploy the executable in the client's network as a 
single executable (the website I'm crawling is only available 
internally...).


Re: Web crawler/scraping

2021-02-17 Thread Ferhat Kurtulmuş via Digitalmars-d-learn
On Wednesday, 17 February 2021 at 12:12:56 UTC, Carlos Cabral 
wrote:

Hi,
I'm trying to collect some json data from a website/admin panel 
automatically, which is behind a login form.


Is there a D library that can help me with this?

Thank you


I found this but it looks outdated:

https://github.com/gedaiu/selenium.d


Web crawler/scraping

2021-02-17 Thread Carlos Cabral via Digitalmars-d-learn

Hi,
I'm trying to collect some json data from a website/admin panel 
automatically, which is behind a login form.


Is there a D library that can help me with this?

Thank you