Has anyone a HTML parser for J?

I don't want an academic/lab quality HTML parser; I'd a mature, robust, forgiving, "real world" parser. But I'll take what I can get.

In particular, I'd be interested if anyone has interfaced with IE's DOM, using mshtml.dll or shdocvw.dll or both.

I tried to use Oleg's XML parser, but no go: I'm not dealing with XHTML; I've got HTML 4 (and sometimes ill formed, at that).

-Dan

PS: I tried to post the following in J Chat, because it's unrelated to J, but it didn't show up. It's question related to the above. All advice is welcome.

=== Sniffing POST variables sent over SSL ===

Is there any way to figure out exactly the HTTP request my browser is POSTing to a HTTPS server?

I am automating some data-fetching. The manual method involves visiting a web site, filling a few fields in a horrifically complicated form, and submitting it.

I want to compose the proper POST parameters to emulate this method, so I can instruct wget to fetch the data for me. The problem is, I have no idea what parameters are being POSTed.

My usual solution to this problem is to visit the page, and fill out the form, but not submit it. Then I run netcat on a local port, set my browser to use localhost:port as a proxy, and submit the form. Then I can see the exact request my browser is sending to the server.

Unfortunately, this server is using SSL (https), so all I see in my netcat output is a CONNECT request. For the same reason (SSL), I can't use other obvious approaches like TCPdump or Ethereal.

I could work through the HTML of the page, but (A) the form is /very/ complicated (and AJAX-y with subforms), and (B) that would require testing/debugging, and I'd rather not burden the server that way.

Suggestions? Do any of the command line browsers have something like a --record-all-transactions option ?

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to