;
}
-
To unsubscribe, e-mail: httpclient-users-unsubscr...@hc.apache.org
For additional commands, e-mail: httpclient-users-h...@hc.apache.org
--
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions training
interesting.
-- Ken
--
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions training
Hadoop, Cascading, Cassandra Solr
to upgrade.
[X] upgrade to Java 1.6: one step at a time.
[ ] upgrade to Java 1.7: new features are more important.
--
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions training
Hadoop, Cascading, Cassandra Solr
as a large-scale crawl.
-- Ken
On Jan 7, 2013, at 2:39am, Oleg Kalnichevski wrote:
On Sun, 2013-01-06 at 15:48 -0800, Ken Krugler wrote:
Hi Oleg,
[snip]
Ken,
You might want to have a look at the lest code in SVN trunk (to be
released as 4.3). Several classes such as the scheme registry
--
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions training
Hadoop, Cascading, Cassandra Solr
On Jan 5, 2013, at 2:11pm, sebb wrote:
On 5 January 2013 21:33, vigna vi...@di.unimi.it wrote:
But why would you want a web crawler to have 10-20K simultaneously
opened connections in the first place?
(I thought I answered this, but it's not on the archive. Boh.)
Having a few thousands
of the server, which could be a better approach to constraining
the # of keep-alive requests.
-- Ken
--
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions training
Hadoop, Cascading, Mahout Solr
On Jan 5, 2013, at 3:31pm, vigna wrote:
On 5 Jan 2013, at 3:10 PM, Ken Krugler kkrugler_li...@transpac.com wrote:
So on a large box (e.g. 24 more powerful cores) I could see using upward
of 10K threads being the
optimal number.
We are working to make 20-30K connections work on 64 cores
http://about.me/kkrugler
+1 530-210-6378
--
Ken Krugler
http://www.scaleunlimited.com
custom big data solutions training
Hadoop, Cascading, Mahout Solr
is the code:
http://pastebin.com/H1PWqdBc
Question same as always: why doesn't it work?
--
Ken Krugler
http://www.scaleunlimited.com
custom big data solutions training
Hadoop, Cascading, Mahout Solr
On Jan 31, 2012, at 7:30am, Oleg Kalnichevski wrote:
On Mon, 2012-01-30 at 17:56 -0800, Ken Krugler wrote:
OK, answering my own question - because ignoreCookies isn't a supported
policy.
I assume either one should be added, or the documentation fixed up.
-- Ken
Hi Ken
the value of a form field of
a website? I guess that's already considered parsing and therefore not part
of the httpclient? I am just a little bit irritated as it's possible to set
form values when using http post. Thanks for you help :-)
Best regards
Stefan
--
Ken Krugler
didn't see how to do that?
2012/1/30 Ken Krugler kkrugler_li...@transpac.com
Assuming you've already read
http://hc.apache.org/httpcomponents-client-ga/primer.html, could you
provide more details of what exactly you're trying to do with HttpClient?
-- Ken
On Jan 30, 2012, at 12:56pm
to use that
policy.
Thanks,
-- Ken
--
Ken Krugler
http://www.scaleunlimited.com
custom big data solutions training
Hadoop, Cascading, Mahout Solr
OK, answering my own question - because ignoreCookies isn't a supported
policy.
I assume either one should be added, or the documentation fixed up.
-- Ken
On Jan 30, 2012, at 4:46pm, Ken Krugler wrote:
Is there any reason why CookiePolicy doesn't include
public static final String IGNORE
during web crawling, when I had 300+
threads sharing one connection pool.
Would it work to go for finer-grained locking, by using atomic counters to
track enforce limits on per route/total connections?
-- Ken
--
Ken Krugler
http://www.scaleunlimited.com
custom big data
profile at:
http://www.linkedin.com/in/rjeffreyvincent
I ♥ DropBox http://db.tt/9O6LfBX !!
--
Ken Krugler
+1 530-210-6378
http://bixolabs.com
custom big data solutions training
Hadoop, Cascading, Mahout Solr
-
To unsubscribe, e-mail: httpclient-users-unsubscr...@hc.apache.org
For additional commands, e-mail: httpclient-users-h...@hc.apache.org
--
Ken Krugler
+1 530-210-6378
http://bixolabs.com
custom data mining solutions
...@hc.apache.org
For additional commands, e-mail: httpclient-users-h...@hc.apache.org
--
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c w e b m i n i n g
in section 4.3.3 of [XML]
that directly address this contingency.
Which means it will look for a byte-order-mark and encoding declaration inside
of the XML content.
-- Ken
--
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c w e b m i n i n g
-in
replacement for HttpClient 3.0.1?
I haven't seen anything that would lead me to believe otherwise, but I
also haven't found anybody using HC 3.1 with Hadoop in a large-scale
cluster.
Thanks,
-- Ken
--
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c w e b
--
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c w e b m i n i n g
-
To unsubscribe, e-mail: httpclient-users-unsubscr...@hc.apache.org
For additional commands, e-mail: httpclient-users-h
Apache Droids also uses HttpClient 4.x
And Nutch uses HttpClient 3.1.
-- Ken
--
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c w e b m i n i n g
-
To unsubscribe, e-mail
, e-mail: httpclient-users-h...@hc.apache.org
--
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c w e b m i n i n g
-
To unsubscribe, e-mail: httpclient-users-unsubscr...@hc.apache.org
Does httpclient has a connection pool like JDBC?
Yes.
-- Ken
--
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c w e b m i n i n g
-
To unsubscribe, e-mail: httpclient-users
.
-- Ken
--
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c w e b m i n i n g
-
To unsubscribe, e-mail: httpclient-users-unsubscr...@hc.apache.org
For additional commands, e-mail
hostUrl = new URI(host.toURI()).toURL();
return new URL(hostUrl,
finalRequest.getURI().toString()).toExternalForm();
-- Ken
http://ken-blog.krugler.org
+1 530-265-2225
--
Ken Krugler
+1 530-210-6378
http
the entire
environment, not just HttpClient parameters.
-- Ken
--
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c w e b m i n i n g
-
To unsubscribe, e-mail: httpclient-users-unsubscr
: httpclient-users-unsubscr...@hc.apache.org
For additional commands, e-mail: httpclient-users-h...@hc.apache.org
http://ken-blog.krugler.org
+1 530-265-2225
--
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c
...@hc.apache.org
For additional commands, e-mail: httpclient-users-h...@hc.apache.org
http://ken-blog.krugler.org
+1 530-265-2225
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c w e b m
Hi Oleg,
On Jun 24, 2010, at 6:54am, Oleg Kalnichevski wrote:
On Thu, 2010-06-24 at 05:42 -0700, Ken Krugler wrote:
On Jun 23, 2010, at 8:24pm, ctg3 wrote:
I am having an issue with HttpClient 4.01. When I try to access
www.google.com or any other website I get a
java.net.SocketException
to offer.
I might wind up building my own version of HttpClient that uses a
custom URI class, which wraps the standard URI class other than
changes needed for non-standard subdomains. If so, I'll post my notes
about how that worked.
-- Ken
Ken Krugler wrote:
Hi Udit,
I believe the problem
.
Thanks
Sachin
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c w e b m i n i n g
-
To unsubscribe, e-mail: httpclient-users-unsubscr...@hc.apache.org
For additional
at Nabble.com.
-
To unsubscribe, e-mail: httpclient-users-unsubscr...@hc.apache.org
For additional commands, e-mail: httpclient-users-h...@hc.apache.org
Ken Krugler
+1 530-210-6378
http
, for any loss or damage arising in any way from its use.
If you received this transmission in error, please immediately
contact the sender and destroy the material in its entirety,
whether in electronic or hard copy format. Thank you.
Ken Krugler
+1
idempotent = !(request instanceof
HttpEntityEnclosingRequest);
// Retry if the request is considered idempotent
return idempotent;
}
}
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c w e b m i n i
or an authorized designee, you may not copy
or use it, or disclose it to anyone else. If you received it in
error please notify us immediately and then destroy it.
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c w e b m i n i n g
http://ken-blog.krugler.org
+1 530-265-2225
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c w e b m i n i n g
On Mar 29, 2010, at 1:59pm, KARR, DAVID (ATTSI) wrote:
-Original Message-
From: Ken Krugler [mailto:kkrugler_li...@transpac.com]
Sent: Monday, March 29, 2010 1:42 PM
To: HttpClient User Discussion
Subject: Re: Simulating connection timeout?
On Mar 29, 2010, at 1:12pm, Sam Crawford
a connection timeout, as no response will be
received. Alternatively, if you have a large regional network, try
connecting to a host in another city/country.
Thanks,
Sam
On 29 March 2010 20:53, Ken Krugler kkrugler_li...@transpac.com
wrote:
On Mar 29, 2010, at 12:31pm, KARR, DAVID (ATTSI) wrote:
I'm
: Ken Krugler [mailto:kkrugler_li...@transpac.com]
Sent: Monday, March 22, 2010 2:39 PM
To: HttpClient User Discussion
Subject: Re: URL/URI syntax issue in HC 4.0.1
On Mar 22, 2010, at 2:15pm, natarajan_va...@emc.com
natarajan_va...@emc.com
wrote:
The URL, http://win2k3_64_ora:9300/app, seems
.
-
To unsubscribe, e-mail: httpclient-users-unsubscr...@hc.apache.org
For additional commands, e-mail: httpclient-users-h...@hc.apache.org
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c w e b m i n i n g
of the last request
made that succeeded.
-- Ken
On Tue, Mar 23, 2010 at 4:01 PM, Ken Krugler
kkrugler_li...@transpac.com wrote:
I haven't used HttpClient 3.1 for a while now, so I'm not up on the
typical
connection management problems.
I don't think that two threads will share the same
to pipeline a bunch together).
Thanks in advance,
Brian
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c w e b m i n i n g
-
To unsubscribe, e-mail: httpclient
...@hc.apache.org
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c w e b m i n i n g
-
To unsubscribe, e-mail: httpclient-users-unsubscr...@hc.apache.org
it to anyone else. If you received it in
error please notify us immediately and then destroy it.
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c w e b m i n i n g
for the failure case, so
you'd see exactly what's coming back from the server.
-- Ken
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c w e b m i n i n g
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c w e b m i n i n g
-
To unsubscribe, e-mail: httpclient-users-unsubscr...@hc.apache.org
For additional commands, e-mail: httpclient-users-h
/div
/div
Any help will be greatly appreciated.
Thanks,
Robert
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c w e b m i n i n g
,
Robert
On Mon, Feb 15, 2010 at 10:36 AM, Ken Krugler
kkrugler_li...@transpac.comwrote:
In the automated login, I don't see the POST parameters:
txtUserName=myUserNametxtPassword=myPassWord
-- Ken
On Feb 15, 2010, at 7:36am, Robert Stone wrote:
*Thanks to Jeff* for getting me on my way
redirect = new HttpGet(uri);
httpClient.execute(redirect, context);
return context;
thanks
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c w e b m i n i n g
-unsubscr...@hc.apache.org
For additional commands, e-mail: httpclient-users-h...@hc.apache.org
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c w e b m i n i n g
On Jan 29, 2010, at 3:35am, sebb wrote:
On 29/01/2010, Ken Krugler kkrugler_li...@transpac.com wrote:
On Jan 28, 2010, at 10:09pm, amoldavsky wrote:
Hi Oleg,
Thank you for the quick reply.
So if there is a possibility that not the whole buffer is filled
how can I
insure or force
don't.
The browser might be synthesizing something (depending on the browser)
for empty 404 pages.
What happens when you do a GET request to that server via curl (with
the -v option)?
-- Ken
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c w e b m i n i n g
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c w e b m i n i n g
with the HttpGet object, and that in turn calls
request.abort().
-- Ken
Ken Krugler wrote:
On Jan 26, 2010, at 3:54am, Claudio Martella wrote:
As I mentioned in the previous post, i'm using httpclient for a
webcrawler i'm writing. at the moment i'm doing something like this:
while
http://ken-blog.krugler.org
+1 530-265-2225
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c w e b m i n i n g
-client/primer.html
And then provide some input on what's so bad about it.
-- Ken
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c w e b m i n i n g
.
-
To unsubscribe, e-mail: httpclient-users-unsubscr...@hc.apache.org
For additional commands, e-mail: httpclient-users-h...@hc.apache.org
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c w e b m i n i n g
it to the socket. Is this
possible with HttpClient, and if so how would I go about
implementing this?
Many thanks,
Tony
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c w e b m i n i n g
,
Tony
Message du 09/01/10 21:03
De : Ken Krugler
A : HttpClient User Discussion
Copie à :
Objet : Re: Efficiently repeating identical requests
Hi Tony,
I'm wondering why you need to do this level of optimization - are you
running into issues with this type of POST request chewing up too
many
Hi Oleg,
On Jan 9, 2010, at 1:35pm, Oleg Kalnichevski wrote:
Ken Krugler wrote:
I wanted to verify some behavior I'm seeing with HttpClient 4.0
I occasionally get a ConnectionPoolTimeoutException, even when I've
got spare connections in my ThreadSafeClientConnManager pool.
Looking
.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:319)
If anybody (well, OK, Oleg) has input on things I could be doing wrong
to trigger this type of behavior, and/or ways to avoid it, I'm all ears.
-- Ken
Ken Krugler
+1 530-210-6378
http
that would only mask the problem.
Thanks,
-- Ken
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c w e b m i n i n g
-
To unsubscribe, e-mail: httpclient-users
-User mailing list archive at Nabble.com.
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c w e b m i n i n g
-
To unsubscribe, e-mail: httpclient-users-unsubscr
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c w e b m i n i n g
-
To unsubscribe, e-mail: httpclient-users-unsubscr...@hc.apache.org
For additional commands, e-mail: httpclient-users-h...@hc.apache.org
Hi Oleg,
On Dec 3, 2009, at 2:40am, Oleg Kalnichevski wrote:
On Wed, 2009-12-02 at 19:15 -0800, Ken Krugler wrote:
Below is an email from August 7th, which I'm reviving due to this
becoming a bigger issue over in Bixo-land.
I've continued to run into this issue with my crawls, but so far I'm
(Unknown Source)
The exception has been caused by the server shutting down the
connection
prematurely most likely due to a error or configuration issue on the
server side. This has nothing to do with HttpClient.
Oleg
--
Ken Krugler
TransPac Software, Inc.
http
/tutorial/html/fundamentals.html#d4e205
The key bit of documentation was at:
http://hc.apache.org/httpcomponents-client/tutorial/html/httpagent.html#d4e1022
-- Ken
--
Ken Krugler
TransPac Software, Inc.
http://www.transpac.com
+1 530-210-6378
the number of parallel
request to one host to be two. Not sure if that would be a factor in
your case, given how you're creating a new client for each request.
-- Ken
--
Ken Krugler
TransPac Software, Inc.
http://www.transpac.com
+1 530-210-6378
: httpclient-users-unsubscr...@hc.apache.org
For additional commands, e-mail: httpclient-users-h...@hc.apache.org
--
Ken Krugler
TransPac Software, Inc.
http://www.transpac.com
+1 530-210-6378
, to
avoid issues with terminals set to us-ascii.
Often problems like this are caused by server configuration - e.g.
there are countless posts on the Solr mailing list about needing to
configure Tomcat to treat incoming URLs as UTF-8.
-- Ken
--
Ken Krugler
() and HttpResponse#getAllHeaders() -
couldn't find anything.
From past posts on the list, I thought httpMethod.getURI() would
return the final URL.
-- Ken
--
Ken Krugler
TransPac Software, Inc.
http://www.transpac.com
+1 530-210-6378
Hi Magnus,
On Sep 2, 2009, at 1:22am, Magnus Olstad Hansen wrote:
Hello,
I'm using HttpClient 4.0 to download a webpage the same way as shown
in one of the examples. This is my method to return a webpage as a
string:
protected static String leechUrl(String url) throws
...@hc.apache.org
--
Ken Krugler
TransPac Software, Inc.
http://www.transpac.com
+1 530-210-6378
-
To unsubscribe, e-mail: httpclient-users-unsubscr...@hc.apache.org
For additional commands, e-mail: httpclient
: httpclient-users-h...@hc.apache.org
--
Ken Krugler
TransPac Software, Inc.
http://www.transpac.com
+1 530-210-6378
-
To unsubscribe, e-mail: httpclient-users-unsubscr...@hc.apache.org
For additional
they will
help.
Thanks,
Melroy
--
Ken Krugler
TransPac Software, Inc.
http://www.transpac.com
+1 530-210-6378
-
To unsubscribe, e-mail: httpclient-users-unsubscr...@hc.apache.org
For additional commands, e
On Aug 25, 2009, at 3:39pm, melroyr wrote:
I have no idea how to set the user agent in HTTPClient
The (really good) on-line documentation is your friend.
http://hc.apache.org/httpcomponents-client/tutorial/html/
-- Ken
melroyr wrote:
I have written a program to download html pages
Hi Melroy,
On Aug 24, 2009, at 12:20pm, melroyr wrote:
I have written a program to download html pages from harristeeter.
However,
when I run my program, I get the following
!DOCTYPE HTML PUBLIC -//W3C//DTD HTML 4.01 Frameset//EN
http://www.w3.org/TR/html4/frameset.dtd;
html
head
...@hc.apache.org
For additional commands, e-mail: httpclient-users-h...@hc.apache.org
--
Ken Krugler
TransPac Software, Inc.
http://www.transpac.com
+1 530-210-6378
-
To unsubscribe, e-mail: httpclient-users-unsubscr
of stack space used per thread, or DNS lookups becoming
slow, etc.
-- Ken
Ken Krugler wrote:
Hi Yan Cheng,
I haven't used HttpClient 3.x for a while - switched to 4.0 and
haven't looked back.
But in general method A is going to work better. You can configure
On Aug 17, 2009, at 11:27am, droidin.net wrote:
Here's what I need to do
1. Read response as a stream
2. Feed it into SAX-based HTML parser on the fly
3. When certain tag is detected - terminate the stream
In other words - I'm reading large documents from which I only need
top 5%,
can I do
them up before following them.
-- Ken
--
Ken Krugler
TransPac Software, Inc.
http://www.transpac.com
+1 530-210-6378
-
To unsubscribe, e-mail: httpclient-users-unsubscr...@hc.apache.org
For additional
,
-- Ken
http://ken-blog.krugler.org
+1 530-265-2225
--
Ken Krugler
TransPac Software, Inc.
http://www.transpac.com
+1 530-210-6378
--
Ken Krugler
TransPac Software, Inc.
http://www.transpac.com
+1 530-210-6378
-
To unsubscribe, e-mail: httpclient-users-unsubscr...@hc.apache.org
For additional commands, e-mail: httpclient-users-h
with a stale connection too optimistic?
Thanks,
-- Ken
--
Ken Krugler
TransPac Software, Inc.
http://www.transpac.com
+1 530-210-6378
-
To unsubscribe, e-mail: httpclient-users-unsubscr...@hc.apache.org
--
Ken Krugler
TransPac Software, Inc.
http://www.transpac.com
+1 530-210-6378
-
To unsubscribe, e-mail: httpclient-users-unsubscr...@hc.apache.org
For additional commands, e-mail: httpclient-users-h...@hc.apache.org
, which is definitely a speed
win.
Note that with the MultiThreadedHttpConnectionManager, you create one
HttpClient, and then re-use it for all of your (multi-threaded)
requests.
-- Ken
--
Ken Krugler
+1 530-210-6378
with 4.0.
-- Ken
--
Ken Krugler
+1 530-210-6378
-
To unsubscribe, e-mail: httpclient-users-unsubscr...@hc.apache.org
For additional commands, e-mail: httpclient-users-h...@hc.apache.org
setup. Is there a way to use
HttpParams to change values on a per-request basis?
[snip]
Thanks,
-- Ken
--
Ken Krugler
+1 530-210-6378
-
To unsubscribe, e-mail: httpclient-users-unsubscr...@hc.apache.org
For additional commands
, I really only want to retry the request when I get an
IOException. And that there's no point in retrying more than once.
Thanks,
-- Ken
--
Ken Krugler
+1 530-210-6378
-
To unsubscribe, e-mail: httpclient-users-unsubscr
, with HttpClient 3.1 there was a weird thing where
httpclient.wire was the name of the wire logger, not
org.apache.whatever
-- Ken
--
Ken Krugler
+1 530-210-6378
-
To unsubscribe, e-mail: httpclient-users-unsubscr...@hc.apache.org
On Mon, May 04, 2009 at 04:30:15PM -0700, Ken Krugler wrote:
Hi all,
In Http 3.1, the Nutch code base would configure timeouts using the
following snippet of code:
MultiThreadedHttpConnectionManager connectionManager =
new MultiThreadedHttpConnectionManager
94 matches
Mail list logo