Re: [pfSense] DNS resolution issues under heavy load

2014-03-25 Thread David Noel
Well, it looks like it's the cable modem after all. Under load I'm
unable to connect to it's admin panel, even when I'm directly
connected to it. I called Comcast's technical support and had them run
their diagnostics on it while everything was running and it failed
miserably. The tech agreed with the conclusion that the modem was
incapable of handling the load. So it looks like I'm in the market for
a new cable modem. I'm not sure how to find one that will meet my
needs though. Any DOCSIS 3 compatible modem will work on Comcast's
network.

Does anyone know of any models that are designed for heavy load? I'd
probably need something that was built for networks of ~10,000 users.
I'm not sure what sort of load 10,000 users generates, but I suspect
it would peak around the 10-100 requests per second that my crawlers
are putting out.

If not, can anyone recommend a place where I might be able to find an
answer to this question? Mailing list? Web forum? IRC channel, even?
I'd really rather not have to pull specs on every DOCSIS 3 compatible
modem and make a best guess based on microcontrollers/CPUs.

Many thanks,

-David



On 3/18/14, David Noel david.i.n...@gmail.com wrote:
 Well, I bumped Maximum State Table from the default of 23,000 to
 75,000, and now it's throwing fewer UnknownHostException's. But
 they're still being thrown. My resource utilization is getting pretty
 high though. I don't think these ALIX boards can handle much more of a
 load, and I still have 2 more servers I need to scale these crawlers
 out to. I do see there's a Firewall Adaptive Timeouts setting in the
 web configurator.. this seems like it might be useful. Can anyone
 recommend any settings I should try to free up some system resources?
 I'm not clear on the consequences of purging pf state entries and
 whether that's something I'd want to do though.

 The state table on my primary router (alix1) is at roughly 50%
 utilization, or 40,000 states. The state table on my secondary router
 (alix2) is at 0%, roughly 250 states. This seems odd. Is this to be
 expected under CARP? Why is the load not distributed evenly?

 Memory usage on my primary router (alix1) is hovering around 55% (of
 235MB). On my backup (alix2) it's pushing 85-90%. Does this make sense
 to anyone? Top output looks roughly the same... and now alix2 has gone
 down. 95% packet loss. Web Configurator unresponsive. ... It's back up
 but throwing 500 - Internal Server Errors periodically. I've ssh'd
 in to alix2 and am looking at top output.. tcpdump seems to be running
 for pflog purposes.. and it's hogging quite a bit of CPU. Is this
 necessary? Can I disable it somehow?

 -David

 On 3/18/14, David Noel david.i.n...@gmail.com wrote:
 I've encountered a strange issue while scaling a Java project that I'm
 not quite sure how to resolve. Any thoughts would be appreciated.

 The code is a crawler that uses HTMLUnit to crawl a bunch of pages
 concurrently. It uses HTMLUnits getPage method to do the crawling. I'm
 running 100 threads per instance. When I have 1 instance up and
 running on 1 machine everything is fine. When I scale it to a second
 machine though I start having trouble. Calls to getPage keep throwing
 UnknownHostException's (DNS resolution error). With 2 servers running,
 roughly 1 out of every 20 calls to getPage throw this exception. For
 some reason it's unable to resolve domain names.. and it's not just
 the crawlers, my entire network starts to bug on DNS queries. On
 different systems on the same network I get 'unable to resolve host'
 errors in my web browser periodically when loading URL's. Usually when
 I retry it goes through, but it keeps happening sporadically as long
 as the crawlers are running.

 So many things could be going wrong here. Thinking maybe it was my
 provider throttling DNS queries I've tried changing DNS servers, but
 that's done nothing. Thinking it might be a bandwidth issue I checked
 systat, but the cumulative load is well under what my line can handle.
 What else could be causing this? My network is pretty simple: Provider
 -- modem -- 2 ALIX boards running pfSense -- Servers and
 workstations. The servers are running FreeBSD, and the workstations
 run FreeBSD, Windows, and OSX.

 Has anyone encountered this before? Does anyone have any thoughts on
 what might be causing it?

 My only other thought is that maybe pfSense is doing something strange
 so if I can't come up with any better ideas I'll try plugging the
 servers directly into the modem. I'd rather have them behind the
 routers though, so this would be a less-than-ideal solution.

 UPDATE: Ok, so it seems to be a pfSense issue. I launched the crawlers
 on 2 servers as before and waited for UnknownHostException's to be
 thrown. I then took a spare laptop and connected it directly into my
 modem, bypassing my 2 pfSense routers. All DNS queries have gone
 through without a hitch, so something strange is going on with
 pfSense. Can anyone think of what might be causing 

Re: [pfSense] DNS resolution issues under heavy load

2014-03-25 Thread A Mohan Rao
Pls share ur load with two pfsense server 1 is too much heavy users
load i have 1200 users thats why i install two pfsense boxes in my network.
After i never face this type of problem.
On Mar 25, 2014 7:15 PM, David Noel david.i.n...@gmail.com wrote:

 Well, it looks like it's the cable modem after all. Under load I'm
 unable to connect to it's admin panel, even when I'm directly
 connected to it. I called Comcast's technical support and had them run
 their diagnostics on it while everything was running and it failed
 miserably. The tech agreed with the conclusion that the modem was
 incapable of handling the load. So it looks like I'm in the market for
 a new cable modem. I'm not sure how to find one that will meet my
 needs though. Any DOCSIS 3 compatible modem will work on Comcast's
 network.

 Does anyone know of any models that are designed for heavy load? I'd
 probably need something that was built for networks of ~10,000 users.
 I'm not sure what sort of load 10,000 users generates, but I suspect
 it would peak around the 10-100 requests per second that my crawlers
 are putting out.

 If not, can anyone recommend a place where I might be able to find an
 answer to this question? Mailing list? Web forum? IRC channel, even?
 I'd really rather not have to pull specs on every DOCSIS 3 compatible
 modem and make a best guess based on microcontrollers/CPUs.

 Many thanks,

 -David



 On 3/18/14, David Noel david.i.n...@gmail.com wrote:
  Well, I bumped Maximum State Table from the default of 23,000 to
  75,000, and now it's throwing fewer UnknownHostException's. But
  they're still being thrown. My resource utilization is getting pretty
  high though. I don't think these ALIX boards can handle much more of a
  load, and I still have 2 more servers I need to scale these crawlers
  out to. I do see there's a Firewall Adaptive Timeouts setting in the
  web configurator.. this seems like it might be useful. Can anyone
  recommend any settings I should try to free up some system resources?
  I'm not clear on the consequences of purging pf state entries and
  whether that's something I'd want to do though.
 
  The state table on my primary router (alix1) is at roughly 50%
  utilization, or 40,000 states. The state table on my secondary router
  (alix2) is at 0%, roughly 250 states. This seems odd. Is this to be
  expected under CARP? Why is the load not distributed evenly?
 
  Memory usage on my primary router (alix1) is hovering around 55% (of
  235MB). On my backup (alix2) it's pushing 85-90%. Does this make sense
  to anyone? Top output looks roughly the same... and now alix2 has gone
  down. 95% packet loss. Web Configurator unresponsive. ... It's back up
  but throwing 500 - Internal Server Errors periodically. I've ssh'd
  in to alix2 and am looking at top output.. tcpdump seems to be running
  for pflog purposes.. and it's hogging quite a bit of CPU. Is this
  necessary? Can I disable it somehow?
 
  -David
 
  On 3/18/14, David Noel david.i.n...@gmail.com wrote:
  I've encountered a strange issue while scaling a Java project that I'm
  not quite sure how to resolve. Any thoughts would be appreciated.
 
  The code is a crawler that uses HTMLUnit to crawl a bunch of pages
  concurrently. It uses HTMLUnits getPage method to do the crawling. I'm
  running 100 threads per instance. When I have 1 instance up and
  running on 1 machine everything is fine. When I scale it to a second
  machine though I start having trouble. Calls to getPage keep throwing
  UnknownHostException's (DNS resolution error). With 2 servers running,
  roughly 1 out of every 20 calls to getPage throw this exception. For
  some reason it's unable to resolve domain names.. and it's not just
  the crawlers, my entire network starts to bug on DNS queries. On
  different systems on the same network I get 'unable to resolve host'
  errors in my web browser periodically when loading URL's. Usually when
  I retry it goes through, but it keeps happening sporadically as long
  as the crawlers are running.
 
  So many things could be going wrong here. Thinking maybe it was my
  provider throttling DNS queries I've tried changing DNS servers, but
  that's done nothing. Thinking it might be a bandwidth issue I checked
  systat, but the cumulative load is well under what my line can handle.
  What else could be causing this? My network is pretty simple: Provider
  -- modem -- 2 ALIX boards running pfSense -- Servers and
  workstations. The servers are running FreeBSD, and the workstations
  run FreeBSD, Windows, and OSX.
 
  Has anyone encountered this before? Does anyone have any thoughts on
  what might be causing it?
 
  My only other thought is that maybe pfSense is doing something strange
  so if I can't come up with any better ideas I'll try plugging the
  servers directly into the modem. I'd rather have them behind the
  routers though, so this would be a less-than-ideal solution.
 
  UPDATE: Ok, so it seems to be a pfSense issue. I 

Re: [pfSense] DNS resolution issues under heavy load

2014-03-25 Thread Ryan Coleman
I’m perfectly content renting a DOCSIS3 from Comcast and have been doing so for 
two years.

Cost be damned - it’s worth it to not have to own it.

What model do you have? SMC? Nortel? Motorola?


On Mar 25, 2014, at 8:45 AM, David Noel david.i.n...@gmail.com wrote:

 Well, it looks like it's the cable modem after all. Under load I'm
 unable to connect to it's admin panel, even when I'm directly
 connected to it. I called Comcast's technical support and had them run
 their diagnostics on it while everything was running and it failed
 miserably. The tech agreed with the conclusion that the modem was
 incapable of handling the load. So it looks like I'm in the market for
 a new cable modem. I'm not sure how to find one that will meet my
 needs though. Any DOCSIS 3 compatible modem will work on Comcast's
 network.
 
 Does anyone know of any models that are designed for heavy load? I'd
 probably need something that was built for networks of ~10,000 users.
 I'm not sure what sort of load 10,000 users generates, but I suspect
 it would peak around the 10-100 requests per second that my crawlers
 are putting out.
 
 If not, can anyone recommend a place where I might be able to find an
 answer to this question? Mailing list? Web forum? IRC channel, even?
 I'd really rather not have to pull specs on every DOCSIS 3 compatible
 modem and make a best guess based on microcontrollers/CPUs.
 
 Many thanks,
 
 -David
 
 
 
 On 3/18/14, David Noel david.i.n...@gmail.com wrote:
 Well, I bumped Maximum State Table from the default of 23,000 to
 75,000, and now it's throwing fewer UnknownHostException's. But
 they're still being thrown. My resource utilization is getting pretty
 high though. I don't think these ALIX boards can handle much more of a
 load, and I still have 2 more servers I need to scale these crawlers
 out to. I do see there's a Firewall Adaptive Timeouts setting in the
 web configurator.. this seems like it might be useful. Can anyone
 recommend any settings I should try to free up some system resources?
 I'm not clear on the consequences of purging pf state entries and
 whether that's something I'd want to do though.
 
 The state table on my primary router (alix1) is at roughly 50%
 utilization, or 40,000 states. The state table on my secondary router
 (alix2) is at 0%, roughly 250 states. This seems odd. Is this to be
 expected under CARP? Why is the load not distributed evenly?
 
 Memory usage on my primary router (alix1) is hovering around 55% (of
 235MB). On my backup (alix2) it's pushing 85-90%. Does this make sense
 to anyone? Top output looks roughly the same... and now alix2 has gone
 down. 95% packet loss. Web Configurator unresponsive. ... It's back up
 but throwing 500 - Internal Server Errors periodically. I've ssh'd
 in to alix2 and am looking at top output.. tcpdump seems to be running
 for pflog purposes.. and it's hogging quite a bit of CPU. Is this
 necessary? Can I disable it somehow?
 
 -David
 
 On 3/18/14, David Noel david.i.n...@gmail.com wrote:
 I've encountered a strange issue while scaling a Java project that I'm
 not quite sure how to resolve. Any thoughts would be appreciated.
 
 The code is a crawler that uses HTMLUnit to crawl a bunch of pages
 concurrently. It uses HTMLUnits getPage method to do the crawling. I'm
 running 100 threads per instance. When I have 1 instance up and
 running on 1 machine everything is fine. When I scale it to a second
 machine though I start having trouble. Calls to getPage keep throwing
 UnknownHostException's (DNS resolution error). With 2 servers running,
 roughly 1 out of every 20 calls to getPage throw this exception. For
 some reason it's unable to resolve domain names.. and it's not just
 the crawlers, my entire network starts to bug on DNS queries. On
 different systems on the same network I get 'unable to resolve host'
 errors in my web browser periodically when loading URL's. Usually when
 I retry it goes through, but it keeps happening sporadically as long
 as the crawlers are running.
 
 So many things could be going wrong here. Thinking maybe it was my
 provider throttling DNS queries I've tried changing DNS servers, but
 that's done nothing. Thinking it might be a bandwidth issue I checked
 systat, but the cumulative load is well under what my line can handle.
 What else could be causing this? My network is pretty simple: Provider
 -- modem -- 2 ALIX boards running pfSense -- Servers and
 workstations. The servers are running FreeBSD, and the workstations
 run FreeBSD, Windows, and OSX.
 
 Has anyone encountered this before? Does anyone have any thoughts on
 what might be causing it?
 
 My only other thought is that maybe pfSense is doing something strange
 so if I can't come up with any better ideas I'll try plugging the
 servers directly into the modem. I'd rather have them behind the
 routers though, so this would be a less-than-ideal solution.
 
 UPDATE: Ok, so it seems to be a pfSense issue. I launched the crawlers
 on 2 servers as 

Re: [pfSense] DNS resolution issues under heavy load

2014-03-25 Thread Adam Thompson

On Mar 25, 2014, at 8:45 AM, David Noel david.i.n...@gmail.com wrote:

Well, it looks like it's the cable modem after all. Under load I'm
unable to connect to it's admin panel, even when I'm directly
connected to it. I called Comcast's technical support and had them run
their diagnostics on it while everything was running and it failed
miserably. The tech agreed with the conclusion that the modem was
incapable of handling the load. So it looks like I'm in the market for
a new cable modem. I'm not sure how to find one that will meet my
needs though. Any DOCSIS 3 compatible modem will work on Comcast's
network.

Does anyone know of any models that are designed for heavy load? I'd
probably need something that was built for networks of ~10,000 users.
I'm not sure what sort of load 10,000 users generates, but I suspect
it would peak around the 10-100 requests per second that my crawlers
are putting out.

If not, can anyone recommend a place where I might be able to find an
answer to this question? Mailing list? Web forum? IRC channel, even?
I'd really rather not have to pull specs on every DOCSIS 3 compatible
modem and make a best guess based on microcontrollers/CPUs.


Short answer: no DOCSIS cable modems are designed for that kind of 
throughput!


Juniper sells MX480 routers to 10,000-customer-ISPs for ~$250k! 
(Granted, that *is* overkill, but even 10k-user corporations will have 
fairly high-end routers connected via fiber to handle that much traffic.)


Your best bet, I think, would be to find a DOCSIS 3 cable modem that can 
be put into bridging mode.  At that point, the CPU/RAM limitations of 
the cable modem are no longer relevant.


Some confirmation:
- 
http://jkoblovsky.wordpress.com/2012/11/21/how-to-use-your-own-router-with-rogers-docsis-3-0-upgrade/
- 
http://communityforums.rogers.com/t5/forums/forumtopicpage/board-id/Getting_connected/thread-id/12199

(implies Hitron and Moto/ARRIS modems can also do bridge-mode)
- http://digitalhome.ca/forum/showthread.php?t=145997page=6
(implies SMC modem can do bridge mode)
- http://www.dslreports.com/faq/comcast/2.1_Modems#17174
(Comcast-specific)

Once your modem is in bridge mode, the bottleneck should be your 
router.  As you've mentioned, your ALIX boxes are pretty much at their 
limit, too, so you're just moving the bottleneck around.


Apologies if I've missed something fundamental - I haven't followed this 
thread from the beginning...


--
-Adam Thompson
 athom...@athompso.net

___
List mailing list
List@lists.pfsense.org
https://lists.pfsense.org/mailman/listinfo/list


Re: [pfSense] DNS resolution issues under heavy load

2014-03-25 Thread David Noel
SMCD3G

On 3/25/14, Ryan Coleman ryanjc...@me.com wrote:
 I'm perfectly content renting a DOCSIS3 from Comcast and have been doing so
 for two years.

 Cost be damned - it's worth it to not have to own it.

 What model do you have? SMC? Nortel? Motorola?


 On Mar 25, 2014, at 8:45 AM, David Noel david.i.n...@gmail.com wrote:

 Well, it looks like it's the cable modem after all. Under load I'm
 unable to connect to it's admin panel, even when I'm directly
 connected to it. I called Comcast's technical support and had them run
 their diagnostics on it while everything was running and it failed
 miserably. The tech agreed with the conclusion that the modem was
 incapable of handling the load. So it looks like I'm in the market for
 a new cable modem. I'm not sure how to find one that will meet my
 needs though. Any DOCSIS 3 compatible modem will work on Comcast's
 network.

 Does anyone know of any models that are designed for heavy load? I'd
 probably need something that was built for networks of ~10,000 users.
 I'm not sure what sort of load 10,000 users generates, but I suspect
 it would peak around the 10-100 requests per second that my crawlers
 are putting out.

 If not, can anyone recommend a place where I might be able to find an
 answer to this question? Mailing list? Web forum? IRC channel, even?
 I'd really rather not have to pull specs on every DOCSIS 3 compatible
 modem and make a best guess based on microcontrollers/CPUs.

 Many thanks,

 -David



 On 3/18/14, David Noel david.i.n...@gmail.com wrote:
 Well, I bumped Maximum State Table from the default of 23,000 to
 75,000, and now it's throwing fewer UnknownHostException's. But
 they're still being thrown. My resource utilization is getting pretty
 high though. I don't think these ALIX boards can handle much more of a
 load, and I still have 2 more servers I need to scale these crawlers
 out to. I do see there's a Firewall Adaptive Timeouts setting in the
 web configurator.. this seems like it might be useful. Can anyone
 recommend any settings I should try to free up some system resources?
 I'm not clear on the consequences of purging pf state entries and
 whether that's something I'd want to do though.

 The state table on my primary router (alix1) is at roughly 50%
 utilization, or 40,000 states. The state table on my secondary router
 (alix2) is at 0%, roughly 250 states. This seems odd. Is this to be
 expected under CARP? Why is the load not distributed evenly?

 Memory usage on my primary router (alix1) is hovering around 55% (of
 235MB). On my backup (alix2) it's pushing 85-90%. Does this make sense
 to anyone? Top output looks roughly the same... and now alix2 has gone
 down. 95% packet loss. Web Configurator unresponsive. ... It's back up
 but throwing 500 - Internal Server Errors periodically. I've ssh'd
 in to alix2 and am looking at top output.. tcpdump seems to be running
 for pflog purposes.. and it's hogging quite a bit of CPU. Is this
 necessary? Can I disable it somehow?

 -David

 On 3/18/14, David Noel david.i.n...@gmail.com wrote:
 I've encountered a strange issue while scaling a Java project that I'm
 not quite sure how to resolve. Any thoughts would be appreciated.

 The code is a crawler that uses HTMLUnit to crawl a bunch of pages
 concurrently. It uses HTMLUnits getPage method to do the crawling. I'm
 running 100 threads per instance. When I have 1 instance up and
 running on 1 machine everything is fine. When I scale it to a second
 machine though I start having trouble. Calls to getPage keep throwing
 UnknownHostException's (DNS resolution error). With 2 servers running,
 roughly 1 out of every 20 calls to getPage throw this exception. For
 some reason it's unable to resolve domain names.. and it's not just
 the crawlers, my entire network starts to bug on DNS queries. On
 different systems on the same network I get 'unable to resolve host'
 errors in my web browser periodically when loading URL's. Usually when
 I retry it goes through, but it keeps happening sporadically as long
 as the crawlers are running.

 So many things could be going wrong here. Thinking maybe it was my
 provider throttling DNS queries I've tried changing DNS servers, but
 that's done nothing. Thinking it might be a bandwidth issue I checked
 systat, but the cumulative load is well under what my line can handle.
 What else could be causing this? My network is pretty simple: Provider
 -- modem -- 2 ALIX boards running pfSense -- Servers and
 workstations. The servers are running FreeBSD, and the workstations
 run FreeBSD, Windows, and OSX.

 Has anyone encountered this before? Does anyone have any thoughts on
 what might be causing it?

 My only other thought is that maybe pfSense is doing something strange
 so if I can't come up with any better ideas I'll try plugging the
 servers directly into the modem. I'd rather have them behind the
 routers though, so this would be a less-than-ideal solution.

 UPDATE: Ok, so it seems to be a pfSense 

Re: [pfSense] DNS resolution issues under heavy load

2014-03-25 Thread David Noel
 Short answer: no DOCSIS cable modems are designed for that kind of
 throughput!

Ugh... I've been suspecting that.

 Juniper sells MX480 routers to 10,000-customer-ISPs for ~$250k!
 (Granted, that *is* overkill, but even 10k-user corporations will have
 fairly high-end routers connected via fiber to handle that much traffic.)

Yikes. That's way outside of my budget. I suspect co-locating or
leasing a T3 are really my only options.

 Your best bet, I think, would be to find a DOCSIS 3 cable modem that can
 be put into bridging mode.  At that point, the CPU/RAM limitations of
 the cable modem are no longer relevant.

 Some confirmation:
  -
 http://jkoblovsky.wordpress.com/2012/11/21/how-to-use-your-own-router-with-rogers-docsis-3-0-upgrade/
  -
 http://communityforums.rogers.com/t5/forums/forumtopicpage/board-id/Getting_connected/thread-id/12199
  (implies Hitron and Moto/ARRIS modems can also do bridge-mode)
  - http://digitalhome.ca/forum/showthread.php?t=145997page=6
  (implies SMC modem can do bridge mode)
  - http://www.dslreports.com/faq/comcast/2.1_Modems#17174
  (Comcast-specific)

 Once your modem is in bridge mode, the bottleneck should be your
 router.  As you've mentioned, your ALIX boxes are pretty much at their
 limit, too, so you're just moving the bottleneck around.

I've enabled bridging for the statics and it's still giving me
trouble. I think I'm going to wind up having to dig through the specs
of the highest-end cable modems I can find and buy the one with the
most CPU/RAM. Thanks for the links -- I didn't know they made 24x8's.
If any cable modem can handle the load I'm generating I bet it'd be
one of those.

-Davod
___
List mailing list
List@lists.pfsense.org
https://lists.pfsense.org/mailman/listinfo/list


Re: [pfSense] DNS resolution issues under heavy load

2014-03-20 Thread David Noel
Unsubscribe is here: http://lists.pfsense.org/mailman/listinfo/list

On 3/19/14, Edouard De Keyser edou...@ipfix.be wrote:
 Please stop your mail.
 Thank you

 Envoyé de mon SkyTel

 Le 19 mars 2014 à 20:29, Chris Buechler c...@pfsense.com a écrit :

 It sounds like you don't have state sync enabled on the secondary, it
 won't accept the primary's states without that.

 Depending on how much load you're generating with the crawlers, you
 could be hitting the limits of the ALIX in new connections per sec.
 I've seen with one customer where they were blasting out 10K+ emails
 (and 10K+ SMTP connections) in less than a second, which put adequate
 load on their ALIX pair that it failed over CARP because the primary
 was under too much load to send its advertisements.

 Though the modem theory is just as plausible, especially if the modem
 is doing any kind of NAT or filtering. If you're not hitting it so
 hard you're failing over CARP, that points to it being something other
 than the firewall. Packet capture on WAN filtered on port 53 would be
 more telling. If you see DNS queries leaving there that get no reply
 back, it's not the firewall.


 On Wed, Mar 19, 2014 at 9:50 AM, David Noel david.i.n...@gmail.com
 wrote:
 Well, it may not be the ALIX boards after all. I connected the servers
 directly to the modem, ran the crawlers, and I'm still getting
 UnknownHostException's. I'm guessing my modem's to blame... I'll have
 to upgrade it and find out.

 On 3/18/14, David Noel david.i.n...@gmail.com wrote:
 Well, I bumped Maximum State Table from the default of 23,000 to
 75,000, and now it's throwing fewer UnknownHostException's. But
 they're still being thrown. My resource utilization is getting pretty
 high though. I don't think these ALIX boards can handle much more of a
 load, and I still have 2 more servers I need to scale these crawlers
 out to. I do see there's a Firewall Adaptive Timeouts setting in the
 web configurator.. this seems like it might be useful. Can anyone
 recommend any settings I should try to free up some system resources?
 I'm not clear on the consequences of purging pf state entries and
 whether that's something I'd want to do though.

 The state table on my primary router (alix1) is at roughly 50%
 utilization, or 40,000 states. The state table on my secondary router
 (alix2) is at 0%, roughly 250 states. This seems odd. Is this to be
 expected under CARP? Why is the load not distributed evenly?

 Memory usage on my primary router (alix1) is hovering around 55% (of
 235MB). On my backup (alix2) it's pushing 85-90%. Does this make sense
 to anyone? Top output looks roughly the same... and now alix2 has gone
 down. 95% packet loss. Web Configurator unresponsive. ... It's back up
 but throwing 500 - Internal Server Errors periodically. I've ssh'd
 in to alix2 and am looking at top output.. tcpdump seems to be running
 for pflog purposes.. and it's hogging quite a bit of CPU. Is this
 necessary? Can I disable it somehow?

 -David

 On 3/18/14, David Noel david.i.n...@gmail.com wrote:
 I've encountered a strange issue while scaling a Java project that I'm
 not quite sure how to resolve. Any thoughts would be appreciated.

 The code is a crawler that uses HTMLUnit to crawl a bunch of pages
 concurrently. It uses HTMLUnits getPage method to do the crawling. I'm
 running 100 threads per instance. When I have 1 instance up and
 running on 1 machine everything is fine. When I scale it to a second
 machine though I start having trouble. Calls to getPage keep throwing
 UnknownHostException's (DNS resolution error). With 2 servers running,
 roughly 1 out of every 20 calls to getPage throw this exception. For
 some reason it's unable to resolve domain names.. and it's not just
 the crawlers, my entire network starts to bug on DNS queries. On
 different systems on the same network I get 'unable to resolve host'
 errors in my web browser periodically when loading URL's. Usually when
 I retry it goes through, but it keeps happening sporadically as long
 as the crawlers are running.

 So many things could be going wrong here. Thinking maybe it was my
 provider throttling DNS queries I've tried changing DNS servers, but
 that's done nothing. Thinking it might be a bandwidth issue I checked
 systat, but the cumulative load is well under what my line can handle.
 What else could be causing this? My network is pretty simple: Provider
 -- modem -- 2 ALIX boards running pfSense -- Servers and
 workstations. The servers are running FreeBSD, and the workstations
 run FreeBSD, Windows, and OSX.

 Has anyone encountered this before? Does anyone have any thoughts on
 what might be causing it?

 My only other thought is that maybe pfSense is doing something strange
 so if I can't come up with any better ideas I'll try plugging the
 servers directly into the modem. I'd rather have them behind the
 routers though, so this would be a less-than-ideal solution.

 UPDATE: Ok, so it seems to be a pfSense issue. I 

Re: [pfSense] DNS resolution issues under heavy load

2014-03-19 Thread Chris Buechler
It sounds like you don't have state sync enabled on the secondary, it
won't accept the primary's states without that.

Depending on how much load you're generating with the crawlers, you
could be hitting the limits of the ALIX in new connections per sec.
I've seen with one customer where they were blasting out 10K+ emails
(and 10K+ SMTP connections) in less than a second, which put adequate
load on their ALIX pair that it failed over CARP because the primary
was under too much load to send its advertisements.

Though the modem theory is just as plausible, especially if the modem
is doing any kind of NAT or filtering. If you're not hitting it so
hard you're failing over CARP, that points to it being something other
than the firewall. Packet capture on WAN filtered on port 53 would be
more telling. If you see DNS queries leaving there that get no reply
back, it's not the firewall.


On Wed, Mar 19, 2014 at 9:50 AM, David Noel david.i.n...@gmail.com wrote:
 Well, it may not be the ALIX boards after all. I connected the servers
 directly to the modem, ran the crawlers, and I'm still getting
 UnknownHostException's. I'm guessing my modem's to blame... I'll have
 to upgrade it and find out.

 On 3/18/14, David Noel david.i.n...@gmail.com wrote:
 Well, I bumped Maximum State Table from the default of 23,000 to
 75,000, and now it's throwing fewer UnknownHostException's. But
 they're still being thrown. My resource utilization is getting pretty
 high though. I don't think these ALIX boards can handle much more of a
 load, and I still have 2 more servers I need to scale these crawlers
 out to. I do see there's a Firewall Adaptive Timeouts setting in the
 web configurator.. this seems like it might be useful. Can anyone
 recommend any settings I should try to free up some system resources?
 I'm not clear on the consequences of purging pf state entries and
 whether that's something I'd want to do though.

 The state table on my primary router (alix1) is at roughly 50%
 utilization, or 40,000 states. The state table on my secondary router
 (alix2) is at 0%, roughly 250 states. This seems odd. Is this to be
 expected under CARP? Why is the load not distributed evenly?

 Memory usage on my primary router (alix1) is hovering around 55% (of
 235MB). On my backup (alix2) it's pushing 85-90%. Does this make sense
 to anyone? Top output looks roughly the same... and now alix2 has gone
 down. 95% packet loss. Web Configurator unresponsive. ... It's back up
 but throwing 500 - Internal Server Errors periodically. I've ssh'd
 in to alix2 and am looking at top output.. tcpdump seems to be running
 for pflog purposes.. and it's hogging quite a bit of CPU. Is this
 necessary? Can I disable it somehow?

 -David

 On 3/18/14, David Noel david.i.n...@gmail.com wrote:
 I've encountered a strange issue while scaling a Java project that I'm
 not quite sure how to resolve. Any thoughts would be appreciated.

 The code is a crawler that uses HTMLUnit to crawl a bunch of pages
 concurrently. It uses HTMLUnits getPage method to do the crawling. I'm
 running 100 threads per instance. When I have 1 instance up and
 running on 1 machine everything is fine. When I scale it to a second
 machine though I start having trouble. Calls to getPage keep throwing
 UnknownHostException's (DNS resolution error). With 2 servers running,
 roughly 1 out of every 20 calls to getPage throw this exception. For
 some reason it's unable to resolve domain names.. and it's not just
 the crawlers, my entire network starts to bug on DNS queries. On
 different systems on the same network I get 'unable to resolve host'
 errors in my web browser periodically when loading URL's. Usually when
 I retry it goes through, but it keeps happening sporadically as long
 as the crawlers are running.

 So many things could be going wrong here. Thinking maybe it was my
 provider throttling DNS queries I've tried changing DNS servers, but
 that's done nothing. Thinking it might be a bandwidth issue I checked
 systat, but the cumulative load is well under what my line can handle.
 What else could be causing this? My network is pretty simple: Provider
 -- modem -- 2 ALIX boards running pfSense -- Servers and
 workstations. The servers are running FreeBSD, and the workstations
 run FreeBSD, Windows, and OSX.

 Has anyone encountered this before? Does anyone have any thoughts on
 what might be causing it?

 My only other thought is that maybe pfSense is doing something strange
 so if I can't come up with any better ideas I'll try plugging the
 servers directly into the modem. I'd rather have them behind the
 routers though, so this would be a less-than-ideal solution.

 UPDATE: Ok, so it seems to be a pfSense issue. I launched the crawlers
 on 2 servers as before and waited for UnknownHostException's to be
 thrown. I then took a spare laptop and connected it directly into my
 modem, bypassing my 2 pfSense routers. All DNS queries have gone
 through without a hitch, so something strange 

Re: [pfSense] DNS resolution issues under heavy load

2014-03-19 Thread David Noel
Thanks.

I do recall seeing some notifications in the Web Configurator about
sync failing when I had everything up and running. I'm pretty sure
there's still an issue with either the modem or line itself though.
I've plugged the servers directly into the modem, run the crawlers,
and DNS queries still fail. If it were purely an ALIX or pfSense issue
bypassing them should have fixed it. It's strange that only DNS
queries fail... once the addresses resolve the throughput is fine. At
any rate I contacted my provider and they agreed to send out a newer,
heavier-duty modem to try. Hopefully that fixes it.

-David

On 3/19/14, Chris Buechler c...@pfsense.com wrote:
 It sounds like you don't have state sync enabled on the secondary, it
 won't accept the primary's states without that.

 Depending on how much load you're generating with the crawlers, you
 could be hitting the limits of the ALIX in new connections per sec.
 I've seen with one customer where they were blasting out 10K+ emails
 (and 10K+ SMTP connections) in less than a second, which put adequate
 load on their ALIX pair that it failed over CARP because the primary
 was under too much load to send its advertisements.

 Though the modem theory is just as plausible, especially if the modem
 is doing any kind of NAT or filtering. If you're not hitting it so
 hard you're failing over CARP, that points to it being something other
 than the firewall. Packet capture on WAN filtered on port 53 would be
 more telling. If you see DNS queries leaving there that get no reply
 back, it's not the firewall.


 On Wed, Mar 19, 2014 at 9:50 AM, David Noel david.i.n...@gmail.com wrote:
 Well, it may not be the ALIX boards after all. I connected the servers
 directly to the modem, ran the crawlers, and I'm still getting
 UnknownHostException's. I'm guessing my modem's to blame... I'll have
 to upgrade it and find out.

 On 3/18/14, David Noel david.i.n...@gmail.com wrote:
 Well, I bumped Maximum State Table from the default of 23,000 to
 75,000, and now it's throwing fewer UnknownHostException's. But
 they're still being thrown. My resource utilization is getting pretty
 high though. I don't think these ALIX boards can handle much more of a
 load, and I still have 2 more servers I need to scale these crawlers
 out to. I do see there's a Firewall Adaptive Timeouts setting in the
 web configurator.. this seems like it might be useful. Can anyone
 recommend any settings I should try to free up some system resources?
 I'm not clear on the consequences of purging pf state entries and
 whether that's something I'd want to do though.

 The state table on my primary router (alix1) is at roughly 50%
 utilization, or 40,000 states. The state table on my secondary router
 (alix2) is at 0%, roughly 250 states. This seems odd. Is this to be
 expected under CARP? Why is the load not distributed evenly?

 Memory usage on my primary router (alix1) is hovering around 55% (of
 235MB). On my backup (alix2) it's pushing 85-90%. Does this make sense
 to anyone? Top output looks roughly the same... and now alix2 has gone
 down. 95% packet loss. Web Configurator unresponsive. ... It's back up
 but throwing 500 - Internal Server Errors periodically. I've ssh'd
 in to alix2 and am looking at top output.. tcpdump seems to be running
 for pflog purposes.. and it's hogging quite a bit of CPU. Is this
 necessary? Can I disable it somehow?

 -David

 On 3/18/14, David Noel david.i.n...@gmail.com wrote:
 I've encountered a strange issue while scaling a Java project that I'm
 not quite sure how to resolve. Any thoughts would be appreciated.

 The code is a crawler that uses HTMLUnit to crawl a bunch of pages
 concurrently. It uses HTMLUnits getPage method to do the crawling. I'm
 running 100 threads per instance. When I have 1 instance up and
 running on 1 machine everything is fine. When I scale it to a second
 machine though I start having trouble. Calls to getPage keep throwing
 UnknownHostException's (DNS resolution error). With 2 servers running,
 roughly 1 out of every 20 calls to getPage throw this exception. For
 some reason it's unable to resolve domain names.. and it's not just
 the crawlers, my entire network starts to bug on DNS queries. On
 different systems on the same network I get 'unable to resolve host'
 errors in my web browser periodically when loading URL's. Usually when
 I retry it goes through, but it keeps happening sporadically as long
 as the crawlers are running.

 So many things could be going wrong here. Thinking maybe it was my
 provider throttling DNS queries I've tried changing DNS servers, but
 that's done nothing. Thinking it might be a bandwidth issue I checked
 systat, but the cumulative load is well under what my line can handle.
 What else could be causing this? My network is pretty simple: Provider
 -- modem -- 2 ALIX boards running pfSense -- Servers and
 workstations. The servers are running FreeBSD, and the workstations
 run FreeBSD, Windows, and OSX.

 Has