Re: Consumer networking head scratcher

2017-03-02 Thread Ryan Pugatch


On Thu, Mar 2, 2017, at 12:24 AM, Roland Dobbins wrote:
> On 2 Mar 2017, at 9:55, Oliver O'Boyle wrote:
> 
> > Currently, I have 3 devices connected. :)
> 
> You could have one or more botted machines launching outbound DDoS 
> attacks, potentially filling up the NAT translation table and/or getting 
> squelched by your broadband access provider with layer-4 granularity.  
> And the boxes themselves could be churning away due to being compromised 
> (look at CPU and memory stats over time).  Aggressive horizontal 
> scanning is often a hallmark of botted machines, and it can interrupt 
> normal network access on the botted hosts themselves.
> 
> I don't actually think that's the case, given the symptomology you 
> report, but just wanted to put it out there for the list archive.
> 
> What about DNS issues?  Are you sure that you really have a networking 
> issue, or are you having intermittent DNS resolution problems caused by 
> flaky/overloaded/attacked recursivs, EDNS0 problems (i.e., filtering on 
> DNS responses > 512 bytes), or TCP/53 blockage?  Different host 
> OSes/browsers/apps exhibit differing re-query characteristics.  Are the 
> Windows boxes and the other boxes set to use the same recursors?  Can 
> you resolve DNS requests during the outages?
> 
> Are your boxes statically-addressed, or are they using DHCP?  
> Periodically-duplicate IPs can cause intermittent symptoms, too.  If 
> you're using the consumer router as a DHCP server, DHCP-lease nonsense 
> could be a contributing factor.
> 
> Are the Windows boxes running some common application/service which 
> updates and/or churns periodically?  Are they members of a Windows 
> workgroup?  All kinds of strange name-resolution stuff goes on with 
> Windows-specific networking.
> 
> Also, be sure to use -n with traceroute.  tcptraceroute is useful, too.  
> netstat -rn should work on Windows boxes, IIRC.
> 
> ---
> Roland Dobbins 

It isn't a DNS issue as trying to access resources via IP address
directly also have the issue.

What became clear to me last night is that this actually also impacts my
Mac, and that it has to do with traffic not properly making it back to
my machines.  When the issue occurs, my traffic makes it out to the
destination, the destination responds, but that packet never makes it to
my laptop, for example.  I tested by sending traffic to a server I
control and doing PCAPs on both ends.

Thanks,
Ryan



Re: Consumer networking head scratcher

2017-03-02 Thread Ryan Pugatch


On Thu, Mar 2, 2017, at 10:32 AM, Dann Schuler wrote:
> Just a quick sanity check here since I know we can occasionally overlook
> the simple things.  You have updated the firmware to the latest available
> version correct?  Have you checked for any odd services like QoS,
> parental controls or an IDS?  Have you tried wiping it to factory default
> and reconfiguring it?
> 
> What happens if you give the affected machine a new IP?  Could it be some
> service on the device affecting that specific IP?
> 

Yes, I've done all of these.  It was running the latest version of code
and I even tried rolling back.  Disabled SPI firewalls, ipv6, verified
QoS and parental controls are off, etc.

The issue impacts multiple device so doesn't appear specific to one IP.

Thanks


RE: Consumer networking head scratcher

2017-03-02 Thread Dann Schuler
Just a quick sanity check here since I know we can occasionally overlook the 
simple things.  You have updated the firmware to the latest available version 
correct?  Have you checked for any odd services like QoS, parental controls or 
an IDS?  Have you tried wiping it to factory default and reconfiguring it?

What happens if you give the affected machine a new IP?  Could it be some 
service on the device affecting that specific IP?


-Original Message-
From: NANOG [mailto:nanog-boun...@nanog.org] On Behalf Of David Bass
Sent: Thursday, March 2, 2017 9:09 AM
To: Aaron Gould <aar...@gvtc.com>
Cc: <nanog@nanog.org> <nanog@nanog.org>
Subject: Re: Consumer networking head scratcher

This all goes away when he reconnects his old router from what I remember...

If that is the case, then I would concentrate my effort on the new router, and 
its functionality (or lack of).  Could be something simple that you are missing 
on it as a setting, or assuming it works a certain way when it does not.  
Sometimes these devices can be counter intuitive.

On Wed, Mar 1, 2017 at 1:23 PM, Aaron Gould <aar...@gvtc.com> wrote:

> That's strange... it's like the TTL on all Windows IP packets are 
> decrementing more and more as time goes on causing you to get less and 
> less hops into the internet
>
> I wonder if it's a bug/virus/malware affecting only your windows computers.
>
> -Aaron
>
>
>


Re: Consumer networking head scratcher

2017-03-02 Thread David Bass
This all goes away when he reconnects his old router from what I remember...

If that is the case, then I would concentrate my effort on the new router,
and its functionality (or lack of).  Could be something simple that you are
missing on it as a setting, or assuming it works a certain way when it does
not.  Sometimes these devices can be counter intuitive.

On Wed, Mar 1, 2017 at 1:23 PM, Aaron Gould  wrote:

> That's strange... it's like the TTL on all Windows IP packets are
> decrementing more and more as time goes on causing you to get less and less
> hops into the internet
>
> I wonder if it's a bug/virus/malware affecting only your windows computers.
>
> -Aaron
>
>
>


Re: Consumer networking head scratcher

2017-03-02 Thread Mark Wiater

On 3/1/2017 11:28 AM, Ryan Pugatch wrote:

At random times, my Windows machines (Win 7 and Win 10, attached to the
network via WiFi, 5GHz) lose connectivity to the Internet.  They can
continue to access internal resources, such as the router's admin
interface.
To the point of Windows reporting no internet access, MS does two things 
to determine if the machine has internet access, as outlined here. 
https://technet.microsoft.com/en-us/library/cc766017(v=ws.10).aspx (I 
think that's still valid)


From a console, can these two machines do the http request and the dns 
lookup when they tell you they're offline?  Can the other machines do 
these two things when the Windows machines can't or when the windows 
machines report offline?





RE: Consumer networking head scratcher

2017-03-02 Thread Aaron Gould
Nat translation limits might not only be related to his first hop nat device
In the home, but these days with the exhaustion of ipv4, the second hop
carrier grade nat (cgnat) device in his upstream provider could be limiting
also.   

I run a cgnat for an isp and allow 2500 ports per customer private address,
and time out those translations at 120 seconds.  It's possible to hit a
limit there.  I see it sometimes.

-Aaron




RE: Consumer networking head scratcher

2017-03-02 Thread Aaron Gould
What's the old router make/model ?
What's the new router make/model ?

-Aaron

-Original Message-
From: Ryan Pugatch [mailto:r...@lp0.org] 
Sent: Wednesday, March 1, 2017 12:27 PM
To: Aaron Gould <aar...@gvtc.com>; nanog@nanog.org
Subject: Re: Consumer networking head scratcher

The issue doesn't happen with my previous router, and I've tested multiple 
computers (one that isn't mine.)

It doesn't seem like it decrements over time.. it just dies sooner as I trace 
further up the path.  I can consistently die at the 7th hop if I try to go to 
Google, but if I trace to the 6th hop, it'll die at the 5th hop!


On Wed, Mar 1, 2017, at 01:23 PM, Aaron Gould wrote:
> That's strange... it's like the TTL on all Windows IP packets are 
> decrementing more and more as time goes on causing you to get less and 
> less hops into the internet
> 
> I wonder if it's a bug/virus/malware affecting only your windows 
> computers.
> 
> -Aaron
> 
> 



RE: Consumer networking head scratcher

2017-03-02 Thread Aaron Gould
That's strange... it's like the TTL on all Windows IP packets are decrementing 
more and more as time goes on causing you to get less and less hops into the 
internet

I wonder if it's a bug/virus/malware affecting only your windows computers.

-Aaron




Re: Consumer networking head scratcher

2017-03-01 Thread Chuck Anderson
On Thu, Mar 02, 2017 at 12:24:38PM +0700, Roland Dobbins wrote:
> On 2 Mar 2017, at 9:55, Oliver O'Boyle wrote:
> 
> >Currently, I have 3 devices connected. :)
> 
> What about DNS issues?  Are you sure that you really have a
> networking issue, or are you having intermittent DNS resolution
> problems caused by flaky/overloaded/attacked recursivs, EDNS0

This reminded me of another possibility related to NAT table
exhaustion.  Are you running a full recursive resolver on a system
behind the NAT?  Especially one like unbound possibly w/dnssec?  I had
some strange issues caused during the time when unbound was priming
its cache from a cold start...


Re: Consumer networking head scratcher

2017-03-01 Thread Roland Dobbins

On 2 Mar 2017, at 9:55, Oliver O'Boyle wrote:


Currently, I have 3 devices connected. :)


You could have one or more botted machines launching outbound DDoS 
attacks, potentially filling up the NAT translation table and/or getting 
squelched by your broadband access provider with layer-4 granularity.  
And the boxes themselves could be churning away due to being compromised 
(look at CPU and memory stats over time).  Aggressive horizontal 
scanning is often a hallmark of botted machines, and it can interrupt 
normal network access on the botted hosts themselves.


I don't actually think that's the case, given the symptomology you 
report, but just wanted to put it out there for the list archive.


What about DNS issues?  Are you sure that you really have a networking 
issue, or are you having intermittent DNS resolution problems caused by 
flaky/overloaded/attacked recursivs, EDNS0 problems (i.e., filtering on 
DNS responses > 512 bytes), or TCP/53 blockage?  Different host 
OSes/browsers/apps exhibit differing re-query characteristics.  Are the 
Windows boxes and the other boxes set to use the same recursors?  Can 
you resolve DNS requests during the outages?


Are your boxes statically-addressed, or are they using DHCP?  
Periodically-duplicate IPs can cause intermittent symptoms, too.  If 
you're using the consumer router as a DHCP server, DHCP-lease nonsense 
could be a contributing factor.


Are the Windows boxes running some common application/service which 
updates and/or churns periodically?  Are they members of a Windows 
workgroup?  All kinds of strange name-resolution stuff goes on with 
Windows-specific networking.


Also, be sure to use -n with traceroute.  tcptraceroute is useful, too.  
netstat -rn should work on Windows boxes, IIRC.


---
Roland Dobbins 


Re: Consumer networking head scratcher

2017-03-01 Thread Oliver O'Boyle
Next -->

On March 1, 2017, at 9:31 PM, Ryan Pugatch  wrote:




On Wed, Mar 1, 2017, at 09:29 PM, Oliver O'Boyle wrote:

Each device associated with the AP consumes memory. Small low-end routers don't 
typically come with much memory. If you've got a lot of devices associated with 
the AP you will run out of memory. I'm not sure how many devices you're 
connecting, though. Three will not cause this problem. 30 might.


O.



Currently, I have 3 devices connected. :)




Re: Consumer networking head scratcher

2017-03-01 Thread Ryan Pugatch




On Wed, Mar 1, 2017, at 09:29 PM, Oliver O'Boyle wrote:

> Each device associated with the AP consumes memory. Small low-end
> routers don't typically come with much memory. If you've got a lot of
> devices associated with the AP you will run out of memory. I'm not
> sure how many devices you're connecting, though. Three will not cause
> this problem. 30 might.
> 

> O.

> 



Currently, I have 3 devices connected. :)




Re: Consumer networking head scratcher

2017-03-01 Thread Oliver O'Boyle
Each device associated with the AP consumes memory. Small low-end routers
don't typically come with much memory. If you've got a lot of devices
associated with the AP you will run out of memory. I'm not sure how many
devices you're connecting, though. Three will not cause this problem. 30
might.

O.

On Wed, Mar 1, 2017 at 9:22 PM, Ryan Pugatch  wrote:

>
>
> On Wed, Mar 1, 2017, at 06:35 PM, Jean-Francois Mezei wrote:
> > On 2017-03-01 11:28, Ryan Pugatch wrote:
> >
> > > At random times, my Windows machines (Win 7 and Win 10, attached to the
> > > network via WiFi, 5GHz) lose connectivity to the Internet.
> >
> > > For what it's worth, the router is a Linksys EA7300 that I just picked
> > > up.
> >
> >
> > Way back when, I have a netgear router. It ended having a limit on its
> > NAT translation table, and when I had too many connections going at same
> > time (or not yet timed out), I would lose connection. There was an
> > unofficial patch to the firmware (litterally a patch in code that
> > defined table size) to increase that table to 1000- as I recall.
> >
> > Does the Linksys have a means to display the NAT translation table and
> > see if maybe connections are lost when that table is full and lots of
> > connections have not yet timed out ?
> >
>
>
> It doesn't seem to provide visibility into the NAT tables.  However, I'm
> starting to think you might be on to something.
>
> The issue actually happened to my Mac tonight, and sure enough the
> traceroute dies at the same time.  So, it isn't just the Windows
> machines impacted.
>
> I did a packet capture on my end, and on a server somewhere that I
> control and sent pings from my laptop to the server.
>
> The server received my ICMP packets and responded, but those responses
> never made it back to my laptop.
>
> Meanwhile, my Roku is actively streaming from the Internet, so it's not
> like the Internet was down.
>



-- 
:o@>


Re: Consumer networking head scratcher

2017-03-01 Thread Ryan Pugatch


On Wed, Mar 1, 2017, at 06:35 PM, Jean-Francois Mezei wrote:
> On 2017-03-01 11:28, Ryan Pugatch wrote:
> 
> > At random times, my Windows machines (Win 7 and Win 10, attached to the
> > network via WiFi, 5GHz) lose connectivity to the Internet. 
> 
> > For what it's worth, the router is a Linksys EA7300 that I just picked
> > up.
> 
> 
> Way back when, I have a netgear router. It ended having a limit on its
> NAT translation table, and when I had too many connections going at same
> time (or not yet timed out), I would lose connection. There was an
> unofficial patch to the firmware (litterally a patch in code that
> defined table size) to increase that table to 1000- as I recall.
> 
> Does the Linksys have a means to display the NAT translation table and
> see if maybe connections are lost when that table is full and lots of
> connections have not yet timed out ?
> 


It doesn't seem to provide visibility into the NAT tables.  However, I'm
starting to think you might be on to something.

The issue actually happened to my Mac tonight, and sure enough the
traceroute dies at the same time.  So, it isn't just the Windows
machines impacted.

I did a packet capture on my end, and on a server somewhere that I
control and sent pings from my laptop to the server.

The server received my ICMP packets and responded, but those responses
never made it back to my laptop.

Meanwhile, my Roku is actively streaming from the Internet, so it's not
like the Internet was down.


Re: Consumer networking head scratcher

2017-03-01 Thread Jean-Francois Mezei
On 2017-03-01 11:28, Ryan Pugatch wrote:

> At random times, my Windows machines (Win 7 and Win 10, attached to the
> network via WiFi, 5GHz) lose connectivity to the Internet. 

> For what it's worth, the router is a Linksys EA7300 that I just picked
> up.


Way back when, I have a netgear router. It ended having a limit on its
NAT translation table, and when I had too many connections going at same
time (or not yet timed out), I would lose connection. There was an
unofficial patch to the firmware (litterally a patch in code that
defined table size) to increase that table to 1000- as I recall.

Does the Linksys have a means to display the NAT translation table and
see if maybe connections are lost when that table is full and lots of
connections have not yet timed out ?



Re: Consumer networking head scratcher

2017-03-01 Thread Ryan Pugatch


On Wed, Mar 1, 2017, at 03:58 PM, iam...@gmail.com wrote:
> On many non-windows OS (Mac OSX, Linux, FreeBSD etc.) you can specify
> ICMP
> traceroute using -I:
> 
> traceroute -I google.com
> 
> I wonder if this would replicate your experience with Windows tracert


Definitely on my list to test.

Thanks.


Re: Consumer networking head scratcher

2017-03-01 Thread iam...@gmail.com
On many non-windows OS (Mac OSX, Linux, FreeBSD etc.) you can specify ICMP
traceroute using -I:

traceroute -I google.com

I wonder if this would replicate your experience with Windows tracert


Re: Consumer networking head scratcher

2017-03-01 Thread Ryan Pugatch


On Wed, Mar 1, 2017, at 02:57 PM, William Herrin wrote:
> On Wed, Mar 1, 2017 at 2:31 PM, Ryan Pugatch  wrote:
> > So in that case, I would be back to my original issue where I stop being
> > able to pass traffic to the Internet, and when that happens my
> > traceroute always dies at the same hop.  After disconnecting and
> > reconnecting, the same traceroute will go all the way through.
> 
> Hi Ryan,
> 
> Next step: run Wireshark and see what you see during the traceroutes.
> Are they leaving with a reasonable TTL? Is it certain that nothing
> returns? Are the packets going to the ethernet MAC address you expect
> them to?
> 
> I had a fun problem once when I cloned some VMs but neglected to
> change the source MAC address. They all seemed to work under light
> load but get two downloading at once and suddenly they both
> experienced major packet loss.
> 
> Regards,
> Bill
> 

Definitely the direction I'm going.  Even aside from the traceroutes,
I'm going to capture some regular web traffic to see what is happening. 
Planning to send traffic to a machine I control to see if any packets
are actually making it through at all.

I'm not sure if this new Linksys router has any packet capture ability
that is exposed to the end user, but I'd also love be able to see what's
actually going through the router itself.

Thanks,
Ryan


Re: Consumer networking head scratcher

2017-03-01 Thread William Herrin
On Wed, Mar 1, 2017 at 2:31 PM, Ryan Pugatch  wrote:
> So in that case, I would be back to my original issue where I stop being
> able to pass traffic to the Internet, and when that happens my
> traceroute always dies at the same hop.  After disconnecting and
> reconnecting, the same traceroute will go all the way through.

Hi Ryan,

Next step: run Wireshark and see what you see during the traceroutes.
Are they leaving with a reasonable TTL? Is it certain that nothing
returns? Are the packets going to the ethernet MAC address you expect
them to?

I had a fun problem once when I cloned some VMs but neglected to
change the source MAC address. They all seemed to work under light
load but get two downloading at once and suddenly they both
experienced major packet loss.

Regards,
Bill



-- 
William Herrin  her...@dirtside.com  b...@herrin.us
Owner, Dirtside Systems . Web: 


Re: Consumer networking head scratcher

2017-03-01 Thread Ryan Pugatch


On Wed, Mar 1, 2017, at 02:04 PM, William Herrin wrote:
> > On Wed, Mar 1, 2017, at 01:23 PM, Aaron Gould wrote:
> >> That's strange... it's like the TTL on all Windows IP packets are
> >> decrementing more and more as time goes on causing you to get less and
> >> less hops into the internet
> 
> Hi Ryan,
> 
> Windows tracert uses ICMP echo-request packets to trace the path. It
> expects either an ICMP destination unreachable message or an ICMP echo
> response message to come back. The final hop in the trace will return
> an ICMP echo-response or an unreachable-prohibited. The ones prior to
> the final hop will return an unreachable-time-exceeded if they return
> anything at all.
> 
> If the destination does not respond to ping, if those pings are
> dropped, or if it responds with an unreachable that's dropped you will
> not receive a response and the tracert will not find its end. That's
> why you're seeing the "decrementing" behavior you describe.
> 
> I have no information about whether comcast blocks pings to its routers.
> 
> Regards,
> Bill Herrin
> 

I see what you're saying, and that could explain the decrementing
behavior I'm seeing which ultimately is not a real indicator of the
problem I am having.

So in that case, I would be back to my original issue where I stop being
able to pass traffic to the Internet, and when that happens my
traceroute always dies at the same hop.  After disconnecting and
reconnecting, the same traceroute will go all the way through.

Thanks for the thoughts.


Re: Consumer networking head scratcher

2017-03-01 Thread valdis . kletnieks
On Wed, 01 Mar 2017 14:04:07 -0500, William Herrin said:

> I have no information about whether comcast blocks pings to its routers.

All the Comcast gear in the path from my home router to non-Comcast addresses
will quite cheerfully rate-limit answer both pings and traceroutes.


pgpO6xO_p6EQX.pgp
Description: PGP signature


Re: Consumer networking head scratcher

2017-03-01 Thread William Herrin
> On Wed, Mar 1, 2017, at 01:23 PM, Aaron Gould wrote:
>> That's strange... it's like the TTL on all Windows IP packets are
>> decrementing more and more as time goes on causing you to get less and
>> less hops into the internet

Hi Ryan,

Windows tracert uses ICMP echo-request packets to trace the path. It
expects either an ICMP destination unreachable message or an ICMP echo
response message to come back. The final hop in the trace will return
an ICMP echo-response or an unreachable-prohibited. The ones prior to
the final hop will return an unreachable-time-exceeded if they return
anything at all.

If the destination does not respond to ping, if those pings are
dropped, or if it responds with an unreachable that's dropped you will
not receive a response and the tracert will not find its end. That's
why you're seeing the "decrementing" behavior you describe.

I have no information about whether comcast blocks pings to its routers.

Regards,
Bill Herrin



-- 
William Herrin  her...@dirtside.com  b...@herrin.us
Owner, Dirtside Systems . Web: 


Re: Consumer networking head scratcher

2017-03-01 Thread Ryan Pugatch
The issue doesn't happen with my previous router, and I've tested
multiple computers (one that isn't mine.)

It doesn't seem like it decrements over time.. it just dies sooner as I
trace further up the path.  I can consistently die at the 7th hop if I
try to go to Google, but if I trace to the 6th hop, it'll die at the 5th
hop!


On Wed, Mar 1, 2017, at 01:23 PM, Aaron Gould wrote:
> That's strange... it's like the TTL on all Windows IP packets are
> decrementing more and more as time goes on causing you to get less and
> less hops into the internet
> 
> I wonder if it's a bug/virus/malware affecting only your windows
> computers.
> 
> -Aaron
> 
> 


Consumer networking head scratcher

2017-03-01 Thread Ryan Pugatch
Hi everyone,

I've got a real head scratcher that I have come across after replacing
the router on my home network.

I thought I'd share because it is a fascinating issue to me.

At random times, my Windows machines (Win 7 and Win 10, attached to the
network via WiFi, 5GHz) lose connectivity to the Internet.  They can
continue to access internal resources, such as the router's admin
interface.  Other devices including Macs, iPhones, Android phones, and
Rokus never have this issue.

I realized that on the Windows machines, when the connection drops, if I
run a traceroute, it dies at a certain hop every time (out in Comcast's
network, who is my ISP) even though a Mac sitting right next to it is
able to go all the way through to the destination.

The even stranger thing I discovered last night is that if I trace to
the hop before the hop that it dies at, it then dies at the hop before
that (and as I trace to closer and closer hops, it dies the hop before
that!)

This is illustrated in the traces I've captured here:
http://pastebin.com/raw/R1UHLi0U

For what it's worth, the router is a Linksys EA7300 that I just picked
up.

I can't even imagine what would cause this issue at this point.  If
anyone has any thoughts, I'd love to hear them!

I'm going to start studying some packet captures to see if I can spot an
issue.

Best,
Ryan