[jira] [Commented] (TS-1087) TSHttpTxnOutgoingAddrSet forward declaration does not match implementation

2012-03-07 Thread B Wyatt (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-1087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13224817#comment-13224817
 ] 

B Wyatt commented on TS-1087:
-

I am not at head at the moment, but at least in my version the president set by 
the rest of the API had socklen_t passed in as a parameter.  Is that still the 
case?

I can see the argument either way, the addition of a socklen_t parameter at 
least gives the backend a fighting chance to not read invalid memory if a 
plugin calls in with a malformed socket address type (like a sockaddr_in with 
AF_INET6 for a family).  In a case where the data is correct, it is useless.

FWIW, the signature of the implementation included the socklen_t so either 
nobody was using this function with a recent version of trafficserver 
(unresolved at library load time) -or- they are already using the socklen_t 
parameter and counting on a rogue forward declaration or voodoo to link it.  



 TSHttpTxnOutgoingAddrSet forward declaration does not match implementation
 --

 Key: TS-1087
 URL: https://issues.apache.org/jira/browse/TS-1087
 Project: Traffic Server
  Issue Type: Bug
  Components: TS API
Affects Versions: 3.1.0
Reporter: B Wyatt
Assignee: B Wyatt
Priority: Trivial
 Fix For: 3.1.5

 Attachments: txn-outgoing-addr.patch

   Original Estimate: 1m
  Remaining Estimate: 1m

 ts.h.in lists the following declaration:
 {code}TSReturnCode TSHttpTxnOutgoingAddrSet(TSHttpTxn txnp, struct sockaddr 
 const* addr);{code}
 However, the current implementation has this function sig:
 {code}tsapi TSReturnCode TSHttpTxnOutgoingAddrSet(TSHttpTxn txnp, struct 
 sockaddr const* addr, socklen_t addrlen);{code}
 Trafficserver is unable to load plugins which use this function due to the 
 unresolved symbol.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (TS-1075) Port range bottleneck in transparent proxy mode

2012-03-07 Thread B Wyatt (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-1075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13224821#comment-13224821
 ] 

B Wyatt commented on TS-1075:
-

I will investigate as this will no doubt bite me as well (it may be biting me 
already).

 Port range bottleneck in transparent proxy mode
 ---

 Key: TS-1075
 URL: https://issues.apache.org/jira/browse/TS-1075
 Project: Traffic Server
  Issue Type: Bug
  Components: Core
Affects Versions: 3.0.1
 Environment: Centos 5.6, kernel 2.6.39.2 compiled with TPROXY support
 ATS compiled as: ./configure --enable-tproxy 
Reporter: Danny Shporer
Assignee: B Wyatt
 Fix For: 3.1.3

 Attachments: ports.patch


 The Linux TPROXY stack only takes into account the local addresses when using 
 dynamic bind (bind without specifying a specific port). This limits the port 
 range to only the local range (around 30K by default and can be extended to 
 around 64K) - this together with the TIME-WAIT Linux method of releasing 
 ports causes a bottleneck).
 One symptom of this is that traffic_cop cannot open a connection to the 
 server to monitor it (it gets error 99 - address already in use) and kills 
 it. 
 Another issue is when opening the connection to the server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (TS-949) key-volume hash table is not consistent when a disk is marked as bad or removed due to failure

2011-12-14 Thread B Wyatt (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13169622#comment-13169622
 ] 

B Wyatt commented on TS-949:


Opened a new bug TS-1050, that refers to this bug and addresses the data loss 
on volume addition problem. 

 key-volume hash table is not consistent when a disk is marked as bad or 
 removed due to failure
 ---

 Key: TS-949
 URL: https://issues.apache.org/jira/browse/TS-949
 Project: Traffic Server
  Issue Type: Bug
  Components: Cache
Affects Versions: 3.1.0
 Environment: Multi-volume cache with apparently faulty drives
Reporter: B Wyatt
Assignee: John Plevyak
 Fix For: 3.1.2

 Attachments: TS-949-jp-1.patch, TS-949-jp2.patch, TS949-BW-p1.patch, 
 explicit-pair.patch


 The method for resolving collisions when distributing hash-table space to 
 volumes for the object_key-volume hash table creates inconsistency when a 
 disk is determined to be bad, or when a failed disk is removed from the 
 volume.config.
 Background:
 The hash space is distributed by round robin draft where each volume drafts 
 a random index in the hash table until the hash space is exhausted.  The 
 random order in which a given volume drafts hash table slots is consistent 
 across reboot/crash/disk-failure, however when a volume attempts to draft a 
 slot which has already been occupied, it skips to its next random pick and 
 attempts to draft that slot until it finds an open slot.  This ensures that 
 the hash is partitioned evenly between volumes.
 The issue:
 Resolving slot contention breaks the consistency as it is dependent on the 
 order that the volumes draft.  When rebuilding the hash after disk failure or 
 reboot with fewer drives, a volume may secure an index that was previously 
 occupied by the dead-disk.  In the old hash, the surviving volume would have 
 selected another random index due to contention.  If this index is taken, by 
 the next draft round it will represent an inconsistent key-volume result.  
 The effects of one inconsistency will then cascade as whichever volume 
 occupies that index after removing a dead disk is now behind on its draft 
 sequence as well. 
 An Example:
 ||Disk||Draft Sequence||
 |A|1,4,7,5|
 |B|4,2,8,1|
 |C|3,7,5,2|
 Pre-failure Hash Table after 2 rounds of draft:
 |A|B|C|B|C|?|A|?|
 Post-failure of drive B Hash Table after 3 rounds of draft:
 |A|C|C|A|{color:red}A{color}|?|{color:red}C{color}|?|
 Two slots have become inconsistent and more will probably follow.  These 
 inconsistencies become objects stored in a volume but lost to the top level 
 cache for open/lookup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (TS-949) key-volume hash table is not consistent when a disk is marked as bad or removed due to failure

2011-12-12 Thread B Wyatt (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13167597#comment-13167597
 ] 

B Wyatt commented on TS-949:


Thanks John.  I think the new patch should be more stable.  I apologize for the 
misread of the previous patch, all of my volumes are matched in size so I had 
erroneously tuned out the inclusion of vol-len in the initial value of 
forvol[i].

While I am not an enforcer of code quality, I think the particulars of this 
method should at the very least be documented in the patched code.  I'll let 
someone else decide whether it is worth the effort to pretty it up.

All of this digging has brought up a new related issue (that I am pretty sure 
we cannot address at this level): Object loss when adding volumes.  The hash is 
now consistent, however when a new volume supersedes an existing volume in the 
hash, any object that maps to that bucket but currently stored on the old 
volume will become inaccessible.  I will probably create a new issue for that 
as this one is solved in my book.  

 key-volume hash table is not consistent when a disk is marked as bad or 
 removed due to failure
 ---

 Key: TS-949
 URL: https://issues.apache.org/jira/browse/TS-949
 Project: Traffic Server
  Issue Type: Bug
  Components: Cache
Affects Versions: 3.1.0
 Environment: Multi-volume cache with apparently faulty drives
Reporter: B Wyatt
Assignee: John Plevyak
 Fix For: 3.1.2

 Attachments: TS-949-jp-1.patch, TS-949-jp2.patch, TS949-BW-p1.patch


 The method for resolving collisions when distributing hash-table space to 
 volumes for the object_key-volume hash table creates inconsistency when a 
 disk is determined to be bad, or when a failed disk is removed from the 
 volume.config.
 Background:
 The hash space is distributed by round robin draft where each volume drafts 
 a random index in the hash table until the hash space is exhausted.  The 
 random order in which a given volume drafts hash table slots is consistent 
 across reboot/crash/disk-failure, however when a volume attempts to draft a 
 slot which has already been occupied, it skips to its next random pick and 
 attempts to draft that slot until it finds an open slot.  This ensures that 
 the hash is partitioned evenly between volumes.
 The issue:
 Resolving slot contention breaks the consistency as it is dependent on the 
 order that the volumes draft.  When rebuilding the hash after disk failure or 
 reboot with fewer drives, a volume may secure an index that was previously 
 occupied by the dead-disk.  In the old hash, the surviving volume would have 
 selected another random index due to contention.  If this index is taken, by 
 the next draft round it will represent an inconsistent key-volume result.  
 The effects of one inconsistency will then cascade as whichever volume 
 occupies that index after removing a dead disk is now behind on its draft 
 sequence as well. 
 An Example:
 ||Disk||Draft Sequence||
 |A|1,4,7,5|
 |B|4,2,8,1|
 |C|3,7,5,2|
 Pre-failure Hash Table after 2 rounds of draft:
 |A|B|C|B|C|?|A|?|
 Post-failure of drive B Hash Table after 3 rounds of draft:
 |A|C|C|A|{color:red}A{color}|?|{color:red}C{color}|?|
 Two slots have become inconsistent and more will probably follow.  These 
 inconsistencies become objects stored in a volume but lost to the top level 
 cache for open/lookup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (TS-949) key-volume hash table is not consistent when a disk is marked as bad or removed due to failure

2011-12-06 Thread B Wyatt (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13163641#comment-13163641
 ] 

B Wyatt commented on TS-949:


Thanks John, this scheme certainly solves disks failing/being added to the 
cache in a deterministic way.  I tend to agree with you that the extra effort 
to guarantee an equal distribution of hash buckets is of questionable value.  

It does look like there is some cruft in the patch.  Score is multiplied by a 
value which is almost-constant across the volumes and divided by an integer 
constant.  The comments indicate that this may have been an attempt to even out 
the distribution, but as it would cause the same type of inconsistency on disk 
loss as the previous scheme I assume it was disabled on purpose (by not 
decrementing forvol[*] ever).  Eitherway, the result of the comparison will 
currently be the same as the un-multiplied un-divided comparison if the integer 
truncation is not important. 

Also I think ttable[i] = top; should be ttable[i] = mapping[top]; as the 
range of valid volume indices has holes in the case that a disk(s) have been 
declare bad.

 key-volume hash table is not consistent when a disk is marked as bad or 
 removed due to failure
 ---

 Key: TS-949
 URL: https://issues.apache.org/jira/browse/TS-949
 Project: Traffic Server
  Issue Type: Bug
  Components: Cache
Affects Versions: 3.1.0
 Environment: Multi-volume cache with apparently faulty drives
Reporter: B Wyatt
Assignee: John Plevyak
 Fix For: 3.1.2

 Attachments: TS-949-jp-1.patch


 The method for resolving collisions when distributing hash-table space to 
 volumes for the object_key-volume hash table creates inconsistency when a 
 disk is determined to be bad, or when a failed disk is removed from the 
 volume.config.
 Background:
 The hash space is distributed by round robin draft where each volume drafts 
 a random index in the hash table until the hash space is exhausted.  The 
 random order in which a given volume drafts hash table slots is consistent 
 across reboot/crash/disk-failure, however when a volume attempts to draft a 
 slot which has already been occupied, it skips to its next random pick and 
 attempts to draft that slot until it finds an open slot.  This ensures that 
 the hash is partitioned evenly between volumes.
 The issue:
 Resolving slot contention breaks the consistency as it is dependent on the 
 order that the volumes draft.  When rebuilding the hash after disk failure or 
 reboot with fewer drives, a volume may secure an index that was previously 
 occupied by the dead-disk.  In the old hash, the surviving volume would have 
 selected another random index due to contention.  If this index is taken, by 
 the next draft round it will represent an inconsistent key-volume result.  
 The effects of one inconsistency will then cascade as whichever volume 
 occupies that index after removing a dead disk is now behind on its draft 
 sequence as well. 
 An Example:
 ||Disk||Draft Sequence||
 |A|1,4,7,5|
 |B|4,2,8,1|
 |C|3,7,5,2|
 Pre-failure Hash Table after 2 rounds of draft:
 |A|B|C|B|C|?|A|?|
 Post-failure of drive B Hash Table after 3 rounds of draft:
 |A|C|C|A|{color:red}A{color}|?|{color:red}C{color}|?|
 Two slots have become inconsistent and more will probably follow.  These 
 inconsistencies become objects stored in a volume but lost to the top level 
 cache for open/lookup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (TS-996) HTTPHdr::m_host goes stale if HdrHeap::evacuate_from_str_heaps is called

2011-10-21 Thread B Wyatt (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13132822#comment-13132822
 ] 

B Wyatt commented on TS-996:


I am auditioning a fast/dirty fix for this that cache's a MIMEField pointer 
instead of the string pointer if the host comes from the MIMEHdr.  For Hosts in 
the URL it uses the m_cached_url's copy instead of a top level copy which 
should be immune to heap changes. 

Ideally, I think that code is up for a bit of cleaning but this will hopefully 
suffice for now.

The patch is attached as m_host.patch

 HTTPHdr::m_host goes stale if HdrHeap::evacuate_from_str_heaps is called
 

 Key: TS-996
 URL: https://issues.apache.org/jira/browse/TS-996
 Project: Traffic Server
  Issue Type: Bug
  Components: HTTP, MIME
Affects Versions: 3.1.0
Reporter: B Wyatt
 Attachments: m_host.patch


 class HTTPHdr stores a copy of the string pointer from either the URLimpl or 
 the MIMEHdr for the host name in m_host.  In both cases, these strings can be 
 moved to a new heap underneath the HTTPHdr.  When this happens, the process 
 will, at best read stale memory and be fine and at worst read unmapped memory 
 and segfault. 
 Currently, HdrHeap::evacuate_from_str_heaps is called to coalesce multiple 
 heaps into a single heap.  When this happens it will directly access the low 
 level objects via ::move_strings calls.  These objects do not posses the 
 necessary information to inform parent objects about the change, nor does the 
 HdrHeap directly inform interested parties.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (TS-934) Proxy Mutex null pointer crash

2011-10-10 Thread B Wyatt (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13124201#comment-13124201
 ] 

B Wyatt commented on TS-934:


From the cores/callstacks I've seen this is the same issue as TS-857.

 Proxy Mutex null pointer crash
 --

 Key: TS-934
 URL: https://issues.apache.org/jira/browse/TS-934
 Project: Traffic Server
  Issue Type: Bug
  Components: Core
Affects Versions: 3.1.0
 Environment: Debian 6.0.2 quadcore, forward transparent proxy.
Reporter: Alan M. Carroll
Assignee: Alan M. Carroll
 Fix For: 3.1.1

 Attachments: ts-934-patch.txt


 [Client report]
 We had the cache crash gracefully twice last night on a segfault.  Both 
 times the callstack produced by trafficserver's signal handler was:
 /usr/bin/traffic_server[0x529596]
 /lib/libpthread.so.0(+0xef60)[0x2ab09a897f60]
 [0x2ab09e7c0a10]
 usr/bin/traffic_server(HttpServerSession::do_io_close(int)+0xa8)[0x567a3c]
 /usr/bin/traffic_server(HttpVCTable::cleanup_entry(HttpVCTableEntry*)+0x4c)[0x56aff6]
 /usr/bin/traffic_server(HttpVCTable::cleanup_all()+0x64)[0x56b07a]
 /usr/bin/traffic_server(HttpSM::kill_this()+0x120)[0x57c226]
 /usr/bin/traffic_server(HttpSM::main_handler(int, void*)+0x208)[0x571b28]
 /usr/bin/traffic_server(Continuation::handleEvent(int, 
 void*)+0x69)[0x4e4623]
 I went through the disassembly and the instruction that it is on in 
 ::do_io_close is loading the value of diags (not dereferencing it) so it 
 is unlikely that that through a segfault (unless this is some how in 
 thread local storage and that is corrupt).
 The kernel message claimed that the instruction pointer was 0x4e438e 
 which in this build is in ProxyMutexPtr::operator -() on the 
 instruction that dereferences the object pointer to get the stored mutex 
 pointer (bingo!), so it would seem that at some point we are 
 dereferencing a null safe pointer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira