Re: debugs from an interesting CBDATA case study
On 14/01/2014 3:31 p.m., Alex Rousskov wrote: On 01/07/2014 08:35 AM, Alex Rousskov wrote: 2) [Add cbdata debugging to] find which object stores the invalid cbdata pointer. Then find how that invalid pointer gets to that object. I think it happens here: #8 0x0834f190 in CommCbMemFunT (aMeth= (void (Comm::TcpReceiver::*)(Comm::TcpReceiver * const, const CommCbMemFunTComm::TcpReceiver, CommCloseCbParams::Params )) 0x834e270 Comm::TcpReceiver::tcpConnectionClosed(CommCloseCbParams const), aJob=..., this=0xb268) at ../../../src/CommCalls.h:180 where we are passing a raw job pointer (job.get()) to CommCloseCbParams object which then uses it as cbdata in CommCommonCbParams constructor: CommCbMemFunT(const CbcPointerC job, Method meth): JobDialerC(job), CommDialerParamsTParams_(job.get()), CommCloseCbParams(void *aData); CommCommonCbParams::CommCommonCbParams(void *aData): data(cbdataReference(aData)), ... Does the attached untested patch help? Perfectly. Thank you!!! I have applied your fix to the branch with your details. Would you care to do the honours for trunk please? Amos
Re: [PATCH] use nullptr more?
On Tue, Jan 21, 2014 at 2:31 AM, Amos Jeffries squ...@treenet.co.nz wrote: On 2014-01-21 12:47, Alex Rousskov wrote: On 01/20/2014 02:28 PM, Amos Jeffries wrote: On 2014-01-21 08:05, Alex Rousskov wrote: On 01/20/2014 08:15 AM, Kinkie wrote: the attached patch is an attempt (build-tested) to rely more on nullptr in place of NULL. It takes from the current implementation, it is just a bit more forceful in using nullptr if available. Hi Kinkie, You forgot to mention *why* do we want to overwrite the external NULL definition? Overwriting NULL set by others will prevent folks with broken compilers working around nullptr compatibility issues. What will it give us in return, the ability to overwrite NULL #defined in some header we happened to #include? The only reason I know of that we might want to is that cstd* headers define NULL explicitly now, whereas before C++11 it was optionally there: The above actually sounds like a reason _not_ to redefine NULL to me: 1/ The macro NULL, defined in any of clocale, cstddef, cstdio, cstdlib, cstring, ctime, or cwchar, is an implementation-defined C++ null pointer constant in this International Standard (18.2). And now we are saying, Take that International Standard! We are going to use what we think is the right NULL, and there is nothing you can do about it! Pretty much. Although not so much as Take that IS! as Take that GCC since it is defined to be implementation specific. All we do with #undef is add risk screwing up the alignment with OS code and third-party code libraries which may now be depending on those compiler-specific defines. Now now.. My aim was (and is) not to induce controversy. The main benefit of nullptr lays in the fact that it is of type nullptr_t, which is implicitly convertible and comparable to any pointer type or pointer-to-member type. It is not implicitly convertible or comparable to integral types, except for bool. (copypasted from stackoverflow). So in theory the change I did should NOT change any visible behavior, but help catch unwanted pointer arithmetic, implicit type cast to int, truncation issues caused by casts to int etc. Total effort needed to verify if this causes unforeseen problem is 3 hours of farm build time. Is it clean? Probably not. Does it work? Maybe. Does it help? Possibly. Any issues this causes are at compile time and easily detectable. I won't push for this to go in. If you aren't convinced, we can certainly keep on going like we have so far, and it will keep on working like it does now. For the record, I am not objecting to this change. It would be great to understand why it is being done, but if others are sure it is needed, you do not need to convince me. I don't think any change here is needed. The existing hack (and it is just a hack) was a good idea at the time, and still is for all the GCC 4.0-4.5 compilers and old systems not defining it at all. But it is becoming increasingly irrelevant and when we someday switch to requiring a C++11 compiler NULL can be erased completely. Hopefully. -- /kinkie
Re: [RFC] FTP gw source layout
On 01/20/2014 08:45 PM, Alex Rousskov wrote: Hello, FTP gateway[1] code[2] work well, and we are polishing it for the official submission. The biggest change we need to make is to rearrange where the new code lives in Squid src directory. In this email, I am proposing how to structure that new code, but I start with a little bit of a background on the classes involved: * The current official code has a server-side FTP client that translates true HTTP requests (with ftp:// URIs) into FTP commands and FTP responses into true HTTP responses. I call this old FTP client below. * The new FTP Gateway code uses a different server-side FTP client that translates fake HTTP requests into FTP commands and FTP responses into fake HTTP responses. There is now also client-side FTP server code that translates true FTP commands into fake HTTP requests and fake HTTP responses into true FTP responses. Before the ftp-gw branch changes, most of the FTP code lived in two source files (.cc and .h): src/ftp.* # server-side FTP code (i.e., FTP client) - # non-existent client-side FTP code (i.e., FTP server) In the current ftp-gw branch, FTP temporary lives in these files: src/FtpServer.* # code shared by old and new FTP clients src/ftp.* # old FTP client src/FtpGatewayServer.* # new FTP gw client src/client_side.cc # new FTP gw server I propose the following arrangement for the official submission: src/out/FtpServer.* # code shared by old and new FTP clients src/out/FtpToHttp.* # old FTP client src/out/FtpGateway.* # new FTP gw client src/in/FtpGateway.* # new FTP gw server src/ftp/* # FTP code shared by client and server sides I am not happy about the ToHttp suffix for the old FTP client class but cannot think of a better one. The primary difference between the old and new FTP clients is that the former converts FTP responses into true HTTP responses, while the latter just wraps FTP responses into fake HTTP response wrappers (and unwraps them into native FTP responses on the client side). One alternative worth considering is accumulating FTP agent (client or server) code in their own directories, better separating them from HTTP agents (and agents for other protocols on the same side), even though we have not done that for HTTP yet: src/out/ftp/Server.* # code shared by old and new FTP clients src/out/ftp/ToHttp.* # old FTP client src/out/ftp/Gateway.* # new FTP gw client src/in/ftp/Gateway.* # new client-side FTP gw code src/ftp/* # FTP code shared by client and server sides I have not strong opinion on these. The src/out/ftp/* looks better but probably it will result to directories with only one or two source files. The protocol related files (http*, ftp*, gopher*) are few files. There has been at least one discussion about related matters, but it did not appear to reach consensus on the key questions relevant here. Pleases review [RFC] SourceLayout for client-side and server-side components[3] to avoid rehashing the same set of arguments here. Just a comment on in and out directories. I prefer the client-side and server-side instead of in/out or client/server, because it describes better what really is this part of code. I remember when I was trying to read squid code, in many cases I was confused with client* or server* naming schemes. The client-side part of squid is a server and the server-side is a client. That old discussion uses CCC and SSS placeholders for client- and server-side files. In my proposal above, CCC is in and SSS is out. I picked these misleading terms among all other misleading terms already discussed simply because they are shorter. If we have to mislead, let's at least not waste space! OK :-) Thank you, Alex. [1] http://wiki.squid-cache.org/Features/FtpGateway [2] https://code.launchpad.net/~measurement-factory/squid/ftp-gw [3] http://www.squid-cache.org/mail-archive/squid-dev/201303/0054.html
Re: [RFC] FTP gw source layout
mån 2014-01-20 klockan 11:45 -0700 skrev Alex Rousskov: FTP gateway[1] code[2] work well, and we are polishing it for the official submission. The biggest change we need to make is to rearrange where the new code lives in Squid src directory. For the record I have no opinion on either of the proposed layouts, except that sticking multiple server implementations in client_side.c is certainly not the right thing. Pick one and stick to it. But please account for adjusting other protocols to the same layout. Regards Henrik
Re: [RFC] FTP gw source layout
[ A specific revised proposal is at the end of this email, after the discussion. ] On 01/20/2014 03:30 PM, Amos Jeffries wrote: On 2014-01-21 07:45, Alex Rousskov wrote: I propose the following arrangement for the official submission: src/out/FtpServer.* # code shared by old and new FTP clients src/out/FtpToHttp.* # old FTP client src/out/FtpGateway.* # new FTP gw client src/in/FtpGateway.* # new FTP gw server src/ftp/* # FTP code shared by client and server sides These are all backwards conceptually. Yes, courtesy of the established Squid 'server side' is for clients that we are going to call servers terminology in the existing Squid code. * Gateway - translating in/ transport to out/ transport. Seems to describe what the old client code does pretty closely both against the dictionary and RFC terminology definitions for gateway. * Relay - passing in/ transport to same out/ transport. Seems to describe what the new code does if we ignore the messy internal representation mapping. I did try to figure out what FTP standards or other authoritative FTP-specific documents say about this, and found pretty much nothing. In a general sense, not specific to the established FTP terminology (whatever it is!), I agree that relay is a more suitable term for what we are doing. Keep in mind that relay (with various filtering capabilities offered by Squid) is essentially a proxy. This project was called Native FTP Proxy. [FWIW, the term FTP Gateway was suggested by Henrik during initial RFC review. Henrik thought that using HTTP semantics internally means we are a gateway. I changed the project name in order to avoid having an argument. Technically, it is not really clear whether we are using HTTP semantics internally or not. We are using HTTP structures, but semantics is a very gray area IMO.] I am OK with Relay if others think it is better in this context, but I may have an even better suggestion below. * out/FtpServer contains FTP clients. Lets not do this. We are already doing that for the existing server-side code, which is all based on the Server[StateDataInfo] class. On one hand, I think we should be consistent: Either rename every class or keep the current naming scheme, even if we know it is flawed. On the other hand, causing and dealing with so much renaming pain just for the sake of consistency feels wrong. My revised proposal below lessens that pain at the expense of some inconsistency with the old code. Yes it does kind of make sense if one has a lot of knowledge about Squid. But for most it will be amazingly confusing. The gadgets terminology we started using fits here, so I propose FtpGadgets.* to help reduce confusion. Gadgets names a collection of handy little things. It does not work at all for a base FTP client class IMO. One alternative worth considering is accumulating FTP agent (client or server) code in their own directories, better separating them from HTTP agents (and agents for other protocols on the same side), even though we have not done that for HTTP yet: src/out/ftp/Server.* # code shared by old and new FTP clients src/out/ftp/ToHttp.* # old FTP client src/out/ftp/Gateway.* # new FTP gw client src/in/ftp/Gateway.* # new client-side FTP gw code For now I think we leave this, it can be done later if we end up with similar models for other protocols. (FYI: I kind of dislike the whole fake-HTTP request mapping, but lacking a better alternative am not arguing against it at this point). Same here. Agreed. I am still finding all the names we can think of feel a bit contrived. At least in/ and out/ are shorter and fit by way of being more general than our somewhat squid-specific terminology. We can always rename later if something better is thought up. OK, so given your, Christos, and Henrik responses (thank you everybody!), I revise my proposal as follows: 1) The new Feature is going to be called Native FTP Proxy. The old FTP Gateway code name is dropped. 2) We are going to use the following layout for the FTP code: src/servers/FtpServer.* # new FTP server, relaying FTP src/clients/FtpClient.* # code shared by old and new FTP clients src/clients/FtpGateway.* # old FTP client, translating back to HTTP src/clients/FtpNative.* # new FTP client, relaying FTP src/ftp/* # FTP stuff shared by clients and servers 3) The remaining non-FTP code will be eventually moved into appropriate files and directories inside src/clients/ and src/servers/ structure, and the corresponding files/classes will be renamed at that time. Doing so is outside the FTP Gateway project scope and is likely to happen one class (or one group of files) at a time. Thank you, Alex. [1] http://wiki.squid-cache.org/Features/FtpGateway [2] https://code.launchpad.net/~measurement-factory/squid/ftp-gw [3] http://www.squid-cache.org/mail-archive/squid-dev/201303/0054.html
Re: [RFC] FTP gw source layout
On 2014-01-22 10:37, Alex Rousskov wrote: [ A specific revised proposal is at the end of this email, after the discussion. ] On 01/20/2014 03:30 PM, Amos Jeffries wrote: On 2014-01-21 07:45, Alex Rousskov wrote: I propose the following arrangement for the official submission: src/out/FtpServer.* # code shared by old and new FTP clients src/out/FtpToHttp.* # old FTP client src/out/FtpGateway.* # new FTP gw client src/in/FtpGateway.* # new FTP gw server src/ftp/* # FTP code shared by client and server sides These are all backwards conceptually. Yes, courtesy of the established Squid 'server side' is for clients that we are going to call servers terminology in the existing Squid code. I'm looking at a higher level of abstraction than those quirks. Even if one swaps the client/server terms backwards these terminology have no reason to change (ie. one gateways from server to client as much as from client to server). [ On a side note shall we make a proper effort to fix that confusion by deprecating our use of the terms and call bits client or server by functionality? it would be as easy to describe these area-based parts as client-facing or server-facing areas of Squid. We already talk of ICAP as a client despite being both client-side and server-side. Your updated proposal below would seem to be a good move in that direction. If we can agree to stop saying/writing those terms and start replacing documentation we can probably make naming decisions a lot less confusing in the near future. ] * Gateway - translating in/ transport to out/ transport. Seems to describe what the old client code does pretty closely both against the dictionary and RFC terminology definitions for gateway. * Relay - passing in/ transport to same out/ transport. Seems to describe what the new code does if we ignore the messy internal representation mapping. I did try to figure out what FTP standards or other authoritative FTP-specific documents say about this, and found pretty much nothing. In a general sense, not specific to the established FTP terminology (whatever it is!), I agree that relay is a more suitable term for what we are doing. FTP has no protocol concept of proxying. So I would not expect anything to exist in those. This is generic/abstract systems terminology so you may find the terms defined in dictionaries, the basic Internet RFC the cover terminology or specific protocols like HTTP/DNS/SMTP/SIP which have the concept of proxying. Keep in mind that relay (with various filtering capabilities offered by Squid) is essentially a proxy. This project was called Native FTP Proxy. Essentially is the word. With one fine distinction: Relay is transparent in the protocol (essentially 'dumb'), Proxy may adjust and make changes to the message semantics as they go through. [FWIW, the term FTP Gateway was suggested by Henrik during initial RFC review. Henrik thought that using HTTP semantics internally means we are a gateway. I changed the project name in order to avoid having an argument. Technically, it is not really clear whether we are using HTTP semantics internally or not. We are using HTTP structures, but semantics is a very gray area IMO.] [ This last point is one reason I really want to make the move from a structures based internal representation to passing around frames. That was we can simply call each of these FTP messages a frame (might even stay in FTP native format) and be done with it. ] I am OK with Relay if others think it is better in this context, but I may have an even better suggestion below. Relay is fine with me. * out/FtpServer contains FTP clients. Lets not do this. We are already doing that for the existing server-side code, which is all based on the Server[StateDataInfo] class. On one hand, I think we should be consistent: Either rename every class or keep the current naming scheme, even if we know it is flawed. On the other hand, causing and dealing with so much renaming pain just for the sake of consistency feels wrong. Unless we stop with the improving of code we are going to go through the pain eventually and spread over an arbitrary span of time. The SourceLayout project alone seeks to do that rename every class, either by actual rename or replacement with a better class. My revised proposal below lessens that pain at the expense of some inconsistency with the old code. Yes it does kind of make sense if one has a lot of knowledge about Squid. But for most it will be amazingly confusing. The gadgets terminology we started using fits here, so I propose FtpGadgets.* to help reduce confusion. Gadgets names a collection of handy little things. It does not work at all for a base FTP client class IMO. Okay. Anything better? or are we stuck with FtpStateData for a while longer? One alternative worth considering is accumulating FTP agent (client or server) code in their
Re: [RFC] FTP gw source layout
On 01/21/2014 03:54 PM, Amos Jeffries wrote: [ On a side note shall we make a proper effort to fix that confusion by deprecating our use of the terms and call bits client or server by functionality? it would be as easy to describe these area-based parts as client-facing or server-facing areas of Squid. We already talk of ICAP as a client despite being both client-side and server-side. Your updated proposal below would seem to be a good move in that direction. If we can agree to stop saying/writing those terms and start replacing documentation we can probably make naming decisions a lot less confusing in the near future. ] If the proposed Native FTP naming scheme is accepted, we will indeed migrate away from the end-agent proximity model towards a protocol role-based model and, personally, I will do my best to avoid client-side and server-side terminology. Keep in mind that relay (with various filtering capabilities offered by Squid) is essentially a proxy. This project was called Native FTP Proxy. Essentially is the word. With one fine distinction: Relay is transparent in the protocol (essentially 'dumb'), Proxy may adjust and make changes to the message semantics as they go through. In our case, Squid adjusts native FTP message semantics (e.g., Squid may convert an active data transfer request into a passive one or strip some FEAT response items) and adaptation services may do so as well (e.g., block or even adjust certain FTP downloads). [FWIW, the term FTP Gateway was suggested by Henrik during initial RFC review. Henrik thought that using HTTP semantics internally means we are a gateway. I changed the project name in order to avoid having an argument. Technically, it is not really clear whether we are using HTTP semantics internally or not. We are using HTTP structures, but semantics is a very gray area IMO.] [ This last point is one reason I really want to make the move from a structures based internal representation to passing around frames. That was we can simply call each of these FTP messages a frame (might even stay in FTP native format) and be done with it. ] I am happy to discuss what you mean by frames that are not structure-based, and what we should use them for, but let's do that elsewhere. I am pretty sure that the primary problems with that migration are not in the primitive ways Native FTP code is using [so called] HttpMsg objects. Gadgets names a collection of handy little things. It does not work at all for a base FTP client class IMO. Okay. Anything better? or are we stuck with FtpStateData for a while longer? If the proposal below is implemented, the base FTP client class will be just that: Ftp::Client. 1) The new Feature is going to be called Native FTP Proxy. The old FTP Gateway code name is dropped. Ok. 2) We are going to use the following layout for the FTP code: src/servers/FtpServer.* # new FTP server, relaying FTP src/clients/FtpClient.* # code shared by old and new FTP clients src/clients/FtpGateway.* # old FTP client, translating back to HTTP src/clients/FtpNative.* # new FTP client, relaying FTP src/ftp/* # FTP stuff shared by clients and servers Ok. 3) The remaining non-FTP code will be eventually moved into appropriate files and directories inside src/clients/ and src/servers/ structure, and the corresponding files/classes will be renamed at that time. Doing so is outside the FTP Gateway project scope and is likely to happen one class (or one group of files) at a time. Sure. +1 on this proposal. Great! I will wait a few days before implementing #1 and #2 to give everybody else a better chance to voice their opinion, including objections. Thank you, Alex.
Re: [PATCH] client-side redesign pt1 - Comm::TcpReceiver
On 01/07/2014 02:52 AM, Amos Jeffries wrote: Updated patch attaced for audit. This one includes all the currently known bits for server-side delay pools so no audit omissions this time around. On 4/01/2014 8:16 a.m., Alex Rousskov wrote: On 12/03/2013 10:05 PM, Amos Jeffries wrote: This patch abstracts the TCP socket operations out of existing client-side code into a class which can be re-used for any TCP socket code. +namespace Comm { + +class TcpReceiver : virtual public AsyncJob +{ +public: Missing class description. You may want to add an XXX: finalize and describe scope comment in lieu of that description for now. I was hoping initially to call it TcpReader, but these send-related details prevent it being a pure read-only semantic. At least receive semantics involve the concept of sending in the form of ACKs. To make matters worse, ICAP code is using read/write terms to talk about data being sent from/to Squid main code and receive/send terms to talk about data being sent from/to the remote ICAP service. See the first comment in src/adaptation/icap/ModXact.cc Unfortunately, I am worried there is a bigger problem here than the verb to use (on the other hand, our inability to name something is often indicative of a design problem; more on that immediately below). Any better ideas at a name? The primary purpose is to control the read(2) operations. The only write(2) related code is the stop*() errors flag state to indicate that sending has stopped or not. Which is needed to manage the close(2) properly. Or should we abstract that out to yet another class managing FD lifetime? Difficult to say without doing the legwork. The correct design depends on what our clients and servers need. I am seriously worried that you are too focused on one server now (client_side_*cc) and once you start adding more and more agents, the current design would fall apart, even if we spend another five iterations perfecting it. All the TCP clients and servers you are willing to include (as future TcpReceiver kids) in the current project scope have at least one thing in common -- they all read and write protocol messages over [persistent] TCP connections. Why do you think it is a good idea to focus on just the reading part so much? If you want a common parent for all those agents, why not handle both actions in that common parent, especially if you already find it necessary to add a little bit of send part into the TcpReceiver? If tomorrow we have both TcpReceiver and TcpSender, and all their kids have both classes as parents, with complex inter-parent dependencies that are already showing in TcpReceiver, we are doing something wrong. This may be one of those cases where restricting class scope too much makes the code more complex and less reusable. After reading your response to my initial comments, this is my primary concern for the overall design. I am not [just] ranting about it. This may be a serious matter. I can think of two very different ways to go forward from here: A) Forget about other agents, sides, etc. and focus on the HTTP server (i.e., client-side*cc) code exclusively. That code does not need a TcpReceiver. It needs a lot of work, but extracting TCP reading code into an abstract class is not one of them! B) Try to implement a common parent for all existing TCP protocol agents/sides. That common parent, let's call it TcpAgent for the purpose of this discussion, will have code to read and write messages. There is lots of common, complex code to extract from the existing agents into this common parent! If we go a step further, many Agents also maintain a connection pool, so you may add a TcpAgentWithPool class of some kind. IMO, both approaches above are valid in isolation, but mixing them may cause trouble. It provides: * data members for referencing a TCP socket and read buffer. * methods for incremental parsing or processing of recieved data. * methods for managing half and fully closing a socket. * separation of socket read() from HTTP/1 parsing and processing logic. It may be re-used for any client-side or server-side TCP socket. +void +Comm::TcpReceiver::stopReading() +{ +/* NP: This is a hack only needed to allow TunnelStateData + * to take control of a socket despite any scheduled read. + */ +if (reading()) { +comm_read_cancel(tcp-fd, reader_); +reader_ = NULL; +} +} Why is this called a hack? It only exists because tunnel.cc takes over the FD. Really nasty (the dropped data race condition). Let's call this method stopReadingXXX() so that it is clear that any caller is doing it wrong and to discourage use. All I have done is shuffle the function into this class. The problem already exists and I do not see any way we can solve it within reasonabel scope of this project. So I opted to mark with a comment and leave it for now. That is fine, of course. We just need to make sure others will not