Re: POP3 returns line data and CRLF separately, drops final CRLF

2012-02-15 Thread Rich Gray

Steve Holme wrote:

Hi Rich,


1.  Libcurl is returning message data line-by-line, with two
callbacks per line - one for the line data and the other for the
CRLF. This seems like strange behavior.  I'd coded as if I were
getting the data off a TCP connection - might get one byte,
might get the whole message in one shot, might get anything
between.  So I'm curious as to what the intent is here.


Yes - that's by design.

You could quite genuinely argue that it is strange behaviour and wouldn't it
be better to pass data on block by block as it comes off the socket.

However, there were problems with the existing CRLF.CRLF checking code that
needed fixing and it was generally simpler for Daniel and I to implement
this checking code this way. With this fix, I did question whether I should
buffer the lines back up before passing them onto whatever application was
using libcurl but I quite quickly came to the assumption that most
applications, will probably buffer it up themselves... thus it seemed quite
inefficient for libcurl to buffer up lines of data whilst checking for the
dot and then for an application programmer to do roughly the same once
received from libcurl.

The reason for the two callbacks is that the checking code will send you all
the data up to a CRLF, which it then has to buffer (if my memory serves me
correctly) in case the CRLF.CRLF runs over the end of a packet into the
next. The second callback is when sufficient data has been received by
libcurl and it has realised that the line wasn't a CRLF.CRLF so it then
passes the CRLF onto you.

It should be possible to get libcurl to send the block of data it receives
off the connection if it doesn't contain any part of the CRLF.CRLF less any
part of that if it is received at the end of the packet, but both Daniel and
I produced a few patches in an attempt to fix this, all of which failed the
test harnesses - However, we are open to patches that are able to pass the
data on in less calls it if you fancy the challenge ;-)


I pretty much figured this was the case.  I had thought that all the code 
had to do was implement a state machine to track the line ending cruft as it 
passed the data to the caller, but then I realized that it must do the dot 
destuffing (right?) so may have to modify the data in the case of a 
CRLF..CRLF(?)  Still, given that the CRLF at the end of the line IS part of 
the data (#2), it should be possible to deliver it with the data.  It is 
start of the following line which can't be passed to the user before being 
checked for .CRLF (end of data) or ..CRLF (pass as .CRLF), right?


I'm certainly not going to quibble about implementation details, given I'm 
just a libcurl user at this point.  As a user, I want to only express 
gratitude to the developers and never even appear to demand anything. 
Sometimes I can't help but comment though... ;)  Currently, I'm off on 
another, non-curl project for a while, but I do look forward to coming back 
to this and taking a look at the code.  (I seem to have +16 years of runtime 
on you.  Contrary to what Daniel said, I seem to have less time.  My wife 
and I must be doing something wrong - I know! It's the kids!!)



2. Libcurl is dropping the final CRLF from the data.


It's probably a combination of 1) me being slightly lazy and 2) me
misinterpreted the spec :(

I should have picked that up, but the existing code that had problems,
stripped the final CRLF off so I maintained compatibility with that rather
than questioning what the existing code was doing and fixing it. A fix
should be fairly simple to do so I will have a go at that later ;-)


Sorry for lack of snippage, I couldn't figure out what to snip!

Cheers!
Rich
---
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette:  http://curl.haxx.se/mail/etiquette.html


RE: POP3 returns line data and CRLF separately, drops final CRLF

2012-02-14 Thread Steve Holme
Hi Rich,

 1.  Libcurl is returning message data line-by-line, with two
 callbacks per line - one for the line data and the other for the
 CRLF. This seems like strange behavior.  I'd coded as if I were
 getting the data off a TCP connection - might get one byte,
 might get the whole message in one shot, might get anything
 between.  So I'm curious as to what the intent is here.

Yes - that's by design.

You could quite genuinely argue that it is strange behaviour and wouldn't it
be better to pass data on block by block as it comes off the socket.

However, there were problems with the existing CRLF.CRLF checking code that
needed fixing and it was generally simpler for Daniel and I to implement
this checking code this way. With this fix, I did question whether I should
buffer the lines back up before passing them onto whatever application was
using libcurl but I quite quickly came to the assumption that most
applications, will probably buffer it up themselves... thus it seemed quite
inefficient for libcurl to buffer up lines of data whilst checking for the
dot and then for an application programmer to do roughly the same once
received from libcurl.

The reason for the two callbacks is that the checking code will send you all
the data up to a CRLF, which it then has to buffer (if my memory serves me
correctly) in case the CRLF.CRLF runs over the end of a packet into the
next. The second callback is when sufficient data has been received by
libcurl and it has realised that the line wasn't a CRLF.CRLF so it then
passes the CRLF onto you.

It should be possible to get libcurl to send the block of data it receives
off the connection if it doesn't contain any part of the CRLF.CRLF less any
part of that if it is received at the end of the packet, but both Daniel and
I produced a few patches in an attempt to fix this, all of which failed the
test harnesses - However, we are open to patches that are able to pass the
data on in less calls it if you fancy the challenge ;-)

 2. Libcurl is dropping the final CRLF from the data.

It's probably a combination of 1) me being slightly lazy and 2) me
misinterpreted the spec :(

I should have picked that up, but the existing code that had problems,
stripped the final CRLF off so I maintained compatibility with that rather
than questioning what the existing code was doing and fixing it. A fix
should be fairly simple to do so I will have a go at that later ;-)

Regards

Steve

---
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette:  http://curl.haxx.se/mail/etiquette.html


POP3 returns line data and CRLF separately, drops final CRLF

2012-02-08 Thread Rich Gray
As noted in a previous e-mail (Jan 31, State of POP3 in curl?), I'm
working on a prototype POP3 download program utilizing libcurl.  I've got
my part of that prototype pretty much completed, but have noticed a couple
of anomalies:

1.  Libcurl is returning message data line-by-line, with two callbacks per
line - one for the line data and the other for the CRLF.  This seems like
strange behavior.  I'd coded as if I were getting the data off a TCP
connection - might get one byte, might get the whole message in one shot,
might get anything between.  So I'm curious as to what the intent is here.
If it's going to return line at a time, it would be nice to get the line
with the CRLF in one callback.  If, as a function of the dot de-stuffing,
libcurl returns whole chunks of message data on CRLF boundaries, that
would be fine too.  I can deal with full, unaligned chunks of data too.
For the moment, I'm not going to consider any sort of alignment
entitlement.

2. Libcurl is dropping the final CRLF from the data.  Although it can be
coped with, this seems wrong.  E-mail messages are always CRLF terminated
lines.  Not getting the final CRLF leaves a hanging, incomplete, line.  I
think this might be a mis-interpretation of RFC 1939, section 3 - Basic
Operation:

   Responses to certain commands are multi-line.  In these cases, which
   are clearly indicated below, after sending the first line of the
   response and a CRLF, any additional lines are sent, each terminated
   by a CRLF pair.  When all lines of the response have been sent, a
   final line is sent, consisting of a termination octet (decimal code
   046, .) and a CRLF pair.  If any line of the multi-line response
   begins with the termination octet, the line is byte-stuffed by
   pre-pending the termination octet to that line of the response.
   Hence a multi-line response is terminated with the five octets
   CRLF.CRLF.  When examining a multi-line response, the client checks
   to see if the line begins with the termination octet.  If so and if
   octets other than CRLF follow, the first octet of the line (the
   termination octet) is stripped away.  If so and if CRLF immediately
   follows the termination character, then the response from the POP
   server is ended and the line containing .CRLF is not considered
   part of the multi-line response.

I think the libcurl implementation has keyed off the CRLF.CRLF sentence
in the middle of this paragraph, whereas the final sentence clearly states
that the final .CRLF is not part of the data.  By implication, the
immediately preceding CRLF of the last line is part of the data.  Or, it's
just a bug! ;P


Using this write callback routine for a LIST command,

 size_t pop_list_data(char *ptr, size_t size, size_t nmemb, void
*userdata)
 {
int bytes = (int)(size * nmemb);
int *num_msgs = userdata;
int n;

printf(list %.*s\n, bytes, ptr);
if (*ptr = '0'  *ptr = '9')
   if ((n = atoi(ptr))  0)
  *num_msgs = n;
return bytes;
 }

with two messages in the mailbox, I get:

 LIST
 +OK 2 messages (31754 octets)
list 1 16050
list 

list 2 15704
* Connection #0 to host XX left intact

which shows both issues 1  2.   (Yes, I did shamelessly take advantage of
the line-by-line data return for the prototype. ;)  This will be redone in
a final version anyway, using STAT or UIDL.)

Cheers!
Rich
---
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette:  http://curl.haxx.se/mail/etiquette.html


Re: POP3 returns line data and CRLF separately, drops final CRLF

2012-02-08 Thread Daniel Stenberg

On Wed, 8 Feb 2012, Rich Gray wrote:

1.  Libcurl is returning message data line-by-line, with two callbacks per 
line - one for the line data and the other for the CRLF.  This seems like 
strange behavior.  I'd coded as if I were getting the data off a TCP 
connection - might get one byte, might get the whole message in one shot, 
might get anything between.  So I'm curious as to what the intent is here.


libcurl can of course deliver data this way and still adhere to the API/ABI 
just fine. It does this because of how it needs to traverse the entire data 
and do magic on dot-prefixed lines and while doing so it can just as well 
ship data off to the client like this.


If it's going to return line at a time, it would be nice to get the line 
with the CRLF in one callback.  If, as a function of the dot de-stuffing, 
libcurl returns whole chunks of message data on CRLF boundaries, that would 
be fine too.  I can deal with full, unaligned chunks of data too. For the 
moment, I'm not going to consider any sort of alignment entitlement.


If you can make the code any better and provide data to the application in 
less invokes or in another more convenient way then please feel free to have a 
go at it! I think that quite simply nobody has cared about that particular 
effect.



2. Libcurl is dropping the final CRLF from the data.


I think the libcurl implementation has keyed off the CRLF.CRLF sentence in 
the middle of this paragraph, whereas the final sentence clearly states that 
the final .CRLF is not part of the data.  By implication, the immediately 
preceding CRLF of the last line is part of the data.  Or, it's just a bug! 
;P


If libcurl gets the wrong data compared with what other tools say then it 
would be an indication that this is a bug or misinterpretation of the spec.


We do have POP3 test cases that seem to run fine though so it would then also 
indicate that those are wrong... (for example test 800, 808, 809, 810, 811 
etc)


--

 / daniel.haxx.se
---
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette:  http://curl.haxx.se/mail/etiquette.html