Re: Parsing a git HTTP protocol response

2018-11-30 Thread Bryan Turner
On Fri, Nov 30, 2018 at 6:58 PM Bryan Turner  wrote:
>
> Here's a (very ugly) patch I threw together on top of your code:
...snip

Gmail butchered my patch, so here it is as an attachment.

Bryan


short-size-reads.patch
Description: Binary data


Re: Parsing a git HTTP protocol response

2018-11-30 Thread Bryan Turner
On Fri, Nov 30, 2018 at 5:05 PM Farhan Khan  wrote:
>
> Hi all,
>
> I am writing an implementation of the git HTTP pack protocol in C. It
> just does a request to clone a repository. It works pretty well for
> small repositories, but seems to fail on larger repositories and I do
> not understand why.
>
> All that my code does is send a hard-coded "want" request. As I
> understand it, responses come with a four-byte size prefix, then the
> data of the size - 4. The first four bytes are the size of the object
> in ASCII and after that size, a new object should begin by specifying
> its size. The final "" string should signify the end.
>
> On small repositories my code works fine. But when use a large git
> repository, I hit an object size that cannot be interpreted by ASCII
> into a number. In particular, right before the crash I am pretty
> consistently getting a size starting with 0x32,0x00 or 0x32,0x30,0x00
> (note, it starts with 0x32 and has 0x00 in there). This is not a
> readable four byte ascii value, likely an erroneous size value, which
> causes the next object size calculation to land incorrectly and
> subsequently the program to malfunction.
>
> My questions:
> 1. Am I misunderstanding the protocol?
> 2. Does 0x32 signify something?
> 3. Also, where can I see in the source code git parses the data it
> receives from a "want" request?
>
> If you would like to see my code, it is located here:
> http://git.farhan.codes/farhan/post
> To compile to Linux run: gcc post.c main.c -lcurl -o post
> To compile on FreeBSD you can use the Makefile or: cc -L
> /usr/local/lib -I /usr/local/include -lcurl main.c post.c -o post
> In both cases you need to have libcurl installed.
>
> To run do: ./post [a git http url] [a commit value]
> ie: ./post http://git.farhan.codes/farhan/openbsd
> 373dd5ba116d42cddf1fdcb58b578a4274f6d589
>
> I have a lot of debugging printf() messages, but it eventually hits a
> point where you see this:
>
> Start=
> Current stats: Current state: 999 Received: 1448
> We have enough bytes, moving forward
> == New Object
> No Error, object is 32 bytes
> Size bytes: 32 30 00 00

I modified your debugging output a little bit, and right before the
failure happens I see this in my output:

Start=
Current stats: Current state: 999 Received: 1298
Read complete; still missing 1296 bytes (6901 read of 8197)
Offset: 1298

Start=
Current stats: Current state: 999 Received: 1298
Read complete; 2 more bytes in buffer
== New Object
Size bytes: 32 30 00 2b

Based on that, it appears what's happening is you're not handling the
case where you end up with fewer than 4 bytes in the buffer. As a
result, memcpy reads past the end of the response buffer and gets
whatever it gets, resulting in an incorrect parsed size.

The while loop in pack_protocol_line is checking +1, but I think it
should be checking +3 to ensure there's at least 4 bytes so it can get
a complete size. The "parseread" struct will need to grow a couple
more fields to allow buffering some additional bytes between
pack_protocol_line calls (because curl requires the write function to
always consume the full buffer) and, at the start of the loop, when
reading/parsing a size, the code will need to consider any buffered
bytes from the previous function call. That requires some knock-on
changes to how the offset is handled, as well as the the initial
"psize" set when starting a new packet.

Once you start accounting for reads that cut off in 1, 2 or 3 bytes
into the 4 required to parse a size, your program should be able to
run to completion.

Here's a (very ugly) patch I threw together on top of your code:

diff --git a/main.c b/main.c
index 956362b..8873fd5 100644
--- a/main.c
+++ b/main.c
@@ -71,7 +71,7 @@ int main(int argc, char *argv[]) {
  curl_easy_setopt(curl, CURLOPT_HTTPHEADER, list);
  curl_easy_setopt(curl, CURLOPT_POSTFIELDSIZE, (long)content_length);
  curl_easy_setopt(curl, CURLOPT_WRITEDATA, );
- curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, pack_protocol_line);
+ curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, _protocol_line);

  res = curl_easy_perform(curl);
  if (curl != CURLE_OK)
diff --git a/post.c b/post.c
index cfffd5c..4f8512c 100644
--- a/post.c
+++ b/post.c
@@ -36,89 +36,83 @@ size_t pack_protocol_line(void *buffer, size_t
size, size_t nmemb, void *userp)
 {
  unsigned char *reply = buffer;
  struct parseread *parseread = userp;
- int offset = 0;
+ int offset = 0, pending = 0, remaining = 0;
  char tmp[5];
- int check;
-
- int pool = size * nmemb;

  printf("\n\nStart=\n");
- printf("Current stats: Current state: %d Received: %ld\n",
parseread->state, size*nmemb);
+ printf("Current stats: Current state: %d Received: %ld\n",
parseread->state, nmemb);

- while(offset + 1 < size + nmemb) {
+ while (offset + 3 < nmemb) {
  if (parseread->state == STATE_NEWLINE) {
  printf("== New Object\n");
  bzero(tmp, 5);
- memcpy(tmp, buffer+offset, 4); tmp[4] = '\0';
- check 

Parsing a git HTTP protocol response

2018-11-30 Thread Farhan Khan
Hi all,

I am writing an implementation of the git HTTP pack protocol in C. It
just does a request to clone a repository. It works pretty well for
small repositories, but seems to fail on larger repositories and I do
not understand why.

All that my code does is send a hard-coded "want" request. As I
understand it, responses come with a four-byte size prefix, then the
data of the size - 4. The first four bytes are the size of the object
in ASCII and after that size, a new object should begin by specifying
its size. The final "" string should signify the end.

On small repositories my code works fine. But when use a large git
repository, I hit an object size that cannot be interpreted by ASCII
into a number. In particular, right before the crash I am pretty
consistently getting a size starting with 0x32,0x00 or 0x32,0x30,0x00
(note, it starts with 0x32 and has 0x00 in there). This is not a
readable four byte ascii value, likely an erroneous size value, which
causes the next object size calculation to land incorrectly and
subsequently the program to malfunction.

My questions:
1. Am I misunderstanding the protocol?
2. Does 0x32 signify something?
3. Also, where can I see in the source code git parses the data it
receives from a "want" request?

If you would like to see my code, it is located here:
http://git.farhan.codes/farhan/post
To compile to Linux run: gcc post.c main.c -lcurl -o post
To compile on FreeBSD you can use the Makefile or: cc -L
/usr/local/lib -I /usr/local/include -lcurl main.c post.c -o post
In both cases you need to have libcurl installed.

To run do: ./post [a git http url] [a commit value]
ie: ./post http://git.farhan.codes/farhan/openbsd
373dd5ba116d42cddf1fdcb58b578a4274f6d589

I have a lot of debugging printf() messages, but it eventually hits a
point where you see this:

Start=
Current stats: Current state: 999 Received: 1448
We have enough bytes, moving forward
== New Object
No Error, object is 32 bytes
Size bytes: 32 30 00 00

The program interprets the size of {0x32,0x30,0x00,0x00} to be "20"
which in decimal is 32, causing the next read to fail.
This problem repeats on a few different repositories.

Any assistance is welcome, I am very stuck on how the HTTP git protocol works.
Thanks,
--
Farhan Khan
PGP Fingerprint: B28D 2726 E2BC A97E 3854 5ABE 9A9F 00BC D525 16EE