On Mar 29, 2010, at 11:31 AM, Brice Figureau wrote:
Hi,
I started working again on #3373:
http://projects.reductivelabs.com/issues/3373
I'd like to discuss here how to achieve the file client streaming
(ie
never fully buffer the file content).
What makes thing real complex is that the response is valid only
in the
block of the network request:
<pseudo-code>
http.get(...) do |response|
... here response is accessible by chunk
end
... response is not accessible unless the request was buffered in
RAM
... which is not what we want to achieve.
</pseudo-code>
In the current version, the file content indirector REST call is
done
when something tries to access the File source property content
attribute. This is called by the content property at the time of
syncing
the type.
So, we either need to reverse how the content writing works (passing
done a block), or use dirty thread tricks like I did to defer the
http
request to when we'll actually write content to disk, because we
have to
pass a block (which will be called with the response chunk) that
will
have to write the content to disk.
My current implementation solves the issue by firing a new thread
that
waits the file type is ready to write something to the disk and then
handles response chunk to the writer thread. It works, but is quite
dirty (and I'm sure everybody was offensed by this hardcore code
to the
point everybody was speechless :-)). For reference see
DeferredResponse
in [1]
So, I'm seeking advice on how to write this code better, and will
welcome any comments or ideas.
If you need a reference to the actual patch see here:
[1] http://github.com/masterzen/puppet/tree/tickets/master/3373
Given that file management is already far more complicated than all
of
the other Indirector types, I'd prefer to have the 'write' method in
File directly do the http call.
You can make the http instance easily with the
Network::HttpPool#http_instance method, then you just have to parse
the
URL from the source.
We could pretty easily just have a simple branch here. Here's the
current writing line:
File.open(path, File::CREAT|File::WRONLY|File::TRUNC, mode) { |f|
f.print content }
With 'content' passed in. Instead, I'd recommend changing it to
something like:
File.open(path, File::CREAT|File::WRONLY|File::TRUNC, mode) { |f|
write_content(f) }
And then have 'write_content' switch on the content type:
def write_content(fh)
if parameter(:content).actual_content
fh.print content
else
uri = URI.parse(self[:source])
case uri.scheme
when "puppet"; write_content_from_http(fh, uri)
when "file"; write_content_from_file(fh, uri.path)
else
raise "some failure"
end
end
end
Just about this exact case statement is already used somewhere else
and
could likely be removed from there. At worst, you've got the basic
structure and parsing duplicated, but my guess is we can do that in a
way that eliminates duplication. Then we just implement the basic
bits:
def write_content_from_http(fh, uri)
http = Puppet::Network::HttpPool.http_instance(uri.host, uri.port ||
Puppet[:masterport])
http.get { |r| .... }
end
def write_content_from_file(fh, path)
...similar by-chunk copying of file....
end
Given that this doesn't require any arechitectural chagnes, and
probably
actually enables some simplification of it, why not use this?