On 31/03/10 00:11, Luke Kanies wrote:
> On Mar 30, 2010, at 10:46 AM, Brice Figureau wrote:
> 
>> On 29/03/10 20:54, Luke Kanies wrote:
>>> On Mar 29, 2010, at 11:31 AM, Brice Figureau wrote:
>>>
>>>> Hi,
>>>>
>>>> I started working again on #3373:
>>>> http://projects.reductivelabs.com/issues/3373
>>>>
>>>> I'd like to discuss here how to achieve the file client streaming (ie
>>>> never fully buffer the file content).
>>>>
>>>> What makes thing real complex is that the response is valid only in the
>>>> block of the network request:
>>>>
>>>> <pseudo-code>
>>>>
>>>> http.get(...) do |response|
>>>> ... here response is accessible by chunk
>>>> end
>>>>
>>>> ... response is not accessible unless the request was buffered in RAM
>>>> ... which is not what we want to achieve.
>>>> </pseudo-code>
>>>>
>>>> In the current version, the file content indirector REST call is done
>>>> when something tries to access the File source property content
>>>> attribute. This is called by the content property at the time of
>>>> syncing
>>>> the type.
>>>>
>>>> So, we either need to reverse how the content writing works (passing
>>>> done a block), or use dirty thread tricks like I did to defer the http
>>>> request to when we'll actually write content to disk, because we
>>>> have to
>>>> pass a block (which will be called with the response chunk) that will
>>>> have to write the content to disk.
>>>>
>>>> My current implementation solves the issue by firing a new thread that
>>>> waits the file type is ready to write something to the disk and then
>>>> handles response chunk to the writer thread. It works, but is quite
>>>> dirty (and I'm sure everybody was offensed by this hardcore code to the
>>>> point everybody was speechless :-)). For reference see DeferredResponse
>>>> in [1]
>>>>
>>>> So, I'm seeking advice on how to write this code better, and will
>>>> welcome any comments or ideas.
>>>>
>>>> If you need a reference to the actual patch see here:
>>>> [1] http://github.com/masterzen/puppet/tree/tickets/master/3373
>>>
>>> Given that file management is already far more complicated than all of
>>> the other Indirector types, I'd prefer to have the 'write' method in
>>> File directly do the http call.
>>>
>>> You can make the http instance easily with the
>>> Network::HttpPool#http_instance method, then you just have to parse the
>>> URL from the source.
>>>
>>> We could pretty easily just have a simple branch here.  Here's the
>>> current writing line:
>>>
>>> File.open(path, File::CREAT|File::WRONLY|File::TRUNC, mode) { |f|
>>> f.print content }
>>>
>>> With 'content' passed in.  Instead, I'd recommend changing it to
>>> something like:
>>>
>>> File.open(path, File::CREAT|File::WRONLY|File::TRUNC, mode) { |f|
>>> write_content(f) }
>>>
>>> And then have 'write_content' switch on the content type:
>>>
>>> def write_content(fh)
>>>  if parameter(:content).actual_content
>>>    fh.print content
>>>  else
>>>    uri = URI.parse(self[:source])
>>>    case uri.scheme
>>>    when "puppet"; write_content_from_http(fh, uri)
>>>    when "file"; write_content_from_file(fh, uri.path)
>>>    else
>>>      raise "some failure"
>>>    end
>>>  end
>>> end
>>>
>>> Just about this exact case statement is already used somewhere else and
>>> could likely be removed from there.  At worst, you've got the basic
>>> structure and parsing duplicated, but my guess is we can do that in a
>>> way that eliminates duplication.  Then we just implement the basic bits:
>>>
>>> def write_content_from_http(fh, uri)
>>>  http = Puppet::Network::HttpPool.http_instance(uri.host, uri.port ||
>>> Puppet[:masterport])
>>>  http.get { |r| .... }
>>> end
>>>
>>> def write_content_from_file(fh, path)
>>>  ...similar by-chunk copying of file....
>>> end
>>>
>>> Given that this doesn't require any arechitectural chagnes, and probably
>>> actually enables some simplification of it, why not use this?
>>
>> The issue here is that Puppet::Type::File#write is passed the content.
>> This content can come from apparently 3 possible sources:
>> * content#sync, with a given content
>> * content#sync, with a given source content (which triggers the file
>> download)
>> * ensure#sync, with ""
>>
>> So, if we remove the content parameter, that means this logic will now
>> be embedded in File#write, and all the sync property methods will only
>> call write (with their property name if that's something that matters).
>> I don't have myself any objections about this change, but maybe you'd
>> have some?
> 
> The content source actually gets dramatically simplified in the testing
> branch, when I convert the 'checksum' property into a parameter. 

I have a question about 884f46 (from testing).

It looks to me there's something wrong. Namely there seems to be some
mismatch in the parameter passing:

This is in Puppet::Type::File#write:
# And put our new file in place
if use_temporary_file # This is only not true when our file is empty.
   begin
       fail_if_checksum_is_wrong(path, content) if validate
       File.rename(path, self[:path])
   rescue => detail
       fail "Could not rename temporary file %s to %s : %s" % [path,
self[:path], detail]
end

Then later:
# Make sure the file we wrote out is what we think it is.
def fail_if_checksum_is_wrong(path, checksum)
    # Use the appropriate checksum type -- md5, md5lite, etc.
    checksum = parameter(:checksum).sum(content)

    newsum = parameter(:checksum).sum_file(path)
    return if [:absent, nil, checksum].include?(newsum)

    self.fail "File written to disk did not match checksum; discarding
changes (%s vs %s)" % [checksum, newsum]
end

The above method uses 'content' which is nil (I couldn't find a content
method).

Beside this (minor) problem, I'm wondering if we couldn't optimize this
double checksum computation (can't we trust our OS to write the correct
thing to disk?). The other idea is that for sourced content we already
know the checksum (it should be in the metadata).

I'll leave this like this (except the content fix) for the moment.
-- 
Brice Figureau
My Blog: http://www.masterzen.fr/

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Developers" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/puppet-dev?hl=en.

Reply via email to