Hi Christian,

Yes, the data is corrupted also when using Protocol Buffers. In fact, after
the test I just did uploading/downloading 100 x 1 MB files, the error rate
appeared to be an order of magnitude (~20-40 ppm) greater than the  ~1 ppm
rate I have seen when using HTTP.

Just wanted to remind everyone that it does seem to work just fine when
doing the exact same tests inside an Ubuntu VM *running on the same Windows
machine* on which I'm seeing the corruption errors (I'm using VMware
Player).

Mysterious.

Finkle


2013/11/9 Christian Dahlqvist <christ...@basho.com>

> Hi,
>
> I see from your code sample that you are using the HTTP mode. Do you see
> the same issue if you switch to using Protocol Buffers?
>
> Best regards,
>
> Christian
>
>
>
>
> On 9 Nov 2013, at 09:53, finkle mcgraw <finklemcg...@gmail.com> wrote:
>
> Hi John and Engel,
>
> Here's a link to a Dropbox folder with a set of file pairs (the source
> file and the corrupted version that has taken a round trip via riak):
> https://www.dropbox.com/sh/snfbiqm0jys9u2a/AZPF7_RcBT
>
> John, to answer your questions:
>
> *Windows-->Riak-->Ubuntu VM*
> When uploading files from windows to riak, then downloading them to the
> Ubuntu VM, inconsistencies appear also, but always for the same subset of
> files (if I repeatedly download the same set of files from riak and verify
> against the source files). This to me indicates that these files were
> corrupted on the upload from windows to riak.
>
> *Ubuntu VM-->Riak-->Windows*
> When uploading the source files from Ubuntu VM (and after having verified
> that they can be downloaded into the Ubuntu VM again without any problems)
> and then downloading them to windows, inconsistencies appear. However,
> these inconsistencies are varying from file to file from each download
> round. I.e., by downloading a file a few times I eventually get a
> non-corrupted version. This to me indicates that the files were correctly
> uploaded to riak from the Ubuntu VM, but are corrupted somewhere in the
> download flow on the windows machine.
>
> Ergo: Data appears to be corrupted both when going upstream and when going
> downstream somewhere inside the stack used by the riak python client on
> windows 7 64 bit.
>
> One more observation: I've done some byte for byte comparisons when
> uploading/downloading, and the error rate appears to be on the order of 0.4
> ppm.
>
> Finkle
>
>
>
>
>
>
>
> 2013/11/9 John Daily <jda...@basho.com>
>
>> (And the inverse would also be interesting to know.)
>>
>> -John
>>
>> On Nov 8, 2013, at 6:41 PM, John Daily <jda...@basho.com> wrote:
>>
>> If you upload the files from Windows, and download them to the Ubuntu VM,
>> do inconsistencies ever appear?
>>
>> -John
>>
>> On Nov 8, 2013, at 4:58 PM, Engel Sanchez <en...@basho.com> wrote:
>>
>> Hello there,
>>
>> This looks puzzling. Just from looking at the code we haven't found
>> anything suspicious. Would you mind posting a pair of those files that
>> failed to match somewhere so we can look at the differences?
>>
>> Thanks for reporting this.
>>
>> Engel@Basho
>>
>>
>> On Fri, Nov 8, 2013 at 2:41 PM, finkle mcgraw <finklemcg...@gmail.com>wrote:
>>
>>> Fellow Riak users,
>>>
>>> I've noticed that when I upload binary files with sizes of >~1 MB to
>>> Riak from my Windows 7 (64 bit) machine, then read the same data back
>>> again, often it has a few corrupted bytes, while maintining the correct
>>> total data length.
>>>
>>> Here's the Python script I use to provoke and detect the situation:
>>> https://gist.github.com/anonymous/7376084
>>>
>>> Notice that I included the typical output when running the script at the
>>> bottom of the gist. As you can see, for that particular run, half of the
>>> dummy-data files were corrupted. The returned data from Riak has the exact
>>> same length as the source, but not the exact same content. I've only done
>>> brief analysis of how the corruptions appear within the files that are
>>> detected as corrupted, but it looks like it's typically between 1 to 5
>>> bytes that are altered, evenly distributed within the file.
>>>
>>> I get no exceptions or warnings from the Riak Python client. Everything
>>> appears to be in order.
>>>
>>> So far I've tested this on two different windows machines against two
>>> different Riak clusters (a five node Amazon cluster with a loadbalancer in
>>> front, and a local devcluster running inside an Ubuntu 12.04 Virtual
>>> Machine). The problems appear in all four possible combinations.
>>>
>>> However, if I run the script from within an Ubuntu VM, on one of the
>>> said Windows machines, against any of the two Riak clusteres, the problems
>>> do NOT appear.
>>>
>>> Another observation: If I generate 50 sample files, upload them, then
>>> repeatedly try to download them over and over again, the script will detect
>>> corruptions in different files on each repetition of downloading. E.g., on
>>> round one it might say that file 1,5, and 19 were corrupted, but on round
>>> two it might say 3, 8 and 19.
>>>
>>> Here is the riak stats-view from the Amazon cluster we're running (that
>>> I tested the script agains):
>>> https://gist.github.com/anonymous/7376379
>>>
>>> But as I said, the corruptions appear also when working locally between
>>> a Win7 machine and a cluster running on a virtual Ubuntu 12.04 machine.
>>>
>>> Here are my local package versions, running on Python 2.7.5 64 bit on
>>> Windows 7 64 bit:
>>> protobuf==2.4.1
>>> riak==2.0.1
>>> riak-pb==1.4.1.1
>>>
>>> Any ideas? This seems relatively serious, unless it's some kind of
>>> brutal oversight on my part.
>>>
>>> Finkle
>>>
>>>
>>>
>>> _______________________________________________
>>> riak-users mailing list
>>> riak-users@lists.basho.com
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>
>>>
>> _______________________________________________
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>>
>>
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
>
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to