If you upload the files from Windows, and download them to the Ubuntu VM, do inconsistencies ever appear?
-John On Nov 8, 2013, at 4:58 PM, Engel Sanchez <en...@basho.com> wrote: > Hello there, > > This looks puzzling. Just from looking at the code we haven't found anything > suspicious. Would you mind posting a pair of those files that failed to match > somewhere so we can look at the differences? > > Thanks for reporting this. > > Engel@Basho > > > On Fri, Nov 8, 2013 at 2:41 PM, finkle mcgraw <finklemcg...@gmail.com> wrote: > Fellow Riak users, > > I've noticed that when I upload binary files with sizes of >~1 MB to Riak > from my Windows 7 (64 bit) machine, then read the same data back again, often > it has a few corrupted bytes, while maintining the correct total data length. > > Here's the Python script I use to provoke and detect the situation: > https://gist.github.com/anonymous/7376084 > > Notice that I included the typical output when running the script at the > bottom of the gist. As you can see, for that particular run, half of the > dummy-data files were corrupted. The returned data from Riak has the exact > same length as the source, but not the exact same content. I've only done > brief analysis of how the corruptions appear within the files that are > detected as corrupted, but it looks like it's typically between 1 to 5 bytes > that are altered, evenly distributed within the file. > > I get no exceptions or warnings from the Riak Python client. Everything > appears to be in order. > > So far I've tested this on two different windows machines against two > different Riak clusters (a five node Amazon cluster with a loadbalancer in > front, and a local devcluster running inside an Ubuntu 12.04 Virtual > Machine). The problems appear in all four possible combinations. > > However, if I run the script from within an Ubuntu VM, on one of the said > Windows machines, against any of the two Riak clusteres, the problems do NOT > appear. > > Another observation: If I generate 50 sample files, upload them, then > repeatedly try to download them over and over again, the script will detect > corruptions in different files on each repetition of downloading. E.g., on > round one it might say that file 1,5, and 19 were corrupted, but on round two > it might say 3, 8 and 19. > > Here is the riak stats-view from the Amazon cluster we're running (that I > tested the script agains): > https://gist.github.com/anonymous/7376379 > > But as I said, the corruptions appear also when working locally between a > Win7 machine and a cluster running on a virtual Ubuntu 12.04 machine. > > Here are my local package versions, running on Python 2.7.5 64 bit on Windows > 7 64 bit: > protobuf==2.4.1 > riak==2.0.1 > riak-pb==1.4.1.1 > > Any ideas? This seems relatively serious, unless it's some kind of brutal > oversight on my part. > > Finkle > > > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com