If you upload the files from Windows, and download them to the Ubuntu VM, do 
inconsistencies ever appear?

-John

On Nov 8, 2013, at 4:58 PM, Engel Sanchez <en...@basho.com> wrote:

> Hello there,
> 
> This looks puzzling. Just from looking at the code we haven't found anything 
> suspicious. Would you mind posting a pair of those files that failed to match 
> somewhere so we can look at the differences?
> 
> Thanks for reporting this.
> 
> Engel@Basho
> 
> 
> On Fri, Nov 8, 2013 at 2:41 PM, finkle mcgraw <finklemcg...@gmail.com> wrote:
> Fellow Riak users,
> 
> I've noticed that when I upload binary files with sizes of >~1 MB to Riak 
> from my Windows 7 (64 bit) machine, then read the same data back again, often 
> it has a few corrupted bytes, while maintining the correct total data length.
> 
> Here's the Python script I use to provoke and detect the situation:
> https://gist.github.com/anonymous/7376084
> 
> Notice that I included the typical output when running the script at the 
> bottom of the gist. As you can see, for that particular run, half of the 
> dummy-data files were corrupted. The returned data from Riak has the exact 
> same length as the source, but not the exact same content. I've only done 
> brief analysis of how the corruptions appear within the files that are 
> detected as corrupted, but it looks like it's typically between 1 to 5 bytes 
> that are altered, evenly distributed within the file.
> 
> I get no exceptions or warnings from the Riak Python client. Everything 
> appears to be in order.
> 
> So far I've tested this on two different windows machines against two 
> different Riak clusters (a five node Amazon cluster with a loadbalancer in 
> front, and a local devcluster running inside an Ubuntu 12.04 Virtual 
> Machine). The problems appear in all four possible combinations.
> 
> However, if I run the script from within an Ubuntu VM, on one of the said 
> Windows machines, against any of the two Riak clusteres, the problems do NOT 
> appear.
> 
> Another observation: If I generate 50 sample files, upload them, then 
> repeatedly try to download them over and over again, the script will detect 
> corruptions in different files on each repetition of downloading. E.g., on 
> round one it might say that file 1,5, and 19 were corrupted, but on round two 
> it might say 3, 8 and 19.
> 
> Here is the riak stats-view from the Amazon cluster we're running (that I 
> tested the script agains):
> https://gist.github.com/anonymous/7376379
> 
> But as I said, the corruptions appear also when working locally between a 
> Win7 machine and a cluster running on a virtual Ubuntu 12.04 machine.
> 
> Here are my local package versions, running on Python 2.7.5 64 bit on Windows 
> 7 64 bit:
> protobuf==2.4.1
> riak==2.0.1
> riak-pb==1.4.1.1
> 
> Any ideas? This seems relatively serious, unless it's some kind of brutal 
> oversight on my part.
> 
> Finkle
> 
> 
> 
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 
> 
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to