Hi Christian, Yes, the data is corrupted also when using Protocol Buffers. In fact, after the test I just did uploading/downloading 100 x 1 MB files, the error rate appeared to be an order of magnitude (~20-40 ppm) greater than the ~1 ppm rate I have seen when using HTTP.
Just wanted to remind everyone that it does seem to work just fine when doing the exact same tests inside an Ubuntu VM *running on the same Windows machine* on which I'm seeing the corruption errors (I'm using VMware Player). Mysterious. Finkle 2013/11/9 Christian Dahlqvist <christ...@basho.com> > Hi, > > I see from your code sample that you are using the HTTP mode. Do you see > the same issue if you switch to using Protocol Buffers? > > Best regards, > > Christian > > > > > On 9 Nov 2013, at 09:53, finkle mcgraw <finklemcg...@gmail.com> wrote: > > Hi John and Engel, > > Here's a link to a Dropbox folder with a set of file pairs (the source > file and the corrupted version that has taken a round trip via riak): > https://www.dropbox.com/sh/snfbiqm0jys9u2a/AZPF7_RcBT > > John, to answer your questions: > > *Windows-->Riak-->Ubuntu VM* > When uploading files from windows to riak, then downloading them to the > Ubuntu VM, inconsistencies appear also, but always for the same subset of > files (if I repeatedly download the same set of files from riak and verify > against the source files). This to me indicates that these files were > corrupted on the upload from windows to riak. > > *Ubuntu VM-->Riak-->Windows* > When uploading the source files from Ubuntu VM (and after having verified > that they can be downloaded into the Ubuntu VM again without any problems) > and then downloading them to windows, inconsistencies appear. However, > these inconsistencies are varying from file to file from each download > round. I.e., by downloading a file a few times I eventually get a > non-corrupted version. This to me indicates that the files were correctly > uploaded to riak from the Ubuntu VM, but are corrupted somewhere in the > download flow on the windows machine. > > Ergo: Data appears to be corrupted both when going upstream and when going > downstream somewhere inside the stack used by the riak python client on > windows 7 64 bit. > > One more observation: I've done some byte for byte comparisons when > uploading/downloading, and the error rate appears to be on the order of 0.4 > ppm. > > Finkle > > > > > > > > 2013/11/9 John Daily <jda...@basho.com> > >> (And the inverse would also be interesting to know.) >> >> -John >> >> On Nov 8, 2013, at 6:41 PM, John Daily <jda...@basho.com> wrote: >> >> If you upload the files from Windows, and download them to the Ubuntu VM, >> do inconsistencies ever appear? >> >> -John >> >> On Nov 8, 2013, at 4:58 PM, Engel Sanchez <en...@basho.com> wrote: >> >> Hello there, >> >> This looks puzzling. Just from looking at the code we haven't found >> anything suspicious. Would you mind posting a pair of those files that >> failed to match somewhere so we can look at the differences? >> >> Thanks for reporting this. >> >> Engel@Basho >> >> >> On Fri, Nov 8, 2013 at 2:41 PM, finkle mcgraw <finklemcg...@gmail.com>wrote: >> >>> Fellow Riak users, >>> >>> I've noticed that when I upload binary files with sizes of >~1 MB to >>> Riak from my Windows 7 (64 bit) machine, then read the same data back >>> again, often it has a few corrupted bytes, while maintining the correct >>> total data length. >>> >>> Here's the Python script I use to provoke and detect the situation: >>> https://gist.github.com/anonymous/7376084 >>> >>> Notice that I included the typical output when running the script at the >>> bottom of the gist. As you can see, for that particular run, half of the >>> dummy-data files were corrupted. The returned data from Riak has the exact >>> same length as the source, but not the exact same content. I've only done >>> brief analysis of how the corruptions appear within the files that are >>> detected as corrupted, but it looks like it's typically between 1 to 5 >>> bytes that are altered, evenly distributed within the file. >>> >>> I get no exceptions or warnings from the Riak Python client. Everything >>> appears to be in order. >>> >>> So far I've tested this on two different windows machines against two >>> different Riak clusters (a five node Amazon cluster with a loadbalancer in >>> front, and a local devcluster running inside an Ubuntu 12.04 Virtual >>> Machine). The problems appear in all four possible combinations. >>> >>> However, if I run the script from within an Ubuntu VM, on one of the >>> said Windows machines, against any of the two Riak clusteres, the problems >>> do NOT appear. >>> >>> Another observation: If I generate 50 sample files, upload them, then >>> repeatedly try to download them over and over again, the script will detect >>> corruptions in different files on each repetition of downloading. E.g., on >>> round one it might say that file 1,5, and 19 were corrupted, but on round >>> two it might say 3, 8 and 19. >>> >>> Here is the riak stats-view from the Amazon cluster we're running (that >>> I tested the script agains): >>> https://gist.github.com/anonymous/7376379 >>> >>> But as I said, the corruptions appear also when working locally between >>> a Win7 machine and a cluster running on a virtual Ubuntu 12.04 machine. >>> >>> Here are my local package versions, running on Python 2.7.5 64 bit on >>> Windows 7 64 bit: >>> protobuf==2.4.1 >>> riak==2.0.1 >>> riak-pb==1.4.1.1 >>> >>> Any ideas? This seems relatively serious, unless it's some kind of >>> brutal oversight on my part. >>> >>> Finkle >>> >>> >>> >>> _______________________________________________ >>> riak-users mailing list >>> riak-users@lists.basho.com >>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >>> >>> >> _______________________________________________ >> riak-users mailing list >> riak-users@lists.basho.com >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> >> >> >> > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > >
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com