Re: Lots of data over a service

Wallace Turner Thu, 08 Aug 2013 04:07:14 -0700

> I can't use pure binary serialization because it's not supported in
Silverlight clients.
>I ran an experiment to convert ~6000 entities


How big are you entities? One thing i can suggest is protobuf
(protobuf-net). The size of the serialized entities will be *tiny* compared
to BinarySerializer or XmlSerializer. tiny.

What I would suggest to get going as quickly as possible is to set up a WCF
endpoint that transfers a Message which contains a byte[] payload.
public class Message
{
     public byte[] Data { get; set; }
}

Yes I know this isnt how WCF was intended but it gives you (almost) full
control over what goes over the wire. If you want to take this a step
further you can fully protobuf everything with custom bindings (perhaps not
supported by silverlight tho) but the overhead of the former approach
doesnt add that much more.

<plug>
See my chart [1] for a comparison of speeds and size for an example entity:


[1]: http://wallaceturner.com/serialization-with-protobuf-net





On Wed, Aug 7, 2013 at 10:15 AM, Paul Evrat <[email protected]> wrote:

> ** **
>
> In this age of ‘big data’ you’d think there would be a big
> commercialisation opportunity for visualising both small and large data
> sets in that way. Standardise the input data formats so people can prepare
> their own data and interpolate missing points and it would have to be huge
> for management and presentation software particularly if not available
> commercially already. ****
>
> ** **
>
> ** **
>
> *From:* [email protected] [mailto:
> [email protected]] *On Behalf Of *Greg Harris
> *Sent:* Wednesday, 7 August 2013 11:11 AM
>
> *To:* ozDotNet
> *Subject:* Re: Lots of data over a service****
>
> ** **
>
> Hi Paul,****
>
> ** **
>
> >> Is this something you will use yourself or for a client, or propose to
> make available one way or another?****
>
> This is work that I did myself as a side project some years ago to cement
> my Silverlight and C# knowledge.  I tried to find some commercial interest
> in it, but it just was not there in 2009/2010 when I was looking.  I am
> very open to suggestions.****
>
>  ****
>
> Some of my original notes on the project are:****
>
>  ****
>
> The web site where I first saw this style of graph on one of the TED talks
> (
> http://www.ted.com/talks/hans_rosling_shows_the_best_stats_you_ve_ever_seen.html)
> talks by Hans Rosling (http://en.wikipedia.org/wiki/Hans_Rosling) of the
> Karolinska Institute, where he presented the work done at Gap Minder (
> http://gapminder.org) I was very impressed and assumed that the graphics
> system behind the graph was some extensive university project that would be
> hard to reproduce.****
>
>  ****
>
> When I saw the graph again some months later during a presentation by
> Tristan Kurniawan  (then at SSW) on good user interface design, it occurred
> to me that this could be done as a Silverlight project.  At the time Adam
> Cogan said yeah sure Greg, you do that this weekend… While it was clear
> that it would be a lot more than a weekend job, I started on the project as
> my 'background project', which has took up about 18 Months of background
> work to complete (say equivalent of three-four months of full time work).*
> ***
>
>  ****
>
> While this work is strongly influenced by the GapMinder project all the
> code in this version is my own, I draw every pixel on the screen!****
>
>  ****
>
> The data sources I used is from GapMinder.org, specifically see:****
>
> Life expectancy at birth:
> http://spreadsheets.google.com/pub?key=phAwcNAVuyj2tPLxKvvnNPA ****
>
> GDP per capita:
> http://spreadsheets.google.com/pub?key=phAwcNAVuyj1jiMAkmq1iMg****
>
> Population :
> http://spreadsheets.google.com/pub?key=phAwcNAVuyj0XOoBL_n5tAQ****
>
>  ****
>
> The data needed extensive massaging to get the data into a more usable
> format and to interpolate missing data between known values.  See the data
> tabs on the left hand side of the graph for the raw data I ended up with.*
> ***
>
>  ****
>
> Where data is missing for some years for a country, that data is estimated
> by drawing a straight line between two known data points, this is then used
> to derive data for the missing years in between.****
>
>  ****
>
> The data displayed is not complete and may have errors and omissions,
> where there was a problem with part of the data set, that was left out
> rather than represent incorrect data.   There was a problem merging
> separate data sets where countries showed different names, so a direct
> merge was not possible, in this case if clear merge did not present itself,
> the data was excluded.****
>
> Other errors may have been introduced into the data during preparation the
> data for representation in this format (I welcome someone doing a through
> data validation).****
>
>  ****
>
> Once I had all of the data I worked on getting the graph drawn, the graph
> is drawn with many lines, circles and rectangles drawn on a Silverlight
> canvas.  With the sheer volume of data and updates needed, this was a bit
> of a trial and error process to find processes that worked effectively at
> an acceptable performance.****
>
> ** **
>
> Regards****
>
> Greg Harris****
>
> ** **
>
> On Wed, Aug 7, 2013 at 8:46 AM, Paul Evrat <[email protected]> wrote:****
>
> Greg,****
>
>  ****
>
> I saw the TED talk that you note was the inspiration for this. I thought
> at the time it was a brilliant way to present and understand data. Plus it
> and the presenter had the audience totally amused but it really made the
> data talk.****
>
>  ****
>
> Is this something you will use yourself or for a client, or propose to
> make available one way or another?****
>
>  ****
>
> Regards,****
>
>  ****
>
>  ****
>
> *From:* [email protected] [mailto:
> [email protected]] *On Behalf Of *Greg Harris
> *Sent:* Wednesday, 7 August 2013 1:30 AM
> *To:* ozDotNet
> *Subject:* Re: Lots of data over a service****
>
>  ****
>
> Hi Greg,****
>
>  ****
>
> What I did with my Motion Chart software (
> http://www.eshiftlog.com/Silverlight/MotionGraphTestPage.html) to get
> better download performance was:****
>
> • Move away from small WCF data transfers to transferring a single large
> encoded compressed text file****
>
> • Only transfer raw data (no JSON/XML structure, which adds a LOT OF FAT)*
> ***
>
> • Minor use of CSV format, otherwise fixed format****
>
> • Define my own number formats to reduce size (remove unneeded decimal
> places)****
>
> • Use zip file to transfer data****
>
> This has improved data load time by a factor of ~50-100 times (sorry no
> hard numbers).****
>
> My data ended up being 430KB for ~32K rows, just over 13 bytes/row.****
>
>  ****
>
> Example data:****
>
> C,007,Australia,Oceania,1820,2007****
>
> 3413340017010****
>
> 3413310017070****
>
> 3413290017280****
>
> 3413290017530****
>
> 3413320017950****
>
> 3413330018330****
>
>  ****
>
> As traditional CSV text, this would look like:****
>
> CountryID,Year,LifeExpect,Population,GDP,CountryName,RegionCode,RegionName
> ****
>
> 007,1820,34.1,0000334000,000701.0,Australia,4S,Oceania****
>
> 007,1821,34.1,0000331000,000707.0,Australia,4S,Oceania****
>
> 007,1822,34.1,0000329000,000728.0,Australia,4S,Oceania****
>
> 007,1823,34.1,0000329000,000753.0,Australia,4S,Oceania****
>
> 007,1824,34.1,0000332000,000795.0,Australia,4S,Oceania****
>
> 007,1825,34.1,0000333000,000833.0,Australia,4S,Oceania****
>
>  ****
>
> There are three row types in the file:****
>
> Lines beginning with "C" are CSV country header lines - Like:****
>
>   C,007,Australia,Oceania,1820,2007****
>
> The values being:****
>
>   - C: Header****
>
>   - 007: Country number****
>
>   - Australia: Country name****
>
>   - Oceania: Country region****
>
>   - 1820: First year there is data****
>
>   - 2007: Last year there is data****
>
>  ****
>
> Lines starting with 0-9 are data for one individual year for the above
> country****
>
>   - The year is assumed to increment for every detail line****
>
>   - These detail lines are always 13 digits wide, fixed width fields, no
> field separator, like:****
>
>            341 334001 7010 (spaces added for clarity, not in actual file)*
> ***
>
>   - Life expectancy (x10), example: 341 = 34.1 years****
>
>   - Population (last digit is exponent multiplier) 334001 = 334,000;
> 334002 = 3,340,000. ****
>
>     The last digit is effectively the number of zeros to add at the right
> hand side.****
>
>   - GDP (per person, last digit is exponent multiplier) 7010 = $7,010;
> 7011 = $70,100. ****
>
>      Again, the last digit is effectively the number of zeros to add at
> the right hand side.****
>
>  ****
>
> You need to be careful with this technique, how much data can you afford
> to “lose” due to data rounding.****
>
>  ****
>
> You were looking for “getting the data across with the least suffering and
> complexity”, my complexity was continual refining to more and more simple
> data structures, that were more and more looking like a data structure from
> a 1960’s COBOL program when storage was expensive and processing was slow.
> ****
>
>  ****
>
> In hindsight, I feel that I still sent more data than I needed to down the
> wire, I could have taken one digit off the age range, two digits off the
> population and one digit off the GDP, saving another 4 bytes per row. Also,
> could have used base 64 numbers, that would have given me another ~4 bytes
> per row.  But the performance was fine with this structure, so I did no
> more to cut it back.  ****
>
>  ****
>
> WARNING: This worked fine with my specific smallish well known data set,
> if I was putting this out into customer land, I would allow for a wider
> range of values.  For example, if we were to need to express the values in
> Indonesian Rupiahs rather than US Dollars, the amounts would go up by a
> factor of 10,000 and my values would no longer fit.  My values only work
> for large positive numbers, no room for a negative sign in front of the
> number or the exponent.  ****
>
>  ****
>
> So you need to design a file format that will work for your specific
> situation and data and keep an eye on it to make sure it stays working.***
> *
>
>  ****
>
> After having done all of this, I am tempted to see what the performance
> would be like with just simple raw CSV, if I was going to re-code this
> today, that is what I would start with.****
>
>  ****
>
> Regards****
>
> Greg #2 Harris****
>
>  ****
>
>  ****
>
> On Tue, Aug 6, 2013 at 6:00 PM, Greg Keogh <[email protected]> wrote:****
>
> Folks, I have to send several thousand database entities of different
> types to both a Silverlight 5 and WPF app for display in a grid. I can't
> "page" the data because it's all got to be loaded to allow a snappy
> response to filtering it. I'm fishing for ways of getting the data across
> with the least suffering and complexity ... don't forget that Silverlight
> is involved.****
>
>  ****
>
> Does a WCF service with http binding allow streaming? That would be the
> ideal technique if it comes out of the box and isn't too complex.****
>
>  ****
>
> I ran an experiment to convert ~6000 entities into XML and the size is a
> hefty 6MB (no surprise!), however Ionic.Zlib deflates it down to a 500KB
> buffer which transmits acceptably fast. I'm unhappy with my code to round
> trip the entities-to-XML as it's a bit messy and has special case logic to
> skip association properties.****
>
>  ****
>
> Then I thought of Json, which I haven't need to use before. Would the
> Jason libraries make round-tripping easier? Are the built-in Framework
> classes good enough, or would I need to use something like NewtonSoft? Can
> I control which properties are processed? Any general ideas would be
> welcome.****
>
>  ****
>
> Greg K****
>
>  ****
>
>  ****
>
> ** **
>
> No virus found in this message.
> Checked by AVG - www.avg.com
> Version: 2013.0.3392 / Virus Database: 3209/6554 - Release Date: 08/05/13*
> ***
>
> ** **
>
> No virus found in this message.
> Checked by AVG - www.avg.com
> Version: 2013.0.3392 / Virus Database: 3209/6556 - Release Date: 08/06/13*
> ***
>
>

Re: Lots of data over a service

Reply via email to