[web2py] Re: Asyncronous Application Sync

Niphlod Wed, 11 Jul 2012 12:50:39 -0700

I don't think there is something already written, however we could use some 
more info about it....
- how many data are we talking about ?
- what does it mean "as soon as there is availability of connectivity" ?


Given that, the approximate design for this kind of things is some form of 
interoperability (i.e. xmlrpc or jsonrpc) on the "city" location, that 
receive "records" uniquely identifiable (e.g. with uuids assigned), then 
db.table.update_or_insert(uuid==record.uuid, **record). At the end of the 
process (i.e. "jungle" sends 50 records at a time, "city" update_or_insert 
all of them in a transaction), "city" replies with some message meaning 
"ok, processed".
At this point "jungle" knows that the batch is "committed" on "city", so 
"marks" in some way the records just sent as "already transmitted", selects 
the next batch and prepare data to be sent. 
If you have a solid model (i.e. a unique record and a "transmitted" boolean 
for every record you must sync), "jungle" can use web2py's scheduler: every 
n seconds tries to connect to the "city" endpoint, if "city" is available 
--> data is transmitted, if "city" is unavailable (probably raising an 
urllib error) --> waits n seconds and retry.

NB: jsonrpc saves some kb, xmlrpc is more verbose and needs more "space" to 
serialize the same record.

NB2: if data is several megabytes and perhaps serializing to csv and 
zipping or gzipping it result in a serious compression percentage, then you 
can replace jsonrpc with:
-  "jungle" dumps the data to csv and transmit it zipped
- "city" receive zipped data, decompress to csv and loads data. You can use 
then db.table.import_from_csv_file that handles the "update_or_insert" 
automatically, as stated in the book 
http://web2py.com/books/default/chapter/29/6#CSV-%28one-Table-at-a-time%29
"""
If a table contains a field called "uuid", this field will be used to 
identify duplicates. Also, if an imported record has the same "uuid" as an 
existing record, the previous record will be updated.
"""

Summary: if frequent updates (like 50 updates in a minute) are needed, then 
a message queue like rabbitmq is probably the best choice ("high 
availability"). Given that the connectivity is "random", I guess the "city" 
app (and users) already "knows" that "jungle" can be disconnected for some 
time. If the underlying model of "jungle" and "city" can easily identify a) 
what data is already synced b) a way to identify unique records, you're 
probably "overcomplicating" a relatively simple issue to fix. 
If "city" is online, it will "reply" to "jungle" as soon as possible (i.e. 
when "jungle" can reach "city"). You just have to write a small function 
for "jungle" that identifies what records are going to be sent and a small 
function on city that acknowledges the received data. Then cron or 
scheduler will do the work for you (polling at specified intervals for 
"city" availability and sending data from "jungle" to "city") 


On Wednesday, July 11, 2012 7:06:23 PM UTC+2, Alfonso wrote:
>
> Hi, 
>
>
> I have a web2py app in 2 places: 
> - City location 
> - Rainforest location 
>
> The app is the same for both cases, however the city app is the "main". 
>
> I need to synchronize the information entered in the location of the 
> jungle to the city but not bi-directionally due to low bandwidth, in 
> addition should be automatic as soon as there is availability of 
> connectivity (ie queue management / messaging). 
>
> Has anyone had experiences like this with web2py specifically? ( I can 
> surely work with rabbitmq, hornetmq, etc. but there is an approach for 
> web2py?) 
>
>
> Saludos, 
>
> -------------------------------- 
> Alfonso de la Guarda 
> Twitter: @alfonsodg 
> Redes sociales: alfonsodg 
>    Telef. 991935157 
> 1024D/B23B24A4 
> 5469 ED92 75A3 BBDB FD6B  58A5 54A1 851D B23B 24A4 
>

[web2py] Re: Asyncronous Application Sync

Reply via email to