I am trying to build a low latency machine learning system from scratch using apache ignite. Note: I am in design phase, and have not implemented anything yet.
General data pipeline is: Json Data via socket -> Ignite Cache -> Ignite ML (Updating) -> Ignite Cache -> App (via continuous query) (maybe Ignite ML -> App via socket for improved latency) I am trying to minimise latency, and also improve ml speed. Obviously the distributed in-memory colocated processing is quite useful for high performance ml when dealing with lots of data. However, I am wondering: 1) What is best practice for performing various operations to improve latency / ml performance 2) Whether there could be fundamental changes in ignite framework to better support such thing One important thing here could be serialisation / deserialisation speed. This includes json ->(some object) -> cache. and cache -> (some object) -> ml vector So optimal would be to serialise from json direct into cache representation (binaryserialise?), and straight from cache into ml vector? Is this possible? Any best practice? Did this make it easier: https://issues.apache.org/jira/browse/IGNITE-13672 As well as potentially looking at more optimal method of representing data to improve ml performance. I have heard that storing columnar data is quite useful: https://arrow.apache.org/overview/ Is it possible that something like this could be implemented as an alternative cache memory architecture? If this is not possible - then is there an alternative to the java array [] / Vector on heap, that seems to be used in the ml algorithms? Is it possible for the ml algorithms to work on the data in place (in the cache), without having to retrieve it (is this what IgniteRDD does for spark?) The two considerations would be improving ml algorithm performance, and as well mininising (de)serialisation overhead. Thanks ! -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/