Hi, Denis,

Thank you very much, type so many words. It is  very valuable for me.
The reason of previous design is that I want to use SQL. Though,most times, we just do simple aggregations,
SQL give us more capabilities.  The requirements are part of our whole solution, I think you are right, I will reconsider it.


On 4/9/2018 18:25Denis Mekhanikov<dmekhani...@gmail.com> wrote:

These requirements don't sound realistic.
You can do an experiment and try inserting your data into a Java TreeMap, for example, which is many times faster than a distributed cache (but much less functional, obviously).
It takes almost a second just to insert a few millions of short strings into a map on my laptop, even if I do in parallel.

So, this is a pretty big amount of data to process, even if no complex operations are involved.

When we are talking about storing data in Ignite, then you should take into account the overhead for serialization and maintaining complex data structures, that let it work in a distributed environment. And if Ignite is run on a separate server, then network communication will also become a part of the equation.

Loading the data will probably be the most expensive part. If you eliminate it, then desired time can be achieved with proper tuning, if your queries are not too complex.
So, think about keeping the data and modifying the changed pieces instead of wiping everything out after the processing finishes, if it's possible.


вс, 8 апр. 2018 г. в 17:53, shawn.du <shawn...@neulion.com.cn>:
i want to know how costly below case:
1 creat a temp tabl/cache dynamically. Table only has 3 or 4 columns.
2 insert millions rows data by mapreduce.
Each row data is small. Like each column less than 20 bytes.  
3 Then do some simple aggregation query.
4 drop the cache.

Can above operation finish in less one second?  Which steps is most costly?

Suppose run on a one-node cluster,8 cores,memory is enough.


Reply via email to