Re: how costly creat a temp cache and drop it?

2018-04-09 Thread shawn.du






Hi, Denis,Thank you very much, type so many words. It is  very valuable for me.
The reason of previous design is that I want to use SQL. Though,most times, we just do simple aggregations,SQL give us more capabilities.  The requirements are part of our whole solution, I think you are right, I will reconsider it.






ThanksShawn





On 4/9/2018 18:25,Denis Mekhanikov wrote: 


Hi!These requirements don't sound realistic.You can do an experiment and try inserting your data into a Java TreeMap, for example, which is many times faster than a distributed cache (but much less functional, obviously).It takes almost a second just to insert a few millions of short strings into a map on my laptop, even if I do in parallel.So, this is a pretty big amount of data to process, even if no complex operations are involved.When we are talking about storing data in Ignite, then you should take into account the overhead for serialization and maintaining complex data structures, that let it work in a distributed environment. And if Ignite is run on a separate server, then network communication will also become a part of the equation.Loading the data will probably be the most expensive part. If you eliminate it, then desired time can be achieved with proper tuning, if your queries are not too complex.So, think about keeping the data and modifying the changed pieces instead of wiping everything out after the processing finishes, if it's possible.Denisвс, 8 апр. 2018 г. в 17:53, shawn.du :

Hi,i want to know how costly below case:1 creat a temp tabl/cache dynamically. Table only has 3 or 4 columns.2 insert millions rows data by mapreduce.Each row data is small. Like each column less than 20 bytes.  3 Then do some simple aggregation query.4 drop the cache.Can above operation finish in less one second?  Which steps is most costly?Suppose run on a one-node cluster,8 cores,memory is enough.Thanks.


















shawn.du



邮箱:shawn...@neulion.com.cn









签名由 网易邮箱大师 定制
 
 








Re: how costly creat a temp cache and drop it?

2018-04-09 Thread Denis Mekhanikov
Hi!

These requirements don't sound realistic.
You can do an experiment and try inserting your data into a Java TreeMap,
for example, which is many times faster than a distributed cache (but much
less functional, obviously).
It takes almost a second just to insert a few millions of short strings
into a map on my laptop, even if I do in parallel.

So, this is a pretty big amount of data to process, even if no complex
operations are involved.

When we are talking about storing data in Ignite, then you should take into
account the overhead for serialization and maintaining complex data
structures, that let it work in a distributed environment. And if Ignite is
run on a separate server, then network communication will also become a
part of the equation.

Loading the data will probably be the most expensive part. If you eliminate
it, then desired time can be achieved with proper tuning, if your queries
are not too complex.
So, think about keeping the data and modifying the changed pieces instead
of wiping everything out after the processing finishes, if it's possible.

Denis

вс, 8 апр. 2018 г. в 17:53, shawn.du :

> Hi,
> i want to know how costly below case:
> 1 creat a temp tabl/cache dynamically. Table only has 3 or 4 columns.
> 2 insert millions rows data by mapreduce.
> Each row data is small. Like each column less than 20 bytes.
> 3 Then do some simple aggregation query.
> 4 drop the cache.
>
> Can above operation finish in less one second?  Which steps is most costly?
>
> Suppose run on a one-node cluster,8 cores,memory is enough.
>
> Thanks.
>
> shawn.du
> 邮箱:shawn...@neulion.com.cn
>
> 
>
> 签名由 网易邮箱大师  定制
>


how costly creat a temp cache and drop it?

2018-04-08 Thread shawn.du


Hi,i want to know how costly below case:1 creat a temp tabl/cache dynamically. Table only has 3 or 4 columns.2 insert millions rows data by mapreduce.Each row data is small. Like each column less than 20 bytes.  3 Then do some simple aggregation query.4 drop the cache.Can above operation finish in less one second?  Which steps is most costly?Suppose run on a one-node cluster,8 cores,memory is enough.Thanks.


















shawn.du



邮箱:shawn...@neulion.com.cn









签名由 网易邮箱大师 定制