Re: Map-reduce proceesing
Hi, If you broadcast the job and want to iterate over cache inside it, then please make sure that you iterate only over local entries (e.g. IgniteCache.localEntries(), ScanQuery.setLocal(true), etc.). Otherwise your jobs will duplicate work and performance will suffer. Also please note that returned result set might be incomplete if one of the nodes failed during job processing. If you care about it, you should either implement some failover, or use Ignite's built-in queries (ScanQuery, SqlQuery) which already take care of it. Anyway, I strongly recommend you to focus on SqlQuery first. You can configure indexes on cache and they could give you great boost, because instead of iterating over the whole cache, Ignite will use indexes for fast data lookup. Vladimir. On Wed, Apr 20, 2016 at 12:31 PM, dmreshet <dmres...@gmail.com> wrote: > Yes, I know. > I want to compare performance of SQL, SQL with indexes and MapReduce job. > I have found that I can use broadcast to garantie that my MapReduce job > will > be executed on each node exactly once. > So now my job uses code: > /Collection<ListPerson> result = > > ignite.compute(ignite.cluster()).broadcast((IgniteCallable<ListPerson>>) > () -> {...});/ > > And than I will reduce the result. > > Is that the best practise to implement MapReduce job in case that I should > process data from cache? > > > > -- > View this message in context: > http://apache-ignite-users.70518.x6.nabble.com/Map-reduce-proceesing-tp4357p4364.html > Sent from the Apache Ignite Users mailing list archive at Nabble.com. >
Re: Map-reduce proceesing
Yes, I know. I want to compare performance of SQL, SQL with indexes and MapReduce job. I have found that I can use broadcast to garantie that my MapReduce job will be executed on each node exactly once. So now my job uses code: /Collection<ListPerson> result = ignite.compute(ignite.cluster()).broadcast((IgniteCallable<ListPerson>>) () -> {...});/ And than I will reduce the result. Is that the best practise to implement MapReduce job in case that I should process data from cache? -- View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Map-reduce-proceesing-tp4357p4364.html Sent from the Apache Ignite Users mailing list archive at Nabble.com.
Re: Map-reduce proceesing
Hi, There is no need to implement SQL queries using map-reduce. Ignite already has it's own query engine. Please refer to *org.apache.ignite.cache.query.SqlQuery *class and *IgniteCache.query()* method. Alternatively you can use scan queries for some cases. See *org.apache.ignite.cache.query.ScanQuery*. Vladimir. On Wed, Apr 20, 2016 at 10:41 AM, dmreshet <dmres...@gmail.com> wrote: > Hello! > I want to implement SQL query in terms of MapReduce with > ComputeTaskSplitAdapter. > > /select * from Person where salary > ?/ > > And I want to know what is the best practise to do this? > > At this moment I am using cache.localEntries() to get all cache values at > Map stage and it look's like it is not coorect, because there is no > garanties that each task will be executed on different nodes of Ignite Data > Grid. > > Here is an example of split method of my ComputeTaskSplitAdapter class > > > /@Override > protected Collection split(int gridSize, Integer > salary) throws IgniteException { > List jobs = new ArrayList<>(gridSize); > > for (int i = 0; i < gridSize; i++) { > jobs.add(new ComputeJobAdapter() { > @Override > public Object execute() { > IgniteCache<Long, Person> cache = > Ignition.ignite().cache(Executor.PERSON_CACHE); > List list = new ArrayList<>(); > Iterable<Cache.EntryLong, Person>> entries = > cache.localEntries(); > entries.forEach((entry -> { > if (entry.getValue().getSalary() > salary) { > list.add(entry.getValue()); > } > })); > > return list; > } > }); > } > > return jobs; > } > / > > > > > > -- > View this message in context: > http://apache-ignite-users.70518.x6.nabble.com/Map-reduce-proceesing-tp4357.html > Sent from the Apache Ignite Users mailing list archive at Nabble.com. >
Map-reduce proceesing
Hello! I want to implement SQL query in terms of MapReduce with ComputeTaskSplitAdapter. /select * from Person where salary > ?/ And I want to know what is the best practise to do this? At this moment I am using cache.localEntries() to get all cache values at Map stage and it look's like it is not coorect, because there is no garanties that each task will be executed on different nodes of Ignite Data Grid. Here is an example of split method of my ComputeTaskSplitAdapter class /@Override protected Collection split(int gridSize, Integer salary) throws IgniteException { List jobs = new ArrayList<>(gridSize); for (int i = 0; i < gridSize; i++) { jobs.add(new ComputeJobAdapter() { @Override public Object execute() { IgniteCache<Long, Person> cache = Ignition.ignite().cache(Executor.PERSON_CACHE); List list = new ArrayList<>(); Iterable<Cache.EntryLong, Person>> entries = cache.localEntries(); entries.forEach((entry -> { if (entry.getValue().getSalary() > salary) { list.add(entry.getValue()); } })); return list; } }); } return jobs; } / -- View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Map-reduce-proceesing-tp4357.html Sent from the Apache Ignite Users mailing list archive at Nabble.com.