Importing the CSV into H2 database will require a huge amount of memory, since the file is big and contains a lot of redundant data. Some rows should be aggregated since they belong to an object with the same key (e.g. accountNumber). Moreover the rows are not sorted by the accountNumber.
Could you propose another solution instead of using the H2 database? I have tried storing the CSV in igfs and performing the map-reduce. I instantiate an object for each row in the the job, but in the reduce method in the task, I get all the instantiated objects. They are not grouped by the accountNumber. Is there a way to get a grouped object in the reduce method? > Am 20.01.2016 um 07:21 schrieb Alexey Kuznetsov <[email protected]>: > > Ferry, > > I would like to propose following work around: > 1) Import your CSV into H2 database, see: > http://www.h2database.com/html/tutorial.html#csv > <http://www.h2database.com/html/tutorial.html#csv> > 2) Use Apache Ignite Schema Import Utility to generate POJO classes and > xml/java configuration,\ > see https://apacheignite.readme.io/docs/automatic-persistence > <https://apacheignite.readme.io/docs/automatic-persistence> > 3) Use CacheJdbcPojoStoreFactory / CacheJdbcPojoStore to load your data into > cache. > > Will this work for you? > > > On Tue, Jan 19, 2016 at 10:02 PM, Ferry Syafei Sapei > <[email protected] <mailto:[email protected]>> wrote: > I have a CSV file with the following structure: > > accountNumber,accountProperty1,accountProperty2,billNumber,billProperty1,billProperty2 > 100,property11,property12,100700,billProperty11,billProperty12 > 100,property11,property12,100700,billProperty21,billProperty22 > > I would like to import the file and fill in the cache with the following > object structure: > class AccountInformation > int accountNumber > String accountProperty1 > String accountProperty2 > List<Bill> bills > > class Bill > int billNumber > String billProperty1 > String billProperty2 > > I have tried using IgniteDataStreamer and StreamVisitor. Line by line will be > read and added to the data stream. In the data streamer, I could check if the > account information exists or not. If it exists, I just add the new bill to > the existing account and replace the cache content for that account. > > How can I achieve the same result using CacheStore? > > > > -- > Alexey Kuznetsov > GridGain Systems > www.gridgain.com <http://www.gridgain.com/>
