Thanks Rahul. Data source is JdbcDataSource with MySQL database. Data size is around 100GB. I am not much familiar with spark but are you suggesting that we should create document by merging distinct RDBMS tables in using RDD?
On Thu, Apr 12, 2018 at 10:06 PM, Rahul Singh <rahul.xavier.si...@gmail.com> wrote: > How much data and what is the database source? Spark is probably the > fastest way. > > -- > Rahul Singh > rahul.si...@anant.us > > Anant Corporation > > On Apr 12, 2018, 7:28 AM -0400, Sujay Bawaskar <sujaybawas...@gmail.com>, > wrote: > > Hi, > > > > We are using DIH with SortedMapBackedCache but as data size increases we > > need to provide more heap memory to solr JVM. > > Can we use multiple CSV file instead of database queries and later data > in > > CSV files can be joined using zipper? So bottom line is to create CSV > files > > for each of entity in data-config.xml and join these CSV files using > > zipper. > > We also tried EHCache based DIH cache but since EHCache uses MMap IO its > > not good to use with MMapDirectoryFactory and causes to exhaust physical > > memory on machine. > > Please suggest how can we handle use case of importing huge amount of > data > > into solr. > > > > -- > > Thanks, > > Sujay P Bawaskar > > M:+91-77091 53669 > -- Thanks, Sujay P Bawaskar M:+91-77091 53669