Re: DIH with huge data

Sujay Bawaskar Thu, 12 Apr 2018 10:11:59 -0700

Thanks Rahul. Data source is JdbcDataSource with MySQL database. Data size
is around 100GB.
I am not much familiar with spark but are you suggesting that we should
create document by merging distinct RDBMS tables in using RDD?


On Thu, Apr 12, 2018 at 10:06 PM, Rahul Singh <rahul.xavier.si...@gmail.com>
wrote:

> How much data and what is the database source? Spark is probably the
> fastest way.
>
> --
> Rahul Singh
> rahul.si...@anant.us
>
> Anant Corporation
>
> On Apr 12, 2018, 7:28 AM -0400, Sujay Bawaskar <sujaybawas...@gmail.com>,
> wrote:
> > Hi,
> >
> > We are using DIH with SortedMapBackedCache but as data size increases we
> > need to provide more heap memory to solr JVM.
> > Can we use multiple CSV file instead of database queries and later data
> in
> > CSV files can be joined using zipper? So bottom line is to create CSV
> files
> > for each of entity in data-config.xml and join these CSV files using
> > zipper.
> > We also tried EHCache based DIH cache but since EHCache uses MMap IO its
> > not good to use with MMapDirectoryFactory and causes to exhaust physical
> > memory on machine.
> > Please suggest how can we handle use case of importing huge amount of
> data
> > into solr.
> >
> > --
> > Thanks,
> > Sujay P Bawaskar
> > M:+91-77091 53669
>



-- 
Thanks,
Sujay P Bawaskar
M:+91-77091 53669

Re: DIH with huge data

Reply via email to