Raymond,

Running parallel index might be trickier than it looks if the scale is big.
For instance, you can easily partition your data (let's say into 5 chunks)
and run 5 processes to index them. However, you will need to be aware if
there will be choke in the pipeline along the way (e.g. I/O of database, or
even commits at the core). If you think your infrastructure can handle the
load, you can try what i aforementioned.

On Thu, May 24, 2018 at 9:36 AM, Raymond Xie <xie3208...@gmail.com> wrote:

> Thank you Rahul despite that's very high level.
>
> With no offense, do you have a successful implementation or it is just your
> unproven idea? I never used Rabbit nor Kafka before but would be very
> interested in knowing more detail on the Kafka idea as Kafka is available
> in my environment.
>
> Thank you again and look forward to hearing more from you or anyone in this
> Solr community.
>
>
> *------------------------------------------------*
> *Sincerely yours,*
>
>
> *Raymond*
>
> On Wed, May 23, 2018 at 8:15 AM, Rahul Singh <rahul.xavier.si...@gmail.com
> >
> wrote:
>
> > Enumerate the file locations (map) , put them in a queue like rabbit or
> > Kafka (Persist the map), have a bunch of threads , workers, containers,
> > whatever pop off the queue , process the item (reduce).
> >
> >
> > --
> > Rahul Singh
> > rahul.si...@anant.us
> >
> > Anant Corporation
> >
> > On May 20, 2018, 7:24 AM -0400, Raymond Xie <xie3208...@gmail.com>,
> wrote:
> >
> > I know how to do indexing on file system like single file or folder, but
> > how do I do that in a parallel way? The data I need to index is of huge
> > volume and can't be put on HDFS.
> >
> > Thank you
> >
> > *------------------------------------------------*
> > *Sincerely yours,*
> >
> >
> > *Raymond*
> >
> >
>



-- 

Best regards,
Adhyan Arizki

Reply via email to