Well that begins to not look so much like a Solr/Lucene problem.  Overall
data is moderately large (TB's to 10's of TB's) for Lucene and the
individual user profiles are distinctly large to be storing in Lucene.

If there is part of the profile that you might want to search, that would
be appropriate for Lucene.  If you can split the user data into several
components that are updated independently, then Hbase might be appropriate
with different components in different column families.

You aren't going to get a definitive answer on a mailing list, however.
 You are going to need somebody with a bit of experience to advise you
directly and/or you are going to need to prototype test cases.

On Tue, Dec 20, 2011 at 1:07 PM, Alireza Salimi <alireza.sal...@gmail.com>wrote:

> Well, actually we haven't started the actual project yet.
> But probably it will have to handle the data of millions of users,
> and a rough estimation for each user's data would be something around
> 5 MB.
>
> The other problem is that those data will be changed very often.
>
> I hope I answered your question.
>
> Thanks
>
> On Tue, Dec 20, 2011 at 4:00 PM, Ted Dunning <ted.dunn...@gmail.com>
> wrote:
>
> > You didn't mention how big your data is or how you create it.
> >
> > Hadoop would mostly used in the preparation of the data or the off-line
> > creation of indexes.
> >
> > On Tue, Dec 20, 2011 at 12:28 PM, Alireza Salimi
> > <alireza.sal...@gmail.com>wrote:
> >
> > > Hi,
> > >
> > > I have a basic question, let's say we're going to have a very very huge
> > set
> > > of data.
> > > In a way that for sure we will need many servers (tens or hundreds of
> > > servers).
> > > We will also need failover.
> > > Now the question is, if we should use Hadoop or using Solr Distributed
> > > Search
> > > with shards would be enough?
> > >
> > > I've read lots of articles like:
> > > http://www.lucidimagination.com/content/scaling-lucene-and-solr
> > > http://wiki.apache.org/solr/DistributedSearch
> > >
> > > But I'm still confused, Solr's distributed search seems to be able to
> > > handle
> > > splitting the queries and merging the result. So what's the point of
> > using
> > > Hadoop?
> > >
> > > I'm pretty sure I'm missing something here. Can anyone suggest
> > > some links regarding this issue?
> > >
> > > Regards
> > >
> > > --
> > > Alireza Salimi
> > > Java EE Developer
> > >
> >
>
>
>
> --
> Alireza Salimi
> Java EE Developer
>

Reply via email to