Re: managing active/passive cores in Solr and Haystack

2017-03-28 Thread serwah sabetghadam
Dear all,

Do you know any good reference/best practice for Solr to work with
Time-series data, time-based indexes or retiring data.
As I searched it seems to me that we should simulate the configuration
ourselves through distributed search.

Any help is highly appreciated,
Best,
Serwah


On Fri, Mar 17, 2017 at 3:30 PM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 3/15/2017 7:55 AM, serwah sabetghadam wrote:
> > Thanks Erick for the fast answer:)
> >
> > I knew about sharding, just as far as I know it will work on different
> > servers.
> > I wonder if it is possible to do sth like sharding as you mentioned but
> on
> > a single standalone Solr?
> > Can I use the implicit routing on standalone then?
>
> If you're running standalone (not SolrCloud), then everything having to
> do with shards must be 100 percent managed by you.  There is no
> routing.  There is no capability of automatically managing which
> implicit shards belong to which logical index.  There's no automatic
> replication of index data for redundancy.  You're in charge of
> *everything* that SolrCloud would normally handle automatically.
>
> https://cwiki.apache.org/confluence/display/solr/
> Distributed+Search+with+Index+Sharding
>
> Multiple shards can live in a single Solr instance, whether you use
> SolrCloud or the old way described above.  If your query rate is very
> low, this probably will perform well.  As the query rate increases, it's
> best to only have one core per Solr instance.  Either way, it's
> *usually* best to only have one Solr instance per machine.
>
> Thanks,
> Shawn
>
>


-- 
Serwah Sabetghadam
Vienna University of Technology
Office phone: +43 1 58801 188633 <%2B43%201%2058801%20188314>


Re: managing active/passive cores in Solr and Haystack

2017-03-15 Thread serwah sabetghadam
Thanks Erick for the fast answer:)

I knew about sharding, just as far as I know it will work on different
servers.
I wonder if it is possible to do sth like sharding as you mentioned but on
a single standalone Solr?
Can I use the implicit routing on standalone then?

and I would appreciate if someone has experience with HAYSTACK conducting
Solr to share the experience here.

Best,
Serwah

On Tue, Mar 14, 2017 at 4:15 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> I don't know much about HAYSTACK, but for the Solr URL you probably
> want the "shards" parameter for searching, see:
> https://cwiki.apache.org/confluence/display/solr/
> Distributed+Search+with+Index+Sharding
>
> And just use the specific core you care about for update requests.
>
> But I would suggest that you can have Solr do much of this work,
> specifically SolrCloud with "implicit" routing. Combine that with
> "collection aliasing" and I think you have what you need with a single
> Solr URL. "implicit" routing allows you to send docs to a particular
> shard based on the value of a particular field. You can add/remove
> shards at will (only with the "implicit" router, not with the default
> compositeID router". Etc.
>
> I've skimmed over lots of details here, I just didn't wan you to be
> unaware that a solution exists (see "time series data" in the
> literature).
>
> Best,
> Erick
>
> On Tue, Mar 14, 2017 at 8:06 AM, serwah sabetghadam
> <sabetgha...@ifs.tuwien.ac.at> wrote:
> > Hi all,
> >
> > I am totally new to this group and of course so happy to join:)
> > So my question may be repetitive but I did not find how to search all
> > previous questions.
> >
> >
> > problem in one sentence:
> > to read from multiple cores (archive and active ones), write only to the
> > latest active core
> > using Solr and Haystack
> >
> >
> > I am designing a periodic indexing system, one core per months, of which
> > always the last two indexes are used to search on, and always the last
> one
> > is the active one for current indexing.
> >
> >
> > We are using Haystack to manage the communications with Solr.
> > We can use multiple cores in the settings.py in Haystack, that is totally
> > fine.
> > The problem is that in this case, as I have tested, both cores are
> getting
> > updated for new indexing.
> >
> > Then I decided to use the "--using" parameter of Haystack to select which
> > backend to use for updating the index, sth like:
> >
> > ./manage.py update_index events.Event --age=24 --workers=4
> --using=default
> >
> > that in default part in the settigns.py file I have defined the active
> > core.
> > HAYSTACK_CONNECTIONS = {
> > 'default': {
> > 'ENGINE': 'haystack.backends.solr_backend.SolrEngine',
> >  'URL': 'http://127.0.0.1:8983/solr/core_Feb',
> >   },
> >  'slave':{
> >   'ENGINE': 'haystack.backends.solr_backend.SolrEngine',
> >   'URL': 'http://127.0.0.1:8983/solr/core_Jan',
> >  },
> >  }
> >
> > here core_Feb is the active core, or is going to be the active core.
> >
> > then now I am not sure this way it will read from both. Now I can manage
> > the write part, but again problem with reading from multiple cores. What
> I
> > tested before for reading from multiple cores was like :
> >
> > HAYSTACK_CONNECTIONS = {
> > 'default': {
> > 'ENGINE': 'haystack.backends.solr_backend.SolrEngine',
> >  'URL': 'http://127.0.0.1:8983/solr/core_Feb',
> >   'URL': 'http://127.0.0.1:8983/solr/core_Jan',
> >  },
> >  }
> >
> >
> > but in this case it will write in both! that I want to write only in the
> > core_Feb one.
> >
> > Any help is highly appreciated,
> > Best,
> > Serwah
>



-- 
Serwah Sabetghadam
Vienna University of Technology
Office phone: +43 1 58801 188633 <%2B43%201%2058801%20188314>


managing active/passive cores in Solr and Haystack

2017-03-14 Thread serwah sabetghadam
Hi all,

I am totally new to this group and of course so happy to join:)
So my question may be repetitive but I did not find how to search all
previous questions.


problem in one sentence:
to read from multiple cores (archive and active ones), write only to the
latest active core
using Solr and Haystack


I am designing a periodic indexing system, one core per months, of which
always the last two indexes are used to search on, and always the last one
is the active one for current indexing.


We are using Haystack to manage the communications with Solr.
We can use multiple cores in the settings.py in Haystack, that is totally
fine.
The problem is that in this case, as I have tested, both cores are getting
updated for new indexing.

Then I decided to use the "--using" parameter of Haystack to select which
backend to use for updating the index, sth like:

./manage.py update_index events.Event --age=24 --workers=4 --using=default

that in default part in the settigns.py file I have defined the active
core.
HAYSTACK_CONNECTIONS = {
'default': {
'ENGINE': 'haystack.backends.solr_backend.SolrEngine',
 'URL': 'http://127.0.0.1:8983/solr/core_Feb',
  },
 'slave':{
  'ENGINE': 'haystack.backends.solr_backend.SolrEngine',
  'URL': 'http://127.0.0.1:8983/solr/core_Jan',
 },
 }

here core_Feb is the active core, or is going to be the active core.

then now I am not sure this way it will read from both. Now I can manage
the write part, but again problem with reading from multiple cores. What I
tested before for reading from multiple cores was like :

HAYSTACK_CONNECTIONS = {
'default': {
'ENGINE': 'haystack.backends.solr_backend.SolrEngine',
 'URL': 'http://127.0.0.1:8983/solr/core_Feb',
  'URL': 'http://127.0.0.1:8983/solr/core_Jan',
 },
 }


but in this case it will write in both! that I want to write only in the
core_Feb one.

Any help is highly appreciated,
Best,
Serwah