master master, repeaters

2010-12-19 Thread Tri Nguyen
Hi,

In the master-slave configuration, I'm trying to figure out how to configure 
the 
system setup for master failover.

Does solr support master-master setup?  From my readings, solr does not.

I've read about repeaters as well where the slave can act as a master.  When 
the 
main master goes down, do the other slaves switch to the repeater?

Barring better solutions, I'm thinking about putting 2 masters behind  a load 
balancer.

If this is not implemented already, perhaps solr can be updated to support a 
list of masters for fault tolerance.

Tri

shard versus core

2010-12-19 Thread Tri Nguyen
Hi,

Was wondering about  the pro's and con's of using sharding versus cores.

An index can be split up to multiple cores or multilple shards.

So why one over the other?

Thanks,


tri

Re: shard versus core

2010-12-19 Thread Erick Erickson
Well, they can be different beasts. First of all, different cores can have
different schemas, which is not true of shards. Also, shards are almost
assumed to be running on different machines as a scaling technique,
whereas it multiple cores are run on a single Solr instance.

So using multiple cores is very similar to running multiple virtual Solr
serves on a single machine, each independent of the other. This can make
sense if, for instance, you wanted to have a bunch of small indexes all
on one machine. You could use multiple cores rather than multiple
instances of Solr. These indexes may or may not have anything to do with
each other.

Sharding, on the other hand, is almost always used to split a single logical
index up amongst multiple machines in order to improve performance. The
assumption usually is that the index is too big to give satisfactory
performance
on a single machine, so you'll split it into parts. That assumption really
implies that it makes no sense to put multiple shards on the #same# machine.

So really, the answer to your question is that you choose the right
technique
for the problem you're trying to solve. They aren't really different
solutions to
the same problem...

Hope this helps.
Erick

On Sun, Dec 19, 2010 at 4:07 AM, Tri Nguyen tringuye...@yahoo.com wrote:

 Hi,

 Was wondering about  the pro's and con's of using sharding versus cores.

 An index can be split up to multiple cores or multilple shards.

 So why one over the other?

 Thanks,


 tri


Re: shard versus core

2010-12-19 Thread Shawn Heisey

On 12/19/2010 2:07 AM, Tri Nguyen wrote:

Was wondering about  the pro's and con's of using sharding versus cores.

An index can be split up to multiple cores or multilple shards.

So why one over the other?


If you split your index into multiple cores, you still have to use the 
shards parameter to tell Solr where to find the parts.  You can use 
multiple servers, multiple cores, or even both.  Which method to use 
depends on why you've decided to split your index into multiple pieces.


If the primary motivating factor is index size, you'll probably want to 
use separate servers.  Unless the only reason for distributed search is 
making build process easier (or possible), I personally would not have 
multiple live cores on a single machine.  An example where multiple 
cores per server is entirely appropriate (creating a new core every five 
minutes):


http://www.loggly.com/2010/08/our-solr-system/

I went to this guy's talk at Lucene Revolution.  Amazing stuff.

Shawn



Re: DIH for sharded database?

2010-12-19 Thread Dennis Gearon
The easiest way, and the way that the database needs to use those shards, 
probably, is to use a view with a queiry and I think it joins on the primary 
key.

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: Andy angelf...@yahoo.com
To: solr-user@lucene.apache.org
Sent: Sat, December 18, 2010 6:20:54 PM
Subject: DIH for sharded database?

I have a table that is broken up into many virtual shards. So basically I have 
N 
identical tables:

Document1
Document2
.
.
Document36

Currently these tables all live in the same database, but in the future they 
may 
be moved to different servers to scale out if the needs arise.

Is there any way to configure a DIH for these tables so that it will 
automatically loop through the 36 identical tables and pull data out for 
indexing?

Something like (pseudo code):

for (i = 1; i = 36; i++) {
   ## retrieve data from the table Document{$i}  index the data
}

What's the best way to handle a situation like this?

Thanks


Re: DIH for sharded database?

2010-12-19 Thread Dennis Gearon
Some talk on giant databases in postgres:
  
http://wiki.postgresql.org/images/3/38/PGDay2009-EN-Datawarehousing_with_PostgreSQL.pdf

wikipedia
  http://en.wikipedia.org/wiki/Partition_%28database%29
  (says to use a UNION)
postgres description on how to do it:
  http://www.postgresql.org/docs/current/interactive/ddl-partitioning.html

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: Andy angelf...@yahoo.com
To: solr-user@lucene.apache.org
Sent: Sat, December 18, 2010 6:20:54 PM
Subject: DIH for sharded database?

I have a table that is broken up into many virtual shards. So basically I have 
N 
identical tables:

Document1
Document2
.
.
Document36

Currently these tables all live in the same database, but in the future they 
may 
be moved to different servers to scale out if the needs arise.

Is there any way to configure a DIH for these tables so that it will 
automatically loop through the 36 identical tables and pull data out for 
indexing?

Something like (pseudo code):

for (i = 1; i = 36; i++) {
   ## retrieve data from the table Document{$i}  index the data
}

What's the best way to handle a situation like this?

Thanks


Re: master master, repeaters

2010-12-19 Thread Upayavira
We had a (short) thread on this late last week. 

Solr doesn't support automatic failover of the master, at least in
1.4.1. I've been discussing with my colleague (Tommaso) about ways to
achieve this.

There's ways we could 'fake it', scripting the following:

 * set up a 'backup' master, as a replica of the actual master
 * monitor the master for 'up-ness'
 * if it fails:
   * tell the master to start indexing to the backup instead
   * tell the slave(s) to connect to a different master (the backup)
 * then, when the master is back:
   * wipe its index (backing up dir first?)
   * configure it to be a backup of the new master
   * make it pull a fresh index over

But, Jan Høydahl suggested using SolrCloud. I'm going to follow up on
how that might work in that thread.

Upayavira
 

On Sun, 19 Dec 2010 00:20 -0800, Tri Nguyen tringuye...@yahoo.com
wrote:
 Hi,
 
 In the master-slave configuration, I'm trying to figure out how to
 configure the 
 system setup for master failover.
 
 Does solr support master-master setup?  From my readings, solr does not.
 
 I've read about repeaters as well where the slave can act as a master. 
 When the 
 main master goes down, do the other slaves switch to the repeater?
 
 Barring better solutions, I'm thinking about putting 2 masters behind  a
 load 
 balancer.
 
 If this is not implemented already, perhaps solr can be updated to
 support a 
 list of masters for fault tolerance.
 
 Tri


Re: master master, repeaters

2010-12-19 Thread Tri Nguyen
How do we tell the slaves to point to the new master without modifying the 
config files?  Can we do this while the slave is up, issuing a command to it?
 
Thanks,
 
Tri

--- On Sun, 12/19/10, Upayavira u...@odoko.co.uk wrote:


From: Upayavira u...@odoko.co.uk
Subject: Re: master master, repeaters
To: solr-user@lucene.apache.org
Date: Sunday, December 19, 2010, 10:13 AM


We had a (short) thread on this late last week. 

Solr doesn't support automatic failover of the master, at least in
1.4.1. I've been discussing with my colleague (Tommaso) about ways to
achieve this.

There's ways we could 'fake it', scripting the following:

* set up a 'backup' master, as a replica of the actual master
* monitor the master for 'up-ness'
* if it fails:
   * tell the master to start indexing to the backup instead
   * tell the slave(s) to connect to a different master (the backup)
* then, when the master is back:
   * wipe its index (backing up dir first?)
   * configure it to be a backup of the new master
   * make it pull a fresh index over

But, Jan Høydahl suggested using SolrCloud. I'm going to follow up on
how that might work in that thread.

Upayavira


On Sun, 19 Dec 2010 00:20 -0800, Tri Nguyen tringuye...@yahoo.com
wrote:
 Hi,
 
 In the master-slave configuration, I'm trying to figure out how to
 configure the 
 system setup for master failover.
 
 Does solr support master-master setup?  From my readings, solr does not.
 
 I've read about repeaters as well where the slave can act as a master. 
 When the 
 main master goes down, do the other slaves switch to the repeater?
 
 Barring better solutions, I'm thinking about putting 2 masters behind  a
 load 
 balancer.
 
 If this is not implemented already, perhaps solr can be updated to
 support a 
 list of masters for fault tolerance.
 
 Tri


Re: Transparent redundancy in Solr

2010-12-19 Thread Upayavira
Jan,

I'd appreciate a little more explanation here. I've explored SolrCloud
somewhat, but there's some bits of this architecture I don't yet get.

You say, next time an indexer slave pings ZK. What is an indexer
slave? Is that the external entity that is doing posting indexing
content? If this app that posts to Solr, you imply it must check with ZK
before it can do an HTTP post to Solr? Also, once you do this leader
election to switch to an alternative master, are you implying that this
new master was once a slave of the original master, and thus has a valid
index?

Find this interesting, but still not quite sure on how it works exactly.

Upayavira

On Fri, 17 Dec 2010 10:09 +0100, Jan Høydahl / Cominvent
jan@cominvent.com wrote:
 Hi,
 
 I believe the way to go is through ZooKeeper[1], not property files or
 local hacks. We've already started on this route and it makes sense to
 let ZK do what it is designed for, such as leader election. When a node
 starts up, it asks ZK what role it should have and fetches corresponding
 configuration. Then it polls ZK regularly to know if the world has
 changed. So if a master indexer goes down, ZK will register that as a
 state change condition, and next time one of the indexer slaves pings ZK,
 it may be elected as new master, and config in ZK is changed
 correspondingly, causing all adds to flow to the new master...
 
 Then, when the slaves cannot contact their old master, they ask ZK for an
 update, and retrieve a new value for master URL.
 
 Note also that SolrCloud is implementing load-balancing and sharding as
 part of the arcitecture so often we can skip dedicated LBs.
 
 [1] : http://wiki.apache.org/solr/SolrCloud
 
 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 
 On 15. des. 2010, at 18.50, Tommaso Teofili wrote:
 
  Hi all,
  me, Upayavira and other guys at Sourcesense have collected some Solr
  architectural views inside the presentation at [1].
  For sure one can set up an architecture for failover and resiliency on the
  search face (search slaves with coordinators and distributed search) but
  I'd like to ask how would you reach transparent redundancy in Solr on the
  index face.
  On slide 13 we put 2 slave backup masters and so if one of the main masters
  goes down you can switch slaves' replication on the backup master.
  First question if how could it be made automatic?
  In a previous thread [2] I talked about a possible solution writing the
  master url of slaves in a properties file so when you have to switch you
  change that url to the backup master and reload the slave's core but that is
  not automatic :-) Any more advanced ideas?
  Second question: when main master comes up how can it be automatically
  considered as the backup master (since hopefully the backup master has
  received some indexing requests in the meantime)? Also consider that its
  index should be wiped out and replicated from the new master to ensure index
  integrity.
  Looking forward for your feedback,
  Cheers,
  Tommaso
  
  [1] : http://www.slideshare.net/sourcesense/sharded-solr-setup-with-master
  [2] : http://markmail.org/thread/vjj5jovbg6evpmpp
 


Re: master master, repeaters

2010-12-19 Thread Upayavira


On Sun, 19 Dec 2010 10:20 -0800, Tri Nguyen tringuye...@yahoo.com
wrote:
 How do we tell the slaves to point to the new master without modifying
 the config files?  Can we do this while the slave is up, issuing a
 command to it?

I believe this can be done (details are in
http://wiki.apache.org/solr/SolrReplication), but I've not actually done
it.

Upayavira  

 --- On Sun, 12/19/10, Upayavira u...@odoko.co.uk wrote:
 
 
 From: Upayavira u...@odoko.co.uk
 Subject: Re: master master, repeaters
 To: solr-user@lucene.apache.org
 Date: Sunday, December 19, 2010, 10:13 AM
 
 
 We had a (short) thread on this late last week. 
 
 Solr doesn't support automatic failover of the master, at least in
 1.4.1. I've been discussing with my colleague (Tommaso) about ways to
 achieve this.
 
 There's ways we could 'fake it', scripting the following:
 
 * set up a 'backup' master, as a replica of the actual master
 * monitor the master for 'up-ness'
 * if it fails:
    * tell the master to start indexing to the backup instead
    * tell the slave(s) to connect to a different master (the backup)
 * then, when the master is back:
    * wipe its index (backing up dir first?)
    * configure it to be a backup of the new master
    * make it pull a fresh index over
 
 But, Jan Høydahl suggested using SolrCloud. I'm going to follow up on
 how that might work in that thread.
 
 Upayavira
 
 
 On Sun, 19 Dec 2010 00:20 -0800, Tri Nguyen tringuye...@yahoo.com
 wrote:
  Hi,
  
  In the master-slave configuration, I'm trying to figure out how to
  configure the 
  system setup for master failover.
  
  Does solr support master-master setup?  From my readings, solr does not.
  
  I've read about repeaters as well where the slave can act as a master. 
  When the 
  main master goes down, do the other slaves switch to the repeater?
  
  Barring better solutions, I'm thinking about putting 2 masters behind  a
  load 
  balancer.
  
  If this is not implemented already, perhaps solr can be updated to
  support a 
  list of masters for fault tolerance.
  
  Tri
 


Re: Custom scoring for searhing geographic objects

2010-12-19 Thread Alexey Serba
Hi Pavel,

I had the similar problem several years ago - I had to find
geographical locations in textual descriptions, geocode these objects
to lat/long during indexing process and allow users to filter/sort
search results to specific geographical areas. The important issue was
that there were several types of geographical objects - street  town
 region  country. The idea was to geocode to most narrow
geographical area as possible. Relevance logic in this case could be
specified as find the most narrow result that is unique identified by
your text or search query.  So I came up with custom algorithm that
was quite good in terms of performance and precision/recall. Here's
the simple description:
* You can intersect all text/searchquery terms with locations
dictionary to find only geo terms
* Search in your locations Lucene index and filter only street objects
(the most narrow areas). Due to tf*idf formula you'll get the most
relevant results. Then you need to post process N (3/5/10) results and
verify that they are matches indeed. I did intersect search terms with
result's terms and make another lucene search to verify if these terms
are unique identifying the match. If it's then return matching street.
If there's no any match proceed using the same algorithm with towns,
regions, countries.

HTH,
Alexey

On Wed, Dec 15, 2010 at 6:28 PM, Pavel Minchenkov char...@gmail.com wrote:
 Hi,
 Please give me advise how to create custom scoring. I need to result that
 documents were in order, depending on how popular each term in the document
 (popular = how many times it appears in the index) and length of the
 document (less terms - higher in search results).

 For example, index contains following data:

 ID    | SEARCH_FIELD
 --
 1     | Russia
 2     | Russia, Moscow
 3     | Russia, Volgograd
 4     | Russia, Ivanovo
 5     | Russia, Ivanovo, Altayskaya street 45
 6     | Russia, Moscow, Kremlin
 7     | Russia, Moscow, Altayskaya street
 8     | Russia, Moscow, Altayskaya street 15
 9     | Russia, Moscow, Altayskaya street 15/26


 And I should get next results:


 Query                     | Document result set
 --
 Russia                    | 1,2,4,3,6,7,8,9,5
 Moscow                  | 2,6,7,8,9
 Ivanovo                    | 4,5
 Altayskaya              | 7,8,9,5

 In fact --- it is a search for geographic objects (cities, streets, houses).
 At the same time can be given only part of the address, and the results
 should appear the most relevant results.

 Thanks.
 --
 Pavel Minchenkov



Performance Monitoring Solution

2010-12-19 Thread Cameron Hurst
I am at the point in my set up that I am happy with how things are being
indexed and my interface is all good to go but what I don't know how to
judge is how often it will be queried and how much resources it needs to
function properly. So what I am looking for is some sort of performance
monitoring solution. I know if I go to the statistics page i can find
the number of queries and the average response time. What I want is a
bit more detailed result, showing how it varies over time. A plot of RAM
usage and possibly disk IO that is due to Solr over time as well.

It is due to my new use of the program but also unsure about my user and
how much they will user my search interface that I need detailed results
of its use. Solutions I have found is New Relic RPM apparently has
support for Solr but you need to use one of their paid packages which I
would like to avoid. The other option I found is LiquidGaze, it says it
is an open source solution for monitoring the handlers and can do a lot
of what I need but has anyone ever used it before and can give it a
rating, good or bad. Is there another solution for this that I have
missed that would be better than the to that I listed?


Re: Dataimport performance

2010-12-19 Thread Alexey Serba
 With subquery and with left join:   320k in 6 Min 30
It's 820 records per second. It's _really_ impressive considering the
fact that DIH performs separate sql query for every record in your
case.

 So there's one track entity with an artist sub-entity. My (admittedly
 rather limited) experience has been that sub-entities, where you have
 to run a separate query for every row in the parent entity, really
 slow down data import.
Sub entities slows down data import indeed. You can try to avoid
separate query for every row by using CachedSqlEntityProcessor. There
are couple of options - 1) you can load all sub-entity data in memory
or 2) you can reduce the number of sql queries by caching sub entity
data per id. There's no silver bullet and each option has its own pros
and cons.

Also Ephraim proposed a really neat solution with GROUP_CONCAT, but
I'm not sure that all RDBMS-es support that.


2010/12/15 Robert Gründler rob...@dubture.com:
 i've benchmarked the import already with 500k records, one time without the 
 artists subquery, and one time without the join in the main query:


 Without subquery: 500k in 3 min 30 sec

 Without join and without subquery: 500k in 2 min 30.

 With subquery and with left join:   320k in 6 Min 30


 so the joins / subqueries are definitely a bottleneck.

 How exactly did you implement the custom data import?

 In our case, we need to de-normalize the relations of the sql data for the 
 index,
 so i fear i can't really get rid of the join / subquery.


 -robert





 On Dec 15, 2010, at 15:43 , Tim Heckman wrote:

 2010/12/15 Robert Gründler rob...@dubture.com:
 The data-config.xml looks like this (only 1 entity):

      entity name=track query=select t.id as id, t.title as title, 
 l.title as label from track t left join label l on (l.id = t.label_id) 
 where t.deleted = 0 transformer=TemplateTransformer
        field column=title name=title_t /
        field column=label name=label_t /
        field column=id name=sf_meta_id /
        field column=metaclass template=Track name=sf_meta_class/
        field column=metaid template=${track.id} name=sf_meta_id/
        field column=uniqueid template=Track_${track.id} 
 name=sf_unique_id/

        entity name=artists query=select a.name as artist from artist a 
 left join track_artist ta on (ta.artist_id = a.id) where 
 ta.track_id=${track.id}
          field column=artist name=artists_t /
        /entity

      /entity

 So there's one track entity with an artist sub-entity. My (admittedly
 rather limited) experience has been that sub-entities, where you have
 to run a separate query for every row in the parent entity, really
 slow down data import. For my own purposes, I wrote a custom data
 import using SolrJ to improve the performance (from 3 hours to 10
 minutes).

 Just as a test, how long does it take if you comment out the artists entity?




Re: Dataimport performance

2010-12-19 Thread Lukas Kahwe Smith

On 19.12.2010, at 23:30, Alexey Serba wrote:

 
 Also Ephraim proposed a really neat solution with GROUP_CONCAT, but
 I'm not sure that all RDBMS-es support that.


Thats MySQL only syntax.
But if you google you can find similar solution for other RDBMS.

regards,
Lukas Kahwe Smith
m...@pooteeweet.org





DIH for taxonomy faceting in Lucid webcast

2010-12-19 Thread Andy
Hi,

I watched the Lucid webcast:
http://www.lucidimagination.com/solutions/webcasts/faceting

It talks about encoding hierarchical categories to facilitate faceting. So a 
category path of NonFicScience would be encoded as the multivalues 
0/NonFic  1/NonFic/Science.

1) My categories are stored in database as coded numbers instead of fully 
spelled out names. For example I would have a category of 2/7 and a lookup 
dictionary to convert 2/7 into NonFic/Science. How do I do such lookup in 
DIH?

2) Once I have the fully spelled out category path such as NonFic/Science, 
how do I turn that into 0/NonFic  1/NonFic/Science using the DIH?

3) Some of my categories are multi-words containing whitespaces, such as 
Computer Science and Functional Programming, so I'd have facet values such 
as 2/NonFic/Computer Science/Functional Programming.  How do I handle 
whitespaces in this case? Would filtering by fq still work?

Thanks


  


Re: DIH for sharded database?

2010-12-19 Thread Andy
This is helpful. Thank you.

--- On Sun, 12/19/10, Dennis Gearon gear...@sbcglobal.net wrote:

 From: Dennis Gearon gear...@sbcglobal.net
 Subject: Re: DIH for sharded database?
 To: solr-user@lucene.apache.org
 Date: Sunday, December 19, 2010, 11:56 AM
 Some talk on giant databases in
 postgres:
   
 http://wiki.postgresql.org/images/3/38/PGDay2009-EN-Datawarehousing_with_PostgreSQL.pdf
 
 wikipedia
   http://en.wikipedia.org/wiki/Partition_%28database%29
   (says to use a UNION)
 postgres description on how to do it:
   http://www.postgresql.org/docs/current/interactive/ddl-partitioning.html
 
  Dennis Gearon
 
 
 Signature Warning
 
 It is always a good idea to learn from your own mistakes.
 It is usually a better 
 idea to learn from others’ mistakes, so you do not have
 to make them yourself. 
 from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'
 
 
 EARTH has a Right To Life,
 otherwise we all die.
 
 
 
 - Original Message 
 From: Andy angelf...@yahoo.com
 To: solr-user@lucene.apache.org
 Sent: Sat, December 18, 2010 6:20:54 PM
 Subject: DIH for sharded database?
 
 I have a table that is broken up into many virtual shards.
 So basically I have N 
 identical tables:
 
 Document1
 Document2
 .
 .
 Document36
 
 Currently these tables all live in the same database, but
 in the future they may 
 be moved to different servers to scale out if the needs
 arise.
 
 Is there any way to configure a DIH for these tables so
 that it will 
 automatically loop through the 36 identical tables and pull
 data out for 
 indexing?
 
 Something like (pseudo code):
 
 for (i = 1; i = 36; i++) {
    ## retrieve data from the table
 Document{$i}  index the data
 }
 
 What's the best way to handle a situation like this?
 
 Thanks
 





Custom transformer to get file content from file path

2010-12-19 Thread vasu p
Hi,
I have a custom library, which is used to input a file path and it returns
file content as a string output.
My DB has a file path in one of the table and using DIH configuration in
Solr to do the indexing. I couldnt use TikaEntityProcessor to do indexing of
a file located in file system. I though of using Custom Transformer to
transform file_path to file_content field in the row.

I would like to know following details:
1. Setting file content as a string to a custom file_content field might
cause memory issue if a very big file over hundreds of mega bites might
consume the RAM space. Is it possible to send a stream as input to Solr?
What is the filed type should be configured in schema.xml?
2. Is there any better approach than a custom transformer?
3. Any other best approach to implement indexing based on a file path?
Thanks a lot.


Re: Custom transformer to get file content from file path

2010-12-19 Thread Ahmet Arslan
 I have a custom library, which is used to input a file path
 and it returns
 file content as a string output.
 My DB has a file path in one of the table and using DIH
 configuration in
 Solr to do the indexing. I couldnt use TikaEntityProcessor
 to do indexing of
 a file located in file system. I though of using Custom
 Transformer to
 transform file_path to file_content field in the row.
 
 I would like to know following details:
 1. Setting file content as a string to a custom
 file_content field might
 cause memory issue if a very big file over hundreds of mega
 bites might
 consume the RAM space. Is it possible to send a stream as
 input to Solr?
 What is the filed type should be configured in schema.xml?
 2. Is there any better approach than a custom transformer?
 3. Any other best approach to implement indexing based on a
 file path?

http://wiki.apache.org/solr/DataImportHandler#PlainTextEntityProcessor should 
do the trick.


  


[Reload-Config] not working

2010-12-19 Thread Adam Estrada
a href=
http://localhost:8983/solr/select?clean=falsecommit=trueqt=%2Fdataimportcommand=full-import;Full
Import/abr /
a href=
http://localhost:8983/solr/select?clean=falsecommit=trueqt=%2Fdataimportcommand=reload-config;Reload
Configuration/a

All,

The links above are meant for me to reload the configuration file after a
change is made and the other is to perform the full import. My problem is
that The reload-config option does not seem to be working. Am I doing
anything wrong? Your expertise is greatly appreciated!

Adam


Re: [Reload-Config] not working

2010-12-19 Thread Ahmet Arslan

--- On Mon, 12/20/10, Adam Estrada estrada.adam.gro...@gmail.com wrote:

 From: Adam Estrada estrada.adam.gro...@gmail.com
 Subject: [Reload-Config] not working
 To: solr-user@lucene.apache.org
 Date: Monday, December 20, 2010, 5:33 AM
 a href=
 http://localhost:8983/solr/select?clean=falsecommit=trueqt=%2Fdataimportcommand=full-import;Full
 Import/abr /
 a href=
 http://localhost:8983/solr/select?clean=falsecommit=trueqt=%2Fdataimportcommand=reload-config;Reload
 Configuration/a
 
 All,
 
 The links above are meant for me to reload the
 configuration file after a
 change is made and the other is to perform the full import.
 My problem is
 that The reload-config option does not seem to be working.
 Am I doing
 anything wrong? Your expertise is greatly appreciated!
 
 Adam
 


  


Re: [Reload-Config] not working

2010-12-19 Thread Ahmet Arslan
 a href=
 http://localhost:8983/solr/select?clean=falsecommit=trueqt=%2Fdataimportcommand=full-import;Full
 Import/abr /
 a href=
 http://localhost:8983/solr/select?clean=falsecommit=trueqt=%2Fdataimportcommand=reload-config;Reload
 Configuration/a
 
 All,
 
 The links above are meant for me to reload the
 configuration file after a
 change is made and the other is to perform the full import.
 My problem is
 that The reload-config option does not seem to be working.
 Am I doing
 anything wrong? Your expertise is greatly appreciated!

I am sorry, I hit the reply button accidentally. 

Are you receiving/checking the message 
str name=importResponseConfiguration Re-loaded sucessfully/str
after the reload?

And are checking that data-config.xml is a valid xml after editing it 
programatically?

And instead of editing data-config.xml file cant you use  variable resolver? 
http://search-lucene.com/m/qYzPk2n86iIsubj


  


Re: Performance Monitoring Solution

2010-12-19 Thread Gora Mohanty
On Mon, Dec 20, 2010 at 3:13 AM, Cameron Hurst wakemaste...@z33k.com wrote:
 I am at the point in my set up that I am happy with how things are being
 indexed and my interface is all good to go but what I don't know how to
 judge is how often it will be queried and how much resources it needs to
 function properly. So what I am looking for is some sort of performance
 monitoring solution. I know if I go to the statistics page i can find
 the number of queries and the average response time. What I want is a
 bit more detailed result, showing how it varies over time. A plot of RAM
 usage and possibly disk IO that is due to Solr over time as well.
[...]

Solr exposes statistics through JMX: http://wiki.apache.org/solr/SolrJmx
As described there, you can examine them using jconsole. Nagios also
has several JMX plugins, of which we have successfully used
http://exchange.nagios.org/directory/Plugins/Java-Applications-and-Servers/Syabru-Nagios-JMX-Plugin/details

There are also several open-source JMX clients available, but we did
not find any actively-maintained one with the features that we would
have liked. We have prototyped an in-house JMX application, focused
on Solr, and hope to be able to open-source this soon.

 The other option I found is LiquidGaze, it says it
 is an open source solution for monitoring the handlers and can do a lot
 of what I need but has anyone ever used it before and can give it a
 rating, good or bad.
[...]

I presume that you mean LucidGaze. We have been meaning to check
it out, but the person implementing it was running into some issues.

Regards,
Gora


Re: master master, repeaters

2010-12-19 Thread Lance Norskog
If you have a load balancer available, that is a much cleaner solution
than anything else. After the main indexer comes back, you have to get
the current index state to it to start again. But otherwise

On Sun, Dec 19, 2010 at 10:39 AM, Upayavira u...@odoko.co.uk wrote:


 On Sun, 19 Dec 2010 10:20 -0800, Tri Nguyen tringuye...@yahoo.com
 wrote:
 How do we tell the slaves to point to the new master without modifying
 the config files?  Can we do this while the slave is up, issuing a
 command to it?

 I believe this can be done (details are in
 http://wiki.apache.org/solr/SolrReplication), but I've not actually done
 it.

 Upayavira

 --- On Sun, 12/19/10, Upayavira u...@odoko.co.uk wrote:


 From: Upayavira u...@odoko.co.uk
 Subject: Re: master master, repeaters
 To: solr-user@lucene.apache.org
 Date: Sunday, December 19, 2010, 10:13 AM


 We had a (short) thread on this late last week.

 Solr doesn't support automatic failover of the master, at least in
 1.4.1. I've been discussing with my colleague (Tommaso) about ways to
 achieve this.

 There's ways we could 'fake it', scripting the following:

 * set up a 'backup' master, as a replica of the actual master
 * monitor the master for 'up-ness'
 * if it fails:
    * tell the master to start indexing to the backup instead
    * tell the slave(s) to connect to a different master (the backup)
 * then, when the master is back:
    * wipe its index (backing up dir first?)
    * configure it to be a backup of the new master
    * make it pull a fresh index over

 But, Jan Høydahl suggested using SolrCloud. I'm going to follow up on
 how that might work in that thread.

 Upayavira


 On Sun, 19 Dec 2010 00:20 -0800, Tri Nguyen tringuye...@yahoo.com
 wrote:
  Hi,
 
  In the master-slave configuration, I'm trying to figure out how to
  configure the
  system setup for master failover.
 
  Does solr support master-master setup?  From my readings, solr does not.
 
  I've read about repeaters as well where the slave can act as a master.
  When the
  main master goes down, do the other slaves switch to the repeater?
 
  Barring better solutions, I'm thinking about putting 2 masters behind  a
  load
  balancer.
 
  If this is not implemented already, perhaps solr can be updated to
  support a
  list of masters for fault tolerance.
 
  Tri





-- 
Lance Norskog
goks...@gmail.com


Re: DIH for sharded database?

2010-12-19 Thread Lance Norskog
You said: Currently these tables all live in the same database, but in
the future they may be moved to different servers to scale out if the
needs arise.

That's why I concentrated on the JDBC url problem.

But you can use a file as a list of tables. Read each line, and a
sub-entity can substitute the line value into the SQL statement.

On Sat, Dec 18, 2010 at 6:46 PM, Andy angelf...@yahoo.com wrote:

 --- On Sat, 12/18/10, Lance Norskog goks...@gmail.com wrote:

 You can have a file with 1,2,3 on
 separate lines. There is a
 line-by-line file reader that can pull these as separate
 drivers.
 Inside that entity the JDBC url has to be altered with the
 incoming
 numbers. I don't know if this will work.

 I'm not sure I understand.

 How will altering the JDBC url change the name of the table it is importing 
 data from?

 Wouldn't I need to change the  actual SQL query itself?

 select * from Document1
 select * from Document2
 ...
 select * from Document36







-- 
Lance Norskog
goks...@gmail.com


Re: DIH for sharded database?

2010-12-19 Thread Andy

--- On Mon, 12/20/10, Lance Norskog goks...@gmail.com wrote:

 You said: Currently these tables all
 live in the same database, but in
 the future they may be moved to different servers to scale
 out if the
 needs arise.
 
 That's why I concentrated on the JDBC url problem.
 
 But you can use a file as a list of tables. Read each line,
 and a
 sub-entity can substitute the line value into the SQL
 statement.
 

Can you give me an example of how to do this or pointing me to documentation 
that illustrates this? I think I sorta understand what you're saying 
conceptually but I need to be sure about the specifics.

Thanks.