Re: solr init.d script

2010-11-09 Thread Nikola Garafolic

Sorry, forgot to mention, Centos.
Thanks.

I have very similar script to this Centos one and I am missing status 
portion of the script.


On 11/09/2010 08:47 AM, Eric Martin wrote:

Er, what flavor?

RHEL / CentOS

#!/bin/sh

# Starts, stops, and restarts Apache Solr.
#
# chkconfig: 35 92 08
# description: Starts and stops Apache Solr

SOLR_DIR=/var/solr
JAVA_OPTIONS=-Xmx1024m -DSTOP.PORT=8079 -DSTOP.KEY=mustard -jar start.jar
LOG_FILE=/var/log/solr.log
JAVA=/usr/bin/java

case $1 in
 start)
 echo Starting Solr
 cd $SOLR_DIR
 $JAVA $JAVA_OPTIONS 2  $LOG_FILE
 ;;
 stop)
 echo Stopping Solr
 cd $SOLR_DIR
 $JAVA $JAVA_OPTIONS --stop
 ;;
 restart)
 $0 stop
 sleep 1
 $0 start
 ;;
 *)
 echo Usage: $0 {start|stop|restart}2
 exit 1
 ;;
esac




Debian

http://xdeb.org/node/1213

__

Ubuntu

STEPS
Type in the following command in TERMINAL to install nano text editor.
sudo apt-get install nano
Type in the following command in TERMINAL to add a new script.
sudo nano /etc/init.d/solr
TERMINAL will display a new page title GNU nano 2.0.x.
Paste the below script in this TERMINAL window.
#!/bin/sh -e

# Starts, stops, and restarts solr

SOLR_DIR=/apache-solr-1.4.0/example
JAVA_OPTIONS=-Xmx1024m -DSTOP.PORT=8079 -DSTOP.KEY=stopkey -jar start.jar
LOG_FILE=/var/log/solr.log
JAVA=/usr/bin/java

case $1 in
 start)
 echo Starting Solr
 cd $SOLR_DIR
 $JAVA $JAVA_OPTIONS 2  $LOG_FILE
 ;;
 stop)
 echo Stopping Solr
 cd $SOLR_DIR
 $JAVA $JAVA_OPTIONS --stop
 ;;
 restart)
 $0 stop
 sleep 1
 $0 start
 ;;
 *)
 echo Usage: $0 {start|stop|restart}2
 exit 1
 ;;
esac
Note: In above script you might have to replace /apache-solr-1.4.0/example
with appropriate directory name.
Press CTRL-X keys.
Type in Y
When ask File Name to Write press ENTER key.
You're now back to TERMINAL command line.

Type in the following command in TERMINAL to create all the links to the
script.
sudo update-rc.d solr defaults
Type in the following command in TERMINAL to make the script executable.
sudo chmod a+rx /etc/init.d/solr
To test. Reboot your Ubuntu Server.
Wait until Ubuntu Server reboot is completed.
Wait 2 minutes for Apache Solr to startup.
Using your internet browser go to your website and try a Solr search.



-Original Message-
From: Nikola Garafolic [mailto:nikola.garafo...@srce.hr]
Sent: Monday, November 08, 2010 11:42 PM
To: solr-user@lucene.apache.org
Subject: solr init.d script

Hi,

Does anyone have some kind of init.d script for solr, that can start,
stop and check solr status?




--
Nikola Garafolic
SRCE, Sveucilisni racunski centar
tel: +385 1 6165 804
email: nikola.garafo...@srce.hr


Re: Replication and ignored fields

2010-11-09 Thread Jan Høydahl / Cominvent
Not sure about that. I have read that the replication handler actually issues a 
commit() on itself once the index is downloaded.

But probably a better way for Markus' case is to hook the prune job on the 
master, writing to another core (myIndexPruned). Then you replicate from that 
core instead, and you also get the benefit of transferring a smaller index 
across the network.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 8. nov. 2010, at 23.50, Shalin Shekhar Mangar wrote:

 On Fri, Nov 5, 2010 at 2:30 PM, Jan Høydahl / Cominvent
 jan@cominvent.com wrote:
 
 How about hooking in  Andrzej's pruning tool at the postCommit event, 
 literally removing unused fields. I believe a commit is fired on the slave 
 by itself after every successful replication, to put
 the index live. You could execute a script which prunes away the dead meat 
 and then call a new commit?
 
 Well, I don't think it will work because a new commit will cause the
 index version on the slave to be ahead of the master which will cause
 Solr replication to download a full index from the master and it'd go
 in an infinite loop.
 
 --
 Regards,
 Shalin Shekhar Mangar.



solr dynamic core creation

2010-11-09 Thread nizan

Hi,

I’m not sure this is the right place, hopefully you can help. Anyway, I also
sent mail to solr-user@lucene.apache.org

I’m using solr – one master with 17 slaves in the server and using solrj as
the java client

Currently there’s only one core in all of them (master and slaves) – only
the cpaCore.

I thought about using multi-cores solr, but I have some problems with that.

I don’t know in advance which cores I’d need – 

When my java program runs, I call for documents to be index to a certain
url, which contains the core name, and I might create a url based on core
that is not yet created. For example:

(at the begining, the only core is cpaCore)

Calling to index – http://localhost:8080/cpaCore  - existing core,
everything as usual
Calling to index -  http://localhost:8080/newCore - Currently throws
excecption. what I'd like to happen is - server realizes there’s no core
“newCore”, creates it and indexes to it. After that – also creates the new
core in the slaves
Calling to index – http://localhost:8080/newCore  - existing core,
everything as usual

What I’d like to have on the server side to do is realize by itself if the
cores exists or not, and if not  - create it

One other restriction – I can’t change anything in the client side – calling
to the server can only make the calls it’s doing now – for index and search,
and cannot make calls for cores creation via the CoreAdminHandler. All I can
do is something in the server itself

What can I do to get it done? Write some RequestHandler? REquestProcessor?
Any other option?

Thanks, nizan

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-dynamic-core-creation-tp1867705p1867705.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Tomcat special character problem

2010-11-09 Thread Em

The problem was firstly the wrong URIEncoding of tomcat itself.
The second problem came from the application's side: The params were wrongly
encoded, so it was not possible to show the desired results.

If you need to convert from different encodings to utf8, I can give you the
following piece of pseudocode:

string = urlencode(encodeForUtf8(myString));

And if you need to decode for several reasons, keep in mind that you must
change the order of decodings:

value = decodeFromUtf8(urldecode(string));

Hope that helps.

Thank you!
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Tomcat-special-character-problem-tp1857648p1868024.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr init.d script

2010-11-09 Thread Nikola Garafolic
I  have two nodes running one jboss server each and using one (single) 
solr instance, thats how I run it for now.


Do you recommend running jboss with solr via servlet? Two jboss run in 
load-balancing for high availability purpose.


For now it seems to be ok.

On 11/09/2010 03:17 PM, Israel Ekpo wrote:

I think it would be a better idea to load solr via a servlet container like
Tomcat and then create the init.d script for tomcat instead.

http://wiki.apache.org/solr/SolrTomcat#Installing_Tomcat_6



--
Nikola Garafolic
SRCE, Sveucilisni racunski centar
tel: +385 1 6165 804
email: nikola.garafo...@srce.hr


Re: Replication and ignored fields

2010-11-09 Thread Shalin Shekhar Mangar
On Tue, Nov 9, 2010 at 12:33 AM, Jan Høydahl / Cominvent
jan@cominvent.com wrote:
 Not sure about that. I have read that the replication handler actually issues 
 a commit() on itself once the index is downloaded.

That was true with the old replication scripts. The Java based
replication just re-opens the IndexReader after all the files are
downloaded so the index version on the slave remains in sync with the
one on the master.


 But probably a better way for Markus' case is to hook the prune job on the 
 master, writing to another core (myIndexPruned). Then you replicate from that 
 core instead, and you also get the benefit of transferring a smaller index 
 across the network.

I agree, that is a good idea.

-- 
Regards,
Shalin Shekhar Mangar.


Re: How to Facet on a price range

2010-11-09 Thread Geert-Jan Brits
Just to add to this, if you want to allow the user more choice in his option
to select ranges, perhaps by using a 2-sided javasacript slider for the
pricerange (ala kayak.com) it may be very worthwhile to discretize the
allowed values for the slider (e.g: steps of 5 dolllar) Most js-slider
implementations allow for this easily.

This has the advantages of:
- having far fewer possible facetqueries and thus a far greater chance of
these facetqueries hitting the cache.
- a better user-experience, although that's debatable.

just to be clear: for this the Solr-side would still use:
facet=onfacet.query=price:[50
TO *]facet.query=price:[* TO 100] and not the optimized pre-computed
variant suggested above.

Geert-Jan

2010/11/9 jayant jayan...@hotmail.com


 That was very well thought of and a clever solution. Thanks.
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/How-to-Facet-on-a-price-range-tp1846392p1869201.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Replication and ignored fields

2010-11-09 Thread Jan Høydahl / Cominvent
Cool, thanks for the clarification, Shalin.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 9. nov. 2010, at 15.12, Shalin Shekhar Mangar wrote:

 On Tue, Nov 9, 2010 at 12:33 AM, Jan Høydahl / Cominvent
 jan@cominvent.com wrote:
 Not sure about that. I have read that the replication handler actually 
 issues a commit() on itself once the index is downloaded.
 
 That was true with the old replication scripts. The Java based
 replication just re-opens the IndexReader after all the files are
 downloaded so the index version on the slave remains in sync with the
 one on the master.
 
 
 But probably a better way for Markus' case is to hook the prune job on the 
 master, writing to another core (myIndexPruned). Then you replicate from 
 that core instead, and you also get the benefit of transferring a smaller 
 index across the network.
 
 I agree, that is a good idea.
 
 -- 
 Regards,
 Shalin Shekhar Mangar.



Re: How to Facet on a price range

2010-11-09 Thread jayant

That was very well thought of and a clever solution. Thanks.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-Facet-on-a-price-range-tp1846392p1869201.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr init.d script

2010-11-09 Thread Israel Ekpo
I think it would be a better idea to load solr via a servlet container like
Tomcat and then create the init.d script for tomcat instead.

http://wiki.apache.org/solr/SolrTomcat#Installing_Tomcat_6

On Tue, Nov 9, 2010 at 2:47 AM, Eric Martin e...@makethembite.com wrote:

 Er, what flavor?

 RHEL / CentOS

 #!/bin/sh

 # Starts, stops, and restarts Apache Solr.
 #
 # chkconfig: 35 92 08
 # description: Starts and stops Apache Solr

 SOLR_DIR=/var/solr
 JAVA_OPTIONS=-Xmx1024m -DSTOP.PORT=8079 -DSTOP.KEY=mustard -jar start.jar
 LOG_FILE=/var/log/solr.log
 JAVA=/usr/bin/java

 case $1 in
start)
echo Starting Solr
cd $SOLR_DIR
$JAVA $JAVA_OPTIONS 2 $LOG_FILE 
;;
stop)
echo Stopping Solr
cd $SOLR_DIR
$JAVA $JAVA_OPTIONS --stop
;;
restart)
$0 stop
sleep 1
$0 start
;;
*)
echo Usage: $0 {start|stop|restart} 2
exit 1
;;
 esac

 


 Debian

 http://xdeb.org/node/1213

 __

 Ubuntu

 STEPS
 Type in the following command in TERMINAL to install nano text editor.
 sudo apt-get install nano
 Type in the following command in TERMINAL to add a new script.
 sudo nano /etc/init.d/solr
 TERMINAL will display a new page title GNU nano 2.0.x.
 Paste the below script in this TERMINAL window.
 #!/bin/sh -e

 # Starts, stops, and restarts solr

 SOLR_DIR=/apache-solr-1.4.0/example
 JAVA_OPTIONS=-Xmx1024m -DSTOP.PORT=8079 -DSTOP.KEY=stopkey -jar start.jar
 LOG_FILE=/var/log/solr.log
 JAVA=/usr/bin/java

 case $1 in
start)
echo Starting Solr
cd $SOLR_DIR
$JAVA $JAVA_OPTIONS 2 $LOG_FILE 
;;
stop)
echo Stopping Solr
cd $SOLR_DIR
$JAVA $JAVA_OPTIONS --stop
;;
restart)
$0 stop
sleep 1
$0 start
;;
*)
echo Usage: $0 {start|stop|restart} 2
exit 1
;;
 esac
 Note: In above script you might have to replace /apache-solr-1.4.0/example
 with appropriate directory name.
 Press CTRL-X keys.
 Type in Y
 When ask File Name to Write press ENTER key.
 You're now back to TERMINAL command line.

 Type in the following command in TERMINAL to create all the links to the
 script.
 sudo update-rc.d solr defaults
 Type in the following command in TERMINAL to make the script executable.
 sudo chmod a+rx /etc/init.d/solr
 To test. Reboot your Ubuntu Server.
 Wait until Ubuntu Server reboot is completed.
 Wait 2 minutes for Apache Solr to startup.
 Using your internet browser go to your website and try a Solr search.



 -Original Message-
 From: Nikola Garafolic [mailto:nikola.garafo...@srce.hr]
 Sent: Monday, November 08, 2010 11:42 PM
 To: solr-user@lucene.apache.org
 Subject: solr init.d script

 Hi,

 Does anyone have some kind of init.d script for solr, that can start,
 stop and check solr status?

 --
 Nikola Garafolic
 SRCE, Sveucilisni racunski centar
 tel: +385 1 6165 804
 email: nikola.garafo...@srce.hr




-- 
°O°
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


dynamically create unique key

2010-11-09 Thread Christopher Gross
I'm trying to use Solr to store information from a few different sources in
one large index.  I need to create a unique key for the Solr index that will
be unique per document.  If I have 3 systems, and they all have a document
with id=1, then I need to create a uniqueId field in my schema that
contains both the system name and that id, along the lines of: sysa1,
sysb1, and sysc1.  That way, each document will have a unique id.

I added this to my schema.xml:

  copyField source=source dest=uniqueId/
  copyField source=id dest=uniqueId/


However, after trying to insert, I got this:
java.lang.Exception: ERROR: multiple values encountered for non multiValued
copy field uniqueId: sysa

So instead of just appending to the uniqueId field, it tried to do a
multiValued.  Does anyone have an idea on how I can make this work?

Thanks!

-- Chris


Re: dynamically create unique key

2010-11-09 Thread Ken Stanley
On Tue, Nov 9, 2010 at 10:39 AM, Christopher Gross cogr...@gmail.com wrote:
 I'm trying to use Solr to store information from a few different sources in
 one large index.  I need to create a unique key for the Solr index that will
 be unique per document.  If I have 3 systems, and they all have a document
 with id=1, then I need to create a uniqueId field in my schema that
 contains both the system name and that id, along the lines of: sysa1,
 sysb1, and sysc1.  That way, each document will have a unique id.

 I added this to my schema.xml:

  copyField source=source dest=uniqueId/
  copyField source=id dest=uniqueId/


 However, after trying to insert, I got this:
 java.lang.Exception: ERROR: multiple values encountered for non multiValued
 copy field uniqueId: sysa

 So instead of just appending to the uniqueId field, it tried to do a
 multiValued.  Does anyone have an idea on how I can make this work?

 Thanks!

 -- Chris


Chris,

Depending on how you insert your documents into SOLR will determine
how to create your unique field. If you are POST'ing the data via
HTTP, then you would be responsible for building your unique id (i.e.,
your program/language would use string concatenation to add the unique
id to the output before it gets to the update handler in SOLR). If
you're using the DataImportHandler, then you can use the
TemplateTransformer
(http://wiki.apache.org/solr/DataImportHandler#TemplateTransformer) to
dynamically build your unique id at document insertion time.

For example, we here at bizjournals use SOLR and the DataImportHandler
to index our documents. Like you, we run the risk of two or more ids
clashing, and thus overwriting a different type of document. As such,
we take two or three different fields and combine them together using
the TemplateTransformer to generate a more unique id for each document
we index.

With respect to the multiValued option, that is used more for an
array-like structure within a field. For example, if you have a blog
entry with multiple tag keywords, you would probably want a field in
SOLR that can contain the various tag keywords for each blog entry;
this is where multiValued comes in handy.

I hope that this helps to clarify things for you.

- Ken Stanley


Re: dynamically create unique key

2010-11-09 Thread Christopher Gross
Thanks Ken.

I'm using a script with Java/SolrJ to copy documents from their original
locations into the Solr Index.

I wasn't sure if the copyField would help me, but from your answers it seems
that I'll have to handle it on my own.  That's fine -- it is definitely not
hard to pass a new field myself.  I was just thinking that there should be
an easy way to have Solr build the unique field, since it was getting
everything anyway.

I was just confused as to why I was getting a multiValued error, since I was
just trying to append to a field.  I wasn't sure if I was missing something.

Thanks again!

-- Chris


On Tue, Nov 9, 2010 at 10:47 AM, Ken Stanley doh...@gmail.com wrote:

 On Tue, Nov 9, 2010 at 10:39 AM, Christopher Gross cogr...@gmail.com
 wrote:
  I'm trying to use Solr to store information from a few different sources
 in
  one large index.  I need to create a unique key for the Solr index that
 will
  be unique per document.  If I have 3 systems, and they all have a
 document
  with id=1, then I need to create a uniqueId field in my schema that
  contains both the system name and that id, along the lines of: sysa1,
  sysb1, and sysc1.  That way, each document will have a unique id.
 
  I added this to my schema.xml:
 
   copyField source=source dest=uniqueId/
   copyField source=id dest=uniqueId/
 
 
  However, after trying to insert, I got this:
  java.lang.Exception: ERROR: multiple values encountered for non
 multiValued
  copy field uniqueId: sysa
 
  So instead of just appending to the uniqueId field, it tried to do a
  multiValued.  Does anyone have an idea on how I can make this work?
 
  Thanks!
 
  -- Chris
 

 Chris,

 Depending on how you insert your documents into SOLR will determine
 how to create your unique field. If you are POST'ing the data via
 HTTP, then you would be responsible for building your unique id (i.e.,
 your program/language would use string concatenation to add the unique
 id to the output before it gets to the update handler in SOLR). If
 you're using the DataImportHandler, then you can use the
 TemplateTransformer
 (http://wiki.apache.org/solr/DataImportHandler#TemplateTransformer) to
 dynamically build your unique id at document insertion time.

 For example, we here at bizjournals use SOLR and the DataImportHandler
 to index our documents. Like you, we run the risk of two or more ids
 clashing, and thus overwriting a different type of document. As such,
 we take two or three different fields and combine them together using
 the TemplateTransformer to generate a more unique id for each document
 we index.

 With respect to the multiValued option, that is used more for an
 array-like structure within a field. For example, if you have a blog
 entry with multiple tag keywords, you would probably want a field in
 SOLR that can contain the various tag keywords for each blog entry;
 this is where multiValued comes in handy.

 I hope that this helps to clarify things for you.

 - Ken Stanley



Re: dynamically create unique key

2010-11-09 Thread Ken Stanley
On Tue, Nov 9, 2010 at 10:53 AM, Christopher Gross cogr...@gmail.com wrote:
 Thanks Ken.

 I'm using a script with Java/SolrJ to copy documents from their original
 locations into the Solr Index.

 I wasn't sure if the copyField would help me, but from your answers it seems
 that I'll have to handle it on my own.  That's fine -- it is definitely not
 hard to pass a new field myself.  I was just thinking that there should be
 an easy way to have Solr build the unique field, since it was getting
 everything anyway.

 I was just confused as to why I was getting a multiValued error, since I was
 just trying to append to a field.  I wasn't sure if I was missing something.

 Thanks again!

 -- Chris


Chris,

I definitely understand your sentiment. The thing to keep in mind with
SOLR is that it really has limited logic mechanisms; in fact, unless
you're willing to use the DataImportHandler (dih) and the
ScriptTransformer, you really have no logic.

The copyField directive in schema.xml is mainly used to help you
easily copy the contents of one field into another so that it may be
indexed in multiple ways; for example, you can index a string so that
it is stored literally (i.e., Hello World), parsed using a
whitespace tokenizer (i.e., Hello, World), parsed for an nGram
tokenizer (i.e., H, He, Hel... ). This is beneficial to you
because you wouldn't have to explicitly define each possible instance
in your data stream. You just define the field once, and SOLR is smart
enough to copy it where it needs to go.

Glad to have helped. :)

- Ken


Re: How to Facet on a price range

2010-11-09 Thread gwk

Hi,

Instead of all the facet queries, you can also make use of range facets 
(http://wiki.apache.org/solr/SimpleFacetParameters#Facet_by_Range), 
which is in trunk afaik, it should also be patchable into older versions 
of Solr, although that should not be necessary.


We make use of it (http://www.mysecondhome.co.uk/search.html) to create 
the nice sliders Geert-Jan describes. We've also used it to add the 
sparklines above the sliders which give a nice indication of how the 
current selection is spread out.


Regards,

gwk

On 11/9/2010 3:33 PM, Geert-Jan Brits wrote:

Just to add to this, if you want to allow the user more choice in his option
to select ranges, perhaps by using a 2-sided javasacript slider for the
pricerange (ala kayak.com) it may be very worthwhile to discretize the
allowed values for the slider (e.g: steps of 5 dolllar) Most js-slider
implementations allow for this easily.

This has the advantages of:
- having far fewer possible facetqueries and thus a far greater chance of
these facetqueries hitting the cache.
- a better user-experience, although that's debatable.

just to be clear: for this the Solr-side would still use:
facet=onfacet.query=price:[50
TO *]facet.query=price:[* TO 100] and not the optimized pre-computed
variant suggested above.

Geert-Jan

2010/11/9 jayantjayan...@hotmail.com


That was very well thought of and a clever solution. Thanks.
--
View this message in context:
http://lucene.472066.n3.nabble.com/How-to-Facet-on-a-price-range-tp1846392p1869201.html
Sent from the Solr - User mailing list archive at Nabble.com.





spell check vs terms component

2010-11-09 Thread bbarani

Hi,

We are trying to implement auto suggest feature in our application.

I would like to know the difference between terms vs spell check component.

Both the handlers seems to display almost the same output, can anyone let me
know the difference and also I would like to know when to go for spell check
and when to go for terms component.

Thanks,
Barani
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/spell-check-vs-terms-component-tp1870214p1870214.html
Sent from the Solr - User mailing list archive at Nabble.com.


Is there a way to embed terms handler in search handler?

2010-11-09 Thread bbarani

Hi,

I am trying to figure out if there is a way to embed terms handler as part
of default search handler and 
access using URL something lilke below

http://localhost:8990/solr/db/select?q=*:*terms.prefix=aterms.fl=name

Couple of other questions,

I would like to know if its possible to mention * in fl.name to search on
all fields or we should specify the field names only?

Will the autosuggest suggest the whole phrase or just the word it matches?

Thanks,
Barani
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-there-a-way-to-embed-terms-handler-in-search-handler-tp1870505p1870505.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr init.d script

2010-11-09 Thread Israel Ekpo
Yes.

I recommend running Solr via a servlet container.

It is much easier to manage compared to running it by itself.

On Tue, Nov 9, 2010 at 10:03 AM, Nikola Garafolic
nikola.garafo...@srce.hrwrote:

 I  have two nodes running one jboss server each and using one (single) solr
 instance, thats how I run it for now.

 Do you recommend running jboss with solr via servlet? Two jboss run in
 load-balancing for high availability purpose.

 For now it seems to be ok.


 On 11/09/2010 03:17 PM, Israel Ekpo wrote:

 I think it would be a better idea to load solr via a servlet container
 like
 Tomcat and then create the init.d script for tomcat instead.

 http://wiki.apache.org/solr/SolrTomcat#Installing_Tomcat_6


 --
 Nikola Garafolic
 SRCE, Sveucilisni racunski centar
 tel: +385 1 6165 804
 email: nikola.garafo...@srce.hr




-- 
°O°
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: spell check vs terms component

2010-11-09 Thread Shalin Shekhar Mangar
On Tue, Nov 9, 2010 at 8:20 AM, bbarani bbar...@gmail.com wrote:


 Hi,

 We are trying to implement auto suggest feature in our application.

 I would like to know the difference between terms vs spell check component.

 Both the handlers seems to display almost the same output, can anyone let
 me
 know the difference and also I would like to know when to go for spell
 check
 and when to go for terms component.


SpellCheckComponent is designed to operate on whole words and not partial
words so I don't know how well it will work for auto-suggest, if at all.

As far as differences between SpellCheckComponent and Terms Component is
concerned, TermsComponent is a straight prefix match whereas SCC takes edit
distance into account. Also, SCC can deal with phrases composed of multiple
words and also gives back a collated suggestion.

-- 
Regards,
Shalin Shekhar Mangar.


Re: How to Facet on a price range

2010-11-09 Thread Geert-Jan Brits
@ 
http://www.mysecondhome.co.uk/search.htmhttp://www.mysecondhome.co.uk/search.html
--
when you drag the sliders , an update of how many results would match is
immediately shown. I really like this. How did you do this? IS this
out-of-the-box available with the suggested Facet_by_range patch?

Thanks,
Geert-Jan

2010/11/9 gwk g...@eyefi.nl

 Hi,

 Instead of all the facet queries, you can also make use of range facets (
 http://wiki.apache.org/solr/SimpleFacetParameters#Facet_by_Range), which
 is in trunk afaik, it should also be patchable into older versions of Solr,
 although that should not be necessary.

 We make use of it (http://www.mysecondhome.co.uk/search.html) to create
 the nice sliders Geert-Jan describes. We've also used it to add the
 sparklines above the sliders which give a nice indication of how the current
 selection is spread out.

 Regards,

 gwk


 On 11/9/2010 3:33 PM, Geert-Jan Brits wrote:

 Just to add to this, if you want to allow the user more choice in his
 option
 to select ranges, perhaps by using a 2-sided javasacript slider for the
 pricerange (ala kayak.com) it may be very worthwhile to discretize the
 allowed values for the slider (e.g: steps of 5 dolllar) Most js-slider
 implementations allow for this easily.

 This has the advantages of:
 - having far fewer possible facetqueries and thus a far greater chance of
 these facetqueries hitting the cache.
 - a better user-experience, although that's debatable.

 just to be clear: for this the Solr-side would still use:
 facet=onfacet.query=price:[50
 TO *]facet.query=price:[* TO 100] and not the optimized pre-computed
 variant suggested above.

 Geert-Jan

 2010/11/9 jayantjayan...@hotmail.com

  That was very well thought of and a clever solution. Thanks.
 --
 View this message in context:

 http://lucene.472066.n3.nabble.com/How-to-Facet-on-a-price-range-tp1846392p1869201.html
 Sent from the Solr - User mailing list archive at Nabble.com.





Re: dynamically create unique key

2010-11-09 Thread Chris Hostetter

: one large index.  I need to create a unique key for the Solr index that will
: be unique per document.  If I have 3 systems, and they all have a document
: with id=1, then I need to create a uniqueId field in my schema that
: contains both the system name and that id, along the lines of: sysa1,
: sysb1, and sysc1.  That way, each document will have a unique id.

take a look at the SignatureUpdateProcessorFactory...

http://wiki.apache.org/solr/Deduplication

:   copyField source=source dest=uniqueId/
:   copyField source=id dest=uniqueId/
...
: So instead of just appending to the uniqueId field, it tried to do a
: multiValued.  Does anyone have an idea on how I can make this work?

copyField doesn't append it copies Field (value) instances from the 
source field to the dest field -- so if you get multiple values for 
hte dest field. 


-Hoss


Solr highlighter question

2010-11-09 Thread Moazzam Khan
Hey guys,

I have 3 fields: FirstName, LastName, Biography. They are all string
fields.  In schema, I copy them to the default search field which is
text. Is there any way to get Solr to highlight all the fields when
someone searches the default search field but when someone searches
for FirstName then only highlight that?

For example: if someone searches: medical +FirstName:dave then medical
should be highlighted in all fields and dave only in FirstName.

Thanks in advance,
Moazzam


Re: spell check vs terms component

2010-11-09 Thread Ken Stanley
On Tue, Nov 9, 2010 at 1:02 PM, Shalin Shekhar Mangar
shalinman...@gmail.com wrote:
 On Tue, Nov 9, 2010 at 8:20 AM, bbarani bbar...@gmail.com wrote:


 Hi,

 We are trying to implement auto suggest feature in our application.

 I would like to know the difference between terms vs spell check component.

 Both the handlers seems to display almost the same output, can anyone let
 me
 know the difference and also I would like to know when to go for spell
 check
 and when to go for terms component.


 SpellCheckComponent is designed to operate on whole words and not partial
 words so I don't know how well it will work for auto-suggest, if at all.

 As far as differences between SpellCheckComponent and Terms Component is
 concerned, TermsComponent is a straight prefix match whereas SCC takes edit
 distance into account. Also, SCC can deal with phrases composed of multiple
 words and also gives back a collated suggestion.

 --
 Regards,
 Shalin Shekhar Mangar.


An alternative to using the SpellCheckComponent and/or the
TermsComponent, would be the (Edge)NGrams filter. Basically, this
filter breaks words down into auto-suggest-friendly tokens (i.e.,
Hello = H, He, Hel, Hell, Hello) that works great for
auto suggestion querying.

Here is an article from Lucid Imagination on using the ngram filter:
http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/
Here is the SOLR wiki entry for the filter:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.EdgeNGramFilterFactory

- Ken Stanley


Re: dynamically create unique key

2010-11-09 Thread Christopher Gross
Thanks Hoss, I'll look into that!

-- Chris


On Tue, Nov 9, 2010 at 1:43 PM, Chris Hostetter hossman_luc...@fucit.orgwrote:


 : one large index.  I need to create a unique key for the Solr index that
 will
 : be unique per document.  If I have 3 systems, and they all have a
 document
 : with id=1, then I need to create a uniqueId field in my schema that
 : contains both the system name and that id, along the lines of: sysa1,
 : sysb1, and sysc1.  That way, each document will have a unique id.

 take a look at the SignatureUpdateProcessorFactory...

 http://wiki.apache.org/solr/Deduplication

 :   copyField source=source dest=uniqueId/
 :   copyField source=id dest=uniqueId/
 ...
 : So instead of just appending to the uniqueId field, it tried to do a
 : multiValued.  Does anyone have an idea on how I can make this work?

 copyField doesn't append it copies Field (value) instances from the
 source field to the dest field -- so if you get multiple values for
 hte dest field.


 -Hoss



Re: solr init.d script

2010-11-09 Thread Nikola Garafolic

On 11/09/2010 07:00 PM, Israel Ekpo wrote:

Yes.

I recommend running Solr via a servlet container.

It is much easier to manage compared to running it by itself.

On Tue, Nov 9, 2010 at 10:03 AM, Nikola Garafolic
nikola.garafo...@srce.hrwrote:


But in my case, that would make things more complex as I see it. Two 
jboss servers with solr as servlet container, and then I need the same 
data dir, right? I am now running single solr instance as cluster 
service, with data dir set to shared lun, that can be started on any of 
two hosts.


Can you explain my benefits with two solr instances via servlet, maybe 
more performance?


Regards,
Nikola

--
Nikola Garafolic
SRCE, Sveucilisni racunski centar
tel: +385 1 6165 804
email: nikola.garafo...@srce.hr




RE: returning message to sender

2010-11-09 Thread Teki, Prasad
--=_Part_27114_30663314.1289327581322
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit


Hi guys,
I have been exploring Solr since last few weeks. Our main intension is
to
expose the data, as WS, across various data sources by linking them
using
some scenario.

I have couple of questions.
Is there any good document/URL, which answers...

How the indexing happens/built for the queries across different data
sources
(DIH)?

Does the Lucene store the actual data of each individual query or a
combination?, where, if yes?

Whenever we do a query against built index, when exactly it fires the
query
to database?

How does the index get the updates from the DIH, For example, if my
query
includes 3 DIH and 
What is the max number of data sources, I can include to get better
performace?

How do we measure the scalablity?

Can I run these search engines in a grid mode?

Thanks.
-- 
View this message in context:
http://lucene.472066.n3.nabble.com/Storage-tp1871155p1871155.html
Sent from the Solr - User mailing list archive at Nabble.com.

--=_Part_27114_30663314.1289327581322
Content-Type: text/html; charset=us-ascii
Content-Transfer-Encoding: 7bit


Hi guys,
I have been exploring Solr since last few weeks. Our main intension is
to expose the data, as WS, across various data sources by linking them
using some scenario.

I have couple of questions.
Is there any good document/URL, which answers...

How the indexing happens/built for the queries across different data
sources (DIH)?

Does the Lucene store the actual data of each individual query or a
combination?, where, if yes?

Whenever we do a query against built index, when exactly it fires the
query to database?

How does the index get the updates from the DIH, For example, if my
query includes 3 DIH and 
What is the max number of data sources, I can include to get better
performace?

How do we measure the scalablity?

Can I run these search engines in a grid mode?

Thanks.img class='smiley'
src='http://n3.nabble.com/images/smiley/anim_confused.gif' /
brhr align=left width=300
View this message in context: a
href=http://lucene.472066.n3.nabble.com/Storage-tp1871155p1871155.html;
Storage/abr
Sent from the a
href=http://lucene.472066.n3.nabble.com/Solr-User-f472068.html;Solr -
User mailing list archive/a at Nabble.com.br

--=_Part_27114_30663314.1289327581322-- 
Standard  Poor's: Empowering Investors and Markets for 150 Years
 


The information contained in this message is intended only for the recipient, 
and may be a confidential attorney-client communication or may otherwise be 
privileged and confidential and protected from disclosure. If the reader of 
this message is not the intended recipient, or an employee or agent responsible 
for delivering this message to the intended recipient, please be aware that any 
dissemination or copying of this communication is strictly prohibited. If you 
have received this communication in error, please immediately notify us by 
replying to the message and deleting it from your computer. The McGraw-Hill 
Companies, Inc. reserves the right, subject to applicable local law, to monitor 
and review the content of any electronic message or information sent to or from 
McGraw-Hill employee e-mail addresses without informing the sender or recipient 
of the message.



Using Multiple Cores for Multiple Users

2010-11-09 Thread Adam Estrada
All,

I have a web application that requires the user to register and then login
to gain access to the site. Pretty standard stuff...Now I would like to know
what the best approach would be to implement a customized search
experience for each user. Would this mean creating a separate core per user?
I think that this is not possible without restarting Solr after each core is
added to the multi-core xml file, right?

My use case is this...User A would like to index 5 RSS feeds and User B
would like to index 5 completely different RSS feeds and he is not
interested at all in what User A is interested in. This means that they
would have to be separate index cores, right?

What is the best approach for this kind of thing?

Thanks in advance,
Adam


Re: returning message to sender

2010-11-09 Thread Erick Erickson
Hmmm, this is a little murky
I'm inferring that you believe that DIH somehow
queries the data source at #query# time, and this
is not true.  DIH is an #index time# concept.

DIH is used to add data to an index. Once that index is
created, all searches against are unaware that there
were different data sources.

So, with a single Solr schema, you can use DIH
on as many different data sources as you want,
mapping the various bits of information from each
data source into your Solr schema. Searches go
against fields defined in the schema, so you're
automatically searching against all the databases
(assuming you've mapped your data into your
schema)

If I've misunderstood, perhaps you can add some
details?

Best
Erick

On Tue, Nov 9, 2010 at 1:39 PM, Teki, Prasad 
prasad_t...@standardandpoors.com wrote:

 --=_Part_27114_30663314.1289327581322
 Content-Type: text/plain; charset=us-ascii
 Content-Transfer-Encoding: 7bit


 Hi guys,
 I have been exploring Solr since last few weeks. Our main intension is
 to
 expose the data, as WS, across various data sources by linking them
 using
 some scenario.

 I have couple of questions.
 Is there any good document/URL, which answers...

 How the indexing happens/built for the queries across different data
 sources
 (DIH)?

 Does the Lucene store the actual data of each individual query or a
 combination?, where, if yes?

 Whenever we do a query against built index, when exactly it fires the
 query
 to database?

 How does the index get the updates from the DIH, For example, if my
 query
 includes 3 DIH and
 What is the max number of data sources, I can include to get better
 performace?

 How do we measure the scalablity?

 Can I run these search engines in a grid mode?

 Thanks.
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Storage-tp1871155p1871155.html
 Sent from the Solr - User mailing list archive at Nabble.com.

 --=_Part_27114_30663314.1289327581322
 Content-Type: text/html; charset=us-ascii
 Content-Transfer-Encoding: 7bit


 Hi guys,
 I have been exploring Solr since last few weeks. Our main intension is
 to expose the data, as WS, across various data sources by linking them
 using some scenario.

 I have couple of questions.
 Is there any good document/URL, which answers...

 How the indexing happens/built for the queries across different data
 sources (DIH)?

 Does the Lucene store the actual data of each individual query or a
 combination?, where, if yes?

 Whenever we do a query against built index, when exactly it fires the
 query to database?

 How does the index get the updates from the DIH, For example, if my
 query includes 3 DIH and
 What is the max number of data sources, I can include to get better
 performace?

 How do we measure the scalablity?

 Can I run these search engines in a grid mode?

 Thanks.img class='smiley'
 src='http://n3.nabble.com/images/smiley/anim_confused.gif' /
 brhr align=left width=300
 View this message in context: a
 href=http://lucene.472066.n3.nabble.com/Storage-tp1871155p1871155.html;
 Storage/abr
 Sent from the a
 href=http://lucene.472066.n3.nabble.com/Solr-User-f472068.html;Solr -
 User mailing list archive/a at Nabble.com.br

 --=_Part_27114_30663314.1289327581322--
 Standard  Poor's: Empowering Investors and Markets for 150 Years

 

 The information contained in this message is intended only for the
 recipient, and may be a confidential attorney-client communication or may
 otherwise be privileged and confidential and protected from disclosure. If
 the reader of this message is not the intended recipient, or an employee or
 agent responsible for delivering this message to the intended recipient,
 please be aware that any dissemination or copying of this communication is
 strictly prohibited. If you have received this communication in error,
 please immediately notify us by replying to the message and deleting it from
 your computer. The McGraw-Hill Companies, Inc. reserves the right, subject
 to applicable local law, to monitor and review the content of any electronic
 message or information sent to or from McGraw-Hill employee e-mail addresses
 without informing the sender or recipient of the message.
 



Re: Using Multiple Cores for Multiple Users

2010-11-09 Thread Markus Jelsma
Hi,

 All,
 
 I have a web application that requires the user to register and then login
 to gain access to the site. Pretty standard stuff...Now I would like to
 know what the best approach would be to implement a customized search
 experience for each user. Would this mean creating a separate core per
 user? I think that this is not possible without restarting Solr after each
 core is added to the multi-core xml file, right?

No, you can dynamically manage cores and parts of their configuration. 
Sometimes you must reindex after a change, the same is true for reloading 
cores. Check the wiki on this one [1].

 
 My use case is this...User A would like to index 5 RSS feeds and User B
 would like to index 5 completely different RSS feeds and he is not
 interested at all in what User A is interested in. This means that they
 would have to be separate index cores, right?

If you view documents within an rss feed as a separate documents, you can 
assign an user ID to those documents, creating a multi user index with rss 
documents per user, or group or whatever.

Having a core per user isn't a good idea if you have many users.  It takes up 
additional memory and disk space, doesn't share caches etc.  There is also 
more maintenance and your need some support scripts to dynamically create new 
cores - Solr currently doesn't create a new core directory structure.

But, reindexing a very large index takes up a lot more time and resources and 
relevancy might be an issue depending on the rss feeds' contents. 

 
 What is the best approach for this kind of thing?

I'd usually store the feeds in a single index and shard if it's too many for a 
single server with your specifications. Unless the demands are too specific.

 
 Thanks in advance,
 Adam

[1]: http://wiki.apache.org/solr/CoreAdmin

Cheers


Re: Using Multiple Cores for Multiple Users

2010-11-09 Thread Dennis Gearon
I'm willing to bet a lot that the standard approach is to use a Server Side 
Langauge to customize the queries for the user . . . on the same core/set of 
cores.

The only reasons that my limited experience suggests for a 'core per user' is 
privacy/performance. Unless you have a very small set of users, I would think 
managing cores for LOTS of users to be PIA. Create one (takes time), replicate 
to it (takes MORE time), use it, destroy it after session expires (requires 
garbage collection program running pretty often)(LOTS more time/CPU resource 
taken up.

I am happy to be corrected on any of this.

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: Markus Jelsma markus.jel...@openindex.io
To: solr-user@lucene.apache.org
Cc: Adam Estrada estrada.adam.gro...@gmail.com
Sent: Tue, November 9, 2010 3:57:34 PM
Subject: Re: Using Multiple Cores for Multiple Users

Hi,

 All,
 
 I have a web application that requires the user to register and then login
 to gain access to the site. Pretty standard stuff...Now I would like to
 know what the best approach would be to implement a customized search
 experience for each user. Would this mean creating a separate core per
 user? I think that this is not possible without restarting Solr after each
 core is added to the multi-core xml file, right?

No, you can dynamically manage cores and parts of their configuration. 
Sometimes you must reindex after a change, the same is true for reloading 
cores. Check the wiki on this one [1].

 
 My use case is this...User A would like to index 5 RSS feeds and User B
 would like to index 5 completely different RSS feeds and he is not
 interested at all in what User A is interested in. This means that they
 would have to be separate index cores, right?

If you view documents within an rss feed as a separate documents, you can 
assign an user ID to those documents, creating a multi user index with rss 
documents per user, or group or whatever.

Having a core per user isn't a good idea if you have many users.  It takes up 
additional memory and disk space, doesn't share caches etc.  There is also 
more maintenance and your need some support scripts to dynamically create new 
cores - Solr currently doesn't create a new core directory structure.

But, reindexing a very large index takes up a lot more time and resources and 
relevancy might be an issue depending on the rss feeds' contents. 

 
 What is the best approach for this kind of thing?

I'd usually store the feeds in a single index and shard if it's too many for a 
single server with your specifications. Unless the demands are too specific.

 
 Thanks in advance,
 Adam

[1]: http://wiki.apache.org/solr/CoreAdmin

Cheers



Re: dynamically create unique key

2010-11-09 Thread Dennis Gearon
Seems to me, it would be a good idea to put into the Solr Code, a unique ID per 
instance or installation or both, accessible either with JAVA or a query. Kind 
of like all the browsers do for their SSL connections.

Then, it's automatically easy to implement what is described below.

Maybe it should be written to the config file upon first run when it does not 
exist, and then any updates or reinstalls would reuse the same 
installation/instance ID.



From: Christopher Gross cogr...@gmail.com
To: solr-user@lucene.apache.org
Sent: Tue, November 9, 2010 11:37:03 AM
Subject: Re: dynamically create unique key

Thanks Hoss, I'll look into that!

-- Chris


On Tue, Nov 9, 2010 at 1:43 PM, Chris Hostetter hossman_luc...@fucit.orgwrote:


 : one large index.  I need to create a unique key for the Solr index that
 will
 : be unique per document.  If I have 3 systems, and they all have a
 document
 : with id=1, then I need to create a uniqueId field in my schema that
 : contains both the system name and that id, along the lines of: sysa1,
 : sysb1, and sysc1.  That way, each document will have a unique id.

 take a look at the SignatureUpdateProcessorFactory...

 http://wiki.apache.org/solr/Deduplication

 :   copyField source=source dest=uniqueId/
 :   copyField source=id dest=uniqueId/
 ...
 : So instead of just appending to the uniqueId field, it tried to do a
 : multiValued.  Does anyone have an idea on how I can make this work?

 copyField doesn't append it copies Field (value) instances from the
 source field to the dest field -- so if you get multiple values for
 hte dest field.


 -Hoss




RE: Using Multiple Cores for Multiple Users

2010-11-09 Thread Jonathan Rochkind
If storing in a single index (possibly sharded if you need it), you can simply 
include a solr field that specifies the user ID of the saved thing. On the 
client side, in your application, simply ensure that there is an fq parameter 
limiting to the current user, if you want to limit to the current user's stuff. 
 Relevancy ranking should work just as if you had 'seperate cores', there is no 
relevancy issue. 

It IS true that when your index gets very large, commits will start taking 
longer, which can be a problem. I don't mean commits will take longer just 
because there is more stuff to commit -- the larger the index, the longer an 
update to a single document will take to commit. 

In general, i suspect that having dozens or hundreds (or thousands!) of cores 
is not going to scale well, it is not going to make good use of your cpu/ram/hd 
resources.   Not really the intended use case of multiple cores. 

However, you are probably going to run into some issues with the single index 
approach too. In general, how to deal with multi-tenancy in Solr is an 
oft-asked question that there doesn't seem to be any just works and does 
everything for you without needing to think about it solution for in solr. 
Judging from past thread. I am not a Solr developer or expert. 


From: Markus Jelsma [markus.jel...@openindex.io]
Sent: Tuesday, November 09, 2010 6:57 PM
To: solr-user@lucene.apache.org
Cc: Adam Estrada
Subject: Re: Using Multiple Cores for Multiple Users

Hi,

 All,

 I have a web application that requires the user to register and then login
 to gain access to the site. Pretty standard stuff...Now I would like to
 know what the best approach would be to implement a customized search
 experience for each user. Would this mean creating a separate core per
 user? I think that this is not possible without restarting Solr after each
 core is added to the multi-core xml file, right?

No, you can dynamically manage cores and parts of their configuration.
Sometimes you must reindex after a change, the same is true for reloading
cores. Check the wiki on this one [1].


 My use case is this...User A would like to index 5 RSS feeds and User B
 would like to index 5 completely different RSS feeds and he is not
 interested at all in what User A is interested in. This means that they
 would have to be separate index cores, right?

If you view documents within an rss feed as a separate documents, you can
assign an user ID to those documents, creating a multi user index with rss
documents per user, or group or whatever.

Having a core per user isn't a good idea if you have many users.  It takes up
additional memory and disk space, doesn't share caches etc.  There is also
more maintenance and your need some support scripts to dynamically create new
cores - Solr currently doesn't create a new core directory structure.

But, reindexing a very large index takes up a lot more time and resources and
relevancy might be an issue depending on the rss feeds' contents.


 What is the best approach for this kind of thing?

I'd usually store the feeds in a single index and shard if it's too many for a
single server with your specifications. Unless the demands are too specific.


 Thanks in advance,
 Adam

[1]: http://wiki.apache.org/solr/CoreAdmin

Cheers


Re: Using Multiple Cores for Multiple Users

2010-11-09 Thread Adam Estrada
Thanks a lot for all the tips, guys! I think that we may explore both
options just to see what happens. I'm sure that scalability will be a huge
mess with the core-per-user scenario. I like the idea of creating a user ID
field and agree that it's probably the best approach. We'll see...I will be
sure to let the list know what I find! Please don't stop posting your
comments everyone ;-) My inquiring mind wants to know...

Adam

On Tue, Nov 9, 2010 at 7:34 PM, Jonathan Rochkind rochk...@jhu.edu wrote:

 If storing in a single index (possibly sharded if you need it), you can
 simply include a solr field that specifies the user ID of the saved thing.
 On the client side, in your application, simply ensure that there is an fq
 parameter limiting to the current user, if you want to limit to the current
 user's stuff.  Relevancy ranking should work just as if you had 'seperate
 cores', there is no relevancy issue.

 It IS true that when your index gets very large, commits will start taking
 longer, which can be a problem. I don't mean commits will take longer just
 because there is more stuff to commit -- the larger the index, the longer an
 update to a single document will take to commit.

 In general, i suspect that having dozens or hundreds (or thousands!) of
 cores is not going to scale well, it is not going to make good use of your
 cpu/ram/hd resources.   Not really the intended use case of multiple cores.

 However, you are probably going to run into some issues with the single
 index approach too. In general, how to deal with multi-tenancy in Solr is
 an oft-asked question that there doesn't seem to be any just works and does
 everything for you without needing to think about it solution for in solr.
 Judging from past thread. I am not a Solr developer or expert.

 
 From: Markus Jelsma [markus.jel...@openindex.io]
 Sent: Tuesday, November 09, 2010 6:57 PM
 To: solr-user@lucene.apache.org
 Cc: Adam Estrada
 Subject: Re: Using Multiple Cores for Multiple Users

 Hi,

  All,
 
  I have a web application that requires the user to register and then
 login
  to gain access to the site. Pretty standard stuff...Now I would like to
  know what the best approach would be to implement a customized search
  experience for each user. Would this mean creating a separate core per
  user? I think that this is not possible without restarting Solr after
 each
  core is added to the multi-core xml file, right?

 No, you can dynamically manage cores and parts of their configuration.
 Sometimes you must reindex after a change, the same is true for reloading
 cores. Check the wiki on this one [1].

 
  My use case is this...User A would like to index 5 RSS feeds and User B
  would like to index 5 completely different RSS feeds and he is not
  interested at all in what User A is interested in. This means that they
  would have to be separate index cores, right?

 If you view documents within an rss feed as a separate documents, you can
 assign an user ID to those documents, creating a multi user index with rss
 documents per user, or group or whatever.

 Having a core per user isn't a good idea if you have many users.  It takes
 up
 additional memory and disk space, doesn't share caches etc.  There is also
 more maintenance and your need some support scripts to dynamically create
 new
 cores - Solr currently doesn't create a new core directory structure.

 But, reindexing a very large index takes up a lot more time and resources
 and
 relevancy might be an issue depending on the rss feeds' contents.

 
  What is the best approach for this kind of thing?

 I'd usually store the feeds in a single index and shard if it's too many
 for a
 single server with your specifications. Unless the demands are too
 specific.

 
  Thanks in advance,
  Adam

 [1]: http://wiki.apache.org/solr/CoreAdmin

 Cheers



Highlighter - multiple instances of term being combined

2010-11-09 Thread Sasank Mudunuri
I'm finding that if a keyword appears in a field multiple times very close
together, it will get highlighted as a phrase even though there are other
terms between the two instances. So this search:

http://localhost:8983/solr/select/?

hl=true
hl.snippets=1
q=residue
hl.fragsize=0
mergeContiguous=false
indent=on
hl.usePhraseHighlighter=false
debugQuery=on
hl.fragmenter=gap
hl.highlightMultiTerm=false

Highlights as:
What does low-emresidue mean? Like low-residue/em diet?

Trying to get it to highlight as:
What does low-emresidue/em mean? Like low-emresidue/em diet?
I've tried playing with various combinations of mergeContiguous,
highlightMultiTerm, and usePhraseHighlighter, but they all yield the same
output.

For reference, field type uses a StandardTokenizerFactory and
SynonymFilterFactory, StopFilterFactory, StandardFilterFactory and
SnowballFilterFactory. I've confirmed that the intermediate words don't
appear in either the synonym or the stop words list. I can post the full
definition if helpful.

Any pointers as to how to debug this would be greatly appreciated!
sasank


Re: solr init.d script

2010-11-09 Thread Lance Norskog
As many solrs as you want can open an index for read-only queries. If
you have a shared disk with a global file system, this could work very
well.

A note: Solr sessions are stateless. There is no reason to run JBoss
Solr in fail-over mode with session replication.

On Tue, Nov 9, 2010 at 12:25 PM, Nikola Garafolic
nikola.garafo...@srce.hr wrote:
 On 11/09/2010 07:00 PM, Israel Ekpo wrote:

 Yes.

 I recommend running Solr via a servlet container.

 It is much easier to manage compared to running it by itself.

 On Tue, Nov 9, 2010 at 10:03 AM, Nikola Garafolic
 nikola.garafo...@srce.hrwrote:

 But in my case, that would make things more complex as I see it. Two jboss
 servers with solr as servlet container, and then I need the same data dir,
 right? I am now running single solr instance as cluster service, with data
 dir set to shared lun, that can be started on any of two hosts.

 Can you explain my benefits with two solr instances via servlet, maybe more
 performance?

 Regards,
 Nikola

 --
 Nikola Garafolic
 SRCE, Sveucilisni racunski centar
 tel: +385 1 6165 804
 email: nikola.garafo...@srce.hr






-- 
Lance Norskog
goks...@gmail.com


Re: returning message to sender

2010-11-09 Thread Lance Norskog
David Smiley and Eric Pugh wrote a wonderful book on Solr:

http://www.lucidimagination.com/blog/2010/01/11/book-review-solr-packt-book/

Reading through this book and trying the examples will address all of
your questions.

On Tue, Nov 9, 2010 at 3:23 PM, Erick Erickson erickerick...@gmail.com wrote:
 Hmmm, this is a little murky
 I'm inferring that you believe that DIH somehow
 queries the data source at #query# time, and this
 is not true.  DIH is an #index time# concept.

 DIH is used to add data to an index. Once that index is
 created, all searches against are unaware that there
 were different data sources.

 So, with a single Solr schema, you can use DIH
 on as many different data sources as you want,
 mapping the various bits of information from each
 data source into your Solr schema. Searches go
 against fields defined in the schema, so you're
 automatically searching against all the databases
 (assuming you've mapped your data into your
 schema)

 If I've misunderstood, perhaps you can add some
 details?

 Best
 Erick

 On Tue, Nov 9, 2010 at 1:39 PM, Teki, Prasad 
 prasad_t...@standardandpoors.com wrote:

 --=_Part_27114_30663314.1289327581322
 Content-Type: text/plain; charset=us-ascii
 Content-Transfer-Encoding: 7bit


 Hi guys,
 I have been exploring Solr since last few weeks. Our main intension is
 to
 expose the data, as WS, across various data sources by linking them
 using
 some scenario.

 I have couple of questions.
 Is there any good document/URL, which answers...

 How the indexing happens/built for the queries across different data
 sources
 (DIH)?

 Does the Lucene store the actual data of each individual query or a
 combination?, where, if yes?

 Whenever we do a query against built index, when exactly it fires the
 query
 to database?

 How does the index get the updates from the DIH, For example, if my
 query
 includes 3 DIH and
 What is the max number of data sources, I can include to get better
 performace?

 How do we measure the scalablity?

 Can I run these search engines in a grid mode?

 Thanks.
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Storage-tp1871155p1871155.html
 Sent from the Solr - User mailing list archive at Nabble.com.

 --=_Part_27114_30663314.1289327581322
 Content-Type: text/html; charset=us-ascii
 Content-Transfer-Encoding: 7bit


 Hi guys,
 I have been exploring Solr since last few weeks. Our main intension is
 to expose the data, as WS, across various data sources by linking them
 using some scenario.

 I have couple of questions.
 Is there any good document/URL, which answers...

 How the indexing happens/built for the queries across different data
 sources (DIH)?

 Does the Lucene store the actual data of each individual query or a
 combination?, where, if yes?

 Whenever we do a query against built index, when exactly it fires the
 query to database?

 How does the index get the updates from the DIH, For example, if my
 query includes 3 DIH and
 What is the max number of data sources, I can include to get better
 performace?

 How do we measure the scalablity?

 Can I run these search engines in a grid mode?

 Thanks.img class='smiley'
 src='http://n3.nabble.com/images/smiley/anim_confused.gif' /
 brhr align=left width=300
 View this message in context: a
 href=http://lucene.472066.n3.nabble.com/Storage-tp1871155p1871155.html;
 Storage/abr
 Sent from the a
 href=http://lucene.472066.n3.nabble.com/Solr-User-f472068.html;Solr -
 User mailing list archive/a at Nabble.com.br

 --=_Part_27114_30663314.1289327581322--
 Standard  Poor's: Empowering Investors and Markets for 150 Years

 

 The information contained in this message is intended only for the
 recipient, and may be a confidential attorney-client communication or may
 otherwise be privileged and confidential and protected from disclosure. If
 the reader of this message is not the intended recipient, or an employee or
 agent responsible for delivering this message to the intended recipient,
 please be aware that any dissemination or copying of this communication is
 strictly prohibited. If you have received this communication in error,
 please immediately notify us by replying to the message and deleting it from
 your computer. The McGraw-Hill Companies, Inc. reserves the right, subject
 to applicable local law, to monitor and review the content of any electronic
 message or information sent to or from McGraw-Hill employee e-mail addresses
 without informing the sender or recipient of the message.
 





-- 
Lance Norskog
goks...@gmail.com


Re: dynamically create unique key

2010-11-09 Thread Lance Norskog
Here is an exausting and exhaustive discursion about picking a unique key:

http://wiki.apache.org/solr/UniqueKey




On Tue, Nov 9, 2010 at 4:20 PM, Dennis Gearon gear...@sbcglobal.net wrote:
 Seems to me, it would be a good idea to put into the Solr Code, a unique ID 
 per
 instance or installation or both, accessible either with JAVA or a query. Kind
 of like all the browsers do for their SSL connections.

 Then, it's automatically easy to implement what is described below.

 Maybe it should be written to the config file upon first run when it does not
 exist, and then any updates or reinstalls would reuse the same
 installation/instance ID.



 From: Christopher Gross cogr...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Tue, November 9, 2010 11:37:03 AM
 Subject: Re: dynamically create unique key

 Thanks Hoss, I'll look into that!

 -- Chris


 On Tue, Nov 9, 2010 at 1:43 PM, Chris Hostetter 
 hossman_luc...@fucit.orgwrote:


 : one large index.  I need to create a unique key for the Solr index that
 will
 : be unique per document.  If I have 3 systems, and they all have a
 document
 : with id=1, then I need to create a uniqueId field in my schema that
 : contains both the system name and that id, along the lines of: sysa1,
 : sysb1, and sysc1.  That way, each document will have a unique id.

 take a look at the SignatureUpdateProcessorFactory...

 http://wiki.apache.org/solr/Deduplication

 :   copyField source=source dest=uniqueId/
 :   copyField source=id dest=uniqueId/
         ...
 : So instead of just appending to the uniqueId field, it tried to do a
 : multiValued.  Does anyone have an idea on how I can make this work?

 copyField doesn't append it copies Field (value) instances from the
 source field to the dest field -- so if you get multiple values for
 hte dest field.


 -Hoss






-- 
Lance Norskog
goks...@gmail.com


Re: Using Multiple Cores for Multiple Users

2010-11-09 Thread Lance Norskog
There is a standard problem with this: relevance is determined from
all of the words in a field of all documents, not just the documents
that match the query. That is, when user A searches for 'monkeys' and
one of his feeds has a document with this word, but someone else is a
zoophile, 'monkeys' will be a common word in the index. This will skew
the relevance computation for user A.

You could have a separate text field for each user. This might work
better- but you can't use field norms (they take up space for all
documents).

Lance

On Tue, Nov 9, 2010 at 6:00 PM, Adam Estrada
estrada.adam.gro...@gmail.com wrote:
 Thanks a lot for all the tips, guys! I think that we may explore both
 options just to see what happens. I'm sure that scalability will be a huge
 mess with the core-per-user scenario. I like the idea of creating a user ID
 field and agree that it's probably the best approach. We'll see...I will be
 sure to let the list know what I find! Please don't stop posting your
 comments everyone ;-) My inquiring mind wants to know...

 Adam

 On Tue, Nov 9, 2010 at 7:34 PM, Jonathan Rochkind rochk...@jhu.edu wrote:

 If storing in a single index (possibly sharded if you need it), you can
 simply include a solr field that specifies the user ID of the saved thing.
 On the client side, in your application, simply ensure that there is an fq
 parameter limiting to the current user, if you want to limit to the current
 user's stuff.  Relevancy ranking should work just as if you had 'seperate
 cores', there is no relevancy issue.

 It IS true that when your index gets very large, commits will start taking
 longer, which can be a problem. I don't mean commits will take longer just
 because there is more stuff to commit -- the larger the index, the longer an
 update to a single document will take to commit.

 In general, i suspect that having dozens or hundreds (or thousands!) of
 cores is not going to scale well, it is not going to make good use of your
 cpu/ram/hd resources.   Not really the intended use case of multiple cores.

 However, you are probably going to run into some issues with the single
 index approach too. In general, how to deal with multi-tenancy in Solr is
 an oft-asked question that there doesn't seem to be any just works and does
 everything for you without needing to think about it solution for in solr.
 Judging from past thread. I am not a Solr developer or expert.

 
 From: Markus Jelsma [markus.jel...@openindex.io]
 Sent: Tuesday, November 09, 2010 6:57 PM
 To: solr-user@lucene.apache.org
 Cc: Adam Estrada
 Subject: Re: Using Multiple Cores for Multiple Users

 Hi,

  All,
 
  I have a web application that requires the user to register and then
 login
  to gain access to the site. Pretty standard stuff...Now I would like to
  know what the best approach would be to implement a customized search
  experience for each user. Would this mean creating a separate core per
  user? I think that this is not possible without restarting Solr after
 each
  core is added to the multi-core xml file, right?

 No, you can dynamically manage cores and parts of their configuration.
 Sometimes you must reindex after a change, the same is true for reloading
 cores. Check the wiki on this one [1].

 
  My use case is this...User A would like to index 5 RSS feeds and User B
  would like to index 5 completely different RSS feeds and he is not
  interested at all in what User A is interested in. This means that they
  would have to be separate index cores, right?

 If you view documents within an rss feed as a separate documents, you can
 assign an user ID to those documents, creating a multi user index with rss
 documents per user, or group or whatever.

 Having a core per user isn't a good idea if you have many users.  It takes
 up
 additional memory and disk space, doesn't share caches etc.  There is also
 more maintenance and your need some support scripts to dynamically create
 new
 cores - Solr currently doesn't create a new core directory structure.

 But, reindexing a very large index takes up a lot more time and resources
 and
 relevancy might be an issue depending on the rss feeds' contents.

 
  What is the best approach for this kind of thing?

 I'd usually store the feeds in a single index and shard if it's too many
 for a
 single server with your specifications. Unless the demands are too
 specific.

 
  Thanks in advance,
  Adam

 [1]: http://wiki.apache.org/solr/CoreAdmin

 Cheers





-- 
Lance Norskog
goks...@gmail.com


Re: Highlighter - multiple instances of term being combined

2010-11-09 Thread Lance Norskog
Have you looked at solr/admin/analysis.jsp? This is 'Analysis' link
off the main solr admin page. It will show you how text is broken up
for both the indexing and query processes. You might get some insight
about how these words are torn apart and assigned positions. Trying
the different Analyzers and options might get you there.

But to be frank- highlighting is a tough problem and has always had a
lot of edge cases.

On Tue, Nov 9, 2010 at 6:08 PM, Sasank Mudunuri sas...@gmail.com wrote:
 I'm finding that if a keyword appears in a field multiple times very close
 together, it will get highlighted as a phrase even though there are other
 terms between the two instances. So this search:

 http://localhost:8983/solr/select/?

 hl=true
 hl.snippets=1
 q=residue
 hl.fragsize=0
 mergeContiguous=false
 indent=on
 hl.usePhraseHighlighter=false
 debugQuery=on
 hl.fragmenter=gap
 hl.highlightMultiTerm=false

 Highlights as:
 What does low-emresidue mean? Like low-residue/em diet?

 Trying to get it to highlight as:
 What does low-emresidue/em mean? Like low-emresidue/em diet?
 I've tried playing with various combinations of mergeContiguous,
 highlightMultiTerm, and usePhraseHighlighter, but they all yield the same
 output.

 For reference, field type uses a StandardTokenizerFactory and
 SynonymFilterFactory, StopFilterFactory, StandardFilterFactory and
 SnowballFilterFactory. I've confirmed that the intermediate words don't
 appear in either the synonym or the stop words list. I can post the full
 definition if helpful.

 Any pointers as to how to debug this would be greatly appreciated!
 sasank




-- 
Lance Norskog
goks...@gmail.com


scheduling imports and heartbeats

2010-11-09 Thread Tri Nguyen
Hi,
 
Can I configure solr to schedule imports at a specified time (say once a day, 
once an hour, etc)?
 
Also, does solr have some sort of heartbeat mechanism?
 
Thanks,
 
Tri

Re: Using Multiple Cores for Multiple Users

2010-11-09 Thread Dennis Gearon
hm, relevance is before filtering, probably during indexing?
 Dennis Gearon 


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die. 



- Original Message 
From: Lance Norskog goks...@gmail.com
To: solr-user@lucene.apache.org
Sent: Tue, November 9, 2010 7:07:45 PM
Subject: Re: Using Multiple Cores for Multiple Users

There is a standard problem with this: relevance is determined from
all of the words in a field of all documents, not just the documents
that match the query. That is, when user A searches for 'monkeys' and
one of his feeds has a document with this word, but someone else is a
zoophile, 'monkeys' will be a common word in the index. This will skew
the relevance computation for user A.

You could have a separate text field for each user. This might work
better- but you can't use field norms (they take up space for all
documents).

Lance

On Tue, Nov 9, 2010 at 6:00 PM, Adam Estrada
estrada.adam.gro...@gmail.com wrote:
 Thanks a lot for all the tips, guys! I think that we may explore both
 options just to see what happens. I'm sure that scalability will be a huge
 mess with the core-per-user scenario. I like the idea of creating a user ID
 field and agree that it's probably the best approach. We'll see...I will be
 sure to let the list know what I find! Please don't stop posting your
 comments everyone ;-) My inquiring mind wants to know...

 Adam

 On Tue, Nov 9, 2010 at 7:34 PM, Jonathan Rochkind rochk...@jhu.edu wrote:

 If storing in a single index (possibly sharded if you need it), you can
 simply include a solr field that specifies the user ID of the saved thing.
 On the client side, in your application, simply ensure that there is an fq
 parameter limiting to the current user, if you want to limit to the current
 user's stuff.  Relevancy ranking should work just as if you had 'seperate
 cores', there is no relevancy issue.

 It IS true that when your index gets very large, commits will start taking
 longer, which can be a problem. I don't mean commits will take longer just
 because there is more stuff to commit -- the larger the index, the longer an
 update to a single document will take to commit.

 In general, i suspect that having dozens or hundreds (or thousands!) of
 cores is not going to scale well, it is not going to make good use of your
 cpu/ram/hd resources.   Not really the intended use case of multiple cores.

 However, you are probably going to run into some issues with the single
 index approach too. In general, how to deal with multi-tenancy in Solr is
 an oft-asked question that there doesn't seem to be any just works and does
 everything for you without needing to think about it solution for in solr.
 Judging from past thread. I am not a Solr developer or expert.

 
 From: Markus Jelsma [markus.jel...@openindex.io]
 Sent: Tuesday, November 09, 2010 6:57 PM
 To: solr-user@lucene.apache.org
 Cc: Adam Estrada
 Subject: Re: Using Multiple Cores for Multiple Users

 Hi,

  All,
 
  I have a web application that requires the user to register and then
 login
  to gain access to the site. Pretty standard stuff...Now I would like to
  know what the best approach would be to implement a customized search
  experience for each user. Would this mean creating a separate core per
  user? I think that this is not possible without restarting Solr after
 each
  core is added to the multi-core xml file, right?

 No, you can dynamically manage cores and parts of their configuration.
 Sometimes you must reindex after a change, the same is true for reloading
 cores. Check the wiki on this one [1].

 
  My use case is this...User A would like to index 5 RSS feeds and User B
  would like to index 5 completely different RSS feeds and he is not
  interested at all in what User A is interested in. This means that they
  would have to be separate index cores, right?

 If you view documents within an rss feed as a separate documents, you can
 assign an user ID to those documents, creating a multi user index with rss
 documents per user, or group or whatever.

 Having a core per user isn't a good idea if you have many users.  It takes
 up
 additional memory and disk space, doesn't share caches etc.  There is also
 more maintenance and your need some support scripts to dynamically create
 new
 cores - Solr currently doesn't create a new core directory structure.

 But, reindexing a very large index takes up a lot more time and resources
 and
 relevancy might be an issue depending on the rss feeds' contents.

 
  What is the best approach for this kind of thing?

 I'd usually store the feeds in a single index and shard if it's too many
 for a
 single server with your specifications. Unless 

Re: Using Multiple Cores for Multiple Users

2010-11-09 Thread Lance Norskog
Relevance is TF/DF, meaning the term frequency in the index. DF is the
number of times the term appears in the document.

There is no quick calculation for total frequency for terms only in
these documents. Facets do this, and they're very very slow.

On Tue, Nov 9, 2010 at 7:50 PM, Dennis Gearon gear...@sbcglobal.net wrote:
 hm, relevance is before filtering, probably during indexing?
  Dennis Gearon


 Signature Warning
 
 It is always a good idea to learn from your own mistakes. It is usually a 
 better
 idea to learn from others’ mistakes, so you do not have to make them yourself.
 from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'


 EARTH has a Right To Life,
 otherwise we all die.



 - Original Message 
 From: Lance Norskog goks...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Tue, November 9, 2010 7:07:45 PM
 Subject: Re: Using Multiple Cores for Multiple Users

 There is a standard problem with this: relevance is determined from
 all of the words in a field of all documents, not just the documents
 that match the query. That is, when user A searches for 'monkeys' and
 one of his feeds has a document with this word, but someone else is a
 zoophile, 'monkeys' will be a common word in the index. This will skew
 the relevance computation for user A.

 You could have a separate text field for each user. This might work
 better- but you can't use field norms (they take up space for all
 documents).

 Lance

 On Tue, Nov 9, 2010 at 6:00 PM, Adam Estrada
 estrada.adam.gro...@gmail.com wrote:
 Thanks a lot for all the tips, guys! I think that we may explore both
 options just to see what happens. I'm sure that scalability will be a huge
 mess with the core-per-user scenario. I like the idea of creating a user ID
 field and agree that it's probably the best approach. We'll see...I will be
 sure to let the list know what I find! Please don't stop posting your
 comments everyone ;-) My inquiring mind wants to know...

 Adam

 On Tue, Nov 9, 2010 at 7:34 PM, Jonathan Rochkind rochk...@jhu.edu wrote:

 If storing in a single index (possibly sharded if you need it), you can
 simply include a solr field that specifies the user ID of the saved thing.
 On the client side, in your application, simply ensure that there is an fq
 parameter limiting to the current user, if you want to limit to the current
 user's stuff.  Relevancy ranking should work just as if you had 'seperate
 cores', there is no relevancy issue.

 It IS true that when your index gets very large, commits will start taking
 longer, which can be a problem. I don't mean commits will take longer just
 because there is more stuff to commit -- the larger the index, the longer an
 update to a single document will take to commit.

 In general, i suspect that having dozens or hundreds (or thousands!) of
 cores is not going to scale well, it is not going to make good use of your
 cpu/ram/hd resources.   Not really the intended use case of multiple cores.

 However, you are probably going to run into some issues with the single
 index approach too. In general, how to deal with multi-tenancy in Solr is
 an oft-asked question that there doesn't seem to be any just works and does
 everything for you without needing to think about it solution for in solr.
 Judging from past thread. I am not a Solr developer or expert.

 
 From: Markus Jelsma [markus.jel...@openindex.io]
 Sent: Tuesday, November 09, 2010 6:57 PM
 To: solr-user@lucene.apache.org
 Cc: Adam Estrada
 Subject: Re: Using Multiple Cores for Multiple Users

 Hi,

  All,
 
  I have a web application that requires the user to register and then
 login
  to gain access to the site. Pretty standard stuff...Now I would like to
  know what the best approach would be to implement a customized search
  experience for each user. Would this mean creating a separate core per
  user? I think that this is not possible without restarting Solr after
 each
  core is added to the multi-core xml file, right?

 No, you can dynamically manage cores and parts of their configuration.
 Sometimes you must reindex after a change, the same is true for reloading
 cores. Check the wiki on this one [1].

 
  My use case is this...User A would like to index 5 RSS feeds and User B
  would like to index 5 completely different RSS feeds and he is not
  interested at all in what User A is interested in. This means that they
  would have to be separate index cores, right?

 If you view documents within an rss feed as a separate documents, you can
 assign an user ID to those documents, creating a multi user index with rss
 documents per user, or group or whatever.

 Having a core per user isn't a good idea if you have many users.  It takes
 up
 additional memory and disk space, doesn't share caches etc.  There is also
 more maintenance and your need some support scripts to dynamically create
 new
 cores - Solr currently doesn't create a new core 

Re: scheduling imports and heartbeats

2010-11-09 Thread Ranveer Kumar
You should use cron for that..

On 10 Nov 2010 08:47, Tri Nguyen tringuye...@yahoo.com wrote:

Hi,

Can I configure solr to schedule imports at a specified time (say once a
day,
once an hour, etc)?

Also, does solr have some sort of heartbeat mechanism?

Thanks,

Tri


Re: Using Multiple Cores for Multiple Users

2010-11-09 Thread Dennis Gearon
So, if my other filter/selection criteria get some set of the whole index that 
goes say from 50% relevance to 60% relevance, the set still gets ordered by 
relevance and then each item in the returned set is still based on its 
relevance 
relative to the set, right? That would only be a problem if there was some 
minimal relevance desired, right?

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: Lance Norskog goks...@gmail.com
To: solr-user@lucene.apache.org
Sent: Tue, November 9, 2010 8:00:09 PM
Subject: Re: Using Multiple Cores for Multiple Users

Relevance is TF/DF, meaning the term frequency in the index. DF is the
number of times the term appears in the document.

There is no quick calculation for total frequency for terms only in
these documents. Facets do this, and they're very very slow.

On Tue, Nov 9, 2010 at 7:50 PM, Dennis Gearon gear...@sbcglobal.net wrote:
 hm, relevance is before filtering, probably during indexing?
  Dennis Gearon


 Signature Warning
 
 It is always a good idea to learn from your own mistakes. It is usually a 
better
 idea to learn from others’ mistakes, so you do not have to make them yourself.
 from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'


 EARTH has a Right To Life,
 otherwise we all die.



 - Original Message 
 From: Lance Norskog goks...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Tue, November 9, 2010 7:07:45 PM
 Subject: Re: Using Multiple Cores for Multiple Users

 There is a standard problem with this: relevance is determined from
 all of the words in a field of all documents, not just the documents
 that match the query. That is, when user A searches for 'monkeys' and
 one of his feeds has a document with this word, but someone else is a
 zoophile, 'monkeys' will be a common word in the index. This will skew
 the relevance computation for user A.

 You could have a separate text field for each user. This might work
 better- but you can't use field norms (they take up space for all
 documents).

 Lance

 On Tue, Nov 9, 2010 at 6:00 PM, Adam Estrada
 estrada.adam.gro...@gmail.com wrote:
 Thanks a lot for all the tips, guys! I think that we may explore both
 options just to see what happens. I'm sure that scalability will be a huge
 mess with the core-per-user scenario. I like the idea of creating a user ID
 field and agree that it's probably the best approach. We'll see...I will be
 sure to let the list know what I find! Please don't stop posting your
 comments everyone ;-) My inquiring mind wants to know...

 Adam

 On Tue, Nov 9, 2010 at 7:34 PM, Jonathan Rochkind rochk...@jhu.edu wrote:

 If storing in a single index (possibly sharded if you need it), you can
 simply include a solr field that specifies the user ID of the saved thing.
 On the client side, in your application, simply ensure that there is an fq
 parameter limiting to the current user, if you want to limit to the current
 user's stuff.  Relevancy ranking should work just as if you had 'seperate
 cores', there is no relevancy issue.

 It IS true that when your index gets very large, commits will start taking
 longer, which can be a problem. I don't mean commits will take longer just
 because there is more stuff to commit -- the larger the index, the longer an
 update to a single document will take to commit.

 In general, i suspect that having dozens or hundreds (or thousands!) of
 cores is not going to scale well, it is not going to make good use of your
 cpu/ram/hd resources.   Not really the intended use case of multiple cores.

 However, you are probably going to run into some issues with the single
 index approach too. In general, how to deal with multi-tenancy in Solr is
 an oft-asked question that there doesn't seem to be any just works and does
 everything for you without needing to think about it solution for in solr.
 Judging from past thread. I am not a Solr developer or expert.

 
 From: Markus Jelsma [markus.jel...@openindex.io]
 Sent: Tuesday, November 09, 2010 6:57 PM
 To: solr-user@lucene.apache.org
 Cc: Adam Estrada
 Subject: Re: Using Multiple Cores for Multiple Users

 Hi,

  All,
 
  I have a web application that requires the user to register and then
 login
  to gain access to the site. Pretty standard stuff...Now I would like to
  know what the best approach would be to implement a customized search
  experience for each user. Would this mean creating a separate core per
  user? I think that this is not possible without restarting Solr after
 each
  core is added to the multi-core xml file, right?

 No, you can dynamically manage cores and parts of their