URLDecoder error message

2013-02-12 Thread o.mares
Hey, 

yesterday we updated from solr 4.0 to solr 4.1 and since then from time to
time following error pops up:

{msg=URLDecoder: Invalid character encoding detected after position 160 of
query string / form data (while parsing as UTF-8),code=400}:
{msg=URLDecoder: Invalid character encoding detected after position 160 of
query string / form data (while parsing as UTF-8),code=400}

Is this an issue with incorrect input data, or is something broken with
our solr?
Solr 4.0 did not gave us these errors.

Regards
Ota



--
View this message in context: 
http://lucene.472066.n3.nabble.com/URLDecoder-error-message-tp4039883.html
Sent from the Solr - User mailing list archive at Nabble.com.


compare two shards.

2013-02-12 Thread stockii
hello.

i want to compare two shards each other, because these shards should have
the same index. but this isnt so =(
so i want to find these documents, there are missing in one shard of my both
shards.

my ideas
- distrubuted shard request on my nodes and fire a facet search on my
unique-field. but the result of facet component isnt reversable =( 

- grouping. but its not working correctly i think so. no groups of the same
uniquekey in the resultset.


does anyone some better ideas?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/compare-two-shards-tp4039887.html
Sent from the Solr - User mailing list archive at Nabble.com.


Indexed And Stored

2013-02-12 Thread anurag.jain
hello, 

in my schema

field name=city_name type=text_general indexed=false stored=true/

and i updated 18 data. 


now i need indexed=true for all old data. 

i need solution  

please someone help me out. 

please reply urgent!!
thanks  



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexed-And-Stored-tp4039893.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Indexed And Stored

2013-02-12 Thread Gora Mohanty
On 12 February 2013 15:49, anurag.jain anurag.k...@gmail.com wrote:
 hello,

 in my schema

 field name=city_name type=text_general indexed=false stored=true/

 and i updated 18 data.


 now i need indexed=true for all old data.

 i need solution
[...]

You have no choice but to change the schema, and
reindex, either from the original source, or by first
pulling from the existing stored values.

Regards,
Gora


Re: replication problems with solr4.1

2013-02-12 Thread Bernd Fehling

Now this is strange, the index generation and index version
is changing with replication.

e.g. master has index generation 118 index version 136059533234
and  slave  has index generation 118 index version 136059533234
are both same.

Now add one doc to master with commit.
master has index generation 119 index version 1360595446556

Next replicate master to slave. The result is:
master has index generation 119 index version 1360595446556
slave  has index generation 120 index version 1360595564333

I have not seen this before.
I thought replication is just taking over the index from master to slave,
more like a sync?




Am 11.02.2013 09:29, schrieb Bernd Fehling:
 Hi list,
 
 after upgrading from solr4.0 to solr4.1 and running it for two weeks now
 it turns out that replication has problems and unpredictable results.
 My installation is single index 41 mio. docs / 115 GB index size / 1 master / 
 3 slaves.
 - the master builds a new index from scratch once a week
 - a replication is started manually with Solr admin GUI
 
 What I see is one of these cases:
 - after a replication a new searcher is opened on index.xxx directory and
   the old data/index/ directory is never deleted and besides the file
   replication.properties there is also a file index.properties
 OR
 - the replication takes place everything looks fine but when opening the 
 admin GUI
   the statistics report
 Last Modified: a day ago
 Num Docs: 42262349
 Max Doc:  42262349
 Deleted Docs:  0
 Version:  45174
 Segment Count: 1
 
 VersionGen  Size
 Master: 1360483635404  112  116.5 GB
 Slave:1360483806741  113  116.5 GB
 
 
 In the first case, why is the replication doing that???
 It is an offline slave, no search activity, just there fore backup!
 
 
 In the second case, why is the version and generation different right after
 full replication?
 
 
 Any thoughts on this?
 
 
 - Bernd
 

-- 
*
Bernd FehlingBielefeld University Library
Dipl.-Inform. (FH)LibTec - Library Technology
Universitätsstr. 25  and Knowledge Management
33615 Bielefeld
Tel. +49 521 106-4060   bernd.fehling(at)uni-bielefeld.de

BASE - Bielefeld Academic Search Engine - www.base-search.net
*


Re: Indexed And Stored

2013-02-12 Thread Rafał Kuć
Hello!

The simplest way will be updating your schema.xml file, do the change
that needs to be done and fully re-index your data. Solr wont be able
to automatically change not indexed field to indexed one.

You could also use the partial document update API of Solr if you
don't have your original data, however there are a few limitations. In
order to fully reconstruct your documents you would have to have all
the fields stored in your index. 

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

 hello, 

 in my schema

 field name=city_name type=text_general indexed=false stored=true/

 and i updated 18 data. 


 now i need indexed=true for all old data. 

 i need solution  

 please someone help me out. 

 please reply urgent!!
 thanks  



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Indexed-And-Stored-tp4039893.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Maximum Number of Records In Index

2013-02-12 Thread Macroman
Our document ID's are most definately distinct and there are partial updates
to existing records, I have run SQL queries outside of SOLR to validate
records going in and only about 1% are updates to existing records. There
are no deletes underway every day new records are added or updated. Example
for today. Before Data Handler ran, 13,586,537 records in SOLR all distinct
ID's. Records extracted from 7 different sources to go into SOLR index was ,
45,345, of these 1,912 were updates to existing records. Thus 43,433 were
new records each with a new ID. I made sure ID's we always distinct. Yet our
index now says 13,589,646. Indicating that only 3,109 new records went into
the index. Thus missing 40,324 records. I use Date Facet Range and can see
that there is an increase for January and February this year. In conclusion
I have to say that it must be removing earlier records somehow despite no
knowing where this may be controlled/set if at all. If there is a possible
configuration to remove or weed records where is this configured? Our SOLR
is virtually out of the box and only SOLCONFIG and SCHEMA amended to suit
the needs of our business for fields and field types indexed. We also have
configured the macro.s Velocity to display results. So none the wiser and
thank you to all whom have responded so far.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Maximum-Number-of-Records-In-Index-tp4038961p4039908.html
Sent from the Solr - User mailing list archive at Nabble.com.


Tag facet.query excludes are broken when group.facet=true - SOLR 4.1 Bug?

2013-02-12 Thread Mark Beeby
I'm trying to use facets alongside grouping, however when I ask SOLR to compute 
grouped facet counts (group.facet=true, see 
http://wiki.apache.org/solr/FieldCollapsing) it no longer honours facet.query 
excludes, however without this (group.facet=false) the exclude works again 
without any problems. Have I mis-understood the purpose of group.facet or is 
there, as it appears to me, a bug in SOLR 4.1?

Here is the simplest version of a query that shows the problem I'm having:

http://localhost:8080/wmp/product/select
?q=title_search:history
rows=0
fq={!tag=format}formatLegend:Paperback
group=true
group.field=titleCode
group.limit=9
group.facet=true
facet=true
facet.query={!key=Paperback ex=format}formatLegend:Paperback
facet.query={!key=Hardback ex=format}formatLegend:Hardback

This produces:
lst name=facet_queries
int name=Paperback1492/int
int name=Hardback0/int
/lst

However just switching group.facet=false, the following is produced, showing 
the exclude appears to have been ignored previously:
lst name=facet_queries
int name=Paperback1492/int
int name=Hardback1361/int
/lst
Anybody else tried using this combination of excludes, facet queries and 
grouping and got this working?

Kind Regards,
Mark

Re: Maximum Number of Records In Index

2013-02-12 Thread Joel Bernstein
A couple of things to check.

1) Have you retained your solr logs. If so, take a look in them for
indexing errors.
2) What is the difference between maxdocs and numdocs. This will give an
indication if a large number of records are being deleted or updated.
3) Can you explain your partial updates? Are you sending the entire
document again for the partial update?


Try to debug your next load. Perform the load, and watch the logs for
errors. Write a program that loops through each doc and checks to see if
the doc is present in the index.





On Tue, Feb 12, 2013 at 6:19 AM, Macroman peter0...@hotmail.com wrote:

 Our document ID's are most definately distinct and there are partial
 updates
 to existing records, I have run SQL queries outside of SOLR to validate
 records going in and only about 1% are updates to existing records. There
 are no deletes underway every day new records are added or updated. Example
 for today. Before Data Handler ran, 13,586,537 records in SOLR all distinct
 ID's. Records extracted from 7 different sources to go into SOLR index was
 ,
 45,345, of these 1,912 were updates to existing records. Thus 43,433 were
 new records each with a new ID. I made sure ID's we always distinct. Yet
 our
 index now says 13,589,646. Indicating that only 3,109 new records went into
 the index. Thus missing 40,324 records. I use Date Facet Range and can see
 that there is an increase for January and February this year. In conclusion
 I have to say that it must be removing earlier records somehow despite no
 knowing where this may be controlled/set if at all. If there is a possible
 configuration to remove or weed records where is this configured? Our SOLR
 is virtually out of the box and only SOLCONFIG and SCHEMA amended to suit
 the needs of our business for fields and field types indexed. We also have
 configured the macro.s Velocity to display results. So none the wiser and
 thank you to all whom have responded so far.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Maximum-Number-of-Records-In-Index-tp4038961p4039908.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Joel Bernstein
Professional Services LucidWorks


LoadBalancing while adding documents

2013-02-12 Thread J Mohamed Zahoor
Hi

I have multi shard replicated index spread across two machines. 

Once a week, i delete the entire index and create it from scratch.
Today i am using ConcurrentUpdateSolrServer in solrj to add documents to the 
index.

I want to add documents through both the servers.. to utilise the resources...
i read in wiki (i think)  that LBHttpSolrServer should not be used for indexing 
documents.

Is there any other way to send request to both the servers without using any 
external load balancers?
I am using Solr 4.1.

./zahoor

Re: Indexed And Stored

2013-02-12 Thread anurag.jain
Actually problem is i updated data first. and then i have to add new fields
so i made another json file 

[
{
id:2131,
newfield:{add:2121}
},

{
id:21,
newfield:{add:21}
}
]



now i have two different files. so if i try to update previous file for
indexed = true. it erase new field





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexed-And-Stored-tp4039893p4039930.html
Sent from the Solr - User mailing list archive at Nabble.com.


solrcloud-zookeeper

2013-02-12 Thread adm1n
Hi all,
the first question:
is there a way to reduce timeout when sold shard comes up? it looks in log
file as follows:

Feb 12, 2013 1:19:08 PM org.apache.solr.cloud.ShardLeaderElectionContext
waitForReplicasToComeUp
INFO: Waiting until we see more replicas up: total=2 found=1
timeoutin=178992
Feb 12, 2013 1:19:09 PM org.apache.solr.cloud.ShardLeaderElectionContext
waitForReplicasToComeUp
INFO: Waiting until we see more replicas up: total=2 found=1
timeoutin=178489
Feb 12, 2013 1:19:09 PM org.apache.solr.cloud.ShardLeaderElectionContext
waitForReplicasToComeUp
INFO: Waiting until we see more replicas up: total=2 found=1
timeoutin=177986


And another one - let's assume I have 2 shards and one of them is down (both
- master and slave) for some reason.
What is happening now is that cluster returns 503 on the search request. Is
there a way to configure to get responses from other shard?


thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solrcloud-zookeeper-tp4039934.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: memory leak - multiple cores

2013-02-12 Thread Michael Della Bitta
Marcos,

You could consider using the CoreAdminHandler instead:

http://wiki.apache.org/solr/CoreAdmin#CoreAdminHandler

It works extremely well.

Otherwise, you should periodically restart Tomcat. I'm not sure how
much memory would be leaked, but it's likely not going to have much of
an impact for a few iterations.


Michael Della Bitta


Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Mon, Feb 11, 2013 at 8:45 PM, Marcos Mendez mar...@jitisoft.com wrote:
 Hi Michael,

 Yes, we do intend to reload Solr when deploying new cores. So we deploy it, 
 update solr.xml and then restart Solr only. So this will happen sometimes in 
 production, but mostly testing. Which means it will be a real pain. Any way 
 to fix this?

 Also, I'm running geronimo with -Xmx1024m -XX:MaxPermSize=256m.

 Regards,
 Marcos

 On Feb 6, 2013, at 10:54 AM, Michael Della Bitta wrote:

 Marcos,

 The later 3 errors are common and won't pose a problem unless you
 intend to reload the Solr application without restarting Geronimo
 often.

 The first error, however, shouldn't happen. Have you changed the size
 of PermGen at all? I noticed this error while testing Solr 4.0 in
 Tomcat, but haven't seen it with Solr 4.1 (yet), so if you're on 4.0,
 you might want to try upgrading.


 Michael Della Bitta

 
 Appinions
 18 East 41st Street, 2nd Floor
 New York, NY 10017-6271

 www.appinions.com

 Where Influence Isn’t a Game


 On Wed, Feb 6, 2013 at 6:09 AM, Marcos Mendez mar...@jitisoft.com wrote:
 Hi,

 I'm deploying the SOLR war in Geronimo, with multiple cores. I'm seeing the
 following issue and it eats up a lot of memory when shutting down. Has
 anyone seen this and have an idea how to solve it?

 Exception in thread DefaultThreadPool 196 java.lang.OutOfMemoryError:
 PermGen space
 2013-02-05 20:13:34,747 ERROR [ConcurrentLRUCache] ConcurrentLRUCache was
 not destroyed prior to finalize(), indicates a bug -- POSSIBLE RESOURCE
 LEAK!!!
 2013-02-05 20:13:34,747 ERROR [ConcurrentLRUCache] ConcurrentLRUCache was
 not destroyed prior to finalize(), indicates a bug -- POSSIBLE RESOURCE
 LEAK!!!
 2013-02-05 20:13:34,747 ERROR [CoreContainer] CoreContainer was not
 shutdown prior to finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!!
 instance=2080324477

 Regards,
 Marcos



Re: memory leak - multiple cores

2013-02-12 Thread Michael Della Bitta
I should also say that there can easily be memory leaked from permgen
space when reloading webapps in Tomcat regardless of what resources
the app creates because class references from the context classloader
to the parent classloader can't be collected appropriately, so
restarting Tomcat periodically when you reload webapps is a good
practice either way.

Michael Della Bitta


Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Tue, Feb 12, 2013 at 9:03 AM, Michael Della Bitta
michael.della.bi...@appinions.com wrote:
 Marcos,

 You could consider using the CoreAdminHandler instead:

 http://wiki.apache.org/solr/CoreAdmin#CoreAdminHandler

 It works extremely well.

 Otherwise, you should periodically restart Tomcat. I'm not sure how
 much memory would be leaked, but it's likely not going to have much of
 an impact for a few iterations.


 Michael Della Bitta

 
 Appinions
 18 East 41st Street, 2nd Floor
 New York, NY 10017-6271

 www.appinions.com

 Where Influence Isn’t a Game


 On Mon, Feb 11, 2013 at 8:45 PM, Marcos Mendez mar...@jitisoft.com wrote:
 Hi Michael,

 Yes, we do intend to reload Solr when deploying new cores. So we deploy it, 
 update solr.xml and then restart Solr only. So this will happen sometimes in 
 production, but mostly testing. Which means it will be a real pain. Any way 
 to fix this?

 Also, I'm running geronimo with -Xmx1024m -XX:MaxPermSize=256m.

 Regards,
 Marcos

 On Feb 6, 2013, at 10:54 AM, Michael Della Bitta wrote:

 Marcos,

 The later 3 errors are common and won't pose a problem unless you
 intend to reload the Solr application without restarting Geronimo
 often.

 The first error, however, shouldn't happen. Have you changed the size
 of PermGen at all? I noticed this error while testing Solr 4.0 in
 Tomcat, but haven't seen it with Solr 4.1 (yet), so if you're on 4.0,
 you might want to try upgrading.


 Michael Della Bitta

 
 Appinions
 18 East 41st Street, 2nd Floor
 New York, NY 10017-6271

 www.appinions.com

 Where Influence Isn’t a Game


 On Wed, Feb 6, 2013 at 6:09 AM, Marcos Mendez mar...@jitisoft.com wrote:
 Hi,

 I'm deploying the SOLR war in Geronimo, with multiple cores. I'm seeing the
 following issue and it eats up a lot of memory when shutting down. Has
 anyone seen this and have an idea how to solve it?

 Exception in thread DefaultThreadPool 196 java.lang.OutOfMemoryError:
 PermGen space
 2013-02-05 20:13:34,747 ERROR [ConcurrentLRUCache] ConcurrentLRUCache was
 not destroyed prior to finalize(), indicates a bug -- POSSIBLE RESOURCE
 LEAK!!!
 2013-02-05 20:13:34,747 ERROR [ConcurrentLRUCache] ConcurrentLRUCache was
 not destroyed prior to finalize(), indicates a bug -- POSSIBLE RESOURCE
 LEAK!!!
 2013-02-05 20:13:34,747 ERROR [CoreContainer] CoreContainer was not
 shutdown prior to finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!!
 instance=2080324477

 Regards,
 Marcos



Re: memory leak - multiple cores

2013-02-12 Thread Marcos Mendez
Many thanks! I will try to use the CoreAdminHandler and see if that solves the 
issue!

On Feb 12, 2013, at 9:05 AM, Michael Della Bitta wrote:

 I should also say that there can easily be memory leaked from permgen
 space when reloading webapps in Tomcat regardless of what resources
 the app creates because class references from the context classloader
 to the parent classloader can't be collected appropriately, so
 restarting Tomcat periodically when you reload webapps is a good
 practice either way.
 
 Michael Della Bitta
 
 
 Appinions
 18 East 41st Street, 2nd Floor
 New York, NY 10017-6271
 
 www.appinions.com
 
 Where Influence Isn’t a Game
 
 
 On Tue, Feb 12, 2013 at 9:03 AM, Michael Della Bitta
 michael.della.bi...@appinions.com wrote:
 Marcos,
 
 You could consider using the CoreAdminHandler instead:
 
 http://wiki.apache.org/solr/CoreAdmin#CoreAdminHandler
 
 It works extremely well.
 
 Otherwise, you should periodically restart Tomcat. I'm not sure how
 much memory would be leaked, but it's likely not going to have much of
 an impact for a few iterations.
 
 
 Michael Della Bitta
 
 
 Appinions
 18 East 41st Street, 2nd Floor
 New York, NY 10017-6271
 
 www.appinions.com
 
 Where Influence Isn’t a Game
 
 
 On Mon, Feb 11, 2013 at 8:45 PM, Marcos Mendez mar...@jitisoft.com wrote:
 Hi Michael,
 
 Yes, we do intend to reload Solr when deploying new cores. So we deploy it, 
 update solr.xml and then restart Solr only. So this will happen sometimes 
 in production, but mostly testing. Which means it will be a real pain. Any 
 way to fix this?
 
 Also, I'm running geronimo with -Xmx1024m -XX:MaxPermSize=256m.
 
 Regards,
 Marcos
 
 On Feb 6, 2013, at 10:54 AM, Michael Della Bitta wrote:
 
 Marcos,
 
 The later 3 errors are common and won't pose a problem unless you
 intend to reload the Solr application without restarting Geronimo
 often.
 
 The first error, however, shouldn't happen. Have you changed the size
 of PermGen at all? I noticed this error while testing Solr 4.0 in
 Tomcat, but haven't seen it with Solr 4.1 (yet), so if you're on 4.0,
 you might want to try upgrading.
 
 
 Michael Della Bitta
 
 
 Appinions
 18 East 41st Street, 2nd Floor
 New York, NY 10017-6271
 
 www.appinions.com
 
 Where Influence Isn’t a Game
 
 
 On Wed, Feb 6, 2013 at 6:09 AM, Marcos Mendez mar...@jitisoft.com wrote:
 Hi,
 
 I'm deploying the SOLR war in Geronimo, with multiple cores. I'm seeing 
 the
 following issue and it eats up a lot of memory when shutting down. Has
 anyone seen this and have an idea how to solve it?
 
 Exception in thread DefaultThreadPool 196 java.lang.OutOfMemoryError:
 PermGen space
 2013-02-05 20:13:34,747 ERROR [ConcurrentLRUCache] ConcurrentLRUCache was
 not destroyed prior to finalize(), indicates a bug -- POSSIBLE RESOURCE
 LEAK!!!
 2013-02-05 20:13:34,747 ERROR [ConcurrentLRUCache] ConcurrentLRUCache was
 not destroyed prior to finalize(), indicates a bug -- POSSIBLE RESOURCE
 LEAK!!!
 2013-02-05 20:13:34,747 ERROR [CoreContainer] CoreContainer was not
 shutdown prior to finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!!
 instance=2080324477
 
 Regards,
 Marcos
 



Re: solrcloud-zookeeper

2013-02-12 Thread Mark Miller
By default, on cluster startup, we wait until we see all the replicas for a 
shard come up. This is for safety. You may have introduced an old shard with 
old data or a new shard with no data, and you don't want something like that 
becoming the leader.

If you don't want to do this wait, it's configurable. In solr.xml change the 
cores attribute leaderVoteWait to n milliseconds or 0. It defaults to 18 
(3 minutes).

- Mark

On Feb 12, 2013, at 8:31 AM, adm1n evgeni.evg...@gmail.com wrote:

 Hi all,
 the first question:
 is there a way to reduce timeout when sold shard comes up? it looks in log
 file as follows:
 
 Feb 12, 2013 1:19:08 PM org.apache.solr.cloud.ShardLeaderElectionContext
 waitForReplicasToComeUp
 INFO: Waiting until we see more replicas up: total=2 found=1
 timeoutin=178992
 Feb 12, 2013 1:19:09 PM org.apache.solr.cloud.ShardLeaderElectionContext
 waitForReplicasToComeUp
 INFO: Waiting until we see more replicas up: total=2 found=1
 timeoutin=178489
 Feb 12, 2013 1:19:09 PM org.apache.solr.cloud.ShardLeaderElectionContext
 waitForReplicasToComeUp
 INFO: Waiting until we see more replicas up: total=2 found=1
 timeoutin=177986
 
 
 And another one - let's assume I have 2 shards and one of them is down (both
 - master and slave) for some reason.
 What is happening now is that cluster returns 503 on the search request. Is
 there a way to configure to get responses from other shard?
 
 
 thanks.
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/solrcloud-zookeeper-tp4039934.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Possible issue in edismax?

2013-02-12 Thread Sandeep Mestry
Hi Felipe, Just a short note to say thanks for your valuable suggestion. I
had implemented that and could see expected results. The length norm still
spoils it for few fields but I balanced it with the boost factors
accordingly.

Once again, Many Thanks!
Sandeep


On 1 February 2013 22:53, Sandeep Mestry sanmes...@gmail.com wrote:

 Brilliant!  Thanks very much for your response. .
 On 1 Feb 2013 20:37, Felipe Lahti fla...@thoughtworks.com wrote:

 It's not necessary. It's only query time.


 On Fri, Feb 1, 2013 at 5:00 PM, Sandeep Mestry sanmes...@gmail.com
 wrote:

  Hi..
 
  Could you tell me if changing default similarity to custom
 implementation
  will require me to rebuild the index? Or will it be used only query
 time?
 
  thanks,
  Sandeep
   On 31 Jan 2013 13:55, Felipe Lahti fla...@thoughtworks.com wrote:
 
   So, it depends of your business requirement, right? If a document has
   matches in more searchable fields, at least for me, this document is
 more
   important than other document that has less matches.
  
   Example:
   Put this in your schema:
   similarity class=com.your.namespace.NoIDFSimilarity /
  
   And create a class in your classpath of your Solr:
  
   package com.your.namespace;
  
   import org.apache.lucene.search.similarities.DefaultSimilarity;
  
   public class NoIDFSimilarity extends DefaultSimilarity {
  
   @Override
  
   public float idf(long docFreq, long numDocs) {
  
   return 1;
  
   }
  
   }
  
  
   It will neutralize the idf (which is the rarity of term).
  
  
  
  
  
  
   On Thu, Jan 31, 2013 at 5:31 AM, Sandeep Mestry sanmes...@gmail.com
   wrote:
  
Thanks Felipe..
Can you point me an example please?
   
Also forgive me but if a document has matches in more searchable
 fields
then should it not rank higher?
   
Thanks,
Sandeep
On 30 Jan 2013 19:30, Felipe Lahti fla...@thoughtworks.com
 wrote:
   
 If you compare the first and last document scores you will see
 that
  the
 last one matches more fields than first one. So, you maybe
 thinking
   why?
 The first doc only matches contributions field and the last
  matches a
 bunch of fields so if you want to  have behave more like (str
 name=qfseries_title^500 title^100 description^15
  contribution/str)
you
 have to override the method of DefaultSimilarity.


 On Wed, Jan 30, 2013 at 4:12 PM, Sandeep Mestry 
 sanmes...@gmail.com
  
 wrote:

  I have pasted it below and it is slightly variant from the
 dismax
  configuration I have mentioned above as I was playing with all
  sorts
   of
  boost values, however it looks more lie below:
 
  str name=c208c2ca-4270-27b8-e040-a8c00409063a
  2675.7844 = (MATCH) sum of: 2675.7844 = (MATCH) max plus 0.01
 times
 others
  of: 2675.7844 = (MATCH) weight(contributions:news in 63298)
  [DefaultSimilarity], result of: 2675.7844 =
  score(doc=63298,freq=1.0
   =
  termFreq=1.0 ), product of: 0.004495774 = queryWeight, product
 of:
  14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 =
  queryNorm
  595177.7 = fieldWeight in 63298, product of: 1.0 = tf(freq=1.0),
  with
 freq
  of: 1.0 = termFreq=1.0 14.530705 = idf(docFreq=14,
  maxDocs=11282414)
  40960.0 = fieldNorm(doc=63298)
  /str
  str name=c208c2a9-66bc-27b8-e040-a8c00409063a
  2317.297 = (MATCH) sum of: 2317.297 = (MATCH) max plus 0.01
 times
others
  of: 2317.297 = (MATCH) weight(contributions:news in 9826415)
  [DefaultSimilarity], result of: 2317.297 =
   score(doc=9826415,freq=3.0 =
  termFreq=3.0 ), product of: 0.004495774 = queryWeight, product
 of:
  14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 =
  queryNorm
  515439.0 = fieldWeight in 9826415, product of: 1.7320508 =
tf(freq=3.0),
  with freq of: 3.0 = termFreq=3.0 14.530705 = idf(docFreq=14,
  maxDocs=11282414) 20480.0 = fieldNorm(doc=9826415)
  /str
  str name=c208c2aa-1806-27b8-e040-a8c00409063a
  2140.6274 = (MATCH) sum of: 2140.6274 = (MATCH) max plus 0.01
 times
 others
  of: 2140.6274 = (MATCH) weight(contributions:news in 9882325)
  [DefaultSimilarity], result of: 2140.6274 =
   score(doc=9882325,freq=1.0
=
  termFreq=1.0 ), product of: 0.004495774 = queryWeight, product
 of:
  14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 =
  queryNorm
  476142.16 = fieldWeight in 9882325, product of: 1.0 =
 tf(freq=1.0),
with
  freq of: 1.0 = termFreq=1.0 14.530705 = idf(docFreq=14,
maxDocs=11282414)
  32768.0 = fieldNorm(doc=9882325)
  /str
  str name=c208c2b0-5165-27b8-e040-a8c00409063a
  1605.4707 = (MATCH) sum of: 1605.4707 = (MATCH) max plus 0.01
 times
 others
  of: 1605.4707 = (MATCH) weight(contributions:news in 220007)
  [DefaultSimilarity], result of: 1605.4707 =
   score(doc=220007,freq=1.0 =
  termFreq=1.0 ), product of: 0.004495774 = 

Re: More Like This, finding original record

2013-02-12 Thread Daniel Rijkhof
Well, i have found the following line in
MoreLikeThisHandler$MoreListThisHelper.getMoreLikeThis(..)


  // exclude current document from results  realMLTQuery.add(
  new TermQuery
http://javasourcecode.org/html/open-source/lucene/lucene-3.6.1/org/apache/lucene/search/TermQuery.java.html(new
Term 
http://javasourcecode.org/html/open-source/lucene/lucene-3.6.1/org/apache/lucene/index/Term.java.html(uniqueKeyField.getName(),
uniqueKeyField.getType().storedToIndexed(doc.getFieldable(uniqueKeyField.getName(),
BooleanClause.Occur.MUST_NOT);

I'll try to remove the line someway, and see if my results work for me.

It at least is clear that this line is not surrounded by any if statement
and will always be executed, so 'NO' is the answer to is there a way to
get the current document in the search results?.


Have Fun
Daniel



On Tue, Feb 12, 2013 at 3:25 PM, Daniel Rijkhof daniel.rijk...@gmail.comwrote:

 I guess it's not possible, but perhaps someone knows how to do this:

 Do a more like this query (through the mlt handler),
 And find the match record within the response records (top match, should
 be first in list).

 This would then make it possible for me to compare scores...

 Anybody around that did this? (Modify source code perhaps?)

 Have Fun
 Daniel



DisMax Query Field-Filters (ASCIIFolding)

2013-02-12 Thread Ralf Heyde
Hello,

I have an interesting behaviour.

I have a FieldType Text_PL. This type is configured as:

fieldType name=text_pl class=solr.TextField positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true 
words=words/stopwords_pl.txt enablePositionIncrements=true /
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 
splitOnCaseChange=1/
filter class=solr.ASCIIFoldingFilterFactory /
filter class=solr.LowerCaseFilterFactory/
filter class=solr.StempelPolishStemFilterFactory  
protected=words/protwords_pl.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true 
words=words/stopwords_pl.txt enablePositionIncrements=true /
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 
splitOnCaseChange=1/
filter class=solr.ASCIIFoldingFilterFactory /
filter class=solr.LowerCaseFilterFactory/
filter class=solr.StempelPolishStemFilterFactory 
protected=words/protwords_pl.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType

So, one filter in the chain is the ASCIIFoldingFilterFactory which normalizes 
special characters (e.g. ó -- O).
If I query field:czolenka it shows the same behaviour like searching for 
field:czółenka - as expected.

Now, if I use the DisMax query, this normalization step does not take place. I 
debugged the code, if I run the normal query, the debugger stops at the 
ASCIIFoldingFilter (as expected), if I run the DisMax Query, there is no stop 
at this filter - so the filter is not used.

Does anybody has an idea why? 
Do I have to configure the DisMax RequestHandler for ASCIIFolding - if possible 
?

Thanks,

Ralf


Re: SolrCloud and hardcoded 'id' field

2013-02-12 Thread Michael Della Bitta
Apparently this was a side effect of the custom sharding feature.
There is a fix planned, but I don't know more about it than that.

Michael Della Bitta


Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Mon, Feb 11, 2013 at 7:15 PM, Shawn Heisey s...@elyograg.org wrote:
 I have heard that SolrCloud may require the presence of a uniqeKey field
 specifically named 'id' for sharding.

 Is this true?  Is it still true as of Solr 4.2-SNAPSHOT?  If not, what svn
 commit fixed it?  If so, should I file a jira?  I am not actually using
 SolrCloud for one index, but my worry is that once a precedent for putting
 specific names in the code is set, it may bleed over into other features.
 Also, I have another set of servers for a different purpose that ARE using
 SolrCloud.  Currently that system uses numShards=1, but one day we might
 want to do a distributed search there.

 Both my systems have a uniqueKey field other than 'id' and it would be quite
 a task to change it.  The 'id' field doesn't exist at all in either system.
 Here's relevant info for one of the systems:

field name=tag_id type=lowercase indexed=true stored=true
 omitTermFreqAndPositions=true/

 !-- lowercases the entire field value --
 fieldType name=lowercase class=solr.TextField
 sortMissingLast=true positionIncrementGap=0 omitNorms=true
   analyzer
 tokenizer class=solr.KeywordTokenizerFactory/
 filter class=solr.ICUFoldingFilterFactory/
 filter class=solr.TrimFilterFactory/
   /analyzer
 /fieldType

   uniqueKeytag_id/uniqueKey

 Thanks,
 Shawn


Benefits of Solr over Lucene?

2013-02-12 Thread JohnRodey
I know that Solr web-enables a Lucene index, but I'm trying to figure out
what other things Solr offers over Lucene.  On the Solr features list it
says Solr uses the Lucene search library and extends it!, but what exactly
are the extensions from the list and what did Lucene give you?  Also if I
have an index built through Solr is there a non-HTTP way to search that
index?  Because solr4j essentially just makes HTTP requests correct?

Some features Im particularly interested in are:
Geospatial Search
Highlighting
Dynamic Fields
Near Real-Time Indexing
Multiple Search Indices 

Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Benefits-of-Solr-over-Lucene-tp4039964.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DisMax Query Field-Filters (ASCIIFolding)

2013-02-12 Thread Ahmet Arslan

Hi Ralf,

Dismax querparser does not allow fielded queries. e.g. field:something

Consider using edismax query parser instead.

Also debugQuery=on will display informative output how query parsed analyzed 
etc.

ahmet

--- On Tue, 2/12/13, Ralf Heyde ralf.he...@gmx.de wrote:

 From: Ralf Heyde ralf.he...@gmx.de
 Subject: DisMax Query  Field-Filters (ASCIIFolding)
 To: solr-user@lucene.apache.org
 Date: Tuesday, February 12, 2013, 5:25 PM
 Hello,
 
 I have an interesting behaviour.
 
 I have a FieldType Text_PL. This type is configured as:
 
 fieldType name=text_pl class=solr.TextField
 positionIncrementGap=100
       analyzer type=index
         tokenizer
 class=solr.WhitespaceTokenizerFactory/
         filter
 class=solr.StopFilterFactory ignoreCase=true
 words=words/stopwords_pl.txt
 enablePositionIncrements=true /
         filter
 class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1
 catenateWords=1 catenateNumbers=1 catenateAll=0
 splitOnCaseChange=1/
         filter
 class=solr.ASCIIFoldingFilterFactory /
         filter
 class=solr.LowerCaseFilterFactory/
         filter
 class=solr.StempelPolishStemFilterFactory 
 protected=words/protwords_pl.txt/
         filter
 class=solr.RemoveDuplicatesTokenFilterFactory/
       /analyzer
       analyzer type=query
         tokenizer
 class=solr.WhitespaceTokenizerFactory/
         filter
 class=solr.StopFilterFactory ignoreCase=true
 words=words/stopwords_pl.txt
 enablePositionIncrements=true /
         filter
 class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1
 catenateWords=1 catenateNumbers=1 catenateAll=0
 splitOnCaseChange=1/
         filter
 class=solr.ASCIIFoldingFilterFactory /
         filter
 class=solr.LowerCaseFilterFactory/
         filter
 class=solr.StempelPolishStemFilterFactory
 protected=words/protwords_pl.txt/
         filter
 class=solr.RemoveDuplicatesTokenFilterFactory/
       /analyzer
     /fieldType
 
 So, one filter in the chain is the ASCIIFoldingFilterFactory
 which normalizes special characters (e.g. ó -- O).
 If I query field:czolenka it shows the same behaviour like
 searching for field:czółenka - as expected.
 
 Now, if I use the DisMax query, this normalization step does
 not take place. I debugged the code, if I run the normal
 query, the debugger stops at the ASCIIFoldingFilter (as
 expected), if I run the DisMax Query, there is no stop at
 this filter - so the filter is not used.
 
 Does anybody has an idea why? 
 Do I have to configure the DisMax RequestHandler for
 ASCIIFolding - if possible ?
 
 Thanks,
 
 Ralf



Re: Benefits of Solr over Lucene?

2013-02-12 Thread Travis Low
http://lucene.apache.org/solr/

On Tue, Feb 12, 2013 at 10:40 AM, JohnRodey timothydd...@yahoo.com wrote:

 I know that Solr web-enables a Lucene index, but I'm trying to figure out
 what other things Solr offers over Lucene.  On the Solr features list it
 says Solr uses the Lucene search library and extends it!, but what
 exactly
 are the extensions from the list and what did Lucene give you?  Also if I
 have an index built through Solr is there a non-HTTP way to search that
 index?  Because solr4j essentially just makes HTTP requests correct?

 Some features Im particularly interested in are:
 Geospatial Search
 Highlighting
 Dynamic Fields
 Near Real-Time Indexing
 Multiple Search Indices

 Thanks!



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Benefits-of-Solr-over-Lucene-tp4039964.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 

**

*Travis Low, Director of Development*


** t...@4centurion.com* *

*Centurion Research Solutions, LLC*

*14048 ParkEast Circle *•* Suite 100 *•* Chantilly, VA 20151*

*703-956-6276 *•* 703-378-4474 (fax)*

*http://www.centurionresearch.com* http://www.centurionresearch.com

**The information contained in this email message is confidential and
protected from disclosure.  If you are not the intended recipient, any use
or dissemination of this communication, including attachments, is strictly
prohibited.  If you received this email message in error, please delete it
and immediately notify the sender.

This email message and any attachments have been scanned and are believed
to be free of malicious software and defects that might affect any computer
system in which they are received and opened. No responsibility is accepted
by Centurion Research Solutions, LLC for any loss or damage arising from
the content of this email.


Solr 3.3.0 - Random CPU problem

2013-02-12 Thread federico.wachs
Hi all,

I'm using Solr 3.3.0 with one master server and two slaves. And the problem
I'm having is that both slaves get degraded randomly but at the same time.
I am completely lost at to what the cause could be, but I see that the
tomcat that runs Solr webapp executes a PERL script that consumes 100% of
the CPU and when I go and kill it manually solr starts working perfectly
again. 

Does anybody has any idea of what the problem could be?

This is killing performance on my production environment and I've got no
idea of what's going on :S

Any help will be greatly appreciated.
Regards,
Federico



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-3-3-0-Random-CPU-problem-tp4039969.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DisMax Query Field-Filters (ASCIIFolding)

2013-02-12 Thread Ralf Heyde
Hi, 

thanks for your first Answer.

I don't want to have a fielded-query in my DisMax Query.

My DismaxQuery looks like this:

qt=dismaxq=czółenka... -- works
qt=dismaxq=czolenka... -- does not work

The accessed Fields contain the ASCIIFoldingFilter for Query  Index.

So, what I need is, that the DisMax QueryParser normalizes by ASCIIFolding. 
Is that possible?

Thanks,

Ralf

 Original-Nachricht 
 Datum: Tue, 12 Feb 2013 07:42:17 -0800 (PST)
 Von: Ahmet Arslan iori...@yahoo.com
 An: solr-user@lucene.apache.org
 Betreff: Re: DisMax Query  Field-Filters (ASCIIFolding)

 
 Hi Ralf,
 
 Dismax querparser does not allow fielded queries. e.g. field:something
 
 Consider using edismax query parser instead.
 
 Also debugQuery=on will display informative output how query parsed
 analyzed etc.
 
 ahmet
 
 --- On Tue, 2/12/13, Ralf Heyde ralf.he...@gmx.de wrote:
 
  From: Ralf Heyde ralf.he...@gmx.de
  Subject: DisMax Query  Field-Filters (ASCIIFolding)
  To: solr-user@lucene.apache.org
  Date: Tuesday, February 12, 2013, 5:25 PM
  Hello,
  
  I have an interesting behaviour.
  
  I have a FieldType Text_PL. This type is configured as:
  
  fieldType name=text_pl class=solr.TextField
  positionIncrementGap=100
        analyzer type=index
          tokenizer
  class=solr.WhitespaceTokenizerFactory/
          filter
  class=solr.StopFilterFactory ignoreCase=true
  words=words/stopwords_pl.txt
  enablePositionIncrements=true /
          filter
  class=solr.WordDelimiterFilterFactory
  generateWordParts=1 generateNumberParts=1
  catenateWords=1 catenateNumbers=1 catenateAll=0
  splitOnCaseChange=1/
          filter
  class=solr.ASCIIFoldingFilterFactory /
          filter
  class=solr.LowerCaseFilterFactory/
          filter
  class=solr.StempelPolishStemFilterFactory 
  protected=words/protwords_pl.txt/
          filter
  class=solr.RemoveDuplicatesTokenFilterFactory/
        /analyzer
        analyzer type=query
          tokenizer
  class=solr.WhitespaceTokenizerFactory/
          filter
  class=solr.StopFilterFactory ignoreCase=true
  words=words/stopwords_pl.txt
  enablePositionIncrements=true /
          filter
  class=solr.WordDelimiterFilterFactory
  generateWordParts=1 generateNumberParts=1
  catenateWords=1 catenateNumbers=1 catenateAll=0
  splitOnCaseChange=1/
          filter
  class=solr.ASCIIFoldingFilterFactory /
          filter
  class=solr.LowerCaseFilterFactory/
          filter
  class=solr.StempelPolishStemFilterFactory
  protected=words/protwords_pl.txt/
          filter
  class=solr.RemoveDuplicatesTokenFilterFactory/
        /analyzer
      /fieldType
  
  So, one filter in the chain is the ASCIIFoldingFilterFactory
  which normalizes special characters (e.g. ó -- O).
  If I query field:czolenka it shows the same behaviour like
  searching for field:czółenka - as expected.
  
  Now, if I use the DisMax query, this normalization step does
  not take place. I debugged the code, if I run the normal
  query, the debugger stops at the ASCIIFoldingFilter (as
  expected), if I run the DisMax Query, there is no stop at
  this filter - so the filter is not used.
  
  Does anybody has an idea why? 
  Do I have to configure the DisMax RequestHandler for
  ASCIIFolding - if possible ?
  
  Thanks,
  
  Ralf
  


Re: DisMax Query Field-Filters (ASCIIFolding)

2013-02-12 Thread Jack Krupansky
1. Show us the full query request and request handler. In particular, the 
qf parameter.


2. Try the Solr Admin Analysis UI to check for sure how the analysis is 
being performed.


3. Add debugQuery=true to your query to see how it is actually parsed.

4. If there is any chance that you have modified your field type since 
originally indexing the data, be sure to completely reindex after ANY change 
in the field types.


-- Jack Krupansky

-Original Message- 
From: Ralf Heyde

Sent: Tuesday, February 12, 2013 10:53 AM
To: solr-user@lucene.apache.org
Subject: Re: DisMax Query  Field-Filters (ASCIIFolding)

Hi,

thanks for your first Answer.

I don't want to have a fielded-query in my DisMax Query.

My DismaxQuery looks like this:

qt=dismaxq=czółenka... -- works
qt=dismaxq=czolenka... -- does not work

The accessed Fields contain the ASCIIFoldingFilter for Query  Index.

So, what I need is, that the DisMax QueryParser normalizes by 
ASCIIFolding. Is that possible?


Thanks,

Ralf

 Original-Nachricht 

Datum: Tue, 12 Feb 2013 07:42:17 -0800 (PST)
Von: Ahmet Arslan iori...@yahoo.com
An: solr-user@lucene.apache.org
Betreff: Re: DisMax Query  Field-Filters (ASCIIFolding)




Hi Ralf,

Dismax querparser does not allow fielded queries. e.g. field:something

Consider using edismax query parser instead.

Also debugQuery=on will display informative output how query parsed
analyzed etc.

ahmet

--- On Tue, 2/12/13, Ralf Heyde ralf.he...@gmx.de wrote:

 From: Ralf Heyde ralf.he...@gmx.de
 Subject: DisMax Query  Field-Filters (ASCIIFolding)
 To: solr-user@lucene.apache.org
 Date: Tuesday, February 12, 2013, 5:25 PM
 Hello,

 I have an interesting behaviour.

 I have a FieldType Text_PL. This type is configured as:

 fieldType name=text_pl class=solr.TextField
 positionIncrementGap=100
   analyzer type=index
 tokenizer
 class=solr.WhitespaceTokenizerFactory/
 filter
 class=solr.StopFilterFactory ignoreCase=true
 words=words/stopwords_pl.txt
 enablePositionIncrements=true /
 filter
 class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1
 catenateWords=1 catenateNumbers=1 catenateAll=0
 splitOnCaseChange=1/
 filter
 class=solr.ASCIIFoldingFilterFactory /
 filter
 class=solr.LowerCaseFilterFactory/
 filter
 class=solr.StempelPolishStemFilterFactory
 protected=words/protwords_pl.txt/
 filter
 class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
   analyzer type=query
 tokenizer
 class=solr.WhitespaceTokenizerFactory/
 filter
 class=solr.StopFilterFactory ignoreCase=true
 words=words/stopwords_pl.txt
 enablePositionIncrements=true /
 filter
 class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1
 catenateWords=1 catenateNumbers=1 catenateAll=0
 splitOnCaseChange=1/
 filter
 class=solr.ASCIIFoldingFilterFactory /
 filter
 class=solr.LowerCaseFilterFactory/
 filter
 class=solr.StempelPolishStemFilterFactory
 protected=words/protwords_pl.txt/
 filter
 class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
 /fieldType

 So, one filter in the chain is the ASCIIFoldingFilterFactory
 which normalizes special characters (e.g. ó -- O).
 If I query field:czolenka it shows the same behaviour like
 searching for field:czółenka - as expected.

 Now, if I use the DisMax query, this normalization step does
 not take place. I debugged the code, if I run the normal
 query, the debugger stops at the ASCIIFoldingFilter (as
 expected), if I run the DisMax Query, there is no stop at
 this filter - so the filter is not used.

 Does anybody has an idea why?
 Do I have to configure the DisMax RequestHandler for
 ASCIIFolding - if possible ?

 Thanks,

 Ralf
 




Re: DisMax Query Field-Filters (ASCIIFolding)

2013-02-12 Thread Ralf Heyde
I'll try to reindex - i modified the schema, but NOT re-indexed the Index.

Damn !


 Original-Nachricht 
 Datum: Tue, 12 Feb 2013 11:14:04 -0500
 Von: Jack Krupansky j...@basetechnology.com
 An: solr-user@lucene.apache.org
 Betreff: Re: DisMax Query  Field-Filters (ASCIIFolding)

 1. Show us the full query request and request handler. In particular, the 
 qf parameter.
 
 2. Try the Solr Admin Analysis UI to check for sure how the analysis is 
 being performed.
 
 3. Add debugQuery=true to your query to see how it is actually parsed.
 
 4. If there is any chance that you have modified your field type since 
 originally indexing the data, be sure to completely reindex after ANY
 change 
 in the field types.
 
 -- Jack Krupansky
 
 -Original Message- 
 From: Ralf Heyde
 Sent: Tuesday, February 12, 2013 10:53 AM
 To: solr-user@lucene.apache.org
 Subject: Re: DisMax Query  Field-Filters (ASCIIFolding)
 
 Hi,
 
 thanks for your first Answer.
 
 I don't want to have a fielded-query in my DisMax Query.
 
 My DismaxQuery looks like this:
 
 qt=dismaxq=czółenka... -- works
 qt=dismaxq=czolenka... -- does not work
 
 The accessed Fields contain the ASCIIFoldingFilter for Query  Index.
 
 So, what I need is, that the DisMax QueryParser normalizes by 
 ASCIIFolding. Is that possible?
 
 Thanks,
 
 Ralf
 
  Original-Nachricht 
  Datum: Tue, 12 Feb 2013 07:42:17 -0800 (PST)
  Von: Ahmet Arslan iori...@yahoo.com
  An: solr-user@lucene.apache.org
  Betreff: Re: DisMax Query  Field-Filters (ASCIIFolding)
 
 
  Hi Ralf,
 
  Dismax querparser does not allow fielded queries. e.g. field:something
 
  Consider using edismax query parser instead.
 
  Also debugQuery=on will display informative output how query parsed
  analyzed etc.
 
  ahmet
 
  --- On Tue, 2/12/13, Ralf Heyde ralf.he...@gmx.de wrote:
 
   From: Ralf Heyde ralf.he...@gmx.de
   Subject: DisMax Query  Field-Filters (ASCIIFolding)
   To: solr-user@lucene.apache.org
   Date: Tuesday, February 12, 2013, 5:25 PM
   Hello,
  
   I have an interesting behaviour.
  
   I have a FieldType Text_PL. This type is configured as:
  
   fieldType name=text_pl class=solr.TextField
   positionIncrementGap=100
 analyzer type=index
   tokenizer
   class=solr.WhitespaceTokenizerFactory/
   filter
   class=solr.StopFilterFactory ignoreCase=true
   words=words/stopwords_pl.txt
   enablePositionIncrements=true /
   filter
   class=solr.WordDelimiterFilterFactory
   generateWordParts=1 generateNumberParts=1
   catenateWords=1 catenateNumbers=1 catenateAll=0
   splitOnCaseChange=1/
   filter
   class=solr.ASCIIFoldingFilterFactory /
   filter
   class=solr.LowerCaseFilterFactory/
   filter
   class=solr.StempelPolishStemFilterFactory
   protected=words/protwords_pl.txt/
   filter
   class=solr.RemoveDuplicatesTokenFilterFactory/
 /analyzer
 analyzer type=query
   tokenizer
   class=solr.WhitespaceTokenizerFactory/
   filter
   class=solr.StopFilterFactory ignoreCase=true
   words=words/stopwords_pl.txt
   enablePositionIncrements=true /
   filter
   class=solr.WordDelimiterFilterFactory
   generateWordParts=1 generateNumberParts=1
   catenateWords=1 catenateNumbers=1 catenateAll=0
   splitOnCaseChange=1/
   filter
   class=solr.ASCIIFoldingFilterFactory /
   filter
   class=solr.LowerCaseFilterFactory/
   filter
   class=solr.StempelPolishStemFilterFactory
   protected=words/protwords_pl.txt/
   filter
   class=solr.RemoveDuplicatesTokenFilterFactory/
 /analyzer
   /fieldType
  
   So, one filter in the chain is the ASCIIFoldingFilterFactory
   which normalizes special characters (e.g. ó -- O).
   If I query field:czolenka it shows the same behaviour like
   searching for field:czółenka - as expected.
  
   Now, if I use the DisMax query, this normalization step does
   not take place. I debugged the code, if I run the normal
   query, the debugger stops at the ASCIIFoldingFilter (as
   expected), if I run the DisMax Query, there is no stop at
   this filter - so the filter is not used.
  
   Does anybody has an idea why?
   Do I have to configure the DisMax RequestHandler for
   ASCIIFolding - if possible ?
  
   Thanks,
  
   Ralf
   
 


Re: Benefits of Solr over Lucene?

2013-02-12 Thread Walter Underwood
This is apples and pomegranates. Lucene is a library, Solr is a server. In 
features, they are more alike than different.

wunder

On Feb 12, 2013, at 7:40 AM, JohnRodey wrote:

 I know that Solr web-enables a Lucene index, but I'm trying to figure out
 what other things Solr offers over Lucene.  On the Solr features list it
 says Solr uses the Lucene search library and extends it!, but what exactly
 are the extensions from the list and what did Lucene give you?  Also if I
 have an index built through Solr is there a non-HTTP way to search that
 index?  Because solr4j essentially just makes HTTP requests correct?
 
 Some features Im particularly interested in are:
 Geospatial Search
 Highlighting
 Dynamic Fields
 Near Real-Time Indexing
 Multiple Search Indices 
 
 Thanks!





Re: Benefits of Solr over Lucene?

2013-02-12 Thread Jack Krupansky
Here's yet another short list of benefits of Solr over Lucene (not that any 
of them take away from Lucene since Solr is based on Lucene):


- Multiple core index - go beyond the limits of a single lucene index
- Support for multi-core or named collections
- richer query parsers (e.g., schema-aware, edismax)
- schema language, including configurable field types and configurable 
analyzers

- easier to do per-field/type analysis
- plugin architecture, easily configured and customized
- Generally, develop a search engine without writing any code, and what code 
you may write is mostly easily configured plugins
- Editable configuration file rather than hard-coded or app-specific 
properties
- Tomcat/Jetty container support enable system administration as corporate 
IT ops teams already know it
- Web-based Admin UI, including debugging features such as field/type 
analysis
- Solr search features are available to any app written in any language, not 
just Java. All you need is HTTP access. (Granted, there is SOME support for 
Lucene in SOME other languages.)


In short, if you want to embed search engine capabilities in your Java app, 
Lucene is the way to go, but if you want a web architecture, with the 
search engine in a separate process from the app in a multi-tier 
architecture, Solr is the way to go. Granted, you could also use 
ElasticSearch or roll your own, but Solr basically runs right out of the 
box with no code development needed to get started and no Java knowledge 
needed.


And to be clear, Solr is not simply an extension of Lucene - Solr is a 
distinct architectural component that is based on Lucene. In OOP terms, 
think of composition rather than derivation.


-- Jack Krupansky

-Original Message- 
From: JohnRodey

Sent: Tuesday, February 12, 2013 10:40 AM
To: solr-user@lucene.apache.org
Subject: Benefits of Solr over Lucene?

I know that Solr web-enables a Lucene index, but I'm trying to figure out
what other things Solr offers over Lucene.  On the Solr features list it
says Solr uses the Lucene search library and extends it!, but what exactly
are the extensions from the list and what did Lucene give you?  Also if I
have an index built through Solr is there a non-HTTP way to search that
index?  Because solr4j essentially just makes HTTP requests correct?

Some features Im particularly interested in are:
Geospatial Search
Highlighting
Dynamic Fields
Near Real-Time Indexing
Multiple Search Indices

Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Benefits-of-Solr-over-Lucene-tp4039964.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Benefits of Solr over Lucene?

2013-02-12 Thread Amit Jha
Add to Jack reply, Solr can also be embed into the application and can run on 
same process. Solr, the server-I zation of lucene. The line is very blurred and 
solr is not a very thin wrapper around lucene library. 

Most solr features are distinct from lucene like 

- detailed breakdown of scoring mathematics
- text analysis phases
- solr adds to lucene's text analysis library and makes it configurable through 
XML
- introduce the notion of a field types
- runtime performance stats including cache hit/ miss rate


Rgds
AJ

On 12-Feb-2013, at 22:17, Jack Krupansky j...@basetechnology.com wrote:

 Here's yet another short list of benefits of Solr over Lucene (not that any 
 of them take away from Lucene since Solr is based on Lucene):
 
 - Multiple core index - go beyond the limits of a single lucene index
 - Support for multi-core or named collections
 - richer query parsers (e.g., schema-aware, edismax)
 - schema language, including configurable field types and configurable 
 analyzers
 - easier to do per-field/type analysis
 - plugin architecture, easily configured and customized
 - Generally, develop a search engine without writing any code, and what code 
 you may write is mostly easily configured plugins
 - Editable configuration file rather than hard-coded or app-specific 
 properties
 - Tomcat/Jetty container support enable system administration as corporate IT 
 ops teams already know it
 - Web-based Admin UI, including debugging features such as field/type analysis
 - Solr search features are available to any app written in any language, not 
 just Java. All you need is HTTP access. (Granted, there is SOME support for 
 Lucene in SOME other languages.)
 
 In short, if you want to embed search engine capabilities in your Java app, 
 Lucene is the way to go, but if you want a web architecture, with the 
 search engine in a separate process from the app in a multi-tier 
 architecture, Solr is the way to go. Granted, you could also use 
 ElasticSearch or roll your own, but Solr basically runs right out of the 
 box with no code development needed to get started and no Java knowledge 
 needed.
 
 And to be clear, Solr is not simply an extension of Lucene - Solr is a 
 distinct architectural component that is based on Lucene. In OOP terms, think 
 of composition rather than derivation.
 
 -- Jack Krupansky
 
 -Original Message- From: JohnRodey
 Sent: Tuesday, February 12, 2013 10:40 AM
 To: solr-user@lucene.apache.org
 Subject: Benefits of Solr over Lucene?
 
 I know that Solr web-enables a Lucene index, but I'm trying to figure out
 what other things Solr offers over Lucene.  On the Solr features list it
 says Solr uses the Lucene search library and extends it!, but what exactly
 are the extensions from the list and what did Lucene give you?  Also if I
 have an index built through Solr is there a non-HTTP way to search that
 index?  Because solr4j essentially just makes HTTP requests correct?
 
 Some features Im particularly interested in are:
 Geospatial Search
 Highlighting
 Dynamic Fields
 Near Real-Time Indexing
 Multiple Search Indices
 
 Thanks!
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Benefits-of-Solr-over-Lucene-tp4039964.html
 Sent from the Solr - User mailing list archive at Nabble.com. 


Re: Any inputs regarding running solr cluster on virtual machines?

2013-02-12 Thread Shawn Heisey

On 2/12/2013 12:25 AM, adfel70 wrote:

I'm currently running a solr cluster on 10 physical machines.
I'm considering moving to virtual machines.
Any insights on this issue?
Have anyone tried this? any best practices?


You'll definitely see some performance degradation.  How much is very 
hard to say.


I started with Solr on virtual machines, first vmware esxi (free 
version) and then Xen, one core/shard per VM.  I later moved to the bare 
metal (same machines) and began running multiple cores/shards per Solr 
instance, one instance per machine.  Performance is better and I don't 
have to maintain as many copies of the OS.


It did work perfectly when virtualized, though.

Thanks,
Shawn



Re: URLDecoder error message

2013-02-12 Thread Shawn Heisey

On 2/12/2013 1:42 AM, o.mares wrote:

yesterday we updated from solr 4.0 to solr 4.1 and since then from time to
time following error pops up:

{msg=URLDecoder: Invalid character encoding detected after position 160 of
query string / form data (while parsing as UTF-8),code=400}:
{msg=URLDecoder: Invalid character encoding detected after position 160 of
query string / form data (while parsing as UTF-8),code=400}

Is this an issue with incorrect input data, or is something broken with
our solr?
Solr 4.0 did not gave us these errors.


Is your client code using UTF-8? It sounds like maybe it's not.

Thanks,
Shawn



Re: Benefits of Solr over Lucene?

2013-02-12 Thread JohnRodey
So I have had a fair amount of experience using Solr.  However on a separate
project we are considering just using Lucene directly, which I have never
done.  I am trying to avoid finding out late that Lucene doesn't offer what
we need and being like aw snap, it doesn't support geospatial  (or
highlighting, or dynamic fields, or etc...).  I am more curious about core
index and search features, and not as much with sharding, cloud features,
different client languages and so on.

Any thoughts?

thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Benefits-of-Solr-over-Lucene-tp4039964p4040009.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 3.3.0 - Random CPU problem

2013-02-12 Thread Chris Hostetter

: I'm using Solr 3.3.0 with one master server and two slaves. And the problem
: I'm having is that both slaves get degraded randomly but at the same time.
: I am completely lost at to what the cause could be, but I see that the
: tomcat that runs Solr webapp executes a PERL script that consumes 100% of
: the CPU and when I go and kill it manually solr starts working perfectly
: again. 
: 
: Does anybody has any idea of what the problem could be?

what is the name of the perl script? what do the contents of that perl 
script look like? how do you kow it is being run by tomcat?

Solr doesn't ship with any perl scripts, and to the best of my knowledge 
neither does tomcat ... so it sounds like your problem either isn't 
specificaly about Solr, but perhaps about something related to your 
Solr/Tomcat configuration?

-Hoss


Re: Solr 3.3.0 - Random CPU problem

2013-02-12 Thread federico.wachs
I don't know how the perl script looks like. I can tell it's being ran by
tomcat because when I do : top the owner of the process says tomcat and
the CPU is at 100%.

I haven't done anything weird to my Solr installation, actually is pretty
simple and is the one it used to be on the solr website a year ago or
something like that.

Do you have any idea of how to see which PERL script is being executed or
what it's content is?

Thanks for your reply!

Regards,
Federico



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-3-3-0-Random-CPU-problem-tp4039969p4040019.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: which analyzer is used for facet.query?

2013-02-12 Thread Chris Hostetter

:  So it seems that facet.query is using the analyzer of type index.
:  Is it a bug or is there another analyzer type for the facet query?

That doesn't really make any sense ... 

i don't know much about setting up UIMA (or what/when it logs things) but 
facet.query uses the regular query parser framework, which uses the query 
analyzer for fields when building up queries.

You can see clear evidence of this by looking at the following query using 
the 4.1 example configs...

http://localhost:8983/solr/select?q=name:piximafl=namedebugQuery=true

The text_general field type used by the name field is configured to use 
synonyms at query time, but not index time, and you can see that it maps 
pixima to pixma ...

  str name=querystringname:pixima/str
  str name=parsedqueryname:pixma/str

If you have the sample documents indexed, you can see a single match for 
the query above, and if you use pixima in a facet.query, you can see the 
expected count... 

http://localhost:8983/solr/select?q=*:*facet.query=name:piximafacet=truerows=0

result name=response numFound=32 start=0/
lst name=facet_counts
  lst name=facet_queries
int name=name:pixima1/int
  /lst
  ...


-Hoss


Re: Solr 3.3.0 - Random CPU problem

2013-02-12 Thread Chris Hostetter

: I don't know how the perl script looks like. I can tell it's being ran by
: tomcat because when I do : top the owner of the process says tomcat and
: the CPU is at 100%.
...
: Do you have any idea of how to see which PERL script is being executed or
: what it's content is?

look at the PID column in top, assuming it's something like 1234 then in 
another shell run this command...

ps -wwwf -p 1234

...and that should tell you the detals of the process including the path 
to the perl script.

(the args you need for ps may be different if you aren't running linux)

-Hoss


Re: Need to create SolrServer objects without checking server availability

2013-02-12 Thread Chris Hostetter

: The problem is at program startup -- when 'new HttpSolrServer(url)' is called,
: it goes and makes sure that the server is up and responsive.  If any of those
: 56 object creation calls fails, then my app won't even start.

What exactly is the exception are you getting?  i don't think antying in 
HttpSolrServer explicitly tries to test the server on creation.

: Someone on the IRC channel brought up the possibility of initializing the
: HttpClient myself instead of letting the Solr object do it.  If the health
: check is actually in HttpClient, this might work, if there's a way to
: initialize the HttpClient without a health check.  I've actually been
: wondering if it makes any sense to re-use a single HttpClient object across
: all 56 server objects.

It probably would make sense for you to create a single HttpClient object 
that you re-use in all of your HttpSolrServer instances -- but i'm not 
sure how that would help your problem, since the HttpClient constructed 
implicitly by HttpSolrServer(String) doesn't know anything about the 
baseUrl.

HttpClient literally can not test the health of the URL when it is 
created, because it doesn't know anything about any URLs until a request 
is executed.


-Hoss


Re: Benefits of Solr over Lucene?

2013-02-12 Thread Shawn Heisey

On 2/12/2013 11:19 AM, JohnRodey wrote:

So I have had a fair amount of experience using Solr.  However on a separate
project we are considering just using Lucene directly, which I have never
done.  I am trying to avoid finding out late that Lucene doesn't offer what
we need and being like aw snap, it doesn't support geospatial  (or
highlighting, or dynamic fields, or etc...).  I am more curious about core
index and search features, and not as much with sharding, cloud features,
different client languages and so on.


Because Solr is written using the Lucene API, if you want to use Lucene, 
you can do anything Solr can, plus plenty of things that Solr can't -- 
but for many of those, you'd have to write the code yourself.  That's 
the key difference -- with Solr, a HUGE amount of coding is already done 
for you, you just have to put a few easy-to-debug client API calls in 
your code.


From my perspective as a user with some Java coding ability but not any 
true experience with large-scale development:  If your development team 
is ready and capable of writing Lucene code, then it would be better to 
use Solr instead, and if there's something you need that Solr can't do, 
put your development team to work writing the required plugin.  They 
would likely spend far less time doing that than writing an entire 
search system using Lucene.


Thanks,
Shawn



Re: Do I have to reindex when upgrading from solr 4.0 to 4.1?

2013-02-12 Thread Joel Bernstein
Michael is correct, that was what was said at the bootcamp (by me). I
believe this may not be correct though.

Further code review shows that Solr 4.0 was already distributing documents
using the hash range technique used in 4.1. The big change in 4.1 was that
a composite hash key could be used to distribute docs around the hash
range. But docs that don't use the composite key would be distributed
similarly to 4.0.

So, you may not need to re-index to take advantage of shard splitting. This
will become more clear as shard splitting documentation becomes available.


On Mon, Feb 11, 2013 at 12:45 PM, Michael Della Bitta 
michael.della.bi...@appinions.com wrote:

 Arkadi,

 That's the answer I received at Solr Bootcamp, yes.

 Michael Della Bitta

 
 Appinions
 18 East 41st Street, 2nd Floor
 New York, NY 10017-6271

 www.appinions.com

 Where Influence Isn’t a Game


 On Mon, Feb 11, 2013 at 2:23 AM, Arkadi Colson ark...@smartbit.be wrote:
  Does it mean that when you redo indexing after the upgrade to 4.1 shard
  splitting will work in 4.2?
 
  Met vriendelijke groeten
 
  Arkadi Colson
 
  Smartbit bvba • Hoogstraat 13 • 3670 Meeuwen
  T +32 11 64 08 80 • F +32 11 64 08 81
 
  On 02/10/2013 05:21 PM, Michael Della Bitta wrote:
 
  No. You can just update Solr in place. But...
 
  If you're using Solr Cloud, your documents won't be hashed in a way
  that lets you do shard splitting in 4.2. That seemed to be the
  consensus during Solr Boot Camp.
 
  Michael Della Bitta
 
  
  Appinions
  18 East 41st Street, 2nd Floor
  New York, NY 10017-6271
 
  www.appinions.com
 
  Where Influence Isn’t a Game
 
 
  On Sun, Feb 10, 2013 at 10:46 AM, adfel70 adfe...@gmail.com wrote:
 
  Do I have to recreate the collections/cores?
  Do I have to reindex?
 
  thanks.
 
 
 
  --
  View this message in context:
 
 http://lucene.472066.n3.nabble.com/Do-I-have-to-reindex-when-upgrading-from-solr-4-0-to-4-1-tp4039560.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 




-- 
Joel Bernstein
Professional Services LucidWorks


Re: Benefits of Solr over Lucene?

2013-02-12 Thread Glen Newton
Is there a page on the wiki that points out the use cases (or the
features) that are best suited for Lucene adoption, and those best
suited for SOLR adoption?

-Glen

On Tue, Feb 12, 2013 at 3:11 PM, Shawn Heisey s...@elyograg.org wrote:
 On 2/12/2013 11:19 AM, JohnRodey wrote:

 So I have had a fair amount of experience using Solr.  However on a
 separate
 project we are considering just using Lucene directly, which I have never
 done.  I am trying to avoid finding out late that Lucene doesn't offer
 what
 we need and being like aw snap, it doesn't support geospatial  (or
 highlighting, or dynamic fields, or etc...).  I am more curious about core
 index and search features, and not as much with sharding, cloud features,
 different client languages and so on.


 Because Solr is written using the Lucene API, if you want to use Lucene, you
 can do anything Solr can, plus plenty of things that Solr can't -- but for
 many of those, you'd have to write the code yourself.  That's the key
 difference -- with Solr, a HUGE amount of coding is already done for you,
 you just have to put a few easy-to-debug client API calls in your code.

 From my perspective as a user with some Java coding ability but not any true
 experience with large-scale development:  If your development team is ready
 and capable of writing Lucene code, then it would be better to use Solr
 instead, and if there's something you need that Solr can't do, put your
 development team to work writing the required plugin.  They would likely
 spend far less time doing that than writing an entire search system using
 Lucene.

 Thanks,
 Shawn




--
-
http://zzzoot.blogspot.com/
-


Re: Benefits of Solr over Lucene?

2013-02-12 Thread Walter Underwood
It is like deciding between a disk drive and a file server. Solr and Lucene are 
different kinds of things.

wunder

On Feb 12, 2013, at 12:26 PM, Glen Newton wrote:

 Is there a page on the wiki that points out the use cases (or the
 features) that are best suited for Lucene adoption, and those best
 suited for SOLR adoption?
 
 -Glen
 
 On Tue, Feb 12, 2013 at 3:11 PM, Shawn Heisey s...@elyograg.org wrote:
 On 2/12/2013 11:19 AM, JohnRodey wrote:
 
 So I have had a fair amount of experience using Solr.  However on a
 separate
 project we are considering just using Lucene directly, which I have never
 done.  I am trying to avoid finding out late that Lucene doesn't offer
 what
 we need and being like aw snap, it doesn't support geospatial  (or
 highlighting, or dynamic fields, or etc...).  I am more curious about core
 index and search features, and not as much with sharding, cloud features,
 different client languages and so on.
 
 
 Because Solr is written using the Lucene API, if you want to use Lucene, you
 can do anything Solr can, plus plenty of things that Solr can't -- but for
 many of those, you'd have to write the code yourself.  That's the key
 difference -- with Solr, a HUGE amount of coding is already done for you,
 you just have to put a few easy-to-debug client API calls in your code.
 
 From my perspective as a user with some Java coding ability but not any true
 experience with large-scale development:  If your development team is ready
 and capable of writing Lucene code, then it would be better to use Solr
 instead, and if there's something you need that Solr can't do, put your
 development team to work writing the required plugin.  They would likely
 spend far less time doing that than writing an entire search system using
 Lucene.
 
 Thanks,
 Shawn
 
 
 
 
 --
 -
 http://zzzoot.blogspot.com/
 -

--
Walter Underwood
wun...@wunderwood.org





Re: Benefits of Solr over Lucene?

2013-02-12 Thread Glen Newton
And helping people - who don't know much about them - how to decide
which to use is not useful?

-Glen

On Tue, Feb 12, 2013 at 3:34 PM, Walter Underwood wun...@wunderwood.org wrote:
 It is like deciding between a disk drive and a file server. Solr and Lucene 
 are different kinds of things.

 wunder

 On Feb 12, 2013, at 12:26 PM, Glen Newton wrote:

 Is there a page on the wiki that points out the use cases (or the
 features) that are best suited for Lucene adoption, and those best
 suited for SOLR adoption?

 -Glen

 On Tue, Feb 12, 2013 at 3:11 PM, Shawn Heisey s...@elyograg.org wrote:
 On 2/12/2013 11:19 AM, JohnRodey wrote:

 So I have had a fair amount of experience using Solr.  However on a
 separate
 project we are considering just using Lucene directly, which I have never
 done.  I am trying to avoid finding out late that Lucene doesn't offer
 what
 we need and being like aw snap, it doesn't support geospatial  (or
 highlighting, or dynamic fields, or etc...).  I am more curious about core
 index and search features, and not as much with sharding, cloud features,
 different client languages and so on.


 Because Solr is written using the Lucene API, if you want to use Lucene, you
 can do anything Solr can, plus plenty of things that Solr can't -- but for
 many of those, you'd have to write the code yourself.  That's the key
 difference -- with Solr, a HUGE amount of coding is already done for you,
 you just have to put a few easy-to-debug client API calls in your code.

 From my perspective as a user with some Java coding ability but not any true
 experience with large-scale development:  If your development team is ready
 and capable of writing Lucene code, then it would be better to use Solr
 instead, and if there's something you need that Solr can't do, put your
 development team to work writing the required plugin.  They would likely
 spend far less time doing that than writing an entire search system using
 Lucene.

 Thanks,
 Shawn




 --
 -
 http://zzzoot.blogspot.com/
 -

 --
 Walter Underwood
 wun...@wunderwood.org






--
-
http://zzzoot.blogspot.com/
-


Re: Need to create SolrServer objects without checking server availability

2013-02-12 Thread Shawn Heisey

On 2/12/2013 12:27 PM, Chris Hostetter wrote:


: The problem is at program startup -- when 'new HttpSolrServer(url)' is called,
: it goes and makes sure that the server is up and responsive.  If any of those
: 56 object creation calls fails, then my app won't even start.

What exactly is the exception are you getting?  i don't think anything in
HttpSolrServer explicitly tries to test the server on creation.

: Someone on the IRC channel brought up the possibility of initializing the
: HttpClient myself instead of letting the Solr object do it.  If the health
: check is actually in HttpClient, this might work, if there's a way to
: initialize the HttpClient without a health check.  I've actually been
: wondering if it makes any sense to re-use a single HttpClient object across
: all 56 server objects.

It probably would make sense for you to create a single HttpClient object
that you re-use in all of your HttpSolrServer instances -- but i'm not
sure how that would help your problem, since the HttpClient constructed
implicitly by HttpSolrServer(String) doesn't know anything about the
baseUrl.

HttpClient literally can not test the health of the URL when it is
created, because it doesn't know anything about any URLs until a request
is executed.


I gathered up the exception, looked it over very closely ... and it's 
all my fault!  User error all the way on this one.


In my code, as soon as I create the object, I make a call that gets the 
dataDir and instanceDir.  I initially wrote this code so long ago that I 
had forgotten this fact.


I have now changed it so this query is only made when the information is 
requested, not when the object initializes.  Aside: it turns out that 
nothing in my code actually USES those getter methods!  Now my program 
will start up even if Solr is down.  I'll look into re-using an http 
client on all objects.


Thanks for the prodding, Hoss!

Shawn



Re: Benefits of Solr over Lucene?

2013-02-12 Thread Upayavira
Do you want to embed an index into your application, e.g. as a desktop
app? Use Lucene. Is search basically the whole of your app? Perhaps use
Lucene. 

Do you want you offer search as a service? Do you want to be able to
arbitrarily scale your index (beyond the number of documents a single
index can handle, or beyond the load a single server can handle), do you
want to offer search services to a number of other servers? Then use
Solr.

Once you get that Lucene is a library you embed into your java app, and
Solr is a server that you connect to from other server(s), you will
hopefully be able to work out which is more appropriate.

If you consider using Lucene in the latter scenario, you will probably
end up rewriting a lot of what Solr does anyway.

Upayavira

On Tue, Feb 12, 2013, at 08:26 PM, Glen Newton wrote:
 Is there a page on the wiki that points out the use cases (or the
 features) that are best suited for Lucene adoption, and those best
 suited for SOLR adoption?
 
 -Glen
 
 On Tue, Feb 12, 2013 at 3:11 PM, Shawn Heisey s...@elyograg.org wrote:
  On 2/12/2013 11:19 AM, JohnRodey wrote:
 
  So I have had a fair amount of experience using Solr.  However on a
  separate
  project we are considering just using Lucene directly, which I have never
  done.  I am trying to avoid finding out late that Lucene doesn't offer
  what
  we need and being like aw snap, it doesn't support geospatial  (or
  highlighting, or dynamic fields, or etc...).  I am more curious about core
  index and search features, and not as much with sharding, cloud features,
  different client languages and so on.
 
 
  Because Solr is written using the Lucene API, if you want to use Lucene, you
  can do anything Solr can, plus plenty of things that Solr can't -- but for
  many of those, you'd have to write the code yourself.  That's the key
  difference -- with Solr, a HUGE amount of coding is already done for you,
  you just have to put a few easy-to-debug client API calls in your code.
 
  From my perspective as a user with some Java coding ability but not any true
  experience with large-scale development:  If your development team is ready
  and capable of writing Lucene code, then it would be better to use Solr
  instead, and if there's something you need that Solr can't do, put your
  development team to work writing the required plugin.  They would likely
  spend far less time doing that than writing an entire search system using
  Lucene.
 
  Thanks,
  Shawn
 
 
 
 
 --
 -
 http://zzzoot.blogspot.com/
 -


Re: Edismax and mm per field

2013-02-12 Thread Chris Hostetter

: Currently, edismax applies mm to the combination of all fields listed in qf.
: 
: I would like to have mm applied individually to those fields instead.

That doesn't really make sense if you think about how the qf is used to 
build the final query structure -- it is essentially producing a cross 
product of the fields in the qf and the chunks of input in the query 
string.  the mm param says how many clauses of the final resulting query 
(which are each queries for the same chunk across multiple fields) must 
match.  This blog i wrote a while back tries to explain this...

http://searchhub.org/2010/05/23/whats-a-dismax/

: For instance, the query:
: 1)
: defType=edismaxq=leo fostermm=2qf=title^5
: summary^2pf=title^5fq=contentsource:src1
: 
: would return a doc where
: title: leo lee
: summary:Joe foster

which is exactly what it's designed to do -- that way queries like David 
Smiley Solr Enterprise Search Server will match a document with David 
Smiley in the author field and Apache Solr 3 Enterprise Search Server 
in the title field.

: For the original query 1), having an additional parameter like:
: 
: - mm.qf=true (tell solr to do mm on individual fields in qf )
: or
: - mm.pf=true (tell solr to do mm on individual fields in pf)
: 
: or anything along the line would be useful.

...but you have to think about what you would want solr to do with params 
like that -- look at the query structure solr produces wih multiple fields 
in the qf, and multiple terms in the query string. look at where the 
minNumberShouldMatch is set on the outer BooleanQuery and think about 
where/how you would like to see a per-field mm applied -- if you can 
explain that in psuedo-code, then it's certainly worth discussing, but i'm 
not understanidng how it would make sense.


-Hoss


DIH Delete with Full Import

2013-02-12 Thread Kiran J
Hi,

I'm using this configuration:

http://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport

The wiki says: In this case it means obviously that in case you also want
to use deletedPkQuery then when running the delta-import command is still
necessary.

In this link: http://wiki.apache.org/solr/DataImportHandler



   -

   *postImportDeleteQuery* : after full-import this will be used to cleanup
   the index !. This is honored only on an entity that is an immediate
   sub-child of document Solr1.4 http://wiki.apache.org/solr/Solr1.4.

Is it possible for me to use full-import and postimportdeletequery ? I have
table that has the UUIDs of all the records that need to be deleted. Can I
define something like postImportDeleteQuery = Select Id from
delete_log_table. Can someone provide me an example ?

Any help is much appreciated.

Thank you.


Re: Eastings and northings support in Solr Spatial

2013-02-12 Thread Smiley, David W.
Yeah, solr.PointType.  Or use solr.SpatialRecursivePrefixTree with
geo=false
http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4


On 2/8/13 10:38 AM, Kissue Kissue kissue...@gmail.com wrote:

I can see Solr has the field type solr.LatLonType which supports spatial
based on longitudes and latitudes. Does it support spatial based on
Eastings and Northings and is the solr.PointType field type meant to be
used for this type of cordinates?

Thanks.



RE: what do you use for testing relevance?

2013-02-12 Thread Markus Jelsma
Roman,

Logging clicks and their position in the result list is one useful method to 
measure the relevance. Using the position you can calculate the mean reciprocal 
rank, a value near 1.0 is very good so over time you can clearly see whether 
changes actually improve user experience/expectations. Keep in mind that there 
is some noise because users tend to click one or more of the first few results 
anyway. 

You may also be interested in A/B testing.

http://en.wikipedia.org/wiki/Mean_reciprocal_rank
http://en.wikipedia.org/wiki/A/B_testing

Cheers
Markus
 
 
-Original message-
 From:Roman Chyla roman.ch...@gmail.com
 Sent: Tue 12-Feb-2013 23:04
 To: solr-user@lucene.apache.org
 Subject: what do you use for testing relevance?
 
 Hi,
 I do realize this is a very broad question, but still I need to ask it.
 Suppose you make a change into the scoring formula. How do you
 test/know/see what impact it had? Any framework out there?
 
 It seems like people are writing their own tools to measure relevancy.
 
 Thanks for any pointers,
 
   roman
 


Re: what do you use for testing relevance?

2013-02-12 Thread Sebastian Saip
What do you want to achieve with these tests?

Is it meant as a regression, to make sure that only the queries/boosts you
changed are affected?
Then you will have to implement tests that cover your specific
schema/boosts. I'm not aware of any frameworks that do this - we're using
Java based tests that retrieve documents from solr,  map them to our domain
model (objects representing a document) and do assertions on debug values
(e.g. score)

Or is it more about whats more relevant for the user? Then you will need
some kind of user tracking, as Markus described already.

BR


On 12 February 2013 23:16, Markus Jelsma markus.jel...@openindex.io wrote:

 Roman,

 Logging clicks and their position in the result list is one useful method
 to measure the relevance. Using the position you can calculate the mean
 reciprocal rank, a value near 1.0 is very good so over time you can clearly
 see whether changes actually improve user experience/expectations. Keep in
 mind that there is some noise because users tend to click one or more of
 the first few results anyway.

 You may also be interested in A/B testing.

 http://en.wikipedia.org/wiki/Mean_reciprocal_rank
 http://en.wikipedia.org/wiki/A/B_testing

 Cheers
 Markus


 -Original message-
  From:Roman Chyla roman.ch...@gmail.com
  Sent: Tue 12-Feb-2013 23:04
  To: solr-user@lucene.apache.org
  Subject: what do you use for testing relevance?
 
  Hi,
  I do realize this is a very broad question, but still I need to ask it.
  Suppose you make a change into the scoring formula. How do you
  test/know/see what impact it had? Any framework out there?
 
  It seems like people are writing their own tools to measure relevancy.
 
  Thanks for any pointers,
 
roman
 



Re: solr4.0 problem zkHost with multiple hosts throws out of range exception

2013-02-12 Thread mbennett
The suggested syntax didn't work with embedded ZooKeeper:

Syntax:
-DzkRun -DzkHost=nodeA:9983,nodeB:9983,nodeC:9983,nodeD:9983/solrroot
-DnumShards=2 -Dbootstrap_confdir=./solr/collection1/conf
-Dcollection.configName=MyConfig

Error:
SEVERE: Could not start Solr. Check solr/home property and the logs
Feb 12, 2013 1:36:49 PM org.apache.solr.common.SolrException log
SEVERE: null:java.lang.NumberFormatException: For input string:
9983/solrroot
at
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)

More details at:
https://issues.apache.org/jira/browse/SOLR-4450



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr4-0-problem-zkHost-with-multiple-hosts-throws-out-of-range-exception-tp4014440p4040087.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr4.0 problem zkHost with multiple hosts throws out of range exception

2013-02-12 Thread Upayavira
This config isn't intended for embedded zookeeper, it is for a separate
zookeeper ensemble that is shared with other services.

Upayavira

On Tue, Feb 12, 2013, at 10:19 PM, mbennett wrote:
 The suggested syntax didn't work with embedded ZooKeeper:
 
 Syntax:
 -DzkRun -DzkHost=nodeA:9983,nodeB:9983,nodeC:9983,nodeD:9983/solrroot
 -DnumShards=2 -Dbootstrap_confdir=./solr/collection1/conf
 -Dcollection.configName=MyConfig
 
 Error:
 SEVERE: Could not start Solr. Check solr/home property and the logs
 Feb 12, 2013 1:36:49 PM org.apache.solr.common.SolrException log
 SEVERE: null:java.lang.NumberFormatException: For input string:
 9983/solrroot
 at
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
 
 More details at:
 https://issues.apache.org/jira/browse/SOLR-4450
 
 
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/solr4-0-problem-zkHost-with-multiple-hosts-throws-out-of-range-exception-tp4014440p4040087.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to limit queries to specific IDs

2013-02-12 Thread Erick Erickson
First, it may not be a problem assuming your other filter queries are more
frequent.

Second, the easiest way to keep these out of the filter cache would be just
to include them as a MUST clause, like
+(original query) +id:(1 2 3 4).

Third possibility, see https://issues.apache.org/jira/browse/SOLR-2429, but
the short form is:
fq={!cache=false}restoffq


On Mon, Feb 11, 2013 at 2:41 PM, Isaac Hebsh isaac.he...@gmail.com wrote:

 Hi everyone.

 I have queries that should be bounded to a set of IDs (the uniqueKey field
 of my schema).
 My client front-end sends two Solr request:
 In the first one, it wants to get the top X IDs. This result should return
 very fast. No time to waste on highlighting. this is a very standard
 query.
 In the aecond one, it wants to get the highlighting info (corresponding to
 the queried fields and terms, of course), on those documents (may be some
 sequential requests, on small bulks of the full list).

 These two requests are implemented as almost identical calls, to different
 requestHandlers.

 I thought to append a filter query to the second request, id:(1 2 3 4 5).
 Is this idea good for Solr?
 If does, my problem is that I don't want these filters to flood my
 filterCache... Is there any way (even if it involves some coding...) to add
 a filter query which won't be added to filterCache (at least, not instead
 of standard filters)?


 Notes:
 1. It can't be assured that the the first query will remain in
 queryResultsCache...
 2. consider index size of 50M documents...



Re: Reverse range query

2013-02-12 Thread Erick Erickson
Well, what does adding debug=query show you for the parsed query? What
documents show up?

My first guess is that since you're using exclusive rather than inclusive
end points you're expectations aren't what you think.

Best
Erick


On Mon, Feb 11, 2013 at 10:57 PM, ballusethuraman ballusethura...@gmail.com
 wrote:

 Hi,
 I have craeted new attribute(Year) in attribute dictionary and associated
 with different catentries with different values say
 2000,2001,2002,2003,...2012.
 Now I want to search with the Year attribute with min and max range. when
 2000 to 2005 is given as search condition it should fetch the catentries
 which is between these two values.
 This is the url I used to hit the solr server.
 ads_f11001 is the logical name of the attribute year that i have created
 in management center. This value will be in srchattrprop table. 2000 and
 2005 is min and max range.
 http://localhost/solr/MC_10701_CatalogEntry_en_US/select?q=ads_f11001:{2000
 2005}

 when i try to hit this url i am getting 0 records found.
 http://localhost/solr/MC_10701_CatalogEntry_en_US/select?q=ads_f11001:{2000
 TO *}

 and

 http://localhost/solr/MC_10701_CatalogEntry_en_US/select?q=ads_f11001:{*TO
 2005}

 These above two urls ferching me some result but it s not the expected
 result. Plz help me to solve this issue.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Reverse-range-query-tp1789135p4039860.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: LoadBalancing while adding documents

2013-02-12 Thread Erick Erickson
Hold on here. LBHttpSolrServer should not be used for indexing in a
Master/Slave setup, but in SolrCloud you may use it. Indeed,
CloudSolrServer uses LBHttpSolrServer under the covers.

Now, why would you want to send requests to both servers? If you're in
master/slave mode (i.e. not running Zookeeper), you _must_ send the update
to the right master. If you're in SolrCloud mode, you don't care. You have
to send each document to Solr only once. In Master/Slave mode, you must
send it to the correct master. In SolrCloud mode you don't care where you
send it, it'll be routed to the right place.

Best
Erick


On Tue, Feb 12, 2013 at 8:22 AM, J Mohamed Zahoor zah...@indix.com wrote:

 Hi

 I have multi shard replicated index spread across two machines.

 Once a week, i delete the entire index and create it from scratch.
 Today i am using ConcurrentUpdateSolrServer in solrj to add documents to
 the index.

 I want to add documents through both the servers.. to utilise the
 resources...
 i read in wiki (i think)  that LBHttpSolrServer should not be used for
 indexing documents.

 Is there any other way to send request to both the servers without using
 any external load balancers?
 I am using Solr 4.1.

 ./zahoor


Re: SolrCloud and hardcoded 'id' field

2013-02-12 Thread Shawn Heisey

On 2/11/2013 7:47 PM, Mark Miller wrote:

Doesn't sound right to me. I'd guess you heard wrong.


I did a search for id with the quotes throughout the branch_4x source 
code.  After excluding test code, test files, and other things that 
looked like they have good reason to be hardcoded, I was left with the 
following:


This class looks like a definite problem.
org.apache.solr.common.cloud.HashBasedRouter
Object  idObj = sdoc.getFieldValue(id);  // blech

This class uses id in a way that looks bad to me, but it's just for 
logging:

org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer

I can't tell if use of id in these classes is problematic.
org.apache.solr.handler.admin.LukeRequestHandler
org.apache.solr.handler.admin.QueryElevationComponent
org.apache.solr.handler.component.RealTimeGetComponent
org.apache.solr.handler.loader.JsonLoader
org.apache.solr.handler.loader.XMLLoader
org.apache.solr.spelling.SpellCheckCollator

These classes use id in a way that does not look problematic to me, 
but should be reviewed.

org.apache.solr.cloud.ElectionContext
org.apache.solr.cloud.Overseer
org.apache.solr.cloud.OverseerCollectionProcessor
org.apache.solr.core.JmxMonitoredMap
org.apache.solr.handler.admin.ThreadDumpHandler

Thanks,
Shawn



Re: SolrCloud and hardcoded 'id' field

2013-02-12 Thread Shawn Heisey

On 2/12/2013 7:54 PM, Shawn Heisey wrote:

On 2/11/2013 7:47 PM, Mark Miller wrote:

Doesn't sound right to me. I'd guess you heard wrong.


I did a search for id with the quotes throughout the branch_4x source
code.  After excluding test code, test files, and other things that
looked like they have good reason to be hardcoded, I was left with the
following:

This class looks like a definite problem.
org.apache.solr.common.cloud.HashBasedRouter
 Object  idObj = sdoc.getFieldValue(id);  // blech

This class uses id in a way that looks bad to me, but it's just for
logging:
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer

I can't tell if use of id in these classes is problematic.
org.apache.solr.handler.admin.LukeRequestHandler
org.apache.solr.handler.admin.QueryElevationComponent
org.apache.solr.handler.component.RealTimeGetComponent
org.apache.solr.handler.loader.JsonLoader
org.apache.solr.handler.loader.XMLLoader
org.apache.solr.spelling.SpellCheckCollator

These classes use id in a way that does not look problematic to me,
but should be reviewed.
org.apache.solr.cloud.ElectionContext
org.apache.solr.cloud.Overseer
org.apache.solr.cloud.OverseerCollectionProcessor
org.apache.solr.core.JmxMonitoredMap
org.apache.solr.handler.admin.ThreadDumpHandler


Somehow I missed this one where I don't know if it's a problem:

org.apache.solr.search.grouping.distributed.shardresultserializer.TopGroupsResultTransformer

Thanks,
Shawn



Re: Benefits of Solr over Lucene?

2013-02-12 Thread Lance Norskog
Lucene and Solr have an aggressive upgrade schedule.From 3 to 4 got a 
major rewiring,

and parts are orders of magnitude faster and smaller.
If you code using Lucene, you will never upgrade to newer versions.
(I supported SolrLucene customers for 3 years, and nobody ever did.)

Cheers,
Lance


I know that Solr web-enables a Lucene index, but I'm trying to figure out
what other things Solr offers over Lucene.  On the Solr features list it
says Solr uses the Lucene search library and extends it!, but what exactly
are the extensions from the list and what did Lucene give you?  Also if I
have an index built through Solr is there a non-HTTP way to search that
index?  Because solr4j essentially just makes HTTP requests correct?

Some features Im particularly interested in are:
Geospatial Search
Highlighting
Dynamic Fields
Near Real-Time Indexing
Multiple Search Indices

Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Benefits-of-Solr-over-Lucene-tp4039964.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: More Like This, finding original record

2013-02-12 Thread Otis Gospodnetic
Hello,

Daniel, are you looking for the original doc you used for MLT in the
response? You could always and easily do this on the client side by looking
at IDs of returned docs.

Otis
Solr  ElasticSearch Support
http://sematext.com/



On Feb 12, 2013 9:26 AM, Daniel Rijkhof daniel.rijk...@gmail.com wrote:

 I guess it's not possible, but perhaps someone knows how to do this:

 Do a more like this query (through the mlt handler),
 And find the match record within the response records (top match, should be
 first in list).

 This would then make it possible for me to compare scores...

 Anybody around that did this? (Modify source code perhaps?)

 Have Fun
 Daniel



Re: what do you use for testing relevance?

2013-02-12 Thread Otis Gospodnetic
Hi Roman,

We use our own Search Analytics service. It's free and open to anyone - see
http://sematext.com/search-analytics/index.html

And this post talks exactly about the topic you are asking about:
http://blog.sematext.com/2012/01/06/relevance-tuning-and-competitive-advantage-via-search-analytics

It includes a screenshot with MRR (Mean Reciprocal Rank) that Markus
mentioned.

Otis
Solr  ElasticSearch Support
http://sematext.com/


On Feb 12, 2013 5:04 PM, Roman Chyla roman.ch...@gmail.com wrote:

 Hi,
 I do realize this is a very broad question, but still I need to ask it.
 Suppose you make a change into the scoring formula. How do you
 test/know/see what impact it had? Any framework out there?

 It seems like people are writing their own tools to measure relevancy.

 Thanks for any pointers,

   roman



Re: How to limit queries to specific IDs

2013-02-12 Thread Isaac Hebsh
Thank you, Erick! Three great answers!


On Wed, Feb 13, 2013 at 4:20 AM, Erick Erickson erickerick...@gmail.comwrote:

 First, it may not be a problem assuming your other filter queries are more
 frequent.

 Second, the easiest way to keep these out of the filter cache would be just
 to include them as a MUST clause, like
 +(original query) +id:(1 2 3 4).

 Third possibility, see https://issues.apache.org/jira/browse/SOLR-2429,
 but
 the short form is:
 fq={!cache=false}restoffq


 On Mon, Feb 11, 2013 at 2:41 PM, Isaac Hebsh isaac.he...@gmail.com
 wrote:

  Hi everyone.
 
  I have queries that should be bounded to a set of IDs (the uniqueKey
 field
  of my schema).
  My client front-end sends two Solr request:
  In the first one, it wants to get the top X IDs. This result should
 return
  very fast. No time to waste on highlighting. this is a very standard
  query.
  In the aecond one, it wants to get the highlighting info (corresponding
 to
  the queried fields and terms, of course), on those documents (may be some
  sequential requests, on small bulks of the full list).
 
  These two requests are implemented as almost identical calls, to
 different
  requestHandlers.
 
  I thought to append a filter query to the second request, id:(1 2 3 4
 5).
  Is this idea good for Solr?
  If does, my problem is that I don't want these filters to flood my
  filterCache... Is there any way (even if it involves some coding...) to
 add
  a filter query which won't be added to filterCache (at least, not instead
  of standard filters)?
 
 
  Notes:
  1. It can't be assured that the the first query will remain in
  queryResultsCache...
  2. consider index size of 50M documents...
 



Re: LoadBalancing while adding documents

2013-02-12 Thread J Mohamed Zahoor

On 13-Feb-2013, at 8:11 AM, Erick Erickson erickerick...@gmail.com wrote:

 Hold on here. LBHttpSolrServer should not be used for indexing in a
 Master/Slave setup, but in SolrCloud you may use it. Indeed,
 CloudSolrServer uses LBHttpSolrServer under the covers.

In SolrCloud mode, ConcurrentUpdateSolrServer will already do the LoadBalacing 
while adding and querying documents from Solr. 
is my understanding right?



 
 Now, why would you want to send requests to both servers?


I just wanted to send some docs to machine1 and some docs to machine2 to load 
balance.
Not the same doc to both the machines.


 If you're in
 master/slave mode (i.e. not running Zookeeper), you _must_ send the update
 to the right master. If you're in SolrCloud mode, you don't care. You have
 to send each document to Solr only once. In Master/Slave mode, you must
 send it to the correct master. In SolrCloud mode you don't care where you
 send it, it'll be routed to the right place.
 

I am in SolrCloud mode. 
I always send it to one of the server. And if i get you right, they will 
automatically loadBalance is what i take.


./Zahoor



Re: what do you use for testing relevance?

2013-02-12 Thread Steffen Elberg Godskesen

Hi Roman,

If you're looking for regression testing then 
https://github.com/sul-dlss/rspec-solr might be worth looking at. If you're not 
a ruby shop, doing something similar in another language shouldn't be to hard.
 

The basic idea is that you setup a set of tests like

If the query is X, then the document with id Y should be in the first 10 
results
If the query is S, then a document with title T should be the first result
If the query is P, then a document with author Q should not be in the first 10 
result

and that you run these whenever you tune your scoring formula to ensure that 
you haven't introduced unintended effects. New ideas/requirements for your 
relevance ranking should always result in writing new tests - that will 
probably fail until you tune your scoring formula. This is certainly no magic 
bullet, but it will give you some confidence that you didn't make things worse. 
And - in my humble opinion - it also gives you the benefit of discouraging you 
from tuning your scoring just for fun. To put it bluntly: if you cannot write 
up a requirement in form of a test, you probably have no need to tune your 
scoring.


Regards,

-- 
Steffen



On Tuesday, February 12, 2013 at 23:03 , Roman Chyla wrote:

 Hi,
 I do realize this is a very broad question, but still I need to ask it.
 Suppose you make a change into the scoring formula. How do you
 test/know/see what impact it had? Any framework out there?
 
 It seems like people are writing their own tools to measure relevancy.
 
 Thanks for any pointers,
 
 roman 




Re: LoadBalancing while adding documents

2013-02-12 Thread J Mohamed Zahoor
Ooh.. I dint know that there is CloudSolrServer.
Thanks for the pointer.
Will explore that.

./zahoor


On 13-Feb-2013, at 11:49 AM, J Mohamed Zahoor zah...@indix.com wrote:

 
 On 13-Feb-2013, at 8:11 AM, Erick Erickson erickerick...@gmail.com wrote:
 
 Hold on here. LBHttpSolrServer should not be used for indexing in a
 Master/Slave setup, but in SolrCloud you may use it. Indeed,
 CloudSolrServer uses LBHttpSolrServer under the covers.
 
 In SolrCloud mode, ConcurrentUpdateSolrServer will already do the 
 LoadBalacing while adding and querying documents from Solr. 
 is my understanding right?
 
 
 
 
 Now, why would you want to send requests to both servers?
 
 
 I just wanted to send some docs to machine1 and some docs to machine2 to load 
 balance.
 Not the same doc to both the machines.
 
 
 If you're in
 master/slave mode (i.e. not running Zookeeper), you _must_ send the update
 to the right master. If you're in SolrCloud mode, you don't care. You have
 to send each document to Solr only once. In Master/Slave mode, you must
 send it to the correct master. In SolrCloud mode you don't care where you
 send it, it'll be routed to the right place.
 
 
 I am in SolrCloud mode. 
 I always send it to one of the server. And if i get you right, they will 
 automatically loadBalance is what i take.
 
 
 ./Zahoor