date:20130507

Re: ConcurrentUpdateSolrServer Missing ContentType error on SOLR 4.2.1

2013-05-07 Thread Shawn Heisey

 I apologize for intruding, Shawn, do you know what can cause empty params
 (i.e. params={}) ?

I've got no idea what is causing this problem on your system. All of the
ideas I've had so far don't seem to apply.

Can you run a packet sniffer on your client to see whether the client is
sending the right info?

Thanks,
Shawn

Re: update to 4.3

2013-05-07 Thread Arkadi Colson


Any tips on what to do with the configuration files?
Where do I have to store them and what should they look like? Any examples?


May 07, 2013 6:16:27 AM org.apache.catalina.core.AprLifecycleListener init
INFO: The APR based Apache Tomcat Native library which allows optimal 
performance in production environments was not found on the 
java.library.path: 
/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib

May 07, 2013 6:16:28 AM org.apache.coyote.AbstractProtocol init
INFO: Initializing ProtocolHandler [http-bio-8983]
May 07, 2013 6:16:28 AM org.apache.coyote.AbstractProtocol init
INFO: Initializing ProtocolHandler [ajp-bio-8009]
May 07, 2013 6:16:28 AM org.apache.catalina.startup.Catalina load
INFO: Initialization processed in 621 ms
May 07, 2013 6:16:28 AM org.apache.catalina.core.StandardService 
startInternal

INFO: Starting service Catalina
May 07, 2013 6:16:28 AM org.apache.catalina.core.StandardEngine 
startInternal

INFO: Starting Servlet Engine: Apache Tomcat/7.0.39
May 07, 2013 6:16:28 AM org.apache.catalina.startup.HostConfig deployWAR
INFO: Deploying web application archive 
/usr/local/apache-tomcat-7.0.39/webapps/solr.war
log4j:WARN No appenders could be found for logger 
(org.apache.solr.servlet.SolrDispatchFilter).

log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for 
more info.
May 07, 2013 6:16:33 AM org.apache.catalina.startup.HostConfig 
deployDirectory
INFO: Deploying web application directory 
/usr/local/apache-tomcat-7.0.39/webapps/host-manager
May 07, 2013 6:16:33 AM org.apache.catalina.startup.HostConfig 
deployDirectory
INFO: Deploying web application directory 
/usr/local/apache-tomcat-7.0.39/webapps/docs
May 07, 2013 6:16:33 AM org.apache.catalina.startup.HostConfig 
deployDirectory
INFO: Deploying web application directory 
/usr/local/apache-tomcat-7.0.39/webapps/manager
May 07, 2013 6:16:34 AM org.apache.catalina.startup.HostConfig 
deployDirectory
INFO: Deploying web application directory 
/usr/local/apache-tomcat-7.0.39/webapps/ROOT
May 07, 2013 6:16:34 AM org.apache.catalina.startup.HostConfig 
deployDirectory
INFO: Deploying web application directory 
/usr/local/apache-tomcat-7.0.39/webapps/examples

May 07, 2013 6:16:34 AM org.apache.coyote.AbstractProtocol start
INFO: Starting ProtocolHandler [http-bio-8983]
May 07, 2013 6:16:34 AM org.apache.coyote.AbstractProtocol start
INFO: Starting ProtocolHandler [ajp-bio-8009]
May 07, 2013 6:16:34 AM org.apache.catalina.startup.Catalina start
INFO: Server startup in 6000 ms

BR,
Arkadi

On 05/06/2013 10:13 PM, Jan Høydahl wrote:

Hi,

The reason is that from Solr 4.3 you need to provide the SLF4J logger jars of 
choice
when deploying Solr to an external servlet container.

Simplest is to copy all jars from example/lib/ext into tomcat/lib

cd solr-4.3.0/example/lib/ext
cp * /usr/local/apache-tomcat-7.0.39/lib/

Please see CHANGES.TXT for more info 
http://lucene.apache.org/solr/4_3_0/changes/Changes.html#4.3.0.upgrading_from_solr_4.2.0

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

6. mai 2013 kl. 16:50 skrev Arkadi Colson ark...@smartbit.be:


Hi

After update to 4.3 I got this error:

May 06, 2013 2:30:08 PM org.apache.coyote.AbstractProtocol init
INFO: Initializing ProtocolHandler [http-bio-8983]
May 06, 2013 2:30:08 PM org.apache.coyote.AbstractProtocol init
INFO: Initializing ProtocolHandler [ajp-bio-8009]
May 06, 2013 2:30:08 PM org.apache.catalina.startup.Catalina load
INFO: Initialization processed in 610 ms
May 06, 2013 2:30:08 PM org.apache.catalina.core.StandardService startInternal
INFO: Starting service Catalina
May 06, 2013 2:30:08 PM org.apache.catalina.core.StandardEngine startInternal
INFO: Starting Servlet Engine: Apache Tomcat/7.0.39
May 06, 2013 2:30:08 PM org.apache.catalina.startup.HostConfig deployWAR
INFO: Deploying web application archive 
/usr/local/apache-tomcat-7.0.39/webapps/solr.war
May 06, 2013 2:30:45 PM org.apache.catalina.util.SessionIdGenerator 
createSecureRandom
INFO: Creation of SecureRandom instance for session ID generation using 
[SHA1PRNG] took [36,697] milliseconds.
May 06, 2013 2:30:45 PM org.apache.catalina.core.StandardContext startInternal
SEVERE: Error filterStart
May 06, 2013 2:30:45 PM org.apache.catalina.core.StandardContext startInternal
SEVERE: Context [/solr] startup failed due to previous errors
May 06, 2013 2:30:45 PM org.apache.catalina.startup.HostConfig deployDirectory
INFO: Deploying web application directory 
/usr/local/apache-tomcat-7.0.39/webapps/host-manager
May 06, 2013 2:30:45 PM org.apache.catalina.startup.HostConfig deployDirectory
INFO: Deploying web application directory 
/usr/local/apache-tomcat-7.0.39/webapps/docs
May 06, 2013 2:30:45 PM org.apache.catalina.startup.HostConfig deployDirectory
INFO: Deploying web application directory 
/usr/local/apache-tomcat-7.0.39/webapps/manager
May 06, 2013 2:30:45 PM org.apache.catalina.startup.HostConfig

Re: update to 4.3

2013-05-07 Thread Arkadi Colson


Found it on http://wiki.apache.org/solr/SolrLogging!

Thx

On 05/07/2013 08:40 AM, Arkadi Colson wrote:

Any tips on what to do with the configuration files?
Where do I have to store them and what should they look like? Any 
examples?



May 07, 2013 6:16:27 AM org.apache.catalina.core.AprLifecycleListener 
init
INFO: The APR based Apache Tomcat Native library which allows optimal 
performance in production environments was not found on the 
java.library.path: 
/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib

May 07, 2013 6:16:28 AM org.apache.coyote.AbstractProtocol init
INFO: Initializing ProtocolHandler [http-bio-8983]
May 07, 2013 6:16:28 AM org.apache.coyote.AbstractProtocol init
INFO: Initializing ProtocolHandler [ajp-bio-8009]
May 07, 2013 6:16:28 AM org.apache.catalina.startup.Catalina load
INFO: Initialization processed in 621 ms
May 07, 2013 6:16:28 AM org.apache.catalina.core.StandardService 
startInternal

INFO: Starting service Catalina
May 07, 2013 6:16:28 AM org.apache.catalina.core.StandardEngine 
startInternal

INFO: Starting Servlet Engine: Apache Tomcat/7.0.39
May 07, 2013 6:16:28 AM org.apache.catalina.startup.HostConfig deployWAR
INFO: Deploying web application archive 
/usr/local/apache-tomcat-7.0.39/webapps/solr.war
log4j:WARN No appenders could be found for logger 
(org.apache.solr.servlet.SolrDispatchFilter).

log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig 
for more info.
May 07, 2013 6:16:33 AM org.apache.catalina.startup.HostConfig 
deployDirectory
INFO: Deploying web application directory 
/usr/local/apache-tomcat-7.0.39/webapps/host-manager
May 07, 2013 6:16:33 AM org.apache.catalina.startup.HostConfig 
deployDirectory
INFO: Deploying web application directory 
/usr/local/apache-tomcat-7.0.39/webapps/docs
May 07, 2013 6:16:33 AM org.apache.catalina.startup.HostConfig 
deployDirectory
INFO: Deploying web application directory 
/usr/local/apache-tomcat-7.0.39/webapps/manager
May 07, 2013 6:16:34 AM org.apache.catalina.startup.HostConfig 
deployDirectory
INFO: Deploying web application directory 
/usr/local/apache-tomcat-7.0.39/webapps/ROOT
May 07, 2013 6:16:34 AM org.apache.catalina.startup.HostConfig 
deployDirectory
INFO: Deploying web application directory 
/usr/local/apache-tomcat-7.0.39/webapps/examples

May 07, 2013 6:16:34 AM org.apache.coyote.AbstractProtocol start
INFO: Starting ProtocolHandler [http-bio-8983]
May 07, 2013 6:16:34 AM org.apache.coyote.AbstractProtocol start
INFO: Starting ProtocolHandler [ajp-bio-8009]
May 07, 2013 6:16:34 AM org.apache.catalina.startup.Catalina start
INFO: Server startup in 6000 ms

BR,
Arkadi

On 05/06/2013 10:13 PM, Jan Høydahl wrote:

Hi,

The reason is that from Solr 4.3 you need to provide the SLF4J logger 
jars of choice

when deploying Solr to an external servlet container.

Simplest is to copy all jars from example/lib/ext into tomcat/lib

cd solr-4.3.0/example/lib/ext
cp * /usr/local/apache-tomcat-7.0.39/lib/

Please see CHANGES.TXT for more info 
http://lucene.apache.org/solr/4_3_0/changes/Changes.html#4.3.0.upgrading_from_solr_4.2.0


--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

6. mai 2013 kl. 16:50 skrev Arkadi Colson ark...@smartbit.be:


Hi

After update to 4.3 I got this error:

May 06, 2013 2:30:08 PM org.apache.coyote.AbstractProtocol init
INFO: Initializing ProtocolHandler [http-bio-8983]
May 06, 2013 2:30:08 PM org.apache.coyote.AbstractProtocol init
INFO: Initializing ProtocolHandler [ajp-bio-8009]
May 06, 2013 2:30:08 PM org.apache.catalina.startup.Catalina load
INFO: Initialization processed in 610 ms
May 06, 2013 2:30:08 PM org.apache.catalina.core.StandardService 
startInternal

INFO: Starting service Catalina
May 06, 2013 2:30:08 PM org.apache.catalina.core.StandardEngine 
startInternal

INFO: Starting Servlet Engine: Apache Tomcat/7.0.39
May 06, 2013 2:30:08 PM org.apache.catalina.startup.HostConfig 
deployWAR
INFO: Deploying web application archive 
/usr/local/apache-tomcat-7.0.39/webapps/solr.war
May 06, 2013 2:30:45 PM org.apache.catalina.util.SessionIdGenerator 
createSecureRandom
INFO: Creation of SecureRandom instance for session ID generation 
using [SHA1PRNG] took [36,697] milliseconds.
May 06, 2013 2:30:45 PM org.apache.catalina.core.StandardContext 
startInternal

SEVERE: Error filterStart
May 06, 2013 2:30:45 PM org.apache.catalina.core.StandardContext 
startInternal

SEVERE: Context [/solr] startup failed due to previous errors
May 06, 2013 2:30:45 PM org.apache.catalina.startup.HostConfig 
deployDirectory
INFO: Deploying web application directory 
/usr/local/apache-tomcat-7.0.39/webapps/host-manager
May 06, 2013 2:30:45 PM org.apache.catalina.startup.HostConfig 
deployDirectory
INFO: Deploying web application directory 
/usr/local/apache-tomcat-7.0.39/webapps/docs
May 06, 2013 2:30:45 PM org.apache.catalina.startup.HostConfig 
deployDirectory
INFO: Deploying web application

Re: When a search query comes to a replica what happens?

2013-05-07 Thread Furkan KAMACI

Hi Otis;

I've read at somewhere says that if you have one replica and 1000 query per
second search rate and if you switch to 5 replica you may get 200 qps
search rate. What do you think about that and how Solr parallelize
searching within replicas? By the way when you say replica do you mean both
replica and leader (because of leader is a replica too) or nodes of shard
except for leader?

2013/4/17 Otis Gospodnetic otis.gospodne...@gmail.com

 No.

 Otis
 --
 Solr  ElasticSearch Support
 http://sematext.com/





 On Tue, Apr 16, 2013 at 6:23 PM, Furkan KAMACI furkankam...@gmail.com
 wrote:
  All in all will replica ask to its leader about where is remaining of
 data
  or it directly asks to Zookeper?
 
  2013/4/17 Otis Gospodnetic otis.gospodne...@gmail.com
 
  Hi,
 
  No, I believe redirect from replica to leader would happen only at
  index time, so a doc first gets indexed to leader and from there it's
  replicated to non-leader shards.  At query time there is no redirect
  to leader, I imagine, as that would quickly turn leaders into
  hotspots.
 
  Otis
  --
  Solr  ElasticSearch Support
  http://sematext.com/
 
 
 
 
 
  On Tue, Apr 16, 2013 at 6:01 PM, Furkan KAMACI furkankam...@gmail.com
  wrote:
   I want to make it clear in my mind:
  
   When a search query comes to a replica what happens?
  
   -Does it forwards the search query to leader and leader collects all
 the
   data and prepares response (this will cause a performance issue
 because
   leader is responsible for indexing at same time)
   or
   - replica communicates with leader and learns where is remaining
   data(leaders asks to Zookeper and tells it to replica) and replica
  collects
   all data and response it?

RE: Solr Cloud with large synonyms.txt

2013-05-07 Thread Roman Chyla

We have synonym files bigger than 5MB so even with compression that would
be probably failing (not using solr cloud yet)
Roman
On 6 May 2013 23:09, David Parks davidpark...@yahoo.com wrote:

 Wouldn't it make more sense to only store a pointer to a synonyms file in
 zookeeper? Maybe just make the synonyms file accessible via http so other
 boxes can copy it if needed? Zookeeper was never meant for storing
 significant amounts of data.


 -Original Message-
 From: Jan Høydahl [mailto:jan@cominvent.com]
 Sent: Tuesday, May 07, 2013 4:35 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr Cloud with large synonyms.txt

 See discussion here
 http://lucene.472066.n3.nabble.com/gt-1MB-file-to-Zookeeper-td3958614.html

 One idea was compression. Perhaps if we add gzip support to SynonymFilter
 it
 can read synonyms.txt.gz which would then fit larger raw dicts?

 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com

 6. mai 2013 kl. 18:32 skrev Son Nguyen s...@trancorp.com:

  Hello,
 
  I'm building a Solr Cloud (version 4.1.0) with 2 shards and a Zookeeper
 (the Zookeeer is on different machine, version 3.4.5).
  I've tried to start with a 1.7MB synonyms.txt, but got a
 ConnectionLossException:
  Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException:
 KeeperErrorCode = ConnectionLoss for /configs/solr1/synonyms.txt
 at
 org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
 at
 org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
 at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1266)
 at
 org.apache.solr.common.cloud.SolrZkClient$8.execute(SolrZkClient.java:270)
 at
 org.apache.solr.common.cloud.SolrZkClient$8.execute(SolrZkClient.java:267)
 at

 org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java
 :65)
 at
 org.apache.solr.common.cloud.SolrZkClient.setData(SolrZkClient.java:267)
 at
 org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:436)
 at
 org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:315)
 at
 org.apache.solr.cloud.ZkController.uploadToZK(ZkController.java:1135)
 at
 org.apache.solr.cloud.ZkController.uploadConfigDir(ZkController.java:955)
 at
 org.apache.solr.core.CoreContainer.initZooKeeper(CoreContainer.java:285)
 ... 43 more
 
  I did some researches on internet and found out that because Zookeeper
 znode size limit is 1MB. I tried to increase the system property
 jute.maxbuffer but it won't work.
  Does anyone have experience of dealing with it?
 
  Thanks,
  Son

Re: Rearranging Search Results of a Search?

2013-05-07 Thread Furkan KAMACI

Can I use Transformers for my purpose?

2013/5/3 Furkan KAMACI furkankam...@gmail.com

 I think this looks like what I search for:
 https://issues.apache.org/jira/browse/SOLR-4465

 How about post filter for Lucene, can it help me for my purpose?

 2013/5/3 Otis Gospodnetic otis.gospodne...@gmail.com

 Hi,

 You should use search more often :)

 http://search-lucene.com/?q=scriptable+collectorsort=newestOnTopfc_project=Solrfc_type=issue

 Coincidentally, what you see there happens to be a good example of a
 Solr component that does something behind the scenes to deliver those
 search results even though my original query was bad.  Knd of
 similar to what you are after.

 Otis
 --
 Solr  ElasticSearch Support
 http://sematext.com/





 On Thu, May 2, 2013 at 4:47 PM, Furkan KAMACI furkankam...@gmail.com
 wrote:
  I know that I can use boosting at query for a field, for a searching
 term,
  at solrconfig.xml and query elevator so I can arrange the results of a
  search. However after I get top documents how can I change the order of
 a
  results? Does Lucene's postfilter stands for that?

Re: Solr Cloud with large synonyms.txt

2013-05-07 Thread Jan Høydahl

Hi,

SolrCloud is designed with an assumption that you should be able to upload your 
whole disk-based conf folder into ZK, and that you should be able to add an 
empty Solr node to a cluster and it would download all config from ZK. So 
immediately a splitting strategy automatically handled by ZkSolresourceLoader 
for large files could be one way forward, i.e. store synonyms.txt as e.g. 
__001_synonyms.txt __002_synonyms.txt

Feel free to open a JIRA issue for this so we can get a proper resolution.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

7. mai 2013 kl. 09:55 skrev Roman Chyla roman.ch...@gmail.com:

 We have synonym files bigger than 5MB so even with compression that would
 be probably failing (not using solr cloud yet)
 Roman
 On 6 May 2013 23:09, David Parks davidpark...@yahoo.com wrote:
 
 Wouldn't it make more sense to only store a pointer to a synonyms file in
 zookeeper? Maybe just make the synonyms file accessible via http so other
 boxes can copy it if needed? Zookeeper was never meant for storing
 significant amounts of data.
 
 
 -Original Message-
 From: Jan Høydahl [mailto:jan@cominvent.com]
 Sent: Tuesday, May 07, 2013 4:35 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr Cloud with large synonyms.txt
 
 See discussion here
 http://lucene.472066.n3.nabble.com/gt-1MB-file-to-Zookeeper-td3958614.html
 
 One idea was compression. Perhaps if we add gzip support to SynonymFilter
 it
 can read synonyms.txt.gz which would then fit larger raw dicts?
 
 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 
 6. mai 2013 kl. 18:32 skrev Son Nguyen s...@trancorp.com:
 
 Hello,
 
 I'm building a Solr Cloud (version 4.1.0) with 2 shards and a Zookeeper
 (the Zookeeer is on different machine, version 3.4.5).
 I've tried to start with a 1.7MB synonyms.txt, but got a
 ConnectionLossException:
 Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException:
 KeeperErrorCode = ConnectionLoss for /configs/solr1/synonyms.txt
   at
 org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
   at
 org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
   at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1266)
   at
 org.apache.solr.common.cloud.SolrZkClient$8.execute(SolrZkClient.java:270)
   at
 org.apache.solr.common.cloud.SolrZkClient$8.execute(SolrZkClient.java:267)
   at
 
 org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java
 :65)
   at
 org.apache.solr.common.cloud.SolrZkClient.setData(SolrZkClient.java:267)
   at
 org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:436)
   at
 org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:315)
   at
 org.apache.solr.cloud.ZkController.uploadToZK(ZkController.java:1135)
   at
 org.apache.solr.cloud.ZkController.uploadConfigDir(ZkController.java:955)
   at
 org.apache.solr.core.CoreContainer.initZooKeeper(CoreContainer.java:285)
   ... 43 more
 
 I did some researches on internet and found out that because Zookeeper
 znode size limit is 1MB. I tried to increase the system property
 jute.maxbuffer but it won't work.
 Does anyone have experience of dealing with it?
 
 Thanks,
 Son

Re: Delete from Solr Cloud 4.0 index..

2013-05-07 Thread Annette Newton

Hi Erick,

Thanks for the tip.

Will docValues help with memory usage?  It seemed a bit complicated to set
up..

The index size saving was nice because that means that potentially I could
use smaller provisioned IOP volumes which cost less...

Thanks.


On 3 May 2013 18:27, Erick Erickson erickerick...@gmail.com wrote:

 Anette:

 Be a little careful with the index size savings, they really don't
 mean much for _searching_. The sotred field compression
 significantly reduces the size on disk, but only for the stored
 data which is only accessed when returning the top N docs. In
 terms of how many docs you can fit on your hardware, it's pretty
 irrelevant.

 The *.fdt and *.fdx files in your index directory contain the stored
 data, so when looking at the effects of various options (including
 compression), you can pretty much ignore these files.

 FWIW,
 Erick

 On Fri, May 3, 2013 at 2:03 AM, Annette Newton
 annette.new...@servicetick.com wrote:
  Thanks Shawn.
 
  I have played around with Soft Commits before and didn't seem to have any
  improvement, but with the current load testing I am doing I will give it
  another go.
 
  I have researched docValues and came across the fact that it would
 increase
  the index size.  With the upgrade to 4.2.1 the index size has reduced by
  approx 33% which is pleasing and I don't really want to lose that saving.
 
  We do use the facet.enum method - which works really well, but I will
  verify that we are using that in every instance, we have numerous
  developers working on the product and maybe one or two have slipped
  through.
 
  Right from the first I upped the zkClientTimeout to 30 as I wanted to
 give
  extra time for any network blips that we experience on AWS.  We only seem
  to drop communication on a full garbage collection though.
 
  I am coming to the conclusion that we need to have more shards to cope
 with
  the writes, so I will play around with adding more shards and see how I
 go.
 
 
  I appreciate you having a look over our setup and the advice.
 
  Thanks again.
 
  Netty.
 
 
  On 2 May 2013 23:17, Shawn Heisey s...@elyograg.org wrote:
 
  On 5/2/2013 4:24 AM, Annette Newton wrote:
   Hi Shawn,
  
   Thanks so much for your response.  We basically are very write
 intensive
   and write throughput is pretty essential to our product.  Reads are
   sporadic and actually is functioning really well.
  
   We write on average (at the moment) 8-12 batches of 35 documents per
   minute.  But we really will be looking to write more in the future, so
  need
   to work out scaling of solr and how to cope with more volume.
  
   Schema (I have changed the names) :
  
   http://pastebin.com/x1ry7ieW
  
   Config:
  
   http://pastebin.com/pqjTCa7L
 
  This is very clean.  There's probably more you could remove/comment, but
  generally speaking I couldn't find any glaring issues.  In particular,
  you have disabled autowarming, which is a major contributor to commit
  speed problems.
 
  The first thing I think I'd try is increasing zkClientTimeout to 30 or
  60 seconds.  You can use the startup commandline or solr.xml, I would
  probably use the latter.  Here's a solr.xml fragment that uses a system
  property or a 15 second default:
 
  ?xml version=1.0 encoding=UTF-8 ?
  solr persistent=true sharedLib=lib
cores adminPath=/admin/cores
  zkClientTimeout=${zkClientTimeout:15000} hostPort=${jetty.port:}
  hostContext=solr
 
  General thoughts, these changes might not help this particular issue:
  You've got autoCommit with openSearcher=true.  This is a hard commit.
  If it were me, I would set that up with openSearcher=false and either do
  explicit soft commits from my application or set up autoSoftCommit with
  a shorter timeframe than autoCommit.
 
  This might simply be a scaling issue, where you'll need to spread the
  load wider than four shards.  I know that there are financial
  considerations with that, and they might not be small, so let's leave
  that alone for now.
 
  The memory problems might be a symptom/cause of the scaling issue I just
  mentioned.  You said you're using facets, which can be a real memory hog
  even with only a few of them.  Have you tried facet.method=enum to see
  how it performs?  You'd need to switch to it exclusively, never go with
  the default of fc.  You could put that in the defaults or invariants
  section of your request handler(s).
 
  Another way to reduce memory usage for facets is to use disk-based
  docValues on version 4.2 or later for the facet fields, but this will
  increase your index size, and your index is already quite large.
  Depending on your index contents, the increase may be small or large.
 
  Something to just mention: It looks like your solrconfig.xml has
  hard-coded absolute paths for dataDir and updateLog.  This is fine if
  you'll only ever have one core/collection on each server, but it'll be a
  disaster if you have multiples.  I could be wrong about how these get
  interpreted

Re: solr adding unique values

2013-05-07 Thread Nikhil Kumar

Thanks Erik,
 For the reply ! I know about 'set' but that's not my goal, i had to give a
better example.
I want this and if i have to add another list_c
user a[
id:a
liists[
 list_a,
 list_b
   ]
]
It Should look like:
user a[
id:a
liists[
 list_a,
 list_b,
 list_c
   ]
]
However if i again add list_a, it should *not* be:
user a[
id:a
liists[
 list_a,
 list_b,
 list_c,
 list_a,
   ]
]
I am *not* reindexing the documents.

Depends on your goal here. I'm guessing you're using
atomic updates, in which case you need to use set
rather than add as the former replaces the contents.
See: http://wiki.apache.org/solr/UpdateJSON#Solr_4.0_Example

If you're simply re-indexing the documents, just send the entire
fresh document to solr and it'll replace the earlier document
completely.

Best
Erick


On Mon, May 6, 2013 at 1:44 PM, Nikhil Kumar nikhil.ku...@hashedin.comwrote:

 Hey,
I have recently started using solr, I have a list of users, which are
 subscribed to some lists.
 eg.
 user a[
 id:a
 liists[
  list_a
]
 ]
 user b[
id:b
 liists[
  list_a
]
 ]
 I am using {id: a, lists:{add:list_a}} to add particular list a
 user.
 but what is happening if I use the same command again, it again adds the
 same list, which i want to avoid.
 user a[
 id:a
 liists[
  list_a,
  list_a
]
 ]
 I searched the documentation and tutorials, i found

-

overwrite = true | false — default is true, meaning newer
documents will replace previously added documents with the same uniqueKey.
-

commitWithin = (milliseconds) if the commitWithin attribute is
present, the document will be added within that time. [image: !]
Solr1.4 http://wiki.apache.org/solr/Solr1.4. See 
 CommitWithinhttp://wiki.apache.org/solr/CommitWithin
-

(deprecated) allowDups = true | false — default is false
-

(deprecated) overwritePending = true | false — default is negation
of allowDups
-

(deprecated) overwriteCommitted = true|false — default is negation
of allowDups


but using overwrite and allowDups didn't solve the problem either,
seems because there is no unique id but just value.

So the question is how to solve this problem?

 --
 Thank You and Regards,
 Nikhil Kumar
 +91-9916343619
 Technical Analyst
 Hashed In Technologies Pvt. Ltd.




-- 
Thank You and Regards,
Nikhil Kumar
+91-9916343619
Technical Analyst
Hashed In Technologies Pvt. Ltd.

Lazy load Error on UI analysis area

2013-05-07 Thread yriveiro

Hi,

I was exploring the UI interface and in the analysis section I had a lazy
load error.

The logs says:

INFO  - 2013-05-07 11:52:06.412; org.apache.solr.core.SolrCore; []
webapp=/solr path=/admin/luke params={_=1367923926380show=schemawt=json}
status=0 QTime=23
ERROR - 2013-05-07 11:52:06.499; org.apache.solr.common.SolrException;
null:org.apache.solr.common.SolrException: lazy loading error
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:258)
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:240)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:931)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
at
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1004)
at
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
at
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:312)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
Caused by: org.apache.solr.common.SolrException: Error loading class
'solr.solr.FieldAnalysisRequestHandler'
at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:464)
at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:396)
at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:518)
at org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java:592)
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:249)
... 20 more
Caused by: java.lang.ClassNotFoundException:
solr.solr.FieldAnalysisRequestHandler
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:789)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:266)
at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:448)
... 24 more



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Lazy-load-Error-on-UI-analysis-area-tp4061291.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Search performance: shards or replications?

2013-05-07 Thread Jan Høydahl

Hi,

It depends(TM) on what kind of search performance problems you are seeing.
If you simply have so high query load that the server starts to kneal, it will
definitely not help to shard, since ALL the shards will still be hit with
ALL the queries, and you add some extra overhead with sharding as well.

But if your QPS is moderate and you have tons of documents, you may gain
better performance both for indexing latency and search latency by sharding.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

7. mai 2013 kl. 13:09 skrev Stanislav Sandalnikov s.sandalni...@gmail.com:

 Hi,
 
 We are moving to SolrCloud architecture. And I have question about search
 performance and its correlation with shards or replicas. What will be more
 efficient: to split all index we have to several shards or create several
 replications of index? Is parallel search works with both shards and
 replicas?
 
 Please share your experience regarding this matter.
 
 Thanks in advance.
 
 Regards,
 Stanislav

Re: Search performance: shards or replications?

2013-05-07 Thread Stanislav Sandalnikov

Hi Yan,

Thanks for the quick reply.

Thus, replication seems to be the preferable solution. QTime decreases
proportional to replications number or there are any other drawbacks?

Just to clarify, what amount of documents stands for tons of documents in
your opinion? :)


2013/5/7 Jan Høydahl jan@cominvent.com

 Hi,

 It depends(TM) on what kind of search performance problems you are seeing.
 If you simply have so high query load that the server starts to kneal, it
 will
 definitely not help to shard, since ALL the shards will still be hit with
 ALL the queries, and you add some extra overhead with sharding as well.

 But if your QPS is moderate and you have tons of documents, you may gain
 better performance both for indexing latency and search latency by
 sharding.

 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com

 7. mai 2013 kl. 13:09 skrev Stanislav Sandalnikov s.sandalni...@gmail.com
 :

  Hi,
 
  We are moving to SolrCloud architecture. And I have question about search
  performance and its correlation with shards or replicas. What will be
 more
  efficient: to split all index we have to several shards or create several
  replications of index? Is parallel search works with both shards and
  replicas?
 
  Please share your experience regarding this matter.
 
  Thanks in advance.
 
  Regards,
  Stanislav

Re: Search performance: shards or replications?

2013-05-07 Thread Stanislav Sandalnikov

P.S. Sorry for misspelling your name, Jan


2013/5/7 Stanislav Sandalnikov s.sandalni...@gmail.com

 Hi Yan,

 Thanks for the quick reply.

 Thus, replication seems to be the preferable solution. QTime decreases
 proportional to replications number or there are any other drawbacks?

 Just to clarify, what amount of documents stands for tons of documents
 in your opinion? :)


 2013/5/7 Jan Høydahl jan@cominvent.com

 Hi,

 It depends(TM) on what kind of search performance problems you are seeing.
 If you simply have so high query load that the server starts to kneal, it
 will
 definitely not help to shard, since ALL the shards will still be hit with
 ALL the queries, and you add some extra overhead with sharding as well.

 But if your QPS is moderate and you have tons of documents, you may gain
 better performance both for indexing latency and search latency by
 sharding.

 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com

 7. mai 2013 kl. 13:09 skrev Stanislav Sandalnikov 
 s.sandalni...@gmail.com:

  Hi,
 
  We are moving to SolrCloud architecture. And I have question about
 search
  performance and its correlation with shards or replicas. What will be
 more
  efficient: to split all index we have to several shards or create
 several
  replications of index? Is parallel search works with both shards and
  replicas?
 
  Please share your experience regarding this matter.
 
  Thanks in advance.
 
  Regards,
  Stanislav

How to get Term Vector Information on Distributed Search

2013-05-07 Thread meghana

Hi,

I am using distributed query to fetch records. Distributed Search Document
on wiki says , Distributed Search support distributed query. but I m getting
error while querying. Not sure if I am doing anything wrong.

below is my Query to fetch Term Vector with Distributed Search. 

http://localhost:8080/solr/core1/tvrh?q=id:3426545tv.all=truef.text.tv.tf_idf=falsef.text.tv.df=falsetv.fl=textshards=localhost:8080/solr/core1,localhost:8080/solr/core2,localhost:8080/solr/core3shards.qt=selectdebugQuery=on

Below is error coming... 

java.lang.NullPointerException
at
org.apache.solr.handler.component.TermVectorComponent.finishStage(TermVectorComponent.java:437)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:317)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:242)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1817)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:280)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:248)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:275)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:161)
at
org.jboss.as.web.security.SecurityContextAssociationValve.invoke(SecurityContextAssociationValve.java:153)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:155)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:368)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:877)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:671)
at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:930)
at java.lang.Thread.run(Unknown Source)


Please help me on this. 
Thanks 
Meghana




--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-get-Term-Vector-Information-on-Distributed-Search-tp4061313.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: How to get Term Vector Information on Distributed Search

2013-05-07 Thread Markus Jelsma

hi - this is a known issue: https://issues.apache.org/jira/browse/SOLR-4479

 
 
-Original message-
 From:meghana meghana.rav...@amultek.com
 Sent: Tue 07-May-2013 14:28
 To: solr-user@lucene.apache.org
 Subject: How to get Term Vector Information on Distributed Search
 
 Hi,
 
 I am using distributed query to fetch records. Distributed Search Document
 on wiki says , Distributed Search support distributed query. but I m getting
 error while querying. Not sure if I am doing anything wrong.
 
 below is my Query to fetch Term Vector with Distributed Search. 
 
 http://localhost:8080/solr/core1/tvrh?q=id:3426545tv.all=truef.text.tv.tf_idf=falsef.text.tv.df=falsetv.fl=textshards=localhost:8080/solr/core1,localhost:8080/solr/core2,localhost:8080/solr/core3shards.qt=selectdebugQuery=on
 
 Below is error coming... 
 
 java.lang.NullPointerException
   at
 org.apache.solr.handler.component.TermVectorComponent.finishStage(TermVectorComponent.java:437)
   at
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:317)
   at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
   at
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:242)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1817)
   at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639)
   at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
   at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
   at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:280)
   at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:248)
   at
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:275)
   at
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:161)
   at
 org.jboss.as.web.security.SecurityContextAssociationValve.invoke(SecurityContextAssociationValve.java:153)
   at
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:155)
   at
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
   at
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
   at
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:368)
   at
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:877)
   at
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:671)
   at 
 org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:930)
   at java.lang.Thread.run(Unknown Source)
 
 
 Please help me on this. 
 Thanks 
 Meghana
 
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/How-to-get-Term-Vector-Information-on-Distributed-Search-tp4061313.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Solr 1.4 - Proximity Search - Where is configuration for storing positions?

2013-05-07 Thread KnightRider

I have an index built using Solr 1.4 with one field.
I was able to run proximity search (Ex: word1 within5 word2) but no where in
the configuration I see any information about storing/indexing the positions
or offsets of the terms.

My understanding is that we need to store/index termvectors
positions/offsets for proximity search to work.

Can someone please tell if positions are indexed by default in Solr 1.4?

FYI, Here is the configuration of field in schema.xml
(to keep it simple I am only adding fieldType and field definition from
schema.xml here)

fieldtype class=solr.TextField name=string
omitNorms=true
sortMissingLast=true
analyzer type=index
tokenizer
class=solr.StandardTokenizerFactory /
filter class=solr.LowerCaseFilterFactory
/
filter class=solr.StopFilterFactory
enablePositionIncrements=true
ignoreCase=true words=stop-words.txt /
/analyzer
/fieldtype

 field indexed=true multiValued=false name=contents stored=true
type=string /

Thanks
-kRider



-
Thanks
-K'Rider
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-1-4-Proximity-Search-Where-is-configuration-for-storing-positions-tp4061315.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Solr 1.4 - Proximity Search - Where is configuration for storing positions?

2013-05-07 Thread Markus Jelsma

Hi - they are indexed by default but can be omitted since 3.4:
http://wiki.apache.org/solr/SchemaXml#Common_field_options

 
 
-Original message-
 From:KnightRider ksu.wildc...@gmail.com
 Sent: Tue 07-May-2013 14:41
 To: solr-user@lucene.apache.org
 Subject: Solr 1.4 - Proximity Search - Where is configuration for storing 
 positions?
 
 I have an index built using Solr 1.4 with one field.
 I was able to run proximity search (Ex: word1 within5 word2) but no where in
 the configuration I see any information about storing/indexing the positions
 or offsets of the terms.
 
 My understanding is that we need to store/index termvectors
 positions/offsets for proximity search to work.
 
 Can someone please tell if positions are indexed by default in Solr 1.4?
 
 FYI, Here is the configuration of field in schema.xml
 (to keep it simple I am only adding fieldType and field definition from
 schema.xml here)
 
 fieldtype class=solr.TextField name=string
 omitNorms=true
 sortMissingLast=true
 analyzer type=index
 tokenizer
 class=solr.StandardTokenizerFactory /
 filter class=solr.LowerCaseFilterFactory
 /
 filter class=solr.StopFilterFactory
 enablePositionIncrements=true
 ignoreCase=true words=stop-words.txt /
 /analyzer
 /fieldtype
 
  field indexed=true multiValued=false name=contents stored=true
 type=string /
 
 Thanks
 -kRider
 
 
 
 -
 Thanks
 -K'Rider
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-1-4-Proximity-Search-Where-is-configuration-for-storing-positions-tp4061315.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Search performance: shards or replications?

2013-05-07 Thread Andre Bois-Crettez


Some clarifications :

1) *lots of docs, few queries* : If you have a high number of documents
(+dozen millions) and lowish number of queries per second (say less than
10), replicas will not help to reduce the Qtime. For this kind of task
it is better to shard the index, as each query will effectively be
processed in parallel by N shards, thus reducing Qtime.

2) *few docs, lots of queries* : less than 10M docs and 30+ qps, on the
contrary, you want more replicas to handle more traffic, and avoid
overloaded servers (which would increase the Qtime).

3) *lots of docs, lots of queries* : do both sharding and replicas.

Actual numbers depends on the hardware, the type of docs and queries, etc.
The best is to benchmark your setup varying the load so that you case
trace a hockey stick graph of Qtime versus qps.
Feel free to ask for details if needed.



André

On 05/07/2013 01:56 PM, Stanislav Sandalnikov wrote:

Hi Yan,

Thanks for the quick reply.

Thus, replication seems to be the preferable solution. QTime decreases
proportional to replications number or there are any other drawbacks?

Just to clarify, what amount of documents stands for tons of documents in
your opinion? :)


2013/5/7 Jan Høydahljan@cominvent.com


Hi,

It depends(TM) on what kind of search performance problems you are seeing.
If you simply have so high query load that the server starts to kneal, it
will
definitely not help to shard, since ALL the shards will still be hit with
ALL the queries, and you add some extra overhead with sharding as well.

But if your QPS is moderate and you have tons of documents, you may gain
better performance both for indexing latency and search latency by
sharding.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

7. mai 2013 kl. 13:09 skrev Stanislav Sandalnikovs.sandalni...@gmail.com

:
Hi,

We are moving to SolrCloud architecture. And I have question about search
performance and its correlation with shards or replicas. What will be

more

efficient: to split all index we have to several shards or create several
replications of index? Is parallel search works with both shards and
replicas?

Please share your experience regarding this matter.

Thanks in advance.

Regards,
Stanislav




--
André Bois-Crettez

Search technology, Kelkoo
http://www.kelkoo.com/


Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention 
exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce 
message, merci de le détruire et d'en avertir l'expéditeur.

custom facet.sort

2013-05-07 Thread Giovanni Bricconi

I have a string field containing values such as 1khz 1ghz 1mhz etc.

I use this field to show a facet, currently I'm showing results in
facet.sort=count order. Now I'm asked to reorder the facet according to the
unit of measure (khz/mhz/ghz).

I also have 3/4 other custom sorting to implement

Is it possible to plug in a custom java class to provide custom facet.sort
modes?

Thank you

Giovanni

RE: Solr 1.4 - Proximity Search - Where is configuration for storing positions?

2013-05-07 Thread KnightRider

Thanks Markus.



-
Thanks
-K'Rider
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-1-4-Proximity-Search-Where-is-configuration-for-storing-positions-tp4061315p4061325.html
Sent from the Solr - User mailing list archive at Nabble.com.

SOLR query performance

2013-05-07 Thread Kamal Palei

Dear All
I am using Apache SOLR 3.6.2 version for my search engine in a job site.

I am observing for a solr query taking around 15 seconds to complete. I am
sure there is something wrong in my approach or I am doing indexing
wrongly. I need assistance/pointer to resolve this issue. I am providing a
detail background of work what I have done. Kindly provide me some pointer
how do I resolve this.

I am using Drupal 7.15 framework for job site. Using Apache solr 3.6.2 as
my search engine. When a user registers his profile, I create a node
(page), attach the document in that node. In every one hour I run the cron
task and do indexing of new nodes or updated nodes.

When an employer search some key words say java, mysql, php etc, I use apis
provided by Drupal to interact with SOLR and get the documents that
contains keywords such as java, mysql, drupal etc.

There is a filter rows. If I specify rows as 100 or 200, the query
returns first (takes around half second). If I specify rows as 3000, it
takes around 15seconds to return.

Now, my question is, Is there any mechanism, I can tell to solr that, my
start row is X, rows is Y, then it will return search result from Xth row
with Y number of rows (Please note that this is similar with LIMIT stuff
provided by mysql).

Kindly let me know. This will help us to great extent.

Best Regards
Kamal

Re: Solr Cloud with large synonyms.txt

2013-05-07 Thread Mark Miller


On May 6, 2013, at 12:32 PM, Son Nguyen s...@trancorp.com wrote:

 I did some researches on internet and found out that because Zookeeper znode 
 size limit is 1MB. I tried to increase the system property jute.maxbuffer 
 but it won't work.
 Does anyone have experience of dealing with it?

Perhaps hit up the ZK list? They doc it as simply raising jute.maxbuffer, 
though you have to do it for each ZK instance.

- Mark

Re: SOLR query performance

2013-05-07 Thread Alexandre Rafalovitch

Yes, that's what the 'start' and 'rows' parameters do in the query
string. I would check the queries Solr sees when you do that long
request. There is usually a delay in retrieving items further down the
sorted list, but 15 seconds does feel excessive.

http://wiki.apache.org/solr/CommonQueryParameters#start

Regards,
   Alex.

On Tue, May 7, 2013 at 10:10 AM, Kamal Palei palei.ka...@gmail.com wrote:
 Now, my question is, Is there any mechanism, I can tell to solr that, my
 start row is X, rows is Y, then it will return search result from Xth row
 with Y number of rows (Please note that this is similar with LIMIT stuff
 provided by mysql).



Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)

Re: Solr Cloud with large synonyms.txt

2013-05-07 Thread Mark Miller


On May 7, 2013, at 10:24 AM, Mark Miller markrmil...@gmail.com wrote:

 
 On May 6, 2013, at 12:32 PM, Son Nguyen s...@trancorp.com wrote:
 
 I did some researches on internet and found out that because Zookeeper znode 
 size limit is 1MB. I tried to increase the system property jute.maxbuffer 
 but it won't work.
 Does anyone have experience of dealing with it?
 
 Perhaps hit up the ZK list? They doc it as simply raising jute.maxbuffer, 
 though you have to do it for each ZK instance.
 
 - Mark
 

the system property must be set on all servers and clients otherwise problems 
will arise.

Make sure you try passing it both to ZK *and* to Solr.

- Mark

Get Suggester to return same phrase as query

2013-05-07 Thread Rounak Jain

Hi,

I'm using the Suggester component in Solr, and if I search for iPhone 5
the suggestions never give me the same phrase, that is iPhone 5. Is there
any way to alter this behaviour to return iPhone 5 as well?

A backup option could be to always display what the user has entered in the
UI, but I want it to be displayed *only *if there are results for it in
Solr, which is only possible if Solr returns the term.

Rounak

Re: SOLR query performance

2013-05-07 Thread Kamal Palei

Thanks a lot Alex.

I will go and try to make use of start filter and update.

Meantime, if I need to know, how many total search records are there.
Example: Lets say I am searching key word java.

There might be 1000 documents having java keyword. I need to show only 100
records at a time.

When I query, as query result I need to know total number of records, and
only 100 records data.

At the bottom of the web page, I am showing something like

*Prev   1234567 8  910 Next*

When user clicks, 4, I will set start filter as 300, rows filter as 100
and do the query. As query result, I am expecting row count as 1000, and
100 records data (row number 301 to 400).

Is this something possible.

Alex, kindly guide me.

Thanks
kamal



On Tue, May 7, 2013 at 7:55 PM, Alexandre Rafalovitch arafa...@gmail.comwrote:

 Yes, that's what the 'start' and 'rows' parameters do in the query
 string. I would check the queries Solr sees when you do that long
 request. There is usually a delay in retrieving items further down the
 sorted list, but 15 seconds does feel excessive.

 http://wiki.apache.org/solr/CommonQueryParameters#start

 Regards,
Alex.

 On Tue, May 7, 2013 at 10:10 AM, Kamal Palei palei.ka...@gmail.com
 wrote:
  Now, my question is, Is there any mechanism, I can tell to solr that, my
  start row is X, rows is Y, then it will return search result from Xth row
  with Y number of rows (Please note that this is similar with LIMIT stuff
  provided by mysql).



 Personal blog: http://blog.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all
 at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
 book)

Re: SOLR query performance

2013-05-07 Thread Shawn Heisey

On 5/7/2013 8:45 AM, Kamal Palei wrote:
 When user clicks, 4, I will set start filter as 300, rows filter as 100
 and do the query. As query result, I am expecting row count as 1000, and
 100 records data (row number 301 to 400).

This is what using the start and rows parameter with Solr will do.  A
nitpick: It will be row number 300 to 399 - the first page is accessed
with start=0.

Requesting 3000 rows (or even a start value of 3000) should not take 15
seconds.  You should review this wiki page that I wrote for possible
problems with your install:

http://wiki.apache.org/solr/SolrPerformanceProblems

One thing that is not on the wiki page, I will need to add it: Solr
performs best when it is the only thing running on a server.  Other
applications (like a web server running Drupal) compete for resources.
Performance of both Solr and the other applications will suffer.  For
low-volume installations on really good hardware this may not be a
problem, but if your volume is high and/or your server is undersized,
then sharing is not a good idea.

Thanks,
Shawn

Storing positions and offsets vs FieldType IndexOptions DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS

2013-05-07 Thread KnightRider

I see that Lucene 4.x has FieldInfo.IndexOptions that can be used to tell
lucene whether to Index Documents/Frequencies/Positions/Offsets.

We are in the process of upgrading from Lucene 2.9 to Lucene 4.x and I was
wondering if there was a way to tell lucene whether to index
docs/freqs/pos/offsets or not in the older versions (2.9) or did it always
index positions and offsets by default?

Also I see that Lucene 4.x has FieldType.setStoreTermVectorPositions and
FieldType.setStoreTermVectorOffsets.
Can someone please tell me a usecase for storing positions and offsets in
index?
Is it necessary to store termvector positions and offsets when using
IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS?

Thanks
-kRider



-
Thanks
-K'Rider
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Storing-positions-and-offsets-vs-FieldType-IndexOptions-DOCS-AND-FREQS-AND-POSITIONS-AND-OFFSETS-tp4061354.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Search performance: shards or replications?

2013-05-07 Thread Stanislav Sandalnikov

Thank you, everything seems clear.
 07.05.2013 20:17 пользователь Andre Bois-Crettez andre.b...@kelkoo.com
написал:

 Some clarifications :

 1) *lots of docs, few queries* : If you have a high number of documents
 (+dozen millions) and lowish number of queries per second (say less than
 10), replicas will not help to reduce the Qtime. For this kind of task
 it is better to shard the index, as each query will effectively be
 processed in parallel by N shards, thus reducing Qtime.

 2) *few docs, lots of queries* : less than 10M docs and 30+ qps, on the
 contrary, you want more replicas to handle more traffic, and avoid
 overloaded servers (which would increase the Qtime).

 3) *lots of docs, lots of queries* : do both sharding and replicas.

 Actual numbers depends on the hardware, the type of docs and queries, etc.
 The best is to benchmark your setup varying the load so that you case
 trace a hockey stick graph of Qtime versus qps.
 Feel free to ask for details if needed.



 André

 On 05/07/2013 01:56 PM, Stanislav Sandalnikov wrote:

 Hi Yan,

 Thanks for the quick reply.

 Thus, replication seems to be the preferable solution. QTime decreases
 proportional to replications number or there are any other drawbacks?

 Just to clarify, what amount of documents stands for tons of documents
 in
 your opinion? :)


 2013/5/7 Jan Høydahljan@cominvent.com

  Hi,

 It depends(TM) on what kind of search performance problems you are
 seeing.
 If you simply have so high query load that the server starts to kneal, it
 will
 definitely not help to shard, since ALL the shards will still be hit with
 ALL the queries, and you add some extra overhead with sharding as well.

 But if your QPS is moderate and you have tons of documents, you may gain
 better performance both for indexing latency and search latency by
 sharding.

 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com

 7. mai 2013 kl. 13:09 skrev Stanislav Sandalnikovs.sandalnikov@**
 gmail.com s.sandalni...@gmail.com

 :
 Hi,

 We are moving to SolrCloud architecture. And I have question about
 search
 performance and its correlation with shards or replicas. What will be

 more

 efficient: to split all index we have to several shards or create
 several
 replications of index? Is parallel search works with both shards and
 replicas?

 Please share your experience regarding this matter.

 Thanks in advance.

 Regards,
 Stanislav



 --
 André Bois-Crettez

 Search technology, Kelkoo
 http://www.kelkoo.com/


 Kelkoo SAS
 Société par Actions Simplifiée
 Au capital de € 4.168.964,30
 Siège social : 8, rue du Sentier 75002 Paris
 425 093 069 RCS Paris

 Ce message et les pièces jointes sont confidentiels et établis à
 l'attention exclusive de leurs destinataires. Si vous n'êtes pas le
 destinataire de ce message, merci de le détruire et d'en avertir
 l'expéditeur.

Re: FieldCache insanity with field used as facet and group

2013-05-07 Thread Chris Hostetter


: I am using the Lucene FieldCache with SolrCloud and I have insane instances
: with messages like:

FWIW: I'm the one that named the result of these sanity checks 
FieldCacheInsantity and i have regretted it ever since -- a better label 
would have been inconsistency

: VALUEMISMATCH: Multiple distinct value objects for
: SegmentCoreReader(owner=_11i(4.2.1):C4493997/853637)+merchantid
: 'SegmentCoreReader(owner=_11i(4.2.1):C4493997/853637)'='merchantid',class
: 
org.apache.lucene.index.SortedDocValues,0.5=org.apache.lucene.search.FieldCacheImpl$SortedDocValuesImpl#557711353
: 
'SegmentCoreReader(owner=_11i(4.2.1):C4493997/853637)'='merchantid',int,null=org.apache.lucene.search.FieldCacheImpl$IntsFromArray#1105988713
: 
'SegmentCoreReader(owner=_11i(4.2.1):C4493997/853637)'='merchantid',int,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_INT_PARSER=org.apache.lucene.search.FieldCacheImpl$IntsFromArray#1105988713
: 
: All insane instances are for a field merchantid of type int used as facet
: and group field.

Interesting: it appears that the grouping code and the facet code are not 
being consistent in how they are building hte field cache, so you are 
getting two objects in the cache for each segment

I haven't checked if this happens much with the example configs, but if 
you could: please file a bug with the details of which Solr version you 
are using along with the schema fieldType  filed declarations for your 
merchantid field, along with the mbean stats output showing the field 
cache insanity after executing two queries like...

/select?q=*:*facet=truefacet.field=merchantid
/select?q=*:*group=truegroup.field=merchantid

(that way we can rule out your custom SearchComponent as having a bug in 
it)

: This insanity can have performance impact ?
: How can I fix it ?

the impact is just that more ram is being used them is probably strictly 
neccessary.  unless there is something unusual in your fieldType 
delcataion, i don't think there is an easy fix you can apply -- we need to 
fix the underlying code.

-Hoss

RE: Solr Cloud with large synonyms.txt

2013-05-07 Thread Son Nguyen

Mark,

I tried to set that property on both ZK (I have only one ZK instance) and Solr, 
but it still didn't work.
But I read somewhere that ZK is not really designed for keeping large data 
files, so this solution - increasing jute.maxbuffer (if I can implement it) 
should be just temporary.

Son

-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com] 
Sent: Tuesday, May 07, 2013 9:35 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr Cloud with large synonyms.txt


On May 7, 2013, at 10:24 AM, Mark Miller markrmil...@gmail.com wrote:

 
 On May 6, 2013, at 12:32 PM, Son Nguyen s...@trancorp.com wrote:
 
 I did some researches on internet and found out that because Zookeeper znode 
 size limit is 1MB. I tried to increase the system property jute.maxbuffer 
 but it won't work.
 Does anyone have experience of dealing with it?
 
 Perhaps hit up the ZK list? They doc it as simply raising jute.maxbuffer, 
 though you have to do it for each ZK instance.
 
 - Mark
 

the system property must be set on all servers and clients otherwise problems 
will arise.

Make sure you try passing it both to ZK *and* to Solr.

- Mark

RE: Solr Cloud with large synonyms.txt

2013-05-07 Thread Son Nguyen

Jan,

Thank you for your answer.
I've opened a JIRA issue with your suggestion.
https://issues.apache.org/jira/browse/SOLR-4793

Son

-Original Message-
From: Jan Høydahl [mailto:jan@cominvent.com] 
Sent: Tuesday, May 07, 2013 4:16 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr Cloud with large synonyms.txt

Hi,

SolrCloud is designed with an assumption that you should be able to upload your 
whole disk-based conf folder into ZK, and that you should be able to add an 
empty Solr node to a cluster and it would download all config from ZK. So 
immediately a splitting strategy automatically handled by ZkSolresourceLoader 
for large files could be one way forward, i.e. store synonyms.txt as e.g. 
__001_synonyms.txt __002_synonyms.txt

Feel free to open a JIRA issue for this so we can get a proper resolution.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

7. mai 2013 kl. 09:55 skrev Roman Chyla roman.ch...@gmail.com:

 We have synonym files bigger than 5MB so even with compression that 
 would be probably failing (not using solr cloud yet) Roman On 6 May 
 2013 23:09, David Parks davidpark...@yahoo.com wrote:
 
 Wouldn't it make more sense to only store a pointer to a synonyms 
 file in zookeeper? Maybe just make the synonyms file accessible via 
 http so other boxes can copy it if needed? Zookeeper was never meant 
 for storing significant amounts of data.
 
 
 -Original Message-
 From: Jan Høydahl [mailto:jan@cominvent.com]
 Sent: Tuesday, May 07, 2013 4:35 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr Cloud with large synonyms.txt
 
 See discussion here
 http://lucene.472066.n3.nabble.com/gt-1MB-file-to-Zookeeper-td3958614
 .html
 
 One idea was compression. Perhaps if we add gzip support to 
 SynonymFilter it can read synonyms.txt.gz which would then fit larger 
 raw dicts?
 
 --
 Jan Høydahl, search solution architect Cominvent AS - 
 www.cominvent.com
 
 6. mai 2013 kl. 18:32 skrev Son Nguyen s...@trancorp.com:
 
 Hello,
 
 I'm building a Solr Cloud (version 4.1.0) with 2 shards and a 
 Zookeeper
 (the Zookeeer is on different machine, version 3.4.5).
 I've tried to start with a 1.7MB synonyms.txt, but got a
 ConnectionLossException:
 Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException:
 KeeperErrorCode = ConnectionLoss for /configs/solr1/synonyms.txt
   at
 org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
   at
 org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
   at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1266)
   at
 org.apache.solr.common.cloud.SolrZkClient$8.execute(SolrZkClient.java
 :270)
   at
 org.apache.solr.common.cloud.SolrZkClient$8.execute(SolrZkClient.java
 :267)
   at
 
 org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecut
 or.java
 :65)
   at
 org.apache.solr.common.cloud.SolrZkClient.setData(SolrZkClient.java:2
 67)
   at
 org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:
 436)
   at
 org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:
 315)
   at
 org.apache.solr.cloud.ZkController.uploadToZK(ZkController.java:1135)
   at
 org.apache.solr.cloud.ZkController.uploadConfigDir(ZkController.java:
 955)
   at
 org.apache.solr.core.CoreContainer.initZooKeeper(CoreContainer.java:2
 85)
   ... 43 more
 
 I did some researches on internet and found out that because 
 Zookeeper
 znode size limit is 1MB. I tried to increase the system property 
 jute.maxbuffer but it won't work.
 Does anyone have experience of dealing with it?
 
 Thanks,
 Son

stats cache

2013-05-07 Thread J Mohamed Zahoor

Hi

I am computing lots of stats as part of a query…
looks like the solr caching is not helping here… 

Does solr caches stats of a query?

./zahoor

facet.pivot limit

2013-05-07 Thread J Mohamed Zahoor

Hi

is there a limit for facet pivot  like we have in facet.limit?

./zahoor

Re: Solr Cloud with large synonyms.txt

2013-05-07 Thread Mark Miller

I'm not so worried about the large file in zk issue myself.

The concern is that you start storing and accessing lots of large files in ZK. 
This is not what it was made for, and everything stays in RAM, so they guard 
against this type of usage.

We are talking about a config file that is loaded on Core load though. It's 
uploaded and read very rarely. On modern hardware and networks, making that 
file 5MB rather than 1MB is not going to ruin your day. It just won't. Solr 
does not use ZooKeeper heavily - in a steady state cluster, it doesn't read or 
write from ZooKeeper at all to any degree that registers. I'm going to have to 
see problems loading these larger config files from ZooKeeper before I'm 
worried that it's a problem.

- Mark

On May 7, 2013, at 12:21 PM, Son Nguyen s...@trancorp.com wrote:

 Mark,
 
 I tried to set that property on both ZK (I have only one ZK instance) and 
 Solr, but it still didn't work.
 But I read somewhere that ZK is not really designed for keeping large data 
 files, so this solution - increasing jute.maxbuffer (if I can implement it) 
 should be just temporary.
 
 Son
 
 -Original Message-
 From: Mark Miller [mailto:markrmil...@gmail.com] 
 Sent: Tuesday, May 07, 2013 9:35 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr Cloud with large synonyms.txt
 
 
 On May 7, 2013, at 10:24 AM, Mark Miller markrmil...@gmail.com wrote:
 
 
 On May 6, 2013, at 12:32 PM, Son Nguyen s...@trancorp.com wrote:
 
 I did some researches on internet and found out that because Zookeeper 
 znode size limit is 1MB. I tried to increase the system property 
 jute.maxbuffer but it won't work.
 Does anyone have experience of dealing with it?
 
 Perhaps hit up the ZK list? They doc it as simply raising jute.maxbuffer, 
 though you have to do it for each ZK instance.
 
 - Mark
 
 
 the system property must be set on all servers and clients otherwise 
 problems will arise.
 
 Make sure you try passing it both to ZK *and* to Solr.
 
 - Mark

Re: stats cache

2013-05-07 Thread Otis Gospodnetic

Hi,

Yes, in the query cache.  You should see it in your monitoring tool or
your Solr Stats Admin page.  Doesn't help if queries don't repeat or
cache settings and poor.

Otis
--
Search Analytics - http://sematext.com/search-analytics/index.html
SOLR Performance Monitoring - http://sematext.com/spm/index.html





On Tue, May 7, 2013 at 12:48 PM, J Mohamed Zahoor zah...@indix.com wrote:
 Hi

 I am computing lots of stats as part of a query…
 looks like the solr caching is not helping here…

 Does solr caches stats of a query?

 ./zahoor

Use case for storing positions and offsets in index?

2013-05-07 Thread KnightRider

Can someone please tell me the usecase for storing term positions and offsets
in the index?

I am trying to understand the difference between storing positions/offsets
vs indexing positions/offsets.

Thanks
KR



-
Thanks
-K'Rider
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Use-case-for-storing-positions-and-offsets-in-index-tp4061376.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Delete from Solr Cloud 4.0 index..

2013-05-07 Thread Erick Erickson

bq: Will docValues help with memory usage?

'm still a bit fuzzy on all the ramifications of DocValues, but I
somewhat doubt they'll result in index size savings, they _really_
help with loading the values for a field, but the end result is still
the values in memory

People who know what they're talking about, _please_ correct this if
I'm off base.

Sure, stored field compression will help with disk space, no question.
I was mostly cautioning against extrapolating from disk size to memory
requirements without taking this into account.


Best
Erick

Best
Erick

On Tue, May 7, 2013 at 6:46 AM, Annette Newton
annette.new...@servicetick.com wrote:
 Hi Erick,

 Thanks for the tip.

 Will docValues help with memory usage?  It seemed a bit complicated to set
 up..

 The index size saving was nice because that means that potentially I could
 use smaller provisioned IOP volumes which cost less...

 Thanks.


 On 3 May 2013 18:27, Erick Erickson erickerick...@gmail.com wrote:

 Anette:

 Be a little careful with the index size savings, they really don't
 mean much for _searching_. The sotred field compression
 significantly reduces the size on disk, but only for the stored
 data which is only accessed when returning the top N docs. In
 terms of how many docs you can fit on your hardware, it's pretty
 irrelevant.

 The *.fdt and *.fdx files in your index directory contain the stored
 data, so when looking at the effects of various options (including
 compression), you can pretty much ignore these files.

 FWIW,
 Erick

 On Fri, May 3, 2013 at 2:03 AM, Annette Newton
 annette.new...@servicetick.com wrote:
  Thanks Shawn.
 
  I have played around with Soft Commits before and didn't seem to have any
  improvement, but with the current load testing I am doing I will give it
  another go.
 
  I have researched docValues and came across the fact that it would
 increase
  the index size.  With the upgrade to 4.2.1 the index size has reduced by
  approx 33% which is pleasing and I don't really want to lose that saving.
 
  We do use the facet.enum method - which works really well, but I will
  verify that we are using that in every instance, we have numerous
  developers working on the product and maybe one or two have slipped
  through.
 
  Right from the first I upped the zkClientTimeout to 30 as I wanted to
 give
  extra time for any network blips that we experience on AWS.  We only seem
  to drop communication on a full garbage collection though.
 
  I am coming to the conclusion that we need to have more shards to cope
 with
  the writes, so I will play around with adding more shards and see how I
 go.
 
 
  I appreciate you having a look over our setup and the advice.
 
  Thanks again.
 
  Netty.
 
 
  On 2 May 2013 23:17, Shawn Heisey s...@elyograg.org wrote:
 
  On 5/2/2013 4:24 AM, Annette Newton wrote:
   Hi Shawn,
  
   Thanks so much for your response.  We basically are very write
 intensive
   and write throughput is pretty essential to our product.  Reads are
   sporadic and actually is functioning really well.
  
   We write on average (at the moment) 8-12 batches of 35 documents per
   minute.  But we really will be looking to write more in the future, so
  need
   to work out scaling of solr and how to cope with more volume.
  
   Schema (I have changed the names) :
  
   http://pastebin.com/x1ry7ieW
  
   Config:
  
   http://pastebin.com/pqjTCa7L
 
  This is very clean.  There's probably more you could remove/comment, but
  generally speaking I couldn't find any glaring issues.  In particular,
  you have disabled autowarming, which is a major contributor to commit
  speed problems.
 
  The first thing I think I'd try is increasing zkClientTimeout to 30 or
  60 seconds.  You can use the startup commandline or solr.xml, I would
  probably use the latter.  Here's a solr.xml fragment that uses a system
  property or a 15 second default:
 
  ?xml version=1.0 encoding=UTF-8 ?
  solr persistent=true sharedLib=lib
cores adminPath=/admin/cores
  zkClientTimeout=${zkClientTimeout:15000} hostPort=${jetty.port:}
  hostContext=solr
 
  General thoughts, these changes might not help this particular issue:
  You've got autoCommit with openSearcher=true.  This is a hard commit.
  If it were me, I would set that up with openSearcher=false and either do
  explicit soft commits from my application or set up autoSoftCommit with
  a shorter timeframe than autoCommit.
 
  This might simply be a scaling issue, where you'll need to spread the
  load wider than four shards.  I know that there are financial
  considerations with that, and they might not be small, so let's leave
  that alone for now.
 
  The memory problems might be a symptom/cause of the scaling issue I just
  mentioned.  You said you're using facets, which can be a real memory hog
  even with only a few of them.  Have you tried facet.method=enum to see
  how it performs?  You'd need to switch to it exclusively, never go with
  the default of fc.  You

Re: Rearranging Search Results of a Search?

2013-05-07 Thread Erick Erickson

No, DocTransformers work on a single document at a time, which is
pretty clear if you look at the methods you must implement.

Really, you'd do yourself a favor by doing a little more research
before asking questions, you might review:
http://wiki.apache.org/solr/UsingMailingLists
and consider that most of us are volunteers with limited time. So a
little evidence that you're putting forth some effort before pinging
the list would be well received.

Best
Erick

On Tue, May 7, 2013 at 4:04 AM, Furkan KAMACI furkankam...@gmail.com wrote:
 Can I use Transformers for my purpose?

 2013/5/3 Furkan KAMACI furkankam...@gmail.com

 I think this looks like what I search for:
 https://issues.apache.org/jira/browse/SOLR-4465

 How about post filter for Lucene, can it help me for my purpose?

 2013/5/3 Otis Gospodnetic otis.gospodne...@gmail.com

 Hi,

 You should use search more often :)

 http://search-lucene.com/?q=scriptable+collectorsort=newestOnTopfc_project=Solrfc_type=issue

 Coincidentally, what you see there happens to be a good example of a
 Solr component that does something behind the scenes to deliver those
 search results even though my original query was bad.  Knd of
 similar to what you are after.

 Otis
 --
 Solr  ElasticSearch Support
 http://sematext.com/





 On Thu, May 2, 2013 at 4:47 PM, Furkan KAMACI furkankam...@gmail.com
 wrote:
  I know that I can use boosting at query for a field, for a searching
 term,
  at solrconfig.xml and query elevator so I can arrange the results of a
  search. However after I get top documents how can I change the order of
 a
  results? Does Lucene's postfilter stands for that?

Re: Lazy load Error on UI analysis area

2013-05-07 Thread Erick Erickson

It looks like you have old jars in the classpath somewhere, class not found
just shouldn't be happening.

If this can be reproduced on a fresh install (and even better on a machine
that's never had Solr installed) it would be something we'd need to pursue...

Best
Erick

On Tue, May 7, 2013 at 6:56 AM, yriveiro yago.rive...@gmail.com wrote:
 Hi,

 I was exploring the UI interface and in the analysis section I had a lazy
 load error.

 The logs says:

 INFO  - 2013-05-07 11:52:06.412; org.apache.solr.core.SolrCore; []
 webapp=/solr path=/admin/luke params={_=1367923926380show=schemawt=json}
 status=0 QTime=23
 ERROR - 2013-05-07 11:52:06.499; org.apache.solr.common.SolrException;
 null:org.apache.solr.common.SolrException: lazy loading error
 at
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:258)
 at
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:240)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816)
 at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
 at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
 at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
 at
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
 at
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
 at
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
 at
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
 at
 org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:931)
 at
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
 at
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
 at
 org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1004)
 at
 org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
 at
 org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:312)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:722)
 Caused by: org.apache.solr.common.SolrException: Error loading class
 'solr.solr.FieldAnalysisRequestHandler'
 at
 org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:464)
 at
 org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:396)
 at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:518)
 at 
 org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java:592)
 at
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:249)
 ... 20 more
 Caused by: java.lang.ClassNotFoundException:
 solr.solr.FieldAnalysisRequestHandler
 at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
 at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:789)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
 at java.lang.Class.forName0(Native Method)
 at java.lang.Class.forName(Class.java:266)
 at
 org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:448)
 ... 24 more



 -
 Best regards
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Lazy-load-Error-on-UI-analysis-area-tp4061291.html
 Sent from the Solr - User mailing list archive at Nabble.com.

RE: Unsubscribing from JIRA

2013-05-07 Thread johnmunir


For someone link me, who want to follow dev discussions but not JIRA, having a 
separate mailing list subscription for each would be ideal.  The incoming mail 
traffic would be cut drastically (for me, I get far more non relevant emails 
from JIRA vs. dev).


-- MJ
 
-Original Message-
From: Raymond Wiker [mailto:rwi...@gmail.com] 
Sent: Wednesday, May 01, 2013 2:01 PM
To: solr-user@lucene.apache.org
Subject: Re: Unsubscribing from JIRA
 
On May 1, 2013,at 19:07 , johnmunir@aol.comwrote:
 Are yousaying because I'm subscribed to dev, which I'm, is why I'm getting 
 JIRA mailstoo, and the only way I can stop JIRA mails is to unsubscribe from 
 dev?  I don't think so.  I'm subscribed to other projects, both devand user, 
 and yet I do not receive JIRA mails.
 
 
I'm pretty surethat's the case... I subscribed to dev, and got the JIRA mails. 
I unsubscribedfrom dev, and the JIRA mails stopped.

Re: Get Suggester to return same phrase as query

2013-05-07 Thread Erick Erickson

Hmmm, R. Muir did some work here:
https://issues.apache.org/jira/browse/SOLR-3143, note that it's 4.0 or
later. I haven't implemented this, but this is a common problem so if
you do dig into it and get it to work (warning, I haven't a clue) it'd
be a great contribution to the Wiki.

Best
Erick

On Tue, May 7, 2013 at 10:41 AM, Rounak Jain rouna...@gmail.com wrote:
 Hi,

 I'm using the Suggester component in Solr, and if I search for iPhone 5
 the suggestions never give me the same phrase, that is iPhone 5. Is there
 any way to alter this behaviour to return iPhone 5 as well?

 A backup option could be to always display what the user has entered in the
 UI, but I want it to be displayed *only *if there are results for it in
 Solr, which is only possible if Solr returns the term.

 Rounak

Re: Unsubscribing from JIRA

2013-05-07 Thread Alexandre Rafalovitch

Email filters? I mean, you may have a point, but the cost of change at
this moment is probably too high. Personal email filters, on the other
hand, seems like an easy solution.

Regards,
   Alex.

On Tue, May 7, 2013 at 2:01 PM,  johnmu...@aol.com wrote:
 For someone link me, who want to follow dev discussions but not JIRA, having 
 a separate mailing list subscription for each would be ideal.  The incoming 
 mail traffic would be cut drastically (for me, I get far more non relevant 
 emails from JIRA vs. dev).



Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)

Search identifier fields containing blanks

2013-05-07 Thread Silvio Hermann


Hello,

I am about to index identfier fields containing blanks (shelfmarks) eg. G 23/60 
12
The field type is set to Solr.string. To get the exact matching hit (the doc 
with shelfmark mentioned above) the user must quote the search term. Is there a 
way to omit the quotes?

Best,

Silvio

Re: Storing positions and offsets vs FieldType IndexOptions DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS

2013-05-07 Thread Shawn Heisey


On 5/7/2013 9:50 AM, KnightRider wrote:

I see that Lucene 4.x has FieldInfo.IndexOptions that can be used to tell
lucene whether to Index Documents/Frequencies/Positions/Offsets.


I really don't like giving unhelpful responses like this, but I don't 
think there's any other way to go.


This is the solr-user mailing list.  Most of the end-users here (and a 
few of the regulars, including myself) have very little experience with 
Lucene, even though Solr is a Lucene application and the source code is 
part of Lucene.


There are a number of lucene-specific discussion places available:

http://lucene.apache.org/core/discussion.html

Thanks,
Shawn

dataimport handler

2013-05-07 Thread Eric Myers

In the  data import handler  I have multiple entities.  Each one
generates a date in the
dataimport.properties i.e. entityname.last_index_time.

How do I reference the specific entity time in my delta queries?

Thanks

Eric

Re: solr.LatLonType type vs solr.SpatialRecursivePrefixTreeFieldType

2013-05-07 Thread Smiley, David W.

Hi Barani,

This identical question was posed at the same time on StackOverflow, and I
answered it there already:

http://stackoverflow.com/questions/16407110/solr-4-2-solr-latlontype-type-v
s-solr-spatialrecursiveprefixtreefieldtype/16409327#16409327

~ David

On 5/6/13 12:28 PM, bbarani bbar...@gmail.com wrote:

Hi,

I am currently using SOLR 4.2 to index geospatial data. I have configured
my
geospatial field as below.

 fieldType name=location class=solr.LatLonType
subFieldSuffix=_coordinate/

  field name=latlong type=location   indexed=true
stored=false multiValued=true/

I just want to make sure that I am using the correct SOLR class for
performing geospatial search since I am not sure which of the 2
class(LatLonType vs  SpatialRecursivePrefixTreeFieldType) will be
supported
by future versions of SOLR.

I assume latlong is an upgraded version of
SpatialRecursivePrefixTreeFieldType, can someone please confirm if I am
right?

Thanks,
Barani 



--
View this message in context:
http://lucene.472066.n3.nabble.com/solr-LatLonType-type-vs-solr-SpatialRec
ursivePrefixTreeFieldType-tp4061113.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: ConcurrentUpdateSolrServer Missing ContentType error on SOLR 4.2.1

2013-05-07 Thread cleardot

This is resolved, I switched in the 4.2.1 jars and also corrected a mismatch
between the compile and runtime JDKs, for some reason the system was
overriding my JAVA_HOME setting (6.1) and running the client with a 5.0 JVM. 
I did not have to use setParser.

I did try running the 'new' 4.2.1 SolrJ client against SOLR 3.6 and got this
error in the server log:

2013-05-07 16:14:34,835 WARN 
[org.apache.solr.handler.XmlUpdateRequestHandler]
(http-0.0.0.0-18841-Processor15) Unknown attribute doc/field/@update

so I've settled for separate 3.6 and 4.2.1 versions.

Your info helped a lot, thanks Shawn.

DK




--
View this message in context: 
http://lucene.472066.n3.nabble.com/ConcurrentUpdateSolrServer-Missing-ContentType-error-on-SOLR-4-2-1-tp4061160p4061416.html
Sent from the Solr - User mailing list archive at Nabble.com.

Storing and retrieving Objects using ByteField

2013-05-07 Thread zqzuk

Hi

I need to store and retrieve some custom java objects using Solr and I have
used ByteField and java serialisation for this. Using the embedded jetty
server I can see these byte data but when I use Solrj api to retrieve the
data they are not available. Details are below:

My schema:
--
field name=id type=string indexed=true stored=true required=true
multiValued=false / 
field name=value type=byte indexed=false stored=true
multiValued=true omitNorms=true/ 
fieldtype name=byte class=solr.ByteField/


My query using jetty embedded solr server:
--
http://localhost:8983/solr/collection1/select?q=id:1843921115wt=xmlindent=true
-


And I can see the following results in the browser:
-
esult name=response numFound=1 start=0
doc
str name=id1843921115/str
arr name=value
strrO0ABXNyABNqYXZhLnV0aWwuQXJyYX..blahblahblah/str
/arr
long name=_version_1434407268842995712/long
/doc
-

So it looks like the data are created properly.

However, when I use SolrJ to retrieve this record like this:

---
 ModifiableSolrParams params = new ModifiableSolrParams();
 params.set(q, id:1843921115);
 QueryResponse response = server.query(params);
 SolrDocument doc = response.getResults().get(0);
 if(doc.getFieldValues(value)==null)
 System.out.println(data unavailable);
---

I can see that doc only have two fields: id and version, and the field
value is never available.

Please suggestions what have I done wrong?

Many thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Storing-and-retrieving-Objects-using-ByteField-tp4061418.html
Sent from the Solr - User mailing list archive at Nabble.com.

Index compatibility between Solr releases.

2013-05-07 Thread Skand Gupta

We have a fairly large (in the order of 10s of TB) indices built using Solr
3.5. We are considering migrating to Solr 4.3 and was wondering what the
policy is on maintaining backward compatibility of the indices? Will 4.3
work with my 3.5 indexes? Because of the large data size, I would ideally
like to move new data to 4.3 and gradually re-index all the 3.5 indices.

Thanks,
- Skand.

Index compatibility between Solr releases.

2013-05-07 Thread Skand Gupta

We have a fairly large (in the order of 10s of TB) indices built using Solr
3.5. We are considering migrating to Solr 4.3 and was wondering what the
policy is on maintaining backward compatibility of the indices? Will 4.3
work with my 3.5 indexes? Because of the large data size, I would ideally
like to move new data to 4.3 and gradually re-index all the 3.5 indices.

Thanks,
- Skand.

Re: Index compatibility between Solr releases.

2013-05-07 Thread Shawn Heisey


On 5/7/2013 3:11 PM, Skand Gupta wrote:

We have a fairly large (in the order of 10s of TB) indices built using Solr
3.5. We are considering migrating to Solr 4.3 and was wondering what the
policy is on maintaining backward compatibility of the indices? Will 4.3
work with my 3.5 indexes? Because of the large data size, I would ideally
like to move new data to 4.3 and gradually re-index all the 3.5 indices.


Solr 4.x will read 3.x indexes with no problem.  When Solr 5.x comes 
out, it will read 4.x indexes, but it will not read 3.x indexes.


If the 4.x server does any updates on a 3.x index, it will write new 
segments in the new format, and if existing segments get merged, they 
will be in the new format.


If you do an optimize in that situation, which would take forever with 
terabytes of data, Solr would convert the index format.  Reindexing is 
MUCH better, but you've already stated that as a goal, so I won't 
mention any more about that.


Due to advances and bugfixes, you might see some unusual behavior until 
you reindex.  This happens due to changes in the way analyzers and query 
parsers work as compared to the way things worked on 3.5 when you built 
the index.  The more complicated your analyzer chains are in your 
schema, the more likely you are to run into this.


One thing that might be of immediate concern - in 4.0 and later, the 
forward slash is a special query character and must be escaped with a 
backslash.  It is safe to send this escaped character to 3.5 as well. 
The utility method in SolrJ for escaping queries 
(ClientUtils#escapeQueryChars) has been updated to include the foward 
slash in newer SolrJ versions.


Thanks,
Shawn

Re: dataimport handler

2013-05-07 Thread Shalin Shekhar Mangar

Using ${dih.entity_name.last_index_time} should work. Make sure you put
it in quotes in your query.


On Tue, May 7, 2013 at 12:07 PM, Eric Myers emy...@nabancard.com wrote:

 In the  data import handler  I have multiple entities.  Each one
 generates a date in the
 dataimport.properties i.e. entityname.last_index_time.

 How do I reference the specific entity time in my delta queries?

 Thanks

 Eric




-- 
Regards,
Shalin Shekhar Mangar.

Re: stats cache

2013-05-07 Thread Yonik Seeley

On Tue, May 7, 2013 at 12:48 PM, J Mohamed Zahoor zah...@indix.com wrote:
 Hi

 I am computing lots of stats as part of a query…
 looks like the solr caching is not helping here…

 Does solr caches stats of a query?

No.  Neither facet counts or stats part of a request are cached.  The
query cache only caches top N docs (plus scores if applicable) for a
given query + filters.

If the whole request is identical, then you can use an HTTP caching
mechanism though.

-Yonik
http://lucidworks.com

Re: Index compatibility between Solr releases.

2013-05-07 Thread Skand S Gupta

Thank you Shawn. This was detailed and very helpful.

Skand. 

On May 7, 2013, at 5:54 PM, Shawn Heisey s...@elyograg.org wrote:

 On 5/7/2013 3:11 PM, Skand Gupta wrote:
 We have a fairly large (in the order of 10s of TB) indices built using Solr
 3.5. We are considering migrating to Solr 4.3 and was wondering what the
 policy is on maintaining backward compatibility of the indices? Will 4.3
 work with my 3.5 indexes? Because of the large data size, I would ideally
 like to move new data to 4.3 and gradually re-index all the 3.5 indices.
 
 Solr 4.x will read 3.x indexes with no problem.  When Solr 5.x comes out, it 
 will read 4.x indexes, but it will not read 3.x indexes.
 
 If the 4.x server does any updates on a 3.x index, it will write new segments 
 in the new format, and if existing segments get merged, they will be in the 
 new format.
 
 If you do an optimize in that situation, which would take forever with 
 terabytes of data, Solr would convert the index format.  Reindexing is MUCH 
 better, but you've already stated that as a goal, so I won't mention any more 
 about that.
 
 Due to advances and bugfixes, you might see some unusual behavior until you 
 reindex.  This happens due to changes in the way analyzers and query parsers 
 work as compared to the way things worked on 3.5 when you built the index.  
 The more complicated your analyzer chains are in your schema, the more likely 
 you are to run into this.
 
 One thing that might be of immediate concern - in 4.0 and later, the forward 
 slash is a special query character and must be escaped with a backslash.  It 
 is safe to send this escaped character to 3.5 as well. The utility method in 
 SolrJ for escaping queries (ClientUtils#escapeQueryChars) has been updated to 
 include the foward slash in newer SolrJ versions.
 
 Thanks,
 Shawn

Index corrupted detection from http get command.

2013-05-07 Thread Michel Dion

Hello,

I'm look for a way to detect solr index corruption using a http get
command. I've look at the /admin/ping and /admin/luke request handlers but
not sure if the their status provide guarantees that everything is all
right. The idea is to be able to tell a load balancer to put a given solr
instance out of rotation if its index is  corrupted.

Thanks

Michel

Re: Storing positions and offsets vs FieldType IndexOptions DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS

2013-05-07 Thread KnightRider

Thanks Shawn. I'll reach out to Lucene discussion group.



-
Thanks
-K'Rider
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Storing-positions-and-offsets-vs-FieldType-IndexOptions-DOCS-AND-FREQS-AND-POSITIONS-AND-OFFSETS-tp4061354p4061457.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Questions about the performance of Solr

2013-05-07 Thread joo

Thank you.
However, fq is already in use.
In my opinion, it is to think that it might be slow data of 70 million
reviews is contained in the core of one, but do you have examples of
performance of a certain number or more may decrease maybe?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Questions-about-the-performance-of-Solr-tp4060988p4061461.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Search identifier fields containing blanks

2013-05-07 Thread Chris Hostetter


: I am about to index identfier fields containing blanks (shelfmarks) eg. G
: 23/60 12
: The field type is set to Solr.string. To get the exact matching hit (the doc
: with shelfmark mentioned above) the user must quote the search term. Is there
: a way to omit the quotes?

whitespace has to be quoted when using the lucene QParser because it's a 
semanticly significant character that means end boolean query clause

if you want to search for a literal string w/o needing any escaping, use 
the term QParser...

{!term f=yourFieldName}G 23/60 12

Of course, if you are putting this in a URL (ie: testing in a browser) it 
still needs to be URL escaped...

/select?q={!term+f=yourFieldName}G+23/60+12


-Hoss

Re: Unsubscribing from JIRA

2013-05-07 Thread Chris Hostetter


: Email filters? I mean, you may have a point, but the cost of change at
: this moment is probably too high. Personal email filters, on the other
: hand, seems like an easy solution.

The reason for having Jira notifications go to the devs list is that all 
of the comments  discussion in jira are the bulk of the discussion about 
developing Solr/Lucene.  The goal is to make it easy to subscribe to one 
list and then be notified about eveyrthing related to the development 
efforts.

As mentioned in a previous comment, the appropraite place to suggest 
policy changes to the dev list would be on the dev list -- but Alex's 
comment is probably what you are going to hear from most people.


-Hoss

Re: Search identifier fields containing blanks

2013-05-07 Thread Upayavira



On Wed, May 8, 2013, at 02:07 AM, Chris Hostetter wrote:
 
 : I am about to index identfier fields containing blanks (shelfmarks) eg.
 G
 : 23/60 12
 : The field type is set to Solr.string. To get the exact matching hit
 (the doc
 : with shelfmark mentioned above) the user must quote the search term. Is
 there
 : a way to omit the quotes?
 
 whitespace has to be quoted when using the lucene QParser because it's a 
 semanticly significant character that means end boolean query clause
 
 if you want to search for a literal string w/o needing any escaping, use 
 the term QParser...
 
   {!term f=yourFieldName}G 23/60 12
 
 Of course, if you are putting this in a URL (ie: testing in a browser) it 
 still needs to be URL escaped...
 
   /select?q={!term+f=yourFieldName}G+23/60+12

I'm surprised you didn't offer the improvement (a technique I learned
from you..:-) ):

/select?q={!term f=yourFieldName v=$productCode}productCode=G 23/60 12

which allows you to present the code as a separate request parameter.

Upayavira

Re: Scores dilemma after providing boosting with bq as same weigtage for 2 condition

2013-05-07 Thread nishi

ab_1eb83ef9bc0896:
0.17063755 = (MATCH) sum of:
  3.085E-4 = (MATCH) MatchAllDocsQuery, product of:
3.085E-4 = queryNorm
  0.009742409 = (MATCH) product of:
0.019484818 = (MATCH) sum of:
  0.016588148 = (MATCH) sum of:
0.0034696688 = (MATCH) weight(articleTopic:Food^1.2 in 2441)
[DefaultSimilarity], result of:
  0.0034696688 = score(doc=2441,freq=1.0 = termFreq=1.0
), product of:
0.0012905049 = queryWeight, product of:
  1.2 = boost
  2.6886134 = idf(docFreq=52556, maxDocs=284437)
  3.085E-4 = queryNorm
2.6886134 = fieldWeight in 2441, product of:
  1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
  2.6886134 = idf(docFreq=52556, maxDocs=284437)
  1.0 = fieldNorm(doc=2441)
0.013118479 = (MATCH) weight(articleTopic:Office^1.2 in 2441)
[DefaultSimilarity], result of:
  0.013118479 = score(doc=2441,freq=1.0 = termFreq=1.0
), product of:
0.0025093278 = queryWeight, product of:
  1.2 = boost
  5.2278857 = idf(docFreq=4147, maxDocs=284437)
  3.085E-4 = queryNorm
5.2278857 = fieldWeight in 2441, product of:
  1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
  5.2278857 = idf(docFreq=4147, maxDocs=284437)
  1.0 = fieldNorm(doc=2441)
  7.967604E-4 = (MATCH) product of:
0.0051789423 = (MATCH) sum of:
  0.0017619515 = (MATCH) weight(subTopic:Protein in 2441)
[DefaultSimilarity], result of:
0.0017619515 = score(doc=2441,freq=1.0 = termFreq=1.0
), product of:
  4.5981447E-4 = queryWeight, product of:
3.8318748 = idf(docFreq=16753, maxDocs=284437)
1.1999726E-4 = queryNorm
  3.8318748 = fieldWeight in 2441, product of:
1.0 = tf(freq=1.0), with freq of:
  1.0 = termFreq=1.0
3.8318748 = idf(docFreq=16753, maxDocs=284437)
1.0 = fieldNorm(doc=2441)
  0.0034169909 = (MATCH) weight(subTopic:Printers in 2441)
[DefaultSimilarity], result of:
0.0034169909 = score(doc=2441,freq=1.0 = termFreq=1.0
), product of:
  5.019797E-4 = queryWeight, product of:
0.3 = boost
4.18326 = idf(docFreq=11789, maxDocs=284437)
3.085E-4 = queryNorm
  4.18326 = fieldWeight in 2441, product of:
1.0 = tf(freq=1.0), with freq of:
  1.0 = termFreq=1.0
4.18326 = idf(docFreq=11789, maxDocs=284437)
1.0 = fieldNorm(doc=2441)
0.5 = coord(3/6)
  0.16049515 = (MATCH)
FunctionQuery(0.08/(3.16E-11*float(ms(const(136779840),date(pubDate)))+0.5)),
product of:
0.16049883 =
0.08/(3.16E-11*float(ms(const(136779840),date(pubDate)=1367847578000))+0.5)
2500.0 = boost
3.085E-4 = queryNorm
,
.
.
.
..

This is the explain description for the score coming upbut not coming in
easier understandable format...any pointer would be helpful, meantime
looking into it to understand more



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Scores-dilemma-after-providing-boosting-with-bq-as-same-weigtage-for-2-condition-tp4061035p4061480.html
Sent from the Solr - User mailing list archive at Nabble.com.

62 matches

Mail list logo