Re: Use case for storing positions and offsets in index?

2013-05-09 Thread Jack Krupansky

Term positions in the index are used for phrase query and span queries.

There is a separate concept called term vectors that maintains positions 
as well. It is most useful for highlighting - you want to know exactly where 
a term started and ended.


-- Jack Krupansky

-Original Message- 
From: KnightRider

Sent: Tuesday, May 07, 2013 12:58 PM
To: solr-user@lucene.apache.org
Subject: Use case for storing positions and offsets in index?

Can someone please tell me the usecase for storing term positions and 
offsets

in the index?

I am trying to understand the difference between storing positions/offsets
vs indexing positions/offsets.

Thanks
KR



-
Thanks
-K'Rider
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Use-case-for-storing-positions-and-offsets-in-index-tp4061376.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Use case for storing positions and offsets in index?

2013-05-09 Thread Jason Hellman
Consider further that term vector data and highlighting becomes very useful if 
you highlight externally to Solr.  That is to say, you have the data stored 
externally and wish to re-parse positions of terms (especially synonyms) from 
source material.  This is a (not too uncommon) technique used for extremely 
large articles, where data storage in the Lucene index might be repetitive.

On May 8, 2013, at 11:04 PM, Jack Krupansky j...@basetechnology.com wrote:

 Term positions in the index are used for phrase query and span queries.
 
 There is a separate concept called term vectors that maintains positions as 
 well. It is most useful for highlighting - you want to know exactly where a 
 term started and ended.
 
 -- Jack Krupansky
 
 -Original Message- From: KnightRider
 Sent: Tuesday, May 07, 2013 12:58 PM
 To: solr-user@lucene.apache.org
 Subject: Use case for storing positions and offsets in index?
 
 Can someone please tell me the usecase for storing term positions and offsets
 in the index?
 
 I am trying to understand the difference between storing positions/offsets
 vs indexing positions/offsets.
 
 Thanks
 KR
 
 
 
 -
 Thanks
 -K'Rider
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Use-case-for-storing-positions-and-offsets-in-index-tp4061376.html
 Sent from the Solr - User mailing list archive at Nabble.com. 



Portability of Solr index

2013-05-09 Thread mukesh katariya
I have built a SOLR Index on Windows 7 Enterprise, 64 Bit. I copy the index
to Centos release 6.2, 32 Bit OS.

The index is readable and the application is able to load data from the
index on Linux. But there are a few fields on which FQ Queries dont work on
Linux , but same FQ Query work on windows.

I have a situation where in i have to prepare index on windows and port it
on Linux. I need the index to be portable.

The only thing which is not working is the FQ Queries.

Inside the BlockTreeTermsReader seekExact API, I have enabled debugging and
system out statements scanToTermLeaf: block fp=1705107 prefix=0 nextEnt=0
(of 167)
target=1RD0JIHMr9aw4RPPuS0DVzB2tKf38FfjKaEg7HsYDd7EtAOpE9FYvvj5ryB7679r4KNnlIaze‌​vPo
h7qabtLhXw== [31 52 44 30 4a 49 48 4d 72 39 61 77 34 52 50 50 75 53 30 44 56
7a 42 32 74 4b 66 33 38 46 66 6a 4b 61 45 67 37 48 73 59 44 64 37 45 74 41
4f 70 45 39 46 59 76 76 6a 35 72 79 42 37 36 37 39 72 34 4b 4e 6e 6c 49 61
7a 65 76 50 6f d a 68 37 71 61 62 74 4c 68 58 77 3d 3d] term= [] This is a
Term Query, and target bytes to match
   
As per the algorithm it runs through the term and tries to match , now the
6th term is a exact match, but there is a problem of few bytescycle: term 6
(of 167)
suffix=1RD0JIHMr9aw4RPPuS0DVzB2tKf38FfjKaEg7HsYDd7EtAOpE9FYvvj5ryB7679r4KNnlIaze‌​vPo
h7qabtLhXw== [31 52 44 30 4a 49 48 4d 72 39 61 77 34 52 50 50 75 53 30 44 56
7a 42 32 74 4b 66 33 38 46 66 6a 4b 61 45 67 37 48 73 59 44 64 37 45 74 41
4f 70 45 39 46 59 76 76 6a 35 72 79 42 37 36 37 39 72 34 4b 4e 6e 6c 49 61
7a 65 76 50 6f a 68 37 71 61 62 74 4c 68 58 77 3d 3d] Prefix:=0 Suffix:=89
target.offset:=0 target.length :=90 targetLimit :=89
   
from the first section 50 6f d a 68 37 from the second section 50 6f a 68
37. The test scenario is the index is built on linux and i am testing the
index through solr api on windows machine.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Portability-of-Solr-index-tp4061794.html
Sent from the Solr - User mailing list archive at Nabble.com.


Need solr query help

2013-05-09 Thread Abhishek tiwari
We are doing spatial search. with following logic.
a) There are shops in a city . Each provides the facility of home delivery
b) each shop has different  max_delivery_distance .

Now my query is suppose some one is searching from point P1 with radius R.

User wants the result of shops those can deliver him.(distance between P1
to shop s1 say d1 should be less than max_delivery distance say md1 )

how can i implement this by solr spatial query.


More Like This and Caching

2013-05-09 Thread Giammarco Schisani
Hi all,

Could anybody explain which Solr cache (e.g. queryResultCache,
documentCache, fieldCache, etc.) can be used by the More Like This handler?

One of my colleagues had previously suggested that the More Like This
handler does not take advantage of any of the Solr caches.

However, if I issue two identical MLT requests to the same Solr instance,
the second request will execute much faster than the first request (for
example, the first request will execute in 200ms and the second request
will execute in 20ms). This makes me believe that at least one of the Solr
caches is being used by the More Like This handler.

I think the documentCache is the cache that is most likely being used,
but would you be able to confirm?

As information, I am currently using Solr version 3.6.1.

Kind regards,
Giammarco Schisani


Re: Re: Re: Re: Shard update error when using DIH

2013-05-09 Thread heaven
Thank you all, guys.
 
Your advises work great and I don't see any errors in Solr logs anymore.
 
Best,
Alex

Monday 29 April 2013, you wrote:


On 29 April 2013 14:55, heaven [hidden email][1] wrote:  Got these errors 
after switching the field type to long:   *  *crm-test:*  
org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:  
Unknown fieldtype 'long' specified on field _version_ 

You have probably edited your schema. The default one has fieldType 
name=long class=solr.TrieLongField precisionStep=0 omitNorms=true 
positionIncrementGap=0/ towards the top of the file. 

Regards, Gora 





*If you reply to this email, your message will be added to the discussion 
below:* 
http://lucene.472066.n3.nabble.com/Shard-update-error-when-using-DIH-tp4035502p4059740.html[2]
 
To unsubscribe from Shard update error when using DIH, click here[3].

NAML[4] 




[1] /user/SendEmail.jtp?type=nodenode=4059740i=0
[2] 
http://lucene.472066.n3.nabble.com/Shard-update-error-when-using-DIH-tp4035502p4059740.html
[3] 
http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe
_by_codenode=4035502code=YWhlYXZlbjg3QGdtYWlsLmNvbXw0MDM1NTAyfDE3
MDI0ODI4OTY=
[4] 
http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_view
erid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.B
asicNamespace-nabble.view.web.template.NabbleNamespace-
nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21
nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-
send_instant_email%21nabble%3Aemail.naml




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Shard-update-error-when-using-DIH-tp4035502p4061812.html
Sent from the Solr - User mailing list archive at Nabble.com.

filter result by numFound in Result Grouping

2013-05-09 Thread Shalom Ben-Zvi Kazaz
Hello list
In one of our search that we use Result Grouping we have a need to
filter results to only groups that have more then one document in the
group, or more specifically to groups that have two documents.
Is it possible in some way?

Thank you


Re: Solr 4.3 fails in startup when dataimporthandler declaration is included in solrconfig.xml

2013-05-09 Thread Jan Høydahl
My question was: When you move DIH libs to Solr's classloader (e.g. 
instanceDir/lib and refer from solrconfig.xml), and remove solr.war from 
tomcat/lib, what error msg do you then get?

Also make sure to delete the old tomcat/webapps/solr folder just to make sure 
you're starting from scratch

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

9. mai 2013 kl. 01:54 skrev William Pierce evalsi...@hotmail.com:

 The reason I placed the solr.war in tomcat/lib was -- I guess -- because 
 that's way I had always done it since 1.3 days.  Our tomcat instance(s) run 
 nothing other than solr - so that seemed as good a place as any.
 
 The DIH jars that I placed in the tomcat/lib are: 
 solr-dataimporthandler-4.3.0.jar and solr-dataimporthandler-extras-4.3.0.jar. 
  Are there any dependent jars that also need to be added that I am unaware of?
 
 On the specific errors - I get a stack trace noted in the first email that 
 began this thread but repeated here for convenience:
 
 ERROR - 2013-05-08 10:43:48.185; org.apache.solr.core.CoreContainer; Unable 
 to create core: collection1
 org.apache.solr.common.SolrException: 
 org/apache/solr/util/plugin/SolrCoreAware
   at org.apache.solr.core.SolrCore.init(SolrCore.java:821)
   at org.apache.solr.core.SolrCore.init(SolrCore.java:618)
   at 
 org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:949)
   at org.apache.solr.core.CoreContainer.create(CoreContainer.java:984)
   at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597)
   at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592)
   at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
   at java.util.concurrent.FutureTask.run(Unknown Source)
   at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
   at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
   at java.util.concurrent.FutureTask.run(Unknown Source)
   at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
   at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
   at java.lang.Thread.run(Unknown Source)
 Caused by: java.lang.NoClassDefFoundError: 
 org/apache/solr/util/plugin/SolrCoreAware
   at java.lang.ClassLoader.defineClass1(Native Method)
   at java.lang.ClassLoader.defineClass(Unknown Source)
   at java.security.SecureClassLoader.defineClass(Unknown Source)
   at java.net.URLClassLoader.defineClass(Unknown Source)
   at java.net.URLClassLoader.access$100(Unknown Source)
   at java.net.URLClassLoader$1.run(Unknown Source)
   at java.net.URLClassLoader$1.run(Unknown Source)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(Unknown Source)
   at java.lang.ClassLoader.loadClass(Unknown Source)
   at java.lang.ClassLoader.loadClass(Unknown Source)
   at java.lang.Class.forName0(Native Method)
   at java.lang.Class.forName(Unknown Source)
   at 
 org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1700)
   at java.lang.ClassLoader.loadClass(Unknown Source)
   at java.net.FactoryURLClassLoader.loadClass(Unknown Source)
   at java.lang.ClassLoader.loadClass(Unknown Source)
   at java.net.FactoryURLClassLoader.loadClass(Unknown Source)
   at java.lang.ClassLoader.loadClass(Unknown Source)
   at java.lang.Class.forName0(Native Method)
   at java.lang.Class.forName(Unknown Source)
   at 
 org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:448)
   at 
 org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:396)
   at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:518)
   at org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java:592)
   at 
 org.apache.solr.core.RequestHandlers.initHandlersFromConfig(RequestHandlers.java:154)
   at org.apache.solr.core.SolrCore.init(SolrCore.java:758)
   ... 13 more
 Caused by: java.lang.ClassNotFoundException: 
 org.apache.solr.util.plugin.SolrCoreAware
   at java.net.URLClassLoader$1.run(Unknown Source)
   at java.net.URLClassLoader$1.run(Unknown Source)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(Unknown Source)
   at java.lang.ClassLoader.loadClass(Unknown Source)
   at java.lang.ClassLoader.loadClass(Unknown Source)
   ... 40 more
 ERROR - 2013-05-08 10:43:48.189; org.apache.solr.common.SolrException; 
 null:org.apache.solr.common.SolrException: Unable to create core: collection1
   at 
 org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:1450)
   at org.apache.solr.core.CoreContainer.create(CoreContainer.java:993)
   at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597)
   at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592)
   at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
   at java.util.concurrent.FutureTask.run(Unknown Source)
   at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
   at 

Portability of Solr index

2013-05-09 Thread mukesh katariya
I have built a SOLR Index on Windows 7 Enterprise, 64 Bit. I copy the index
to Centos release 6.2, 32 Bit OS.

The index is readable and the application is able to load data from the
index on Linux. But there are a few fields on which FQ Queries dont work on
Linux , but same FQ Query work on windows.

I have a situation where in i have to prepare index on windows and port it
on Linux. I need the index to be portable.

The only thing which is not working is the FQ Queries.

Inside the BlockTreeTermsReader seekExact API, I have enabled debugging and
system out statements scanToTermLeaf: block fp=1705107 prefix=0 nextEnt=0
(of 167)
target=1RD0JIHMr9aw4RPPuS0DVzB2tKf38FfjKaEg7HsYDd7EtAOpE9FYvvj5ryB7679r4KNnlIaze‌​vPo
h7qabtLhXw== [31 52 44 30 4a 49 48 4d 72 39 61 77 34 52 50 50 75 53 30 44 56
7a 42 32 74 4b 66 33 38 46 66 6a 4b 61 45 67 37 48 73 59 44 64 37 45 74 41
4f 70 45 39 46 59 76 76 6a 35 72 79 42 37 36 37 39 72 34 4b 4e 6e 6c 49 61
7a 65 76 50 6f d a 68 37 71 61 62 74 4c 68 58 77 3d 3d] term= [] This is a
Term Query, and target bytes to match 

As per the algorithm it runs through the term and tries to match , now the
6th term is a exact match, but there is a problem of few bytescycle: term 6
(of 167)
suffix=1RD0JIHMr9aw4RPPuS0DVzB2tKf38FfjKaEg7HsYDd7EtAOpE9FYvvj5ryB7679r4KNnlIaze‌​vPo
h7qabtLhXw== [31 52 44 30 4a 49 48 4d 72 39 61 77 34 52 50 50 75 53 30 44 56
7a 42 32 74 4b 66 33 38 46 66 6a 4b 61 45 67 37 48 73 59 44 64 37 45 74 41
4f 70 45 39 46 59 76 76 6a 35 72 79 42 37 36 37 39 72 34 4b 4e 6e 6c 49 61
7a 65 76 50 6f a 68 37 71 61 62 74 4c 68 58 77 3d 3d] Prefix:=0 Suffix:=89
target.offset:=0 target.length :=90 targetLimit :=89 

from the first comment 50 6f d a 68 37 from the second comment 50 6f a 68
37. The test scenario is the index is built on linux and i am testing the
index through solr api on windows machine. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Portability-of-Solr-index-tp4061783.html
Sent from the Solr - User mailing list archive at Nabble.com.


ColrCloud: IOException occured when talking to server at

2013-05-09 Thread heaven
Hi, observing lots of these errors with SolrCloud

Here is the instruction I am using to run services:
zookeeper:
  1: cd /opt/zookeeper/
  2: sudo bin/zkServer.sh start zoo1.cfg
  3: sudo bin/zkServer.sh start zoo2.cfg
  4: sudo bin/zkServer.sh start zoo3.cfg

shards:
  1: cd /opt/solr-cluster/shard1/
 sudo su solr -c java -Xmx4096M
-DzkHost=localhost:2181,localhost:2182,localhost:2183
-Dbootstrap_confdir=./solr/conf -Dcollection.configName=Carmen -DnumShards=2
-jar start.jar etc/jetty.xml etc/jetty-logging.xml 
  2: cd ../shard2/
 sudo su solr -c java -Xmx4096M
-DzkHost=localhost:2181,localhost:2182,localhost:2183 -jar start.jar
etc/jetty.xml etc/jetty-logging.xml 

replicas:
  1: cd ../replica1/
 sudo su solr -c java -Xmx4096M
-DzkHost=localhost:2181,localhost:2182,localhost:2183 -jar start.jar
etc/jetty.xml etc/jetty-logging.xml 
  2: cd ../replica2/
 sudo su solr -c java -Xmx4096M
-DzkHost=localhost:2181,localhost:2182,localhost:2183 -jar start.jar
etc/jetty.xml etc/jetty-logging.xml 

zoo1.cfg:
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial 
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between 
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just 
# example sakes.
dataDir=/opt/zookeeper/data/1
# the port at which the clients will connect
clientPort=2181

server.1=localhost:2888:3888
server.2=localhost:2889:3889
server.3=localhost:2890:3890

zoo2.cfg and zoo3.cfg are the same except dataDir and client port
respectively.

Also very often I see: org.apache.solr.common.SolrException: No registered
leader was found and lots of other errors. Just updated jetty.xml and set
org.eclipse.jetty.server.Request.maxFormContentSize to 10MB and restarted
the cluster — half of errors gone, but this one about IOException still
here.

I am re-indexing a few models (rails application), they have from 1 000 000
to 20 000 000 of records. For indexing I have a queue (mongodb) and a few
workers which process it in batches of 200-500 records.

All Solr and Zookeeper instances are launched on the same server: 2 intel
xenon processors, 8 total cores, 32Gb of memory and rapid RAID storage.

Please help me to figure out what could be the reason for those errors and
how can fix them. Please tell me if I can provide some more information
about the server setup, logs, errors, etc.

Best,
Alex

http://lucene.472066.n3.nabble.com/file/n4061831/Topology.png 

Shard 1:
http://lucene.472066.n3.nabble.com/file/n4061831/Shard1.png 
Replica 1:
http://lucene.472066.n3.nabble.com/file/n4061831/Replica1.png 
Shard 2:
http://lucene.472066.n3.nabble.com/file/n4061831/Shard2.png 
Replica 2:
http://lucene.472066.n3.nabble.com/file/n4061831/Replica2.png 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/ColrCloud-IOException-occured-when-talking-to-server-at-tp4061831.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 4.2 rollback not working

2013-05-09 Thread mark12345

So for all current versions of Solr, rollback will not work for SolrCloud? 
Will this change in the future, or will rollback always be unsupported for
SolrCloud?

This did catch me by surprise.  Should the SolrJ documentation be updated to
reflect this behavior?

http://lucene.apache.org/solr/4_3_0/solr-solrj/org/apache/solr/client/solrj/SolrServer.html#rollback%28%29
http://lucene.apache.org/solr/4_3_0/solr-solrj/org/apache/solr/client/solrj/SolrServer.html#rollback%28%29
  




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-2-rollback-not-working-tp4060393p4061834.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: ColrCloud: IOException occured when talking to server at

2013-05-09 Thread heaven
Forget to mention Solr is 4.2 and zookepeer 3.4.5

I do not do manual commits and prefer softCommit each second and autoCommit
each 3 minutes.

the problem happened again, lots of errors in logs and no description.
Cluster state changed, on the shard 2 replica became a leader, former leader
get in to recovering mode.
The error happened when
1. Shard1 tried to forward an update to Shard2, and this was the initial
error From Shard2:
ClusterState says we are the leader,​ but locally we don't think so
2. Shard2 forwarded the update to the Replica2 and get:
org.apache.solr.common.SolrException: Request says it is coming from
leader,​ but we are the leader

Please see attachments

Topology:
http://lucene.472066.n3.nabble.com/file/n4061839/Topology_new.png 
Shard1:
http://lucene.472066.n3.nabble.com/file/n4061839/Shard1_new.png 
Replica1:
http://lucene.472066.n3.nabble.com/file/n4061839/Replica1_new.png 
Shard2:
http://lucene.472066.n3.nabble.com/file/n4061839/Shard2_new.png 
Replica2:
http://lucene.472066.n3.nabble.com/file/n4061839/Replica2_new.png 

All errors from the screenshots appears each time the server load gets
higher. Only I started a few more queue workers, load gets higher and
cluster becomes unstable. So I have doubts about reliability. Could any docs
be lost during these errors or should I just ignore those?

I understand that 4 solr instances and 3 zookeeper could be too many for a
single machine, there could be not enough resources, etc. But anyway it
should not cause anything like that. The worst scenario there should be is a
timeout error, when Solr not responding and my queue processors could handle
that and resend a request after a while.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/ColrCloud-IOException-occured-when-talking-to-server-at-tp4061831p4061839.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: ColrCloud: IOException occured when talking to server at

2013-05-09 Thread heaven
Zookeeper log:
 1  *2013-05-09 03:03:07,177* [myid:3] - WARN 
[QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2183:Follower@118] - Got zxid
0x20001 expected 0x1
 2  *2013-05-09 03:36:52,918* [myid:3] - ERROR
[CommitProcessor:3:NIOServerCnxn@180] - Unexpected Exception: 
 3  java.nio.channels.CancelledKeyException
 4  at
sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
 5  at
sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
 6  at
org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:153)
 7  at
org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1076)
 8  at
org.apache.zookeeper.server.NIOServerCnxn.process(NIOServerCnxn.java:1113)
 9  at
org.apache.zookeeper.server.DataTree.setWatches(DataTree.java:1327)
10  at
org.apache.zookeeper.server.ZKDatabase.setWatches(ZKDatabase.java:384)
11  at
org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:304)
12  at
org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:74)
13  *2013-05-09 03:36:52,928* [myid:3] - ERROR
[CommitProcessor:3:NIOServerCnxn@180] - Unexpected Exception: 
14  java.nio.channels.CancelledKeyException
15  at
sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
16  at
sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
17  at org.apache.zookeeper.server.NIOServerCnxn.s*2013-05-09
04:26:04,790* [myid:2] - WARN 
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2182:NIOServerCnxn@349] - caught end
of stream exception
18  EndOfStreamException: Unable to read additional data from client
sessionid 0x23e88bdaf81, likely client has closed socket
19  at
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
20  at
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
21  at java.lang.Thread.run(Thread.java:679)
22  tionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
23  at
sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
24  at
org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:153)
25  at
org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1076)
26  at
org.apache.zookeeper.server.NIOServerCnxn.process(NIOServerCnxn.java:1113)
27  at
org.apache.zookeeper.server.WatchManager.triggerWatch(WatchManager.java:120)
28  at
org.apache.zookeeper.server.WatchManager.triggerWatch(WatchManager.java:92)
29  at
org.apache.zookeeper.server.DataTree.setData(DataTree.java:620)
30  at
org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:807)
31  at
org.apache.zookeeper.server.ZKDatabase.processTxn(ZKDatabase.java:329)
32  at
org.apache.zookeeper.server.ZooKeeperServer.processTxn(ZooKeeperServer.java:965)
33  at
org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:116)
34  at
org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:74)
35  *2013-05-09 04:27:04,002* [myid:3] - ERROR
[CommitProcessor:3:NIOServerCnxn@180] - Unexpected Exception: 
36  java.nio.channels.CancelledKeyException
37  at
sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
38  at
sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
39  at
org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:153)
40  at
org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1076)
41  at
org.apache.zookeeper.server.NIOServerCnxn.process(NIOServerCnxn.java:1113)
42  at
org.apache.zookeeper.server.WatchManager.triggerWatch(WatchManager.java:120)
43  at
org.apache.zookeeper.server.WatchManager.triggerWatch(WatchManager.java:92)
44  at
org.apache.zookeeper.server.DataTree.deleteNode(DataTree.java:591)
45  at
org.apache.zookeeper.server.DataTree.killSession(DataTree.java:966)
46  at
org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:818)
47  at
org.apache.zookeeper.server.ZKDatabase.processTxn(ZKDatabase.java:329)
48  at
org.apache.zookeeper.server.ZooKeeperServer.processTxn(ZooKeeperServer.java:965)
49  at
org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:116)
50  at
org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:74)
51  *2013-05-09 04:36:00,485* [myid:3] - WARN 
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2183:NIOServerCnxn@349] - caught end
of stream exception
52  EndOfStreamException: Unable to read additional data from client
sessionid 

Re: Solr 4.2 rollback not working

2013-05-09 Thread Mark Miller
At the least it should throw an exception if you try rollback with SolrCloud - 
though now there is discussion about removing it entirely.

But yes, it's not supported and there are no real plans to support it.

- Mark

On May 9, 2013, at 7:21 AM, mark12345 marks1900-pos...@yahoo.com.au wrote:

 
 So for all current versions of Solr, rollback will not work for SolrCloud? 
 Will this change in the future, or will rollback always be unsupported for
 SolrCloud?
 
 This did catch me by surprise.  Should the SolrJ documentation be updated to
 reflect this behavior?
 
 http://lucene.apache.org/solr/4_3_0/solr-solrj/org/apache/solr/client/solrj/SolrServer.html#rollback%28%29
 http://lucene.apache.org/solr/4_3_0/solr-solrj/org/apache/solr/client/solrj/SolrServer.html#rollback%28%29
   
 
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-4-2-rollback-not-working-tp4060393p4061834.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: ColrCloud: IOException occured when talking to server at

2013-05-09 Thread heaven
Can confirm this lead to data loss. I have 1217427 records in database and
only 1217216 indexed. Which does mean that Solr gave a successful response
and then did not added some documents to the index.

Seems like SolrCloud is not a production-ready solution, would be good if
there was a warning in the Solr wiki about that.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/ColrCloud-IOException-occured-when-talking-to-server-at-tp4061831p4061847.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 4.3 fails in startup when dataimporthandler declaration is included in solrconfig.xml

2013-05-09 Thread William Pierce
I got this to work (thanks, Jan, and all).  It turns out that DIH jars need 
to be included explicitly by specifying in solrconfig.xml or placed in some 
default path under solr.home.  I placed these jars in instanceDir/lib and it 
worked.  Previously I had reported it as not working - this was because I 
had mistakenly left a copy of the jars under tomcat/lib.


Bill

-Original Message- 
From: Jan Høydahl

Sent: Thursday, May 09, 2013 2:58 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr 4.3 fails in startup when dataimporthandler declaration is 
included in solrconfig.xml


My question was: When you move DIH libs to Solr's classloader (e.g. 
instanceDir/lib and refer from solrconfig.xml), and remove solr.war from 
tomcat/lib, what error msg do you then get?


Also make sure to delete the old tomcat/webapps/solr folder just to make 
sure you're starting from scratch


--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

9. mai 2013 kl. 01:54 skrev William Pierce evalsi...@hotmail.com:

The reason I placed the solr.war in tomcat/lib was -- I guess -- because 
that's way I had always done it since 1.3 days.  Our tomcat instance(s) 
run nothing other than solr - so that seemed as good a place as any.


The DIH jars that I placed in the tomcat/lib are: 
solr-dataimporthandler-4.3.0.jar and 
solr-dataimporthandler-extras-4.3.0.jar.  Are there any dependent jars 
that also need to be added that I am unaware of?


On the specific errors - I get a stack trace noted in the first email that 
began this thread but repeated here for convenience:


ERROR - 2013-05-08 10:43:48.185; org.apache.solr.core.CoreContainer; 
Unable to create core: collection1
org.apache.solr.common.SolrException: 
org/apache/solr/util/plugin/SolrCoreAware

  at org.apache.solr.core.SolrCore.init(SolrCore.java:821)
  at org.apache.solr.core.SolrCore.init(SolrCore.java:618)
  at 
org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:949)

  at org.apache.solr.core.CoreContainer.create(CoreContainer.java:984)
  at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597)
  at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592)
  at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
  at java.util.concurrent.FutureTask.run(Unknown Source)
  at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
  at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
  at java.util.concurrent.FutureTask.run(Unknown Source)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
  at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.NoClassDefFoundError: 
org/apache/solr/util/plugin/SolrCoreAware

  at java.lang.ClassLoader.defineClass1(Native Method)
  at java.lang.ClassLoader.defineClass(Unknown Source)
  at java.security.SecureClassLoader.defineClass(Unknown Source)
  at java.net.URLClassLoader.defineClass(Unknown Source)
  at java.net.URLClassLoader.access$100(Unknown Source)
  at java.net.URLClassLoader$1.run(Unknown Source)
  at java.net.URLClassLoader$1.run(Unknown Source)
  at java.security.AccessController.doPrivileged(Native Method)
  at java.net.URLClassLoader.findClass(Unknown Source)
  at java.lang.ClassLoader.loadClass(Unknown Source)
  at java.lang.ClassLoader.loadClass(Unknown Source)
  at java.lang.Class.forName0(Native Method)
  at java.lang.Class.forName(Unknown Source)
  at 
org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1700)

  at java.lang.ClassLoader.loadClass(Unknown Source)
  at java.net.FactoryURLClassLoader.loadClass(Unknown Source)
  at java.lang.ClassLoader.loadClass(Unknown Source)
  at java.net.FactoryURLClassLoader.loadClass(Unknown Source)
  at java.lang.ClassLoader.loadClass(Unknown Source)
  at java.lang.Class.forName0(Native Method)
  at java.lang.Class.forName(Unknown Source)
  at 
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:448)
  at 
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:396)

  at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:518)
  at org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java:592)
  at 
org.apache.solr.core.RequestHandlers.initHandlersFromConfig(RequestHandlers.java:154)

  at org.apache.solr.core.SolrCore.init(SolrCore.java:758)
  ... 13 more
Caused by: java.lang.ClassNotFoundException: 
org.apache.solr.util.plugin.SolrCoreAware

  at java.net.URLClassLoader$1.run(Unknown Source)
  at java.net.URLClassLoader$1.run(Unknown Source)
  at java.security.AccessController.doPrivileged(Native Method)
  at java.net.URLClassLoader.findClass(Unknown Source)
  at java.lang.ClassLoader.loadClass(Unknown Source)
  at java.lang.ClassLoader.loadClass(Unknown Source)
  ... 40 more
ERROR - 2013-05-08 10:43:48.189; org.apache.solr.common.SolrException; 
null:org.apache.solr.common.SolrException: Unable to 

Re: Portability of Solr index

2013-05-09 Thread Alexandre Rafalovitch
What is the query/term you are looking for? I wonder if the difference
is due to newline treatment on different platforms.

Regards,
   Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Thu, May 9, 2013 at 1:49 AM, mukesh katariya
mukesh.katar...@e-zest.in wrote:
 I have built a SOLR Index on Windows 7 Enterprise, 64 Bit. I copy the index
 to Centos release 6.2, 32 Bit OS.

 The index is readable and the application is able to load data from the
 index on Linux. But there are a few fields on which FQ Queries dont work on
 Linux , but same FQ Query work on windows.

 I have a situation where in i have to prepare index on windows and port it
 on Linux. I need the index to be portable.

 The only thing which is not working is the FQ Queries.

 Inside the BlockTreeTermsReader seekExact API, I have enabled debugging and
 system out statements scanToTermLeaf: block fp=1705107 prefix=0 nextEnt=0
 (of 167)
 target=1RD0JIHMr9aw4RPPuS0DVzB2tKf38FfjKaEg7HsYDd7EtAOpE9FYvvj5ryB7679r4KNnlIaze‌vPo
 h7qabtLhXw== [31 52 44 30 4a 49 48 4d 72 39 61 77 34 52 50 50 75 53 30 44 56
 7a 42 32 74 4b 66 33 38 46 66 6a 4b 61 45 67 37 48 73 59 44 64 37 45 74 41
 4f 70 45 39 46 59 76 76 6a 35 72 79 42 37 36 37 39 72 34 4b 4e 6e 6c 49 61
 7a 65 76 50 6f d a 68 37 71 61 62 74 4c 68 58 77 3d 3d] term= [] This is a
 Term Query, and target bytes to match

 As per the algorithm it runs through the term and tries to match , now the
 6th term is a exact match, but there is a problem of few bytescycle: term 6
 (of 167)
 suffix=1RD0JIHMr9aw4RPPuS0DVzB2tKf38FfjKaEg7HsYDd7EtAOpE9FYvvj5ryB7679r4KNnlIaze‌vPo
 h7qabtLhXw== [31 52 44 30 4a 49 48 4d 72 39 61 77 34 52 50 50 75 53 30 44 56
 7a 42 32 74 4b 66 33 38 46 66 6a 4b 61 45 67 37 48 73 59 44 64 37 45 74 41
 4f 70 45 39 46 59 76 76 6a 35 72 79 42 37 36 37 39 72 34 4b 4e 6e 6c 49 61
 7a 65 76 50 6f a 68 37 71 61 62 74 4c 68 58 77 3d 3d] Prefix:=0 Suffix:=89
 target.offset:=0 target.length :=90 targetLimit :=89

 from the first comment 50 6f d a 68 37 from the second comment 50 6f a 68
 37. The test scenario is the index is built on linux and i am testing the
 index through solr api on windows machine.



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Portability-of-Solr-index-tp4061783.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Fuzzy searching documents over multiple fields using Solr

2013-05-09 Thread britske
Not sure if this has ever come up (or perhaps even implemented without me
knowing) , but I'm interested in doing Fuzzy search over multiple fields
using Solr. 

What I mean is the ability to returns documents based on some 'distance
calculation' without documents having to match 100% to the query. 

Usecase: a user is searching for a tv with a couple of filters selected. No
tv matches all filters. How to come up with a bunch of suggestions that
match the selected filters as closely as possible? The hard part is to
determine what 'closely' means in this context, etc.

This relates to (approximate) nearest neighbor, Kd-trees, etc. Has anyone
ever tried to do something similar? any plugins, etc? or reasons Solr/Lucene
would/wouldn't be the correct system to build on?

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Fuzzy-searching-documents-over-multiple-fields-using-Solr-tp4061867.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Fuzzy searching documents over multiple fields using Solr

2013-05-09 Thread Jack Krupansky
A simple OR boolean query will boost documents that have more matches. You 
can also selectively boost individual OR terms to control importance. And do 
and AND for the required terms, like tv.


-- Jack Krupansky
-Original Message- 
From: britske

Sent: Thursday, May 09, 2013 11:21 AM
To: solr-user@lucene.apache.org
Subject: Fuzzy searching documents over multiple fields using Solr

Not sure if this has ever come up (or perhaps even implemented without me
knowing) , but I'm interested in doing Fuzzy search over multiple fields
using Solr.

What I mean is the ability to returns documents based on some 'distance
calculation' without documents having to match 100% to the query.

Usecase: a user is searching for a tv with a couple of filters selected. No
tv matches all filters. How to come up with a bunch of suggestions that
match the selected filters as closely as possible? The hard part is to
determine what 'closely' means in this context, etc.

This relates to (approximate) nearest neighbor, Kd-trees, etc. Has anyone
ever tried to do something similar? any plugins, etc? or reasons Solr/Lucene
would/wouldn't be the correct system to build on?

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Fuzzy-searching-documents-over-multiple-fields-using-Solr-tp4061867.html
Sent from the Solr - User mailing list archive at Nabble.com. 



4.3 logging setup

2013-05-09 Thread richardg
On all prior index version I setup my log via the logging.properties file in
/usr/local/tomcat/conf, it looked like this:

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the License); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an AS IS BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

handlers = 1catalina.org.apache.juli.FileHandler,
2localhost.org.apache.juli.FileHandler,
3manager.org.apache.juli.FileHandler,
4host-manager.org.apache.juli.FileHandler, java.util.logging.ConsoleHandler

.handlers = 1catalina.org.apache.juli.FileHandler,
java.util.logging.ConsoleHandler


# Handler specific properties.
# Describes specific configuration info for Handlers.


1catalina.org.apache.juli.FileHandler.level = WARNING
1catalina.org.apache.juli.FileHandler.directory = ${catalina.base}/logs
1catalina.org.apache.juli.FileHandler.prefix = catalina.

2localhost.org.apache.juli.FileHandler.level = FINE
2localhost.org.apache.juli.FileHandler.directory = ${catalina.base}/logs
2localhost.org.apache.juli.FileHandler.prefix = localhost.

3manager.org.apache.juli.FileHandler.level = FINE
3manager.org.apache.juli.FileHandler.directory = ${catalina.base}/logs
3manager.org.apache.juli.FileHandler.prefix = manager.

4host-manager.org.apache.juli.FileHandler.level = FINE
4host-manager.org.apache.juli.FileHandler.directory = ${catalina.base}/logs
4host-manager.org.apache.juli.FileHandler.prefix = host-manager.

java.util.logging.ConsoleHandler.level = FINE
java.util.logging.ConsoleHandler.formatter =
java.util.logging.SimpleFormatter



# Facility specific properties.
# Provides extra control for each logger.


org.apache.catalina.core.ContainerBase.[Catalina].[localhost].level = INFO
org.apache.catalina.core.ContainerBase.[Catalina].[localhost].handlers =
2localhost.org.apache.juli.FileHandler

org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/manager].level
= INFO
org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/manager].handlers
= 3manager.org.apache.juli.FileHandler

org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/host-manager].level
= INFO
org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/host-manager].handlers
= 4host-manager.org.apache.juli.FileHandler

# For example, set the org.apache.catalina.util.LifecycleBase logger to log
# each component that extends LifecycleBase changing state:
#org.apache.catalina.util.LifecycleBase.level = FINE

# To see debug messages in TldLocationsCache, uncomment the following line:
#org.apache.jasper.compiler.TldLocationsCache.level = FINE

After upgrading to 4.3 today the files defined aren't being logged to.  I
know things have changed for logging w/ 4.3 but how can I get it setup like
it was before?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/4-3-logging-setup-tp4061875.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Fuzzy searching documents over multiple fields using Solr

2013-05-09 Thread Geert-Jan Brits
I didn't mention it but I'd like individual fields to contribute to the
overall score on a continuum instead of 1 (match) and 0 (no match), which
will lead to more fine-grained scoring.

A contrived example: all other things equal a tv of 40 inch should score
higher than a 38 inch tv when searching for a 42 inch tv.
This based on some distance modeling on the 'size' -field. (eg:
score(42,40) = 0.6 and score(42,38) = 0,4).
Other qualitative fields may be modeled in the same way: (e.g: restaurants
with field 'price' with values: 'budget','mid-range', 'expensive', ...)

Any way to incorporate this?



2013/5/9 Jack Krupansky j...@basetechnology.com

 A simple OR boolean query will boost documents that have more matches.
 You can also selectively boost individual OR terms to control importance.
 And do and AND for the required terms, like tv.

 -- Jack Krupansky
 -Original Message- From: britske
 Sent: Thursday, May 09, 2013 11:21 AM
 To: solr-user@lucene.apache.org
 Subject: Fuzzy searching documents over multiple fields using Solr


 Not sure if this has ever come up (or perhaps even implemented without me
 knowing) , but I'm interested in doing Fuzzy search over multiple fields
 using Solr.

 What I mean is the ability to returns documents based on some 'distance
 calculation' without documents having to match 100% to the query.

 Usecase: a user is searching for a tv with a couple of filters selected. No
 tv matches all filters. How to come up with a bunch of suggestions that
 match the selected filters as closely as possible? The hard part is to
 determine what 'closely' means in this context, etc.

 This relates to (approximate) nearest neighbor, Kd-trees, etc. Has anyone
 ever tried to do something similar? any plugins, etc? or reasons
 Solr/Lucene
 would/wouldn't be the correct system to build on?

 Thanks



 --
 View this message in context: http://lucene.472066.n3.**
 nabble.com/Fuzzy-searching-**documents-over-multiple-**
 fields-using-Solr-tp4061867.**htmlhttp://lucene.472066.n3.nabble.com/Fuzzy-searching-documents-over-multiple-fields-using-Solr-tp4061867.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Use case for storing positions and offsets in index?

2013-05-09 Thread KnightRider
Thanks Jack  Jason



-
Thanks
-K'Rider
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Use-case-for-storing-positions-and-offsets-in-index-tp4061376p4061890.html
Sent from the Solr - User mailing list archive at Nabble.com.


Grouping search results by field returning all search results for a given query

2013-05-09 Thread Luis Carlos Guerrero Covo
Hi,

I'm using solr to maintain an index of items that belong to different
companies. I want the search results to be returned in a way that is fair
to all companies, thus I wish to group the results such that each company
has 1 item in each group, and the groups of results should be returned
sorted by score.

example:
--

20 companies

first 100 results

1-20 results - (company1 highest score item, company2 highest score item,
etc..)
20-40 results - (company1 second highest score item, company 2 second
highest score item, etc..)
...

 --

I'm trying to use the field collapsing feature but I have only been able to
create the first group of results by using
group.limit=1,group.field=companyid. If I raise the group.limit value, I
would be violating the 'fairness rule' because more than one result of a
company would be returned in the first group of results.

Can I achieve the desired search result using SOLR, or do I have to look at
other options?

thank you,

Luis Guerrero


Re: Fuzzy searching documents over multiple fields using Solr

2013-05-09 Thread Jack Krupansky
You can use function queries to boost documents as well. Sorry, but it can 
get messy to figure out.


See:
http://wiki.apache.org/solr/FunctionQuery

See also the edismax bf parameter:
http://wiki.apache.org/solr/ExtendedDisMax#bf_.28Boost_Function.2C_additive.29

-- Jack Krupansky

-Original Message- 
From: Geert-Jan Brits

Sent: Thursday, May 09, 2013 12:32 PM
To: solr-user@lucene.apache.org
Subject: Re: Fuzzy searching documents over multiple fields using Solr

I didn't mention it but I'd like individual fields to contribute to the
overall score on a continuum instead of 1 (match) and 0 (no match), which
will lead to more fine-grained scoring.

A contrived example: all other things equal a tv of 40 inch should score
higher than a 38 inch tv when searching for a 42 inch tv.
This based on some distance modeling on the 'size' -field. (eg:
score(42,40) = 0.6 and score(42,38) = 0,4).
Other qualitative fields may be modeled in the same way: (e.g: restaurants
with field 'price' with values: 'budget','mid-range', 'expensive', ...)

Any way to incorporate this?



2013/5/9 Jack Krupansky j...@basetechnology.com


A simple OR boolean query will boost documents that have more matches.
You can also selectively boost individual OR terms to control importance.
And do and AND for the required terms, like tv.

-- Jack Krupansky
-Original Message- From: britske
Sent: Thursday, May 09, 2013 11:21 AM
To: solr-user@lucene.apache.org
Subject: Fuzzy searching documents over multiple fields using Solr


Not sure if this has ever come up (or perhaps even implemented without me
knowing) , but I'm interested in doing Fuzzy search over multiple fields
using Solr.

What I mean is the ability to returns documents based on some 'distance
calculation' without documents having to match 100% to the query.

Usecase: a user is searching for a tv with a couple of filters selected. 
No

tv matches all filters. How to come up with a bunch of suggestions that
match the selected filters as closely as possible? The hard part is to
determine what 'closely' means in this context, etc.

This relates to (approximate) nearest neighbor, Kd-trees, etc. Has anyone
ever tried to do something similar? any plugins, etc? or reasons
Solr/Lucene
would/wouldn't be the correct system to build on?

Thanks



--
View this message in context: http://lucene.472066.n3.**
nabble.com/Fuzzy-searching-**documents-over-multiple-**
fields-using-Solr-tp4061867.**htmlhttp://lucene.472066.n3.nabble.com/Fuzzy-searching-documents-over-multiple-fields-using-Solr-tp4061867.html
Sent from the Solr - User mailing list archive at Nabble.com.





Re: Grouping search results by field returning all search results for a given query

2013-05-09 Thread Jason Hellman
Luis,

I am presuming you do not have an overarching grouping value here…and simply 
wish to show a standard search result that shows 1 item per company.

You should be able to accomplish your second page of desired items (the second 
item from each of your 20 represented companies) by using the group.offset 
parameter.  This will shift the position in the returned array of documents to 
the value provided.

Thus:

group.limit=1group.field=companyidgroup.offset=1

…would return the second item in each companyid group matching your current 
query.

Jason

On May 9, 2013, at 10:30 AM, Luis Carlos Guerrero Covo 
lcguerreroc...@gmail.com wrote:

 Hi,
 
 I'm using solr to maintain an index of items that belong to different
 companies. I want the search results to be returned in a way that is fair
 to all companies, thus I wish to group the results such that each company
 has 1 item in each group, and the groups of results should be returned
 sorted by score.
 
 example:
 --
 
 20 companies
 
 first 100 results
 
 1-20 results - (company1 highest score item, company2 highest score item,
 etc..)
 20-40 results - (company1 second highest score item, company 2 second
 highest score item, etc..)
 ...
 
 --
 
 I'm trying to use the field collapsing feature but I have only been able to
 create the first group of results by using
 group.limit=1,group.field=companyid. If I raise the group.limit value, I
 would be violating the 'fairness rule' because more than one result of a
 company would be returned in the first group of results.
 
 Can I achieve the desired search result using SOLR, or do I have to look at
 other options?
 
 thank you,
 
 Luis Guerrero



RE: More Like This and Caching

2013-05-09 Thread David Parks
I'm not the expert here, but perhaps what you're noticing is actually the
OS's disk cache. The actual solr index isn't cached by solr, but as you read
the blocks off disk the OS disk cache probably did cache those blocks for
you. On the 2nd run the index blocks were read out of memory.

There was a very extensive discussion on this list not long back titled:
Re: SolrCloud loadbalancing, replication, and failover look that thread up
and you'll get a lot of in-depth on the topic.

David


-Original Message-
From: Giammarco Schisani [mailto:giamma...@schisani.com] 
Sent: Thursday, May 09, 2013 2:59 PM
To: solr-user@lucene.apache.org
Subject: More Like This and Caching

Hi all,

Could anybody explain which Solr cache (e.g. queryResultCache,
documentCache, fieldCache, etc.) can be used by the More Like This handler?

One of my colleagues had previously suggested that the More Like This
handler does not take advantage of any of the Solr caches.

However, if I issue two identical MLT requests to the same Solr instance,
the second request will execute much faster than the first request (for
example, the first request will execute in 200ms and the second request will
execute in 20ms). This makes me believe that at least one of the Solr caches
is being used by the More Like This handler.

I think the documentCache is the cache that is most likely being used, but
would you be able to confirm?

As information, I am currently using Solr version 3.6.1.

Kind regards,
Giammarco Schisani



Re: 4.3 logging setup

2013-05-09 Thread Jason Hellman
From:

http://lucene.apache.org/solr/4_3_0/changes/Changes.html#4.3.0.upgrading_from_solr_4.2.0

Slf4j/logging jars are no longer included in the Solr webapp. All logging jars 
are now in example/lib/ext. Changing logging impls is now as easy as updating 
the jars in this folder with those necessary for the logging impl you would 
like. If you are using another webapp container, these jars will need to go in 
the corresponding location for that container. In conjunction, the 
dist-excl-slf4j and dist-war-excl-slf4 build targets have been removed since 
they are redundent. See the Slf4j documentation, SOLR-3706, and SOLR-4651 for 
more details.

It should just require you provide your preferred logging jars within an 
appropriate classpath. 


On May 9, 2013, at 9:24 AM, richardg richa...@dvdempire.com wrote:

 On all prior index version I setup my log via the logging.properties file in
 /usr/local/tomcat/conf, it looked like this:
 
 # Licensed to the Apache Software Foundation (ASF) under one or more
 # contributor license agreements.  See the NOTICE file distributed with
 # this work for additional information regarding copyright ownership.
 # The ASF licenses this file to You under the Apache License, Version 2.0
 # (the License); you may not use this file except in compliance with
 # the License.  You may obtain a copy of the License at
 #
 # http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an AS IS BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
 handlers = 1catalina.org.apache.juli.FileHandler,
 2localhost.org.apache.juli.FileHandler,
 3manager.org.apache.juli.FileHandler,
 4host-manager.org.apache.juli.FileHandler, java.util.logging.ConsoleHandler
 
 .handlers = 1catalina.org.apache.juli.FileHandler,
 java.util.logging.ConsoleHandler
 
 
 # Handler specific properties.
 # Describes specific configuration info for Handlers.
 
 
 1catalina.org.apache.juli.FileHandler.level = WARNING
 1catalina.org.apache.juli.FileHandler.directory = ${catalina.base}/logs
 1catalina.org.apache.juli.FileHandler.prefix = catalina.
 
 2localhost.org.apache.juli.FileHandler.level = FINE
 2localhost.org.apache.juli.FileHandler.directory = ${catalina.base}/logs
 2localhost.org.apache.juli.FileHandler.prefix = localhost.
 
 3manager.org.apache.juli.FileHandler.level = FINE
 3manager.org.apache.juli.FileHandler.directory = ${catalina.base}/logs
 3manager.org.apache.juli.FileHandler.prefix = manager.
 
 4host-manager.org.apache.juli.FileHandler.level = FINE
 4host-manager.org.apache.juli.FileHandler.directory = ${catalina.base}/logs
 4host-manager.org.apache.juli.FileHandler.prefix = host-manager.
 
 java.util.logging.ConsoleHandler.level = FINE
 java.util.logging.ConsoleHandler.formatter =
 java.util.logging.SimpleFormatter
 
 
 
 # Facility specific properties.
 # Provides extra control for each logger.
 
 
 org.apache.catalina.core.ContainerBase.[Catalina].[localhost].level = INFO
 org.apache.catalina.core.ContainerBase.[Catalina].[localhost].handlers =
 2localhost.org.apache.juli.FileHandler
 
 org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/manager].level
 = INFO
 org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/manager].handlers
 = 3manager.org.apache.juli.FileHandler
 
 org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/host-manager].level
 = INFO
 org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/host-manager].handlers
 = 4host-manager.org.apache.juli.FileHandler
 
 # For example, set the org.apache.catalina.util.LifecycleBase logger to log
 # each component that extends LifecycleBase changing state:
 #org.apache.catalina.util.LifecycleBase.level = FINE
 
 # To see debug messages in TldLocationsCache, uncomment the following line:
 #org.apache.jasper.compiler.TldLocationsCache.level = FINE
 
 After upgrading to 4.3 today the files defined aren't being logged to.  I
 know things have changed for logging w/ 4.3 but how can I get it setup like
 it was before?
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/4-3-logging-setup-tp4061875.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: More Like This and Caching

2013-05-09 Thread Jason Hellman
Purely from empirical observation, both the DocumentCache and QueryResultCache 
are being populated and reused in reloads of a simple MLT search.  You can see 
in the cache inserts how much extra-curricular activity is happening to 
populate the MLT data by how many inserts and lookups occur on the first load. 

(lifted right out of the MLT wiki http://wiki.apache.org/solr/MoreLikeThis )

http://localhost:8983/solr/select?q=apachemlt=truemlt.fl=manu,catmlt.mindf=1mlt.mintf=1fl=id,score

There is no activity in the filterCache, fieldCache, or fieldValueCache - and 
that makes plenty of sense.

On May 9, 2013, at 11:12 AM, David Parks davidpark...@yahoo.com wrote:

 I'm not the expert here, but perhaps what you're noticing is actually the
 OS's disk cache. The actual solr index isn't cached by solr, but as you read
 the blocks off disk the OS disk cache probably did cache those blocks for
 you. On the 2nd run the index blocks were read out of memory.
 
 There was a very extensive discussion on this list not long back titled:
 Re: SolrCloud loadbalancing, replication, and failover look that thread up
 and you'll get a lot of in-depth on the topic.
 
 David
 
 
 -Original Message-
 From: Giammarco Schisani [mailto:giamma...@schisani.com] 
 Sent: Thursday, May 09, 2013 2:59 PM
 To: solr-user@lucene.apache.org
 Subject: More Like This and Caching
 
 Hi all,
 
 Could anybody explain which Solr cache (e.g. queryResultCache,
 documentCache, fieldCache, etc.) can be used by the More Like This handler?
 
 One of my colleagues had previously suggested that the More Like This
 handler does not take advantage of any of the Solr caches.
 
 However, if I issue two identical MLT requests to the same Solr instance,
 the second request will execute much faster than the first request (for
 example, the first request will execute in 200ms and the second request will
 execute in 20ms). This makes me believe that at least one of the Solr caches
 is being used by the More Like This handler.
 
 I think the documentCache is the cache that is most likely being used, but
 would you be able to confirm?
 
 As information, I am currently using Solr version 3.6.1.
 
 Kind regards,
 Giammarco Schisani
 



Re: 4.3 logging setup

2013-05-09 Thread richardg
Thanks for responding.  My issue is I've never changed anything w/ logging, I
have always used the built in Juli.  I've never messed w/ any jar files,
just had edit the logging.properties file.  I don't know where I would get
the jars for juli or where to put them, if that is what is needed.  I had
read what you posted before I just can't make any sense of it.

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/4-3-logging-setup-tp4061875p4061901.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Invalid version (expected 2, but 60) or the data in not in 'javabin' format

2013-05-09 Thread Sergiu Bivol
I have a similar problem. With 5 shards, querying 500K rows fails, but 400K is 
fine.
Querying individual shards for 1.5 million rows works.
All solr instances are v4.2.1 and running on separate Ubuntu VMs.
It is not random, can be always reproduced by adding rows=50 to a query 
where numFound is  500K

Is this a configuration issue, where some setting can be increased?

-
This transmission (including any attachments) may contain confidential 
information, privileged material (including material protected by the 
solicitor-client or other applicable privileges), or constitute non-public 
information. Any use of this information by anyone other than the intended 
recipient is prohibited. If you have received this transmission in error, 
please immediately reply to the sender and delete this information from your 
system. Use, dissemination, distribution, or reproduction of this transmission 
by unintended recipients is not authorized and may be unlawful.


Re: 4.3 logging setup

2013-05-09 Thread Jason Hellman
If you nab the jars in example/lib/ext and place them within the appropriate 
folder in Tomcat (and this will somewhat depend on which version of Tomcat you 
are using…let's presume tomcat/lib as a brute-force approach) you should be 
back in business.

On May 9, 2013, at 11:41 AM, richardg richa...@dvdempire.com wrote:

 Thanks for responding.  My issue is I've never changed anything w/ logging, I
 have always used the built in Juli.  I've never messed w/ any jar files,
 just had edit the logging.properties file.  I don't know where I would get
 the jars for juli or where to put them, if that is what is needed.  I had
 read what you posted before I just can't make any sense of it.
 
 Thanks
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/4-3-logging-setup-tp4061875p4061901.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: 4.3 logging setup

2013-05-09 Thread Jan Høydahl
Hi,

FIrst of all, to setup loggin using Log4J (which is really better than JULI), 
copy all the jars from Jetty's lib/ext over to tomcat's lib folder, see 
instructions here: http://wiki.apache.org/solr/SolrLogging#Solr_4.3_and_above. 
You can place your log4j.properties in tomcat/lib as well so it will be read 
automatically.

Now when you start your Tomcat, you will find a file tomcat/logs/solr.log in 
nicer format than before, with one log entry per line instead of two, and 
automatic log file rotation and cleaning.

However, if you like to switch to Java Util logging, do the following:

1. Download slf4j version 1.6.6 (since that's what we use). 
http://www.slf4j.org/dist/slf4j-1.6.6.zip
2. Unpack, and pull out the file slf4j-jdk14-1.6.6.jar
3. Remove tomcat/lib/slf4j-log4j12-1.6.6.jar and copy slf4j-jdk14-1.6.6.jar to 
tomcat/lib instead
4. Use your old logging.properties (either place it on classpath or point to it 
with startup opt)

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

9. mai 2013 kl. 20:41 skrev richardg richa...@dvdempire.com:

 Thanks for responding.  My issue is I've never changed anything w/ logging, I
 have always used the built in Juli.  I've never messed w/ any jar files,
 just had edit the logging.properties file.  I don't know where I would get
 the jars for juli or where to put them, if that is what is needed.  I had
 read what you posted before I just can't make any sense of it.
 
 Thanks
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/4-3-logging-setup-tp4061875p4061901.html
 Sent from the Solr - User mailing list archive at Nabble.com.



RE: Invalid version (expected 2, but 60) or the data in not in 'javabin' format

2013-05-09 Thread Sergiu Bivol
Adding the original message.

Thank you
Sergiu

-Original Message-
From: Sergiu Bivol [mailto:sbi...@blackberry.com]
Sent: Thursday, May 09, 2013 2:50 PM
To: solr-user@lucene.apache.org
Subject: RE: Invalid version (expected 2, but 60) or the data in not in 
'javabin' format

I have a similar problem. With 5 shards, querying 500K rows fails, but 400K is 
fine.
Querying individual shards for 1.5 million rows works.
All solr instances are v4.2.1 and running on separate Ubuntu VMs.
It is not random, can be always reproduced by adding rows=50 to a query 
where numFound is  500K

Is this a configuration issue, where some setting can be increased?

-
From: Ahmet Arslan iori...@yahoo.com
Subject: Invalid version (expected 2, but 60) or the data in not in 'javabin' 
format
Date: Mon, 21 Jan 2013 22:35:10 GMT

Hi,

I am was hitting the following exception when doing distributed search.
I am faceting on an int field named contentID. For some queries it was giving 
this error.
For some queries it just works fine.

localhost:8080/solr/kanu/select/?shards=localhost:8080/solr/rega,localhost:8080/solr/kanuindent=trueq=kararstart=0rows=15hl=falsewt=xmlfacet=truefacet.limit=-1facet.sort=falsejson.nl=arrarrfq=isXml:falsemm=100%facet.field=contentIDf.contentID.facet.mincount=2

Same search URL works fine for cores (kanu and rega) individually.

Plus if I use rega core as base search URL it works too. e.g.
localhost:8080/solr/rega/select/?shards=localhost:8080...

I see that rega core has lots of unique values for contentID field.
So my conclusion is, if a shard response is too big this happens.

This is a bad usage of faceting and I will remove faceting on that field since 
it was added
accidentally.

I still want to share stack traces since error message is somehow misleading.

Jan 21, 2013 10:36:53 PM org.apache.solr.common.SolrException log
SEVERE: null:org.apache.solr.common.SolrException: java.lang.RuntimeException: 
Invalid version
(expected 2, but 60) or the data in not in 'javabin' format
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:300)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1701)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602)
at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.RuntimeException: Invalid version (expected 2, but 60) or 
the data in
not in 'javabin' format
at 
org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:109)
at 
org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:41)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:385)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:182)
at 
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:166)
at 
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:133)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
... 1 more


When I add shards.tolerant=true exception becomes:

Jan 21, 2013 10:51:51 

Re: 4.3 logging setup

2013-05-09 Thread richardg
I had already copied those jars over and gotten the app to start(it wouldn't
without them).  I was able configure solf4j/log4j logging using the
log4j.properties in the /lib folder to start logging but I don't want to
switch.  I have alerts set on the wording that the juli logging puts out but
everything I've tried to get it to work has failed.  I have older
indexes(4.2 and under) running on the server that are still able to log
correctly, it is just 4.3.  I am obviously missing something.

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/4-3-logging-setup-tp4061875p4061907.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: 4.3 logging setup

2013-05-09 Thread Shawn Heisey

On 5/9/2013 12:54 PM, Jason Hellman wrote:

If you nab the jars in example/lib/ext and place them within the appropriate 
folder in Tomcat (and this will somewhat depend on which version of Tomcat you 
are using…let's presume tomcat/lib as a brute-force approach) you should be 
back in business.

On May 9, 2013, at 11:41 AM, richardg richa...@dvdempire.com wrote:


Thanks for responding.  My issue is I've never changed anything w/ logging, I
have always used the built in Juli.  I've never messed w/ any jar files,
just had edit the logging.properties file.  I don't know where I would get
the jars for juli or where to put them, if that is what is needed.  I had
read what you posted before I just can't make any sense of it.


I've been looking into this a little bit. Tomcat's juli is an apache 
reimplementation of java.util.logging.  Solr uses SLF4J, but before 4.3, 
Solr's slf4j was bound to java.util.logging ... which I would bet was 
being intercepted by tomcat and sent through the juli config.


With 4.3, SLF4J is bound to log4j by default.  If you stick with this 
binding, then you need to configure log4j instead of juli.


Richard, you could go back to java.util.logging (the way earlier 
versions had it) with this procedure, and this will probably restore the 
ability to configure logging with juli.


- Delete the following jars from Solr's example lib/ext:
-- jul-to-slf4j-1.6.6.jar
-- log4j-1.2.16.jar
-- slf4j-log4j12-1.6.6.jar
- Download slf4j version 1.6.6 from their website.
- Copy the following jars from the download into lib/ext:
-- log4j-over-slf4j-1.6.6.jar
-- slf4j-jdk14-1.6.6.jar
- Copy all jars in lib/ext to tomcat's lib directory.

http://www.slf4j.org/dist/
http://www.slf4j.org

Alternatively, you could copy the jars from lib/ext to a directory in 
your classpath, or add Solr's lib/ext to your classpath.


If you want to upgrade to the newest slf4j, you can, you'll just have to 
use the new version for all slf4j jars.


Please let me know whether this worked for you so we can get a proper 
procedure up on the wiki.


Thanks,
Shawn



Re: Grouping search results by field returning all search results for a given query

2013-05-09 Thread Luis Carlos Guerrero Covo
Thank you for the prompt reply jason. The group.offset parameter is working
for me, now I can iterate through all items for each company. The problem
I'm having right now is pagination. Is there a way how this can be
implemented out of the box with solr?

Before I was using the group.main=true for easy pagination of results, but
it seems like I'll have to ditch that and use the standard grouping format
returned by solr for the group.offset parameter to be useful. Since all
groups don't have the same number of items, I'll have to carefully
calculate the results that should be returned for each page of 20 items and
probably make several solr calls per page rendered.


On Thu, May 9, 2013 at 1:07 PM, Jason Hellman 
jhell...@innoventsolutions.com wrote:

 Luis,

 I am presuming you do not have an overarching grouping value here…and
 simply wish to show a standard search result that shows 1 item per company.

 You should be able to accomplish your second page of desired items (the
 second item from each of your 20 represented companies) by using the
 group.offset parameter.  This will shift the position in the returned array
 of documents to the value provided.

 Thus:

 group.limit=1group.field=companyidgroup.offset=1

 …would return the second item in each companyid group matching your
 current query.

 Jason

 On May 9, 2013, at 10:30 AM, Luis Carlos Guerrero Covo 
 lcguerreroc...@gmail.com wrote:

  Hi,
 
  I'm using solr to maintain an index of items that belong to different
  companies. I want the search results to be returned in a way that is fair
  to all companies, thus I wish to group the results such that each company
  has 1 item in each group, and the groups of results should be returned
  sorted by score.
 
  example:
  --
 
  20 companies
 
  first 100 results
 
  1-20 results - (company1 highest score item, company2 highest score item,
  etc..)
  20-40 results - (company1 second highest score item, company 2 second
  highest score item, etc..)
  ...
 
  --
 
  I'm trying to use the field collapsing feature but I have only been able
 to
  create the first group of results by using
  group.limit=1,group.field=companyid. If I raise the group.limit value, I
  would be violating the 'fairness rule' because more than one result of a
  company would be returned in the first group of results.
 
  Can I achieve the desired search result using SOLR, or do I have to look
 at
  other options?
 
  thank you,
 
  Luis Guerrero




-- 
Luis Carlos Guerrero Covo
M.S. Computer Engineering
(57) 3183542047


Re: Grouping search results by field returning all search results for a given query

2013-05-09 Thread Jason Hellman
I would think pagination is resolved by obtaining the numFound value for your 
returned groups.  If you have numFound=6 then each page of 20 items (one item 
per company) would imply a total of 6 pages.

You'll have to arbitrate for the variance here…but it would seem to me you need 
as many pages as the highest value in the numFound field for all groups.  
This shouldn't require requerying but will definitely require a little 
intelligence on the web app to handle the groups that are less than the largest 
size.

Hope that's useful!

On May 9, 2013, at 12:23 PM, Luis Carlos Guerrero Covo 
lcguerreroc...@gmail.com wrote:

 Thank you for the prompt reply jason. The group.offset parameter is working
 for me, now I can iterate through all items for each company. The problem
 I'm having right now is pagination. Is there a way how this can be
 implemented out of the box with solr?
 
 Before I was using the group.main=true for easy pagination of results, but
 it seems like I'll have to ditch that and use the standard grouping format
 returned by solr for the group.offset parameter to be useful. Since all
 groups don't have the same number of items, I'll have to carefully
 calculate the results that should be returned for each page of 20 items and
 probably make several solr calls per page rendered.
 
 
 On Thu, May 9, 2013 at 1:07 PM, Jason Hellman 
 jhell...@innoventsolutions.com wrote:
 
 Luis,
 
 I am presuming you do not have an overarching grouping value here…and
 simply wish to show a standard search result that shows 1 item per company.
 
 You should be able to accomplish your second page of desired items (the
 second item from each of your 20 represented companies) by using the
 group.offset parameter.  This will shift the position in the returned array
 of documents to the value provided.
 
 Thus:
 
 group.limit=1group.field=companyidgroup.offset=1
 
 …would return the second item in each companyid group matching your
 current query.
 
 Jason
 
 On May 9, 2013, at 10:30 AM, Luis Carlos Guerrero Covo 
 lcguerreroc...@gmail.com wrote:
 
 Hi,
 
 I'm using solr to maintain an index of items that belong to different
 companies. I want the search results to be returned in a way that is fair
 to all companies, thus I wish to group the results such that each company
 has 1 item in each group, and the groups of results should be returned
 sorted by score.
 
 example:
 --
 
 20 companies
 
 first 100 results
 
 1-20 results - (company1 highest score item, company2 highest score item,
 etc..)
 20-40 results - (company1 second highest score item, company 2 second
 highest score item, etc..)
 ...
 
 --
 
 I'm trying to use the field collapsing feature but I have only been able
 to
 create the first group of results by using
 group.limit=1,group.field=companyid. If I raise the group.limit value, I
 would be violating the 'fairness rule' because more than one result of a
 company would be returned in the first group of results.
 
 Can I achieve the desired search result using SOLR, or do I have to look
 at
 other options?
 
 thank you,
 
 Luis Guerrero
 
 
 
 
 -- 
 Luis Carlos Guerrero Covo
 M.S. Computer Engineering
 (57) 3183542047



Re: 4.3 logging setup

2013-05-09 Thread richardg
These are the files I have in my /lib folder:

slf4j-api-1.6.6
log4j-1.2.16
jul-to-slf4j-1.6.6
jcl-over-slf4j-1.6.6
slf4j-jdk14-1.6.6
log4j-over-slf4j-1.6.6

Currently everything seems to be logging like before.  After I followed the
instructions in Jan's post replacing slf4j-log4j12-1.6.6.jar with this
slf4j-jdk14-1.6.6.jar it all started working.  Shawn I then removed
everything as you instructed and put in just  log4j-over-slf4j-1.6.6.jar and
slf4j-jdk14-1.6.6.jar but the index showed an error and wouldn't start.  So
that is why I have those 6 files in there now, I'm not sure if
log4j-over-slf4j-1.6.6.jar this file is needed or not.  Let me know if you
need me to test anything else.

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/4-3-logging-setup-tp4061875p4061922.html
Sent from the Solr - User mailing list archive at Nabble.com.


Does Distributed Search are Cached Only the By Node That Runs Query?

2013-05-09 Thread Furkan KAMACI
I have Solr 4.2.1 and run them as SolrCloud. When I do a search on
SolrCloud as like that:

ip_of_node_1:8983solr/select?q=*:*rows=1

and when I check admin page I see that:

I have 5 GB Java Heap. 616.32 MB is dark gray, 3.13 GB is gray.

Before my search it was something like: 150 MB dark gray, 500 MB gray.

I understand that when I do a search like that, fields are cached. However
when I look at other SolrCloud nodes' admin pages there are no differences.
Why that query is cached only by the node that I run that query on?


Re: 4.3 logging setup

2013-05-09 Thread Shawn Heisey

On 5/9/2013 1:41 PM, richardg wrote:

These are the files I have in my /lib folder:

slf4j-api-1.6.6
log4j-1.2.16
jul-to-slf4j-1.6.6
jcl-over-slf4j-1.6.6
slf4j-jdk14-1.6.6
log4j-over-slf4j-1.6.6

Currently everything seems to be logging like before.  After I followed the
instructions in Jan's post replacing slf4j-log4j12-1.6.6.jar with this
slf4j-jdk14-1.6.6.jar it all started working.  Shawn I then removed
everything as you instructed and put in just  log4j-over-slf4j-1.6.6.jar and
slf4j-jdk14-1.6.6.jar but the index showed an error and wouldn't start.  So
that is why I have those 6 files in there now, I'm not sure if
log4j-over-slf4j-1.6.6.jar this file is needed or not.  Let me know if you
need me to test anything else.


You're on the right track.  Your list just has two files that shouldn't 
be there - log4j-1.2.16 and jul-to-slf4j-1.6.6.  They are probably not 
causing any real problems, but they might in the future.


Remove those and you will have the exact list I was looking for.  If 
that doesn't work, use a paste website (pastie.org and others) to send a 
log showing the errors you get.


Thanks,
Shawn



Is the CoreAdmin RENAME method atomic?

2013-05-09 Thread Lan
We need to implement a locking mechanism for a full-reindexing SOLR server
pool. We could use a database, Zookeeper as our locking mechanism but thats
a lot of work. Could solr do it?

I noticed the core admin RENAME function
(http://wiki.apache.org/solr/CoreAdmin#RENAME) Is this an synchronous atomic
operation?

What I'm thinking is we create a solr core named 'lock' and any process that
wants to obtain a solr server from the pool tries to rename the 'lock' core
to say 'lock.someuniqueid'. If it fails, then it tries another server in the
pools or waits a bit. If it succeeds, it reindexes it's data and then
renames 'lock.someuniqueid' back to 'lock' to return the server back to the
pool.









--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-the-CoreAdmin-RENAME-method-atomic-tp4061944.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Frequent OOM - (Unknown source in logs).

2013-05-09 Thread shreejay
We ended up using a Solr 4.0 (now 4.2) without the cloud option. And it seems
to be holding good. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Frequent-OOM-Unknown-source-in-logs-tp4029361p4061945.html
Sent from the Solr - User mailing list archive at Nabble.com.


SolrCloud Sorting Results By Relevance

2013-05-09 Thread Furkan KAMACI
When I make a search at Solr 4.2.1 that runs as SolrCloud I get:

result name=response numFound=18720 start=0 maxScore=1.2672108

First one has that boost:

float name=boost
1.3693064
/float

Second one has that:

float name=boost
1.7501166
/float

and third one:

float name=boost
1.0387472
/float

Here is default schema for Nutch:
http://svn.apache.org/viewvc/nutch/tags/release-2.1/conf/schema-solr4.xml?revision=1388536view=markup

Do I miss something or result are already sorted by relevance by Solr?


Apache Whirr for SolrCloud with external Zookeeper

2013-05-09 Thread Furkan KAMACI
Hi Folks;

I have tested Solr 4.2.1 as SolrCloud and I think to use 4.3.1 when it is
ready at my pre-production environment. I want to learn that does anybody
uses Apache Whirr for SolrCloud with external Zookeeper ensemble? What
folks are using for such kind of purposes?


Status of EDisMax

2013-05-09 Thread André Widhani
Hi,

what is the current status of the Extended DisMax Query Parser? The release 
notes for Solr 3.1 say it was experimental at that time (two years back).

The current wiki page for EDisMax does not contain any such statement. We 
recently ran into the issue described in SOLR-2649 (using q.op=AND) which I 
think is a very fundamental defect making it unusable at least in our case.

Thanks,
André



Negative Boosting at Recent Versions of Solr?

2013-05-09 Thread Furkan KAMACI
I know that whilst Lucene allows negative boosts, Solr does not. However
did it change with newer versions of Solr (I use Solr 4.2.1) or still same?


Re: Apache Whirr for SolrCloud with external Zookeeper

2013-05-09 Thread Otis Gospodnetic
I've never encountered anyone using Whirr to launch Solr even though
that's possible - http://issues.apache.org/jira/browse/WHIRR-465

Otis
--
Solr  ElasticSearch Support
http://sematext.com/





On Thu, May 9, 2013 at 5:28 PM, Furkan KAMACI furkankam...@gmail.com wrote:
 Hi Folks;

 I have tested Solr 4.2.1 as SolrCloud and I think to use 4.3.1 when it is
 ready at my pre-production environment. I want to learn that does anybody
 uses Apache Whirr for SolrCloud with external Zookeeper ensemble? What
 folks are using for such kind of purposes?


Re: Apache Whirr for SolrCloud with external Zookeeper

2013-05-09 Thread Furkan KAMACI
I saw that ticket and wanted to ask it to mail list. I want to give it a
try and feedback to mail list. What folks use for such kind of purposes?


2013/5/10 Otis Gospodnetic otis.gospodne...@gmail.com

 I've never encountered anyone using Whirr to launch Solr even though
 that's possible - http://issues.apache.org/jira/browse/WHIRR-465

 Otis
 --
 Solr  ElasticSearch Support
 http://sematext.com/





 On Thu, May 9, 2013 at 5:28 PM, Furkan KAMACI furkankam...@gmail.com
 wrote:
  Hi Folks;
 
  I have tested Solr 4.2.1 as SolrCloud and I think to use 4.3.1 when it is
  ready at my pre-production environment. I want to learn that does anybody
  uses Apache Whirr for SolrCloud with external Zookeeper ensemble? What
  folks are using for such kind of purposes?



Re: Negative Boosting at Recent Versions of Solr?

2013-05-09 Thread Jack Krupansky
Solr does support both additive and multiplicative boosts. Although Solr 
doesn't support negative multiplicative boosts on query terms, it does 
support fractional multiplicative boosts (0.25) which do allow you to 
de-boost a term.


The boosts for individual query terms and for the edismax qf parameter 
cannot be negative, but can be fractional.


The edismax bf parameter give a function query that provides an additive 
boost, which could be negative.


The edismax boost parameter gives a function query that provides a 
multiplicative boost - which could be negative, so it’s not absolutely true 
that doesn't support negative boosts.


-- Jack Krupansky

-Original Message- 
From: Furkan KAMACI

Sent: Thursday, May 09, 2013 6:08 PM
To: solr-user@lucene.apache.org
Subject: Negative Boosting at Recent Versions of Solr?

I know that whilst Lucene allows negative boosts, Solr does not. However
did it change with newer versions of Solr (I use Solr 4.2.1) or still same? 



Re: Index compatibility between Solr releases.

2013-05-09 Thread Erick Erickson
Solr strives to keep backwards-compatible 1 major revision, so 4.x
should be able to work with 3.x indexes. One caution though, well
actually two.

1 If you have a master/slave setup, upgrade the _slaves_ first. If
you upgrade a master fist and it merges segments, then the slaves
won't be able to read the 4.x formst.

2 make backups first G...

BTW, when the segments are written, they should be written in 4.x
format. So I've heard of people doing the migration, then forcing an
optimize just to bring all the segments up to the 4.x format.

Best
Erick

On Tue, May 7, 2013 at 3:28 PM, Skand Gupta skandsgu...@gmail.com wrote:
 We have a fairly large (in the order of 10s of TB) indices built using Solr
 3.5. We are considering migrating to Solr 4.3 and was wondering what the
 policy is on maintaining backward compatibility of the indices? Will 4.3
 work with my 3.5 indexes? Because of the large data size, I would ideally
 like to move new data to 4.3 and gradually re-index all the 3.5 indices.

 Thanks,
 - Skand.


Re: Index corrupted detection from http get command.

2013-05-09 Thread Erick Erickson
There's no way to do this that I know of. There's the checkindex
tool, but it's fairly expensive resource-wise and there's no HTTP
command to do it.

Best
Erick

On Tue, May 7, 2013 at 8:04 PM, Michel Dion diom...@gmail.com wrote:
 Hello,

 I'm look for a way to detect solr index corruption using a http get
 command. I've look at the /admin/ping and /admin/luke request handlers but
 not sure if the their status provide guarantees that everything is all
 right. The idea is to be able to tell a load balancer to put a given solr
 instance out of rotation if its index is  corrupted.

 Thanks

 Michel


Re: transientCacheSize doesn't seem to have any effect, except on startup

2013-05-09 Thread Erick Erickson
I'm slammed with stuff and have to leave for vacation Saturday morning
so I'll be going silent for a while, sorry

Best
Erick

On Wed, May 8, 2013 at 11:27 AM, didier deshommes dfdes...@gmail.com wrote:
 Any idea on this? I still cannot get the combination of transient cores and
 transientCacheSize to work as I think it should: give me the ability to
 create a large number cores and automatically load and unload them for me
 based on a limit that I set.

 If anyone else is using this feature and it is working for you, let me know
 how you got it working!


 On Fri, May 3, 2013 at 2:11 PM, didier deshommes dfdes...@gmail.com wrote:


 On Fri, May 3, 2013 at 11:18 AM, Erick Erickson 
 erickerick...@gmail.comwrote:

 The cores aren't loaded (or at least shouldn't be) for getting the status.
 The _names_ of the cores should be returned, but those are (supposed) to
 be
 retrieved from a list rather than loaded cores. So are you sure that's
 not what
 you are seeing? How are you determining whether the cores are actually
 loaded
 or not?


 I'm looking at the output of :

 $ curl http://localhost:8983/solr/admin/cores?wt=jsonaction=status;

 cores that are loaded have a startTime and upTime value. Cores that
 are unloaded don't appear in the output at all. For example, I created 3
 transient cores with transientCacheSize=2 . When I asked for a list of
 all cores, all 3 cores were returned. I explicitly unloaded 1 core and got
 back 2 cores when I asked for the list again.

 It would be nice if cores had a isTransient and a isCurrentlyLoaded
 value so that one could see exactly which cores are loaded.




 That said, it's perfectly possible that the status command is doing
 something we
 didn't anticipate, but I took a quick look at the code (got to rush to a
 plane)
 and CoreAdminHandler _appears_ to be just returning whatever info it can
 about an unloaded core for status. I _think_ you'll get more info if the
 core has ever been loaded though, even though if it's been removed from
 the transient cache. Ditto for the create action.

 So let's figure out whether you're really seeing loaded cores or not, and
 then
 raise a JIRA if so...

 Thanks for reporting!
 Erick

 On Thu, May 2, 2013 at 1:27 PM, didier deshommes dfdes...@gmail.com
 wrote:
  Hi,
  I've been very interested in the transient core feature of solr to
 manage a
  large number of cores. I'm especially interested in this use case, that
 the
  wiki lists at http://wiki.apache.org/solr/LotsOfCores (looks to be down
  now):
 
 loadOnStartup=false transient=true: This is really the use-case. There
 are
  a large number of cores in your system that are short-duration use. You
  want Solr to load them as necessary, but unload them when the cache gets
  full on an LRU basis.
 
  I'm creating 10 transient core via core admin like so
 
  $ curl 
 
 http://localhost:8983/solr/admin/cores?wt=jsonaction=CREATEname=new_core2instanceDir=collection1/dataDir=new_core2transient=trueloadOnStartup=false
  
 
  and have transientCacheSize=2 in my solr.xml file, which I take means
 I
  should have at most 2 transient cores loaded at any time. The problem is
  that these cores are still loaded when when I ask solr to list cores:
 
  $ curl http://localhost:8983/solr/admin/cores?wt=jsonaction=status;
 
  From the explanation in the wiki, it looks like solr would manage
 loading
  and unloading transient cores for me without having to worry about them,
  but this is not what's happening.
 
  The situation is different when I restart solr; it does the right
 thing
  by loading the maximum cores set by transientCacheSize. When I add more
  cores, the old behavior happens again, where all created transient cores
  are loaded in solr.
 
  I'm using the development branch lucene_solr_4_3 to run my example. I
 can
  open a jira if need be.





Re: Apache Whirr for SolrCloud with external Zookeeper

2013-05-09 Thread Otis Gospodnetic
Great, let us know how it works for you. Blog post?

Otis
Solr  ElasticSearch Support
http://sematext.com/
On May 9, 2013 6:30 PM, Furkan KAMACI furkankam...@gmail.com wrote:

 I saw that ticket and wanted to ask it to mail list. I want to give it a
 try and feedback to mail list. What folks use for such kind of purposes?


 2013/5/10 Otis Gospodnetic otis.gospodne...@gmail.com

  I've never encountered anyone using Whirr to launch Solr even though
  that's possible - http://issues.apache.org/jira/browse/WHIRR-465
 
  Otis
  --
  Solr  ElasticSearch Support
  http://sematext.com/
 
 
 
 
 
  On Thu, May 9, 2013 at 5:28 PM, Furkan KAMACI furkankam...@gmail.com
  wrote:
   Hi Folks;
  
   I have tested Solr 4.2.1 as SolrCloud and I think to use 4.3.1 when it
 is
   ready at my pre-production environment. I want to learn that does
 anybody
   uses Apache Whirr for SolrCloud with external Zookeeper ensemble? What
   folks are using for such kind of purposes?
 



Re: SolrCloud: IOException occured when talking to server at

2013-05-09 Thread Shawn Heisey
On 5/9/2013 7:31 AM, heaven wrote:
 Can confirm this lead to data loss. I have 1217427 records in database and
 only 1217216 indexed. Which does mean that Solr gave a successful response
 and then did not added some documents to the index.
 
 Seems like SolrCloud is not a production-ready solution, would be good if
 there was a warning in the Solr wiki about that.

You've got some kind of underlying problem here.  Here are my guesses
about what that might be:

- An improperly configured Linux firewall and/or SELinux is enabled.
- The hardware is already overtaxed by other software.
- Your zkClientTimeout value is extremely small.
- Your GC pauses are large.
- You're running into an open file limit.

Here's what you could do to resolve each of these:

- Disable the firewall and selinux, reboot.
- Stop other software.
- The example zkClientTimeout is 15 seconds. Try 30-60.
- See http://wiki.apache.org/solr/SolrPerformanceProblems for some GC ideas.
- Increase the file and process limits.  For most versions of Linux, in
/etc/security/limits.conf:

solr hardnproc   6144
solr softnproc   4096
solr hardnofile  65536
solr softnofile  49152

These numbers should be sufficient for deployments considerably larger
than yours.

SolrCloud is not only production ready, it's being used by many many
people for extremely large indexes.  My own SolrCloud deployment is
fairly small with only 1.5 million docs, but it's extremely stable.  I
also have a somewhat large (77 million docs) non-cloud deployment.

Are you running 4.2.1?  I feel fairly certain based on your screenshots
that you are not running 4.3, but I can't tell which version you are
running.  There are some bugs in the 4.3 release, a 4.3.1 will be
released soon.  If you had planned to upgrade, you should wait for 4.3.1
or 4.4.

NB, and something you might already know: When talking about
production-ready, you can't run everything on the same server.  You need
at least three - two of them can run Solr and zookeeper, and the third
runs zookeeper.  This single-server setup is fine for a proof-of-concept.

Thanks,
Shawn



Re: SolrCloud Sorting Results By Relevance

2013-05-09 Thread Otis Gospodnetic
Hits are sorted by relevance score by default. You are listing boost.

Otis
Solr  ElasticSearch Support
http://sematext.com/
On May 9, 2013 5:16 PM, Furkan KAMACI furkankam...@gmail.com wrote:

 When I make a search at Solr 4.2.1 that runs as SolrCloud I get:

 result name=response numFound=18720 start=0 maxScore=1.2672108

 First one has that boost:

 float name=boost
 1.3693064
 /float

 Second one has that:

 float name=boost
 1.7501166
 /float

 and third one:

 float name=boost
 1.0387472
 /float

 Here is default schema for Nutch:

 http://svn.apache.org/viewvc/nutch/tags/release-2.1/conf/schema-solr4.xml?revision=1388536view=markup

 Do I miss something or result are already sorted by relevance by Solr?



Re: Status of EDisMax

2013-05-09 Thread Otis Gospodnetic
Didn't check that issue,  but edismax is not experimental any more - most
solr users use it.

Otis
Solr  ElasticSearch Support
http://sematext.com/
On May 9, 2013 5:36 PM, André Widhani andre.widh...@digicol.de wrote:

 Hi,

 what is the current status of the Extended DisMax Query Parser? The
 release notes for Solr 3.1 say it was experimental at that time (two years
 back).

 The current wiki page for EDisMax does not contain any such statement. We
 recently ran into the issue described in SOLR-2649 (using q.op=AND) which I
 think is a very fundamental defect making it unusable at least in our case.

 Thanks,
 André




Re: Does Distributed Search are Cached Only the By Node That Runs Query?

2013-05-09 Thread Otis Gospodnetic
You are looking at jvm heap but attributing it to caching only. Not quite
right...there are other things in that jvm heap.

Otis
Solr  ElasticSearch Support
http://sematext.com/
On May 9, 2013 3:55 PM, Furkan KAMACI furkankam...@gmail.com wrote:

 I have Solr 4.2.1 and run them as SolrCloud. When I do a search on
 SolrCloud as like that:

 ip_of_node_1:8983solr/select?q=*:*rows=1

 and when I check admin page I see that:

 I have 5 GB Java Heap. 616.32 MB is dark gray, 3.13 GB is gray.

 Before my search it was something like: 150 MB dark gray, 500 MB gray.

 I understand that when I do a search like that, fields are cached. However
 when I look at other SolrCloud nodes' admin pages there are no differences.
 Why that query is cached only by the node that I run that query on?



Re: More Like This and Caching

2013-05-09 Thread Otis Gospodnetic
This is correct,  doc cache for previously read docs regardless of which
query read them and query cache for repeat query. Plus OS cache for actual
index files.

Otis
Solr  ElasticSearch Support
http://sematext.com/
On May 9, 2013 2:32 PM, Jason Hellman jhell...@innoventsolutions.com
wrote:

 Purely from empirical observation, both the DocumentCache and
 QueryResultCache are being populated and reused in reloads of a simple MLT
 search.  You can see in the cache inserts how much extra-curricular
 activity is happening to populate the MLT data by how many inserts and
 lookups occur on the first load.

 (lifted right out of the MLT wiki http://wiki.apache.org/solr/MoreLikeThis)


 http://localhost:8983/solr/select?q=apachemlt=truemlt.fl=manu,catmlt.mindf=1mlt.mintf=1fl=id,score

 There is no activity in the filterCache, fieldCache, or fieldValueCache -
 and that makes plenty of sense.

 On May 9, 2013, at 11:12 AM, David Parks davidpark...@yahoo.com wrote:

  I'm not the expert here, but perhaps what you're noticing is actually the
  OS's disk cache. The actual solr index isn't cached by solr, but as you
 read
  the blocks off disk the OS disk cache probably did cache those blocks for
  you. On the 2nd run the index blocks were read out of memory.
 
  There was a very extensive discussion on this list not long back titled:
  Re: SolrCloud loadbalancing, replication, and failover look that
 thread up
  and you'll get a lot of in-depth on the topic.
 
  David
 
 
  -Original Message-
  From: Giammarco Schisani [mailto:giamma...@schisani.com]
  Sent: Thursday, May 09, 2013 2:59 PM
  To: solr-user@lucene.apache.org
  Subject: More Like This and Caching
 
  Hi all,
 
  Could anybody explain which Solr cache (e.g. queryResultCache,
  documentCache, fieldCache, etc.) can be used by the More Like This
 handler?
 
  One of my colleagues had previously suggested that the More Like This
  handler does not take advantage of any of the Solr caches.
 
  However, if I issue two identical MLT requests to the same Solr instance,
  the second request will execute much faster than the first request (for
  example, the first request will execute in 200ms and the second request
 will
  execute in 20ms). This makes me believe that at least one of the Solr
 caches
  is being used by the More Like This handler.
 
  I think the documentCache is the cache that is most likely being used,
 but
  would you be able to confirm?
 
  As information, I am currently using Solr version 3.6.1.
 
  Kind regards,
  Giammarco Schisani
 




Re: Per Shard Replication Factor

2013-05-09 Thread Otis Gospodnetic
Could these just be different collections? Then sharding and replication is
independent.  And you can reduce replication factor as the index ages.

Otis
Solr  ElasticSearch Support
http://sematext.com/
On May 9, 2013 1:43 AM, Steven Bower smb-apa...@alcyon.net wrote:

 Is it currently possible to have per-shard replication factor?

 A bit of background on the use case...

 If you are hashing content to shards by a known factor (lets say date
 ranges, 12 shards, 1 per month) it might be the case that most of your
 search traffic would be directed to one particular shard (eg. the current
 month shard) and having increased query capacity in that shard would be
 useful... this could be extended to many use cases such as data hashed by
 organization, type, etc.

 Thanks,

 steve



Re: 4.3 logging setup

2013-05-09 Thread Jan Høydahl
I've updated the WIKI: 
http://wiki.apache.org/solr/SolrLogging#Switching_from_Log4J_logging_back_to_Java-util_logging

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

9. mai 2013 kl. 21:57 skrev Shawn Heisey s...@elyograg.org:

 On 5/9/2013 1:41 PM, richardg wrote:
 These are the files I have in my /lib folder:
 
 slf4j-api-1.6.6
 log4j-1.2.16
 jul-to-slf4j-1.6.6
 jcl-over-slf4j-1.6.6
 slf4j-jdk14-1.6.6
 log4j-over-slf4j-1.6.6
 
 Currently everything seems to be logging like before.  After I followed the
 instructions in Jan's post replacing slf4j-log4j12-1.6.6.jar with this
 slf4j-jdk14-1.6.6.jar it all started working.  Shawn I then removed
 everything as you instructed and put in just  log4j-over-slf4j-1.6.6.jar and
 slf4j-jdk14-1.6.6.jar but the index showed an error and wouldn't start.  So
 that is why I have those 6 files in there now, I'm not sure if
 log4j-over-slf4j-1.6.6.jar this file is needed or not.  Let me know if you
 need me to test anything else.
 
 You're on the right track.  Your list just has two files that shouldn't be 
 there - log4j-1.2.16 and jul-to-slf4j-1.6.6.  They are probably not causing 
 any real problems, but they might in the future.
 
 Remove those and you will have the exact list I was looking for.  If that 
 doesn't work, use a paste website (pastie.org and others) to send a log 
 showing the errors you get.
 
 Thanks,
 Shawn
 



Re: SOLR Error: Document is missing mandatory uniqueKey field

2013-05-09 Thread zaheer.java
Here is the stack trace:

DEBUG - 2013-05-09 18:53:06.411;
org.apache.solr.update.processor.LogUpdateProcessor; PRE_UPDATE
add{,id=(null)} {wt=javabinversion=2}
DEBUG - 2013-05-09 18:53:06.411;
org.apache.solr.update.processor.LogUpdateProcessor; PRE_UPDATE FINISH
{wt=javabinversion=2}
INFO  - 2013-05-09 18:53:06.412;
org.apache.solr.update.processor.LogUpdateProcessor; [orderitemsStage]
webapp=/solr path=/update params={wt=javabinversion=2}
{add=[488653_0_0_141_388 (1434610076088270848), 488653_0_0_141_388
(1434610076090368000), 488653_0_0_141_388 (1434610076091416576),
488653_0_0_141_388 (1434610076091416577), 488653_0_0_141_388
(1434610076092465152), 488653_0_0_141_388 (1434610076093513728),
488653_0_0_141_388 (1434610076094562304), 488653_0_0_141_388
(1434610076094562305), 488653_0_0_141_388 (1434610076095610880),
488653_0_0_141_388 (1434610076096659456), ... (4031 adds)]} 0 2790
ERROR - 2013-05-09 18:53:06.412; org.apache.solr.common.SolrException;
org.apache.solr.common.SolrException: Document is missing mandatory
uniqueKey field: orderItemKey
at
org.apache.solr.update.AddUpdateCommand.getIndexedId(AddUpdateCommand.java:88)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:517)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:396)
at
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
at
org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246)
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
at
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:936)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
at
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1004)
at
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
at
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-Error-Document-is-missing-mandatory-uniqueKey-field-tp4062177p4062178.html
Sent from the Solr - User mailing list archive at Nabble.com.


SOLR Error: Document is missing mandatory uniqueKey field

2013-05-09 Thread zaheer.java
I repeatedly get this error while adding documents to SOLR using SOLRJ
Document is missing mandatory uniqueKey field: orderItemKey.  This field
is defined as uniqueKey in the Document Schema. I've made sure that I'm
passing this field from Java by logging it upfront. 

As suggested somwhere, I've tried upgrading from 4.0 to 4.3, and also made
the field as required=false. 

Please help me debug or get a resolution to this problem.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-Error-Document-is-missing-mandatory-uniqueKey-field-tp4062177.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: dataimport handler

2013-05-09 Thread William Bell
It does not work anymore in 4.x.

${dih.last_index_time} does work, but the entity version does not.

Bill



On Tue, May 7, 2013 at 4:19 PM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 Using ${dih.entity_name.last_index_time} should work. Make sure you put
 it in quotes in your query.


 On Tue, May 7, 2013 at 12:07 PM, Eric Myers emy...@nabancard.com wrote:

  In the  data import handler  I have multiple entities.  Each one
  generates a date in the
  dataimport.properties i.e. entityname.last_index_time.
 
  How do I reference the specific entity time in my delta queries?
 
  Thanks
 
  Eric
 



 --
 Regards,
 Shalin Shekhar Mangar.




-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: SOLR Error: Document is missing mandatory uniqueKey field

2013-05-09 Thread Shawn Heisey
On 5/9/2013 7:44 PM, zaheer.java wrote:
 I repeatedly get this error while adding documents to SOLR using SOLRJ
 Document is missing mandatory uniqueKey field: orderItemKey.  This field
 is defined as uniqueKey in the Document Schema. I've made sure that I'm
 passing this field from Java by logging it upfront. 
 
 As suggested somwhere, I've tried upgrading from 4.0 to 4.3, and also made
 the field as required=false. 

If you have a uniqueKey defined in your schema, then every document must
define that field or you'll get the error message you're seeing.  That's
the entire point of a uniqueKey.  It is pretty much the same concept as
a primary key on a database table.

There is one main difference between uniqueKey and a DB primary key -
the database will prevent you from inserting a record with the same ID
as an existing record, but Solr uses it to allow easy reindexing.
Sending a document with the same ID as an existing document will cause
Solr to delete the old one before inserting the new one.

Certain Solr features, notably distributed search, require a uniqueKey.
 SolrCloud uses distributed search so it also requires it.

If you're not using features that require uniqueKey, and you don't need
Solr to delete duplicate documents, then you can remove that from your
schema.  It's not recommended, but it should work.

Thanks,
Shawn



Re: SOLR guidance required

2013-05-09 Thread Shawn Heisey
On 5/9/2013 9:41 PM, Kamal Palei wrote:
 I hope there must be some mechanism, by which I can associate salary,
 experience, age etc with resume document during indexing. And when
 I search for resumes I can give all filters accordingly and can retrieve
 100 records and strait way I can show 100 records to user without doing any
 mysql query. Please let me know if this is feasible. If so, kindly give me
 some pointer how do I do it.

If you define fields for these values in your schema, then you can send
send filter queries to restrict the search.  Solr will filter invalid
documents out and only send the results that match your requirements.
Some examples of the filter queries you can use are below.  You can add
more than one of these; they will be ANDed together.

fq=age:[21 TO 45]
fq=experience:[2 TO *]
fq=salaryReq:[* TO 55000]

If you're using a Solr API (for Java, PHP, etc) rather than constructing
a URL to send directly to Solr, then the API will have a mechanism for
adding filters to your query.

One caveat: unless you can write code that will automatically extract
this information from a resume and/or application, then you will need
someone doing data entry that drives the indexing, or you will need
prospective employees to fill out a computerized form for their application.

Thanks,
Shawn



RE: Is the CoreAdmin RENAME method atomic?

2013-05-09 Thread David Parks
Find the discussion titled Indexing off the production servers just a week
ago in this same forum, there is a significant discussion of this feature
that you will probably want to review.


-Original Message-
From: Lan [mailto:dung@gmail.com] 
Sent: Friday, May 10, 2013 3:42 AM
To: solr-user@lucene.apache.org
Subject: Is the CoreAdmin RENAME method atomic?

We need to implement a locking mechanism for a full-reindexing SOLR server
pool. We could use a database, Zookeeper as our locking mechanism but thats
a lot of work. Could solr do it?

I noticed the core admin RENAME function
(http://wiki.apache.org/solr/CoreAdmin#RENAME) Is this an synchronous atomic
operation?

What I'm thinking is we create a solr core named 'lock' and any process that
wants to obtain a solr server from the pool tries to rename the 'lock' core
to say 'lock.someuniqueid'. If it fails, then it tries another server in the
pools or waits a bit. If it succeeds, it reindexes it's data and then
renames 'lock.someuniqueid' back to 'lock' to return the server back to the
pool.









--
View this message in context:
http://lucene.472066.n3.nabble.com/Is-the-CoreAdmin-RENAME-method-atomic-tp4
061944.html
Sent from the Solr - User mailing list archive at Nabble.com.