date:20121119

Your using ram dir?

Sent from my iPhone

On Nov 19, 2012, at 1:21 AM, deniz denizdurmu...@gmail.com wrote:

 Hello,
 
 for test purposes, I am running two zookeepers on ports 2181 and 2182. and i
 have two solr instances running on different machines...
 
 For the one which is running on my local and acts as leader:
 java -Dbootstrap_conf=true -DzkHost=localhost:2181 -jar start.jar
 
 and for the one which acts as follower, on a remote machine:
 java -Djetty.port=7574 -DzkHost=address-of-mylocal:2182 -jar start.jar
 
 until this point everything is smooth and i can see the configs on both
 zookeeper hosts when i connect with zkCli.sh. 
 
 just to see what happens and check recovery stuff, i have killed the solr
 which is running on my local and tried to index some files by using the
 follewer, which was failed... this is normal as writes are routed into the
 leader...
 
 the point that i dont understand is here:
 
 when i restart the leader with the same command on terminal, after normal
 logs, it start showing this 
 
 
 Nov 19, 2012 2:15:18 PM org.apache.solr.common.SolrException log
 SEVERE: SnapPull failed :org.apache.solr.common.SolrException: Index fetch
 failed : 
at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:400)
at
 org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:297)
at
 org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:151)
at
 org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:405)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:220)
 Caused by: org.apache.lucene.index.IndexNotFoundException: no segments* file
 found in org.apache.lucene.store.RAMDirectory@1e75e89
 lockFactory=org.apache.lucene.store.NativeFSLockFactory@128e909: files: []
at
 org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:741)
at
 org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:630)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:343)
at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:639)
at org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:75)
at org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:62)
at
 org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:191)
at
 org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:77)
at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:354)
... 4 more
 
 Nov 19, 2012 2:15:18 PM org.apache.solr.common.SolrException log
 SEVERE: Error while trying to recover:org.apache.solr.common.SolrException:
 Replication for recovery failed.
at
 org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:154)
at
 org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:405)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:220)
 
 
 it fails to recover after shutdown... why does this happen? 
 
 
 
 
 
 
 -
 Zeki ama calismiyor... Calissa yapar...
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/SolrCloud-Error-after-leader-restarts-tp4020985.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: CloudSolrServer or load-balancer for indexing

Nodes stop accepting updates if they cannot talk to Zookeeper, so the external 
load balancer is no advantage there.

CloudSolrServer will be smart about knowing who the leaders are, eventually 
will do hashing, will auto add/remove nodes from rotation based on the cluster 
state in Zookeeper, and is probably out of the box more intelligent about 
retrying on some responses (for example responses that are returned on shutdown 
or startup).

- Mark

On Nov 19, 2012, at 6:54 AM, Marcin Rzewucki mrzewu...@gmail.com wrote:

 Hi,
 
 As far as I know CloudSolrServer is recommended to be used for indexing to
 SolrCloud. I wonder what are advantages of this approach over external
 load-balancer ? Let's say I have 4 nodes SolrCloud (2 shards + replicas) +
 1 server running ZooKeeper. I can use CloudSolrServer for indexing or use
 load-balancer and send updates to any existing node. In former case it
 seems that ZooKeeper is a single point of failure - indexing is not
 possible if it is down. In latter case I can still indexing data even if
 some nodes are down (no data outage). What is better for reliable indexing
 - CloudSolrServer, load-balancer or you know some different methods worth
 to consider ?
 
 Regards.

SolrCloud and exernal file fields

2012-11-19 Thread Simone Gianni

Hi all,
I'm planning to move a quite big Solr index to SolrCloud. However, in this
index, an external file field is used for popularity ranking.

Does SolrCloud supports external file fields? How does it cope with
sharding and replication? Where should the external file be placed now that
the index folder is not local but in the cloud?

Are there otherwise other best practices to deal with the use cases
external file fields were used for, like popularity/ranking, in SolrCloud?
Custom ValueSources going to something external?

Thanks in advance,
Simone

Re: Custom Solr indexer/searcher

2012-11-19 Thread Smiley, David W.

FWIW I helped someone a few days ago about a similar problem and similarly 
advised modifying SpatialPrefixTree:
http://lucene.472066.n3.nabble.com/PointType-multivalued-query-tt4020445.html

IMO GeoHashField should be deprecated because it ads no value.

~ David

On Nov 16, 2012, at 1:49 PM, Scott Smith wrote:

 Thanks for the suggestions.  I'll take a look at these things.
 
 -Original Message-
 From: Mikhail Khludnev [mailto:mkhlud...@griddynamics.com] 
 Sent: Thursday, November 15, 2012 11:54 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Custom Solr indexer/searcher
 
 Scott,
 It sounds like you need to look into few samples of similar things in Lucene. 
 On top of my head FuzzyQuery from 4.0, which finds terms similar to the given 
 in FST for query expansion. Generic query expansion is done via 
 MultiTermQuery. Index time terms expansion is shown in TrieField and btw 
 NumericRangeQuery (it should match with your goal a lot). All these are 
 single dimension samples, but AFAIK KD-tree is multidimensional, look into 
 GeoHashField which puts two dimensional points into single terms with ability 
 to build ranges on them see GeoHashField.createSpatialQuery().
 
 Happy hacking!
 
 
 On Fri, Nov 16, 2012 at 10:34 AM, John Whelan whelanl...@gmail.com wrote:
 
 Scott,
 
 I probably have no idea as to what I'm saying, but if you're looking 
 for finding results in a N-dimensional space, you might look at 
 creating a field of type 'point'. Point-type fields have a dimension 
 attribute; I believe that it can be set to a large integer value.
 
 Barring that, there is also a 'dist()' function that can be used to 
 work with multiple numeric fields in order sort results based on 
 closeness to a desired coordinate. The 'dist function takes a 
 parameter to specify the means of calculating the distance. (For example, 2 
 - 'Euclidean distance'.
 I don't know the other options.)
 
 In the worst case, my response is worthless, but pops your question 
 back up in the e-mails...
 
 Regards,
 John
 
 
 
 
 --
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics
 
 http://www.griddynamics.com
 mkhlud...@griddynamics.com

solr cloud shards and servers issue

2012-11-19 Thread joe.cohe...@gmail.com

Hi
I have the following scenario:
I have 1 collection across 10 servers. Num of shards: 10.
Each server has 2 solr instances running. replication is 2.

I want to move one of the instances to another server. meaning, kill the
solr process in server X and start a new solr process in server Y instead.
When I kill the solr process in server X, I can still see that instance in
the solr-cloud-graph (marked differently).
When I run the instance on server Y, it get attahced to another shard,
instead of getting into the shard that is now actually missing an instance.

1. Any way to tell solr/zookeeper  - Forget about that instance?
2. when running a new solr instance - any way to tell solr/zookeper - add
this instance to shard X?

thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-cloud-shards-and-servers-issue-tp4021101.html
Sent from the Solr - User mailing list archive at Nabble.com.

How do I best detect when my DIH load is done?

2012-11-19 Thread Andy Lester

A little while back, I needed a way to tell if my DIH load was done, so I made 
up a little Ruby program to query /dih?command=status .  The program is here: 
http://petdance.com/2012/07/a-little-ruby-program-to-monitor-solr-dih-imports/

Is this the best way to do it?  Is there some other tool or interface that I 
should be using instead?

Thanks,
xoa

--
Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance

Re: solr cloud shards and servers issue


On Nov 19, 2012, at 11:24 AM, joe.cohe...@gmail.com wrote:

 Hi
 I have the following scenario:
 I have 1 collection across 10 servers. Num of shards: 10.
 Each server has 2 solr instances running. replication is 2.
 
 I want to move one of the instances to another server. meaning, kill the
 solr process in server X and start a new solr process in server Y instead.
 When I kill the solr process in server X, I can still see that instance in
 the solr-cloud-graph (marked differently).
 When I run the instance on server Y, it get attahced to another shard,
 instead of getting into the shard that is now actually missing an instance.
 
 1. Any way to tell solr/zookeeper  - Forget about that instance?

Unload the SolrCores involved.

 2. when running a new solr instance - any way to tell solr/zookeper - add
 this instance to shard X?

Specify a shardId when creating the core or configuring it in solr.xml and make 
it match the shard you want to add to.

- Mark

Order by hl.snippets count

2012-11-19 Thread Gabriel Croitoru


Hello,
I'm using  Solr 1.3 with 
http://wiki.apache.org/solr/HighlightingParameters options.
The client just asked us to change the order from the default score to 
the number of hl.snippets per document.


It's this posibble from Solr configuration? (without implementing a 
custom scoring algorithm)?


Thanks,
--
*Gabriel-Cristian CROITORU*

Senior Software Engineer
www.zitec.com
Tel. +40 (0)31 71 00 114

We are hiring! www.zitec.com/join-zitec

Re: solr cloud shards and servers issue

2012-11-19 Thread joe.cohe...@gmail.com

How can I unload a solrCore after i killed the running process?


Mark Miller-3 wrote
 On Nov 19, 2012, at 11:24 AM, 

 joe.cohen.m@

  wrote:
 
 Hi
 I have the following scenario:
 I have 1 collection across 10 servers. Num of shards: 10.
 Each server has 2 solr instances running. replication is 2.
 
 I want to move one of the instances to another server. meaning, kill the
 solr process in server X and start a new solr process in server Y
 instead.
 When I kill the solr process in server X, I can still see that instance
 in
 the solr-cloud-graph (marked differently).
 When I run the instance on server Y, it get attahced to another shard,
 instead of getting into the shard that is now actually missing an
 instance.
 
 1. Any way to tell solr/zookeeper  - Forget about that instance?
 
 Unload the SolrCores involved.
 
 2. when running a new solr instance - any way to tell solr/zookeper -
 add
 this instance to shard X?
 
 Specify a shardId when creating the core or configuring it in solr.xml and
 make it match the shard you want to add to.
 
 - Mark





--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-cloud-shards-and-servers-issue-tp4021101p402.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: inconsistent number of results returned in solr cloud

2012-11-19 Thread Buttler, David

Answers inline below

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Saturday, November 17, 2012 6:40 AM
To: solr-user@lucene.apache.org
Subject: Re: inconsistent number of results returned in solr cloud

Hmmm, first an aside. If by commit after every batch of documents  you
mean after every call to server.add(doclist), there's no real need to do
that unless you're striving for really low latency. the usual
recommendation is to use commitWithin when adding and commit only at the
very end of the run. This shouldn't actually be germane to your issue, just
an FYI.

DB Good point.  The code for committing docs to solr is fairly old.  I will 
update it since I don't have a latency requirement.

So you're saying that the inconsistency is permanent? By that I mean it
keeps coming back inconsistently for minutes/hours/days?

DB Yes, it is permanent.  I have collections that have been up for weeks, and 
are still returning inconsistent results, and I haven't been adding any 
additional documents.
DB Related to this, I seem to have a discrepancy between the number of 
documents I think I am sending to solr, and the number of documents it is 
reporting.  I have tried reducing the number of shards for one of my small 
collections, so I deleted all references to this collections, and reloaded it. 
I think I have 260 documents submitted (counted from a hadoop job).  Solr 
returns a count of ~430 (it varies), and the first returned document is not 
consistent.

I guess if I were trying to test this I'd need to know how you added
subsequent collections. In particular what you did re: zookeeper as you
added each collection.

DB These are my steps
DB 1. Create the collection via the HTTP API: 
http://host:port/solr/admin/collections?action=CREATEname=collectionnumShards=6%20collection.configName=collection
DB 2. Relaunch one of my JVM processes, bootstrapping the collection: 
DB java -Xmx16g -Dcollection.configName=collection -Djetty.port=port 
-DzkHost=zkhost -Dsolr.solr.home=solr home -DnumShards=6 
-Dbootstrap_confdir=conf -jar start.jar
DB load data

DB Let me know if something is unclear.  I can run through the process again 
and document it more carefully.
DB
DB Thanks for looking at it,
DB Dave

Best
Erick

On Fri, Nov 16, 2012 at 2:55 PM, Buttler, David buttl...@llnl.gov wrote:

 My typical way of adding documents is through SolrJ, where I commit after
 every batch of documents (where the batch size is configurable)

 I have now tried committing several times, from the command line (curl)
 with and without openSearcher=true.  It does not affect anything.

 Dave

 -Original Message-
 From: Mark Miller [mailto:markrmil...@gmail.com]
 Sent: Friday, November 16, 2012 11:04 AM
 To: solr-user@lucene.apache.org
 Subject: Re: inconsistent number of results returned in solr cloud

 How did you do the final commit? Can you try a lone commit (with
 openSearcher=true) and see if that affects things?

 Trying to determine if this is a known issue or not.

 - Mark

 On Nov 16, 2012, at 1:34 PM, Buttler, David buttl...@llnl.gov wrote:

  Hi all,
  I buried an issue in my last post, so let me pop it up.

  I have a cluster with 10 collections on it.  The first collection I
 loaded works perfectly.  But every subsequent collection returns an
 inconsistent number of results for each query.  The queries can be simply
 *:*, or more complex facet queries.  If I go to individual cores and issue
 the query, with distrib=false, I get a consistent number of results.  I am
 wondering if there is some delay in returning results from my shards, and
 the queried node just times out and displays the number of results that it
 has received so far.  If there is such a timeout, it must be very small, as
 my QTime is around 11 ms.

  Dave

RE: Architecture Question

2012-11-19 Thread Buttler, David

If you just want to store the data, you can dump it into HDFS sequence files.  
While HBase is really nice if you want to process and serve data real-time, it 
adds overhead to use it as pure storage.
Dave

-Original Message-
From: Cool Techi [mailto:cooltec...@outlook.com] 
Sent: Friday, November 16, 2012 8:26 PM
To: solr-user@lucene.apache.org
Subject: RE: Architecture Question

Hi Otis,

Thanks for your reply, just wanted to check what NoSql structure would be best 
suited to store data and use the least amount of memory, since for most of my 
work Solr would be sufficient and I want to store data just in case we want to 
reindex and as a backup.

Regards,
Ayush

 Date: Fri, 16 Nov 2012 15:47:40 -0500
 Subject: Re: Architecture Question
 From: otis.gospodne...@gmail.com
 To: solr-user@lucene.apache.org

 Hello,

  I am not sure if this is the right forum for this question, but it would
  be great if I could be pointed in the right direction. We have been using a
  combination of MySql and Solr for all our company full text and query
  needs.  But as our customers have grow so has the amount of data and MySql
  is just not proving to be a right option for storing/querying.

  I have been looking at Solr Cloud and it looks really impressive, but and
  not sure if we should give away our storage system. So, I have been
  exploring DataStax but a commercial option is out of question. So we were
  thinking of using hbase to store the data and at the same time index the
  data into Solr cloud, but for many reasons this design doesn't seem
  convincing (Also seen basic of Lilly).

  1) Would it be recommended to just user Solr cloud with multiple
  replication or hbase-solr seems like good option

 If you trust SolrCloud with replication and keep all your fields stored
 then you could live without an external DB.  At this point I personally
 would still want an external DB.  Whether HBase is the right DB for the job
 I can't tell because I don't know anything about your data, volume, access
 patterns, etc.  I can tell you that HBase does scale well - we have tables
 with many billions of rows stored in it for instance.

  2) How much strain would be to keep both Solr Shard and Hbase node on the
  same machine

 HBase loves memory.  So does Solr.  They both dislike disk IO (who
 doesn't!).  Solr can use a lot of CPU for indexing/searching, depending on
 the volume.  HBase RegionServers can use a lot of CPU if you run MapReuce
 on data in HBase.

  3) if there a calculation on what kind of machine configuration would I
  need to store 500-1000 million records. Most of these with be social data
  (Twitter/facebook/blogs etc) and how many shards.

 No recipe here, unfortunately.  You'd have to experiment and test, do load
 and performance testing, etc.  If you need help with Solr + HBase, we
 happen to have a lot of experience with both and have even used them
 together for some of our clients.

 Otis
 --
 Performance Monitoring - http://sematext.com/spm/index.html
 Search Analytics - http://sematext.com/search-analytics/index.html

RE: How do I best detect when my DIH load is done?

2012-11-19 Thread Dyer, James

Andy,

I use an approach similar to yours.  There may be something better, however.  
You might be able to write an onImportEnd listener to tell you when it ends.  

See http://wiki.apache.org/solr/DataImportHandler#EventListeners for a little 
documentation

See also https://issues.apache.org/jira/browse/SOLR-938 and 
https://issues.apache.org/jira/browse/SOLR-1081 for the background on this 
feature.

If you do end up using this let us know how it works and if there is anything 
you could see to improve it.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Andy Lester [mailto:a...@petdance.com] 
Sent: Monday, November 19, 2012 10:29 AM
To: solr-user@lucene.apache.org
Subject: How do I best detect when my DIH load is done?

A little while back, I needed a way to tell if my DIH load was done, so I made 
up a little Ruby program to query /dih?command=status .  The program is here: 
http://petdance.com/2012/07/a-little-ruby-program-to-monitor-solr-dih-imports/

Is this the best way to do it?  Is there some other tool or interface that I 
should be using instead?

Thanks,
xoa

--
Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance

Search using the result returned from the spell checking component

2012-11-19 Thread Roni

Hi,

I've successfully configured the spell check component and it works well.

I couldn't find an answer to my question so any help would be much
appreciated: 

Can i send a single request to Solr, and make it so that if any part of the
query was misspelled, than the search would be performed using the first
spell suggestion that returns?

I want to make only one request, e.g. submit a query only once, if that is
possible.

For example: if a user searched for jaca than the search would be
performed only once - for java.

Thanks an advance for any answer or a link to a relevant resource (I
couldn't find any).

  



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Search-using-the-result-returned-from-the-spell-checking-component-tp4021135.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Search using the result returned from the spell checking component

2012-11-19 Thread Dyer, James

What you want isn't supported.  You always will need to issue that second 
request.  This would be a nice feature to add though.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Roni [mailto:r...@socialarray.com] 
Sent: Monday, November 19, 2012 12:54 PM
To: solr-user@lucene.apache.org
Subject: Search using the result returned from the spell checking component

Hi,

I've successfully configured the spell check component and it works well.

I couldn't find an answer to my question so any help would be much
appreciated: 

Can i send a single request to Solr, and make it so that if any part of the
query was misspelled, than the search would be performed using the first
spell suggestion that returns?

I want to make only one request, e.g. submit a query only once, if that is
possible.

For example: if a user searched for jaca than the search would be
performed only once - for java.

Thanks an advance for any answer or a link to a relevant resource (I
couldn't find any).

  



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Search-using-the-result-returned-from-the-spell-checking-component-tp4021135.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Search using the result returned from the spell checking component

2012-11-19 Thread Roni

Thank you.

I was wondering - what if a make a first request, and ask it to return only
1 result - will it still return the spell suggestions while avoiding the
overhead of returning all relevant results?

Than I could make a second request to get all the results i need.

Would that work?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Search-using-the-result-returned-from-the-spell-checking-component-tp4021135p4021140.html
Sent from the Solr - User mailing list archive at Nabble.com.

Cacti monitoring of Solr and Tomcat

2012-11-19 Thread Andy Lester

Is anyone using Cacti to track trends over time in Solr and Tomcat metrics?  We 
have Nagios set up for alerts, but want to track trends over time.

I've found a couple of examples online, but none have worked completely for me. 
 I'm looking at this one next: 
http://forums.cacti.net/viewtopic.php?f=12t=19744start=15  It looks promising 
although it doesn't monitor Solr itself.

Suggestions?

Thanks,
Andy

--
Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance

Re: Search using the result returned from the spell checking component

2012-11-19 Thread Walter Underwood

You can even request zero rows. That will still return the number of matches.  
--wunder

On Nov 19, 2012, at 11:12 AM, Roni wrote:

 Thank you.
 
 I was wondering - what if a make a first request, and ask it to return only
 1 result - will it still return the spell suggestions while avoiding the
 overhead of returning all relevant results?
 
 Than I could make a second request to get all the results i need.
 
 Would that work?

Re: Search using the result returned from the spell checking component

2012-11-19 Thread Roni

And performance-wise: is asking for 0 rows the same as asking for 100 rows?

On Mon, Nov 19, 2012 at 9:22 PM, Walter Underwood [via Lucene] 
ml-node+s472066n4021143...@n3.nabble.com wrote:

 You can even request zero rows. That will still return the number of
 matches.  --wunder

 On Nov 19, 2012, at 11:12 AM, Roni wrote:

  Thank you.
 
  I was wondering - what if a make a first request, and ask it to return
 only
  1 result - will it still return the spell suggestions while avoiding the
  overhead of returning all relevant results?
 
  Than I could make a second request to get all the results i need.
 
  Would that work?





 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://lucene.472066.n3.nabble.com/Search-using-the-result-returned-from-the-spell-checking-component-tp4021135p4021143.html
  To unsubscribe from Search using the result returned from the spell
 checking component, click 
 herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4021135code=cm9uaUBzb2NpYWxhcnJheS5jb218NDAyMTEzNXwtMTQ5MzI5ODA0Mw==
 .
 NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Search-using-the-result-returned-from-the-spell-checking-component-tp4021135p4021144.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How do I best detect when my DIH load is done?

2012-11-19 Thread Shawn Heisey


On 11/19/2012 11:52 AM, Dyer, James wrote:

Andy,

I use an approach similar to yours.  There may be something better, however.  You might 
be able to write an onImportEnd listener to tell you when it ends.

See http://wiki.apache.org/solr/DataImportHandler#EventListeners for a little 
documentation

See also https://issues.apache.org/jira/browse/SOLR-938 and 
https://issues.apache.org/jira/browse/SOLR-1081 for the background on this 
feature.

If you do end up using this let us know how it works and if there is anything 
you could see to improve it.


I think it would be a good idea to provide a SolrJ API out of the box 
(similar to CoreAdminRequest) for gathering the status URL from Solr and 
obtaining the following information:


1) Determining import status
-a) never started (idle)
-b) finished successful (idle)
-c) finished with error, canceled, etc. (idle)
-d) in progress. (busy)
2) Determining how many documents have been added.
3) Determining how long the import took or has taken so far.
4) Any other commonly gathered information.

There may be some reluctance to do this simply because DIH is a contrib 
module.  Perhaps there could be a contrib module for SolrJ?


Thanks,
Shawn

Can Solr v1.4 and v4.0 co-exist in Tomcat?

2012-11-19 Thread kfdroid

I have an existing v1.4 implementation of Solr that supports 2 lines of
business. For a third line of business the need to do Geo searching requires
using Solr 4.0. I'd like to minimize the impact to the existing lines of
business (let them upgrade at their own pace), however I want to share
hardware if possible. 

Can I have Solr 4.0 and Solr 1.4 co-exist in the same Tomcat instance? If
so, are there any potential side-effects to the existing Solr implementation
I should be aware of?

Thanks,
Ken



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-Solr-v1-4-and-v4-0-co-exist-in-Tomcat-tp4021146.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How do I best detect when my DIH load is done?

2012-11-19 Thread geeky2

Hello Andy,

i had a similar question on this some time ago.

http://lucene.472066.n3.nabble.com/possible-status-codes-from-solr-during-a-DIH-data-import-process-td3987110.html#a3987123

http://lucene.472066.n3.nabble.com/need-input-lessons-learned-or-best-practices-for-data-imports-td3801327.html#a3803658

i ended up writing my own shell based polling application that runs from our
*nx batch server that handles all of our Control-M work.  

+1 on the idea of making this a more formal part of the API.

let me know if you want concrete example code.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-do-I-best-detect-when-my-DIH-load-is-done-tp4021121p4021148.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Cacti monitoring of Solr and Tomcat

Hi Andy,

My favourite topic ;)  See my sig below for SPM for Solr. At my last
company we used Cacti but it felt very 1990s almost. Some ppl use zabbix,
some graphite, some newrelic, some SPM, some nothing!

Otis
--
Solr Performance Monitoring - http://sematext.com/spm
On Nov 19, 2012 2:18 PM, Andy Lester a...@petdance.com wrote:

 Is anyone using Cacti to track trends over time in Solr and Tomcat
 metrics?  We have Nagios set up for alerts, but want to track trends over
 time.

 I've found a couple of examples online, but none have worked completely
 for me.  I'm looking at this one next:
 http://forums.cacti.net/viewtopic.php?f=12t=19744start=15  It looks
 promising although it doesn't monitor Solr itself.

 Suggestions?

 Thanks,
 Andy

 --
 Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance

RE: How do I best detect when my DIH load is done?

2012-11-19 Thread geeky2

James,

was it you (cannot remember) that replied to one of my queries on this
subject and mentioned that there was consideration being given to cleaning
up the response codes to remove ambiguity?





--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-do-I-best-detect-when-my-DIH-load-is-done-tp4021121p4021150.html
Sent from the Solr - User mailing list archive at Nabble.com.

Inserting many documents and update relations

2012-11-19 Thread uwe72

Hi there,

i have a principal question.

We have arround 5 million lucene documents. 

At the beginning we have arround 4000 XML-files which we transform to
SolrInputDocuemnts by using solrj and adding them to the index.

A document is also related to other documents, so while adding a document we
have to do some queries (at least one) to identiy if there are related
documents already in the cache in order to do the association to the related
document. The related document also has a backlink, so we have to update
also the related document (means load, update, delete and re-add).

We are using solr 3.6.1.

The performance is quite slow because of this queries and modfifications of
already existing documents in the cache.

Are there some configuration issues what we can do, or anything else?

Thanks a lot in advance.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Inserting-many-documents-and-update-relations-tp4021151.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: How do I best detect when my DIH load is done?

2012-11-19 Thread Dyer, James

I'm not sure.  But there are at least a few jira issues open with differing 
ideas on how to improve this.  For instance,

SOLR-1554
SOLR-2728
SOLR-2729

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: geeky2 [mailto:gee...@hotmail.com] 
Sent: Monday, November 19, 2012 1:52 PM
To: solr-user@lucene.apache.org
Subject: RE: How do I best detect when my DIH load is done?

James,

was it you (cannot remember) that replied to one of my queries on this
subject and mentioned that there was consideration being given to cleaning
up the response codes to remove ambiguity?





--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-do-I-best-detect-when-my-DIH-load-is-done-tp4021121p4021150.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Can Solr v1.4 and v4.0 co-exist in Tomcat?

2012-11-19 Thread James Jory

Hi Ken-

We've been running 1.3 and 4.0 as separate web apps within the same Tomcat 
instance for the last 3 weeks with no issues. The only challenge for us was 
refactoring our app client code to use SolrJ 4.0 to access both the the 1.3 and 
4.0 backends. The calls to the 1.3 backend use the XML response format while 
the 4.0 backend use the Java binary format.

-James

On Nov 19, 2012, at 11:40 AM, kfdroid kfdr...@gmail.com wrote:

 I have an existing v1.4 implementation of Solr that supports 2 lines of
 business. For a third line of business the need to do Geo searching requires
 using Solr 4.0. I'd like to minimize the impact to the existing lines of
 business (let them upgrade at their own pace), however I want to share
 hardware if possible. 
 
 Can I have Solr 4.0 and Solr 1.4 co-exist in the same Tomcat instance? If
 so, are there any potential side-effects to the existing Solr implementation
 I should be aware of?
 
 Thanks,
 Ken
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Can-Solr-v1-4-and-v4-0-co-exist-in-Tomcat-tp4021146.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Per user document exclusions

2012-11-19 Thread SUJIT PAL

Hi Christian,

Since customization is not a problem in your case, how about writing out the 
userId and excluded document ids to the database when it is excluded, and then 
for each query from the user (possibly identified by a userid parameter), 
lookup the database by userid, construct a NOT filter out of the excluded 
docIds, then send to Solr as the fq?

We are using a variant of this approach to allow database style wildcard search 
on document titles.

-sujit
 
On Nov 18, 2012, at 9:05 PM, Christian Jensen wrote:

 Hi,
 
 We have a need to allow each user to 'exclude' individual documents in the
 results. We can easily do this now within the RDBMS using a FTS index and a
 query with 'OUTER LEFT JOIN WHERE NULL' type of thing.
 
 Can Solr do this somehow? Heavy customization is not a problem - I would
 bet this has already been done. I would like to avoid multiple trips back
 and forth from either the DB or SOLR if possible.
 
 Thanks!
 Christian
 
 -- 
 
 *Christian Jensen*
 724 Ioco Rd
 Port Moody, BC V3H 2W8
 +1 (778) 996-4283
 christ...@jensenbox.com

Re: Solr4.0 / SolrCloud queries

2012-11-19 Thread shreejay

Hi all , 

I have managed to successfully index around 6 million documents, but while
indexing (and even now after the indexing has stopped), I am running into a
bunch of errors. 

The most common error I see is 
/ null:org.apache.solr.common.SolrException:
org.apache.solr.client.solrj.SolrServerException: Server refused connection
at: http://ABC:8983/solr/xyzabc/

I have made sure that the servers are able to communicate with each other
using the same names. 

Another error I keep getting is that the leader stops recovering and goes
red / recovery failed.
/Error while trying to recover.
core=ABC123:org.apache.solr.common.SolrException: We are not the leader/


The servers intermittently go offline taking down one of the shards and in
turn stopping all search queries. 

The configuration I have 

Shard1:
Server1 -  Memory - 22GB , JVM - 8gb 
Server2 - Memory - 22GB , JVM - 10gb  (This one is on recovery failed
status, but still acting as a leader). 

Shard2:
Server1 -  Memory - 22GB , JVM - 8 GB (This one is on recovery failed
status, but still acting as a leader). 
Server2 - Memory -  22 GB, JVM - 8 GB

Shard3 
Server1 - Memory -  22 GB, JVM - 10 GB
Server2 - Memory -  22 GB, JVM - 8 GB

While typing his post I did a Reload from the Core Admin page, and both
servers (Shard1-Server2 and Shard2-Server1)came back up again. 

Has anyone else encountered these issues? Any steps to prevent these? 

Thanks. 


--Shreejay






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr4-0-SolrCloud-queries-tp4016825p4021154.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Per user document exclusions

Hi Christian,

Since you didn't explicitly mention it, I'm not sure if you are aware of it
- ManifoldCF has ACL support built in.  This may be what you are after.

Otis
--
Solr Performance Monitoring - http://sematext.com/spm/index.html
Search Analytics - http://sematext.com/search-analytics/index.html




On Mon, Nov 19, 2012 at 12:05 AM, Christian Jensen
christ...@jensenbox.comwrote:

 Hi,

 We have a need to allow each user to 'exclude' individual documents in the
 results. We can easily do this now within the RDBMS using a FTS index and a
 query with 'OUTER LEFT JOIN WHERE NULL' type of thing.

 Can Solr do this somehow? Heavy customization is not a problem - I would
 bet this has already been done. I would like to avoid multiple trips back
 and forth from either the DB or SOLR if possible.

 Thanks!
 Christian

 --

 *Christian Jensen*
 724 Ioco Rd
 Port Moody, BC V3H 2W8
 +1 (778) 996-4283
 christ...@jensenbox.com

Re: Best way to retrieve 20 specific documents

2012-11-19 Thread Tomás Fernández Löbbe

If you are in Solr 4 you could use realtime get and list the ids that you
need. For example:
http://host:port/solr/mycore/get?ids=my_id_1,my_id_2...

See http://lucidworks.lucidimagination.com/display/solr/RealTime+Get

Tomás


On Mon, Nov 19, 2012 at 5:27 PM, Otis Gospodnetic 
otis.gospodne...@gmail.com wrote:

 Hi,

 How about id1 OR id2 OR id3? :)

 Otis
 --
 Performance Monitoring - http://sematext.com/spm/index.html
 Search Analytics - http://sematext.com/search-analytics/index.html




 On Mon, Nov 19, 2012 at 2:40 PM, Dotan Cohen dotanco...@gmail.com wrote:

  Suppose that an application needs to retrieve about 20-30 solr
  documents by id. The application could simply run 20 queries to
  retrieve them, but is there a better way? The id field is stored and
  indexed, of course. It is of type solr.StrField, and is configured as
  the uniqueKey.
 
  Thank you for any insight.
 
  --
  Dotan Cohen
 
  http://gibberish.co.il
  http://what-is-what.com

Re: Cacti monitoring of Solr and Tomcat

2012-11-19 Thread Andy Lester


On Nov 19, 2012, at 1:46 PM, Otis Gospodnetic otis.gospodne...@gmail.com 
wrote:

 My favourite topic ;)  See my sig below for SPM for Solr. At my last
 company we used Cacti but it felt very 1990s almost. Some ppl use zabbix,
 some graphite, some newrelic, some SPM, some nothing!


SPM looks mighty tasty, but we must have it in-house on our own servers, for 
monitoring internal dev systems, and we'd like it to be open source.

We already have Cacti up and running, but it's possible we could use something 
else.

--
Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance

Re: Solr Delta Import Handler not working

2012-11-19 Thread Lance Norskog

|  dataSource=null

I think this should not be here. The datasource should default to the 
dataSource listing. And 'rootEntity=true' should be in the 
XPathEntityProcessor block, because you are adding each file as one document.

- Original Message -
| From: Spadez james_will...@hotmail.com
| To: solr-user@lucene.apache.org
| Sent: Sunday, November 18, 2012 7:34:34 AM
| Subject: Re: Solr Delta Import Handler not working
| 
| Update! Thank you to Lance for the help. Based on your suggestion I
| have
| fixed up a few things.
| 
| *My Dataconfig now has the filename pattern fixed and root
| entity=true*
| /dataConfig
|   dataSource type=FileDataSource /
|   document
| entity
|   name=document
|   processor=FileListEntityProcessor
|   baseDir=/var/lib/employ
|   fileName=^.*\.xml$
|   recursive=false
|   rootEntity=true
|   dataSource=null
|   entity
| processor=XPathEntityProcessor
| url=${document.fileAbsolutePath}
| useSolrAddSchema=true
| stream=true
|   /entity
| /entity
|   /document
| /dataConfig/
| 
| *My data.xml has a corrected date format with T:*
| /add
| doc
| field name=id123/field
|   field name=titleDelta Import 2/field
| field name=descriptionThis is my long description/field
|   field name=truncated_descriptionThis is/field
| 
| field name=companyGoogle/field
| field name=location_nameEngland/field
| field name=date2007-12-31T22:29:59/field
| field name=sourceGoogle/field
| field name=urlwww.google.com/field
| field name=latlng45.17614,45.17614/field
| /doc
| /add/
| 
| 
| 
| --
| View this message in context:
| 
http://lucene.472066.n3.nabble.com/Solr-Delta-Import-Handler-not-working-tp4020897p4020925.html
| Sent from the Solr - User mailing list archive at Nabble.com.
|

Re: Cacti monitoring of Solr and Tomcat

2012-11-19 Thread Walter Underwood

We (Chegg) are using New Relic, even for the dev systems. It is pretty good, 
but only reports averages, when we need median and 90th percentile.

Our next step is putting something together with the Metrics server from Coda 
Hale (http://metrics.codahale.com/) and Graphite 
(http://graphite.wikidot.com/). This looks far more capable than New Relic, but 
more work.

wunder

On Nov 19, 2012, at 12:36 PM, Andy Lester wrote:

 
 On Nov 19, 2012, at 1:46 PM, Otis Gospodnetic otis.gospodne...@gmail.com 
 wrote:
 
 My favourite topic ;)  See my sig below for SPM for Solr. At my last
 company we used Cacti but it felt very 1990s almost. Some ppl use zabbix,
 some graphite, some newrelic, some SPM, some nothing!
 
 
 SPM looks mighty tasty, but we must have it in-house on our own servers, for 
 monitoring internal dev systems, and we'd like it to be open source.
 
 We already have Cacti up and running, but it's possible we could use 
 something else.
 
 --
 Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance

Odd behaviour for case insensitive searches

2012-11-19 Thread shemszot

Hello Everyone,

I've been having issues with odd SOLR behavior when searching for case
insensitive data.  Let's take a vanilla SOLR config (from the example). 
Then I uploaded the default solr.xml document with a slight modification to
the field with name 'name'.  I added Thomas NOSQL.

add
doc
  field name=idSOLR1000/field
  field name=nameSolr, the Enterprise Search Server Thomas NOSQL/field

/doc
/add

Then when I search for 
nosql~

I got the record returned in the search

However, when I seach for NOSQL~ no records are returned.

You can see my solr admin interface here:

http://skatingboutique.com [PORT 8080] /solr/#/tracks

Why is this?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Odd-behaviour-for-case-insensitive-searches-tp4021171.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: solr cloud shards and servers issue

Joe,

Can you remove it from the config and have it gone when you restart Solr?
Or restart Solr and unload as described on
http://wiki.apache.org/solr/CoreAdmin ?

Otis
--
Performance Monitoring - http://sematext.com/spm/index.html
Search Analytics - http://sematext.com/search-analytics/index.html

On Mon, Nov 19, 2012 at 11:57 AM, joe.cohe...@gmail.com
joe.cohe...@gmail.com wrote:

How can I unload a solrCore after i killed the running process?

Mark Miller-3 wrote
On Nov 19, 2012, at 11:24 AM,

joe.cohen.m@

wrote:

Hi
I have the following scenario:
I have 1 collection across 10 servers. Num of shards: 10.
Each server has 2 solr instances running. replication is 2.

I want to move one of the instances to another server. meaning, kill the
solr process in server X and start a new solr process in server Y
instead.
When I kill the solr process in server X, I can still see that instance
in
the solr-cloud-graph (marked differently).
When I run the instance on server Y, it get attahced to another shard,
instead of getting into the shard that is now actually missing an
instance.

1. Any way to tell solr/zookeeper - Forget about that instance?

Unload the SolrCores involved.

2. when running a new solr instance - any way to tell solr/zookeper -
add
this instance to shard X?

Specify a shardId when creating the core or configuring it in solr.xml
and
make it match the shard you want to add to.

- Mark

--
View this message in context:
http://lucene.472066.n3.nabble.com/solr-cloud-shards-and-servers-issue-tp4021101p402.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: CloudSolrServer or load-balancer for indexing

2012-11-19 Thread Marcin Rzewucki

OK, got it. Thanks.

On 19 November 2012 15:00, Mark Miller markrmil...@gmail.com wrote:

 Nodes stop accepting updates if they cannot talk to Zookeeper, so the
 external load balancer is no advantage there.

 CloudSolrServer will be smart about knowing who the leaders are,
 eventually will do hashing, will auto add/remove nodes from rotation based
 on the cluster state in Zookeeper, and is probably out of the box more
 intelligent about retrying on some responses (for example responses that
 are returned on shutdown or startup).

 - Mark

 On Nov 19, 2012, at 6:54 AM, Marcin Rzewucki mrzewu...@gmail.com wrote:

  Hi,
 
  As far as I know CloudSolrServer is recommended to be used for indexing
 to
  SolrCloud. I wonder what are advantages of this approach over external
  load-balancer ? Let's say I have 4 nodes SolrCloud (2 shards + replicas)
 +
  1 server running ZooKeeper. I can use CloudSolrServer for indexing or use
  load-balancer and send updates to any existing node. In former case it
  seems that ZooKeeper is a single point of failure - indexing is not
  possible if it is down. In latter case I can still indexing data even if
  some nodes are down (no data outage). What is better for reliable
 indexing
  - CloudSolrServer, load-balancer or you know some different methods worth
  to consider ?
 
  Regards.

Re: CloudSolrServer or load-balancer for indexing

2012-11-19 Thread Upayavira

A single zookeeper node could be a single point of failure. It is
recommended that you have at least one three zookeeper nodes running as
an ensemble.

Zookeeper has a simple rule - over half of your nodes must be available
to achieve quorum and thus be functioning. This is to avoid
'split-brain'. Thus, with three servers, you could handle the loss of
one zookeeper node. Five would allow the loss of two nodes.

More to the point, you're pushing the static configuration from being a
list of solr nodes, to being a list of Zookeeper nodes. The expectation
is clearly that you'll need to scale your Zookeeper nodes far less often
than you'd need to do it with Solr.

Upayavira

On Mon, Nov 19, 2012, at 09:39 PM, Marcin Rzewucki wrote:
 OK, got it. Thanks.
 
 On 19 November 2012 15:00, Mark Miller markrmil...@gmail.com wrote:
 
  Nodes stop accepting updates if they cannot talk to Zookeeper, so the
  external load balancer is no advantage there.
 
  CloudSolrServer will be smart about knowing who the leaders are,
  eventually will do hashing, will auto add/remove nodes from rotation based
  on the cluster state in Zookeeper, and is probably out of the box more
  intelligent about retrying on some responses (for example responses that
  are returned on shutdown or startup).
 
  - Mark
 
  On Nov 19, 2012, at 6:54 AM, Marcin Rzewucki mrzewu...@gmail.com wrote:
 
   Hi,
  
   As far as I know CloudSolrServer is recommended to be used for indexing
  to
   SolrCloud. I wonder what are advantages of this approach over external
   load-balancer ? Let's say I have 4 nodes SolrCloud (2 shards + replicas)
  +
   1 server running ZooKeeper. I can use CloudSolrServer for indexing or use
   load-balancer and send updates to any existing node. In former case it
   seems that ZooKeeper is a single point of failure - indexing is not
   possible if it is down. In latter case I can still indexing data even if
   some nodes are down (no data outage). What is better for reliable
  indexing
   - CloudSolrServer, load-balancer or you know some different methods worth
   to consider ?
  
   Regards.

Re: Cacti monitoring of Solr and Tomcat

2012-11-19 Thread Chris Hostetter


: Is anyone using Cacti to track trends over time in Solr and Tomcat 
: metrics?  We have Nagios set up for alerts, but want to track trends 
: over time.

A key thing to remember is that all of the stats you can get from solr 
via HTTP are also available via JMX...

http://wiki.apache.org/solr/SolrJmx

...so anytime if you have favotire monitoring tool WizWat and you're 
wondering if anyone has tips on using WizWat to monitor Solr, start 
by checking if WizWat has any docs on monitoring apps using JMX.


-Hoss

Re: solr cloud shards and servers issue

2012-11-19 Thread Tomás Fernández Löbbe

Maybe it would be better if Solr checked the live nodes and not all the
existing nodes in zk. If a server dies and you need to start a new one, it
would go straight to the correct shard without one needing to specify it
manually. Of course, the problem could be if a server goes down for a
minute and then comes back up, maybe a new node was added to the shard in
the interim, but I still think it would be better this way.

Tomás


On Mon, Nov 19, 2012 at 1:51 PM, Mark Miller markrmil...@gmail.com wrote:


 On Nov 19, 2012, at 11:24 AM, joe.cohe...@gmail.com wrote:

  Hi
  I have the following scenario:
  I have 1 collection across 10 servers. Num of shards: 10.
  Each server has 2 solr instances running. replication is 2.
 
  I want to move one of the instances to another server. meaning, kill the
  solr process in server X and start a new solr process in server Y
 instead.
  When I kill the solr process in server X, I can still see that instance
 in
  the solr-cloud-graph (marked differently).
  When I run the instance on server Y, it get attahced to another shard,
  instead of getting into the shard that is now actually missing an
 instance.
 
  1. Any way to tell solr/zookeeper  - Forget about that instance?

 Unload the SolrCores involved.

  2. when running a new solr instance - any way to tell solr/zookeper -
 add
  this instance to shard X?

 Specify a shardId when creating the core or configuring it in solr.xml and
 make it match the shard you want to add to.

 - Mark

Re: Order by hl.snippets count

2012-11-19 Thread Koji Sekiguchi


(12/11/20 1:50), Gabriel Croitoru wrote:

Hello,
I'm using  Solr 1.3 with http://wiki.apache.org/solr/HighlightingParameters 
options.
The client just asked us to change the order from the default score to the 
number of hl.snippets per
document.

It's this posibble from Solr configuration? (without implementing a custom 
scoring algorithm)?


I don't think it is possible.

koji
--
http://soleami.com/blog/lucene-4-is-super-convenient-for-developing-nlp-tools.html

Re: Best way to retrieve 20 specific documents

2012-11-19 Thread Shawn Heisey


On 11/19/2012 1:49 PM, Dotan Cohen wrote:

On Mon, Nov 19, 2012 at 10:27 PM, Otis Gospodnetic
otis.gospodne...@gmail.com wrote:

Hi,

How about id1 OR id2 OR id3? :)

Thank, Otis. This was my first inclination (id:123 OR 456), but it
didn't work when I tried. At your instigation I tried then id:123 OR
id:456. This does work. Thanks.


You can also use this query format:

id:(123 OR 456 OR 789)

This does get expanded internally by the query parser to the format that 
has the field name on every clause, but it is sometimes easier to write 
code that produces the above form.


Thanks,
Shawn

Re: Execute an independent query from the main query

2012-11-19 Thread Indika Tantrigoda

Hi Otis,

Yes, that seems like one solution, however I  have multiple opening and
closing hours, within the same day. Therefore it might become somewhat
complicated to manage the index. For now I shifted the business logic to
the client and a second query is made to get the additional data. Thanks
for the suggestion.

Indika

On 20 November 2012 02:50, Otis Gospodnetic otis.gospodne...@gmail.comwrote:

 Hi Indika,

 So my suggestion was to maybe consider changing the index structure and
 pull open/close times into 1 or more fields in the main record, so you
 don't have this problem all together.

 Otis
 --
 Performance Monitoring - http://sematext.com/spm/index.html
 Search Analytics - http://sematext.com/search-analytics/index.html




 On Sun, Nov 18, 2012 at 10:39 PM, Indika Tantrigoda indik...@gmail.com
 wrote:

  Hi Otis,
 
  Actually I maintain a separate document for each open/close time along
 with
  the date (i.e. Sunday =1, Monday =2). I was thinking if it would be
  possible to query Solr asking, give the next day's (can be current_day
 +1)
  minimum opening time as a response field.
 
  Thanks,
  Indika
 
  On 19 November 2012 04:50, Otis Gospodnetic otis.gospodne...@gmail.com
  wrote:
 
   Hi,
  
   Maybe your index needs to have a separate field for each day open/close
   time. No join or extra query needed then.
  
   Otis
   --
   Performance Monitoring - http://sematext.com/spm
   On Nov 18, 2012 5:35 PM, Indika Tantrigoda indik...@gmail.com
 wrote:
  
Thanks for the response.
   
Erick,
My use case is related to restaurant opening hours, In the same
 request
   to
Solr I'd like to get the time when the restaurant opens the next
day, preferably part of the fields returned, and this needs to be
independent of the main queries search params.
   
Yes, the Join wouldn't be suitable in this use case.
   
Luis,
I had thought of having the logic in the client side, but before
 that I
wanted to see if I could get the result from Solr itself. I
am currently using SolrJ along with Spring.
   
Thanks,
Indika
   
On 18 November 2012 21:49, Luis Cappa Banda luisca...@gmail.com
  wrote:
   
 Hello!

 When queries become more and more complex and you need to apply one
second
 query with the resultant docs from the first one, or re-sort
 results,
   or
 maybe add some promotional or special docs to the response, I
  recommend
to
 develop a Web App module that implements that complex business
 logic
   and
 dispatches queries from your Client App to your Solr back-end. That
module,
 let's call Search Engine, lets you play with all those special use
   cases.
 If you are familiar with Java I suggest you to have a look at the
 combination between SolrJ and Spring framework or Jersey.

 Regards,

 - Luis Cappa.
 El 18/11/2012 15:15, Indika Tantrigoda indik...@gmail.com
   escribió:

  Hi All,
 
  I would like to get results of an query that is different from
 the
   main
  query as a new field. This query needs to be independent from any
filter
  queries applied to the main query. I was trying to achieve this
 by
  fl=_external_query_result:query($myQuery), however that result
  seems
   to
 be
  governed by any filter queries applied to the main query ? Is it
possible
  to have a completely separate query in the fl list and return its
result
  along with the results (per results), or would I need to create a
 separate
  query on the client side to get the results of the independent
  query
 (based
  on the results from the first query) ?
 
  Thanks in advance,
  Indika

Re: Preventing accepting queries while custom QueryComponent starts up?

2012-11-19 Thread Chris Hostetter


: I have several custom QueryComponents that have high one-time startup costs
: (hashing things in the index, caching things from a RDBMS, etc...)

you need to provide more details about how your custom components work -- 
in particular: where in teh lifecycle of your components is this 
high-startup cost happening?

: Is there a way to prevent solr from accepting connections before all
: QueryComponents are ready?

Define ready ? ... things that happen in the init() and inform(SolrCore) 
methods will completley prevent the SolrCore from being available for 
queries.

Likewise: if you are using firstSearcher warming queries, then the 
useColdSearcher option in solrconfig.xml can be used to control wether 
or not external requests will block until the searcher is available or 
not -- however this doesn't prevent the servlet container from accepting 
the HTTP connection.  but as mentioned, this is where things like the 
PingRequestHandler and the enable/disable commands can be used to take 
servers in and out of rotation with your load balancer -- assuming that 
your load balanver can be configured to monitor the ping URL.   
Alternatively you can just use native features of your load balancer to 
control this independent of solr (but the ping handler is a nice way of 
letting one set of dev/ops folks own the solr servers and control their 
availability even if they don't have the ability to control the load 
blaancer itself)


-Hoss

Re: Best way to retrieve 20 specific documents

2012-11-19 Thread Upayavira

In fact, you shouldn't need OR:

id:(123 456 789) 

will default to OR.

Upayavira

On Mon, Nov 19, 2012, at 10:45 PM, Shawn Heisey wrote:
 On 11/19/2012 1:49 PM, Dotan Cohen wrote:
  On Mon, Nov 19, 2012 at 10:27 PM, Otis Gospodnetic
  otis.gospodne...@gmail.com wrote:
  Hi,
 
  How about id1 OR id2 OR id3? :)
  Thank, Otis. This was my first inclination (id:123 OR 456), but it
  didn't work when I tried. At your instigation I tried then id:123 OR
  id:456. This does work. Thanks.
 
 You can also use this query format:
 
 id:(123 OR 456 OR 789)
 
 This does get expanded internally by the query parser to the format that 
 has the field name on every clause, but it is sometimes easier to write 
 code that produces the above form.
 
 Thanks,
 Shawn

All-wildcard query performance

2012-11-19 Thread Aleksey Vorona


Hi,

Our application sometimes generates queries with one of the constraints:
field:[* TO *]

I expected this query performance to be the same as if we omitted the 
field constraint completely. However, I see the performance of the two 
queries to differ drastically (3ms without all-wildcard constraint, 
200ms with it).


Could someone explain the source of the difference, please?

I am fixing the application not to generate such queries, obviously, but 
still would like to understand the logic here. We use Solr 3.6.1. Thanks.


-- Aleksey

Re: SolrCloud Error after leader restarts

It's generally not a good choice to use ram directory.

4x solrcloud does not work with it no - 5x does, but in any case, ram dir is 
not persistent. So when you restart Solr you will lose the data.

MMap is generally the right dir to use.

- Mark

On Nov 19, 2012, at 6:52 PM, deniz denizdurmu...@gmail.com wrote:

 yea, i am using ram.
 
 solrcloud is not working with ram directory? 
 
 
 
 -
 Zeki ama calismiyor... Calissa yapar...
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/SolrCloud-Error-after-leader-restarts-tp4020985p4021194.html
 Sent from the Solr - User mailing list archive at Nabble.com.

solr4 MULTIPOLYGON search syntax

2012-11-19 Thread jend

Does anybody have any info on how to property construct a multipolygon
search?

Im very interested in

Polygon (search all documents within a shape)
Multipolygon (search all documents within 2+ shapes)
Multipolygon (search all documents with 2+ shapes but not within an area
within a shape - if you can image a donut where you dont search within the
hole in the center)

Im trying to search 2 shapes but get errors at the moment. Polygon searches
work just fine so I have everything installed correctly, but 2 shapes in the
one search as per below is not working. I cant find anything on the net to
try and debug Multipolygons.

My multipolygon query looks like this.
fq=geo:Intersects(MULTIPOLYGON ((149.4023 -34.6072, 149.4023 -34.8690,
149.9022 -34.8690, 149.9022 -34.6072, 149.4023 -34.6072)), ((151.506958
-33.458943, 150.551147 -33.60547, 151.00708 -34.257216, 151.627808
-33.861293, 151.506958 -33.458943)))

And I get this error.
ERROR 500

error reading WKT

But a polygon search works fine.
fq=geo:Intersects(POLYGON((149.4023 -34.6072, 149.4023 -34.8690, 149.9022
-34.8690, 149.9022 -34.6072, 149.4023 -34.6072))) 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr4-MULTIPOLYGON-search-syntax-tp4021199.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: All-wildcard query performance

2012-11-19 Thread Shawn Heisey

 Hi,

 Our application sometimes generates queries with one of the constraints:
  field:[* TO *]

 I expected this query performance to be the same as if we omitted the
 field constraint completely. However, I see the performance of the two
 queries to differ drastically (3ms without all-wildcard constraint,
 200ms with it).

 Could someone explain the source of the difference, please?

 I am fixing the application not to generate such queries, obviously, but
 still would like to understand the logic here. We use Solr 3.6.1. Thanks.

That query does not mean all docs. It means something slightly different -
all documents for which field is present. If this field happens to exist
in every document, then it amounts to the same thing, but Solr still must
check every document.

Thanks,
Shawn

Re: More Like this without a document?

2012-11-19 Thread Chris Hostetter


: If I want to use MoreLikeThis algorithm I need to add this documents in the
: index? The MoreLikeThis will work with soft commits? Is there a solution to
: do a MoreLikeThis without adding the document in the index?

you can feed the MoreLikeThisHandler a ContentStream (ie: POST data, or 
file upload, or stream.body request param) of text instead of sending it 
a query and it will use that raw text to find more like this

http://wiki.apache.org/solr/MoreLikeThisHandler

-Hoss

Re: SolrCloud Error after leader restarts

2012-11-19 Thread deniz

i know facts about ramdirectory actually.. just running some perf tests on
our dev env right now..

so in case i use ramdir with 5x cloud, it will still not do the recovery? i
mean it will not get the data from the leader and fill its ramdir again?



-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-Error-after-leader-restarts-tp4020985p4021203.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: is it possible to save the search query?

2012-11-19 Thread Romita Saha

Hi,

Thanks for your guidance. I am unable to figure out what is a doc ID and 
how can i collect all the doc IDs.

Thanks and regards,
Romita Saha



From:   Otis Gospodnetic otis.gospodne...@gmail.com
To: solr-user@lucene.apache.org, 
Date:   11/09/2012 12:33 AM
Subject:Re: is it possible to save the search query?



Hi,

Aha, I think I understand.  Yes, you could collect all doc IDs from each
query and find the differences.  There is nothing in Solr that can find
those differences or that would store doc IDs of returned hits in the 
first
place, so you would have to implement this yourself.  Sematext's Search
Analytics service my be of help here in the sense that all data you
need (queries, doc IDs, etc.) are collected, so it would be a matter of
providing an API to get the data for off-line analysis.  But this data
collection+diffing is also something you could implement yourself.  One
thing to think about - what do you do when a query returns a lrge
number of hits.  Do you really want/need to get IDs for all of them, or
only a page at a time.

Otis
--
Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html


On Thu, Nov 8, 2012 at 1:01 AM, Romita Saha 
romita.s...@sg.panasonic.comwrote:

 Hi,

 The following is the example;
 1st query:


 
http://localhost:8983/solr/db/select/?defType=dismaxdebugQuery=onq=cashier2qf=data

 ^2
 idstart=0rows=11fl=data,id

 Next query:


 
http://localhost:8983/solr/db/select/?defType=dismaxdebugQuery=onq=cashier2qf=data

 id^2start=0rows=11fl=data,id

 In the 1st query the the field 'data' is boosted by 2. However may be 
the
 user was not satisfied with the response. Thus in the next query he
 boosted the field 'id' by 2.

 I want to record both the queries and compare between the two, meaning,
 what are the changes implemented on the 2nd query which are not present 
in
 the previous one.

 Thanks and regards,
 Romita Saha



 From:   Otis Gospodnetic otis.gospodne...@gmail.com
 To: solr-user@lucene.apache.org,
 Date:   11/08/2012 01:35 PM
 Subject:Re: is it possible to save the search query?



 Hi,

 Compare in what sense?  An example will help.

 Otis
 --
 Performance Monitoring - http://sematext.com/spm
 On Nov 7, 2012 8:45 PM, Romita Saha romita.s...@sg.panasonic.com
 wrote:

  Hi All,
 
  Is it possible to record a search query in solr and then compare it 
with
  the previous search query?
 
  Thanks and regards,
  Romita Saha

Re: SolrCloud Error after leader restarts


On Nov 19, 2012, at 9:11 PM, deniz denizdurmu...@gmail.com wrote:

 so in case i use ramdir with 5x cloud, it will still not do the recovery? i
 mean it will not get the data from the leader and fill its ramdir again?

Yes, in 5x RAM directory should be able to recover.

- Mark

Ranking by sorting score and rankingField better or by product(score, rankingField)?

Hi  there,

I have a field(which is externalFileField, called rankingField) and that
value(type=float) is calculated by client app.

For the solr original scoring model, affect boost value will result
different ranking. So I think product(score,rankingField) may equivalent to
solr scoring model.

What I curious is which will be better in practice and the different
meanings on these three solutions?

1. sort=score+desc,ranking+desc
2. sort=ranking+desc,score+desc
3. sort=product(score,ranking) --is this possible?

I'd like to hear your thoughts.

Many thanks

Floyd

Re: SolrCloud Error after leader restarts

2012-11-19 Thread deniz

Mark Miller-3 wrote
 On Nov 19, 2012, at 9:11 PM, deniz lt;

 denizdurmus87@

 gt; wrote:
 
 so in case i use ramdir with 5x cloud, it will still not do the recovery?
 i
 mean it will not get the data from the leader and fill its ramdir again?
 
 Yes, in 5x RAM directory should be able to recover.
 
 - Mark

thank you so much for your patience with me :) 



-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-Error-after-leader-restarts-tp4020985p4021209.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Ranking by sorting score and rankingField better or by product(score, rankingField)?

Hi,

3. yes, you can sort by function -
http://search-lucene.com/?q=solr+sort+by+function
2. this will sort by score only when there is a tie in ranking (two docs
have the same rank value)
1. the reverse of 2.

Otis
--
Performance Monitoring - http://sematext.com/spm/index.html
Search Analytics - http://sematext.com/search-analytics/index.html




On Mon, Nov 19, 2012 at 9:40 PM, Floyd Wu floyd...@gmail.com wrote:

 Hi  there,

 I have a field(which is externalFileField, called rankingField) and that
 value(type=float) is calculated by client app.

 For the solr original scoring model, affect boost value will result
 different ranking. So I think product(score,rankingField) may equivalent to
 solr scoring model.

 What I curious is which will be better in practice and the different
 meanings on these three solutions?

 1. sort=score+desc,ranking+desc
 2. sort=ranking+desc,score+desc
 3. sort=product(score,ranking) --is this possible?

 I'd like to hear your thoughts.

 Many thanks

 Floyd

Re: Custom ranking solutions?

Hi Floyd,

Use debugQuery=true and let's see it.:)

Otis
--
Performance Monitoring - http://sematext.com/spm/index.html
Search Analytics - http://sematext.com/search-analytics/index.html




On Mon, Nov 19, 2012 at 9:29 PM, Floyd Wu floyd...@gmail.com wrote:

 Hi there,

 Before ExternalFielField introduced, change document boost value to achieve
 custom ranking. My client app will update each boost value for documents
 daily and seem to worked fine.
 Actual ranking could be predicted based on boost value. (value is
 calculated based on click, recency, and rating ).

 I'm now try to use ExternalFileField to do some ranking, after some test, I
 did not get my expectation.

 I'm doing a sort like this

 sort=product(score,abs(rankingField))+desc
 But the query result ranking won't change anyway.

 The external file as following
 doc1=3
 doc2=5
 doc3=9

 The original score get from Solr result as fllowing
 doc1=41.042
 doc2=10.1256
 doc3=8.2135

 Expected ranking
 doc1
 doc3
 doc2

 What wrong in my test, please kindly help on this.

 Floyd

Re: is it possible to save the search query?

Hi,

Document ID would be a field in your document.  A unique field that you
specify when indexing.
You can collect it by telling Solr to return it in the search results by
including it in the fl= parameter.

Otis
--
Performance Monitoring - http://sematext.com/spm/index.html
Search Analytics - http://sematext.com/search-analytics/index.html




On Mon, Nov 19, 2012 at 9:31 PM, Romita Saha
romita.s...@sg.panasonic.comwrote:

 Hi,

 Thanks for your guidance. I am unable to figure out what is a doc ID and
 how can i collect all the doc IDs.

 Thanks and regards,
 Romita Saha



 From:   Otis Gospodnetic otis.gospodne...@gmail.com
 To: solr-user@lucene.apache.org,
 Date:   11/09/2012 12:33 AM
 Subject:Re: is it possible to save the search query?



 Hi,

 Aha, I think I understand.  Yes, you could collect all doc IDs from each
 query and find the differences.  There is nothing in Solr that can find
 those differences or that would store doc IDs of returned hits in the
 first
 place, so you would have to implement this yourself.  Sematext's Search
 Analytics service my be of help here in the sense that all data you
 need (queries, doc IDs, etc.) are collected, so it would be a matter of
 providing an API to get the data for off-line analysis.  But this data
 collection+diffing is also something you could implement yourself.  One
 thing to think about - what do you do when a query returns a lrge
 number of hits.  Do you really want/need to get IDs for all of them, or
 only a page at a time.

 Otis
 --
 Search Analytics - http://sematext.com/search-analytics/index.html
 Performance Monitoring - http://sematext.com/spm/index.html


 On Thu, Nov 8, 2012 at 1:01 AM, Romita Saha
 romita.s...@sg.panasonic.comwrote:

  Hi,
 
  The following is the example;
  1st query:
 
 
 

 http://localhost:8983/solr/db/select/?defType=dismaxdebugQuery=onq=cashier2qf=data

  ^2
  idstart=0rows=11fl=data,id
 
  Next query:
 
 
 

 http://localhost:8983/solr/db/select/?defType=dismaxdebugQuery=onq=cashier2qf=data

  id^2start=0rows=11fl=data,id
 
  In the 1st query the the field 'data' is boosted by 2. However may be
 the
  user was not satisfied with the response. Thus in the next query he
  boosted the field 'id' by 2.
 
  I want to record both the queries and compare between the two, meaning,
  what are the changes implemented on the 2nd query which are not present
 in
  the previous one.
 
  Thanks and regards,
  Romita Saha
 
 
 
  From:   Otis Gospodnetic otis.gospodne...@gmail.com
  To: solr-user@lucene.apache.org,
  Date:   11/08/2012 01:35 PM
  Subject:Re: is it possible to save the search query?
 
 
 
  Hi,
 
  Compare in what sense?  An example will help.
 
  Otis
  --
  Performance Monitoring - http://sematext.com/spm
  On Nov 7, 2012 8:45 PM, Romita Saha romita.s...@sg.panasonic.com
  wrote:
 
   Hi All,
  
   Is it possible to record a search query in solr and then compare it
 with
   the previous search query?
  
   Thanks and regards,
   Romita Saha

Re: Best way to retrieve 20 specific documents

I wanted to be explicit for the OP.

Vut wouldn't that depend on mm if you are using (e)dismax?

Otis
--
Performance Monitoring - http://sematext.com/spm/index.html
Search Analytics - http://sematext.com/search-analytics/index.html




On Mon, Nov 19, 2012 at 6:37 PM, Upayavira u...@odoko.co.uk wrote:

 In fact, you shouldn't need OR:

 id:(123 456 789)

 will default to OR.

 Upayavira

 On Mon, Nov 19, 2012, at 10:45 PM, Shawn Heisey wrote:
  On 11/19/2012 1:49 PM, Dotan Cohen wrote:
   On Mon, Nov 19, 2012 at 10:27 PM, Otis Gospodnetic
   otis.gospodne...@gmail.com wrote:
   Hi,
  
   How about id1 OR id2 OR id3? :)
   Thank, Otis. This was my first inclination (id:123 OR 456), but it
   didn't work when I tried. At your instigation I tried then id:123 OR
   id:456. This does work. Thanks.
 
  You can also use this query format:
 
  id:(123 OR 456 OR 789)
 
  This does get expanded internally by the query parser to the format that
  has the field name on every clause, but it is sometimes easier to write
  code that produces the above form.
 
  Thanks,
  Shawn

Re: Ranking by sorting score and rankingField better or by product(score, rankingField)?

Thanks Otis,

But the sort=product(score, rankingField) is not working in my test. What
probably wrong?

Floyd


2012/11/20 Otis Gospodnetic otis.gospodne...@gmail.com

 Hi,

 3. yes, you can sort by function -
 http://search-lucene.com/?q=solr+sort+by+function
 2. this will sort by score only when there is a tie in ranking (two docs
 have the same rank value)
 1. the reverse of 2.

 Otis
 --
 Performance Monitoring - http://sematext.com/spm/index.html
 Search Analytics - http://sematext.com/search-analytics/index.html




 On Mon, Nov 19, 2012 at 9:40 PM, Floyd Wu floyd...@gmail.com wrote:

  Hi  there,
 
  I have a field(which is externalFileField, called rankingField) and that
  value(type=float) is calculated by client app.
 
  For the solr original scoring model, affect boost value will result
  different ranking. So I think product(score,rankingField) may equivalent
 to
  solr scoring model.
 
  What I curious is which will be better in practice and the different
  meanings on these three solutions?
 
  1. sort=score+desc,ranking+desc
  2. sort=ranking+desc,score+desc
  3. sort=product(score,ranking) --is this possible?
 
  I'd like to hear your thoughts.
 
  Many thanks
 
  Floyd

Re: Ranking by sorting score and rankingField better or by product(score, rankingField)?

Hi,

Do you see any errors?
Which version of Solr?
What does debugQuery=true say?
Are you sure your file with ranks is being used? (remove it, put some junk
in it, see if that gives an error)

Otis
--
Performance Monitoring - http://sematext.com/spm/index.html
Search Analytics - http://sematext.com/search-analytics/index.html




On Mon, Nov 19, 2012 at 10:16 PM, Floyd Wu floyd...@gmail.com wrote:

 Thanks Otis,

 But the sort=product(score, rankingField) is not working in my test. What
 probably wrong?

 Floyd


 2012/11/20 Otis Gospodnetic otis.gospodne...@gmail.com

  Hi,
 
  3. yes, you can sort by function -
  http://search-lucene.com/?q=solr+sort+by+function
  2. this will sort by score only when there is a tie in ranking (two docs
  have the same rank value)
  1. the reverse of 2.
 
  Otis
  --
  Performance Monitoring - http://sematext.com/spm/index.html
  Search Analytics - http://sematext.com/search-analytics/index.html
 
 
 
 
  On Mon, Nov 19, 2012 at 9:40 PM, Floyd Wu floyd...@gmail.com wrote:
 
   Hi  there,
  
   I have a field(which is externalFileField, called rankingField) and
 that
   value(type=float) is calculated by client app.
  
   For the solr original scoring model, affect boost value will result
   different ranking. So I think product(score,rankingField) may
 equivalent
  to
   solr scoring model.
  
   What I curious is which will be better in practice and the different
   meanings on these three solutions?
  
   1. sort=score+desc,ranking+desc
   2. sort=ranking+desc,score+desc
   3. sort=product(score,ranking) --is this possible?
  
   I'd like to hear your thoughts.
  
   Many thanks
  
   Floyd

Re: Custom ranking solutions?

HI Otis,
The debug information as following, seems there is no product() process .

lst name=debug
str name=rawquerystring_l_all:測試/str
str name=querystring_l_all:測試/str
str name=parsedqueryPhraseQuery(_l_all:測 試)/str
str name=parsedquery_toString_l_all:測 試/str
lst name=explain
str name=222
41.11747 = (MATCH) weight(_l_all:測 試 in 0) [DefaultSimilarity], result
of: 41.11747 = fieldWeight in 0, product of: 4.1231055 = tf(freq=17.0),
with freq of: 17.0 = phraseFreq=17.0 1.4246359 = idf(), sum of: 0.71231794
= idf(docFreq=3, maxDocs=3) 0.71231794 = idf(docFreq=3, maxDocs=3) 7.0 =
fieldNorm(doc=0)
/str
str name=223
14.246359 = (MATCH) weight(_l_all:測 試 in 0) [DefaultSimilarity], result
of: 14.246359 = fieldWeight in 0, product of: 1.0 = tf(freq=1.0), with freq
of: 1.0 = phraseFreq=1.0 1.4246359 = idf(), sum of: 0.71231794 =
idf(docFreq=3, maxDocs=3) 0.71231794 = idf(docFreq=3, maxDocs=3) 10.0 =
fieldNorm(doc=0)
/str
str name=211
10.073696 = (MATCH) weight(_l_all:測 試 in 0) [DefaultSimilarity], result
of: 10.073696 = fieldWeight in 0, product of: 1.4142135 = tf(freq=2.0),
with freq of: 2.0 = phraseFreq=2.0 1.4246359 = idf(), sum of: 0.71231794 =
idf(docFreq=3, maxDocs=3) 0.71231794 = idf(docFreq=3, maxDocs=3) 5.0 =
fieldNorm(doc=0)
/str
/lst
str name=QParserLuceneQParser/str
lst name=timing
double name=time6.0/double
lst name=prepare
double name=time0.0/double
lst name=org.apache.solr.handler.component.QueryComponent
double name=time0.0/double
/lst
lst name=org.apache.solr.handler.component.FacetComponent
double name=time0.0/double
/lst
lst name=org.apache.solr.handler.component.MoreLikeThisComponent
double name=time0.0/double
/lst
lst name=org.apache.solr.handler.component.HighlightComponent
double name=time0.0/double
/lst
lst name=org.apache.solr.handler.component.StatsComponent
double name=time0.0/double
/lst
lst name=org.apache.solr.handler.component.DebugComponent
double name=time0.0/double
/lst
/lst
lst name=process
double name=time6.0/double
lst name=org.apache.solr.handler.component.QueryComponent
double name=time3.0/double
/lst
lst name=org.apache.solr.handler.component.FacetComponent
double name=time0.0/double
/lst
lst name=org.apache.solr.handler.component.MoreLikeThisComponent
double name=time0.0/double
/lst
lst name=org.apache.solr.handler.component.HighlightComponent
double name=time0.0/double
/lst
lst name=org.apache.solr.handler.component.StatsComponent
double name=time0.0/double
/lst
lst name=org.apache.solr.handler.component.DebugComponent
double name=time3.0/double
/lst
/lst
/lst
/lst


2012/11/20 Otis Gospodnetic otis.gospodne...@gmail.com

 Hi Floyd,

 Use debugQuery=true and let's see it.:)

 Otis
 --
 Performance Monitoring - http://sematext.com/spm/index.html
 Search Analytics - http://sematext.com/search-analytics/index.html




 On Mon, Nov 19, 2012 at 9:29 PM, Floyd Wu floyd...@gmail.com wrote:

  Hi there,
 
  Before ExternalFielField introduced, change document boost value to
 achieve
  custom ranking. My client app will update each boost value for documents
  daily and seem to worked fine.
  Actual ranking could be predicted based on boost value. (value is
  calculated based on click, recency, and rating ).
 
  I'm now try to use ExternalFileField to do some ranking, after some
 test, I
  did not get my expectation.
 
  I'm doing a sort like this
 
  sort=product(score,abs(rankingField))+desc
  But the query result ranking won't change anyway.
 
  The external file as following
  doc1=3
  doc2=5
  doc3=9
 
  The original score get from Solr result as fllowing
  doc1=41.042
  doc2=10.1256
  doc3=8.2135
 
  Expected ranking
  doc1
  doc3
  doc2
 
  What wrong in my test, please kindly help on this.
 
  Floyd

Re: Ranking by sorting score and rankingField better or by product(score, rankingField)?

Hi Otis,

There is no error in console nor in log file. I'm using Solr-4.0.
The External file name is external_rankingField.txt and exist is directory
C:\solr-4.0.0\example\solr\collection1\data\external_rankingField.txt

External file should work as well because when I issue query
sort=sqrt(rankingField)+desc or sort=sqrt(rankingField)+asc or
sort=sqrt(rankingField)+desc

Things will change accordingly.

By the way, I first try external field according document here
http://lucidworks.lucidimagination.com/display/solr/Working+with+External+Files+and+Processes

Format of the External File

The file itself is located in Solr's index directory, which by default is
$SOLR_HOME/data/index. The name of the file should beexternal_*fieldname*
or external_*fieldname*.*. For the example above, then, the file could be
named external_entryRankFile orexternal_entryRankFile.txt.

But actually the external file should put in
$SOLR_HOME/data/

Floyd

2012/11/20 Otis Gospodnetic otis.gospodne...@gmail.com

Hi,

Do you see any errors?
Which version of Solr?
What does debugQuery=true say?
Are you sure your file with ranks is being used? (remove it, put some junk
in it, see if that gives an error)

Otis
--
Performance Monitoring - http://sematext.com/spm/index.html
Search Analytics - http://sematext.com/search-analytics/index.html

On Mon, Nov 19, 2012 at 10:16 PM, Floyd Wu floyd...@gmail.com wrote:

Thanks Otis,

But the sort=product(score, rankingField) is not working in my test. What
probably wrong?

Floyd

2012/11/20 Otis Gospodnetic otis.gospodne...@gmail.com

Hi,

3. yes, you can sort by function -
http://search-lucene.com/?q=solr+sort+by+function
2. this will sort by score only when there is a tie in ranking (two
docs
have the same rank value)
1. the reverse of 2.

Otis
--
Performance Monitoring - http://sematext.com/spm/index.html
Search Analytics - http://sematext.com/search-analytics/index.html

On Mon, Nov 19, 2012 at 9:40 PM, Floyd Wu floyd...@gmail.com wrote:

Hi there,

I have a field(which is externalFileField, called rankingField) and
that
value(type=float) is calculated by client app.

For the solr original scoring model, affect boost value will result
different ranking. So I think product(score,rankingField) may
equivalent
to
solr scoring model.

What I curious is which will be better in practice and the different
meanings on these three solutions?

1. sort=score+desc,ranking+desc
2. sort=ranking+desc,score+desc
3. sort=product(score,ranking) --is this possible?

I'd like to hear your thoughts.

Many thanks

Floyd

Weird Behaviour on Solr 5x (SolrCloud)

2012-11-19 Thread deniz

Hi all, 

after Mark Miller made it clear to me that 5x is supporting cloud with
ramdir, I have started playing with it and it seemed working smoothly,
except a weird behaviour.. here is the story of it:

Basically, I have pulled the code and built solr 5x, and the replace the war
file in webapps dir of my current installation... then i have started my
zookeeper servers..

after that i have started solr instances with the params below:

java -Djetty.port=7574 -DzkHost=zkserver2:2182 -jar start.jar (running on a
remote machine)
java -Dbootstrap_conf=true -DzkHost=zkserver1:2181 -jar start.jar (running
on local)

after both of them are up, i have indexed some docs, and both of the solr
instances were updated succesfully. after this point, i have killed one of
the solr (running on remote, not leader) and then restarted it again. there
was no errors in the log and everything seemed normal in the logs...

however, when i have checked the web interface for the one i have restarted
it showed 0 docs.. after that I ran q=*:* few times... 
and thats the point which surprises me... randomly it returned 0 results and
then it returned correct numbers.. each time i make the same query, i get an
empty result set randomly... I have no idea why this is happening


here is the logs 

for the one running on remote (which was restarted)

Nov 20, 2012 11:32:11 AM org.apache.solr.core.SolrCore execute
INFO: [collection1] webapp=/solr path=/select
params={distrib=falsewt=javabinrows=10version=2df=textfl=id,scoreshard.url=10.60.0.54:8983/solr/collection1/|remote:7574/solr/collection1/NOW=1353382331589start=0q=*:*isShard=truefsv=true}
hits=0 status=0 QTime=0 
Nov 20, 2012 11:32:11 AM org.apache.solr.core.SolrCore execute
INFO: [collection1] webapp=/solr path=/select params={wt=xmlq=*:*} hits=0
status=0 QTime=7 
Nov 20, 2012 11:32:22 AM org.apache.solr.core.SolrCore execute
INFO: [collection1] webapp=/solr path=/select
params={distrib=falsewt=javabinrows=10version=2df=textfl=id,scoreshard.url=10.60.0.54:8983/solr/collection1/|remote:7574/solr/collection1/NOW=1353382342238start=0q=*:*isShard=truefsv=true}
hits=0 status=0 QTime=0 
Nov 20, 2012 11:32:22 AM org.apache.solr.core.SolrCore execute
INFO: [collection1] webapp=/solr path=/select params={wt=xmlq=*:*} hits=0
status=0 QTime=7 
Nov 20, 2012 11:32:27 AM org.apache.solr.core.SolrCore execute
INFO: [collection1] webapp=/solr path=/select
params={distrib=falsewt=javabinrows=10version=2df=textfl=id,scoreshard.url=10.60.0.54:8983/solr/collection1/|remote:7574/solr/collection1/NOW=1353382347438start=0q=*:*isShard=truefsv=true}
hits=0 status=0 QTime=0 
Nov 20, 2012 11:32:27 AM org.apache.solr.core.SolrCore execute
INFO: [collection1] webapp=/solr path=/select params={wt=xmlq=*:*} hits=0
status=0 QTime=14 
Nov 20, 2012 11:32:28 AM org.apache.solr.core.SolrCore execute
INFO: [collection1] webapp=/solr path=/select
params={distrib=falsewt=javabinrows=10version=2df=textfl=id,scoreshard.url=10.60.0.54:8983/solr/collection1/|remote:7574/solr/collection1/NOW=1353382348255start=0q=*:*isShard=truefsv=true}
hits=0 status=0 QTime=1 
Nov 20, 2012 11:32:28 AM org.apache.solr.core.SolrCore execute
INFO: [collection1] webapp=/solr path=/select params={wt=xmlq=*:*} hits=0
status=0 QTime=7 
Nov 20, 2012 11:32:28 AM org.apache.solr.core.SolrCore execute
INFO: [collection1] webapp=/solr path=/select params={wt=xmlq=*:*} hits=32
status=0 QTime=14 


and for the same query, here is the log, from my local (leader, not
restarted)

Nov 20, 2012 11:31:46 AM org.apache.solr.core.SolrCore execute
INFO: [collection1] webapp=/solr path=/select
params={distrib=falsewt=javabinrows=10version=2df=textfl=id,scoreshard.url=localhost:8983/solr/collection1/|remoteserver:7574/solr/collection1/NOW=1353382306472start=0q=*:*isShard=truefsv=true}
hits=32 status=0 QTime=0 
Nov 20, 2012 11:31:46 AM org.apache.solr.core.SolrCore execute
INFO: [collection1] webapp=/solr path=/select
params={df=textshard.url=localhost:8983/solr/collection1/|remoteserver:7574/solr/collection1/NOW=1353382306472q=*:*ids=SP2514N,GB18030TEST,apple,F8V7067-APL-KIT,adata,6H500F0,MA147LL/A,ati,IW-02,asusdistrib=falseisShard=truewt=javabinrows=10version=2}
status=0 QTime=1 
Nov 20, 2012 11:32:00 AM org.apache.solr.core.SolrCore execute
INFO: [collection1] webapp=/solr path=/select
params={distrib=falsewt=javabinrows=10version=2df=textfl=id,scoreshard.url=localhost:8983/solr/collection1/|remoteserver:7574/solr/collection1/NOW=1353382320738start=0q=*:*isShard=truefsv=true}
hits=32 status=0 QTime=0 
Nov 20, 2012 11:32:00 AM org.apache.solr.core.SolrCore execute
INFO: [collection1] webapp=/solr path=/select
params={df=textshard.url=localhost:8983/solr/collection1/|remoteserver:7574/solr/collection1/NOW=1353382320738q=*:*ids=SP2514N,GB18030TEST,apple,F8V7067-APL-KIT,adata,6H500F0,MA147LL/A,ati,IW-02,asusdistrib=falseisShard=truewt=javabinrows=10version=2}
status=0 QTime=1 
Nov 20, 2012 11:32:28 AM org.apache.solr.core.SolrCore execute
INFO: [collection1]

Re: solr autocomplete requirement

2012-11-19 Thread Sujatha Arun

Anyone with suggestions on this?


On Mon, Nov 19, 2012 at 10:13 PM, Sujatha Arun suja.a...@gmail.com wrote:

 Hi,

 Our requirement for auto complete is slightly complicated , We need two
 types of auto complete

 1. Meta data Auto complete
 2. Full text Content Auto complete

 In addition the metadata fields are multi-valued  we need to filter the
 results for certain auto-complete both types

 After trying different approaches like

 1)Suggester  -We cannot filter results
 2)Terms Comp - We cannot filter
 3)Facets on Full text Content with Tokenized fields - Expensive
 4)Same core with n-gram Indexing and storing the results and using the
 highlight component to fetch the snippet for autosuggest.

 The last approach  which we are leaning towards has 2 draw backs -

 One- it returns duplicates data as ,some meta data is the same across
 documents
 Two- words are getting truncated at character when results are returned
 with highlight


 Mitigation for the above 2 issue could be :  Remove duplicates after
  obtaining results at Application (issue could be additional time for this)
Use fast
 vector highlight that can help with full word snippets (could be heavy on
 the Index Size)

 Anybody body has any suggestion / had similar requirements with successful
 implementation?

 Other question ,what would be impact of serving the suggestions out of the
 same core as the one we are searching while using highlight component for
 fetching snippets.

 For our full text search requirements ,we are doing the highlight outside
 solr, in our application and we would be storing and using the highlight ,
 only for suggestion.

 Thanks
 Sujatha

Re: Custom ranking solutions?