Re: Solr query syntax.

2013-12-02 Thread elmerfudd
Im using the default qparser that come with solr 4.4 , Is there anything
better?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-query-syntax-tp4103784p4104344.html
Sent from the Solr - User mailing list archive at Nabble.com.


luke 4.5.0 released

2013-12-02 Thread Dmitry Kan
Hello!

I have just released luke 4.5.0 along with the binary. It's version is
reflecting the Lucene's version underneath.

Feel free to test this and give feedback / submit bug fixes / patches.

https://github.com/DmitryKey/luke/releases/tag/4.5.0

Thanks.

-- 
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: twitter.com/dmitrykan


SolrCloud FunctionQuery inconsistency

2013-12-02 Thread sling
Hi,
I have a solrcloud with 4 shards. They are running normally.
How is possible that the same function query returns different results? 
And it happens even in the same shard?

However, when sort by ptime desc, the result is consistent.
The dateDeboost generate the time-weight from ptime, which is multiplied by
the score.

The result is as follows:
{
  responseHeader:{
status:0,
QTime:7,
params:{
  fl:id,
  shards:shard3,
  cache:false,
  indent:true,
  start:0,
  q:{!boost b=dateDeboost(ptime)}channelid:0082  (title:\abc\ ||
dkeys:\abc\),
  wt:json,
  rows:5}},
  response:{numFound:121,start:0,maxScore:0.5319116,docs:[
  {
id:9EORHN5I00824IHR},
  {
id:9EOPQGOI00824IMP},
  {
id:9EMATM6900824IHR},
  {
id:9EJLBOEN00824IHR},
  {
id:9E6V45IM00824IHR}]
  }}



{
  responseHeader:{
status:0,
QTime:6,
params:{
  fl:id,
  shards:shard3,
  cache:false,
  indent:true,
  start:0,
  q:{!boost b=dateDeboost(ptime)}channelid:0082  (title:\abc\ ||
dkeys:\abc\),
  wt:json,
  rows:5}},
  response:{numFound:121,start:0,maxScore:0.5319117,docs:[
  {
id:9EOPQGOI00824IMP},
  {
id:9EORHN5I00824IHR},
  {
id:9EMATM6900824IHR},
  {
id:9EJLBOEN00824IHR},
  {
id:9E1LP3S300824IHR}]
  }}





--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-FunctionQuery-inconsistency-tp4104346.html
Sent from the Solr - User mailing list archive at Nabble.com.


some cores goes down during indexing

2013-12-02 Thread Grzegorz Sobczyk
Hi
I have strange situation. During indexing some of cores goes down:

ZkController.publish(1017) | publishing core=shops5 state=down
ZkController.register(785) | Register replica - core:shops5 address:
http://host77:8280/solr collection:shops5 shard:shard1
ZkController.register(810) | We are http://host77:8280/solr/shops5/ and
leader is http://host136:8280/solr/shops5/

After that core doesn't register as working in the cloud, even if it should
(It can process requests).
For now I can only restart solr to fix the situation. Reload core doesn't
help.

Has someone faced similar problem?

Some info:
Multiple Solr 4.5.1 in SolrCloud
3x Zk

http://host77:8280/solr/#/shops5/replication:
Index Version Gen Size
Master (Searching) 1385955301185 67 127.62 KB
Master (Replicable) 1385955301185 67 -

http://host136:8280/solr/#/shops5/replication:
Index Version Gen Size
Master (Searching) 1385955301218 68 127.65 KB
Master (Replicable) 1385955301218 68 -

http://host141:8280/solr/#/shops5/replication:
Index Version Gen Size
Master (Searching) 1385955301265 68 127.37 KB
Master (Replicable) 1385955301265 68 -

Logs from other core:
ZkController.publish(1017) | publishing core=shops3 state=down
ZkController.register(785) | Register replica - core:shops3 address:
http://host77:8280/solr collection:shops3 shard:shard1
ZkController.register(810) | We are http://host77:8280/solr/shops3/ and
leader is http://host136:8280/solr/shops3/
ZkController.register(841) | No LogReplay needed for core=shops3 baseURL=
http://host77:8280/solr
ZkController.checkRecovery(993) | Core needs to recover:shops3
RecoveryStrategy.run(216) | Starting recovery process. core=shops3
recoveringAfterStartup=false
ZkController.publish(1017) | publishing core=shops3 state=recovering
RecoveryStrategy.doRecovery(356) | Attempting to PeerSync from
http://host136:8280/solr/shops3/ core=shops3 - recoveringAfterStartup=false
RecoveryStrategy.doRecovery(368) | PeerSync Recovery was successful -
registering as Active. core=shops3
ZkController.publish(1017) | publishing core=shops3 state=active
SolrCore.registerSearcher(1812) | [shops3] Registered new searcher
Searcher@45df7f8c main{StandardDirectoryReader(segments_ik:1977:nrt
_n1(4.5.1):C97)}
PeerSync.sync(186) | PeerSync: core=shops3
url=http://host77:8280/solrSTART replicas=[
http://host136:8280/solr/shops3/] nUpdates=100
PeerSync.handleVersions(346) | PeerSync: core=shops3 url=
http://host77:8280/solr Received 97 versions from host136:8280/solr/shops3/
PeerSync.handleVersions(399) | PeerSync: core=shops3 url=
http://host77:8280/solr Our versions are newer.
ourLowThreshold=1453188869165940736 otherHigh=1453279151809101824
PeerSync.sync(272) | PeerSync: core=shops3
url=http://host77:8280/solrDONE. sync succeeded

above lines are missing for core shops5

-- 
Grzegorz Sobczyk


Solr non-suuported languages

2013-12-02 Thread Prasi S
hi ,
I have a requirement to index and search few languages that are not
supported by solr. ( E.g countries like Slovenia, Moldova, Belarus etc.)

If i need to do only exact match against these langauges, what sort of
analyser, tokenizers would suit

thanks.


Thanks,
Prasi


Re: Solr non-suuported languages

2013-12-02 Thread Ahmet Arslan
Hi Prasi,

text_general thats ships with example schema.xml would suit.



On Monday, December 2, 2013 12:35 PM, Prasi S prasi1...@gmail.com wrote:
 
hi ,
I have a requirement to index and search few languages that are not
supported by solr. ( E.g countries like Slovenia, Moldova, Belarus etc.)

If i need to do only exact match against these langauges, what sort of
analyser, tokenizers would suit

thanks.


Thanks,
Prasi

Re: Solr query syntax.

2013-12-02 Thread Ahmet Arslan
Hi,

Choice of query parser depends on your needs. I am just surprised that you used 
prefix notation in your example. 
Default query parser syntax for and(blabla , name: george)  is q=blabla AND 
name:george
Term blabla (which does not consider field) parsed against default search 
field. Default field is set via df parameter.
https://cwiki.apache.org/confluence/display/solr/The+Standard+Query+Parser  




On Monday, December 2, 2013 10:17 AM, elmerfudd na...@012.net.il wrote:
 
Im using the default qparser that come with solr 4.4 , Is there anything
better?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-query-syntax-tp4103784p4104344.html

Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr query syntax.

2013-12-02 Thread Jack Krupansky
The edismax (ExtendedDisMax) query parser is the best, overall. There are 
other specialized query parsers with features that edismax does not have 
(e.g., surround for span queries, and complex phrase for wildcards in 
phrases.)


-- Jack Krupansky

-Original Message- 
From: elmerfudd

Sent: Monday, December 02, 2013 3:17 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr query syntax.

Im using the default qparser that come with solr 4.4 , Is there anything
better?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-query-syntax-tp4103784p4104344.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Constantly increasing time of full data import

2013-12-02 Thread michallos
Update: I can see that times increases when the search load is higher. During
nights and weekends full load times doesn't increase. So it is not caused by
the number of documents being loaded (during weekends we have the same
number of new documents) but number of queries / minute.

Anyone observe such strange behaviour? It is critical for us.

Best,
Michal



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Constantly-increasing-time-of-full-data-import-tp4103873p4104370.html
Sent from the Solr - User mailing list archive at Nabble.com.


How Whatsapp applies search techniques for conversation?

2013-12-02 Thread Anurag
I was just wondering how Whatsapp uses to implement search in the
conversations history.

Is it the same thing used for all kinds of android app supporting search on
chat/conversations?

Has anyone implemented on similar lines?


Thanks
Kumar Anurag



-
Kumar Anurag

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-Whatsapp-applies-search-techniques-for-conversation-tp4104366.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Auto optimized of Solr indexing results

2013-12-02 Thread Erick Erickson
TieredMergePolicy is the default even though it's
commented out in solrconfig, it's still being used.
So there's nothing to do.

Given the size of your index,  you can actually do
whatever you please. Optimizing it will shrink its size,
but frankly your index is so small I doubt you'll see any
noticeable difference. They'll self-purge as you re-crawl
eventually.

In all, I think you can mostly ignore the issue.

Best,
Erick


On Sun, Dec 1, 2013 at 8:00 PM, Bayu Widyasanyata
bwidyasany...@gmail.comwrote:

 Hi Erick,

 After waiting for some days abt. a week (I did daily crawling  indexing),
 here are the docs summary:

 Num Docs:   9738
 Max Doc:   15311
 Deleted Docs: 5573
 Version: 781
 Segment Count: 5

 The percentage of deletedDocs of NumDocs is near 57%.

 In the other, the TieredMergePolicy in solrconfig.xml is still disabled.

 !--
 mergePolicy class=org.apache.lucene.index.TieredMergePolicy
   int name=maxMergeAtOnce10/int
   int name=segmentsPerTier10/int
 /mergePolicy
   --

 Should we enable it and wait for the effect?

 Thanks!



 On Wed, Nov 20, 2013 at 9:55 PM, Bayu Widyasanyata
 bwidyasany...@gmail.comwrote:

  Thanks Erick.
  I will check that on next round.
 
  ---
  wassalam,
  [bayu]
 
  /sent from Android phone/
  On Nov 20, 2013 7:45 PM, Erick Erickson erickerick...@gmail.com
 wrote:
 
  You probably shouldn't optimize at all. The default TieredMergePolicy
  will eventually purge the deleted files' data, which is really what
  optimize
  does. So despite its name, most of the time it's not really worth the
  effort.
 
  Take a look at your Solr admin page, the overview link for a core.
  If the number of deleted docs is a significant percentage of your
  numDocs (I typically use 20% or so, but YMMV) then optimize
  might be worthwhile. Otherwise, it's a distraction unless and until
  you have some evidence that it actually makes a difference.
 
  Best,
  Erick
 
 
  On Wed, Nov 20, 2013 at 7:33 AM, Bayu Widyasanyata
  bwidyasany...@gmail.comwrote:
 
   Hi,
  
   After successfully configured re-crawling script, I sometimes checked
  and
   found on Solr Admin that Optimized status of my collection is not
   optimized (slash icon).
  
   Hence I did optimized steps manually.
  
   How to make my crawling optimized automatically?
  
   Should we restart Solr (I use Tomcat) as shown on here [1]
  
   [1] http://wiki.apache.org/nutch/Crawl
  
   Thanks!
  
   --
   wassalam,
   [bayu]
  
 
 


 --
 wassalam,
 [bayu]



Re: luke 4.5.0 released

2013-12-02 Thread Erick Erickson
Excellent! thanks!


On Mon, Dec 2, 2013 at 3:27 AM, Dmitry Kan solrexp...@gmail.com wrote:

 Hello!

 I have just released luke 4.5.0 along with the binary. It's version is
 reflecting the Lucene's version underneath.

 Feel free to test this and give feedback / submit bug fixes / patches.

 https://github.com/DmitryKey/luke/releases/tag/4.5.0

 Thanks.

 --
 Dmitry
 Blog: http://dmitrykan.blogspot.com
 Twitter: twitter.com/dmitrykan



Re: SolrCloud FunctionQuery inconsistency

2013-12-02 Thread Erick Erickson
I'm not quite sure what you're seeing as
inconsistent, you didn't say. Is it the
maxScore? Did you index any docs
in the mean time? Even though both
show 121 docs, if you updated some
docs it might affect the score because
the terms from the old docs still affect
tf/idf calcs and thus the boosted score.

Or if an optimize or merge happened,
that might also affect things.

Best,
Erick


On Mon, Dec 2, 2013 at 3:33 AM, sling sling...@gmail.com wrote:

 Hi,
 I have a solrcloud with 4 shards. They are running normally.
 How is possible that the same function query returns different results?
 And it happens even in the same shard?

 However, when sort by ptime desc, the result is consistent.
 The dateDeboost generate the time-weight from ptime, which is multiplied by
 the score.

 The result is as follows:
 {
   responseHeader:{
 status:0,
 QTime:7,
 params:{
   fl:id,
   shards:shard3,
   cache:false,
   indent:true,
   start:0,
   q:{!boost b=dateDeboost(ptime)}channelid:0082  (title:\abc\ ||
 dkeys:\abc\),
   wt:json,
   rows:5}},
   response:{numFound:121,start:0,maxScore:0.5319116,docs:[
   {
 id:9EORHN5I00824IHR},
   {
 id:9EOPQGOI00824IMP},
   {
 id:9EMATM6900824IHR},
   {
 id:9EJLBOEN00824IHR},
   {
 id:9E6V45IM00824IHR}]
   }}



 {
   responseHeader:{
 status:0,
 QTime:6,
 params:{
   fl:id,
   shards:shard3,
   cache:false,
   indent:true,
   start:0,
   q:{!boost b=dateDeboost(ptime)}channelid:0082  (title:\abc\ ||
 dkeys:\abc\),
   wt:json,
   rows:5}},
   response:{numFound:121,start:0,maxScore:0.5319117,docs:[
   {
 id:9EOPQGOI00824IMP},
   {
 id:9EORHN5I00824IHR},
   {
 id:9EMATM6900824IHR},
   {
 id:9EJLBOEN00824IHR},
   {
 id:9E1LP3S300824IHR}]
   }}





 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/SolrCloud-FunctionQuery-inconsistency-tp4104346.html
 Sent from the Solr - User mailing list archive at Nabble.com.



ShardSplit errors..

2013-12-02 Thread Annette Newton
Hi,

I have been trying to split a shard with little success.  I'm probably
missing something obvious but would appreciate a little help.

Solr version: 4.6.0
Number of documents in the Shard: 2,933,059
Index size: 6.52

I know I have some setting somewhere that I need to change but I believe I
have changed everything available.

At first I had a write.lock timeout so I upped that setting and got passed
it.

Command:

curl '
http://localhost:8983/solr/admin/collections?action=SPLITSHARDcollection=sessionfiltersetshard=shard1
'

Message returned:

?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeaderint name=status500/intint
name=QTime300023/int/lstlst name=errorstr name=msgsplitshard
the collection time out:300s/strstr
name=traceorg.apache.solr.common.SolrException: splitshard the
collection time out:300s
at
org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:204)
at
org.apache.solr.handler.admin.CollectionsHandler.handleSplitShardAction(CollectionsHandler.java:422)
at
org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:158)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:662)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:197)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
at
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:722)
/strint name=code500/int/lst
/response

During the run time I'm not convinced it's doing anything.  No split
directory is created, CPU remains very low and memory doesn't seem to spike.

Any help would be greatly appreciated.

-- 

Annette Newton

Database Administrator

ServiceTick Ltd



T:+44(0)1603 618326



Seebohm House, 2-4 Queen Street, Norwich, England NR2 4SQ

www.servicetick.com

*www.sessioncam.com http://www.sessioncam.com*

-- 
*This message is confidential and is intended to be read solely by the 
addressee. The contents should not be disclosed to any other person or 
copies taken unless authorised to do so. If you are not the intended 
recipient, please notify the sender and permanently delete this message. As 
Internet communications are not secure ServiceTick accepts neither legal 
responsibility for the contents of this message nor responsibility for any 
change made to this message after it was forwarded by the original author.*


ANNOUNCE: Apache Solr Reference Guide 4.6

2013-12-02 Thread Chris Hostetter


The Lucene PMC is pleased to announce the release of the Apache Solr 
Reference Guide for Solr 4.6.


This 347 page PDF serves as the definitive users manual for Solr 4.6.

The Solr Reference Guide is available for download from the Apache mirror 
network:


  https://www.apache.org/dyn/closer.cgi/lucene/solr/ref-guide/

(If you have followup questions, please send them only to 
solr-user@lucene.apache.org)


-Hoss


Re: Function query matching

2013-12-02 Thread Peter Keegan
I'm persuing this possible PostFilter solution, I can see how to collect
all the hits and recompute the scores in a PostFilter, after all the hits
have been collected (for scaling). Now, I can't see how to get the custom
doc/score values back into the main query's HitQueue. Any advice?

Thanks,
Peter


On Fri, Nov 29, 2013 at 9:18 AM, Peter Keegan peterlkee...@gmail.comwrote:

 Instead of using a function query, could I use the edismax query (plus
 some low cost filters not shown in the example) and implement the
 scale/sum/product computation in a PostFilter? Is the query's maxScore
 available there?

 Thanks,
 Peter


 On Wed, Nov 27, 2013 at 1:58 PM, Peter Keegan peterlkee...@gmail.comwrote:

 Although the 'scale' is a big part of it, here's a closer breakdown. Here
 are 4 queries with increasing functions, and theei response times (caching
 turned off in solrconfig):

 100 msec:
 select?q={!edismax v='news' qf='title^2 body'}

 135 msec:
 select?qq={!edismax v='news' qf='title^2
 body'}q={!func}product(field(myfield),query($qq)fq={!query v=$qq}

 200 msec:
 select?qq={!edismax v='news' qf='title^2
 body'}q={!func}sum(product(0.75,query($qq)),product(0.25,field(myfieldfq={!query
 v=$qq}

 320 msec:
  select?qq={!edismax v='news' qf='title^2
 body'}scaledQ=scale(product(query($qq),1),0,1)q={!func}sum(product(0.75,$scaledQ),product(0.25,field(myfield)))fq={!query
 v=$qq}

 Btw, that no-op product is necessary, else you get this exception:

 org.apache.lucene.search.BooleanQuery$BooleanWeight cannot be cast to 
 org.apache.lucene.queries.function.valuesource.ScaleFloatFunction$ScaleInfo

 thanks,

 peter



 On Wed, Nov 27, 2013 at 1:30 PM, Chris Hostetter 
 hossman_luc...@fucit.org wrote:


 : So, this query does just what I want, but it's typically 3 times slower
 : than the edismax query  without the functions:

 that's because the scale() function is inhernetly slow (it has to
 compute the min  max value for every document in order to know how to
 scale them)

 what you are seeing is the price you have to pay to get that query with a
 normalized 0-1 value.

 (you might be able to save a little bit of time by eliminating that
 no-Op multiply by 1: product(query($qq),1) ... but i doubt you'll even
 notice much of a chnage given that scale function.

 : Is there any way to speed this up? Would writing a custom function
 query
 : that compiled all the function queries together be any faster?

 If you can find a faster implementation for scale() then by all means let
 us konw, and we can fold it back into Solr.


 -Hoss






Re: SolrCloud FunctionQuery inconsistency

2013-12-02 Thread Chris Hostetter

: However, when sort by ptime desc, the result is consistent.
: The dateDeboost generate the time-weight from ptime, which is multiplied by
: the score.

As Erick mentioned, you haven't given us enough details to make any 
educated guesses as to what problem you are seeing.

My wild, uneducated, shot in the dark guess: are you populating ptime 
using a default of NOW?  

If so, can you rule out the function as an issue by asking for fl=id,ptime 
and confirming that the ptime for these documents sometimes varies slightly?

NOTE: Allthough it is possible to configure a TrieDateField instance with 
a default value of NOW to compute a timestamp of when the document was 
indexed, this is not advisable when using SolrCloud since each replica of 
the document may compute a slightly different value. 
TimestampUpdateProcessorFactory is recomended instead.

https://lucene.apache.org/solr/4_6_0/solr-core/org/apache/solr/schema/TrieDateField.html
https://lucene.apache.org/solr/4_6_0/solr-core/org/apache/solr/update/processor/TimestampUpdateProcessorFactory.html


-Hoss


Re: solr as a service for multiple projects in the same environment

2013-12-02 Thread Ing. Jorge Luis Betancourt Gonzalez
I think that one experience in this area could by provided by Tray Grainger, 
author of Solr in Action, I believe that some of his work on careerbuilder 
involve the creation of something (somehow) similar to what you're trying to 
accomplish. I must say that I'm also interested in this topic, but haven't had 
the time to really do anything about this.

- Mensaje original -
De: adfel70 adfe...@gmail.com
Para: solr-user@lucene.apache.org
Enviados: Domingo, 1 de Diciembre 2013 2:41:00
Asunto: Re: solr as a service for multiple projects in the same environment

The risk is if you buy mistake mess up a cluster while doing maintenance on
one of the systems, you can affect the other system.
Its a pretty amorfic risk.
Aside from having multiple systems share the same hardware resources, I
don't see any other real risk.

Are your collections share the same topology in terms of shards and
replicas?
Do you manually configure the nodes on which each collection is created so
that you'll still have some level of seperation between the systems?




michael.boom wrote
 Hi,
 
 There's nothing unusual in what you are trying to do, this scenario is
 very common.
 
 To answer your questions:
 1. as I understand I can separate the configs of each collection in
 zookeeper. is it correct? 
 Yes, that's correct. You'll have to upload your configs to ZK and use the
 CollectionAPI to create your collections.
 
2.are there any solr operations that can be performed on collection A and
somehow affect collection B? 
 No, I can't think of any cross-collection operation. Here you can find a
 list of collection related operations:
 https://cwiki.apache.org/confluence/display/solr/Collections+API
 
3. is the solr cache separated for each collection? 
 Yes, separate and configurable in solrconfig.xml for each collection.
 
4. I assume that I'll encounter a problem with the os cache, when the
different indices will compete on the same memory, right? how severe is this
issue? 
 Hardware can be a bottleneck. If all your collection will face the same
 load you should try to give solr a RAM amount equal to the index size (all
 indexes)
 
5. any other advice on building such an architecture? does the maintenance
overhead of maintaining multiple clusters in production really overwhelm the
problems and risks of using the same cluster for multiple systems? 
 I was in the same situation as you, and putting everything in multiple
 collections in just one cluster made sense for me : it's easier to manage
 and has no obvious downside. As for risks of using the same cluster for
 multiple systems they are pretty much the same  in both scenarios. Only
 that with multiple clusters you'll have much more machines to manage.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-as-a-service-for-multiple-projects-in-the-same-environment-tp4103523p4104206.html
Sent from the Solr - User mailing list archive at Nabble.com.

III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu

III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu


Proxy.php tutorials for AJAX Solr

2013-12-02 Thread Reyes, Mark
Are there any good tutorials that touch base on how to integrate the suggested 
PHP proxy for JavaScript framework AJAX Solr?

Here is the proxy, https://gist.github.com/evolvingweb/298580

Also on Stackoverflow, 
http://stackoverflow.com/questions/20338073/proxy-php-tutorials-for-ajax-solr

IMPORTANT NOTICE: This e-mail message is intended to be received only by 
persons entitled to receive the confidential information it may contain. E-mail 
messages sent from Bridgepoint Education may contain information that is 
confidential and may be legally privileged. Please do not read, copy, forward 
or store this message unless you are an intended recipient of it. If you 
received this transmission in error, please notify the sender by reply e-mail 
and delete the message and any attachments.

Re: Error integrating opennlp in solr

2013-12-02 Thread Furkan KAMACI
Did you check here: http://wiki.apache.org/solr/OpenNLP

30 Kasım 2013 Cumartesi tarihinde Arti a...@j9ventures.com adlı kullanıcı
şöyle yazdı:


 Hi Team ,

 I am getting the stack of errors given below while integrating solr with
OpenNLP. Please help.





 Caused by: org.apache.solr.common.SolrException: Plugin init failure for
[schema.xml] fieldType text_opennlp: Plugin init failure for [schema.xml]
a
 nalyzer/tokenizer: Error loading class 'solr.OpenNLPTokenizerFactory'
 at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177)
 at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:467)
 ... 15 more
 Caused by: org.apache.solr.common.SolrException: Plugin init failure for
[schema.xml] analyzer/tokenizer: Error loading class
'solr.OpenNLPTokenizerFa
 ctory'



 With Regards,



 Arti Lamba





 -

 Dainik Jagran - Largest Read Daily of India with 56.5 Million Readers.
(Source: Indian Readership Survey 2012 Q4)

 www.jagran.com www.jplcorp.in www.adrates.jagran.com



Re: Error integrating opennlp in solr

2013-12-02 Thread Furkan KAMACI
Especially here: Also, you may have to add the OpenNLP lib directory to
your solr/lib or solr/cores/collection/lib directory. The text types assume
that cores/collection/conf/opennlp contains the OpenNLP model files.

3 Aralık 2013 Salı tarihinde Furkan KAMACI furkankam...@gmail.com adlı
kullanıcı şöyle yazdı:
 Did you check here: http://wiki.apache.org/solr/OpenNLP

 30 Kasım 2013 Cumartesi tarihinde Arti a...@j9ventures.com adlı
kullanıcı şöyle yazdı:


 Hi Team ,

 I am getting the stack of errors given below while integrating solr with
OpenNLP. Please help.





 Caused by: org.apache.solr.common.SolrException: Plugin init failure for
[schema.xml] fieldType text_opennlp: Plugin init failure for [schema.xml]
a
 nalyzer/tokenizer: Error loading class 'solr.OpenNLPTokenizerFactory'
 at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177)
 at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:467)
 ... 15 more
 Caused by: org.apache.solr.common.SolrException: Plugin init failure for
[schema.xml] analyzer/tokenizer: Error loading class
'solr.OpenNLPTokenizerFa
 ctory'



 With Regards,



 Arti Lamba





 -

 Dainik Jagran - Largest Read Daily of India with 56.5 Million Readers.
(Source: Indian Readership Survey 2012 Q4)

 www.jagran.com www.jplcorp.in www.adrates.jagran.com



Re: SolrCloud FunctionQuery inconsistency

2013-12-02 Thread sling
Thanks, Erick

I mean the first id of the results is not consistent, and the maxScore is
not too.

When query, I do index docs at the same time, but they are not revelent to
this query. 

The updated docs can not affect tf cals, and for idf, they should affect for
all docs, so the results should consistent.

But for the same query, it shows diffenents sort(either sort A or sort B)
over and over.

Thanks,
sling



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-FunctionQuery-inconsistency-tp4104346p4104549.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud FunctionQuery inconsistency

2013-12-02 Thread sling
Thank for your reply, Chris.

Yes, I am populating ptime using a default of NOW.

I only store the id, so I can't get ptime values. But from the perspective
of business logic, ptime should not change.

Strangely, the sort result is consistent now... :(
I should do more test case...



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-FunctionQuery-inconsistency-tp4104346p4104558.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Function query matching

2013-12-02 Thread Trey Grainger
We're working on the same problem with the combination of the
scale(query(...)) combination, so I'd like to share a bit more information
that may be useful.

*On the scale function:*
Even thought the scale query has to calculate the scores for all documents,
it is actually doing this work twice for each ValueSource (once to
calculate the min and max values, and then again when actually scoring the
documents), which is inefficient.

To solve the problem, we're in the process of putting a cache inside the
scale function to remember the values for each document when they are
initially computed (to find the min and max) so that the second pass can
just use the previously computed values for each document.  Our theory is
that most of the extra time due to the scale function is really just the
result of doing duplicate work.

No promises this won't be overly costly in terms of memory utilization, but
we'll see what we get in terms of speed improvements and will share the
code if it works out well.  Alternate implementation suggestions (or
criticism of a cache like this) are also welcomed.


*On the NoOp product function: scale(prod(1, query(...))):*
We do the same thing, which ultimately is just an unnecessary waste of a
loop through all documents to do an extra multiplication step.  I just
debugged the code and uncovered the problem.  There is a Map (called
context) that is passed through to each value source to store intermediate
state, and both the query and scale functions are passing the ValueSource
for the query function in as the KEY to this Map (as opposed to using some
composite key that makes sense in the current context).  Essentially, these
lines are overwriting each other:

Inside ScaleFloatFunction: context.put(this.source, scaleInfo);
 //this.source refers to the QueryValueSource, and the scaleInfo refers to
a ScaleInfo object
Inside QueryValueSource: context.put(this, w); //this refers to the same
QueryValueSource from above, and the w refers to a Weight object

As such, when the ScaleFloatFunction later goes to read the ScaleInfo from
the context Map, it unexpectedly pulls the Weight object out instead and
thus the invalid case exception occurs.  The NoOp multiplication works
because it puts an different ValueSource between the query and the
ScaleFloatFunction such that this.source (in ScaleFloatFunction) != this
(in QueryValueSource).

This should be an easy fix.  I'll create a JIRA ticket to use better key
names in these functions and push up a patch.  This will eliminate the need
for the extra NoOp function.

-Trey


On Mon, Dec 2, 2013 at 12:41 PM, Peter Keegan peterlkee...@gmail.comwrote:

 I'm persuing this possible PostFilter solution, I can see how to collect
 all the hits and recompute the scores in a PostFilter, after all the hits
 have been collected (for scaling). Now, I can't see how to get the custom
 doc/score values back into the main query's HitQueue. Any advice?

 Thanks,
 Peter


 On Fri, Nov 29, 2013 at 9:18 AM, Peter Keegan peterlkee...@gmail.com
 wrote:

  Instead of using a function query, could I use the edismax query (plus
  some low cost filters not shown in the example) and implement the
  scale/sum/product computation in a PostFilter? Is the query's maxScore
  available there?
 
  Thanks,
  Peter
 
 
  On Wed, Nov 27, 2013 at 1:58 PM, Peter Keegan peterlkee...@gmail.com
 wrote:
 
  Although the 'scale' is a big part of it, here's a closer breakdown.
 Here
  are 4 queries with increasing functions, and theei response times
 (caching
  turned off in solrconfig):
 
  100 msec:
  select?q={!edismax v='news' qf='title^2 body'}
 
  135 msec:
  select?qq={!edismax v='news' qf='title^2
  body'}q={!func}product(field(myfield),query($qq)fq={!query v=$qq}
 
  200 msec:
  select?qq={!edismax v='news' qf='title^2
 
 body'}q={!func}sum(product(0.75,query($qq)),product(0.25,field(myfieldfq={!query
  v=$qq}
 
  320 msec:
   select?qq={!edismax v='news' qf='title^2
 
 body'}scaledQ=scale(product(query($qq),1),0,1)q={!func}sum(product(0.75,$scaledQ),product(0.25,field(myfield)))fq={!query
  v=$qq}
 
  Btw, that no-op product is necessary, else you get this exception:
 
  org.apache.lucene.search.BooleanQuery$BooleanWeight cannot be cast to
 org.apache.lucene.queries.function.valuesource.ScaleFloatFunction$ScaleInfo
 
  thanks,
 
  peter
 
 
 
  On Wed, Nov 27, 2013 at 1:30 PM, Chris Hostetter 
  hossman_luc...@fucit.org wrote:
 
 
  : So, this query does just what I want, but it's typically 3 times
 slower
  : than the edismax query  without the functions:
 
  that's because the scale() function is inhernetly slow (it has to
  compute the min  max value for every document in order to know how to
  scale them)
 
  what you are seeing is the price you have to pay to get that query
 with a
  normalized 0-1 value.
 
  (you might be able to save a little bit of time by eliminating that
  no-Op multiply by 1: product(query($qq),1) ... but i doubt you'll
 even
  notice much of a chnage 

Re: Auto optimized of Solr indexing results

2013-12-02 Thread Bayu Widyasanyata
Thanks Erick for your advance and share.

Regards,


On Mon, Dec 2, 2013 at 11:06 PM, Erick Erickson erickerick...@gmail.comwrote:

 TieredMergePolicy is the default even though it's
 commented out in solrconfig, it's still being used.
 So there's nothing to do.

 Given the size of your index,  you can actually do
 whatever you please. Optimizing it will shrink its size,
 but frankly your index is so small I doubt you'll see any
 noticeable difference. They'll self-purge as you re-crawl
 eventually.

 In all, I think you can mostly ignore the issue.

 Best,
 Erick


 On Sun, Dec 1, 2013 at 8:00 PM, Bayu Widyasanyata
 bwidyasany...@gmail.comwrote:

  Hi Erick,
 
  After waiting for some days abt. a week (I did daily crawling 
 indexing),
  here are the docs summary:
 
  Num Docs:   9738
  Max Doc:   15311
  Deleted Docs: 5573
  Version: 781
  Segment Count: 5
 
  The percentage of deletedDocs of NumDocs is near 57%.
 
  In the other, the TieredMergePolicy in solrconfig.xml is still disabled.
 
  !--
  mergePolicy class=org.apache.lucene.index.TieredMergePolicy
int name=maxMergeAtOnce10/int
int name=segmentsPerTier10/int
  /mergePolicy
--
 
  Should we enable it and wait for the effect?
 
  Thanks!
 
 
 
  On Wed, Nov 20, 2013 at 9:55 PM, Bayu Widyasanyata
  bwidyasany...@gmail.comwrote:
 
   Thanks Erick.
   I will check that on next round.
  
   ---
   wassalam,
   [bayu]
  
   /sent from Android phone/
   On Nov 20, 2013 7:45 PM, Erick Erickson erickerick...@gmail.com
  wrote:
  
   You probably shouldn't optimize at all. The default TieredMergePolicy
   will eventually purge the deleted files' data, which is really what
   optimize
   does. So despite its name, most of the time it's not really worth the
   effort.
  
   Take a look at your Solr admin page, the overview link for a core.
   If the number of deleted docs is a significant percentage of your
   numDocs (I typically use 20% or so, but YMMV) then optimize
   might be worthwhile. Otherwise, it's a distraction unless and until
   you have some evidence that it actually makes a difference.
  
   Best,
   Erick
  
  
   On Wed, Nov 20, 2013 at 7:33 AM, Bayu Widyasanyata
   bwidyasany...@gmail.comwrote:
  
Hi,
   
After successfully configured re-crawling script, I sometimes
 checked
   and
found on Solr Admin that Optimized status of my collection is not
optimized (slash icon).
   
Hence I did optimized steps manually.
   
How to make my crawling optimized automatically?
   
Should we restart Solr (I use Tomcat) as shown on here [1]
   
[1] http://wiki.apache.org/nutch/Crawl
   
Thanks!
   
--
wassalam,
[bayu]
   
  
  
 
 
  --
  wassalam,
  [bayu]
 




-- 
wassalam,
[bayu]


Re: Constantly increasing time of full data import

2013-12-02 Thread Ryan Cutter
Michal,

I don't have much experience with DIH so I'll leave that to someone else
but I would suggest you profile Solr during imports.  That might show you
where the bottleneck is.

Generally, it's reasonable to think Solr updates will get slower the larger
the indexes get and the more load you put on the system.  It's possible
you're seeing something outside the norm - I just don't know what you were
expecting and the capabilities of your resources.

You might want to post more info (autoCommit settings, etc) as well.

Thanks, Ryan


On Mon, Dec 2, 2013 at 4:22 AM, michallos michal.ware...@gmail.com wrote:

 Update: I can see that times increases when the search load is higher.
 During
 nights and weekends full load times doesn't increase. So it is not caused
 by
 the number of documents being loaded (during weekends we have the same
 number of new documents) but number of queries / minute.

 Anyone observe such strange behaviour? It is critical for us.

 Best,
 Michal



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Constantly-increasing-time-of-full-data-import-tp4103873p4104370.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Indexing Multiple Languages with solr (Arabic English)

2013-12-02 Thread aniljayanti
Hi,

I am working on solr for using searching by indexing with text_general for
ENGLISH language. Search is working fine. Now I have a Arabic text, which
needs to indexing and searching. Below is my basic config for English.* Same
field contains ENGLISH and ARABIC text in database*. Please guide me in
this.

fieldType name=text_general class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /

filter class=solr.LowerCaseFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType

I saw below configs in schema.xml file for Arabic language. 

 
fieldType name=text_ar class=solr.TextField
positionIncrementGap=100
  analyzer 
tokenizer class=solr.StandardTokenizerFactory/

filter class=solr.LowerCaseFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=lang/stopwords_ar.txt enablePositionIncrements=true/

filter class=solr.ArabicNormalizationFilterFactory/
filter class=solr.ArabicStemFilterFactory/
  /analyzer
/fieldType

Please suggest me to configure Arabic indexing and searching.

Thanks in Advance,

AnilJayanti




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-Multiple-Languages-with-solr-Arabic-English-tp4104580.html
Sent from the Solr - User mailing list archive at Nabble.com.


Using the flexible query parser in Solr instead of classic

2013-12-02 Thread Karsten R.
Hi folks,
 
last year we built a 3.X Solr-QueryParser based on
org.apache.lucene.queryparser.flexible.standard.StandardQueryParser because
we had some additions with SpanQueries and PhraseQueries. We think about to
adapt this for 4.X
 
At time the SolrQueryParser is based on
org.apache.lucene.queryparser.classic.QueryParser.jj
 
Is there a plan for 4.X to switch with LuceneQParser from classic to
flexible (
org.apache.lucene.queryparser.flexible.standard.parser.StandardSyntaxParser.jj
)?
Is there a SOLR-Task to use the flexible QP ?
Is this a need for someone else?
 
Beste regards
  Karsten


P.S. I did only found one (unanswered) Thread and no Task about Solr and
flexible QP (Thread:
http://lucene.472066.n3.nabble.com/Using-the-contrib-flexible-query-parser-in-Solr-td819.html
)
 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-the-flexible-query-parser-in-Solr-instead-of-classic-tp4104584.html
Sent from the Solr - User mailing list archive at Nabble.com.


post filtering for boolean filter queries

2013-12-02 Thread Dmitry Kan
Hello!

We have been experimenting with post filtering lately. Our setup is a
filter having long boolean query; drawing the example from the Dublin's
Stump the Chump:

fq=UserId:(user1 OR user2 OR...OR user1000)

The underlining issue impacting performance is that the combination of user
ids in the query above is unique per each user in the system and on top the
combination is changing every day.

Our idea was to stop caching the filter query with {!cache=false}. Since
there is no way to introspect the contents of the filter cache to our
knowledge (jmx?), we can't be sure those are not cached. This is because
the initial query per each combination takes substantially more time (as if
it was *not* cached) than the second and subsequent queries with the same
fq (as if it *was* cached).

Question is: does post filtering support boolean queries in fq params?

Another thing we have been trying is assigning a cost to the fq relatively
higher than for other filter queries. Does this feature support the boolean
queries in fq params as well?

-- 
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: twitter.com/dmitrykan


Re: Best approach to multiple languages

2013-12-02 Thread aniljayanti
Hi 

thanks for you post. I am searching for this type of multiple language
indexing and searching in solr. Below is my post in lecene. Can you please
help me out of this.

http://lucene.472066.n3.nabble.com/Indexing-Multiple-Languages-with-solr-Arabic-amp-English-td4104580.html

thanks in advance,

aniljayanti



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Best-approach-to-multiple-languages-tp498198p4104593.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: ANNOUNCE: Apache Solr Reference Guide 4.6

2013-12-02 Thread Bernd Fehling
But it still has the error about TrimFilterFactory in it, which I reported a 
couple of days back.
http://www.mail-archive.com/solr-user@lucene.apache.org/msg92064.html

So what it needs to correct the Reference Guide is to place a note like under 
StopFilter somewhere under TrimFilter:
As of Solr 4.4, the updateOffsets argument is no longer supported.


By the way, I found the solution to my question by looking into the sources.
Thanks anyway.

Bernd


Am 02.12.2013 18:28, schrieb Chris Hostetter:
 
 The Lucene PMC is pleased to announce the release of the Apache Solr 
 Reference Guide for Solr 4.6.
 
 This 347 page PDF serves as the definitive users manual for Solr 4.6.
 
 The Solr Reference Guide is available for download from the Apache mirror 
 network:
 
   https://www.apache.org/dyn/closer.cgi/lucene/solr/ref-guide/
 
 (If you have followup questions, please send them only to 
 solr-user@lucene.apache.org)
 
 -Hoss