sort on facet.index?

2015-04-02 Thread Derek Poh

Is sorting on facet index supported?

I would like to sort on the below facet index

lst name=P_SupplierRanking
int name=014/int
int name=18/int
int name=212/int
int name=3349/int
int name=481/int
int name=58/int
int name=612/int
/lst

to

lst name=P_SupplierRanking
int name=612/int
int name=58/int
int name=481/int
int name=3349/int
...
...
...
/lst

-Derek


Re: Solr -indexing from csv file having 28 cols taking lot of time ..plz help i m new to solr

2015-04-02 Thread avinash09
Alex,
finally it worked for me found ctrl A separator ==( separator=%01escape=\)

Thanks for your help



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-indexing-from-csv-file-having-28-cols-taking-lot-of-time-plz-help-i-m-new-to-solr-tp4196904p4197143.html
Sent from the Solr - User mailing list archive at Nabble.com.


Alphanumeric Wild card search

2015-04-02 Thread Palagiri, Jayasankar
Hello Team,

Below is my field type

fieldType name=text_en_splitting class=solr.TextField 
positionIncrementGap=100 autoGeneratePhraseQueries=true
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
!-- in this example, we will only use synonyms at query time
filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt 
ignoreCase=true expand=false/
--
!-- Case insensitive stop word removal.
--
filter class=solr.StopFilterFactory
ignoreCase=true
words=lang/stopwords_en.txt
/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 
splitOnCaseChange=1 /
filter class=solr.LowerCaseFilterFactory/
filter class=solr.KeywordMarkerFilterFactory 
protected=protwords.txt/
filter class=solr.PorterStemFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory
ignoreCase=true
words=lang/stopwords_en.txt
/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 
splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.KeywordMarkerFilterFactory 
protected=protwords.txt/
filter class=solr.PorterStemFilterFactory/
  /analyzer
/fieldType


And my field is I have few docunets in my index

Like 1234-305, 1234-308,1234-318.

When I search
Thanks and Regards,
Jaya



Re: SolrCloud 5.0 cluster RAM requirements

2015-04-02 Thread Toke Eskildsen
Ryan Steele ryan.ste...@pgi.com wrote:
 Does a SolrCloud 5.0 cluster need enough RAM across the cluster to load
 all the collections into RAM at all times?

Although Shawn is right about us not being able to answer properly, sometimes 
we can give qualified suggestions and guesses. At least to the direction you 
should be looking. The quality of the guesses goes up with the amount of 
information provided and 1TB is really not much information.

- Are you indexing while searching? How much?
- How many documents in the index?
- What is a typical query? What about faceting?
- How many concurrent queries?
- Expected median response time?

 I'm building a SolrCloud cluster that may have approximately 1 TB of
 data spread across the collections.

We're running a 22TB SolrCloud of a single 16-core server with 256GB RAM. We've 
also had performance problems serving a 100GB index from a same-size machine.

The one hardware advice I will give is to start with SSDs and scale from there. 
With present day price/performance, using spinning drives for anything 
IO-intensive makes little sense.

- Toke Eskildsen


Question regarding enablePositionIncrements

2015-04-02 Thread Aman Tandon
Hi,

I was using the enablePositionIncrements in solr 4.8.1 schema. But when I
tries to use it in solr-5.0.0 it is giving error in creating the collection

If I am correct it was useful in phrase queries. So is there any particular
reasons for not supporting this option in solr 5? If so, then please
explain it to me. Thanks in advance.

With Regards
Aman Tandon


Re: Restart solr failed after applied the patch in https://issues.apache.org/jira/browse/SOLR-6359

2015-04-02 Thread forest_soup
Thanks Ramkumar!

Understood. We will try 100, 10. 

But with our original steps which we found the exception, can we say that
the patch has some issue? 
1, put the patch to all 5 running solr servers(tomcat) by replacing the
tomcat/webapps/solr/WEB-INF/lib/solr-core-4.7.0.jar with the patched
solr-core-4.7-SNAPSHOT.jar I built out. And we keep them all running.
2, uploaded the solrconfig.xml to zookeeper with below changes: 
updateLog
str name=dir${solr.ulog.dir:}/str
int name=numRecordsToKeep1/int
int name=maxNumLogsToKeep100/int
/updateLog
3, restarted solr server 1(tomcat), after it restarted, it has that
exception in my first POST.
4, restarted solr server 1 again, it still has the same issue.
5, restored the patch by replace the
tomcat/webapps/solr/WEB-INF/lib/solr-core-4.7-SNAPSHOT.jar with the orignal
4.7.0 one.
6, restarted solr server 1 again, there is no that issue. 

So we are thinking if we will have that in version 5.1, after we upgrade
solr, and doing rolling restart, will the issue emerge and we have to do a
full restart which causes service outage. 

Thanks! 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Restart-solr-failed-after-applied-the-patch-in-https-issues-apache-org-jira-browse-SOLR-6359-tp4196251p4197163.html
Sent from the Solr - User mailing list archive at Nabble.com.


ShardHandler semantics

2015-04-02 Thread Gregg Donovan
We're starting work on adding backup requests
http://static.googleusercontent.com/media/research.google.com/en/us/people/jeff/Berkeley-Latency-Mar2012.pdf
to the ShardHandler. Roughly something like:

1. Send requests to 100 shards.
2. Wait for results from 75 to come back.
3. Wait for either a) the other 25 to come back or b) 20% more time to
elapse
4. If any shards have still not returned results, send a second request to
a different server for each of the missing shards.

I want to be sure I understand the ShardHandler contract correctly before
getting started. My understanding is :

--ShardHandler#take methods
https://github.com/apache/lucene-solr/blob/dff38c2051ba26f928687139218bbc43e9004ebe/solr/core/src/java/org/apache/solr/handler/component/ShardHandler.java#L25:L26
can be called with different ShardRequests having been submitted
https://github.com/apache/lucene-solr/blob/dff38c2051ba26f928687139218bbc43e9004ebe/solr/core/src/java/org/apache/solr/handler/component/ShardHandler.java#L24
.
--ShardHandler#takeXXX is then called in a loop, returning a ShardResponse
from the last shard returning for a given ShardRequest.
--When ShardHandler#takeXXX returns null, the SearchHandler
https://github.com/apache/lucene-solr/blob/dff38c2051ba26f928687139218bbc43e9004ebe/solr/core/src/java/org/apache/solr/handler/component/SearchHandler.java#L277:L367
proceeds
https://github.com/apache/lucene-solr/blob/dff38c2051ba26f928687139218bbc43e9004ebe/solr/core/src/java/org/apache/solr/handler/component/SearchHandler.java#L333
.

For example, the flow could look like:

shardHandler.submit(slowGroupingRequest, shard1, groupingParams);
shardHandler.submit(slowGroupingRequest, shard2, groupingParams);
shardHandler.submit(fastFacetRefinementRequest, shard1, facetParams);
shardHandler.submit(fastFacetRefinementRequest, shard2, facetParams);
shardHandler.takeCompletedOrError(); // returns fastFacetRefinementRequest
with responses
shardHandler.takeCompletedOrError(); // returns slowGroupingRequest with
responses
shardHandler.takeCompletedOrError(); // return null, SearchHandler exits
take loop

Does that seem like a correct understanding of the
SearchHandler-ShardHandler interaction?

If so, it seems that to make backup requests work we'd need to fanout
individual ShardRequests independently, each with its own completion
service and pending queue. Does that sound right?

Thanks!

--Gregg


Re: newbie questions regarding solr cloud

2015-04-02 Thread Upayavira
A couple of additions:

I had a system that indexed log files. I created a new core each day
(some 20m log events/day). I created collection aliases called today,
week and month that aggregated the relevant collections. That way,
accessing the “today” collection would always get you to the right
place. And I could unload, or delete, collections over a certain age.

Second thing - some months ago, I created a pull request against pysolr
that added Zookeeper support. Please use it, try it, and comment on the
PR, as it hasn’t been merged yet. I’m keen to get feedback on whether it
works for you. When testing it, I had it happily notice a node going
down and redirect traffic to another host within 200ms, and did so
transparently. I will likely be starting to use it in a project in the
next few weeks myself.

Upayavira

On Thu, Apr 2, 2015, at 09:00 PM, Erick Erickson wrote:
 See inline:
 
 On Thu, Apr 2, 2015 at 12:36 PM, Ben Hsu ben@criticalmedia.com
 wrote:
  Hello
 
  I am playing with solr5 right now, to see if its cloud features can replace
  what we have with solr 3.6, and I have some questions, some newbie, and
  some not so newbie
 
  Background: the documents we are putting in solr have a date field. the
  majority of our searches are restricted to documents created within the
  last week, but searches do go back 60 days. documents older than 60 days
  are removed from the repo. we also want high availability in case a machine
  becomes unavailable
 
  our current method, using solr 3.6, is to split the data into 1 day chunks,
  within each day the data is split into several shards, and each shard has 2
  replicas. Our code generates the list of cores to be queried on based on
  the time ranged in the query. Cores that fall off the 60 day range are
  deleted through solr's RESTful API.
 
  This all sounds a lot like what Solr Cloud provides, so I started looking
  at Solr Cloud's features.
 
  My newbie questions:
 
   - it looks like the way to write a document is to pick a node (possibly
  using a LB), send it to that node, and let solr figure out which nodes that
  document is supposed to go. is this the recommended way?
 
 [EOE] That's totally fine. If you're using SolrJ a better way is to
 use CloudSolrClient
 which sends the docs to the proper leader, thus saving one hop.
 
   - similarly, can I just randomly pick a core (using the demo example:
  http://localhost:7575/solr/#/gettingstarted_shard1_replica2/query ), query
  it, and let it scatter out the queries to the appropriate cores, and send
  me the results back? will it give me back results from all the shards?
 
 [EOE] Yes. Actually, you don't even have to pick a core, just a
 collection.
 The # is totally unneeded, it's just part of navigating around the UI. So
 this
 should work:
 http://localhost:7575/solr/gettingstarted/query?q=*:*
 
   - is there a recommended Python library?
 [EOE] Unsure. If you do find one, check that it has the
 CloudSolrClient support as
 I expect that would take the most effort
 
 
  My hopefully less newbie questions:
   - does solr auto detect when node become unavailable, and stop sending
  queries to them?
 
 [EOE] Yes, that's what Zookeeper is all about. As each Solr node comes up
 it
 registers itself as a listener for collection state changes. ZK
 detects a node dying and
 notifies all the remaining nodes that nodeX is out of commission and
 they adjust accordingly.
 
   - when the master node dies and the cluster elects a new master, what
  happens to writes?
 [EOE] Stop thinking master/slave! It's leaders and replicas
 (although I'm trying
 to use leaders and followers). The critical bit is that on an
 update, the raw document
 is forwarded from the leader to all followers so they can come and go.
 You simply cannot
 rely on a particular node that is a leader remaining the leader. For
 instance, if you bring up
 your nodes in a different order tomorrow, the leaders and followers
 won't be the same.
 
 
   - what happens when a node is unavailable
 [EOE] SolrCloud does the right thing and keeps on chugging. See the
 comments about
 auto-detect. The exception is that if _all_ the nodes hosting a shard
 go down, you cannot
 add to the index and queries will fail unless you set
 shards.tolerant=true.
 
   - what is the procedure when a shard becomes too big for one machine, and
  needs to be split?
 There is the Collections API SPLITSHARD command you can use. This means
 that
 you increase by powers of two though, there's no such thing as adding,
 say, one new
 shard to a 4 shard cluster.
 
 You can also reindex from scratch.
 
 You can also overshard when you initially create your collection and
 host multiple
 shards and/or replicas on a single machine, then physically move them
 when the
 aggregate size exceeds your boundaries.
 
   - what is the procedure when we lose a machine and the node needs replacing
 Use the Collections API to DELETEREPLICA on the replicas on the dead
 node.
 Use the 

Re: sort param could not be parsed as a query, and is not a field that exists in the index: geodist()

2015-04-02 Thread Erick Erickson
What comes out int he Solr logs? Nothing's jumping out at me here.

What version of Solr are you using? What is your GEOLOCATION field type?

Best,
Erick

On Thu, Apr 2, 2015 at 2:20 PM, Niraj niroj.off...@gmail.com wrote:
 *Objective: To find out all locations those are present within 1 KM of the
 specified reference point, sorted by the distance from the reference*

 curl -i --globoff --negotiate -u XXX:XXX -XGET  -H Accept:
 application/json \
 -X GET
 http://xx:8983/solr/loc_data/select?q=*:*wt=jsonindent=truestart=0rows=1000fq=%7B!geofilt%7Dsfield=GEO_LOCATIONpt=25.8227920532,-80.1314697266d=1sort=geodist()+asc


 --
 {
   responseHeader:{
 status:400,
 QTime:1,
 params:{
   d:1,
   sort:geodist() asc,
   indent:true,
   start:0,
   q:*:*,
   sfield:GEO_LOCATION,
   pt:25.8227920532,-80.1314697266,
   doAs:*,
   wt:json,
   fq:{!geofilt},
   rows:1000}},
   error:{
 msg:*sort param could not be parsed as a query, and is not a field
 that exists in the index: geodist()*,
 code:400}}

 Please note that, the query works properly without the geodist() function.
 I am newbie to Solr. Please help.

 Regards,
 Niraj






 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/sort-param-could-not-be-parsed-as-a-query-and-is-not-a-field-that-exists-in-the-index-geodist-tp4197350.html
 Sent from the Solr - User mailing list archive at Nabble.com.


newbie questions regarding solr cloud

2015-04-02 Thread Ben Hsu
Hello

I am playing with solr5 right now, to see if its cloud features can replace
what we have with solr 3.6, and I have some questions, some newbie, and
some not so newbie

Background: the documents we are putting in solr have a date field. the
majority of our searches are restricted to documents created within the
last week, but searches do go back 60 days. documents older than 60 days
are removed from the repo. we also want high availability in case a machine
becomes unavailable

our current method, using solr 3.6, is to split the data into 1 day chunks,
within each day the data is split into several shards, and each shard has 2
replicas. Our code generates the list of cores to be queried on based on
the time ranged in the query. Cores that fall off the 60 day range are
deleted through solr's RESTful API.

This all sounds a lot like what Solr Cloud provides, so I started looking
at Solr Cloud's features.

My newbie questions:

 - it looks like the way to write a document is to pick a node (possibly
using a LB), send it to that node, and let solr figure out which nodes that
document is supposed to go. is this the recommended way?
 - similarly, can I just randomly pick a core (using the demo example:
http://localhost:7575/solr/#/gettingstarted_shard1_replica2/query ), query
it, and let it scatter out the queries to the appropriate cores, and send
me the results back? will it give me back results from all the shards?
 - is there a recommended Python library?

My hopefully less newbie questions:
 - does solr auto detect when node become unavailable, and stop sending
queries to them?
 - when the master node dies and the cluster elects a new master, what
happens to writes?
 - what happens when a node is unavailable
 - what is the procedure when a shard becomes too big for one machine, and
needs to be split?
 - what is the procedure when we lose a machine and the node needs replacing
 - how would we quickly bulk delete data within a date range?


Re: Taking Solr 5.0 to Production on Windows

2015-04-02 Thread Upayavira


On Thu, Apr 2, 2015, at 04:23 PM, Shawn Heisey wrote:
 On 4/2/2015 8:20 AM, Steven White wrote:
  I'm reading Taking Solr 5.0 to Production
  https://cwiki.apache.org/confluence/display/solr/Taking+Solr+to+Production
  but I cannot find anything about Windows, is there some other link I'm
  missing?
 
  This section in the doc is an important part for a successful Solr
  deployment, but it is missing Windows instructions.  Without one, there
  will either be scattered deployment or Windows folks (like me) will miss
  out on some key aspects that Solr expert know.
 
 We are aware that the documentation is missing step-by-step information
 for Windows.  We are all volunteers, and there's a limited amount of
 free time available.  The hole in the documentation will eventually get
 filled, but it's not going to happen immediately.  The available
 solutions must be studied so the best option can be determined, and it
 probably will require some development work to automate the install.
 
 You might get the sense that Windows is treated as a second class
 citizen around here ... and I think you'd probably be right to feel that
 way.  There are no technical advantages in a Windows server over the
 free operating systems like Linux.  The biggest disadvantage is in
 Microsoft's licensing model.  A Windows Server OS has a hefty price tag,
 and client operating systems like Windows 7 and 8 are intentionally
 crippled by Microsoft so they run heavy-duty server programs poorly, in
 order to sell more Server licenses.  If a Solr install is not very busy,
 Windows 7 or 8 would probably run it just fine, but a very busy install
 will run into problems if it's on a client OS.  Unfortunately I cannot
 find any concrete information about the precise limitations in client
 operating systems.

I think the point is more that the majority of developers use a Unix
based system, and the majority of testing is done on Unix based systems.

Also, there are ways in which the Windows memory model differs from a
Unix one, meaning certain memory optimisations have not been possible
under Windows. A Lucene index is accessed via a Directory object, and
Solr/Lucene will, by default, choose one according to your architecture:
Windows/Unix, 32/64 bit, etc. 64 bit Unix gives you the best options.

My unconfirmed understanding is that this is to do with the
MemoryMappedDirectory implementation which will only work on Unix. This
implementation uses the OS disk cache directly, rather than reading
files from the disk cache into the heap, and is therefore much more
efficient. I’m sure there are some folks here who can clarify if I got
my implementation names or other details wrong.

So, Solr *will* run on Windows, whether desktop (for development) or
server. However, it is much less tested, and you will find some things,
such as new init scripts, and so on, that maybe have not yet been ported
over to Windows.

Upayavira


solr query latency spike when replicating index

2015-04-02 Thread wei
I noticed the solr query latency spike on slave node when replicating index
from master. Especially when master just finished optimization, the slave
node will copy the whole index, and the latency is really bad.

Is there some way to fix it?

Thanks,
Wei


Re: Taking Solr 5.0 to Production on Windows

2015-04-02 Thread Shawn Heisey
On 4/2/2015 2:23 PM, Upayavira wrote:
 I think the point is more that the majority of developers use a Unix
 based system, and the majority of testing is done on Unix based systems.

 Also, there are ways in which the Windows memory model differs from a
 Unix one, meaning certain memory optimisations have not been possible
 under Windows. A Lucene index is accessed via a Directory object, and
 Solr/Lucene will, by default, choose one according to your architecture:
 Windows/Unix, 32/64 bit, etc. 64 bit Unix gives you the best options.

 My unconfirmed understanding is that this is to do with the
 MemoryMappedDirectory implementation which will only work on Unix. This
 implementation uses the OS disk cache directly, rather than reading
 files from the disk cache into the heap, and is therefore much more
 efficient. I’m sure there are some folks here who can clarify if I got
 my implementation names or other details wrong.

 So, Solr *will* run on Windows, whether desktop (for development) or
 server. However, it is much less tested, and you will find some things,
 such as new init scripts, and so on, that maybe have not yet been ported
 over to Windows.

MMap seems to work perfectly fine on Windows.

Uwe Schindler indicates that MMap is used by default on 64-bit Windows
JVMs since Lucene/Solr 3.1:

http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

For various reasons, MMap being only one of them, Solr should always be
run on 64-bit operating systems with a 64-bit Java.

There are no major *disadvantages* to running Solr on Windows, as long
as it's a 64-bit Server OS.  NTFS cannot compare to the best filesystems
on a recent Linux kernel, but it's not horrible.  If you've sized your
RAM appropriately, Solr will hardly ever hit the disk, so the filesystem
may not make much difference.

Thanks,
Shawn



Unable to update config file using zkcli or RELOAD

2015-04-02 Thread Shamik Bandopadhyay
Hi,

  I'm facing a weird issue. I've a solr cloud cluster with 2 shards having
a replica each. I started the cluster
using -Dbootstrap_confdir=./solr/collection1/conf
-Dcollection.configName=myconf. After the cluster is up and running, I
added a new request handler (newhandler) and wanted to push it without
restarting the server. First, I tried the RELOAD option. I ran

http://54.151.xx.xxx:8983/solr/admin/cores?action=RELOADcore=collection1

The command was successful, but when I logged in to the admin screen, the
solrconfig didn't show the request handler. Next I tried the zkcli script
on shard 1.

sh zkcli.sh -cmd upconfig -zkhost  zoohost1:2181 -confname myconf -solrhome
/mnt/opt/solrhome/ -confdir /mnt/opt/solrhome/solr/collection1/conf/

The script ran successfully and I could see the updated solrconfig file in
Solr admin. But then, when I tried

http://54.151.xx.xxx:8983/solr/collection1/newhandler

I got a 404. Not sure what I'm doing wrong. Do I need to run the zkcli
script on each node? I'm using Solr 5.0.

Regards,
Shamik


DOcValues

2015-04-02 Thread William Bell
If I set indexed=true and docvalues=true, when I
facet=truefacet.field=manu_exact
will it use docValues or the Indexed version?

Also, does it help with *Too many values for UnInvertedField faceting ?*


*Do I need to set facet.method when using docvalues?*

field name=manu_exact type=string indexed=true stored=true
docValues=true /

-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: SolrCloud 5.0 cluster RAM requirements

2015-04-02 Thread Ryan Steele
Thank you Shawn and Toke for the information and links!

No, I was not the one on #solr IRC channel. :/

Here are the details I have right now:

I'm building/running the operations side of this new SolrCloud cluster. 
It will be in Amazon, the initial cluster I'm planning to start with is 
5 r3.xlarge instances each using a general purpose SSD EBS volume for 
the SolrCloud related data (this will be separate from the EBS volume 
used by the OS). Each instance has 30.5 GiB RAM--152.5 GiB cluster 
wide--and each instance has 4 vCPU's. I'm using Oracle Java 1.8.0_31 and 
the G1 GC.

The data will be indexed on a separate machine and added to the 
SolrCloud cluster while searching is taking place. Unfortunately I don't 
have numbers at this time on how much data will be indexed. I do know 
that we will have over 2000 collections--some will be small (a few 
hundred documents and only a few megabytes at most), and a few will be 
very large (somewhere in the gigabytes). Our old Solr Master/Slave 
systems isn't broken up this way, so we aren't certain about how exactly 
things will map out in SolrCloud.

I'll continue researching, but I expect I'll just have to monitor the 
cluster as data gets imported into it and make adjustments as needed.

Ryan

On 4/2/15 12:06 AM, Toke Eskildsen wrote:
 Ryan Steeleryan.ste...@pgi.com  wrote:
 Does a SolrCloud 5.0 cluster need enough RAM across the cluster to load
 all the collections into RAM at all times?
 Although Shawn is right about us not being able to answer properly, sometimes 
 we can give qualified suggestions and guesses. At least to the direction you 
 should be looking. The quality of the guesses goes up with the amount of 
 information provided and 1TB is really not much information.

 - Are you indexing while searching? How much?
 - How many documents in the index?
 - What is a typical query? What about faceting?
 - How many concurrent queries?
 - Expected median response time?

 I'm building a SolrCloud cluster that may have approximately 1 TB of
 data spread across the collections.
 We're running a 22TB SolrCloud of a single 16-core server with 256GB RAM. 
 We've also had performance problems serving a 100GB index from a same-size 
 machine.

 The one hardware advice I will give is to start with SSDs and scale from 
 there. With present day price/performance, using spinning drives for anything 
 IO-intensive makes little sense.

 - Toke Eskildsen


---
 This email has been scanned for email related threats and delivered safely by 
Mimecast.
 For more information please visit http://www.mimecast.com
---



Problems with solr-cloud 4.8.0 and zookeeper 3.4.6

2015-04-02 Thread Vincenzo D'Amore
Hi,

In my development I have 3 servers.
Inside every server there are two running instance of zookeeper and
solrcloud.
zkHos
There aren't connections or any other clients running but I have the
zookeeper logs flooded by this annoying exceptions coming only from server
1 and 3.
All solrcloud and zookeeper instances seems to be healty.
I'm really unable to understand why this thing is happening.

Any help is really appreciated.


2015-04-03 01:27:18,899 [myid:1] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection
from /192.168.0.13:51675
2015-04-03 01:27:18,900 [myid:1] - WARN  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid
0x0, likely client has closed socket
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
at
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Thread.java:744)
2015-04-03 01:27:18,900 [myid:1] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for
client /192.168.0.13:51675 (no session established for client)


-- 
Vincenzo D'Amore
email: v.dam...@gmail.com
skype: free.dev
mobile: +39 349 8513251


Facet sorting algorithm for index

2015-04-02 Thread yriveiro
Hi,

I have an external application that use the output of a facet to join other
dataset using the keys of the facet result.

The facet query use index sort but in some point, my application crash
because the order of the keys is not correct. If I do an unix sort over
the keys of the result with LC_ALL=C doesn't output the same result.

I identified a case like this:

760d1f833b764591161\84b20f28242a0
760d1f833b76459116184b20f2

Why the line whit the '\' is before? This chain of chars is the character 
or is raw and are 2 chars?

In ASCII the  has lower ord than character 8, if \ is  then this sort
makes sense ...

My question here is how index sort works and how I can replicate it in C++





-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Facet-sorting-algorithm-for-index-tp4197174.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Alphanumeric Wild card search

2015-04-02 Thread Palagiri, Jayasankar
Hello Team,

Below is my field type

fieldType name=text_en_splitting class=solr.TextField 
positionIncrementGap=100 autoGeneratePhraseQueries=true
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
!-- in this example, we will only use synonyms at query time
filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt 
ignoreCase=true expand=false/
--
!-- Case insensitive stop word removal.
--
filter class=solr.StopFilterFactory
ignoreCase=true
words=lang/stopwords_en.txt
/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 
splitOnCaseChange=1 /
filter class=solr.LowerCaseFilterFactory/
filter class=solr.KeywordMarkerFilterFactory 
protected=protwords.txt/
filter class=solr.PorterStemFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory
ignoreCase=true
words=lang/stopwords_en.txt
/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 
splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.KeywordMarkerFilterFactory 
protected=protwords.txt/
filter class=solr.PorterStemFilterFactory/
  /analyzer
/fieldType


And my field is

field name=Name type=text_en_splitting indexed=true stored=true /

I have few docunets in my index

Like 1234-305, 1234-308,1234-318.

When I search Name:1234-* I get desired results, but when I search like 
Name:123-3* I get 0 results

Can some one help to find what is wrong with my indexing?

When I search
Thanks and Regards,
Jaya



Re: Question regarding enablePositionIncrements

2015-04-02 Thread Jack Krupansky
Position increments were considered problematic, especially for
highlighting. Did you get this for the stop filter? There was a Jira for
this - check CHANGES.TXT and the Jira for details.

For some discussion, see:
https://issues.apache.org/jira/browse/SOLR-6468


-- Jack Krupansky

On Thu, Apr 2, 2015 at 3:01 AM, Aman Tandon amantandon...@gmail.com wrote:

 Hi,

 I was using the enablePositionIncrements in solr 4.8.1 schema. But when I
 tries to use it in solr-5.0.0 it is giving error in creating the collection

 If I am correct it was useful in phrase queries. So is there any particular
 reasons for not supporting this option in solr 5? If so, then please
 explain it to me. Thanks in advance.

 With Regards
 Aman Tandon



Re: Database vs Solr : ID based filtering

2015-04-02 Thread Aman Tandon
Thanks Mikhail for the explanation.

With Regards
Aman Tandon

On Fri, Mar 27, 2015 at 3:40 PM, Mikhail Khludnev 
mkhlud...@griddynamics.com wrote:

 for the single where clause RDBMS with index performs comparable same as
 inverted index. Inverted index wins on multiple 'where' clauses, where it
 doesn't need composite indices; multivalue field is also its' intrinsic
 advantage. More details at
 http://www.slideshare.net/lucenerevolution/what-is-inaluceneagrandfinal


 On Fri, Mar 27, 2015 at 9:56 AM, Aman Tandon amantandon...@gmail.com
 wrote:

  Hi,
 
  Does an ID based filtering on solr will perform poor than DB?
 
  field nameid typestring indexed=true stored=true
 
 - http://localhost:8983/solr/select?q=*fq=id:153
 
 *OR*
 
 - select * from TABLE where id=153
 
 
  With Regards
  Aman Tandon
 



 --
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics

 http://www.griddynamics.com
 mkhlud...@griddynamics.com



edismax operators

2015-04-02 Thread Mahmoud Almokadem
Hello,

I've a strange behaviour on using edismax with multiwords. When using
passing q=+(word1 word2) I got

rawquerystring: +(word1 word2), querystring: +(word1 word2), 
parsedquery: (+(+(DisjunctionMaxQuery((title:word1))
DisjunctionMaxQuery((title:word2)/no_coord,
parsedquery_toString: +(+((title:word1)
(title:word2))),

I expected to get two words as must as I added + before the parentheses
so It must be applied for all terms in parentheses.

How can I apply default operator AND for all words.

Thanks,
Mahmoud


Re: edismax operators

2015-04-02 Thread Jack Krupansky
The parentheses signal a nested query. Your plus operator applies to the
overall nested query - that the nested query must match something. Use the
plus operator on each of the discrete terms if each of them is mandatory.
The plus and minus operators apply to the overall nested query - they do
not distribute to each term within the nested query. They don't magically
distribute to all nested queries.

Let's see you full set of query parameters, both on the request and in
solrconfig.

-- Jack Krupansky

On Thu, Apr 2, 2015 at 7:12 AM, Mahmoud Almokadem prog.mahm...@gmail.com
wrote:

 Hello,

 I've a strange behaviour on using edismax with multiwords. When using
 passing q=+(word1 word2) I got

 rawquerystring: +(word1 word2), querystring: +(word1 word2), 
 parsedquery: (+(+(DisjunctionMaxQuery((title:word1))
 DisjunctionMaxQuery((title:word2)/no_coord,
 parsedquery_toString: +(+((title:word1)
 (title:word2))),

 I expected to get two words as must as I added + before the parentheses
 so It must be applied for all terms in parentheses.

 How can I apply default operator AND for all words.

 Thanks,
 Mahmoud



Re: Alphanumeric Wild card search

2015-04-02 Thread Simon Martinelli
Hi,

Have a look at the generated terms to see how they look.

Simon

On Thu, Apr 2, 2015 at 9:43 AM, Palagiri, Jayasankar 
jayashankar.palag...@honeywell.com wrote:

 Hello Team,

 Below is my field type

 fieldType name=text_en_splitting class=solr.TextField
 positionIncrementGap=100 autoGeneratePhraseQueries=true
   analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 !-- in this example, we will only use synonyms at query time
 filter class=solr.SynonymFilterFactory
 synonyms=index_synonyms.txt ignoreCase=true expand=false/
 --
 !-- Case insensitive stop word removal.
 --
 filter class=solr.StopFilterFactory
 ignoreCase=true
 words=lang/stopwords_en.txt
 /
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 /
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.KeywordMarkerFilterFactory
 protected=protwords.txt/
 filter class=solr.PorterStemFilterFactory/
   /analyzer
   analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
 filter class=solr.StopFilterFactory
 ignoreCase=true
 words=lang/stopwords_en.txt
 /
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=0
 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.KeywordMarkerFilterFactory
 protected=protwords.txt/
 filter class=solr.PorterStemFilterFactory/
   /analyzer
 /fieldType


 And my field is

 field name=Name type=text_en_splitting indexed=true stored=true /

 I have few docunets in my index

 Like 1234-305, 1234-308,1234-318.

 When I search Name:1234-* I get desired results, but when I search like
 Name:123-3* I get 0 results

 Can some one help to find what is wrong with my indexing?

 When I search
 Thanks and Regards,
 Jaya




Re: Facet sorting algorithm for index

2015-04-02 Thread Yonik Seeley
On Thu, Apr 2, 2015 at 6:36 AM, yriveiro yago.rive...@gmail.com wrote:
 Hi,

 I have an external application that use the output of a facet to join other
 dataset using the keys of the facet result.

 The facet query use index sort but in some point, my application crash
 because the order of the keys is not correct. If I do an unix sort over
 the keys of the result with LC_ALL=C doesn't output the same result.

 I identified a case like this:

 760d1f833b764591161\84b20f28242a0
 760d1f833b76459116184b20f2

 Why the line whit the '\' is before? This chain of chars is the character 
 or is raw and are 2 chars?

 In ASCII the  has lower ord than character 8, if \ is  then this sort
 makes sense ...

How are you viewing the results?  If it's JSON, then yes the backslash
double quote would mean that there is just a literal double quote in
the string.

-Yonik


Re: Alphanumeric Wild card search

2015-04-02 Thread Jack Krupansky
This is caused by the word delimiter filter - it breaks multi-part terms
(the hyphens trigger it) into multiple terms. Wildcards simply don't work
consistently well in such a situation. The basic problem is that the
presence of the wildcard causes all but the simplest token filtering stages
to be bypassed, particularly the word delimiter filter (because it would
have stripped out the wildcard asterisk), so your wildcard term is analyzed
differently than it was indexed, so it fails to match. In other cases it
may match, but that would be happen only if the abbreviated token filtering
actually happened to match the full indexing filtering.

This is a limitation of Solr. You just have to learn to live with it. Or...
don't use the word delimiter filter when you need to be able to do
wildcards of multi-part terms.

-- Jack Krupansky

On Thu, Apr 2, 2015 at 3:43 AM, Palagiri, Jayasankar 
jayashankar.palag...@honeywell.com wrote:

 Hello Team,

 Below is my field type

 fieldType name=text_en_splitting class=solr.TextField
 positionIncrementGap=100 autoGeneratePhraseQueries=true
   analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 !-- in this example, we will only use synonyms at query time
 filter class=solr.SynonymFilterFactory
 synonyms=index_synonyms.txt ignoreCase=true expand=false/
 --
 !-- Case insensitive stop word removal.
 --
 filter class=solr.StopFilterFactory
 ignoreCase=true
 words=lang/stopwords_en.txt
 /
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 /
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.KeywordMarkerFilterFactory
 protected=protwords.txt/
 filter class=solr.PorterStemFilterFactory/
   /analyzer
   analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
 filter class=solr.StopFilterFactory
 ignoreCase=true
 words=lang/stopwords_en.txt
 /
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=0
 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.KeywordMarkerFilterFactory
 protected=protwords.txt/
 filter class=solr.PorterStemFilterFactory/
   /analyzer
 /fieldType


 And my field is

 field name=Name type=text_en_splitting indexed=true stored=true /

 I have few docunets in my index

 Like 1234-305, 1234-308,1234-318.

 When I search Name:1234-* I get desired results, but when I search like
 Name:123-3* I get 0 results

 Can some one help to find what is wrong with my indexing?

 When I search
 Thanks and Regards,
 Jaya




Re: SolrCloud 5.0 cluster RAM requirements

2015-04-02 Thread Shawn Heisey
On 4/2/2015 4:46 PM, Ryan Steele wrote:
 Thank you Shawn and Toke for the information and links! No, I was not
 the one on #solr IRC channel. :/ Here are the details I have right
 now: I'm building/running the operations side of this new SolrCloud
 cluster. It will be in Amazon, the initial cluster I'm planning to
 start with is 5 r3.xlarge instances each using a general purpose SSD
 EBS volume for the SolrCloud related data (this will be separate from
 the EBS volume used by the OS). Each instance has 30.5 GiB RAM--152.5
 GiB cluster wide--and each instance has 4 vCPU's. I'm using Oracle
 Java 1.8.0_31 and the G1 GC. 

Java 8u40 is supposed to have some significant improvements to G1
garbage collection, so I would recommend an upgrade from 8u31.  I heard
this directly from Oracle engineers on a mailing list for GC issues.

 The data will be indexed on a separate machine and added to the
 SolrCloud cluster while searching is taking place. Unfortunately I
 don't have numbers at this time on how much data will be indexed. I do
 know that we will have over 2000 collections--some will be small (a
 few hundred documents and only a few megabytes at most), and a few
 will be very large (somewhere in the gigabytes). Our old Solr
 Master/Slave systems isn't broken up this way, so we aren't certain
 about how exactly things will map out in SolrCloud. 

If is a viable option to combine collections that use the same or
similar schemas and do filtering on the query side to reduce the total
number of collections to only a few hundred, your SolrCloud experience
will probably be better.  See this issue:

https://issues.apache.org/jira/browse/SOLR-7191

General SolrCloud stability is not very good with thousands of
collections, but I would imagine that SSD storage will improve that,
especially if the zookeeper database is also on SSD.

In a perfect world, for the best performance, you would have enough
memory across the cluster so that you can cache all of the index data
present on the cluster, including all replicas ... but for terabyte
scale indexes, that's either a huge amount of RAM on a modest number of
servers or a huge amount of servers, each with a big chunk of RAM. 
Either way it's very expensive, especially on Amazon.  Usually you can
achieve very good performance without a perfect one-to-one relationship
between index size and RAM.

The fact that you will have a lot of smaller indexes will hopefully mean
only some of them are needed at any given time.  If that's the case,
your overall memory requirements will be lower than if you had a single
1TB index, and I think the SSD storage will help the performance of
those smaller indexes a lot more than it would for very large indexes.

Thanks,
Shawn



Re: SolrCloud 5.0 cluster RAM requirements

2015-04-02 Thread Shawn Heisey
On 4/2/2015 11:18 PM, Shawn Heisey wrote:
 On 4/2/2015 4:46 PM, Ryan Steele wrote:
 cluster. It will be in Amazon, the initial cluster I'm planning to
 start with is 5 r3.xlarge instances each using a general purpose SSD
 EBS volume for the SolrCloud related data (this will be separate from
 the EBS volume used by the OS). Each instance has 30.5 GiB RAM--152.5
 GiB cluster wide--and each instance has 4 vCPU's. I'm using Oracle
 Java 1.8.0_31 and the G1 GC. 

Followup on the RAM:  Depending on your query characteristics, 1TB of
index data might require a significant amount of heap memory.  I would
imagine that you'll need to allocate at least half of your 30GB RAM to
the Java heap on each server, and possibly more, which will reduce the
amount available for disk caching.

There's a very good chance that you'll either need more EC2 instances,
and/or that you will need instances with more memory.  Before committing
more resources, you will need to find out whether performance is
acceptable or not with what you have already planned.

Thanks,
Shawn



multi core faceting

2015-04-02 Thread Aman Tandon
Hi,

I have two cores one contains the data of jeans and other core contains
data of shirts available to user. I want to show count of shirts and jeans
on my website from one solr request.

Is there any functionality available i solr by which I can get the combined
facet from both the cores (jeans  shirts).

With Regards
Aman Tandon


Re: multi core faceting

2015-04-02 Thread Shawn Heisey
On 4/2/2015 11:30 PM, Aman Tandon wrote:
 I have two cores one contains the data of jeans and other core contains
 data of shirts available to user. I want to show count of shirts and jeans
 on my website from one solr request.
 
 Is there any functionality available i solr by which I can get the combined
 facet from both the cores (jeans  shirts).

Are the schemas of the two cores the same, or at least very similar?  At
a bare minimum, they would need to use the same field name for the
uniqueKey, but substantial similarity, at least on the fields that you
will be querying and faceting, is usually required.

If the answer to that question is yes, you may be able to do a
distributed search.

Your message history on this list mentions SolrCloud quite frequently,
but your message specifically says cores ... which would tend to mean
that it's NOT SolrCloud.

If it is cloud, you could create a collection alias that points at both
collections, then use the alias in your queries to query them both.

https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api4

If it's not SolrCloud, then you can use the older method for distributed
searching:

https://wiki.apache.org/solr/DistributedSearch

Thanks,
Shawn



Re: Generating json response in custom requestHandler (xml is working)

2015-04-02 Thread Shalin Shekhar Mangar
The URL you are trying to access is wrong. You are using
/solr/etr_base_core/trendswt=json but you should be using
/solr/etr_base_core/trends?wt=json

On Thu, Apr 2, 2015 at 9:51 AM, Christian Reuschling 
christian.reuschl...@gmail.com wrote:

 Hi,

 I managed it to create a small custom requestHandler, and filled the
 response parameter with  some
 static values in the structure I want to have later.

 I can invoke the requestHander from the browser and get nicely xml with
 the data and structure I
 had specified - so far so good. Here is the xml response:


 response
 lst name=responseHeader
 int name=status0/int
 int name=QTime17/int
 /lst
 lst name=response
 double name=doubleAtt113.0/double
 double name=doubleAtt214.0/double
 int name=intAtt115/int
 double name=doubleAtt316.0/double
 double name=doubleAtt417.0/double
 arr name=slices
 lst
 double name=doubleAtt113.0/double
 double name=doubleAtt214.0/double
 int name=intAtt115/int
 double name=doubleAtt516.0/double
 double name=doubleAtt617.0/double
 arr name=ids
 strid1/str
 strid2/str
 strid3/str
 strid4/str
 /arr
 /lst
 lst
 double name=doubleAtt113.0/double
 double name=doubleAtt214.0/double
 int name=intAtt115/int
 double name=doubleAtt516.0/double
 double name=doubleAtt617.0/double
 arr name=ids
 strid1/str
 strid2/str
 strid3/str
 strid4/str
 /arr
 /lst
 /arr
 /lst
 /response


 Now I simply add wt=json to the invocation. Sadly I get a

 HTTP ERROR 404

 Problem accessing /solr/etr_base_core/trendswt=json. Reason:

 Not Found


 I had the feeling that the response format is transparent for me when I
 write a custom
 requestHandler. But it seems I've overseen something.

 Does anybody have an idea?


 Regards

 Christian




-- 
Regards,
Shalin Shekhar Mangar.


Re: Facet sorting algorithm for index

2015-04-02 Thread Yonik Seeley
On Thu, Apr 2, 2015 at 9:44 AM, Yago Riveiro yago.rive...@gmail.com wrote:
 Where can I found the source code used in index sorting? I need to ensure 
 that the external data has the same sorting that the facet result.

If you step over the indexed terms of a field you get them in sorted
order (hence for a single node, the sorting is done at indexing time).
Lucene index order for text will essentially be unicode code point order.

-Yonik


Re: sort on facet.index?

2015-04-02 Thread Toke Eskildsen
Ryan Josal rjo...@gmail.com wrote:
 So maybe you are asking if you can sort by index, but reversed?
 I don't think this is possible, and it's a good question.

It is not currently possible and the JIRA for the issue 
  https://issues.apache.org/jira/browse/SOLR-1672
is 5 years old. On the plus side, there seems to be renewed interest for it.

- Toke Eskildsen


Re: How to recover a Shard

2015-04-02 Thread Erick Erickson
Matt:

This seems dangerous, but you might be able to use the Collections API to
1 DELTEREPLICA an all but one.
2 RELOAD the collection
3 ADDREPLICA back.

I don't _like_ this much mind you as when you added the replicas back
it'd replicate the index from the leader, but at least you might not
have to take Solr down.

I'm not completely sure that this'll work, mind you but

Erick

On Wed, Apr 1, 2015 at 8:04 PM, Matt Kuiper matt.kui...@issinc.com wrote:
 Maybe I have been working too many long hours as I missed the obvious 
 solution of bringing down/up one of the Solr nodes backing one of the 
 replicas, and then the same for the second node.  This did the trick.

 Since I brought this topic up, I will narrow the question a bit:  Would there 
 be a way to recover without restarting the Solr node?  Basically to delete 
 one replica and then somehow declare the other replica the leader and break 
 it out of its recovery process?

 Thanks,
 Matt


 From: Matt Kuiper
 Sent: Wednesday, April 01, 2015 8:43 PM
 To: solr-user@lucene.apache.org
 Subject: How to recover a Shard

 Hello,

 I have a SolrCloud (4.10.1) where for one of the shards, both replicas are in 
 a Recovery Failed state per the Solr Admin Cloud page.  The logs contains 
 the following type of entries for the two Solr nodes involved, including 
 statements that it will retry.

 Is there a way to recover from this state?

 Maybe bring down one replica, and then somehow declare that the remaining 
 replica is to be the leader?  Understand this would not be ideal as the new 
 leader may be missing documents that were sent its way to be indexed while it 
 was down, but would be better than having to rebuild the whole cloud.

 Any tips or suggestions would be appreciated.

 Thanks,
 Matt

 Solr node .65
 Error while trying to recover. 
 core=kla_collection_shard6_replica5:org.apache.solr.common.SolrException: No 
 registered leader was found after waiting for 4000ms , collection: 
 kla_collection slice: shard6
  at 
 org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:568)
  at 
 org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:551)
  at 
 org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:332)
  at 
 org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:235)
 Solr node .64

 Error while trying to recover. 
 core=kla_collection_shard6_replica2:org.apache.solr.common.SolrException: No 
 registered leader was found after waiting for 4000ms , collection: 
 kla_collection slice: shard6

  at 
 org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:568)

  at 
 org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:551)

  at 
 org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:332)

  at 
 org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:235)



Taking Solr 5.0 to Production on Windows

2015-04-02 Thread Steven White
Hi folks,

I'm reading Taking Solr 5.0 to Production
https://cwiki.apache.org/confluence/display/solr/Taking+Solr+to+Production
but I cannot find anything about Windows, is there some other link I'm
missing?

This section in the doc is an important part for a successful Solr
deployment, but it is missing Windows instructions.  Without one, there
will either be scattered deployment or Windows folks (like me) will miss
out on some key aspects that Solr expert know.

Any feedback on this?

Thanks

Steve


Re: sort on facet.index?

2015-04-02 Thread Ryan Josal
Sorting the result set or the facets?  For the facets there is
facet.sort=index (lexicographically) and facet.sort=count.  So maybe you
are asking if you can sort by index, but reversed?  I don't think this is
possible, and it's a good question.  I wanted to chime in on this one
because I wanted my own facet.sort=rank, but there is no nice pluggable way
to implement a new sort.  I'd love to be able to add a Comparator for a new
sort.  I ended up subclassing FacetComponent to sort of hack on the rank
sort implementation but it isn't very pretty and I'm sure not as efficient
as it could be if FacetComponent was designed for more sorts.

Ryan

On Thursday, April 2, 2015, Derek Poh d...@globalsources.com wrote:

 Is sorting on facet index supported?

 I would like to sort on the below facet index

 lst name=P_SupplierRanking
 int name=014/int
 int name=18/int
 int name=212/int
 int name=3349/int
 int name=481/int
 int name=58/int
 int name=612/int
 /lst

 to

 lst name=P_SupplierRanking
 int name=612/int
 int name=58/int
 int name=481/int
 int name=3349/int
 ...
 ...
 ...
 /lst

 -Derek



Re: Taking Solr 5.0 to Production on Windows

2015-04-02 Thread Shawn Heisey
On 4/2/2015 8:20 AM, Steven White wrote:
 I'm reading Taking Solr 5.0 to Production
 https://cwiki.apache.org/confluence/display/solr/Taking+Solr+to+Production
 but I cannot find anything about Windows, is there some other link I'm
 missing?

 This section in the doc is an important part for a successful Solr
 deployment, but it is missing Windows instructions.  Without one, there
 will either be scattered deployment or Windows folks (like me) will miss
 out on some key aspects that Solr expert know.

We are aware that the documentation is missing step-by-step information
for Windows.  We are all volunteers, and there's a limited amount of
free time available.  The hole in the documentation will eventually get
filled, but it's not going to happen immediately.  The available
solutions must be studied so the best option can be determined, and it
probably will require some development work to automate the install.

You might get the sense that Windows is treated as a second class
citizen around here ... and I think you'd probably be right to feel that
way.  There are no technical advantages in a Windows server over the
free operating systems like Linux.  The biggest disadvantage is in
Microsoft's licensing model.  A Windows Server OS has a hefty price tag,
and client operating systems like Windows 7 and 8 are intentionally
crippled by Microsoft so they run heavy-duty server programs poorly, in
order to sell more Server licenses.  If a Solr install is not very busy,
Windows 7 or 8 would probably run it just fine, but a very busy install
will run into problems if it's on a client OS.  Unfortunately I cannot
find any concrete information about the precise limitations in client
operating systems.

Thanks,
Shawn



RE: How to recover a Shard

2015-04-02 Thread Matt Kuiper
Thanks Erick!  Understand your warning.  Next time it occurs, I will plan to 
give it a try.  I am currently in a dev environment, so it is a safe place to 
experiment.

Matt

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Thursday, April 02, 2015 9:40 AM
To: solr-user@lucene.apache.org
Subject: Re: How to recover a Shard

Matt:

This seems dangerous, but you might be able to use the Collections API to
1 DELTEREPLICA an all but one.
2 RELOAD the collection
3 ADDREPLICA back.

I don't _like_ this much mind you as when you added the replicas back it'd 
replicate the index from the leader, but at least you might not have to take 
Solr down.

I'm not completely sure that this'll work, mind you but

Erick

On Wed, Apr 1, 2015 at 8:04 PM, Matt Kuiper matt.kui...@issinc.com wrote:
 Maybe I have been working too many long hours as I missed the obvious 
 solution of bringing down/up one of the Solr nodes backing one of the 
 replicas, and then the same for the second node.  This did the trick.

 Since I brought this topic up, I will narrow the question a bit:  Would there 
 be a way to recover without restarting the Solr node?  Basically to delete 
 one replica and then somehow declare the other replica the leader and break 
 it out of its recovery process?

 Thanks,
 Matt


 From: Matt Kuiper
 Sent: Wednesday, April 01, 2015 8:43 PM
 To: solr-user@lucene.apache.org
 Subject: How to recover a Shard

 Hello,

 I have a SolrCloud (4.10.1) where for one of the shards, both replicas are in 
 a Recovery Failed state per the Solr Admin Cloud page.  The logs contains 
 the following type of entries for the two Solr nodes involved, including 
 statements that it will retry.

 Is there a way to recover from this state?

 Maybe bring down one replica, and then somehow declare that the remaining 
 replica is to be the leader?  Understand this would not be ideal as the new 
 leader may be missing documents that were sent its way to be indexed while it 
 was down, but would be better than having to rebuild the whole cloud.

 Any tips or suggestions would be appreciated.

 Thanks,
 Matt

 Solr node .65
 Error while trying to recover. 
 core=kla_collection_shard6_replica5:org.apache.solr.common.SolrException: No 
 registered leader was found after waiting for 4000ms , collection: 
 kla_collection slice: shard6
  at 
 org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:568)
  at 
 org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:551)
  at 
 org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:332)
  at 
 org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:235)
 Solr node .64

 Error while trying to recover. 
 core=kla_collection_shard6_replica2:org.apache.solr.common.SolrExcepti
 on: No registered leader was found after waiting for 4000ms , 
 collection: kla_collection slice: shard6

  at 
 org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReade
 r.java:568)

  at 
 org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReade
 r.java:551)

  at 
 org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.jav
 a:332)

  at 
 org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:235)



Re: edismax operators

2015-04-02 Thread Shawn Heisey
On 4/2/2015 8:35 AM, Mahmoud Almokadem wrote:
 Thank you Jack for your clarifications. I used regular defType and set
 q.op=AND so all terms without operators are must. How can I use this with
 edismax?

The edismax parser is capable of much more granularity than simply
AND/OR on the default operator, through the mm parameter.  If you set
q.op to AND, the mm parameter will be set to 100%.  The mm parameter is
EXTREMELY flexible.

https://wiki.apache.org/solr/ExtendedDisMax#mm_.28Minimum_.27Should.27_Match.29

Thanks,
Shawn



RE: edismax operators

2015-04-02 Thread Davis, Daniel (NIH/NLM) [C]
Can the mm parameter be set per clause?I guess I've ignored it in the past 
aside from setting it once to what seemed like a reasonable value.
That is probably replicated across every collection, which cannot be ideal for 
relevance.

-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: Thursday, April 02, 2015 11:13 AM
To: solr-user@lucene.apache.org
Subject: Re: edismax operators

On 4/2/2015 8:35 AM, Mahmoud Almokadem wrote:
 Thank you Jack for your clarifications. I used regular defType and set 
 q.op=AND so all terms without operators are must. How can I use this 
 with edismax?

The edismax parser is capable of much more granularity than simply AND/OR on 
the default operator, through the mm parameter.  If you set q.op to AND, the mm 
parameter will be set to 100%.  The mm parameter is EXTREMELY flexible.

https://wiki.apache.org/solr/ExtendedDisMax#mm_.28Minimum_.27Should.27_Match.29

Thanks,
Shawn



Re: Question regarding enablePositionIncrements

2015-04-02 Thread Jack Krupansky
That's my understanding - but use the Solr Admin UI analysis page to
confirm exactly what happens, for both index and query analysis.

-- Jack Krupansky

On Thu, Apr 2, 2015 at 10:04 AM, Aman Tandon amantandon...@gmail.com
wrote:

 Hi Jack,

 I read that jira, i understand the concern of heaven.

 So does it mean that, no hole will be left when we will use the stop
 filter?

 With Regards
 Aman Tandon

 On Thu, Apr 2, 2015 at 6:01 PM, Jack Krupansky jack.krupan...@gmail.com
 wrote:

  Position increments were considered problematic, especially for
  highlighting. Did you get this for the stop filter? There was a Jira for
  this - check CHANGES.TXT and the Jira for details.
 
  For some discussion, see:
  https://issues.apache.org/jira/browse/SOLR-6468
 
 
  -- Jack Krupansky
 
  On Thu, Apr 2, 2015 at 3:01 AM, Aman Tandon amantandon...@gmail.com
  wrote:
 
   Hi,
  
   I was using the enablePositionIncrements in solr 4.8.1 schema. But
 when I
   tries to use it in solr-5.0.0 it is giving error in creating the
  collection
  
   If I am correct it was useful in phrase queries. So is there any
  particular
   reasons for not supporting this option in solr 5? If so, then please
   explain it to me. Thanks in advance.
  
   With Regards
   Aman Tandon
  
 



Re: Question regarding enablePositionIncrements

2015-04-02 Thread Aman Tandon
Hi Jack,

I read that jira, i understand the concern of heaven.

So does it mean that, no hole will be left when we will use the stop filter?

With Regards
Aman Tandon

On Thu, Apr 2, 2015 at 6:01 PM, Jack Krupansky jack.krupan...@gmail.com
wrote:

 Position increments were considered problematic, especially for
 highlighting. Did you get this for the stop filter? There was a Jira for
 this - check CHANGES.TXT and the Jira for details.

 For some discussion, see:
 https://issues.apache.org/jira/browse/SOLR-6468


 -- Jack Krupansky

 On Thu, Apr 2, 2015 at 3:01 AM, Aman Tandon amantandon...@gmail.com
 wrote:

  Hi,
 
  I was using the enablePositionIncrements in solr 4.8.1 schema. But when I
  tries to use it in solr-5.0.0 it is giving error in creating the
 collection
 
  If I am correct it was useful in phrase queries. So is there any
 particular
  reasons for not supporting this option in solr 5? If so, then please
  explain it to me. Thanks in advance.
 
  With Regards
  Aman Tandon
 



Re: Facet sorting algorithm for index

2015-04-02 Thread Yago Riveiro
The result is a custom responseWriter, I found a bug in my code that append de 
\ to “.


The JSON response shows the data without the \.




Where can I found the source code used in index sorting? I need to ensure that 
the external data has the same sorting that the facet result.


—
/Yago Riveiro

On Thu, Apr 2, 2015 at 12:26 PM, Yonik Seeley ysee...@gmail.com wrote:

 On Thu, Apr 2, 2015 at 6:36 AM, yriveiro yago.rive...@gmail.com wrote:
 Hi,

 I have an external application that use the output of a facet to join other
 dataset using the keys of the facet result.

 The facet query use index sort but in some point, my application crash
 because the order of the keys is not correct. If I do an unix sort over
 the keys of the result with LC_ALL=C doesn't output the same result.

 I identified a case like this:

 760d1f833b764591161\84b20f28242a0
 760d1f833b76459116184b20f2

 Why the line whit the '\' is before? This chain of chars is the character 
 or is raw and are 2 chars?

 In ASCII the  has lower ord than character 8, if \ is  then this sort
 makes sense ...
 How are you viewing the results?  If it's JSON, then yes the backslash
 double quote would mean that there is just a literal double quote in
 the string.
 -Yonik

Re: edismax operators

2015-04-02 Thread Mahmoud Almokadem
Thank you Jack for your clarifications. I used regular defType and set
q.op=AND so all terms without operators are must. How can I use this with
edismax?

Thanks,
Mahmoud

On Thu, Apr 2, 2015 at 2:14 PM, Jack Krupansky jack.krupan...@gmail.com
wrote:

 The parentheses signal a nested query. Your plus operator applies to the
 overall nested query - that the nested query must match something. Use the
 plus operator on each of the discrete terms if each of them is mandatory.
 The plus and minus operators apply to the overall nested query - they do
 not distribute to each term within the nested query. They don't magically
 distribute to all nested queries.

 Let's see you full set of query parameters, both on the request and in
 solrconfig.

 -- Jack Krupansky

 On Thu, Apr 2, 2015 at 7:12 AM, Mahmoud Almokadem prog.mahm...@gmail.com
 wrote:

  Hello,
 
  I've a strange behaviour on using edismax with multiwords. When using
  passing q=+(word1 word2) I got
 
  rawquerystring: +(word1 word2), querystring: +(word1 word2), 
  parsedquery: (+(+(DisjunctionMaxQuery((title:word1))
  DisjunctionMaxQuery((title:word2)/no_coord,
  parsedquery_toString: +(+((title:word1)
  (title:word2))),
 
  I expected to get two words as must as I added + before the parentheses
  so It must be applied for all terms in parentheses.
 
  How can I apply default operator AND for all words.
 
  Thanks,
  Mahmoud
 



RE: edismax operators

2015-04-02 Thread Davis, Daniel (NIH/NLM) [C]
Thanks Shawn,

This is what I thought, but Solr often has features I don't anticipate.

-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: Thursday, April 02, 2015 12:54 PM
To: solr-user@lucene.apache.org
Subject: Re: edismax operators

On 4/2/2015 9:59 AM, Davis, Daniel (NIH/NLM) [C] wrote:
 Can the mm parameter be set per clause?I guess I've ignored it in the 
 past aside from setting it once to what seemed like a reasonable value.
 That is probably replicated across every collection, which cannot be ideal 
 for relevance.

It applies to the whole query.  You can have a different value on every query 
you send.  Just like with other parameters, defaults can be configured in the 
solrconfig.xml request handler definition.

Thanks,
Shawn



RE: Generating json response in custom requestHandler (xml is working)

2015-04-02 Thread Davis, Daniel (NIH/NLM) [C]
Use XSLT to generate JSON?But you probably actually do want both, and 
ruby/python, etc.

-Original Message-
From: Christian Reuschling [mailto:christian.reuschl...@gmail.com] 
Sent: Thursday, April 02, 2015 12:51 PM
To: solr-user@lucene.apache.org
Subject: Generating json response in custom requestHandler (xml is working)

Hi,

I managed it to create a small custom requestHandler, and filled the response 
parameter with  some static values in the structure I want to have later.

I can invoke the requestHander from the browser and get nicely xml with the 
data and structure I had specified - so far so good. Here is the xml response:


response
lst name=responseHeader
int name=status0/int
int name=QTime17/int
/lst
lst name=response
double name=doubleAtt113.0/double
double name=doubleAtt214.0/double
int name=intAtt115/int
double name=doubleAtt316.0/double
double name=doubleAtt417.0/double
arr name=slices
lst
double name=doubleAtt113.0/double
double name=doubleAtt214.0/double
int name=intAtt115/int
double name=doubleAtt516.0/double
double name=doubleAtt617.0/double
arr name=ids
strid1/str
strid2/str
strid3/str
strid4/str
/arr
/lst
lst
double name=doubleAtt113.0/double
double name=doubleAtt214.0/double
int name=intAtt115/int
double name=doubleAtt516.0/double
double name=doubleAtt617.0/double
arr name=ids
strid1/str
strid2/str
strid3/str
strid4/str
/arr
/lst
/arr
/lst
/response


Now I simply add wt=json to the invocation. Sadly I get a

HTTP ERROR 404

Problem accessing /solr/etr_base_core/trendswt=json. Reason:

Not Found


I had the feeling that the response format is transparent for me when I write a 
custom requestHandler. But it seems I've overseen something.

Does anybody have an idea?


Regards

Christian


Re: edismax operators

2015-04-02 Thread Mahmoud Almokadem
Thanks all for you response,

But the parsed_query and number of results still when changing MM parameter

the following results for mm=100% and mm=0%

http://solrserver/solr/collection1/select?q=%2B(word1+word2)rows=0fl=Titlewt=jsonindent=truedebugQuery=truedefType=edismaxqf=titlemm=100%25stopwords=truelowercaseOperators=true
http://10.1.1.118:8090/solr/PAEB/select?q=%2B(word1+word2)rows=0fl=Titlewt=jsonindent=truedebugQuery=truedefType=edismaxqf=titlemm=100%25stopwords=truelowercaseOperators=true

rawquerystring: +(word1 word2), querystring: +(word1 word2),
parsedquery: (+(+(DisjunctionMaxQuery((title:word1))
DisjunctionMaxQuery((title:word2)/no_coord,
parsedquery_toString: +(+((title:word1) (title:word2)))”,



http://solrserver/solr/collection1/select?q=%2B(word1+word2)rows=0fl=Titlewt=jsonindent=truedebugQuery=truedefType=edismaxqf=titlemm=0%25stopwords=truelowercaseOperators=true
http://10.1.1.118:8090/solr/PAEB/select?q=%2B(word1+word2)rows=0fl=Titlewt=jsonindent=truedebugQuery=truedefType=edismaxqf=titlemm=100%25stopwords=truelowercaseOperators=true

rawquerystring: +(word1 word2), querystring: +(word1 word2),
parsedquery: (+(+(DisjunctionMaxQuery((title:word1))
DisjunctionMaxQuery((title:word2)/no_coord,
parsedquery_toString: +(+((title:word1) (title:word2))),

There are any changes on two queries

solr version 4.8.1

Thanks,
Mahmoud

On Thu, Apr 2, 2015 at 6:56 PM, Davis, Daniel (NIH/NLM) [C] 
daniel.da...@nih.gov wrote:

 Thanks Shawn,

 This is what I thought, but Solr often has features I don't anticipate.

 -Original Message-
 From: Shawn Heisey [mailto:apa...@elyograg.org]
 Sent: Thursday, April 02, 2015 12:54 PM
 To: solr-user@lucene.apache.org
 Subject: Re: edismax operators

 On 4/2/2015 9:59 AM, Davis, Daniel (NIH/NLM) [C] wrote:
  Can the mm parameter be set per clause?I guess I've ignored it in
 the past aside from setting it once to what seemed like a reasonable value.
  That is probably replicated across every collection, which cannot be
 ideal for relevance.

 It applies to the whole query.  You can have a different value on every
 query you send.  Just like with other parameters, defaults can be
 configured in the solrconfig.xml request handler definition.

 Thanks,
 Shawn




Re: sort on facet.index?

2015-04-02 Thread Yonik Seeley
On Thu, Apr 2, 2015 at 10:25 AM, Ryan Josal rjo...@gmail.com wrote:
 Sorting the result set or the facets?  For the facets there is
 facet.sort=index (lexicographically) and facet.sort=count.  So maybe you
 are asking if you can sort by index, but reversed?  I don't think this is
 possible, and it's a good question.

The new facet module that will be in Solr 5.1 supports sorting both
directions on both count and index order (as well as by statistics /
bucket aggregations).
http://yonik.com/json-facet-api/

-Yonik


RE: Generating json response in custom requestHandler (xml is working)

2015-04-02 Thread Davis, Daniel (NIH/NLM) [C]
I mean that you could use XSLTResponseWriter to generate exactly the format you 
want.   However, I anticipate that if you already have a custom response, 
getting it to automatically generate XML/JSON/Python/Ruby was an expectation, 
and may be a requirement.

Maybe you should look at the code - it could be that the standard response 
writer looks explicitly at the wt parameter and does something using these 
other response writers that you should copy.

-Original Message-
From: Davis, Daniel (NIH/NLM) [C] 
Sent: Thursday, April 02, 2015 1:00 PM
To: solr-user@lucene.apache.org
Subject: RE: Generating json response in custom requestHandler (xml is working)

Use XSLT to generate JSON?But you probably actually do want both, and 
ruby/python, etc.

-Original Message-
From: Christian Reuschling [mailto:christian.reuschl...@gmail.com] 
Sent: Thursday, April 02, 2015 12:51 PM
To: solr-user@lucene.apache.org
Subject: Generating json response in custom requestHandler (xml is working)

Hi,

I managed it to create a small custom requestHandler, and filled the response 
parameter with  some static values in the structure I want to have later.

I can invoke the requestHander from the browser and get nicely xml with the 
data and structure I had specified - so far so good. Here is the xml response:


response
lst name=responseHeader
int name=status0/int
int name=QTime17/int
/lst
lst name=response
double name=doubleAtt113.0/double
double name=doubleAtt214.0/double
int name=intAtt115/int
double name=doubleAtt316.0/double
double name=doubleAtt417.0/double
arr name=slices
lst
double name=doubleAtt113.0/double
double name=doubleAtt214.0/double
int name=intAtt115/int
double name=doubleAtt516.0/double
double name=doubleAtt617.0/double
arr name=ids
strid1/str
strid2/str
strid3/str
strid4/str
/arr
/lst
lst
double name=doubleAtt113.0/double
double name=doubleAtt214.0/double
int name=intAtt115/int
double name=doubleAtt516.0/double
double name=doubleAtt617.0/double
arr name=ids
strid1/str
strid2/str
strid3/str
strid4/str
/arr
/lst
/arr
/lst
/response


Now I simply add wt=json to the invocation. Sadly I get a

HTTP ERROR 404

Problem accessing /solr/etr_base_core/trendswt=json. Reason:

Not Found


I had the feeling that the response format is transparent for me when I write a 
custom requestHandler. But it seems I've overseen something.

Does anybody have an idea?


Regards

Christian


Re: edismax operators

2015-04-02 Thread Erick Erickson
The MM parameter is specific to the handler you set up/use, so it's
really on a per collection basis. Different collections can specify
this however they want.

Or I misunderstand what you're asking..

Best,
Erick

On Thu, Apr 2, 2015 at 8:59 AM, Davis, Daniel (NIH/NLM) [C]
daniel.da...@nih.gov wrote:
 Can the mm parameter be set per clause?I guess I've ignored it in the 
 past aside from setting it once to what seemed like a reasonable value.
 That is probably replicated across every collection, which cannot be ideal 
 for relevance.

 -Original Message-
 From: Shawn Heisey [mailto:apa...@elyograg.org]
 Sent: Thursday, April 02, 2015 11:13 AM
 To: solr-user@lucene.apache.org
 Subject: Re: edismax operators

 On 4/2/2015 8:35 AM, Mahmoud Almokadem wrote:
 Thank you Jack for your clarifications. I used regular defType and set
 q.op=AND so all terms without operators are must. How can I use this
 with edismax?

 The edismax parser is capable of much more granularity than simply AND/OR on 
 the default operator, through the mm parameter.  If you set q.op to AND, the 
 mm parameter will be set to 100%.  The mm parameter is EXTREMELY flexible.

 https://wiki.apache.org/solr/ExtendedDisMax#mm_.28Minimum_.27Should.27_Match.29

 Thanks,
 Shawn



Re: edismax operators

2015-04-02 Thread Shawn Heisey
On 4/2/2015 9:59 AM, Davis, Daniel (NIH/NLM) [C] wrote:
 Can the mm parameter be set per clause?I guess I've ignored it in the 
 past aside from setting it once to what seemed like a reasonable value.
 That is probably replicated across every collection, which cannot be ideal 
 for relevance.

It applies to the whole query.  You can have a different value on every
query you send.  Just like with other parameters, defaults can be
configured in the solrconfig.xml request handler definition.

Thanks,
Shawn




Generating json response in custom requestHandler (xml is working)

2015-04-02 Thread Christian Reuschling
Hi,

I managed it to create a small custom requestHandler, and filled the response 
parameter with  some
static values in the structure I want to have later.

I can invoke the requestHander from the browser and get nicely xml with the 
data and structure I
had specified - so far so good. Here is the xml response:


response
lst name=responseHeader
int name=status0/int
int name=QTime17/int
/lst
lst name=response
double name=doubleAtt113.0/double
double name=doubleAtt214.0/double
int name=intAtt115/int
double name=doubleAtt316.0/double
double name=doubleAtt417.0/double
arr name=slices
lst
double name=doubleAtt113.0/double
double name=doubleAtt214.0/double
int name=intAtt115/int
double name=doubleAtt516.0/double
double name=doubleAtt617.0/double
arr name=ids
strid1/str
strid2/str
strid3/str
strid4/str
/arr
/lst
lst
double name=doubleAtt113.0/double
double name=doubleAtt214.0/double
int name=intAtt115/int
double name=doubleAtt516.0/double
double name=doubleAtt617.0/double
arr name=ids
strid1/str
strid2/str
strid3/str
strid4/str
/arr
/lst
/arr
/lst
/response


Now I simply add wt=json to the invocation. Sadly I get a

HTTP ERROR 404

Problem accessing /solr/etr_base_core/trendswt=json. Reason:

Not Found


I had the feeling that the response format is transparent for me when I write a 
custom
requestHandler. But it seems I've overseen something.

Does anybody have an idea?


Regards

Christian


Re: sort on facet.index?

2015-04-02 Thread Ryan Josal
Awesome, I didn't know this feature was going to add so much power!
Looking forward to using it.

On Thursday, April 2, 2015, Yonik Seeley ysee...@gmail.com wrote:

 On Thu, Apr 2, 2015 at 10:25 AM, Ryan Josal rjo...@gmail.com
 javascript:; wrote:
  Sorting the result set or the facets?  For the facets there is
  facet.sort=index (lexicographically) and facet.sort=count.  So maybe you
  are asking if you can sort by index, but reversed?  I don't think this is
  possible, and it's a good question.

 The new facet module that will be in Solr 5.1 supports sorting both
 directions on both count and index order (as well as by statistics /
 bucket aggregations).
 http://yonik.com/json-facet-api/

 -Yonik



Re: edismax operators

2015-04-02 Thread Jack Krupansky
Personally, I am not convinced how the q.op and mm parameters are really
handled within nested queries. There have been bugs in edismax and some
oddities for how it does work. I have personally given up on figuring out
how the code works. At one stage, back in the days when I did feel that I
had a handle on the code, the q.op/mm logic seemed to apply only to the
outer, top level of the query, not to the nested terms of the query, but my
recollection could be faulty on that specific point, and it may have
changed as some bugs have been fixed.

So, I would suggest that you file a Jira and let the committers sort out
whether it is really a bug or simply needs better doc for its expected
behavior on this specific issue.

-- Jack Krupansky

On Thu, Apr 2, 2015 at 1:02 PM, Mahmoud Almokadem prog.mahm...@gmail.com
wrote:

 Thanks all for you response,

 But the parsed_query and number of results still when changing MM parameter

 the following results for mm=100% and mm=0%


 http://solrserver/solr/collection1/select?q=%2B(word1+word2)rows=0fl=Titlewt=jsonindent=truedebugQuery=truedefType=edismaxqf=titlemm=100%25stopwords=truelowercaseOperators=true
 
 http://10.1.1.118:8090/solr/PAEB/select?q=%2B(word1+word2)rows=0fl=Titlewt=jsonindent=truedebugQuery=truedefType=edismaxqf=titlemm=100%25stopwords=truelowercaseOperators=true
 

 rawquerystring: +(word1 word2), querystring: +(word1 word2),
 parsedquery: (+(+(DisjunctionMaxQuery((title:word1))
 DisjunctionMaxQuery((title:word2)/no_coord,
 parsedquery_toString: +(+((title:word1) (title:word2)))”,




 http://solrserver/solr/collection1/select?q=%2B(word1+word2)rows=0fl=Titlewt=jsonindent=truedebugQuery=truedefType=edismaxqf=titlemm=0%25stopwords=truelowercaseOperators=true
 
 http://10.1.1.118:8090/solr/PAEB/select?q=%2B(word1+word2)rows=0fl=Titlewt=jsonindent=truedebugQuery=truedefType=edismaxqf=titlemm=100%25stopwords=truelowercaseOperators=true
 

 rawquerystring: +(word1 word2), querystring: +(word1 word2),
 parsedquery: (+(+(DisjunctionMaxQuery((title:word1))
 DisjunctionMaxQuery((title:word2)/no_coord,
 parsedquery_toString: +(+((title:word1) (title:word2))),

 There are any changes on two queries

 solr version 4.8.1

 Thanks,
 Mahmoud

 On Thu, Apr 2, 2015 at 6:56 PM, Davis, Daniel (NIH/NLM) [C] 
 daniel.da...@nih.gov wrote:

  Thanks Shawn,
 
  This is what I thought, but Solr often has features I don't anticipate.
 
  -Original Message-
  From: Shawn Heisey [mailto:apa...@elyograg.org]
  Sent: Thursday, April 02, 2015 12:54 PM
  To: solr-user@lucene.apache.org
  Subject: Re: edismax operators
 
  On 4/2/2015 9:59 AM, Davis, Daniel (NIH/NLM) [C] wrote:
   Can the mm parameter be set per clause?I guess I've ignored it in
  the past aside from setting it once to what seemed like a reasonable
 value.
   That is probably replicated across every collection, which cannot be
  ideal for relevance.
 
  It applies to the whole query.  You can have a different value on every
  query you send.  Just like with other parameters, defaults can be
  configured in the solrconfig.xml request handler definition.
 
  Thanks,
  Shawn
 
 



Re: newbie questions regarding solr cloud

2015-04-02 Thread Erick Erickson
See inline:

On Thu, Apr 2, 2015 at 12:36 PM, Ben Hsu ben@criticalmedia.com wrote:
 Hello

 I am playing with solr5 right now, to see if its cloud features can replace
 what we have with solr 3.6, and I have some questions, some newbie, and
 some not so newbie

 Background: the documents we are putting in solr have a date field. the
 majority of our searches are restricted to documents created within the
 last week, but searches do go back 60 days. documents older than 60 days
 are removed from the repo. we also want high availability in case a machine
 becomes unavailable

 our current method, using solr 3.6, is to split the data into 1 day chunks,
 within each day the data is split into several shards, and each shard has 2
 replicas. Our code generates the list of cores to be queried on based on
 the time ranged in the query. Cores that fall off the 60 day range are
 deleted through solr's RESTful API.

 This all sounds a lot like what Solr Cloud provides, so I started looking
 at Solr Cloud's features.

 My newbie questions:

  - it looks like the way to write a document is to pick a node (possibly
 using a LB), send it to that node, and let solr figure out which nodes that
 document is supposed to go. is this the recommended way?

[EOE] That's totally fine. If you're using SolrJ a better way is to
use CloudSolrClient
which sends the docs to the proper leader, thus saving one hop.

  - similarly, can I just randomly pick a core (using the demo example:
 http://localhost:7575/solr/#/gettingstarted_shard1_replica2/query ), query
 it, and let it scatter out the queries to the appropriate cores, and send
 me the results back? will it give me back results from all the shards?

[EOE] Yes. Actually, you don't even have to pick a core, just a collection.
The # is totally unneeded, it's just part of navigating around the UI. So this
should work:
http://localhost:7575/solr/gettingstarted/query?q=*:*

  - is there a recommended Python library?
[EOE] Unsure. If you do find one, check that it has the
CloudSolrClient support as
I expect that would take the most effort


 My hopefully less newbie questions:
  - does solr auto detect when node become unavailable, and stop sending
 queries to them?

[EOE] Yes, that's what Zookeeper is all about. As each Solr node comes up it
registers itself as a listener for collection state changes. ZK
detects a node dying and
notifies all the remaining nodes that nodeX is out of commission and
they adjust accordingly.

  - when the master node dies and the cluster elects a new master, what
 happens to writes?
[EOE] Stop thinking master/slave! It's leaders and replicas
(although I'm trying
to use leaders and followers). The critical bit is that on an
update, the raw document
is forwarded from the leader to all followers so they can come and go.
You simply cannot
rely on a particular node that is a leader remaining the leader. For
instance, if you bring up
your nodes in a different order tomorrow, the leaders and followers
won't be the same.


  - what happens when a node is unavailable
[EOE] SolrCloud does the right thing and keeps on chugging. See the
comments about
auto-detect. The exception is that if _all_ the nodes hosting a shard
go down, you cannot
add to the index and queries will fail unless you set shards.tolerant=true.

  - what is the procedure when a shard becomes too big for one machine, and
 needs to be split?
There is the Collections API SPLITSHARD command you can use. This means that
you increase by powers of two though, there's no such thing as adding,
say, one new
shard to a 4 shard cluster.

You can also reindex from scratch.

You can also overshard when you initially create your collection and
host multiple
shards and/or replicas on a single machine, then physically move them when the
aggregate size exceeds your boundaries.

  - what is the procedure when we lose a machine and the node needs replacing
Use the Collections API to DELETEREPLICA on the replicas on the dead node.
Use the Collections API to ADREPLICA on new machines.

  - how would we quickly bulk delete data within a date range?
[EOE]
...solr/update?commit=truestream.body=deletequerydate_field:[DATE1
TO DATE2]/query/delete

You can take explicit control of where your docs go by various routing
schemes. The default is to route based on a hash of the id field, but
if you choose you route all docs based on the value of a field
(_route_) or based on the first part of the unique key with the bang
(!) operator.

Do note, though, that one of the consequences of putting all of a
day's data on a single shard (or subset of shards) is that you
concentrate all your searching on those machines, and the other ones
can be idle. At times you can get better throughput by just letting
the docs be distributed randomly. That's what I'd start with
anyway.

Best,
Erick


sort param could not be parsed as a query, and is not a field that exists in the index: geodist()

2015-04-02 Thread Niraj
*Objective: To find out all locations those are present within 1 KM of the
specified reference point, sorted by the distance from the reference*

curl -i --globoff --negotiate -u XXX:XXX -XGET  -H Accept:
application/json \
-X GET
http://xx:8983/solr/loc_data/select?q=*:*wt=jsonindent=truestart=0rows=1000fq=%7B!geofilt%7Dsfield=GEO_LOCATIONpt=25.8227920532,-80.1314697266d=1sort=geodist()+asc


--
{
  responseHeader:{
status:400,
QTime:1,
params:{
  d:1,
  sort:geodist() asc,
  indent:true,
  start:0,
  q:*:*,
  sfield:GEO_LOCATION,
  pt:25.8227920532,-80.1314697266,
  doAs:*,
  wt:json,
  fq:{!geofilt},
  rows:1000}},
  error:{
msg:*sort param could not be parsed as a query, and is not a field
that exists in the index: geodist()*,
code:400}}

Please note that, the query works properly without the geodist() function.
I am newbie to Solr. Please help.

Regards,
Niraj






--
View this message in context: 
http://lucene.472066.n3.nabble.com/sort-param-could-not-be-parsed-as-a-query-and-is-not-a-field-that-exists-in-the-index-geodist-tp4197350.html
Sent from the Solr - User mailing list archive at Nabble.com.