Re:

2013-12-01 Thread Henrik Ossipoff Hansen
To expand a bit on the other replies, yes, your order data should definitely be 
denormalized into one single order scheme. We store orders this way in Solr, 
since near real-time search among live orders is a requirement for several of 
our systems.

Something non-Solr though - consider denormalizing your order data in your 
relational database as well. Sooner or later, you will get into trouble with 
keeping orders and associated products separated via normalization - unless you 
keep a history of all previous versions of a product, or you never change 
products. Say that a product changes its name one month after an order is 
placed  - if you keep the data normalized, all previous orders will get the new 
name of the product - not the name it had when the order was placed. This 
behaviour is usually not sought after in my experience.

This would, of course, also make a direct map to Solr more straightforward.
--
Henrik Ossipoff Hansen

On 1. dec. 2013 at 02.06.54, subacini Arunkumar 
(subac...@gmail.commailto://subac...@gmail.com) wrote:

Thanks Walter for the reply. Here is my complete requirement.


Please let me know the possible solutions to address my requirement.

* Two tables might have millions of records with 50 columns in each table

* Expected output is same as what we get in SQL inner join

say For Eg, I have two tables Product , Order table.

*Product Table *

id Name

P1 ipad

P2 iphone 4

P3 iphone 5

*Order Table*

id order date product_id

O1 1-Dec-2012 P1

O2 1-Dec-2012 P2

O3 2-Dec-2012 P2


*Expected Output: *I want to show the details in UI as below [SQL inner
join ]

O1 01-Dec-2012 ipad

O2 1-Dec-2012 iPhone 4

O3 2-Dec-2012 iPhone 5


I tried setting up two solr cores , Product core  Order Core

*Option 1: Using Solr Join*

I got expected result but i was able to get columns only from one core (ie)
total 3 records but only product table columns


http://…./product/select?q=*fq={!join from=product_id to=id
fromIndex=order}*


*Option 2: Using shards*

Created third core, but number of records is sum of(Product core + Order
Core) as documents are of different types and they are all unique(ie) 6
records

So how could i generate a single document with all fields from two
different document types in different cores.


On Sat, Nov 30, 2013 at 8:04 AM, Walter Underwood wun...@wunderwood.orgwrote:

 1. Flatten the data into a single table.

 2. Solr does not seem like a good solution for order data, especially live
 orders that need to be transactional. That is a great match to a standard
 relational DB.

 wunder

 On Nov 30, 2013, at 12:15 AM, subacini Arunkumar subac...@gmail.com
 wrote:

  Hi
 
  We are using solr 4.4 . Please let me know the possible solutions to
  address my requirement.
 
  We have to fetch data from two tables Product , Order table.
 
  Product Table
 
  id Name
  P1 ipad
  P2 iphone 4
  P3 iphone 5
 
 
  Order Table
 
  id order date product_id
  O1 1-Dec-2012 P1
  O2 1-Dec-2012 P2
  O3 2-Dec-2012 P2
 
  I want to show the details in UI as below
 
  O1 01-Dec-2012
 
 
  On Sat, Nov 30, 2013 at 12:13 AM, subacini Arunkumar subac...@gmail.com
 wrote:
 
  Hi
 
  We are using solr 4.4 . Please let me know the possible solutions to
  address my requirement.
 
  We have to fetch data from two tables Product , Order table.
 
  Product Table
 
  id Name
  P1 ipad
  P2 iphone 4
  P3 iphone 5
 
 
  Order Table
 
  id order date product_id
  O1
 
 

 --
 Walter Underwood
 wun...@wunderwood.org






Re: SolrCloud unstable

2013-11-12 Thread Henrik Ossipoff Hansen
Hello,

I’m experiencing sort of the same issue, but with much smaller indexes - 
although with much higher latency on disks during backup sessions on our NFS. I 
have a feeling the solution could be the same, so I’ll just leave my story here 
just in case, no solution found yet. 
http://lucene.472066.n3.nabble.com/SolrCloud-never-fully-recovers-after-slow-disks-td4099350.html

--
Henrik Ossipoff Hansen
Developer, Entertainment Trading


On 12. nov. 2013 at 09.47.01, Martin de Vries 
(mar...@downnotifier.commailto://mar...@downnotifier.com) wrote:

Hi,

We have:

Solr 4.5.1 - 5 servers
36 cores, 2 shards each, 2 servers per shard (every core is on 4
servers)
about 4.5 GB total data on disk per server
4GB JVM-Memory per server, 3GB average in use
Zookeeper 3.3.5 - 3 servers (one shared with Solr)
haproxy load balancing

Our Solrcloud is very unstable. About one time a week some cores go in
recovery state or down state. Many timeouts occur and we have to restart
servers to get them back to work. The failover doesn't work in many
cases, because one server has the core in down state, the other in
recovering state. Other cores work fine. When the cloud is stable I
sometimes see log messages like:
- shard update error StdNode:
http://033.downnotifier.com:8983/solr/dntest_shard2_replica1/:org.apache.solr.client.solrj.SolrServerException:
IOException occured when talking to server at:
http://033.downnotifier.com:8983/solr/dntest_shard2_replica1
- forwarding update to
http://033.downnotifier.com:8983/solr/dn_shard2_replica2/ failed -
retrying ...
- null:ClientAbortException: java.io.IOException: Broken pipe

Before the the cloud problems start there are many large Qtime's in the
log (sometimes over 50 seconds), but there are no other errors until the
recovery problems start.


Any clue about what can be wrong?


Kinds regards,

Martin


RE: Why do people want to deploy to Tomcat?

2013-11-12 Thread Henrik Ossipoff Hansen
I agree with previous statements about the ‘example’ name is putting people 
off. Not only that though, I believe there are still some of the official wiki 
pages that directly states that the shipped Jetty is not appropriate for 
production use, which was what made us use Tomcat for a long while (that, and 
one developer had previous experience with Tomcat configuration).
--
Henrik Ossipoff Hansen
Developer, Entertainment Trading


On 12. nov. 2013 at 15.45.42, Hoggarth, Gil 
(gil.hogga...@bl.ukmailto://gil.hogga...@bl.uk) wrote:

For me, a side-affect of 'example' is that it's just that, not appropriate for 
production. But also, there's the organisation factor beyond Solr that is about 
staff expertise - we don't have any systems that utilise jetty so we're 
unfamiliar with its configuration, issues, or oddities. Tomcat is our defacto 
container so it makes sense for us to implement Solr within Tomcat.

If we ruled out these reasons, I'd still be looking for a container that:
- was a standalone installation (i.e., outside of Solr tarball) so that it 
would be managed via yum (we run on RHEL). This separates any issues of Solr 
from issues of jetty, which given a current lack of jetty knowledge would be a 
helpful thing.
- the container service could be managed via standard SysV startup processes. 
To be fair, I've implemented our own for Tomcat and could do this for jetty, 
but I'd prefer jetty included this (which would suggest it is more prepared for 
enterprise use).
- Likewise, I assume all of jetty's configuration can be reset to use normal 
RHEL /etc/ and /var/ directories, but I'd prefer that jetty did this for me (to 
demonstrate again it's enterprise-ready status).

Yes, I could do all the necessary bespoke configuration so that jetty follows 
the above reasons, but because I'd have to I question if it's ready for our 
enterprise setup (which mainly means that our Operations team will fight 
against unusual configurations).

Having added all of this, I have to admit that I like the idea of using jetty 
because you guys tell me that Solr is affectively pre-configured for jetty. But 
then I'd want to know what in particular these jetty configurations were!

BTW Very pleased that this is being discussed - the views can help me argue our 
case to use jetty if it is indeed more beneficial to do so.

Gil

-Original Message-
From: Sebastián Ramírez [mailto:sebastian.rami...@senseta.com]
Sent: 12 November 2013 13:38
To: solr-user@lucene.apache.org
Subject: Re: Why do people want to deploy to Tomcat?

I agree with Doug, when I started I had to spend some time figuring out what 
was just an example and what I would have to change in a production 
environment... until I found that all the example was ready for production.

Of course, you commonly have to change the settings, parameters, fields, etc. 
of your Solr system, but the example doesn't have anything that is not for 
production.


Sebastián Ramírez
[image: SENSETA – Capture  Analyze] http://www.senseta.com/


On Tue, Nov 12, 2013 at 8:18 AM, Amit Aggarwal amit.aggarwa...@gmail.comwrote:

 Agreed with Doug
 On 12-Nov-2013 6:46 PM, Doug Turnbull 
 dturnb...@opensourceconnections.com
 wrote:

  As an aside, I think one reason people feel compelled to deviate
  from the distributed jetty distribution is because the folder is named 
  example.
  I've had to explain to a few clients that this is a bit of a misnomer.
 The
  IT dept especially sees example and feels uncomfortable using that
  as a starting point for a jetty install. I wish it was called
  default or
 bin
  or something where its more obviously the default jetty distribution
  of Solr.
 
 
  On Tue, Nov 12, 2013 at 7:06 AM, Roland Everaert
  reveatw...@gmail.com
  wrote:
 
   In my case, the first time I had to deploy and configure solr on
   tomcat (and jboss) it was a requirement to reuse as much as
   possible the application/web server already in place. The next
   deployment I also use tomcat, because I was used to deploy on
   tomcat and I don't know jetty
 at
   all.
  
   I could ask the same question with regard to jetty. Why
   use/bundle(/ if
  not
   recommend) jetty with solr over other webserver solutions?
  
   Regards,
  
  
   Roland Everaert.
  
  
  
   On Tue, Nov 12, 2013 at 12:33 PM, Alvaro Cabrerizo
   topor...@gmail.com
   wrote:
  
In my case, the selection of the servlet container has never
been a
  hard
requirement. I mean, some customers provide us a virtual machine
   configured
with java/tomcat , others have a tomcat installed and want to
share
 it
   with
solr, others prefer jetty because their sysadmins are used to
 configure
it... At least in the projects I've been working in, the
selection
 of
   the
servlet engine has not been a key factor in the project success.
   
Regards.
   
   
On Tue, Nov 12, 2013 at 12:11 PM, Andre Bois-Crettez
andre.b...@kelkoo.comwrote:
   
 We are using Solr running

RE: SolrCloud never fully recovers after slow disks

2013-11-11 Thread Henrik Ossipoff Hansen
The joy was short-lived.

Tonight our environment was “down/slow” a bit longer than usual. It looks like 
two of our nodes never recovered, clusterstate says everything is active. All 
nodes are throwing this in the log (the nodes they have trouble reaching are 
the ones that are affected) - the error comes about several cores:

ERROR - 2013-11-11 09:16:42.735; org.apache.solr.common.SolrException; Error 
while trying to recover. 
core=products_se_shard1_replica2:org.apache.solr.client.solrj.SolrServerException:
 Timeout occured while waiting response from server at: 
http://solr04.cd-et.com:8080/solr
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:431)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
at 
org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:198)
at 
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:342)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:219)
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:150)
at java.net.SocketInputStream.read(SocketInputStream.java:121)
at 
org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:166)
at 
org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:90)
at 
org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:281)
at 
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:92)
at 
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:62)
at 
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:254)
at 
org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:289)
at 
org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:252)
at 
org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:191)
at 
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:300)
at 
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:127)
at 
org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:717)
at 
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:522)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:365)
... 4 more

ERROR - 2013-11-11 09:16:42.736; org.apache.solr.cloud.RecoveryStrategy; 
Recovery failed - trying again... (30) core=products_se_shard1_replica2
--
Henrik Ossipoff Hansen
Developer, Entertainment Trading


On 10. nov. 2013 at 21.07.32, Henrik Ossipoff Hansen 
(h...@entertainment-trading.commailto://h...@entertainment-trading.com) wrote:

Solr version is 4.5.0.

I have done some tweaking. Doubling my Zookeeper timeout values in zoo.cfg and 
the Zookeeper timeout in solr.xml seemed to somewhat minimize the problem, but 
it still did occur. I next stopped all larger batch indexing in the period 
where the issues happened, which also seemed to help somewhat. Now the next 
thing weirds me out a bit - I switched from using Tomcat7 to using the Jetty 
that ships with Solr, and that actually seems to have fixed the last issues 
(together with stopping a few smaller updates - very few).

During the slow period in the night, I get something like this:

03:11:49 ERROR ZkController There was a problem finding the leader in 
zk:org.apache.solr.common.SolrException: Could not get leader props
03:06:47 ERROR Overseer Could not create Overseer node
03:06:47 WARN LeaderElector
03:06:47 WARN ZkStateReader ZooKeeper watch triggered,​ but Solr cannot talk to 
ZK
03:07:41 WARN RecoveryStrategy Stopping recovery for 
zkNodeName=solr04.cd-et.com:8080_solr_auto_suggest_shard1_replica2core=auto_suggest_shard1_replica2

After this, the cluster state seems to be fine, and I'm not being spammed with 
errors in the log files.

Bottom line is that the issues are fixed for now it seems, but I still find it 
weird that Solr was not able to fully receover.

// Henrik Ossipoff

-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com]
Sent: 10. november 2013 19:27
To: solr-user@lucene.apache.org
Subject: Re: SolrCloud never fully recovers after

Re: SolrCloud never fully recovers after slow disks

2013-11-11 Thread Henrik Ossipoff Hansen
I will file a JIRA later today.

What I don’t get though (I haven’t looked much into any actual Solr code) is 
that at this point, our systems are running fine, so timeouts shouldn’t be an 
issue. Those two nodes though, is somehow left in a state where their response 
time is up to around 120k ms - which is fairly high - everything else is 
running like normal at this point.
--
Henrik Ossipoff Hansen
Developer, Entertainment Trading


On 11. nov. 2013 at 16.01.58, Mark Miller 
(markrmil...@gmail.commailto://markrmil...@gmail.com) wrote:

The socket read timeouts are actually fairly short for recovery - we should 
probably bump them up. Can you file a JIRA issue? It may be a symptom rather 
than a cause, but given a slow env, bumping them up makes sense.

- Mark

 On Nov 11, 2013, at 8:27 AM, Henrik Ossipoff Hansen 
 h...@entertainment-trading.com wrote:

 The joy was short-lived.

 Tonight our environment was “down/slow” a bit longer than usual. It looks 
 like two of our nodes never recovered, clusterstate says everything is 
 active. All nodes are throwing this in the log (the nodes they have trouble 
 reaching are the ones that are affected) - the error comes about several 
 cores:

 ERROR - 2013-11-11 09:16:42.735; org.apache.solr.common.SolrException; Error 
 while trying to recover. 
 core=products_se_shard1_replica2:org.apache.solr.client.solrj.SolrServerException:
  Timeout occured while waiting response from server at: 
 http://solr04.cd-et.com:8080/solr
 at 
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:431)
 at 
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
 at 
 org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:198)
 at 
 org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:342)
 at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:219)
 Caused by: java.net.SocketTimeoutException: Read timed out
 at java.net.SocketInputStream.socketRead0(Native Method)
 at java.net.SocketInputStream.read(SocketInputStream.java:150)
 at java.net.SocketInputStream.read(SocketInputStream.java:121)
 at 
 org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:166)
 at 
 org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:90)
 at 
 org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:281)
 at 
 org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:92)
 at 
 org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:62)
 at 
 org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:254)
 at 
 org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:289)
 at 
 org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:252)
 at 
 org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:191)
 at 
 org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:300)
 at 
 org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:127)
 at 
 org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:717)
 at 
 org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:522)
 at 
 org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
 at 
 org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
 at 
 org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
 at 
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:365)
 ... 4 more

 ERROR - 2013-11-11 09:16:42.736; org.apache.solr.cloud.RecoveryStrategy; 
 Recovery failed - trying again... (30) core=products_se_shard1_replica2
 --
 Henrik Ossipoff Hansen
 Developer, Entertainment Trading


 On 10. nov. 2013 at 21.07.32, Henrik Ossipoff Hansen 
 (h...@entertainment-trading.commailto://h...@entertainment-trading.com) 
 wrote:

 Solr version is 4.5.0.

 I have done some tweaking. Doubling my Zookeeper timeout values in zoo.cfg 
 and the Zookeeper timeout in solr.xml seemed to somewhat minimize the 
 problem, but it still did occur. I next stopped all larger batch indexing in 
 the period where the issues happened, which also seemed to help somewhat. Now 
 the next thing weirds me out a bit - I switched from using Tomcat7 to using 
 the Jetty that ships with Solr, and that actually seems to have fixed the 
 last issues (together with stopping a few smaller updates - very few).

 During the slow period in the night, I get something like this:

 03:11:49 ERROR ZkController There was a problem finding the leader in 
 zk:org.apache.solr.common.SolrException: Could not get leader props
 03:06:47 ERROR Overseer

RE: SolrCloud never fully recovers after slow disks

2013-11-10 Thread Henrik Ossipoff Hansen
Solr version is 4.5.0.

I have done some tweaking. Doubling my Zookeeper timeout values in zoo.cfg and 
the Zookeeper timeout in solr.xml seemed to somewhat minimize the problem, but 
it still did occur. I next stopped all larger batch indexing in the period 
where the issues happened, which also seemed to help somewhat. Now the next 
thing weirds me out a bit - I switched from using Tomcat7 to using the Jetty 
that ships with Solr, and that actually seems to have fixed the last issues 
(together with stopping a few smaller updates - very few).

During the slow period in the night, I get something like this:

03:11:49 ERROR ZkController There was a problem finding the leader in 
zk:org.apache.solr.common.SolrException: Could not get leader props
03:06:47 ERROR Overseer Could not create Overseer node
03:06:47 WARN LeaderElector
03:06:47 WARN ZkStateReader ZooKeeper watch triggered,​ but Solr cannot talk to 
ZK
03:07:41 WARN RecoveryStrategy Stopping recovery for 
zkNodeName=solr04.cd-et.com:8080_solr_auto_suggest_shard1_replica2core=auto_suggest_shard1_replica2

After this, the cluster state seems to be fine, and I'm not being spammed with 
errors in the log files.

Bottom line is that the issues are fixed for now it seems, but I still find it 
weird that Solr was not able to fully receover.

// Henrik Ossipoff

-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com] 
Sent: 10. november 2013 19:27
To: solr-user@lucene.apache.org
Subject: Re: SolrCloud never fully recovers after slow disks

Which version of solr are you using? Regardless of your env, this is a fail 
safe that you should not hit. 

- Mark

 On Nov 5, 2013, at 8:33 AM, Henrik Ossipoff Hansen 
 h...@entertainment-trading.com wrote:
 
 I previously made a post on this, but have since narrowed down the issue and 
 am now giving this another try, with another spin to it.
 
 We are running a 4 node setup (over Tomcat7) with a 3-ensemble external 
 ZooKeeper. This is running no a total of 7 (4+3) different VMs, and each VM 
 is using our Storage system (NFS share in VMWare).
 
 Now I do realize and have heard, that NFS is not the greatest way to run Solr 
 on, but we have never had this issue on non-SolrCloud setups.
 
 Basically, each night when we run our backup jobs, our storage becomes a bit 
 slow in response - this is obviously something we’re trying to solve, but 
 bottom line is, that all our other systems somehow stays alive or recovers 
 gracefully when bandwidth exists again.
 SolrCloud - not so much. Typically after a session like this, 3-5 nodes will 
 either go into a Down state or a Recovering state - and stay that way. 
 Sometimes such node will even be marked as leader. A such node will have 
 something like this in the log:
 
 ERROR - 2013-11-05 08:57:45.764; 
 org.apache.solr.update.processor.DistributedUpdateProcessor; ClusterState 
 says we are the leader, but locally we don't think so ERROR - 2013-11-05 
 08:57:45.768; org.apache.solr.common.SolrException; 
 org.apache.solr.common.SolrException: ClusterState says we are the leader 
 (http://solr04.cd-et.com:8080/solr/products_fi_shard1_replica2), but locally 
 we don't think so. Request came from 
 http://solr01.cd-et.com:8080/solr/products_fi_shard2_replica1/
at 
 org.apache.solr.update.processor.DistributedUpdateProcessor.doDefensiveChecks(DistributedUpdateProcessor.java:381)
at 
 org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:243)
at 
 org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:428)
at 
 org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:247)
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:174)
at 
 org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703)
at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406)
at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)
at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224)
at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169)
at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168

Re: SolrCloud never fully recovers after slow disks

2013-11-07 Thread Henrik Ossipoff Hansen
Hey Erick,

I have tried upping the timeouts quite a bit now, and have tried upping the 
zkTimeout setting in Solr itself (I found a few old posts on the mailing list 
suggesting this).

I realise this is a sort of weird situation, where we are actually trying to 
work around some horrible hardware setup.

Thank you for your post - I will make another post in a day or two after I see 
how it performs.
--
Henrik Ossipoff Hansen
Developer, Entertainment Trading


On 7. nov. 2013 at 13.23.59, Erick Erickson 
(erickerick...@gmail.commailto://erickerick...@gmail.com) wrote:

Right, can you up your ZK timeouts significantly? It sounds like
your ZK timeout is short enough that when your system slows
down, the timeout is exceeded and it's throwing Solr
into a tailspin

See zoo.cfg.

Best,
Erick


On Tue, Nov 5, 2013 at 3:33 AM, Henrik Ossipoff Hansen 
h...@entertainment-trading.com wrote:

 I previously made a post on this, but have since narrowed down the issue
 and am now giving this another try, with another spin to it.

 We are running a 4 node setup (over Tomcat7) with a 3-ensemble external
 ZooKeeper. This is running no a total of 7 (4+3) different VMs, and each VM
 is using our Storage system (NFS share in VMWare).

 Now I do realize and have heard, that NFS is not the greatest way to run
 Solr on, but we have never had this issue on non-SolrCloud setups.

 Basically, each night when we run our backup jobs, our storage becomes a
 bit slow in response - this is obviously something we’re trying to solve,
 but bottom line is, that all our other systems somehow stays alive or
 recovers gracefully when bandwidth exists again.
 SolrCloud - not so much. Typically after a session like this, 3-5 nodes
 will either go into a Down state or a Recovering state - and stay that way.
 Sometimes such node will even be marked as leader. A such node will have
 something like this in the log:

 ERROR - 2013-11-05 08:57:45.764;
 org.apache.solr.update.processor.DistributedUpdateProcessor; ClusterState
 says we are the leader, but locally we don't think so
 ERROR - 2013-11-05 08:57:45.768; org.apache.solr.common.SolrException;
 org.apache.solr.common.SolrException: ClusterState says we are the leader (
 http://solr04.cd-et.com:8080/solr/products_fi_shard1_replica2), but
 locally we don't think so. Request came from
 http://solr01.cd-et.com:8080/solr/products_fi_shard2_replica1/
 at
 org.apache.solr.update.processor.DistributedUpdateProcessor.doDefensiveChecks(DistributedUpdateProcessor.java:381)
 at
 org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:243)
 at
 org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:428)
 at
 org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:247)
 at
 org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:174)
 at
 org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
 at
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
 at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
 at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)
 at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
 at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
 at
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224)
 at
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169)
 at
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
 at
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
 at
 org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:927)
 at
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
 at
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
 at
 org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987)
 at
 org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:579)
 at
 org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:307)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:724)

 On the other nodes, an error similar to this will be in the log:

 09:27:34 - ERROR - SolrCmdDistributor shard update error RetryNode:
 http://solr04.cd-et.com:8080/solr/products_dk_shard1_replica2

SolrCloud never fully recovers after slow disks

2013-11-05 Thread Henrik Ossipoff Hansen
I previously made a post on this, but have since narrowed down the issue and am 
now giving this another try, with another spin to it.

We are running a 4 node setup (over Tomcat7) with a 3-ensemble external 
ZooKeeper. This is running no a total of 7 (4+3) different VMs, and each VM is 
using our Storage system (NFS share in VMWare).

Now I do realize and have heard, that NFS is not the greatest way to run Solr 
on, but we have never had this issue on non-SolrCloud setups.

Basically, each night when we run our backup jobs, our storage becomes a bit 
slow in response - this is obviously something we’re trying to solve, but 
bottom line is, that all our other systems somehow stays alive or recovers 
gracefully when bandwidth exists again.
SolrCloud - not so much. Typically after a session like this, 3-5 nodes will 
either go into a Down state or a Recovering state - and stay that way. 
Sometimes such node will even be marked as leader. A such node will have 
something like this in the log:

ERROR - 2013-11-05 08:57:45.764; 
org.apache.solr.update.processor.DistributedUpdateProcessor; ClusterState says 
we are the leader, but locally we don't think so
ERROR - 2013-11-05 08:57:45.768; org.apache.solr.common.SolrException; 
org.apache.solr.common.SolrException: ClusterState says we are the leader 
(http://solr04.cd-et.com:8080/solr/products_fi_shard1_replica2), but locally we 
don't think so. Request came from 
http://solr01.cd-et.com:8080/solr/products_fi_shard2_replica1/
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.doDefensiveChecks(DistributedUpdateProcessor.java:381)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:243)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:428)
at 
org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:247)
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:174)
at 
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
at 
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:927)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
at 
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987)
at 
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:579)
at 
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:307)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)

On the other nodes, an error similar to this will be in the log:

09:27:34 - ERROR - SolrCmdDistributor shard update error RetryNode: 
http://solr04.cd-et.com:8080/solr/products_dk_shard1_replica2/:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
 Server at http://solr04.cd-et.com:8080/solr/products_dk_shard1_replica2 
returned non ok status:503,​ message:Service Unavailable
09:27:34 -ERROR - SolrCmdDistributor forwarding update to 
http://solr04.cd-et.com:8080/solr/products_dk_shard1_replica2/ failed - 
retrying ...

Does anyone have any ideas or leads towards a solution - one that doesn’t 
involve getting a new storage system (a solution we *are* actively working on, 
but that’s not a quick fix in our case). Shouldn’t a setup like this be 
possible? And even more so - shouldn’t SolrCloud be able to gracefully recover 
after issues like this?

--
Henrik Ossipoff Hansen
Developer, Entertainment

Pivot faceting not working after upgrading to 4.5

2013-10-21 Thread Henrik Ossipoff Hansen
Hello,

We have a rather weird behavior I don't really understand. As written in a few 
other threads, we're migrating from a master/slave setup running 4.3 to a 
SolrCloud setup running 4.5. Both run on the same data set (the 4.5 instances 
have been re-indexed under 4.5 obviously).

The following query works fine under our 4.3 setup:

?q=*:*facet.pivot=facet_category,facet_platformfacet=truerows=0

However, in our 4.5 setup, the facet_pivot entry in the facet_count is straight 
up missing in the response. I've been digging around the logs for a bit, but 
I'm unable to find something relating to this. If I remove one of the 
facet.pivot elements (i.e. only having facet.pivot=facet_category) I get an 
error as expected, so that part of the component is at least working.

Does anyone have an idea to something obvious I might have missed? I've been 
unable to find any change logs suggesting changes to this part of the facet 
component.

Thanks.

Regards,
Henrik

Re: Pivot faceting not working after upgrading to 4.5

2013-10-21 Thread Henrik Ossipoff Hansen
I realise now that distributed pivotal faceting is not implemented yet in 
SolrCloud after some digging through the internet.

Apologies :)

Den 21/10/2013 kl. 18.20 skrev Henrik Ossipoff Hansen 
h...@entertainment-trading.com:

 Hello,
 
 We have a rather weird behavior I don't really understand. As written in a 
 few other threads, we're migrating from a master/slave setup running 4.3 to a 
 SolrCloud setup running 4.5. Both run on the same data set (the 4.5 instances 
 have been re-indexed under 4.5 obviously).
 
 The following query works fine under our 4.3 setup:
 
 ?q=*:*facet.pivot=facet_category,facet_platformfacet=truerows=0
 
 However, in our 4.5 setup, the facet_pivot entry in the facet_count is 
 straight up missing in the response. I've been digging around the logs for a 
 bit, but I'm unable to find something relating to this. If I remove one of 
 the facet.pivot elements (i.e. only having facet.pivot=facet_category) I get 
 an error as expected, so that part of the component is at least working.
 
 Does anyone have an idea to something obvious I might have missed? I've been 
 unable to find any change logs suggesting changes to this part of the facet 
 component.
 
 Thanks.
 
 Regards,
 Henrik



Re: SolrCloud Query Balancing

2013-10-16 Thread Henrik Ossipoff Hansen
What you could do (and what we do) is to have a simple proxy in front of your 
Solr instances. We for example run with Nginx in front of all of our Tomcats, 
and use Nginx's upstream capabilities to do a simple loadbalancer for our 
SolrCloud cluster.

http://wiki.nginx.org/HttpUpstreamModule

I'm sure other web servers have similar modules.

Den 16/10/2013 kl. 12.08 skrev michael.boom 
my_sky...@yahoo.commailto:my_sky...@yahoo.com:

Thanks!

I've read a lil' bit about that, but my app is php-based so I'm afraid I
can't use that.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-Query-Balancing-tp4095854p4095857.html
Sent from the Solr - User mailing list archive at Nabble.comhttp://Nabble.com.



Re: SolrCloud Query Balancing

2013-10-16 Thread Henrik Ossipoff Hansen
I did not actually realize this, I apologize for my previous reply!

Haproxy would definitely be the right choice then for the posters setup for 
redundancy.

Den 16/10/2013 kl. 15.53 skrev Shawn Heisey s...@elyograg.org:

 On 10/16/2013 3:52 AM, michael.boom wrote:
 I have setup a SolrCloud system with: 3 shards, replicationFactor=3 on 3
 machines along with 3 Zookeeper instances.
 
 My web application makes queries to Solr specifying the hostname of one of
 the machines. So that machine will always get the request and the other ones
 will just serve as an aid.
 So I would like to setup a load balancer that would fix that, balancing the
 queries to all machines. 
 Maybe doing the same while indexing.
 
 SolrCloud actually handles load balancing for you.  You'll find that
 when you send requests to one server, they are actually being
 re-directed across the entire cloud, unless you include a
 distrib=false parameter on the request, but that would also limit the
 search to one shard, which is probably not what you want.
 
 The only thing that you don't get with a non-Java client is redundancy.
 If you can't build in failover capability yourself, which is a very
 advanced programming technique, then you need a load balancer.
 
 For my large non-Cloud Solr install, I use haproxy as a load balancer.
 Most of the time, it doesn't actually balance the load, just makes sure
 that Solr is always reachable even if part of it goes down.  The haproxy
 program is simple and easy to use, but performs extremely well.  I've
 got a pacemaker cluster making sure that the shared IP address, haproxy,
 and other homegrown utility applications related to Solr are only
 running on one machine.
 
 Thanks,
 Shawn
 



Hardware dimension for new SolrCloud cluster

2013-10-08 Thread Henrik Ossipoff Hansen
We're in the process of moving onto SolrCloud, and have gotten to the point 
where we are considering how to do our hardware setup.

We're limited to VMs running on our server cluster and storage system, so 
buying new physical servers is out of the question - the question is how we 
should dimension the new VMs.

Our document area is somewhat small, with about 1.2 million orders (rising of 
course), 75k products (divided into 5 countries - each which will be their own 
collection/core) and some million customers.

In our current master/slave setup, we only index the products, with each 
country taking up about 35 MB of disk space. The index frequency i more or less 
updating the indexes 8 times per hour (mostly this is not all data thought, but 
atomic updates with new stock data, new prices etc.).

Our upcoming order and customer indexes however will more or less receive 
updates on the fly as it happens (softcommit) and we expect the same to be 
the case for products in the near future.

- For hardware, it's down to 1 or 2 cores - current master runs with 2 cores
- RAM - currently our master runs with 6 GB only
- How much heap space should we allocate for max heap?

We currently plan on this setup:
- 1 machine for a simple loadbalancer
- 4 VMs totally for the Solr machines themselves (for both leaders and 
replicas, just one replica per shard is enough for our use case)
- A qorum of 3 ZKs

Question is - is this machine setup enough? And how exactly do we dimension the 
Solr machines?

Any help, pointers or resources will be much appreciated :)

Thank you!

SolrCloud looses connection to Zookeeper but stays down?

2013-10-02 Thread Henrik Ossipoff Hansen
We are slowly starting to move from a Master/slave setup into SolrCloud, and 
with the addition some new functionality on our site, we decided to give it a 
go in production (with a very minimal setup so far).

We are experiencing that our nodes looses connection to ZK during the night 
according to the log:

02:17:33 WARN OverseerCollectionProcessor Overseer cannot talk to ZK
02:17:33 WARN Overseer Solr cannot talk to ZK,​ exiting Overseer main queue loop

The node is listed as down in the cloud window in Solr admin. However, as I'm 
speaking, it seems to be able to speak with ZK just fine; I can just fine 
update configurations from ZK to SolrCloud nodes - but the nodes are still 
listed as down. Everything seems to work.

Is this a known bug, that they are still listed as down even though they're up 
and active? We're running 4.4.0.

RE: Facet sorting seems weird

2013-07-16 Thread Henrik Ossipoff Hansen
This is indeed an interesting idea so to speak, but I think it's a bit too 
manual, so to speak, for our use case. I do see it would solve the problem 
though, so thank you for sharing it with the community! :)
 
-Original Message-
From: James Thomas [mailto:jtho...@camstar.com] 
Sent: 15. juli 2013 17:08
To: solr-user@lucene.apache.org
Subject: RE: Facet sorting seems weird

Hi Henrik,

We did something related to this that I'll share.  I'm rather new to Solr so 
take this idea cautiously :-) Our requirement was to show exact values but have 
case-insensitive sorting and facet filtering (prefix filtering).

We created an index field (type=string) for creating facets so that the 
values are indexed as-is.
The values we indexed were given the format lowercase value|exact value So 
for example, given the value bObles, we would index the string 
bobles|bObles.
When displaying the facet we split the facet value from Solr in half and 
display the second half to the user.
Of course the caveat is that you could have 2 facets that differ only in case, 
but to me that's a data cleansing issue.

James

-Original Message-
From: Henrik Ossipoff Hansen [mailto:h...@entertainment-trading.com]
Sent: Monday, July 15, 2013 10:57 AM
To: solr-user@lucene.apache.org
Subject: RE: Facet sorting seems weird

Hello, thank you for the quick reply!

But given that facet.sort=index just sorts by the faceted index (and I don't 
want the facet itself to be in lower-case), would that really work?

Regards,
Henrik Ossipoff


-Original Message-
From: David Quarterman [mailto:da...@corexe.com]
Sent: 15. juli 2013 16:46
To: solr-user@lucene.apache.org
Subject: RE: Facet sorting seems weird

Hi Henrik,

Try setting up a copyfield in your schema and set the copied field to use 
something like 'text_ws' which implements LowerCaseFilterFactory. Then sort on 
the copyfield.

Regards,

DQ

-Original Message-
From: Henrik Ossipoff Hansen [mailto:h...@entertainment-trading.com]
Sent: 15 July 2013 15:08
To: solr-user@lucene.apache.org
Subject: Facet sorting seems weird

Hello, first time writing to the list. I am a developer for a company where we 
recently switched all of our search core from Sphinx to Solr with very great 
results. In general we've been very happy with the switch, and everything seems 
to work just as we want it to.

Today however we've run into a bit of a issue regarding faceted sort.

For example we have a field called brand in our core, defined as the text_en 
datatype from the example Solr core. This field is copied into facet_brand with 
the datatype string (since we don't really need to do much with it except show 
it for faceted navigation).

Now, given these two entries into the field on different documents, LEGO and 
bObles, and given facet.sort=index, it appears that LEGO is sorted as being 
before bObles. I assume this is because of casing differences.

My question then is, how do we define a decent datatype in our schema, where 
the casing is exact, but we are able to sort it without casing mattering?

Thank you :)

Best regards,
Henrik Ossipoff


RE: Facet sorting seems weird

2013-07-16 Thread Henrik Ossipoff Hansen
Hi Alex,

Yes this makes sense. My Java is a bit dusty, but depending on how much in need 
we will become at this feature, it's definitely something we will look into 
creating, and if successful, we will definitely be submitting a patch. Thank 
you for your time and detailed answer!

Best regards,
Henrik Ossipoff

-Original Message-
From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] 
Sent: 15. juli 2013 17:16
To: solr-user@lucene.apache.org
Subject: Re: Facet sorting seems weird

Hi Henrik,

If I understand the question correctly (case-insensitive sorting of the facet 
values), then this is the limitation of the current Facet component.

You can see the full implementation at:
https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/handler/component/FacetComponent.java#L818

If you are comfortable with Java code, the easiest thing might be to copy/fix 
the component and use your own one for faceting. The components are defined in 
solrconfig.xml and FacetComponent is in a default chain.
See:
https://github.com/apache/lucene-solr/blob/trunk/solr/example/solr/collection1/conf/solrconfig.xml#L1194

If you do manage to do this (I would recommend doing it as an extra option), it 
would be nice to have it contributed back to Solr. I think you are not the only 
one with this requirement.

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at once. 
Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Mon, Jul 15, 2013 at 10:08 AM, Henrik Ossipoff Hansen  
h...@entertainment-trading.com wrote:

 Hello, first time writing to the list. I am a developer for a company 
 where we recently switched all of our search core from Sphinx to Solr 
 with very great results. In general we've been very happy with the 
 switch, and everything seems to work just as we want it to.

 Today however we've run into a bit of a issue regarding faceted sort.

 For example we have a field called brand in our core, defined as the 
 text_en datatype from the example Solr core. This field is copied into 
 facet_brand with the datatype string (since we don't really need to do 
 much with it except show it for faceted navigation).

 Now, given these two entries into the field on different documents, LEGO
 and bObles, and given facet.sort=index, it appears that LEGO is 
 sorted as being before bObles. I assume this is because of casing differences.

 My question then is, how do we define a decent datatype in our schema, 
 where the casing is exact, but we are able to sort it without casing 
 mattering?

 Thank you :)

 Best regards,
 Henrik Ossipoff



Facet sorting seems weird

2013-07-15 Thread Henrik Ossipoff Hansen
Hello, first time writing to the list. I am a developer for a company where we 
recently switched all of our search core from Sphinx to Solr with very great 
results. In general we've been very happy with the switch, and everything seems 
to work just as we want it to.

Today however we've run into a bit of a issue regarding faceted sort.

For example we have a field called brand in our core, defined as the text_en 
datatype from the example Solr core. This field is copied into facet_brand with 
the datatype string (since we don't really need to do much with it except show 
it for faceted navigation).

Now, given these two entries into the field on different documents, LEGO and 
bObles, and given facet.sort=index, it appears that LEGO is sorted as being 
before bObles. I assume this is because of casing differences.

My question then is, how do we define a decent datatype in our schema, where 
the casing is exact, but we are able to sort it without casing mattering?

Thank you :)

Best regards,
Henrik Ossipoff


RE: Facet sorting seems weird

2013-07-15 Thread Henrik Ossipoff Hansen
Hello, thank you for the quick reply!

But given that facet.sort=index just sorts by the faceted index (and I don't 
want the facet itself to be in lower-case), would that really work?

Regards,
Henrik Ossipoff


-Original Message-
From: David Quarterman [mailto:da...@corexe.com] 
Sent: 15. juli 2013 16:46
To: solr-user@lucene.apache.org
Subject: RE: Facet sorting seems weird

Hi Henrik,

Try setting up a copyfield in your schema and set the copied field to use 
something like 'text_ws' which implements LowerCaseFilterFactory. Then sort on 
the copyfield.

Regards,

DQ

-Original Message-
From: Henrik Ossipoff Hansen [mailto:h...@entertainment-trading.com] 
Sent: 15 July 2013 15:08
To: solr-user@lucene.apache.org
Subject: Facet sorting seems weird

Hello, first time writing to the list. I am a developer for a company where we 
recently switched all of our search core from Sphinx to Solr with very great 
results. In general we've been very happy with the switch, and everything seems 
to work just as we want it to.

Today however we've run into a bit of a issue regarding faceted sort.

For example we have a field called brand in our core, defined as the text_en 
datatype from the example Solr core. This field is copied into facet_brand with 
the datatype string (since we don't really need to do much with it except show 
it for faceted navigation).

Now, given these two entries into the field on different documents, LEGO and 
bObles, and given facet.sort=index, it appears that LEGO is sorted as being 
before bObles. I assume this is because of casing differences.

My question then is, how do we define a decent datatype in our schema, where 
the casing is exact, but we are able to sort it without casing mattering?

Thank you :)

Best regards,
Henrik Ossipoff