solrcloud and csv import hangs

2012-09-24 Thread dan sutton
Hi,

This appears to happen in trunk too.

It appears that the add command request parameters get sent to the
nodes. If I comment these out like so for add and commit:

core/src/java/org/apache/solr/update/processor/DistributedUpdateProcessor.java

-  params = new ModifiableSolrParams(req.getParams());
+  //params = new ModifiableSolrParams(req.getParams());
+  params = new ModifiableSolrParams();

This things work as expected.

Otherwise params like stream.url gets sent to the replicant nodes
which causes failure if the file is missing, or worse repeatedly
importing the same file if exists on a replicant.

This might not be the right thing to do? ... what should be sent here
for a streaming CSV import?

Dan


On Thu, Sep 20, 2012 at 4:32 PM, dan sutton danbsut...@gmail.com wrote:
 Hi,

 I'm using Solr 4.0-BETA and trying to import a CSV file as follows:

 curl http://localhost:8080/solr/core/update -d overwrite=false -d
 commit=true -d stream.contentType='text/csv;charset=utf-8' -d
 stream.url=file:///dir/file.csv

 I have 2 tomcat servers running on different machines and a separate
 zookeeper quorum (3  zoo servers, 2 on same machine).  This is a 1
 shard core, replicated to the other machine.

 It seems that for a 255K line file I have 170 docs on the server that
 issued the command, but on the other, the index seems to grow
 unbounded?

 Has anyone been seen this, or been successful in using the CSV import
 with solrcloud?

 Cheers,
 Dan


solrcloud and csv import hangs

2012-09-20 Thread dan sutton
Hi,

I'm using Solr 4.0-BETA and trying to import a CSV file as follows:

curl http://localhost:8080/solr/core/update -d overwrite=false -d
commit=true -d stream.contentType='text/csv;charset=utf-8' -d
stream.url=file:///dir/file.csv

I have 2 tomcat servers running on different machines and a separate
zookeeper quorum (3  zoo servers, 2 on same machine).  This is a 1
shard core, replicated to the other machine.

It seems that for a 255K line file I have 170 docs on the server that
issued the command, but on the other, the index seems to grow
unbounded?

Has anyone been seen this, or been successful in using the CSV import
with solrcloud?

Cheers,
Dan


Re: SOLR 4.0 / Jetty Security Set Up

2012-09-07 Thread dan sutton
Hi,

If like most people you have application server(s) in front of solr,
the simplest and most secure option is to bind solr to a local address
(192.168.* or 10.0.0.*). The app server talks to solr via the local
(a.k.a blackhole) ip address that no-one from outside can ever access
as it's not routable.

Plus you then don't need to employ authentication which can slow down
responses as you're ONLY employing access control.This is what we do
for access to 5 solr servers.

Cheers,
Dan

On Wed, Sep 5, 2012 at 10:51 AM, Paul Codman snoozes...@gmail.com wrote:
 First time Solr user and I am loving it! I have a standard Solr 4 set up
 running under Jetty. The instructions in the Wiki do not seem to apply to
 Solr 4 (eg mortbay references / section to uncomment not present in xml
 file / etc) - could someone please advise on steps required to secure Solr
 4 and can someone confirm that security operates in relation to new Admin
 interface. Thanks in advance.


Solr Cloud partitioning

2012-09-05 Thread dan sutton
Hi,

At the moment, partitioning with solrcloud is hash based on uniqueid.
What I'd like to do is have custom partitioning, e.g. based on date
(shard_MMYY).

I'm aware of https://issues.apache.org/jira/browse/SOLR-2592, but
after a cursory look it seems that with the latest patch, one might
end up with multiple partitions in the same shard, perhaps all (e.g.
if 2 or more partition hash values end up in the same range), which
I'd not want.

Has anyone else implemented custom shard partitioning for solrcloud ?

I think the answer is to have the partition class itself pluggable
(default to hash of unique_key as now), but not sure how to pass the
solrConfig pluggable partition class through to ClusterState (which is
in solrj not core)? any advice?

Cheers,
Dan


flashcache and solr/lucene

2012-03-01 Thread dan sutton
Hi,

Just wondering if anyone had any experience with solr and flashcache
[https://wiki.archlinux.org/index.php/Flashcache], my guess it might
be particularly useful for indicies not changing that often, and for
large indicies where an SSD of that size is prohibitive.

Cheers,
Dan


Solr Warm-up performance issues

2012-01-27 Thread dan sutton
Hi List,

We use Solr 4.0.2011.12.01.09.59.41 and have a dataset of roughly 40 GB.
Every day we produce a new dataset of 40 GB and have to switch one for
the other.

Once the index switch over has taken place, it takes roughly 30 min for Solr
to reach maximum performance. Are there any hardware or software solutions
to reduce the warm-up time ? We tried warm-up queries but it didn't change
much.

Our hardware specs is:
   * Dell Poweredge 1950
   * 2 x Quad-Core Xeon E5405 (2.00GHz)
   * 48 GB RAM
   * 2 x 146 GB SAS 3 Gb/s 15K RPM disk configured in RAID mirror

One thing that does seem to take a long time is un-inverting a set of
multivalued fields, are there any optimizations we might be able to
use here?

Thanks for your help.
Dan


Re: How to return exact set of multivalue field

2011-10-20 Thread dan sutton
-field_name:[ * TO 384] +field_name:[385 TO 386]  -field_name:[387 TO *]

On Thu, Oct 20, 2011 at 10:51 AM, Ellery Leung elleryle...@be-o.com wrote:
 Hi all



 I am using Solr 3.4 on Windows 7.



 Here is the example of a multivalue field:



 doc

 arr name=field_name

 str387/str

 str386/str

 /arr

 /doc



 doc

 arr name= field_name 

 str387/str

 str386/str

 /arr

 /doc



 doc

 arr name= field_name

 str387/str

 str386/str

 str385/str

 str382/str

 str312/str

 str311/str

 /arr

 /doc



 I am doing a search on field_name and JUST want to return record that IS
 387 and 386 (the first and second record).



 Here is the query:



 field_name: (387 AND 386)



 But this query return all 3 records, which is wrong.



 I have tried using filter: field_name: (387 AND 386) but it still doesn't
 work.



 Therefore I would like to ask, are there any way to change this query so
 that it will ONLY return first and second record?



 Thank you in advance for any help.




Distributed Search question/feedback

2011-09-22 Thread dan sutton
Hi,

Does SolrCloud use Distributed search as described
http://wiki.apache.org/solr/DistributedSearch or is it different
entirely?

Does SolrCloud suffer from the same limitation as Distributed search
(inefficient to use a high start parameter, and presumably high CPU
highlighting all those docs etc among other issues).

Our search mainly comprises of searches with a country, and
occationally across a continent or worldwide, so I'm thinking it's
probably simpler to have a pan index for worldwide and continent
searches, and seperate country indicies (and these placed closer to
each country for example).

Any pointers for those who've been down the distributed path appreciated!

Cheers,
Dan


logging client ip address

2011-09-07 Thread dan sutton
Hi,

We're using log4j with solr which is working fine and I'm wondering
how I might be able to log the client ip address?

Has anyone else been able to do this?

Cheers,
Dan


Re: logging client ip address

2011-09-07 Thread dan sutton
Does anyone know how I would be able to include the client ip address
for tomcat 6 with log4j?

Cheers,
Dan

On Wed, Sep 7, 2011 at 11:03 AM, Shalin Shekhar Mangar
shalinman...@gmail.com wrote:
 On Wed, Sep 7, 2011 at 2:56 PM, dan sutton danbsut...@gmail.com wrote:

 Hi,

 We're using log4j with solr which is working fine and I'm wondering
 how I might be able to log the client ip address?

 Has anyone else been able to do this?


 Your application container should have an access log facility. That is the
 best way to record client IPs. Solr does not have that capability.

 --
 Regards,
 Shalin Shekhar Mangar.



replication/search on separate LANs

2011-06-10 Thread dan sutton
Hi All,

I'm wondering if anyone had experience on replicating and searching
over separate LANs? currently we do both over the same one.

So each slave would have 2 Ethernet cards, 1/LAN and the master just one.

We're currently building and replicating a daily index, this is quite
large about 15M docs, and during the replication we see a high CPU
load and searching becomes slow so we're trying to mitigate this.

Has anyone set this up? did it help ?

Cheers,
Dan


custom highlighting

2011-05-24 Thread dan sutton
Hi,

I'd like to make the highlighting work as follows:

length(all snippits) approx. 200 chars
hl.snippits = 2 (2 snippits)

e.g. if there is  onyl 1 snippet available, length = 200chars
e.g. if there is 1 snippet, length each snippet == 100chars, so I
take the first 2 and get 200 chars

Is this possible with the regex fragmenter?

Or does anyone know of any contrib fragmenter that might do this?

Many thanks
Dan


Suggester and query/index analysis

2011-05-17 Thread dan sutton
Hi All,

I understand that I can use a custom queryConverter for the input to
the suggester http://wiki.apache.org/solr/Suggester component, however
there dosen't seem to be anything on the indexing side, TST appears to
take the input verbatim, and Jaspell seems to lowercase everything.

The problem with this is that a suggest query like q=l would not show
up 'London, UK' due to case differences. Has anyone using the
suggester component come up with a workaround? My initial thoughts are
to override the TSTLookup to alter the key to pass through an
analyzer, and do the same with my custom queryConverter.

Any other options?

e.g. I'd like the following to all return 'London, UK' as the display
for the autocomplete

london, uk
london   uk
London UK
London uk

etc.

Cheers,
Dan


Enable/disable mainIndex component

2011-05-11 Thread dan sutton
Hi,

Does anyone know if I can do the following:

  mainIndex enable=${enable.master:false}
mergeFactor10/mergeFactor
...
  /mainIndex

  mainIndex enable=${enable.slave:true}
mergeFactor2/mergeFactor
...
  /mainIndex


Cheers,
Dan


Highlighting and custom fragmenting

2011-04-07 Thread dan sutton
Hi All,

I'd like to make the highlighting work as follows:

length(all snippits) approx. 200 chars
hl.snippits = 2 (2 snippits)

is this possible with the regex fragmenter? or does anyone know of any
contrib fragmenter that might do this?

Many thanks
Dan


Re: Math-generated fields during query

2011-03-10 Thread dan sutton
As a workaround can you not have a search component run after the
querycomponent, and have the qty_ordered,unit_price as stored fields
and returned with the fl parameter and have your custom component do
the calc, unless you need to sort by this value too?

Dan

On Wed, Mar 9, 2011 at 10:06 PM, Peter Sturge peter.stu...@gmail.com wrote:
 Hi,

 I was wondering if it is possible during a query to create a returned
 field 'on the fly' (like function query, but for concrete values, not
 score).

 For example, if I input this query:
   q=_val_:product(15,3)fl=*,score

 For every returned document, I get score = 45.

 If I change it slightly to add *:* like this:
   q=*:* _val_:product(15,3)fl=*,score

 I get score = 32.526913.

 If I try my use case of _val_:product(qty_ordered,unit_price), I get
 varying scores depending on...well depending on something.

 I understand this is doing relevance scoring, but it doesn't seem to
 tally with the FunctionQuery Wiki
 [example at the bottom of the page]:

   q=boxname:findbox+_val_:product(product(x,y),z)fl=*,score
 ...where score will contain the resultant volume.

 Is there a trick to getting not a score, but the actual value of
 quantity*price (e.g. product(5,2.21) == 11.05)?

 Many thanks



Split analysis

2011-03-02 Thread dan sutton
Hi All,

I have a requirement to analyze a field with a series of filters,
calculate a 'signature' then concatenate with the original input

e.g.

input = 'this is the input'

tokenized and filtered,  input becomes say 'this input' =
12ef5e (signature)

so the final output indexed is:

12ef5ethis is the input

I can calculate the signature easily, but how can I get access to the
original (now tokenized and filtered) input

Many thanks in advance,
Dan


Re: Replication and newSearcher registerd poll interval

2011-02-17 Thread dan sutton
Hi,

Keeping the thread alive, any thought on only doing replication if
there is no warming currently going on?

Cheers,
Dan

On Thu, Feb 10, 2011 at 11:09 AM, dan sutton danbsut...@gmail.com wrote:
 Hi,

 If the replication window is too small to allow a new searcher to warm
 and close the current searcher before the new one needs to be in
 place, then the slaves continuously has a high load, and potentially
 an OOM error. we've noticed this in our environment where we have
 several facets on large multivalued fields.

 I was wondering what the list though about modifying the replication
 process to skip polls (though warning to logs) when there is a
 searcher in the process of warming? Else as in our case it brings the
 slave to it's knees, workaround was to extend the poll interval,
 though not ideal.

 Cheers,
 Dan



Replication and newSearcher registerd poll interval

2011-02-10 Thread dan sutton
Hi,

If the replication window is too small to allow a new searcher to warm
and close the current searcher before the new one needs to be in
place, then the slaves continuously has a high load, and potentially
an OOM error. we've noticed this in our environment where we have
several facets on large multivalued fields.

I was wondering what the list though about modifying the replication
process to skip polls (though warning to logs) when there is a
searcher in the process of warming? Else as in our case it brings the
slave to it's knees, workaround was to extend the poll interval,
though not ideal.

Cheers,
Dan


Re: facet.mincount

2011-02-03 Thread dan sutton
I don't think facet.mincount works with date faceting, see here:

http://wiki.apache.org/solr/SimpleFacetParameters

Dan

On Thu, Feb 3, 2011 at 10:11 AM, Isan Fulia isan.fu...@germinait.com wrote:
 Any query followed by

 facet=onfacet.date=aUpdDtfacet.date.start=2011-01-02T08:00:00.000Zfacet.date.end=2011-02-03T08:00:00.000Zfacet.date.gap=%2B1HOURfacet.mincount=1

 On 3 February 2011 15:14, Savvas-Andreas Moysidis 
 savvas.andreas.moysi...@googlemail.com wrote:

 could you post the query you are submitting to Solr?

 On 3 February 2011 09:33, Isan Fulia isan.fu...@germinait.com wrote:

  Hi all,
  Even after making facet.mincount=1 , it is showing the results with count
 =
  0.
  Does anyone know why this is happening.
 
  --
  Thanks  Regards,
  Isan Fulia.
 




 --
 Thanks  Regards,
 Isan Fulia.



Re: facet.mincount

2011-02-03 Thread dan sutton
facet.mincount is grouped only under field faceting parameters not
date faceting parameters

On Thu, Feb 3, 2011 at 11:08 AM, Savvas-Andreas Moysidis
savvas.andreas.moysi...@googlemail.com wrote:
 Hi Dan,

 I'm probably just not able to spot this, but where does the wiki page
 mention that the facet.mincount is not applicable on date fields?

 On 3 February 2011 10:55, Isan Fulia isan.fu...@germinait.com wrote:

 I am using solr1.4.1 release version
 I got the following error while using facet.mincount
 java.lang.IllegalStateException: STREAM
        at org.mortbay.jetty.Response.getWriter(Response.java:571)
        at
 org.apache.jasper.runtime.JspWriterImpl.initOut(JspWriterImpl.java:158)
        at
 org.apache.jasper.runtime.JspWriterImpl.flushBuffer(JspWriterImpl.java:151)
        at
 org.apache.jasper.runtime.PageContextImpl.release(PageContextImpl.java:208)
        at

 org.apache.jasper.runtime.JspFactoryImpl.internalReleasePageContext(JspFactoryImpl.java:144)
        at

 org.apache.jasper.runtime.JspFactoryImpl.releasePageContext(JspFactoryImpl.java:95)
        at

 org.apache.jsp.admin.index_jsp._jspService(org.apache.jsp.admin.index_jsp:397)
        at
 org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:80)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
        at

 org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:373)
        at
 org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:464)
        at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:358)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
        at
 org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:487)
        at
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:367)
        at
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
        at
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
        at
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
        at
 org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
        at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:268)
        at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:126)
        at
 org.mortbay.jetty.servlet.DefaultServlet.doGet(DefaultServlet.java:431)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
        at
 org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:487)
        at

 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1098)
        at

 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:286)
        at

 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
        at
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
        at
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
        at
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
        at
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
        at
 org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
        at

 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
        at

 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
        at
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
        at org.mortbay.jetty.Server.handle(Server.java:285)
        at
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
        at

 org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
        at

 org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
        at

 org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)


 On 3 February 2011 16:17, dan sutton danbsut...@gmail.com wrote:

  I don't think facet.mincount works with date faceting, see here:
 
  http://wiki.apache.org/solr/SimpleFacetParameters
 
  Dan
 
  On Thu, Feb 3, 2011 at 10:11 AM, Isan Fulia isan.fu...@germinait.com
  wrote:
   Any query followed by
  
  
 
 facet=onfacet.date=aUpdDtfacet.date.start=2011-01-02T08:00:00.000Zfacet.date.end=2011-02-03T08:00:00.000Zfacet.date.gap=%2B1HOURfacet.mincount=1
  
   On 3 February 2011 15:14, Savvas-Andreas Moysidis 
   savvas.andreas.moysi...@googlemail.com wrote:
  
   could you post the query you are submitting to Solr?
  
   On 3 February 2011 09:33, Isan Fulia isan.fu...@germinait.com
 wrote:
  
Hi all

EmbeddedSolrServer and junit

2011-01-31 Thread dan sutton
Hi,

I have 2 cores CoreA and CoreB, when updating content on CoreB, I use
solrj and EmbeddedSolrServer to query CoreA for information, however
when I do this with my junit tests (which also use EmbeddedSolrServer
to query) I get this error

SEVERE: Previous SolrRequestInfo was not closed!

junit.framework.AssertionFailedError
[junit] at 
org.apache.solr.request.SolrRequestInfo.setRequestInfo(SolrRequestInfo.java:45)

How should I write the junit tests to test a multi-core, with
EmbeddedSolrServer used in a component during querying?

Cheers,
Dan


Re: EmbeddedSolrServer and junit

2011-01-31 Thread dan sutton
Hi,

I think I've found the cause:

src/java/org/apache/solr/util/TestHarness.java,  query(String handler,
SolrQueryRequest req) calls SolrRequestInfo.setRequestInfo(new
SolrRequestInfo(req, rsp)), which my componenet also calls in the same
thread hence the error.

The fix was to override assertQ to call
queryAndResponse(String handler, SolrQueryRequest req) instead which
does not set/clear SolrRequestInfo

Regards,
Dan

On Mon, Jan 31, 2011 at 2:32 PM, dan sutton danbsut...@gmail.com wrote:
 Hi,

 I have 2 cores CoreA and CoreB, when updating content on CoreB, I use
 solrj and EmbeddedSolrServer to query CoreA for information, however
 when I do this with my junit tests (which also use EmbeddedSolrServer
 to query) I get this error

 SEVERE: Previous SolrRequestInfo was not closed!

 junit.framework.AssertionFailedError
 [junit]     at 
 org.apache.solr.request.SolrRequestInfo.setRequestInfo(SolrRequestInfo.java:45)

 How should I write the junit tests to test a multi-core, with
 EmbeddedSolrServer used in a component during querying?

 Cheers,
 Dan



solr equiv of : SELECT count(distinct(field)) FROM index WHERE length(field) 0 AND other_criteria

2010-12-22 Thread dan sutton
Hi,

Is there a way with faceting or field collapsing to do the SQL equivalent of

SELECT count(distinct(field)) FROM index WHERE length(field)  0 AND
other_criteria

i.e. I'm only interested in the total count not the individual records
and counts.

Cheers,
Dan


JMX Cache values are wrong

2010-11-18 Thread dan sutton
Hi,

I've used three different JMX clients to query

solr/core:id=org.apache.solr.search.FastLRUCache,type=queryResultCache
and
solr/core:id=org.apache.solr.search.FastLRUCache,type=documentCache

beans and they appear to return old cache information.

As new searchers come online, the newer caches dosen't appear to be
registered perhaps?
I can see this when I query JMX for the 'description' attribute and
the regenerator JMX output shows a different
org.apache.solr.search.SolrIndexSearcher to that which appears in the
stats.jsp page.

Any ideas as to what's gone wrong ... anyone else experience this?

From registry.jsp:

Solr Specification Version: 1.4.0.2010.09.10.17.10.36
Solr Implementation Version: 1.4.1-dev exported
Lucene Specification Version: 2.9.1
Lucene Implementation Version: 2.9.1 832363 - 2009-11-03 04:37:25

Cheers,
Dan


Re: spatial sorting

2010-09-30 Thread dan sutton
Hi All,

This is more of an FYI for those wanting to filter and sort by distance, and
have the values returned in the result set after determining a way to do
this with existing code.

Using solr 4.0 an example query would contain the following parameters:

/select?

q=stevenage^0.0
+_val_:ghhsin(6371,geohash(52.0274,-0.4952),location)^1.0

Make the boost on all parts of the query other than the ghhsin distance
value function 0 ,and 1 on the function, this is so that the score is then
equal to the distance. (52.0274,-0.4952) here is the query point and
'location' is the geohash field to search against



sort=score asc

basically sort by distance asc (closest first)



fq={!sfilt%20fl=location}pt=52.0274,-0.4952d=30

This is the spatial filter to limit the necessary distance calculations.



fl=*,score

Return all fields (if required) but include the score (which contains the
distance calculation)


Does anyone know if it's possible to return the distance and score
separately?  I know there has been a patch to sort by value function, but
how can one return the values from this?

Cheers,
Dan


On Fri, Sep 17, 2010 at 2:45 PM, dan sutton danbsut...@gmail.com wrote:

 Hi,

 I'm trying to filter and sort by distance with this URL:


 http://localhost:8080/solr/select/?q=*:*fq={!sfilt%20fl=loc_lat_lon}pt=52.02694,-0.49567d=2sort={!func}hsin(52.02694,-0.49567,loc_lat_lon_0_d,%20loc_lat_lon_1_d,3963.205)http://localhost:8080/solr/select/?q=*:*fq=%7B%21sfilt%20fl=loc_lat_lon%7Dpt=52.02694,-0.49567d=2sort=%7B%21func%7Dhsin%2852.02694,-0.49567,loc_lat_lon_0_d,%20loc_lat_lon_1_d,3963.205%29asc

 Filtering is fine but it's failing in parsing the sort with :

 The request sent by the client was syntactically incorrect (can not sort
 on undefined field or function: {!func}(52.02694,-0.49567,loc_lat_lon_0_d,
 loc_lat_lon_1_d, 3963.205)).*

 *I'm using the solr/lucene trunk to try this out ... does anyone know what
 is wrong with the syntax?

 Additionally am I able to return the distance sort values e.g. with param
 fl ? ... else am I going to have to either write my own component (which
 would also look up the filtered cached values rather than re-calculating
 distance) or use an alternative like localsolr ?

 Dan



multiple spatial values

2010-09-21 Thread dan sutton
Hi,

I was looking at the LatLonType and how it might represent multiple lon/lat
values ... it looks to me like the lat would go in {latlongfield}_0_LatLon
and the long in {latlongfield}_1_LatLon ... how then if we have multiple
lat/long points for a doc when filtering for example we choose the correct
points.

e.g. if thinking in cartisean coords and we have

P1(3,4), P2(6,7) ... x is stored with 3,6 and y with 4,7 ...

then how does it ensure we're not erroneously picking (3,7) or (6,4) whilst
filtering with the spatial query?

don't we have to store both values together ? what am i missing here?

Cheers,
Dan


Re: how to normalize a query

2010-09-09 Thread dan sutton
What I wanted was a was to determine that simply the query q=one two is
equivalent to q=two one, by normalizing I might have

q=one two for both for example, and then the q.hashCode() would be the
same

Simply using q.hashCode()  returns different values for each query above so
this is not suitable

Cheers
Dan

On Thu, Sep 9, 2010 at 3:36 PM, Markus Jelsma markus.jel...@buyways.nlwrote:

 LuceneQParser

 http://lucene.apache.org/java/2_4_0/queryparsersyntax.html#Proximity%20Searches

 DismaxQParser
 http://wiki.apache.org/solr/DisMaxQParserPlugin#qs_.28Query_Phrase_Slop.29


 On Thursday 09 September 2010 15:08:41 dan sutton wrote:
  Hi,
 
  Does anyone know how I might normalized a query so that e.g. q=one two
  equals q=two one
 
  Cheers,
  Dan
 

 Markus Jelsma - Technisch Architect - Buyways BV
 http://www.linkedin.com/in/markus17
 050-8536620 / 06-50258350




Re: Auto Suggest

2010-09-03 Thread dan sutton
I set this up a few years ago with something like the following:

fieldType name=autocomplete class=solr.TextField
analyzer type=index
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
filter class=solr.PatternReplaceFilterFactory
pattern=([^a-z0-9]) replacement= replace=all /
filter class=solr.EdgeNGramFilterFactory
maxGramSize=20 minGramSize=1 /
/analyzer
analyzer type=query
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
filter class=solr.PatternReplaceFilterFactory
pattern=([^a-z0-9]) replacement= replace=all /
/analyzer
/fieldType

filter class=solr.PatternReplaceFilterFactory pattern=([^a-z0-9])
replacement= replace=all / is the bit missing i think here

This way the search is agnostic to case and any non-alphanum chars, this was
to facilitate a location autocomplete for searching

So is was a basic search, returning the top N results along with additional
info to show in the autocomplete to our mod_perl servers, Results were
cached in the mod_perl servers.

Regards,
Dan

On Thu, Sep 2, 2010 at 1:53 PM, Jason Rutherglen jason.rutherg...@gmail.com
 wrote:

 I'm having a different issue with the EdgeNGram technique described
 here:
 http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/

 That is one word queries q=app on the query_text field, work fine
 however q=app mou do not.  Why would this be or is there a
 configuration that could be missing?

 On Wed, Sep 1, 2010 at 3:53 PM, Eric Grobler impalah...@googlemail.com
 wrote:
  Thanks for your feedback Robert,
 
  I will try that and see how Solr performs on my data - I think I will
 create
  a field that contains only important key/product terms from the text.
 
  Regards
  Johan
 
  On Wed, Sep 1, 2010 at 9:12 PM, Robert Petersen rober...@buy.com
 wrote:
 
  We don't have that many, just a hundred thousand, and solr response
  times (since the index's docs are small and not complex) are logged as
  typically 1 ms if not 0 ms.  It's funny but sometimes it is so fast no
  milliseconds have elapsed.  Incredible if you ask me...  :)
 
  Once you get SOLR to consider the whole phrase as just one big term, the
  wildcard is very fast.
 
  -Original Message-
  From: Eric Grobler [mailto:impalah...@googlemail.com]
  Sent: Wednesday, September 01, 2010 12:35 PM
  To: solr-user@lucene.apache.org
  Subject: Re: Auto Suggest
 
  Hi Robert,
 
  Interesting approach, how many documents do you have in Solr?
  I have about 2 million and I just wonder if it might be a bit slow.
 
  Regards
  Johan
 
  On Wed, Sep 1, 2010 at 7:38 PM, Robert Petersen rober...@buy.com
  wrote:
 
   I do this by replacing the spaces with a '%' in a separate search
  field
   which is not parsed nor tokenized and then you can wildcard across the
   whole phrase like you want and the spaces don't mess you up.  Just
  store
   the original phrase with spaces in a separate field for returning to
  the
   front end for display.
  
   -Original Message-
   From: Jazz Globe [mailto:jazzgl...@hotmail.com]
   Sent: Wednesday, September 01, 2010 7:33 AM
   To: solr-user@lucene.apache.org
   Subject: Auto Suggest
  
  
   Hallo
  
   How would one implement a multiple term auto-suggest feature in Solr
   that is filter sensitive?
   For example, a user enters :
   mp3
and solr might suggest:
-   mp3 player
-   mp3 nano
-   mp3 sony
   and then the user starts the second word :
   mp3 n
   and that narrows it down to:
- mp3 nano
  
   I had a quick look at the Terms Component.
   I suppose it just returns term totals for the entire index and cannot
  be
   used with a filter or query?
  
   Thanks
   Johan
  
  
  
 
 



Re: Spellchecking and frequency

2010-07-28 Thread dan sutton
Hi Mark,

Thanks for that info looks very interesting, would be great to see your
code. Out of interest did you use the dictionary and the phonetic file? Did
you see better results with both?

In regards to the secondary part to check the corpus for matching
suggestions, would another way to do this is to have an event listener to
listen for commits, and then build the dictionary for matching corpus words
that way, then you avoid the performance hit at query time.

Cheers,
Dan

On Tue, Jul 27, 2010 at 7:04 PM, Mark Holland mark.holl...@zoopla.co.ukwrote:

 Hi,

 I found the suggestions returned from the standard solr spellcheck not to
 be
 that relevant. By contrast, aspell, given the same dictionary and mispelled
 words, gives much more accurate suggestions.

 I therefore wrote an implementation of SolrSpellChecker that wraps jazzy,
 the java aspell library. I also extended the SpellCheckComponent to take
 the
 matrix of suggested words and query the corpus to find the first
 combination
 of suggestions which returned a match. This works well for my use case,
 where term frequency is irrelevant to spelling or scoring.

 I'd like to publish the code in case someone finds it useful (although it's
 a bit crude at the moment and will need a decent tidy up). Would it be
 appropriate to open up a Jira issue for this?

 Cheers,
 ~mark

 On 27 July 2010 09:33, dan sutton danbsut...@gmail.com wrote:

  Hi,
 
  I've recently been looking into Spellchecking in solr, and was struck by
  how
  limited the usefulness of the tool was.
 
  Like most corpora , ours contains lots of different spelling mistakes for
  the same word, so the 'spellcheck.onlyMorePopular' is not really that
  useful
  unless you click on it numerous times.
 
  I was thinking that since most of the time people spell words correctly
 why
  was there no other frequency parameter that could enter into the score?
  i.e.
  something like:
 
  spell_score ~ edit_dist * freq
 
  I'm sure others have come across this issue and was wonding what
  steps/algorithms they have used to overcome these limitations?
 
  Cheers,
  Dan
 



Spellchecking and frequency

2010-07-27 Thread dan sutton
Hi,

I've recently been looking into Spellchecking in solr, and was struck by how
limited the usefulness of the tool was.

Like most corpora , ours contains lots of different spelling mistakes for
the same word, so the 'spellcheck.onlyMorePopular' is not really that useful
unless you click on it numerous times.

I was thinking that since most of the time people spell words correctly why
was there no other frequency parameter that could enter into the score? i.e.
something like:

spell_score ~ edit_dist * freq

I'm sure others have come across this issue and was wonding what
steps/algorithms they have used to overcome these limitations?

Cheers,
Dan


Re: why spellcheck and elevate search components can't work together?

2010-07-19 Thread dan sutton
It needs to be :

   arr name=last-components
 strspellcheck/str
 strelevateListings/str
   /arr

or

   arr name=last-components
 strelevateListings/str
 strspellcheck/str
   /arr

Dan


On Mon, Jul 19, 2010 at 11:14 AM, Chamnap Chhorn chamnapchh...@gmail.comwrote:

 In my solrconfig.xml, I setup this way, but it doesn't work at all. Any one
 can help? it works one without other one.

  searchComponent name=elevateListings
 class=org.apache.solr.handler.component.QueryElevationComponent 
str name=queryFieldTypestring_ci/str
str name=config-fileelevateListings.xml/str
str name=forceElevationfalse/str
  /searchComponent

  requestHandler name=mb_listings class=solr.SearchHandler
lst name=defaults
  str name=echoParamsexplicit/str
  int name=rows20/int
  str name=defTypedismax/str
  str name=qfname^2 full_text^1/str
  str name=fluuid/str
  str name=version2.2/str
  str name=indenton/str
  str name=tie0.1/str
/lst
lst name=appends
  str name=fqtype:Listing/str
/lst
lst name=invariants
  str name=facetfalse/str
/lst
arr name=last-components
  strspellcheck/str
/arr
arr name=last-components
  strelevateListings/str
/arr
  /requestHandler

 If I remove spellcheck component, the elevate component works (the result
 also loads from elevateListings.xml).
 If I remove elevate component,

 http://localhost:8081/solr/select/?q=reddqt=mb_listingsspellcheck=truespellcheck.collate=truedoes
 work.

 Any ideas?

 Chhorn Chamnap
 http://chamnapchhorn.blogspot.com/



Re: Custom comparator

2010-07-16 Thread dan sutton
Apologies I didn't make the requirement clear.

I need to keep the best N documents  - set A (chosen by some criteria - call
them sponsored docs) in front of the natural scoring docs  - set B so that I
return (A,B). The set A docs need to all score above 1% of maxScore in B
else they join the B set, though I don't really know maxScore until I've
looked at all the docs.

I am looking at the QueryElevationComponent for some hints, but any other
suggestions are appreciated.

Many thanks,
Dan

On Fri, Jul 16, 2010 at 12:03 AM, Erick Erickson erickerick...@gmail.comwrote:

 Hmmm, why do you need a custom collector? You can use
 the form of the search that returns a TopDocs, from which you
 can get the max score and the array of ScoreDoc each of which
 has its score. So you can just let the underlying code get the
 top N documents, and throw out any that don't score above
 1%.

 HTH
 Erick

 On Thu, Jul 15, 2010 at 10:02 AM, dan sutton danbsut...@gmail.com wrote:

  Hi,
 
  I have a requirement to have a custom comparator that keep the top N
  documents (chosen by some criteria) but only if their score is more then
  e.g. 1% of the maxScore.
 
  Looking at SolrIndexSearcher.java, I was hoping to have a custom
  TopFieldCollector.java to return these via TopFieldCollector..topDocs,
 but
  I
  can't see how to override that class to provide my own, I think I need to
  do
  this here (TopFieldCollector..topDocs) as I won't know what the maxScore
 is
  until all the docs have been collected and compared?
 
  Does anyone have any suggestions? I'd like to avoid having to do two
  searches.
 
  Many Thanks,
  Dan
 



Custom comparator

2010-07-15 Thread dan sutton
Hi,

I have a requirement to have a custom comparator that keep the top N
documents (chosen by some criteria) but only if their score is more then
e.g. 1% of the maxScore.

Looking at SolrIndexSearcher.java, I was hoping to have a custom
TopFieldCollector.java to return these via TopFieldCollector..topDocs, but I
can't see how to override that class to provide my own, I think I need to do
this here (TopFieldCollector..topDocs) as I won't know what the maxScore is
until all the docs have been collected and compared?

Does anyone have any suggestions? I'd like to avoid having to do two
searches.

Many Thanks,
Dan


Re: Help with highlighting

2010-06-23 Thread dan sutton
It looks to me like a tokenisation issue, all_text content and the query
text will match, but the string fieldtype fields 'might not' and therefore
will not be highlighted.

On Wed, Jun 23, 2010 at 4:40 PM, n...@frameweld.com wrote:

 Here's my request:
 q=ASA+AND+minisite_id%3A36version=1.3json.nl
 =maprows=10start=0wt=jsonhl=truehl.fl=%2Ahl.simple.pre=%3Cspan+class%3D%22hl%22%3Ehl.simple.post=%3C%2Fspan%3Ehl.fragsize=0hl.mergeContiguous=false

 And here's what happened:
 It didn't return results, even when I applied an asterisk for which fields
 highlight. I tried other fields and that didn't work either, however
 all_text is the only one that works. Any other ideas why the other fields
 won't highlight? Thanks.

 -Original Message-
 From: Erik Hatcher erik.hatc...@gmail.com
 Sent: Tuesday, June 22, 2010 9:49pm
 To: solr-user@lucene.apache.org
 Subject: Re: Help with highlighting

 You need to share with us the Solr request you made, any any custom
 request handler settings that might map to.  Chances are you just need
 to twiddle with the highlighter parameters (see wiki for docs) to get
 it to do what you want.

Erik

 On Jun 22, 2010, at 4:42 PM, n...@frameweld.com wrote:

  Hi, I need help with highlighting fields that would match a query.
  So far, my results only highlight if the field is from all_text, and
  I would like it to use other fields. It simply isn't the case if I
  just turn highlighting on. Any ideas why it only applies to
  all_text? Here is my schema:
 
  ?xml version=1.0 ?
 
  schema name=Search version=1.1
types
!-- Basic Solr Bundled Data Types --
 
!-- Rudimentary types --
fieldType name=string class=solr.StrField
  sortMissingLast=true omitNorms=true /
fieldType name=boolean class=solr.BoolField
  sortMissingLast=true omitNorms=true /
 
!-- Non-sortable numeric types --
fieldType name=integer class=solr.IntField
 omitNorms=true/
 
fieldType name=long class=solr.LongField
 omitNorms=true/
fieldType name=float class=solr.FloatField
 omitNorms=true/
fieldType name=double class=solr.DoubleField
 omitNorms=true/
 
!-- Sortable numeric types --
fieldType name=sint class=solr.SortableIntField
  sortMissingLast=true omitNorms=true/
fieldType name=slong class=solr.SortableLongField
  sortMissingLast=true omitNorms=true/
fieldType name=sfloat class=solr.SortableFloatField
  sortMissingLast=true omitNorms=true/
fieldType name=sdouble class=solr.SortableDoubleField
  sortMissingLast=true omitNorms=true/
 
!-- Date/Time types --
 
fieldType name=date class=solr.DateField
  sortMissingLast=true omitNorms=true/
 
!-- Pseudo types --
fieldType name=random class=solr.RandomSortField
  indexed=true /
 
!-- Analyzing types --
fieldType name=text_ws class=solr.TextField
  positionIncrementGap=100
analyzer
tokenizer
 class=solr.WhitespaceTokenizerFactory/
/analyzer
/fieldType
 
 
fieldType name=text class=solr.TextField
  positionIncrementGap=100
analyzer type=index
tokenizer
 class=solr.WhitespaceTokenizerFactory/
!-- filter class=solr.SynonymFilterFactory
  synonyms=index_synonyms.txt ignoreCase=true expand=false/ --
filter
 class=solr.WordDelimiterFilterFactory
  generateWordParts=1 generateNumberParts=1 catenateWords=1
  catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
filter
 class=solr.LowerCaseFilterFactory/
filter
 class=solr.EnglishPorterFilterFactory
  protected=protwords.txt/
filter
 class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
 
analyzer type=query
tokenizer
 class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory
  synonyms=synonyms.txt ignoreCase=true expand=true/
filter
 class=solr.WordDelimiterFilterFactory
  generateWordParts=1 generateNumberParts=1 catenateWords=0
  catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
filter
 class=solr.LowerCaseFilterFactory/
filter
 class=solr.EnglishPorterFilterFactory
  protected=protwords.txt/
filter
 class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
/fieldType
 
 
fieldType name=textTight class=solr.TextField
  positionIncrementGap=100 

fl and nulls

2010-05-26 Thread dan sutton
Hi,

In Solr 1.3 it looks like null fields were returned if requested with the fl
param,, whereas with solr 1.4, nulls are omitted entirely.

Is there a way to have the nulls returned with Solr 1.4 e.g.
...
doc
 field1/
 field2/
/doc

Cheers,
Dan


Dynamic analyzers

2010-05-24 Thread dan sutton
Hi,

I have a requirement to dynamically choose a fieldType to analyze text in
multiple languages. I will know the language (in a separate field) at index
and query time.

I've tried implementing this with a custom UpdateRequestProcessorFactory and
custom DocumentBuilder.toDocument to change the FieldType, but this dosen't
work.

I realize I can have e.g. text_en, text_de,... and dynamically populate this
with a custom UpdateRequestProcessorFactory, but we are worried with all the
languages (lets say 50+) that effectively doing an OR with 50 fields will be
a performance issue, is this true?

Many thanks in advance,
Dan


Custom sorting

2010-05-19 Thread dan sutton
Hi,

I have a requirement to do the following:

For up to the first 10 results (i.e. only on the first page) show
sponsored category ads, in order of bid, but no more than 2 / category,
and only if all sponsored cat' ads are more that min% of the highest
score. e.g. If I had the following:

min% =1


doc score bid  cat_id sponsored
  1   100   x   x 0
  255x   x 0

  3502   2 1
  4202   2 1
  5052   2 1

  6801   1 1
  7701   1 1
  8601   1 1

x = dont care

sorted order would be:

3
4

6
7

1
8
2
5

I'm not sure if this can be implemented with a custom comparator as I
need access to the final score to enforce min%, I'm thinking I'm
probably going to have to implement a subclass of QParserPlugin with a
custom sort. but was wondering if there were alternatives ?

Many thanks in advance.
Dan