Migrating from Lucene 2.9.1 to Solr 1.4.0 - Performance issues under heavy load

2010-08-04 Thread Ophir Adiv
[posted this yesterday in lucene-user mailing list, and got an advice to
post this here instead. excuse me for spamming]

Hi,

I'm currently involved in a project of migrating from Lucene 2.9.1 to Solr
1.4.0.
During stress testing, I encountered this performance problem:
While actual search times in our shards (which are now running Solr) have
not changed, the total time it takes for a query has increased dramatically.
During this performance test, we of course do not modify the indexes.
Our application is sending Solr select queries concurrently to the 8 shards,
using CommonsHttpSolrServer.
I added some timing debug messages, and found that
CommonsHttpSolrServer.java, line 416 takes about 95% of the application's
total search time:
int statusCode = _httpClient.executeMethod(method);

Just to clarify: looking at access logs of the Solr shards, TTLB for a query
might be around 5 ms. (on all shards), but httpClient.executeMethod() for
this query can be much higher - say, 50 ms.
On average, if under light load queries take 12 ms. on average, under heavy
load the take around 22 ms.

Another route we tried to pursue is add the shards=shard1,shard2,…
parameter to the query instead of doing this ourselves, but this doesn't
seem to work due to an NPE caused by QueryComponent.returnFields(), line
553:
if (returnScores  sdoc.score != null) {

where sdoc is null. I saw there is a null check on trunk, but since we're
currently using Solr 1.4.0's ready-made WAR file, I didn't see an easy way
around this.
Note: we're using a custom query component which extends QueryComponent, but
debugging this, I saw nothing wrong with the results at this point in the
code.

Our previous code used HTTP in a different manner:
For each request, we created a new
sun.net.www.protocol.http.HttpURLConnection, and called its getInputStream()
method.
Under the same load as the new application, the old application does not
encounter the delays mentioned above.

Our current code is initializing CommonsHttpSolrServer for each shard this
way:
MultiThreadedHttpConnectionManager httpConnectionManager = new
MultiThreadedHttpConnectionManager();
httpConnectionManager.getParams().setTcpNoDelay(true);
httpConnectionManager.getParams().setMaxTotalConnections(1024);
httpConnectionManager.getParams().setStaleCheckingEnabled(false);
HttpClient httpClient = new HttpClient();
HttpClientParams params = new HttpClientParams();
params.setCookiePolicy(CookiePolicy.IGNORE_COOKIES);
params.setAuthenticationPreemptive(false);
params.setContentCharset(StringConstants.UTF8);
httpClient.setParams(params);
httpClient.setHttpConnectionManager(httpConnectionManager);

and passing the new HttpClient to the Solr Server:
solrServer = new CommonsHttpSolrServer(coreUrl, httpClient);

We tried two different ways - one with a single
MultiThreadedHttpConnectionManager and HttpClient for all the SolrServer's,
and the other with a new MultiThreadedHttpConnectionManager and HttpClient
for each SolrServer.
Both tries yielded similar performance results.
Also tried to give setMaxTotalConnections() a much higher connections number
(1,000,000) - didn't have an effect.

One last thing - to answer Lance's question about this being an apples to
apples comparison (in lucene-user thread) - yes, our main goal in this
project is to do things as close to the previous version as possible.
This way we can monitor that behavior (both quality and performance) remains
similar, release this version, and then move forward to improve things.
Of course, there are some changes, but I believe we are indeed measuring the
complete flow on both apps, and that both apps are returning the same fields
via HTTP.

Would love to hear what you think about this. TIA,
Ophir


Setting up apache solr in eclipse with Tomcat

2010-08-04 Thread Hando420

I would like to setup apache solr in eclipse using tomcat. It is easy to
setup with jetty but with tomcat it doesn't run solr on runtime. Anyone has
done this before?

Hando
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Setting-up-apache-solr-in-eclipse-with-Tomcat-tp1021673p1021673.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Migrating from Lucene 2.9.1 to Solr 1.4.0 - Performance issues under heavy load

2010-08-04 Thread Peter Karich
Ophir,

this sounds a bit strange:

 CommonsHttpSolrServer.java, line 416 takes about 95% of the application's 
 total search time

Is this only for heavy load?

Some other things:

 * with lucene you accessed the indices with MultiSearcher in a LAN, right?
 * did you look into the logs of the servers, is there something
wrong/delayed?
 * did you enable gzip compression for your servers or even the binary
writer/parser for your solr clients?

CommonsHttpSolrServer server = ...
server.setRequestWriter(new BinaryRequestWriter());
server.setParser(new BinaryResponseParser());

Regards,
Peter.

 [posted this yesterday in lucene-user mailing list, and got an advice to
 post this here instead. excuse me for spamming]

 Hi,

 I'm currently involved in a project of migrating from Lucene 2.9.1 to Solr
 1.4.0.
 During stress testing, I encountered this performance problem:
 While actual search times in our shards (which are now running Solr) have
 not changed, the total time it takes for a query has increased dramatically.
 During this performance test, we of course do not modify the indexes.
 Our application is sending Solr select queries concurrently to the 8 shards,
 using CommonsHttpSolrServer.
 I added some timing debug messages, and found that
 CommonsHttpSolrServer.java, line 416 takes about 95% of the application's
 total search time:
 int statusCode = _httpClient.executeMethod(method);

 Just to clarify: looking at access logs of the Solr shards, TTLB for a query
 might be around 5 ms. (on all shards), but httpClient.executeMethod() for
 this query can be much higher - say, 50 ms.
 On average, if under light load queries take 12 ms. on average, under heavy
 load the take around 22 ms.

 Another route we tried to pursue is add the shards=shard1,shard2,…
 parameter to the query instead of doing this ourselves, but this doesn't
 seem to work due to an NPE caused by QueryComponent.returnFields(), line
 553:
 if (returnScores  sdoc.score != null) {

 where sdoc is null. I saw there is a null check on trunk, but since we're
 currently using Solr 1.4.0's ready-made WAR file, I didn't see an easy way
 around this.
 Note: we're using a custom query component which extends QueryComponent, but
 debugging this, I saw nothing wrong with the results at this point in the
 code.

 Our previous code used HTTP in a different manner:
 For each request, we created a new
 sun.net.www.protocol.http.HttpURLConnection, and called its getInputStream()
 method.
 Under the same load as the new application, the old application does not
 encounter the delays mentioned above.

 Our current code is initializing CommonsHttpSolrServer for each shard this
 way:
 MultiThreadedHttpConnectionManager httpConnectionManager = new
 MultiThreadedHttpConnectionManager();
 httpConnectionManager.getParams().setTcpNoDelay(true);
 httpConnectionManager.getParams().setMaxTotalConnections(1024);
 httpConnectionManager.getParams().setStaleCheckingEnabled(false);
 HttpClient httpClient = new HttpClient();
 HttpClientParams params = new HttpClientParams();
 params.setCookiePolicy(CookiePolicy.IGNORE_COOKIES);
 params.setAuthenticationPreemptive(false);
 params.setContentCharset(StringConstants.UTF8);
 httpClient.setParams(params);
 httpClient.setHttpConnectionManager(httpConnectionManager);

 and passing the new HttpClient to the Solr Server:
 solrServer = new CommonsHttpSolrServer(coreUrl, httpClient);

 We tried two different ways - one with a single
 MultiThreadedHttpConnectionManager and HttpClient for all the SolrServer's,
 and the other with a new MultiThreadedHttpConnectionManager and HttpClient
 for each SolrServer.
 Both tries yielded similar performance results.
 Also tried to give setMaxTotalConnections() a much higher connections number
 (1,000,000) - didn't have an effect.

 One last thing - to answer Lance's question about this being an apples to
 apples comparison (in lucene-user thread) - yes, our main goal in this
 project is to do things as close to the previous version as possible.
 This way we can monitor that behavior (both quality and performance) remains
 similar, release this version, and then move forward to improve things.
 Of course, there are some changes, but I believe we are indeed measuring the
 complete flow on both apps, and that both apps are returning the same fields
 via HTTP.

 Would love to hear what you think about this. TIA,
 Ophir

   


-- 
http://karussell.wordpress.com/



RE: wildcard and proximity searches

2010-08-04 Thread Frederico Azeiteiro
Thanks for you ideia.

At this point I'm logging each query time. My ideia is to divide my
queries into normal queries and heavy queries. I have some heavy
queries with 1 minute or 2mintes to get results. But they have for
instance (*word1* AND *word2* AND word3*). I guess that this will be
always slower (could be a little faster with
ReversedWildcardFilterFactory) but they never be ready in a few
seconds. For now, I just increased the timeout for those :) (using
solrnet).

My priority at the moment is the queries phrases like word1* word2*
word3. After this is working, I'll try to optimize the heavy queries

Frederico


-Original Message-
From: Jonathan Rochkind [mailto:rochk...@jhu.edu] 
Sent: quarta-feira, 4 de Agosto de 2010 01:41
To: solr-user@lucene.apache.org
Subject: Re: wildcard and proximity searches

Frederico Azeiteiro wrote:

 But it is unusual to use both leading and trailing * operator. Why
are
   
 you doing this?

 Yes I know, but I have a few queries that need this. I'll try the
 ReversedWildcardFilterFactory. 


   

ReverseWildcardFilter will help leading wildcard, but will not help 
trying to use a query with BOTH leading and trailing wildcard. it'll 
still be slow. Solr/lucene isn't good at that; I didn't even know Solr 
would do it at all in fact.

If you really needed to do that, the way to play to solr/lucene's way of

doing things, would be to have a field where you actually index each 
_character_ as a seperate token. Then leading and trailing wildcard 
search is basically reduced to a phrase search, but where the words 
are actually characters.   But then you're going to get an index where 
pretty much every token belongs to every document, which Solr isn't that

great at either, but then you can apply commongram stuff on top to 
help that out a lot too. Not quite sure what the end result will be, 
I've never tried it.  I'd only use that weird special char as token 
field for queries that actually required leading and trailing wildcards.

Figuring out how to set up your analyzers, and what (if anything) you're

going to have to do client-app-side to transform the user's query into 
something that'll end up searching like a phrase search where each 
'word' is a character is left as an exersize for the reader. :)  

Jonathan


AW: Migrating from Lucene 2.9.1 to Solr 1.4.0 - Performance issues under heavy load

2010-08-04 Thread Bastian Spitzer
Im not sure if i understand your problem, but basicly it isnt Solr vs Lucene 
but HttpURLConnection vs Solrj's 
CommonsHttpSolrServer, because Server Query Times havent changed at all from 
what u say?

Why arent you querying the Server the same way you did before when u want to 
compare solr to lucene only?

-Ursprüngliche Nachricht-
Von: Ophir Adiv [mailto:firt...@gmail.com] 
Gesendet: Mittwoch, 4. August 2010 09:11
An: solr-user@lucene.apache.org
Betreff: Migrating from Lucene 2.9.1 to Solr 1.4.0 - Performance issues under 
heavy load

[posted this yesterday in lucene-user mailing list, and got an advice to post 
this here instead. excuse me for spamming]

Hi,

I'm currently involved in a project of migrating from Lucene 2.9.1 to Solr 
1.4.0.
During stress testing, I encountered this performance problem:
While actual search times in our shards (which are now running Solr) have not 
changed, the total time it takes for a query has increased dramatically.
During this performance test, we of course do not modify the indexes.
Our application is sending Solr select queries concurrently to the 8 shards, 
using CommonsHttpSolrServer.
I added some timing debug messages, and found that CommonsHttpSolrServer.java, 
line 416 takes about 95% of the application's total search time:
int statusCode = _httpClient.executeMethod(method);

Just to clarify: looking at access logs of the Solr shards, TTLB for a query 
might be around 5 ms. (on all shards), but httpClient.executeMethod() for this 
query can be much higher - say, 50 ms.
On average, if under light load queries take 12 ms. on average, under heavy 
load the take around 22 ms.

Another route we tried to pursue is add the shards=shard1,shard2,...
parameter to the query instead of doing this ourselves, but this doesn't seem 
to work due to an NPE caused by QueryComponent.returnFields(), line
553:
if (returnScores  sdoc.score != null) {

where sdoc is null. I saw there is a null check on trunk, but since we're 
currently using Solr 1.4.0's ready-made WAR file, I didn't see an easy way 
around this.
Note: we're using a custom query component which extends QueryComponent, but 
debugging this, I saw nothing wrong with the results at this point in the code.

Our previous code used HTTP in a different manner:
For each request, we created a new
sun.net.www.protocol.http.HttpURLConnection, and called its getInputStream() 
method.
Under the same load as the new application, the old application does not 
encounter the delays mentioned above.

Our current code is initializing CommonsHttpSolrServer for each shard this
way:
MultiThreadedHttpConnectionManager httpConnectionManager = new 
MultiThreadedHttpConnectionManager();
httpConnectionManager.getParams().setTcpNoDelay(true);
httpConnectionManager.getParams().setMaxTotalConnections(1024);
httpConnectionManager.getParams().setStaleCheckingEnabled(false);
HttpClient httpClient = new HttpClient();
HttpClientParams params = new HttpClientParams();
params.setCookiePolicy(CookiePolicy.IGNORE_COOKIES);
params.setAuthenticationPreemptive(false);
params.setContentCharset(StringConstants.UTF8);
httpClient.setParams(params);
httpClient.setHttpConnectionManager(httpConnectionManager);

and passing the new HttpClient to the Solr Server:
solrServer = new CommonsHttpSolrServer(coreUrl, httpClient);

We tried two different ways - one with a single 
MultiThreadedHttpConnectionManager and HttpClient for all the SolrServer's, and 
the other with a new MultiThreadedHttpConnectionManager and HttpClient for each 
SolrServer.
Both tries yielded similar performance results.
Also tried to give setMaxTotalConnections() a much higher connections number
(1,000,000) - didn't have an effect.

One last thing - to answer Lance's question about this being an apples to 
apples comparison (in lucene-user thread) - yes, our main goal in this project 
is to do things as close to the previous version as possible.
This way we can monitor that behavior (both quality and performance) remains 
similar, release this version, and then move forward to improve things.
Of course, there are some changes, but I believe we are indeed measuring the 
complete flow on both apps, and that both apps are returning the same fields 
via HTTP.

Would love to hear what you think about this. TIA, Ophir


Re: Migrating from Lucene 2.9.1 to Solr 1.4.0 - Performance issues under heavy load

2010-08-04 Thread Ophir Adiv
On Wed, Aug 4, 2010 at 10:50 AM, Peter Karich peat...@yahoo.de wrote:

 Ophir,

 this sounds a bit strange:

  CommonsHttpSolrServer.java, line 416 takes about 95% of the application's
 total search time

 Is this only for heavy load?


I think this makes sense, since the hard work is done by Solr - once the
application gets the search results from the shards, it does a bit of
manipulations on them (combine, filter, ...), but these are easy tasks.

Some other things:


  * with lucene you accessed the indices with MultiSearcher in a LAN, right?


No, each shard was run under a different tomcat instance, and each shard was
accessed by HTTP calls (the same way we're trying to work now with Solr)


  * did you look into the logs of the servers, is there something
 wrong/delayed?


Everything seems peachy... logs are clean of errors/warnings and the likes


  * did you enable gzip compression for your servers or even the binary
 writer/parser for your solr clients?


We're running our application (and Solr) under Tomcat. We do not enable
compression (the configuration remained similar to our old application's
configuration)
Tried using XMLResponseParser instead of BinaryResponseParser - hardly
affected run times.

Thanks for the ideas,
Ophir

CommonsHttpSolrServer server = ...
 server.setRequestWriter(new BinaryRequestWriter());
 server.setParser(new BinaryResponseParser());

 Regards,
 Peter.

  [posted this yesterday in lucene-user mailing list, and got an advice to
  post this here instead. excuse me for spamming]
 
  Hi,
 
  I'm currently involved in a project of migrating from Lucene 2.9.1 to
 Solr
  1.4.0.
  During stress testing, I encountered this performance problem:
  While actual search times in our shards (which are now running Solr) have
  not changed, the total time it takes for a query has increased
 dramatically.
  During this performance test, we of course do not modify the indexes.
  Our application is sending Solr select queries concurrently to the 8
 shards,
  using CommonsHttpSolrServer.
  I added some timing debug messages, and found that
  CommonsHttpSolrServer.java, line 416 takes about 95% of the application's
  total search time:
  int statusCode = _httpClient.executeMethod(method);
 
  Just to clarify: looking at access logs of the Solr shards, TTLB for a
 query
  might be around 5 ms. (on all shards), but httpClient.executeMethod() for
  this query can be much higher - say, 50 ms.
  On average, if under light load queries take 12 ms. on average, under
 heavy
  load the take around 22 ms.
 
  Another route we tried to pursue is add the shards=shard1,shard2,…
  parameter to the query instead of doing this ourselves, but this doesn't
  seem to work due to an NPE caused by QueryComponent.returnFields(), line
  553:
  if (returnScores  sdoc.score != null) {
 
  where sdoc is null. I saw there is a null check on trunk, but since we're
  currently using Solr 1.4.0's ready-made WAR file, I didn't see an easy
 way
  around this.
  Note: we're using a custom query component which extends QueryComponent,
 but
  debugging this, I saw nothing wrong with the results at this point in the
  code.
 
  Our previous code used HTTP in a different manner:
  For each request, we created a new
  sun.net.www.protocol.http.HttpURLConnection, and called its
 getInputStream()
  method.
  Under the same load as the new application, the old application does not
  encounter the delays mentioned above.
 
  Our current code is initializing CommonsHttpSolrServer for each shard
 this
  way:
  MultiThreadedHttpConnectionManager httpConnectionManager = new
  MultiThreadedHttpConnectionManager();
  httpConnectionManager.getParams().setTcpNoDelay(true);
  httpConnectionManager.getParams().setMaxTotalConnections(1024);
  httpConnectionManager.getParams().setStaleCheckingEnabled(false);
  HttpClient httpClient = new HttpClient();
  HttpClientParams params = new HttpClientParams();
  params.setCookiePolicy(CookiePolicy.IGNORE_COOKIES);
  params.setAuthenticationPreemptive(false);
  params.setContentCharset(StringConstants.UTF8);
  httpClient.setParams(params);
  httpClient.setHttpConnectionManager(httpConnectionManager);
 
  and passing the new HttpClient to the Solr Server:
  solrServer = new CommonsHttpSolrServer(coreUrl, httpClient);
 
  We tried two different ways - one with a single
  MultiThreadedHttpConnectionManager and HttpClient for all the
 SolrServer's,
  and the other with a new MultiThreadedHttpConnectionManager and
 HttpClient
  for each SolrServer.
  Both tries yielded similar performance results.
  Also tried to give setMaxTotalConnections() a much higher connections
 number
  (1,000,000) - didn't have an effect.
 
  One last thing - to answer Lance's question about this being an apples
 to
  apples comparison (in lucene-user thread) - yes, our main goal in this
  project is to do things as close to the previous version as possible.
  This way we can monitor 

Is there a better for solor server side loadbalance?

2010-08-04 Thread Chengyang
The default solr solution is client side loadbalance.
Is there a solution provide the server side loadbalance?



Support loading queries from external files in QuerySenderListener

2010-08-04 Thread Stanislaw
Hi all!
I cant load my custom queries from the external file, as written here:
https://issues.apache.org/jira/browse/SOLR-784

This option is seems to be not implemented in current version 1.4.1 of Solr.
It was deleted or it comes first with new version?

regards,
Stanislaw


Re: Date faceting

2010-08-04 Thread Koji Sekiguchi

(10/08/04 19:42), Eric Grobler wrote:

Hi Solr community,

How do I facet on timestamp for example?

I tried something like this - but I get no result.

facet=true
facet.date=timestamp
f.facet.timestamp.date.start=2010-01-01T00:00:00Z
f.facet.timestamp.date.end=2010-12-31T00:00:00Z
f.facet.timestamp.date.gap=+1HOUR
f.facet.timestamp.date.hardend=true

Thanks
ericz

   

Your parameters are not correct. Try:

facet=true
facet.date=timestamp
facet.date.start=2010-01-01T00:00:00Z
facet.date.end=2010-12-31T00:00:00Z
facet.date.gap=+1HOUR
facet.date.hardend=true

If you want to use per-field override feature, you can set them:

f.timestamp.facet.date.start=2010-01-01T00:00:00Z
f.timestamp.facet.date.end=2010-12-31T00:00:00Z
f.timestamp.facet.date.gap=+1HOUR
f.timestamp.facet.date.hardend=true

Koji

--
http://www.rondhuit.com/en/



Re: Best solution to avoiding multiple query requests

2010-08-04 Thread kenf_nc

Not sure the processing would be any faster than just querying again, but, in
your original result set the first doc that has a field value that matches a
to 10 facet, will be the number 1 item if you fq on that facet value. So you
don't need to query it again. You would only need to query those that aren't
in your result set.
ie:
   q=dogfacet=onfacet.field=foo
results 10 docs
   id=1, foo=A
   id=2, foo=A
   id=3, foo=B
   id=4, foo=C
   id=5, foo=B
   id=6, foo=A
   id=7, foo=Z
   id=8, foo=T
   id=9, foo=B
   id=10, foo=J

If your facet results top 10 were (A, B, T, J, D, X, Q, O, P, I)
you already have the number 1 for A (id 1), B (id 3), T (id 8) and J (id 10)
in your very first query. You only need to query D, X, Q, O, P, I. 

If your first query returned 100 instead of 10 you may even have more of the
top 10 represented. Again, the processing steps you would need to do may not
be any faster than re-querying, it depends on the speed of your index and
network etc.

I would think that if your second query was
q=dogfq=(foo=A OR foo=B OR foo=T...etc) then you have even a greater chance
of having the number 1 result for each of the top 10 in just your second
query.

  
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Best-solution-to-avoiding-multiple-query-requests-tp1020886p1022397.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Best solution to avoiding multiple query requests

2010-08-04 Thread Geert-Jan Brits
Field Collapsing (currently as patch) is exactly what you're looking for
imo.

http://wiki.apache.org/solr/FieldCollapsing

http://wiki.apache.org/solr/FieldCollapsingGeert-Jan


2010/8/4 Ken Krugler kkrugler_li...@transpac.com

 Hi all,

 I've got a situation where the key result from an initial search request
 (let's say for dog) is the list of values from a faceted field, sorted by
 hit count.

 For the top 10 of these faceted field values, I need to get the top hit for
 the target request (dog) restricted to that value for the faceted field.

 Currently this is 11 total requests, of which the 10 requests following the
 initial query can be made in parallel. But that's still a lot of requests.

 So my questions are:

 1. Is there any magic query to handle this with Solr as-is?

 2. if not, is the best solution to create my own request handler?

 3. And in that case, any input/tips on developing this type of custom
 request handler?

 Thanks,

 -- Ken


 
 Ken Krugler
 +1 530-210-6378
 http://bixolabs.com
 e l a s t i c   w e b   m i n i n g







Re: Date faceting

2010-08-04 Thread Eric Grobler
Thanks Koji,

It works :-)

Have a nice day.

regards
ericz

On Wed, Aug 4, 2010 at 12:08 PM, Koji Sekiguchi k...@r.email.ne.jp wrote:

 (10/08/04 19:42), Eric Grobler wrote:

 Hi Solr community,

 How do I facet on timestamp for example?

 I tried something like this - but I get no result.

 facet=true
 facet.date=timestamp
 f.facet.timestamp.date.start=2010-01-01T00:00:00Z
 f.facet.timestamp.date.end=2010-12-31T00:00:00Z
 f.facet.timestamp.date.gap=+1HOUR
 f.facet.timestamp.date.hardend=true

 Thanks
 ericz



 Your parameters are not correct. Try:


 facet=true
 facet.date=timestamp
 facet.date.start=2010-01-01T00:00:00Z
 facet.date.end=2010-12-31T00:00:00Z
 facet.date.gap=+1HOUR
 facet.date.hardend=true

 If you want to use per-field override feature, you can set them:

 f.timestamp.facet.date.start=2010-01-01T00:00:00Z
 f.timestamp.facet.date.end=2010-12-31T00:00:00Z
 f.timestamp.facet.date.gap=+1HOUR
 f.timestamp.facet.date.hardend=true

 Koji

 --
 http://www.rondhuit.com/en/




Re: Multi word synomyms

2010-08-04 Thread Qwerky

It would be nice if you could configure some kind of filter to be processed
before the query string is passed to the parser. The QueryComponent class
seems a nice place for this; a filter could be run against the raw query and
ResponseBuilder's queryString value could be modified before the QParser is
created.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Multi-word-synomyms-tp1019722p1022461.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Setting up apache solr in eclipse with Tomcat

2010-08-04 Thread jayendra patil
Have got solr working in the Eclipse and deployed on Tomcat through eclipse
plugin.
The Crude approach, was to

   1. Import the Solr war into Eclipse which will be imported as a web
   project and can be deployed on tomcat.
   2. Add multiple source folders to the Project, linked to the checked out
   solr source code. e.g. entry in .project file
   linkedResources
   link
   namecommon/name
   type2/type
   locationD:/Solr/solr/src/common/location
   /link
   .
   /linkedResources
   3. Remove the solr jars from the web-inf lib, so that changes on the
   project sources can be deployed and debugged.

Let me know if you get a better approach.

On Wed, Aug 4, 2010 at 3:49 AM, Hando420 hando...@gmail.com wrote:


 I would like to setup apache solr in eclipse using tomcat. It is easy to
 setup with jetty but with tomcat it doesn't run solr on runtime. Anyone has
 done this before?

 Hando
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Setting-up-apache-solr-in-eclipse-with-Tomcat-tp1021673p1021673.html
 Sent from the Solr - User mailing list archive at Nabble.com.



analysis tool vs. reality

2010-08-04 Thread Justin Lolofie
Erik: Yes, I did re-index if that means adding the document again.
Here are the exact steps I took:

1. analysis.jsp ABC12 does NOT match title ABC12 (however, ABC or 12 does)
2. changed schema.xml WordDelimeterFilterFactory catenate-all
3. restarted tomcat
4. deleted the document with title ABC12
5. added the document with title ABC12
6. query ABC12 does NOT result in the document with title ABC12
7. analysis.jsp ABC12 DOES match that document now

Is there any way to see, given an ID, how something is indexed internally?

Lance: I understand the index/query sections of analysis.jsp. However,
it operates on text that you enter into the form, not on actual index
data. Since all my documents have a unique ID, I'd like to supply an
ID and a query, and get back the same index/query sections- using
whats actually in the index.


-- Forwarded message --
From: Erik Hatcher erik.hatc...@gmail.com
To: solr-user@lucene.apache.org
Date: Tue, 3 Aug 2010 22:43:17 -0400
Subject: Re: analysis tool vs. reality
Did you reindex after changing the schema?


On Aug 3, 2010, at 7:35 PM, Justin Lolofie wrote:

Hi Erik, thank you for replying. So, turning on debugQuery shows
information about how the query is processed- is there a way to see
how things are stored internally in the index?

My query is ABC12. There is a document who's title field is
ABC12. However, I can only get it to match if I search for ABC or
12. This was also true in the analysis tool up until recently.
However, I changed schema.xml and turned on catenate-all in
WordDelimterFilterFactory for title fieldtype. Now, in the analysis
tool ABC12 matches ABC12. However, when doing an actual query, it
does not match.

Thank you for any help,
Justin


-- Forwarded message --
From: Erik Hatcher erik.hatc...@gmail.com
To: solr-user@lucene.apache.org
Date: Tue, 3 Aug 2010 16:50:06 -0400
Subject: Re: analysis tool vs. reality
The analysis tool is merely that, but during querying there is also a
query parser involved.  Adding debugQuery=true to your request will
give you the parsed query in the response offering insight into what
might be going on.   Could be lots of things, like not querying the
fields you think you are to a misunderstanding about some text not
being analyzed (like wildcard clauses).

 Erik

On Aug 3, 2010, at 4:43 PM, Justin Lolofie wrote:

  Hello,

  I have found the analysis tool in the admin page to be very useful in
  understanding my schema. I've made changes to my schema so that a
  particular case I'm looking at matches properly. I restarted solr,
  deleted the document from the index, and added it again. But still,
  when I do a query, the document does not get returned in the results.

  Does anyone have any tips for debugging this sort of issue? What is
  different between what I see in analysis tool and new documents added
  to the index?

  Thanks,
  Justin


No group by? looking for an alternative.

2010-08-04 Thread Mickael Magniez

Hello,

I'm dealing with a problem since few days  : I want to index and search
shoes, each shoe can have several size and colors, at different prices.

So, what i want is : when I search for Converse, i want to retrieve one
shoe per model, i-e one color and one size, but having colors and sizes in
facets.

My first idea was to copy SQL behaviour with a SELECT * FROM solr WHERE
text CONTAINS 'converse' GROUP BY model. 
But no group by in Solr :(. I try with FieldCollapsing, but have many bugs
(NullPointerException).

Then I try with multivalued facets  : 
field name=size type=string indexed=true stored=true
multiValued=true/
field name=color type=string indexed=true stored=true
multiValued=true/

It's nearly working, but i have a problem : when i filtered on red shoes, in
the size facet, I also have sizes which are not available in red. I don't
find any solutions to filter multivalued facet with value of another
multivalued facet.

So if anyone have an idea for solving this problem...



Mickael.

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/No-group-by-looking-for-an-alternative-tp1022738p1022738.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: analysis tool vs. reality

2010-08-04 Thread Robert Muir
I think I agree with Justin here, I think the way analysis tool highlights
'matches' is extremely misleading, especially considering it completely
ignores queryparsing.

it would be better if it put your text in a memoryindex and actually parsed
the query w/ queryparser, ran it, and used the highlighter to try to show
any matches.

On Wed, Aug 4, 2010 at 10:14 AM, Justin Lolofie jta...@gmail.com wrote:

 Erik: Yes, I did re-index if that means adding the document again.
 Here are the exact steps I took:

 1. analysis.jsp ABC12 does NOT match title ABC12 (however, ABC or 12
 does)
 2. changed schema.xml WordDelimeterFilterFactory catenate-all
 3. restarted tomcat
 4. deleted the document with title ABC12
 5. added the document with title ABC12
 6. query ABC12 does NOT result in the document with title ABC12
 7. analysis.jsp ABC12 DOES match that document now

 Is there any way to see, given an ID, how something is indexed internally?

 Lance: I understand the index/query sections of analysis.jsp. However,
 it operates on text that you enter into the form, not on actual index
 data. Since all my documents have a unique ID, I'd like to supply an
 ID and a query, and get back the same index/query sections- using
 whats actually in the index.


 -- Forwarded message --
 From: Erik Hatcher erik.hatc...@gmail.com
 To: solr-user@lucene.apache.org
 Date: Tue, 3 Aug 2010 22:43:17 -0400
 Subject: Re: analysis tool vs. reality
 Did you reindex after changing the schema?


 On Aug 3, 2010, at 7:35 PM, Justin Lolofie wrote:

Hi Erik, thank you for replying. So, turning on debugQuery shows
information about how the query is processed- is there a way to see
how things are stored internally in the index?

My query is ABC12. There is a document who's title field is
ABC12. However, I can only get it to match if I search for ABC or
12. This was also true in the analysis tool up until recently.
However, I changed schema.xml and turned on catenate-all in
WordDelimterFilterFactory for title fieldtype. Now, in the analysis
tool ABC12 matches ABC12. However, when doing an actual query, it
does not match.

Thank you for any help,
Justin


-- Forwarded message --
From: Erik Hatcher erik.hatc...@gmail.com
To: solr-user@lucene.apache.org
Date: Tue, 3 Aug 2010 16:50:06 -0400
Subject: Re: analysis tool vs. reality
The analysis tool is merely that, but during querying there is also a
query parser involved.  Adding debugQuery=true to your request will
give you the parsed query in the response offering insight into what
might be going on.   Could be lots of things, like not querying the
fields you think you are to a misunderstanding about some text not
being analyzed (like wildcard clauses).

 Erik

On Aug 3, 2010, at 4:43 PM, Justin Lolofie wrote:

  Hello,

  I have found the analysis tool in the admin page to be very useful in
  understanding my schema. I've made changes to my schema so that a
  particular case I'm looking at matches properly. I restarted solr,
  deleted the document from the index, and added it again. But still,
  when I do a query, the document does not get returned in the results.

  Does anyone have any tips for debugging this sort of issue? What is
  different between what I see in analysis tool and new documents added
  to the index?

  Thanks,
   Justin




-- 
Robert Muir
rcm...@gmail.com


analysis tool vs. reality

2010-08-04 Thread Justin Lolofie
Wow, I got to work this morning and my query results now include the
'ABC12' document. I'm not sure what that means. Either I made a
mistake in the process I described in the last email (I dont think
this is the case) or there is some kind of caching of query results
going on that doesnt get flushed on a restart of tomcat.




Erik: Yes, I did re-index if that means adding the document again.
Here are the exact steps I took:

1. analysis.jsp ABC12 does NOT match title ABC12 (however, ABC or 12 does)
2. changed schema.xml WordDelimeterFilterFactory catenate-all
3. restarted tomcat
4. deleted the document with title ABC12
5. added the document with title ABC12
6. query ABC12 does NOT result in the document with title ABC12
7. analysis.jsp ABC12 DOES match that document now

Is there any way to see, given an ID, how something is indexed internally?

Lance: I understand the index/query sections of analysis.jsp. However,
it operates on text that you enter into the form, not on actual index
data. Since all my documents have a unique ID, I'd like to supply an
ID and a query, and get back the same index/query sections- using
whats actually in the index.


-- Forwarded message --
From: Erik Hatcher erik.hatc...@gmail.com
To: solr-user@lucene.apache.org
Date: Tue, 3 Aug 2010 22:43:17 -0400
Subject: Re: analysis tool vs. reality
Did you reindex after changing the schema?


On Aug 3, 2010, at 7:35 PM, Justin Lolofie wrote:

Hi Erik, thank you for replying. So, turning on debugQuery shows
information about how the query is processed- is there a way to see
how things are stored internally in the index?

My query is ABC12. There is a document who's title field is
ABC12. However, I can only get it to match if I search for ABC or
12. This was also true in the analysis tool up until recently.
However, I changed schema.xml and turned on catenate-all in
WordDelimterFilterFactory for title fieldtype. Now, in the analysis
tool ABC12 matches ABC12. However, when doing an actual query, it
does not match.

Thank you for any help,
Justin


-- Forwarded message --
From: Erik Hatcher erik.hatc...@gmail.com
To: solr-user@lucene.apache.org
Date: Tue, 3 Aug 2010 16:50:06 -0400
Subject: Re: analysis tool vs. reality
The analysis tool is merely that, but during querying there is also a
query parser involved.  Adding debugQuery=true to your request will
give you the parsed query in the response offering insight into what
might be going on.   Could be lots of things, like not querying the
fields you think you are to a misunderstanding about some text not
being analyzed (like wildcard clauses).

 Erik

On Aug 3, 2010, at 4:43 PM, Justin Lolofie wrote:

  Hello,

  I have found the analysis tool in the admin page to be very useful in
  understanding my schema. I've made changes to my schema so that a
  particular case I'm looking at matches properly. I restarted solr,
  deleted the document from the index, and added it again. But still,
  when I do a query, the document does not get returned in the results.

  Does anyone have any tips for debugging this sort of issue? What is
  different between what I see in analysis tool and new documents added
  to the index?

  Thanks,
  Justin


Re: Support loading queries from external files in QuerySenderListener

2010-08-04 Thread Shalin Shekhar Mangar
On Wed, Aug 4, 2010 at 3:27 PM, Stanislaw solrgeschic...@googlemail.comwrote:

 Hi all!
 I cant load my custom queries from the external file, as written here:
 https://issues.apache.org/jira/browse/SOLR-784

 This option is seems to be not implemented in current version 1.4.1 of
 Solr.
 It was deleted or it comes first with new version?


That patch was never committed so it is not available in any release.

-- 
Regards,
Shalin Shekhar Mangar.


Re: analysis tool vs. reality

2010-08-04 Thread Shalin Shekhar Mangar
On Wed, Aug 4, 2010 at 7:52 PM, Robert Muir rcm...@gmail.com wrote:

 I think I agree with Justin here, I think the way analysis tool highlights
 'matches' is extremely misleading, especially considering it completely
 ignores queryparsing.

 it would be better if it put your text in a memoryindex and actually parsed
 the query w/ queryparser, ran it, and used the highlighter to try to show
 any matches.


+1

-- 
Regards,
Shalin Shekhar Mangar.


Re: Is there a better for solor server side loadbalance?

2010-08-04 Thread Shalin Shekhar Mangar
2010/8/4 Chengyang atreey...@163.com

 The default solr solution is client side loadbalance.
 Is there a solution provide the server side loadbalance?


No. Most of us stick a HTTP load balancer in front of multiple Solr servers.

-- 
Regards,
Shalin Shekhar Mangar.


DIH and Cassandra

2010-08-04 Thread Mark
Is it possible to use DIH with Cassandra either out of the box or with 
something more custom? Thanks


Re: enhancing auto complete

2010-08-04 Thread Avlesh Singh
I preferred to answer this question privately earlier. But I have received
innumerable requests to unveil the architecture. For the benefit of all, I
am posting it here (after hiding as much info as I should, in my company's
interest).

The context: Auto-suggest feature on http://askme.in

*Solr setup*: Underneath are some of the salient features -

   1. TermsComponent is NOT used.
   2. The index is made up of 4 fields of the following types -
   autocomplete_full, autocomplete_token, string and text.
   3. autocomplete_full uses KeywordTokenizerFactory and
   EdgeNGramFilterFactory. autocomplete_token uses WhitespaceTokenizerFactory
   and EdgeNGramFilterFactory. Both of these are Solr text fields with standard
   filters like LowerCaseFilterFactory etc applied during querying and
   indexing.
   4. Standard DataImportHandler and a bunch of sql procedures are used to
   derive all suggestable phrases from the system and index them in the above
   mentioned fields.

*Controller setup*: The controller (to handle suggest queries) is a typical
JAVA servlet using Solr as its backend (connecting via solrj). Based on the
incoming query string, a lucene query is created. It is BooleanQuery
comprising of TermQuery across all the above mentioned fields. The boost
factor to each of these term queries would determine (to an extent) what
kind of matches do you prefer to show up first. JSON is used as the data
exchange format.

*Frontend setup*: It is a home grown JS to address some specific use cases
of the project in question. One simple exercise with Firebug will spill all
the beans. However, I strongly recommend using jQuery to build (and extend)
the UI component.

Any help beyond this is available, but off the list.

Cheers
Avlesh
@avlesh http://twitter.com/avlesh | http://webklipper.com

On Tue, Aug 3, 2010 at 10:04 AM, Bhavnik Gajjar 
bhavnik.gaj...@gatewaynintec.com wrote:

  Whoops!

 table still not looks ok :(

 trying to send once again


 loremLorem ipsum dolor sit amet
 Hieyed ddi lorem ipsum dolor
 test lorem ipsume
 test xyz lorem ipslili

 lorem ipLorem ipsum dolor sit amet
 Hieyed ddi lorem ipsum dolor
 test lorem ipsume
 test xyz lorem ipslili

 lorem ipsltest xyz lorem ipslili

 On 8/3/2010 10:00 AM, Bhavnik Gajjar wrote:

 Avlesh,

 Thanks for responding

 The table mentioned below looks like,

 lorem   Lorem ipsum dolor sit amet
  Hieyed ddi lorem ipsum
 dolor
  test lorem ipsume
  test xyz lorem ipslili

 lorem ip   Lorem ipsum dolor sit amet
  Hieyed ddi lorem ipsum
 dolor
  test lorem ipsume
  test xyz lorem ipslili

 lorem ipsl test xyz lorem ipslili


 Yes, [http://askme.in] looks good!

 I would like to know its designs/solr configurations etc.. Can you
 please provide me detailed views of it?

 In [http://askme.in], there is one thing to be noted. Search text like,
 [business c] populates [Business Centre] which looks OK but, [Consultant
 Business] looks bit odd. But, in general the pointer you suggested is
 great to start with.

 On 8/2/2010 8:39 PM, Avlesh Singh wrote:


  From whatever I could read in your broken table of sample use cases, I think


  you are looking for something similar to what has been done here 
 -http://askme.in; if this is what you are looking do let me know.

 Cheers
 Avlesh
 @avleshhttp://twitter.com/avlesh http://twitter.com/avlesh  | 
 http://webklipper.com

 On Mon, Aug 2, 2010 at 8:09 PM, Bhavnik 
 Gajjarbhavnik.gaj...@gatewaynintec.com  wrote:




  Hi,

 I'm looking for a solution related to auto complete feature for one
 application.

 Below is a list of texts from which auto complete results would be
 populated.

 Lorem ipsum dolor sit amet
 tincidunt ut laoreet
 dolore eu feugiat nulla facilisis at vero eros et
 te feugait nulla facilisi
 Claritas est etiam processus
 anteposuerit litterarum formas humanitatis
 fiant sollemnes in futurum
 Hieyed ddi lorem ipsum dolor
 test lorem ipsume
 test xyz lorem ipslili

 Consider below table. First column describes user entered value and
 second column describes expected result (list of auto complete terms
 that should be populated from Solr)

 lorem
 *Lorem* ipsum dolor sit amet
 Hieyed ddi *lorem* ipsum dolor
 test *lorem *ipsume
 test xyz *lorem *ipslili
 lorem ip
 *Lorem ip*sum dolor sit amet
 Hieyed ddi *lorem ip*sum dolor
 test *lorem ip*sume
 test xyz *lorem ip*slili
 lorem ipsl
 test xyz *lorem ipsl*ili



 Can anyone share ideas of how this can be achieved 

Re: Setting up apache solr in eclipse with Tomcat

2010-08-04 Thread Hando420

Thanks man i haven't tried this but where do put that xml configuration. Is
it to the web.xml in solr?

Cheers,
Hando
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Setting-up-apache-solr-in-eclipse-with-Tomcat-tp1021673p1023188.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Setting up apache solr in eclipse with Tomcat

2010-08-04 Thread jayendra patil
The sole home is configured in the web.xml of the application which points
to the folder having the conf files and the data directory
env-entry
   env-entry-namesolr/home/env-entry-name
   env-entry-valueD:/multicore/env-entry-value
   env-entry-typejava.lang.String/env-entry-type
/env-entry

Regards,
Jayendra

On Wed, Aug 4, 2010 at 12:21 PM, Hando420 hando...@gmail.com wrote:


 Thanks man i haven't tried this but where do put that xml configuration. Is
 it to the web.xml in solr?

 Cheers,
 Hando
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Setting-up-apache-solr-in-eclipse-with-Tomcat-tp1021673p1023188.html
 Sent from the Solr - User mailing list archive at Nabble.com.



can't use strdist as functionquery?

2010-08-04 Thread solr-user

I want to sort my results by how closely a given resultset field matches a
given string.

For example, say I am searching for a given product, and the product can be
found in many cities including seattle.  I want to sort the results so
that results from city of seattle are at the top, and all other results
below that

I thought that I could do so by using strdist as a functionquery (I am using
solr 1.4 so I cant directly sort on strdist) but am having problems with the
syntax of the query because functionqueries require double quotes and so
does strdist.

My current query, which fails with an NPE, looks something like this:

http://localhost:8080/solr/select?q=(product:foo)
_val_:strdist(seattle,city,edit)sort=score%20ascfl=product, city,
score

I have tried various types of URL encoding (ie using %22 instead of double
quotes in the strdist function), but no success.

Any ideas??  Is there a better way to accomplish this sorting??

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/can-t-use-strdist-as-functionquery-tp1023390p1023390.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Setting up apache solr in eclipse with Tomcat

2010-08-04 Thread Hando420

Thanks now its clear and works fine.

Regards,
Hando
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Setting-up-apache-solr-in-eclipse-with-Tomcat-tp1021673p1023404.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Sharing index files between multiple JVMs and replication

2010-08-04 Thread Kelly Taylor
Is anybody else encountering these same issues; IF having a similar setup?  And 
is there a way to configure certain Solr web-apps as read-only (basically dummy 
instances) so that index changes are not allowed?



- Original Message 
From: Kelly Taylor wired...@yahoo.com
To: solr-user@lucene.apache.org
Sent: Tue, August 3, 2010 5:48:11 PM
Subject: Re: Sharing index files between multiple JVMs and replication

Yes, they are on a common file server, and I've been sharing the same index 
directory between the Solr JVMs. But I seem to be hitting a wall when 
attempting 

to use just one instance for changing the index.

With Solr replication disabled, I stream updates to the one instance, and this 
process hangs whenever there are additional Solr JVMs started up with the same 
configuration in solrconfig.xml  -  So I then tried, to no avail, using a 
different configuration, solrconfig-readonly.xml where the updateHandler was 
commmented out, all /update* requestHandlers removed, mainIndex locktype of 
none, etc.

And with Solr replication enabled, the Slave seems to hang, or at least report 
unusually long time estimates for the current running replication process to 
complete. 


-Kelly



- Original Message 
From: Lance Norskog goks...@gmail.com
To: solr-user@lucene.apache.org
Sent: Tue, August 3, 2010 4:56:58 PM
Subject: Re: Sharing index files between multiple JVMs and replication

Are these files on a common file server? If you want to share them
that way, it actually does work just to give them all the same index
directory, as long as only one of them changes it.

On Tue, Aug 3, 2010 at 4:38 PM, Kelly Taylor wired...@yahoo.com wrote:
 Is there a way to share index files amongst my multiple Solr web-apps, by
 configuring only one of the JVMs as an indexer, and the remaining, as 
read-only
 searchers?

 I'd like to configure in such a way that on startup of the read-only 
searchers,
 missing cores/indexes are not created, and updates are not handled.

 If I can get around the files being locked by the read-only instances, I 
should
 be able to scale wider in a given environment, as well as have less replicated
 copies of my master index (Solr 1.4 Java Replication).

 Then once the commit is issued to the slave, I can fire off a RELOAD script 
for
 each of my read-only cores.

 -Kelly








-- 
Lance Norskog
goks...@gmail.com






Re: analysis tool vs. reality

2010-08-04 Thread Chris Hostetter

: I think I agree with Justin here, I think the way analysis tool highlights
: 'matches' is extremely misleading, especially considering it completely
: ignores queryparsing.

it really only attempts to identify when there is overlap between 
analaysis at query time and at indexing time so you can easily spot when 
one analyzer or the other breaks things so that they no longer line up 
(or when it fiexes things so they start to line up)

Even if we eliminated that highlighting as missleading, people would still 
do it in thier minds, it would just be harder -- it doesn't change the 
underlying fact that analysis is only part of the picture.

: it would be better if it put your text in a memoryindex and actually parsed
: the query w/ queryparser, ran it, and used the highlighter to try to show
: any matches.

Thta level of query explanation really only works if the user gives us a 
full document (all fields, not just one) and a full query string, and all 
of the possible query params -- because the query parser (either implicit 
because of config, or explicitly specified by the user) might change it's 
behavior based on those other params.

I agree with you: debugging functionality along hte lines of what you are 
describing would be *VASTLY* more useful then what we've got right now, 
and is something i breifly looked into doing before as an extension of the 
existing DebugComponent...

   https://issues.apache.org/jira/browse/SOLR-1749

...the problems i encountered trying to do it as a debug component on 
a real Solr request seem like they would also be problems for a 
MemoryIndex based admin tool approach like what you suggest -- but if 
you've got ideas on working arround them i am 100% interested.

Independent of how we might create a better QueryPasrser + Analyssis 
Explanation tool / debug component is hte question of what we can do to 
make it more clear what exactly the analysis.jsp page is doing and what 
people can infer from that page.  As i said, i don't think removing the 
match highlighting will actaully reduce confusion, but perhaps there is 
verbage/disclaimers that could be added to make it more clear?



-Hoss



Re: analysis tool vs. reality

2010-08-04 Thread Robert Muir
Furthermore, I would like to add its not just the highlight matches
functionality that is horribly broken here, but the output of the analysis
itself is misleading.

lets say i take 'textTight' from the example, and add the following synonym:

this is broken = broke

the query time analysis is wrong, as it clearly shows synonymfilter
collapsing this is broken to broke, but in reality with the qp for that
field, you are gonna get 3 separate tokenstreams and this will never
actually happen (because the qp will divide it up on whitespace first)

So really the output from 'Query Analyzer' is completely bogus.

On Wed, Aug 4, 2010 at 1:57 PM, Robert Muir rcm...@gmail.com wrote:



 On Wed, Aug 4, 2010 at 1:45 PM, Chris Hostetter 
 hossman_luc...@fucit.orgwrote:


 it really only attempts to identify when there is overlap between
 analaysis at query time and at indexing time so you can easily spot when
 one analyzer or the other breaks things so that they no longer line up
 (or when it fiexes things so they start to line up)


 It attempts badly, because it only works in the most trivial of cases
 (e.g. doesnt reflect the interaction of queryparser with multiword synonyms
 or worddelimiterfilter).

 Since Solr includes these non-trivial analysis components *in the example*
 it means that this 'highlight matches' doesnt actually even really work at
 all.

 Someone is gonna use this thing when they dont understand why analysis isnt
 doing what they want, i.e. the cases like I outlined above.

 For the trivial cases where it does work the 'highlight matches' isnt
 useful anyway, so in its current state its completely unnecessary.


 Even if we eliminated that highlighting as missleading, people would still
 do it in thier minds, it would just be harder -- it doesn't change the
 underlying fact that analysis is only part of the picture.


 I'm not suggesting that. I'm suggesting fixing the highlighting so its not
 misleading. There are really only two choices:
 1. remove the current highlighting
 2. fix it.

 in its current state its completely useless and misleading, except for very
 trivial cases, in which you dont need it anyway.



 : it would be better if it put your text in a memoryindex and actually
 parsed
 : the query w/ queryparser, ran it, and used the highlighter to try to
 show
 : any matches.

 Thta level of query explanation really only works if the user gives us a
 full document (all fields, not just one) and a full query string, and all
 of the possible query params -- because the query parser (either implicit
 because of config, or explicitly specified by the user) might change it's
 behavior based on those other params.


 thats true, but I dont see why the user couldnt be allowed to provide just
 this.
 I'd bet money a lot of people are using this thing with a specific
 query/document in mind anyway!


 people can infer from that page.  As i said, i don't think removing the
 match highlighting will actaully reduce confusion, but perhaps there is
 verbage/disclaimers that could be added to make it more clear?


  As i said before, I think i disagree with you. I think for stuff like this
 the technicals are less important, whats important is this is a misleading
 checkbox that really confuses users.

 I suggest disabling it entirely, you are only going to remove confusion.


 --
 Robert Muir
 rcm...@gmail.com




-- 
Robert Muir
rcm...@gmail.com


Re: Best solution to avoiding multiple query requests

2010-08-04 Thread Ken Krugler

Hi Geert-Jan,

On Aug 4, 2010, at 5:30am, Geert-Jan Brits wrote:

Field Collapsing (currently as patch) is exactly what you're looking  
for

imo.

http://wiki.apache.org/solr/FieldCollapsing


Thanks for the ref, good stuff.

I think it's close, but if I understand this correctly, then I could  
get (using just top two, versus top 10 for simplicity) results that  
looked like


dog training (faceted field value A)
super dog (faceted field value B)

but if the actual faceted field value/hit counts were:

C (10)
D (8)
A (2)
B (1)

Then what I'd want is the top hit for dog AND facet field:C,  
followed by dog AND facet field:D.


Used field collapsing would improve the probability that if I asked  
for the top 100 hits, I'd find entries for each of my top N faceted  
field values.


Thanks again,

-- Ken

I've got a situation where the key result from an initial search  
request
(let's say for dog) is the list of values from a faceted field,  
sorted by

hit count.

For the top 10 of these faceted field values, I need to get the top  
hit for
the target request (dog) restricted to that value for the faceted  
field.


Currently this is 11 total requests, of which the 10 requests  
following the
initial query can be made in parallel. But that's still a lot of  
requests.


So my questions are:

1. Is there any magic query to handle this with Solr as-is?

2. if not, is the best solution to create my own request handler?

3. And in that case, any input/tips on developing this type of custom
request handler?

Thanks,

-- Ken



Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g






Re: DIH and Cassandra

2010-08-04 Thread Andrei Savu
DIH only works with relational databases and XML files [1], you need
to write custom code in order to index data from Cassandra.

It should be pretty easy to map documents from Cassandra to Solr.
There are a lot of client libraries available [2] for Cassandra.

[1] http://wiki.apache.org/solr/DataImportHandler
[2] http://wiki.apache.org/cassandra/ClientOptions

On Wed, Aug 4, 2010 at 6:41 PM, Mark static.void@gmail.com wrote:
 Is it possible to use DIH with Cassandra either out of the box or with
 something more custom? Thanks




-- 
Indekspot -- http://www.indekspot.com -- Managed Hosting for Apache Solr


Re: DIH and Cassandra

2010-08-04 Thread Andrei Savu
DIH only works with relational databases and XML files [1], you need
to write custom code in order to index data from Cassandra.

It should be pretty easy to map documents from Cassandra to Solr.
There are a lot of client libraries available [2] for Cassandra.

[1] http://wiki.apache.org/solr/DataImportHandler
[2] http://wiki.apache.org/cassandra/ClientOptions

On Wed, Aug 4, 2010 at 6:41 PM, Mark static.void@gmail.com wrote:
 Is it possible to use DIH with Cassandra either out of the box or with
 something more custom? Thanks


-- 
Indekspot -- http://www.indekspot.com -- Managed Hosting for Apache Solr


Re: Is there a better for solor server side loadbalance?

2010-08-04 Thread Andrei Savu
Check this article [1] that explains how to setup haproxy to do load
balacing. The steps are the same even if you are not using Drupal.  By
using this approach you can easily add more replicas without changing
the application configuration files.

You should also check SolrCloud [2] which does automatic load
balancing and fail-over for queries. This branch is still under
development.

[1] 
http://davehall.com.au/blog/dave/2010/03/13/solr-replication-load-balancing-haproxy-and-drupal
[2] http://wiki.apache.org/solr/SolrCloud

2010/8/4 Chengyang atreey...@163.com:
- Hide quoted text -
 The default solr solution is client side loadbalance.
 Is there a solution provide the server side loadbalance?



-- 
Indekspot -- http://www.indekspot.com -- Managed Hosting for Apache Solr


Solrj ContentStreamUpdateRequest Slow

2010-08-04 Thread Tod
I'm running a slight variation of the example code referenced below and 
it takes a real long time to finally execute.  In fact it hangs for a 
long time at solr.request(up) before finally executing.  Is there 
anything I can look at or tweak to improve performance?


I am also indexing a local pdf file, there are no firewall issues, solr 
is running on the same machine, and I tried the actual host name in 
addition to localhost but nothing helps.



Thanks - Tod

http://wiki.apache.org/solr/ContentStreamUpdateRequestExample


Re: Best solution to avoiding multiple query requests

2010-08-04 Thread Geert-Jan Brits
If I understand correctly: you want to sort your collapsed results by 'nr of
collapsed results'/ hits.

It seems this can't be done out-of-the-box using this patch (I'm not
entirely sure, at least it doesn't follow from the wiki-page. Perhaps best
is to check the jira-issues to make sure this isn't already available now,
but just not updated on the wiki)

Also I found a blogpost (from the patch creator afaik) with in the comments
someone with the same issue + some pointers.
http://blog.jteam.nl/2009/10/20/result-grouping-field-collapsing-with-solr/

hope that helps,
Geert-jan

2010/8/4 Ken Krugler kkrugler_li...@transpac.com

 Hi Geert-Jan,


 On Aug 4, 2010, at 5:30am, Geert-Jan Brits wrote:

  Field Collapsing (currently as patch) is exactly what you're looking for
 imo.

 http://wiki.apache.org/solr/FieldCollapsing


 Thanks for the ref, good stuff.

 I think it's close, but if I understand this correctly, then I could get
 (using just top two, versus top 10 for simplicity) results that looked like

 dog training (faceted field value A)
 super dog (faceted field value B)

 but if the actual faceted field value/hit counts were:

 C (10)
 D (8)
 A (2)
 B (1)

 Then what I'd want is the top hit for dog AND facet field:C, followed by
 dog AND facet field:D.

 Used field collapsing would improve the probability that if I asked for the
 top 100 hits, I'd find entries for each of my top N faceted field values.

 Thanks again,

 -- Ken


  I've got a situation where the key result from an initial search request
 (let's say for dog) is the list of values from a faceted field, sorted
 by
 hit count.

 For the top 10 of these faceted field values, I need to get the top hit
 for
 the target request (dog) restricted to that value for the faceted
 field.

 Currently this is 11 total requests, of which the 10 requests following
 the
 initial query can be made in parallel. But that's still a lot of
 requests.

 So my questions are:

 1. Is there any magic query to handle this with Solr as-is?

 2. if not, is the best solution to create my own request handler?

 3. And in that case, any input/tips on developing this type of custom
 request handler?

 Thanks,

 -- Ken


 
 Ken Krugler
 +1 530-210-6378
 http://bixolabs.com
 e l a s t i c   w e b   m i n i n g







Re: Best solution to avoiding multiple query requests

2010-08-04 Thread Ken Krugler

Hi Geert-jan,

On Aug 4, 2010, at 12:04pm, Geert-Jan Brits wrote:

If I understand correctly: you want to sort your collapsed results  
by 'nr of

collapsed results'/ hits.

It seems this can't be done out-of-the-box using this patch (I'm not
entirely sure, at least it doesn't follow from the wiki-page.  
Perhaps best
is to check the jira-issues to make sure this isn't already  
available now,

but just not updated on the wiki)

Also I found a blogpost (from the patch creator afaik) with in the  
comments

someone with the same issue + some pointers.
http://blog.jteam.nl/2009/10/20/result-grouping-field-collapsing-with-solr/


Yup, that's the one - 
http://blog.jteam.nl/2009/10/20/result-grouping-field-collapsing-with-solr/comment-page-1/#comment-1249

So with some modifications to that patch, it could work...thanks for  
the info!


-- Ken


2010/8/4 Ken Krugler kkrugler_li...@transpac.com


Hi Geert-Jan,


On Aug 4, 2010, at 5:30am, Geert-Jan Brits wrote:

Field Collapsing (currently as patch) is exactly what you're  
looking for

imo.

http://wiki.apache.org/solr/FieldCollapsing



Thanks for the ref, good stuff.

I think it's close, but if I understand this correctly, then I  
could get
(using just top two, versus top 10 for simplicity) results that  
looked like


dog training (faceted field value A)
super dog (faceted field value B)

but if the actual faceted field value/hit counts were:

C (10)
D (8)
A (2)
B (1)

Then what I'd want is the top hit for dog AND facet field:C,  
followed by

dog AND facet field:D.

Used field collapsing would improve the probability that if I asked  
for the
top 100 hits, I'd find entries for each of my top N faceted field  
values.


Thanks again,

-- Ken


I've got a situation where the key result from an initial search  
request
(let's say for dog) is the list of values from a faceted field,  
sorted

by
hit count.

For the top 10 of these faceted field values, I need to get the  
top hit

for
the target request (dog) restricted to that value for the faceted
field.

Currently this is 11 total requests, of which the 10 requests  
following

the
initial query can be made in parallel. But that's still a lot of
requests.

So my questions are:

1. Is there any magic query to handle this with Solr as-is?

2. if not, is the best solution to create my own request handler?

3. And in that case, any input/tips on developing this type of  
custom

request handler?

Thanks,

-- Ken





Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g








Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g






Indexing boolean value

2010-08-04 Thread PeterKerk

Im trying to index a boolean location, but for some reason it does not show
up in my indexed data.

data-config.xml

entity name=location query=select * from locations
field name=id column=ID /
field name=title column=TITLE /
field name=city column=CITY /
field name=official column=OFFICIALLOCATION /

OFFICIALLOCATION is a MSSQL database field of type 'bit'


schema.xml

field name=official type=boolean indexed=true stored=true/
copyField source=official dest=text /

(im not sure why I would use copyField, also tried it without that line,
but still without luck)
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-boolean-value-tp1023708p1023708.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Indexing fieldvalues with dashes and spaces

2010-08-04 Thread Michael Griffiths
Your schema.xml setting for the field is probably tokenizing the punctuation. 
Change the field type to one that doesn't tokenize on punctuation; e.g. use 
text_ws and not text

-Original Message-
From: PeterKerk [mailto:vettepa...@hotmail.com] 
Sent: Wednesday, August 04, 2010 3:36 PM
To: solr-user@lucene.apache.org
Subject: Indexing fieldvalues with dashes and spaces


Im having issues with indexing field values containing spaces and dashes.
For example: Im trying to index province names of the Netherlands. Some 
province names contain a -:
Zuid-Holland
Noord-Holland

my data-config has this:
entity name=location_province query=select provinceid from 
locations where id=${location.id}
entity name=provinces query=select title from provinces 
where id = ${location_province.provinceid}
field name=province column=title  /
/entity
/entity


When I check what has been indexed, I have this:
response
−
lst name=responseHeader
int name=status0/int
int name=QTime0/int
−
lst name=params
str name=indenton/str
str name=start0/str
str name=q*:*/str
str name=version2.2/str
str name=rows10/str
/lst
/lst
−
result name=response numFound=3 start=0 − doc str 
name=cityNijmegen/str − arr name=features strTuin/str 
strCafe/str /arr str name=id1/str str 
name=provinceGelderland/str − arr name=services 
strFotoreportage/str /arr − arr name=theme strGemeentehuis/str 
/arr date name=timestamp2010-08-04T19:11:51.796Z/date
str name=titleGemeentehuis Nijmegen/str /doc − doc str 
name=cityUtrecht/str − arr name=features strTuin/str 
strCafe/str strDanszaal/str /arr str name=id2/str str 
name=provinceUtrecht/str − arr name=services strFotoreportage/str 
strExclusieve huur/str /arr − arr name=theme strGemeentehuis/str 
/arr date name=timestamp2010-08-04T19:11:51.796Z/date
str name=titleGemeentehuis Utrecht/str /doc − doc str 
name=cityBloemendaal/str − arr name=features strStrand/str 
strCafe/str strDanszaal/str /arr str name=id3/str str 
name=provinceZuid-Holland/str
−
arr name=services
strExclusieve huur/str
strLive muziek/str
/arr
−
arr name=theme
strStrand  Zee/str
/arr
date name=timestamp2010-08-04T19:11:51.812Z/date
str name=titleBeachclub Vroeger/str /doc /result /response



So we see that the full field has been indexed:
str name=provinceZuid-Holland/str


BUT, when I check the facets via
http://localhost:8983/solr/db/select/?wt=jsonindent=onq=*:*fl=id,title,city,score,features,official,servicesfacet=truefacet.field=themefacet.field=featuresfacet.field=provincefacet.field=services

I get this (snippet):
facet_counts:{
  facet_queries:{},
  facet_fields:{
theme:[
 Gemeentehuis,2,
 ,1,    a
 Strand,1,
 Zee,1],
features:[
 cafe,3,
 danszaal,2,
 tuin,2,
 strand,1],
province:[
 gelderland,1,
 holland,1,
 utrecht,1,
 zuid,1,   b
 zuidholland,1],
services:[
 exclusiev,2,
 fotoreportag,2,   c
 huur,2,
 live,1,    d
 muziek,1]},


There several weird things happen which I have indicated with ===

a. the full field value is Strand  Zee, but now one facet is 
b. the full field value is Zuid-Holland, but now zuid is a separate facet 
c. the full field value is fotoreportage, but somehow the last character has 
been truncated d. the full field value live muziek, but now live and 
muziek have become separate facets

What can I do about this?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-fieldvalues-with-dashes-and-spaces-tp1023699p1023699.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: Indexing boolean value

2010-08-04 Thread Michael Griffiths
I could be wrong, but I thought bit was an integer. Try changing fieldtype to 
integer.

-Original Message-
From: PeterKerk [mailto:vettepa...@hotmail.com] 
Sent: Wednesday, August 04, 2010 3:42 PM
To: solr-user@lucene.apache.org
Subject: Indexing boolean value


Im trying to index a boolean location, but for some reason it does not show up 
in my indexed data.

data-config.xml

entity name=location query=select * from locations
field name=id column=ID /
field name=title column=TITLE /
field name=city column=CITY /
field name=official column=OFFICIALLOCATION /

OFFICIALLOCATION is a MSSQL database field of type 'bit'


schema.xml

field name=official type=boolean indexed=true stored=true/ copyField 
source=official dest=text /

(im not sure why I would use copyField, also tried it without that line, but 
still without luck)
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-boolean-value-tp1023708p1023708.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: Indexing fieldvalues with dashes and spaces

2010-08-04 Thread PeterKerk

I changed  values to text_ws 

Now I only seem to have problems with fieldvalues that hold spacessee
below:

   field name=city type=text_ws indexed=true stored=true/
   field name=theme type=text_ws indexed=true stored=true
multiValued=true omitNorms=true termVectors=true /
   field name=features type=text_ws indexed=true stored=true
multiValued=true/
   field name=services type=text_ws indexed=true stored=true
multiValued=true/
   field name=province type=text_ws indexed=true stored=true/

It has now become:

 facet_counts:{
  facet_queries:{},
  facet_fields:{
theme:[
 Gemeentehuis,2,
 ,1,    still  is created as separate facet
 Strand,1,
 Zee,1],
features:[
 Cafe,3,
 Danszaal,2,
 Tuin,2,
 Strand,1],
province:[
 Gelderland,1,
 Utrecht,1,
 Zuid-Holland,1],  this is now correct
services:[
 Exclusieve,2,
 Fotoreportage,2,
 huur,2,
 Live,1,  Live muziek is split and separate facets are 
created
 muziek,1]},
  facet_dates:{}}}
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-fieldvalues-with-dashes-and-spaces-tp1023699p1023787.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Indexing fieldvalues with dashes and spaces

2010-08-04 Thread Markus Jelsma
You shouldn't fetch faceting results from analyzed fields, it will mess with 
your results. Search on analyzed fields but don't retrieve values from them. 
 
-Original message-
From: PeterKerk vettepa...@hotmail.com
Sent: Wed 04-08-2010 22:15
To: solr-user@lucene.apache.org; 
Subject: RE: Indexing fieldvalues with dashes and spaces


I changed  values to text_ws 

Now I only seem to have problems with fieldvalues that hold spacessee
below:

  field name=city type=text_ws indexed=true stored=true/
  field name=theme type=text_ws indexed=true stored=true
multiValued=true omitNorms=true termVectors=true /
  field name=features type=text_ws indexed=true stored=true
multiValued=true/
  field name=services type=text_ws indexed=true stored=true
multiValued=true/
  field name=province type=text_ws indexed=true stored=true/

It has now become:

facet_counts:{
 facet_queries:{},
 facet_fields:{
theme:[
Gemeentehuis,2,
,1,    still  is created as separate facet
Strand,1,
Zee,1],
features:[
Cafe,3,
Danszaal,2,
Tuin,2,
Strand,1],
province:[
Gelderland,1,
Utrecht,1,
Zuid-Holland,1],  this is now correct
services:[
Exclusieve,2,
Fotoreportage,2,
huur,2,
Live,1,  Live muziek is split and separate facets are created
muziek,1]},
 facet_dates:{}}}
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-fieldvalues-with-dashes-and-spaces-tp1023699p1023787.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Indexing boolean value

2010-08-04 Thread PeterKerk

Hi,

I tried that already, so that would make this:

field name=official type=integer indexed=true stored=true/
copyField source=official dest=text /

(still not sure what copyField does though)

But even that wont work. I also dont see the officallocation columns indexed
in the documents:
http://localhost:8983/solr/db/select/?q=*%3A*version=2.2start=0rows=10indent=on


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-boolean-value-tp1023708p1023811.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Indexing fieldvalues with dashes and spaces

2010-08-04 Thread PeterKerk

Sorry, but Im a newbie to Solr...how would I change my schema.xml to match
your requirements?

And what do you mean by it will mess with your results? What will happen
then?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-fieldvalues-with-dashes-and-spaces-tp1023699p1023824.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Indexing fieldvalues with dashes and spaces

2010-08-04 Thread Michael Griffiths
Echoing Markus - use the tokenized field to return results, but have a 
duplicate field of fieldtype=string to show the untokenized results. E.g. 
facet on that field.

-Original Message-
From: Markus Jelsma [mailto:markus.jel...@buyways.nl] 
Sent: Wednesday, August 04, 2010 4:18 PM
To: solr-user@lucene.apache.org
Subject: RE: Indexing fieldvalues with dashes and spaces

You shouldn't fetch faceting results from analyzed fields, it will mess with 
your results. Search on analyzed fields but don't retrieve values from them. 
 
-Original message-
From: PeterKerk vettepa...@hotmail.com
Sent: Wed 04-08-2010 22:15
To: solr-user@lucene.apache.org; 
Subject: RE: Indexing fieldvalues with dashes and spaces


I changed  values to text_ws 

Now I only seem to have problems with fieldvalues that hold spacessee
below:

  field name=city type=text_ws indexed=true stored=true/
  field name=theme type=text_ws indexed=true stored=true
multiValued=true omitNorms=true termVectors=true /
  field name=features type=text_ws indexed=true stored=true
multiValued=true/
  field name=services type=text_ws indexed=true stored=true
multiValued=true/
  field name=province type=text_ws indexed=true stored=true/

It has now become:

facet_counts:{
 facet_queries:{},
 facet_fields:{
theme:[
Gemeentehuis,2,
,1,    still  is created as separate facet
Strand,1,
Zee,1],
features:[
Cafe,3,
Danszaal,2,
Tuin,2,
Strand,1],
province:[
Gelderland,1,
Utrecht,1,
Zuid-Holland,1],  this is now correct
services:[
Exclusieve,2,
Fotoreportage,2,
huur,2,
Live,1,  Live muziek is split and separate facets are created
muziek,1]},
 facet_dates:{}}}
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-fieldvalues-with-dashes-and-spaces-tp1023699p1023787.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Indexing boolean value

2010-08-04 Thread Michael Griffiths
Copyfield copies the field so you can have multiple versions. Useful to dump 
all fields into one super field you can search on, for perf reasons.

If the column isn't being indexed, I'd suggest the problem is in DIH. No 
suggestions as to why, I'm afraid.

-Original Message-
From: PeterKerk [mailto:vettepa...@hotmail.com] 
Sent: Wednesday, August 04, 2010 4:22 PM
To: solr-user@lucene.apache.org
Subject: RE: Indexing boolean value


Hi,

I tried that already, so that would make this:

field name=official type=integer indexed=true stored=true/ copyField 
source=official dest=text /

(still not sure what copyField does though)

But even that wont work. I also dont see the officallocation columns indexed in 
the documents:
http://localhost:8983/solr/db/select/?q=*%3A*version=2.2start=0rows=10indent=on


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-boolean-value-tp1023708p1023811.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: Indexing fieldvalues with dashes and spaces

2010-08-04 Thread Markus Jelsma
Hmm, you should first read a bit more on schema design on the wiki and learn 
about indexing and querying Solr.

 

The copyField directive is what is commonly used in a faceted navigation 
system, search on analyzed fields, show faceting results using the primitive 
string field type. With copyField, you can, well, copy the field from one to 
another without it being analyzed by the first - so no chaining is possible, 
which is good. 

 

Let's say you have a city field you want to navigate with, but also search in, 
then you would have an analyzed field for search and a string field for 
displaying the navigation.

 

But, check the wiki on this subject.
 
-Original message-
From: PeterKerk vettepa...@hotmail.com
Sent: Wed 04-08-2010 22:23
To: solr-user@lucene.apache.org; 
Subject: RE: Indexing fieldvalues with dashes and spaces


Sorry, but Im a newbie to Solr...how would I change my schema.xml to match
your requirements?

And what do you mean by it will mess with your results? What will happen
then?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-fieldvalues-with-dashes-and-spaces-tp1023699p1023824.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DIH and Cassandra

2010-08-04 Thread Shalin Shekhar Mangar
On Wed, Aug 4, 2010 at 9:11 PM, Mark static.void@gmail.com wrote:

 Is it possible to use DIH with Cassandra either out of the box or with
 something more custom? Thanks


It will take some modifications but DIH is built to create denormalized
documents so it is possible.

Also see https://issues.apache.org/jira/browse/SOLR-853

-- 
Regards,
Shalin Shekhar Mangar.


RE: Indexing fieldvalues with dashes and spaces

2010-08-04 Thread PeterKerk

Well the example you provided is 100% relevant to me :)

I've read the wiki now (SchemaXml,SolrFacetingOverview,Query Syntax,
SimpleFacetParameters), but still do not have an exact idea of what you
mean.

My situation:
a city field is something that I want users to search on via text input, so
lets say New Yo would give the results for New York.
But also a facet Cities is available in which New York is just one of
the cities that is clickable.

The other facet is theme, which in my example holds values like
Gemeentehuis and Strand  Zee, that would not be a thing on which can be
searched via manual input but IS clickable.

If you look at my schema.xml, do you see stuff im doing that is absolutely
wrong for the purpose described above? Because as far as I can see the
documents are indexed correctly (BESIDES the spaces in the fieldvalues).

Any help is greatly appreciated! :)
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-fieldvalues-with-dashes-and-spaces-tp1023699p1023992.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DIH and Cassandra

2010-08-04 Thread Dennis Gearon
If data is stored in the index, isn't the index of Solr pretty much already a 
'Big/Cassandra Table', except with tokenized columns to make seaching easier?

How are Cassandra/Big/Couch DBs doing text/weighted searching? 

Seems a real duplication to use Cassandra AND Solr. OTOH, I don't know how many 
'Tables'/indexes one can make using Solr, I'm still a newbie.

Dennis Gearon

Signature Warning

EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Wed, 8/4/10, Andrei Savu andrei.s...@indekspot.com wrote:

 From: Andrei Savu andrei.s...@indekspot.com
 Subject: Re: DIH and Cassandra
 To: solr-user@lucene.apache.org
 Date: Wednesday, August 4, 2010, 12:00 PM
 DIH only works with relational
 databases and XML files [1], you need
 to write custom code in order to index data from
 Cassandra.
 
 It should be pretty easy to map documents from Cassandra to
 Solr.
 There are a lot of client libraries available [2] for
 Cassandra.
 
 [1] http://wiki.apache.org/solr/DataImportHandler
 [2] http://wiki.apache.org/cassandra/ClientOptions
 
 On Wed, Aug 4, 2010 at 6:41 PM, Mark static.void@gmail.com
 wrote:
  Is it possible to use DIH with Cassandra either out of
 the box or with
  something more custom? Thanks
 
 
 
 
 -- 
 Indekspot -- http://www.indekspot.com -- Managed
 Hosting for Apache Solr
 


Re: Is there a better for solor server side loadbalance?

2010-08-04 Thread Peter Karich

 The default solr solution is client side loadbalance.
 Is there a solution provide the server side loadbalance?


 
 No. Most of us stick a HTTP load balancer in front of multiple Solr servers.
   

E.g. mod_jk is a very easy solution (maybe too simple/stupid?) for a
load balancer,
but it offers also a failover functionality:

It is as simple as:

worker.loadbalancer.balance_workers=worker1,worker2,worker3,...

and the failover:

worker.worker1.redirect=worker2



Re: Some basic DataImportHandler questions

2010-08-04 Thread harrysmith

Thanks, I think part of my issue may be I am misunderstanding how to use the
entity and field tags to import data in a particular format and am looking
for a few more examples.

Lets say I have a database table with 2 columns that contain metadata fields
and values, and would like to import this into Solr and keep the pairs
together, an example database table follows consisting of two columns
(String), one containing metadata names and the other metadata values (col
names: metadata_name, metadata_value in this example). There may be multiple
records for a name. The set of potential metadata_names is unknown, it could
be anything.

metadata_name . metadata_value
=====
title   blah blah
subject  some subject
subject  another subject
name some name


What is the proper way to import these and keep the name/value pairs intact.
I am seeing the following after import:

arr name=metadata_name_s
strtitle/str
strsubject/str
strname/str
/arr
−
arr name=metadata_value_s
strblah blah/str
strsome subject/str
stranother subject/str
strsome name/str
/arr

Ideally, the end goal would be something like below:

arr name=title_s
strsome subject/str
/arr

arr name=name_s
strsome name/str
/arr

etc

It feels like I am missing something obvious and this would be a common
structure for imports.





 Just starting with DataImportHandler and had a few simple questions.

 Is there a location for more in depth documentation other than
 http://wiki.apache.org/solr/DataImportHandler?



Umm, no, but let us know what is not covered well and it can be added. 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Some-basic-DataImportHandler-questions-tp1010291p1024205.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: No group by? looking for an alternative.

2010-08-04 Thread Lance Norskog
Hello-

A way to do this is to create on faceting field that includes both the
size and the color. I assume you have a different shoe product
document for each model. Each model would include the color  size
'red' and '14a' fields, but you would add a field with 'red-14a'.

On Wed, Aug 4, 2010 at 7:17 AM, Mickael Magniez
mickaelmagn...@gmail.com wrote:

 Hello,

 I'm dealing with a problem since few days  : I want to index and search
 shoes, each shoe can have several size and colors, at different prices.

 So, what i want is : when I search for Converse, i want to retrieve one
 shoe per model, i-e one color and one size, but having colors and sizes in
 facets.

 My first idea was to copy SQL behaviour with a SELECT * FROM solr WHERE
 text CONTAINS 'converse' GROUP BY model.
 But no group by in Solr :(. I try with FieldCollapsing, but have many bugs
 (NullPointerException).

 Then I try with multivalued facets  :
 field name=size type=string indexed=true stored=true
 multiValued=true/
 field name=color type=string indexed=true stored=true
 multiValued=true/

 It's nearly working, but i have a problem : when i filtered on red shoes, in
 the size facet, I also have sizes which are not available in red. I don't
 find any solutions to filter multivalued facet with value of another
 multivalued facet.

 So if anyone have an idea for solving this problem...



 Mickael.

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/No-group-by-looking-for-an-alternative-tp1022738p1022738.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Lance Norskog
goks...@gmail.com


Re: analysis tool vs. reality

2010-08-04 Thread Lance Norskog
there is some kind of caching of query results
going on that doesnt get flushed on a restart of tomcat.

Yes. Solr by default has http caching on if there is no configuration,
and the example solrconfig.xml has it configured on. You should edit
solrconfig.xml to use the alternative described in the comments.

On Wed, Aug 4, 2010 at 7:55 AM, Justin Lolofie jta...@gmail.com wrote:
 Wow, I got to work this morning and my query results now include the
 'ABC12' document. I'm not sure what that means. Either I made a
 mistake in the process I described in the last email (I dont think
 this is the case) or there is some kind of caching of query results
 going on that doesnt get flushed on a restart of tomcat.




 Erik: Yes, I did re-index if that means adding the document again.
 Here are the exact steps I took:

 1. analysis.jsp ABC12 does NOT match title ABC12 (however, ABC or 12 does)
 2. changed schema.xml WordDelimeterFilterFactory catenate-all
 3. restarted tomcat
 4. deleted the document with title ABC12
 5. added the document with title ABC12
 6. query ABC12 does NOT result in the document with title ABC12
 7. analysis.jsp ABC12 DOES match that document now

 Is there any way to see, given an ID, how something is indexed internally?

 Lance: I understand the index/query sections of analysis.jsp. However,
 it operates on text that you enter into the form, not on actual index
 data. Since all my documents have a unique ID, I'd like to supply an
 ID and a query, and get back the same index/query sections- using
 whats actually in the index.


 -- Forwarded message --
 From: Erik Hatcher erik.hatc...@gmail.com
 To: solr-user@lucene.apache.org
 Date: Tue, 3 Aug 2010 22:43:17 -0400
 Subject: Re: analysis tool vs. reality
 Did you reindex after changing the schema?


 On Aug 3, 2010, at 7:35 PM, Justin Lolofie wrote:

    Hi Erik, thank you for replying. So, turning on debugQuery shows
    information about how the query is processed- is there a way to see
    how things are stored internally in the index?

    My query is ABC12. There is a document who's title field is
    ABC12. However, I can only get it to match if I search for ABC or
    12. This was also true in the analysis tool up until recently.
    However, I changed schema.xml and turned on catenate-all in
    WordDelimterFilterFactory for title fieldtype. Now, in the analysis
    tool ABC12 matches ABC12. However, when doing an actual query, it
    does not match.

    Thank you for any help,
    Justin


    -- Forwarded message --
    From: Erik Hatcher erik.hatc...@gmail.com
    To: solr-user@lucene.apache.org
    Date: Tue, 3 Aug 2010 16:50:06 -0400
    Subject: Re: analysis tool vs. reality
    The analysis tool is merely that, but during querying there is also a
    query parser involved.  Adding debugQuery=true to your request will
    give you the parsed query in the response offering insight into what
    might be going on.   Could be lots of things, like not querying the
    fields you think you are to a misunderstanding about some text not
    being analyzed (like wildcard clauses).

         Erik

    On Aug 3, 2010, at 4:43 PM, Justin Lolofie wrote:

      Hello,

      I have found the analysis tool in the admin page to be very useful in
      understanding my schema. I've made changes to my schema so that a
      particular case I'm looking at matches properly. I restarted solr,
      deleted the document from the index, and added it again. But still,
      when I do a query, the document does not get returned in the results.

      Does anyone have any tips for debugging this sort of issue? What is
      different between what I see in analysis tool and new documents added
      to the index?

      Thanks,
      Justin




-- 
Lance Norskog
goks...@gmail.com


Re: Indexing fieldvalues with dashes and spaces

2010-08-04 Thread Erick Erickson
I suspect you're running afoul of tokenizers and filters. The parts of your
schema
that you published aren't the ones that really count.

What you probably need to look at is the FieldType definitions, i.e. what
analysis is
done for, say, text_ws (see FieldType... in your schema). There you might
find
things like WordDelimiterFilter with several options. LowerCaseFilter, etc.
Each of these
changes what's placed in your index. Here's a good place to start, although
it's not
exhaustive:

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

The general idea here is that the Tokenizers in general break up the
incoming stream according to various rules. The Filters then (potentially)
modify each token in various ways.

Until you have a firm handle on this process, facets are probably a
distraction. You're
better off looking at your index with the admin pages and/or Luke and/or
LukeRequestHandler.

And do be aware that fields you get back from a request (i.e. a search) are
the stored fields,
NOT what's indexed. This may trip you up too...

HTH
Erick

On Wed, Aug 4, 2010 at 5:22 PM, PeterKerk vettepa...@hotmail.com wrote:


 Well the example you provided is 100% relevant to me :)

 I've read the wiki now (SchemaXml,SolrFacetingOverview,Query Syntax,
 SimpleFacetParameters), but still do not have an exact idea of what you
 mean.

 My situation:
 a city field is something that I want users to search on via text input, so
 lets say New Yo would give the results for New York.
 But also a facet Cities is available in which New York is just one of
 the cities that is clickable.

 The other facet is theme, which in my example holds values like
 Gemeentehuis and Strand  Zee, that would not be a thing on which can
 be
 searched via manual input but IS clickable.

 If you look at my schema.xml, do you see stuff im doing that is absolutely
 wrong for the purpose described above? Because as far as I can see the
 documents are indexed correctly (BESIDES the spaces in the fieldvalues).

 Any help is greatly appreciated! :)
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Indexing-fieldvalues-with-dashes-and-spaces-tp1023699p1023992.html
 Sent from the Solr - User mailing list archive at Nabble.com.



XML Format

2010-08-04 Thread twojah

doc
int name=AP_AUC_PHOTO_AVAIL1/int
double name=AUC_AD_PRICE1.0/double
int name=AUC_CLIENT_ID27017/int
str name=AUC_DESCR_SHORTBracket Ceiling untuk semua merk projector,
panjang 60-90 cm  Bahan Besi Cat Hitam = 325rb Bahan Sta/str
str
name=AUC_HTML_DIR_NL/aksesoris-batere-dan-tripod/update-bracket-projector-dan-lcd-plasma-tv-607136.html/str
int name=AUC_ID607136/int
str name=AUC_ISNEGONego/str
int name=AUC_LOCATION7/int
str name=AUC_PHOTO270/27017/bracket_lcd_plasma_3a-1274291780.JPG/str
str name=AUC_START2010-05-19 17:56:45/str
str name=AUC_TITLE[UPDATE] BRACKET Projector dan LCD/PLASMA TV/str
int name=AUC_TYPE21/int
int name=PRO_BACKGROUND0/int
int name=PRO_BOLD0/int
int name=PRO_COLOR0/int
int name=PRO_GALLERY0/int
int name=PRO_LINK0/int
int name=PRO_SPONSOR0/int
int name=cat_id_sub0/int
int name=sectioncode28/int
/doc

above is my recent XML list, I can't do search for example searching
bracket word, it will return empty list, so after I search on the
internet, I found out that there is a mistake in my XML Schema, I should
change the schema so it will return a list below (see the bold one):

doc
int name=AP_AUC_PHOTO_AVAIL1/int
double name=AUC_AD_PRICE1.0/double
int name=AUC_CLIENT_ID27017/int
arr name=AUC_DESCR_SHORTstrBracket Ceiling untuk semua merk projector,
panjang 60-90 cm  Bahan Besi Cat Hitam = 325rb Bahan Sta/str/arr
str
name=AUC_HTML_DIR_NL/aksesoris-batere-dan-tripod/update-bracket-projector-dan-lcd-plasma-tv-607136.html/str
int name=AUC_ID607136/int
str name=AUC_ISNEGONego/str
int name=AUC_LOCATION7/int
str name=AUC_PHOTO270/27017/bracket_lcd_plasma_3a-1274291780.JPG/str
str name=AUC_START2010-05-19 17:56:45/str
arr name=AUC_TITLEstr[UPDATE] BRACKET Projector dan LCD/PLASMA
TV/str/arr
int name=AUC_TYPE21/int
int name=PRO_BACKGROUND0/int
int name=PRO_BOLD0/int
int name=PRO_COLOR0/int
int name=PRO_GALLERY0/int
int name=PRO_LINK0/int
int name=PRO_SPONSOR0/int
int name=cat_id_sub0/int
int name=sectioncode28/int
/doc

my question is, how to change my Schema so it will return the list like the
one above that I bold?
thanks before
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/XML-Format-tp1024608p1024608.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solrj ContentStreamUpdateRequest Slow

2010-08-04 Thread jayendra patil
ContentStreamUpdateRequest seems to read the file contents and transfer it
over http, which slows down the indexing.

Try Using StreamingUpdateSolrServer with stream.file param @
http://wiki.apache.org/solr/SolrPerformanceFactors#Embedded_vs_HTTP_Post

e.g.

SolrServer server = new StreamingUpdateSolrServer(Solr Server URL,20,8);
UpdateRequest req = new UpdateRequest(/update/extract);
ModifiableSolrParams params = null ;
params = new ModifiableSolrParams();
params.add(stream.file, new String[]{local file path});
params.set(literal.id, value);
req.setParams(params);
server.request(req);
server.commit();

Regards,
Jayendra

On Wed, Aug 4, 2010 at 3:01 PM, Tod listac...@gmail.com wrote:

 I'm running a slight variation of the example code referenced below and it
 takes a real long time to finally execute.  In fact it hangs for a long time
 at solr.request(up) before finally executing.  Is there anything I can look
 at or tweak to improve performance?

 I am also indexing a local pdf file, there are no firewall issues, solr is
 running on the same machine, and I tried the actual host name in addition to
 localhost but nothing helps.


 Thanks - Tod

 http://wiki.apache.org/solr/ContentStreamUpdateRequestExample



how to take a value from the query result

2010-08-04 Thread twojah

this is my query in browser navigation toolbar
http://172.16.17.126:8983/search/select/?q=AUC_ID:607136

and this is the result in browser page:
...
doc
int name=AP_AUC_PHOTO_AVAIL1/int
double name=AUC_AD_PRICE1.0/double
int name=AUC_CAT576/int
int name=AUC_CLIENT_ID27017/int
str name=AUC_DESCR_SHORTBracket Ceiling untuk semua merk projector,
panjang 60-90 cm  Bahan Besi Cat Hitam = 325rb Bahan Sta/str
str
name=AUC_HTML_DIR_NL/aksesoris-batere-dan-tripod/update-bracket-projector-dan-lcd-plasma-tv-607136.html/str
int name=AUC_ID607136/int
str name=AUC_ISNEGONego/str
int name=AUC_LOCATION7/int
str name=AUC_PHOTO270/27017/bracket_lcd_plasma_3a-1274291780.JPG/str
str name=AUC_START2010-05-19 17:56:45/str
str name=AUC_TITLE[UPDATE] BRACKET Projector dan LCD/PLASMA TV/str
int name=AUC_TYPE21/int
int name=PRO_BACKGROUND0/int
int name=PRO_BOLD0/int
int name=PRO_COLOR0/int
int name=PRO_GALLERY0/int
int name=PRO_LINK0/int
int name=PRO_SPONSOR0/int
int name=cat_id_sub0/int
int name=sectioncode28/int
/doc

I want to get the AUC_CAT value (576) and using it in my PHP, how can I get
that value?
please help
thanks before
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-take-a-value-from-the-query-result-tp1025119p1025119.html
Sent from the Solr - User mailing list archive at Nabble.com.