Re: mergeFactor / indexing speed

2009-08-04 Thread Chantal Ackermann

Hi Avlesh,
hi Otis,
hi Grant,
hi all,


(enumerating to keep track of all the input)

a) mergeFactor 1000 too high
I'll change that back to 10. I thought it would make Lucene use more RAM 
before starting IO.


b) ramBufferSize:
OK, or maybe more. I'll keep that in mind.

c) solrconfig.xml - default and main index:
I've always changed both sections, the default and the main index one.

d) JDBC batch size:
I haven't set it. I'll do that.

e) DB server performance:
I agree, ping is definitely not much information. I also did queries 
from my own computer towards it (while the indexer ran) which came back 
as fast as usual.
Currently, I don't have any login to ssh to that machine, but I'm going 
to try get one.


f) Network:
I'll definitely need to have a look at that once I have access to the db 
machine.



g) the data

g.1) nested entity in DIH conf
there is only the root and one nested entity. However, that nested 
entity returns multiple rows (about 10) for one query. (Fetched rows is 
about 10 times the number of processed documents.)


g.2) my custom EntityProcessor
( The code is pasted at the very end of this e-mail. )
- iterates over those multiple rows,
- uses one column to create a key in a map,
- uses two other columns to create the corresponding value (String 
concatenation),
- if a key already exists, it gets the value, if that value is a list, 
it adds the new value to that list, if it's not a list, it creates one 
and adds the old and the new value to it.
I refrained from adding any business logic to that processor. It treats 
all rows alike, no matter whether they hold values that can appear 
multiple or values that must appear only once.


g.3) the two transformers
- to split one value into two (regex)
field column=person /
field column=participant sourceColName=person regex=([^\|]+)\|.*/
field column=role sourceColName=person 
regex=[^\|]+\|\d+,\d+,\d+,(.*)/


- to create extract a number from an existing number (bit calculation 
using the script transformer). As that one works on a field that is 
potentially multiValued, it needs to take care of creating and 
populating a list, as well.

field column=cat name=cat /
script![CDATA[
function getMainCategory(row) {
var cat = row.get('cat');
var mainCat;
if (cat != null) {
// check whether cat is an array
if (cat instanceof java.util.List) {
var arr = java.util.ArrayList();
for (var i=0; icat.size(); i++) {
mainCat = new java.lang.Integer(cat.get(i)8);
if (!arr.contains(mainCat)) {
arr.add(mainCat);
}
}
row.put('maincat', arr);
} else { // it is a single value
var mainCat = new java.lang.Integer(cat8);
row.put('maincat', mainCat);
}
}
return row;
}
]]/script
(The EpgValueEntityProcessor decides on creating lists on a case by case 
basis: only if a value is specified multiple times for a certain data 
set does it create a list. This is because I didn't want to put any 
complex configuration or business logic into it.)


g.4) fields
the DIH extracts 5 fields from the root entity, 11 fields from the 
nested entity, and the transformers might create additional 3 (multiValued).
schema.xml defines 21 fields (two additional fields: the timestamp field 
(default=NOW) and a field collecting three other text fields for 
default search (using copy field)):

- 2 long
- 3 integer
- 3 sint
- 3 date
- 6 text_cs (class=solr.TextField positionIncrementGap=100):
analyzer
tokenizer class=solr.WhitespaceTokenizerFactory /
filter class=solr.WordDelimiterFilterFactory splitOnCaseChange=0
generateWordParts=0 generateNumberParts=0 catenateWords=0 
catenateNumbers=0 catenateAll=0 /

/analyzer
- 4 text_de (one is the field populated by copying from the 3 others):
analyzer type=index
tokenizer class=solr.StandardTokenizerFactory /
filter class=solr.LengthFilterFactory min=2 max=5000 /
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords_de.txt /

filter class=solr.WordDelimiterFilterFactory generateWordParts=1
generateNumberParts=1 catenateWords=1 catenateNumbers=1 
catenateAll=0 splitOnCaseChange=1 /

filter class=solr.LowerCaseFilterFactory /
filter class=solr.SnowballPorterFilterFactory language=German /
filter class=solr.RemoveDuplicatesTokenFilterFactory /
/analyzer


Thank you for taking your time!
Cheers,
Chantal





** EpgValueEntityProcessor.java ***

import java.util.ArrayList;
import java.util.Collections;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.logging.Logger;

import org.apache.solr.handler.dataimport.Context;
import org.apache.solr.handler.dataimport.SqlEntityProcessor;

public class 

Functions in search result

2009-08-04 Thread Markus Jelsma - Buyways B.V.
Solr people,


Can i retrieve results from a function query? For instance, i have a
schema in which all documents have a size in bytes field. For each
query, i also need to sum of the bytes field for the returned documents.
I know i can use SUM as part of a function query but i cannot figure it
out if it even works for me.

I prefer doing it with Solr and have the sum in the in the response
header or somewhere similar instead of iterating over the entire
resultset myself. Also, iterating over the resultset would not really
work for me either since i also need paging through start= and rows= to
limit the show documents but still keeping the sum of bytes the same.


Regards,

-  
Markus Jelsma  Buyways B.V. Tel. 050-3118123
Technisch ArchitectFriesestraatweg 215c Fax. 050-3118124
http://www.buyways.nl  9743 AD GroningenKvK  01074105



Re: How to configure Solr in Glassfish ?

2009-08-04 Thread Ilan Rabinovitch

On 7/20/09 11:08 PM, huenzhao wrote:


Yes, I don't know how set solr.home in glassfish with centOS.
I tried to configure the solr.home, but the error log is:looking for
solr.xml: /var/deploy/solr/solr.xml



Is that the appropriate path for your solr.home?  What did you intend to 
set it to?





--
Ilan Rabinovitch
i...@fonz.net

---
SCALE 8x: 2010 Southern California Linux Expo
Los Angeles, CA
http://www.socallinuxexpo.org



Re: Rotating the primary shard in /solr/select

2009-08-04 Thread Shalin Shekhar Mangar
On Wed, Jul 29, 2009 at 2:57 AM, Phillip Farber pfar...@umich.edu wrote:


 Is there any value in a round-robin scheme to cycle through the Solr
 instances supporting a multi-shard index over several machines when sending
 queries or is it better to just pick one instance and stick with it.  I'm
 assuming all machines in the cluster have the same hardware specs.

 So scenario A (round-robin):

 query 1: /solr-shard-1/select?q=dog... shards=shard-1,shard2
 query 2: /solr-shard-2/select?q=dog... shards=shard-1,shard2
 query 3: /solr-shard-1/select?q=dog... shards=shard-1,shard2
 etc.

 or or scenario B (fixed):

 query 1: /solr-shard-1/select?q=dog... shards=shard-1,shard2
 query 2: /solr-shard-1/select?q=dog... shards=shard-1,shard2
 query 3: /solr-shard-1/select?q=dog... shards=shard-1,shard2
 etc.

 Is there evidence that distributing the overhead of result merging over
 more machines (A) gives a performance boost?


We issue distributed search queries through a load balancer. So in effect,
the merging server (or aggregator) keeps changing. I don't know if that
leads to a performance boost or not but I guess spreading the load is a good
idea.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Rotating the primary shard in /solr/select

2009-08-04 Thread Shalin Shekhar Mangar
On Tue, Aug 4, 2009 at 11:26 AM, Rahul R rahul.s...@gmail.com wrote:

 Philip,
 I cannot answer your question, but I do have a question for you. Does
 aggregation happen at the primary shard ? For eg : if I have three JVMs
 JVM 1 : My application powered by Solr
 JVM 2 : Shard 1
 JVM 3 : Shard 2

 I initialize my SolrServer like this
 SolrServer _solrServer = *new* CommonsHttpSolrServer(shard1);

 Does aggregation now happen at JVM 2 ?


Yes.


 Is there any other reason for
 initializing the SolrServer with one of the shard URLs ?


The SolrServer is initialized to the server to which you want to send the
request. It has nothing to do with distributed search by itself.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Rotating the primary shard in /solr/select

2009-08-04 Thread Rahul R
*The SolrServer is initialized to the server to which you want to send the
request. It has nothing to do with distributed search by itself.*

But isn't the request sent to all the shards ? We set all the shard urls in
the 'shards' parameter of our HttpRequest.Or is it something like the
request is first sent to the server (with which SolrServer is initialized)
and from there it is sent to all the other shards ?

Regards
Rahul
On Tue, Aug 4, 2009 at 2:29 PM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 On Tue, Aug 4, 2009 at 11:26 AM, Rahul R rahul.s...@gmail.com wrote:

  Philip,
  I cannot answer your question, but I do have a question for you. Does
  aggregation happen at the primary shard ? For eg : if I have three JVMs
  JVM 1 : My application powered by Solr
  JVM 2 : Shard 1
  JVM 3 : Shard 2
 
  I initialize my SolrServer like this
  SolrServer _solrServer = *new* CommonsHttpSolrServer(shard1);
 
  Does aggregation now happen at JVM 2 ?


 Yes.


  Is there any other reason for
  initializing the SolrServer with one of the shard URLs ?
 

 The SolrServer is initialized to the server to which you want to send the
 request. It has nothing to do with distributed search by itself.

 --
 Regards,
 Shalin Shekhar Mangar.



eternal optimize interrupted

2009-08-04 Thread Thomas Koch
Hi, 

last evening we started an optimize over our solr index of 45GB. This morning 
the optimize was still running, discs spinning like crazy and de index 
directory has grew to 83GB.
We stopped and restarted tomcat since solr was unresponsive and we needed to 
query the index.
Now I don't know what to do? How to find out which ratio of the index is 
optimized, how many nights will it take to finish?

Best regards,

Thomas Koch, http://www.koch.ro


Re: Rotating the primary shard in /solr/select

2009-08-04 Thread Shalin Shekhar Mangar
On Tue, Aug 4, 2009 at 2:37 PM, Rahul R rahul.s...@gmail.com wrote:

 *The SolrServer is initialized to the server to which you want to send the
 request. It has nothing to do with distributed search by itself.*

 But isn't the request sent to all the shards ? We set all the shard urls in
 the 'shards' parameter of our HttpRequest.Or is it something like the
 request is first sent to the server (with which SolrServer is initialized)
 and from there it is sent to all the other shards ?


The request is sent to the server with which SolrServer is initialized. That
server makes use of the shards parameter, queries other servers, merges the
responses and sends it back to the client.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Picking Facet Fields by Frequency-in-Results

2009-08-04 Thread Erik Hatcher
And further on this, if you want a field automatically added to each  
document with the list of its field names, check out http://issues.apache.org/jira/browse/SOLR-1280


Erik



On Aug 4, 2009, at 1:01 AM, Avlesh Singh wrote:

I understand the general need here. And just extending what you  
suggested
(indexing the fields themselves inside a multiValued field), you can  
perform

a query like this -
/search? 
q 
= 
myquery 
facet 
= 
true 
facet 
.field 
= 
indexedfieldsfacet.field=field1facet.field=field2...facet.sort=true


You'll get facets for all the fields (passed as multiple facet.field
params), including the one that gives you field frequency. You can  
do all

sorts of post processing on this data to achieve the desired.

Hope this helps.

Cheers
Avlesh

On Tue, Aug 4, 2009 at 2:20 AM, Chris Harris rygu...@gmail.com  
wrote:



One task when designing a facet-based UI is deciding which fields to
facet on and display facets for. One possibility that I hope to
explore is to determine which fields to facet on dynamically, based  
on

the search results. In particular, I hypothesize that, for a somewhat
heterogeneous index (heterogeneous in terms of which fields a given
record might contain), that the following rule might be helpful:  
Facet

on a given field to the extent that it is frequently set in the
documents matching the user's search.

For example, let's say my results look like this:

Doc A:
f1: foo
f2: bar
f3: N/A
f4: N/A

Doc B:
f1: foo2
f2: N/A
f3: N/A
f4: N/A

Doc C:
f1: foo3
f2: quiz
f3: N/A
f4: buzz

Doc D:
f1: foo4
f2: question
f3: bam
f4: bing

The field usage information for these documents could be summarized  
like

this:

field f1: Set in 4 docs
field f2: Set in 3 doc
field f3: Set 1 doc
field f4: Set 2 doc

If I were choosing facet fields based on the above rule, I would
definitely want to display facets for field f1, since occurs in all
documents.  If I had room for another facet in the UI, I would facet
f2. If I wanted another one, I'd go with f4, since it's more popular
than f3. I probably would ignore f3 in any case, because it's set for
only one document.

Has anyone implemented such a scheme with Solr? Any success? (The
closest thing I can find is
http://wiki.apache.org/solr/ComplexFacetingBrainstorming, which tries
to pick which facets to display based not on frequency but based more
on a ruleset.)

As far as implementation, the most straightforward approach (which
wouldn't involve modifying Solr) would apparently be to add a new
multi-valued fieldsindexed field to each document, which would note
which fields actually have a value for each document. So when I pass
data to Solr at indexing time, it will look something like this
(except of course it will be in valid Solr XML, rather than this
schematic):

Doc A:
f1: foo
f2: bar
indexedfields: f1, f2

Doc B:
f1: foo2
indexedfields: f1

Doc C:
f1: foo3
f2: quiz
f4: buzz
indexedfields: f1, f2, f4

Doc D:
f1: foo4
f2: question
f3: bam
f4: bing
indexedfields: f1, f2, f3, f4

Then to chose which facets to display, I call


http://myserver/solr/search?q=myqueryfacet=truefacet.field=indexedfieldsfacet.sort=true

and use the frequency information from this query to determine which
fields to display in the faceting UI. (To get the actual facet
information for those fields, I would query Solr a second time.)

Are there any alternatives that would be easier or more efficient?

Thanks,
Chris





Re: Rotating the primary shard in /solr/select

2009-08-04 Thread Rahul R
Shalin, thank you for the clarification.

Philip, I just realized that I have diverted the original topic of the
thread. My apologies.

Regards
Rahul

On Tue, Aug 4, 2009 at 3:35 PM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 On Tue, Aug 4, 2009 at 2:37 PM, Rahul R rahul.s...@gmail.com wrote:

  *The SolrServer is initialized to the server to which you want to send
 the
  request. It has nothing to do with distributed search by itself.*
 
  But isn't the request sent to all the shards ? We set all the shard urls
 in
  the 'shards' parameter of our HttpRequest.Or is it something like the
  request is first sent to the server (with which SolrServer is
 initialized)
  and from there it is sent to all the other shards ?
 

 The request is sent to the server with which SolrServer is initialized.
 That
 server makes use of the shards parameter, queries other servers, merges the
 responses and sends it back to the client.

 --
 Regards,
 Shalin Shekhar Mangar.



Synonym aware string field typ

2009-08-04 Thread Jérôme Etévé
Hi all,

I'd like to have a string type which is synonym aware at query time.
Is it ok to have something like that:

fieldType name=sastring class=solr.StrField
  analyzer type=query
   tokenizer class=solr.KeywordTokenizerFactory/
   filter class=solr.SynonymFilterFactory
tokenizerFactory=solr.KeywordTokenizerFactory
synonyms=my_synonyms.txt ignoreCase=true/
   /analyzer

/fieldType


My questions are:

- Will the index time analyzer stay the default for the type solr.StrField .
- Is the KeywordTokenizerFactory the right one to use for the query
time analyzer ?

Cheers!

Jerome.

-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

jer...@eteve.net


Re: ClassCastException from custom request handler

2009-08-04 Thread James Brady
Solr version: 1.3.0 694707

solrconfig.xml:
requestHandler name=livecores class=LiveCoresHandler /

public class LiveCoresHandler extends RequestHandlerBase {
public void init(NamedList args) { }
public String getDescription() { return ; }
public String getSource() { return ; }
public String getSourceId() { return ; }
public NamedList getStatistics() { return new NamedList(); }
public String getVersion() { return ; }

public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse
rsp) {
CollectionString names =
req.getCore().getCoreDescriptor().getCoreContainer().getCoreNames();
rsp.add(cores, names);
// if the cores are dynamic, you prob don't want to cache
rsp.setHttpCaching(false);
}
}

2009/8/4 Avlesh Singh avl...@gmail.com

 
  I'm sure I have the class name right - changing it to something patently
  incorrect results in the expected org.apache.solr.common.SolrException:
  Error loading class ..., rather thanthe ClassCastException.
 
 You are right about that, James.

 Which Solr version are you using?
 Can you please paste the relevant pieces in your solrconfig.xml and the
 request handler class you have created?

 Cheers
 Avlesh

 On Mon, Aug 3, 2009 at 10:51 PM, James Brady james.colin.br...@gmail.com
 wrote:

  Hi,
  Thanks for your suggestions!
 
  I'm sure I have the class name right - changing it to something patently
  incorrect results in the expected
  org.apache.solr.common.SolrException: Error loading class ..., rather
  than
  the ClassCastException.
 
  I did have some problems getting my class on the app server's classpath.
  I'm
  running with solr.home set to multicore, but creating a multicore/lib
  directory and putting my request handler class in there resulted in
 Error
  loading class errors.
 
  I found that setting jetty.class.path to include multicore/lib (and also
  explicitly point at Solr's core and common JARs) fixed the Error loading
  class errors, leaving these ClassCastExceptions...
 
  2009/8/3 Avlesh Singh avl...@gmail.com
 
   Can you cross check the class attribute for your handler in
  solrconfig.xml?
   My guess is that it is specified as solr.LiveCoresHandler. It should
 be
   fully qualified class name - com.foo.path.to.LiveCoresHandler instead.
  
   Moreover, I am damn sure that you did not forget to drop your jar into
   solr.home/lib. Checking once again might not be a bad idea :)
  
   Cheers
   Avlesh
  
   On Mon, Aug 3, 2009 at 9:11 PM, James Brady 
 james.colin.br...@gmail.com
   wrote:
  
Hi,
I'm creating a custom request handler to return a list of live cores
 in
Solr.
   
On startup, I get this exception for each core:
   
Jul 31, 2009 5:20:39 PM org.apache.solr.common. SolrException log
SEVERE: java.lang.ClassCastException: LiveCoresHandler
   at
   
 org.apache.solr.core.RequestHandlers$1.create(RequestHandlers.java:152)
   at
   
 org.apache.solr.core.RequestHandlers$1.create(RequestHandlers.java:161)
   at
   
   
  
 
 org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:140)
   at
   
   
  
 
 org.apache.solr.core.RequestHandlers.initHandlersFromConfig(RequestHandlers.java:169)
   at org.apache.solr.core.SolrCore.init(SolrCore.java:444)
   
I've tried a few variations on the class definition, including
  extending
RequestHandlerBase (as suggested here:
   
   
  
 
 http://wiki.apache.org/solr/SolrRequestHandler#head-1de7365d7ecf2eac079c5f8b92ee9af712ed75c2
)
and implementing SolrRequestHandler directly.
   
I'm sure that the Solr libraries I built against and those I'm
 running
  on
are the same version too, as I unzipped the Solr war file and copies
  the
relevant jars out of there to build against.
   
Any ideas on what could be causing the ClassCastException? I've
  attached
   a
debugger to the running Solr process but it didn't shed any light on
  the
issue...
   
Thanks!
James
   
  
 
 
 
  --
  http://twitter.com/goodgravy
  512 300 4210
  http://webmynd.com/
  Sent from Bury, United Kingdom
 




-- 
http://twitter.com/goodgravy
512 300 4210
http://webmynd.com/
Sent from Bury, United Kingdom


Re: ClassCastException from custom request handler

2009-08-04 Thread Noble Paul നോബിള്‍ नोब्ळ्
what is the package of LiveCoresHandler ?
I guess the requestHandler name should be name=/livecores

On Tue, Aug 4, 2009 at 5:04 PM, James Bradyjames.colin.br...@gmail.com wrote:
 Solr version: 1.3.0 694707

 solrconfig.xml:
    requestHandler name=livecores class=LiveCoresHandler /

 public class LiveCoresHandler extends RequestHandlerBase {
    public void init(NamedList args) { }
    public String getDescription() { return ; }
    public String getSource() { return ; }
    public String getSourceId() { return ; }
    public NamedList getStatistics() { return new NamedList(); }
    public String getVersion() { return ; }

    public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse
 rsp) {
        CollectionString names =
 req.getCore().getCoreDescriptor().getCoreContainer().getCoreNames();
        rsp.add(cores, names);
        // if the cores are dynamic, you prob don't want to cache
        rsp.setHttpCaching(false);
    }
 }

 2009/8/4 Avlesh Singh avl...@gmail.com

 
  I'm sure I have the class name right - changing it to something patently
  incorrect results in the expected org.apache.solr.common.SolrException:
  Error loading class ..., rather thanthe ClassCastException.
 
 You are right about that, James.

 Which Solr version are you using?
 Can you please paste the relevant pieces in your solrconfig.xml and the
 request handler class you have created?

 Cheers
 Avlesh

 On Mon, Aug 3, 2009 at 10:51 PM, James Brady james.colin.br...@gmail.com
 wrote:

  Hi,
  Thanks for your suggestions!
 
  I'm sure I have the class name right - changing it to something patently
  incorrect results in the expected
  org.apache.solr.common.SolrException: Error loading class ..., rather
  than
  the ClassCastException.
 
  I did have some problems getting my class on the app server's classpath.
  I'm
  running with solr.home set to multicore, but creating a multicore/lib
  directory and putting my request handler class in there resulted in
 Error
  loading class errors.
 
  I found that setting jetty.class.path to include multicore/lib (and also
  explicitly point at Solr's core and common JARs) fixed the Error loading
  class errors, leaving these ClassCastExceptions...
 
  2009/8/3 Avlesh Singh avl...@gmail.com
 
   Can you cross check the class attribute for your handler in
  solrconfig.xml?
   My guess is that it is specified as solr.LiveCoresHandler. It should
 be
   fully qualified class name - com.foo.path.to.LiveCoresHandler instead.
  
   Moreover, I am damn sure that you did not forget to drop your jar into
   solr.home/lib. Checking once again might not be a bad idea :)
  
   Cheers
   Avlesh
  
   On Mon, Aug 3, 2009 at 9:11 PM, James Brady 
 james.colin.br...@gmail.com
   wrote:
  
Hi,
I'm creating a custom request handler to return a list of live cores
 in
Solr.
   
On startup, I get this exception for each core:
   
Jul 31, 2009 5:20:39 PM org.apache.solr.common. SolrException log
SEVERE: java.lang.ClassCastException: LiveCoresHandler
       at
   
 org.apache.solr.core.RequestHandlers$1.create(RequestHandlers.java:152)
       at
   
 org.apache.solr.core.RequestHandlers$1.create(RequestHandlers.java:161)
       at
   
   
  
 
 org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:140)
       at
   
   
  
 
 org.apache.solr.core.RequestHandlers.initHandlersFromConfig(RequestHandlers.java:169)
       at org.apache.solr.core.SolrCore.init(SolrCore.java:444)
   
I've tried a few variations on the class definition, including
  extending
RequestHandlerBase (as suggested here:
   
   
  
 
 http://wiki.apache.org/solr/SolrRequestHandler#head-1de7365d7ecf2eac079c5f8b92ee9af712ed75c2
)
and implementing SolrRequestHandler directly.
   
I'm sure that the Solr libraries I built against and those I'm
 running
  on
are the same version too, as I unzipped the Solr war file and copies
  the
relevant jars out of there to build against.
   
Any ideas on what could be causing the ClassCastException? I've
  attached
   a
debugger to the running Solr process but it didn't shed any light on
  the
issue...
   
Thanks!
James
   
  
 
 
 
  --
  http://twitter.com/goodgravy
  512 300 4210
  http://webmynd.com/
  Sent from Bury, United Kingdom
 




 --
 http://twitter.com/goodgravy
 512 300 4210
 http://webmynd.com/
 Sent from Bury, United Kingdom




-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: ClassCastException from custom request handler

2009-08-04 Thread James Brady
Hi, the LiveCoresHandler is in the default package - the behaviour's the
same if I have it in a properly namespaced package too...

The requestHandler name can start either be a path (starting with '/') or a
qt name:
http://wiki.apache.org/solr/SolrRequestHandler

2009/8/4 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com

 what is the package of LiveCoresHandler ?
 I guess the requestHandler name should be name=/livecores

 On Tue, Aug 4, 2009 at 5:04 PM, James Bradyjames.colin.br...@gmail.com
 wrote:
  Solr version: 1.3.0 694707
 
  solrconfig.xml:
 requestHandler name=livecores class=LiveCoresHandler /
 
  public class LiveCoresHandler extends RequestHandlerBase {
 public void init(NamedList args) { }
 public String getDescription() { return ; }
 public String getSource() { return ; }
 public String getSourceId() { return ; }
 public NamedList getStatistics() { return new NamedList(); }
 public String getVersion() { return ; }
 
 public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse
  rsp) {
 CollectionString names =
  req.getCore().getCoreDescriptor().getCoreContainer().getCoreNames();
 rsp.add(cores, names);
 // if the cores are dynamic, you prob don't want to cache
 rsp.setHttpCaching(false);
 }
  }
 
  2009/8/4 Avlesh Singh avl...@gmail.com
 
  
   I'm sure I have the class name right - changing it to something
 patently
   incorrect results in the expected
 org.apache.solr.common.SolrException:
   Error loading class ..., rather thanthe ClassCastException.
  
  You are right about that, James.
 
  Which Solr version are you using?
  Can you please paste the relevant pieces in your solrconfig.xml and the
  request handler class you have created?
 
  Cheers
  Avlesh
 
  On Mon, Aug 3, 2009 at 10:51 PM, James Brady 
 james.colin.br...@gmail.com
  wrote:
 
   Hi,
   Thanks for your suggestions!
  
   I'm sure I have the class name right - changing it to something
 patently
   incorrect results in the expected
   org.apache.solr.common.SolrException: Error loading class ...,
 rather
   than
   the ClassCastException.
  
   I did have some problems getting my class on the app server's
 classpath.
   I'm
   running with solr.home set to multicore, but creating a
 multicore/lib
   directory and putting my request handler class in there resulted in
  Error
   loading class errors.
  
   I found that setting jetty.class.path to include multicore/lib (and
 also
   explicitly point at Solr's core and common JARs) fixed the Error
 loading
   class errors, leaving these ClassCastExceptions...
  
   2009/8/3 Avlesh Singh avl...@gmail.com
  
Can you cross check the class attribute for your handler in
   solrconfig.xml?
My guess is that it is specified as solr.LiveCoresHandler. It
 should
  be
fully qualified class name - com.foo.path.to.LiveCoresHandler
 instead.
   
Moreover, I am damn sure that you did not forget to drop your jar
 into
solr.home/lib. Checking once again might not be a bad idea :)
   
Cheers
Avlesh
   
On Mon, Aug 3, 2009 at 9:11 PM, James Brady 
  james.colin.br...@gmail.com
wrote:
   
 Hi,
 I'm creating a custom request handler to return a list of live
 cores
  in
 Solr.

 On startup, I get this exception for each core:

 Jul 31, 2009 5:20:39 PM org.apache.solr.common. SolrException log
 SEVERE: java.lang.ClassCastException: LiveCoresHandler
at

  org.apache.solr.core.RequestHandlers$1.create(RequestHandlers.java:152)
at

  org.apache.solr.core.RequestHandlers$1.create(RequestHandlers.java:161)
at


   
  
 
 org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:140)
at


   
  
 
 org.apache.solr.core.RequestHandlers.initHandlersFromConfig(RequestHandlers.java:169)
at org.apache.solr.core.SolrCore.init(SolrCore.java:444)

 I've tried a few variations on the class definition, including
   extending
 RequestHandlerBase (as suggested here:


   
  
 
 http://wiki.apache.org/solr/SolrRequestHandler#head-1de7365d7ecf2eac079c5f8b92ee9af712ed75c2
 )
 and implementing SolrRequestHandler directly.

 I'm sure that the Solr libraries I built against and those I'm
  running
   on
 are the same version too, as I unzipped the Solr war file and
 copies
   the
 relevant jars out of there to build against.

 Any ideas on what could be causing the ClassCastException? I've
   attached
a
 debugger to the running Solr process but it didn't shed any light
 on
   the
 issue...

 Thanks!
 James

   
  
  
  
   --
   http://twitter.com/goodgravy
   512 300 4210
   http://webmynd.com/
   Sent from Bury, United Kingdom
  
 
 
 
 
  --
  http://twitter.com/goodgravy
  512 300 4210
  http://webmynd.com/
  Sent from Bury, United Kingdom
 



 --
 

Solr 1.4 schedule?

2009-08-04 Thread Robert Young
Hi,
When is Solr 1.4 scheduled for release? Is there any ballpark date yet?

Thanks
Rob


Delete solr data from disk space

2009-08-04 Thread Ashish Kumar Srivastava

I am facing a problem in deleting solr data form disk space.
I had 80Gb of of solr data. I deleted 30% of these data by using query in
solr-php client and committed.
Now deleted data is not visible from the solr UI but used disk space is
still 80Gb for solr data.
Please reply if you have any solution to free the disk space after deleting
some solr data.

Thanks in advance.
-- 
View this message in context: 
http://www.nabble.com/Delete-solr-data-from-disk-space-tp24808676p24808676.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr 1.4 schedule?

2009-08-04 Thread Eric Pugh
Very soon I think is the answer.  As well as when its ready.  Solr
1.4 is waiting for the next release of Lucene, which is very soon.
Once Lucene comes out, Solr will follow in a week or two barring
release issues.

Also, if you look at JIRA:
http://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=truemode=hidesorter/order=DESCsorter/field=priorityresolution=-1pid=12310230fixfor=12313351
you can see that there are 34 open issues still assigned to 1.4

Eric


On Tue, Aug 4, 2009 at 8:08 AM, Robert Youngr...@roryoung.co.uk wrote:
 Hi,
 When is Solr 1.4 scheduled for release? Is there any ballpark date yet?

 Thanks
 Rob



Re: Delete solr data from disk space

2009-08-04 Thread Markus Jelsma - Buyways B.V.
Hello,


A rigorous but quite effective method is manually deleting the files in
your SOLR_HOME/data directory and reindex the documents you want. This
will surely free some diskspace.


Cheers,

-  
Markus Jelsma  Buyways B.V. Tel. 050-3118123
Technisch ArchitectFriesestraatweg 215c Fax. 050-3118124
http://www.buyways.nl  9743 AD GroningenKvK  01074105


On Tue, 2009-08-04 at 06:26 -0700, Ashish Kumar Srivastava wrote:

 I am facing a problem in deleting solr data form disk space.
 I had 80Gb of of solr data. I deleted 30% of these data by using query in
 solr-php client and committed.
 Now deleted data is not visible from the solr UI but used disk space is
 still 80Gb for solr data.
 Please reply if you have any solution to free the disk space after deleting
 some solr data.
 
 Thanks in advance.


Re: Delete solr data from disk space

2009-08-04 Thread Ashish Kumar Srivastava

Sorry!! But this solution will not work because I deleted data by certain
query.
Then how can i know which files should be deleted. I cant delete whole data.
-- 
View this message in context: 
http://www.nabble.com/Delete-solr-data-from-disk-space-tp24808676p24808868.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Delete solr data from disk space

2009-08-04 Thread Ashish Kumar Srivastava

Hi ,


Sorry!! But this solution will not work because I deleted data by certain
query.
Then how can i know which files should be deleted. I cant delete whole data.



Markus Jelsma - Buyways B.V. wrote:
 
 Hello,
 
 
 A rigorous but quite effective method is manually deleting the files in
 your SOLR_HOME/data directory and reindex the documents you want. This
 will surely free some diskspace.
 
 
 Cheers,
 
 -  
 Markus Jelsma  Buyways B.V. Tel. 050-3118123
 Technisch ArchitectFriesestraatweg 215c Fax. 050-3118124
 http://www.buyways.nl  9743 AD GroningenKvK  01074105
 
 
 On Tue, 2009-08-04 at 06:26 -0700, Ashish Kumar Srivastava wrote:
 
 I am facing a problem in deleting solr data form disk space.
 I had 80Gb of of solr data. I deleted 30% of these data by using query in
 solr-php client and committed.
 Now deleted data is not visible from the solr UI but used disk space is
 still 80Gb for solr data.
 Please reply if you have any solution to free the disk space after
 deleting
 some solr data.
 
 Thanks in advance.
 
 

-- 
View this message in context: 
http://www.nabble.com/Delete-solr-data-from-disk-space-tp24808676p24808883.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Delete solr data from disk space

2009-08-04 Thread Otis Gospodnetic
You simply can't delete individual index files.

 Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
 From: Ashish Kumar Srivastava ashu.impe...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Tuesday, August 4, 2009 9:41:09 AM
 Subject: Re: Delete solr data from disk space
 
 
 Hi ,
 
 
 Sorry!! But this solution will not work because I deleted data by certain
 query.
 Then how can i know which files should be deleted. I cant delete whole data.
 
 
 
 Markus Jelsma - Buyways B.V. wrote:
  
  Hello,
  
  
  A rigorous but quite effective method is manually deleting the files in
  your SOLR_HOME/data directory and reindex the documents you want. This
  will surely free some diskspace.
  
  
  Cheers,
  
  -  
  Markus Jelsma  Buyways B.V. Tel. 050-3118123
  Technisch ArchitectFriesestraatweg 215c Fax. 050-3118124
  http://www.buyways.nl  9743 AD GroningenKvK  01074105
  
  
  On Tue, 2009-08-04 at 06:26 -0700, Ashish Kumar Srivastava wrote:
  
  I am facing a problem in deleting solr data form disk space.
  I had 80Gb of of solr data. I deleted 30% of these data by using query in
  solr-php client and committed.
  Now deleted data is not visible from the solr UI but used disk space is
  still 80Gb for solr data.
  Please reply if you have any solution to free the disk space after
  deleting
  some solr data.
  
  Thanks in advance.
  
  
 
 -- 
 View this message in context: 
 http://www.nabble.com/Delete-solr-data-from-disk-space-tp24808676p24808883.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Error with UpdateRequestProcessorFactory

2009-08-04 Thread Daniel Cassiano
Hi folks,

I'm having some problem with a custom handler on my Solr.
All the application works fine, but when I do a new checkout from svn
and generate a jar file with my handler, I got:

SEVERE: java.lang.NoSuchMethodError:
org.apache.solr.core.SolrCore.getUpdateProcessorFactory(Ljava/lang/String;)Lorg/apache/solr/update/processor/UpdateRequestProcessorFactory;

I checked versions of my libs and they're ok.
I'm using Solr 1.3 and the environment is the same that works previously.

Does anyone have an idea of what could be?

Thanks!

Cheers,
-- 
Daniel Cassiano
_

http://www.apontador.com.br/
http://www.maplink.com.br/


Re: Delete solr data from disk space

2009-08-04 Thread Toby Cole

Hi Anish,
Have you optimized your index?
When you delete documents in lucene they are simply marked as  
'deleted', they aren't physically removed from the disk.
To get the disk space back you must run an optimize, which re-writes  
the index out to disk without the deleted documents, then deletes the  
original.


Toby

On 4 Aug 2009, at 14:41, Ashish Kumar Srivastava wrote:



Hi ,


Sorry!! But this solution will not work because I deleted data by  
certain

query.
Then how can i know which files should be deleted. I cant delete  
whole data.




Markus Jelsma - Buyways B.V. wrote:


Hello,


A rigorous but quite effective method is manually deleting the  
files in
your SOLR_HOME/data directory and reindex the documents you want.  
This

will surely free some diskspace.


Cheers,

-
Markus Jelsma  Buyways B.V. Tel.  
050-3118123
Technisch ArchitectFriesestraatweg 215c Fax.  
050-3118124

http://www.buyways.nl  9743 AD GroningenKvK  01074105


On Tue, 2009-08-04 at 06:26 -0700, Ashish Kumar Srivastava wrote:


I am facing a problem in deleting solr data form disk space.
I had 80Gb of of solr data. I deleted 30% of these data by using  
query in

solr-php client and committed.
Now deleted data is not visible from the solr UI but used disk  
space is

still 80Gb for solr data.
Please reply if you have any solution to free the disk space after
deleting
some solr data.

Thanks in advance.





--
View this message in context: 
http://www.nabble.com/Delete-solr-data-from-disk-space-tp24808676p24808883.html
Sent from the Solr - User mailing list archive at Nabble.com.




--
Toby Cole
Software Engineer, Semantico Limited
Registered in England and Wales no. 03841410, VAT no. GB-744614334.
Registered office Lees House, 21-23 Dyke Road, Brighton BN1 3FE, UK.

Check out all our latest news and thinking on the Discovery blog
http://blogs.semantico.com/discovery-blog/



Re: Synonym aware string field typ

2009-08-04 Thread Otis Gospodnetic
Hi,

KeywordTokenizer will not tokenize your string.  I have a feeling that won't 
work with synonyms, unless your field value entirely match a synonym.  Maybe an 
example would help:

If you have:
  foo canine bar
Then KeywordTokenizer won't break this into 3 tokens.
And then canine/dog synonym won't work.

 Yes, if you define the analyzer like that, it will be used both at index and 
query time.

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
 From: Jérôme Etévé jerome.et...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Tuesday, August 4, 2009 7:33:28 AM
 Subject: Synonym aware string field typ
 
 Hi all,
 
 I'd like to have a string type which is synonym aware at query time.
 Is it ok to have something like that:
 
 
   
   
   
 tokenizerFactory=solr.KeywordTokenizerFactory
 synonyms=my_synonyms.txt ignoreCase=true/
   
 
 
 
 
 My questions are:
 
 - Will the index time analyzer stay the default for the type solr.StrField .
 - Is the KeywordTokenizerFactory the right one to use for the query
 time analyzer ?
 
 Cheers!
 
 Jerome.
 
 -- 
 Jerome Eteve.
 
 Chat with me live at http://www.eteve.net
 
 jer...@eteve.net



Re: Functions in search result

2009-08-04 Thread Grant Ingersoll


On Aug 4, 2009, at 4:37 AM, Markus Jelsma - Buyways B.V. wrote:


Solr people,


Can i retrieve results from a function query? For instance, i have a
schema in which all documents have a size in bytes field. For each
query, i also need to sum of the bytes field for the returned  
documents.
I know i can use SUM as part of a function query but i cannot figure  
it

out if it even works for me.


In short, no.  However, see https://issues.apache.org/jira/browse/SOLR-1298 
 as you are not alone in wanting this.




I prefer doing it with Solr and have the sum in the in the response
header or somewhere similar instead of iterating over the entire
resultset myself. Also, iterating over the resultset would not really
work for me either since i also need paging through start= and rows=  
to

limit the show documents but still keeping the sum of bytes the same.


Regards,

-
Markus Jelsma  Buyways B.V. Tel.  
050-3118123
Technisch ArchitectFriesestraatweg 215c Fax.  
050-3118124

http://www.buyways.nl  9743 AD GroningenKvK  01074105



--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



Re: JVM Heap utilization Memory leaks with Solr

2009-08-04 Thread Otis Gospodnetic
Hi Rahul,

A) There are no known (to me) memory leaks.
I think there are too many variables for a person to tell you what exactly is 
happening, plus you are dealing with the JVM here. :)

Try jmap -histo:live PID-HERE | less and see what's using your memory.

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
 From: Rahul R rahul.s...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Tuesday, August 4, 2009 1:09:06 AM
 Subject: JVM Heap utilization  Memory leaks with Solr
 
 I am trying to track memory utilization with my Application that uses Solr.
 Details of the setup :
 -3rd party Software : Solaris 10, Weblogic 10, jdk_150_14, Solr 1.3.0
 - Hardware : 12 CPU, 24 GB RAM
 
 For testing during PSR I am using a smaller subset of the actual data that I
 want to work with. Details of this smaller sub-set :
 - 5 million records, 4.5 GB index size
 
 Observations during PSR:
 A) I have allocated 3.2 GB for the JVM(s) that I used. After all users
 logout and doing a force GC, only 60 % of the heap is reclaimed. As part of
 the logout process I am invalidating the HttpSession and doing a close() on
 CoreContainer. From my application's side, I don't believe I am holding on
 to any resource. I wanted to know if there are known issues surrounding
 memory leaks with Solr ?
 B) To further test this, I tried deploying with shards. 3.2 GB was allocated
 to each JVM. All JVMs had 96 % free heap space after start up. I got varying
 results with this.
 Case 1 : Used 6 weblogic domains. My application was deployed one 1 domain.
 I split the 5 million index into 5 parts of 1 million each and used them as
 shards. After multiple users used the system and doing a force GC, around 94
 - 96 % of heap was reclaimed in all the JVMs.
 Case 2: Used 2 weblogic domains. My application was deployed on 1 domain. On
 the other, I deployed the entire 5 million part index as one shard. After
 multiple users used the system and doing a gorce GC, around 76 % of the heap
 was reclaimed in the shard JVM. And 96 % was reclaimed in the JVM where my
 application was running. This result further convinces me that my
 application can be absolved of holding on to memory resources.
 
 I am not sure how to interpret these results ? For searching, I am using
 Without Shards : EmbeddedSolrServer
 With Shards :CommonsHttpSolrServer
 In terms of Solr objects this is what differs in my code between normal
 search and shards search (distributed search)
 
 After looking at Case 1, I thought that the CommonsHttpSolrServer was more
 memory efficient but Case 2 proved me wrong. Or could there still be memory
 leaks in my application ? Any thoughts, suggestions would be welcome.
 
 Regards
 Rahul



Re: ClassCastException from custom request handler

2009-08-04 Thread James Brady
There is *something* strange going on with classloaders; when I put my
.class files in the right place in WEB-INF/lib in a repackaged solr.war
file, it's not found by the plugin loader (Error loading class).

So the plugin classloader isn't seeing stuff inside WEB-INF/lib.

That explains why the plugin loader sees my class files when I point
jetty.class.path at the right directory, but in that situation I also need
to point jetty.class.path at the Solr JARs explicitly.

Still, how would ClassCastExceptions be caused by class loader paths not
being set correctly? I don't follow you... To get a ClassCastException, the
class to cast to must have been found. The cast-to class must not be in the
object's inheritance hierarchy, or be built against a different version, no?

2009/8/4 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com

 I guess this is a classloader issue. it is worth trying to put it in
 the WEB-INF/lib of the solr.war


 On Tue, Aug 4, 2009 at 5:35 PM, James Bradyjames.colin.br...@gmail.com
 wrote:
  Hi, the LiveCoresHandler is in the default package - the behaviour's the
  same if I have it in a properly namespaced package too...
 
  The requestHandler name can start either be a path (starting with '/') or
 a
  qt name:
  http://wiki.apache.org/solr/SolrRequestHandler
 starting w/ '/' helps in accessing it directly
 
  2009/8/4 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com
 
  what is the package of LiveCoresHandler ?
  I guess the requestHandler name should be name=/livecores
 
  On Tue, Aug 4, 2009 at 5:04 PM, James Bradyjames.colin.br...@gmail.com
 
  wrote:
   Solr version: 1.3.0 694707
  
   solrconfig.xml:
  requestHandler name=livecores class=LiveCoresHandler /
  
   public class LiveCoresHandler extends RequestHandlerBase {
  public void init(NamedList args) { }
  public String getDescription() { return ; }
  public String getSource() { return ; }
  public String getSourceId() { return ; }
  public NamedList getStatistics() { return new NamedList(); }
  public String getVersion() { return ; }
  
  public void handleRequestBody(SolrQueryRequest req,
 SolrQueryResponse
   rsp) {
  CollectionString names =
   req.getCore().getCoreDescriptor().getCoreContainer().getCoreNames();
  rsp.add(cores, names);
  // if the cores are dynamic, you prob don't want to cache
  rsp.setHttpCaching(false);
  }
   }
  
   2009/8/4 Avlesh Singh avl...@gmail.com
  
   
I'm sure I have the class name right - changing it to something
patently
incorrect results in the expected
org.apache.solr.common.SolrException:
Error loading class ..., rather thanthe ClassCastException.
   
   You are right about that, James.
  
   Which Solr version are you using?
   Can you please paste the relevant pieces in your solrconfig.xml and
 the
   request handler class you have created?
  
   Cheers
   Avlesh
  
   On Mon, Aug 3, 2009 at 10:51 PM, James Brady
   james.colin.br...@gmail.com
   wrote:
  
Hi,
Thanks for your suggestions!
   
I'm sure I have the class name right - changing it to something
patently
incorrect results in the expected
org.apache.solr.common.SolrException: Error loading class ...,
rather
than
the ClassCastException.
   
I did have some problems getting my class on the app server's
classpath.
I'm
running with solr.home set to multicore, but creating a
multicore/lib
directory and putting my request handler class in there resulted in
   Error
loading class errors.
   
I found that setting jetty.class.path to include multicore/lib (and
also
explicitly point at Solr's core and common JARs) fixed the Error
loading
class errors, leaving these ClassCastExceptions...
   
2009/8/3 Avlesh Singh avl...@gmail.com
   
 Can you cross check the class attribute for your handler in
solrconfig.xml?
 My guess is that it is specified as solr.LiveCoresHandler. It
 should
   be
 fully qualified class name - com.foo.path.to.LiveCoresHandler
 instead.

 Moreover, I am damn sure that you did not forget to drop your jar
 into
 solr.home/lib. Checking once again might not be a bad idea :)

 Cheers
 Avlesh

 On Mon, Aug 3, 2009 at 9:11 PM, James Brady 
   james.colin.br...@gmail.com
 wrote:

  Hi,
  I'm creating a custom request handler to return a list of live
  cores
   in
  Solr.
 
  On startup, I get this exception for each core:
 
  Jul 31, 2009 5:20:39 PM org.apache.solr.common. SolrException
 log
  SEVERE: java.lang.ClassCastException: LiveCoresHandler
 at
 
  
 org.apache.solr.core.RequestHandlers$1.create(RequestHandlers.java:152)
 at
 
  
 org.apache.solr.core.RequestHandlers$1.create(RequestHandlers.java:161)
 at
 
 

   
  
  
 

Re: Functions in search result

2009-08-04 Thread Otis Gospodnetic
Markus,

As far as I know, functions are executed on a per-document/field basis.  That 
is, I don't think any of them aggregate numeric field values from a result set.

 Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
 From: Markus Jelsma - Buyways B.V. mar...@buyways.nl
 To: solr-user@lucene.apache.org
 Sent: Tuesday, August 4, 2009 4:37:09 AM
 Subject: Functions in search result
 
 Solr people,
 
 
 Can i retrieve results from a function query? For instance, i have a
 schema in which all documents have a size in bytes field. For each
 query, i also need to sum of the bytes field for the returned documents.
 I know i can use SUM as part of a function query but i cannot figure it
 out if it even works for me.
 
 I prefer doing it with Solr and have the sum in the in the response
 header or somewhere similar instead of iterating over the entire
 resultset myself. Also, iterating over the resultset would not really
 work for me either since i also need paging through start= and rows= to
 limit the show documents but still keeping the sum of bytes the same.
 
 
 Regards,
 
 -  
 Markus Jelsma  Buyways B.V. Tel. 050-3118123
 Technisch ArchitectFriesestraatweg 215c Fax. 050-3118124
 http://www.buyways.nl  9743 AD GroningenKvK  01074105



Re: 99.9% uptime requirement

2009-08-04 Thread Norberto Meijome
On Mon, 3 Aug 2009 13:15:44 -0700
Robert Petersen rober...@buy.com wrote:

 Thanks all, I figured there would be more talk about daemontools if there
 were really a need.  I appreciate the input and for starters we'll put two
 slaves behind a load balancer and grow it from there.
 

Robert,
not taking away from daemon tools, but daemon tools won't help you if your
whole server goes down.

 don't put all your eggs in one basket - several
servers, load balancer (hardware load balancers x 2, haproxy, etc)

and sure, use daemon tools to keep your services running within each server...

B
_
{Beto|Norberto|Numard} Meijome

Why do you sit there looking like an envelope without any address on it?
  Mark Twain

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.


Re: ClassCastException from custom request handler

2009-08-04 Thread Chantal Ackermann

Hi James!

James Brady schrieb:

There is *something* strange going on with classloaders; when I put my
.class files in the right place in WEB-INF/lib in a repackaged solr.war
file, it's not found by the plugin loader (Error loading class).

So the plugin classloader isn't seeing stuff inside WEB-INF/lib.

That explains why the plugin loader sees my class files when I point
jetty.class.path at the right directory, but in that situation I also need
to point jetty.class.path at the Solr JARs explicitly.


you cannot be sure that it sees *your* files. It only sees a class that 
qualifies with the name that is requested in your code. It's obviously 
not the class the code expects, though - as it results in a 
ClassCastException at some point. It might help to have a look at where 
and why that casting went wrong.


I wrote a custom EntityProcessor and deployed it first under 
WEB-INF/classes, and now in the plugin directory, and that worked 
without a problem. My first guess is that something with your packaging 
is wrong - what do you mean by default package? What is the full name 
of your class and how does its path in the file system look like?


Can you paste the stack trace of the exception?

Chantal



Still, how would ClassCastExceptions be caused by class loader paths not
being set correctly? I don't follow you... To get a ClassCastException, the
class to cast to must have been found. The cast-to class must not be in the
object's inheritance hierarchy, or be built against a different version, no?

2009/8/4 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com


I guess this is a classloader issue. it is worth trying to put it in
the WEB-INF/lib of the solr.war


On Tue, Aug 4, 2009 at 5:35 PM, James Bradyjames.colin.br...@gmail.com
wrote:

Hi, the LiveCoresHandler is in the default package - the behaviour's the
same if I have it in a properly namespaced package too...

The requestHandler name can start either be a path (starting with '/') or

a

qt name:
http://wiki.apache.org/solr/SolrRequestHandler

starting w/ '/' helps in accessing it directly

2009/8/4 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com

what is the package of LiveCoresHandler ?
I guess the requestHandler name should be name=/livecores

On Tue, Aug 4, 2009 at 5:04 PM, James Bradyjames.colin.br...@gmail.com
wrote:

Solr version: 1.3.0 694707

solrconfig.xml:
   requestHandler name=livecores class=LiveCoresHandler /

public class LiveCoresHandler extends RequestHandlerBase {
   public void init(NamedList args) { }
   public String getDescription() { return ; }
   public String getSource() { return ; }
   public String getSourceId() { return ; }
   public NamedList getStatistics() { return new NamedList(); }
   public String getVersion() { return ; }

   public void handleRequestBody(SolrQueryRequest req,

SolrQueryResponse

rsp) {
   CollectionString names =
req.getCore().getCoreDescriptor().getCoreContainer().getCoreNames();
   rsp.add(cores, names);
   // if the cores are dynamic, you prob don't want to cache
   rsp.setHttpCaching(false);
   }
}

2009/8/4 Avlesh Singh avl...@gmail.com


I'm sure I have the class name right - changing it to something
patently
incorrect results in the expected
org.apache.solr.common.SolrException:
Error loading class ..., rather thanthe ClassCastException.


You are right about that, James.

Which Solr version are you using?
Can you please paste the relevant pieces in your solrconfig.xml and

the

request handler class you have created?

Cheers
Avlesh

On Mon, Aug 3, 2009 at 10:51 PM, James Brady
james.colin.br...@gmail.com

wrote:
Hi,
Thanks for your suggestions!

I'm sure I have the class name right - changing it to something
patently
incorrect results in the expected
org.apache.solr.common.SolrException: Error loading class ...,
rather
than
the ClassCastException.

I did have some problems getting my class on the app server's
classpath.
I'm
running with solr.home set to multicore, but creating a
multicore/lib
directory and putting my request handler class in there resulted in

Error

loading class errors.

I found that setting jetty.class.path to include multicore/lib (and
also
explicitly point at Solr's core and common JARs) fixed the Error
loading
class errors, leaving these ClassCastExceptions...

2009/8/3 Avlesh Singh avl...@gmail.com


Can you cross check the class attribute for your handler in

solrconfig.xml?

My guess is that it is specified as solr.LiveCoresHandler. It
should

be

fully qualified class name - com.foo.path.to.LiveCoresHandler
instead.

Moreover, I am damn sure that you did not forget to drop your jar
into
solr.home/lib. Checking once again might not be a bad idea :)

Cheers
Avlesh

On Mon, Aug 3, 2009 at 9:11 PM, James Brady 

james.colin.br...@gmail.com

wrote:
Hi,
I'm creating a custom request handler to return a list of live
cores

in

Solr.

On startup, I get this exception for each core:

Jul 31, 2009 5:20:39 PM org.apache.solr.common. SolrException

log


Wild card search does not return any result

2009-08-04 Thread Mohamed Parvez
Hello All,

   I have two fields.

field name=BUS type=text indexed=true stored=true/
field name=ROLE type=text indexed=true stored=true /

I have document(which has been indexed) that has a value of ICS for BUS
field and SSE for ROLE filed

When I search for q=BUS:ics i get the result, but if i search for q=BUS:ics*
i don't get any match (or result)

when I search for q=ROLE:sse or q=ROLE:sse*, both the times I get the
result.

why BUS:ics* does not return any result ?


I have the default configuration for text filed, see below.

fieldType name=text class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
!-- in this example, we will only use synonyms at query time
filter class=solr.SynonymFilterFactory
synonyms=index_synonyms.txt ignoreCase=true expand=false/
--
!-- Case insensitive stop word removal.
 enablePositionIncrements=true ensures that a 'gap' is left to
 allow for accurate phrase queries.
--
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
enablePositionIncrements=true
/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=0
catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType



Thanks/Regards,
Parvez

Note : This is a re-post. looks like something went wrong the first time
around.


Re: Error with UpdateRequestProcessorFactory

2009-08-04 Thread Shalin Shekhar Mangar
On Tue, Aug 4, 2009 at 7:28 PM, Daniel Cassiano danielcassi...@gmail.comwrote:

 Hi folks,

 I'm having some problem with a custom handler on my Solr.
 All the application works fine, but when I do a new checkout from svn
 and generate a jar file with my handler, I got:

 SEVERE: java.lang.NoSuchMethodError:

 org.apache.solr.core.SolrCore.getUpdateProcessorFactory(Ljava/lang/String;)Lorg/apache/solr/update/processor/UpdateRequestProcessorFactory;

 I checked versions of my libs and they're ok.
 I'm using Solr 1.3 and the environment is the same that works previously.


Are you using the released Solr 1.3 or some intermediate nightly build? The
1.3 release has SolrCore.getUpdateProcessorChain(String) method.

-- 
Regards,
Shalin Shekhar Mangar.


Re: ClassCastException from custom request handler

2009-08-04 Thread James Brady
Hi Chantal!
I've included a stack trace below.

I've attached a debugger to the server starting up, and it is finding my
class file as expected... I agree it looks like something wrong with how
I've deployed the compiled code, but perhaps different Solr versions at
compile time and run time? However, I've checked and rechecked that and
can't see a problem!

The actually ClassCastException is being thrown in a anonymous
AbstractPluginLoader instance's create method:
http://svn.apache.org/viewvc/lucene/solr/tags/release-1.3.0/src/java/org/apache/solr/util/plugin/AbstractPluginLoader.java?revision=695557

It's the cast to SolrRequestHandler which fails.

Aug 4, 2009 4:24:25 PM org.apache.solr.util.plugin.AbstractPluginLoader load
INFO: created /update/csv:
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper
Aug 4, 2009 4:24:25 PM org.apache.solr.util.plugin.AbstractPluginLoader load
INFO: created /admin/: org.apache.solr.handler.admin.AdminHandlers
Aug 4, 2009 4:24:25 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.ClassCastException: com.jmsbrdy.LiveCoresHandler
at
org.apache.solr.core.RequestHandlers$1.create(RequestHandlers.java:152)
at
org.apache.solr.core.RequestHandlers$1.create(RequestHandlers.java:161)
at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:140)
at
org.apache.solr.core.RequestHandlers.initHandlersFromConfig(RequestHandlers.java:169)
at org.apache.solr.core.SolrCore.init(SolrCore.java:444)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:323)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:216)
at
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:104)
at
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69)
at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99)

At the moment, my deployment is:

   1. compile my single Java file from an Ant script (pointing at the Solr
   JARs from an exploded solr.war)
   2. copy that class file's directory tree
   (com/jmsbrdy/LiveCoresHandler.class) to a lib in the root of my jetty
   install
   3. add lib to Jetty's class path
   4. add the Solr JARs from the exploded war to Jetty's class path
   5. start the server

Can you see any problems there?

2009/8/4 Chantal Ackermann chantal.ackerm...@btelligent.de

 Hi James!

 James Brady schrieb:

 There is *something* strange going on with classloaders; when I put my
 .class files in the right place in WEB-INF/lib in a repackaged solr.war
 file, it's not found by the plugin loader (Error loading class).

 So the plugin classloader isn't seeing stuff inside WEB-INF/lib.

 That explains why the plugin loader sees my class files when I point
 jetty.class.path at the right directory, but in that situation I also need
 to point jetty.class.path at the Solr JARs explicitly.


 you cannot be sure that it sees *your* files. It only sees a class that
 qualifies with the name that is requested in your code. It's obviously not
 the class the code expects, though - as it results in a ClassCastException
 at some point. It might help to have a look at where and why that casting
 went wrong.

 I wrote a custom EntityProcessor and deployed it first under
 WEB-INF/classes, and now in the plugin directory, and that worked without a
 problem. My first guess is that something with your packaging is wrong -
 what do you mean by default package? What is the full name of your class
 and how does its path in the file system look like?

 Can you paste the stack trace of the exception?

 Chantal



 Still, how would ClassCastExceptions be caused by class loader paths not
 being set correctly? I don't follow you... To get a ClassCastException,
 the
 class to cast to must have been found. The cast-to class must not be in
 the
 object's inheritance hierarchy, or be built against a different version,
 no?

 2009/8/4 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com

  I guess this is a classloader issue. it is worth trying to put it in
 the WEB-INF/lib of the solr.war


 On Tue, Aug 4, 2009 at 5:35 PM, James Bradyjames.colin.br...@gmail.com
 wrote:

 Hi, the LiveCoresHandler is in the default package - the behaviour's the
 same if I have it in a properly namespaced package too...

 The requestHandler name can start either be a path (starting with '/')
 or

 a

 qt name:
 http://wiki.apache.org/solr/SolrRequestHandler

 starting w/ '/' helps in accessing it directly

 2009/8/4 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com

 what is the package of LiveCoresHandler ?
 I guess the requestHandler name should be name=/livecores

 On Tue, Aug 4, 2009 at 5:04 PM, James Brady
 james.colin.br...@gmail.com
 wrote:

 Solr version: 1.3.0 694707

 solrconfig.xml:
   requestHandler name=livecores class=LiveCoresHandler /

 public class LiveCoresHandler extends RequestHandlerBase {
   public void init(NamedList args) { }
   public String getDescription() { 

Re: 99.9% uptime requirement

2009-08-04 Thread Walter Underwood
Right. You don't get to 99.9% by assuming that an 8 hour outage is OK.  
Design for continuous uptime, with plans for how long it takes to  
patch around a single point of failure. For example, if your load  
balancer is a single point of failure, make sure that you can redirect  
the front end servers to a single Solr server in much less than 8 hours.


Also, think about your SLA. Can the search index be more than 8 hours  
stale? How quickly do you need to be able to replace a failed indexing  
server? You might be able to run indexing locally on each search  
server if they are lightly loaded.


wunder

On Aug 4, 2009, at 7:11 AM, Norberto Meijome wrote:


On Mon, 3 Aug 2009 13:15:44 -0700
Robert Petersen rober...@buy.com wrote:

Thanks all, I figured there would be more talk about daemontools if  
there
were really a need.  I appreciate the input and for starters we'll  
put two

slaves behind a load balancer and grow it from there.



Robert,
not taking away from daemon tools, but daemon tools won't help you  
if your

whole server goes down.

don't put all your eggs in one basket - several
servers, load balancer (hardware load balancers x 2, haproxy, etc)

and sure, use daemon tools to keep your services running within each  
server...


B
_
{Beto|Norberto|Numard} Meijome

Why do you sit there looking like an envelope without any address  
on it?

 Mark Twain

I speak for myself, not my employer. Contents may be hot. Slippery  
when wet.
Reading disclaimers makes you go blind. Writing them is worse. You  
have been

Warned.





Re: Synonym aware string field typ

2009-08-04 Thread Jérôme Etévé
Hi Otis,

 Thanks. Yep, this synonym behaviour is the one I want.

 So if I don't want the synonyms to be applied at index time, I need
to specify an index time analyzer right ?

Jerome.


2009/8/4 Otis Gospodnetic otis_gospodne...@yahoo.com:
 Hi,

 KeywordTokenizer will not tokenize your string.  I have a feeling that won't 
 work with synonyms, unless your field value entirely match a synonym.  Maybe 
 an example would help:

 If you have:
  foo canine bar
 Then KeywordTokenizer won't break this into 3 tokens.
 And then canine/dog synonym won't work.

  Yes, if you define the analyzer like that, it will be used both at index and 
 query time.

 Otis
 --
 Sematext is hiring -- http://sematext.com/about/jobs.html?mls
 Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



 - Original Message 
 From: Jérôme Etévé jerome.et...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Tuesday, August 4, 2009 7:33:28 AM
 Subject: Synonym aware string field typ

 Hi all,

 I'd like to have a string type which is synonym aware at query time.
 Is it ok to have something like that:





 tokenizerFactory=solr.KeywordTokenizerFactory
 synonyms=my_synonyms.txt ignoreCase=true/





 My questions are:

 - Will the index time analyzer stay the default for the type solr.StrField .
 - Is the KeywordTokenizerFactory the right one to use for the query
 time analyzer ?

 Cheers!

 Jerome.

 --
 Jerome Eteve.

 Chat with me live at http://www.eteve.net

 jer...@eteve.net





-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

jer...@eteve.net


Re: ClassCastException from custom request handler

2009-08-04 Thread Chantal Ackermann

Hi there,

could it be that something with the Generics code in the plugin loader 
classes works not as expected? Citing for example

http://stackoverflow.com/questions/372250/java-generics-arrays-and-the-classcastexception
this is because

Generics only provide type-safety at compile-time.


80-84
@SuppressWarnings(unchecked)
protected T create( ResourceLoader loader, String name, String 
className, Node node ) throws Exception

{
  return (T) loader.newInstance( className, getDefaultPackages() );
}

I am not sure what T is at runtime in this case. The subclass (anonymous 
in RequestHandlers line 139) replaces T with SolrRequestHandler. But 
what happens in the superclass? Is it using Object? Sorry, I'm not that 
deep into Generics.


Chantal



James Brady schrieb:

Hi Chantal!
I've included a stack trace below.

I've attached a debugger to the server starting up, and it is finding my
class file as expected... I agree it looks like something wrong with how
I've deployed the compiled code, but perhaps different Solr versions at
compile time and run time? However, I've checked and rechecked that and
can't see a problem!

The actually ClassCastException is being thrown in a anonymous
AbstractPluginLoader instance's create method:
http://svn.apache.org/viewvc/lucene/solr/tags/release-1.3.0/src/java/org/apache/solr/util/plugin/AbstractPluginLoader.java?revision=695557

It's the cast to SolrRequestHandler which fails.

Aug 4, 2009 4:24:25 PM org.apache.solr.util.plugin.AbstractPluginLoader load
INFO: created /update/csv:
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper
Aug 4, 2009 4:24:25 PM org.apache.solr.util.plugin.AbstractPluginLoader load
INFO: created /admin/: org.apache.solr.handler.admin.AdminHandlers
Aug 4, 2009 4:24:25 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.ClassCastException: com.jmsbrdy.LiveCoresHandler
at
org.apache.solr.core.RequestHandlers$1.create(RequestHandlers.java:152)
at
org.apache.solr.core.RequestHandlers$1.create(RequestHandlers.java:161)
at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:140)
at
org.apache.solr.core.RequestHandlers.initHandlersFromConfig(RequestHandlers.java:169)
at org.apache.solr.core.SolrCore.init(SolrCore.java:444)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:323)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:216)
at
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:104)
at
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69)
at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99)

At the moment, my deployment is:

   1. compile my single Java file from an Ant script (pointing at the Solr
   JARs from an exploded solr.war)
   2. copy that class file's directory tree
   (com/jmsbrdy/LiveCoresHandler.class) to a lib in the root of my jetty
   install
   3. add lib to Jetty's class path
   4. add the Solr JARs from the exploded war to Jetty's class path
   5. start the server

Can you see any problems there?

2009/8/4 Chantal Ackermann chantal.ackerm...@btelligent.de


Hi James!

James Brady schrieb:


There is *something* strange going on with classloaders; when I put my
.class files in the right place in WEB-INF/lib in a repackaged solr.war
file, it's not found by the plugin loader (Error loading class).

So the plugin classloader isn't seeing stuff inside WEB-INF/lib.

That explains why the plugin loader sees my class files when I point
jetty.class.path at the right directory, but in that situation I also need
to point jetty.class.path at the Solr JARs explicitly.


you cannot be sure that it sees *your* files. It only sees a class that
qualifies with the name that is requested in your code. It's obviously not
the class the code expects, though - as it results in a ClassCastException
at some point. It might help to have a look at where and why that casting
went wrong.

I wrote a custom EntityProcessor and deployed it first under
WEB-INF/classes, and now in the plugin directory, and that worked without a
problem. My first guess is that something with your packaging is wrong -
what do you mean by default package? What is the full name of your class
and how does its path in the file system look like?

Can you paste the stack trace of the exception?

Chantal




Still, how would ClassCastExceptions be caused by class loader paths not
being set correctly? I don't follow you... To get a ClassCastException,
the
class to cast to must have been found. The cast-to class must not be in
the
object's inheritance hierarchy, or be built against a different version,
no?

2009/8/4 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com

 I guess this is a classloader issue. it is worth trying to put it in

the WEB-INF/lib of the solr.war


On Tue, Aug 4, 2009 at 5:35 PM, James Bradyjames.colin.br...@gmail.com
wrote:


Hi, the LiveCoresHandler is in the default package - the 

Re: Wild card search does not return any result

2009-08-04 Thread Otis Gospodnetic
Could it be the same reason as described here:

http://markmail.org/message/ts65a6jok3ii6nva

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
 From: Mohamed Parvez par...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Tuesday, August 4, 2009 11:26:45 AM
 Subject: Wild card search does not return any result
 
 Hello All,
 
I have two fields.
 
 
 
 
 I have document(which has been indexed) that has a value of ICS for BUS
 field and SSE for ROLE filed
 
 When I search for q=BUS:ics i get the result, but if i search for q=BUS:ics*
 i don't get any match (or result)
 
 when I search for q=ROLE:sse or q=ROLE:sse*, both the times I get the
 result.
 
 why BUS:ics* does not return any result ?
 
 
 I have the default configuration for text filed, see below.
 
 
 positionIncrementGap=100
   
 
 
 
 
 ignoreCase=true
 words=stopwords.txt
 enablePositionIncrements=true
 /
 
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
 
 
 protected=protwords.txt/
 
   
   
 
 
 ignoreCase=true expand=true/
 
 words=stopwords.txt/
 
 generateWordParts=1 generateNumberParts=1 catenateWords=0
 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
 
 
 protected=protwords.txt/
 
   
 
 
 
 
 Thanks/Regards,
 Parvez
 
 Note : This is a re-post. looks like something went wrong the first time
 around.



Re: Synonym aware string field typ

2009-08-04 Thread Otis Gospodnetic
Yes, you need to specify one or the other then, index-time or query-time, 
depending on where you want your synonyms to kick in.

Eh, hitting reply to this email used your personal email instead of 
solr-user@lucene.apache.org .  Eh eh. Making it hard for people replying to 
keep the discussion on the list without doing extra work

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
 From: Jérôme Etévé jerome.et...@gmail.com
 To: Otis Gospodnetic otis_gospodne...@yahoo.com
 Cc: solr-user@lucene.apache.org
 Sent: Tuesday, August 4, 2009 12:39:33 PM
 Subject: Re: Synonym aware string field typ
 
 Hi Otis,
 
 Thanks. Yep, this synonym behaviour is the one I want.
 
 So if I don't want the synonyms to be applied at index time, I need
 to specify an index time analyzer right ?
 
 Jerome.
 
 
 2009/8/4 Otis Gospodnetic :
  Hi,
 
  KeywordTokenizer will not tokenize your string.  I have a feeling that 
  won't 
 work with synonyms, unless your field value entirely match a synonym.  Maybe 
 an 
 example would help:
 
  If you have:
   foo canine bar
  Then KeywordTokenizer won't break this into 3 tokens.
  And then canine/dog synonym won't work.
 
   Yes, if you define the analyzer like that, it will be used both at index 
  and 
 query time.
 
  Otis
  --
  Sematext is hiring -- http://sematext.com/about/jobs.html?mls
  Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
 
 
 
  - Original Message 
  From: Jérôme Etévé 
  To: solr-user@lucene.apache.org
  Sent: Tuesday, August 4, 2009 7:33:28 AM
  Subject: Synonym aware string field typ
 
  Hi all,
 
  I'd like to have a string type which is synonym aware at query time.
  Is it ok to have something like that:
 
 
 
 
 
  tokenizerFactory=solr.KeywordTokenizerFactory
  synonyms=my_synonyms.txt ignoreCase=true/
 
 
 
 
 
  My questions are:
 
  - Will the index time analyzer stay the default for the type solr.StrField 
  .
  - Is the KeywordTokenizerFactory the right one to use for the query
  time analyzer ?
 
  Cheers!
 
  Jerome.
 
  --
  Jerome Eteve.
 
  Chat with me live at http://www.eteve.net
 
  jer...@eteve.net
 
 
 
 
 
 -- 
 Jerome Eteve.
 
 Chat with me live at http://www.eteve.net
 
 jer...@eteve.net



Re: ClassCastException from custom request handler

2009-08-04 Thread Chantal Ackermann
Code is from AbstractPluginLoader in the solr plugin package, 1.3 (the 
regular stable release, no svn checkout).



80-84
@SuppressWarnings(unchecked)
protected T create( ResourceLoader loader, String name, String
className, Node node ) throws Exception
{
   return (T) loader.newInstance( className, getDefaultPackages() );
}


Re: Synonym aware string field typ

2009-08-04 Thread Jérôme Etévé
2009/8/4 Otis Gospodnetic otis_gospodne...@yahoo.com:
 Yes, you need to specify one or the other then, index-time or query-time, 
 depending on where you want your synonyms to kick in.

Ok great. Thx !

 Eh, hitting reply to this email used your personal email instead of 
 solr-user@lucene.apache.org .  Eh eh. Making it hard for people replying to 
 keep the discussion on the list without doing extra work


It did the same for me with your message. I had to click 'reply all' .

Maybe it's a gmail problem.

J.


 Otis
 --
 Sematext is hiring -- http://sematext.com/about/jobs.html?mls
 Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



 - Original Message 
 From: Jérôme Etévé jerome.et...@gmail.com
 To: Otis Gospodnetic otis_gospodne...@yahoo.com
 Cc: solr-user@lucene.apache.org
 Sent: Tuesday, August 4, 2009 12:39:33 PM
 Subject: Re: Synonym aware string field typ

 Hi Otis,

 Thanks. Yep, this synonym behaviour is the one I want.

 So if I don't want the synonyms to be applied at index time, I need
 to specify an index time analyzer right ?

 Jerome.


 2009/8/4 Otis Gospodnetic :
  Hi,
 
  KeywordTokenizer will not tokenize your string.  I have a feeling that 
  won't
 work with synonyms, unless your field value entirely match a synonym.  Maybe 
 an
 example would help:
 
  If you have:
   foo canine bar
  Then KeywordTokenizer won't break this into 3 tokens.
  And then canine/dog synonym won't work.
 
   Yes, if you define the analyzer like that, it will be used both at index 
  and
 query time.
 
  Otis
  --
  Sematext is hiring -- http://sematext.com/about/jobs.html?mls
  Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
 
 
 
  - Original Message 
  From: Jérôme Etévé
  To: solr-user@lucene.apache.org
  Sent: Tuesday, August 4, 2009 7:33:28 AM
  Subject: Synonym aware string field typ
 
  Hi all,
 
  I'd like to have a string type which is synonym aware at query time.
  Is it ok to have something like that:
 
 
 
 
 
  tokenizerFactory=solr.KeywordTokenizerFactory
  synonyms=my_synonyms.txt ignoreCase=true/
 
 
 
 
 
  My questions are:
 
  - Will the index time analyzer stay the default for the type 
  solr.StrField .
  - Is the KeywordTokenizerFactory the right one to use for the query
  time analyzer ?
 
  Cheers!
 
  Jerome.
 
  --
  Jerome Eteve.
 
  Chat with me live at http://www.eteve.net
 
  jer...@eteve.net
 
 



 --
 Jerome Eteve.

 Chat with me live at http://www.eteve.net

 jer...@eteve.net





-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

jer...@eteve.net


Re: ClassCastException from custom request handler

2009-08-04 Thread James Brady
Yeah I was thinking T would be SolrRequestHandler too. Eclipse's debugger
can't tell me...

Lot's of other handlers are created with no problem before my plugin falls
over, so I don't think it's a problem with T not being what we expected.

Do you know of any working examples of plugins I can download and build in
my environment to see what happens?

2009/8/4 Chantal Ackermann chantal.ackerm...@btelligent.de

 Code is from AbstractPluginLoader in the solr plugin package, 1.3 (the
 regular stable release, no svn checkout).


  80-84
 @SuppressWarnings(unchecked)
 protected T create( ResourceLoader loader, String name, String
 className, Node node ) throws Exception
 {
   return (T) loader.newInstance( className, getDefaultPackages() );
 }




-- 
http://twitter.com/goodgravy
512 300 4210
http://webmynd.com/
Sent from Bury, United Kingdom


DisMax - fetching dynamic fields

2009-08-04 Thread Alexey Serba
Hi everybody,

I have a couple of dynamic fields in my schema, e.g. rating_* popularity_*

The problem I have is that if I try to specify existing fields
rating_1 popularity_1 in fl parameter - DisMax handler just
ignores them whereas StandardRequestHandler works fine.

Any clues what's wrong?

Thanks in advance,
Alex


Re: DisMax - fetching dynamic fields

2009-08-04 Thread Alexey Serba
Solr 1.4 built from trunk revision 790594 ( 02 Jul 2009 )

On Tue, Aug 4, 2009 at 9:19 PM, Alexey Serbaase...@gmail.com wrote:
 Hi everybody,

 I have a couple of dynamic fields in my schema, e.g. rating_* popularity_*

 The problem I have is that if I try to specify existing fields
 rating_1 popularity_1 in fl parameter - DisMax handler just
 ignores them whereas StandardRequestHandler works fine.

 Any clues what's wrong?

 Thanks in advance,
 Alex



Re: ClassCastException from custom request handler

2009-08-04 Thread Chantal Ackermann



James Brady schrieb:

Yeah I was thinking T would be SolrRequestHandler too. Eclipse's debugger
can't tell me...


You could try disassembling. Or Eclipse opens classes in a very 
rudimentary format when there is no source code attached. Maybe it shows 
the actual return value there, instead of T.




Lot's of other handlers are created with no problem before my plugin falls
over, so I don't think it's a problem with T not being what we expected.

Do you know of any working examples of plugins I can download and build in
my environment to see what happens?


No sorry. I've only overwritten the EntityProcessor from 
DataImportHandler, and that is not configured in solrconfig.xml.





2009/8/4 Chantal Ackermann chantal.ackerm...@btelligent.de


Code is from AbstractPluginLoader in the solr plugin package, 1.3 (the
regular stable release, no svn checkout).


 80-84

@SuppressWarnings(unchecked)
protected T create( ResourceLoader loader, String name, String
className, Node node ) throws Exception
{
  return (T) loader.newInstance( className, getDefaultPackages() );
}




--
http://twitter.com/goodgravy
512 300 4210
http://webmynd.com/
Sent from Bury, United Kingdom


Re: DIH: Any way to make update on db table?

2009-08-04 Thread Jay Hill
Excellent, thanks Avlesh and Noble.

-Jay

On Mon, Aug 3, 2009 at 9:28 PM, Avlesh Singh avl...@gmail.com wrote:

 
  datasource.getData(update mytable ); //though the name is getData()
  it can execute update commands also
 
 Even when the dataSource is readOnly, Noble?

 Cheers
 Avlesh

 2009/8/4 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com

  If your are writing a Transformer (or any other component) you can get
  hold of a dataSource instance .
 
   datasource =Context#getDataSource(name).
  //then you can invoke
  datasource.getData(update mytable );
  //though the name is getData() it can execute update commands also
 
  ensure that you do a
  datasource.close();
  after you are done
 
  On Tue, Aug 4, 2009 at 9:40 AM, Avlesh Singhavl...@gmail.com wrote:
   Couple of things -
  
 1. Your dataSource is probably in readOnly mode. It is possible to
 fire
 updates, by specifying readOnly=false in your dataSource.
 2. What you are trying achieve, is typically done using a select for
 update. For MySql, here's the documentation -
 http://dev.mysql.com/doc/refman/5.0/en/innodb-locking-reads.html
 3. You don't need to create a separate entity for firing updates.
 Writing a database procedure might be a good idea. In that case your
  query
 will simply be  entity name=mainEntity query=call MyProcedure();
  .../.
 All the heavy lifting can be done by this query.
  
   Moreover, update queries, only return the number of rows affected and
 not
  a
   resultSet. DIH expects one and hence the exception.
  
   Cheers
   Avlesh
  
   On Tue, Aug 4, 2009 at 1:49 AM, Jay Hill jayallenh...@gmail.com
 wrote:
  
   Is it possible for the DataImportHandler to update records in the
 table
  it
   is querying? For example, say I have a query like this in my entity:
  
   query=select field1, field2, from someTable where
 hasBeenIndexed=false
  
   Is there a way I can mark each record processed by updating the
   hasBeenIndexed field? Here's a config I tried:
  
   ?xml version=1.0?
   dataConfig
  dataSource
 type=JdbcDataSource
 driver=com.mysql.jdbc.Driver
 url=jdbc:mysql://localhost:3306/solrhacks
 user=user
 password=pass/
  
document name=testingDIHupdate
  entity name=mainEntity
  pk=id
  query=select id, name from tableToIndex where
   hasBeenIndexed=0
field column=id template=dihTestUpdate-${main.id}/
field column=name name=name/
  
entity name=updateEntity
pk=id
query=update tableToIndex set hasBeenIndexed=1 where
   id=${mainEntity.id}
/entity
  /entity
/document
   /dataConfig
  
   It does update the first record, but then an Exception is thrown:
   Aug 3, 2009 1:15:24 PM org.apache.solr.handler.dataimport.DocBuilder
   buildDocument
   SEVERE: Exception while processing: mainEntity document :
   SolrInputDocument[{id=id(1.0)={1}, name=name(1.0)={John Jones}}]
   org.apache.solr.handler.dataimport.DataImportHandlerException: Unable
 to
   execute query: update tableToIndex set hasBeenIndexed=1 where id=1
   Processing Document # 1
  at
  
  
 
 org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:250)
  at
  
  
 
 org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:207)
  at
  
  
 
 org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:40)
  at
  
  
 
 org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:58)
  at
  
  
 
 org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:71)
  at
  
  
 
 org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:237)
  at
  
  
 
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:344)
  at
  
  
 
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:370)
  at
  
  
 
 org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:225)
  at
  
 
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:167)
  at
  
  
 
 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:333)
  at
  
  
 
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:393)
  at
  
  
 
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:372)
   Caused by: java.lang.NullPointerException
  at
  
  
 
 org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:248)
  ... 12 more
  
  
   -Jay
  
  
 
 
 
  --
  -
  Noble Paul | Principal Engineer| AOL | http://aol.com
 



Re: Wild card search does not return any result

2009-08-04 Thread Mohamed Parvez
Thanks Otis, The thread suggests that this is bug

http://markmail.org/message/ts65a6jok3ii6nva#query:+page:1+mid:qinymqdn6mkocv4k

Both SSE and ICS are 3 letter word and both are not part of English
language.
SEE* works fine and ICS* does not work, this is sure a bug.

Any idea when will this bug be fixed or if there is any work around.


Thanks/Regards,
Parvez
GV : 786-693-2228


On Tue, Aug 4, 2009 at 11:48 AM, Otis Gospodnetic 
otis_gospodne...@yahoo.com wrote:

 Could it be the same reason as described here:

 http://markmail.org/message/ts65a6jok3ii6nva

 Otis
 --
 Sematext is hiring -- http://sematext.com/about/jobs.html?mls
 Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



 - Original Message 
  From: Mohamed Parvez par...@gmail.com
  To: solr-user@lucene.apache.org
  Sent: Tuesday, August 4, 2009 11:26:45 AM
  Subject: Wild card search does not return any result
 
  Hello All,
 
 I have two fields.
 
 
 
 
  I have document(which has been indexed) that has a value of ICS for BUS
  field and SSE for ROLE filed
 
  When I search for q=BUS:ics i get the result, but if i search for
 q=BUS:ics*
  i don't get any match (or result)
 
  when I search for q=ROLE:sse or q=ROLE:sse*, both the times I get the
  result.
 
  why BUS:ics* does not return any result ?
 
 
  I have the default configuration for text filed, see below.
 
 
  positionIncrementGap=100
 
 
 
 
 
  ignoreCase=true
  words=stopwords.txt
  enablePositionIncrements=true
  /
 
  generateWordParts=1 generateNumberParts=1 catenateWords=1
  catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
 
 
  protected=protwords.txt/
 
 
 
 
 
  ignoreCase=true expand=true/
 
  words=stopwords.txt/
 
  generateWordParts=1 generateNumberParts=1 catenateWords=0
  catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
 
 
  protected=protwords.txt/
 
 
 
 
 
  
  Thanks/Regards,
  Parvez
 
  Note : This is a re-post. looks like something went wrong the first time
  around.




Dynamic Configuration

2009-08-04 Thread pgiesin

I have a client who is interested in using Solr/Lucene as their search
engine. So far I think it meets 85% of their requirements. I have decided to
integrate with JAMon tp provide statistical/performance analysis at
run-time. The piece I am still missing is dynamic configuration of the
indexing engine. Is it possible to problematically control such things as
what fields are indexed based on content type, weights, etc? The key
requirement is that these should be modifiable without restarting the
server. I thought I may be able to provide this through JMX but these
attributes seem to be read-only. 

Pete
-- 
View this message in context: 
http://www.nabble.com/Dynamic-Configuration-tp24814729p24814729.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Error with UpdateRequestProcessorFactory

2009-08-04 Thread Daniel Cassiano
Hi Shalin,

On Tue, Aug 4, 2009 at 12:43 PM, Shalin Shekhar
Mangarshalinman...@gmail.com wrote:
 I'm having some problem with a custom handler on my Solr.
 All the application works fine, but when I do a new checkout from svn
 and generate a jar file with my handler, I got:

 SEVERE: java.lang.NoSuchMethodError:

 org.apache.solr.core.SolrCore.getUpdateProcessorFactory(Ljava/lang/String;)Lorg/apache/solr/update/processor/UpdateRequestProcessorFactory;

 I checked versions of my libs and they're ok.
 I'm using Solr 1.3 and the environment is the same that works previously.


 Are you using the released Solr 1.3 or some intermediate nightly build? The
 1.3 release has SolrCore.getUpdateProcessorChain(String) method.

You are ritght. I was using some nightly build. I changed to the
released 1.3 and it works.

Thanks!
-- 
Daniel Cassiano
_

Page: http://danielcassiano.net/
http://www.umitproject.org/


RE: facet sorting by index on sint fields

2009-08-04 Thread Simon Stanlake
To solve this issue I created a subclass of SortableIntField that overrides the 
getSortField() method as follows...

@Override
public SortField getSortField(SchemaField field, boolean reverse) {
return new SortField(field.getName(), SortField.INT, reverse);
}

I'm not really sure of the impact of this change but it seems to now do what I 
want. I'm curious as to why the SortableIntField supplied with SOLR uses 
SortField.STRING here. I found some references to it in solr-dev but no 
conclusions.

If anyone has any thoughts about the impact of this change, or why it is not 
like this by default I'd be very interested to hear.

Thanks,
Simon

-Original Message-
From: Simon Stanlake [mailto:sim...@tradebytes.com] 
Sent: Thursday, July 30, 2009 7:28 PM
To: 'solr-user@lucene.apache.org'
Subject: facet sorting by index on sint fields

Hi,
I have a field in my schema specified using

field name=wordCount type=sint/

Where sint is specified as follows (the default from schema.xml)

fieldType name=sint class=solr.SortableIntField sortMissingLast=true 
omitNorms=true/

When I do a facet on this field using sort=index I always get the values back 
in lexicographic order. Eg: adding this to a query string...

facet=truefacet.field=wordCountf.wordCount.facet.sort=index

gives me
lst name=wordCount
int name=15/int
int name=102/int
int name=26/int
...

Is this a current limitation of solr faceting or am I missing a configuration 
step somewhere? I couldn't find any notes in the docs about this.

Cheers,
Simon



Re: facet sorting by index on sint fields

2009-08-04 Thread Yonik Seeley
On Thu, Jul 30, 2009 at 10:28 PM, Simon Stanlakesim...@tradebytes.com wrote:
 Hi,
 I have a field in my schema specified using

 field name=wordCount type=sint/

 Where sint is specified as follows (the default from schema.xml)

 fieldType name=sint class=solr.SortableIntField sortMissingLast=true 
 omitNorms=true/

 When I do a facet on this field using sort=index I always get the values back 
 in lexicographic order. Eg: adding this to a query string...

 facet=truefacet.field=wordCountf.wordCount.facet.sort=index

 gives me
 lst name=wordCount
        int name=15/int
        int name=102/int
        int name=26/int
 ...

 Is this a current limitation of solr faceting or am I missing a configuration 
 step somewhere? I couldn't find any notes in the docs about this.

This is not the intention - seems like a bug somewhere.  Is it still
broken in trunk?  are you using distributed search?

-Yonik
http://www.lucidimagination.com


Re: facet sorting by index on sint fields

2009-08-04 Thread Yonik Seeley
On Tue, Aug 4, 2009 at 5:27 PM, Yonik Seeleyyo...@lucidimagination.com wrote:
 Is this a current limitation of solr faceting or am I missing a 
 configuration step somewhere? I couldn't find any notes in the docs about 
 this.

 This is not the intention - seems like a bug somewhere.  Is it still
 broken in trunk?  are you using distributed search?

OK, I just tried trunk with the example docs, with the popularity
field indexed as both int (now trie based) and sint - both seem to
work correctly.

http://localhost:8983/solr/select?q=*:*facet=truefacet.field=popularityfacet.sort=lex

-Yonik
http://www.lucidimagination.com


RE: facet sorting by index on sint fields

2009-08-04 Thread Simon Stanlake
Oh boy - I had a problem with my deploy scripts that was keeping an old version 
of the schema.xml file around. SortableIntField is working fine for me now. 
Sorry to waste everyone's time and thanks for the responses.

Simon
-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
Sent: Tuesday, August 04, 2009 2:28 PM
To: solr-user@lucene.apache.org
Subject: Re: facet sorting by index on sint fields

On Thu, Jul 30, 2009 at 10:28 PM, Simon Stanlakesim...@tradebytes.com wrote:
 Hi,
 I have a field in my schema specified using

 field name=wordCount type=sint/

 Where sint is specified as follows (the default from schema.xml)

 fieldType name=sint class=solr.SortableIntField sortMissingLast=true 
 omitNorms=true/

 When I do a facet on this field using sort=index I always get the values back 
 in lexicographic order. Eg: adding this to a query string...

 facet=truefacet.field=wordCountf.wordCount.facet.sort=index

 gives me
 lst name=wordCount
        int name=15/int
        int name=102/int
        int name=26/int
 ...

 Is this a current limitation of solr faceting or am I missing a configuration 
 step somewhere? I couldn't find any notes in the docs about this.

This is not the intention - seems like a bug somewhere.  Is it still
broken in trunk?  are you using distributed search?

-Yonik
http://www.lucidimagination.com


Re: Dynamic Configuration

2009-08-04 Thread Koji Sekiguchi

pgiesin wrote:

I have a client who is interested in using Solr/Lucene as their search
engine. So far I think it meets 85% of their requirements. I have decided to
integrate with JAMon tp provide statistical/performance analysis at
run-time. The piece I am still missing is dynamic configuration of the
indexing engine. Is it possible to problematically control such things as
what fields are indexed based on content type, weights, etc? The key
requirement is that these should be modifiable without restarting the
server. I thought I may be able to provide this through JMX but these
attributes seem to be read-only. 


Pete
  


Solr multicore might be an option. It has reload/swap/... commands
to reload/switch SolrCore:

http://wiki.apache.org/solr/CoreAdmin

Koji



Re: Wild card search does not return any result

2009-08-04 Thread Otis Gospodnetic
Hi,

I doubt it's a bug.  It's probably working correctly based on the config, etc., 
I just don't have enough details about the configuration, your request handler, 
query rewriting, the data in your index, etc. to tell you what exactly is 
happening.

 Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
 From: Mohamed Parvez par...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Tuesday, August 4, 2009 3:22:53 PM
 Subject: Re: Wild card search does not return any result
 
 Thanks Otis, The thread suggests that this is bug
 
 http://markmail.org/message/ts65a6jok3ii6nva#query:+page:1+mid:qinymqdn6mkocv4k
 
 Both SSE and ICS are 3 letter word and both are not part of English
 language.
 SEE* works fine and ICS* does not work, this is sure a bug.
 
 Any idea when will this bug be fixed or if there is any work around.
 
 
 Thanks/Regards,
 Parvez
 GV : 786-693-2228
 
 
 On Tue, Aug 4, 2009 at 11:48 AM, Otis Gospodnetic 
 otis_gospodne...@yahoo.com wrote:
 
  Could it be the same reason as described here:
 
  http://markmail.org/message/ts65a6jok3ii6nva
 
  Otis
  --
  Sematext is hiring -- http://sematext.com/about/jobs.html?mls
  Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
 
 
 
  - Original Message 
   From: Mohamed Parvez 
   To: solr-user@lucene.apache.org
   Sent: Tuesday, August 4, 2009 11:26:45 AM
   Subject: Wild card search does not return any result
  
   Hello All,
  
  I have two fields.
  
  
  
  
   I have document(which has been indexed) that has a value of ICS for BUS
   field and SSE for ROLE filed
  
   When I search for q=BUS:ics i get the result, but if i search for
  q=BUS:ics*
   i don't get any match (or result)
  
   when I search for q=ROLE:sse or q=ROLE:sse*, both the times I get the
   result.
  
   why BUS:ics* does not return any result ?
  
  
   I have the default configuration for text filed, see below.
  
  
   positionIncrementGap=100
  
  
  
  
  
   ignoreCase=true
   words=stopwords.txt
   enablePositionIncrements=true
   /
  
   generateWordParts=1 generateNumberParts=1 catenateWords=1
   catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
  
  
   protected=protwords.txt/
  
  
  
  
  
   ignoreCase=true expand=true/
  
   words=stopwords.txt/
  
   generateWordParts=1 generateNumberParts=1 catenateWords=0
   catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
  
  
   protected=protwords.txt/
  
  
  
  
  
   
   Thanks/Regards,
   Parvez
  
   Note : This is a re-post. looks like something went wrong the first time
   around.
 
 



Re: eternal optimize interrupted

2009-08-04 Thread Yonik Seeley
On Tue, Aug 4, 2009 at 6:04 AM, Thomas Kochtho...@koch.ro wrote:
 last evening we started an optimize over our solr index of 45GB. This morning
 the optimize was still running, discs spinning like crazy and de index
 directory has grew to 83GB.

Hmmm, it was probably code to done given that 45*2=90.
But with that size of an index, and given that solr/tomcat wasn't
responsive, and that there was a lot of disk IO, perhaps the system
was swapping?

-Yonik
http://www.lucidimagination.com


A Presentation on Building a Hadoop + Lucene System Architecture

2009-08-04 Thread Bradford Stephens
Hey all,

I just wanted to send a link to a presentation I made on how my
company is building its entire core BI infrastructure around Hadoop,
HBase, Lucene, and more. It features a decent amount of practical
advice: from rules for approaching scalability problems, to why we
chose certain aspects of the Hadoop Ecosystem. Perhaps you can use it
as justification for their decisions, or as a jumping-off point to
utilizing it in the real world.

I hope you find it helpful! You can catch it at my blog:
http://www.roadtofailure.com . There's also a few inflammatory
articles, such as Social Media Kills the RDBMS.

Ask me if you have any questions :)

-- 
http://www.hadoopconsulting.com -- Making Hadoop and your web apps
that use it scale
http://www.roadtofailure.com -- The Fringes of Scalability, Social
Media, and Computer Science


Re: Wild card search does not return any result

2009-08-04 Thread Avlesh Singh
You read it incorrectly Parvez.
The bug that Bill seem to have found out is with the analysis tool and NOT
the search handler itself. Results in your case is as expected. Wildcard
queries are not analyzed hence the inconsistency.
A workaround is suggested, on the same thread, here -
http://markmail.org/message/ts65a6jok3ii6nva#query:+page:1+mid:i5zxdbnvspgek2bp+state:results

Cheers
Avlesh

On Wed, Aug 5, 2009 at 12:52 AM, Mohamed Parvez par...@gmail.com wrote:

 Thanks Otis, The thread suggests that this is bug


 http://markmail.org/message/ts65a6jok3ii6nva#query:+page:1+mid:qinymqdn6mkocv4k

 Both SSE and ICS are 3 letter word and both are not part of English
 language.
 SEE* works fine and ICS* does not work, this is sure a bug.

 Any idea when will this bug be fixed or if there is any work around.

 
 Thanks/Regards,
 Parvez
 GV : 786-693-2228


 On Tue, Aug 4, 2009 at 11:48 AM, Otis Gospodnetic 
 otis_gospodne...@yahoo.com wrote:

  Could it be the same reason as described here:
 
  http://markmail.org/message/ts65a6jok3ii6nva
 
  Otis
  --
  Sematext is hiring -- http://sematext.com/about/jobs.html?mls
  Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
 
 
 
  - Original Message 
   From: Mohamed Parvez par...@gmail.com
   To: solr-user@lucene.apache.org
   Sent: Tuesday, August 4, 2009 11:26:45 AM
   Subject: Wild card search does not return any result
  
   Hello All,
  
  I have two fields.
  
  
  
  
   I have document(which has been indexed) that has a value of ICS for
 BUS
   field and SSE for ROLE filed
  
   When I search for q=BUS:ics i get the result, but if i search for
  q=BUS:ics*
   i don't get any match (or result)
  
   when I search for q=ROLE:sse or q=ROLE:sse*, both the times I get the
   result.
  
   why BUS:ics* does not return any result ?
  
  
   I have the default configuration for text filed, see below.
  
  
   positionIncrementGap=100
  
  
  
  
  
   ignoreCase=true
   words=stopwords.txt
   enablePositionIncrements=true
   /
  
   generateWordParts=1 generateNumberParts=1 catenateWords=1
   catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
  
  
   protected=protwords.txt/
  
  
  
  
  
   ignoreCase=true expand=true/
  
   words=stopwords.txt/
  
   generateWordParts=1 generateNumberParts=1 catenateWords=0
   catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
  
  
   protected=protwords.txt/
  
  
  
  
  
   
   Thanks/Regards,
   Parvez
  
   Note : This is a re-post. looks like something went wrong the first
 time
   around.
 
 



Re: JVM Heap utilization Memory leaks with Solr

2009-08-04 Thread Rahul R
Otis,
Thank you for your response. I know there are a few variables here but the
difference in memory utilization with and without shards somehow leads me to
believe that the leak could be within Solr.

I tried using a profiling tool - Yourkit. The trial version was free for 15
days. But I couldn't find anything of significance.

Regards
Rahul


On Tue, Aug 4, 2009 at 7:35 PM, Otis Gospodnetic otis_gospodne...@yahoo.com
 wrote:

 Hi Rahul,

 A) There are no known (to me) memory leaks.
 I think there are too many variables for a person to tell you what exactly
 is happening, plus you are dealing with the JVM here. :)

 Try jmap -histo:live PID-HERE | less and see what's using your memory.

 Otis
 --
 Sematext is hiring -- http://sematext.com/about/jobs.html?mls
 Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



 - Original Message 
  From: Rahul R rahul.s...@gmail.com
  To: solr-user@lucene.apache.org
  Sent: Tuesday, August 4, 2009 1:09:06 AM
  Subject: JVM Heap utilization  Memory leaks with Solr
 
  I am trying to track memory utilization with my Application that uses
 Solr.
  Details of the setup :
  -3rd party Software : Solaris 10, Weblogic 10, jdk_150_14, Solr 1.3.0
  - Hardware : 12 CPU, 24 GB RAM
 
  For testing during PSR I am using a smaller subset of the actual data
 that I
  want to work with. Details of this smaller sub-set :
  - 5 million records, 4.5 GB index size
 
  Observations during PSR:
  A) I have allocated 3.2 GB for the JVM(s) that I used. After all users
  logout and doing a force GC, only 60 % of the heap is reclaimed. As part
 of
  the logout process I am invalidating the HttpSession and doing a close()
 on
  CoreContainer. From my application's side, I don't believe I am holding
 on
  to any resource. I wanted to know if there are known issues surrounding
  memory leaks with Solr ?
  B) To further test this, I tried deploying with shards. 3.2 GB was
 allocated
  to each JVM. All JVMs had 96 % free heap space after start up. I got
 varying
  results with this.
  Case 1 : Used 6 weblogic domains. My application was deployed one 1
 domain.
  I split the 5 million index into 5 parts of 1 million each and used them
 as
  shards. After multiple users used the system and doing a force GC, around
 94
  - 96 % of heap was reclaimed in all the JVMs.
  Case 2: Used 2 weblogic domains. My application was deployed on 1 domain.
 On
  the other, I deployed the entire 5 million part index as one shard. After
  multiple users used the system and doing a gorce GC, around 76 % of the
 heap
  was reclaimed in the shard JVM. And 96 % was reclaimed in the JVM where
 my
  application was running. This result further convinces me that my
  application can be absolved of holding on to memory resources.
 
  I am not sure how to interpret these results ? For searching, I am using
  Without Shards : EmbeddedSolrServer
  With Shards :CommonsHttpSolrServer
  In terms of Solr objects this is what differs in my code between normal
  search and shards search (distributed search)
 
  After looking at Case 1, I thought that the CommonsHttpSolrServer was
 more
  memory efficient but Case 2 proved me wrong. Or could there still be
 memory
  leaks in my application ? Any thoughts, suggestions would be welcome.
 
  Regards
  Rahul




Re: Dynamic Configuration

2009-08-04 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Wed, Aug 5, 2009 at 12:59 AM, pgiesinpgie...@hubcitymedia.com wrote:

 I have a client who is interested in using Solr/Lucene as their search
 engine. So far I think it meets 85% of their requirements. I have decided to
 integrate with JAMon tp provide statistical/performance analysis at
 run-time. The piece I am still missing is dynamic configuration of the
 indexing engine. Is it possible to problematically control such things as
 what fields are indexed based on content type, weights, etc? The key
 requirement is that these should be modifiable without restarting the
 server. I thought I may be able to provide this through JMX but these
 attributes seem to be read-only.

I don't think it is possible to change the behavior of the same field
during runtime (it is not even advisable). But you can always write
the data to a different field w/ the required attributes using an
UpdateProcessor or you can write a new UpdaterequestHandler

 Pete
 --
 View this message in context: 
 http://www.nabble.com/Dynamic-Configuration-tp24814729p24814729.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com