Re: solr with hadoop

2008-01-07 Thread Otis Gospodnetic
Stu, Interesting! Can you provide more details about your setup? By "load balance the indexing stage" you mean "distribute the indexing process", right? Do you simply take your content to be indexed, split it into N chunks where N matches the number of TaskNodes in your Hadoop cluster and pr

Re: How do i normalize diff information (different type of documents) in the index ?

2008-01-07 Thread s d
Isn't there a better way to take the information into account but still normalize? taking the score of only one of the fields doesn't sound like the best thing to do (it's basically ignoring part of the information). On Jan 7, 2008 9:20 PM, Mike Klaas <[EMAIL PROTECTED]> wrote: > > On 7-Jan-08, a

Re: How do i normalize diff information (different type of documents) in the index ?

2008-01-07 Thread Mike Klaas
On 7-Jan-08, at 9:02 PM, s d wrote: e.g. if the index is field1 and field2 and documents of type (A) always have information for field1 AND information for field2 while document of type (B) always have information for field1 but NEVER information for field2. The problem is that the formula

How do i normalize diff information (different type of documents) in the index ?

2008-01-07 Thread s d
e.g. if the index is field1 and field2 and documents of type (A) always have information for field1 AND information for field2 while document of type (B) always have information for field1 but NEVER information for field2. The problem is that the formula will sum field1 and field2 hence skewing in

Newbie question: facets and filter query?

2008-01-07 Thread solruser2
I have two categories, CDs and DVDs, doing something like this: explicit disc_name^2 disc_year 1 true category explicit disc_name^2 disc_year disc_artist 1

Re: solr with hadoop

2008-01-07 Thread Stu Hood
As Mike suggested, we use Hadoop to organize our data en route to Solr. Hadoop allows us to load balance the indexing stage, and then we use the raw Lucene IndexWriter.addAllIndexes method to merge the data to be hosted on Solr instances. Thanks, Stu -Original Message- From: Mike Kla

Re: Problem with camelCase but not casing in general

2008-01-07 Thread Mike Klaas
On 7-Jan-08, at 3:21 PM, Benjamin Higgins wrote: Well, he might want to split on punctuation. I do, so I just turned off splitOnCaseChange instead of removing WordDelimiterFilterFactory completely. It's looking good now! The OP's problem might have to do with index/query-time analyzer misma

RE: Problem with camelCase but not casing in general

2008-01-07 Thread Benjamin Higgins
> Well, he might want to split on punctuation. I do, so I just turned off splitOnCaseChange instead of removing WordDelimiterFilterFactory completely. It's looking good now! > The OP's problem might have to do with index/query-time analyzer > mismatch. We'd know more if he posted the schema d

Re: Problem with camelCase but not casing in general

2008-01-07 Thread Mike Klaas
On 7-Jan-08, at 2:35 PM, Yonik Seeley wrote: Anyway, if splits on capitalization changes is not desired, getting rid of the WordDelimiterFilter in both the index and query analyzers is the right thing to do. Well, he might want to split on punctuation. self.object.frobulation.method() proba

Re: Problem with camelCase but not casing in general

2008-01-07 Thread Yonik Seeley
On Jan 7, 2008 5:26 PM, Brendan Grainger <[EMAIL PROTECTED]> wrote: > I think your problem is happening because splitOnCaseChange is 1 in > your WordDelimiterFilterFactory: > > generateWordParts="1" generateNumberParts="1" catenateWords="1" > catenateNumbers="1" catenateAll="0" splitOnCaseChange="

Re: Problem with camelCase but not casing in general

2008-01-07 Thread Brendan Grainger
I think your problem is happening because splitOnCaseChange is 1 in your WordDelimiterFilterFactory: So "getElementById" is tokenized to: (get,0,3) (Element,3,10) (By,10,12) (Id,12,14) (getElementById,0,14,posIncr=0) However getelementbyid is tokenized to: (getelementbyid,0,14) which woul

Re: Problem with camelCase but not casing in general

2008-01-07 Thread Yonik Seeley
On Jan 7, 2008 5:15 PM, Benjamin Higgins <[EMAIL PROTECTED]> wrote: > Hi all, I am using a mostly out-of-the-box install of Solr that I'm > using to search through our code repositories. I've run into a funny > problem where searches for text that is camelCased aren't returning > results unless th

Problem with camelCase but not casing in general

2008-01-07 Thread Benjamin Higgins
Hi all, I am using a mostly out-of-the-box install of Solr that I'm using to search through our code repositories. I've run into a funny problem where searches for text that is camelCased aren't returning results unless the casing is exactly the same. For example, a query for "getElementById" r

Query - multiple

2008-01-07 Thread Jae Joo
If the number of results > 2500 then sort by company_name otherwise, sort by revenue; Do I have to access 2 times? One is to get the number of results and the other one is for sort. The second query should be accessed by necessary. Any efficient way? Thanks, Jae

Re: How to override a QueryComponent

2008-01-07 Thread Ryan McKinley
You are doing things correctly, thanks for pointing this out. I just changed the initialization process to only add components that are not specified: http://svn.apache.org/viewvc?view=rev&revision=609717 thanks! ryan Brendan Grainger wrote: Hi, I'm using a solr nightly build and I have cr

Tomcat and Solr - out of memory

2008-01-07 Thread Jae Joo
Hi, What happens if Solr application hit the max. memory of heap assigned? Will be die or just slow down? Jae

How to override a QueryComponent

2008-01-07 Thread Brendan Grainger
Hi, I'm using a solr nightly build and I have created my own QueryComponent which is just a subclass of the default QueryComponent. FYI, in most cases I just delegate to the superclass, but I also allow a parameter to be used which will cause some custom filtering (which is why I'm doing