date:20120209

Thanks for your reply.

I didn't use any other params except  q(for example 
http://localhost:8080/solr/search?q=drugs). no facet, no sort.
I don't think configure newSearcher or firstSearcher can help, because I want 
every query can be very fast. Do you have other solution?
 I think 460ms is too slow even though a word  is firstly searched.


My computer 's setting:
cpu: amd 5000, 2.2GHz, 1 cpu with 2 cores.
main memory: 2G, 800Mhz
disk drive : 7200r/min
 
This is my  full search configuration:


  
   
  xslt
  dismaxdoc.xsl 
  -1
  all
  off
   filename
  10
  dismax
   filename^5.0 text^1.5
  *:*
  on
  filename text
 true


100
100   
  filename  
  100  
  3   

   
  


and my schema.xml 


 
   
   

 
 text
 id 
 


and 



  




  
  





  


  





  
  





   

  



At 2012-02-10 11:49:39,"Chris Hostetter"  wrote:
>
>: When I first search one word in solr . its response time is 460ms. When 
>: I search the same word the second time. its response time is under 70ms. 
>: I can't tolerate 460ms . Does anyone know how to improve performance?
>
>tell us more about the query itself -- what params did you use?  did you 
>sort? did you facet?
>
>(the only info you've given us so far is what defaults you configured in 
>your handler, but not what params you used at query time)
>
>
>: and my search configuration
>:  dismax
>:filename^5.0 text^1.5
>: 
>: 
>:   *:*
>:   on
>:   filename text
>:  true
>:  
>: 
>: 100
>:   filename
>:   3 
>
>-Hoss

Re: Weighting categories


if your goal is to say documents from certain categories should score 
higher then documents from other categories, regardless of what the query 
is, then you should review...

http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_increase_the_score_for_specific_documents
http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_change_the_score_of_a_document_based_on_the_.2Avalue.2A_of_a_field_.28say.2C_.22popularity.22.29

...and pick a number for each category that you use at index time (either 
as the docBoost or put in a numeric field that you use in functions)

if your goal is to say i want $current_category to be a factor in 
scoring, so that documents in $current_category score higher, but 
$current_category will change depending on the users action, then just 
incorporate "category:$current_category" into your query -- either as an 
optional clause in a lucene formated query, or using the bq or boost 
params on a dismax/edismax query.

: i've a table with products and their proper categories. Is it possible to
: weight categories, so that a user that searches for "apple ipad" don't get a
: magazine about apple ipad at the first result but the "hardware" apple ipad?
: I'm using DHI for indexing data, but don't know if there is any post-process
: to weight the categories I have.


-Hoss

Re: solr search speed is so slow.


: field, than you could use a warming query like the following:
: 
: 
:  
:   common query
:   title description author
:   category:books
:   title_sort asc
:   dismax
:  
: 

...it's improtant to remember that if params like your defType and qf (and 
sort and facet.field and fq etc...) are already specified as defaults on 
your query handler, you don't need to list them here -- just specify the 
handler you are using as the "qt" param (you didn't mention what it was 
named in your original email)...

 
  
   common query
   standard_or_whater_you_named_it
  
 

In general, most people should really only need a handful of newSearcher 
warming queries -- you certianly don't need to list *every* query that you 
want to be "fast" in newSearcher -- just enough to ensure:
 * the basic index files are loaded by the OS
 * any fields you might sort on are used (in a sort)
 * any fields you might facet on or do functions over are used (in 
faceting/functions)


-Hoss

Re: solr search speed is so slow.


: When I first search one word in solr . its response time is 460ms. When 
: I search the same word the second time. its response time is under 70ms. 
: I can't tolerate 460ms . Does anyone know how to improve performance?

tell us more about the query itself -- what params did you use?  did you 
sort? did you facet?

(the only info you've given us so far is what defaults you configured in 
your handler, but not what params you used at query time)


: and my search configuration
:  dismax
:filename^5.0 text^1.5
: 
: 
:   *:*
:   on
:   filename text
:  true
:  
: 
: 100
:   filename
:   3 

-Hoss

Re: custom TokenFilter

Think I figured it out, the tokens just needed the same position attribute.

On Thu, Feb 9, 2012 at 10:38 PM, Jamie Johnson  wrote:
> Thanks Robert, worked perfect for the index side of the house.  Now on
> the query side I have a similar Tokenizer, but it's not operating
> quite the way I want it to.  The query tokenizer generates the tokens
> properly except I'm ending up with a phrase query, i.e. field:"1 2 3
> 4" when I really want field:1 OR field:2 OR field:3 OR field:4.  Is
> there something in the tokenizer that needs to be set for this to
> generate this type of query or is it something in the query parser?
>
> On Thu, Feb 9, 2012 at 9:02 PM, Robert Muir  wrote:
>> On Thu, Feb 9, 2012 at 8:54 PM, Jamie Johnson  wrote:
>>> Again thanks.  I'll take a stab at that are you aware of any
>>> resources/examples of how to do this?  I figured I'd start with
>>> WhiteSpaceTokenizer but wasn't sure if there was a simpler place to
>>> start.
>>>
>>
>> Well, easiest is if you can build what you need out of existing resources...
>>
>> But if you need to write your own, and If your input is not massive
>> documents/you have no problem processing the whole field in RAM at
>> once, you could try looking at PatternTokenizer for an example:
>>
>> http://svn.apache.org/repos/asf/lucene/dev/trunk/modules/analysis/common/src/java/org/apache/lucene/analysis/pattern/PatternTokenizer.java
>>
>> --
>> lucidimagination.com

Re: How to do this in Solr? random result for the first few results


: ok i was looking at RandomSortField and got confused by "As long as the
: index version remains unchanged, and the same field name is reused, the
: ordering of the docs will be consistent. " So does that mean it's not really
: random if I'm hitting an index which doesn't have an update for a while?

it means that you get a consistent random ordering for each field against 
each version of the index.

if you want a completely new random ordering every time, use a new field 
name everytime (configured with a dynamic field)

"sort=random_1234 desc" will be a differnet random ordering then 
"sort=random_9876 desc" even if they index hasn't changed.


-Hoss

Re: custom TokenFilter

Thanks Robert, worked perfect for the index side of the house.  Now on
the query side I have a similar Tokenizer, but it's not operating
quite the way I want it to.  The query tokenizer generates the tokens
properly except I'm ending up with a phrase query, i.e. field:"1 2 3
4" when I really want field:1 OR field:2 OR field:3 OR field:4.  Is
there something in the tokenizer that needs to be set for this to
generate this type of query or is it something in the query parser?

On Thu, Feb 9, 2012 at 9:02 PM, Robert Muir  wrote:
> On Thu, Feb 9, 2012 at 8:54 PM, Jamie Johnson  wrote:
>> Again thanks.  I'll take a stab at that are you aware of any
>> resources/examples of how to do this?  I figured I'd start with
>> WhiteSpaceTokenizer but wasn't sure if there was a simpler place to
>> start.
>>
>
> Well, easiest is if you can build what you need out of existing resources...
>
> But if you need to write your own, and If your input is not massive
> documents/you have no problem processing the whole field in RAM at
> once, you could try looking at PatternTokenizer for an example:
>
> http://svn.apache.org/repos/asf/lucene/dev/trunk/modules/analysis/common/src/java/org/apache/lucene/analysis/pattern/PatternTokenizer.java
>
> --
> lucidimagination.com

Re: custom TokenFilter

2012-02-09 Thread Robert Muir

On Thu, Feb 9, 2012 at 8:54 PM, Jamie Johnson  wrote:
> Again thanks.  I'll take a stab at that are you aware of any
> resources/examples of how to do this?  I figured I'd start with
> WhiteSpaceTokenizer but wasn't sure if there was a simpler place to
> start.
>

Well, easiest is if you can build what you need out of existing resources...

But if you need to write your own, and If your input is not massive
documents/you have no problem processing the whole field in RAM at
once, you could try looking at PatternTokenizer for an example:

http://svn.apache.org/repos/asf/lucene/dev/trunk/modules/analysis/common/src/java/org/apache/lucene/analysis/pattern/PatternTokenizer.java

-- 
lucidimagination.com

Re: custom TokenFilter

Again thanks.  I'll take a stab at that are you aware of any
resources/examples of how to do this?  I figured I'd start with
WhiteSpaceTokenizer but wasn't sure if there was a simpler place to
start.

On Thu, Feb 9, 2012 at 8:44 PM, Robert Muir  wrote:
> On Thu, Feb 9, 2012 at 8:28 PM, Jamie Johnson  wrote:
>> Thanks Robert, I'll take a look there.  Does it sound like I'm on the
>> right the right track with what I'm implementing, in other words is a
>> TokenFilter appropriate or is there something else that would be a
>> better fit for what I've described?
>
> I can't say for sure to be honest... because its a bit too
> abstract...I don't know the reasoning behind trying to convert
> "abcdefghijk" to 1 2 3 4, and I'm not sure I really understand what
> that means either.
>
> But in general: if you are taking the whole content of a field and
> making it into tokens, then its best implemented as a tokenizer.
>
> --
> lucidimagination.com

Re: custom TokenFilter

2012-02-09 Thread Robert Muir

On Thu, Feb 9, 2012 at 8:28 PM, Jamie Johnson  wrote:
> Thanks Robert, I'll take a look there.  Does it sound like I'm on the
> right the right track with what I'm implementing, in other words is a
> TokenFilter appropriate or is there something else that would be a
> better fit for what I've described?

I can't say for sure to be honest... because its a bit too
abstract...I don't know the reasoning behind trying to convert
"abcdefghijk" to 1 2 3 4, and I'm not sure I really understand what
that means either.

But in general: if you are taking the whole content of a field and
making it into tokens, then its best implemented as a tokenizer.

-- 
lucidimagination.com

Re: custom TokenFilter

Thanks Robert, I'll take a look there.  Does it sound like I'm on the
right the right track with what I'm implementing, in other words is a
TokenFilter appropriate or is there something else that would be a
better fit for what I've described?

On Thu, Feb 9, 2012 at 6:44 PM, Robert Muir  wrote:
> If you are writing a custom tokenstream, I recommend using some of the
> resources in Lucene's test-framework.jar to test it.
> These find lots of bugs! (including thread-safety bugs)
>
> For a filter: I recommend to use the assertions in
> BaseTokenStreamTestCase: assertTokenStreamContents, assertAnalyzesTo,
> and especially checkRandomData
> http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/test-framework/src/java/org/apache/lucene/analysis/BaseTokenStreamTestCase.java
>
> When testing your filter, for even more checks, don't use Whitespace
> or Keyword Tokenizer, use MockTokenizer, it has more checks:
> http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/test-framework/src/java/org/apache/lucene/analysis/MockTokenizer.java
>
> For some examples, you can look at the tests in modules/analysis.
>
> And of course enable assertions (-ea) when testing!
>
> On Thu, Feb 9, 2012 at 6:30 PM, Jamie Johnson  wrote:
>> I have the need to take user input and index it in a unique fashion,
>> essentially the value is some string (say "abcdefghijk") and needs to
>> be converted into a set of tokens (say 1 2 3 4).  I am currently have
>> implemented a custom TokenFilter to do this, is this appropriate?  In
>> cases where I am indexing things slowly (i.e. 1 at a time) this works
>> fine, but when I send 10,000 things to solr (all in one thread) I am
>> noticing exceptions where it seems that the generated instance
>> variable is being used by several threads.  Is my implementation
>> appropriate or is there another more appropriate way to do this?  Are
>> TokenFilters reused?  Would it be more appropriate to convert the
>> stream to 1 token space separated then run that through a
>> WhiteSpaceTokenizer?  Any guidance on this would be greatly
>> appreciated.
>>
>>        class CustomFilter extends TokenFilter {
>>                private final CharTermAttribute termAtt =
>> addAttribute(CharTermAttribute.class);
>>                private final PositionIncrementAttribute posAtt =
>> addAttribute(PositionIncrementAttribute.class);
>>                protected CustomFilter(TokenStream input) {
>>                        super(input);
>>                }
>>
>>                Iterator replacement;
>>                @Override
>>                public boolean incrementToken() throws IOException {
>>
>>
>>                        if(generated == null){
>>                                //setup generated
>>                                if(!input.incrementToken()){
>>                                        return false;
>>                                }
>>
>>                                //clearAttributes();
>>                                List cells = 
>> StaticClass.generateTokens(termAtt.toString());
>>                                generated = new 
>> ArrayList(cells.size());
>>                                boolean first = true;
>>                                for(String cell : cells) {
>>                                        AttributeSource newTokenSource = 
>> this.cloneAttributes();
>>
>>                                        CharTermAttribute newTermAtt =
>> newTokenSource.addAttribute(CharTermAttribute.class);
>>                                        newTermAtt.setEmpty();
>>                                        newTermAtt.append(cell);
>>                                        OffsetAttribute newOffsetAtt =
>> newTokenSource.addAttribute(OffsetAttribute.class);
>>                                        PositionIncrementAttribute 
>> newPosIncAtt =
>> newTokenSource.addAttribute(PositionIncrementAttribute.class);
>>                                        newOffsetAtt.setOffset(0,0);
>>                                        
>> newPosIncAtt.setPositionIncrement(first ? 1 : 0);
>>                                        generated.add(newTokenSource);
>>                                        first = false;
>>                                        generated.add(newTokenSource);
>>                                }
>>
>>                        }
>>                        if(!generated.isEmpty()){
>>                                copy(this, generated.remove(0));
>>                                return true;
>>                        }
>>
>>                        return false;
>>
>>                }
>>
>>                private void copy(AttributeSource target, AttributeSource 
>> source) {
>>                        if (target != source)
>>                                source.copyTo(target);
>>                }
>>
>>                private LinkedList buffer;
>>                private LinkedList matched;
>>
>>                private boolean exhausted;
>>
>>                private AttributeSource nextTok() thr

Re: How to do this in Solr? random result for the first few results

2012-02-09 Thread mtheone



ok i was looking at RandomSortField and got confused by "As long as the
index version remains unchanged, and the same field name is reused, the
ordering of the docs will be consistent. " So does that mean it's not really
random if I'm hitting an index which doesn't have an update for a while?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-do-this-in-Solr-random-result-for-the-first-few-results-tp3728729p3731411.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Empty results with OR filter query


On Feb 9, 2012, at 20:11 , Steven Ou wrote:

> Sorry, what do you mean "explicit category rather than boolean expression"?

q=category_ids_im:634 for example.  Just to get an idea of what matches each 
category.

> Type was not changed midstream - hasn't really been changed ever, really.
> And I happen to have *just* reindexed, too.
> 
> Don't seem to have a default operator set. Not sure how to do it, either...?

Look at Solr's example schema.xml.  It'll have it spelled out there.

Erik


> --
> Steven Ou | 歐偉凡
> 
> *ravn.com* | Chief Technology Officer
> steve...@gmail.com | +1 909-569-9880
> 
> 
> On Thu, Feb 9, 2012 at 5:01 PM, Erik Hatcher  wrote:
> 
>> Extremely odd.
>> 
>> Hmmm... other things to try:
>> 
>> * query on an explicit category, rather than in a boolean expression
>> * try a different field type than sint (say just int, or string)
>> * shouldn't matter (since you're using "OR" explicitly) but double check
>> the default operator in schema.xml
>> * reindex (was the field type ever changed mid-stream?)
>> 
>> Definitely something fishy here.  Nothing obvious pops out yet.
>> 
>>   Erik
>> 
>> 
>> On Feb 9, 2012, at 19:53 , Steven Ou wrote:
>> 
>>> Actually, I take that back. Using q instead of fq still produces the same
>>> problem. Somehow it's *less* inconsistent so at first glance it looked
>> like
>>> it fixed it. However, it does *not* fix it :(
>>> --
>>> Steven Ou | 歐偉凡
>>> 
>>> *ravn.com* | Chief Technology Officer
>>> steve...@gmail.com | +1 909-569-9880
>>> 
>>> 
>>> On Thu, Feb 9, 2012 at 4:48 PM, Steven Ou  wrote:
>>> 
 Well, keeping all other filter queries the same, changing fq=
 category_ids_im:(637+OR+639) to fq=category_ids_im:(637+OR+639+OR+634)
 causes results to not show up.
 
 In fact, I took out *all* other filter queries. And while I wasn't able
 to reproduce it exactly, nonetheless when I added the third category id
>> the
 number of results *went down*. Which is consistently inconsistent, per
 se. Adding an OR cannot, logically, reduce the number of results!
 --
 Steven Ou | 歐偉凡
 
 *ravn.com* | Chief Technology Officer
 steve...@gmail.com | +1 909-569-9880
 
 
 
 On Thu, Feb 9, 2012 at 4:39 PM, Erik Hatcher >> wrote:
 
> Yes, certainly should work fine as a filter query... I was merely
>> trying
> to eliminate variables from the equation.  You've got several filters
>> and a
> q=*:* going on below, so it's obviously harder to pinpoint what could
>> be
> going wrong.  I suggest continuing to eliminate variables here, as more
> than likely some other filter is causing the documents you think should
> appear to be filtered out.
> 
>  Erik
> 
> 
> 
> On Feb 9, 2012, at 19:24 , Steven Ou wrote:
> 
>> By turning fq=category_ids_im:(637+OR+639+OR+634) to
>> q=category_ids_im:(637+OR+639+OR+634)
>> it appears to produce the correct results. But... that doesn't seem to
> make
>> sense to me? Shouldn't it work just fine as a filter query?
>> --
>> Steven Ou | 歐偉凡
>> 
>> *ravn.com* | Chief Technology Officer
>> steve...@gmail.com | +1 909-569-9880
>> 
>> 
>> On Thu, Feb 9, 2012 at 4:20 PM, Steven Ou  wrote:
>> 
>>> I don't really know how to analyze the debug output... Here it is for
> the
>>> full query I'm running, which includes other filter queries.
>>> 
>>> 
>>> *:*
>>> *:*
>>> MatchAllDocsQuery(*:*)
>>> *:*
>>> 
>>> LuceneQParser
>>> 
>>> type:Event
>>> displayable_b:true
>>> category_ids_im:(637 OR 639 OR 634)
>>> end_datetime_dt:[2012\-02\-10T00\:17\:52Z TO *]
>>> {!geofilt}
>>> 
>>> 
>>> type:Event
>>> displayable_b:true
>>> 
>>> category_ids_im:637 category_ids_im:639 category_ids_im:634
>>> 
>>> end_datetime_dt:[1328833072000 TO *]
>>> 
>>> 
>>> 
> 
>> SpatialDistanceQuery(geofilt(latlonSource=coordinates_lls(double(coordinates_lls_0_coordinate),double(coordinates_lls_1_coordinate)),latCenter=37.7561438,lonCenter=-122.4325682,dist=50.0,latMin=37.30648363225355,latMax=38.20580396774645,lonMin=-123.0013021058511,lonMax-121.86383429414894,lon2Min=-180.0,lon2Max180.0,calcDist=true,planetRadius=6371.009))
>>> 
>>> 
>>> 
>>> 1.0
>>> 
>>> 1.0
>>> 
>>> 1.0
>>> 
>>> 
>>> 0.0
>>> 
>>> 
>>> 0.0
>>> 
>>> 
>>> 0.0
>>> 
>>> 
>>> 0.0
>>> 
>>> 
>>> 0.0
>>> 
>>> 
>>> 
>>> 0.0
>>> 
>>> 0.0
>>> 
>>> 
>>> 0.0
>>> 
>>> 
>>> 0.0
>>> 
>>> 
>>> 0.0
>>> 
>>> 
>>> 0.0
>>> 
>>> 
>>> 0.0
>>> 
>>> 
>>> 
>>> 
>>> --
>>> Steven Ou | 歐偉凡
>>> 
>>> *ravn.com* | Chief Technology Officer
>>> steve...@gmail.com | +1 909-569-9880
>>> 
>>> 
>>> On Thu, Feb 9, 2012 at 4:15 PM,

Re: Empty results with OR filter query

Sorry, what do you mean "explicit category rather than boolean expression"?

Type was not changed midstream - hasn't really been changed ever, really.
And I happen to have *just* reindexed, too.

Don't seem to have a default operator set. Not sure how to do it, either...?
--
Steven Ou | 歐偉凡

*ravn.com* | Chief Technology Officer
steve...@gmail.com | +1 909-569-9880


On Thu, Feb 9, 2012 at 5:01 PM, Erik Hatcher  wrote:

> Extremely odd.
>
> Hmmm... other things to try:
>
>  * query on an explicit category, rather than in a boolean expression
>  * try a different field type than sint (say just int, or string)
>  * shouldn't matter (since you're using "OR" explicitly) but double check
> the default operator in schema.xml
>  * reindex (was the field type ever changed mid-stream?)
>
> Definitely something fishy here.  Nothing obvious pops out yet.
>
>Erik
>
>
> On Feb 9, 2012, at 19:53 , Steven Ou wrote:
>
> > Actually, I take that back. Using q instead of fq still produces the same
> > problem. Somehow it's *less* inconsistent so at first glance it looked
> like
> > it fixed it. However, it does *not* fix it :(
> > --
> > Steven Ou | 歐偉凡
> >
> > *ravn.com* | Chief Technology Officer
> > steve...@gmail.com | +1 909-569-9880
> >
> >
> > On Thu, Feb 9, 2012 at 4:48 PM, Steven Ou  wrote:
> >
> >> Well, keeping all other filter queries the same, changing fq=
> >> category_ids_im:(637+OR+639) to fq=category_ids_im:(637+OR+639+OR+634)
> >> causes results to not show up.
> >>
> >> In fact, I took out *all* other filter queries. And while I wasn't able
> >> to reproduce it exactly, nonetheless when I added the third category id
> the
> >> number of results *went down*. Which is consistently inconsistent, per
> >> se. Adding an OR cannot, logically, reduce the number of results!
> >> --
> >> Steven Ou | 歐偉凡
> >>
> >> *ravn.com* | Chief Technology Officer
> >> steve...@gmail.com | +1 909-569-9880
> >>
> >>
> >>
> >> On Thu, Feb 9, 2012 at 4:39 PM, Erik Hatcher  >wrote:
> >>
> >>> Yes, certainly should work fine as a filter query... I was merely
> trying
> >>> to eliminate variables from the equation.  You've got several filters
> and a
> >>> q=*:* going on below, so it's obviously harder to pinpoint what could
> be
> >>> going wrong.  I suggest continuing to eliminate variables here, as more
> >>> than likely some other filter is causing the documents you think should
> >>> appear to be filtered out.
> >>>
> >>>   Erik
> >>>
> >>>
> >>>
> >>> On Feb 9, 2012, at 19:24 , Steven Ou wrote:
> >>>
>  By turning fq=category_ids_im:(637+OR+639+OR+634) to
>  q=category_ids_im:(637+OR+639+OR+634)
>  it appears to produce the correct results. But... that doesn't seem to
> >>> make
>  sense to me? Shouldn't it work just fine as a filter query?
>  --
>  Steven Ou | 歐偉凡
> 
>  *ravn.com* | Chief Technology Officer
>  steve...@gmail.com | +1 909-569-9880
> 
> 
>  On Thu, Feb 9, 2012 at 4:20 PM, Steven Ou  wrote:
> 
> > I don't really know how to analyze the debug output... Here it is for
> >>> the
> > full query I'm running, which includes other filter queries.
> >
> > 
> > *:*
> > *:*
> > MatchAllDocsQuery(*:*)
> > *:*
> > 
> > LuceneQParser
> > 
> > type:Event
> > displayable_b:true
> > category_ids_im:(637 OR 639 OR 634)
> > end_datetime_dt:[2012\-02\-10T00\:17\:52Z TO *]
> > {!geofilt}
> > 
> > 
> > type:Event
> > displayable_b:true
> > 
> > category_ids_im:637 category_ids_im:639 category_ids_im:634
> > 
> > end_datetime_dt:[1328833072000 TO *]
> > 
> >
> >
> >>>
> SpatialDistanceQuery(geofilt(latlonSource=coordinates_lls(double(coordinates_lls_0_coordinate),double(coordinates_lls_1_coordinate)),latCenter=37.7561438,lonCenter=-122.4325682,dist=50.0,latMin=37.30648363225355,latMax=38.20580396774645,lonMin=-123.0013021058511,lonMax-121.86383429414894,lon2Min=-180.0,lon2Max180.0,calcDist=true,planetRadius=6371.009))
> > 
> > 
> > 
> > 1.0
> > 
> > 1.0
> > 
> > 1.0
> > 
> > 
> > 0.0
> > 
> > 
> > 0.0
> > 
> > 
> > 0.0
> > 
> > 
> > 0.0
> > 
> > 
> > 0.0
> > 
> > 
> > 
> > 0.0
> > 
> > 0.0
> > 
> > 
> > 0.0
> > 
> > 
> > 0.0
> > 
> > 
> > 0.0
> > 
> > 
> > 0.0
> > 
> > 
> > 0.0
> > 
> > 
> > 
> > 
> > --
> > Steven Ou | 歐偉凡
> >
> > *ravn.com* | Chief Technology Officer
> > steve...@gmail.com | +1 909-569-9880
> >
> >
> > On Thu, Feb 9, 2012 at 4:15 PM, Steven Ou 
> wrote:
> >
> >> Heh, yeah, I bolded the numbers for emphasis. The field type
> follows.
> >>
> >> *Dynamically Created From Pattern: **_IM<
> >>> http://192.168.1.30:8080/solr/admin/schema.jsp#>
> >>
> >> *Field Type: *SINT

Re: How to do this in Solr? random result for the first few results

2012-02-09 Thread mtheone

I'd rather boost premium ads rather than go the way of elevation.

@wunder

i think I'll go with your suggestion, thanks all

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-do-this-in-Solr-random-result-for-the-first-few-results-tp3728729p3731393.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Empty results with OR filter query

Extremely odd.

Hmmm... other things to try:

  * query on an explicit category, rather than in a boolean expression
  * try a different field type than sint (say just int, or string)
  * shouldn't matter (since you're using "OR" explicitly) but double check the 
default operator in schema.xml
  * reindex (was the field type ever changed mid-stream?)

Definitely something fishy here.  Nothing obvious pops out yet.

Erik


On Feb 9, 2012, at 19:53 , Steven Ou wrote:

> Actually, I take that back. Using q instead of fq still produces the same
> problem. Somehow it's *less* inconsistent so at first glance it looked like
> it fixed it. However, it does *not* fix it :(
> --
> Steven Ou | 歐偉凡
> 
> *ravn.com* | Chief Technology Officer
> steve...@gmail.com | +1 909-569-9880
> 
> 
> On Thu, Feb 9, 2012 at 4:48 PM, Steven Ou  wrote:
> 
>> Well, keeping all other filter queries the same, changing fq=
>> category_ids_im:(637+OR+639) to fq=category_ids_im:(637+OR+639+OR+634)
>> causes results to not show up.
>> 
>> In fact, I took out *all* other filter queries. And while I wasn't able
>> to reproduce it exactly, nonetheless when I added the third category id the
>> number of results *went down*. Which is consistently inconsistent, per
>> se. Adding an OR cannot, logically, reduce the number of results!
>> --
>> Steven Ou | 歐偉凡
>> 
>> *ravn.com* | Chief Technology Officer
>> steve...@gmail.com | +1 909-569-9880
>> 
>> 
>> 
>> On Thu, Feb 9, 2012 at 4:39 PM, Erik Hatcher wrote:
>> 
>>> Yes, certainly should work fine as a filter query... I was merely trying
>>> to eliminate variables from the equation.  You've got several filters and a
>>> q=*:* going on below, so it's obviously harder to pinpoint what could be
>>> going wrong.  I suggest continuing to eliminate variables here, as more
>>> than likely some other filter is causing the documents you think should
>>> appear to be filtered out.
>>> 
>>>   Erik
>>> 
>>> 
>>> 
>>> On Feb 9, 2012, at 19:24 , Steven Ou wrote:
>>> 
 By turning fq=category_ids_im:(637+OR+639+OR+634) to
 q=category_ids_im:(637+OR+639+OR+634)
 it appears to produce the correct results. But... that doesn't seem to
>>> make
 sense to me? Shouldn't it work just fine as a filter query?
 --
 Steven Ou | 歐偉凡
 
 *ravn.com* | Chief Technology Officer
 steve...@gmail.com | +1 909-569-9880
 
 
 On Thu, Feb 9, 2012 at 4:20 PM, Steven Ou  wrote:
 
> I don't really know how to analyze the debug output... Here it is for
>>> the
> full query I'm running, which includes other filter queries.
> 
> 
> *:*
> *:*
> MatchAllDocsQuery(*:*)
> *:*
> 
> LuceneQParser
> 
> type:Event
> displayable_b:true
> category_ids_im:(637 OR 639 OR 634)
> end_datetime_dt:[2012\-02\-10T00\:17\:52Z TO *]
> {!geofilt}
> 
> 
> type:Event
> displayable_b:true
> 
> category_ids_im:637 category_ids_im:639 category_ids_im:634
> 
> end_datetime_dt:[1328833072000 TO *]
> 
> 
> 
>>> SpatialDistanceQuery(geofilt(latlonSource=coordinates_lls(double(coordinates_lls_0_coordinate),double(coordinates_lls_1_coordinate)),latCenter=37.7561438,lonCenter=-122.4325682,dist=50.0,latMin=37.30648363225355,latMax=38.20580396774645,lonMin=-123.0013021058511,lonMax-121.86383429414894,lon2Min=-180.0,lon2Max180.0,calcDist=true,planetRadius=6371.009))
> 
> 
> 
> 1.0
> 
> 1.0
> 
> 1.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 
> 0.0
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 
> 
> --
> Steven Ou | 歐偉凡
> 
> *ravn.com* | Chief Technology Officer
> steve...@gmail.com | +1 909-569-9880
> 
> 
> On Thu, Feb 9, 2012 at 4:15 PM, Steven Ou  wrote:
> 
>> Heh, yeah, I bolded the numbers for emphasis. The field type follows.
>> 
>> *Dynamically Created From Pattern: **_IM<
>>> http://192.168.1.30:8080/solr/admin/schema.jsp#>
>> 
>> *Field Type: *SINT 
>> 
>> *Schema: *Indexed, Multivalued, Omit Norms
>> 
>> *Index: *(unstored field)
>> 
>> *Index Analyzer: *org.apache.solr.schema.FieldType$DefaultAnalyzer
>> 
>> *Query Analyzer: *org.apache.solr.schema.FieldType$DefaultAnalyzer
>> 
>> *Docs: *33730
>> 
>> *Distinct: *528
>> --
>> Steven Ou | 歐偉凡
>> 
>> *ravn.com* | Chief Technology Officer
>> steve...@gmail.com | +1 909-569-9880
>> 
>> 
>> 
>> On Thu, Feb 9, 2012 at 4:08 PM, Erik Hatcher >>> wrote:
>> 
>>> What type of field is category_ids_im?
>>> 
>>> And I'm assuming that the *'s below are for emphasis and not really
>>> in
>>> your query?
>>> 
>>

Re: Empty results with OR filter query

I'm really sorry to be spamming everyone. I know I've sent out a ton of
emails, but I ran it without *any* other filters (just
solr/select?q=category_ids_im:(637+OR+639+OR+634)&debugQuery=true) and
here's the debug. This produces 1 result only. Removing category 634
produces 11 results. Can anyone help? I noticed the parsedquery_toString
has weird symbols in it:


category_ids_im:(637 OR 639 OR 634)
category_ids_im:(637 OR 639 OR 634)

category_ids_im:637 category_ids_im:639 category_ids_im:634


category_ids_im:€#0;ɽ category_ids_im:€#0;ɿ category_ids_im:€#0;ɺ



11.743038 = (MATCH) sum of: 4.007905 = (MATCH) weight(category_ids_im:€#0;ɽ
in 4268), product of: 0.5842093 = queryWeight(category_ids_im:€#0;ɽ),
product of: 6.860392 = idf(docFreq=187, maxDocs=65962) 0.08515684 =
queryNorm 6.860392 = (MATCH) fieldWeight(category_ids_im:€#0;ɽ in 4268),
product of: 1.0 = tf(termFreq(category_ids_im:€#0;ɽ)=1) 6.860392 =
idf(docFreq=187, maxDocs=65962) 1.0 = fieldNorm(field=category_ids_im,
doc=4268) 3.959362 = (MATCH) weight(category_ids_im:€#0;ɿ in 4268), product
of: 0.58066064 = queryWeight(category_ids_im:€#0;ɿ), product of: 6.8187194
= idf(docFreq=195, maxDocs=65962) 0.08515684 = queryNorm 6.8187194 =
(MATCH) fieldWeight(category_ids_im:€#0;ɿ in 4268), product of: 1.0 =
tf(termFreq(category_ids_im:€#0;ɿ)=1) 6.8187194 = idf(docFreq=195,
maxDocs=65962) 1.0 = fieldNorm(field=category_ids_im, doc=4268) 3.7757707 =
(MATCH) weight(category_ids_im:€#0;ɺ in 4268), product of: 0.56703854 =
queryWeight(category_ids_im:€#0;ɺ), product of: 6.658755 = idf(docFreq=229,
maxDocs=65962) 0.08515684 = queryNorm 6.658755 = (MATCH)
fieldWeight(category_ids_im:€#0;ɺ in 4268), product of: 1.0 =
tf(termFreq(category_ids_im:€#0;ɺ)=1) 6.658755 = idf(docFreq=229,
maxDocs=65962) 1.0 = fieldNorm(field=category_ids_im, doc=4268)


LuceneQParser
...

--
Steven Ou | 歐偉凡

*ravn.com* | Chief Technology Officer
steve...@gmail.com | +1 909-569-9880


On Thu, Feb 9, 2012 at 4:53 PM, Steven Ou  wrote:

> Actually, I take that back. Using q instead of fq still produces the same
> problem. Somehow it's *less* inconsistent so at first glance it looked
> like it fixed it. However, it does *not* fix it :(
>
> --
> Steven Ou | 歐偉凡
>
> *ravn.com* | Chief Technology Officer
> steve...@gmail.com | +1 909-569-9880
>
>
> On Thu, Feb 9, 2012 at 4:48 PM, Steven Ou  wrote:
>
>> Well, keeping all other filter queries the same, changing fq=
>> category_ids_im:(637+OR+639) to fq=category_ids_im:(637+OR+639+OR+634)
>> causes results to not show up.
>>
>> In fact, I took out *all* other filter queries. And while I wasn't able
>> to reproduce it exactly, nonetheless when I added the third category id the
>> number of results *went down*. Which is consistently inconsistent, per
>> se. Adding an OR cannot, logically, reduce the number of results!
>> --
>> Steven Ou | 歐偉凡
>>
>> *ravn.com* | Chief Technology Officer
>> steve...@gmail.com | +1 909-569-9880
>>
>>
>>
>> On Thu, Feb 9, 2012 at 4:39 PM, Erik Hatcher wrote:
>>
>>> Yes, certainly should work fine as a filter query... I was merely trying
>>> to eliminate variables from the equation.  You've got several filters and a
>>> q=*:* going on below, so it's obviously harder to pinpoint what could be
>>> going wrong.  I suggest continuing to eliminate variables here, as more
>>> than likely some other filter is causing the documents you think should
>>> appear to be filtered out.
>>>
>>>Erik
>>>
>>>
>>>
>>> On Feb 9, 2012, at 19:24 , Steven Ou wrote:
>>>
>>> > By turning fq=category_ids_im:(637+OR+639+OR+634) to
>>> > q=category_ids_im:(637+OR+639+OR+634)
>>> > it appears to produce the correct results. But... that doesn't seem to
>>> make
>>> > sense to me? Shouldn't it work just fine as a filter query?
>>> > --
>>> > Steven Ou | 歐偉凡
>>> >
>>> > *ravn.com* | Chief Technology Officer
>>> > steve...@gmail.com | +1 909-569-9880
>>> >
>>> >
>>> > On Thu, Feb 9, 2012 at 4:20 PM, Steven Ou  wrote:
>>> >
>>> >> I don't really know how to analyze the debug output... Here it is for
>>> the
>>> >> full query I'm running, which includes other filter queries.
>>> >>
>>> >> 
>>> >> *:*
>>> >> *:*
>>> >> MatchAllDocsQuery(*:*)
>>> >> *:*
>>> >> 
>>> >> LuceneQParser
>>> >> 
>>> >> type:Event
>>> >> displayable_b:true
>>> >> category_ids_im:(637 OR 639 OR 634)
>>> >> end_datetime_dt:[2012\-02\-10T00\:17\:52Z TO *]
>>> >> {!geofilt}
>>> >> 
>>> >> 
>>> >> type:Event
>>> >> displayable_b:true
>>> >> 
>>> >> category_ids_im:637 category_ids_im:639 category_ids_im:634
>>> >> 
>>> >> end_datetime_dt:[1328833072000 TO *]
>>> >> 
>>> >>
>>> >>
>>> SpatialDistanceQuery(geofilt(latlonSource=coordinates_lls(double(coordinates_lls_0_coordinate),double(coordinates_lls_1_coordinate)),latCenter=37.7561438,lonCenter=-122.4325682,dist=50.0,latMin=37.30648363225355,latMax=38.20580396774645,lonMin=-123.0013021058511,lonMax-121.86383429414894,lon2Min=-180.0,lon2Max180.0,calcDist=true,planetRadius=6371.009))
>>> >> 
>>> >> 
>>> >> 
>>> >> 1.0
>>>

Re: Empty results with OR filter query

Actually, I take that back. Using q instead of fq still produces the same
problem. Somehow it's *less* inconsistent so at first glance it looked like
it fixed it. However, it does *not* fix it :(
--
Steven Ou | 歐偉凡

*ravn.com* | Chief Technology Officer
steve...@gmail.com | +1 909-569-9880


On Thu, Feb 9, 2012 at 4:48 PM, Steven Ou  wrote:

> Well, keeping all other filter queries the same, changing fq=
> category_ids_im:(637+OR+639) to fq=category_ids_im:(637+OR+639+OR+634)
> causes results to not show up.
>
> In fact, I took out *all* other filter queries. And while I wasn't able
> to reproduce it exactly, nonetheless when I added the third category id the
> number of results *went down*. Which is consistently inconsistent, per
> se. Adding an OR cannot, logically, reduce the number of results!
> --
> Steven Ou | 歐偉凡
>
> *ravn.com* | Chief Technology Officer
> steve...@gmail.com | +1 909-569-9880
>
>
>
> On Thu, Feb 9, 2012 at 4:39 PM, Erik Hatcher wrote:
>
>> Yes, certainly should work fine as a filter query... I was merely trying
>> to eliminate variables from the equation.  You've got several filters and a
>> q=*:* going on below, so it's obviously harder to pinpoint what could be
>> going wrong.  I suggest continuing to eliminate variables here, as more
>> than likely some other filter is causing the documents you think should
>> appear to be filtered out.
>>
>>Erik
>>
>>
>>
>> On Feb 9, 2012, at 19:24 , Steven Ou wrote:
>>
>> > By turning fq=category_ids_im:(637+OR+639+OR+634) to
>> > q=category_ids_im:(637+OR+639+OR+634)
>> > it appears to produce the correct results. But... that doesn't seem to
>> make
>> > sense to me? Shouldn't it work just fine as a filter query?
>> > --
>> > Steven Ou | 歐偉凡
>> >
>> > *ravn.com* | Chief Technology Officer
>> > steve...@gmail.com | +1 909-569-9880
>> >
>> >
>> > On Thu, Feb 9, 2012 at 4:20 PM, Steven Ou  wrote:
>> >
>> >> I don't really know how to analyze the debug output... Here it is for
>> the
>> >> full query I'm running, which includes other filter queries.
>> >>
>> >> 
>> >> *:*
>> >> *:*
>> >> MatchAllDocsQuery(*:*)
>> >> *:*
>> >> 
>> >> LuceneQParser
>> >> 
>> >> type:Event
>> >> displayable_b:true
>> >> category_ids_im:(637 OR 639 OR 634)
>> >> end_datetime_dt:[2012\-02\-10T00\:17\:52Z TO *]
>> >> {!geofilt}
>> >> 
>> >> 
>> >> type:Event
>> >> displayable_b:true
>> >> 
>> >> category_ids_im:637 category_ids_im:639 category_ids_im:634
>> >> 
>> >> end_datetime_dt:[1328833072000 TO *]
>> >> 
>> >>
>> >>
>> SpatialDistanceQuery(geofilt(latlonSource=coordinates_lls(double(coordinates_lls_0_coordinate),double(coordinates_lls_1_coordinate)),latCenter=37.7561438,lonCenter=-122.4325682,dist=50.0,latMin=37.30648363225355,latMax=38.20580396774645,lonMin=-123.0013021058511,lonMax-121.86383429414894,lon2Min=-180.0,lon2Max180.0,calcDist=true,planetRadius=6371.009))
>> >> 
>> >> 
>> >> 
>> >> 1.0
>> >> 
>> >> 1.0
>> >> 
>> >> 1.0
>> >> 
>> >> 
>> >> 0.0
>> >> 
>> >> 
>> >> 0.0
>> >> 
>> >> 
>> >> 0.0
>> >> 
>> >> 
>> >> 0.0
>> >> 
>> >> 
>> >> 0.0
>> >> 
>> >> 
>> >> 
>> >> 0.0
>> >> 
>> >> 0.0
>> >> 
>> >> 
>> >> 0.0
>> >> 
>> >> 
>> >> 0.0
>> >> 
>> >> 
>> >> 0.0
>> >> 
>> >> 
>> >> 0.0
>> >> 
>> >> 
>> >> 0.0
>> >> 
>> >> 
>> >> 
>> >> 
>> >> --
>> >> Steven Ou | 歐偉凡
>> >>
>> >> *ravn.com* | Chief Technology Officer
>> >> steve...@gmail.com | +1 909-569-9880
>> >>
>> >>
>> >> On Thu, Feb 9, 2012 at 4:15 PM, Steven Ou  wrote:
>> >>
>> >>> Heh, yeah, I bolded the numbers for emphasis. The field type follows.
>> >>>
>> >>> *Dynamically Created From Pattern: **_IM<
>> http://192.168.1.30:8080/solr/admin/schema.jsp#>
>> >>>
>> >>> *Field Type: *SINT 
>> >>>
>> >>> *Schema: *Indexed, Multivalued, Omit Norms
>> >>>
>> >>> *Index: *(unstored field)
>> >>>
>> >>> *Index Analyzer: *org.apache.solr.schema.FieldType$DefaultAnalyzer
>> >>>
>> >>> *Query Analyzer: *org.apache.solr.schema.FieldType$DefaultAnalyzer
>> >>>
>> >>> *Docs: *33730
>> >>>
>> >>> *Distinct: *528
>> >>> --
>> >>> Steven Ou | 歐偉凡
>> >>>
>> >>> *ravn.com* | Chief Technology Officer
>> >>> steve...@gmail.com | +1 909-569-9880
>> >>>
>> >>>
>> >>>
>> >>> On Thu, Feb 9, 2012 at 4:08 PM, Erik Hatcher > >wrote:
>> >>>
>>  What type of field is category_ids_im?
>> 
>>  And I'm assuming that the *'s below are for emphasis and not really
>> in
>>  your query?
>> 
>>  Try your query in the q parameter and turn on debug
>> (&debugQuery=true)
>>  and see how your query is parsing.  That'll likely tell all.
>> 
>>    Erik
>> 
>>  On Feb 9, 2012, at 18:42 , Steven Ou wrote:
>> 
>> > Hey guys, I'm stumped - hope someone can help!
>> >
>> > Basically, I'm running a filter query that filters by category (e.g.
>> > fq=category_ids_im:(637 OR 639 OR 634)). However, it often produces
>> no
>> > results whatsoever even though each individual query *does* produce
>>  results.
>> >
>>

Re: Empty results with OR filter query

Well, keeping all other filter queries the same, changing fq=
category_ids_im:(637+OR+639) to fq=category_ids_im:(637+OR+639+OR+634)
causes results to not show up.

In fact, I took out *all* other filter queries. And while I wasn't able to
reproduce it exactly, nonetheless when I added the third category id the
number of results *went down*. Which is consistently inconsistent, per se.
Adding an OR cannot, logically, reduce the number of results!
--
Steven Ou | 歐偉凡

*ravn.com* | Chief Technology Officer
steve...@gmail.com | +1 909-569-9880


On Thu, Feb 9, 2012 at 4:39 PM, Erik Hatcher  wrote:

> Yes, certainly should work fine as a filter query... I was merely trying
> to eliminate variables from the equation.  You've got several filters and a
> q=*:* going on below, so it's obviously harder to pinpoint what could be
> going wrong.  I suggest continuing to eliminate variables here, as more
> than likely some other filter is causing the documents you think should
> appear to be filtered out.
>
>Erik
>
>
>
> On Feb 9, 2012, at 19:24 , Steven Ou wrote:
>
> > By turning fq=category_ids_im:(637+OR+639+OR+634) to
> > q=category_ids_im:(637+OR+639+OR+634)
> > it appears to produce the correct results. But... that doesn't seem to
> make
> > sense to me? Shouldn't it work just fine as a filter query?
> > --
> > Steven Ou | 歐偉凡
> >
> > *ravn.com* | Chief Technology Officer
> > steve...@gmail.com | +1 909-569-9880
> >
> >
> > On Thu, Feb 9, 2012 at 4:20 PM, Steven Ou  wrote:
> >
> >> I don't really know how to analyze the debug output... Here it is for
> the
> >> full query I'm running, which includes other filter queries.
> >>
> >> 
> >> *:*
> >> *:*
> >> MatchAllDocsQuery(*:*)
> >> *:*
> >> 
> >> LuceneQParser
> >> 
> >> type:Event
> >> displayable_b:true
> >> category_ids_im:(637 OR 639 OR 634)
> >> end_datetime_dt:[2012\-02\-10T00\:17\:52Z TO *]
> >> {!geofilt}
> >> 
> >> 
> >> type:Event
> >> displayable_b:true
> >> 
> >> category_ids_im:637 category_ids_im:639 category_ids_im:634
> >> 
> >> end_datetime_dt:[1328833072000 TO *]
> >> 
> >>
> >>
> SpatialDistanceQuery(geofilt(latlonSource=coordinates_lls(double(coordinates_lls_0_coordinate),double(coordinates_lls_1_coordinate)),latCenter=37.7561438,lonCenter=-122.4325682,dist=50.0,latMin=37.30648363225355,latMax=38.20580396774645,lonMin=-123.0013021058511,lonMax-121.86383429414894,lon2Min=-180.0,lon2Max180.0,calcDist=true,planetRadius=6371.009))
> >> 
> >> 
> >> 
> >> 1.0
> >> 
> >> 1.0
> >> 
> >> 1.0
> >> 
> >> 
> >> 0.0
> >> 
> >> 
> >> 0.0
> >> 
> >> 
> >> 0.0
> >> 
> >> 
> >> 0.0
> >> 
> >> 
> >> 0.0
> >> 
> >> 
> >> 
> >> 0.0
> >> 
> >> 0.0
> >> 
> >> 
> >> 0.0
> >> 
> >> 
> >> 0.0
> >> 
> >> 
> >> 0.0
> >> 
> >> 
> >> 0.0
> >> 
> >> 
> >> 0.0
> >> 
> >> 
> >> 
> >> 
> >> --
> >> Steven Ou | 歐偉凡
> >>
> >> *ravn.com* | Chief Technology Officer
> >> steve...@gmail.com | +1 909-569-9880
> >>
> >>
> >> On Thu, Feb 9, 2012 at 4:15 PM, Steven Ou  wrote:
> >>
> >>> Heh, yeah, I bolded the numbers for emphasis. The field type follows.
> >>>
> >>> *Dynamically Created From Pattern: **_IM<
> http://192.168.1.30:8080/solr/admin/schema.jsp#>
> >>>
> >>> *Field Type: *SINT 
> >>>
> >>> *Schema: *Indexed, Multivalued, Omit Norms
> >>>
> >>> *Index: *(unstored field)
> >>>
> >>> *Index Analyzer: *org.apache.solr.schema.FieldType$DefaultAnalyzer
> >>>
> >>> *Query Analyzer: *org.apache.solr.schema.FieldType$DefaultAnalyzer
> >>>
> >>> *Docs: *33730
> >>>
> >>> *Distinct: *528
> >>> --
> >>> Steven Ou | 歐偉凡
> >>>
> >>> *ravn.com* | Chief Technology Officer
> >>> steve...@gmail.com | +1 909-569-9880
> >>>
> >>>
> >>>
> >>> On Thu, Feb 9, 2012 at 4:08 PM, Erik Hatcher  >wrote:
> >>>
>  What type of field is category_ids_im?
> 
>  And I'm assuming that the *'s below are for emphasis and not really in
>  your query?
> 
>  Try your query in the q parameter and turn on debug (&debugQuery=true)
>  and see how your query is parsing.  That'll likely tell all.
> 
>    Erik
> 
>  On Feb 9, 2012, at 18:42 , Steven Ou wrote:
> 
> > Hey guys, I'm stumped - hope someone can help!
> >
> > Basically, I'm running a filter query that filters by category (e.g.
> > fq=category_ids_im:(637 OR 639 OR 634)). However, it often produces
> no
> > results whatsoever even though each individual query *does* produce
>  results.
> >
> > So, for example, fq=category_ids_im:*637* produces
> > results. fq=category_ids_im:*639* produces results.
> > fq=category_ids_im:*634* produces
> > results. Even fq=category_ids_im:(*637* OR *639*) produces results,
> as
>  well
> > as fq=category_ids_im:(*639* OR *634*).
> >
> > BUT as soon as I do fq=category_ids_im:(*637* OR *639* OR *634*), it
> > produces NO RESULTS!
> >
> > Any ideas what might be wrong? Really appreciate any help!
> > --
> > Steven Ou | 歐偉凡
> >
> > *ravn.c

Re: Range facet - Count in facet menu != Count in search results

2012-02-09 Thread Jan Høydahl

Hi,

If you use trunk (4.0) version, you can say fq=price:[10 TO 20} and have the 
upper bound be exclusive.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 10. feb. 2012, at 00:58, Yuhao wrote:

> I've changed the "facet.range.include" option to every possible value (lower, 
> upper, edge, outer, all)**.  It only changes the count shown in the "Ranges" 
> facet menu on the left.  It has no effect on the count and results shown in 
> search results, which ALWAYS is inclusive of both the lower AND upper bounds 
> (which is equivalent to "include = all").  Is this by design?  I would like 
> to make the search results include the lower bound, but not the upper bound.  
> Can I do that?
> 
> My range field is multi-valued, but I don't think that should be the problem.
> 
> ** Actually, it doesn't like "outer" for some reason, which leaves the facet 
> completely empty.

Re: Empty results with OR filter query

Yes, certainly should work fine as a filter query... I was merely trying to 
eliminate variables from the equation.  You've got several filters and a q=*:* 
going on below, so it's obviously harder to pinpoint what could be going wrong. 
 I suggest continuing to eliminate variables here, as more than likely some 
other filter is causing the documents you think should appear to be filtered 
out.

Erik



On Feb 9, 2012, at 19:24 , Steven Ou wrote:

> By turning fq=category_ids_im:(637+OR+639+OR+634) to
> q=category_ids_im:(637+OR+639+OR+634)
> it appears to produce the correct results. But... that doesn't seem to make
> sense to me? Shouldn't it work just fine as a filter query?
> --
> Steven Ou | 歐偉凡
> 
> *ravn.com* | Chief Technology Officer
> steve...@gmail.com | +1 909-569-9880
> 
> 
> On Thu, Feb 9, 2012 at 4:20 PM, Steven Ou  wrote:
> 
>> I don't really know how to analyze the debug output... Here it is for the
>> full query I'm running, which includes other filter queries.
>> 
>> 
>> *:*
>> *:*
>> MatchAllDocsQuery(*:*)
>> *:*
>> 
>> LuceneQParser
>> 
>> type:Event
>> displayable_b:true
>> category_ids_im:(637 OR 639 OR 634)
>> end_datetime_dt:[2012\-02\-10T00\:17\:52Z TO *]
>> {!geofilt}
>> 
>> 
>> type:Event
>> displayable_b:true
>> 
>> category_ids_im:637 category_ids_im:639 category_ids_im:634
>> 
>> end_datetime_dt:[1328833072000 TO *]
>> 
>> 
>> SpatialDistanceQuery(geofilt(latlonSource=coordinates_lls(double(coordinates_lls_0_coordinate),double(coordinates_lls_1_coordinate)),latCenter=37.7561438,lonCenter=-122.4325682,dist=50.0,latMin=37.30648363225355,latMax=38.20580396774645,lonMin=-123.0013021058511,lonMax-121.86383429414894,lon2Min=-180.0,lon2Max180.0,calcDist=true,planetRadius=6371.009))
>> 
>> 
>> 
>> 1.0
>> 
>> 1.0
>> 
>> 1.0
>> 
>> 
>> 0.0
>> 
>> 
>> 0.0
>> 
>> 
>> 0.0
>> 
>> 
>> 0.0
>> 
>> 
>> 0.0
>> 
>> 
>> 
>> 0.0
>> 
>> 0.0
>> 
>> 
>> 0.0
>> 
>> 
>> 0.0
>> 
>> 
>> 0.0
>> 
>> 
>> 0.0
>> 
>> 
>> 0.0
>> 
>> 
>> 
>> 
>> --
>> Steven Ou | 歐偉凡
>> 
>> *ravn.com* | Chief Technology Officer
>> steve...@gmail.com | +1 909-569-9880
>> 
>> 
>> On Thu, Feb 9, 2012 at 4:15 PM, Steven Ou  wrote:
>> 
>>> Heh, yeah, I bolded the numbers for emphasis. The field type follows.
>>> 
>>> *Dynamically Created From Pattern: 
>>> **_IM
>>> 
>>> *Field Type: *SINT 
>>> 
>>> *Schema: *Indexed, Multivalued, Omit Norms
>>> 
>>> *Index: *(unstored field)
>>> 
>>> *Index Analyzer: *org.apache.solr.schema.FieldType$DefaultAnalyzer
>>> 
>>> *Query Analyzer: *org.apache.solr.schema.FieldType$DefaultAnalyzer
>>> 
>>> *Docs: *33730
>>> 
>>> *Distinct: *528
>>> --
>>> Steven Ou | 歐偉凡
>>> 
>>> *ravn.com* | Chief Technology Officer
>>> steve...@gmail.com | +1 909-569-9880
>>> 
>>> 
>>> 
>>> On Thu, Feb 9, 2012 at 4:08 PM, Erik Hatcher wrote:
>>> 
 What type of field is category_ids_im?
 
 And I'm assuming that the *'s below are for emphasis and not really in
 your query?
 
 Try your query in the q parameter and turn on debug (&debugQuery=true)
 and see how your query is parsing.  That'll likely tell all.
 
   Erik
 
 On Feb 9, 2012, at 18:42 , Steven Ou wrote:
 
> Hey guys, I'm stumped - hope someone can help!
> 
> Basically, I'm running a filter query that filters by category (e.g.
> fq=category_ids_im:(637 OR 639 OR 634)). However, it often produces no
> results whatsoever even though each individual query *does* produce
 results.
> 
> So, for example, fq=category_ids_im:*637* produces
> results. fq=category_ids_im:*639* produces results.
> fq=category_ids_im:*634* produces
> results. Even fq=category_ids_im:(*637* OR *639*) produces results, as
 well
> as fq=category_ids_im:(*639* OR *634*).
> 
> BUT as soon as I do fq=category_ids_im:(*637* OR *639* OR *634*), it
> produces NO RESULTS!
> 
> Any ideas what might be wrong? Really appreciate any help!
> --
> Steven Ou | 歐偉凡
> 
> *ravn.com* | Chief Technology Officer
> steve...@gmail.com | +1 909-569-9880
 
 
>>> 
>>

Re: Empty results with OR filter query

By turning fq=category_ids_im:(637+OR+639+OR+634) to
q=category_ids_im:(637+OR+639+OR+634)
it appears to produce the correct results. But... that doesn't seem to make
sense to me? Shouldn't it work just fine as a filter query?
--
Steven Ou | 歐偉凡

*ravn.com* | Chief Technology Officer
steve...@gmail.com | +1 909-569-9880


On Thu, Feb 9, 2012 at 4:20 PM, Steven Ou  wrote:

> I don't really know how to analyze the debug output... Here it is for the
> full query I'm running, which includes other filter queries.
>
> 
> *:*
> *:*
> MatchAllDocsQuery(*:*)
> *:*
> 
> LuceneQParser
> 
> type:Event
> displayable_b:true
> category_ids_im:(637 OR 639 OR 634)
> end_datetime_dt:[2012\-02\-10T00\:17\:52Z TO *]
> {!geofilt}
> 
> 
> type:Event
> displayable_b:true
> 
> category_ids_im:637 category_ids_im:639 category_ids_im:634
> 
> end_datetime_dt:[1328833072000 TO *]
> 
>
> SpatialDistanceQuery(geofilt(latlonSource=coordinates_lls(double(coordinates_lls_0_coordinate),double(coordinates_lls_1_coordinate)),latCenter=37.7561438,lonCenter=-122.4325682,dist=50.0,latMin=37.30648363225355,latMax=38.20580396774645,lonMin=-123.0013021058511,lonMax-121.86383429414894,lon2Min=-180.0,lon2Max180.0,calcDist=true,planetRadius=6371.009))
> 
> 
> 
> 1.0
> 
> 1.0
> 
> 1.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 
> 0.0
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 
> 
> --
> Steven Ou | 歐偉凡
>
> *ravn.com* | Chief Technology Officer
> steve...@gmail.com | +1 909-569-9880
>
>
> On Thu, Feb 9, 2012 at 4:15 PM, Steven Ou  wrote:
>
>> Heh, yeah, I bolded the numbers for emphasis. The field type follows.
>>
>> *Dynamically Created From Pattern: 
>> **_IM
>>
>> *Field Type: *SINT 
>>
>> *Schema: *Indexed, Multivalued, Omit Norms
>>
>> *Index: *(unstored field)
>>
>> *Index Analyzer: *org.apache.solr.schema.FieldType$DefaultAnalyzer
>>
>> *Query Analyzer: *org.apache.solr.schema.FieldType$DefaultAnalyzer
>>
>> *Docs: *33730
>>
>> *Distinct: *528
>> --
>> Steven Ou | 歐偉凡
>>
>> *ravn.com* | Chief Technology Officer
>> steve...@gmail.com | +1 909-569-9880
>>
>>
>>
>> On Thu, Feb 9, 2012 at 4:08 PM, Erik Hatcher wrote:
>>
>>> What type of field is category_ids_im?
>>>
>>> And I'm assuming that the *'s below are for emphasis and not really in
>>> your query?
>>>
>>> Try your query in the q parameter and turn on debug (&debugQuery=true)
>>> and see how your query is parsing.  That'll likely tell all.
>>>
>>>Erik
>>>
>>> On Feb 9, 2012, at 18:42 , Steven Ou wrote:
>>>
>>> > Hey guys, I'm stumped - hope someone can help!
>>> >
>>> > Basically, I'm running a filter query that filters by category (e.g.
>>> > fq=category_ids_im:(637 OR 639 OR 634)). However, it often produces no
>>> > results whatsoever even though each individual query *does* produce
>>> results.
>>> >
>>> > So, for example, fq=category_ids_im:*637* produces
>>> > results. fq=category_ids_im:*639* produces results.
>>> > fq=category_ids_im:*634* produces
>>> > results. Even fq=category_ids_im:(*637* OR *639*) produces results, as
>>> well
>>> > as fq=category_ids_im:(*639* OR *634*).
>>> >
>>> > BUT as soon as I do fq=category_ids_im:(*637* OR *639* OR *634*), it
>>> > produces NO RESULTS!
>>> >
>>> > Any ideas what might be wrong? Really appreciate any help!
>>> > --
>>> > Steven Ou | 歐偉凡
>>> >
>>> > *ravn.com* | Chief Technology Officer
>>> > steve...@gmail.com | +1 909-569-9880
>>>
>>>
>>
>

Re: Empty results with OR filter query

I don't really know how to analyze the debug output... Here it is for the
full query I'm running, which includes other filter queries.


*:*
*:*
MatchAllDocsQuery(*:*)
*:*

LuceneQParser

type:Event
displayable_b:true
category_ids_im:(637 OR 639 OR 634)
end_datetime_dt:[2012\-02\-10T00\:17\:52Z TO *]
{!geofilt}


type:Event
displayable_b:true

category_ids_im:637 category_ids_im:639 category_ids_im:634

end_datetime_dt:[1328833072000 TO *]

SpatialDistanceQuery(geofilt(latlonSource=coordinates_lls(double(coordinates_lls_0_coordinate),double(coordinates_lls_1_coordinate)),latCenter=37.7561438,lonCenter=-122.4325682,dist=50.0,latMin=37.30648363225355,latMax=38.20580396774645,lonMin=-123.0013021058511,lonMax-121.86383429414894,lon2Min=-180.0,lon2Max180.0,calcDist=true,planetRadius=6371.009))



1.0

1.0

1.0


0.0


0.0


0.0


0.0


0.0



0.0

0.0


0.0


0.0


0.0


0.0


0.0




--
Steven Ou | 歐偉凡

*ravn.com* | Chief Technology Officer
steve...@gmail.com | +1 909-569-9880


On Thu, Feb 9, 2012 at 4:15 PM, Steven Ou  wrote:

> Heh, yeah, I bolded the numbers for emphasis. The field type follows.
>
> *Dynamically Created From Pattern: 
> **_IM
>
> *Field Type: *SINT 
>
> *Schema: *Indexed, Multivalued, Omit Norms
>
> *Index: *(unstored field)
>
> *Index Analyzer: *org.apache.solr.schema.FieldType$DefaultAnalyzer
>
> *Query Analyzer: *org.apache.solr.schema.FieldType$DefaultAnalyzer
>
> *Docs: *33730
>
> *Distinct: *528
> --
> Steven Ou | 歐偉凡
>
> *ravn.com* | Chief Technology Officer
> steve...@gmail.com | +1 909-569-9880
>
>
>
> On Thu, Feb 9, 2012 at 4:08 PM, Erik Hatcher wrote:
>
>> What type of field is category_ids_im?
>>
>> And I'm assuming that the *'s below are for emphasis and not really in
>> your query?
>>
>> Try your query in the q parameter and turn on debug (&debugQuery=true)
>> and see how your query is parsing.  That'll likely tell all.
>>
>>Erik
>>
>> On Feb 9, 2012, at 18:42 , Steven Ou wrote:
>>
>> > Hey guys, I'm stumped - hope someone can help!
>> >
>> > Basically, I'm running a filter query that filters by category (e.g.
>> > fq=category_ids_im:(637 OR 639 OR 634)). However, it often produces no
>> > results whatsoever even though each individual query *does* produce
>> results.
>> >
>> > So, for example, fq=category_ids_im:*637* produces
>> > results. fq=category_ids_im:*639* produces results.
>> > fq=category_ids_im:*634* produces
>> > results. Even fq=category_ids_im:(*637* OR *639*) produces results, as
>> well
>> > as fq=category_ids_im:(*639* OR *634*).
>> >
>> > BUT as soon as I do fq=category_ids_im:(*637* OR *639* OR *634*), it
>> > produces NO RESULTS!
>> >
>> > Any ideas what might be wrong? Really appreciate any help!
>> > --
>> > Steven Ou | 歐偉凡
>> >
>> > *ravn.com* | Chief Technology Officer
>> > steve...@gmail.com | +1 909-569-9880
>>
>>
>

Re: Empty results with OR filter query

Heh, yeah, I bolded the numbers for emphasis. The field type follows.

*Dynamically Created From Pattern:
**_IM

*Field Type: *SINT 

*Schema: *Indexed, Multivalued, Omit Norms

*Index: *(unstored field)

*Index Analyzer: *org.apache.solr.schema.FieldType$DefaultAnalyzer

*Query Analyzer: *org.apache.solr.schema.FieldType$DefaultAnalyzer

*Docs: *33730

*Distinct: *528
--
Steven Ou | 歐偉凡

*ravn.com* | Chief Technology Officer
steve...@gmail.com | +1 909-569-9880


On Thu, Feb 9, 2012 at 4:08 PM, Erik Hatcher  wrote:

> What type of field is category_ids_im?
>
> And I'm assuming that the *'s below are for emphasis and not really in
> your query?
>
> Try your query in the q parameter and turn on debug (&debugQuery=true) and
> see how your query is parsing.  That'll likely tell all.
>
>Erik
>
> On Feb 9, 2012, at 18:42 , Steven Ou wrote:
>
> > Hey guys, I'm stumped - hope someone can help!
> >
> > Basically, I'm running a filter query that filters by category (e.g.
> > fq=category_ids_im:(637 OR 639 OR 634)). However, it often produces no
> > results whatsoever even though each individual query *does* produce
> results.
> >
> > So, for example, fq=category_ids_im:*637* produces
> > results. fq=category_ids_im:*639* produces results.
> > fq=category_ids_im:*634* produces
> > results. Even fq=category_ids_im:(*637* OR *639*) produces results, as
> well
> > as fq=category_ids_im:(*639* OR *634*).
> >
> > BUT as soon as I do fq=category_ids_im:(*637* OR *639* OR *634*), it
> > produces NO RESULTS!
> >
> > Any ideas what might be wrong? Really appreciate any help!
> > --
> > Steven Ou | 歐偉凡
> >
> > *ravn.com* | Chief Technology Officer
> > steve...@gmail.com | +1 909-569-9880
>
>

Re: Empty results with OR filter query

What type of field is category_ids_im?

And I'm assuming that the *'s below are for emphasis and not really in your 
query?

Try your query in the q parameter and turn on debug (&debugQuery=true) and see 
how your query is parsing.  That'll likely tell all.

Erik

On Feb 9, 2012, at 18:42 , Steven Ou wrote:

> Hey guys, I'm stumped - hope someone can help!
> 
> Basically, I'm running a filter query that filters by category (e.g.
> fq=category_ids_im:(637 OR 639 OR 634)). However, it often produces no
> results whatsoever even though each individual query *does* produce results.
> 
> So, for example, fq=category_ids_im:*637* produces
> results. fq=category_ids_im:*639* produces results.
> fq=category_ids_im:*634* produces
> results. Even fq=category_ids_im:(*637* OR *639*) produces results, as well
> as fq=category_ids_im:(*639* OR *634*).
> 
> BUT as soon as I do fq=category_ids_im:(*637* OR *639* OR *634*), it
> produces NO RESULTS!
> 
> Any ideas what might be wrong? Really appreciate any help!
> --
> Steven Ou | 歐偉凡
> 
> *ravn.com* | Chief Technology Officer
> steve...@gmail.com | +1 909-569-9880

Range facet - Count in facet menu != Count in search results

2012-02-09 Thread Yuhao

I've changed the "facet.range.include" option to every possible value (lower, 
upper, edge, outer, all)**.  It only changes the count shown in the "Ranges" 
facet menu on the left.  It has no effect on the count and results shown in 
search results, which ALWAYS is inclusive of both the lower AND upper bounds 
(which is equivalent to "include = all").  Is this by design?  I would like to 
make the search results include the lower bound, but not the upper bound.  Can 
I do that?

My range field is multi-valued, but I don't think that should be the problem.

** Actually, it doesn't like "outer" for some reason, which leaves the facet 
completely empty.

Re: custom TokenFilter

2012-02-09 Thread Robert Muir

If you are writing a custom tokenstream, I recommend using some of the
resources in Lucene's test-framework.jar to test it.
These find lots of bugs! (including thread-safety bugs)

For a filter: I recommend to use the assertions in
BaseTokenStreamTestCase: assertTokenStreamContents, assertAnalyzesTo,
and especially checkRandomData
http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/test-framework/src/java/org/apache/lucene/analysis/BaseTokenStreamTestCase.java

When testing your filter, for even more checks, don't use Whitespace
or Keyword Tokenizer, use MockTokenizer, it has more checks:
http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/test-framework/src/java/org/apache/lucene/analysis/MockTokenizer.java

For some examples, you can look at the tests in modules/analysis.

And of course enable assertions (-ea) when testing!

On Thu, Feb 9, 2012 at 6:30 PM, Jamie Johnson  wrote:
> I have the need to take user input and index it in a unique fashion,
> essentially the value is some string (say "abcdefghijk") and needs to
> be converted into a set of tokens (say 1 2 3 4).  I am currently have
> implemented a custom TokenFilter to do this, is this appropriate?  In
> cases where I am indexing things slowly (i.e. 1 at a time) this works
> fine, but when I send 10,000 things to solr (all in one thread) I am
> noticing exceptions where it seems that the generated instance
> variable is being used by several threads.  Is my implementation
> appropriate or is there another more appropriate way to do this?  Are
> TokenFilters reused?  Would it be more appropriate to convert the
> stream to 1 token space separated then run that through a
> WhiteSpaceTokenizer?  Any guidance on this would be greatly
> appreciated.
>
>        class CustomFilter extends TokenFilter {
>                private final CharTermAttribute termAtt =
> addAttribute(CharTermAttribute.class);
>                private final PositionIncrementAttribute posAtt =
> addAttribute(PositionIncrementAttribute.class);
>                protected CustomFilter(TokenStream input) {
>                        super(input);
>                }
>
>                Iterator replacement;
>                @Override
>                public boolean incrementToken() throws IOException {
>
>
>                        if(generated == null){
>                                //setup generated
>                                if(!input.incrementToken()){
>                                        return false;
>                                }
>
>                                //clearAttributes();
>                                List cells = 
> StaticClass.generateTokens(termAtt.toString());
>                                generated = new 
> ArrayList(cells.size());
>                                boolean first = true;
>                                for(String cell : cells) {
>                                        AttributeSource newTokenSource = 
> this.cloneAttributes();
>
>                                        CharTermAttribute newTermAtt =
> newTokenSource.addAttribute(CharTermAttribute.class);
>                                        newTermAtt.setEmpty();
>                                        newTermAtt.append(cell);
>                                        OffsetAttribute newOffsetAtt =
> newTokenSource.addAttribute(OffsetAttribute.class);
>                                        PositionIncrementAttribute 
> newPosIncAtt =
> newTokenSource.addAttribute(PositionIncrementAttribute.class);
>                                        newOffsetAtt.setOffset(0,0);
>                                        
> newPosIncAtt.setPositionIncrement(first ? 1 : 0);
>                                        generated.add(newTokenSource);
>                                        first = false;
>                                        generated.add(newTokenSource);
>                                }
>
>                        }
>                        if(!generated.isEmpty()){
>                                copy(this, generated.remove(0));
>                                return true;
>                        }
>
>                        return false;
>
>                }
>
>                private void copy(AttributeSource target, AttributeSource 
> source) {
>                        if (target != source)
>                                source.copyTo(target);
>                }
>
>                private LinkedList buffer;
>                private LinkedList matched;
>
>                private boolean exhausted;
>
>                private AttributeSource nextTok() throws IOException {
>                        if (buffer != null && !buffer.isEmpty()) {
>                                return buffer.removeFirst();
>                        } else {
>                                if (!exhausted && input.incrementToken()) {
>                                        return this;
>                                } else {
>                                        exhausted = tr

Empty results with OR filter query

Hey guys, I'm stumped - hope someone can help!

Basically, I'm running a filter query that filters by category (e.g.
fq=category_ids_im:(637 OR 639 OR 634)). However, it often produces no
results whatsoever even though each individual query *does* produce results.

So, for example, fq=category_ids_im:*637* produces
results. fq=category_ids_im:*639* produces results.
fq=category_ids_im:*634* produces
results. Even fq=category_ids_im:(*637* OR *639*) produces results, as well
as fq=category_ids_im:(*639* OR *634*).

BUT as soon as I do fq=category_ids_im:(*637* OR *639* OR *634*), it
produces NO RESULTS!

Any ideas what might be wrong? Really appreciate any help!
--
Steven Ou | 歐偉凡

*ravn.com* | Chief Technology Officer
steve...@gmail.com | +1 909-569-9880

indexing with DIH (and with problems)

2012-02-09 Thread alessio crisantemi

hi all,
I would index on solr my pdf files wich includeds on my directory c:\myfile\

so, I add on my solr/conf directory the file data-config.xml like the
following:









 





before, I add this part into solr-config.xml:




  c:\solr\conf\data-config.xml

  


but this is the result:


* * *delta-import*
 * * *idle*
 * * 
 
*-*

 * * *0:0:2.512*
 * * *0*
 * * *0*
 * * *0*
 * * *0*
 * * *2012-02-09 23:37:07*
 * * *Indexing failed. Rolled back all changes.*
 * * *2012-02-09 23:37:07*
* * 
 * * *This response format is experimental. It is
likely to change in the future.*
* * 

suggestions?
thanks
alessio

RE: regular expression in solrcore.config to be passed to dataConfig via DataImportHandler

2012-02-09 Thread Dyer, James

I wouldn't feel too bad about this.  This is a pretty common gotcha and going 
forward it would be nice if we can make it easier to parameterize 
data-config.xml...

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Zajkowski, Radoslaw [mailto:radoslaw.zajkow...@proximity.ca] 
Sent: Thursday, February 09, 2012 4:16 PM
To: solr-user@lucene.apache.org
Subject: RE: regular expression in solrcore.config to be passed to dataConfig 
via DataImportHandler

Nevermind everybody, you're with stupid, the correct way to access these vars 
is to prefix with dataimporter.request

Got the answer here earlier today:

http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201008.mbox/%3c9f8b39cb3b7c6d4594293ea29ccf438b01702...@icq-mail.icq.il.office.aol.com%3E



Radoslaw Zajkowski
Senior Developer
O°
proximity
CANADA
t: 416-972-1505 ext.7306
c: 647-281-2567
f: 416-944-7886

2011 ADCC Interactive Agency of the Year
2011 Strategy Magazine Digital Agency of the Year

http://www.proximityworld.com/

Join us on:
Facebook - http://www.facebook.com/ProximityCanada
Twitter - http://twitter.com/ProximityWW
YouTube - http://www.youtube.com/proximitycanada

-Original Message-

From: Zajkowski, Radoslaw [mailto:radoslaw.zajkow...@proximity.ca]
Sent: Thursday, February 09, 2012 5:03 PM
To: solr-user@lucene.apache.org
Subject: regular expression in solrcore.config to be passed to dataConfig via 
DataImportHandler

Hi,

I have a good number of files which will be broken into a dozen + cores.

To make config management easier I have been using global xml files and passing 
settings to them as needed. My settings reside in solrcore.config and are 
passed to solr config and dataConfig as default values of the dataimporthandler

Settings file looks like this:
For Spanish core:
core.languagegroup=es
core.filenamefilter=.*(spa|spl|sppr|spus|esci|ese|esep|eses)\.(xml)

Fore English core:
core.languagegroup=en
core.filenamefilter=.*(eeau|eaw|eez|eep|eeap|eeat|eebe|eeci|eedk)\.(xml)

I am adding the core.filenamefilter value to the dataimporthandler as a default 
value like this:



../../global_core_configs/DataConfig.xml
${core.languagegroup}
${core.filenamefilter}

  

and then accessing in the dataConfig section like this



It seems that the value gets ignored or not passed correctly or parsed as 
proper regex at the dataConfig level.

Any help greatly appreciated, thank you,

Radek.




Radoslaw Zajkowski
Senior Developer
O°
proximity
CANADA
t: 416-972-1505 ext.7306
c: 647-281-2567
f: 416-944-7886

2011 ADCC Interactive Agency of the Year
2011 Strategy Magazine Digital Agency of the Year

http://www.proximityworld.com/

Join us on:
Facebook - http://www.facebook.com/ProximityCanada
Twitter - http://twitter.com/ProximityWW YouTube - 
http://www.youtube.com/proximitycanada





Please consider the environment before printing this e-mail.

This message and any attachments contain information, which may be confidential 
or privileged. If you are not the intended recipient, please refrain from any 
disclosure, copying, distribution or use of this information. Please be aware 
that such actions are prohibited. If you have received this transmission in 
error, kindly notify us by e-mail to mailto:helpd...@bbdo.com. We appreciate 
your cooperation.



-
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2012.0.1913 / Virus Database: 2112/4798 - Release Date: 02/09/12

 -Original Message-

Please consider the environment before printing this e-mail.

This message and any attachments contain information, which may be confidential 
or privileged. If you are not the intended recipient, please refrain from any 
disclosure, copying, distribution or use of this information. Please be aware 
that such actions are prohibited. If you have received this transmission in 
error, kindly notify us by e-mail to mailto:helpd...@bbdo.com. We appreciate 
your cooperation.

RE: regular expression in solrcore.config to be passed to dataConfig via DataImportHandler

2012-02-09 Thread Zajkowski, Radoslaw

Nevermind everybody, you're with stupid, the correct way to access these vars 
is to prefix with dataimporter.request

Got the answer here earlier today:

http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201008.mbox/%3c9f8b39cb3b7c6d4594293ea29ccf438b01702...@icq-mail.icq.il.office.aol.com%3E



Radoslaw Zajkowski
Senior Developer
O°
proximity
CANADA
t: 416-972-1505 ext.7306
c: 647-281-2567
f: 416-944-7886

2011 ADCC Interactive Agency of the Year
2011 Strategy Magazine Digital Agency of the Year

http://www.proximityworld.com/

Join us on:
Facebook - http://www.facebook.com/ProximityCanada
Twitter - http://twitter.com/ProximityWW
YouTube - http://www.youtube.com/proximitycanada

-Original Message-

From: Zajkowski, Radoslaw [mailto:radoslaw.zajkow...@proximity.ca]
Sent: Thursday, February 09, 2012 5:03 PM
To: solr-user@lucene.apache.org
Subject: regular expression in solrcore.config to be passed to dataConfig via 
DataImportHandler

Hi,

I have a good number of files which will be broken into a dozen + cores.

To make config management easier I have been using global xml files and passing 
settings to them as needed. My settings reside in solrcore.config and are 
passed to solr config and dataConfig as default values of the dataimporthandler

Settings file looks like this:
For Spanish core:
core.languagegroup=es
core.filenamefilter=.*(spa|spl|sppr|spus|esci|ese|esep|eses)\.(xml)

Fore English core:
core.languagegroup=en
core.filenamefilter=.*(eeau|eaw|eez|eep|eeap|eeat|eebe|eeci|eedk)\.(xml)

I am adding the core.filenamefilter value to the dataimporthandler as a default 
value like this:



../../global_core_configs/DataConfig.xml
${core.languagegroup}
${core.filenamefilter}

  

and then accessing in the dataConfig section like this



It seems that the value gets ignored or not passed correctly or parsed as 
proper regex at the dataConfig level.

Any help greatly appreciated, thank you,

Radek.




Radoslaw Zajkowski
Senior Developer
O°
proximity
CANADA
t: 416-972-1505 ext.7306
c: 647-281-2567
f: 416-944-7886

2011 ADCC Interactive Agency of the Year
2011 Strategy Magazine Digital Agency of the Year

http://www.proximityworld.com/

Join us on:
Facebook - http://www.facebook.com/ProximityCanada
Twitter - http://twitter.com/ProximityWW YouTube - 
http://www.youtube.com/proximitycanada





Please consider the environment before printing this e-mail.

This message and any attachments contain information, which may be confidential 
or privileged. If you are not the intended recipient, please refrain from any 
disclosure, copying, distribution or use of this information. Please be aware 
that such actions are prohibited. If you have received this transmission in 
error, kindly notify us by e-mail to mailto:helpd...@bbdo.com. We appreciate 
your cooperation.



-
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2012.0.1913 / Virus Database: 2112/4798 - Release Date: 02/09/12

 -Original Message-

Please consider the environment before printing this e-mail.

This message and any attachments contain information, which may be confidential 
or privileged. If you are not the intended recipient, please refrain from any 
disclosure, copying, distribution or use of this information. Please be aware 
that such actions are prohibited. If you have received this transmission in 
error, kindly notify us by e-mail to mailto:helpd...@bbdo.com. We appreciate 
your cooperation.

regular expression in solrcore.config to be passed to dataConfig via DataImportHandler

2012-02-09 Thread Zajkowski, Radoslaw

Hi,

I have a good number of files which will be broken into a dozen + cores.

To make config management easier I have been using global xml files and passing 
settings to them as needed. My settings reside in solrcore.config and are 
passed to solr config and dataConfig as default values of the dataimporthandler

Settings file looks like this:
For Spanish core:
core.languagegroup=es
core.filenamefilter=.*(spa|spl|sppr|spus|esci|ese|esep|eses)\.(xml)

Fore English core:
core.languagegroup=en
core.filenamefilter=.*(eeau|eaw|eez|eep|eeap|eeat|eebe|eeci|eedk)\.(xml)

I am adding the core.filenamefilter value to the dataimporthandler as a default 
value like this:



../../global_core_configs/DataConfig.xml
${core.languagegroup}
${core.filenamefilter}

  

and then accessing in the dataConfig section like this



It seems that the value gets ignored or not passed correctly or parsed as 
proper regex at the dataConfig level.

Any help greatly appreciated, thank you,

Radek.




Radoslaw Zajkowski
Senior Developer
O°
proximity
CANADA
t: 416-972-1505 ext.7306
c: 647-281-2567
f: 416-944-7886

2011 ADCC Interactive Agency of the Year
2011 Strategy Magazine Digital Agency of the Year

http://www.proximityworld.com/

Join us on:
Facebook - http://www.facebook.com/ProximityCanada
Twitter - http://twitter.com/ProximityWW
YouTube - http://www.youtube.com/proximitycanada





Please consider the environment before printing this e-mail.

This message and any attachments contain information, which may be confidential 
or privileged. If you are not the intended recipient, please refrain from any 
disclosure, copying, distribution or use of this information. Please be aware 
that such actions are prohibited. If you have received this transmission in 
error, kindly notify us by e-mail to mailto:helpd...@bbdo.com. We appreciate 
your cooperation.

Specify a cores roles through core add command

per SOLR-2765 we can add roles to specific cores such that it's
possible to give custom roles to solr instances, is it possible to
specify this when adding a core through curl
'http://host:port/solr/admin/cores...'?


https://issues.apache.org/jira/browse/SOLR-2765

correct usage of StreamingUpdateSolrServer?

2012-02-09 Thread T Vinod Gupta

Hi,
I wrote a hello world program to add documents to solr server. When I
use CommonsHttpSolrServer, the program exits but when I
use StreamingUpdateSolrServer, the program never exits. And I couldn't find
a way to close it? Are there any best practices here? Do I have to do
anything differently at the time of documents adds/updates when
using StreamingUpdateSolrServer? I am following the add/commit cycle. Is
that ok?

thanks

Keyword Tokenizer Phrase Issue

2012-02-09 Thread Zac Smith

Hi,

I have a simple field type that uses the KeywordTokenizerFactory. I would like 
to use this so that values in this field are only matched with the full text of 
the field.
e.g. If I indexed the text 'chicken stock', searches on this field would only 
match when searching for 'chicken stock'. If searching for just 'chicken' or 
just 'stock' there should not match.

This mostly works, except if there is more than one word in the text I only get 
a match when searching with quotes. e.g.
"chicken stock" (matches)
chicken stock (doesn't match)

Is there any way I can set this up so that I don't have to provide quotes? I am 
using dismax and if I put quotes in it will mess up the search for the rest of 
my fields. I had an idea that I could issue a separate search using the regular 
query parser, but couldn't work out how to do this:
I thought I could do something like this: qt=dismax&q=fish OR 
_query_:ingredient:"chicken stock"

I am using solr 3.5.0. My field type is:









Thanks
Zac

Re: SolrCloud on Trunk shard update error

There is more, but it was lost since I was doing so many inserts.  The
issue looks like it's getting thrown by a custom FilterFactory I have
and is completely unrelated to SolrCloud.  I am trying to confirm now
though.

On Thu, Feb 9, 2012 at 3:03 PM, Mark Miller  wrote:
> Is that the entire stack trace - no other exception logged?
>
> On Feb 9, 2012, at 2:44 PM, Jamie Johnson wrote:
>
>> I just ran a test with a very modest cluster (exactly the same as
>> http://outerthought.org/blog/491-ot.html).  I then indexed 10,000
>> documents into the cluster.  From what I can tell everything worked
>> properly but I'm seeing the following errors in the logs.  I'm
>> randomly choosing the solr instance to write to and the documents are
>> getting forwarded properly, but I wonder what the exception means and
>> why it's getting thrown.  Any thoughts?
>>
>>
>> Feb 9, 2012 2:39:01 PM org.apache.solr.common.SolrException log
>> SEVERE: shard update error StdNode:
>> http://jamiesmac:8502/solr/slice1_shard2/:org.apache.solr.common.SolrException:
>> null  java.lang.NullPointerException
>>
>> null  java.lang.NullPointerException
>>
>> request: 
>> http://jamiesmac:8502/solr/slice1_shard2/update?wt=javabin&version=2&leader=true&wt=javabin&version=2
>>       at 
>> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:433)
>>       at 
>> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:251)
>>       at 
>> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:325)
>>       at 
>> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:306)
>>       at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>>       at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>>       at 
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>>       at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>>       at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>>       at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>       at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>       at java.lang.Thread.run(Thread.java:680)
>
> - Mark Miller
> lucidimagination.com
>
>
>
>
>
>
>
>
>
>
>

Re: SolrCloud on Trunk shard update error

Is that the entire stack trace - no other exception logged?

On Feb 9, 2012, at 2:44 PM, Jamie Johnson wrote:

> I just ran a test with a very modest cluster (exactly the same as
> http://outerthought.org/blog/491-ot.html).  I then indexed 10,000
> documents into the cluster.  From what I can tell everything worked
> properly but I'm seeing the following errors in the logs.  I'm
> randomly choosing the solr instance to write to and the documents are
> getting forwarded properly, but I wonder what the exception means and
> why it's getting thrown.  Any thoughts?
> 
> 
> Feb 9, 2012 2:39:01 PM org.apache.solr.common.SolrException log
> SEVERE: shard update error StdNode:
> http://jamiesmac:8502/solr/slice1_shard2/:org.apache.solr.common.SolrException:
> null  java.lang.NullPointerException
> 
> null  java.lang.NullPointerException
> 
> request: 
> http://jamiesmac:8502/solr/slice1_shard2/update?wt=javabin&version=2&leader=true&wt=javabin&version=2
>   at 
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:433)
>   at 
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:251)
>   at 
> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:325)
>   at 
> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:306)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:680)

- Mark Miller
lucidimagination.com

SolrCloud on Trunk shard update error

I just ran a test with a very modest cluster (exactly the same as
http://outerthought.org/blog/491-ot.html).  I then indexed 10,000
documents into the cluster.  From what I can tell everything worked
properly but I'm seeing the following errors in the logs.  I'm
randomly choosing the solr instance to write to and the documents are
getting forwarded properly, but I wonder what the exception means and
why it's getting thrown.  Any thoughts?


Feb 9, 2012 2:39:01 PM org.apache.solr.common.SolrException log
SEVERE: shard update error StdNode:
http://jamiesmac:8502/solr/slice1_shard2/:org.apache.solr.common.SolrException:
null  java.lang.NullPointerException

null  java.lang.NullPointerException

request: 
http://jamiesmac:8502/solr/slice1_shard2/update?wt=javabin&version=2&leader=true&wt=javabin&version=2
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:433)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:251)
at 
org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:325)
at 
org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:306)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:680)

RE: spell checking and filtering in the same query

2012-02-09 Thread Dyer, James

Mark,

I'm not as familiar with the Suggester, but with normal spellcheck if you set 
"spellcheck.maxCollationTries" to something greater than 0 it will check the 
collations with the index.  This checking includes any "fq" params you had.  So 
in this sense the SpellCheckComponent does work with "fq".

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Mark Swinson [mailto:mark.swin...@bbc.co.uk] 
Sent: Thursday, February 09, 2012 7:38 AM
To: solr-user@lucene.apache.org
Subject: spell checking and filtering in the same query

Background:

I have an solr index containing foodtypes, chefs, and courses. This is
an initial setup to test my configuration.


Here is the problem I'm trying to solve :

-When I query for a mispelt foodtype 'x' and filter by chef 'c' I should
get a suggested list of foodtypes prepared by chef 'c'


ok:

I've managed to set up a spellcheck component so I can make the
following query:

/suggest?q=ban&spellcheck.dictionary=foodtypes

This gets me the results
'banana bread'
'banoffee pie'

How can I modify this query and the solr configuration to allow me to
filter by another field?

I'm aware the the fq parameter does not work with the SpellCheck
component.
Is there anyway of passing the results of the first query to a filter
query? I've seen various posts
on this topic, but no solutions. The best suggestion was to make the
client make a second request,
which is something I do not want to do.

Is it possible to write a SearchComponent or SearchHandler that chains
results?


Thanks for any help.


Mark








http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal 
views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on 
it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.

Re: Geospatial search with multivalued field

2012-02-09 Thread Mikhail Khludnev

Some time ago I tested backported patch from
https://issues.apache.org/jira/browse/SOLR-2155
it works.

Regards

On Thu, Feb 9, 2012 at 6:36 PM, Marian Steinbach  wrote:

> Hi!
>
> I'm trying to figure out how to enable spatial search for my use case.
>
> I have documents that are in many cases associated with multiple geo
> locations. I'd like to filter documents by the minimum distance to a
> reference point (which is given at query time).
>
> What this means is: If at least one of the locations of a document
> lies within a certain radius of the point, it should be included in
> the result.
>
> Which field type can I use for this and how would I have to do the
> filtering?
>
> Sorting (by distance) isn't relevant at this point, but it might be in
> the future.
>
> The example in Solr 3.4 states in schema.xml for the fieldType
> "location" (field "store"): "A specialized field for geospatial
> search. If indexed, this fieldType must not be multivalued." If I used
> a field of type solr.LatLonType this would mean that I could have
> multivalued="true", but no indexing? This means that I couldn't do
> fast bounding box / range queries on the locations in order to narrow
> down the result for a distance filter, correct? So wich one is better?
>
> Thanks!
>
> Marian
>



-- 
Sincerely yours
Mikhail Khludnev
Lucid Certified
Apache Lucene/Solr Developer
Grid Dynamics

Re: Improving performance for SOLR geo queries?

2012-02-09 Thread Yonik Seeley

2012/2/9 Matthias Käppler :
> 
> {!bbox cache=false d=50 sfield=location_ll pt=54.1434,-0.452322}
> 
> 
> 
> WrappedQuery({!cache=false
> cost=0}+location_ll_0_coordinate:[53.69373983225355 TO
> 54.59306016774645] +location_ll_1_coordinate:[-1.2199462259963294 TO
> 0.31530222599632934])
> 
> 

Yep, bbox normally just evaluates to two range queries.
In the example schema, *_coordinate uses tdouble:

   
   

And tdouble is defined to be:



One way to speed up numeric range queries (at the cost of increased
index size) is to lower the precisionStep.  You could try changing
this from 8 to 4 and then re-indexing to see how that affects your
query speed.

-Yonik
lucidimagination.com

Re: Help:Solr can't put all pdf files into index

Tika is not guaranteed to be able to parse any PDF file that can be read. There
are significant differences in how pdf files are constructed by different
"compatible" vendors, and the reader is quite forgiving about still displaying
them.

Sometimes you can get around this by re-writing the PDF with an app that
Tika seems to be able to handle the output from.

Also, you haven't said what version of Solr you're using. Tika has been
upgraded to 1.0 in the 3.6 build, which has not been released yet. You might
try using that, you can get the build from:
https://builds.apache.org//view/S-Z/view/Solr/job/Solr-3.x/

Best
Erick

2012/2/9 Vivek Shrivastava :
> I think you might need to figure out what files are not coming in the index, 
> and see if you can find command pattern in  those files. Since these are pdf 
> files, please make sure the file's security settings allow content extraction 
> etc..
>
> Regards,
>
> Vivek
>
> -Original Message-
> From: 荣康 [mailto:whuiss_cs2...@163.com]
> Sent: Wednesday, February 08, 2012 11:30 PM
> To: solr-user@lucene.apache.org
> Subject: Help:Solr can't put all pdf files into index
>
> Hey ,
> I am using solr as my search engine to search my pdf files. I have 18219 
> files(different file names) and all the files are in one same directory。But 
> when I use solr to import the files into index using Dataimport method, solr 
> report only import 17233 files. It's very strange. This problem has stoped 
> out project for a few days. I can't handle it.
>
>
>  please help me!
>
>
> Schema.xml
>
>
> 
>termVectors="true" termPositions="true" termOffsets="true"/>
>termVectors="true" termPositions="true" termOffsets="true"/>
>   
>  
>  id
>  
>
>
> and
> 
>
>  
>  rootEntity="false"
>  dataSource="null"  baseDir="H:/pdf/cls_1_16800_OCRed/1"
> fileName=".*\.(PDF)|(pdf)|(Pdf)|(pDf)|(pdF)|(PDf)|(PdF)|(pDF)" onError="skip">
>
>
>  url="${f.fileAbsolutePath}" format="text" dataSource="bin" onError="skip">
>
> 
>  
>  
> 
>
> 
>
>
>
>
> sincerecly
> Rong Kang
>
>
>

Re: Index Start Question

Hmmm. You say:

"The DBA opens a command line prompt and initiates an index build/rebuild"

How? By issuing a curl command? Running a program? It seems to me that the
easiest thing to do here would be to create a small program that kicks
off the indexing process and have *that* program send the e-mails when
it starts and perhaps a completion e-mail after it's done.

Seems a lot surer than trying to infer the action from the Solr logs...

Best
Erick

On Thu, Feb 9, 2012 at 10:43 AM, Hoffman, Chase  wrote:
> Erick,
>
> My understanding of the process is this:
>
> 1. The DBA opens a command line prompt and initiates an index build/rebuild
> 2. SOLR performs said index build/rebuild
> 3. Index finishes
>
> I don't think we're appending documents to the SOLR index - it's indexing 
> MSSQL tables.  The servers these are running on aren't beefy enough to run 
> multiple SOLR index builds at the same time.  So the hope is to find some key 
> in the logs that shows the start of the index rebuild so that I can put in 
> some automation to blast out an email saying "Server X is currently running 
> an index, do not kick off an index run on Server X".
>
> Thanks so much for your help.
>
> Best,
>
> --Chase
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Thursday, February 09, 2012 9:39 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Index Start Question
>
> OK, what do you mean by "index is kicked off"? You mean starting Solr or 
> actually adding a document to a running Solr?
>
> If the latter, you're probably looking for something like this:
> Feb 9, 2012 10:34:26 AM
> org.apache.solr.update.processor.LogUpdateProcessor finish
> INFO: {add=[eoe32]} 0 6
>
> The important bits are solr.update.processor and the add=blahblah bit where 
> the stuff after the = will be a list of s for the document(s) 
> added.
>
> However, this will be somewhat fragile, the format of the logged messages is 
> not guaranteed in future versions.
>
> Although this is happening, I think, after the doc has been added to the 
> index, so it may be too late for your problem.
>
> Best
> Erick
>
> On Wed, Feb 8, 2012 at 3:13 PM, Hoffman, Chase  wrote:
>> Please forgive me if this is a dumb question.  I've never dealt with SOLR 
>> before, and I'm being asked to determine from the logs when a SOLR index is 
>> kicked off (it is a Windows server).  The TOMCAT service runs continually, 
>> so no love there.  In parsing the logs, I think 
>> "org.apache.solr.core.SolrResourceLoader " is the indicator, since 
>> "org.apache.solr.core.SolrCore execute" seems to occur even when I know an 
>> index has not been started.
>>
>> Any advice you could give me would be wonderful.
>>
>> Best,
>>
>> --Chase
>>
>> Chase Hoffman
>> Infrastructure Systems Administrator, Performance Technologies The
>> Advisory Board Company
>> 512-681-2190 direct | 512-609-1150 fax
>> hoffm...@advisory.com |
>> www.advisory.com
>>
>> Don't miss out-log in now
>> Unlock thousands of members-only tools, events, best practices, and more at 
>> www.advisory.com.
>> Get
>> started> SignatureLine|Other|ABC|Login+8+Reasons|Nov212011>
>

Re: Zookeeper view not displaying on latest trunk


On Feb 9, 2012, at 12:09 PM, Jamie Johnson wrote:

> To get this to work I had to modify my solr.xml to add a
> defaultCoreName, then everything worked fine on the old interface
> (/solr/admin).  The new interface was still unhappy and looking at the
> response that comes back I see the following
> 
> {"status": 404, "error" : "Zookeeper is not configured for this Solr
> Core. Please try connecting to an alternate zookeeper address."}
> 
> Does the new interface support multiple cores?

It should, but someone else wrote it, so I don't know offhand - sounds like a 
issue we need to look at.


>  Should the old
> interface require that defaultCoreName be set?

No - another thing we should look at.

> 
> On Thu, Feb 9, 2012 at 10:29 AM, Jamie Johnson  wrote:
>> I'm looking at the latest code on trunk and it seems as if the
>> zookeeper view does not work.  When trying to access the information I
>> get the following in the log
>> 
>> 
>> 2012-02-09 10:28:49.030:WARN::/solr/zookeeper.jsp
>> java.lang.NullPointerException
>>at 
>> org.apache.jsp.zookeeper_jsp$ZKPrinter.(org.apache.jsp.zookeeper_jsp:55)
>>at 
>> org.apache.jsp.zookeeper_jsp._jspService(org.apache.jsp.zookeeper_jsp:533)
>>at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:109)
>>at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
>>at 
>> org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:389)
>>at 
>> org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:486)
>>at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:380)
>>at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
>>at 
>> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
>>at 
>> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
>>at 
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:280)
>>at 
>> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>>at 
>> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
>>at 
>> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>>at 
>> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
>>at 
>> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
>>at 
>> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
>>at 
>> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
>>at 
>> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
>>at 
>> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
>>at org.mortbay.jetty.Server.handle(Server.java:326)
>>at 
>> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
>>at 
>> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
>>at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
>>at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
>>at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
>>at 
>> org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
>>at 
>> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

- Mark Miller
lucidimagination.com

Re: struggling with solr.WordDelimiterFilterFactory and periods "." or dots

Hmmm, Try looking at either anything you've done in solrconfig.xml
where to the request handler (probably called "search") with
default="true" set.

Or does your field in schema.xml have anything like
autoGeneratePhraseQueries="true" in it?

Best
Erick

On Thu, Feb 9, 2012 at 12:02 PM, geeky2  wrote:
>
>>>
> OK, first question is why are you searching on two different values?
> Is that intentional?
> <<
>
> yes - our users have to be able to locate a part or model number (that may
> or may not have periods in that number) even if they do NOT enter the number
> with the embedded periods.
>
> example:
>
> actual part number in our database is BP2.1UAA
>
> however the user needs to be able to search on BP21UAA and find that part.
>
> there are business reason why a user may see something different in the
> field then is actually in the database.
>
> does this make sense?
>
>
>
>>>
> If I'm reading your problem right, you should
> be able to get/not get any response just by toggling whether the
> period is in the search URL, right?
> <<
>
> yes - simply put - the user MUST get a hit on the above mentioned part if
> they enter BP21UAA or BP2.1UAA.
>
>>>
> But assuming that's not the problem, there's something you're
> not telling us. In particular, why is this parsing as "MultiPhraseQuer"?
> <<
>
> sorry - i did not know i was doing this or how it happened - it was not
> intentional and i did not notice this until your posting.  i am not sure of
> the implications related to this or what it means to have something as a
> MultiPhraseQuery.
>
>>>
> Are you putting quotes in somehow, either through the URL or by
> something in your solrconfig.xml?
> <<
>
> i did not use quotes in the url - i cut and pasted the urls for my tests in
> the message thread.  i do not see quotes as part of the url in my previous
> post.
>
> what would i be looking for in the solrconfig.xml file that would force the
> MultiPhraseQuery?
>
> it seems that this is the crux of the issue - but i am not sure how to
> determine what is manifesting the quotes?  as previously stated - the quotes
> are not being entered via the url - they are pasted (in this message thread)
> exactly as i pulled them from the browser.
>
> thank you,
> mark
>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/struggling-with-solr-WordDelimiterFilterFactory-and-periods-or-dots-tp3724822p3730070.html
> Sent from the Solr - User mailing list archive at Nabble.com.

RE: Help:Solr can't put all pdf files into index

2012-02-09 Thread Vivek Shrivastava

I think you might need to figure out what files are not coming in the index, 
and see if you can find command pattern in  those files. Since these are pdf 
files, please make sure the file's security settings allow content extraction 
etc..

Regards,

Vivek

-Original Message-
From: 荣康 [mailto:whuiss_cs2...@163.com] 
Sent: Wednesday, February 08, 2012 11:30 PM
To: solr-user@lucene.apache.org
Subject: Help:Solr can't put all pdf files into index

Hey ,
I am using solr as my search engine to search my pdf files. I have 18219 
files(different file names) and all the files are in one same directory。But 
when I use solr to import the files into index using Dataimport method, solr 
report only import 17233 files. It's very strange. This problem has stoped out 
project for a few days. I can't handle it.


 please help me!


Schema.xml



   
   

 
 id 
 


and 
 
 
  
 



  
 
 
  
 
 
 




sincerecly
Rong Kang

Re: Zookeeper view not displaying on latest trunk

To get this to work I had to modify my solr.xml to add a
defaultCoreName, then everything worked fine on the old interface
(/solr/admin).  The new interface was still unhappy and looking at the
response that comes back I see the following

{"status": 404, "error" : "Zookeeper is not configured for this Solr
Core. Please try connecting to an alternate zookeeper address."}

Does the new interface support multiple cores?  Should the old
interface require that defaultCoreName be set?

On Thu, Feb 9, 2012 at 10:29 AM, Jamie Johnson  wrote:
> I'm looking at the latest code on trunk and it seems as if the
> zookeeper view does not work.  When trying to access the information I
> get the following in the log
>
>
> 2012-02-09 10:28:49.030:WARN::/solr/zookeeper.jsp
> java.lang.NullPointerException
>        at 
> org.apache.jsp.zookeeper_jsp$ZKPrinter.(org.apache.jsp.zookeeper_jsp:55)
>        at 
> org.apache.jsp.zookeeper_jsp._jspService(org.apache.jsp.zookeeper_jsp:533)
>        at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:109)
>        at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
>        at 
> org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:389)
>        at 
> org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:486)
>        at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:380)
>        at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
>        at 
> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
>        at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
>        at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:280)
>        at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>        at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
>        at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>        at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
>        at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
>        at 
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
>        at 
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
>        at 
> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
>        at 
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
>        at org.mortbay.jetty.Server.handle(Server.java:326)
>        at 
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
>        at 
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
>        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
>        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
>        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
>        at 
> org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
>        at 
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

Re: struggling with solr.WordDelimiterFilterFactory and periods "." or dots

2012-02-09 Thread geeky2

>>
OK, first question is why are you searching on two different values?
Is that intentional?
<<

yes - our users have to be able to locate a part or model number (that may
or may not have periods in that number) even if they do NOT enter the number
with the embedded periods.

example:

actual part number in our database is BP2.1UAA

however the user needs to be able to search on BP21UAA and find that part.

there are business reason why a user may see something different in the
field then is actually in the database.

does this make sense?

>>
If I'm reading your problem right, you should
be able to get/not get any response just by toggling whether the
period is in the search URL, right?
<<

yes - simply put - the user MUST get a hit on the above mentioned part if
they enter BP21UAA or BP2.1UAA.

>>
But assuming that's not the problem, there's something you're
not telling us. In particular, why is this parsing as "MultiPhraseQuer"?
<<

sorry - i did not know i was doing this or how it happened - it was not
intentional and i did not notice this until your posting. i am not sure of
the implications related to this or what it means to have something as a
MultiPhraseQuery.

>>
Are you putting quotes in somehow, either through the URL or by
something in your solrconfig.xml?
<<

i did not use quotes in the url - i cut and pasted the urls for my tests in
the message thread. i do not see quotes as part of the url in my previous
post.

what would i be looking for in the solrconfig.xml file that would force the
MultiPhraseQuery?

it seems that this is the crux of the issue - but i am not sure how to
determine what is manifesting the quotes? as previously stated - the quotes
are not being entered via the url - they are pasted (in this message thread)
exactly as i pulled them from the browser.

thank you,
mark

--
View this message in context:
http://lucene.472066.n3.nabble.com/struggling-with-solr-WordDelimiterFilterFactory-and-periods-or-dots-tp3724822p3730070.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Help:Solr can't put all pdf files into index

2012-02-09 Thread Michael Kuhlmann


I don't know much about Tika, but this seems to be a bug in PDFBox.

See: https://issues.apache.org/jira/browse/PDFBOX-797

Yoz might also have a look at this: 
http://stackoverflow.com/questions/7489206/error-while-parsing-binary-files-mostly-pdf


At least that's what I found when I googled the NPE.

Greetings,
Kuli

On 09.02.2012 17:13, Rong Kang wrote:

I test one file that is missing in Solr index. And solr response as below

[...]


Exception in entity : 
tika-test:org.apache.solr.handler.dataimport.DataImportHandlerException: Unable 
to read content Processing Document # 1
at 
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
at 
org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:130)
at 
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:238)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:591)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:617)
at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:267)
at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:186)
at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:353)
at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:411)
at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:392)
Caused by: org.apache.tika.exception.TikaException: Unexpected RuntimeException 
from org.apache.tika.parser.ParserDecorator$1@190725e
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:199)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:137)
at 
org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:128)
... 8 more
Caused by: java.lang.NullPointerException
at org.apache.pdfbox.pdmodel.PDPageNode.getCount(PDPageNode.java:109)
at org.apache.pdfbox.pdmodel.PDDocument.getNumberOfPages(PDDocument.java:943)
at org.apache.tika.parser.pdf.PDFParser.extractMetadata(PDFParser.java:108)
at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:89)
at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:91)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
... 10 more


I think this is because tika can't read the pdf file or this  pdf file's format 
has some error. But I can read this pdf file in Adobe Reader.
Regards,

Rong Kang

Re:Re: Help:Solr can't put all pdf files into index

I test one file that is missing in Solr index. And solr response as below  

...
0
1
0
2012-02-10 00:03:23

Indexing completed. Added/Updated: 0 documents. Deleted 0 documents.

..

I see tomcat's log file and find this

Exception in entity : 
tika-test:org.apache.solr.handler.dataimport.DataImportHandlerException: Unable 
to read content Processing Document # 1
at 
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
at 
org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:130)
at 
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:238)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:591)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:617)
at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:267)
at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:186)
at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:353)
at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:411)
at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:392)
Caused by: org.apache.tika.exception.TikaException: Unexpected RuntimeException 
from org.apache.tika.parser.ParserDecorator$1@190725e
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:199)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:137)
at 
org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:128)
... 8 more
Caused by: java.lang.NullPointerException
at org.apache.pdfbox.pdmodel.PDPageNode.getCount(PDPageNode.java:109)
at org.apache.pdfbox.pdmodel.PDDocument.getNumberOfPages(PDDocument.java:943)
at org.apache.tika.parser.pdf.PDFParser.extractMetadata(PDFParser.java:108)
at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:89)
at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:91)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
... 10 more

I think this is because tika can't read the pdf file or this  pdf file's format 
has some error. But I can read this pdf file in Adobe Reader.
Regards,

Rong Kang
At 2012-02-09 23:49:28,"Michael Kuhlmann"  wrote:
>I'd suggest that you check which documents *exactly* are missing in Solr 
>index. Or find at least one that's missing, and try to figure out how 
>this document differs from the other ones that can be found in Solr.
>
>Maybe we can then find out what exact problem there is.
>
>Greetings,
>-Kuli
>
>On 09.02.2012 16:37, Rong Kang wrote:
>>
>> Yes, I put all file in one directory and I have tested file names using 
>> code.
>>
>>
>>
>>
>> At 2012-02-09 20:45:49,"Jan Høydahl"  wrote:
>>> Hi,
>>>
>>> Are you 100% sure that the filename is globally unique, since you use it as 
>>> the uniqueKey?
>>>
>>> --
>>> Jan Høydahl, search solution architect
>>> Cominvent AS - www.cominvent.com
>>> Solr Training - www.solrtraining.com
>>>
>>> On 9. feb. 2012, at 08:30, 荣康 wrote:
>>>
 Hey ,
 I am using solr as my search engine to search my pdf files. I have 18219 
 files(different file names) and all the files are in one same 
 directory。But when I use solr to import the files into index using 
 Dataimport method, solr report only import 17233 files. It's very strange. 
 This problem has stoped out project for a few days. I can't handle it.

 please help me!

 Schema.xml

>>> termVectors="true" termPositions="true" termOffsets="true"/>
>>> required="true" termVectors="true" termPositions="true" 
 termOffsets="true"/>

 id

 and

 >>> rootEntity="false"
 dataSource="null"  baseDir="H:/pdf/cls_1_16800_OCRed/1"
 fileName=".*\.(PDF)|(pdf)|(Pdf)|(pDf)|(pdF)|(PDf)|(PdF)|(pDF)" 
 onError="skip">

 >>> url="${f.fileAbsolutePath}" format="text" dataSource="bin" onError="skip">

 sincerecly
 Rong Kang

>>>
>

Re: Wildcard ? issue?

You can pull down 3.5 (aka 3.x) from the nightly build if you want, see:
https://builds.apache.org//view/S-Z/view/Solr/job/Solr-3.x/
the "last successful artifacts" link will probably be what you want.

Best
Erick

On Thu, Feb 9, 2012 at 5:35 AM, Dalius Sidlauskas
 wrote:
> Okay, I get it, 3.6 is not released yet. Thanks for help fellas!
>
> Regards!
> Dalius Sidlauskas
>
>
>
> On 09/02/12 10:19, Dalius Sidlauskas wrote:
>>
>> It seams it is applicable for Solr 3.6 and 4.0. Mines version is 3.5
>>
>> Regards!
>> Dalius Sidlauskas
>>
>>
>> On 08/02/12 17:26, Ahmet Arslan wrote:

 I have already tried this and it did
 not helped because it does not
 highlight matches if wild-card is used. The field
 configuration turns
 data to:
>>>
>>> This writeup should explain your scenario :
>>> http://wiki.apache.org/solr/MultitermQueryAnalysis

Re: Latest SolrCloud Issue

done
https://issues.apache.org/jira/browse/SOLR-3117

On Thu, Feb 9, 2012 at 11:02 AM, Mark Miller  wrote:
>
> On Feb 9, 2012, at 10:14 AM, Jamie Johnson wrote:
>
>> So I think the change I made should still be done
>
> If you create a JIRA issue, I'd be happy to pop it in.
>
> - Mark Miller
> lucidimagination.com
>
>
>
>
>
>
>
>
>
>
>

Re: How do i do group by in solr with multiple shards?

You have to provide many more details, particularly what you've
tried and what the results have been. For instance, from the
Wiki:
Grouping is also supported for distributed searches from version
Solr3.5 and from version  Solr4.0. Currently group.truncate and
group.func are the only parameters that aren't supported for
distributed searches.

So are you using at least Solr 3.5?

Please review:
http://wiki.apache.org/solr/UsingMailingLists

Best
Erick

On Thu, Feb 9, 2012 at 1:04 AM, Kashif Khan  wrote:
> Hi all,
>
> I have tried group by in solr with multiple shards but it does not work.
> Basically i want to simply do GROUP BY statement like in SQL in solr with
> multiple shards. Please suggest me how can i do this as it is not supported
> currently OOB by solr.
>
> Thanks & regards,
> Kashif Khan
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/How-do-i-do-group-by-in-solr-with-multiple-shards-tp3728555p3728555.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Latest SolrCloud Issue


On Feb 9, 2012, at 10:14 AM, Jamie Johnson wrote:

> So I think the change I made should still be done

If you create a JIRA issue, I'd be happy to pop it in.

- Mark Miller
lucidimagination.com

Re: Help:Solr can't put all pdf files into index

2012-02-09 Thread Michael Kuhlmann

I'd suggest that you check which documents *exactly* are missing in Solr 
index. Or find at least one that's missing, and try to figure out how 
this document differs from the other ones that can be found in Solr.


Maybe we can then find out what exact problem there is.

Greetings,
-Kuli

On 09.02.2012 16:37, Rong Kang wrote:


Yes, I put all file in one directory and I have tested file names using code.




At 2012-02-09 20:45:49,"Jan Høydahl"  wrote:

Hi,

Are you 100% sure that the filename is globally unique, since you use it as the 
uniqueKey?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 9. feb. 2012, at 08:30, 荣康 wrote:


Hey ,
I am using solr as my search engine to search my pdf files. I have 18219 
files(different file names) and all the files are in one same directory。But 
when I use solr to import the files into index using Dataimport method, solr 
report only import 17233 files. It's very strange. This problem has stoped out 
project for a few days. I can't handle it.


please help me!


Schema.xml



   
   
   

id



and


















sincerecly
Rong Kang

Re: Help:Solr can't put all pdf files into index

2012-02-09 Thread François Schiettecatte

Have you tried checking any logs?

Have you tried identifying a file which did not make it in and submitting just 
that one and seeing what happens?

François

On Feb 9, 2012, at 10:37 AM, Rong Kang wrote:

> 
> Yes, I put all file in one directory and I have tested file names using 
> code.  
> 
> 
> 
> 
> At 2012-02-09 20:45:49,"Jan Høydahl"  wrote:
>> Hi,
>> 
>> Are you 100% sure that the filename is globally unique, since you use it as 
>> the uniqueKey?
>> 
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>> Solr Training - www.solrtraining.com
>> 
>> On 9. feb. 2012, at 08:30, 荣康 wrote:
>> 
>>> Hey ,
>>> I am using solr as my search engine to search my pdf files. I have 18219 
>>> files(different file names) and all the files are in one same directory。But 
>>> when I use solr to import the files into index using Dataimport method, 
>>> solr report only import 17233 files. It's very strange. This problem has 
>>> stoped out project for a few days. I can't handle it.
>>> 
>>> 
>>> please help me!
>>> 
>>> 
>>> Schema.xml
>>> 
>>> 
>>> 
>>>  >> termVectors="true" termPositions="true" termOffsets="true"/>
>>>  >> termVectors="true" termPositions="true" termOffsets="true"/>
>>>   
>>> 
>>> id 
>>> 
>>> 
>>> 
>>> and 
>>>  
>>>
>>>  
>>> >> rootEntity="false" 
>>> dataSource="null"  baseDir="H:/pdf/cls_1_16800_OCRed/1" 
>>> fileName=".*\.(PDF)|(pdf)|(Pdf)|(pDf)|(pdF)|(PDf)|(PdF)|(pDF)" 
>>> onError="skip"> 
>>> 
>>> 
>>> >> url="${f.fileAbsolutePath}" format="text" dataSource="bin" onError="skip">
>>> 
>>>  
>>> 
>>>  
>>>  
>>>
>>>  
>>> 
>>> 
>>> 
>>> 
>>> sincerecly
>>> Rong Kang
>>> 
>>> 
>>> 
>>

RE: Index Start Question

2012-02-09 Thread Hoffman, Chase

Erick,

My understanding of the process is this:

1. The DBA opens a command line prompt and initiates an index build/rebuild
2. SOLR performs said index build/rebuild
3. Index finishes

I don't think we're appending documents to the SOLR index - it's indexing MSSQL 
tables.  The servers these are running on aren't beefy enough to run multiple 
SOLR index builds at the same time.  So the hope is to find some key in the 
logs that shows the start of the index rebuild so that I can put in some 
automation to blast out an email saying "Server X is currently running an 
index, do not kick off an index run on Server X".

Thanks so much for your help.

Best,

--Chase

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Thursday, February 09, 2012 9:39 AM
To: solr-user@lucene.apache.org
Subject: Re: Index Start Question

OK, what do you mean by "index is kicked off"? You mean starting Solr or 
actually adding a document to a running Solr?

If the latter, you're probably looking for something like this:
Feb 9, 2012 10:34:26 AM
org.apache.solr.update.processor.LogUpdateProcessor finish
INFO: {add=[eoe32]} 0 6

The important bits are solr.update.processor and the add=blahblah bit where the 
stuff after the = will be a list of s for the document(s) added.

However, this will be somewhat fragile, the format of the logged messages is 
not guaranteed in future versions.

Although this is happening, I think, after the doc has been added to the index, 
so it may be too late for your problem.

Best
Erick

On Wed, Feb 8, 2012 at 3:13 PM, Hoffman, Chase  wrote:
> Please forgive me if this is a dumb question.  I've never dealt with SOLR 
> before, and I'm being asked to determine from the logs when a SOLR index is 
> kicked off (it is a Windows server).  The TOMCAT service runs continually, so 
> no love there.  In parsing the logs, I think 
> "org.apache.solr.core.SolrResourceLoader " is the indicator, since 
> "org.apache.solr.core.SolrCore execute" seems to occur even when I know an 
> index has not been started.
>
> Any advice you could give me would be wonderful.
>
> Best,
>
> --Chase
>
> Chase Hoffman
> Infrastructure Systems Administrator, Performance Technologies The 
> Advisory Board Company
> 512-681-2190 direct | 512-609-1150 fax 
> hoffm...@advisory.com | 
> www.advisory.com
>
> Don't miss out-log in now
> Unlock thousands of members-only tools, events, best practices, and more at 
> www.advisory.com.
> Get 
> started SignatureLine|Other|ABC|Login+8+Reasons|Nov212011>

Re: Index Start Question

OK, what do you mean by "index is kicked off"? You mean starting Solr
or actually adding a document to a running Solr?

If the latter, you're probably looking for something like this:
Feb 9, 2012 10:34:26 AM
org.apache.solr.update.processor.LogUpdateProcessor finish
INFO: {add=[eoe32]} 0 6

The important bits are solr.update.processor and the add=blahblah bit
where the stuff after the =
will be a list of s for the document(s) added.

However, this will be somewhat fragile, the format of the logged messages is not
guaranteed in future versions.

Although this is happening, I think, after the doc has been added to
the index, so it
may be too late for your problem.

Best
Erick

On Wed, Feb 8, 2012 at 3:13 PM, Hoffman, Chase  wrote:
> Please forgive me if this is a dumb question.  I've never dealt with SOLR 
> before, and I'm being asked to determine from the logs when a SOLR index is 
> kicked off (it is a Windows server).  The TOMCAT service runs continually, so 
> no love there.  In parsing the logs, I think 
> "org.apache.solr.core.SolrResourceLoader " is the indicator, since 
> "org.apache.solr.core.SolrCore execute" seems to occur even when I know an 
> index has not been started.
>
> Any advice you could give me would be wonderful.
>
> Best,
>
> --Chase
>
> Chase Hoffman
> Infrastructure Systems Administrator, Performance Technologies
> The Advisory Board Company
> 512-681-2190 direct | 512-609-1150 fax
> hoffm...@advisory.com | 
> www.advisory.com
>
> Don't miss out-log in now
> Unlock thousands of members-only tools, events, best practices, and more at 
> www.advisory.com.
> Get 
> started

Re:Re: Help:Solr can't put all pdf files into index


Yes, I put all file in one directory and I have tested file names using code.  




At 2012-02-09 20:45:49,"Jan Høydahl"  wrote:
>Hi,
>
>Are you 100% sure that the filename is globally unique, since you use it as 
>the uniqueKey?
>
>--
>Jan Høydahl, search solution architect
>Cominvent AS - www.cominvent.com
>Solr Training - www.solrtraining.com
>
>On 9. feb. 2012, at 08:30, 荣康 wrote:
>
>> Hey ,
>> I am using solr as my search engine to search my pdf files. I have 18219 
>> files(different file names) and all the files are in one same directory。But 
>> when I use solr to import the files into index using Dataimport method, solr 
>> report only import 17233 files. It's very strange. This problem has stoped 
>> out project for a few days. I can't handle it.
>> 
>> 
>> please help me!
>> 
>> 
>> Schema.xml
>> 
>> 
>> 
>>   > termVectors="true" termPositions="true" termOffsets="true"/>
>>   > termVectors="true" termPositions="true" termOffsets="true"/>
>>
>> 
>> id 
>> 
>> 
>> 
>> and 
>>  
>> 
>>  
>> > rootEntity="false" 
>> dataSource="null"  baseDir="H:/pdf/cls_1_16800_OCRed/1" 
>> fileName=".*\.(PDF)|(pdf)|(Pdf)|(pDf)|(pdF)|(PDf)|(PdF)|(pDF)" 
>> onError="skip"> 
>> 
>> 
>> > url="${f.fileAbsolutePath}" format="text" dataSource="bin" onError="skip">
>>  
>>  
>> 
>>  
>>  
>> 
>>  
>> 
>> 
>> 
>> 
>> sincerecly
>> Rong Kang
>> 
>> 
>> 
>

Re: struggling with solr.WordDelimiterFilterFactory and periods "." or dots

OK, first question is why are you searching on two different values?
Is that intentional? If I'm reading your problem right, you should
be able to get/not get any response just by toggling whether the
period is in the search URL, right?

But assuming that's not the problem, there's something you're
not telling us. In particular, why is this parsing as "MultiPhraseQuer"?
Are you putting quotes in somehow, either through the URL or by
something in your solrconfig.xml?

Because this works fine for me, using your schema definition and
without using quotes. I get, however, this as the parsed query:
eoe:b eoe:12 eoe:0123 eoe:120123 eoe:b120123
not a phrase in sight.

If I *do* put quotes around the version without the period, I get
no results returned and a MultiPhraseQuery.

Best
Erick



On Wed, Feb 8, 2012 at 11:54 AM, geeky2  wrote:
> hello,
>
> thanks for sticking with me on this ...very frustrating
>
> ok - i did perform the query with the debug parms using two scenarios:
>
> 1) a successful search (where i insert the period / dot) in to the itemNo
> field and the search returns a document.
>
> itemNo:BP2.1UAA
>
> http://hfsthssolr1.intra.searshc.com:8180/solrpartscat/core1/select/?q=itemNo%3ABP2.1UAA&version=2.2&start=0&rows=10&indent=on&debugQuery=on
>
> results from debug
>
> 
> 
>
> 
>  0
>  1
>  
>    on
>    10
>
>    2.2
>    on
>    0
>    itemNo:BP2.1UAA
>  
> 
> 
>  
>
>    PHILIPS
>    0333500
>    0333500,1549  ,BP2.1UAA                           
>    PLASMA TELEVISION
>    BP2.1UAA                           
>    2
>
>    BP2.1UAA                           
>    Plasma Television^
>    0
>    1549  
>  
> 
> 
>  itemNo:BP2.1UAA
>
>  itemNo:BP2.1UAA
>  MultiPhraseQuery(itemNo:"bp 2 (1 21) (uaa
> bp21uaa)")
>  itemNo:"bp 2 (1 21) (uaa bp21uaa)"
>  
>    
> 22.539911 = (MATCH) weight(itemNo:"bp 2 (1 21) (uaa bp21uaa)" in 134993),
> product of:
>  0.9994 = queryWeight(itemNo:"bp 2 (1 21) (uaa bp21uaa)"), product of:
>    45.079826 = idf(itemNo: bp=829 2=29303 1=43943 21=6716 uaa=32 bp21uaa=1)
>    0.02218287 = queryNorm
>  22.539913 = (MATCH) fieldWeight(itemNo:"bp 2 (1 21) (uaa bp21uaa)" in
> 134993), product of:
>    1.0 = tf(phraseFreq=1.0)
>    45.079826 = idf(itemNo: bp=829 2=29303 1=43943 21=6716 uaa=32 bp21uaa=1)
>    0.5 = fieldNorm(field=itemNo, doc=134993)
> 
>  
>
>  LuceneQParser
>  
>    1.0
>    
>      0.0
>      
>        0.0
>
>      
>      
>        0.0
>      
>      
>        0.0
>      
>      
>
>        0.0
>      
>      
>        0.0
>      
>      
>        0.0
>
>      
>    
>    
>      1.0
>      
>        1.0
>      
>      
>
>        0.0
>      
>      
>        0.0
>      
>      
>        0.0
>
>      
>      
>        0.0
>      
>      
>        0.0
>      
>    
>
>  
> 
> 
>
>
>
>
>
>
>
> 2) a NON-successful search (where i do NOT insert a period / dot) in to the
> itemNo field and the search does NOT return a document
>
>  itemNo:BP21UAA
>
> http://hfsthssolr1.intra.searshc.com:8180/solrpartscat/core1/select/?q=itemNo%3ABP21UAA&version=2.2&start=0&rows=10&indent=on&debugQuery=on
>
> 
> 
>
> 
>  0
>  1
>  
>    on
>    10
>
>    2.2
>    on
>    0
>    itemNo:BP21UAA
>  
> 
> 
> 
>
>  itemNo:BP21UAA
>  itemNo:BP21UAA
>  MultiPhraseQuery(itemNo:"bp 21 (uaa
> bp21uaa)")
>  itemNo:"bp 21 (uaa bp21uaa)"
>  
>  LuceneQParser
>
>  
>    1.0
>    
>      1.0
>      
>        1.0
>      
>
>      
>        0.0
>      
>      
>        0.0
>      
>      
>        0.0
>
>      
>      
>        0.0
>      
>      
>        0.0
>      
>    
>
>    
>      0.0
>      
>        0.0
>      
>      
>        0.0
>
>      
>      
>        0.0
>      
>      
>        0.0
>      
>      
>
>        0.0
>      
>      
>        0.0
>      
>    
>  
> 
>
> 
>
> the parsedquery part of the debug ouput looks like it DOES contain the term
> that i am entering for my search criteria on the itemNo field ??
>
> does this make sense?
>
> thank you,
> mark
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/struggling-with-solr-WordDelimiterFilterFactory-and-periods-or-dots-tp3724822p3726614.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Zookeeper view not displaying on latest trunk

I'm looking at the latest code on trunk and it seems as if the
zookeeper view does not work.  When trying to access the information I
get the following in the log


2012-02-09 10:28:49.030:WARN::/solr/zookeeper.jsp
java.lang.NullPointerException
at 
org.apache.jsp.zookeeper_jsp$ZKPrinter.(org.apache.jsp.zookeeper_jsp:55)
at 
org.apache.jsp.zookeeper_jsp._jspService(org.apache.jsp.zookeeper_jsp:533)
at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:109)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at 
org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:389)
at 
org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:486)
at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:380)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at 
org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:280)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

Re: Latest SolrCloud Issue

So I think the change I made should still be done, but the issue was
on my end missing the '' surrounding the url.  After changing that
things are moving again.

On Thu, Feb 9, 2012 at 9:56 AM, Jamie Johnson  wrote:
> This morning I pulled the latest code from trunk and was trying to
> walk through the adding of cores here
>
> http://outerthought.org/blog/491-ot.html
>
> curl 
> http://localhost:8501/solr/admin/cores?action=CREATE&name=slice1_shard1&collection=collection1&shard=slice1&collection.configName=config1
>
> when attempting to do this I'm seeing the following issue
>
> SEVERE: org.apache.solr.common.SolrException: Error executing default
> implementation of CREATE
>        at 
> org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:380)
>        at 
> org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:135)
>        at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
>        at 
> org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:292)
>        at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:166)
>        at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>        at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
>        at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>        at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
>        at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
>        at 
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
>        at 
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
>        at 
> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
>        at 
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
>        at org.mortbay.jetty.Server.handle(Server.java:326)
>        at 
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
>        at 
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
>        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
>        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
>        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
>        at 
> org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
>        at 
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
> Caused by: java.lang.RuntimeException: Core needs a name
>        at org.apache.solr.core.CoreDescriptor.(CoreDescriptor.java:47)
>        at 
> org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:330)
>        ... 21 more
>
> I modified CoreDescriptor.java as follows
>
>    if (name == null) {
>        throw new RuntimeException("Core needs a name");
>      }
>
>    if(coreContainer != null && coreContainer.getZkController() != null) {
>      this.cloudDesc = new CloudDescriptor();
>      // cloud collection defaults to core name
>      cloudDesc.setCollectionName(name.isEmpty() ?
> coreContainer.getDefaultCoreName() : name);
>    }
>
> otherwise a null pointer exception was getting thrown from line 49.
>
> From what I can tell I've specified the parameter being looked for
> (name), but it's not getting picked up, any thoughts on this?

Re: How to do this in Solr? random result for the first few results

2012-02-09 Thread Walter Underwood

Or you can do a search for two ads with random ordering, then a second search 
for ads in the desired order with excludes for the two ads returned in the 
first. 

You don't have to do everything inside Solr.

wunder
Search Guy, Chegg

On Feb 9, 2012, at 1:04 AM, Tommaso Teofili wrote:

> I think you may use/customize the query elevation component to achieve that.
> http://wiki.apache.org/solr/QueryElevationComponent
> Tommaso
> 
> 2012/2/9 mtheone 
> 
>> Say I have a classified ads site, I want to display 2 random items (premium
>> ads) in the beginning of the search result and the rest are regular ads,
>> how
>> do I do it?
>> 
>> Thanks
>> 
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/How-to-do-this-in-Solr-random-result-for-the-first-few-results-tp3728729p3728729.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>

Problem Multi word synonyms in solr 3.4

2012-02-09 Thread Pravin Agrawal

Hi All,



I am trying to use synonyms in solr 3.4 and facing below issue with multiword 
synonyms.



I am using edismax query parser with following fields in qf and pf



qf: name^1.2,name_synonym^0.5

pf: phrase_name^3



The analyzers that I am using for name_synonym is as follows





























With above configuration the below type of synonyms works fine

foobar => foo bar

FnB => foo and bar

aaa,bbb,ccc





However for following multiword synonym, the dismax query is incorrectly formed 
for qf field

xxx zzz, aaa bbb, mmm nnn, aaabbb





The parsedquery_tostring that gets formed for the query aaabbb is as follows



+(name:aaabbb^1.2 | name_synonym:" xxx zzz aaa bbb mmm (nnn aaabbb)"^0.5)~0.5 
(phrase_name:" xxx zzz aaa bbb mmm (nnn aaabbb)"~5^3.0)~0.5



I am expecting a query like



+(name:aaabbb^1.2 | ((name_synonym:xxx zzz name_synonym:aaa bbb 
name_synonym:mmm nnn name_synonym:aaabbb)^0.5))~0.5



Similarly for query xxx zzz I am getting following parsedquery_tostring from 
dismax



+((name:xxx^1.2 | name_synonym:xxx^0.5 | name:zzz^1.2 | 
name_synonym:zzz^0.5)~0.5) (phrase_name:"xxx zzz"~5^3.0)~0.5



But I m expecting following query



+((name:xxx^1.2 | name_synonym:xxx^0.5 | name:zzz^1.2 | 
name_synonym:zzz^0.5)~0.5) (phrase_name:"xxx zzz"~5^3.0 | phrase_name:"aaa 
bbb"~5^3.0 | phrase_name:"mmm nnn"~5^3.0 | phrase_name:"aaabbb"~5^3.0)~0.5





However it's not the case.

Please let me know if I am missing something or its expected behavior. Also 
please let me know what should be done to get my desired output.



Thanks in advance.

Pravin

DISCLAIMER
==
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.

Re: Which Tokeniser (and/or filter)

Give it a try. I was surprised the first time I tried ngramming,
the actual increase in my index size was much less than I feared.

Best
Erick

On Wed, Feb 8, 2012 at 11:41 AM, Robert Brown  wrote:
> Attempting to re-produce legacy behaviour (i know!) of simple SQL
> substring searching, with and without phrases.
>
> I feel simply NGram'ing 4m CV's may be pushing it?
>
>
> ---
>
> IntelCompute
> Web Design & Local Online Marketing
>
> http://www.intelcompute.com
>
>
> On Wed, 8 Feb 2012 11:27:24 -0500, Erick Erickson
>  wrote:
>> You'll probably have to index them in separate fields to
>> get what you want. The question is always whether it's
>> worth it, is the use-case really well served by having a
>> variant that keeps dots and things? But that's always more
>> a question for your product manager
>>
>> Best
>> Erick
>>
>> On Wed, Feb 8, 2012 at 9:23 AM, Robert Brown  wrote:
>>> Thanks Erick,
>>>
>>> I didn't get confused with multiple tokens vs multiValued  :)
>>>
>>> Before I go ahead and re-index 4m docs, and believe me I'm using the
>>> analysis page like a mad-man!
>>>
>>> What do I need to configure to have the following both indexed with and
>>> without the dots...
>>>
>>> .net
>>> sales manager.
>>> £12.50
>>>
>>> Currently...
>>>
>>> 
>>> >>        generateWordParts="1"
>>>        generateNumberParts="1"
>>>        catenateWords="1"
>>>        catenateNumbers="1"
>>>        catenateAll="1"
>>>        splitOnCaseChange="1"
>>>        splitOnNumerics="1"
>>>        types="wdftypes.txt"
>>> />
>>>
>>> with nothing specific in wdftypes.txt for full-stops.
>>>
>>> Should there also be any difference when quoting my searches?
>>>
>>> The analysis page seems to just drop the quotes, but surely actual
>>> calls don't do this?
>>>
>>>
>>>
>>> ---
>>>
>>> IntelCompute
>>> Web Design & Local Online Marketing
>>>
>>> http://www.intelcompute.com
>>>
>>>
>>> On Wed, 8 Feb 2012 07:38:42 -0500, Erick Erickson
>>>  wrote:
 Yes, WDDF creates multiple tokens. But that has
 nothing to do with the multiValued suggestion.

 You can get exactly what you want by
 1> setting multiValued="true" in your schema file and re-indexing. Say
 positionIncrementGap is set to 100
 2> When you index, add the field for each sentence, so your doc
       looks something like:
      
         i am a sales-manager in here
        using asp.net and .net daily
          .
       
 3> search like "sales manager"~100

 Best
 Erick

 On Wed, Feb 8, 2012 at 3:05 AM, Rob Brown  wrote:
> Apologies if things were a little vague.
>
> Given the example snippet to index (numbered to show searches needed to
> match)...
>
> 1: i am a sales-manager in here
> 2: using asp.net and .net daily
> 3: working in design.
> 4: using something called sage 200. and i'm fluent
> 5: german sausages.
> 6: busy A&E dept earning £10,000 annually
>
>
> ... all with newlines in place.
>
> able to match...
>
> 1. sales
> 1. "sales manager"
> 1. sales-manager
> 1. "sales-manager"
> 2. .net
> 2. asp.net
> 3. design
> 4. sage 200
> 6. A&E
> 6. £10,000
>
> But do NOT match "fluent german" from 4 + 5 since there's a newline
> between them when indexed, but not when searched.
>
>
> Do the filters (wdf in this case) not create multiple tokens, so if
> splitting on period in "asp.net" would create tokens for all of "asp",
> "asp.", "asp.net", ".net", "net".
>
>
> Cheers,
> Rob
>
> --
>
> IntelCompute
> Web Design and Online Marketing
>
> http://www.intelcompute.com
>
>
> -Original Message-
> From: Chris Hostetter 
> Reply-to: solr-user@lucene.apache.org
> To: solr-user@lucene.apache.org
> Subject: Re: Which Tokeniser (and/or filter)
> Date: Tue, 7 Feb 2012 15:02:36 -0800 (PST)
>
> : This all seems a bit too much work for such a real-world scenario?
>
> You haven't really told us what your scenerio is.
>
> You said you want to split tokens on whitespace, full-stop (aka:
> period) and comma only, but then in response to some suggestions you added
> comments other things that you never mentioned previously...
>
> 1) evidently you don't want the "." in foo.net to cause a split in tokens?
> 2) evidently you not only want token splits on newlines, but also
> positition gaps to prevent phrases matching across newlines.
>
> ...these are kind of important details that affect suggestions people
> might give you.
>
> can you please provide some concrete examples of hte types of data you
> have, the types of queries you want them to match, and the types of
> queries you *don't* want to match?
>
>
> -Hoss
>
>>>
>

Geospatial search with multivalued field

2012-02-09 Thread Marian Steinbach

Hi!

I'm trying to figure out how to enable spatial search for my use case.

I have documents that are in many cases associated with multiple geo
locations. I'd like to filter documents by the minimum distance to a
reference point (which is given at query time).

What this means is: If at least one of the locations of a document
lies within a certain radius of the point, it should be included in
the result.

Which field type can I use for this and how would I have to do the filtering?

Sorting (by distance) isn't relevant at this point, but it might be in
the future.

The example in Solr 3.4 states in schema.xml for the fieldType
"location" (field "store"): "A specialized field for geospatial
search. If indexed, this fieldType must not be multivalued." If I used
a field of type solr.LatLonType this would mean that I could have
multivalued="true", but no indexing? This means that I couldn't do
fast bounding box / range queries on the locations in order to narrow
down the result for a distance filter, correct? So wich one is better?

Thanks!

Marian

Re: multiple cores in a single instance vs multiple instances with single core

On Feb 8, 2012, at 10:14 PM, Jamie Johnson wrote:

> Thanks Mark, in regards to failover I completely agree, I am wondering more
> about performance and memory usage if the indexes are large and wondering
> if the separate Java instances under heavy load would more or less
> performant.  Currently we deploy a single core per instance but deploy
> multiple instances per machine

I've heard reports that you can eek out more performance in certain situations 
(imagine cases where a large data structure is built with a single thread) by 
using more java instances or more cores rather than one huge index. I don't 
know that it's generally worth the extra headache, and there are fewer and 
fewer hotspots around this sort of thing all the time, but you can likely push 
things a little harder with more jvms (or probably just cores). I think 
generally you don't need to do it, but if you are managing lots of shards 
anyway it may be worth trying your own tests.

- Mark Miller
lucidimagination.com

Re: solr cloud concepts

On Feb 8, 2012, at 11:27 PM, Adeel Qureshi wrote:

> to
> create new collections its not that automated right ..

It can be fairly automated...if you have uploaded the configuration sets for 
both collections, you can basically then create new collections that use one of 
those configuration sets using CoreAdminHandler commands. You would just create 
as many SolrCores as the number of instances (or shards) that you wanted, 
specifying which collection they belong to when you do. Essentially a new 
collection is created the first time a SolrCore that has been set to a new 
collection name starts.

So if you wanted a new collection with new configuration you would do something 
like:

* upload new configuration files and call them config2

* Using the CoreAdminHandler, create a new core on Solr instance 1 with 
collection name 'collection2' and use the conf set 'config2' and shard it into 
2 so that the index will span 2 Solr instances. This will get auto assigned 
shard1.

*  Using the CoreAdminHandler, create a new core on Solr instance 2 with 
collection name 'collection2' and use the conf set 'config2'. This will get 
auto assigned shard2.

*  Using the CoreAdminHandler,create a new core on Solr instance 3 with 
collection name 'collection2' and use the conf set 'config2'. This will 
replicate 1 or 2 for query load and data redundancy.

*  Using the CoreAdminHandler,create a new core on Solr instance 4 with 
collection name 'collection2' and use the conf set 'config2'. This will host 
shard2.  This will replicate 1 or 2 for query load and data redundancy.

That would give you 4 instances with half your index for the collection on 2 
instances, the other half on 2 other instances. Each half will will have a 
duplicate instance so you have 2 copies of the index in the cluster.

- Mark Miller
lucidimagination.com

spell checking and filtering in the same query

2012-02-09 Thread Mark Swinson

Background:

I have an solr index containing foodtypes, chefs, and courses. This is
an initial setup to test my configuration.


Here is the problem I'm trying to solve :

-When I query for a mispelt foodtype 'x' and filter by chef 'c' I should
get a suggested list of foodtypes prepared by chef 'c'


ok:

I've managed to set up a spellcheck component so I can make the
following query:

/suggest?q=ban&spellcheck.dictionary=foodtypes

This gets me the results
'banana bread'
'banoffee pie'

How can I modify this query and the solr configuration to allow me to
filter by another field?

I'm aware the the fq parameter does not work with the SpellCheck
component.
Is there anyway of passing the results of the first query to a filter
query? I've seen various posts
on this topic, but no solutions. The best suggestion was to make the
client make a second request,
which is something I do not want to do.

Is it possible to write a SearchComponent or SearchHandler that chains
results?


Thanks for any help.


Mark








http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal 
views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on 
it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.

Re: Help:Solr can't put all pdf files into index

2012-02-09 Thread Jan Høydahl

Hi,

Are you 100% sure that the filename is globally unique, since you use it as the 
uniqueKey?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 9. feb. 2012, at 08:30, 荣康 wrote:

> Hey ,
> I am using solr as my search engine to search my pdf files. I have 18219 
> files(different file names) and all the files are in one same directory。But 
> when I use solr to import the files into index using Dataimport method, solr 
> report only import 17233 files. It's very strange. This problem has stoped 
> out project for a few days. I can't handle it.
> 
> 
> please help me!
> 
> 
> Schema.xml
> 
> 
> 
>termVectors="true" termPositions="true" termOffsets="true"/>
>termVectors="true" termPositions="true" termOffsets="true"/>
>
> 
> id 
> 
> 
> 
> and 
>  
> 
>  
>  rootEntity="false" 
> dataSource="null"  baseDir="H:/pdf/cls_1_16800_OCRed/1" 
> fileName=".*\.(PDF)|(pdf)|(Pdf)|(pDf)|(pdF)|(PDf)|(PdF)|(pDF)" 
> onError="skip"> 
> 
> 
>  url="${f.fileAbsolutePath}" format="text" dataSource="bin" onError="skip">
>  
>  
> 
>  
>  
> 
>  
> 
> 
> 
> 
> sincerecly
> Rong Kang
> 
> 
>

Re:Re: solr search speed is so slow.