Re: spellchecking multiple fields?

2008-07-15 Thread Shalin Shekhar Mangar
One way would be to create a copyField containing both the fields and use it
as the dictionary's source.

If you do want to keep separate dictionaries for both the fields then I
guess we can introduce per-dictionary overridable parameters like the
per-field overridden facet parameters. That would be cleaner than json
params. What do you think?

On Wed, Jul 16, 2008 at 6:26 AM, Ryan McKinley <[EMAIL PROTECTED]> wrote:

> I have a use case where I want to spellcheck the input query across
> multiple fields:
>  Did you mean: location = washington
>  vs
>  Did you mean: person = washington
>
> The current parameter / response structure for the spellcheck component
> does not support this kind of thing.  Any thoughts on how/if the component
> should handle this?  Perhaps it could be in a requestHandler where the
> params are passed in as json?
>
>  spelling={ dictionary="location", onlyMorePopular=true}&spelling={
> dictionary="person", onlyMorePopular=false }
>
> Thoughts?
> ryan
>



-- 
Regards,
Shalin Shekhar Mangar.


spellchecking multiple fields?

2008-07-15 Thread Ryan McKinley
I have a use case where I want to spellcheck the input query across  
multiple fields:

 Did you mean: location = washington
  vs
 Did you mean: person = washington

The current parameter / response structure for the spellcheck  
component does not support this kind of thing.  Any thoughts on how/if  
the component should handle this?  Perhaps it could be in a  
requestHandler where the params are passed in as json?


 spelling={ dictionary="location",  
onlyMorePopular=true}&spelling={ dictionary="person",  
onlyMorePopular=false }


Thoughts?
ryan


Re: Slow deleteById request

2008-07-15 Thread Renaud Delbru

Hi,

I think the reason was indeed maxPendingDeletes which was configured to 
1000.
After having updated to a solr nightly build with Lucene 2.4, the issue 
seems to have disappeared.


Thanks for your advices.
--
Renaud Delbru

Mike Klaas wrote:


On 1-Jul-08, at 10:44 PM, Chris Hostetter wrote:

>
> : Yes, updating to a newer version of nightly Solr build could solve 
> the
> : problem, but I am a little afraid to do it since solr-trunk has 
> switched to

> : lucene 2.4-dev.
>
> but did you check wether or not you have maxPendingDeletes 
> configured as

> yonik asked?
>
> That would explain exactly waht you are seeing ... after a certain 
> number
> of deletes have passed, the next one would automaticly force a 
> commit (and
> a newSearcher) and (i believe) subsequent deletes would block until 
> the

> commit is done ... which sounds like exactly what you describe.

It shouldn't cause a commit, just a flushing of deletes.  However, 
deletes count toward both maxDocs and maxTime for  
purposes, so that is the likely explanation.


-Mike



Re: solr synonyms behaviour

2008-07-15 Thread swarag


Yonik Seeley wrote:
> 
> On Tue, Jul 15, 2008 at 2:27 PM, swarag <[EMAIL PROTECTED]>
> wrote:
>> To my understanding, this means I am using synonyms at index time and NOT
>> query time. And yet, I am still having these problems with synonyms.
> 
> Can you give a specific example?  Use debugQuery=true to see what the
> resulting query is.
> You can also use the admin analysis page to see what the output of the
> index and query analyzers.
> 
> -Yonik
> 
> 

So it sounds like using the '=>' operator for synonyms that may or may not
contain multiple words causes problems.  So I changed my synonyms.txt to the
following:

club,bar,night cabaret

In schema.xml, I now have the following:

  







  
  






  


As you can see, 'night cabaret' is my only multi-word synonym term. Searches
for 'bar' and 'club' now behave as expected.  However, if I search for JUST
'night' or JUST 'cabaret', it looks like it is still using the synonyms
'bar' and 'club', which is not what is desired.  I only want 'bar' and
'club' to be returned if a search for the complete 'night cabaret' is
submitted.

Since query-time synonyms is turned "off", the resulting
parsedquery_toString is simply "name:night", "name:cabaret", etc...

Thanks!
-- 
View this message in context: 
http://www.nabble.com/solr-synonyms-behaviour-tp15051211p18476205.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: 2 IDs in schema.xml

2008-07-15 Thread Shalin Shekhar Mangar
Multiple uniqueKeys are not supported. You must use only one field as the
uniqueKey.

On Tue, Jul 15, 2008 at 11:52 PM, dudes dudes <[EMAIL PROTECTED]> wrote:

>
> Hi
>
> With some strange reason hotmail doesn't send any XML tags through. I have
> attached a file with all the necessary xml tags there , thanks :)
>
> I have a rare situation and I'm not too sure how to resolve it.
>  I have defined 2 fields.. one is call userID and the other one is called
> companyID in schema.xml file Please see part 1 of the attached xml file.
>
>
> Then I have both of them fields specified as uniquekeys . PLease see part 2
> of the attached document.
>
>
> when I try to post a test6.xml ( ie java -jar post.jar test6.xml) it gives
> me the following error:
>
> SimplePostTool:FATAL:Solr returned an error:
> Document_null_missing_required_field_userID
>
> However; if I replace CompanyID with userID under test6.xml file, it
> commits without any problems.
>
> any thoughts about this ?
>
> Many thanks to all
> ak
>
>
> _
> The John Lewis Clearance - save up to 50% with FREE delivery
> http://clk.atdmt.com/UKM/go/101719806/direct/01/




-- 
Regards,
Shalin Shekhar Mangar.


Re: FileBasedSpellChecker behavior?

2008-07-15 Thread Shalin Shekhar Mangar
Also see https://issues.apache.org/jira/browse/SOLR-622

On Wed, Jul 16, 2008 at 2:25 AM, Yonik Seeley <[EMAIL PROTECTED]> wrote:

> On Tue, Jul 15, 2008 at 4:19 PM, Grant Ingersoll <[EMAIL PROTECTED]>
> wrote:
> > agreed, but there is a problem in Solr, AIUI, with regards to when the
> > readers are available and when inform() gets called.  The workaround is
> to
> > have a warming query, I believe.
>
> Right... see https://issues.apache.org/jira/browse/SOLR-593
>
> -Yonik
>



-- 
Regards,
Shalin Shekhar Mangar.


Re: FileBasedSpellChecker behavior?

2008-07-15 Thread Yonik Seeley
On Tue, Jul 15, 2008 at 4:19 PM, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
> agreed, but there is a problem in Solr, AIUI, with regards to when the
> readers are available and when inform() gets called.  The workaround is to
> have a warming query, I believe.

Right... see https://issues.apache.org/jira/browse/SOLR-593

-Yonik


Re: FileBasedSpellChecker behavior?

2008-07-15 Thread Grant Ingersoll


On Jul 15, 2008, at 3:49 PM, Ryan McKinley wrote:


Hi-

I'm messing with spellchecking and running into behavior that seems  
peculiar.  We have an index with many words including:

"swim" and "slim"

If I search for "slim", it returns "swim" as an option -- likewise,  
if I search for "slim" it returns "swim"


why does it check words that are in the dictionary?  This does not  
seem to be the behavior for IndexBasedSpellChecker.


I think it can depend on your options, but there are reasons to check  
even if a word is in the dictionary (although w/ FileBased, it's not  
as obvious.)  Namely, there can be "better" spellings available.  The  
strange thing is, I believe, the Lucene Spell checker should be  
handling this, but your not the first to report the oddity.





- - - -

Perhaps the FileBasedSpellChecker should load the configs at  
startup.  It is too strange to have to call load each time the index  
starts.  It should just implement solrCoreAware() and then load the  
file at startup.


agreed, but there is a problem in Solr, AIUI, with regards to when the  
readers are available and when inform() gets called.  The workaround  
is to have a warming query, I believe.





thanks
ryan




FileBasedSpellChecker behavior?

2008-07-15 Thread Ryan McKinley

Hi-

I'm messing with spellchecking and running into behavior that seems  
peculiar.  We have an index with many words including:

"swim" and "slim"

If I search for "slim", it returns "swim" as an option -- likewise, if  
I search for "slim" it returns "swim"


why does it check words that are in the dictionary?  This does not  
seem to be the behavior for IndexBasedSpellChecker.


- - - -

Perhaps the FileBasedSpellChecker should load the configs at startup.   
It is too strange to have to call load each time the index starts.  It  
should just implement solrCoreAware() and then load the file at startup.


thanks
ryan


RE: Wiki for 1.3

2008-07-15 Thread sundar shankar
THANKS!!!

> Date: Tue, 15 Jul 2008 11:38:06 -0700> From: [EMAIL PROTECTED]> To: 
> solr-user@lucene.apache.org> Subject: RE: Wiki for 1.3> > > : Thanks. Do we 
> expect the same some time soon. I agree that the user > : community have shed 
> light in with a lot of examples. Just wanna know if > : there was more that 
> could be done. I am looking at the java docs of the > : same too and that 
> helps to some extent. But have felt the wiki was very > : very useful in the 
> past for me.> > The wiki has never been (nor attempted to be) a comprehensive 
> list of > every "plugin" type available in Solr -- just a pointer to where 
> that info > can be found in the javadocs. The specific items listed on the > 
> AnalyzersTokenizersTokenFilters are just the ones that are particularly > 
> common, or have subtleties about them that people wanted to make notes > 
> aout.> > You can feel free to add any tips&tricks about any analysis plugin 
> you > want to that page.> > SOLR-555 is an attempt at generating more user 
> friendly docs about all > out-ot-the-box plugins. Once it's ready for prime 
> time, we'll still need > more class level javadocs for the various plugins to 
> really make it useful > - so any patches along that lines will eventually 
> help.> > > -Hoss> 
_
Wish to Marry Now? Click Here to Register FREE
http://www.shaadi.com/registration/user/index.php?ptnr=mhottag

Re: solr synonyms behaviour

2008-07-15 Thread Yonik Seeley
On Tue, Jul 15, 2008 at 2:27 PM, swarag <[EMAIL PROTECTED]> wrote:
> To my understanding, this means I am using synonyms at index time and NOT
> query time. And yet, I am still having these problems with synonyms.

Can you give a specific example?  Use debugQuery=true to see what the
resulting query is.
You can also use the admin analysis page to see what the output of the
index and query analyzers.

-Yonik


Re: Solr stops responding

2008-07-15 Thread Fuad Efendi

Sorry for bunch of short self-replies, just trying to analyse...

CPU may get overloaded by constantly running GC trying to  
defragment&optimize memory, in a loop (constant queue of requests);  
response time will be few minutes (in best cases) and contain 500...  
so that sometimes we can't see OOM in log files (overloaded CPU).


At least during troubleshooting we need to comment this block out in  
SolrServlet:



} catch (Throwable e) {
  SolrException.log(log,e);
  sendErr(500, SolrException.toStr(e), request, response);
}



I can't understand also why it happened several times yesterday with  
SUN Java 5 (AMD64), and does not happen yet BEA JRockit. I had  
different problems with JRockit (HttpClient didn't not work with it)  
so that I avoided it till now...



==
http://www.linkedin.com/in/liferay


Quoting Fuad Efendi <[EMAIL PROTECTED]>:


Just as a sample, SolrCore contains blocks like
} catch (Throwable e) {
 SolrException.logOnce(log,null,e);
}


And SolrServlet:
} catch (Throwable e) {
  SolrException.log(log,e);
  sendErr(500, SolrException.toStr(e), request, response);
}



What will happen with OutOfMemoryError? If memory is not
'enough'-enough it won't even output to catalina.out, and JVM/SOLR will
stop responding instead of 'abnormal' exit...



Quoting Fuad Efendi <[EMAIL PROTECTED]>:



I suspect that SolrException is used to catch ALL exceptions in order
to show "500 OutOfMemory" in HTML/XML/JSON etc., so that JVM simply
hangs... weird HTTP understanding...


Quoting Fuad Efendi <[EMAIL PROTECTED]>:


Following lines are strange, looks like SOLR deals with OOM and
rethrows own exception (so that in some cases JVM simply hangs instead
of exit):
Apr 4, 2008 1:20:53 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.OutOfMemoryError: Java heap space







RE: Wiki for 1.3

2008-07-15 Thread Chris Hostetter

: Thanks.  Do we expect the same some time soon. I agree that the user 
: community have shed light in with a lot of examples. Just wanna know if 
: there was more that could be done. I am looking at the java docs of the 
: same too and that helps to some extent. But have felt the wiki was very 
: very useful in the past for me.

The wiki has never been (nor attempted to be) a comprehensive list of 
every "plugin" type available in Solr -- just a pointer to where that info 
can be found in the javadocs.  The specific items listed on the  
AnalyzersTokenizersTokenFilters are just the ones that are particularly 
common, or have subtleties about them that people wanted to make notes 
aout.

You can feel free to add any tips&tricks about any analysis plugin you 
want to that page.

SOLR-555 is an attempt at generating more user friendly docs about all 
out-ot-the-box plugins.  Once it's ready for prime time, we'll still need 
more class level javadocs for the various plugins to really make it useful 
- so any patches along that lines will eventually help.


-Hoss



Re: solr synonyms behaviour

2008-07-15 Thread swarag


matt connolly wrote:
> 
> You won't have the multiple word problem if you use synonyms at index time
> instead of query time.
> 
> 
> swarag wrote:
>> 
>> Here is a basic example of some synonyms in my synonyms.txt:
>> club=>club,bar,night cabaret
>> bar=>bar,club
>> 
>> As you can see, a search for 'bar' will return any documents with 'bar'
>> or 'club' in the name. This works fine. However, a search for 'club'
>> SHOULD return any documents with 'club', 'bar' or 'night cabaret' in the
>> name, but it does not. It only returns 'bar' and 'club'.  
>> 
>> Interestingly, a search for 'night cabaret' gives me all 'night
>> cabaret's, 'bar's and 'club's...which is quite unexpected since I'm using
>> uni-directional synonym config (using the => symbol)
>> 
>> Does your config give you my desired behavior?
>> 
> 
> 

Is there something I am missing here? This is an excerpt from my schema.xml:

  







  
  






  


To my understanding, this means I am using synonyms at index time and NOT
query time. And yet, I am still having these problems with synonyms.

-- 
View this message in context: 
http://www.nabble.com/solr-synonyms-behaviour-tp15051211p18471922.html
Sent from the Solr - User mailing list archive at Nabble.com.



2 IDs in schema.xml

2008-07-15 Thread dudes dudes

Hi

With some strange reason hotmail doesn't send any XML tags through. I have 
attached a file with all the necessary xml tags there , thanks :)

I have a rare situation and I'm not too sure how to resolve it.
 I have defined 2 fields.. one is call userID and the other one is called 
companyID in schema.xml file Please see part 1 of the attached xml file.


Then I have both of them fields specified as uniquekeys . PLease see part 2 of 
the attached document.


when I try to post a test6.xml ( ie java -jar post.jar test6.xml) it gives me 
the following error:

SimplePostTool:FATAL:Solr returned an error: 
Document_null_missing_required_field_userID

However; if I replace CompanyID with userID under test6.xml file, it commits 
without any problems.

any thoughts about this ?

Many thanks to all
ak


_
The John Lewis Clearance - save up to 50% with FREE delivery
http://clk.atdmt.com/UKM/go/101719806/direct/01/
//I have defined 2 fields as shown bellow:


 
 

//UniqueKeys 

  userID
  companyID

//copy field commands









   44 
   



Re: Solr stops responding

2008-07-15 Thread Fuad Efendi

Just as a sample, SolrCore contains blocks like
} catch (Throwable e) {
 SolrException.logOnce(log,null,e);
}


And SolrServlet:
} catch (Throwable e) {
  SolrException.log(log,e);
  sendErr(500, SolrException.toStr(e), request, response);
}



What will happen with OutOfMemoryError? If memory is not  
'enough'-enough it won't even output to catalina.out, and JVM/SOLR  
will stop responding instead of 'abnormal' exit...




Quoting Fuad Efendi <[EMAIL PROTECTED]>:



I suspect that SolrException is used to catch ALL exceptions in order
to show "500 OutOfMemory" in HTML/XML/JSON etc., so that JVM simply
hangs... weird HTTP understanding...


Quoting Fuad Efendi <[EMAIL PROTECTED]>:


Following lines are strange, looks like SOLR deals with OOM and
rethrows own exception (so that in some cases JVM simply hangs instead
of exit):
 Apr 4, 2008 1:20:53 PM org.apache.solr.common.SolrException log
 SEVERE: java.lang.OutOfMemoryError: Java heap space







Re: Duplicate content

2008-07-15 Thread Ryan McKinley


On Jul 15, 2008, at 10:31 AM, Fuad Efendi wrote:


Thanks Ryan,

Is  really unique if we allow duplicates? I had similar  
problem...




if you allowDups, then uniqueKey may not be unique...

however, it is still used as the key for many items.




Quoting Ryan McKinley <[EMAIL PROTECTED]>:



On Jul 15, 2008, at 2:45 AM, Sunil wrote:


Hi All,

I want to change the duplicate content behavior in solr. What I  
want to

do is:

1) I don't want duplicate content.
2) I don't want to overwrite old content with new one.

Means, if I add duplicate content in solr and the content already
exists, the old content should not be overwritten.

Can anyone suggest how to achieve it?



Check the "allowDups" options for 
http://wiki.apache.org/solr/UpdateXmlMessages#head-3dfbf90fbc69f168ab6f3389daf68571ad614bef





Thanks,
Sunil










Re: solr synonyms behaviour

2008-07-15 Thread matt connolly

You won't have the multiple word problem if you use synonyms at index time
instead of query time.


swarag wrote:
> 
> Here is a basic example of some synonyms in my synonyms.txt:
> club=>club,bar,night cabaret
> bar=>bar,club
> 
> As you can see, a search for 'bar' will return any documents with 'bar' or
> 'club' in the name. This works fine. However, a search for 'club' SHOULD
> return any documents with 'club', 'bar' or 'night cabaret' in the name,
> but it does not. It only returns 'bar' and 'club'.  
> 
> Interestingly, a search for 'night cabaret' gives me all 'night cabaret's,
> 'bar's and 'club's...which is quite unexpected since I'm using
> uni-directional synonym config (using the => symbol)
> 
> Does your config give you my desired behavior?
> 

-- 
View this message in context: 
http://www.nabble.com/solr-synonyms-behaviour-tp15051211p18471373.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Filter by Type increases search results.

2008-07-15 Thread Yonik Seeley
On Tue, Jul 15, 2008 at 11:10 AM, Norberto Meijome <[EMAIL PROTECTED]> wrote:
> On Tue, 15 Jul 2008 18:07:43 +0530
> "Preetam Rao" <[EMAIL PROTECTED]> wrote:
>
>> When I say filter, I meant q=fish&fq=type:idea
>
> btw, this *seems* to only work for me with standard search handler. dismax 
> and fq: dont' seem to get along nicely... but maybe, it is just late and i'm 
> not testing it properly..

It should work the same... the only thing dismax does differently now
is change the type of the base query to "dismax".

-Yonik


Re: solr synonyms behaviour

2008-07-15 Thread swarag


matt connolly wrote:
> 
> 
> swarag wrote:
>> 
>> Knowing the Lucene struggles with multi-word query-time synonyms, my
>> question is, does this also affect index-time synonyms? What other
>> alternatives do we have if we require there to be multiple word synonyms?
>> 
> 
> No the multiple word problem doesn't happen with index synonyms, only
> query synonyms.
> 
> See:
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#head-2c461ac74b4ddd82e453dc68fcfc92da77358d46
> 
> I ended up using index time synonyms, but ideally, I'd like to see a
> filter factory that does something like the SynsExpand tool does (which
> was written for lucene, not solr).
> 

I've tried this and it doesn't seem to work. Here are the basics of my
config:

  

...
Synonyms for queryTime is off

Here is a basic example of some synonyms in my synonyms.txt:
club=>club,bar,night cabaret
bar=>bar,club

As you can see, a search for 'bar' will return any documents with 'bar' or
'club' in the name. This works fine. However, a search for 'club' SHOULD
return any documents with 'club', 'bar' or 'night cabaret' in the name, but
it does not. It only returns 'bar' and 'club'.  

Interestingly, a search for 'night cabaret' gives me all 'night cabaret's,
'bar's and 'club's...which is quite unexpected since I'm using
uni-directional synonym config (using the => symbol)

Does your config give you my desired behavior?
-- 
View this message in context: 
http://www.nabble.com/solr-synonyms-behaviour-tp15051211p18469995.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr stops responding

2008-07-15 Thread Fuad Efendi


I suspect that SolrException is used to catch ALL exceptions in order  
to show "500 OutOfMemory" in HTML/XML/JSON etc., so that JVM simply  
hangs... weird HTTP understanding...



Quoting Fuad Efendi <[EMAIL PROTECTED]>:


Following lines are strange, looks like SOLR deals with OOM and
rethrows own exception (so that in some cases JVM simply hangs instead
of exit):
  Apr 4, 2008 1:20:53 PM org.apache.solr.common.SolrException log
  SEVERE: java.lang.OutOfMemoryError: Java heap space







Re: WordDelimiterFilter splits at non-ASCII chars

2008-07-15 Thread Yonik Seeley
On Tue, Jul 15, 2008 at 10:29 AM, Stefan Oestreicher
<[EMAIL PROTECTED]> wrote:
> as I understand the WordDelimiterFilter should split on case changes, word
> delimiters and changes from character to digit, but it should not
> differentiate between ASCII and multibyte chars. It does however. The word
> "hälse" (german plural of "neck") gets split into "h", "ä" and "lse", which
> unfortunately renders this filter quite unusable for me. Am i missing
> something or is this a bug?
> I'm using solr 1.3 built from trunk.

Look for charset issues in communicating with Solr.  I just tried this
with the "text" field via Solr's analysis.jsp and it works fine.

-Yonik


Re: WordDelimiterFilter splits at non-ASCII chars

2008-07-15 Thread Shalin Shekhar Mangar
Hi Stefan,

I wrote a test case for the problem you described but it is working fine. I
used the following definition:



What configuration are you using? If it is different, please share it so
that I can test with it.

On Tue, Jul 15, 2008 at 7:59 PM, Stefan Oestreicher <
[EMAIL PROTECTED]> wrote:

> Hi,
>
> as I understand the WordDelimiterFilter should split on case changes, word
> delimiters and changes from character to digit, but it should not
> differentiate between ASCII and multibyte chars. It does however. The word
> "hälse" (german plural of "neck") gets split into "h", "ä" and "lse", which
> unfortunately renders this filter quite unusable for me. Am i missing
> something or is this a bug?
> I'm using solr 1.3 built from trunk.
>
> TIA,
>
> Stefan Oestreicher
>
>


-- 
Regards,
Shalin Shekhar Mangar.


Re: Solr stops responding

2008-07-15 Thread Fuad Efendi
Following lines are strange, looks like SOLR deals with OOM and  
rethrows own exception (so that in some cases JVM simply hangs instead  
of exit):

  Apr 4, 2008 1:20:53 PM org.apache.solr.common.SolrException log
  SEVERE: java.lang.OutOfMemoryError: Java heap space



This is full Thread Dump after OOM, made in April with Tomcat 6.  
Deadlock at Tomcat? Looks like some queries succeed, but I was forced  
KILL -9.

=


SEVERE: Error allocating socket processor
java.lang.OutOfMemoryError: Java heap space
Apr 4, 2008 1:57:36 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.OutOfMemoryError: Java heap space

Exception in thread "catalina-exec-4" java.lang.OutOfMemoryError: Java  
heap space

Apr 4, 2008 1:58:18 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.OutOfMemoryError: Java heap space

Apr 4, 2008 1:59:01 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.OutOfMemoryError: Java heap space

Apr 4, 2008 1:59:01 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.OutOfMemoryError: Java heap space

Apr 4, 2008 1:59:01 PM org.apache.tomcat.util.net.AprEndpoint$Acceptor run
SEVERE: Socket accept failed
java.lang.OutOfMemoryError: Java heap space
Apr 4, 2008 1:59:39 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.OutOfMemoryError: Java heap space

Apr 4, 2008 1:59:53 PM org.apache.solr.core.SolrCore execute
INFO: /select  
wt=xml&facet.limit=100&rows=100&start=0&facet=true&facet.mincount=1&fl=id,item_name,category,price,price_txt,url,host,country&q=webcam&qt=dismax&version=2.2&facet.field=country&facet.field=host&fq=host:"www.excaliberpc.com"&hl=true 0  
18

Apr 4, 2008 2:00:51 PM org.apache.solr.core.SolrCore execute
INFO: /select  
wt=xml&facet.limit=100&rows=100&start=0&facet=true&facet.mincount=1&fl=id,item_name,category,price,price_txt,url,host,country&q=pepe+jeans&qt=dismax&version=2.2&facet.field=country&facet.field=host&hl=true 0  
38544

Apr 4, 2008 2:02:11 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.OutOfMemoryError: Java heap space

Apr 4, 2008 2:02:11 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.OutOfMemoryError: Java heap space

Apr 4, 2008 2:02:11 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.OutOfMemoryError: Java heap space

Apr 4, 2008 2:02:11 PM org.apache.solr.core.SolrCore execute
INFO: /select  
wt=xml&facet.limit=100&rows=10&start=0&facet=true&facet.mincount=1&fl=id,item_name,category,price,price_txt,url,host,country&q=category:"core"&qt=standard&version=2.2&facet.field=country&facet.field=host&hl=true 0  
79439

Apr 4, 2008 2:02:21 PM org.apache.solr.core.SolrCore execute
INFO: /select  
wt=xml&facet.limit=100&rows=100&start=0&facet=true&facet.mincount=1&fl=id,item_name,category,price,price_txt,url,host,country&q=robot&qt=dismax&version=2.2&facet.field=country&facet.field=host&fq=host:"www.clickonit.com"&hl=true 0  
17

Apr 4, 2008 2:02:35 PM org.apache.solr.core.SolrCore execute
INFO: /select  
wt=xml&facet.limit=100&rows=100&start=0&facet=true&facet.mincount=1&fl=id,item_name,category,price,price_txt,url,host,country&q=Cognac&qt=dismax&version=2.2&facet.field=country&facet.field=host&fq=host:"www.designersimports.com"&hl=true 0  
19

Apr 4, 2008 2:03:12 PM org.apache.solr.core.SolrCore execute
INFO: /select  
wt=xml&facet.limit=100&rows=100&start=0&facet=true&facet.mincount=1&fl=id,item_name,category,price,price_txt,url,host,country&q=prada&qt=dismax&version=2.2&facet.field=country&facet.field=host&fq=host:"www.theprincessescloset.com"&hl=true 0  
1

Apr 4, 2008 2:04:55 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.OutOfMemoryError: Java heap space

Apr 4, 2008 2:04:55 PM org.apache.solr.core.SolrCore execute
INFO: /select  
wt=xml&facet.limit=100&rows=100&start=0&sort=price+desc&facet=true&facet.mincount=1&fl=id,item_name,category,price,price_txt,url,host,country&q=velodyne+DD+15&qt=dismax&version=2.2&facet.field=country&facet.field=host&hl=true 0  
53

Apr 4, 2008 2:05:21 PM org.apache.solr.core.SolrCore execute
INFO: /select  
wt=xml&facet.limit=100&rows=100&start=0&facet=true&facet.mincount=1&fl=id,item_name,category,price,price_txt,url,host,country&q=velodyne+DD+15&qt=dismax&version=2.2&facet.field=country&facet.field=host&fq=host:"www.hometheaterstore.com"&hl=true 0  
3

Apr 4, 2008 2:06:06 PM org.apache.solr.core.SolrCore execute
INFO: /select  
wt=xml&facet.limit=100&rows=100&start=0&sort=id+asc&facet=true&facet.mincount=1&fl=id,item_name,category,price,price_txt,url,host,country&q=sex&qt=dismax&version=2.2&facet.field=country&facet.field=host&fq=host:"www.moviesunlimited.com"&fq=category:"video"&hl=true 0  
39

Apr 4, 2008 2:06:24 PM org.apache.solr.core.SolrCore execute
INFO: /select  
wt=xml&facet.limit=100&rows=10&start=0&facet=true&facet.mincount=1&fl=id,item_name,category,price,price_txt,url,host,country&q=id:[*+TO+*]&qt=standard&version=2.2&facet.field=country&facet.field=host&hl=true 0  
859

Apr 4, 2008 2:07:03 PM org.apac

Best way to return ExternalFileField in the results

2008-07-15 Thread climbingrose
Hi all,
I've been trying to return a field of type ExternalFileField in the search
result. Upon examining XMLWriter class, it seems like Solr can't do this out
of the box. Therefore, I've tried to hack Solr to enable  this behaviour.
The goal is to call to ExternalFileField.getValueSource(SchemaField
field,QParser parser) in XMLWriter.writeDoc(String name, Document
document,...) method. There are two issues with doing this:

1) I need to create an instance of QParser in writeDoc method. What is the
best way to do this? What kind of overhead of creating a new QParser for
every document returned?

2) I have to modify writeDoc method to include the internal Lucene document
Id because I need it to retrieve the ExternalFileField:

fileField.getValueSource(schemaField,
qparser).getValues(request.getSearcher().getIndexReader()).floatVal(docId)

The immediate affect is that it breaks writeVal() method (because this
method references writeDoc()).

Any comments?

Thanks in advance.


-- 
Regards,

Cuong Hoang


Re: Solr stops responding

2008-07-15 Thread Noble Paul നോബിള്‍ नोब्ळ्
Can we collect more information. It would be nice to know what the
threads are doing when it hangs.
If you are using *nix  issue kill -3 
it would print out the stacktrace of all the threads in the VM . That
may tell us what is the state of each thread which could help us
suggest something


On Tue, Jul 15, 2008 at 8:59 PM, Fuad Efendi <[EMAIL PROTECTED]> wrote:
> I constantly have the same problem; sometimes I have OutOfMemoryError in
> logs, sometimes
> not. Not-predictable. I minimized all caches, it still happens even with
> 8192M. CPU usage
> is 375%-400% (two double-core Opterons), SUN Java 5. Moved to BEA JRockit 5
> yesterday,
> looks 30 times faster (25% CPU load with 4096M RAM); no any problem yet,
> let's see...
>
> Strange: Tomcat simply hangs instead of exit(...)
>
> There are some posts related to OutOfMemoryError in solr-user list.
>
>
> ==
> http://www.linkedin.com/in/liferay
>
> Quoting Doug Steigerwald <[EMAIL PROTECTED]>:
>
>> Since we pushed Solr out to production a few weeks ago, we've seen a
>> few issues with Solr not responding to requests (searches or admin
>> pages).  There doesn't seem to be any reason for it from what we can
>> tell.  We haven't seen it in QA or development.
>>
>> We're running Solr with basically the example Solr setup with Jetty
>> (6.1.3).  We package our Solr install by using 'ant example' and
>> replacing configs/etc.  Whenever Solr stops responding, there are no
>> messages in the logs, nothing.  Requests just time out.
>>
>> We have also only seen this on our slaves.  The master doesn't seem to
>> be hitting this issue.  All the boxes are the same, version of java is
>> the same, etc.
>>
>> We don't have a stack trace and no JMX set up.  Once we see this issue,
>> our support folks just stop and start Solr on that machine.
>>
>> Has anyone else run into anything like this with Solr?
>>
>> Thanks.
>> Doug
>
>
>
>



-- 
--Noble Paul


Re: Solr stops responding

2008-07-15 Thread Doug Steigerwald
We haven't seen an OutOfMemoryError.  The load on the server doesn't  
go up either (hovers around 1-2).  We're on Java 1.6.0_03-b05.   
4x3.8GHz Xeons, 8GB RAM.


Doug

On Jul 15, 2008, at 11:29 AM, Fuad Efendi wrote:

I constantly have the same problem; sometimes I have  
OutOfMemoryError in logs, sometimes
not. Not-predictable. I minimized all caches, it still happens even  
with 8192M. CPU usage
is 375%-400% (two double-core Opterons), SUN Java 5. Moved to BEA  
JRockit 5 yesterday,
looks 30 times faster (25% CPU load with 4096M RAM); no any problem  
yet, let's see...


Strange: Tomcat simply hangs instead of exit(...)

There are some posts related to OutOfMemoryError in solr-user list.


==
http://www.linkedin.com/in/liferay

Quoting Doug Steigerwald <[EMAIL PROTECTED]>:


Since we pushed Solr out to production a few weeks ago, we've seen a
few issues with Solr not responding to requests (searches or admin
pages).  There doesn't seem to be any reason for it from what we can
tell.  We haven't seen it in QA or development.

We're running Solr with basically the example Solr setup with Jetty
(6.1.3).  We package our Solr install by using 'ant example' and
replacing configs/etc.  Whenever Solr stops responding, there are no
messages in the logs, nothing.  Requests just time out.

We have also only seen this on our slaves.  The master doesn't seem  
to
be hitting this issue.  All the boxes are the same, version of java  
is

the same, etc.

We don't have a stack trace and no JMX set up.  Once we see this  
issue,

our support folks just stop and start Solr on that machine.

Has anyone else run into anything like this with Solr?

Thanks.
Doug







Re: Duplicate content

2008-07-15 Thread Fuad Efendi

Thanks Ryan,

Is  really unique if we allow duplicates? I had similar problem...


Quoting Ryan McKinley <[EMAIL PROTECTED]>:



On Jul 15, 2008, at 2:45 AM, Sunil wrote:


Hi All,

I want to change the duplicate content behavior in solr. What I want to
do is:

1) I don't want duplicate content.
2) I don't want to overwrite old content with new one.

Means, if I add duplicate content in solr and the content already
exists, the old content should not be overwritten.

Can anyone suggest how to achieve it?



Check the "allowDups" options for 
http://wiki.apache.org/solr/UpdateXmlMessages#head-3dfbf90fbc69f168ab6f3389daf68571ad614bef





Thanks,
Sunil








Re: Solr stops responding

2008-07-15 Thread Fuad Efendi
I constantly have the same problem; sometimes I have OutOfMemoryError  
in logs, sometimes
not. Not-predictable. I minimized all caches, it still happens even  
with 8192M. CPU usage
is 375%-400% (two double-core Opterons), SUN Java 5. Moved to BEA  
JRockit 5 yesterday,
looks 30 times faster (25% CPU load with 4096M RAM); no any problem  
yet, let's see...


Strange: Tomcat simply hangs instead of exit(...)

There are some posts related to OutOfMemoryError in solr-user list.


==
http://www.linkedin.com/in/liferay

Quoting Doug Steigerwald <[EMAIL PROTECTED]>:


Since we pushed Solr out to production a few weeks ago, we've seen a
few issues with Solr not responding to requests (searches or admin
pages).  There doesn't seem to be any reason for it from what we can
tell.  We haven't seen it in QA or development.

We're running Solr with basically the example Solr setup with Jetty
(6.1.3).  We package our Solr install by using 'ant example' and
replacing configs/etc.  Whenever Solr stops responding, there are no
messages in the logs, nothing.  Requests just time out.

We have also only seen this on our slaves.  The master doesn't seem to
be hitting this issue.  All the boxes are the same, version of java is
the same, etc.

We don't have a stack trace and no JMX set up.  Once we see this issue,
our support folks just stop and start Solr on that machine.

Has anyone else run into anything like this with Solr?

Thanks.
Doug






RE: Wiki for 1.3

2008-07-15 Thread sundar shankar
Thanks.  Do we expect the same some time soon. I agree that the user community 
have shed light in with a lot of examples. Just wanna know if there was more 
that could be done. I am looking at the java docs of the same too and that 
helps to some extent. But have felt the wiki was very very useful in the past 
for me.
 
 



> Date: Tue, 15 Jul 2008 11:26:16 +1000> From: [EMAIL PROTECTED]> To: 
> solr-user@lucene.apache.org> Subject: Re: Wiki for 1.3> > On Mon, 14 Jul 2008 
> 23:25:25 +> sundar shankar <[EMAIL PROTECTED]> wrote:> > > Thanks for 
> your patient response. I dont wanna know the classes changed, but I wanna get 
> a hand on the wiki page for the same. I tried to search for these classes in 
> the solr wiki. I was getting a page does not exist. This is the result of the 
> search I did on solr wiki site.> > Hi Sundar,> indeed, some pages havent been 
> written yet. > > If you check the mail archives, there are a few exchanges 
> with working configurations on *NGram* .> > b> > _> 
> {Beto|Norberto|Numard} Meijome> > "At times, to be silent is to lie." > 
> Miguel de Unamuno> > I speak for myself, not my employer. Contents may be 
> hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them 
> is worse. You have been Warned.
_
Wish to Marry Now? Join Shaadi.com FREE! 
http://www.shaadi.com/registration/user/index.php?ptnr=mhottag

Re: Solr stops responding

2008-07-15 Thread Jarek Zgoda
Doug Steigerwald pisze:

> We're running Solr with basically the example Solr setup with Jetty
> (6.1.3).  We package our Solr install by using 'ant example' and
> replacing configs/etc.  Whenever Solr stops responding, there are no
> messages in the logs, nothing.  Requests just time out.
> 
> We have also only seen this on our slaves.  The master doesn't seem to
> be hitting this issue.  All the boxes are the same, version of java is
> the same, etc.
> 
> We don't have a stack trace and no JMX set up.  Once we see this issue,
> our support folks just stop and start Solr on that machine.
> 
> Has anyone else run into anything like this with Solr?

Yes, I saw such behaviour on many Ubuntu 6.06 servers running in virtual
environments (like VMWare). Either Jetty was unable to bind to specified
port (for unknown reason) or the whole process was lost somewhere in
space (killable only by kill -9, not responding to signals, etc.).
Though, I can only confirm, no advice here, as this was mystery to me too.

-- 
We read Knuth so you don't have to. -- Tim Peters

Jarek Zgoda
re:define


Re: solr:sorting on what type is faster

2008-07-15 Thread Shalin Shekhar Mangar
If a sort is not specified then documents are returned in decreasing order
of their score. You can get more details on the scoring at
http://lucene.apache.org/java/docs/scoring.html

On Tue, Jul 15, 2008 at 6:03 PM, sumantht <[EMAIL PROTECTED]> wrote:

>
> hi,
> in databases, sorting based on text fields is faster and preferable, if i
> am
> not wrong.
> similarly, which type of fields are to be chosen to sort in 'solr'? how the
> ties are broken?
> sorry for mistakes, if any ..
>
> thank you
> --
> View this message in context:
> http://www.nabble.com/solr%3Asorting-on-what-type-is-faster-tp18464118p18464118.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


-- 
Regards,
Shalin Shekhar Mangar.


Solr stops responding

2008-07-15 Thread Doug Steigerwald
Since we pushed Solr out to production a few weeks ago, we've seen a  
few issues with Solr not responding to requests (searches or admin  
pages).  There doesn't seem to be any reason for it from what we can  
tell.  We haven't seen it in QA or development.


We're running Solr with basically the example Solr setup with Jetty  
(6.1.3).  We package our Solr install by using 'ant example' and  
replacing configs/etc.  Whenever Solr stops responding, there are no  
messages in the logs, nothing.  Requests just time out.


We have also only seen this on our slaves.  The master doesn't seem to  
be hitting this issue.  All the boxes are the same, version of java is  
the same, etc.


We don't have a stack trace and no JMX set up.  Once we see this  
issue, our support folks just stop and start Solr on that machine.


Has anyone else run into anything like this with Solr?

Thanks.
Doug


Re: Filter by Type increases search results.

2008-07-15 Thread Norberto Meijome
On Tue, 15 Jul 2008 18:07:43 +0530
"Preetam Rao" <[EMAIL PROTECTED]> wrote:

> When I say filter, I meant q=fish&fq=type:idea

btw, this *seems* to only work for me with standard search handler. dismax and 
fq: dont' seem to get along nicely... but maybe, it is just late and i'm not 
testing it properly..

_
{Beto|Norberto|Numard} Meijome

"Mix a little foolishness with your serious plans;
it's lovely to be silly at the right moment."
   Horace

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: Duplicate content

2008-07-15 Thread Ryan McKinley


On Jul 15, 2008, at 2:45 AM, Sunil wrote:


Hi All,

I want to change the duplicate content behavior in solr. What I want  
to

do is:

1) I don't want duplicate content.
2) I don't want to overwrite old content with new one.

Means, if I add duplicate content in solr and the content already
exists, the old content should not be overwritten.

Can anyone suggest how to achieve it?



Check the "allowDups" options for 
http://wiki.apache.org/solr/UpdateXmlMessages#head-3dfbf90fbc69f168ab6f3389daf68571ad614bef





Thanks,
Sunil






WordDelimiterFilter splits at non-ASCII chars

2008-07-15 Thread Stefan Oestreicher
Hi,

as I understand the WordDelimiterFilter should split on case changes, word
delimiters and changes from character to digit, but it should not
differentiate between ASCII and multibyte chars. It does however. The word
"hälse" (german plural of "neck") gets split into "h", "ä" and "lse", which
unfortunately renders this filter quite unusable for me. Am i missing
something or is this a bug?
I'm using solr 1.3 built from trunk.

TIA,
 
Stefan Oestreicher



Re: which type of fields are to be compressed

2008-07-15 Thread Erick Erickson
Compression is only relevant for the original text, not the indexed
part. So in terms of searching, it's irrelevant.

Where it is relevant is when you *fetch* the document (e.g.
doe = hits.doc(32)), the de-compression work is done (for
stored documents). Depending upon your app, this may or
may not matter.

Here's a writeup I did that will shed some light on this, even
though it talks about FieldSelector (which, if you really need
to compress data you probably care about too).

http://wiki.apache.org/lucene-java/FieldSelectorPerformance

Best
Erick

On Tue, Jul 15, 2008 at 8:29 AM, sumantht <[EMAIL PROTECTED]> wrote:

>
> hi
> is it preferable to compress each and every field, if not why.?
> how exactly it helps?
> --
> View this message in context:
> http://www.nabble.com/which-type-of-fields-are-to-be-compressed-tp18464056p18464056.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


RE: Duplicate content

2008-07-15 Thread Sunil
Thanks guys.


-Original Message-
From: Norberto Meijome [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, July 15, 2008 2:35 PM
To: solr-user@lucene.apache.org
Subject: Re: Duplicate content

On Tue, 15 Jul 2008 10:48:14 +0200
Jarek Zgoda <[EMAIL PROTECTED]> wrote:

> >> 2) I don't want to overwrite old content with new one. 
> >>
> >> Means, if I add duplicate content in solr and the content already
> >> exists, the old content should not be overwritten.  
> > 
> > before inserting a new document, query the index - if you get a
result back,
> > then don't insert. I don't know of any other way.  
> 
> This operation is not atomic, so you get a race condition here. Other
> than that, it seems fine. ;)

of course - but i am not sure you can control atomicity at the SOLR
level
(yet? ;) ) for /update handler - so it'd have to either be a custom
handler, or
your app being the only one accessing and controlling write access to it
that
way. It definitely gets more interesting if you start adding shards ;)

_
{Beto|Norberto|Numard} Meijome

"All parts should go together without forcing. You must remember that
the parts
you are reassembling were disassembled by you. Therefore, if you can't
get them
together again, there must be a reason. By all means, do not use
hammer." IBM
maintenance manual, 1975

I speak for myself, not my employer. Contents may be hot. Slippery when
wet.
Reading disclaimers makes you go blind. Writing them is worse. You have
been
Warned.




Re: Filter by Type increases search results.

2008-07-15 Thread matt connolly

Of course - it's so obvious now. Thanks!
-- 
View this message in context: 
http://www.nabble.com/Filter-by-Type-increases-search-results.-tp18462188p18464457.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Filter by Type increases search results.

2008-07-15 Thread Preetam Rao
Hi Matt,

When I say filter, I meant q=fish&fq=type:idea

What you are trying is a boolean OR of defaultsearchfield.:fish OR
type:idea.

Its not a filter, its an OR. Obviously you will get a union of results...

--
Preetam

On Tue, Jul 15, 2008 at 5:37 PM, matt connolly <[EMAIL PROTECTED]> wrote:

>
> Yes, the same, except for the filter.
>
> For example:
>
> http://localhost:8983/solr/select?q=fish
> returns:
> etc (followed by 2
> docs)
>
> http://localhost:8983/solr/select?q=fish+type:idea
> returns:
> . (followed by 9
> docs)
>
>
> -Matt
>
>
> Preetam Rao wrote:
> >
> > Hi Matt,
> >
> > Other than applying one more fq, is everything else remains same between
> > the
> > two queries, like q and all other parameters ?
> >
> > My understanding is that, fq is an intersection on the set of results
> > returned from q. So it should always be a subset of results returned from
> > q.
> > So if one uses just q, and other uses q and fq, for the same q, the
> second
> > will have equal or less number of documents.
> >
> > 
> > Preetam
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/Filter-by-Type-increases-search-results.-tp18462188p18463448.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


solr:sorting on what type is faster

2008-07-15 Thread sumantht

hi,
in databases, sorting based on text fields is faster and preferable, if i am
not wrong. 
similarly, which type of fields are to be chosen to sort in 'solr'? how the
ties are broken?
sorry for mistakes, if any ..

thank you
-- 
View this message in context: 
http://www.nabble.com/solr%3Asorting-on-what-type-is-faster-tp18464118p18464118.html
Sent from the Solr - User mailing list archive at Nabble.com.



which type of fields are to be compressed

2008-07-15 Thread sumantht

hi
is it preferable to compress each and every field, if not why.?
how exactly it helps?
-- 
View this message in context: 
http://www.nabble.com/which-type-of-fields-are-to-be-compressed-tp18464056p18464056.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Filter by Type increases search results.

2008-07-15 Thread matt connolly

Yes, the same, except for the filter.

For example: 

http://localhost:8983/solr/select?q=fish
returns:
etc (followed by 2
docs)

http://localhost:8983/solr/select?q=fish+type:idea
returns:
. (followed by 9
docs)


-Matt


Preetam Rao wrote:
> 
> Hi Matt,
> 
> Other than applying one more fq, is everything else remains same between
> the
> two queries, like q and all other parameters ?
> 
> My understanding is that, fq is an intersection on the set of results
> returned from q. So it should always be a subset of results returned from
> q.
> So if one uses just q, and other uses q and fq, for the same q, the second
> will have equal or less number of documents.
> 
> 
> Preetam
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Filter-by-Type-increases-search-results.-tp18462188p18463448.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Filter by Type increases search results.

2008-07-15 Thread Preetam Rao
Hi Matt,

Other than applying one more fq, is everything else remains same between the
two queries, like q and all other parameters ?

My understanding is that, fq is an intersection on the set of results
returned from q. So it should always be a subset of results returned from q.
So if one uses just q, and other uses q and fq, for the same q, the second
will have equal or less number of documents.


Preetam

On Tue, Jul 15, 2008 at 4:10 PM, matt connolly <[EMAIL PROTECTED]> wrote:

>
> I'm using Solr with a Drupal site, and one of the fields in the schema is
> "type".
>
> In my example development site, searching for the word "fish" returns 2
> documents, one type='story', and the other type='idea'.
>
> If I filter by type:idea then I get 9 results, the correct first result,
> followed by 8 results that are of type='idea' but do not use the word
> "fish"
> at all. I have completely disabled synonyms (and rebuilt indexes) and this
> makes no difference.
>
> Any ideas why filtering the type results in more search documents matched?
> --
> View this message in context:
> http://www.nabble.com/Filter-by-Type-increases-search-results.-tp18462188p18462188.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


Re: Dismax request handler and sub phrase matches... suggestion for another handler..

2008-07-15 Thread Preetam Rao
I agree. If we do decide to implement another kind of request handler, it
should be through StandardRequesthandler def type attribute, which selects
the registered QParser which generates appropriate queries for lucene.


Preetam

On Tue, Jul 15, 2008 at 3:59 PM, Erik Hatcher <[EMAIL PROTECTED]>
wrote:

>
> On Jul 15, 2008, at 4:45 AM, Preetam Rao wrote:
>
>> What are your thoughts on having one more request handler like dismax, but
>> which uses a sub-phrase query instead of dismax query ?
>>
>
> It'd be better to just implement a QParser(Plugin) such that the
> StandardRequestHandler can use it (&defType=dismax, for example).
>
> No need to have additional actual request handlers just to swap out query
> parsing logic anymore.
>
>Erik
>
>


Filter by Type increases search results.

2008-07-15 Thread matt connolly

I'm using Solr with a Drupal site, and one of the fields in the schema is
"type".

In my example development site, searching for the word "fish" returns 2
documents, one type='story', and the other type='idea'.

If I filter by type:idea then I get 9 results, the correct first result,
followed by 8 results that are of type='idea' but do not use the word "fish"
at all. I have completely disabled synonyms (and rebuilt indexes) and this
makes no difference.

Any ideas why filtering the type results in more search documents matched?
-- 
View this message in context: 
http://www.nabble.com/Filter-by-Type-increases-search-results.-tp18462188p18462188.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Dismax request handler and sub phrase matches... suggestion for another handler..

2008-07-15 Thread Erik Hatcher


On Jul 15, 2008, at 4:45 AM, Preetam Rao wrote:
What are your thoughts on having one more request handler like  
dismax, but

which uses a sub-phrase query instead of dismax query ?


It'd be better to just implement a QParser(Plugin) such that the  
StandardRequestHandler can use it (&defType=dismax, for example).


No need to have additional actual request handlers just to swap out  
query parsing logic anymore.


Erik



RE: Solr searching issue..

2008-07-15 Thread dudes dudes

thanks ! I think I fixed the issue and it's doing good :)


> From: [EMAIL PROTECTED]
> To: solr-user@lucene.apache.org
> Subject: RE: Solr searching issue..
> Date: Mon, 14 Jul 2008 20:12:00 +
> 
> Copy field dest="text". I am not sure if u can copy into text or something 
> like that. We copy it into a field of type text or string etc.. Plus what is 
> ur query string. what gives u no results. How do u index it??
> need more clues to figure out answer dude :)
> 
> 
> 
>> From: [EMAIL PROTECTED]> To: solr-user@lucene.apache.org> Subject: RE: Solr 
>> searching issue..> Date: Mon, 14 Jul 2008 09:34:47 +0100>>> again whatever I 
>> have pasted it didn't work ! .. I have attached the schema.xml file 
>> instead,,, sorry for spamming you all>> thanks> ak> 
>> >> From: [EMAIL PROTECTED]>> To: 
>> solr-user@lucene.apache.org>> Subject: RE: Solr searching issue..>> Date: 
>> Mon, 14 Jul 2008 09:28:16 +0100>> with some strange reason my copy and 
>> paste didn't work !!! sorry to terrible you all.. hope you can see them 
>> now..>> >>> From: [EMAIL 
>> PROTECTED]>>> To: solr-user@lucene.apache.org>>> Subject: RE: Solr searching 
>> issue..>>> Date: Mon, 14 Jul 2008 09:17:32 +0100> Hi again,>> I 
>> have done the followings, but I do get zero replies .. please let me know 
>> what I have done wrong... thanks>> version type: nightly build 
>> solr-2008-07-07>> // for 
>> n-gram>> So, if i search 
>> for john,,, john will be found with out any problems... if I search for 
>> "joh" I'm not getting any results back,,,>> thanks>>> ak> 
>>  Date: Fri, 11 Jul 2008 20:14:11 
>> +0530 From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org 
>> Subject: Re: Solr searching issue.. You can use EdgeNGramTokenizer 
>> available with Solr 1.3 to achieve this. But I'd think again about 
>> introducing this kind of search as n-grams can bloat your index 
>> size. On Fri, Jul 11, 2008 at 3:58 PM, dudes dudes 
>> wrote:>> Hi solr-users,>> version type: nightly build 
>> solr-2008-07-07>> If I search for name John, it finds it with out 
>> any issues  On the> other hand if I search for Joh* , it also finds 
>> all the possible matches.> However, if> I search for "Joh".. it 
>> doesn't find any possible match in other word,> it doesn't find name 
>> john if you don't specify the exact name..>> Does anybody know what 
>> I'm missing here?>> thanks> ak> 
>> _> The 
>> John Lewis Clearance - save up to 50% with FREE delivery> 
>> http://clk.atdmt.com/UKM/go/101719806/direct/01/ -- 
>> Regards, Shalin Shekhar Mangar.>> 
>> _>>> 100’s 
>> of Nikon cameras to be won with Live Search>>> 
>> http://clk.atdmt.com/UKM/go/101719808/direct/01/ 
>> _>> Play and 
>> win great prizes with Live Search and Kung Fu Panda>> 
>> http://clk.atdmt.com/UKM/go/101719966/direct/01/>> 
>> _> The John 
>> Lewis Clearance - save up to 50% with FREE delivery> 
>> http://clk.atdmt.com/UKM/go/101719806/direct/01/
> _
> Missed your favourite programme? Stop surfing TV channels and start planning 
> your weekend TV viewing with our comprehensive TV Listing
> http://entertainment.in.msn.com/TV/TVListing.aspx

_
Invite your Facebook friends to chat on Messenger
http://clk.atdmt.com/UKM/go/101719649/direct/01/

Re: solr synonyms behaviour

2008-07-15 Thread matt connolly


swarag wrote:
> 
> Knowing the Lucene struggles with multi-word query-time synonyms, my
> question is, does this also affect index-time synonyms? What other
> alternatives do we have if we require there to be multiple word synonyms?
> 

No the multiple word problem doesn't happen with index synonyms, only query
synonyms.

See:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#head-2c461ac74b4ddd82e453dc68fcfc92da77358d46

I ended up using index time synonyms, but ideally, I'd like to see a filter
factory that does something like the SynsExpand tool does (which was written
for lucene, not solr).
-- 
View this message in context: 
http://www.nabble.com/solr-synonyms-behaviour-tp15051211p18461507.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: solr synonyms behaviour

2008-07-15 Thread Guillaume Smet
Chris,

On Sat, Jan 26, 2008 at 2:30 AM, Chris Hostetter
<[EMAIL PROTECTED]> wrote:
> : I have the synonym filter only at query time coz i can't re-index data (or
> : portion of data) everytime i add a synonym and a couple of other reasons.
>
> Use cases like yours will *never* work as a query time synonym ... hence
> all of the information about multi-word synonyms and the caveats about
> using them in the wiki...
>
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#SynonymFilter

Considering these problems, it might be better to move the
SynonymFilter from type="query" to type="index" in the example file.
This file is very often used as a reference.

Or perhaps we should just mention potential problems and a link to the
documentation in the existing comment: "in this example, we will only
use synonyms at query time".

Thoughts?

-- 
Guillaume


Re: Duplicate content

2008-07-15 Thread Norberto Meijome
On Tue, 15 Jul 2008 10:48:14 +0200
Jarek Zgoda <[EMAIL PROTECTED]> wrote:

> >> 2) I don't want to overwrite old content with new one. 
> >>
> >> Means, if I add duplicate content in solr and the content already
> >> exists, the old content should not be overwritten.  
> > 
> > before inserting a new document, query the index - if you get a result back,
> > then don't insert. I don't know of any other way.  
> 
> This operation is not atomic, so you get a race condition here. Other
> than that, it seems fine. ;)

of course - but i am not sure you can control atomicity at the SOLR level
(yet? ;) ) for /update handler - so it'd have to either be a custom handler, or
your app being the only one accessing and controlling write access to it that
way. It definitely gets more interesting if you start adding shards ;)

_
{Beto|Norberto|Numard} Meijome

"All parts should go together without forcing. You must remember that the parts
you are reassembling were disassembled by you. Therefore, if you can't get them
together again, there must be a reason. By all means, do not use hammer." IBM
maintenance manual, 1975

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.


Re: Duplicate content

2008-07-15 Thread Jarek Zgoda
Norberto Meijome pisze:

>> 2) I don't want to overwrite old content with new one. 
>>
>> Means, if I add duplicate content in solr and the content already
>> exists, the old content should not be overwritten.
> 
> before inserting a new document, query the index - if you get a result back,
> then don't insert. I don't know of any other way.

This operation is not atomic, so you get a race condition here. Other
than that, it seems fine. ;)

-- 
We read Knuth so you don't have to. -- Tim Peters

Jarek Zgoda
re:define


Dismax request handler and sub phrase matches... suggestion for another handler..

2008-07-15 Thread Preetam Rao
Hi,

Apologies if you are receiving it second time...having tough time with mail
server..

I take a user entered query as it is and run it with dismax query handler.
The documents fields have been filled from structured data, where different
fields have different attributes like number of beds, number of baths, city
name etc. A sample user query would look like "3 bed homes in new york". I
would like this to match against city:new york and beds:3 beds. When I use
dismax handler with boosts and tie parameter, I do not always get the most
relevant top 10 results because there seem to be many factors in play one of
which is not being able to recognize the presence of sub phrases and
secondly not being able to ignore unwanted matches in unwanted fields.

What are your thoughts on having one more request handler like dismax, but
which uses a sub-phrase query instead of dismax query ?
It would also provide the below parameters, on per field basis, to help
customize the behavior of the request handler, and give more flexibility in
different scenarios.
.
phraseBoost - how better is a 3 word sub phrase match than 2 word sub phrase
match
useOnlyMaxMatch - If many sub phrases match in the field, only the best
score is used.
ignoreDuplicates - If a field has duplicate matches, pick only one match for
scoring.
matchOnlyOneField - if match is found in the first field, remove the matched
terms while querying the other fields. For example, for me city match is
more important than in other fields. So,, I do not want the"new" in new york
to match all other fields and skew the results, which is what i am seeing
with dismax, irrespective of the high boosts.
ignoreSomeLuceneScorefactors - Ignore the lucene tf, idf, query norm or any
such criteria which is not needed for this field., since if I want exact
matches only, they are really not important. They also seem to play a big
role in me not being to get most relevant top 10 results.

I see this handler might be useful in the below use cases -
a) data is mostly exact in that, I am not trying to search on free text
like, mails, reviews, articles, web pages etc
b) numbers and their binding are important
c) exact phrase or sub phrase matches are more important than rankings
derived from tf, idf, query norm etc.
d) need to make sure that in some cases some fields affect the scoring and
in some they don't. I found this was the most difficult task, to trace the
noise matches from the required ones for my use case.

Your thoughts and suggestions on alternatives are welcome.

Have also posted a question on sub phrase matching in lucene-user which is
not related to having a solr handler with additional features like
sub-phrase matching, for user entered queries.

Thanks
Preetam


Re: Duplicate content

2008-07-15 Thread Norberto Meijome
On Tue, 15 Jul 2008 13:15:41 +0530
"Sunil" <[EMAIL PROTECTED]> wrote:

> 1) I don't want duplicate content.

SOLR uses the field you define as the unique field to determine whether a
document should be replaced or added. The rest of the fields are in your hands.
You could devise a setup whereby the document id is generated by hashing all
the other fields in your schema, thereby ensuring that a unique document id
means unique content (of course, for a meaning of 'uniqueness' that is
"different bytes" ;) )

> 2) I don't want to overwrite old content with new one. 
> 
> Means, if I add duplicate content in solr and the content already
> exists, the old content should not be overwritten.

before inserting a new document, query the index - if you get a result back,
then don't insert. I don't know of any other way.

b
_
{Beto|Norberto|Numard} Meijome

"The real voyage of discovery consists not in seeking new landscapes, but in
having new eyes." Marcel Proust

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.


Re: Duplicate content

2008-07-15 Thread Noble Paul നോബിള്‍ नोब्ळ्
You must do a check before adding documents

On Tue, Jul 15, 2008 at 1:15 PM, Sunil <[EMAIL PROTECTED]> wrote:
> Hi All,
>
> I want to change the duplicate content behavior in solr. What I want to
> do is:
>
> 1) I don't want duplicate content.
> 2) I don't want to overwrite old content with new one.
>
> Means, if I add duplicate content in solr and the content already
> exists, the old content should not be overwritten.
>
> Can anyone suggest how to achieve it?
>
>
> Thanks,
> Sunil
>
>
>



-- 
--Noble Paul


Duplicate content

2008-07-15 Thread Sunil
Hi All,

I want to change the duplicate content behavior in solr. What I want to
do is:

1) I don't want duplicate content.
2) I don't want to overwrite old content with new one. 

Means, if I add duplicate content in solr and the content already
exists, the old content should not be overwritten.

Can anyone suggest how to achieve it?


Thanks,
Sunil