Re: SnowballPorterFilterFactory stemming word question

2009-09-07 Thread darniz

Thanks Hoss
Could you please provide with any example

Does solr provide any implementation for dictionary stemmer, please let me
know 

Thanks
Rashid


hossman wrote:
 
 
 : If i give machine why is that it stems to machin, now from where
 does
 : this word come from
 : If i give revolutionary it stems to revolutionari, i thought it
 should
 : stem to revolution.
 : 
 : How does stemming work?
 
 the porter stemmer (and all of the stemmers provided with solr) are 
 programtic stemmers ... they don't actually know the root of any words the 
 use an aproximate algorithm to compute a *token* from a word based on a 
 set of rules ... these tokens aren't neccessarily real words (and most of 
 the time they aren't words) but the same token tends to be produced from 
 words with similar roots.
 
 if you want to see the actaul root word, you'll have to use a dictionary 
 based stemmer.
 
 
 -Hoss
 
 
 

-- 
View this message in context: 
http://www.nabble.com/SnowballPorterFilterFactory-stemming-word-question-tp25180310p25325738.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Field Collapsing (was Re: Schema for group/child entity setup)

2009-09-07 Thread Uri Boness


Great. Nice site and very similar to my requirements.

thanks.


So, right now, you get all field values by default?
Right now, no field values are returned for the collapsed documents. The 
patch which will be committed soon will add this functionality.


R. Tan wrote:

Great. Nice site and very similar to my requirements.

  

There's work on the patch that is being done now which will enable you to
ask for specific field values of the collapsed documents using a dedicated
request parameter.




So, right now, you get all field values by default?


On Sun, Sep 6, 2009 at 3:58 AM, Uri Boness ubon...@gmail.com wrote:

  

You can check out http://www.ilocal.nl. If you search for a bank in
Amsterdam then you'll see that a lot of the results are collapsed. For this
we used an older version of this patch (which works on 1.3) but a lot has
changed since then. We're currently using this patch on another project, but
it's not live yet.


Uri

R. Tan wrote:



Thanks Uri. Your personal suggestion is appreciated and I think I'll
follow
your advice. We're still early in development and 1.4 would be a good
choice. I hope I can get field collapsing to work with my requirements. Do
you know any live site using field collapsing already?

On Sat, Sep 5, 2009 at 5:57 PM, Uri Boness ubon...@gmail.com wrote:



  

There's work on the patch that is being done now which will enable you to
ask for specific field values of the collapsed documents using a
dedicated
request parameter. This work is not committed yet to the latest patch,
but
will be very soon. There is of course a drawback to that as well, the
collapsed documents set can be very large (depends on your data of
course)
in which case the returned result which includes the fields values can be
rather large, which will impact performance, this is why this feature
will
be enabled only if you specify this extra parameter - by default no field
values will be returned.

AFAIK, the latest patch should work fine with the latest build. Martijn
(which is the main maintainer of this patch) tries to keep it up to date
with the latest builds. But I guess the safest way is to work with the
nightly build of the same date as the latest patch (though I would give
it a
try first with the latest build).

BTW, it's not an official suggestion from the Solr development team, but
if
you ask me, if you have to choose now whether to use 1.3 or 1.4-dev, I
would
go for the later. 1.4 is supposed to be released in the upcoming week or
two
and it bring loads of bug fixes, enhancements and extra functionality.
But
again, this is my personal suggestion.


cheers,
Uri

R. Tan wrote:





Okay. Thanks for giving an insight on how it works in general. Without
trying it myself, are the field values for the collapsed ones also part
of
the results data?
What is the latest build that is safe to use on a production
environment?
I'd probably go for that and use field collapsing.

Thank you very much.


On Fri, Sep 4, 2009 at 4:49 AM, Uri Boness ubon...@gmail.com wrote:





  

The collapsed documents are represented by one master document which
can
be part of the normal search result (the doc list), so pagination just
works
as expected, meaning taking only the returned documents in account
(ignoring
the collapsed ones). As for the scoring, the master document is
actually
the document with the highest score in the collapsed group.

As for Solr 1.3 compatibility... well... it's very hart to tell. All
latest
patch are certainly *not* 1.3 compatible (I think they're also
depending
on
some changes in lucene which are not available for solr 1.3). I guess
you'll
have to try some of the old patches, but I'm not sure about their
stability.

cheers,
Uri


R. Tan wrote:







Thanks Uri. How does paging and scoring work when using field
collapsing?
What patch works with 1.3? Is it production ready?

R


On Thu, Sep 3, 2009 at 3:54 PM, Uri Boness ubon...@gmail.com wrote:







  

The development on this patch is quite active. It works well for
single
solr instance, but distributed search (ie. shards) is not yet
supported.
Using this page you can group search results based on a specific
field.
There are two flavors of field collapsing - adjacent and
non-adjacent,
the
former collapses only document which happen to be located next to
each
other
in the otherwise-non-collapsed results set. The later (the
non-adjacent)
one
collapses all documents with the same field value (regardless of
their
position in the otherwise-non-collapsed results set). Note, that
non-adjacent performs better than adjacent one. There's currently
discussion
to extend this support so in addition to collapsing the documents,
extra
information will be returned for the collapsed documents (see the
discussion
on the issue page).

Uri


R. Tan wrote:









I think this is what I'm looking for. What is the status of this
patch?

On Thu, Sep 3, 2009 at 12:00 PM, R. Tan 

Faceting optimization

2009-09-07 Thread Sébastien Lamy

Hi

I'm currently trying to optimize the response time of my solr server.
I found one aberration and hope you may be able to help me solve it:
If, considering the whole document index, there is a lot of possible 
values for a field, asking for facet on that field dramatically increase 
response time. Even if the search returns only one document, with only 
one facet value for that field. This is shown by the three requests at 
the bottom of this mail.


It seems to me that solr looks at all the possible values in the whole 
index for the faceted field. Whereas it should look at the possible 
values only for the documents in the results, wich would be a lot 
faster. Is there a way asking him to do so?



---
Let's look at this three requests:

1- This request returns only one document and take 3ms
http://localhost:8983/solr/select/?
rows=10
q=(available_owner_display_name_s_facet:%22mag%22)+AND+type_s:[T0+TO+T9]


2- This request returns one document, and its facets for one field. It 
takes about 1000ms. The facet on a_10_alpha_sort returns only one value: 
air du temps. But overall the whole index, there is a lot of values 
(10 000) for a_10_alpha_sort.

http://localhost:8983/solr/select/?
facet=true
rows=10
q=(available_owner_display_name_s_facet:%22mag%22)+AND+type_s:[T0+TO+T9]
facet.field=a_10_alpha_sort
f.a_10_alpha_sort.facet.mincount=1
f.a_10_alpha_sort.facet.sort=true
f.a_10_alpha_sort.facet.limit=8

3- This request includes the value air du temps in the search string. 
It takes 3ms

http://localhost:8983/solr/select/?
rows=10
q=(available_owner_display_name_s_facet:%22mag%22+AND+a_10_alpha_sort:air+du+temps)+AND+type_s:[T0+TO+T9]


Here is the description of the faceted field in my schema: this is a 
single-valued field, with no tokens.


dynamicField name=*_alpha_sort type=alphaOnlySort indexed=true 
stored=false multivalued=false/
fieldType name=alphaOnlySort class=solr.TextField 
sortMissingLast=true omitNorms=true

 analyzer
   !-- KeywordTokenizer does no actual tokenizing, so the entire
input string is preserved as a single token --
   tokenizer class=solr.KeywordTokenizerFactory/
   filter class=solr.ISOLatin1AccentFilterFactory /
   filter class=solr.LowerCaseFilterFactory /
   filter class=solr.TrimFilterFactory /
 /analyzer
/fieldType


Re: Exact Word Search

2009-09-07 Thread bhaskar chandrasekar
 
Hi Shalin,
 
My search is based on the following fields in schema.xml
 

field name=url type=string indexed=true stored=true/
 field name=content type=text indexed=true stored=true/
 field name=description type=string indexed=true stored=true/
 
 
Let me know if you need anything else?
Regards
Bhaskar

--- On Fri, 9/4/09, Shalin Shekhar Mangar shalinman...@gmail.com wrote:


From: Shalin Shekhar Mangar shalinman...@gmail.com
Subject: Re: Exact Word Search
To: solr-user@lucene.apache.org
Date: Friday, September 4, 2009, 5:51 AM


On Fri, Sep 4, 2009 at 6:06 PM, bhaskar chandrasekar
bas_s...@yahoo.co.inwrote:


 Hi,


 I have integrated Solr with Carrot2 Cluster Engine (v 3.1.0).

 Carrot2 is used as a presentation layer. Carrot2 sends requested query to
 external source (Solr) and get results from Solr.
 Carrot2 may not be responsible for forming Query. It would have been
 handled from Solr end.


Can you post the exact query that your application or Carrot2 is sending to
Solr? Can you also list the Solr field and type defined in schema.xml which
is being searched?



 Please help me with the below scenarios.

 Scenario: (Please DO NOT consider any case sensitive)

 Assuming I give bhaskar as input string
 It should give me search results pertaining to word ‘bhaskar’ only.

 I am expecting output like below database query
 Select * from MASTER where name =’bhaskar’;

 Above query suppose to return matched records for ‘bhaskar’..


Use a solr.TextField with KeywordTokenizer and LowerCaseFilter and search
with q=field-name:field-value

-- 
Regards,
Shalin Shekhar Mangar.



  

Re: Netbeans and Solr : Whac-A-Mole

2009-09-07 Thread Noble Paul നോബിള്‍ नोब्ळ्
This testcase is quite independent of anything in Solr. It is a
standalone utility and the only dependency is stax.
discalimer (I run these testcases from Intellij and command line)
BTW are you using XpathRecordReader outside of DIH?

On Mon, Sep 7, 2009 at 3:26 PM, Fergus McMenemiefer...@twig.me.uk wrote:
 Hello all,

 I would appreciate help from somebody who has set up Solr within
 netbeans, I am wanting to do more work with DIH and particularly its
 XpathEntityProcessor stuff. I wish to preform the following from
 within the IDE

   ant -Dtestcase=TestXPathRecordReader.java test

 I have spent a few hours playing Whac-A-Mole with classpath and source
 settings. In the end I got it down to zero flags, but I then added
 some test cases and the scanner thing then went off and flagged dozens
 files with undefined classes I removed my change but the rescan did not
 remove the dozens of flagged files.

 PS: I am a total netbeans newbie.

 --

 ===
 Fergus McMenemie               Email:fer...@twig.me.uk
 Techmore Ltd                   Phone:(UK) 07721 376021

 Unix/Mac/Intranets             Analyst Programmer
 ===




-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: Netbeans and Solr : Whac-A-Mole

2009-09-07 Thread rajan chandi
We use command-line for most stuff except editing/debugging!

2009/9/7 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com

 This testcase is quite independent of anything in Solr. It is a
 standalone utility and the only dependency is stax.
 discalimer (I run these testcases from Intellij and command line)
 BTW are you using XpathRecordReader outside of DIH?

 On Mon, Sep 7, 2009 at 3:26 PM, Fergus McMenemiefer...@twig.me.uk wrote:
  Hello all,
 
  I would appreciate help from somebody who has set up Solr within
  netbeans, I am wanting to do more work with DIH and particularly its
  XpathEntityProcessor stuff. I wish to preform the following from
  within the IDE
 
ant -Dtestcase=TestXPathRecordReader.java test
 
  I have spent a few hours playing Whac-A-Mole with classpath and source
  settings. In the end I got it down to zero flags, but I then added
  some test cases and the scanner thing then went off and flagged dozens
  files with undefined classes I removed my change but the rescan did not
  remove the dozens of flagged files.
 
  PS: I am a total netbeans newbie.
 
  --
 
  ===
  Fergus McMenemie   
  Email:fer...@twig.me.ukemail%3afer...@twig.me.uk
  Techmore Ltd   Phone:(UK) 07721 376021
 
  Unix/Mac/Intranets Analyst Programmer
  ===
 



 --
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com



Re: Netbeans and Solr : Whac-A-Mole

2009-09-07 Thread Fergus McMenemie
This testcase is quite independent of anything in Solr. It is a
standalone utility and the only dependency is stax.
discalimer (I run these testcases from Intellij and command line)
BTW are you using XpathRecordReader outside of DIH?

Nobel,

Is there a better way to test and play with XPathRecordReader.java
other than

 ant -Dtestcase=TestXPathRecordReader test

Which takes 8secs to run here? I am not using XpathRecordReader
outside of DIH, but looking to see how I would add support for
xpaths such as //a.

Fergus.


On Mon, Sep 7, 2009 at 3:26 PM, Fergus McMenemiefer...@twig.me.uk wrote:
 Hello all,

 I would appreciate help from somebody who has set up Solr within
 netbeans, I am wanting to do more work with DIH and particularly its
 XpathEntityProcessor stuff. I wish to preform the following from
 within the IDE

   ant -Dtestcase=TestXPathRecordReader.java test

 I have spent a few hours playing Whac-A-Mole with classpath and source
 settings. In the end I got it down to zero flags, but I then added
 some test cases and the scanner thing then went off and flagged dozens
 files with undefined classes I removed my change but the rescan did not
 remove the dozens of flagged files.

 PS: I am a total netbeans newbie.

 --

 ===
 Fergus McMenemie               Email:fer...@twig.me.uk
 Techmore Ltd                   Phone:(UK) 07721 376021

 Unix/Mac/Intranets             Analyst Programmer
 ===




-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

-- 

===
Fergus McMenemie   Email:fer...@twig.me.uk
Techmore Ltd   Phone:(UK) 07721 376021

Unix/Mac/Intranets Analyst Programmer
===


Re: Netbeans and Solr : Whac-A-Mole

2009-09-07 Thread Shalin Shekhar Mangar
On Mon, Sep 7, 2009 at 5:58 PM, Fergus McMenemie fer...@twig.me.uk wrote:

 This testcase is quite independent of anything in Solr. It is a
 standalone utility and the only dependency is stax.
 discalimer (I run these testcases from Intellij and command line)
 BTW are you using XpathRecordReader outside of DIH?

 Nobel,

 Is there a better way to test and play with XPathRecordReader.java
 other than

  ant -Dtestcase=TestXPathRecordReader test

 Which takes 8secs to run here? I am not using XpathRecordReader
 outside of DIH, but looking to see how I would add support for
 xpaths such as //a.


The target takes a lot of time because it has to go through all the
test-cases in core and contribs trying to match the value given in
-Dtestcase.

You could also do ant -Dtestcase=TestXPathRecordReader test-contrib which
should be a little faster. I run individual test cases directly through IDEA
which avoids these extra steps.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Netbeans and Solr : Whac-A-Mole

2009-09-07 Thread Fergus McMenemie
On Mon, Sep 7, 2009 at 5:58 PM, Fergus McMenemie fer...@twig.me.uk wrote:

 This testcase is quite independent of anything in Solr. It is a
 standalone utility and the only dependency is stax.
 discalimer (I run these testcases from Intellij and command line)
 BTW are you using XpathRecordReader outside of DIH?

 Nobel,

 Is there a better way to test and play with XPathRecordReader.java
 other than

  ant -Dtestcase=TestXPathRecordReader test

 Which takes 8secs to run here? I am not using XpathRecordReader
 outside of DIH, but looking to see how I would add support for
 xpaths such as //a.


The target takes a lot of time because it has to go through all the
test-cases in core and contribs trying to match the value given in
-Dtestcase.

You could also do ant -Dtestcase=TestXPathRecordReader test-contrib which
should be a little faster. I run individual test cases directly through IDEA
which avoids these extra steps.

Shalin,

Hmm, 6 seconds. I looked up IDEA and I guess I should be able
to use it for free while working on solr. Is it easier to 
setup and come up the learning curve?

Regards Fergus.
-- 

===
Fergus McMenemie   Email:fer...@twig.me.uk
Techmore Ltd   Phone:(UK) 07721 376021

Unix/Mac/Intranets Analyst Programmer
===


Re: capturing field length into a stored document field

2009-09-07 Thread mike.schultz

Here's a hybrid solution.  Add a filter to the field in question that counts
all the tokens and at the end  outputs a token of the form
__numtokens.numTokens__.  This eliminates the need to retokenize the field
again.  Also, bucket the numbers, either by some factor of ten, or base 2,
so that there aren't so many different token types produced.  This has a
space advantage over storing in a field, especially since the information
isn't needed at query time anyway.



mike.schultz wrote:
 
 For various statistics I collect from an index it's important for me to
 know the length (measured in tokens) of a document field.  I can get that
 information to some degree from the norms for the field but a) the
 resolution isn't that great, and b) more importantly, if boosts are used
 it's almost impossible to get lengths from this.
 
 Here's two ideas I was thinking about that maybe some can comment on.
 
 1) Use copyto to copy the field in question, fieldA to an addition field,
 fieldALength, which has an extra filter that just counts the tokens and
 only outputs a token representing the length of the field.  This has the
 disadvantage of retokenizing basically the whole document (because the
 field in question is basically the body).  Plus I would think littering
 the term space with these tokens might be bad for performance, I'm not
 sure.
 
 2) Add a filter to the field in question which again counts the tokens. 
 This filter allows the regular tokens to be indexed as usual but somehow
 manages to get the token-count into a stored field of the document.  This
 has the advantage of not having to retokenize the field and instead of
 littering the token space, the count becomes docdata for each doc.  Can
 this be done?  Maybe using threadLocal to temporarily store the count?
 
 Thanks.
 
 

-- 
View this message in context: 
http://www.nabble.com/capturing-field-length-into-a-stored-document-field-tp25297690p25339584.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Can solr return documents which don't match a query?

2009-09-07 Thread Chantal Ackermann
Or add a new tag: the NO TAG at index time and search for that. (If 
you have the possibility to reindex at least that one, first, time.) 
Would clear things up for developers/admins looking at the stuff in some 
months...


Chantal

Yonik Seeley schrieb:

return all documents which either match query1 or don't match query 2


query1 (*:* -query2)

-Yonik
http://www.lucidimagination.com