Multi tokenizer

2008-12-10 Thread Antonio Zippo
Hi all, I need to tokenize my field on whitespaces, html, punctuation, apostrophe but if I use HTMLStripStandardTokenizerFactory it strips only html but no apostrophes If I use PatternTokenizerFactory i don't know if i can create a pattern to tokenizer all of theese characters...(hmtl,

dismax difference between q=text:+toto AND q=toto

2008-12-10 Thread sunnyfr
Hi, I would like to get the difference between q=text:+toto AND q=toto ? /select?fl=*qt=dismaxq=text:+toto : 4 docs find. lst name=params str name=fl*/str str name=qtext: toto/str str name=qtdismax/str /select?fl=*qt=dismaxq=toto : 5682 docs find. lst name=params str name=fl*/str str

Value based boosting - Design Help

2008-12-10 Thread ayyanar
We have a requirement for a keyword search in one of our projects and we are using Solr/Lucene for the same. We have the data, link_id, title, url and a collection of keywords associated to a link_id. Right now we have indexed link_id, title, url and keywords (multivalued field) in a single

Setting Request Handler

2008-12-10 Thread Deshpande, Mukta
Hi, I have a request handler in my solrconfig.xml : /spellCheckCompRH It utilizes the search component spellcheck. When I specify following query in browser, I get correct spelling suggestions from the file dictionary. http://localhost:8080/solr/spellCheckCompRH/?q=SolrDocsspellcheck.q=rel

Re: dismax difference between q=text:+toto AND q=toto

2008-12-10 Thread Erik Hatcher
dismax doesn't support field selection in it's query syntax, only via the qf parameter. add debugQuery=true to see how the queries are being parsed, that'll reveal what is going on. Erik On Dec 10, 2008, at 5:07 AM, sunnyfr wrote: Hi, I would like to get the difference

Re: full-import and empty ./core/data/index

2008-12-10 Thread Shalin Shekhar Mangar
On Wed, Dec 10, 2008 at 4:23 PM, Marc Sturlese [EMAIL PROTECTED]wrote: Is there any way to start solar having the index folder empty without having and error? What I would like to do is start with the empty folder, do a full import (wich would create the index from 0) and from there keep

Re: Value based boosting - Design Help

2008-12-10 Thread Shalin Shekhar Mangar
On Wed, Dec 10, 2008 at 5:54 PM, ayyanar [EMAIL PROTECTED]wrote: Also, in our requirement each keyword value has a weight associated to it and this weight is calculated based on certain factors like (if the keyword exist in title then it takes a specific weight etc…). This weight should

Re: Problems with SOLR-236 (field collapsing)

2008-12-10 Thread Doug Steigerwald
The first output is from the query component. You might just need to make the collapse component first and remove the query component completely. We perform geographic searching with localsolr first (if we need to), and then try to collapse those results (if collapse=true). If we don't

Re: How can i look for tom jerry

2008-12-10 Thread Shalin Shekhar Mangar
On Wed, Dec 10, 2008 at 5:12 PM, sunnyfr [EMAIL PROTECTED] wrote: When I look for this expression it does stop the search at the , taking that for a parameter i guess. You will need to URL encode the query parameter before you make the request. URLEncoder.encode(tom jerry, UTF-8); If you

Re: full-import and empty ./core/data/index

2008-12-10 Thread Marc Sturlese
Thanks, it did work. Shalin Shekhar Mangar wrote: On Wed, Dec 10, 2008 at 4:23 PM, Marc Sturlese [EMAIL PROTECTED]wrote: Is there any way to start solar having the index folder empty without having and error? What I would like to do is start with the empty folder, do a full import

Can we extract contents from two Core folders

2008-12-10 Thread payalsharma
Hi All, Issue: Need to fetch the data available in different core folders. Scenario: We are storing the information on different core folders specific to website ids (such as CoreUSA,CoreUK,CoreIndia ..). Thus information specific to any region get store in specific core folder. for e.g. for

Re: Can we extract contents from two Core folders

2008-12-10 Thread Shalin Shekhar Mangar
On Wed, Dec 10, 2008 at 5:19 PM, payalsharma [EMAIL PROTECTED] wrote: We are storing the information on different core folders specific to website ids (such as CoreUSA,CoreUK,CoreIndia ..). Thus information specific to any region get store in specific core folder. for e.g. for india specific

Re: dismax difference between q=text:+toto AND q=toto

2008-12-10 Thread sunnyfr
Thanks Erik, Have a good day, Erik Hatcher wrote: dismax doesn't support field selection in it's query syntax, only via the qf parameter. add debugQuery=true to see how the queries are being parsed, that'll reveal what is going on. Erik On Dec 10, 2008, at 5:07 AM,

Error, when i update the rich text documents such as .doc, .ppt files.

2008-12-10 Thread RaghavPrabhu
Hi all, I want to index the rich text documents like .doc, .xls, .ppt files. I had done the patch for updating the rich documents by followed the instructions in this below url. http://wiki.apache.org/solr/UpdateRichDocuments When i indexing the doc file, im getting this following error in the

Re: Can we extract contents from two Core folders

2008-12-10 Thread Mark Miller
payalsharma wrote: Hi All, Issue: Need to fetch the data available in different core folders. Scenario: We are storing the information on different core folders specific to website ids (such as CoreUSA,CoreUK,CoreIndia ..). Thus information specific to any region get store in specific core

Re: Can we extract contents from two Core folders

2008-12-10 Thread payalsharma
Hi, Will you please explain what exactly you mean by : Distributed search over the cores. Please provide some context around this. Thanks markrmiller wrote: payalsharma wrote: Hi All, Issue: Need to fetch the data available in different core folders. Scenario: We are storing the

Re: snappuller issue with multicore

2008-12-10 Thread Bill Au
I notices that you are using the same rysncd port for both core. Do you have a scripts.conf for each core? Bill On Tue, Dec 9, 2008 at 11:40 PM, Kashyap, Raghu [EMAIL PROTECTED]wrote: Hi, We are seeing a strange behavior with snappuller We have 2 cores Hotel Location Here are

RE: snappuller issue with multicore

2008-12-10 Thread Kashyap, Raghu
Bill, Yes I do have scripts.conf for each core. However, all the options needed for snappuller is specified in the command line itself (-D -S etc...) -Raghu -Original Message- From: Bill Au [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 10, 2008 9:17 AM To:

RE: Can we extract contents from two Core folders

2008-12-10 Thread Kashyap, Raghu
-Original Message- From: payalsharma [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 10, 2008 9:11 AM To: solr-user@lucene.apache.org Subject: Re: Can we extract contents from two Core folders Hi, Will you please explain what exactly you mean by : Distributed search over the cores.

Re: Look for three words, just two are weighted ?

2008-12-10 Thread Erik Hatcher
On Dec 10, 2008, at 9:58 AM, sunnyfr wrote: Second question, if I want to weight status_official:true^2 should I do it this way ??? for weighting the true one? thanks /select?fl=*qt=dismaxq=+tom+jerry + cartoontv qf=status_official^2.5+owner_login^10+title^3debugQuery=true Use bq

Re: Look for three words, just two are weighted ?

2008-12-10 Thread sunnyfr
Yes but when I check the debug, there is no weight about it ??? /select?fl=*qt=dismaxq=+tom+jerry+cartoontvbq=status_official:true^12qf=owner_login^10+title^3debugQuery=true and its like if it doesnt weight as well my word cartoontv ?? ok maybe the doc which contain this three word is not

Re: snappuller issue with multicore

2008-12-10 Thread Doug Steigerwald
Try using the -d option with the snappuller so you can specify the path to the directory holding index data on local machine. Doug On Dec 10, 2008, at 10:20 AM, Kashyap, Raghu wrote: Bill, Yes I do have scripts.conf for each core. However, all the options needed for snappuller is

Re: Setting Request Handler

2008-12-10 Thread Grant Ingersoll
Inline below... Also, though, you should note that the /spellCheckCompRH that is packaged with the example is not necessarily the best way to actually use the SpellCheckComponent. It is intended to be used as a component in whatever your MAIN Request Handler is, it merely shows the how

RE: snappuller issue with multicore

2008-12-10 Thread Kashyap, Raghu
Ok I think the problem is what Bill mentioned earlier. The rsync port was the same for both the cores and due to which it was copying the same snapshot for both the cores Thanks for all the help -Raghu -Original Message- From: Kashyap, Raghu [mailto:[EMAIL PROTECTED] Sent: Wednesday,

Re: Error, when i update the rich text documents such as .doc, .ppt files.

2008-12-10 Thread Otis Gospodnetic
Hi, There is a ClassNotFound exception in there. Make sure you rebuild the war, completely remove the old one, and properly deploy the new one. Peek into the war and look for the class that the error below is missing to make sure the class is really there. Get the latest code for

Re: Error, when i update the rich text documents such as .doc, .ppt files.

2008-12-10 Thread Grant Ingersoll
Hi Raghav, Recently, integration with Tika was completed for SOLR-284 and it is now committed on the trunk (but does not use the old RichDocumentHandler approach). See http://wiki.apache.org/solr/ExtractingRequestHandler for how to use and configure. Otherwise, it looks to me like the

Dates in Solr

2008-12-10 Thread Tricia Williams
Hi All, I'm curious about what people have done with dates. We Require: 1. multiple granularities to query and facet on: by year, by year/month, by year/month/day 2. sortability: sort/order by date 3. time typically isn't important to us 4. some of these items don't have a day or

Solr Newbie question

2008-12-10 Thread Rakesh Sinha
Hi - I am a new user of Solr tool and came across the introductory tutorial here - http://lucene.apache.org/solr/tutorial.html . I am planning to use Solr in one of my projects . I see that the tutorial mentions about a REST api / interface to add documents and to query the same. I would like

Re: Dates in Solr

2008-12-10 Thread Otis Gospodnetic
Tricia, I think you might have missed the key nugget at the bottom of http://wiki.apache.org/jakarta-lucene/DateRangeQueries Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Tricia Williams [EMAIL PROTECTED] To: solr-user@lucene.apache.org

Re: Limitations of Distributed Search ....

2008-12-10 Thread Otis Gospodnetic
Hi, I have not worked with a 50 node Solr cluster, but I've worked with pure Lucene clusters of that size, very high query and data volumes. I don't imagine a dist search involving 50 nodes will be a problem for Solr. As for handling query slave failures, I'm sure you'll want to involve a LB

Re: solr performance

2008-12-10 Thread Ryan McKinley
For a similar idea, check: https://issues.apache.org/jira/browse/SOLR-906 This opens a single stream and writes all documents to that. It could easily be extended to have multiple threads draining the same Queue On Dec 9, 2008, at 4:02 AM, Noble Paul നോബിള്‍ नोब्ळ् wrote: I guess this

Re: Dates in Solr

2008-12-10 Thread Tricia Williams
Hi Otis, Absolutely, I missed that nugget. I didn't think of using prefix filters/queries. This works really well with how we had already stored dates in a MMDD string. Thanks for pointing me in the right direction. Tricia Otis Gospodnetic wrote: Tricia, I think you might have

Multi Core - Max Core Count Recommendation

2008-12-10 Thread Ryan Peterson
I'm trying to see if anyone has any recommendations on the maximum number of cores that should be used within Solr. Is there significant overhead to each core? Should it be 10 or less, or is 100 or 1,000 cores acceptable. Thanks, Ryan

Re: Multi Core - Max Core Count Recommendation

2008-12-10 Thread Ryan McKinley
it depends! yes there is overhead to each core -- how much it matters will depend entirely on your setup and typical usage pattern. sorry this is not a particularly useful answer. I think the choice of how many cores will come down to your domain logic needs more then hardware. If you

Re: multiValued multiValued fields

2008-12-10 Thread Chris Hostetter
: I want to index a field with an array of arrays, is that possible in Solr? Not out of the box ... you can implement custom FieldTypes that store any data you want in using a byte[] but you'd still need to do some tricks with your FieldType to get the ResponsWriter to write it out in a

Re: Multi Core - Max Core Count Recommendation

2008-12-10 Thread Ryan Peterson
We are considering a migration to SOLR from a home grown Lucene solution. Currently we have 27,000 seperate lucene indexes that are separated based on business logic. Collectively the indexes are about 1.5 Terrabytes in size. We have some very small indexes and some that are quite large (up to

RE: Dealing with field values as key/value pairs

2008-12-10 Thread Chris Hostetter
: This is really cool. U... How does it integrate with the Data Import : Handler? my DIH knowledge is extremely limited, but i'm guessing approach #1 is trivial (there is an easy way to concat DB values to build up solr field values right?); approach #2 would probably be possible using

Sum of Fields and Record Count

2008-12-10 Thread John Martyniak
Hi, I am a new solr user. I have an application that I would like to show the results but one result may be the part of larger set of results. So for example result #1 might also have 10 other results that are part of the same data set. Hopefully this makes sense. What I would like to

SolrConfig.xml Replication

2008-12-10 Thread Jeff Newburn
I am curious as to whether there is a solution to be able to replicate solrconfig.xml with the 1.4 replication. The obvious problem is that the master would replicate the solrconfig turning all slaves into masters with its config. I have also tried on a whim to configure the master and slave on

Re: Sum of Fields and Record Count

2008-12-10 Thread Grant Ingersoll
Hi John, What is your process for determining that #1 is part of the other result set? My gut says this is a faceting problem, i.e. #1 has a field contain its category that is also shared by the 10 other results, and that all you need to do is facet on the category field. The other

Re: Sum of Fields and Record Count

2008-12-10 Thread John Martyniak
Grant, Basically I have created a text field that has the grouping value. All of the records would have the same value in this text field. This is accomplished with some pre-processing. When I capture the data, but before it is submitted into the index. -John On Dec 10, 2008, at 8:46

Re: Sum of Fields and Record Count

2008-12-10 Thread John Martyniak
Grant, For the more like this that would show the grouped results, once you have clicked on the item, so basically making another query, would it show a count of the more like this results? Something like cxxc and a collection 10 other items. -John On Dec 10, 2008, at 8:46 PM, Grant

Ordinal Field value and exact value for date.

2008-12-10 Thread amit rohatgi
Hi All, I am trying to use ord() function query ord() on created_date. I am concrened with the warning of ord behaviour as it uses actual entry creation in indices instead of created_date value. Does all entries created initially with different created_date will have same or nearly ordinal

RE: Multi Core - Max Core Count Recommendation

2008-12-10 Thread Lance Norskog
1) Our limit is: is how big a file do we want to copy around? We switched to multiple indexes because of the logistics of replicating/backing up giant Lucene index files. 2) Searching takes a little memory, sorting takes a lot of memory, and faceting eats like a black hole. There is an

Re: Sum of Fields and Record Count

2008-12-10 Thread Otis Gospodnetic
Hi John, This sounds a lot like field collapsing functionality that a few people are working on in SOLR-236: https://issues.apache.org/jira/browse/SOLR-236 Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: John Martyniak [EMAIL PROTECTED]

Re: SolrConfig.xml Replication

2008-12-10 Thread Otis Gospodnetic
Jeff, Are you using Solr 1.3 replication scripts? If so, I think it would be pretty simple to: 1) put all additional files to replicate to slaves to a specific location (or use a special naming scheme) on the master 2) write another script that uses scp or rsync to look for those additional

ExtractingRequestHandler and XmlUpdateHandler

2008-12-10 Thread Jacob Singh
Hey folks, I'm looking at implementing ExtractingRequestHandler in the Apache_Solr_PHP library, and I'm wondering what we can do about adding meta-data. I saw the docs, which suggests you use different post headers to pass field values along with ext.literal. Is there anyway to use the

Re: Sum of Fields and Record Count

2008-12-10 Thread John Martyniak
Otis, Thanks for the information. It looks like the field collapsing is similar to what I am looking. But is that in the current release? Is it stable? Is there anyway to do it in Solr 1.3? -John On Dec 10, 2008, at 9:59 PM, Otis Gospodnetic wrote: Hi John, This sounds a lot like

RE: Setting Request Handler

2008-12-10 Thread Deshpande, Mukta
Hi Grant, Thanks for the help. So now I can have multiple components, configured as last-components of standard request handler. Best Regards, Mukta -Original Message- From: Grant Ingersoll [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 10, 2008 9:25 PM To:

Re: Dealing with field values as key/value pairs

2008-12-10 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Thu, Dec 11, 2008 at 4:41 AM, Chris Hostetter [EMAIL PROTECTED] wrote: : This is really cool. U... How does it integrate with the Data Import : Handler? my DIH knowledge is extremely limited, but i'm guessing approach #1 is trivial (there is an easy way to concat DB values to build up

Re: SolrConfig.xml Replication

2008-12-10 Thread Noble Paul നോബിള്‍ नोब्ळ्
This is a known issue and I was planning to take it up soon. https://issues.apache.org/jira/browse/SOLR-821 On Thu, Dec 11, 2008 at 5:30 AM, Jeff Newburn [EMAIL PROTECTED] wrote: I am curious as to whether there is a solution to be able to replicate solrconfig.xml with the 1.4 replication.

Re: Solr Newbie question

2008-12-10 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Wed, Dec 10, 2008 at 11:00 PM, Rakesh Sinha [EMAIL PROTECTED] wrote: Hi - I am a new user of Solr tool and came across the introductory tutorial here - http://lucene.apache.org/solr/tutorial.html . I am planning to use Solr in one of my projects . I see that the tutorial mentions about

jboss and solr

2008-12-10 Thread Neha Bhardwaj
I am trying to configure jboss wih solr As stated in wiki docs I copied the solr.war but there is no web-apps folder currently present in jboss. So should I create web-apps manually and paste the war file there. I tried configuring solr with tomcat as well. I paste the war file in

Re: Sum of Fields and Record Count

2008-12-10 Thread Otis Gospodnetic
Hi John, It's not in the current release, but the chances are it will make it into 1.4. You can try one of the recent patches and apply it to your Solr 1.3 sources. Check list archives for more discussion, this field collapsing was just discussed again today/yesterday. markmail.org is a

Re: jboss and solr

2008-12-10 Thread Akshay
On Thu, Dec 11, 2008 at 11:21 AM, Neha Bhardwaj [EMAIL PROTECTED] wrote: I am trying to configure jboss wih solr As stated in wiki docs I copied the solr.war but there is no web-apps folder currently present in jboss. So should I create web-apps manually and paste the war file there.

minimum match issue with dismax

2008-12-10 Thread vinay kumar kaku
Hi, do any one know how to make sure minimum match in dismax is working? i change the values and try doing solrCtl restart indexname but i don't see it taking into effect. any body have an idea on this? thank you vinay _ You

Newbie Question boosting

2008-12-10 Thread ayyanar
I read many articles on boosting still iam not so clear on boosting. Can anyone explain the following questions with examples? 1) Can you given an example for field level boosting and document level boosting and the difference between two? 2) If we set the boost at field level (index time),

Nwebie Question on boosting

2008-12-10 Thread ayyanar
I read many articles on boosting still iam not so clear on boosting. Can anyone explain the following questions with examples? 1) Can you given an example for field level boosting and document level boosting and the difference between two? 2) If we set the boost at field level (index time),

Re: Nwebie Question on boosting

2008-12-10 Thread Robert Young
On Thu, Dec 11, 2008 at 6:49 AM, ayyanar [EMAIL PROTECTED]wrote: 1) Can you given an example for field level boosting and document level boosting and the difference between two? Field level boosting is used when one field is considered more or less important than another. For example, you may