CommonGrams phrase query

2011-01-17 Thread Salman Akram
Hi, I have made an index using CommonGrams. Now when I query a b and explain it, SOLR makes it +MultiPhraseQuery(Contents:(a a_b) b). Shouldn't it just be searching a_b? I am asking this coz even though I am using CommonGrams it's much slower than normal index which just searches on a b. Note:

Re: spell suggest response

2011-01-17 Thread satya swaroop
Hi Grijesh, Though i use autosuggest i maynot get the exact results, the order is not accurate.. As for example if i type http://localhost:8080/solr/terms/?terms.fl=spellterms.prefix=solrterms.sort=indexterms.lower=solrterms.upper.incl=true i get results

Re: spell suggest response

2011-01-17 Thread Grijesh
Hi Satya, In this example you are not using spellchecking .I am saying use spellcheck component also with Terms component so it will give you the spellcheck suggestion also. Then combined both the lists. - Thanx: Grijesh -- View this message in context:

Re: CommonGrams phrase query

2011-01-17 Thread Salman Akram
Ok sorry it was my fault. I wasn't using CommonGramsQueryFilter for query, just had Filter for indexing. The query seems fine now. On Mon, Jan 17, 2011 at 1:44 PM, Salman Akram salman.ak...@northbaysolutions.net wrote: Hi, I have made an index using CommonGrams. Now when I query a b and

sort problem

2011-01-17 Thread Philippe VINCENT-ROYOL
Hi guys, I use solr with utf8 charset and i've a sort problem. For example, i make a sort on a name field.. results looks like: Article Banana Foo aviation brunch ... So my question is, how to force solr to ignore case in result ? I would like to have result as: Article aviation Banana

Re: sort problem

2011-01-17 Thread Grijesh
Use Lowercase filter to lowering your data at both index time and search time it will make case insensitive - Thanx: Grijesh -- View this message in context: http://lucene.472066.n3.nabble.com/sort-problem-tp2271207p2271231.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: spell suggest response

2011-01-17 Thread satya swaroop
Hi Grijesh, i added both the termscomponent and spellcheck component to the terms requesthandler, when i send a query as http://localhost:8080/solr/terms?terms.fl=textterms.prefix=javarows=7omitHeader=truespellcheck=truespellcheck.q=javaspellcheck.count=20 the result i get is response

Re: sort problem

2011-01-17 Thread Philippe VINCENT-ROYOL
Le 17/01/11 10:32, Grijesh a écrit : Use Lowercase filter to lowering your data at both index time and search time it will make case insensitive - Thanx: Grijesh Thanks, so tell me if i m wrong... i need to modify my schema.xml to add lowercase filter and reindex my content?

Re: sort problem

2011-01-17 Thread Salman Akram
Yes. On Mon, Jan 17, 2011 at 2:44 PM, Philippe VINCENT-ROYOL vincent.ro...@gmail.com wrote: Le 17/01/11 10:32, Grijesh a écrit : Use Lowercase filter to lowering your data at both index time and search time it will make case insensitive - Thanx: Grijesh Thanks, so tell me if i

latest patches and big picture of search grouping

2011-01-17 Thread Marc Sturlese
I need to dive into search grouping / field collapsing again. I've seen there are lot's of issues about it now. Can someone point me to the minimum patches I need to run this feature in trunk? I want to see the code of the most optimised version and what's being done in distributed search. I

Re: exception obtaining write lock on startup

2011-01-17 Thread samarth s
In that case why is there a separate lock factory of SingleInstanceLockFactory? On Fri, Dec 31, 2010 at 6:25 AM, Lance Norskog goks...@gmail.com wrote: This will not work. At all. You can only have one Solr core instance changing an index. On Thu, Dec 30, 2010 at 4:38 PM, Tri Nguyen

Re: Single value vs multi value setting in tokenized field

2011-01-17 Thread kenf_nc
No, I have both, a single field (for free form text search), and individual fields (for directed search). I already duplicate the data and that's not a problem, disk space is cheap. What I wanted to know was whether it is best to make the single field multiValued=true or not. That is, should my

solrconfig.xml settings question

2011-01-17 Thread kenf_nc
In the Wiki and the book by Smiley and Pugh, and in the comments inside the solrconfig.xml file itself, it always talks about the various settings in the context of a blended use solr index. By that I mean, it assumes you are indexing and querying from the same solr instance. However, if I have a

Re: boilerpipe solr tika howto please

2011-01-17 Thread arnaud gaudinat
Thanks Ken, this what I wanted to know, I'm not very familiar with this kind of modification. However, I will try to do it and ask you some information in case of need. regards, Arno Le 14.01.2011 18:04, Ken Krugler a écrit : Hi Arno, On Jan 14, 2011, at 3:57am, arnaud gaudinat wrote:

Re: solrconfig.xml settings question

2011-01-17 Thread Ahmet Arslan
In the Wiki and the book by Smiley and Pugh, and in the comments inside the solrconfig.xml file itself, it always talks about the various settings in the context of a blended use solr index. By that I mean, it assumes you are indexing and querying from the same solr instance. However, if I

Clustering using Carrot2 clustering componet

2011-01-17 Thread Isha Garg
Dear All, Can anyone tell me how to use carrot2 clustering component to cluster search results. What are its dependencies ? Which type of changes are required in solr.config or anywhere else. Thanks! Isha

FilterQuery reaching maxBooleanClauses, alternatives?

2011-01-17 Thread Stefan Matheis
Hi List, we are sometimes reaching the maxBooleanClauses Limit (which is 1024, per default). So, the used query looks like: ?q=name:Stefanfq=5 10 12 15 16 [...] where the values are ids of users, which the current user is allowed to see - so long, nothing special. sometimes the filter-query

RE: sort problem

2011-01-17 Thread Brad Dewar
Haha, Yes, you're not wrong. The field you are sorting on should be a fieldtype that has the lowercase filter applied. You'll probably have to re-index your data, unless you happen to already have such a field (via copyField, perhaps). Brad -Original Message- From: Salman Akram

Re: Single value vs multi value setting in tokenized field

2011-01-17 Thread Erick Erickson
Functionally, the two options are equivalent, and I've never really heard of any speed difference. Assuming it's not that big a programming change, though, you probably want to just test... Do be aware of one subtle difference in the approaches, though. If the increment gap is != 1 then

Re: FilterQuery reaching maxBooleanClauses, alternatives?

2011-01-17 Thread Salman Akram
You can index a field which can the User types e.g. UserType (possible values can be TypeA,TypeB and so on...) and then you can just do ?q=name:Stefanfq=UserType:TypeB BTW you can even increase the size of maxBooleanClauses but in this case definitely this is not a good idea. Also you would hit

Re: sort problem

2011-01-17 Thread Erick Erickson
Note two things: 1 the lowercasefilter is NOT applied to the STORED data. So the display will still have the original case although the sorting should be what you want. 2 you should NOT be sorting on a tokenized field. Use something like KeywordTokenizer followed by the lowercase

Re: FilterQuery reaching maxBooleanClauses, alternatives?

2011-01-17 Thread Stefan Matheis
Thanks Salman, talking with others about problems really helps. Adding another FilterQuery is a bit too much - but combining both is working fine! not seen the wood for the trees =) Thanks, Stefan On Mon, Jan 17, 2011 at 2:07 PM, Salman Akram salman.ak...@northbaysolutions.net wrote: You

Re: FilterQuery reaching maxBooleanClauses, alternatives?

2011-01-17 Thread Salman Akram
You are welcome. By new field I meant if you don't have a field for UserType already. On Mon, Jan 17, 2011 at 6:22 PM, Stefan Matheis matheis.ste...@googlemail.com wrote: Thanks Salman, talking with others about problems really helps. Adding another FilterQuery is a bit too much - but

Re: Tika Update, no Data

2011-01-17 Thread Jörg Agatz
Hey! Thanks a lot, nice tip.. works fine.. But one Problem i have too... to indexing ZIP. i tryed : curl http://192.168.105.66:8983/solr/update/extract?literal.id=zipuprefix=attr_commit=true; -F myfile@constellio_standalone-1.0.zip and i get: Warning: Illegally formatted input field! curl:

Re: Tika Update, no Data

2011-01-17 Thread Stefan Matheis
missing the = char between myfile and @filename.ext? On Mon, Jan 17, 2011 at 2:47 PM, Jörg Agatz joerg.ag...@googlemail.comwrote: Hey! Thanks a lot, nice tip.. works fine.. But one Problem i have too... to indexing ZIP. i tryed : curl

Re: Tika Update, no Data

2011-01-17 Thread Jörg Agatz
ohh, your right.. embarrassing! i have tryed, and it works, but it seems it works not Perfect, the txt documents into the ZIP are not indext, lonly the Names of documents into the zip.. King

CommonGrams and SOLR - 1604

2011-01-17 Thread Salman Akram
Hi, I am trying to use CommonGrams with SOLR - 1604 patch but doesn't seem to work. If I don't add {!complexphrase} it uses CommonGramsQueryFilterFactory and proper bi-grams are made but of course doesn't use this patch. If I add {!complexphrase} it simply does it the old way i.e. ignore

resetting the statistics

2011-01-17 Thread Roxana Angheluta
Hi everybody, Is it possible to reset solr statistics without restarting solr or reloading cores? Conform the thread here http://osdir.com/ml/solr-user.lucene.apache.org/2010-03/msg01078.html this was not possible in March 2010. I am wondering if something like this has been implemented in

spellchecking even the key is true....

2011-01-17 Thread satya swaroop
Hi All, can we get the spellchecking results even when the keyword is true. As for spellchecking will give only to the wrong keywords, cant we get similar and near words of the keyword though the spellcheck.q is true.. as an example

partitioning documents with fields

2011-01-17 Thread Claudio Martella
Hi, I'm crawling different intranets so i developed a nutch plugin to add a static field for each of these crawls. I do have now in SOLR my documents with their specific craw field. If i search withing solr i can see my documents being returned with that field. The field definition in the schema

Re: partitioning documents with fields

2011-01-17 Thread Erick Erickson
String fields are unanalyzed, so case matters. Are you sure you're not using a different case (try KeywordTokenizer + lowercaseFilter if you want these normalized to, say, lower case). If that isn't the problem, could we see the results if you add debugQuery=on to your URL? That often helps

Re: partitioning documents with fields

2011-01-17 Thread Claudio Martella
Thanks for your answer. Yes, schema browser shows that the field contains the right values as i expect. From debugQuery=on i see there must be some problem though: str name=rawquerystringcrawl:DIGITALDATA/str str name=querystringcrawl:DIGITALDATA/str str

Re: partitioning documents with fields

2011-01-17 Thread Ahmet Arslan
It looks like there's some problem with my dismax query handler. It doesn't recognize the query with the colon format. Here's the handler definition: It is expected behavior of dismax. You can append/use defType=lucene for colon format.

Re: what would cause large numbers of executeWithRetry INFO messages?

2011-01-17 Thread sakunthalakishan
I am facing exact same issue. Did you find out root cause for this? Please let me know any information you have -- View this message in context: http://lucene.472066.n3.nabble.com/what-would-cause-large-numbers-of-executeWithRetry-INFO-messages-tp1453417p2274077.html Sent from the Solr -

Re: partitioning documents with fields

2011-01-17 Thread Erick Erickson
As Ahmet says, this is what dismax does. You could also append a filter query (fq=crawl:DIGITALDATA) to your query. eDismax supports fielded queries, see: https://issues.apache.org/jira/browse/SOLR-1553 This is already in the trunk and 3.x code lines I'm pretty sure. Best Erick On Mon, Jan 17,

RE: Spell Checking a multi word phrase

2011-01-17 Thread Dyer, James
Camden, You may also want to be aware that there is a new feature added to Spell Check's collate functionality that will guarantee the collations will return hits. It also is able to return more than one collation and tell you how many hits each one would result in if re-queried. This might

RE: spellchecking even the key is true....

2011-01-17 Thread Dyer, James
Add spellcheck.onlyMorePopular=true to your query and I think it'll do what you want. See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.onlyMorePopular for more info. One caveat is if you use spellcheck.collate, this will likely result in useless, nonsensical collations most of

Re: Spell Checking a multi word phrase

2011-01-17 Thread Camden Daily
James, Thank you, but I'm not sure that will work for my needs. I'm very interested in contextual spell checking. Take for example the author stephenie meyer. stephenie is a far less popular spelling than stephanie, but in this context it's the correct option. I feel like shingles with an un

RE: solrj http client 4

2011-01-17 Thread Steven A Rowe
Hi Stevo, Thanks for reviewing the Maven POMs in LUCENE-2657 - I appreciate it. In those poms, not all modules have explicit version and groupId which is a bad practice. Really? According to the POM best practices section in Sonatype's Maven book

RE: Spell Checking a multi word phrase

2011-01-17 Thread Dyer, James
Camden, Have you seen SmileyPugh's Solr book? They describe something very similar to what you're trying to do on p180ff. The difference seems to be they use a field that only has a couple of terms so they don't bother with shingles. The book makes a big point about using spellcheck.q in

what is the diff between katta and solrcloud?

2011-01-17 Thread Sean Bigdatafun
Are their goal fudanmentally different at all or just different approaches to solve the same problem (sharding)? Can someone give a technical review? Thanks, --Sean

Does field collapsing (with facet) reduce performance?

2011-01-17 Thread Andy
Just wanted to know how efficient field collapsing is. And if there is a performance penalty, how big is it likely to be? I'm interested in using field collapsing with faceting. Thanks.

Any way to query by offset?

2011-01-17 Thread 5 Diamond IT
Say I do a query that matches 4000 documents. Is there a query syntax or parser that would allow me to say retrieve offsets 1000, 2000, 3000? I would prefer to not do multiple starts and limit 1's. Thanks in advance. Steve

Re: Any way to query by offset?

2011-01-17 Thread Erick Erickson
Have you seen the start and rows parameters? If they don't work, perhaps you could explain what you need that they don't provide. Best Erick On Mon, Jan 17, 2011 at 4:58 PM, 5 Diamond IT i...@smallbusinessconsultingexperts.com wrote: Say I do a query that matches 4000 documents. Is there a

Re: Any way to query by offset?

2011-01-17 Thread Markus Jelsma
I think Steve wants the 1000th, 2000th and 3000th document from the query. And since there's no method of doing so you're constrained to executing three queries with rows=1 and start is resp. 1000, 2000 and 3000. If you want these documents to return you will have to do multiple queries with

Re: Any way to query by offset?

2011-01-17 Thread 5 Diamond IT
I want to start at row 1000, 2000, and 3000 and retrieve those 3 rows ONLY from the result set of whatever search was used. Yes, I can do 3 queries, start=1000 and limit 1, etc., but, want ONE query to get those 3 rows from the result set. It's the poor mans way of doing price buckets the way I

Re: Does field collapsing (with facet) reduce performance?

2011-01-17 Thread Markus Jelsma
There is always CPU and RAM involved for every nice component you use. Just how much the penalty is depends completely on your hardware, index and type of query. Under heavy load it numbers will change. Since we don't know your situation and it's hard to predict without benchmarks, you should

Re: Is deduplication possible during Tika extract?

2011-01-17 Thread Markus Jelsma
In my opinion it should work for every update handler. If you're really sure your configuration if fine and it still doesn't work you might have to file an issue. Your configuration looks alright but don't forget you've configured overwriteDupes=false! Hello, here is an excerpt of my

NRT

2011-01-17 Thread Dennis Gearon
How is NRT doing, being used in production? Which Solr is it in? And is there built in Spatial in that version? How is Solr 4.x doing? Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from

Re: Does field collapsing (with facet) reduce performance?

2011-01-17 Thread Andy
I understand that the specific figures differ for everybody. I just wanted to see if anyone who has used this feature could share their experience. A ballpark figure -- e.g. 50% slowdown or 10 times slower -- would be helpful. --- On Mon, 1/17/11, Markus Jelsma markus.jel...@openindex.io

Re: Spell Checking a multi word phrase

2011-01-17 Thread Camden Daily
James, Thanks, the spellcheck.q was exactly what I needed to be using! -Camden On Mon, Jan 17, 2011 at 3:54 PM, Dyer, James james.d...@ingrambook.comwrote: Camden, Have you seen SmileyPugh's Solr book? They describe something very similar to what you're trying to do on p180ff. The

Re: Multi-word exact keyword case-insensitive search suggestions

2011-01-17 Thread Chamnap Chhorn
No other way around to fit this requirement? On Sat, Jan 15, 2011 at 10:01 AM, Chamnap Chhorn chamnapchh...@gmail.comwrote: Ahh, thanks guys for helping me! For Adam solution, it doesn't work for me. Here is my Field, FieldType, and solr query: fieldType name=text_keyword

Re: NRT

2011-01-17 Thread Jason Rutherglen
How is NRT doing, being used in production? It works and there are not any lingering bugs as it's been available for quite a while. Which Solr is it in? Per-segment field cache is used transparently by Solr, IndexWriter.getReader is what's not used yet. I'm not sure where per-segment

Re: Solr: using to index large folders recursively containing lots of different documents, and querying over the web

2011-01-17 Thread Lance Norskog
Solr itself does all three things. There is no need for Nutch- that is needed for crawling web sites, not file systems (as the original question specifies). Solr operates as a web service, running in any Java servlet container. Detecting changes to files is more tricky: there is no

just got 'the book' already have a question

2011-01-17 Thread Dennis Gearon
First of all, seems like a good book, Solr-14-Enterprise-Search-Server.pdf Question, is it possible to choose locale at search time? So if my customer is querying across cultural/national/linguistic boundaries and I have the data for him different languages in the same index, can I sort based

Carrot2 clustering Component

2011-01-17 Thread Isha Garg
Hi, Please tell me how can I get the libraries and plugins for carrot2 clustering component in solr1.4.Tell me the site from where i can get them. Thanks! Isha

Carrot2 clustering component

2011-01-17 Thread Isha Garg
Hi, I am not able to understand the caarot2 clustering component from http://wiki.apache.org/solr/ClusteringComponent please provide me more detailed information if someone had already worked on this. How to run this and use this during search query. Thanks! Isha

Re: Carrot2 clustering component

2011-01-17 Thread Otis Gospodnetic
Isha, You'll get more and better help if you provide more details about what you have done, what you have tried, what isn't working, what errors or behaviour you are seeing, etc. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search ::

explicit field type descriptions

2011-01-17 Thread Dennis Gearon
Is there any tabular data anywhere on ALL field types and ALL options? For example, I've looked everywhere in the last hour, and I don't see anywhere on Solr site, google, or in the 1.4 manual where it says whether a copyField 'directive' can be made ' required=true '. Dennis Gearon

Getting started with writing parser

2011-01-17 Thread Dinesh
how to write a parser program that will convert log files into XML.. -- View this message in context: http://lucene.472066.n3.nabble.com/Getting-started-with-writing-parser-tp2278092p2278092.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Not storing, but highlighting from document sentences

2011-01-17 Thread Tarjei Huse
On 01/12/2011 12:02 PM, Otis Gospodnetic wrote: Hello, I'm indexing some content (articles) whose text I cannot store in its original form for copyright reason. So I can index the content, but cannot store it. However, I need snippets and search term highlighting. Any way to

Re: Carrot2 clustering component

2011-01-17 Thread Isha Garg
On Tuesday 18 January 2011 11:12 AM, Otis Gospodnetic wrote: Isha, You'll get more and better help if you provide more details about what you have done, what you have tried, what isn't working, what errors or behaviour you are seeing, etc. Otis Sematext ::http://sematext.com/ :: Solr -

Re: Carrot2 clustering component

2011-01-17 Thread Otis Gospodnetic
Isha, Next, you need to run the actual search so Carrot2 has some search results to cluster. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Isha Garg isha.g...@orkash.com To:

Re: NRT

2011-01-17 Thread Otis Gospodnetic
Hi, How is NRT doing, being used in production? Which Solr is it in? Unless I missed it, I don't think there is true NRT in Solr just yet. And is there built in Spatial in that version? How is Solr 4.x doing? Well :) 3 ways to know this sort of stuff: * follow the dev list - high

Re: just got 'the book' already have a question

2011-01-17 Thread Otis Gospodnetic
Hi, Don't think so. If you search across multiple languages and sort, I think the sort if based on UTF8 order. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Dennis Gearon

Re: just got 'the book' already have a question

2011-01-17 Thread Otis Gospodnetic
I could be wrong, have a look at http://search-lucene.com/?q=locale+sortfc_project=Solr plus: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.CollationKeyFilterFactory Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search ::

Re: Not storing, but highlighting from document sentences

2011-01-17 Thread Otis Gospodnetic
Hi Tarjei, :) Yeah, that is the solution we are going with, actually. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Tarjei Huse tar...@scanmine.com To: solr-user@lucene.apache.org

Re: what is the diff between katta and solrcloud?

2011-01-17 Thread Otis Gospodnetic
Sean, First 2 things that come to mind: * Katta keeps shards on HDFS and they then get deployed to regular servers/FS * SolrCloud doesn't involve HDFS at all. * Katta is a Lucene-level system * SolrCloud is a Solr-level system Both make heavy use of ZooKeeper. Otis Sematext ::

Re: explicit field type descriptions

2011-01-17 Thread Gora Mohanty
On Tue, Jan 18, 2011 at 11:55 AM, Dennis Gearon gear...@sbcglobal.net wrote: Is there any tabular data anywhere on ALL field types and ALL options? There is this: http://search.lucidimagination.com/search/document/CDRG_ch04_4.4.2 Not sure if it meets your needs. For example, I've looked

Re: Getting started with writing parser

2011-01-17 Thread Gora Mohanty
On Tue, Jan 18, 2011 at 11:59 AM, Dinesh mdineshkuma...@karunya.edu.in wrote: how to write a parser program that will convert log files into XML.. [...] There is no point to starting multiple threads on this issue, hoping that someone will somehow solve your problem. You have been given the

Re: what is the diff between katta and solrcloud?

2011-01-17 Thread Sean Bigdatafun
Otis, Any pointer to an architecture view of either system? Thanks, Sean On Mon, Jan 17, 2011 at 11:27 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Sean, First 2 things that come to mind: * Katta keeps shards on HDFS and they then get deployed to regular servers/FS * SolrCloud