Re: indexing rich documents

2010-07-14 Thread satya swaroop
ya i checked the extraction request handler but couldnt get the info... i installed tika-0.7 and copied the jar files into the solr home library.. i started sending the pdf/html files then i get a lazy error. i am using tomcat and solr 1.4

Cache full text into memory

2010-07-14 Thread Li Li
I want to cache full text into memory to improve performance. Full text is only used to highlight in my application(But it's very time consuming, My avg query time is about 250ms, I guess it will cost about 50ms if I just get top 10 full text. Things get worse when get more full text because

Re: Cache full text into memory

2010-07-14 Thread findbestopensource
You have two options 1. Store the compressed text as part of stored field in Solr. 2. Using external caching. http://www.findbestopensource.com/tagged/distributed-caching You could use ehcache / Memcache / Membase. The problem with external caching is you need to synchronize the deletions and

Re: Ranking position in solr

2010-07-14 Thread Chamnap Chhorn
I sent this command: curl http://localhost:8081/solr/update -F stream.body=' commit /', but it doesn't reload. It doesn't reload automatically after every commit or optimize unless I add new document then i commit. Any idea? On Tue, Jul 13, 2010 at 4:54 PM, Ahmet Arslan iori...@yahoo.com wrote:

Re: Cache full text into memory

2010-07-14 Thread Li Li
I have already store it in lucene index. But it is in disk and When a query come, it must seek the disk to get it. I am not familiar with lucene cache. I just want to fully use my memory that load 10GB of it in memory and a LRU stragety when cache full. To load more into memory, I want to compress

Re: ShingleFilter failing with more terms than index phrase

2010-07-14 Thread Ethan Collins
Hi Steve, Thanks for your kind response. I checked PositionfilterFactory (re-index as well) but that also didn't solve the problem. Interesting the problem is not reproduceable from Solr's Field Analysis page, it manifests only when it's in a query. I guess the subject for this post is not very

Re: Cache full text into memory

2010-07-14 Thread findbestopensource
I have just provided you two options. Since you already store as part of the index, You could try external caching. Try using ehcache / Membase http://www.findbestopensource.com/tagged/distributed-caching . The caching system will do LRU and is much more efficient. On Wed, Jul 14, 2010 at 12:39

Re: Ranking position in solr

2010-07-14 Thread Ahmet Arslan
I sent this command: curl http://localhost:8081/solr/update -F stream.body=' commit /', but it doesn't reload. It doesn't reload automatically after every commit or optimize unless I add new document then i commit. Hmm. May be there is an easier way to force it? (add empty/dummy doc) But

MultiValue dynamicField and copyField

2010-07-14 Thread Jan Simon Winkelmann
Hi everyone, i was wondering if the following was possible somehow: dynamicField name=*_m_i type=text indexed=true stored=false required=false multiValued=true/ dynamicField name=*_m_i_f type=string indexed=true stored=true required=false multiValued=true/ copyField source=*_m_i dest=*_m_i_f/

Re: ShingleFilter failing with more terms than index phrase

2010-07-14 Thread Ethan Collins
Hi Steve, Thanks, wrapping with PositionFilter actually worked the search and score -- I made a mistake while re-indexing last time. Trying to analyze PositionFilter: didn't understand why earlier the search of 'Nina Simone I Put' failed since atleast the phrase 'Nina Simone' should have matched

Re: Cache full text into memory

2010-07-14 Thread Li Li
Thank you. I don't know which cache system to use. In my application, the cache system must support compression algorithm which has high compression ratio and fast decompression speed(because each time it get from cache, it must decompress). 2010/7/14 findbestopensource

Re: ShingleFilter failing with more terms than index phrase

2010-07-14 Thread Ethan Collins
Trying to analyze PositionFilter: didn't understand why earlier the search of 'Nina Simone I Put' failed since atleast the phrase 'Nina Simone' should have matched against title_0 field. Any clue? Please note that I have configure the ShingleFilter as bigrams without unigrams. [Honestly, I am

Re: Cache full text into memory

2010-07-14 Thread findbestopensource
I doubt about it. Caching system is a key value store. You have to use some compression library to compress and decompress your data. Caching system helps to retrieve fast. Anyways please take a look of each of the caching system features. Regards Aditya www.findbestopensource.com On Wed, Jul

DataImporter

2010-07-14 Thread Amdebirhan, Samson, VF-Group
Hi all, Can someone help me in this ? Importing 2 different entities one by one (specifying through the entity parameter) why is the second import deleting the previous created index for first entity and vice-versa? The documentation provided by the solr website reports that :

Re: DataImporter

2010-07-14 Thread Bilgin Ibryam
Is it possible that you have the same IDs in both entities? Could you show here your entity mappings? Bilgin Ibryam On Wed, Jul 14, 2010 at 11:48 AM, Amdebirhan, Samson, VF-Group samson.amdebir...@vodafone.com wrote: Hi all, Can someone help me in this ? Importing 2 different entities

question on wild card

2010-07-14 Thread Mark N
I have a database field = hello world and i am indexing to *text* field with standard analyzer ( text is a copy field of solr) Now when user gives a query text:hello world% , how does the query is interpreted in the background are we actually searchingtext: hello OR text: world%(

RE: DataImporter

2010-07-14 Thread Amdebirhan, Samson, VF-Group
Hi Bilgin It's right I have the same primary key, but testing with the property preImportDeleteQuery into the tag entity of the data_config.xml. So now it is working in fact it deletes only the indexs/docs for which I make the full-import based on the field I decleare for the

Re: Strange the when search with dismax

2010-07-14 Thread kenf_nc
Sounds like you want the 'text' fieldType (or equivalent) and are using 'string' or 'lowercase'. Those must match all exactly (well, case insensitively in the case of 'lowercase'). The TextType field types (like 'text') do tokenizations so matches will occur under many more conditions. -- View

Re: MultiValue dynamicField and copyField

2010-07-14 Thread kenf_nc
Yep, my schema does this all day long. -- View this message in context: http://lucene.472066.n3.nabble.com/MultiValue-dynamicField-and-copyField-tp965941p966536.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Strange the when search with dismax

2010-07-14 Thread Jonathan Rochkind
the sounds like it might be a stopword. Are you using stopwords in any of your fields covered by the dismax search? But not in some of the other fields covered by dismax? the combination of dismax and stopwords can result in unexpected behavior if you aren't careful. I wrote about this a bit

DIH: post-delta-import DB cleanup hook?

2010-07-14 Thread Joachim M
I'm updating my solr index using a queue table in my database. When records get updated, a row gets inserted into the queue table with pk, timestamp, deleted flag, and status. DIH made it easy to use this to identify new/udpated recods as well as deletes. I need to do some post processing

AW: MultiValue dynamicField and copyField

2010-07-14 Thread Jan Simon Winkelmann
I figured out where the problem was. The destination wildcard was actually matching the wrong field. I changed the fieldnames around a bit and now everything works fine. Thanks! -Ursprüngliche Nachricht- Von: kenf_nc [mailto:ken.fos...@realestate.com] Gesendet: Mittwoch, 14. Juli

RE: Foreign characters question

2010-07-14 Thread Blargy
Thanks for the reply but that didnt help. Tomcat is accepting foreign characters but for some reason when it reads the synonyms file and it encounters that character ñ it doesnt appear correctly in the Field Analysis admin. It shows up as �. If I query exactly for ñ it will work but the

Re: Foreign characters question

2010-07-14 Thread Robert Muir
is your synonyms file in UTF-8 encoding? On Wed, Jul 14, 2010 at 11:11 AM, Blargy zman...@hotmail.com wrote: Thanks for the reply but that didnt help. Tomcat is accepting foreign characters but for some reason when it reads the synonyms file and it encounters that character ñ it doesnt

date boosting and dismax

2010-07-14 Thread Shawn Heisey
I've started a couple of previous threads on this topic, but I did not have a good date field in my index to use at the time. I now have a schema with the document's post_date in tdate format, so I would like to actually do some implementation. Right now, we are not doing relevancy ranking

RE: date boosting and dismax

2010-07-14 Thread Tim Gilbert
I used this before my search term and it works well: {!boost b=recip(ms(NOW,publishdate),3.16e-11,1,1)} Its enough that when I search for *:* the articles appear in chronological order. Tim -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: Wednesday, July 14, 2010

Re: Foreign characters question

2010-07-14 Thread Blargy
How can I tell and/or create a UTF-8 synonyms file? Do I have to instruct solr that this file is UTF-8? -- View this message in context: http://lucene.472066.n3.nabble.com/Foreign-characters-question-tp964078p967037.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Foreign characters question

2010-07-14 Thread Blargy
Nevermind. Apparently my IDE (Netbeans) was set to No encoding... wtf. Changed it to UTF-8 and recreated the file and all is good now. Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Foreign-characters-question-tp964078p967058.html Sent from the Solr - User mailing

Re: date boosting and dismax

2010-07-14 Thread Shawn Heisey
One of the replies I got on a previous thread mentioned range queries, with this example: [NOW-6MONTHS TO NOW]^5.0 , [NOW-1YEARS TO NOW-6MONTHS]^3.0 [NOW-2YEARS TO NOW-1YEARS]^2.0 [* TO NOW-2YEARS]^1.0 Something like this seems more flexible, and into it, I read an implication that the

RE: date boosting and dismax

2010-07-14 Thread Tim Gilbert
Re: flexibility. This boost does decays over time, the further it gets from now the less of a boost it receives. You are right though, it doesn't allow a fine degree of control, particularly if you don't want to smoothly decay the boost. I hadn't considered your suggestion, so I'll keep it in

Re: Foreign characters question

2010-07-14 Thread Robert Muir
On Wed, Jul 14, 2010 at 12:59 PM, Blargy zman...@hotmail.com wrote: Nevermind. Apparently my IDE (Netbeans) was set to No encoding... wtf. Changed it to UTF-8 and recreated the file and all is good now. Thanks! fyi I created an issue with your example here:

Re: dismax and date boosts

2010-07-14 Thread Shawn Heisey
I have finally figured out how to turn this off in Thunderbird 3: Go to Tools, Options, Display, and turn off Display emoticons as graphics. On 4/12/2010 12:04 PM, Shawn Heisey wrote: On 4/12/2010 11:55 AM, Shawn Heisey wrote: [NOW-6MONTHS TO NOW]^5.0 , [NOW-1YEARS TO NOW-6MONTHS]^3.0

Re: date boosting and dismax

2010-07-14 Thread Jonathan Rochkind
Shawn Heisey wrote: [* TO NOW-2YEARS]^1.0 I also seem to remember seeing something about how to do less than in range queries as well as the less than or equal to implied by the above, but I cannot find it now. Ranges with square brackets [] are inclusive. Ranges with parens () are

Re: Using hl.regex.pattern to print complete lines

2010-07-14 Thread Peter Spam
Any other thoughts, Chris? I've been messing with this a bit, and can't seem to get (?m)^.*$ to do what I want. 1) I don't care how many characters it returns, I'd like entire lines all the time 2) I just want it to always return 3 lines: the line before, the actual line, and the line after.

Multiple cores or not?

2010-07-14 Thread scrapy
Hi, We are planning to host on same server different website that will use solr. What will be the best? One core with a field i schema: site1, site2 etc... and then add this in every query Or one core per site? Thanks for your help

limiting the total number of documents matched

2010-07-14 Thread Paul
I'd like to limit the total number of documents that are returned for a search, particularly when the sort order is not based on relevancy. In other words, if the user searches for a very common term, they might get tens of thousands of hits, and if they sort by title, then very high relevancy

RE: limiting the total number of documents matched

2010-07-14 Thread Nagelberg, Kallin
So you want to take the top 1000 sorted by score, then sort those by another field. It's a strange case, and I can't think of a clean way to accomplish it. You could do it in two queries, where the first is by score and you only request your IDs to keep it snappy, then do a second query against

setting up clustering

2010-07-14 Thread Justin Lolofie
I'm trying to enable clustering in solr 1.4. I'm following these instructions: http://wiki.apache.org/solr/ClusteringComponent However, `ant get-libraries` fails for me. Before it tries to download the 4 jar files, it tries to compile lucene? Is this necessary? Has anyone gotten clustering

Re: limiting the total number of documents matched

2010-07-14 Thread Paul
I was hoping for a way to do this purely by configuration and making the correct GET requests, but if there is a way to do it by creating a custom Request Handler, I suppose I could plunge into that. Would that yield the best results, and would that be particularly difficult? On Wed, Jul 14, 2010

Re: limiting the total number of documents matched

2010-07-14 Thread Paul
I thought of another way to do it, but I still have one thing I don't know how to do. I could do the search without sorting for the 50th page, then look at the relevancy score on the first item on that page, then repeat the search, but add score that relevancy as a parameter. Is it possible to do

Less convoluted way to query for an empty string?

2010-07-14 Thread Mat Brown
Hi all, I can't seem to find a way to query for an empty string that is simpler than this: field_name:[* to ] Things that don't work: field_name: field_name[ TO ] Is the one I'm using the simplest option? If so, is there a particular reason the other ones I mention don't work? Just curious

Re: csv response writer

2010-07-14 Thread Tommy Chheng
I fixed the path of the queryResponseWriter class in the example solrconfig.xml. This was successfully applied against solr 4.0 trunk. A few quirks: * When I didn't specify a default Delimiter, it printed out null as delimiter. I couldn't figure out why because init(NamedList args)

Re: Less convoluted way to query for an empty string?

2010-07-14 Thread Lukas Kahwe Smith
On 15.07.2010, at 00:09, Mat Brown wrote: Hi all, I can't seem to find a way to query for an empty string that is simpler than this: field_name:[* to ] Things that don't work: field_name: field_name[ TO ] Is the one I'm using the simplest option? If so, is there a particular

Re: Solr search streaming/callback

2010-07-14 Thread Chris Hostetter
: I was wondering if anyone was aware of any existing functionality where : clients/server components could register some search criteria and be : notified of newly committed data matching the search when it becomes : available you can register a postCommit listener in your solrconfig.xml file

Re: Solr index optimizing help

2010-07-14 Thread Erick Erickson
Does your schema have a unique id specified? If so, is it possible that you indexed many documents that had the same ID, thus deleting previous documents with the same ID? That would account for it, but it's a shot in the dark... Best Erick On Tue, Jul 13, 2010 at 6:20 AM, Karthik K

Re: stemmed terms and phrases in a combined query

2010-07-14 Thread Chris Hostetter
: My question is how do i query that? : q=text_clean:Nike's new text_orig:running shoes : seems like it would work, but not sure its the best way. that's a perfectly good way to do it. : Is there a way i can tell the parser(or extend it) so that every phrase : query it will use one field and

Re: Using stored terms for faceting

2010-07-14 Thread Chris Hostetter
: is it possible to use the stored terms of a field for a faceted search? No, the only thing stored fields can be used for is document centric opterations (ie: once you have a small set of individual docIds, you can access the stored fields to return to the user, or highlight, etc...) : I

Re: question on wild card

2010-07-14 Thread Erick Erickson
The best way to understand how things are parsed is to go to the solr admin page (Full interface link?) and click the debug info box and submit your query. That'll tell you exactly what happens. Alternatively, you can put debugQuery=on on your URL... HTH Erick On Wed, Jul 14, 2010 at 8:48 AM,

Re: Strange the when search with dismax

2010-07-14 Thread Erick Erickson
If the other suggestions don't work, you need to show us the relevant portions of your schema.xml, and probably query output with debug=on tacked on... Here are some pointers for getting help... http://wiki.apache.org/solr/UsingMailingLists Best Erick 2010/7/14 Jonathan Rochkind

Re: How to find first document for the ALL search

2010-07-14 Thread Chris Hostetter
: I have found that this search crashes: : : /solr/select?q=*%3A*fq=start=0rows=1fl=id Ouch .. that exception is kind of hairy. it suggests that your index may have been corrupted in some way -- do you have nay idea what happened? have you tried using hte CheckIndex tool to see what it

Re: range faceting with integers

2010-07-14 Thread Chris Hostetter
: Subject: range faceting with integers : References: aanlktinhis-wwfljo3yd-wzhkezgmmr2qy8_junmw...@mail.gmail.com : In-Reply-To: aanlktinhis-wwfljo3yd-wzhkezgmmr2qy8_junmw...@mail.gmail.com http://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new

Re: Solr index optimizing help

2010-07-14 Thread Karthik K
yeah, that happened :( ,lost lot of data because of it. Can some one explain the terms numDocs and maxDoc ?? will the difference indicate the duplicates?? Thank you, karthik

about warm up

2010-07-14 Thread Li Li
I want to load full text into an external cache, So I added so codes in newSearcher where I found the warm up takes place. I add my codes before solr warm up which is configed in solrconfig.xml like this: listener event=firstSearcher class=solr.QuerySenderListener arr name=queries

Re: Solr index optimizing help

2010-07-14 Thread Otis Gospodnetic
Hi, The difference indicates deletes. Optimize the index (which expunges docs marked as deleted) and the difference disappears. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Karthik

Re: Multiple cores or not?

2010-07-14 Thread Otis Gospodnetic
Hello there, I'm guessing the sites will be searched separately. In that case I'd recommend a core for each site. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: scr...@asia.com

How to speed up solr search speed

2010-07-14 Thread marship
Hi. All. I got a problem with distributed solr search. The issue is I have 76M documents spread over 76 solr instances, each instance handles 1M documents. Previously I put all 76 instances on single server and when I tested I found each time it runs, it will take several times,

how to eliminating scoring from a query?

2010-07-14 Thread oferiko
in http://www.lucidimagination.com/files/file/LIWP_WhatsNew_Solr1.4.pdf http://www.lucidimagination.com/files/file/LIWP_WhatsNew_Solr1.4.pdf under the performance it mentions: Queries that don’t sort by score can eliminate scoring, which speeds up queries how exactly can i do that? If i don't