Re: Query time noun, verb boosting

2011-06-23 Thread Anshum
Pooja, You could use UIMA (or any other) Parts of Speech Tagger. You could read a little more about it here. http://uima.apache.org/downloads/sandbox/hmmTaggerUsersGuide/hmmTaggerUsersGuide.html#sandbox.tagger.annotatorDescriptor This would help you annotate and segregate nouns from verbs in the

Re: Read past EOF error due to broken connection

2011-06-23 Thread pravesh
Did you do manual copy of index from Master to Slave of servers. I suppose, it won't be copied properly. If this is the case, then you can check the size of indexes on both servers. Otherwise, you would've to recreate the indexes. Thanx Pravesh -- View this message in context:

Re: Read past EOF error due to broken connection

2011-06-23 Thread Anuj Kumar
Hi Pravesh, I was just indexing some documents remotely on a single node instance when the connection broke. So, there isn't any manual copy that I did. I think I will go ahead and re-index. Just curious to know, if there is any option to specify the check-point for last commit and rollback to

Re: Complex situation

2011-06-23 Thread roySolr
Hello, I have change my db dates to the correct format like 2011-01-11T00:00:00Z. Now i have the following data: Manchester Store2011-01-01T00:00:00Z 2011-31-03T00:00:00Z 18:00 Manchester Store2011-01-04T00:00:00Z 2011-31-12T00:00:00Z 20:00

Re: Problem with SolrTestCaseJ4

2011-06-23 Thread Tarjei Huse
On 06/20/2011 01:51 PM, Robert Muir wrote: you must use junit 4.7.x, not junit 4.8.x Is there a way around this? Depending on a specific Junit version is bound to cause problems when working with other packages. For example Spring 2.5.6 testframework does not work newer junit versions than 4.4.

solr scale on trie fields

2011-06-23 Thread Omri Cohen
Hello, I am trying to normalize values of a certain field, and then use them in a function query. For that I need to know the maximum and minimum values the field gets. I am thinking of using the scale(x, minTarget, maxTarget) query function, but i read in solr book (Solr 1.4 enterprise search

Removing duplicate documents from search results

2011-06-23 Thread Pranav Prakash
How can I remove very similar documents from search results? My scenario is that there are documents in the index which are almost similar (people submitting same stuff multiple times, sometimes different people submitting same stuff). Now when a search is performed for keyword, in the top N

Re: Removing duplicate documents from search results

2011-06-23 Thread Omri Cohen
What you need to do, is to calculate some HASH (using any message digest algorithm you want, md5, sha-1 and so on), then do some reading on solr field collapse capabilities. Should not be too complicated.. *Omri Cohen* Co-founder @ yotpo.com | o...@yotpo.com | +972-50-7235198 | +972-3-6036295

Re: Removing duplicate documents from search results

2011-06-23 Thread Pranav Prakash
This approach would definitely work is the two documents are *Exactly* the same. But this is very fragile. Even if one extra space has been added, the whole hash would change. What I am really looking for is some %age similarity between documents, and remove those documents which are more than 95%

Re: Complex situation

2011-06-23 Thread lee carroll
Hi Roy, You have no relationship between time and date due to the de-normalising of your data. I don't have a good answer to this and I guess this is a classic question. One approach is maybe to do the following: make sure you have field collapsing available. trunk or a patch maybe index not

Re: Problem with SolrTestCaseJ4

2011-06-23 Thread Robert Muir
On Thu, Jun 23, 2011 at 4:10 AM, Tarjei Huse tar...@scanmine.com wrote: On 06/20/2011 01:51 PM, Robert Muir wrote: you must use junit 4.7.x, not junit 4.8.x Is there a way around this? No, the only thing option we can do is decide to require 4.8 Depending on a specific Junit version is

Re: [ANNOUNCEMENT] PHP Solr Extension 1.0.1 Stable Has Been Released

2011-06-23 Thread Israel Ekpo
I am working on that, I hope to have an answer within a month or so. On Tue, Jun 21, 2011 at 9:51 AM, roySolr royrutten1...@gmail.com wrote: Are you working on some changes to support earlier versions of PHP? -- View this message in context:

Re: Removing duplicate documents from search results

2011-06-23 Thread pravesh
Would you care to even index the duplicate documents? Finding duplicacy in content fields would be not so easy as in some untokenized/keyword field. May be you could do this filtering at indexing time before sending the document to SOLR. Then the question comes, which one document should go(from a

Re: Complex situation

2011-06-23 Thread roySolr
Hello Lee, I thought maybe this is a solution: I can index every night the correct openinghours for next day. So tonight(00:01) i can index the openinghours for 2011-24-06. My query in my dih can looks like this: select *

Re: Dismax + spatial constraints

2011-06-23 Thread kaiserwaseem
i am using dismax to boost my field as placeName^1.8 schemeName^1.5 text^1.0, now I also want to boost my results with respect to distance to show closest areas first, i sort it with geodist but it show irrelevant results on top, i also tried q={!boost b=recip(geodist(50.1, -0.86, myGeoField),

Re: SEVERE: java.lang.NoSuchFieldError: core Solr branch3.x

2011-06-23 Thread Markus Jelsma
The usual ant clean won't help either. A fresh check out did the trick. On Thursday 23 June 2011 03:24:42 Yonik Seeley wrote: I just tried branch_3x and couldn't reproduce this. Looks like maybe there is something wrong with your build, or some old class files left over somewhere being picked

Re; DIH Scheduling

2011-06-23 Thread simon
The Wiki page describes a design for a scheduler, which has not been committed to Solr yet (I checked). I did see a patch the other day (see https://issues.apache.org/jira/browse/SOLR-2305) but it didn't look well tested. I think that you're basically stuck with something like cron at this time.

Re: response time for pdf indexing

2011-06-23 Thread simon
How long are the documents ? indexing a large document can be slow (although 2 seconds is very slow indeed). 2011/6/22 Rode González (libnova) r...@libnova.es: Hi ! We are using Zend Search based on Lucene. Our indexing pdf consultations take longer than 2 seconds. We want to change to

Re: Removing duplicate documents from search results

2011-06-23 Thread simon
have you checked out the deduplication process that's available at indexing time ? This includes a fuzzy hash algorithm . http://wiki.apache.org/solr/Deduplication -Simon On Thu, Jun 23, 2011 at 5:55 AM, Pranav Prakash pra...@gmail.com wrote: This approach would definitely work is the two

Re: How to index correctly a text save with tinyMCE

2011-06-23 Thread Ariel
I'am sorry I bother you again but this doesn't work, I have written this configuration in my schema.xml file: charFilter class=solr.HTMLStripCharFilterFactory/ tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter

RE: How to index correctly a text save with tinyMCE

2011-06-23 Thread Steven A Rowe
Hi Ariel, On 6/23/2011 at 12:34 PM, Ariel wrote: But it still doesn't convert the code to the correct character, for instance: Espaamp;ntilde;a must be converted to España but it still remains as Espaamp;ntilde;a. So it looks like your text processing tool(s) escape markup meta-characters

Re: How to index correctly a text save with tinyMCE

2011-06-23 Thread Marek Tichy
Or fix the problem at it's source, i think you need to google for entity_encoding : raw on tinyMCE. Hi Ariel, On 6/23/2011 at 12:34 PM, Ariel wrote: But it still doesn't convert the code to the correct character, for instance: Espaamp;ntilde;a must be converted to España but it still

Server Restart Required for Schema Changes After Document Delete All?

2011-06-23 Thread Brandon Fish
Are there any schema changes that would cause problems with the following procedure from the FAQhttp://wiki.apache.org/solr/FAQ#How_can_I_rebuild_my_index_from_scratch_if_I_change_my_schema.3F ? 1.Use the match all docs query in a delete by query command before shutting down Solr:

Re: velocity: hyperlinking to documents

2011-06-23 Thread okayndc
Yes, from the handy /browse view. I'll give this a try. Thanks Erik! -- View this message in context: http://lucene.472066.n3.nabble.com/velocity-hyperlinking-to-documents-tp3091504p3100957.html Sent from the Solr - User mailing list archive at Nabble.com.

Updating the data-config file

2011-06-23 Thread sabman
So I have some RSS feeds that I want to index using Solr. I am using the DataImportHandler and I have added the instructions on how to parse the feeds in the data-config file. Now if a user wants to add more RSS feeds to index, do I have to programatically instruct Solr to update the config

testing subscription.

2011-06-23 Thread Esteban Donato

Re: Updating the data-config file

2011-06-23 Thread Ahmet Arslan
So I have some RSS feeds that I want to index using Solr. I am using the DataImportHandler and I have added the instructions on how to parse the feeds in the data-config file. Now if a user wants to add more RSS feeds to index, do I have to programatically instruct Solr to update the

Re: How to index correctly a text save with tinyMCE

2011-06-23 Thread Ariel
Steven A Rowe the solution you have proposed doesn't work, thanks anyway. Regards On 6/23/11, Steven A Rowe sar...@syr.edu wrote: Hi Ariel, On 6/23/2011 at 12:34 PM, Ariel wrote: But it still doesn't convert the code to the correct character, for instance: Espaamp;ntilde;a must be converted

Re: Updating the data-config file

2011-06-23 Thread sabman
So you mean I cannot update the data-config programmatically? I don't understand how the request parameters be of use to me. This is how my data-config file looks: dataConfig dataSource type=HttpDataSource / document entity name=slashdot

Re: Updating the data-config file

2011-06-23 Thread Ahmet Arslan
So you mean I cannot update the data-config programmatically? Yes you can update it, and reload it via command dataimport?command=reload-config. However there is no built-in mechanism for this in solr. I don't understand how the request parameters be of use to me. May be you can use

Re: Understanding query explain information

2011-06-23 Thread Alexander Ramos Jardim
Yes, I am using synonims in index time. 2011/6/22 lee carroll lee.a.carr...@googlemail.com Hi are you using synonyms ? On 22 June 2011 10:30, Alexander Ramos Jardim alexander.ramos.jar...@gmail.com wrote: Hi guys, I am getting some doubts about how to correctly understand the

Garbage Collection: I have given bad advice in the past!

2011-06-23 Thread Shawn Heisey
In the past I have told people on this list and in the IRC channel #solr what I use for Java GC settings. A couple of days ago, I cleaned up my testing methodology to more closely mimic real production queries, and discovered that my GC settings were woefully inadequate. Here's what I was

Re: Updating the data-config file

2011-06-23 Thread sabman
Ahh! Thats interesting! I understand what you mean. Since RSS and Atom feeds have the same structure parsing them would be the same but I can do the for each different URLs. These URLs can be obtained from a db, a file or through the request parameters, right? -- View this message in context:

how to index data in solr form database automatically

2011-06-23 Thread Romi
I have MySql database for my application. i implemented solr search and used dataimporthandler(DIH)to index data from database into solr. my question is: is there any way that if database gets updated then my solr indexes automatically gets update for new data added in the database. . It means i