Pooja,
You could use UIMA (or any other) Parts of Speech Tagger. You could read a
little more about it here.
http://uima.apache.org/downloads/sandbox/hmmTaggerUsersGuide/hmmTaggerUsersGuide.html#sandbox.tagger.annotatorDescriptor
This would help you annotate and segregate nouns from verbs in the
Did you do manual copy of index from Master to Slave of servers. I suppose,
it won't be copied properly.
If this is the case, then you can check the size of indexes on both servers.
Otherwise, you would've to recreate the indexes.
Thanx
Pravesh
--
View this message in context:
Hi Pravesh,
I was just indexing some documents remotely on a single node instance
when the connection broke.
So, there isn't any manual copy that I did.
I think I will go ahead and re-index. Just curious to know, if there
is any option to specify the check-point for last commit and rollback
to
Hello,
I have change my db dates to the correct format like 2011-01-11T00:00:00Z.
Now i have the following data:
Manchester Store2011-01-01T00:00:00Z
2011-31-03T00:00:00Z 18:00
Manchester Store2011-01-04T00:00:00Z
2011-31-12T00:00:00Z 20:00
On 06/20/2011 01:51 PM, Robert Muir wrote:
you must use junit 4.7.x, not junit 4.8.x
Is there a way around this?
Depending on a specific Junit version is bound to cause problems when
working with other packages. For example Spring 2.5.6 testframework does
not work newer junit versions than 4.4.
Hello,
I am trying to normalize values of a certain field, and then use them in a
function query. For that I need to know the maximum and minimum values the
field gets. I am thinking of using the scale(x, minTarget, maxTarget) query
function, but i read in solr book (Solr 1.4 enterprise search
How can I remove very similar documents from search results?
My scenario is that there are documents in the index which are almost
similar (people submitting same stuff multiple times, sometimes different
people submitting same stuff). Now when a search is performed for keyword,
in the top N
What you need to do, is to calculate some HASH (using any message digest
algorithm you want, md5, sha-1 and so on), then do some reading on solr
field collapse capabilities. Should not be too complicated..
*Omri Cohen*
Co-founder @ yotpo.com | o...@yotpo.com | +972-50-7235198 | +972-3-6036295
This approach would definitely work is the two documents are *Exactly* the
same. But this is very fragile. Even if one extra space has been added, the
whole hash would change. What I am really looking for is some %age
similarity between documents, and remove those documents which are more than
95%
Hi Roy,
You have no relationship between time and date due to the
de-normalising of your data.
I don't have a good answer to this and I guess this is a classic question.
One approach is maybe to do the following:
make sure you have field collapsing available. trunk or a patch maybe
index not
On Thu, Jun 23, 2011 at 4:10 AM, Tarjei Huse tar...@scanmine.com wrote:
On 06/20/2011 01:51 PM, Robert Muir wrote:
you must use junit 4.7.x, not junit 4.8.x
Is there a way around this?
No, the only thing option we can do is decide to require 4.8
Depending on a specific Junit version is
I am working on that, I hope to have an answer within a month or so.
On Tue, Jun 21, 2011 at 9:51 AM, roySolr royrutten1...@gmail.com wrote:
Are you working on some changes to support earlier versions of PHP?
--
View this message in context:
Would you care to even index the duplicate documents? Finding duplicacy in
content fields would be not so easy as in some untokenized/keyword field.
May be you could do this filtering at indexing time before sending the
document to SOLR. Then the question comes, which one document should go(from
a
Hello Lee,
I thought maybe this is a solution:
I can index every night the correct openinghours for next day. So
tonight(00:01) i can index the openinghours for 2011-24-06. My query in my
dih can looks like this:
select *
i am using dismax to boost my field as placeName^1.8 schemeName^1.5 text^1.0,
now I also want to boost my results with respect to distance to show closest
areas first, i sort it with geodist but it show irrelevant results on top,
i also tried
q={!boost b=recip(geodist(50.1, -0.86, myGeoField),
The usual ant clean won't help either. A fresh check out did the trick.
On Thursday 23 June 2011 03:24:42 Yonik Seeley wrote:
I just tried branch_3x and couldn't reproduce this.
Looks like maybe there is something wrong with your build, or some old
class files left over somewhere being picked
The Wiki page describes a design for a scheduler, which has not been
committed to Solr yet (I checked). I did see a patch the other day
(see https://issues.apache.org/jira/browse/SOLR-2305) but it didn't
look well tested.
I think that you're basically stuck with something like cron at this
time.
How long are the documents ? indexing a large document can be slow
(although 2 seconds is very slow indeed).
2011/6/22 Rode González (libnova) r...@libnova.es:
Hi !
We are using Zend Search based on Lucene. Our indexing pdf consultations
take longer than 2 seconds.
We want to change to
have you checked out the deduplication process that's available at
indexing time ? This includes a fuzzy hash algorithm .
http://wiki.apache.org/solr/Deduplication
-Simon
On Thu, Jun 23, 2011 at 5:55 AM, Pranav Prakash pra...@gmail.com wrote:
This approach would definitely work is the two
I'am sorry I bother you again but this doesn't work, I have written
this configuration in my schema.xml file:
charFilter class=solr.HTMLStripCharFilterFactory/
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/
filter
Hi Ariel,
On 6/23/2011 at 12:34 PM, Ariel wrote:
But it still doesn't convert the code to the correct character, for
instance: Espaamp;ntilde;a must be converted to España but it still
remains as Espaamp;ntilde;a.
So it looks like your text processing tool(s) escape markup meta-characters
Or fix the problem at it's source, i think you need to google for
entity_encoding : raw
on tinyMCE.
Hi Ariel,
On 6/23/2011 at 12:34 PM, Ariel wrote:
But it still doesn't convert the code to the correct character, for
instance: Espaamp;ntilde;a must be converted to España but it still
Are there any schema changes that would cause problems with the following
procedure from the
FAQhttp://wiki.apache.org/solr/FAQ#How_can_I_rebuild_my_index_from_scratch_if_I_change_my_schema.3F
?
1.Use the match all docs query in a delete by query command before
shutting down Solr:
Yes, from the handy /browse view.
I'll give this a try. Thanks Erik!
--
View this message in context:
http://lucene.472066.n3.nabble.com/velocity-hyperlinking-to-documents-tp3091504p3100957.html
Sent from the Solr - User mailing list archive at Nabble.com.
So I have some RSS feeds that I want to index using Solr. I am using the
DataImportHandler and I have added the instructions on how to parse the
feeds in the data-config file.
Now if a user wants to add more RSS feeds to index, do I have to
programatically instruct Solr to update the config
So I have some RSS feeds that I want
to index using Solr. I am using the
DataImportHandler and I have added the instructions on how
to parse the
feeds in the data-config file.
Now if a user wants to add more RSS feeds to index, do I
have to
programatically instruct Solr to update the
Steven A Rowe the solution you have proposed doesn't work, thanks anyway.
Regards
On 6/23/11, Steven A Rowe sar...@syr.edu wrote:
Hi Ariel,
On 6/23/2011 at 12:34 PM, Ariel wrote:
But it still doesn't convert the code to the correct character, for
instance: Espaamp;ntilde;a must be converted
So you mean I cannot update the data-config programmatically? I don't
understand how the request parameters be of use to me.
This is how my data-config file looks:
dataConfig
dataSource type=HttpDataSource /
document
entity name=slashdot
So you mean I cannot update the
data-config programmatically?
Yes you can update it, and reload it via command
dataimport?command=reload-config. However there is no built-in mechanism for
this in solr.
I don't
understand how the request parameters be of use to me.
May be you can use
Yes, I am using synonims in index time.
2011/6/22 lee carroll lee.a.carr...@googlemail.com
Hi are you using synonyms ?
On 22 June 2011 10:30, Alexander Ramos Jardim
alexander.ramos.jar...@gmail.com wrote:
Hi guys,
I am getting some doubts about how to correctly understand the
In the past I have told people on this list and in the IRC channel #solr
what I use for Java GC settings. A couple of days ago, I cleaned up my
testing methodology to more closely mimic real production queries, and
discovered that my GC settings were woefully inadequate. Here's what I
was
Ahh! Thats interesting!
I understand what you mean. Since RSS and Atom feeds have the same structure
parsing them would be the same but I can do the for each different URLs.
These URLs can be obtained from a db, a file or through the request
parameters, right?
--
View this message in context:
I have MySql database for my application. i implemented solr search and used
dataimporthandler(DIH)to index data from database into solr. my question is:
is there any way that if database gets updated then my solr indexes
automatically gets update for new data added in the database. . It means i
34 matches
Mail list logo