***
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
--
Thanks and Regards,
Prashant Ullegaddi,
Search and Information Extraction Lab,
IIIT-Hyderabad, India.
there
got to be another fast alternative to achieve the same.
--
Thanks and Regards,
Prashant Ullegaddi,
Search and Information Extraction Lab,
IIIT-Hyderabad, INDIA.
If you want to modify the way Lucene scores documents, I guess you need to
extend Similarity class and provide your own implementation. Take a look at:
http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/DefaultSimilarity.html
Hi,
How to normalize the Lucene score to the range [0, 1]?
Thanks,
Prashant.
Hi,
I've some indexes. As you all know, each has these files:
_0.fdt _0.fdx _hqy.fnm _hqy.frq _hqy.nrm _hqy.prx _hqy.tii _hqy.tis
segments_2 segments.gen
Once I merge those indexes into single index by (IndexWriter's
addIndexes()), the merged index has
only 3 files:
_0.cfs segments_2
to me. The
distinction is yours to draw
On Tue, Aug 4, 2009 at 10:08 AM, prashant ullegaddi
prashullega...@gmail.com wrote:
I'm running it on Quadcore, 2.4GHz each, 4GB RAM.
Prashant.
On Tue, Aug 4, 2009 at 8:38 AM, Otis Gospodnetic
otis_gospodne...@yahoo.com
wrote
On Tue, Aug 4, 2009 at 10:08 AM, prashant ullegaddi
prashullega...@gmail.com wrote:
I'm running it on Quadcore, 2.4GHz each, 4GB RAM.
Prashant.
On Tue, Aug 4, 2009 at 8:38 AM, Otis Gospodnetic
otis_gospodne...@yahoo.com
wrote:
With such a large index
performance numbers, you really ought to tell us about your
hardware, types of queries, etc.
Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
- Original Message
From: prashant ullegaddi prashullega
Hi,
I've indexed some 50million documents. I've indexed the target URL of each
document as url field by using
StandardAnalyzer with index.ANALYZED. Suppose, there is a wikipedia page
with title:Rahul Dravid and
url: http://en.wikipedia.org/wiki/Rahul_Dravid.
But when I search for +title:Rahul
that exists
in
the index (look at the last 3 tokens produced by the Analyzer, in the
output
above).
3) ur:entire string also works, since you index all of it under the
url
field.
Does this explain the behavior you see?
Shai
On Sun, Aug 2, 2009 at 1:27 PM, prashant ullegaddi
prashullega
(document.toString()) before you add it to the
index, and paste the output here?
Shai
On Sun, Aug 2, 2009 at 4:47 PM, prashant ullegaddi
prashullega...@gmail.com
wrote:
Firstly, I'm indexing the string in url field only.
I've never used Luke, I don't know how to use.
What I'm trying
Hi Phil,
The query you gave did work. Well, that proves StandardAnalyzer has a
different way
of tokenizing URLs.
Thanks,
Prashant.
On Sun, Aug 2, 2009 at 11:22 PM, Phil Whelan phil...@gmail.com wrote:
Hi Prashant,
I agree with Shai, that using Luke and printing out what the Document
looks
receives the HOST token type, and breaks it further to
its
components (e.g., extract en, wikipedia and org). You can also return
the original HOST token and its components.
I hope this helps.
Shai
On Sun, Aug 2, 2009 at 8:58 PM, prashant ullegaddi
prashullega...@gmail.com
wrote:
Hi
Hi,
I've a single index of size 87GB containing around 50M documents. When I
search for any query,
best search time I observed was 8sec. And when query is expanded with
synonyms, search takes
minutes (~ 2-3min). Is there a better way to search so that overall search
time reduces?
Thanks,
Given a term say apache, I want to look up the lucene index
programmatically to find out its frequency in the corpus.
On Fri, Jul 31, 2009 at 12:23 AM, oh...@cox.net wrote:
prashant ullegaddi prashullega...@gmail.com wrote:
How to get the number of times a term occurs in the Lucene
Thanks Ahmet. This answers my question.
On Fri, Jul 31, 2009 at 1:30 PM, AHMET ARSLAN iori...@yahoo.com wrote:
Given a term say apache, I want to look up the lucene index
programmatically to find out its frequency in the corpus.
I think you are asking collection frequency of a term. Term
It might be because there are hardly any documents containing both the
words.
Try exact search: \tall fat\
On Fri, Jul 31, 2009 at 3:31 PM, bourne71 gary...@live.com wrote:
Hi, new here.
I recently started using lucene and had encounter a problem.I crawl and
index a number of documents.
In MultiFieldQueryParser, you can mention different fields of the document
which can
be searched for
E.g. in contents of the document, if you index different fields such as URL,
BOLD, ITALIC, you can search over all of them.
Additionally, there is provision to boost a field over the other as
How to get the number of times a term occurs in the Lucene index?
Regards,
Prashant.
Yes you can use Hadoop with Lucene. Borrow some code from Nutch. Look at
org.apache.nutch.indexer.IndexerMapReduce and org.apache.nutch.indexer.
Indexer.
Prashant.
On Wed, Jul 22, 2009 at 2:00 PM, m.harig m.ha...@gmail.com wrote:
Thanks Shai
So there won't be problem when
, prashant ullegaddi wrote:
Hi,
We have some 50M pages, and we also have computed PageRanks of those
pages.
What's the best way to combine lucene's score with PageRank?
Regards,
Prashant.
--
Grant Ingersoll
http://www.lucidimagination.com/
Search the Lucene
Hi,
We have some 50M pages, and we also have computed PageRanks of those pages.
What's the best way to combine lucene's score with PageRank?
Regards,
Prashant.
, prashant ullegaddi
prashullega...@gmail.com wrote:
Hi
I'm unable to find this class in lucene-core-2.4.1.jar. Is there other
jar
file I need to
download to get this?
Regards,
Prashant.
* get any hits. Although I admin not getting jakarta lucene in
50M pages seems unlikely.
But Ian's suggestion that you start with a smaller index is spot on.
Best
Erick
On Thu, Jul 16, 2009 at 8:42 AM, prashant ullegaddi
prashullega...@gmail.com wrote:
50 million HTML pages (part
Hi,
I tried searching:
Apache Jakarta~10
Nothing was returned. What might be wrong?
Regards,
Prashant.
Sorry, subject should have been: Unable to do proximity search.
Also, how to do exact search in Lucene?
~
Prashant
On Thu, Jul 16, 2009 at 6:04 PM, prashant ullegaddi
prashullega...@gmail.com wrote:
Hi,
I tried searching:
Apache Jakarta~10
Nothing was returned. What might be wrong
, 2009 at 6:04 PM, prashant ullegaddi
prashullega...@gmail.com wrote:
Hi,
I tried searching:
Apache Jakarta~10
Nothing was returned. What might be wrong?
Regards,
Prashant.
Hi
I'm unable to find this class in lucene-core-2.4.1.jar. Is there other jar
file I need to
download to get this?
Regards,
Prashant.
28 matches
Mail list logo