Re: Scaling Lucene to 1bln docs

2010-08-10 Thread prashant ullegaddi
*** - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org -- Thanks and Regards, Prashant Ullegaddi, Search and Information Extraction Lab, IIIT-Hyderabad, India.

Retrieving field information for each hit when using MultiFieldQueryParser

2010-02-03 Thread prashant ullegaddi
there got to be another fast alternative to achieve the same. -- Thanks and Regards, Prashant Ullegaddi, Search and Information Extraction Lab, IIIT-Hyderabad, INDIA.

Re: How to give a score for all documents?

2009-08-21 Thread prashant ullegaddi
If you want to modify the way Lucene scores documents, I guess you need to extend Similarity class and provide your own implementation. Take a look at: http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/DefaultSimilarity.html

How to normalize Lucene score?

2009-08-16 Thread prashant ullegaddi
Hi, How to normalize the Lucene score to the range [0, 1]? Thanks, Prashant.

What happens after merging?

2009-08-05 Thread prashant ullegaddi
Hi, I've some indexes. As you all know, each has these files: _0.fdt _0.fdx _hqy.fnm _hqy.frq _hqy.nrm _hqy.prx _hqy.tii _hqy.tis segments_2 segments.gen Once I merge those indexes into single index by (IndexWriter's addIndexes()), the merged index has only 3 files: _0.cfs segments_2

Re: How to improve search time?

2009-08-04 Thread prashant ullegaddi
to me. The distinction is yours to draw On Tue, Aug 4, 2009 at 10:08 AM, prashant ullegaddi prashullega...@gmail.com wrote: I'm running it on Quadcore, 2.4GHz each, 4GB RAM. Prashant. On Tue, Aug 4, 2009 at 8:38 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote

Re: How to improve search time?

2009-08-04 Thread prashant ullegaddi
On Tue, Aug 4, 2009 at 10:08 AM, prashant ullegaddi prashullega...@gmail.com wrote: I'm running it on Quadcore, 2.4GHz each, 4GB RAM. Prashant. On Tue, Aug 4, 2009 at 8:38 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: With such a large index

Re: How to improve search time?

2009-08-03 Thread prashant ullegaddi
performance numbers, you really ought to tell us about your hardware, types of queries, etc. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: prashant ullegaddi prashullega

Weird behaviour

2009-08-02 Thread prashant ullegaddi
Hi, I've indexed some 50million documents. I've indexed the target URL of each document as url field by using StandardAnalyzer with index.ANALYZED. Suppose, there is a wikipedia page with title:Rahul Dravid and url: http://en.wikipedia.org/wiki/Rahul_Dravid. But when I search for +title:Rahul

Re: Weird behaviour

2009-08-02 Thread prashant ullegaddi
that exists in the index (look at the last 3 tokens produced by the Analyzer, in the output above). 3) ur:entire string also works, since you index all of it under the url field. Does this explain the behavior you see? Shai On Sun, Aug 2, 2009 at 1:27 PM, prashant ullegaddi prashullega

Re: Weird behaviour

2009-08-02 Thread prashant ullegaddi
(document.toString()) before you add it to the index, and paste the output here? Shai On Sun, Aug 2, 2009 at 4:47 PM, prashant ullegaddi prashullega...@gmail.com wrote: Firstly, I'm indexing the string in url field only. I've never used Luke, I don't know how to use. What I'm trying

Re: Weird behaviour

2009-08-02 Thread prashant ullegaddi
Hi Phil, The query you gave did work. Well, that proves StandardAnalyzer has a different way of tokenizing URLs. Thanks, Prashant. On Sun, Aug 2, 2009 at 11:22 PM, Phil Whelan phil...@gmail.com wrote: Hi Prashant, I agree with Shai, that using Luke and printing out what the Document looks

Re: Weird behaviour

2009-08-02 Thread prashant ullegaddi
receives the HOST token type, and breaks it further to its components (e.g., extract en, wikipedia and org). You can also return the original HOST token and its components. I hope this helps. Shai On Sun, Aug 2, 2009 at 8:58 PM, prashant ullegaddi prashullega...@gmail.com wrote: Hi

How to improve search time?

2009-08-02 Thread prashant ullegaddi
Hi, I've a single index of size 87GB containing around 50M documents. When I search for any query, best search time I observed was 8sec. And when query is expanded with synonyms, search takes minutes (~ 2-3min). Is there a better way to search so that overall search time reduces? Thanks,

Re: Term's frequency

2009-07-31 Thread prashant ullegaddi
Given a term say apache, I want to look up the lucene index programmatically to find out its frequency in the corpus. On Fri, Jul 31, 2009 at 12:23 AM, oh...@cox.net wrote: prashant ullegaddi prashullega...@gmail.com wrote: How to get the number of times a term occurs in the Lucene

Re: Term's frequency

2009-07-31 Thread prashant ullegaddi
Thanks Ahmet. This answers my question. On Fri, Jul 31, 2009 at 1:30 PM, AHMET ARSLAN iori...@yahoo.com wrote: Given a term say apache, I want to look up the lucene index programmatically to find out its frequency in the corpus. I think you are asking collection frequency of a term. Term

Re: Boosting Search Results

2009-07-31 Thread prashant ullegaddi
It might be because there are hardly any documents containing both the words. Try exact search: \tall fat\ On Fri, Jul 31, 2009 at 3:31 PM, bourne71 gary...@live.com wrote: Hi, new here. I recently started using lucene and had encounter a problem.I crawl and index a number of documents.

Re: Is there any difference between using QueryParser and MultiFieldQueryParser when have single default search field ?

2009-07-31 Thread prashant ullegaddi
In MultiFieldQueryParser, you can mention different fields of the document which can be searched for E.g. in contents of the document, if you index different fields such as URL, BOLD, ITALIC, you can search over all of them. Additionally, there is provision to boost a field over the other as

Term's frequency

2009-07-30 Thread prashant ullegaddi
How to get the number of times a term occurs in the Lucene index? Regards, Prashant.

Re: indexing 100GB of data

2009-07-22 Thread prashant ullegaddi
Yes you can use Hadoop with Lucene. Borrow some code from Nutch. Look at org.apache.nutch.indexer.IndexerMapReduce and org.apache.nutch.indexer. Indexer. Prashant. On Wed, Jul 22, 2009 at 2:00 PM, m.harig m.ha...@gmail.com wrote: Thanks Shai So there won't be problem when

Re: PageRanking with Lucene

2009-07-22 Thread prashant ullegaddi
, prashant ullegaddi wrote: Hi, We have some 50M pages, and we also have computed PageRanks of those pages. What's the best way to combine lucene's score with PageRank? Regards, Prashant. -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene

PageRanking with Lucene

2009-07-19 Thread prashant ullegaddi
Hi, We have some 50M pages, and we also have computed PageRanks of those pages. What's the best way to combine lucene's score with PageRank? Regards, Prashant.

Re: Unable to find: org.apache.lucene.index.memory.AnalyzerUtil

2009-07-17 Thread prashant ullegaddi
, prashant ullegaddi prashullega...@gmail.com wrote: Hi I'm unable to find this class in lucene-core-2.4.1.jar. Is there other jar file I need to download to get this? Regards, Prashant.

Re: Unable to do exact search with Lucene.

2009-07-17 Thread prashant ullegaddi
* get any hits. Although I admin not getting jakarta lucene in 50M pages seems unlikely. But Ian's suggestion that you start with a smaller index is spot on. Best Erick On Thu, Jul 16, 2009 at 8:42 AM, prashant ullegaddi prashullega...@gmail.com wrote: 50 million HTML pages (part

Unable to do exact search with Lucene.

2009-07-16 Thread prashant ullegaddi
Hi, I tried searching: Apache Jakarta~10 Nothing was returned. What might be wrong? Regards, Prashant.

Re: Unable to do exact search with Lucene.

2009-07-16 Thread prashant ullegaddi
Sorry, subject should have been: Unable to do proximity search. Also, how to do exact search in Lucene? ~ Prashant On Thu, Jul 16, 2009 at 6:04 PM, prashant ullegaddi prashullega...@gmail.com wrote: Hi, I tried searching: Apache Jakarta~10 Nothing was returned. What might be wrong

Re: Unable to do exact search with Lucene.

2009-07-16 Thread prashant ullegaddi
, 2009 at 6:04 PM, prashant ullegaddi prashullega...@gmail.com wrote: Hi, I tried searching: Apache Jakarta~10 Nothing was returned. What might be wrong? Regards, Prashant.

Unable to find: org.apache.lucene.index.memory.AnalyzerUtil

2009-07-16 Thread prashant ullegaddi
Hi I'm unable to find this class in lucene-core-2.4.1.jar. Is there other jar file I need to download to get this? Regards, Prashant.