hi,as you the error messages you listed below,pls put the 'reader.close()'
block to the bottom of method.
i think,if you invoke it first,the infrastructure stream is closed ,so
exceptions is encountered.
ohaya wrote:
Hi,
I changed the beginning of the try to:
try {
Don't overlook Solr: http://lucene.apache.org/solr
Erik
On Aug 1, 2009, at 5:43 AM, mschipperheyn wrote:
http://code.google.com/p/bobo-browse
looks like it may be the ticket.
Marc
--
View this message in context:
http://www.nabble.com/Group-by-in-Lucene---tp13581760p24767693.html
Hi,
I've indexed some 50million documents. I've indexed the target URL of each
document as url field by using
StandardAnalyzer with index.ANALYZED. Suppose, there is a wikipedia page
with title:Rahul Dravid and
url: http://en.wikipedia.org/wiki/Rahul_Dravid.
But when I search for +title:Rahul
You write that you index the string under the url field. Do you also index
it under title? If not, that can explain why title:Rahul Dravid does not
work for you.
Also, did you try to look at the index w/ Luke? It will show you what are
the terms in the index.
Another thing which is always good
Firstly, I'm indexing the string in url field only.
I've never used Luke, I don't know how to use.
What I'm trying to do is search for those documents which are from
some particular site, and have a given title.
On Sun, Aug 2, 2009 at 4:07 PM, Shai Erera ser...@gmail.com wrote:
You write
Hi,
BTW, my indexer app is basically the same as the demo IndexFiles.java. Here's
part of the main:
try {
IndexWriter writer = new IndexWriter(INDEX_DIR, new StandardAnalyzer(),
true, IndexWriter.MaxFieldLength.LIMITED);
System.out.println(Indexing to directory ' +INDEX_DIR+
Hi,
How can I get the score of a span that is the result of SpanQuery.getSpans()
? The score should can be the same for each document, but if it's unique per
span, it's even better.
I tried looking for a way to expose this functionality through the Spans
class but it looks too complicated.
I'm
How do you parse/convert the page to a Document object? Are you sure the
title Rahul Dravid is extracted properly and put in the title field?
You can read about Luke here: http://www.getopt.org/luke/.
Can you do System.out.println(document.toString()) before you add it to the
index, and paste
Hi Jim,
On Sun, Aug 2, 2009 at 1:32 AM, oh...@cox.net wrote:
I first noticed the problem that I'm seeing while working on this latter app.
Basically, what I noticed was that while I was adding 13 documents to the
index, when I listed the path terms, there were only 12 of them.
Field text
Hi Jim,
On Sun, Aug 2, 2009 at 9:08 AM, Phil Whelanphil...@gmail.com wrote:
So then, I reviewed the index using Luke, and what I saw with that was that
there were indeed only 12 path terms (under Term Count on the left),
but, when I clicked the Show Top Terms in Luke, there were 13 terms
Yes, I'm sure that title:Rahul Dravid is extracted properly, and there is
a document relevant to this query as well.
The following query and its results proves it:
Enter query:
Searching for: +title:rahul dravid +url:wiki
4 total matching documents
trec-id: clueweb09-enwp02-13-14368, URL:
Hi Prashant,
I agree with Shai, that using Luke and printing out what the Document
looks like before it goes into the index, are going to be your best
bet for debugging this problem.
The problem you're having is that StandardAnalyzer does not break-up
the hostname into separate terms, as it has
Woops sorry for the confusion!
Mike
On Sat, Aug 1, 2009 at 1:03 PM, Phil Whelanphil...@gmail.com wrote:
Hi Mike,
It's Jibo, not me, having the problem. But thanks for the link. I was
interested to look at the code. Will be buying the book soon.
Phil
On Sat, Aug 1, 2009 at 2:08 AM,
Hi Phil,
The query you gave did work. Well, that proves StandardAnalyzer has a
different way
of tokenizing URLs.
Thanks,
Prashant.
On Sun, Aug 2, 2009 at 11:22 PM, Phil Whelan phil...@gmail.com wrote:
Hi Prashant,
I agree with Shai, that using Luke and printing out what the Document
looks
On Sun, Aug 2, 2009 at 10:58 AM, Andrzej Bialeckia...@getopt.org wrote:
Thank you Phil for spotting this bug - this fix will be included in the next
release of Luke.
Glad to help. Thanks for building this great tool!
Phil
-
You can always create your own Analyzer which creates a TokenStream just
like StandardAnalyzer, but instead of using StandardFilter, write another
TokenFilter which receives the HOST token type, and breaks it further to its
components (e.g., extract en, wikipedia and org). You can also return
the
Thank you Phil and Shai.
I will write a different Analyzer.
On Sun, Aug 2, 2009 at 11:50 PM, Shai Erera ser...@gmail.com wrote:
You can always create your own Analyzer which creates a TokenStream just
like StandardAnalyzer, but instead of using StandardFilter, write another
TokenFilter which
the fact is, plural (as an example) is not supported, and that is one of
the most common things that a person doing some search will expect to
Walid, I'm not sure this is true. Many plurals are supported
(certainly not exceptional cases or broken plurals).
This is no different than the other
Hi Phil,
For problem with my app, it wasn't what you suggested (about the tokens, etc.).
For some later things, my indexer creates both a path field that is analyzed
(and thus tokenized, etc.) and another field, fullpath, which is not analyzed
(and thus, not tokenized).
The problem with my
Hi,
I thought that, in the code that I posted, there was a close() in the finally?
Or, are you saying that when an IndexReader is opened, that that somehow
persists in the system, even past my Java app terminating?
FYI, I'm doing this testing on Windows, under Eclipse...
Jim
se3g2011
Hi Jim,
On Sun, Aug 2, 2009 at 12:12 PM, oh...@cox.net wrote:
i.e., I was ignoring the 1st term in the TermEnum (since the .next() bumps
the TermEnum to the 2nd term, initially).
Great! Glad you found the problem. I couldn't see it.
Phil
I've seen Eclipse get into weird states, but I don't think that's your
problem.
You open the IndexReader and set up a TermEnum on it. Then, no matter
what you close the underlying IndexReader in the finally block. Then later
you use the TermEnum *even though the underlying reader has been
Hello,
I have question about KEYWORD type and searching/updating. I am getting
strange behavior that I can't quite comprehend.
My index is created using standard analyzer, which used for writing and
searching. It has three fields
userpin - alphanumeric field which is stored as TEXT
documentkey
Hello,
I have question about KEYWORD type and searching/updating. I am getting
strange behavior that I can't quite comprehend.
My index is created using standard analyzer, which used for writing and
searching. It has three fields
userpin - alphanumeric field which is stored as TEXT
Thanks for all the reply. It help me to understand problem better, but is it
possible to create a query that will give additional boost to the results if
and only if both of the word is found inside the results. This will
definitely make sure that the results will be in the higher up of the list.
Hi Satish,
Lucene doesn't enforce an index schema, so each document can have a different
set of fields. It sounds like you need to write a custom indexer that follows
your custom rules and creates Lucene Documents with different Fields, depending
on what you want indexed.
You also mention
Hi,
I've a single index of size 87GB containing around 50M documents. When I
search for any query,
best search time I observed was 8sec. And when query is expanded with
synonyms, search takes
minutes (~ 2-3min). Is there a better way to search so that overall search
time reduces?
Thanks,
Hi Prashant,
Take a look at this...
http://wiki.apache.org/lucene-java/ImproveSearchingSpeed
Cheers,
Phil
On Sun, Aug 2, 2009 at 9:33 PM, prashant
ullegaddiprashullega...@gmail.com wrote:
Hi,
I've a single index of size 87GB containing around 50M documents. When I
search for any query,
hello there
i like to know about the Boosting Search results thing
thanks
--- On Sun, 8/2/09, bourne71 gary...@live.com wrote:
From: bourne71 gary...@live.com
Subject: Re: Boosting Search Results
To: java-user@lucene.apache.org
Date: Sunday, August 2, 2009, 8:14 PM
Thanks for all the reply.
29 matches
Mail list logo