from:"\"Pleasant, Tracy\""

lucene usage without website

2004-03-24 Thread Pleasant, Tracy


I want to create a knowledgebase but it needs to be something that does
not require a server to run constantly (like with using jsp). I just
needs to run on the Windows platform.  Lucene works well with Windows
using an applet right?

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Lucene and Mysql

2003-12-16 Thread Pleasant, Tracy

You would just take the items from mysql database and create a document for each 
record. Then index all the documents.

-Original Message-
From: Stefan Trcko [mailto:[EMAIL PROTECTED]
Sent: Tuesday, December 16, 2003 3:31 PM
To: [EMAIL PROTECTED]
Subject: Lucene and Mysql

Hello

I'm new to Lucene. I want users can search text which is stored in mysql database.
Is there any tutorial how to implement this kind of search feature.

Best regards,
Stefan

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Word Documents

2003-12-15 Thread Pleasant, Tracy

As a spinoff, I was wondering if anyone has been happy with indexing and searching 
Word docs. What about reading the contents? Any problems?


-Original Message-
From: Ryan Ackley [mailto:[EMAIL PROTECTED]
Sent: Friday, December 12, 2003 5:59 PM
To: Zhou, Oliver; Lucene Users List
Subject: Re: textmining: document title


Check out jakarta POI (http://jakarta.apache.org/poi ) particularly the HPSF
API. It allows you to extract metadata like Title, Author, etc. from OLE
documents.

-Ryan

- Original Message - 
From: "Zhou, Oliver" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Friday, December 12, 2003 5:26 PM
Subject: textmining: document title


> Ryan,
>
> I'm using textmining and lucene to index word documents but don't know how
> to get word document title.  Your advice on this matter is appreciated.
>
> Thanks,
> Oliver Zhou
>
>


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Unindexed fields

2003-12-08 Thread Pleasant, Tracy

If you don't index something then it's not going to be searched.

-Original Message-
From: Chong, Herb [mailto:[EMAIL PROTECTED]
Sent: Monday, December 08, 2003 11:14 AM
To: Lucene Users List
Subject: Unindexed fields

is there a limit to the size of an UnIndexed field? i changed my code to increase the 
maximum string size per document from 300 bytes to 10,000 and although the index run 
completes without errors, i never find any documents while searching.

Herb

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Returning one result

2003-12-05 Thread Pleasant, Tracy

Yes it is in the list of arrays that I want searched.

-Original Message-
From: Dror Matalon [mailto:[EMAIL PROTECTED]
Sent: Friday, December 05, 2003 3:32 PM
To: Lucene Users List
Subject: Re: Returning one result



On Fri, Dec 05, 2003 at 03:14:08PM -0500, Pleasant, Tracy wrote:
> What do you mean 'add' in MultiFieldQueryParser?  I am using all the
> fields 

Sorry, that was wrong. What I meant to say is are you adding the field
to the array of fields that need to be searched? 

You need to use a MultiFieldQueryParser and pass it the array of fields
that you want searched.

Dror

> 
> When I index it does 
> 
>  add (Field.Keyword(..,..))
> 
> 
> But I don't want the user to have to type ID: It would be
> nice to just type ID Number. On your site if you just put: 11183 in
the
> search box there are no results. 
> 
> well, right now I'll just do it as text and query that field for the
id
> # to display the document.  It can't hurt, right? :)  Unless the
Keyword
> is a better way
> 
> 
> 
> -Original Message-
> From: Dror Matalon [mailto:[EMAIL PROTECTED]
> Sent: Friday, December 05, 2003 3:06 PM
> To: Lucene Users List
> Subject: Re: Returning one result
> 
> 
> On Fri, Dec 05, 2003 at 02:45:34PM -0500, Pleasant, Tracy wrote:
> > Maybe we are having some communication issues. 
> > 
> > At any rate, I did index it as a KEYWORD and when displaying used
the
> > TermQuery.
> > 
> > The only problem with this though is by storing the ID (i.e. AR345)
as
> a
> > Keyword, if I search for AR345 no results are returned when I use
the
> > MultiFieldQueryParser .
> > 
> > *sigh* *arg*
> 
> OK. 
> 
> Go to http://www.fastbuzz.com/search/index.jsp and type "lucene"
without
> the quotes  and hit search. You get results from different
channels/rss
> feeds.
> 
> Now type "lucene channel:11183" without the quotes and hit search. You
> get results only from Java-Channel. 
> 
> We're inserting the field channel as a keyword, and it does what I
> understand you want to use AR345.
> 
> I would guess that in MultiFieldQueryParser you are not doing an add()
> of the field for AR345 which is why the search fails. 
> 
> Regards,
> 
> Dror
> 
> 
> > 
> > 
> > 
> > -Original Message-
> > From: Erik Hatcher [mailto:[EMAIL PROTECTED]
> > Sent: Friday, December 05, 2003 2:13 PM
> > To: Lucene Users List
> > Subject: Re: Returning one result
> > 
> > 
> > On Friday, December 5, 2003, at 01:25  PM, Pleasant, Tracy wrote:
> > > Say ID is Ar3453 .. well the user may want to search for Ar3453,
so
> in
> > > order for it to be searchable then it would have to be indexed and
> not
> > 
> > > a
> > > keyword.
> > 
> > *arg* - we're having a serious communication issue here.  My advice
to
> 
> > you is to actually write some simple tests (test-driven learning
using
> 
> > JUnit is a wonderful way to experiement with Lucene, especially
thanks
> 
> > to the RAMDirectory).  Please refer to my articles at java.net as
well
> 
> > as the other great Lucene articles out there.
> > 
> > Let me try again a Field.Keyword *IS* indexed!  Even Lucene's 
> > javadocs say this for this method:
> > 
> >/** Constructs a String-valued Field that is not tokenized, but
is 
> >  >>>indexed<<<
> >  and stored.  Useful for non-text fields, e.g. date or url.  */
> > 
> > [I added the emphasis there]
> > 
> > 
> > > So after using
> > > TermQuery query = new TermQuery(new Term("id", term));
> > >
> > > How would I return the other fields in the document?
> > >
> > > For instance to display a record it would get the record with the
id
> #
> > > and then display the title, contents, etc.
> > 
> > Umm you'd use *exactly* the same way as if you had used 
> > QueryParser.  QueryParser would create a TermQuery for you, in fact,

> > except it would analyze your text first, which is what you want to 
> > avoid, right?
> > 
> > Hits.doc(n) gives you back a Document.  And then 
> > Document.get("fieldName") gives you back the fields (as long as you
> >>> 
> > stored <<< them in the index too).
> > 
> > Again, please attempt some of these things in code.  It is a trivial

> > matter to index and search using RAMDirectory and experiment with 
> > TermQuery, QueryParser, Analy

RE: Returning one result

2003-12-05 Thread Pleasant, Tracy

What do you mean 'add' in MultiFieldQueryParser?  I am using all the
fields 

When I index it does 

 add (Field.Keyword(..,..))


But I don't want the user to have to type ID: It would be
nice to just type ID Number. On your site if you just put: 11183 in the
search box there are no results. 

well, right now I'll just do it as text and query that field for the id
# to display the document.  It can't hurt, right? :)  Unless the Keyword
is a better way



-Original Message-
From: Dror Matalon [mailto:[EMAIL PROTECTED]
Sent: Friday, December 05, 2003 3:06 PM
To: Lucene Users List
Subject: Re: Returning one result


On Fri, Dec 05, 2003 at 02:45:34PM -0500, Pleasant, Tracy wrote:
> Maybe we are having some communication issues. 
> 
> At any rate, I did index it as a KEYWORD and when displaying used the
> TermQuery.
> 
> The only problem with this though is by storing the ID (i.e. AR345) as
a
> Keyword, if I search for AR345 no results are returned when I use the
> MultiFieldQueryParser .
> 
> *sigh* *arg*

OK. 

Go to http://www.fastbuzz.com/search/index.jsp and type "lucene" without
the quotes  and hit search. You get results from different channels/rss
feeds.

Now type "lucene channel:11183" without the quotes and hit search. You
get results only from Java-Channel. 

We're inserting the field channel as a keyword, and it does what I
understand you want to use AR345.

I would guess that in MultiFieldQueryParser you are not doing an add()
of the field for AR345 which is why the search fails. 

Regards,

Dror


> 
> 
> 
> -Original Message-
> From: Erik Hatcher [mailto:[EMAIL PROTECTED]
> Sent: Friday, December 05, 2003 2:13 PM
> To: Lucene Users List
> Subject: Re: Returning one result
> 
> 
> On Friday, December 5, 2003, at 01:25  PM, Pleasant, Tracy wrote:
> > Say ID is Ar3453 .. well the user may want to search for Ar3453, so
in
> > order for it to be searchable then it would have to be indexed and
not
> 
> > a
> > keyword.
> 
> *arg* - we're having a serious communication issue here.  My advice to

> you is to actually write some simple tests (test-driven learning using

> JUnit is a wonderful way to experiement with Lucene, especially thanks

> to the RAMDirectory).  Please refer to my articles at java.net as well

> as the other great Lucene articles out there.
> 
> Let me try again a Field.Keyword *IS* indexed!  Even Lucene's 
> javadocs say this for this method:
> 
>/** Constructs a String-valued Field that is not tokenized, but is 
>  >>>indexed<<<
>  and stored.  Useful for non-text fields, e.g. date or url.  */
> 
> [I added the emphasis there]
> 
> 
> > So after using
> > TermQuery query = new TermQuery(new Term("id", term));
> >
> > How would I return the other fields in the document?
> >
> > For instance to display a record it would get the record with the id
#
> > and then display the title, contents, etc.
> 
> Umm you'd use *exactly* the same way as if you had used 
> QueryParser.  QueryParser would create a TermQuery for you, in fact, 
> except it would analyze your text first, which is what you want to 
> avoid, right?
> 
> Hits.doc(n) gives you back a Document.  And then 
> Document.get("fieldName") gives you back the fields (as long as you
>>> 
> stored <<< them in the index too).
> 
> Again, please attempt some of these things in code.  It is a trivial 
> matter to index and search using RAMDirectory and experiment with 
> TermQuery, QueryParser, Analyzers, etc.
> 
>   Erik
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 

-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.fastbuzz.com
http://www.zapatec.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Returning one result

2003-12-05 Thread Pleasant, Tracy

Thanks, but using it as a Keyword, it will not get returned with my
search results when I use MultiFieldQueryParser.

If I could I would use just parse(query) but that is not a static
method, only parse(query,field,analyzer) is... So when I do that and use
an analyzer, the keyword field isn't searched.



-Original Message-
From: Dror Matalon [mailto:[EMAIL PROTECTED]
Sent: Friday, December 05, 2003 2:14 PM
To: Lucene Users List
Subject: Re: Returning one result


On Fri, Dec 05, 2003 at 01:25:23PM -0500, Pleasant, Tracy wrote:
> What I meant is.
> 
> Say ID is Ar3453 .. well the user may want to search for Ar3453, so in
> order for it to be searchable then it would have to be indexed and not
a
> keyword.

No. You should store it as a keyword. 

>From the javadocs:
Keyword(String name, String value)
  Constructs a String-valued Field that is not tokenized, but is
indexed and stored.


> 
> So after using
> TermQuery query = new TermQuery(new Term("id", term));
> 
> How would I return the other fields in the document?
> 
> For instance to display a record it would get the record with the id #
> and then display the title, contents, etc.
> 
> 
> 
> 
> -Original Message-
> From: Erik Hatcher [mailto:[EMAIL PROTECTED]
> Sent: Friday, December 05, 2003 11:32 AM
> To: Lucene Users List
> Subject: Re: Returning one result
> 
> 
> On Friday, December 5, 2003, at 10:41  AM, Pleasant, Tracy wrote:
> > Maybe I should have been more clear.
> >
> > static Field Keyword(String name, String value)
> >   Constructs a String-valued Field that is not tokenized,
but 
> > is
> > indexed and stored.
> >
> > I need to have it tokenized because people will search for that also

> > and
> > it needs to be searchable.
> 
> Search for *what* also?  Tokenized means that it is broken into pieces

> which will be separate terms.  For example: "see spot" is tokenized 
> into "see" and "spot", and searching for either of those terms will 
> match.
> 
> Just try it and see, please!  :)
> 
> > Should I have two fields - one as a keyword and one as text?
> 
> Depends on what you're doing... but an "id" field to me indicates 
> Field.Keyword to me, only.
> 
> > How would I do that when I want to return search results..
> >
> >  Searcher searcher = new IndexSearcher("index");
> >  String term = request.getParameter("id");
> 
> >  Query query = QueryParser.parse(term, "id", new
> > StandardAnalyzer());
> >
> >  Hits hits  = searcher.search(query);
> >
> > Would it have to be something like:
> >  TermQuery query = ???
> 
> Yes.  TermQuery query = new TermQuery(new Term("id", term));
> 
> Use searcher.search exactly as you did before.  Just don't use 
> QueryParser to construct a query.
> 
>   Erik
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 

-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.fastbuzz.com
http://www.zapatec.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Returning one result

2003-12-05 Thread Pleasant, Tracy

Maybe we are having some communication issues. 

At any rate, I did index it as a KEYWORD and when displaying used the
TermQuery.

The only problem with this though is by storing the ID (i.e. AR345) as a
Keyword, if I search for AR345 no results are returned when I use the
MultiFieldQueryParser .

*sigh* *arg*

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Friday, December 05, 2003 2:13 PM
To: Lucene Users List
Subject: Re: Returning one result

On Friday, December 5, 2003, at 01:25  PM, Pleasant, Tracy wrote:
> Say ID is Ar3453 .. well the user may want to search for Ar3453, so in
> order for it to be searchable then it would have to be indexed and not

> a
> keyword.

*arg* - we're having a serious communication issue here.  My advice to 
you is to actually write some simple tests (test-driven learning using 
JUnit is a wonderful way to experiement with Lucene, especially thanks 
to the RAMDirectory).  Please refer to my articles at java.net as well 
as the other great Lucene articles out there.

Let me try again a Field.Keyword *IS* indexed!  Even Lucene's 
javadocs say this for this method:

   /** Constructs a String-valued Field that is not tokenized, but is 
 >>>indexed<<<
 and stored.  Useful for non-text fields, e.g. date or url.  */

[I added the emphasis there]

> So after using
> TermQuery query = new TermQuery(new Term("id", term));
>
> How would I return the other fields in the document?
>
> For instance to display a record it would get the record with the id #
> and then display the title, contents, etc.

Umm you'd use *exactly* the same way as if you had used 
QueryParser.  QueryParser would create a TermQuery for you, in fact, 
except it would analyze your text first, which is what you want to 
avoid, right?

Hits.doc(n) gives you back a Document.  And then 
Document.get("fieldName") gives you back the fields (as long as you >>> 
stored <<< them in the index too).

Again, please attempt some of these things in code.  It is a trivial 
matter to index and search using RAMDirectory and experiment with 
TermQuery, QueryParser, Analyzers, etc.

Erik

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Returning one result

2003-12-05 Thread Pleasant, Tracy

Also what I am indexing is not a bunch of separate documents - or then
it would be easy to simply have a field called "url" and then the link
would go directly do that document. 

However, there is a text URL with many records
During indexing, a function parses each record and puts each into a
document with appropriate fields. 

When I go to display a particular Document (Lucene Document) I just
query the index for that unique ID rather than go through and parse
through the URL with all the records. 

Wouldn't querying the index for that unique ID be better than going
through that entire page and parsing through it - there is more room for
error that way.  

It's a long story why there isn't a database but it can't be done (don't
ask ... long story). 

-Original Message-
From: Pleasant, Tracy 
Sent: Friday, December 05, 2003 1:25 PM
To: Lucene Users List
Subject: RE: Returning one result

What I meant is.

Say ID is Ar3453 .. well the user may want to search for Ar3453, so in
order for it to be searchable then it would have to be indexed and not a
keyword.

So after using
TermQuery query = new TermQuery(new Term("id", term));

How would I return the other fields in the document?

For instance to display a record it would get the record with the id #
and then display the title, contents, etc.

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Friday, December 05, 2003 11:32 AM
To: Lucene Users List
Subject: Re: Returning one result

On Friday, December 5, 2003, at 10:41  AM, Pleasant, Tracy wrote:
> Maybe I should have been more clear.
>
> static Field Keyword(String name, String value)
>   Constructs a String-valued Field that is not tokenized, but 
> is
> indexed and stored.
>
> I need to have it tokenized because people will search for that also 
> and
> it needs to be searchable.

Search for *what* also?  Tokenized means that it is broken into pieces 
which will be separate terms.  For example: "see spot" is tokenized 
into "see" and "spot", and searching for either of those terms will 
match.

Just try it and see, please!  :)

> Should I have two fields - one as a keyword and one as text?

Depends on what you're doing... but an "id" field to me indicates 
Field.Keyword to me, only.

> How would I do that when I want to return search results..
>
>  Searcher searcher = new IndexSearcher("index");
>  String term = request.getParameter("id");

>  Query query = QueryParser.parse(term, "id", new
> StandardAnalyzer());
>
>  Hits hits  = searcher.search(query);
>
> Would it have to be something like:
>  TermQuery query = ???

Yes.  TermQuery query = new TermQuery(new Term("id", term));

Use searcher.search exactly as you did before.  Just don't use 
QueryParser to construct a query.

Erik

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Returning one result

2003-12-05 Thread Pleasant, Tracy

What I meant is.

Say ID is Ar3453 .. well the user may want to search for Ar3453, so in
order for it to be searchable then it would have to be indexed and not a
keyword.

So after using
TermQuery query = new TermQuery(new Term("id", term));

How would I return the other fields in the document?

For instance to display a record it would get the record with the id #
and then display the title, contents, etc.

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Friday, December 05, 2003 11:32 AM
To: Lucene Users List
Subject: Re: Returning one result

On Friday, December 5, 2003, at 10:41  AM, Pleasant, Tracy wrote:
> Maybe I should have been more clear.
>
> static Field Keyword(String name, String value)
>   Constructs a String-valued Field that is not tokenized, but 
> is
> indexed and stored.
>
> I need to have it tokenized because people will search for that also 
> and
> it needs to be searchable.

Search for *what* also?  Tokenized means that it is broken into pieces 
which will be separate terms.  For example: "see spot" is tokenized 
into "see" and "spot", and searching for either of those terms will 
match.

Just try it and see, please!  :)

> Should I have two fields - one as a keyword and one as text?

Depends on what you're doing... but an "id" field to me indicates 
Field.Keyword to me, only.

> How would I do that when I want to return search results..
>
>  Searcher searcher = new IndexSearcher("index");
>  String term = request.getParameter("id");

>  Query query = QueryParser.parse(term, "id", new
> StandardAnalyzer());
>
>  Hits hits  = searcher.search(query);
>
> Would it have to be something like:
>  TermQuery query = ???

Yes.  TermQuery query = new TermQuery(new Term("id", term));

Use searcher.search exactly as you did before.  Just don't use 
QueryParser to construct a query.

Erik

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Returning one result

2003-12-05 Thread Pleasant, Tracy

Maybe I should have been more clear.

static Field Keyword(String name, String value) 
  Constructs a String-valued Field that is not tokenized, but is
indexed and stored. 

I need to have it tokenized because people will search for that also and
it needs to be searchable. 

Should I have two fields - one as a keyword and one as text? 

How would I do that when I want to return search results..

Right now, in the results page it will have something like
Record AR334 

Then in display_record.jsp:
 Searcher searcher = new IndexSearcher("index");
 String term = request.getParameter("id");

 Query query = QueryParser.parse(term, "id", new
StandardAnalyzer());

 Hits hits  = searcher.search(query);

Would it have to be something like:
 TermQuery query = ???

or 
 Query query = QueryParser.Term("id");

? ? ? 

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Thursday, December 04, 2003 6:18 PM
To: Lucene Users List
Subject: Re: Returning one result

You really should use a TermQuery in this case anyway, rather than 
using QueryParser.  You wouldn't have to worry about the analyzer at 
that point anyway (and I assume you're using Field.Keyword during 
indexing).

Erik

On Thursday, December 4, 2003, at 05:01  PM, Pleasant, Tracy wrote:

> Ok I realized teh Simple Analyzer does not index numbers, so I
switched
> back to Standard.
>
> -Original Message-
> From: Pleasant, Tracy
> Sent: Thursday, December 04, 2003 4:53 PM
> To: Lucene Users List
> Subject: Returning one result
>
>
>  I am indexing a group of items and one field , id, is unique.  When 
> the
> user clicks on a results I want just that one result to show.
>
>  I index and search using SimpleAnalyzer.
>
>
>  Query query_es = QueryParser.parse(query, "id", new
SimpleAnalyzer());
>
>  It should return only one result but returns 200.
>
>
>
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Returning one result

2003-12-05 Thread Pleasant, Tracy

Actually Erik, no I'm using Field.Text
When I used Field.Keyword and tried to get the word for return with
search results it would not display correctly... 

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Thursday, December 04, 2003 6:18 PM
To: Lucene Users List
Subject: Re: Returning one result


You really should use a TermQuery in this case anyway, rather than 
using QueryParser.  You wouldn't have to worry about the analyzer at 
that point anyway (and I assume you're using Field.Keyword during 
indexing).

Erik


On Thursday, December 4, 2003, at 05:01  PM, Pleasant, Tracy wrote:

> Ok I realized teh Simple Analyzer does not index numbers, so I
switched
> back to Standard.
>
> -Original Message-----
> From: Pleasant, Tracy
> Sent: Thursday, December 04, 2003 4:53 PM
> To: Lucene Users List
> Subject: Returning one result
>
>
>  I am indexing a group of items and one field , id, is unique.  When 
> the
> user clicks on a results I want just that one result to show.
>
>  I index and search using SimpleAnalyzer.
>
>
>  Query query_es = QueryParser.parse(query, "id", new
SimpleAnalyzer());
>
>  It should return only one result but returns 200.
>
>
>
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Returning one result

2003-12-05 Thread Pleasant, Tracy

Ok thanks, but still I can't use the Simple analyzer since it won't even
index that whole thing. I 'll give TermQuery a try. Thanks.

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Thursday, December 04, 2003 6:18 PM
To: Lucene Users List
Subject: Re: Returning one result


You really should use a TermQuery in this case anyway, rather than 
using QueryParser.  You wouldn't have to worry about the analyzer at 
that point anyway (and I assume you're using Field.Keyword during 
indexing).

Erik


On Thursday, December 4, 2003, at 05:01  PM, Pleasant, Tracy wrote:

> Ok I realized teh Simple Analyzer does not index numbers, so I
switched
> back to Standard.
>
> -Original Message-
> From: Pleasant, Tracy
> Sent: Thursday, December 04, 2003 4:53 PM
> To: Lucene Users List
> Subject: Returning one result
>
>
>  I am indexing a group of items and one field , id, is unique.  When 
> the
> user clicks on a results I want just that one result to show.
>
>  I index and search using SimpleAnalyzer.
>
>
>  Query query_es = QueryParser.parse(query, "id", new
SimpleAnalyzer());
>
>  It should return only one result but returns 200.
>
>
>
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Returning one result

2003-12-04 Thread Pleasant, Tracy

Ok I realized teh Simple Analyzer does not index numbers, so I switched
back to Standard.

-Original Message-
From: Pleasant, Tracy 
Sent: Thursday, December 04, 2003 4:53 PM
To: Lucene Users List
Subject: Returning one result

 I am indexing a group of items and one field , id, is unique.  When the
user clicks on a results I want just that one result to show.  

 I index and search using SimpleAnalyzer.

 Query query_es = QueryParser.parse(query, "id", new SimpleAnalyzer());

 It should return only one result but returns 200.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Returning one result

2003-12-04 Thread Pleasant, Tracy

 I am indexing a group of items and one field , id, is unique.  When the
user clicks on a results I want just that one result to show.  

 I index and search using SimpleAnalyzer.

 
 Query query_es = QueryParser.parse(query, "id", new SimpleAnalyzer());
  
 It should return only one result but returns 200.
  
 
 
 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: FileDocument.java

2003-11-28 Thread Pleasant, Tracy

Can you give some info in the file type and how you are printing the results. 

Does the contents display correctly?

-Original Message-
From: Tun Lin [mailto:[EMAIL PROTECTED]
Sent: Friday, November 28, 2003 10:59 AM
To: Lucene user list
Subject: FileDocument.java

Hi Lucene experts,
Can you help on this?
I have included the following code in FileDocument to print out the summary but
I have funny output like:
The result after searching, the summary is displayed as below:
ÐÏà¡±á>þÿ
UWþÿÿÿTÿ

ÿ
FileInputStream is = new FileInputStream(f);
try
{
Reader reader = new BufferedReader(new InputStreamReader(is));
char [] buf = new char[512];
reader.read(buf);

String a = new String(buf, 0, 510);
doc.add(Field.Text("contents", reader));
doc.add(Field.UnIndexed("summary", a ) );// return the document
}catch (IOException e)
{
e.printStackTrace();
}

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Eliminating duplicate result

2003-11-26 Thread Pleasant, Tracy

You are searching for the same term and you are searching the same index twice, it 
will return the same results... 

I don't get what you are asking.

-Original Message-
From: Dragan Jotanovic [mailto:[EMAIL PROTECTED]
Sent: Wednesday, November 26, 2003 3:19 AM
To: Lucene Users List
Subject: Re: Eliminating duplicate result

> When you are doing two searches are you searching for two different terms?
> 

No, I am searching for the same term.

What is the easyest way to eliminate duplicate documents if one is doing two searches 
on the same index?

Have anybody done something similar?

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Search Question - not returning desired results

2003-11-26 Thread Pleasant, Tracy

Erik,

I think there may be a typo in the website.

When I run the AnalyzerDemo :

Analzying "xy&z corporation - [EMAIL PROTECTED]"
org.apache.lucene.analysis.standard.StandardAnalyzer:
[xy&z] [corporation] [EMAIL PROTECTED] 

Your website says:

org.apache.lucene.analysis.standard.StandardAnalyzer:
[xy&z] [corporation] [EMAIL PROTECTED] [com] 

When I run it it keeps the entire email '[EMAIL PROTECTED]
but according to your website it separates the '[EMAIL PROTECTED]' from the
'com'

Is there a difference between the versions of Lucene? I'm using 1.3rc2.

Plus I think what I want is a StandardAnalyzer with a little tweaking.
The simple one was fine until I realized that it doesn't do numbers,
which I need as part of my search since numbers is important for what
I'm doing. The Standard does numbers but I need it to be a little
different of course. Thanks for the site.

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Wednesday, November 26, 2003 4:58 AM
To: Lucene Users List
Subject: Re: Search Question - not returning desired results

On Tuesday, November 25, 2003, at 12:11  PM, Pleasant, Tracy wrote:
>
> The documents I have index contain information regarding file names 
> also.
>
> For instance 'return_results.pl' or something like that may be in the 
> document fields.
>
> I am not understanding Lucene's way of searching:
>
> 1. If I search for 'return_results', the search does not return 
> anything
> 2. If I search for 'results' or 'return', the search does not return 
> anything
> 3. If I search for 'results.pl', the search does return the document 
> containg 'return_results.pl'
> 4. If I search for 'results~', the search does return the document 
> containg 'return_results.pl'
> 5. If I search for 'return_results~', the search does not return 
> anything
>
> What is going on?
>
> I want it to return the document in all of the situations.
>
> I also don't want to have to use '~' all the time.

We sure do have a recurring theme lately :)  Analysis!

Please refer to my article at java.net:

http://today.java.net/pub/a/today/2003/07/30/LuceneIntro.html

Look at the AnalysisDemo code.  Copy it over and try it out on the text 
you're using and the Analyzer you're using.  The bracketed text that 
comes out are the "tokens" that you can search on.  It is very very 
important to understand this process and to really know what terms come 
out of text you hand it - otherwise it is a mystery why some things can 
be found and some things cannot despite your expectations to the 
contrary.

A follow-up to the Analysis is querying - and QueryParser has it's own 
set of quirks and caveats related to how things are tokenized/analyzed. 
  And, I've got just the follow-up article for you handy...

http://today.java.net/pub/a/today/2003/11/07/QueryParserRules.html

If you digest both of these articles (analysis one first please) then I 
think a lot of questions that get asked on this list will be implicitly 
answered.  Understanding analysis is key.

Erik

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Search Question - not returning desired results

2003-11-26 Thread Pleasant, Tracy

It seems like what I should we using is something more like a
SimpleAnalyzer or StopAnalyzer.

I've changed my code and the query to use SimpleAnalyzer.

But now i have another question.

Let's say I have 'return_results.pl' in the document in one of the
fields. 

When I search for return_res* or return_res~ it won't return the
document.

But searching for any of these does return the document:
1. 'return_results'
2. 'results' or 'return'
3. 'results.pl'
4. 'results~'
5. 'return_results~'

I guess I have to read more about the '*' and '~'?

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Wednesday, November 26, 2003 4:58 AM
To: Lucene Users List
Subject: Re: Search Question - not returning desired results

On Tuesday, November 25, 2003, at 12:11  PM, Pleasant, Tracy wrote:
>
> The documents I have index contain information regarding file names 
> also.
>
> For instance 'return_results.pl' or something like that may be in the 
> document fields.
>
> I am not understanding Lucene's way of searching:
>
> 1. If I search for 'return_results', the search does not return 
> anything
> 2. If I search for 'results' or 'return', the search does not return 
> anything
> 3. If I search for 'results.pl', the search does return the document 
> containg 'return_results.pl'
> 4. If I search for 'results~', the search does return the document 
> containg 'return_results.pl'
> 5. If I search for 'return_results~', the search does not return 
> anything
>
> What is going on?
>
> I want it to return the document in all of the situations.
>
> I also don't want to have to use '~' all the time.

We sure do have a recurring theme lately :)  Analysis!

Please refer to my article at java.net:

http://today.java.net/pub/a/today/2003/07/30/LuceneIntro.html

Look at the AnalysisDemo code.  Copy it over and try it out on the text 
you're using and the Analyzer you're using.  The bracketed text that 
comes out are the "tokens" that you can search on.  It is very very 
important to understand this process and to really know what terms come 
out of text you hand it - otherwise it is a mystery why some things can 
be found and some things cannot despite your expectations to the 
contrary.

A follow-up to the Analysis is querying - and QueryParser has it's own 
set of quirks and caveats related to how things are tokenized/analyzed. 
  And, I've got just the follow-up article for you handy...

http://today.java.net/pub/a/today/2003/11/07/QueryParserRules.html

If you digest both of these articles (analysis one first please) then I 
think a lot of questions that get asked on this list will be implicitly 
answered.  Understanding analysis is key.

Erik

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Search Question - not returning desired results

2003-11-26 Thread Pleasant, Tracy

Thanks this helps a lot :)

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Wednesday, November 26, 2003 4:58 AM
To: Lucene Users List
Subject: Re: Search Question - not returning desired results

On Tuesday, November 25, 2003, at 12:11  PM, Pleasant, Tracy wrote:
>
> The documents I have index contain information regarding file names 
> also.
>
> For instance 'return_results.pl' or something like that may be in the 
> document fields.
>
> I am not understanding Lucene's way of searching:
>
> 1. If I search for 'return_results', the search does not return 
> anything
> 2. If I search for 'results' or 'return', the search does not return 
> anything
> 3. If I search for 'results.pl', the search does return the document 
> containg 'return_results.pl'
> 4. If I search for 'results~', the search does return the document 
> containg 'return_results.pl'
> 5. If I search for 'return_results~', the search does not return 
> anything
>
> What is going on?
>
> I want it to return the document in all of the situations.
>
> I also don't want to have to use '~' all the time.

We sure do have a recurring theme lately :)  Analysis!

Please refer to my article at java.net:

http://today.java.net/pub/a/today/2003/07/30/LuceneIntro.html

Look at the AnalysisDemo code.  Copy it over and try it out on the text 
you're using and the Analyzer you're using.  The bracketed text that 
comes out are the "tokens" that you can search on.  It is very very 
important to understand this process and to really know what terms come 
out of text you hand it - otherwise it is a mystery why some things can 
be found and some things cannot despite your expectations to the 
contrary.

A follow-up to the Analysis is querying - and QueryParser has it's own 
set of quirks and caveats related to how things are tokenized/analyzed. 
  And, I've got just the follow-up article for you handy...

http://today.java.net/pub/a/today/2003/11/07/QueryParserRules.html

If you digest both of these articles (analysis one first please) then I 
think a lot of questions that get asked on this list will be implicitly 
answered.  Understanding analysis is key.

Erik

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Hits Highlighting

2003-11-25 Thread Pleasant, Tracy

I have seen that one, but it doesn't include the source code, only the
jar with classes.

I need something to actually highlight - like if you took a yellow
marker and highlighted,not doing it in bold.  

-Original Message-
From: Dror Matalon [mailto:[EMAIL PROTECTED]
Sent: Tuesday, November 25, 2003 12:29 PM
To: Lucene Users List
Subject: Re: Hits Highlighting

Hi,

The lucene home page has a lot of resources, including the FAQs,
articles, javadocs and contributions. 

For instance, there's a query hilighter in the contributions page. 

On Tue, Nov 25, 2003 at 12:17:41PM -0500, Pleasant, Tracy wrote:
> 
>  Are there any  hits highlighting functions? 
> 
>  I have a simple one, but it gets complicated with searching multiple
> words, having tokens, etc.
> 
> 
>  
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 

-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.fastbuzz.com
http://www.zapatec.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Search Question

2003-11-25 Thread Pleasant, Tracy

Also searching 'red_*' returns nothing, also.





-Original Message-
From: Dror Matalon [mailto:[EMAIL PROTECTED]
Sent: Tuesday, November 25, 2003 12:22 PM
To: Lucene Users List
Subject: Re: Search Question


No, but if you use the standard analyzer searching "red*" will return
documents with "read_car"

On Tue, Nov 25, 2003 at 12:00:01PM -0500, Pleasant, Tracy wrote:
> 
>  If I have words within a document like 
>  
>  red_car
>  
>  If I search for 'red' would it return documents containing 'red_car'?

> 
>  
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 

-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.fastbuzz.com
http://www.zapatec.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Search Question

2003-11-25 Thread Pleasant, Tracy

 How come if I search for 'red_car*' it returns nothing.

 I am using standard analyzer, too. 

-Original Message-
From: Dror Matalon [mailto:[EMAIL PROTECTED]
Sent: Tuesday, November 25, 2003 12:22 PM
To: Lucene Users List
Subject: Re: Search Question


No, but if you use the standard analyzer searching "red*" will return
documents with "read_car"

On Tue, Nov 25, 2003 at 12:00:01PM -0500, Pleasant, Tracy wrote:
> 
>  If I have words within a document like 
>  
>  red_car
>  
>  If I search for 'red' would it return documents containing 'red_car'?

> 
>  
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 

-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.fastbuzz.com
http://www.zapatec.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Hits Highlighting

2003-11-25 Thread Pleasant, Tracy


 Are there any  hits highlighting functions? 

 I have a simple one, but it gets complicated with searching multiple
words, having tokens, etc.


 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Search Question - not returning desired results

2003-11-25 Thread Pleasant, Tracy


The documents I have index contain information regarding file names also.

For instance 'return_results.pl' or something like that may be in the document fields.

I am not understanding Lucene's way of searching:

1. If I search for 'return_results', the search does not return anything
2. If I search for 'results' or 'return', the search does not return anything
3. If I search for 'results.pl', the search does return the document containg 
'return_results.pl' 
4. If I search for 'results~', the search does return the document containg 
'return_results.pl' 
5. If I search for 'return_results~', the search does not return anything

What is going on? 

I want it to return the document in all of the situations.

I also don't want to have to use '~' all the time.



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Search Question

2003-11-25 Thread Pleasant, Tracy


 If I have words within a document like 
 
 red_car
 
 If I search for 'red' would it return documents containing 'red_car'? 

 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Searching different types of words

2003-11-25 Thread Pleasant, Tracy

If I search for "like" I would want the search to return documents
containing "like", "liked", "likes", etc.. variations of the word.

Is there a way to tell Lucene to do this? 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Lucene refresh index function (incremental indexing).

2003-11-25 Thread Pleasant, Tracy

I vaguely remmeber I had a problem back when I used 0.6.2. I reverted back and used 
0.6.1 instead. I haven't had any problems.


-Original Message-
From: Zhou, Oliver [mailto:[EMAIL PROTECTED]
Sent: Tuesday, November 25, 2003 11:03 AM
To: 'Lucene Users List'
Subject: RE: Lucene refresh index function (incremental indexing).


I do have other problems with PDFBox-0.6.4.  For one, it has annoying debug
information at very low level parsing process.  The other, I got infinite
loop while indexing pdf files although they say the infinite loop bug has
been fixed in their release notes.  Anybody knows what's going on?

Thanks,
Oliver

 

-Original Message-
From: Ben Litchfield [mailto:[EMAIL PROTECTED]
Sent: Tuesday, November 25, 2003 9:45 AM
To: Lucene Users List
Subject: RE: Lucene refresh index function (incremental indexing).



Yes, just add the log4j configuration.  The easiest way to do that is as a
system parameter like this

java -Dlog4j.configuration=log4j.xml org.apache.lucene.demo.IndexHTML
-create -index c:\\index ..

Where log4j.xml is the path to your log4j config, PDFBox has an example
one you can use.

Ben
http://www.pdfbox.org

On Tue, 25 Nov 2003, Zhou, Oliver wrote:

> Lucene doesn't have pdf parser.  In order to index pdf files you have to
add
> one by your self.  PDFBox is a good choice.  You may just ignore the
warning
> for log4j or you can add log4j in your classpath.
>
> Oliver
>
>
> -Original Message-
> From: Tun Lin [mailto:[EMAIL PROTECTED]
> Sent: Monday, November 24, 2003 10:07 PM
> To: 'Lucene Users List'
> Subject: RE: Lucene refresh index function (incremental indexing).
>
>
> Does it support indexing the contents of pdf files? I have found one
project
> called PDFBox that can be integrated with Lucene to search inside of the
pdf
> files. Currently, Lucene can only search for the pdf filename. I tried
with
> PDFBox and I got the following message when I typed the command: java
> org.apache.lucene.demo.IndexHTML -create -index c:\\index ..
>
> log4j:WARN No appenders could be found for logger
> (org.pdfbox.pdfparser.PDFParse
> r).
> log4j:WARN Please initialize the log4j system properly.
>
> Can anyone advise?
>
> -Original Message-
> From: Doug Cutting [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, November 25, 2003 5:01 AM
> To: Lucene Users List
> Subject: Re: Lucene refresh index function (incremental indexing).
>
> Tun Lin wrote:
> > These are the steps I took:
> >
> > 1) I compile all the files in a particular directory using the command:
> > java org.apache.lucene.demo.IndexHTML -create -index c:\\index ..
> > , putting all the indexed files in c:\\index.
> > 2) Everytime, I added an additional file in that directory. I need to
> > reindex/recompile that directory to generate the indexes again. As the
> > directory gets larger, the indexing takes a longer time.
> >
> > My question is how do I generate the indexes automatically everytime a
> > new document is added in that directory without me recompiling everytime
> manually?
>
> To update, try removing the '-create' from the command line.  The demo
code
> supports incremental updates.  It will re-scan the directory and figure
out
> which files have changed, what new files have appeared and which
previously
> existing files have been removed.
>
> Doug
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Eliminating duplicate result

2003-11-25 Thread Pleasant, Tracy

When you are doing two searches are you searching for two different terms?

-Original Message-
From: Dragan Jotanovic [mailto:[EMAIL PROTECTED]
Sent: Tuesday, November 25, 2003 11:14 AM
To: Lucene Users List
Subject: Eliminating duplicate result

What is the easyest way to eliminate duplicate documents if one is doing two searches 
on the same index?

Have anybody done something similar?

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: XML support in Lucene

2003-11-25 Thread Pleasant, Tracy

This may help you: 

http://www.jguru.com/faq/view.jsp?EID=1074235

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Sent: Tuesday, November 25, 2003 11:07 AM
To: [EMAIL PROTECTED]
Subject: XML support in Lucene

Hello group,

does Lucene offer an effective and flexible way to treat XML files. I
know
that as soon as an InputStream is provided Lucene can basically index
(evtl.
after clearning) everything. How is it with XML files?

If there is a way is it possbile to have one big XML file with many
individual parts in it. This should be considered as docuemnts and the
repeative XML
tags as fields.

Here an example:

Tim
How are you? Tom

 Linda
bla bla bla

Does somebody has already developed classes which go though this XML
file,
create TWO documents with the fields "From" and "Content" and fill in
the text
between the tags ? The Indexing business should then be the same since
it is
abstract against the Document object. The same for the search process.
The
search process however could be optimised with stuctural information
(i.e.
only search in "Content")...

Cheers,
Ralph

-- 
NEU FUR ALLE - GMX MediaCenter - fur Fotos, Musik, Dateien...
Fotoalbum, File Sharing, MMS, Multimedia-Gru?, GMX FotoService

Jetzt kostenlos anmelden unter http://www.gmx.net

+++ GMX - die erste Adresse fur Mail, Message, More! +++

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Tokenizing text custom way

2003-11-25 Thread Pleasant, Tracy

Not exactly and answer to the question but I haven't yet used the Token 
classes/functionality that came with Lucene. Can someone give me an idea of how and 
why one may use this?

-Original Message-
From: Dragan Jotanovic [mailto:[EMAIL PROTECTED]
Sent: Tuesday, November 25, 2003 6:42 AM
To: Lucene Users List
Subject: Tokenizing text custom way

Hi. I need to tokenize text while indexing but I don't want space to be delimiter. 
Delimiter should be my custom character (for example comma). I understand that I would 
probably need to implement my own analyzer, but could someone help me where to start. 
Is there any other way to do this without writing custom analyzer?

This is what I want to achieve.
If I have some text that will be indexed like following:

man, people, time out, sun

and if I enter 'time' as a search word, I don't want to get "time out" in results. I 
need exact keyword matching. I would achieve this if I tokenize "time out" as one 
token while idexing.

Maybe someone had similar problem? If someone knows how to handle this, please help me.

Dragan Jotanovic

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Score

2003-11-25 Thread Pleasant, Tracy

Thanks for your input. 
I am using the standard analyzer for everything. I haven't created my
own analyzer yet.

The documents I am using: 

Plain text
PDF Documents

(I have two indexes) 

When I create my index: 
   IndexWriter writer = new IndexWriter(index_name, new
StandardAnalyzer(),true);

When I search:
Analyzer analyzer = new StandardAnalyzer();
query = MultiFieldQueryParser.parse(queryString, fields, analyzer); 
(where query String is the term to search and fields is the array of
fields)

When searching it does the one index then it does the other. 

When you say you use different analyzers for different fields in your
index, how would you accomplish that? When I create the index it has a
parameter for analyzer.. unless you create different indexes , how do
you use two different ones? 

-Original Message-
From: Gerret Apelt [mailto:[EMAIL PROTECTED]
Sent: Monday, November 24, 2003 3:25 PM
To: Lucene Users List
Subject: Re: Score

Tracey --

it would help if you could give more detail on the types of documents, 
fields and analyzers you're using. Also what do you mean by "Multi Field

Search"? I presume you're using the MultiFieldQueryParser to have query 
terms in a user-submitted query be searched for in each field in your
index.

If I am understanding your problem, then it might be the same one I had 
a few weeks ago -- highly relevant matches would not receive a high 
ranking. (This paragraph will apply to you only if you use more than 
just one Analyzer for the set of your fields). I had six fields in my 
index, most of which were populated with a standard analyzer. I used 
self-made Analyzers for two of the fields. This turned out to be my 
problem when using MultiFieldQueryParser: I told my 
MultiFieldQueryParser instance to use only the standard analyzer. 
Instead I discovered that I needed to make use of 
org.apache.lucene.analysis.PerFieldAnalyzerWrapper and feed that to the 
MultiFieldQueryParser. Unless you do this, your problem is whats 
described here: 
http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi?file=chapter.in
dexing&toc=faq#q15.

Most likely, if your scoring is off, you're "doing something wrong" in 
the way you use the Lucene API -- at least, thats what I've discovered 
to be the case when my ranking is off.

If you're interested in the nitty-gritty of how scoring is done, check 
this FAQ entry:
http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi?file=chapter.se
arch&toc=faq#q31

cheers,
Gerret

Pleasant, Tracy wrote:

>Hi,
>
>I'm using the Multi Field Search to search all the fields of my
>documents during the search. 
>
>When it returns results the scores are numerically low - .06, .17, etc.
>I would think if I searched for "Dog" and there was a doc with "Dog" in
>the title and several times in the contents of a document that it would
>receive a score more like 1.0 or close to it.
>
>Is there a way that I can tweak the score?
>
>I tried using Boost but that did absolutely nothing.
>
>-
>To unsubscribe, e-mail: [EMAIL PROTECTED]
>For additional commands, e-mail: [EMAIL PROTECTED]
>
>
>  
>

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Lucene refresh index function (incremental indexing).

2003-11-25 Thread Pleasant, Tracy

I was able to get PDFBox to work with my JSP webpages. 

I think you will have to in a way write your own code to do the PDF
files (while still calling the Lucene functions)

 doc = LucenePDFDocument.getDocument(file);

-Original Message-
From: Tun Lin [mailto:[EMAIL PROTECTED]
Sent: Monday, November 24, 2003 11:07 PM
To: 'Lucene Users List'
Subject: RE: Lucene refresh index function (incremental indexing).

Does it support indexing the contents of pdf files? I have found one
project
called PDFBox that can be integrated with Lucene to search inside of the
pdf
files. Currently, Lucene can only search for the pdf filename. I tried
with
PDFBox and I got the following message when I typed the command: java
org.apache.lucene.demo.IndexHTML -create -index c:\\index .. 

log4j:WARN No appenders could be found for logger
(org.pdfbox.pdfparser.PDFParse
r).
log4j:WARN Please initialize the log4j system properly.

Can anyone advise?

-Original Message-
From: Doug Cutting [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, November 25, 2003 5:01 AM
To: Lucene Users List
Subject: Re: Lucene refresh index function (incremental indexing).

Tun Lin wrote:
> These are the steps I took:
> 
> 1) I compile all the files in a particular directory using the
command: 
> java org.apache.lucene.demo.IndexHTML -create -index c:\\index .. 
> , putting all the indexed files in c:\\index.
> 2) Everytime, I added an additional file in that directory. I need to 
> reindex/recompile that directory to generate the indexes again. As the

> directory gets larger, the indexing takes a longer time.
> 
> My question is how do I generate the indexes automatically everytime a

> new document is added in that directory without me recompiling
everytime
manually?

To update, try removing the '-create' from the command line.  The demo
code
supports incremental updates.  It will re-scan the directory and figure
out
which files have changed, what new files have appeared and which
previously
existing files have been removed.

Doug

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Score

2003-11-24 Thread Pleasant, Tracy

Hi,

I'm using the Multi Field Search to search all the fields of my
documents during the search. 

When it returns results the scores are numerically low - .06, .17, etc.
I would think if I searched for "Dog" and there was a doc with "Dog" in
the title and several times in the contents of a document that it would
receive a score more like 1.0 or close to it.

Is there a way that I can tweak the score?

I tried using Boost but that did absolutely nothing.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

lucene usage without website

RE: Lucene and Mysql

Word Documents

RE: Unindexed fields

RE: Returning one result

RE: Returning one result

RE: Returning one result

RE: Returning one result

RE: Returning one result

RE: Returning one result

RE: Returning one result

RE: Returning one result

RE: Returning one result

RE: Returning one result

Returning one result

RE: FileDocument.java

RE: Eliminating duplicate result

RE: Search Question - not returning desired results

RE: Search Question - not returning desired results

RE: Search Question - not returning desired results

RE: Hits Highlighting

RE: Search Question

RE: Search Question

Hits Highlighting

Search Question - not returning desired results

Search Question

Searching different types of words

RE: Lucene refresh index function (incremental indexing).

RE: Eliminating duplicate result

RE: XML support in Lucene

RE: Tokenizing text custom way

RE: Score

RE: Lucene refresh index function (incremental indexing).

Score

34 matches

Site Navigation

Mail list logo

Footer information