Hi,
This is a kind of followup to a thread a couple of weeks ago.
In my indexer, I want to pre-pend a string to certain terms to make it easier
to search. So for example, if I have a string XXX, I want to add, say,
field1 to it, to get field1XXX before I index it.
To make it easier to see
Hi,
I guess, that, in short, what I'm really trying to find out is:
If I construct a Lucene query, can I (somehow) use that to query a String
object that I have, rather than querying against a Lucene index?
Thanks,
Jim
oh...@cox.net wrote:
Hi,
This question is going to be a little
Paul Cowan co...@aconex.com wrote:
oh...@cox.net wrote:
Document1 subdoc1 term1 term2
subdoc2 term1a term2a
subdoc3 term1b term2b
However, I've now been asked to implement the ability to
Paul Cowan co...@aconex.com wrote:
oh...@cox.net wrote:
- I'd have to create a (very small) index, for each sub-document, where I
do the Document.add() with just the (for example) two terms, then
- Run a query against the 1-entry index, which
- Would either give me a yes or no
Andrzej,
Hah!
I tried as you suggested using Luke, and I found at least part of my problem.
Luke was defaulting to KeywordAnalyzer.
I changed that to StandardAnalyzer, and did queries for:
path:x
and
path:xx.dat
For the first, the Rewritten was:
Hi Matt,
Good catch! As I just posted, I *just* noticed that (Luke use Keyword
Analyzer) :)!!!
Once I switched Luke to using Standard Analyzer, the Luke search results
matched my web query results.
Thanks!
Jim
Matthew Hall mh...@informatics.jax.org wrote:
Luke defaults to
Hi,
I've been doing development of my indexer app, which uses StandardAnalyzer on a
WIndows machine, and today, I deployed an initial onto a Redhat Linux (RHEL)
machine.
On my development machine, I have the files that are being indexed in something
like:
Hi,
In my indexer app (based on the IndexFiles.java demo), I am adding the path
field:
doc.add(new Field(path, f.getPath(), Field.Store.YES,
Field.Index.ANALYZED));
Per Luke, the full path (e.g., c:\\.yyy) gets parsed, and one of the
terms (again, per Luke) is , i.e., the
Phil,
Both my indexer and the webapp are basically from the Lucene demos, the indexer
starting with the IndexFiles.java demo code, so I think they're both using the
StandardAnalyzer.
What appears in Luke, when I select path is just the filename part, without
the extension, i.e., the
Phil,
I need to be more precise...
The files that I have are at, say:
C:\dir1\dir2\
so, for example, I have
C:\dir1\dir2\file-1-1.dat
C:\dir1\dir2\file-1-2.dat
C:\dir1\dir2\file-1-3.dat
C:\dir1\dir2\file-1-4.dat
C:\dir1\dir2\file-1-5.dat
After indexing, and, using Luke, I look at the path
Hi Phil,
Well, kind of... but...
Then, why, when I do the search in Luke, do I get the results I cited:
== succeeds
.yyy == fails (no results)
I guess that I've been assuming that the search in Luke is correct and I've
been using that to test my understanding, but maybe that's an
Hi,
I have an app to initially create a Lucene index, and to populate it with
documents. I'm now working on that app to insert new documents into that
Lucene index.
In general, this new app, which is based loosely on the demo apps (e.g.,
IndexFiles.java), is working, i.e., I can run it with
Hi Ian,
Thanks for the quick response.
I forgot to mention, but in our case, the producers is part of a commercial
package, so we don't have a way to get them to change anything, so I think the
1st 3 suggestions are not feasible for us.
I have considered something like the 4th suggestion
Ian,
One question about the 4th alternative: I was wondering how you implemented
the sleep() in Java, esp. in such a way as not to mess up any of the Lucene
stuff (in case there's threading)?
Right now, my indexer/inserter app doesn't explicitly do any threading stuff.
Thanks,
Jim
Hi Ian,
Ok, thanks for the additional info.
I've implemented check for both file.lastModified and file.length(), and it
seems to work in my dev environment (Windows), so I'll have to test on a real
system.
Thanks again,
Jim
Ian Lea ian@gmail.com wrote:
Jim
The sleep is
Hi,
BTW, my indexer app is basically the same as the demo IndexFiles.java. Here's
part of the main:
try {
IndexWriter writer = new IndexWriter(INDEX_DIR, new StandardAnalyzer(),
true, IndexWriter.MaxFieldLength.LIMITED);
System.out.println(Indexing to directory ' +INDEX_DIR+
Hi Phil,
For problem with my app, it wasn't what you suggested (about the tokens, etc.).
For some later things, my indexer creates both a path field that is analyzed
(and thus tokenized, etc.) and another field, fullpath, which is not analyzed
(and thus, not tokenized).
The problem with my
se3g2...@gmail.com wrote:
hi,as you the error messages you listed below,pls put the 'reader.close()'
block to the bottom of method.
i think,if you invoke it first,the infrastructure stream is closed ,so
exceptions is encountered.
ohaya wrote:
Hi,
I changed the beginning
Hi,
I'm starting to work on an app to list all of the terms in the path field.
I'm including the beginning of my code below.
When I run this, pointing it to a directory named index containing the Lucene
indexes, I am getting a java.io.IOException.
Here's the output when I run:
Index in
Phil,
Yes, that exception is not very helpful :)!!
I'll try your suggestions and post back.
Thanks,
Jim
Phil Whelan phil...@gmail.com wrote:
Hi Jim,
I cannot see anything obvious, but both open() and terms() throw
IOException's. You could try putting these in separate try..catch
Phil,
I posted in haste. Actually, from the output that I posted, doesn't it it look
like the .next() itself is throwing the exception?
That is what has been puzzling me. It looks like it got through the open() and
terms() with no problem, then it blew up when calling the next()?
Jim
Hi,
I changed the beginning of the try to:
try {
System.out.println(About to call .next()...);
boolean foo = termsEnumerator.next();
System.out.println(Finished calling first .next());
Hi,
I don't know what happened, but all of a sudden, it started working :(...
Jim
oh...@cox.net wrote:
Hi,
I changed the beginning of the try to:
try {
System.out.println(About to call .next()...);
boolean foo =
Phil Whelan phil...@gmail.com wrote:
On Thu, Jul 30, 2009 at 7:12 PM, oh...@cox.net wrote:
I was wonder if there is a list of special characters for the standard
analyzer?
What I mean by special is characters that the analyzer considers break
characters.
For example, if I
Hi Ahmet,
Thanks for the clarification and information! That was exactly what I was
looking for.
Jim
AHMET ARSLAN iori...@yahoo.com wrote:
I guess that the obvious question is Which characters are
considered 'punctuation characters'?.
Punctuation = (_|-|/|.|,)
In
Hi,
I still am new to Lucene, but I think I have an initial indexer app (based on
the demo IndexFiles app) working, and also have a web app, based on the demo
luceneweb web app working.
I'm still busy tweaking both, but am starting to think ahead, about operational
type issues, esp.
Hi,
Phil and Ian,
Thanks for the responses and confirmations about this.
Assuming that our requirements (as I described earlier) don't change, it looks
like this updating/inserting thing should be pretty easy :)!
Later, and have a great weekend!
Jim
Phil Whelan phil...@gmail.com
Hi,
Sorry to jump in, but I've been following this thread with interest
:)...
Am I misunderstanding your original observation, that
ThreadedIndexWriter produced smaller index? Did the ThreadedIndexWriter
also finish faster (I'm assuming that it should)?
If the index is smaller, and
Hi,
I don't know the answer to your questions, but I'm guessing that the answer to
#3 is probably because the answers to #1 and #2.
Did you try to look at the indexes using Luke? That shows the top 50 terms
when it starts, so it might be obvious what the differences are, and that might
Hi,
I am trying to index information in some proprietary-formatted files.
In particular, these files contain some IP addresses in dotted notation, e.g.,
aa.bb.cc.dd.
For my initial test, I have a Document implementation, and after I extract what
I need into a String named Info, I do:
Hi,
I am working with a modified version of the demo IndexFiles.
In that code, when it builds the index, it has:
doc.add(new Field(path, f.getPath(), Field.Store.YES,
Field.Index.NOT_ANALYZED));
In Luke, I can see all the file paths in the path field.
I am also using the demo luceneweb
Hi,
Oh. Ok, thanks! I'll give that a try.
Jim
Armasu wrote:
Keyword: Field.Index.NOT_ANALYZED
-Original Message-
From: oh...@cox.net [mailto:oh...@cox.net]
Sent: Thursday, July 30, 2009 4:36 PM
To: java-user@lucene.apache.org
Subject: How to index IP addresses?
Hi,
Ian,
I'll respond to this msg, re. searching path.
I made the change you suggested, to Field.Index.ANALYZED, and that fixed the
problem I was having with searching for components of the path field.
Thanks!
Jim
Ian Lea ian@gmail.com wrote:
In contrast to your last question and
Hi Matthew and Narcis,
I think that I found the (original) problem.
It looks like the reason that I was getting all those other terms, which looked
to me like the octets, weren't the octets :)...
When I was doing the doc.add(), there were some other numbers (not IP
addresses) in the String
prashant ullegaddi prashullega...@gmail.com wrote:
How to get the number of times a term occurs in the Lucene index?
Regards,
Prashant.
Hi,
You didn't mention if you were looking for something programmatic or not, but
there's a tool called Luke, and when you start that up and point
Hi,
I was wonder if there is a list of special characters for the standard
analyzer?
What I mean by special is characters that the analyzer considers break
characters. For example, if I have something like foo=something, apparently
the analyzer considers this as two terms, foo and
Hi,
I'm just starting to work with Lucene, and I guess that I learn best by
working with code, so I've started with the demos in the Lucene
distribution.
I got the IndexFiles.java and IndexHTML.java working, and also the
luceneweb.war is deployed to Tomcat.
I used IndexFiles.java to index
Ian and Matthew,
I've tried foofoo, summary:foofoo, FooFoo, and summary:FooFoo. No
results returned for any of those :(.
Also, Matthew, I bounced Tomcat after running IndexFiles, so I don't think
that's the problem either :(...
I looked at the SearchFiles.java code, and it looks like it's
Matthew,
I'll keep your comments in mind, but I'm still confused about something.
I currently haven't changed much in the demo, other than adding that doc.add
for summary.
With JUST that doc.add, having done my reading, I kind of expected NOT to be
able to search on the summary at all, but it
Matthew,
Ok, thanks for the clarifications.
When I have some quiet time, I'll try to re-do the tests I did earlier and post
back if any questions.
Thanks again,
Jim
Matthew Hall mh...@informatics.jax.org wrote:
Oh.. no.
If you specifically include a fieldname: blah in your clause,
40 matches
Mail list logo