Are there any non-alpha/numeric character that StandardAnalyzer won't treat as break?

2009-08-21 Thread ohaya
Hi, This is a kind of followup to a thread a couple of weeks ago. In my indexer, I want to pre-pend a string to certain terms to make it easier to search. So for example, if I have a string XXX, I want to add, say, field1 to it, to get field1XXX before I index it. To make it easier to see

Re: Possible to invoke same Lucene query on a String?

2009-08-20 Thread ohaya
Hi, I guess, that, in short, what I'm really trying to find out is: If I construct a Lucene query, can I (somehow) use that to query a String object that I have, rather than querying against a Lucene index? Thanks, Jim oh...@cox.net wrote: Hi, This question is going to be a little

Re: Possible to invoke same Lucene query on a String?

2009-08-20 Thread ohaya
Paul Cowan co...@aconex.com wrote: oh...@cox.net wrote: Document1 subdoc1 term1 term2 subdoc2 term1a term2a subdoc3 term1b term2b However, I've now been asked to implement the ability to

Re: Possible to invoke same Lucene query on a String?

2009-08-20 Thread ohaya
Paul Cowan co...@aconex.com wrote: oh...@cox.net wrote: - I'd have to create a (very small) index, for each sub-document, where I do the Document.add() with just the (for example) two terms, then - Run a query against the 1-entry index, which - Would either give me a yes or no

Re: Why does this search succeed with web app, but not Luke?

2009-08-07 Thread ohaya
Andrzej, Hah! I tried as you suggested using Luke, and I found at least part of my problem. Luke was defaulting to KeywordAnalyzer. I changed that to StandardAnalyzer, and did queries for: path:x and path:xx.dat For the first, the Rewritten was:

Re: Why does this search succeed with web app, but not Luke?

2009-08-07 Thread ohaya
Hi Matt, Good catch! As I just posted, I *just* noticed that (Luke use Keyword Analyzer) :)!!! Once I switched Luke to using Standard Analyzer, the Luke search results matched my web query results. Thanks! Jim Matthew Hall mh...@informatics.jax.org wrote: Luke defaults to

StandardAnalyzer and Windows vs. Linux path

2009-08-07 Thread ohaya
Hi, I've been doing development of my indexer app, which uses StandardAnalyzer on a WIndows machine, and today, I deployed an initial onto a Redhat Linux (RHEL) machine. On my development machine, I have the files that are being indexed in something like:

Why does this search succeed with web app, but not Luke?

2009-08-06 Thread ohaya
Hi, In my indexer app (based on the IndexFiles.java demo), I am adding the path field: doc.add(new Field(path, f.getPath(), Field.Store.YES, Field.Index.ANALYZED)); Per Luke, the full path (e.g., c:\\.yyy) gets parsed, and one of the terms (again, per Luke) is , i.e., the

Re: Why does this search succeed with web app, but not Luke?

2009-08-06 Thread ohaya
Phil, Both my indexer and the webapp are basically from the Lucene demos, the indexer starting with the IndexFiles.java demo code, so I think they're both using the StandardAnalyzer. What appears in Luke, when I select path is just the filename part, without the extension, i.e., the

Re: Why does this search succeed with web app, but not Luke?

2009-08-06 Thread ohaya
Phil, I need to be more precise... The files that I have are at, say: C:\dir1\dir2\ so, for example, I have C:\dir1\dir2\file-1-1.dat C:\dir1\dir2\file-1-2.dat C:\dir1\dir2\file-1-3.dat C:\dir1\dir2\file-1-4.dat C:\dir1\dir2\file-1-5.dat After indexing, and, using Luke, I look at the path

Re: Why does this search succeed with web app, but not Luke?

2009-08-06 Thread ohaya
Hi Phil, Well, kind of... but... Then, why, when I do the search in Luke, do I get the results I cited: == succeeds .yyy == fails (no results) I guess that I've been assuming that the search in Luke is correct and I've been using that to test my understanding, but maybe that's an

Slightly Off-topic: How to decide whether or not to add a document?

2009-08-04 Thread ohaya
Hi, I have an app to initially create a Lucene index, and to populate it with documents. I'm now working on that app to insert new documents into that Lucene index. In general, this new app, which is based loosely on the demo apps (e.g., IndexFiles.java), is working, i.e., I can run it with

Re: Slightly Off-topic: How to decide whether or not to add a document?

2009-08-04 Thread ohaya
Hi Ian, Thanks for the quick response. I forgot to mention, but in our case, the producers is part of a commercial package, so we don't have a way to get them to change anything, so I think the 1st 3 suggestions are not feasible for us. I have considered something like the 4th suggestion

Re: Slightly Off-topic: How to decide whether or not to add a document?

2009-08-04 Thread ohaya
Ian, One question about the 4th alternative: I was wondering how you implemented the sleep() in Java, esp. in such a way as not to mess up any of the Lucene stuff (in case there's threading)? Right now, my indexer/inserter app doesn't explicitly do any threading stuff. Thanks, Jim

Re: Slightly Off-topic: How to decide whether or not to add a document?

2009-08-04 Thread ohaya
Hi Ian, Ok, thanks for the additional info. I've implemented check for both file.lastModified and file.length(), and it seems to work in my dev environment (Windows), so I'll have to test on a real system. Thanks again, Jim Ian Lea ian@gmail.com wrote: Jim The sleep is

Re: Weird discrepancy with term counts vs. terms (off by 1)

2009-08-02 Thread ohaya
Hi, BTW, my indexer app is basically the same as the demo IndexFiles.java. Here's part of the main: try { IndexWriter writer = new IndexWriter(INDEX_DIR, new StandardAnalyzer(), true, IndexWriter.MaxFieldLength.LIMITED); System.out.println(Indexing to directory ' +INDEX_DIR+

Re: Weird discrepancy with term counts vs. terms (off by 1)

2009-08-02 Thread ohaya
Hi Phil, For problem with my app, it wasn't what you suggested (about the tokens, etc.). For some later things, my indexer creates both a path field that is analyzed (and thus tokenized, etc.) and another field, fullpath, which is not analyzed (and thus, not tokenized). The problem with my

Re: java.io.IOException when trying to list terms in index (IndexReader)

2009-08-02 Thread ohaya
se3g2...@gmail.com wrote: hi,as you the error messages you listed below,pls put the 'reader.close()' block to the bottom of method. i think,if you invoke it first,the infrastructure stream is closed ,so exceptions is encountered. ohaya wrote: Hi, I changed the beginning

java.io.IOException when trying to list terms in index (IndexReader)

2009-08-01 Thread ohaya
Hi, I'm starting to work on an app to list all of the terms in the path field. I'm including the beginning of my code below. When I run this, pointing it to a directory named index containing the Lucene indexes, I am getting a java.io.IOException. Here's the output when I run: Index in

Re: java.io.IOException when trying to list terms in index (IndexReader)

2009-08-01 Thread ohaya
Phil, Yes, that exception is not very helpful :)!! I'll try your suggestions and post back. Thanks, Jim Phil Whelan phil...@gmail.com wrote: Hi Jim, I cannot see anything obvious, but both open() and terms() throw IOException's. You could try putting these in separate try..catch

Re: java.io.IOException when trying to list terms in index (IndexReader)

2009-08-01 Thread ohaya
Phil, I posted in haste. Actually, from the output that I posted, doesn't it it look like the .next() itself is throwing the exception? That is what has been puzzling me. It looks like it got through the open() and terms() with no problem, then it blew up when calling the next()? Jim

Re: java.io.IOException when trying to list terms in index (IndexReader)

2009-08-01 Thread ohaya
Hi, I changed the beginning of the try to: try { System.out.println(About to call .next()...); boolean foo = termsEnumerator.next(); System.out.println(Finished calling first .next());

Re: java.io.IOException when trying to list terms in index (IndexReader)

2009-08-01 Thread ohaya
Hi, I don't know what happened, but all of a sudden, it started working :(... Jim oh...@cox.net wrote: Hi, I changed the beginning of the try to: try { System.out.println(About to call .next()...); boolean foo =

Re: Is there a list of special characters for standard analyzer?

2009-07-31 Thread ohaya
Phil Whelan phil...@gmail.com wrote: On Thu, Jul 30, 2009 at 7:12 PM, oh...@cox.net wrote: I was wonder if there is a list of special characters for the standard analyzer? What I mean by special is characters that the analyzer considers break characters. For example, if I

Re: Is there a list of special characters for standard analyzer?

2009-07-31 Thread ohaya
Hi Ahmet, Thanks for the clarification and information! That was exactly what I was looking for. Jim AHMET ARSLAN iori...@yahoo.com wrote: I guess that the obvious question is Which characters are considered 'punctuation characters'?. Punctuation = (_|-|/|.|,) In

Seeking guidance for updating indexes

2009-07-31 Thread ohaya
Hi, I still am new to Lucene, but I think I have an initial indexer app (based on the demo IndexFiles app) working, and also have a web app, based on the demo luceneweb web app working. I'm still busy tweaking both, but am starting to think ahead, about operational type issues, esp.

Re: Seeking guidance for updating indexes

2009-07-31 Thread ohaya
Hi, Phil and Ian, Thanks for the responses and confirmations about this. Assuming that our requirements (as I described earlier) don't change, it looks like this updating/inserting thing should be pretty easy :)! Later, and have a great weekend! Jim Phil Whelan phil...@gmail.com

Re: ThreadedIndexWriter vs. IndexWriter

2009-07-31 Thread ohaya
Hi, Sorry to jump in, but I've been following this thread with interest :)... Am I misunderstanding your original observation, that ThreadedIndexWriter produced smaller index? Did the ThreadedIndexWriter also finish faster (I'm assuming that it should)? If the index is smaller, and

Re: ThreadedIndexWriter vs. IndexWriter

2009-07-31 Thread ohaya
Hi, I don't know the answer to your questions, but I'm guessing that the answer to #3 is probably because the answers to #1 and #2. Did you try to look at the indexes using Luke? That shows the top 50 terms when it starts, so it might be obvious what the differences are, and that might

How to index IP addresses?

2009-07-30 Thread ohaya
Hi, I am trying to index information in some proprietary-formatted files. In particular, these files contain some IP addresses in dotted notation, e.g., aa.bb.cc.dd. For my initial test, I have a Document implementation, and after I extract what I need into a String named Info, I do:

How to search path?

2009-07-30 Thread ohaya
Hi, I am working with a modified version of the demo IndexFiles. In that code, when it builds the index, it has: doc.add(new Field(path, f.getPath(), Field.Store.YES, Field.Index.NOT_ANALYZED)); In Luke, I can see all the file paths in the path field. I am also using the demo luceneweb

RE: How to index IP addresses?

2009-07-30 Thread ohaya
Hi, Oh. Ok, thanks! I'll give that a try. Jim Armasu wrote: Keyword: Field.Index.NOT_ANALYZED -Original Message- From: oh...@cox.net [mailto:oh...@cox.net] Sent: Thursday, July 30, 2009 4:36 PM To: java-user@lucene.apache.org Subject: How to index IP addresses? Hi,

Re: How to search path?

2009-07-30 Thread ohaya
Ian, I'll respond to this msg, re. searching path. I made the change you suggested, to Field.Index.ANALYZED, and that fixed the problem I was having with searching for components of the path field. Thanks! Jim Ian Lea ian@gmail.com wrote: In contrast to your last question and

Re: How to index IP addresses?

2009-07-30 Thread ohaya
Hi Matthew and Narcis, I think that I found the (original) problem. It looks like the reason that I was getting all those other terms, which looked to me like the octets, weren't the octets :)... When I was doing the doc.add(), there were some other numbers (not IP addresses) in the String

Re: Term's frequency

2009-07-30 Thread ohaya
prashant ullegaddi prashullega...@gmail.com wrote: How to get the number of times a term occurs in the Lucene index? Regards, Prashant. Hi, You didn't mention if you were looking for something programmatic or not, but there's a tool called Luke, and when you start that up and point

Is there a list of special characters for standard analyzer?

2009-07-30 Thread ohaya
Hi, I was wonder if there is a list of special characters for the standard analyzer? What I mean by special is characters that the analyzer considers break characters. For example, if I have something like foo=something, apparently the analyzer considers this as two terms, foo and

New to Lucene - some questions about demo

2009-07-28 Thread Ohaya
Hi, I'm just starting to work with Lucene, and I guess that I learn best by working with code, so I've started with the demos in the Lucene distribution. I got the IndexFiles.java and IndexHTML.java working, and also the luceneweb.war is deployed to Tomcat. I used IndexFiles.java to index

Re: New to Lucene - some questions about demo

2009-07-28 Thread ohaya
Ian and Matthew, I've tried foofoo, summary:foofoo, FooFoo, and summary:FooFoo. No results returned for any of those :(. Also, Matthew, I bounced Tomcat after running IndexFiles, so I don't think that's the problem either :(... I looked at the SearchFiles.java code, and it looks like it's

Re: New to Lucene - some questions about demo

2009-07-28 Thread ohaya
Matthew, I'll keep your comments in mind, but I'm still confused about something. I currently haven't changed much in the demo, other than adding that doc.add for summary. With JUST that doc.add, having done my reading, I kind of expected NOT to be able to search on the summary at all, but it

Re: New to Lucene - some questions about demo

2009-07-28 Thread ohaya
Matthew, Ok, thanks for the clarifications. When I have some quiet time, I'll try to re-do the tests I did earlier and post back if any questions. Thanks again, Jim Matthew Hall mh...@informatics.jax.org wrote: Oh.. no. If you specifically include a fieldname: blah in your clause,