Re: whats the correct way to do normalisation?

2006-11-08 Thread Joe
Hi, http://wiki.apache.org/jakarta-lucene/LuceneFAQ#head-133cf44dd3dff3680c96c1316a663e881eeac35a Are Wildcard, Prefix, and Fuzzy queries case sensitive? Unlike other types of Lucene queries, Wildcard, Prefix, and Fuzzy queries are not passed through the Analyzer, which is the component that pe

Re: whats the correct way to do normalisation?

2006-11-09 Thread Joe
Hi, : I want "Überraschung" is found by : : Überr* : Ueberr* : : So the best i can do is to do the normalisation manually(not by an : analyzer) before the indexing/searching process? Or use an Analyzer at index time that puts both the UTF-8 version of the string and the Latin-1 version of the st

WilcardQuery and memory

2007-03-09 Thread Joe
Hi, Here we use lucene to index our emails, currently 500.000 Documents. When Searching the body by a WildcardQuery the problems arises. I did some profiling with JProfiler. I see the more BooleanClause instances used the more memory is required during search. Most memory is used by instances

Re: WilcardQuery and memory

2007-03-09 Thread Joe
Hi Rob, For indexing e-mail, I recommend that you tokenise the e-mail addresses into fragments and query on the fragments as whole terms rather than using wildcards. [example] Hm for email adresses this isnt a big problem here. The real problem is the query on the body part of an email, wh

Lucene code injection?

2007-05-24 Thread Joe
Hi, I indexed emails. And now i want to restrict the search functionality for users so they only can search for emails to/from him. i know the email address of the user so my plan is to do it in the following way: The user enters some search parameters, they are combined in a query. This is a mi

Re: Lucene code injection?

2007-05-24 Thread Joe
Hi, This sounds good. As for the code injection it is up to you to sanitize the request before it goes to lucene, probably by filling the email field yourself and not rely on the user input for the email address I hoped i havent to sanitize the user input cause the email address query is ANDed

Re: Lucene code injection?

2007-05-24 Thread Joe
Damien McCarthy schrieb: Hi Joe, It would probably be cleaner to use a QueryFilter rather than doing the AND. Take a look at http://lucene.apache.org/java/2_0_0/api/org/apache/lucene/search/QueryFilter .html ok if its not to slow i go this way. Also I'm not sure that using the se

Re: Lucene code injection?

2007-05-24 Thread Joe
Hi, Hi Joe, It might be possible when you append the restriction before parsing the user query with the QueryParser, but I'm not sure. I recommend first parsing the query, and then constructing a BooleanQuery with the parsed user query and the e-mail term both as must. yes thats the

addding/searching documents during optimize

2007-05-29 Thread Joe
Hi, I am not sure, so i need ur opinion to these 2 questions: Is it save to search an index while its beeing optimized by another java process? Is it save to add documents to an index while its beeing optimized by another java process?

Re: addding/searching documents during optimize

2007-05-30 Thread Joe
hi, 1. Yes it is safe to search while optimizing and adding documents to an index. 2. NO you can not add documents to an index while it is optimized. You can only have one instance of IndexWriter working on an index HTH yes it did,thx -

How to modify a document Field before the document is indexed?

2010-07-19 Thread Joe Hansen
your time! Regards, Joe - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: How to modify a document Field before the document is indexed?

2010-07-19 Thread Joe Hansen
Thanks for your reply Koji! Your suggestion worked fine. I thought adding a field named "contents" to a document, even though it contains a field already named "contents" would NOT do anything. But looks like I am wrong! Thank you for your kind help! :) Regards, Joe On Mon,

Re: "GROUP BY" query

2011-01-01 Thread Joe Scanlon
uot;), > i.e. > > idtext > 1 User not found > 3 Address not found > 4 Fatal error > > > Regards, > Benzion. > > > -- Joe Scanlon jscan...@element115.net Mobile: 603 459 3242 Office: 312 445 0018

Re: full text searching in cloud for minor enterprises

2011-07-05 Thread Joe Scanlon
e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Joe Scanlon jscan...@element115.net Mobile: 603 459 3242 Office: 312 445 0018 - To unsubsc

RE: No subsearcher in Lucene 3.3?

2011-08-30 Thread Joe MA
Thanks for the replies. Here is why I need the subreader (or subsearcher in earlier Lucene versions): I have multiple collections of documents, say broken out by years (it's more complex than this, but this illustrates the use case): Collection1 >>> D:/some folder/2009/*.pdf

RE: No subsearcher in Lucene 3.3?

2011-08-31 Thread Joe MA
> -Original Message- > From: Devon H. O'Dell [mailto:devon.od...@gmail.com] > Sent: Tuesday, August 30, 2011 8:04 PM > To: java-user@lucene.apache.org > Subject: Re: No subsearcher in Lucene 3.3? > > 2011/8/30 Joe MA : > > When searching a single collection, no proble

SearcherTaxonomyManager usage

2013-10-26 Thread Joe Eckard
Hello, I'm new to lucene and I am having some trouble figuring out the right way to use a SearcherTaxonomyManager for NRT faceted search. Assuming I set up the STM with a reopen thread: // Index Writer Directory indexDir = FSDirectory.open(new File(indexDirectoryPath)); IndexWriterCon

Index + Taxonomy Replication

2013-10-30 Thread Joe Eckard
Hello, I'm attempting to setup a master/slave arrangment between two servers where the master uses a SearcherTaxonomyManger to index and search, and the slave is read-only - using just an IndexSearcher and TaxonomyReader. So far I am able to publish new IndexAndTaxonomyRevisions on the master and

Re: Index + Taxonomy Replication

2013-11-01 Thread Joe Eckard
cking the commit data / index epoch to see if taxonomy directory had been inadvertently replaced. Thanks again, Joe On Fri, Nov 1, 2013 at 12:29 PM, Shai Erera wrote: > Opened https://issues.apache.org/jira/browse/LUCENE-5320. > > Shai > > > On Fri, Nov 1, 2013 at 4:59 PM,

Possible issue with Tokenizer in lucene-analyzers-common-4.6.1

2014-03-20 Thread Joe Wong
be if input == ILLEGAL_STATE_READER? Regards, Joe

Re: Possible issue with Tokenizer in lucene-analyzers-common-4.6.1

2014-03-20 Thread Joe Wong
r.setReader(Tokenizer.java:89) at org.apache.lucene.analysis.Analyzer$TokenStreamComponents.setReader(Analyzer.java:307) at org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:145) at LuceneStemmer.stem(LuceneStemmer.java:28) at LuceneStemmerTest.stem(LuceneStemmerTest.java:16) Thanks. Regards, Joe On Thu, Mar 20, 2014 at 1:40

Re: Possible issue with Tokenizer in lucene-analyzers-common-4.6.1

2014-03-20 Thread Joe Wong
ings. There is a second method in Analyzer that > takes a String to analyze (instead of Reader). This one uses an optimized > workflow internally. > > Uwe > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetap

Problem running demo

2014-04-22 Thread Joe Cabrera
terminal is there a specific way it should be run. Thanks, Joe Cabrera

Re: Problem running demo

2014-04-22 Thread Joe Cabrera
I was able to get the demo jars built using Ant and Ivy. It might be a good idea to include in the documentation a reference to Ant and Ivy and exactly which targets should be used. Cheers, On Tue, Apr 22, 2014 at 8:10 AM, Joe Cabrera wrote: > Hi. I am trying to run the demo as specified

Best way to search by pages

2016-11-26 Thread Joe MA
Greetings, I am trying to use Lucene to search large documents, and return the pages where a term(s) is matched. For example, say I am indexing 500 auto manuals, each with around 1000 pages each. So if the user searched for "Taurus" and "flat" and "tire", a good result could be "2006 Ford Ta

Updating the DocValues field doesn't seem to update its associated StoredField value

2017-06-12 Thread Joe Ye
esn't update its associated StoredField value. What do I miss here? I would highly appreciate your help! Regards, Joe

Re: Updating the DocValues field doesn't seem to update its associated StoredField value

2017-06-19 Thread Joe Ye
Hi, Could anyone help with my issue described below? If I'm not posting on the right mailing list please direct me to the correct one. Many thanks, Joe On Mon, Jun 12, 2017 at 3:05 PM, Joe Ye wrote: > Hi, > > I have a few NumericDocValuesField fields and also added separate

Re: Updating the DocValues field doesn't seem to update its associated StoredField value

2017-06-19 Thread Joe Ye
ociated stored field? Is there anything similar/equivalent to useDocValuesAsStored in Lucene core? We're trying to use docValues to avoid a full update (delete + create new)... Yet, we still need to retrieve the updated values. Regards, Joe On Mon, Jun 19, 2017 at 4:16 PM, Michael McCandless &l

Re: Updating the DocValues field doesn't seem to update its associated StoredField value

2017-06-22 Thread Joe Ye
to get the correct value for the target docId (and I'm not sure why 4 values here)? What did I miss/do wrong? Could you point me to the right direction (with examples) please? Many thanks, Joe On Tue, Jun 20, 2017 at 12:14 AM, Michael McCandless < luc...@mikemccandless.com> wrote:

Re: Updating the DocValues field doesn't seem to update its associated StoredField value

2017-06-23 Thread Joe Ye
Thanks very much Mike! That's very helpful! I got MultiDocValues.getNumericValues to work. A follow up question: what's the best way/how do I retrieve binaryDocValues? Regards, Joe On Fri, Jun 23, 2017 at 11:00 AM, Michael McCandless < luc...@mikemccandless.com> wrote: > Try

DocValue update methods don't appear to throw exception if the document doesn't exist

2017-07-04 Thread Joe Ye
t to check the existence of the document before each docValue update. Kind regards, Joe

Re: DocValue update methods don't appear to throw exception if the document doesn't exist

2017-07-06 Thread Joe Ye
sk? If so, when happens if a crash occurs before those updates are committed? Many thanks, Joe On Tue, Jul 4, 2017 at 10:53 PM, Trejkaz wrote: > On Tue, 4 Jul 2017 at 22:39, Joe Ye wrote: > > > Hi, > > > > I'm using Lucene core 6.6. > > &

Not-indexed, Stored Thumbnails or NoSQL?

2018-12-02 Thread Joe MA
Greetings, I have an index where I import documents such as powerpoint, PDF, and so forth. One nice feature I added is that for each document, I store a thumbnail of the first page as an encoded String (uuencode) using a stored,not-indexed field. This thumbnail gets displayed when the user fi

RAMDirectory or Redis

2018-12-02 Thread Joe MA
Greetings, Has anyone looked into using Redis or some other in-memory cache with Lucene? It seems that ElasticSearch may do this. Are there advantages to doing this versus, say, the RAMDirectory class? Thanks in advance, J

Calculated terms during a query

2009-01-07 Thread Joe MarkAnthony
Greetings, I would like to search for items based on 'calculated' terms. Specifically, say I am using Lucene to search a collection of tasks, with fields "start_date" and "end_date", among others. The question to solve is: "Find all tasks that took longer than 100 days". So the easy answer

Re: Lucene indexing error

2007-10-08 Thread Joe Attardi
r.open(IndexReader.java:141) > at org.apache.lucene.index.IndexReader.open(IndexReader.java:136) Looks like you don't have permission to access the Lucene index file(s). -- Joe Attardi [EMAIL PROTECTED] http://thinksincode.blogspot.com/

Re: Lucene indexing error

2007-10-08 Thread Joe Attardi
ex - do you get the same error then? -- Joe Attardi [EMAIL PROTECTED] http://thinksincode.blogspot.com/

Re: Please unsubscribe me from this group mail

2007-10-09 Thread Joe Attardi
Uttam, To unsubscribe you need to send an email to [EMAIL PROTECTED] -- Joe Attardi [EMAIL PROTECTED] http://thinksincode.blogspot.com/ On 10/9/07, Barik, Uttam <[EMAIL PROTECTED]> wrote: > > > > > Regards, > Uttam Kumar Barik > > IT Services > Fidelity Bus

WildCardQuery and TooManyClauses

2008-04-10 Thread Joe K
Hello everybody, I know there was written a tons of words about this issue, but I'm just not clear enough about it. I have these facts: 1. my query is always 1 letter and *, eg. M* 2. i always want to get max 200 results, no more! 3. i don't want to fix this issue by setting maxClauseCount I jus

Re: WildCardQuery and TooManyClauses

2008-04-10 Thread Joe K
arch, Mathematical Sciences Department > IBM T.J. Watson Research Center > (914) 945-2472 > http://www.research.ibm.com/people/g/donnagresh > [EMAIL PROTECTED] > > > "Joe K" <[EMAIL PROTECTED]> wrote on 04/10/2008 08:53:06 AM: > > > Hello everybody, > >

Re: WildCardQuery and TooManyClauses

2008-04-10 Thread Joe K
Watson Research Center > (914) 945-2472 > http://www.research.ibm.com/people/g/donnagresh > [EMAIL PROTECTED] > > > "Joe K" <[EMAIL PROTECTED]> wrote on 04/10/2008 08:53:06 AM: > > > Hello everybody, > > I know there was written a tons of words about

Re: WildCardQuery and TooManyClauses

2008-04-18 Thread Joe K
bits.set(termDocs.doc()); >} >} else { >break; >} >} while (enumerator.next()); >} finally { > termDocs.close(); >enumerator.close(); >

Mutliple indexes or many small documents?

2006-10-05 Thread Joe Shaw
ts in the index, although the content of those documents will be substantially less. I can also do this in one index and not search indexes ParallelReader-style. What are people's gut feelings on how this approach will impact the indexing and search performance in terms of both speed and memor

Re: index architectures

2006-10-18 Thread Joe Shaw
in the worst case (my second example) searches are 50% slower, but in almost all other cases they're quite a bit faster. Hope this helps, Joe - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Announcement: Lucene powering Monster job search index (Beta)

2006-10-30 Thread Joe Shaw
nges to the Lucene core? Thanks, Joe - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: First search is slow after updating index .. subsequent searches very fast

2006-12-21 Thread Joe Shaw
posix_fadvise (fd, 0, 0, POSIX_FADV_SEQUENTIAL); // tell the kernel you will need the whole file. posix_fadvice (fd, 0, 0, POSIX_FADV_WILLNEED); I don't know offhand if Java binds these APIs, though. Joe ---

Re: Use the Luke, Force

2007-01-11 Thread Joe Shaw
Hi, Benson Margulies wrote: My experience tonight is that the stock 1.9-based Luke won't open my 2.0 indices. So I fixed up a version of the source. I've been seeing this too. Anyone else want it? That would be great, if you don't mind. A jar would be ni

How to not tokenize HTML tag from input string

2007-02-07 Thread Joe Tang
My work is to index keywords with a document. In my case, the document is made up with HTML tags which i don't want to index them. For example: Input Document: You are welcome Testing text Expected Keywords: keywords:You keywords:are keywords:welcome keywords:Testing keywords:text Is there

Re: Using ParallelReader over large immutable index and small updatable index

2007-03-07 Thread Joe Shaw
y index and set its bit. It probably doesn't scale to millions of matches, but it scales pretty well to tens of thousands. I'd suggest breaking down into smaller indexes if you can, and run this process across each of them

How to sort on a tokenised field?

2007-04-10 Thread Joe Tang
My task is to index lots of documents with different fields. Some of the fields are tokenized and are going to be sorted later on when a list of result set is need to particular field. Unfortunately, Lucene complains about sort on a tokenized field. So is there any way to get around of it? Thank

Re: How to sort on a tokenised field?

2007-04-10 Thread Joe Tang
that field? There's no good > *general* answer that I've been able to see. > > So I suspect you really want to do something that's not > document sorting, and if you'd make a clearer statement of > what you're trying to accomplish I'm sure you'd ge

Re: Searching on a Rapidly changing Index

2007-05-24 Thread Joe Shaw
few times on the various lists. What constitutes warming up a searcher, simply running a dummy query? Joe - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Number of documents in an index with filter

2007-05-27 Thread Joe MarkAnthony
Greetings, I would like to add the number of possible hits in my queries, for example, "found 18 hits out of a possible 245,000 documents". I am assuming that IndexReader.numDocs() is the best way to get this value. However, I would like to use a filter as part of the query. What is the most e

IndexWriter.updateDocument(Term, Document) not removing old Document?

2007-07-03 Thread Joe Attardi
een by a reader on the same index (flush may happen only after the add)." Thanks for your help.. -- Joe Attardi [EMAIL PROTECTED] http://thinksincode.blogspot.com/

Re: IndexWriter.updateDocument(Term, Document) not removing old Document?

2007-07-03 Thread Joe Attardi
Hi Erick, I'm guessing that your problem is what gets indexed. What analyzer are you using when indexing? One that breaks words apart on, say, periods? I am using the StandardAnalyzer. When I do a test query using Luke, it returns the object I'm looking for. The query I use is: id:"com.mycomp

Re: IndexWriter.updateDocument(Term, Document) not removing old Document?

2007-07-03 Thread Joe Attardi
om working? Thanks... -- Joe Attardi [EMAIL PROTECTED] http://thinksincode.blogspot.com/ On 7/3/07, Joe Attardi <[EMAIL PROTECTED]> wrote: Hi Erick, I'm guessing that your problem is what gets indexed. What analyzer > are you using when indexing? One that breaks words apart on, say

Re: IndexWriter.updateDocument(Term, Document) not removing old Document?

2007-07-03 Thread Joe Attardi
Hi Chris, That did it! Thanks for the help. I should have read the javadocs for Field.Index more closely! Thanks to everyone else for their input too. -- Joe Attardi [EMAIL PROTECTED] http://thinksincode.blogspot.com/ On 7/3/07, Chris Hostetter <[EMAIL PROTECTED]> wrote: It sounds lik

Assembling a query from multiple fields

2007-07-25 Thread Joe Attardi
I don't want to get ahead of myself. One at a time... :) Appreciate any help you all might have! -- Joe Attardi

Indexing/Analyzer question - case-insensitive "contains" search

2007-07-30 Thread Joe Attardi
ored/not indexed("Joe's Devices") ? Or can I accomplish this case-insensitive "contains" search some other way - would I have to write a custom Analyzer, or something? Thanks in advance! -- Joe Attardi [EMAIL PROTECTED] http://thinksincode.blogspot.com/

Re: Indexing/Analyzer question - case-insensitive "contains" search

2007-07-30 Thread Joe Attardi
> > It does sound very strange to me, to default to a WildCardQuery! Suppose I > am looking for "bold", I am getting hits for "old". I know - but that's what the requirements dictate. A better example might be a MAC or IP address, where someone might be searching for a string in the middle - like,

Running query text through an Analyzer without QueryParser?

2007-07-30 Thread Joe Attardi
Following up on my recent question. It has been suggested to me that I can run the query text through an Analyzer without using the QueryParser. For example, if I know what field to be searched I can create a PrefixQuery or WildcardQuery, but still want to process the search text with the same Anal

Re: Running query text through an Analyzer without QueryParser?

2007-07-30 Thread Joe Attardi
So then would I just concatenate the tokens together to form the query text? -- Joe Attardi [EMAIL PROTECTED] http://thinksincode.blogspot.com/ On 7/30/07, Erick Erickson <[EMAIL PROTECTED]> wrote: > > Would this work? > > TokenStream ts = StandardAnalyzer.tokenStream(); &

Re: Running query text through an Analyzer without QueryParser?

2007-07-30 Thread Joe Attardi
should I just index a MAC address as UN_TOKENIZED ? Thanks -- Joe Attardi [EMAIL PROTECTED] http://thinksincode.blogspot.com/ On 7/30/07, Ard Schrijvers <[EMAIL PROTECTED]> wrote: > > > > > > So then would I just concatenate the tokens together to form > > the query te

Re: Problem Search using lucene

2007-07-31 Thread Joe Attardi
You are probably using the StandardAnalyzer which removes stop words such as "and". -- Joe Attardi [EMAIL PROTECTED] http://thinksincode.blogspot.com/ On 8/1/07, masz-wow <[EMAIL PROTECTED]> wrote: > > > I understand that only document that has been indexed will be able

More IP/MAC indexing questions

2007-08-01 Thread Joe Attardi
ken per octet ("00", "17", "fd", "14", "d3", "2a"). Many searches will be for partial IPs or MACs ("192.168", "00:17:fd", etc). Are either of these methods of indexing the addresses (single token vs per-octet token)

Re: More IP/MAC indexing questions

2007-08-01 Thread Joe Attardi
Hi Erick, First, consider using your own analyzer and/or breaking the IP addresses > up by substituting ' ' for '.' upon input. Do you mean breaking the IP up into one token for each segment, like ["192", "168", "1", "100"] ? > But on to your question. Please post what you mean by > "a large n

Re: More IP/MAC indexing questions

2007-08-01 Thread Joe Attardi
On 8/1/07, Erick Erickson <[EMAIL PROTECTED]> wrote: > > Use a SpanNearQuery with a slop of 0 and specify true for ordering. > What that will do is require that the segments you specify must appear > in order with no gaps. You have to construct this yourself since there's > no support for SpanQueri

How do YOU detect corrupt indexes?

2007-08-02 Thread Joe R
Hello, I've been asked to devise some way to discover and correct data in Lucene indexes that have been "corrupted." The word "corrupt", in this case, has a few different meanings, some of which strike me as exceedingly difficult to grok. What concerns me are the cases where we don't know that

Re: How do YOU detect corrupt indexes?

2007-08-03 Thread Joe R
We're planning on using encryption at the filesystem level (whole-disk encryption) and, to be honest, I don't have a mechanism that can produce the changes I'm talking about. Neither does my boss, unfortunately ;) He came along one day and asked, "how do we know when data changed on disk without

Lucene PhraseQuery Problem No Hits

2006-02-11 Thread Joe Amstadt
Problem. I can add one or multiple TermQuery's to the BooleanQuery for searching and I am getting Hits when i preform the search on various indexes. If i add a PhraseQuery to the BooleanQuery on a search i get zero hits. Some Background Information: Indexing using standard anaylzer. I

Re: IndexFiles.java

2006-03-14 Thread Joe Scanlon
you need to specify it from the command line ie, java org.apache.lucene.demo.IndexFile 'type in your starting directory here' On 3/14/06, Miki Sun <[EMAIL PROTECTED]> wrote: > > I think I did. I modified these code: > > //creat a directory to write the indices to > static final File INDEX_DIR =

Technical Lead - Search

2006-04-17 Thread Joe Taylor
All- We are looking for someone with search experience (we leverage Lucene) to lead a small team of developers as described below. If you are interested, send your resume to [EMAIL PROTECTED] Thanks. Joe Job Title: Technical Lead/Engineering Manager - Ariba Content Summary: Ariba

MultiFieldQueryParser Search On C++ problem

2006-06-14 Thread Joe Amstadt
I'm trying to do a search on ( Java PHP C++ ) with lucene 1.9. I am using a MultiFieldQueryParser to parse with StandardAnalyzer. Before I parse the string I clean up the search string and it looks like this ( Java PHP C\+\+ ). The query is only searching on "c" and not "c++" any ideas as to what

Re: Lucene Dynamic http Web Page Search

2006-06-29 Thread joe kim
Hi Clive, Lucene is a general purpose search engine. If you need crawling capabilities on top of Lucene take a look at Nutch: http://lucene.apache.org/nutch/ On 6/29/06, Clive. <[EMAIL PROTECTED]> wrote: Hi, I am working on adding a search feature to a web site that uses single database dri

Re: Lock File

2006-06-29 Thread joe kim
Lucene uses this lock to ensure the index does not become corrupt when IndexReaders and IndexWriters are working on the same index. What are the conditions that cause corruption? If there is just one writer and multiple readers, is that safe? ---