slow brown fox jumped over the lazy dog
If I searched for "quick brown", is there a way I could see that it was hit
4 times within the document?
Thanks,
Jeff
If I am not mistaken, that is for a term.. Is it possible for a query? In
the below example, I don't want to know how many times brown is in the
document I want to know how many times "quick brown" is in the document.
Thanks,
Jeff
On Dec 20, 2007 3:03 PM, Mark Miller <[EMAIL
I found that reducing my index from 8G to 4G (through not stemming) gave me
about a 10% performance improvement.
How did you do this? I don't see this as an option.
Jeff
eperator. Is there an easy way to add ',' as a token seperator?
Thanks,
-Jeff
Hi all,
I only want to index the latest one week's data, the previous data can
be deleted. So I'd like to know about lucene's delete performance and
whether it will has impact on the search performance when I do lots of
delete operation in the meantime. Thanks
--
Best Rega
been fixed and/or reduced in later versions (say 5.x or 6.x)?
Thank you for any info.
Jeff Wallace
Software Development, FileNet
IBM Corp.
1540 Scenic Ave.
Costa Mesa, CA 92626
(714) 327-7163 direct
-
To unsubscribe, e-mail: java
Hello,
I have been looking into tuning the garbage collector for solr. I found
this entry on the lucene wiki that seems to be out of date.
The bug referenced is reported as resolved now. Could someone validate
whether it is safe to use G1 garbage collection with lucene?
"Do not, under any circum
document and I though I would treat each field as
a key word to minimize processing.
Assuming you have clusters operating on independent datasets (so I guess it
would scale linearly) and you want to process Terabytes of logs per day,
is such a solution even feasible?
Thank you,
Jeff Capone
ligion" in documents published within a range of dates.
Thanks
Jeff
On May 10, 2009, at 11:35 AM, Uwe Schindler wrote:
You can get this list using IndexReader.terms(new
Term(fieldname,"")). This
returns an enumeration of all terms starting with the given one (the
field
name). Just
omplish this? Right now I am having to
hit a look up table to translate the city before searching against the
main index - not a fan of this option.
Thanks.
-Jeff Plater
Thanks - I tried it out and it seems to work for "Philadelphid~0.75 PA" but I
can't get it working for "Phil* PA" yet. Perhaps it is an issue with my
Analyzer (I am using WhitespaceAnalyzer)?. Have you used it with wildcard
before?
-Jeff
-Original Messag
Thanks for the suggestion - I double checked the case and it was OK.
Turned out I needed to use the StandardAnalyzer instead of the
WhitespaceAnalyzer.
-Jeff
-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Wednesday, November 11, 2009 6:52 PM
To: java-user
words and such) which can produce an invalid sort order?
Thanks.
-Jeff
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
Thanks - so if my sort field is a single term then I should be ok with
using an analyzer (to lowercase it for example).
-Jeff
-Original Message-
From: J.J. Larrea [mailto:j...@panix.com]
Sent: Monday, November 16, 2009 11:19 AM
To: java-user@lucene.apache.org
Subject: Re: Sort fields
h time you won't be able to use wildcard searching (unless you don't care
about wildcard searching).
-Jeff
-Original Message-
From: Michel Nadeau [mailto:aka...@gmail.com]
Sent: Mon 12/14/2009 4:36 PM
To: java-user@lucene.apache.org
Subject: Lower/Uppercase problem when searchi
, e-mail: java-user-h...@lucene.apache.org
>
>
--
Best Regards
Jeff Zhang
ow which one is better, any help is appreciated.
--
Best Regards
Jeff Zhang
t;
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>
--
Best Regards
Jeff Zhang
a.org/wiki/DRBD
Thanks,
Jeff
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
ere a way to default to no
slop, preferrably without changing all of our queries?
Thanks for any pointers.
Jeff
--
View this message in context:
http://www.nabble.com/Question-about-Qsol-parser-and-phrase-searches-tf4410480.html#a12582
We're trying to perform a query where if our intended search term/phrase is
part of a specific larger phrase, we want to ignore that particular match,
but not the entire document (unless of course there are no other hits with
our intended term/phrase). For example, a query like:
"white house"
I am wanting to be able to put sets of data in a very structured way and
query Lucene for only 100% matches. Is there a way to do this? I seem to
be getting back at best 0.30685282. I appreciate any help and insite.
Jeff Richley, Vice President
Southeast Virginia Java Users Group
[EMAIL
Ah good question. The data that I am needing to query on is not a set
definition of tables or columns like a database is. Let me give two
examples:
1.) I have data like name="Jeff" lastname="Richley" age="33" and I need to
be able to query by any combination such
help would be greatly appreciated.
>
> : 1.) I have data like name="Jeff" lastname="Richley" age="33" and I need
> to
> : be able to query by any combination such as name="Jeff" age="33". But
> if
> : I query with name=&qu
;, "/a/b/c",
Field.Store.YES,
Field.Index.UN_TOKENIZED);
document.add(location);
Field name = new Field("name", "Jeff Richley",
Field.Store.YES,
ueryParser to build your queries for you, use the KeywordAnalyzer
> to
> : > make sure no lowercasing or stemming takes place.
> : > 2) OMIT_NORMs when indexing .. they only matter if you want the
> lengths
> : > of fields to affect the score, and you don't -- you only want t
Hi. I'm using Lucene to do some searching (using the Searcher object and
passing it a ParsedQuery). I search for a word such as "long" and it is
returning partial matches, such as "belong" and "along." Is there a way
to turn off this behavior and only match whole words?
Thank you,
Jeff
IndexSearcher to search the parsed
query created in Step 3.
That's it. Is this the proper way to be doing searching?
Thanks.
Jeff
-Original Message-
From: Paul Borgermans [mailto:[EMAIL PROTECTED]
Sent: Saturday, November 11, 2006 3:06 PM
To: java-user@lucene.apache.org
Subject: Re: Partial
arch for the term "yellow~" I might get something like "bellow." Is
there a way to list what Lucene found in the document that made it
relevant?
Thanks for all the help.
Jeff
-Original Message-
From: Paul Borgermans [mailto:[EMAIL PROTECTED]
Sent: Saturday, November 11,
Erick,
Very useful answers -- I'll be reading up more with the links you've
provided.
Thanks.
Jeff
-Original Message-
From: Erick Erickson [mailto:[EMAIL PROTECTED]
Sent: Saturday, November 11, 2006 5:51 PM
To: java-user@lucene.apache.org
Subject: Re: Partial Word Matches
Thanks for the quick reply. I'll be implementing this in the next couple
of days. Appreciate it!
Jeff
-Original Message-
From: Stephan Spat [mailto:[EMAIL PROTECTED]
Sent: Monday, November 20, 2006 8:43 AM
To: java-user@lucene.apache.org
Subject: Re: Q: Highlighter + Search sy
rmB"~99)
I did this playing around with table cells, and it seems to work so far.
Jeff
rossini wrote:
>
> Actually no,
>
>Because I'd like to retrieve terms that were computed on the same
> instance of Field. Taking your example to ilustrate better, I have 2
>
od to the buffer for each parent
element. Then I removed the current element and added its content as a
Field.
I should add that I am also fairly new to Lucene, so just because I did it
that way doesn't mean it's the best or even a good way.
Jeff
Spencer Tickner wrote:
>
&
do something like this (in search
pseudocode):
sent:(expired num[1 TO 5] "days ago")
I don't see how to do this using either Lucene's QueryParser or the
QsolParser. Is it possible to do it using the Query API (and the appropriate
indexing changes)?
Thanks for any pointers.
thanks Erik
On 10/26/05, Erik Hatcher <[EMAIL PROTECTED]> wrote:
>
>
> On 26 Oct 2005, at 02:50, Jeff Rodenburg wrote:
> > I'm considering building out an index that will flatten a data
> > structure,
> > such that some Document "A" will have
Kevin -
Maybe I'm misunderstanding, but how is this not a BooleanQuery with two
clauses?
- j
On 10/26/05, Kevin L. Cobb <[EMAIL PROTECTED]> wrote:
>
> I've been using Lucene happily for a couple of years now. But, this new
> search functionality I'm trying to add is somewhat different that what
Hi John -
It sounds like you're thinking of your index in terms of sql constructs --
multiple rows for the same record. We do this very same thing with
categories; if you have a record that lives in multiple categories, just add
additional category field/value pairs for your original record. It's
27;ve seen performance in terms of requests/second
drop by a factor of 10, compared to similar tests executing only search
requests (no sorts). CPU appears to be our bottleneck, and I'm trying to
determine if this is expected behavior or if we're outside the bounds of
typical performance.
Thanks,
jeff
(especially for numeric fields).
>
> If you haven't already, you should compare the query times of a
> "warmed" searcher. Sorted queries will still take longer, but I
> haven't measured how much longer.
>
> -Yonik
> Now hiring -- http://forms.cnet.com/slink?
On 11/30/05, Daniel Pfeifer <[EMAIL PROTECTED]> wrote:
>
>
> 1.) Does Lucenes MultiSearcher implement some kind of automatic failover
> and/or load-balancing mechanism if both Searchables which I supply in
> MultiSearchers constructor go to two different servers but to the very same
> index-files?
George -
There are a number of SQL Server specific ways you can do this. Email me
off-list as the solution is not relevant to Lucene.
-- j
On 12/2/05, George Abraham <[EMAIL PROTECTED]> wrote:
>
> All,
> I have created a Lucene index from data in a SQL Server db. When I conduct
> a
> Lucene sea
In one of the Google Labs whitepapers (
http://labs.google.com/papers/mapreduce-osdi04.pdf), a programming construct
known as MapReduce is used in a variety of jobs/tasks within Google's
operation. As an example of the application of MapReduce, the whitepaper
refers to Distributed Sorting.
Essent
thanks Erik
On 12/3/05, Erik Hatcher <[EMAIL PROTECTED]> wrote:
>
>
> On Dec 3, 2005, at 1:26 PM, Jeff Rodenburg wrote:
>
> > In one of the Google Labs whitepapers (
> > http://labs.google.com/papers/mapreduce-osdi04.pdf), a programming
> > construct
> >
Check out Chris Hostetter's methodology for doing this at cnet.
http://mail-archives.apache.org/mod_mbox/lucene-java-user/200508.mbox/[EMAIL
PROTECTED]
This sounds like it matches your requirements.
cheers,
j
On 12/7/05, Ching-Pei Hsing <[EMAIL PROTECTED]> wrote:
>
> Has anyway solved the foll
Well done, Grant. Very informative.
Question on Term Vectors: with their inclusion in an index, have you noticed
any degradation in performance, either from a search effiiciency or
maintenance point-of-view? Given the power of term vectors, if the perf
impact is negligible, I'm curious to the re
index file?
I start jvm with 800MB.
thanks,
Jeff
field that should retrieve a lot of
records, it normally throws the exception.
I will look at MultiSearcher. do you think split the index file based
on date field is a good choice? I somehow feel it requires a lot of
coding to create many indexes based on date field.
Thanks,
I'm very interested in incorporating smart geographic querying capabilities
(distance calcs are just scratching the surface) into Lucene and came across
this whitepaper:
http://www.clef-campaign.org/2005/working_notes/workingnotes2005/leidner05.pdf
Just curious, has anyone ventured down this path
One way to do this (depending on your system and index size) is to remove
and add every url you find. This would ensure that every document in the
index is unique. No need to worry about sorting and iteration and doc_ids
and the like.
It rebuilds your entire index, but if you have a duplication
Have you considered evaluating doc-score thresholds for limiting your
results? Since the perfect answers to these situations lie in the constant
tweaking and twiddling of analysis and tokenization, one way I've found to
help is to evaluate result scores. In your "Ontario CA" example, limiting
res
Vikas -
Start with the RemoteSearchable class. Technology will be RMI.
Hope this helps.
On 2/2/06, Vikas Khengare <[EMAIL PROTECTED]> wrote:
>
> Hi Friends
>
> How do I send one search query to multiple search Indexes which are
> on remote machines ?
>
> Which Technology will help me (A
to tackle this problem with Lucene or another api if doing so makes more
sense?
Thanks,
Jeff
You can generate a token stream for a block of text without having to index
it. Take a look at the highlighter code, it does this very thing.
On 2/5/06, Jeff Thorne <[EMAIL PROTECTED]> wrote:
>
> I am trying to figure out whether or not Lucene is an appropriate solution
> for a p
The site will have million+ posts. I am not familiar with Bayesian
algorithms. Is there an off the shelf API that can provide this type of
capability. As for performance would Bayesian be the way to go over Lucene?
Thanks for the help,
Jeff
-Original Message-
From: gekkokid [mailto
ted approximately 145
clauses within the final constructed query. In validation testing, this
approach has proven to be:
1) Accurate.
2) Performant (thus far).
At last, my question to everyone who cares to respond (and read this far):
feedback?
Thanks,
-- jeff
el [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, February 28, 2006 2:49 PM
> To: java-user@lucene.apache.org
> Subject: RE: Hacking proximity search: looking for feedback
>
> Jeff -
>
> This is an interesting approach. On our end, we have experimented with
> two variants:
>
&g
component
of relevance? We have a need for distance sorting, but I'm trying to slay
that beast at a later stage.
-- jeff
On 2/28/06, Bryzek.Michael <[EMAIL PROTECTED]> wrote:
>
> Jeff -
>
> This is an interesting approach. On our end, we have experimented with
> two va
the notes.
-- jeff
On 2/28/06, Chris Hostetter <[EMAIL PROTECTED]> wrote:
>
>
> : Geo definition:
> : Boxing around a center point. It's not critical to do a radius search
> with
> : a given circle. A boxed approach allows for taller or wider frames of
> : reference
FunctionQueries to influence your scores based on distance fro mthe
> center of hte box.
>
> :
> : Great feedback, thanks for the notes.
> :
> : -- jeff
> :
> : On 2/28/06, Chris Hostetter <[EMAIL PROTECTED]> wrote:
> : >
> : >
> : > : Geo d
Very good note, I missed that. I need the development environment in front
of me to remember all the different class names correctly. ;-)
-- j
On 3/1/06, Doug Cutting <[EMAIL PROTECTED]> wrote:
>
> Jeff Rodenburg wrote:
> > Following on the Range Query approach, how is per
Raul -
You'll want to look at the MultiSearcher and ParallelMultiSearcher classes
for this.
On 3/3/06, Raul Raja Martinez <[EMAIL PROTECTED]> wrote:
>
> Is it possible to search many indexes in one query and get back the Hits
> ordered by relevance?
>
> Can someone point me out to some document o
We've done this, and it's not that complex. (Sorry, client won't allow me
to release the code.)
It's AJAX on the front end, so that background call is simply executing a
search against an index that consists of the aggregated search terms. We do
wildcard queries to get the results we want. For u
data types, etc.
I'm working on this mostly for myself, but if anyone is interested just send
me an email off-list.
cheers,
-- jeff r.
Does anyone have a lead on "business" stop words? Things like "inc", "llc",
"md", etc.
I'd rather not reinvent this wheel. :-)
cheers,
jeff
I run Lucene.Net as well, and your indexing performance is dependent on more
factors aside from whether you're using the Java or C# version. As a basic
suggestion, learn what you can about minMergeDocs and mergeFactor as well as
the compound file format. Try different combinations to understand w
that use a high number of clauses, but another set that needs a low number
of clauses (different indexes searched, and efficiencies dictate the
high/low clause range.)
cheers,
jeff
y can
sometimes cause problems when both types of queries need to execute
simultaneously.
-- j
On 4/15/06, Paul Elschot <[EMAIL PROTECTED]> wrote:
>
> On Saturday 15 April 2006 18:20, Jeff Rodenburg wrote:
> > What was the thinking behind making the BooleanQuery maxClauseCount a
> &
Marc -
We built our index maintenance operation to assume a breakdown would occur
in process (because it happened several times.) We exist in an environment
where "always on, always available" is a business requirement. We also do a
lot of updates on a cyclical basis (every 10 minutes), so malf
The Keyword analyzer does no stemming or input modification of any sort:
think of it as WYSIWYG for index population. The Whitespace analyzer simply
removes spaces from your input (still no stemming), but the tokens are the
individual words. I don't have the code in front of me, so I'm not sure
3.6Ghz I think.) I frankly haven't tested out
scalability yet.
Jeff
Emptoris, Inc.
-Original Message-
From: Vladimir Olenin [mailto:[EMAIL PROTECTED]
Sent: Monday, June 26, 2006 7:56 AM
To: java-user@lucene.apache.org
Subject: search performance benchmarks
Hi,
I'm evaluat
I have a clustered environment, with a load-balancer in the front
assigning connections. Is it better to have one of the cluster running
a searcher as a webservice (to be accessed by the other machines in the
cluster) or to have a IndexReader/Searcher for each machine in the
cluster?
Jeff
Heh, you said it better than I. I was just about to reply with the
witty "Nutch is Lucene, isn't it?"
Jeff
-Original Message-
From: Chris Hostetter [mailto:[EMAIL PROTECTED]
Sent: Friday, July 07, 2006 10:28 AM
To: java-user@lucene.apache.org
Subject: Re: Nutch- Bet
but I
would like to understand the bounds of the problem a bit better.
Any advice?
Thanks,
Jeff Schnitzer
SubEtha Mailing List Manager - http://subetha.tigris.org/
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Hi Mark -
Having gone down this path for the past year, I echo comments from others
that scalability/availability/failover is a lot of work. We migrated away
from a custom system based on Lucene running on Windows to Solr running on
Linux. It took us 6 months to get our system to a solid five-n
Why is a single server so important? I can scale horizontally much cheaper
than I scale vertically.
On 8/11/06, Mark Miller <[EMAIL PROTECTED]> wrote:
I've made a nice little archive application with lucene. I made it to
handle our largest need: 2.5 million docs or so on a single server. Now
On 8/12/06, Mark Miller <[EMAIL PROTECTED]> wrote:
The single server is important because I think it will take a lot of
work to scale it to multiple servers. The index must allow for close to
real-time updates and additions. It must also remain searchable at all
times (other than than during the
Have you considered left-padding your numbers with zeros to make each
number a string of the same length?
e.g., The number 5 would be indexed/queried as "5", which can be
correctly compared to 10 ("00010"), 2345 ("02345"), etc. in a lexical
comparison...
Jeff
O
this a bug, or is it intentional,
or am I missing something?
Thanks,
Jeff
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
That fix works perfectly, as far as I can tell.
As for the unit test, it should actually be:
assertEquals("192.168.0.15\\public", discardEscapeChar
("192.168.0.15\\\\public"));
Jeff
On 7/20/05, Eyal <[EMAIL PROTECTED]> wrote:
> I think this sh
s. I'm running it through the
query parser at present; if I end up sticking with this method I'll
just build a BooleanQuery with a clause for each log type to avoid the
parsing overhead.
Other than the BooleanQuery, is there a more efficient way of
explored the concept of executing multiple sub-searches
to get filters and their subsequent counts, but my requirements allows me a
sustainable, smaller set of potential dynamic filters. This is the concept,
haven't put it into practice so have no idea if it scales any better than
the brute force method.
-- jeff
tc.? What would
you do differently if you were starting from scratch?
Cheers from sunny Seattle,
jeff r.
Ids.
Should I be looking at BooleanQuery, QueryFilter or a custom filter?
QueryFilter (or customfilter, as I may pull the id values from a db) seem
like the *best* approach for my scenario. I'm just looking for feedback from
others who have gone this route and what their experiences yielded.
Thanks,
jeff
Is there a consensus or estimate on when v1.9 will be considered a stable
release? I'm prepping a deployment on v1.4.3 but would like an idea of when
1.9 might be considered stable in the eyes of the community.
-- Jeff Rodenburg
Mayday, mayday
Has anyone had recent contact with George Aroush? He's presently managing
the C# port of Lucene.
Thanks,
Jeff Rodenburg
nitial
thought is the problem lies in the custom filter I've created.
myCustomFilter extends Filter, and I'm following the BitSet comparitive
example as found in the LIA book. I've done nothing in myCustomFilter
regarding caching.
I'm doubting this is a bug, but rather something I've overlooked.
thanks,
jeff r.
Might be the same issue, haven't been able to determine during a
step-through on the code exec.
You're right, no need to add a new FilteredQuery to the statement, just a
search on combinedQuery with a new myCustomFilter.
Unfortunately, no joy; same response.
-- j
On 9/13/05, Chris Hostetter <[E
uals() then there's your problem.
Will do the step-through following this manner and post the results.
-- j
: Date: Tue, 13 Sep 2005 17:22:49 -0700
> : From: Jeff Rodenburg <[EMAIL PROTECTED]>
> : Reply-To: java-user@lucene.apache.org, [EMAIL PROTECTED]
> : To: Chris Hoste
Good call, Chris.I followed the BitSet comparison route and found that
the custom filter was working exactly as it should, but *I* wasn't passing
it correct data. Rookie mistake.
Doh! I hate it when that happens.
-- j
On 9/13/05, Jeff Rodenburg <[EMAIL PROTECTED]> wrote:
>
indexed.
This is an operational question, so the *best* way depends on your overall
operation, as both of these approaches have consequences on index
maintenance operations.
Hope this helps.
-- jeff
On 9/17/05, Ben Gill <[EMAIL PROTECTED]> wrote:
>
> Hi,
>
> I am storing
trimming the post further:
On 9/18/05, James Huang <[EMAIL PROTECTED]> wrote:
>
> >The problem is quite generic, I believe. What I like to do is similar to
> LIA-ch6, i.e. to find a "good Chinese Hunan-style restaurant near me." I
> prefer Hunan-style; however, if a good Human-style one is 12 m
plenty
of support on this mailing list, but you can educate yourself much more
effectively with that book. The authors lurk on this list. It's the cheapest
consulting ($40) you can get.
Cheers,
jeff
On 9/18/05, Kevin Stembridge <[EMAIL PROTECTED]> wrote:
>
>
> Would Lucene
I like Erik's suggestion here as a starting point. I would guess you might
find some direction in the Scorer class, but I haven't gone through this in
detail.
Conceptually a sliding weight based on proximity sounds correct...
-- jeff
On Sep 18, 2005, at 3:39 PM, James Huang wrote:
This is interesting, one I had not considered.
Mark - are there any code samples that implement this approach? Or maybe
something similar in approach?
thanks,
jeff
On 9/19/05, mark harwood <[EMAIL PROTECTED]> wrote:
>
> I think the HitCollector approach was fine but needed
&
ple, a search for "Wedgewood WA" would ideally not match "Wedgewood GA".
I'm starting with the StandardAnalyzer and thinking of possibly extending it
to carry in some of the business rules meant to come into play for
tie-breakers.
Comments appreciated.
Thanks,
jeff r.
Are there known limitations or issues with sorting and RemoteSearchable? I'm
encountering problems attempting to sort through a MultiSearcher
(ParallelMultiSearcher, actually). I'm using an array of RemoteSearchable
objects as the Searchable[] source. If I change the source indexes to be
local Inde
Thanks Rasik.
If this is the case, why is this exposed in the API? Should the overloaded
search method on ParallelMultiSearcher that takes a Sort object be removed?
I'm using the 1.4.3 codebase.
-j
On 10/5/05, Rasik Pandey <[EMAIL PROTECTED]> wrote:
>
> Hi Jeff,
>
> Sor
p the exceptions appropriately.
-- j
On 10/5/05, Rasik Pandey <[EMAIL PROTECTED]> wrote:
>
> Hi Jeff,
>
> Sorting needs access to an IndexReader so it can do Term lookups, and
> I don't think there is a remote impl of IndexReader probably because,
> among other reasons
lit them out in a string[] similar
to the LIA example?
cheers,
jeff r.
so need to pass in a *HitCollector* implementation
that subclasses UnicastRemoteObject, so that the callbacks can return to the
original VM.
So, if you can, it's considerably simpler and more efficient to use
TopDocs-based search when you're working remotely."
Is this still consider
1 - 100 of 108 matches
Mail list logo