Hi folks, have any of you successfully implemented LSH (MinHash) in
Solr? If so, could you share some details of how you went about it?
I know LSH is available in Mahout, but was hoping if someone has a
solr or Lucene implementation.
Thanks
Hi all, I have a use-case where I would need to search a set of
numeric values, using a query set. My business case is
1. I have various Rock samples from various locations {R1...Rn} with
multiple measurements like Porosity [255] - an array of values ,
Conductivity [1028] - also an array of
You can ask on this site http://patents.stackexchange.com/
On Sat, Sep 14, 2013 at 10:03 AM, Michael Sokolov
msoko...@safaribooksonline.com wrote:
On 9/13/2013 9:14 PM, Zaizen Ushio wrote:
Hello
I have a question about patent. I believe Apache license is protecting
Solr developers from
Here is a paper that I found useful:
http://theory.stanford.edu/~aiken/publications/papers/sigmod03.pdf
On Tue, Jul 23, 2013 at 10:42 AM, Furkan KAMACI furkankam...@gmail.com wrote:
Thanks for your comments.
2013/7/23 Tommaso Teofili tommaso.teof...@gmail.com
if you need a specialized
n-grams might help, followed by a edit distance metric such as Jaro-Winkler
or Smith-Waterman-Gotoh to further filter out.
On Sun, Jun 9, 2013 at 1:59 AM, Otis Gospodnetic otis.gospodne...@gmail.com
wrote:
Interesting problem. The first thing that comes to mind is to do
word expansion
Better still start here: http://en.wikipedia.org/wiki/Inverted_index
http://nlp.stanford.edu/IR-book/html/htmledition/a-first-take-at-building-an-inverted-index-1.html
And there are several books on search engines and related algorithms.
On Tue, May 28, 2013 at 10:41 PM, Alexandre Rafalovitch
Look up multicore solr. Another choice could be ElasticSearch - which
is more straightforward in managing multiple indexes IMO.
On Tue, Jul 17, 2012 at 7:53 PM, Zhang, Lisheng
lisheng.zh...@broadvision.com wrote:
Hi,
We have an application where we index data into many different directories
the doc, so we need to put each core name into
Solr config XML, if we add another core and change XML, do we
need to restart Solr?
Best regards, Lisheng
-Original Message-
From: shashi@gmail.com [mailto:shashi@gmail.com]On Behalf Of
Shashi Kant
Sent: Tuesday, July 17, 2012 5
We have used both Solr and graph databases for our XML file indexing. Both
are equivalent in terms of performance, but a graph db (such as Neo4j)
offers a lot more flexibility in joining across the nodes and traversing.
If your data is strictly hierarchical Solr might do it, alternately suggest
is not a static one. It must update
on the fly. As I know, Lucene index is not suitable to be updated too
frequently. If so, how to deal with that?
Best regards,
Bing
On Sun, Jan 22, 2012 at 12:43 PM, Shashi Kant sk...@sloan.mit.edu wrote:
Lucene has a mechanism to boost up/down documents using
Lucene has a mechanism to boost up/down documents using your custom
ranking algorithm. So if you come up with something like Pagerank
you might do something like doc.SetBoost(myboost), before writing to index.
On Sat, Jan 21, 2012 at 5:07 PM, Bing Li lbl...@gmail.com wrote:
Hi, Kai,
Thanks
for a simple, hackish (albeit inefficient) approach look up wildcard searchers
e,g foo*, *bar
On Thu, Dec 29, 2011 at 12:38 PM, Devon Baumgarten
dbaumgar...@nationalcorp.com wrote:
I have been tinkering with Solr for a few weeks, and I am convinced that it
could be very helpful in many of
You can also look at cosine similarity (or related metrics) to measure
document similarity.
On Tue, Dec 27, 2011 at 6:51 AM, vibhoreng04 vibhoren...@gmail.com wrote:
Hi iorixxx,
Thanks for the quick update.I hope I can take it from here !
Regards,
Vibhor
--
View this message in
https://wiki.apache.org/lucene-java/ScoresAsPercentages
On Mon, Aug 15, 2011 at 8:13 PM, Bill Bell billnb...@gmail.com wrote:
How do I change the score to scale it between 0 and 100 irregardless of the
score?
q.alt=*:*bq=lang:SpanishdefType=dismax
Bill Bell
Sent from mobile
Betamax VCR? really ? :-)
On Tue, Aug 9, 2011 at 3:38 PM, Chris Hostetter hossman_luc...@fucit.orgwrote:
: A quick question - is it possible to have 2 cores in Solr on two
different
: machines?
your question is a little vague ... like asking is it possible to have to
have two betamax
Check your Stop words list
On Jul 31, 2011 6:25 PM, François Schiettecatte fschietteca...@gmail.com
wrote:
That seems a little far fetched, have you checked your analysis?
François
On Jul 31, 2011, at 4:58 PM, randohi wrote:
One of our clients (a hot girl!) brought this to our attention:
Range query
On Tue, Jul 5, 2011 at 4:37 AM, Jame Vaalet jvaa...@capitaliq.com wrote:
Hi,
Let say, I have got 10^10 documents in an index with unique id being document
id which is assigned to each of those from 1 to 10^10 .
Now I want to search a particular query string in a subset of these
Here is a very interesting comparison
http://engineering.socialcast.com/2011/05/realtime-search-solr-vs-elasticsearch/
-Original Message-
From: Mark
Sent: May-31-11 10:33 PM
To: solr-user@lucene.apache.org
Subject: Solr vs ElasticSearch
I've been hearing more and more about
You might be better off looking for freelancers on sites such as
odesk.com, guru.com, rentacoder.com, elance.com many more
On Tue, May 17, 2011 at 4:09 PM, Markus Jelsma
markus.jel...@openindex.io wrote:
Check this out:
http://wiki.apache.org/solr/Support
Hi,
I am looking for an
Have you tried posting on odesk.com? I have had decent success finding
Solr/Lucene resources there.
On Thu, Nov 11, 2010 at 7:52 PM, AC acanuc...@yahoo.com wrote:
Hi,
Not sure if this is the correct place to post but I'm looking for someone
to
help finish a Solr install on our LAMP based
On Fri, Oct 29, 2010 at 6:00 PM, Ron Mayer r...@0ape.com wrote:
I have some documents with a bunch of attachments (images, thumbnails
for them, audio clips, word docs, etc); and am currently dealing with
them by just putting a path on a filesystem to them in solr; and then
jumping through
What I am envisioning (at least to start) is have all this add two fields in
the index. One would be for color information for the color similarity
search. The other would be a simple multivalued text field that we put
keywords into based on what OpenCV can detect about the image. If it
Lire looks promising, but how hard is it to integrate the content-based
search into Solr as opposed to Lucene? I myself am not a Java developer. I
have access to people who are, but their time is scarce.
Lire is a nascent effort and based on a cursory overview a while back,
IMHO was an
q=*:*
On Thu, Sep 16, 2010 at 4:39 PM, Christopher Gross cogr...@gmail.com wrote:
I have some queries that I'm running against a solr instance (older,
1.2 I believe), and I would like to get *all* the results back (and
not have to put an absurdly large number as a part of the rows
parameter).
to have it return all the rows in the
results?
-- Chris
On Thu, Sep 16, 2010 at 4:43 PM, Shashi Kant sk...@sloan.mit.edu wrote:
q=*:*
On Thu, Sep 16, 2010 at 4:39 PM, Christopher Gross cogr...@gmail.com wrote:
I have some queries that I'm running against a solr instance (older,
1.2 I
Shawn, I have done some research into this, machine-vision especially
on a large scale is a hard problem, not to be entered into lightly. I
would recommend starting with OpenCV - a comprehensive toolkit for
extracting various features such as Color, Edge etc from images. Also
there is a project
On a related note, I'm curious if anyone has run across a good set of
algorithms (or hopefully a library) for doing naive image
classification. I'm looking for something that can classify images
into something similar to the broad categories that Google image
search has (Face, Photo, Clip
I'm sure there's some post doctoral types who could get a graphic shape
analyzer, color analyzer, to at least say it's a flower.
However, even Google would have to build new datacenters to have the
horsepower to do that kind of graphic processing.
Not necessarily true. Like.com - which
If you are on Windows try the Microsoft IFilter API - it supports
current Office versions.
http://www.microsoft.com/downloads/details.aspx?FamilyId=60C92A37-719C-4077-B5C6-CAC34F4227CCdisplaylang=en
On Tue, Apr 27, 2010 at 6:08 AM, Roland Villemoes r...@alpha-solutions.dk
wrote:
Hi All,
Why do these approaches have to be mutually exclusive?
Do a dictionary lookup, if no satisfactory match found use an
algorithmic stemmer. Would probably save a few CPU cycles by
algorithmic stemming iff necessary.
On Wed, Apr 21, 2010 at 1:31 PM, Robert Muir rcm...@gmail.com wrote:
sy to look
In that case, you could just calculate an offset from 00:00:00 in
seconds (ignore the date)
Pretty simple.
On Wed, Mar 31, 2010 at 4:57 PM, abhatna...@vantage.com
abhatna...@vantage.com wrote:
Hi Sashi,
Could you elaborate point no .1 in the light of case where in a field should
have just
Look at Payload.
On Thu, Jan 28, 2010 at 6:48 AM, murali k ilar...@gmail.com wrote:
Say I have a clothes store, i have ladies clothes, mens clothes
when someone searches for clothes, i want to prioritize mens clothing
results,
how can I achieve this ?
this logic should only apply for this
http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads/
On Thu, Jan 28, 2010 at 6:54 AM, Shashi Kant sk...@sloan.mit.edu wrote:
Look at Payload.
On Thu, Jan 28, 2010 at 6:48 AM, murali k ilar...@gmail.com wrote:
Say I have a clothes store, i have ladies clothes, mens
http://lmgtfy.com/?q=lucene+basics
On Sun, Dec 13, 2009 at 1:01 PM, Faire Mii faire@gmail.com wrote:
Hi,
I am a beginner and i wonder what a document, entity and a field relates to
in a database?
And i wonder if there are some good tutorials that learn you how to design
your schema.
Here is a link that might be helpful:
http://sesat.no/moving-from-fast-to-solr-review.html
The site is choc-a-bloc with great information on their migration
experience.
On Tue, Nov 24, 2009 at 8:55 AM, Tommy Molto tommymo...@gmail.com wrote:
Hi,
I'm new at Solr and i need to make a test
I think it would be useful for members of this list to realize that not
everyone uses the same metrology and terms.
It is very easy for Americans to use the imperial system and presume
everyone does the same; Europeans to use the metric system etc. Hopefully
members on this list would be
This post describes the search-within-search implementation.
http://sujitpal.blogspot.com/2007/04/lucene-search-within-search-with.html
Shashi
On Sat, Apr 4, 2009 at 1:21 PM, Vernon Chapman chapman.li...@gmail.comwrote:
Bess,
I think that might work I'll try it out and see how it works
Have you looked at http://wiki.apache.org/solr/SolrPerformanceData
?http://wiki.apache.org/solr/SolrPerformanceData
On Tue, Mar 24, 2009 at 4:51 PM, solr s...@highbeam.com wrote:
We have three Solr servers (several two processor Dell PowerEdge
servers). I'd like to get three newer servers and
Another project worth investigating is Tesseract.
http://code.google.com/p/tesseract-ocr/
- Original Message
From: Hannes Carl Meyer m...@hcmeyer.com
To: solr-user@lucene.apache.org
Sent: Thursday, February 26, 2009 11:35:14 AM
Subject: Re: Use of scanned documents for text
Can anyone back that up?
IMHO Tesseract is the state-of-the-art in OCR, but not sure that Ocropus
builds on Tesseract.
Can you confirm that Vikram has a point?
Shashi
- Original Message
From: Vikram Kumar vikrambku...@gmail.com
To: solr-user@lucene.apache.org; Shashi Kant sk
one man's crap is another man's treasure. :-P
So how would you decide what is worth posting?
If you feel the list is overwhelming your email, set some filters.
Shashi
- Original Message
From: Tony Wang ivyt...@gmail.com
To: solr-user@lucene.apache.org
Sent: Wednesday, February 18,
Steve - could you not just subscribe to the list from another (off-mobile
device) email (Gmail or Yahoo) for example?
We discourage using corporate email for subscribing mailing lists precisely for
such reasons : volume, spam, malware risks etc.
Shashi
- Original Message
From:
42 matches
Mail list logo