LSH in Solr/Lucene

2014-01-20 Thread Shashi Kant
Hi folks, have any of you successfully implemented LSH (MinHash) in Solr? If so, could you share some details of how you went about it? I know LSH is available in Mahout, but was hoping if someone has a solr or Lucene implementation. Thanks

Searching Numeric Data

2014-01-11 Thread Shashi Kant
Hi all, I have a use-case where I would need to search a set of numeric values, using a query set. My business case is 1. I have various Rock samples from various locations {R1...Rn} with multiple measurements like Porosity [255] - an array of values , Conductivity [1028] - also an array of

Re: Solr Patent

2013-09-14 Thread Shashi Kant
You can ask on this site http://patents.stackexchange.com/ On Sat, Sep 14, 2013 at 10:03 AM, Michael Sokolov msoko...@safaribooksonline.com wrote: On 9/13/2013 9:14 PM, Zaizen Ushio wrote: Hello I have a question about patent. I believe Apache license is protecting Solr developers from

Re: Document Similarity Algorithm at Solr/Lucene

2013-07-23 Thread Shashi Kant
Here is a paper that I found useful: http://theory.stanford.edu/~aiken/publications/papers/sigmod03.pdf On Tue, Jul 23, 2013 at 10:42 AM, Furkan KAMACI furkankam...@gmail.com wrote: Thanks for your comments. 2013/7/23 Tommaso Teofili tommaso.teof...@gmail.com if you need a specialized

Re: Search for misspelled words in corpus

2013-06-09 Thread Shashi Kant
n-grams might help, followed by a edit distance metric such as Jaro-Winkler or Smith-Waterman-Gotoh to further filter out. On Sun, Jun 9, 2013 at 1:59 AM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Interesting problem. The first thing that comes to mind is to do word expansion

Re: How apache solr stores indexes

2013-05-28 Thread Shashi Kant
Better still start here: http://en.wikipedia.org/wiki/Inverted_index http://nlp.stanford.edu/IR-book/html/htmledition/a-first-take-at-building-an-inverted-index-1.html And there are several books on search engines and related algorithms. On Tue, May 28, 2013 at 10:41 PM, Alexandre Rafalovitch

Re: Could I use Solr to index multiple applications?

2012-07-17 Thread Shashi Kant
Look up multicore solr. Another choice could be ElasticSearch - which is more straightforward in managing multiple indexes IMO. On Tue, Jul 17, 2012 at 7:53 PM, Zhang, Lisheng lisheng.zh...@broadvision.com wrote: Hi, We have an application where we index data into many different directories

Re: Could I use Solr to index multiple applications?

2012-07-17 Thread Shashi Kant
the doc, so we need to put each core name into Solr config XML, if we add another core and change XML, do we need to restart Solr? Best regards, Lisheng -Original Message- From: shashi@gmail.com [mailto:shashi@gmail.com]On Behalf Of Shashi Kant Sent: Tuesday, July 17, 2012 5

Re: Does Solr fit my needs?

2012-04-27 Thread Shashi Kant
We have used both Solr and graph databases for our XML file indexing. Both are equivalent in terms of performance, but a graph db (such as Neo4j) offers a lot more flexibility in joining across the nodes and traversing. If your data is strictly hierarchical Solr might do it, alternately suggest

Re: How to Sort By a PageRank-Like Complicated Strategy?

2012-01-23 Thread Shashi Kant
is not a static one. It must update on the fly. As I know, Lucene index is not suitable to be updated too frequently. If so, how to deal with that? Best regards, Bing On Sun, Jan 22, 2012 at 12:43 PM, Shashi Kant sk...@sloan.mit.edu wrote: Lucene has a mechanism to boost up/down documents using

Re: How to Sort By a PageRank-Like Complicated Strategy?

2012-01-21 Thread Shashi Kant
Lucene has a mechanism to boost up/down documents using your custom ranking algorithm. So if you come up with something like Pagerank you might do something like doc.SetBoost(myboost), before writing to index. On Sat, Jan 21, 2012 at 5:07 PM, Bing Li lbl...@gmail.com wrote: Hi, Kai, Thanks

Re: Solr, SQL Server's LIKE

2011-12-29 Thread Shashi Kant
for a simple, hackish (albeit inefficient) approach look up wildcard searchers e,g foo*, *bar On Thu, Dec 29, 2011 at 12:38 PM, Devon Baumgarten dbaumgar...@nationalcorp.com wrote: I have been tinkering with Solr for a few weeks, and I am convinced that it could be very helpful in many of

Re: How to run the solr dedup for the document which match 80% or match almost.

2011-12-27 Thread Shashi Kant
You can also look at cosine similarity (or related metrics) to measure document similarity. On Tue, Dec 27, 2011 at 6:51 AM, vibhoreng04 vibhoren...@gmail.com wrote: Hi iorixxx, Thanks for the quick update.I hope I can take it from here ! Regards, Vibhor -- View this message in

Re: Score

2011-08-15 Thread Shashi Kant
https://wiki.apache.org/lucene-java/ScoresAsPercentages On Mon, Aug 15, 2011 at 8:13 PM, Bill Bell billnb...@gmail.com wrote: How do I change the score to scale it between 0 and 100 irregardless of the score? q.alt=*:*bq=lang:SpanishdefType=dismax Bill Bell Sent from mobile

Re: Multiple Cores on different machines?

2011-08-09 Thread Shashi Kant
Betamax VCR? really ? :-) On Tue, Aug 9, 2011 at 3:38 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : A quick question - is it possible to have 2 cores in Solr on two different : machines? your question is a little vague ... like asking is it possible to have to have two betamax

Re: Solr can not index F**K!

2011-07-31 Thread Shashi Kant
Check your Stop words list On Jul 31, 2011 6:25 PM, François Schiettecatte fschietteca...@gmail.com wrote: That seems a little far fetched, have you checked your analysis? François On Jul 31, 2011, at 4:58 PM, randohi wrote: One of our clients (a hot girl!) brought this to our attention:

Re: searching a subset of SOLR index

2011-07-05 Thread Shashi Kant
Range query On Tue, Jul 5, 2011 at 4:37 AM, Jame Vaalet jvaa...@capitaliq.com wrote: Hi, Let say, I have got 10^10 documents in an index with unique id being document id which is assigned to each of those from 1 to 10^10 . Now I want to search a particular query string in a subset of these

Re: Solr vs ElasticSearch

2011-05-31 Thread Shashi Kant
Here is a very interesting comparison http://engineering.socialcast.com/2011/05/realtime-search-solr-vs-elasticsearch/ -Original Message- From: Mark Sent: May-31-11 10:33 PM To: solr-user@lucene.apache.org Subject: Solr vs ElasticSearch I've been hearing more and more about

Re: I need an available solr lucene consultant

2011-05-17 Thread Shashi Kant
You might be better off looking for freelancers on sites such as odesk.com, guru.com, rentacoder.com, elance.com many more On Tue, May 17, 2011 at 4:09 PM, Markus Jelsma markus.jel...@openindex.io wrote: Check this out: http://wiki.apache.org/solr/Support Hi, I am looking for an

Re: Looking for help with Solr implementation

2010-11-12 Thread Shashi Kant
Have you tried posting on odesk.com? I have had decent success finding Solr/Lucene resources there. On Thu, Nov 11, 2010 at 7:52 PM, AC acanuc...@yahoo.com wrote: Hi, Not sure if this is the correct place to post but I'm looking for someone to help finish a Solr install on our LAMP based

Re: Would it be nuts to store a bunch of large attachments (images, videos) in stored but-not-indexed fields

2010-10-29 Thread Shashi Kant
On Fri, Oct 29, 2010 at 6:00 PM, Ron Mayer r...@0ape.com wrote: I have some documents with a bunch of attachments (images, thumbnails for them, audio clips, word docs, etc); and am currently dealing with them by just putting a path on a filesystem to them in solr; and then jumping through

Re: Color search for images

2010-09-17 Thread Shashi Kant
What I am envisioning (at least to start) is have all this add two fields in the index.  One would be for color information for the color similarity search.  The other would be a simple multivalued text field that we put keywords into based on what OpenCV can detect about the image.  If it

Re: Color search for images

2010-09-16 Thread Shashi Kant
Lire looks promising, but how hard is it to integrate the content-based search into Solr as opposed to Lucene?  I myself am not a Java developer.  I have access to people who are, but their time is scarce. Lire is a nascent effort and based on a cursory overview a while back, IMHO was an

Re: Get all results from a solr query

2010-09-16 Thread Shashi Kant
q=*:* On Thu, Sep 16, 2010 at 4:39 PM, Christopher Gross cogr...@gmail.com wrote: I have some queries that I'm running against a solr instance (older, 1.2 I believe), and I would like to get *all* the results back (and not have to put an absurdly large number as a part of the rows parameter).

Re: Get all results from a solr query

2010-09-16 Thread Shashi Kant
to have it return all the rows in the results? -- Chris On Thu, Sep 16, 2010 at 4:43 PM, Shashi Kant sk...@sloan.mit.edu wrote: q=*:* On Thu, Sep 16, 2010 at 4:39 PM, Christopher Gross cogr...@gmail.com wrote: I have some queries that I'm running against a solr instance (older, 1.2 I

Re: Color search for images

2010-09-15 Thread Shashi Kant
Shawn, I have done some research into this, machine-vision especially on a large scale is a hard problem, not to be entered into lightly. I would recommend starting with OpenCV - a comprehensive toolkit for extracting various features such as Color, Edge etc from images. Also there is a project

Re: Color search for images

2010-09-15 Thread Shashi Kant
On a related note, I'm curious if anyone has run across a good set of algorithms (or hopefully a library) for doing naive image classification. I'm looking for something that can classify images into something similar to the broad categories that Google image search has (Face, Photo, Clip

Re: Color search for images

2010-09-15 Thread Shashi Kant
I'm sure there's some post doctoral types who could get a graphic shape analyzer, color analyzer, to at least say it's a flower. However, even Google would have to build new datacenters to have the horsepower to do that kind of graphic processing. Not necessarily true. Like.com - which

Re: Indexing all versions of Microsoft Office Documents

2010-04-27 Thread Shashi Kant
If you are on Windows try the Microsoft IFilter API - it supports current Office versions. http://www.microsoft.com/downloads/details.aspx?FamilyId=60C92A37-719C-4077-B5C6-CAC34F4227CCdisplaylang=en On Tue, Apr 27, 2010 at 6:08 AM, Roland Villemoes r...@alpha-solutions.dk wrote: Hi All,

Re: LucidWorks Solr

2010-04-21 Thread Shashi Kant
Why do these approaches have to be mutually exclusive? Do a dictionary lookup, if no satisfactory match found use an algorithmic stemmer. Would probably save a few CPU cycles by algorithmic stemming iff necessary. On Wed, Apr 21, 2010 at 1:31 PM, Robert Muir rcm...@gmail.com wrote: sy to look

Re: Query time only Ranges

2010-03-31 Thread Shashi Kant
In that case, you could just calculate an offset from 00:00:00 in seconds (ignore the date) Pretty simple. On Wed, Mar 31, 2010 at 4:57 PM, abhatna...@vantage.com abhatna...@vantage.com wrote: Hi Sashi, Could you elaborate point no .1 in the light of case where in a field should have just

Re: boost on certain keywords

2010-01-28 Thread Shashi Kant
Look at Payload. On Thu, Jan 28, 2010 at 6:48 AM, murali k ilar...@gmail.com wrote: Say I have a clothes store,  i have ladies clothes, mens clothes when someone searches for clothes, i want to prioritize mens clothing results, how can I achieve this ? this logic should only apply for this

Re: boost on certain keywords

2010-01-28 Thread Shashi Kant
http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads/ On Thu, Jan 28, 2010 at 6:54 AM, Shashi Kant sk...@sloan.mit.edu wrote: Look at Payload. On Thu, Jan 28, 2010 at 6:48 AM, murali k ilar...@gmail.com wrote: Say I have a clothes store,  i have ladies clothes, mens

Re: HI

2009-12-13 Thread Shashi Kant
http://lmgtfy.com/?q=lucene+basics On Sun, Dec 13, 2009 at 1:01 PM, Faire Mii faire@gmail.com wrote: Hi, I am a beginner and i wonder what a document, entity and a field relates to in a database? And i wonder if there are some good tutorials that learn you how to design your schema.

Re: Migrating to Solr

2009-11-24 Thread Shashi Kant
Here is a link that might be helpful: http://sesat.no/moving-from-fast-to-solr-review.html The site is choc-a-bloc with great information on their migration experience. On Tue, Nov 24, 2009 at 8:55 AM, Tommy Molto tommymo...@gmail.com wrote: Hi, I'm new at Solr and i need to make a test

Re: Solr - Load Increasing.

2009-11-16 Thread Shashi Kant
I think it would be useful for members of this list to realize that not everyone uses the same metrology and terms. It is very easy for Americans to use the imperial system and presume everyone does the same; Europeans to use the metric system etc. Hopefully members on this list would be

Re: Search Within

2009-04-04 Thread Shashi Kant
This post describes the search-within-search implementation. http://sujitpal.blogspot.com/2007/04/lucene-search-within-search-with.html Shashi On Sat, Apr 4, 2009 at 1:21 PM, Vernon Chapman chapman.li...@gmail.comwrote: Bess, I think that might work I'll try it out and see how it works

Re: Hardware Questions...

2009-03-24 Thread Shashi Kant
Have you looked at http://wiki.apache.org/solr/SolrPerformanceData ?http://wiki.apache.org/solr/SolrPerformanceData On Tue, Mar 24, 2009 at 4:51 PM, solr s...@highbeam.com wrote: We have three Solr servers (several two processor Dell PowerEdge servers). I'd like to get three newer servers and

Re: Use of scanned documents for text extraction and indexing

2009-02-26 Thread Shashi Kant
Another project worth investigating is Tesseract. http://code.google.com/p/tesseract-ocr/ - Original Message From: Hannes Carl Meyer m...@hcmeyer.com To: solr-user@lucene.apache.org Sent: Thursday, February 26, 2009 11:35:14 AM Subject: Re: Use of scanned documents for text

Re: Use of scanned documents for text extraction and indexing

2009-02-26 Thread Shashi Kant
Can anyone back that up? IMHO Tesseract is the state-of-the-art in OCR, but not sure that Ocropus builds on Tesseract. Can you confirm that Vikram has a point? Shashi - Original Message From: Vikram Kumar vikrambku...@gmail.com To: solr-user@lucene.apache.org; Shashi Kant sk

Re: why don't we have a forum for discussion?

2009-02-18 Thread Shashi Kant
one man's crap is another man's treasure. :-P So how would you decide what is worth posting? If you feel the list is overwhelming your email, set some filters. Shashi - Original Message From: Tony Wang ivyt...@gmail.com To: solr-user@lucene.apache.org Sent: Wednesday, February 18,

Re: why don't we have a forum for discussion?

2009-02-18 Thread Shashi Kant
Steve - could you not just subscribe to the list from another (off-mobile device) email (Gmail or Yahoo) for example? We discourage using corporate email for subscribing mailing lists precisely for such reasons : volume, spam, malware risks etc. Shashi - Original Message From: