RE: useFastVectorHighlighter doesn't work

2012-05-29 Thread Ahmet Arslan
 The reason why I use useFastVectorHighlighter is because I
 want to set stored=false, and with more settings
 like  termVectors=true termPositions=true
 termOffsets=true. If stored=true, what is the difference
 between normal highlight and useFastVectorHighlighter? What
 is the right situation for using useFastVectorHighlighter?

term*=true makes sense only for stored=true. FastVectorHighlighter requires 
and makes use of term*=true for speedup.


RE: useFastVectorHighlighter doesn't work

2012-05-29 Thread ZHANG Liang F
So for highlight, stored=true is required in any circumstance, right?

 

-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com] 
Sent: 2012年5月29日 16:04
To: solr-user@lucene.apache.org
Subject: RE: useFastVectorHighlighter doesn't work

 The reason why I use useFastVectorHighlighter is because I want to set 
 stored=false, and with more settings like  termVectors=true 
 termPositions=true
 termOffsets=true. If stored=true, what is the difference between 
 normal highlight and useFastVectorHighlighter? What is the right 
 situation for using useFastVectorHighlighter?

term*=true makes sense only for stored=true. FastVectorHighlighter requires 
and makes use of term*=true for speedup.


[SolrCloud] Replication Factor

2012-05-29 Thread Antoine LE FLOC'H
Hello all,

The page http://wiki.apache.org/solr/NewSolrCloudDesign is mentioning

Replication Factor

It is a feature supported by Katta. Is it actually supported by SolrCloud ?

A more general question: katta had some pretty good features like this one.
Why is katta not active anymore ? Is there a way to run equivalent
functionalities with another Solr based framework today, if these doesn't
exist in SolrCloud yet ?

Thank you.


A few random questions about solr queries.

2012-05-29 Thread santamaria2
*1)* With faceting, how does facet.query perform in comparison to
facet.field? I'm just wondering this as in my use case, I need to facet over
a field -- which would get me the top n facets for that field, but I also
need to show the count for a selected filter which might have a relatively
low count so it doesn't appear in the top n returned facets. So the solution
would be to 'ensure' its presence by adding a 'facet.query=cat:val' in
addition to my facet.field=cat.

I want to do this to quite a few fields.

Related/example-based question:
When I facet over a field, and something gets returned, eg: John Smith (83),
and I also 'ensure' this facet's presence by having it in
facet.query=author:John Smith, are two different calculations performed?
Or is the facet returned by facet.field also used by facet.query to obtain
the count?



*2) *Is there a performance issue if I have around, say, 20 facet.query
conditions along with 10 facet.fields? 3/10 of those fields have around
100,000 possible values. Remaining have a few hundred each.



*3)* I've rummaged around a bit, looking for info on when to use q vs fq. I
want to clear my doubts for a certain use case.

Where should my date range queries go? In q or fq? The default settings in
my site show results from the past 90 days with buttons to show stuff from
the last month and week as well. But the user is allowed to use a slider to
apply any date range... this is allowed, but it's not /that/ common. 
I definitely use fq for filtering various tags. Choosing a tag is a common
activity.

Should the date range query go in fq? As I mentioned, the default view shows
stuff from the past 90 days. So on each new day does this like invalidate
stuff in the cache? Or is stuff stored in the filtered cache in some way
that makes it easy to fetch stuff from the past 89 days when a query is
performed the next day?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/A-few-random-questions-about-solr-queries-tp3986562.html
Sent from the Solr - User mailing list archive at Nabble.com.


Query elevation / boosting or something else to guarantee document position

2012-05-29 Thread Wenca

Hi all,

I have an index with thousands of products with various fields 
(manufacturer, price, popularity, type, color, ...) and I want to 
guarantee at least one product by a particular manufacturer to be within 
the first 5 results.


The search is done mainly by using filter params and results are ordered 
by function e.g.: product(price, popularity) asc or by  discount desc


And I need to guarantee that if there is any product matching the given 
filters made by a concrete manufacturer, then it will be on the 5th 
position at worst, even if the position by the order function is worse.


It seems to me that the Query elevation component is not the right thing 
for me. I don't know the query in advance (or the set of filter 
criteria) and I don't know concrete product that will be the best for 
the criteria within the order.


And also I don't think that I can construct a function with such 
requirements to use it directly for ordering the results.


Of course I can make a second query in case there is no desired product 
on the first page of results and put it there, but it requires 
additional request to solr and complicates results processing and 
further pagination.


Can anybody suggest any solution?

Thanks
Wenca


Re: Multi-words synonyms matching

2012-05-29 Thread Bernd Fehling
Hello Elisabeth,

my synonyms.txt is like your 2nd example:

naturwald, φυσικό\ δάσος, естествена\ гора, prírodný\ les, naravni\ gozd,
foresta\ naturale, natuurbos, natural\ forest, bosque\ natural, természetes\ 
erdő,
natūralus\ miškas, prirodna\ šuma, dabiskais\ mežs, floresta\ natural, 
naturskov,
forêt\ naturelle, naturskog, přírodní\ les, luonnonmetsä, pădure\ naturală,
las\ naturalny, natürlicher\ wald


An example from my system with debugging turned on and searching for 
naturwald:

lst name=debug
  str name=rawquerystringnaturwald/str
  str name=querystringnaturwald/str
  str name=parsedquerytextth:naturwald textth:φυσικό δάσος 
textth:естествена гора
textth:prírodný les textth:naravni gozd textth:foresta naturale 
textth:natuurbos
textth:natural forest textth:bosque natural textth:természetes erdő
textth:natūralus miškas textth:prirodna šuma textth:dabiskais mežs
textth:floresta natural textth:naturskov textth:forêt naturelle 
textth:naturskog
textth:přírodní les textth:luonnonmetsä textth:pădure naturală textth:las 
naturalny
textth:natürlicher wald/str
...

As you can see my search for naturwald extends to single and multiword 
synonyms e.g. forêt naturelle


My SynonymFilterFactory has the following settings:

org.apache.solr.analysis.SynonymFilterFactory
{tokenizerFactory=solr.KeywordTokenizerFactory, 
synonyms=synonyms_eurovoc_desc_desc_ufall.txt, expand=true, format=solr, 
ignoreCase=true,
luceneMatchVersion=LUCENE_36}

But as I already mentioned, there is much more work to be done to get it 
running than
just using SynonymFilterFactory.

Regards
Bernd



Am 23.05.2012 08:49, schrieb elisabeth benoit:
 Hello Bernd,
 
 Thanks for your advice.
 
 I have one question: how did you manage to map one word to a multiwords
 synonym???
 
 I've tried (in synonyms.txt)
 
 mairie, hotel de ville
 
 mairie, hotel\ de\ ville
 
 mairie = mairie, hotel de ville
 
 mairie = mairie, hotel\ de\ ville
 
 but nothing prevents mairie from matching with hotel...
 
 The only way I found is to use
 tokenizerFactory=solr.KeywordTokenizerFactory in my synonyms declaration
 in schema.xml, but then since mairie is not alone in my index field, it
 doesn't match.
 
 
 best regards,
 Elisabeth
 
 
 
 
 the only way I found, I schema.xml, is to use
 
 
 
 2012/5/15 Bernd Fehling bernd.fehl...@uni-bielefeld.de
 
 Without reading the whole thread let me say that you should not trust
 the solr admin analysis. It takes the whole multiword search and runs
 it all together at once through each analyzer step (factory).
 But this is not how the real system works. First pitfall, the query parser
 is also splitting at white space (if not a phrase query). Due to this,
 a multiword query is send chunk after chunk through the analyzer and,
 second pitfall, each chunk runs through the whole analyzer by its own.

 So if you are dealing with multiword synonyms you have the following
 problems. Either you turn your query into a phrase so that the whole
 phrase is analyzed at once and therefore looked up as multiword synonym
 but phrase queries are not analyzed !!! OR you send your query chunk
 by chunk through the analyzer but then they are not multiwords anymore
 and are not found in your synonyms.txt.

 From my experience I can say that it requires some deep work to get it done
 but it is possible. I have connected a thesaurus to solr which is doing
 query time expansion (no need to reindex if the thesaurus changes).
 The thesaurus holds synonyms and used for terms in 24 languages. So
 it is also some kind of language translation. And naturally the thesaurus
 translates from single term to multi term synonyms and vice versa.

 Regards,
 Bernd


 Am 14.05.2012 13:54, schrieb elisabeth benoit:
 Just for the record, I'd like to conclude this thread

 First, you were right, there was no behaviour difference between fq and q
 parameters.

 I realized that:

 1) my synonym (hotel de ville) has a stopword in it (de) and since I used
 tokenizerFactory=solr.KeywordTokenizerFactory in my synonyms
 declaration,
 there was no stopword removal in the indewed expression, so when
 requesting
 hotel de ville, after stopwords removal in query, Solr was comparing
 hotel de ville
 with hotel ville

 but my queries never even got to that point since

 2) I made a mistake using mairie alone in the admin interface when
 testing my schema. The real field was something like collectivités
 territoriales mairie,
 so the synonym hotel de ville was not even applied, because of the
 tokenizerFactory=solr.KeywordTokenizerFactory in my synonym definition
 not splitting field into words when parsing

 So my problem is not solved, and I'm considering solving it outside of
 Solr
 scope, unless someone else has a clue

 Thanks again,
 Elisabeth



 2012/4/25 Erick Erickson erickerick...@gmail.com

 A little farther down the debug info output you'll find something
 like this (I specified fq=name:features)

 arr name=parsed_filter_queries
 strname:features/str
 /arr


 so 

Re: Is optimize needed on slaves if it replicates from optimized master?

2012-05-29 Thread sudarshan
Hi Walter,
 Thank you. Do you mean that optimize need not be used at all?
If Solr merges segments (when needed as you said), is there a criteria
during which Solr does this automatically. If I want the search to be faster
and Solr does not optimize for quite a long time, would it not compromise my
query processing rate?

To All,
 I have another doubt. If I optimize and replicate, for the
first time it would transfer all the segments from the master to slave
irrespective of the modified segment(s). After first replication, how the
transfer would be made  - again all segments are replicated or only the
modified segments are replicated? I believe after the first replication
(master and slave in sync), only the modified segments would be transferred
just like the  non-optimized index transfer. Am I right? 

Regards,
Sudarshan  

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-optimize-needed-on-slaves-if-it-replicates-from-optimized-master-tp3241604p3986597.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Multicore Issue - Server Restart

2012-05-29 Thread lboutros
Hi Suajtha,

each webapps has its own solr home ?

Ludovic.

-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multicore-Issue-Server-Restart-tp3986516p3986602.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr - 1143

2012-05-29 Thread Jack Krupansky
That issue is marked as a duplicate of SOLR-3134, which has a patch for Solr 
3.5.


https://issues.apache.org/jira/browse/SOLR-3134

-- Jack Krupansky

-Original Message- 
From: Ramprakash Ramamoorthy

Sent: Tuesday, May 29, 2012 3:03 AM
To: solr-user@lucene.apache.org
Subject: Solr - 1143

Dear all,

 A small doubt. I realised I will have to apply the patch
mentioned in Solr Jira 1143 to return partial results when one of my shards
is dead/slow.

 But the patch has no version explicitly specified. I am using
Solr 3.5.0 and can I apply the patch to my installation as such?

--
With Thanks and Regards,
Ramprakash Ramamoorthy,
Engineer Trainee,
Zoho Corporation.
+91 9626975420 



Re: suggestions developing a multi-version concurrency control (MVCC) mechanism

2012-05-29 Thread Nicholas Ball

Hmmm interesting, that will definitely work and may be the way to go.
Ideally, I'd rather store the older versions within a field of the newest
if possible.
Can one create a custom field that holds other objects?

Nick

On Mon, 28 May 2012 17:07:06 -0700, Lance Norskog goks...@gmail.com
wrote:
 You can use the document id and timestamp as a compound unique id.
 Then the search would also sort by id, then by timestamp. Result
 grouping might let you pick the most recent document from each of the
 sorted docs.
 
 On Mon, May 28, 2012 at 3:15 PM, Nicholas Ball
 nicholas.b...@nodelay.com wrote:

 Hello all,

 For the first step of the distributed snapshot isolation system I'm
 developing for Solr, I'm going to need to have a MVCC mechanism as
 opposed
 to the single-version concurrency control mechanism already developed
 (DistributedUpdateProcessor class). I'm trying to find the very best
way
 to
 develop this into Solr 4.x (trunk) and so any help would be greatly
 appreciated!

 Essentially I need to be able to store multiple version of a document
so
 that when you look up a document with a given timestamp, you're given
the
 correct version (anything the same or older, not fresher). The older
 versioned documents need to be stored in the index itself to ensure
they
 are durable and can be manipulated as other Solr data can be.

 One way to do this is to store the old versioned Solr documents within
 the
 latest Solr Document, but I'm not sure this is even possible?
 Alternatively, I could have the latest versioned Document store the
 unique
 keys which point to other older documents. The problem with this is
that
 it
 complicates things having various partial objects which all combine as
 one
 logically document.

 Are there any suggestions as to the best way to develop this feature?

 Thank you in advance for any help you can spare!

 Nicholas


Re: Is optimize needed on slaves if it replicates from optimized master?

2012-05-29 Thread Walter Underwood
You do not need to use optimize at all.

Solr continually merges segments (optimizes) as needed.

wunder

On May 29, 2012, at 6:08 AM, sudarshan wrote:

 Hi Walter,
 Thank you. Do you mean that optimize need not be used at all?
 If Solr merges segments (when needed as you said), is there a criteria
 during which Solr does this automatically. If I want the search to be faster
 and Solr does not optimize for quite a long time, would it not compromise my
 query processing rate?
 
 To All,
 I have another doubt. If I optimize and replicate, for the
 first time it would transfer all the segments from the master to slave
 irrespective of the modified segment(s). After first replication, how the
 transfer would be made  - again all segments are replicated or only the
 modified segments are replicated? I believe after the first replication
 (master and slave in sync), only the modified segments would be transferred
 just like the  non-optimized index transfer. Am I right? 
 
 Regards,
 Sudarshan  
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Is-optimize-needed-on-slaves-if-it-replicates-from-optimized-master-tp3241604p3986597.html
 Sent from the Solr - User mailing list archive at Nabble.com.







sort in local params and rows parameter

2012-05-29 Thread jhusman
Hello,

we're having some issues with a Solr query and are unsure if we've
encountered a bug or just don't understand the expected behaviour. Any help
would be appreciated.

The problem is this: we're running a query using the browser that for
debugging purposes looks like this:
q={!sort%3DeventId%20asc}arows=2

here eventId is a long field in our schema. The sort works fine, but the
query returns 10 results (out of 35), clearly ignoring the rows parameter.
For reference, q=arows=2 only returns 2 results (again out of 35).

We can go around this by introducing rows as a local parameter instead:
q={!sort%3DeventId%20asc+rows%3D2}a

this only returns 2 results, as expected.
So, it seems that using sort as a local parameter causes solr to ignore the
external rows parameter. This does not seem to be true for any local
parameters, only sort (as far as we can tell).

Why is this happening?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/sort-in-local-params-and-rows-parameter-tp3986615.html
Sent from the Solr - User mailing list archive at Nabble.com.


TF-IDF vector

2012-05-29 Thread Allen
Hi List,

I am curious about the meaning of tf-idf vector after reading this
http://wiki.apache.org/solr/TermVectorComponent.

The tf flag returns me the tf vector for just one doc. The df flag
returns me the df vector of all the docs in the index.

Does the tf-idf vector represents one doc or set of docs?

Too, can I specify a subset of docs which the df vector is calculated
on rather than the entire set of docs?


Re: TF-IDF vector

2012-05-29 Thread Jack Krupansky

Does the tf-idf vector represents one doc or set of docs?

IDF is calculated across all docs that contain the term.

TF is calculated for a single document containing the term.

Each term of each doc will have its own tf-idf.

-- Jack Krupansky

-Original Message- 
From: Allen 
Sent: Tuesday, May 29, 2012 12:11 PM 
To: solr-user@lucene.apache.org 
Subject: TF-IDF vector 


Hi List,

I am curious about the meaning of tf-idf vector after reading this
http://wiki.apache.org/solr/TermVectorComponent.

The tf flag returns me the tf vector for just one doc. The df flag
returns me the df vector of all the docs in the index.

Does the tf-idf vector represents one doc or set of docs?

Too, can I specify a subset of docs which the df vector is calculated
on rather than the entire set of docs?


RE: Many Cores with Solr

2012-05-29 Thread Klostermeyer, Michael
IMO it would be a better (from Solr's perspective) to handle the security w/ 
the application code.  Each query could include a ?fq=userID:12345... which 
would limit results to only what that user is allowed to see.
 
Mike

-Original Message-
From: Mike Douglass [mailto:mikeadougl...@gmail.com] 
Sent: Wednesday, May 23, 2012 4:02 PM
To: solr-user@lucene.apache.org
Subject: Re: Many Cores with Solr

My interest in this is the desire to create one index per user of a system - 
the issue here is privacy - data indexed for one user should not be visible to 
other users.

For this purpose solr will be hidden behind a proxy which steers authenticated 
sessions to the appropriat ecore.

Does this seem like a valid/feasible approach?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Many-Cores-with-Solr-tp3161889p3985789.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Many Cores with Solr

2012-05-29 Thread Michael Della Bitta
That's what we do. It has the advantage of letting the general queries
be cached once across all users.

Michael

On Tue, May 29, 2012 at 12:39 PM, Klostermeyer, Michael
mklosterme...@riskexchange.com wrote:
 IMO it would be a better (from Solr's perspective) to handle the security w/ 
 the application code.  Each query could include a ?fq=userID:12345... which 
 would limit results to only what that user is allowed to see.

 Mike


-- 
Appinions, Inc. -- Where Influence Isn’t a Game. http://www.appinions.com


Re: UpdateRequestProcessor : flattened values

2012-05-29 Thread Chris Hostetter

: And it might make sense to have a multi-value flattening attribute for Solr
: itself rather than in SolrCell.

Coming in 4.0...

https://builds.apache.org/view/G-L/view/Lucene/job/Solr-trunk/javadoc/org/apache/solr/update/processor/ConcatFieldUpdateProcessorFactory.html

DOC
Concatenates multiple values for fields matching the specified conditions 
using a configurable delimiter which defaults to , .

By default, this processor concatenates the values for any field name 
which according to the schema is multiValued=false and uses TextField or 
StrField
DOC



-Hoss


Solr backup / replication internals

2012-05-29 Thread Ganesh
Hi,

Could any one explain me about the internals of Backup / Replication. Please 
give me more information like do's and don'ts of Backup / Replication.

1. Is the backup / replication incremental ? 

2. While taking backup / replication, Whether Solr could add / update the index?

3. Backup command and Backup script does file copy. Is there any difference 
between them. 

Regards
Ganesh


Relevancy ranking for synonym matches

2012-05-29 Thread Gau
I was wondering if there is any solution for this.
Currently I expand my results to match the synonyms at query time.

So if I entered James, I would get results for Jim, Gomes, Game etc as they
would be expanded by matching the synonyms for James. But then since this is
just a one word match, tf, idf and other parameters dont make sense. I have
reset those factors to 1. Hence the results I get have an equal score.

What I really want to do is, sort these results by Levenstein Distance
without using ~ sign. The issue in using ~ sign is, if I have a synonym
which is radically different (say Greg for James), if I use James~0, Greg
would not even match closely with James and the number of results returned
would be less than the actual number of synonym matches.

So my usecase is, without reducing the number of results, I want to sort
them by Levenstein Distance, or closest string match to the original query

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Relevancy-ranking-for-synonym-matches-tp3986634.html
Sent from the Solr - User mailing list archive at Nabble.com.


MongoDB and Solr

2012-05-29 Thread rjain15
Hi

I am building web app/mobile app, where users can update information
frequently and there is a search function to quick search the information
using different types of searches. 

Most of the data is going to be posted in JSON Format and stored in JSON
format

I have a few questions on the architecture choices, I am relatively new to
Solr and MongoDB.  

1. Should I use MongoDB to store the JSON documents, or does Solr natively
store the documents in the data directory

2. Does Solr require a specific schema for the JSON document. 


Thanks
Rajesh



--
View this message in context: 
http://lucene.472066.n3.nabble.com/MongoDB-and-Solr-tp3986636.html
Sent from the Solr - User mailing list archive at Nabble.com.


MongoDB and Solr

2012-05-29 Thread rjain15
Hi

I am building web app/mobile app, where users can update information
frequently and there is a search function to quick search the information
using different types of searches. 

Most of the data is going to be posted in JSON Format and stored in JSON
format

I have a few questions on the architecture choices, I am relatively new to
Solr and MongoDB.  

1. Should I use MongoDB to store the JSON documents, or does Solr natively
store the documents in the data directory

2. Does Solr require a specific schema for the JSON document. 


Thanks
Rajesh



--
View this message in context: 
http://lucene.472066.n3.nabble.com/MongoDB-and-Solr-tp3986637.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: UpdateRequestProcessor : flattened values

2012-05-29 Thread Jack Krupansky
Sounds good. Then all that will be needed is a way to disable the SolrCell 
flattening so that other update processors can see the unflattened field 
values before they are handled off to a ConcatFieldUpdateProcessor them.


-- Jack Krupansky

-Original Message- 
From: Chris Hostetter

Sent: Tuesday, May 29, 2012 12:43 PM
To: solr-user@lucene.apache.org
Subject: Re: UpdateRequestProcessor : flattened values


: And it might make sense to have a multi-value flattening attribute for 
Solr

: itself rather than in SolrCell.

Coming in 4.0...

https://builds.apache.org/view/G-L/view/Lucene/job/Solr-trunk/javadoc/org/apache/solr/update/processor/ConcatFieldUpdateProcessorFactory.html

DOC
Concatenates multiple values for fields matching the specified conditions
using a configurable delimiter which defaults to , .

By default, this processor concatenates the values for any field name
which according to the schema is multiValued=false and uses TextField or
StrField
DOC



-Hoss 



Re: MongoDB and Solr

2012-05-29 Thread Jack Krupansky
Although Solr uses XML format for document update and query, JSON is a 
supported option.


To post documents in JSON, see:
http://wiki.apache.org/solr/UpdateJSON

To retrieve query results in JSON, see:
http://wiki.apache.org/solr/SolJSON

That works well for relatively flat data (each field has a simple value or 
list of values), but less well if you have complex structure within an 
individual field value (e.g., multi-level nesting of JSON for a single field 
value.) For the latter, you would have to store the JSON as a string for 
such a field.


-- Jack Krupansky

-Original Message- 
From: rjain15

Sent: Tuesday, May 29, 2012 12:57 PM
To: solr-user@lucene.apache.org
Subject: MongoDB and Solr

Hi

I am building web app/mobile app, where users can update information
frequently and there is a search function to quick search the information
using different types of searches.

Most of the data is going to be posted in JSON Format and stored in JSON
format

I have a few questions on the architecture choices, I am relatively new to
Solr and MongoDB.

1. Should I use MongoDB to store the JSON documents, or does Solr natively
store the documents in the data directory

2. Does Solr require a specific schema for the JSON document.


Thanks
Rajesh



--
View this message in context: 
http://lucene.472066.n3.nabble.com/MongoDB-and-Solr-tp3986637.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: MongoDB and Solr

2012-05-29 Thread rjain15
Hi Jack

Thanks for the information. I do have multi-level nesting of JSON data. 

So back to my questions, apologize for repeating...

1. Should I use MongoDB to store the JSON documents, or does Solr natively 
store the documents in the data directory 

2. Does Solr require a specific schema for the JSON document. 

Thanks
Rajesh


--
View this message in context: 
http://lucene.472066.n3.nabble.com/MongoDB-and-Solr-tp3986637p3986662.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: MongoDB and Solr

2012-05-29 Thread Michael Della Bitta
1. Yes, and 2. Yes. :)

Solr's adding more NoSQL-like features for 4.0, but in the meantime,
you're better off storing documents with a complex schema in a
document store and using Solr for findability. Basically the schema
for a document in Solr/Lucene is flat (although it can contain
arbitrarily-named fields), so your document will require some sort of
transformation for indexing.

Michael Della Bitta


Appinions, Inc. -- Where Influence Isn’t a Game.
http://www.appinions.com


On Tue, May 29, 2012 at 2:20 PM, rjain15 rjai...@gmail.com wrote:
 Hi Jack

 Thanks for the information. I do have multi-level nesting of JSON data.

 So back to my questions, apologize for repeating...

 1. Should I use MongoDB to store the JSON documents, or does Solr natively
 store the documents in the data directory

 2. Does Solr require a specific schema for the JSON document.

 Thanks
 Rajesh


 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/MongoDB-and-Solr-tp3986637p3986662.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: MongoDB and Solr

2012-05-29 Thread Jack Krupansky
Could you give us an example of one of your documents. Then we can give you 
better feedback on what makes sense within Solr.


-- Jack Krupansky

-Original Message- 
From: rjain15

Sent: Tuesday, May 29, 2012 2:20 PM
To: solr-user@lucene.apache.org
Subject: Re: MongoDB and Solr

Hi Jack

Thanks for the information. I do have multi-level nesting of JSON data.

So back to my questions, apologize for repeating...

1. Should I use MongoDB to store the JSON documents, or does Solr natively
store the documents in the data directory

2. Does Solr require a specific schema for the JSON document.

Thanks
Rajesh


--
View this message in context: 
http://lucene.472066.n3.nabble.com/MongoDB-and-Solr-tp3986637p3986662.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Example setup of using Solr 3.6.0 with Jetty 7 (7.6.3)?

2012-05-29 Thread Aaron Daubman
Greetings,

Has anybody gotten Solr 3.6.0 to work well with Jetty 7.6.3, and if so,
would you mind sharing your config files / directory structure / other
useful details?

Thanks,
 Aaron


Re: Many Cores with Solr

2012-05-29 Thread Mike Douglass
Thank you.

That sounds good - are we sure to get no leakage with this approach?

I'd be indexing personal information which must not be delivered without
authentication.

The solr instance is front-ended by bedework which can handle the auth and
adding a query term.

 IMO it would be a better (from Solr's perspective) to handle the security
 w/ the application code.  Each query could include a ?fq=userID:12345...
 which would limit results to only what that user is allowed to see.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Many-Cores-with-Solr-tp3161889p3986675.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: MongoDB and Solr

2012-05-29 Thread rjain15
Hi 

This is a sample schema, but it can be more nested as I build the app. As
more students enroll, or more classes are added, it will grow. 





colleges
[
college:
{
id : college Id
classes:
[
{
id: 0001,
type: speech,
name: Speech Class,
credits: 3,
students:
{

{ id: 1001, name: ABC, },

{ id: 1002, name: PQQ,... },

{ id: 1003, name: AAA,... },

{ id: 1004, name: ASA,... }
},
instructors:
[
{ id: 5001, 
name: ASAS },
{ id: 5002, 
name: ASAA },
]
},
]   
locations:
[
{ id: 6001, address: Address-1 
},
{ id: 6001, address: Address-2 
},
]
}
]   



--
View this message in context: 
http://lucene.472066.n3.nabble.com/MongoDB-and-Solr-tp3986637p3986676.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: MongoDB and Solr

2012-05-29 Thread Gora Mohanty
On 29 May 2012 22:27, rjain15 rjai...@gmail.com wrote:
 Hi

 I am building web app/mobile app, where users can update information
 frequently and there is a search function to quick search the information
 using different types of searches.

 Most of the data is going to be posted in JSON Format and stored in JSON
 format

 I have a few questions on the architecture choices, I am relatively new to
 Solr and MongoDB.

 1. Should I use MongoDB to store the JSON documents, or does Solr natively
 store the documents in the data directory

Sorry, but you do not provide nearly enough information
for people to be able to make sensible suggestions. What
is your use case? MongoDB is largely a different beast from
Solr. What do you think merits its use, and where does it
fit in your scheme of things? In many cases, one could have
both MongoDB, and Solr. In other cases, one or the other
might better fit the bill.

 2. Does Solr require a specific schema for the JSON document.

You can POST a JSON document to Solr, and get
JSON output back. Not sure if this meets your needs,
but please take a look at:
http://wiki.apache.org/solr/UpdateJSON
http://wiki.apache.org/solr/SolJSON

Regards,
Gora


Re: Many Cores with Solr

2012-05-29 Thread Michael Della Bitta
It's a similar approach as using SQL to filter the rows brought back
for a particular user from a table. It's strong as long as you write
your queries correctly, you store your data properly, and you guard
against injection and privilege escalation. There's an added bonus in
this case in that the user's submitted text isn't in the same query as
the part that limits the rows they have access to, but if you're doing
proper escaping of the query text, that shouldn't be relied on anyway.

Michael Della Bitta


Appinions, Inc. -- Where Influence Isn’t a Game.
http://www.appinions.com


On Tue, May 29, 2012 at 3:07 PM, Mike Douglass mikeadougl...@gmail.com wrote:
 Thank you.

 That sounds good - are we sure to get no leakage with this approach?

 I'd be indexing personal information which must not be delivered without
 authentication.

 The solr instance is front-ended by bedework which can handle the auth and
 adding a query term.


Re: Many Cores with Solr

2012-05-29 Thread Erik Hatcher
You do get relevancy related leakage though.  With users content all in the 
same index and using the same field names, term and document frequencies across 
the index will be used for scoring.  This may be (and has been) a good reason 
to keep separately searchable content in different indexes/cores.

Erik


On May 29, 2012, at 15:07 , Mike Douglass wrote:

 Thank you.
 
 That sounds good - are we sure to get no leakage with this approach?
 
 I'd be indexing personal information which must not be delivered without
 authentication.
 
 The solr instance is front-ended by bedework which can handle the auth and
 adding a query term.
 
 IMO it would be a better (from Solr's perspective) to handle the security
 w/ the application code.  Each query could include a ?fq=userID:12345...
 which would limit results to only what that user is allowed to see.
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Many-Cores-with-Solr-tp3161889p3986675.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Many Cores with Solr

2012-05-29 Thread Michael Della Bitta
In our particular case, we're using this index to do prefix searches
for autocomplete of sparse keyword data, so we don't have much to
worry about on this front, but I do agree that it's a consideration
for those use cases that do reveal information via ranking.

Michael Della Bitta


Appinions, Inc. -- Where Influence Isn’t a Game.
http://www.appinions.com


On Tue, May 29, 2012 at 4:00 PM, Erik Hatcher erik.hatc...@gmail.com wrote:
 You do get relevancy related leakage though.  With users content all in the 
 same index and using the same field names, term and document frequencies 
 across the index will be used for scoring.  This may be (and has been) a good 
 reason to keep separately searchable content in different indexes/cores.

        Erik


 On May 29, 2012, at 15:07 , Mike Douglass wrote:

 Thank you.

 That sounds good - are we sure to get no leakage with this approach?

 I'd be indexing personal information which must not be delivered without
 authentication.

 The solr instance is front-ended by bedework which can handle the auth and
 adding a query term.

 IMO it would be a better (from Solr's perspective) to handle the security
 w/ the application code.  Each query could include a ?fq=userID:12345...
 which would limit results to only what that user is allowed to see.



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Many-Cores-with-Solr-tp3161889p3986675.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: suggestions developing a multi-version concurrency control (MVCC) mechanism

2012-05-29 Thread Lance Norskog
Solr uses a flat schema. You can store old versions, but you have to
encode them somehow and save them as data.

On Tue, May 29, 2012 at 7:20 AM, Nicholas Ball
nicholas.b...@nodelay.com wrote:

 Hmmm interesting, that will definitely work and may be the way to go.
 Ideally, I'd rather store the older versions within a field of the newest
 if possible.
 Can one create a custom field that holds other objects?

 Nick

 On Mon, 28 May 2012 17:07:06 -0700, Lance Norskog goks...@gmail.com
 wrote:
 You can use the document id and timestamp as a compound unique id.
 Then the search would also sort by id, then by timestamp. Result
 grouping might let you pick the most recent document from each of the
 sorted docs.

 On Mon, May 28, 2012 at 3:15 PM, Nicholas Ball
 nicholas.b...@nodelay.com wrote:

 Hello all,

 For the first step of the distributed snapshot isolation system I'm
 developing for Solr, I'm going to need to have a MVCC mechanism as
 opposed
 to the single-version concurrency control mechanism already developed
 (DistributedUpdateProcessor class). I'm trying to find the very best
 way
 to
 develop this into Solr 4.x (trunk) and so any help would be greatly
 appreciated!

 Essentially I need to be able to store multiple version of a document
 so
 that when you look up a document with a given timestamp, you're given
 the
 correct version (anything the same or older, not fresher). The older
 versioned documents need to be stored in the index itself to ensure
 they
 are durable and can be manipulated as other Solr data can be.

 One way to do this is to store the old versioned Solr documents within
 the
 latest Solr Document, but I'm not sure this is even possible?
 Alternatively, I could have the latest versioned Document store the
 unique
 keys which point to other older documents. The problem with this is
 that
 it
 complicates things having various partial objects which all combine as
 one
 logically document.

 Are there any suggestions as to the best way to develop this feature?

 Thank you in advance for any help you can spare!

 Nicholas



-- 
Lance Norskog
goks...@gmail.com


Re: Many Cores with Solr

2012-05-29 Thread Mike Douglass
That was one of my concerns. To date I've been using lucene directly and
pointing it at an index for the current authenticated user. solr cores
seemed to come close to that.

Is the issue with a lot of cores just creating a lot or using many cores
concurrently? 


Erik Hatcher-4 wrote
 
 You do get relevancy related leakage though.  With users content all in
 the same index and using the same field names, term and document
 frequencies across the index will be used for scoring.  This may be (and
 has been) a good reason to keep separately searchable content in different
 indexes/cores.
 


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Many-Cores-with-Solr-tp3161889p3986710.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Multi-words synonyms matching

2012-05-29 Thread elisabeth benoit
Hello Bernd,

Thanks a lot for your answer. I'll work on this.

Best regards,
Elisabeth

2012/5/29 Bernd Fehling bernd.fehl...@uni-bielefeld.de

 Hello Elisabeth,

 my synonyms.txt is like your 2nd example:

 naturwald, φυσικό\ δάσος, естествена\ гора, prírodný\ les, naravni\ gozd,
 foresta\ naturale, natuurbos, natural\ forest, bosque\ natural,
 természetes\ erdő,
 natūralus\ miškas, prirodna\ šuma, dabiskais\ mežs, floresta\ natural,
 naturskov,
 forêt\ naturelle, naturskog, přírodní\ les, luonnonmetsä, pădure\ naturală,
 las\ naturalny, natürlicher\ wald


 An example from my system with debugging turned on and searching for
 naturwald:

 lst name=debug
  str name=rawquerystringnaturwald/str
  str name=querystringnaturwald/str
  str name=parsedquerytextth:naturwald textth:φυσικό δάσος
 textth:естествена гора
 textth:prírodný les textth:naravni gozd textth:foresta naturale
 textth:natuurbos
 textth:natural forest textth:bosque natural textth:természetes erdő
 textth:natūralus miškas textth:prirodna šuma textth:dabiskais mežs
 textth:floresta natural textth:naturskov textth:forêt naturelle
 textth:naturskog
 textth:přírodní les textth:luonnonmetsä textth:pădure naturală
 textth:las naturalny
 textth:natürlicher wald/str
 ...

 As you can see my search for naturwald extends to single and multiword
 synonyms e.g. forêt naturelle


 My SynonymFilterFactory has the following settings:

 org.apache.solr.analysis.SynonymFilterFactory
 {tokenizerFactory=solr.KeywordTokenizerFactory,
 synonyms=synonyms_eurovoc_desc_desc_ufall.txt, expand=true, format=solr,
 ignoreCase=true,
 luceneMatchVersion=LUCENE_36}

 But as I already mentioned, there is much more work to be done to get it
 running than
 just using SynonymFilterFactory.

 Regards
 Bernd



 Am 23.05.2012 08:49, schrieb elisabeth benoit:
  Hello Bernd,
 
  Thanks for your advice.
 
  I have one question: how did you manage to map one word to a multiwords
  synonym???
 
  I've tried (in synonyms.txt)
 
  mairie, hotel de ville
 
  mairie, hotel\ de\ ville
 
  mairie = mairie, hotel de ville
 
  mairie = mairie, hotel\ de\ ville
 
  but nothing prevents mairie from matching with hotel...
 
  The only way I found is to use
  tokenizerFactory=solr.KeywordTokenizerFactory in my synonyms
 declaration
  in schema.xml, but then since mairie is not alone in my index field, it
  doesn't match.
 
 
  best regards,
  Elisabeth
 
 
 
 
  the only way I found, I schema.xml, is to use
 
 
 
  2012/5/15 Bernd Fehling bernd.fehl...@uni-bielefeld.de
 
  Without reading the whole thread let me say that you should not trust
  the solr admin analysis. It takes the whole multiword search and runs
  it all together at once through each analyzer step (factory).
  But this is not how the real system works. First pitfall, the query
 parser
  is also splitting at white space (if not a phrase query). Due to this,
  a multiword query is send chunk after chunk through the analyzer and,
  second pitfall, each chunk runs through the whole analyzer by its own.
 
  So if you are dealing with multiword synonyms you have the following
  problems. Either you turn your query into a phrase so that the whole
  phrase is analyzed at once and therefore looked up as multiword synonym
  but phrase queries are not analyzed !!! OR you send your query chunk
  by chunk through the analyzer but then they are not multiwords anymore
  and are not found in your synonyms.txt.
 
  From my experience I can say that it requires some deep work to get it
 done
  but it is possible. I have connected a thesaurus to solr which is doing
  query time expansion (no need to reindex if the thesaurus changes).
  The thesaurus holds synonyms and used for terms in 24 languages. So
  it is also some kind of language translation. And naturally the
 thesaurus
  translates from single term to multi term synonyms and vice versa.
 
  Regards,
  Bernd
 
 
  Am 14.05.2012 13:54, schrieb elisabeth benoit:
  Just for the record, I'd like to conclude this thread
 
  First, you were right, there was no behaviour difference between fq
 and q
  parameters.
 
  I realized that:
 
  1) my synonym (hotel de ville) has a stopword in it (de) and since I
 used
  tokenizerFactory=solr.KeywordTokenizerFactory in my synonyms
  declaration,
  there was no stopword removal in the indewed expression, so when
  requesting
  hotel de ville, after stopwords removal in query, Solr was comparing
  hotel de ville
  with hotel ville
 
  but my queries never even got to that point since
 
  2) I made a mistake using mairie alone in the admin interface when
  testing my schema. The real field was something like collectivités
  territoriales mairie,
  so the synonym hotel de ville was not even applied, because of the
  tokenizerFactory=solr.KeywordTokenizerFactory in my synonym
 definition
  not splitting field into words when parsing
 
  So my problem is not solved, and I'm considering solving it outside of
  Solr
  scope, unless someone else has a clue
 
  

Re: Multi-words synonyms matching

2012-05-29 Thread Lance Norskog
I recently have had the same use case. I wound up doing this: in both
index and query time, the synonyms file is 'expand=false'. All
multi-word synonyms map to one single-word synonym (per group). This
way, only the main word is indexed or queried.

If the synonym file changes, you have to re-index the matching content.

On Tue, May 29, 2012 at 1:27 PM, elisabeth benoit
elisaelisael...@gmail.com wrote:
 Hello Bernd,

 Thanks a lot for your answer. I'll work on this.

 Best regards,
 Elisabeth

 2012/5/29 Bernd Fehling bernd.fehl...@uni-bielefeld.de

 Hello Elisabeth,

 my synonyms.txt is like your 2nd example:

 naturwald, φυσικό\ δάσος, естествена\ гора, prírodný\ les, naravni\ gozd,
 foresta\ naturale, natuurbos, natural\ forest, bosque\ natural,
 természetes\ erdő,
 natūralus\ miškas, prirodna\ šuma, dabiskais\ mežs, floresta\ natural,
 naturskov,
 forêt\ naturelle, naturskog, přírodní\ les, luonnonmetsä, pădure\ naturală,
 las\ naturalny, natürlicher\ wald


 An example from my system with debugging turned on and searching for
 naturwald:

 lst name=debug
  str name=rawquerystringnaturwald/str
  str name=querystringnaturwald/str
  str name=parsedquerytextth:naturwald textth:φυσικό δάσος
 textth:естествена гора
 textth:prírodný les textth:naravni gozd textth:foresta naturale
 textth:natuurbos
 textth:natural forest textth:bosque natural textth:természetes erdő
 textth:natūralus miškas textth:prirodna šuma textth:dabiskais mežs
 textth:floresta natural textth:naturskov textth:forêt naturelle
 textth:naturskog
 textth:přírodní les textth:luonnonmetsä textth:pădure naturală
 textth:las naturalny
 textth:natürlicher wald/str
 ...

 As you can see my search for naturwald extends to single and multiword
 synonyms e.g. forêt naturelle


 My SynonymFilterFactory has the following settings:

 org.apache.solr.analysis.SynonymFilterFactory
 {tokenizerFactory=solr.KeywordTokenizerFactory,
 synonyms=synonyms_eurovoc_desc_desc_ufall.txt, expand=true, format=solr,
 ignoreCase=true,
 luceneMatchVersion=LUCENE_36}

 But as I already mentioned, there is much more work to be done to get it
 running than
 just using SynonymFilterFactory.

 Regards
 Bernd



 Am 23.05.2012 08:49, schrieb elisabeth benoit:
  Hello Bernd,
 
  Thanks for your advice.
 
  I have one question: how did you manage to map one word to a multiwords
  synonym???
 
  I've tried (in synonyms.txt)
 
  mairie, hotel de ville
 
  mairie, hotel\ de\ ville
 
  mairie = mairie, hotel de ville
 
  mairie = mairie, hotel\ de\ ville
 
  but nothing prevents mairie from matching with hotel...
 
  The only way I found is to use
  tokenizerFactory=solr.KeywordTokenizerFactory in my synonyms
 declaration
  in schema.xml, but then since mairie is not alone in my index field, it
  doesn't match.
 
 
  best regards,
  Elisabeth
 
 
 
 
  the only way I found, I schema.xml, is to use
 
 
 
  2012/5/15 Bernd Fehling bernd.fehl...@uni-bielefeld.de
 
  Without reading the whole thread let me say that you should not trust
  the solr admin analysis. It takes the whole multiword search and runs
  it all together at once through each analyzer step (factory).
  But this is not how the real system works. First pitfall, the query
 parser
  is also splitting at white space (if not a phrase query). Due to this,
  a multiword query is send chunk after chunk through the analyzer and,
  second pitfall, each chunk runs through the whole analyzer by its own.
 
  So if you are dealing with multiword synonyms you have the following
  problems. Either you turn your query into a phrase so that the whole
  phrase is analyzed at once and therefore looked up as multiword synonym
  but phrase queries are not analyzed !!! OR you send your query chunk
  by chunk through the analyzer but then they are not multiwords anymore
  and are not found in your synonyms.txt.
 
  From my experience I can say that it requires some deep work to get it
 done
  but it is possible. I have connected a thesaurus to solr which is doing
  query time expansion (no need to reindex if the thesaurus changes).
  The thesaurus holds synonyms and used for terms in 24 languages. So
  it is also some kind of language translation. And naturally the
 thesaurus
  translates from single term to multi term synonyms and vice versa.
 
  Regards,
  Bernd
 
 
  Am 14.05.2012 13:54, schrieb elisabeth benoit:
  Just for the record, I'd like to conclude this thread
 
  First, you were right, there was no behaviour difference between fq
 and q
  parameters.
 
  I realized that:
 
  1) my synonym (hotel de ville) has a stopword in it (de) and since I
 used
  tokenizerFactory=solr.KeywordTokenizerFactory in my synonyms
  declaration,
  there was no stopword removal in the indewed expression, so when
  requesting
  hotel de ville, after stopwords removal in query, Solr was comparing
  hotel de ville
  with hotel ville
 
  but my queries never even got to that point since
 
  2) I made a mistake using mairie alone in the admin interface when

Re: how to reduce the result size to 2-3 lines and expand based on user interest

2012-05-29 Thread srini
hi iorixxx,

Sorry I missed your reply. Let me put my requirement in another way.

I have a description field which holds more text(2-3 para graphs) and it is
indexed.

When User search for any word, if solr finds that word in description I want
to show the content probably 2-3 lines which matches the search word? Any
ideas how to do this?

Thanks in Advance!!!
Srini


--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-reduce-the-result-size-to-2-3-lines-and-expand-based-on-user-interest-tp3985692p3986727.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: MongoDB and Solr

2012-05-29 Thread rjain15
Hi Gora, 

I am working on a Mobile App, which is updating/accessing/searching data and
I have created a simple prototype using Solr and the Update JSON / Get JSON
functions of Solr. 

I came across some discussion on MongoDB and how it natively stores JSON
data, and hence as I was looking at scalability of data storage/indexing, I
was pausing to understand if I am on the right track of just using Solr or
should I combine Solr with MongoDB as I am reading this blog post...

http://blog.knuthaugen.no/2010/04/cooking-with-mongodb-and-solr.html
http://blog.knuthaugen.no/2010/04/cooking-with-mongodb-and-solr.html

Maybe this is an incorrect question, as you say -- MongoDB might be an
entirely different beast. 

Apologies for a novice question. My point was, for Mobile / Consumer Web
Apps -- what are the architectural considerations. I don't want it to be a
overkill, hence if solr can natively store/index/search json documents, then
that is the solution I can build on top of. 


Thanks
Rajesh



--
View this message in context: 
http://lucene.472066.n3.nabble.com/MongoDB-and-Solr-tp3986636p3986729.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: MongoDB and Solr

2012-05-29 Thread Walter Underwood
Solr does not natively store/index/search arbitrary JSON documents.

It accepts JSON in a specific format for document input.

wunder

On May 29, 2012, at 3:21 PM, rjain15 wrote:

 Hi Gora, 
 
 I am working on a Mobile App, which is updating/accessing/searching data and
 I have created a simple prototype using Solr and the Update JSON / Get JSON
 functions of Solr. 
 
 I came across some discussion on MongoDB and how it natively stores JSON
 data, and hence as I was looking at scalability of data storage/indexing, I
 was pausing to understand if I am on the right track of just using Solr or
 should I combine Solr with MongoDB as I am reading this blog post...
 
 http://blog.knuthaugen.no/2010/04/cooking-with-mongodb-and-solr.html
 http://blog.knuthaugen.no/2010/04/cooking-with-mongodb-and-solr.html
 
 Maybe this is an incorrect question, as you say -- MongoDB might be an
 entirely different beast. 
 
 Apologies for a novice question. My point was, for Mobile / Consumer Web
 Apps -- what are the architectural considerations. I don't want it to be a
 overkill, hence if solr can natively store/index/search json documents, then
 that is the solution I can build on top of. 
 
 
 Thanks
 Rajesh
 






RE: useFastVectorHighlighter doesn't work

2012-05-29 Thread ZHANG Liang F
Thanks a lot, It's quite clear now. 

-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com] 
Sent: 2012年5月29日 16:37
To: solr-user@lucene.apache.org
Subject: RE: useFastVectorHighlighter doesn't work

 So for highlight, stored=true is
 required in any circumstance, right?

Exactly. http://wiki.apache.org/solr/FieldOptionsByUseCase



Re: MongoDB and Solr

2012-05-29 Thread Gora Mohanty
On 30 May 2012 03:51, rjain15 rjai...@gmail.com wrote:
 Hi Gora,

 I am working on a Mobile App, which is updating/accessing/searching data and
 I have created a simple prototype using Solr and the Update JSON / Get JSON
 functions of Solr.

 I came across some discussion on MongoDB and how it natively stores JSON
 data, and hence as I was looking at scalability of data storage/indexing, I
 was pausing to understand if I am on the right track of just using Solr or
 should I combine Solr with MongoDB as I am reading this blog post...
[...]

A discussion on web architecture is off-topic for
this list, and will also probably draw in people with
strong opinions. Here is a brief personal opinion,
but you are probably better off trying out a couple
of different architectural prototypes, and/or talking
to someone with experience in scalable sites.

First of all, you should consider whether you really
need a NoSQL store. This would depend on the
scale, and requirements of your app. IMHO, RDBMSes
now are proven systems with many years of learning
behind them. Thus, your question should be why
NoSQL, rather than the other way around.

Solr for search should do fine, and you already
know how to get JSON in and out of it. Incidentally,
we also tested out Solr as a NoSQL store (raw data,
and not JSON, though), and were quite happy with
the performance.

Regards,
Gora