Multi threaded document atomic OR in-place updates

2018-04-04 Thread pravesh
I have a scenario as follows:

There are 2 separate threads where each will try to update the same document
in a single index for 2 separate fields, for which we are using atomic OR
in-place updates. For e.g.

id is the unique field in the index

thread-1 will update following info:
id:1001
field-1:abc1001

thread-2 will update following info:
id:1001
field-2:xyz1002

The updates are done on the same core index asynchronously.
What i would need to know is that will there be at any time inconsistency in
the index. Both the 2 threads will update different fields for the same id
field.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Problem with Synonyms

2013-09-03 Thread pravesh
SOLR has a nice analysis page. You can use it to get insight what is
happening after each filter is applied at index/search time


Regards
Pravesh



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-with-Synonyms-tp4087905p4087915.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: AW: Surprising score?

2013-07-05 Thread pravesh
Is there a way to omitNorms and still be able to use {!boost b=boost} ? 

OR you could let /omitNorms=false/  as usual and have your custom
Similarity implementation with the length normalization method overridden
for using a constant value of 1.


Regards
Pravesh



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Surprising-score-tp4075436p4075722.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SOLR guidance required

2013-05-10 Thread pravesh
Aditya,

As suggested by others, definitely you should use the filter queries
directly to query SOLR. Just keep your indexes updated.
Keep all your fields indexed/stored as per your requirements. Refer through
the filter query wiki

http://wiki.apache.org/solr/CommonQueryParameters
http://wiki.apache.org/solr/CommonQueryParameters  

http://wiki.apache.org/solr/SimpleFacetParameters
http://wiki.apache.org/solr/SimpleFacetParameters  


BTW, almost all the job sites out there (whether small/medium/big) use
SOLR/lucene to power their searches :) 


Best
Pravesh



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-guidance-required-tp4062188p4062422.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: TooManyClauses: maxClauseCount is set to 1024

2013-04-18 Thread pravesh
Just increase the value of /maxClauseCount/ in your solrconfig.xml. Keep it
large enough.

Best
Pravesh



--
View this message in context: 
http://lucene.472066.n3.nabble.com/TooManyClauses-maxClauseCount-is-set-to-1024-tp4056965p4056966.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: TooManyClauses: maxClauseCount is set to 1024

2013-04-18 Thread pravesh
Update:

Also remove your range queries from the main query and specify it as a
filter query.


Best
Pravesh



--
View this message in context: 
http://lucene.472066.n3.nabble.com/TooManyClauses-maxClauseCount-is-set-to-1024-tp4056965p4056969.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Response time in client was much longer than QTime in tomcat

2013-01-21 Thread pravesh
SOLR's QTime represents actual time it spent on searching, where as your c#
client response time might be the total time spent in sending HTTP request
and getting back the response(which might also include parsing the results)
.


Regards
Pravesh



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Response-time-in-client-was-much-longer-than-QTime-in-tomcat-tp4034148p4034996.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: need basic information

2012-09-02 Thread pravesh
Do logstash/graylog2 do log processing/searching in real time? Or can scale
for real time need?
I guess harshadmehta is looking for real-time indexing/search.

Regards
Pravesh



--
View this message in context: 
http://lucene.472066.n3.nabble.com/need-basic-information-tp4004588p4004996.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: need basic information

2012-08-31 Thread pravesh
One basic and trivial solution could be to have schema like;

Date (of type date/string) -- this would store the '-mm-dd' format date
Tag (of type string) -- the text/tag 'Account' goes into this
account-id (of type sint/int) -- account id like '123' goes into this
action (of type sting) -- values like 'created'/'updated' goes into this

Then just push your logs into solr.  http://wiki.apache.org/solr/UpdateCSV
http://wiki.apache.org/solr/UpdateCSV 

Then to get log activity for account id '123', you could query like:

http://localhost:port/solr/select/?q=id:123fq=Tag:Accountfq=Date:[d1 TO
d2]
then process the results for plotting/reporting

OR you could ask for faceting on the 'action' field like;
http://localhost:port/solr/select/?q=id:123fq=Tag:Accountfq=Date:[d1 TO
d2]facet=truefacet.field=action

This way you have facet count for created/updated/deleted etc.

Hope this is what u r looking for.

Thanx
Pravesh




--
View this message in context: 
http://lucene.472066.n3.nabble.com/need-basic-information-tp4004588p4004637.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Maximum index size on single instance of Solr

2012-08-30 Thread pravesh
We have a 48GB index size on a single shard. 20+ million documents. Recently
migrated to SOLR 3.5
But we have a cluster of SOLR servers for hosting searches. But i do see to
migrate to SOLR sharding going forward.


Thanx
Pravesh




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Maximum-index-size-on-single-instance-of-Solr-tp4004171p4004418.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr indexing slows down after few minutes

2012-08-30 Thread pravesh
Did you checked wiki: 
http://wiki.apache.org/lucene-java/ImproveIndexingSpeed
http://wiki.apache.org/lucene-java/ImproveIndexingSpeed 

Do you commit often? Do you index with multiple threads? Also try
experimenting with various available MergePolicies introduced from SOLR 3.4
onwards

Thanx
Pravesh



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-indexing-slows-down-after-few-minutes-tp4004337p4004421.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: optimum solr core size

2012-08-30 Thread pravesh
How many documents are there in the index? How many stored/indexed fields?
There is no magic number as yet for defining the size of a single
core(whether no. of docs OR the size of index), but 123GB seems to be on a
higher side, so, you could definitely go for sharding of indexes.

BTW, how are your searches/indexing performing over the time? Are there any
impact?

Regards
Pravesh



--
View this message in context: 
http://lucene.472066.n3.nabble.com/optimum-solr-core-size-tp4004251p4004424.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Load Testing in Solr

2012-08-30 Thread pravesh
Hi Dhaivat,
JMeter is a nice tool. But it all depends what sort of load are you
expecting, how complex queries are you expecting(sorting/filtering/textual
searches).  You need to consider all these to benchmark.

Thanx
Pravedsh



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Load-Testing-in-Solr-tp4004117p4004428.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Query Time problem on Big Index Solr 3.5

2012-08-30 Thread pravesh
How often the documents are added in the index? Try lessen down the optimize
frequency.  
BTW do you optimize only on master (which should be the desired way).

Also specifically for dates ranges, try to use the filter queries, this way
it would be cached and would thus be faster. This would also apply to other
fields which require very less analysis or have limited unique fields.



Thanx
Pravesh



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-Time-problem-on-Big-Index-Solr-3-5-tp4003660p4004437.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Query Time problem on Big Index Solr 3.5

2012-08-30 Thread pravesh
13 GB index isn't too big, but going forward index sharding is the solutions
for large single core indexes. This way you can scale out. 
This links will give you some info: 
http://lucidworks.lucidimagination.com/display/solr/Distributed+Search+with+Index+Sharding
http://lucidworks.lucidimagination.com/display/solr/Distributed+Search+with+Index+Sharding
 

Regards
Pravesh



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-Time-problem-on-Big-Index-Solr-3-5-tp4003660p4004630.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Query during a query

2012-08-30 Thread pravesh
Did you checked SOLR Field Collapsing/Grouping.
http://wiki.apache.org/solr/FieldCollapsing
http://wiki.apache.org/solr/FieldCollapsing 
If this is what you are looking for.


Thanx
Pravesh



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-during-a-query-tp4004624p4004631.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: IndexWrite in Lucene/Solr 3.5 is slower?

2012-06-15 Thread pravesh
BTW, Have you changed the MergePolicy  MergeScheduler settings also? Since
Lucene 3.x/3.5 onwards,
there have been new MergePolicy  MergeScheduler implementations available,
like TieredMergePolicy  ConcurrentMergeScheduler.

Regards
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/IndexWrite-in-Lucene-Solr-3-5-is-slower-tp3989764p3989768.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Dynamically pick dataDirs

2012-05-09 Thread pravesh
While n being a higher value, firing 100 cores wouldn't be a viable
solution. How do I achiever this in solr, in short I would like to
have a single core and get results out of multiple index searchers and 
that implies multiple index readers.

When you'd want to have single core with multiple index directories (which
currently is not supported by SOLR), then why can't you have a single merged
index within the core.

Lucene supports searching through multiple indexes but this hasn't been
inherited by the SOLR by design (I mean using MultiSearcher API's for a
single core with multiple index directories in it).

BTW, how big your index(es) are? Total documents? total size? etc. If each
core is small(MBs/ few GBs) then you could merge few of them together.

Regards
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Dynamically-pick-dataDirs-tp3973368p3973682.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SOLR 3.5 Index Optimization not producing single .cfs file

2012-05-04 Thread pravesh
Thanx Mike,

If you really must have a CFS (how come?) then you can call
TieredMergePolicy.setNOCFSRatio(1.0) -- not sure how/where this is
exposed in Solr though. 

BTW, would this impact the search performance? I mean i was just trying few
random keyword searches(without sort and filters) on both the system(1.4.1
vs 3.5) and found that 3.5 searches takes longer time than the 1.4.1(around
10-20% slower). Haven't done any load test till now

Regards
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-3-5-Index-Optimization-not-producing-single-cfs-file-tp3958619p3961441.html
Sent from the Solr - User mailing list archive at Nabble.com.


SOLR 3.5 Index Optimization not producing single .cfs file

2012-05-03 Thread pravesh
Hi,

I've migrated the search servers to the latest stable release (SOLR-3.5)
from SOLR-1.4.1.
We've fully recreated the index for this. After index completes, when im
optimizing the index then it is not merging the index into a single .cfs
file as was being done with 1.4.1 version.

We've set the , useCompoundFiletrue/useCompoundFile

Is it something related to the new MergePolicy being used with SOLR 3.x
onwards (I suppose it is TieredMergePolicy with 3.x version)? If yes should
i change it to the LogByteSizeMergePolicy?

Does this change requires complete rebuilt OR will do incrementally?


Regards
Pravesh



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-3-5-Index-Optimization-not-producing-single-cfs-file-tp3958619.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Any way to get reference to original request object from within Solr component?

2012-03-17 Thread pravesh
Hi Sujit,

The Http parameters ordering is above the SOLR level. Don't think this could
be controlled at SOLR level.
You can append all required values in a single Http param at then break at
your component level.

Regds
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Any-way-to-get-reference-to-original-request-object-from-within-Solr-component-tp3833703p3834082.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Basic SOLR help needed

2012-02-19 Thread pravesh
When I do a query using the Admin tool:
INST_NAME:KENTUCKY TECH PADUCAH  (There is a docment in the db that meets
this INST_NAME exactly) 

Try using this way:
INST_NAME:(KENTUCKY TECH PADUCAH)
This way all the 3 terms would be searched in the field INST_NAME, otherwise
only the first term KENTUCKY is searched in the INST_NAME and rest terms
like TECH and PADUCAH are searched in your default search field

Regds
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Basic-SOLR-help-needed-tp3759855p375.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Sorting and searching on a field

2011-12-15 Thread pravesh
I have read about the option of copying this to a different field, using
one for searching by tokenizing, and one for sorting.

That would be the optimal way of doing it. Since sorting requires the fields
not to be analyzed/tokenized, while the searching requires it. The copy
field would be the optimal solution for doing it.

Regds
Pravesh



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Sorting-and-searching-on-a-field-tp3584992p3587906.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Search Across Multiple Cores not working when quering on specific field

2011-12-14 Thread pravesh
but when i searched on a specific field than it is not working
http://localhost:8983/solr/core0/select?shards=localhost:8983/solr/core0,localhost:8983/solr/core1;
q=mnemonic_value:United

Why distributed search is not working when i search on a particular
field.? 

Since you have multiple shard infra, do the cores share the same
configurations(schema.xml/solrconfig.xml etc.)?? What error/output you are
getting for sharded query?

Regards
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Search-Across-Multiple-Cores-not-working-when-quering-on-specific-field-tp3585013p3587890.html
Sent from the Solr - User mailing list archive at Nabble.com.


Generic RemoveDuplicatesTokenFilter

2011-12-12 Thread pravesh
Hi All,

Currently, the SOLR's existing 
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.RemoveDuplicatesTokenFilterFactory
RemoveDuplicatesTokenFilter   filters the duplicate tokens with the same
text and logical at the same position.

In my case, if the same term appears duplicate one after the other then i
need to remove all duplicates and consume only single occurance of the term
(even if the positionincrementgap ==1).

For e.g. the input stream is as:  /quick brown brown brown fox jumps jumps
over the little little lazy brown dog/
Then the output shld be:  quick brown fox jumps over the little lazy brown
dog.

To acheive this, I implemented my own version of
/RemoveDuplicatesTokenFilter/ with overridden /process()/ method as:

  protected Token process(Token t) throws IOException {
  Token nextTok = peek(1);
  if(t!=null  nextTok!=null){
 if(t.termText().equalsIgnoreCase(nextTok.termText())){
return null;
  }
  }
  return t;
  }

The above implementation works as per desired and the continuous duplicates
are getting removed :)

Any advice/feedback for the above implementation :)

Regards
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Generic-RemoveDuplicatesTokenFilter-tp3581656p3581656.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr or SQL fultext search

2011-12-08 Thread pravesh
Go ahead with SOLR based text search. Thats what it is meant for and does it
great.

Regards
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-or-SQL-fultext-search-tp3566654p3569894.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr using very high I/O

2011-12-08 Thread pravesh
Can u share more info: like what is your H/W infra, CPU, RAM, HDD??
From where you pick the records/documents to index; RDBMS, Files, Network??

Regards
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-using-very-high-I-O-tp3567076p3569903.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to improve facet search?

2011-12-08 Thread pravesh
What is the type of the field on which you are getting facets (string, Text,
int, date etc.). Is it multivalued or not?
How many unique values do you have for the field?

What is your filtercache setting in your solrconfig.xml?

Regards
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-improve-facet-search-tp3569910p3569955.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to improve facet search?

2011-12-08 Thread pravesh
How many unique terms do you have in the faceting field
Since there are lot of evictions, consider increasing the size of the
filtercache. Try to keep evictions to min.

BTW how much is your index size (GB/MB??) How much RAM is allocated?

Above All:  Have you benchmarked your search? Is searching done in
milis/secs/mins?? I am trying to understand if your search could already be
performing quite good/OK.

Regards
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-improve-facet-search-tp3569910p3570048.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: cache monitoring tools?

2011-12-07 Thread pravesh
facet.limit=50
your facet.limit seems too high. Do you actually require this much?

Since there a lot of evictions from filtercache, so, increase the maxsize
value to your acceptable limit.

Regards
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/cache-monitoring-tools-tp3566645p3566811.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr sorting issue : can not sort on multivalued field

2011-12-07 Thread pravesh
Was that field multivalued=true earlier by any chance??? Did you rebuild
the index from scratch after changing it to multivalued=false ???

Regards
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-sorting-issue-can-not-sort-on-multivalued-field-tp3564266p3566832.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: how to make effective search with fq and q params

2011-11-22 Thread pravesh
Usually,

Use the 'q' parameter to search for the free text values entered by the
users (where you might want to parse the query and/or apply
boosting/phrase-sloppy, minimum match,tie etc )

Use the 'fq' to limit the searches to certain criterias like location,
date-ranges etc.

Also, avoid using the q=*:* as it implicitly translates to matchalldocsquery

Regds
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-make-effective-search-with-fq-and-q-params-tp3527217p3527535.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrJ - threading, http clients, connection managers

2011-11-07 Thread pravesh
1) Is it safe to reuse a single _mgr and _client
across all 28 cores?

both are thread-safe API as per HttpClient specs. You shld go ahead with
this.

Regds
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrJ-threading-http-clients-connection-managers-tp3485012p3486436.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: to prevent number-of-matching-terms in contributing score

2011-11-07 Thread pravesh
Did you rebuild the index from scratch. Since this is index time factor, you
need to build complete index from scratch.

Regds
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/to-prevent-number-of-matching-terms-in-contributing-score-tp3486373p3486447.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: to prevent number-of-matching-terms in contributing score

2011-11-07 Thread pravesh
Hi Samar,

You can write your custom similarity implementation, and override the
/lengthNorm()/ method to return a constant value.

Then in your /schema.xml/ specify your custom implementation as the default
similarity class.

But you need to rebuild your index from scratch for this to come into
effect(also set /omitNorms=true/ for your fields where you need this
feature)

Regds
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/to-prevent-number-of-matching-terms-in-contributing-score-tp3486373p3486512.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: best way for sum of fields

2011-11-07 Thread pravesh
I Guess,

This has nothing to do with search part. You can post process the search
results(I mean iterate through your results and sum it)

Regds
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/best-way-for-sum-of-fields-tp3477517p3486536.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: hierarchical synonym

2011-10-21 Thread pravesh
If I understood correctly, this seems you are wanting facets/hierarchical
facets.

Regds
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/hierarchical-synonym-tp344p3440090.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: inconsistent results when faceting on multivalued field

2011-10-21 Thread pravesh
Could u clarify on below:
When I make a search on facet.qua_code=1234567 ??

Are u trying to say, when u fire a fresh search for a facet item, like;
q=qua_code:1234567??

This this would fetch for documents where qua_code fields contains either
the terms 1234567 OR both terms (1234567  9384738.and others terms).
This would be since its a multivalued field and hence if you see the facet,
then its shown for both the terms.

If I reword the query as 'facet.query=qua_code:1234567 TO 1234567', I only
get the expected counts

You will get facet for documents which have term 1234567 only (facet.query
would apply to the facets,so as to which facet to be picked/shown)

Regds
Pravesh



--
View this message in context: 
http://lucene.472066.n3.nabble.com/inconsistent-results-when-faceting-on-multivalued-field-tp3438991p3440128.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Painfully slow indexing

2011-10-21 Thread pravesh
Are you posting through HTTP/SOLRJ?

Your script time 'T' includes time between sending POST request -to- the
response fetched after successful response right??

Try sending in small batches like 10-20.  BTW how many documents are u
indexing???

Regds
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Painfully-slow-indexing-tp3434399p3440175.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Getting single documents by fq on unique field, performance

2011-10-21 Thread pravesh
This approach seems fine. You might benchmark it through load test etc.

Regds
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Getting-single-documents-by-fq-on-unique-field-performance-tp3440229p3440351.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: upgrading 1.4 to 3.x

2011-10-14 Thread pravesh
Just look into your tomcat logs in more detail.specifically the logs when
tomcat loads the solr application's web context. There you might find some
clues or just post the logs snapshot here.


Regds
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/upgrading-1-4-to-3-x-tp3415044p3421225.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: text search and data aggregation, thoughts?

2011-10-14 Thread pravesh
Hi Esteban,

A lot depends on a lot of things: 1) How much volume(total documents) 2)
size of index 3) How you represent the data-aggregated part in your UI.

Your option-2 seems to be a suitable way to go. This way you tune each cores
separately. Also the use-cases for updating each document/product in both
indexes also seems different. One is updated when a product is
added/updated. Other is updated when a product in viewed/sold from search
results

Option-1 can be used in case you are showing the data-aggregation stats on
the search results page only along with each item. If it is shown in the
item-detail page then option-1 seems better.

Regds
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/text-search-and-data-aggregation-thoughts-tp3416330p3421361.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: boosting and relevancy options from solr extensibility points -java-

2011-10-05 Thread pravesh
in a certain time period (say christmas) I will promote a doc in christmas
keyword

You might check the QueryElevation component in SOLR.

or based on users interest I will boost a specific category of products.
or (I am not sure how can I do this one) I will boost docs that current
user's friends (source:facebook) purchased/used/...

You can check 
https://cwiki.apache.org/confluence/display/MAHOUT/Recommender+Documentation
apache mahout  for this purpose. It's got recommendation engine that works
pretty well.

Thanx
Pravesh


--
View this message in context: 
http://lucene.472066.n3.nabble.com/boosting-and-relevancy-options-from-solr-extensibility-points-java-tp3149916p3395752.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: is there a way to know which mm value was used?

2011-10-05 Thread pravesh
You can explicitly pass /mm/ for every search, and get it in your response,
otherwise use /debugQuery=true/, it will give you all implicitly used
defaults (but you wouldn't want to use this in production)

Thanx
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/is-there-a-way-to-know-which-mm-value-was-used-tp3395746p3395765.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Hierarchical faceting with Date

2011-10-05 Thread pravesh
You count index the date as a text field(or use a new text field to store
date as text) and then try it on this new field

Thanx
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Hierarchical-faceting-with-Date-tp3394521p3395824.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: what is scheduling ? why should we do this?how to achieve this ?

2011-08-29 Thread pravesh
SCHEDULING in OS terminology, is, when you specify cron jobs on linux/unix
machines (and scheduled tasks in windows machines).
What ever task that you schedule along with time/date or interval, it will
be automatically invoked, so, you don't have to manually log into the
machine and call the script/batch.

SOLR scheduling is also same, but with internal mechanism provided by SOLR
to set schedule to automatically invoke; delta-import, full-import, commit,
etc. This would help, so, you're not dependent at OS level because for
different OS's you have to schedule it differently(cron/scheduled-tasks).

Thanx
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/what-is-scheduling-why-should-we-do-this-how-to-achieve-this-tp3287115p3292068.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: how to update solr cache when i delete records from remote database?

2011-08-29 Thread pravesh
You would have to delete them from SOLR also, and then commit it (commit will
automatically refresh your caches).

Thanx
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-update-solr-cache-when-i-delete-records-from-remote-database-tp3291879p3292074.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: how i am getting data in my search field eventhough i removed data in my remote database?

2011-08-29 Thread pravesh
http://lucene.472066.n3.nabble.com/how-to-update-solr-cache-when-i-delete-records-from-remote-database-td3291879.html
http://lucene.472066.n3.nabble.com/how-to-update-solr-cache-when-i-delete-records-from-remote-database-td3291879.html
 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-i-am-getting-data-in-my-search-field-eventhough-i-removed-data-in-my-remote-database-tp3289008p3292095.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Viewing the complete document from within the index

2011-08-29 Thread pravesh
Reconstructing the document might not be possible, since,only the stored
fields are actually stored document-wise(un-inverted), where as the
indexed-only fields are put as inverted way.
In don't think SOLR/Lucene currently provides any way, so, one can
re-construct document in the way you desire. (It's sort of reverse
engineering not supported)

Thanx
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Viewing-the-complete-document-from-within-the-index-tp3288076p3292111.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: what is scheduling ? why should we do this?how to achieve this ?

2011-08-29 Thread pravesh
The Wiki link that you referred is quite old and is not into active
development.
I would prefer the OS based scheduling using cron jobs. You can check below
link.

http://wiki.apache.org/solr/CollectionDistribution
http://wiki.apache.org/solr/CollectionDistribution 

Thanx
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/what-is-scheduling-why-should-we-do-this-how-to-achieve-this-tp3287115p3292212.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: difference between shard and core in solr

2011-07-18 Thread pravesh
a single core is an index with same schema  , is this wat core really is ?

 YES. A single core is a independent index with its own unique schema. You
go with a new core for cases where your schema/analysis/search requirements
are completely different from your existing core(s).

can a single core contain two separate indexes with different schema in it
? 

NO (for same reason as explained above).

Is a shard  refers to a collection of index in a single physical machine
?can a single core be presented in different shards ? 

You can think of a Shard as a big index distributed across a cluster of
machines. So all shards belonging to a single core share same
schema/analysis/search requirements. You go with sharding when index is not
scalable on a single machine, or, when your index grows really big in size.


Thanx
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/difference-between-shard-and-core-in-solr-tp3178214p3178249.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Start parameter messes with rows

2011-07-18 Thread pravesh
 i just wanna be clear in the concepts of core and shard ?
a single core is an index with same schema  , is this wat core really is ?
can a single core contain two separate indexes with different schema in it
?
Is a shard  refers to a collection of index in a single physical machine
?can a single core be presented in different shards ? 

You might look into following thread:

http://lucene.472066.n3.nabble.com/difference-between-shard-and-core-in-solr-td3178214.html
 


Thanx
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Start-parameter-messes-with-rows-tp3174637p3178678.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SOLR Shard failover Query

2011-07-18 Thread pravesh
Thanx Shawn,

When I first set things up, I was using SOLR-1537 on Solr 1.5-dev.  By
the time I went into production, I had abandoned that idea and rolled
out a stock 1.4.1 index with two complete server chains, each with 7
shards.

  So, Both 2 chains were configured under cluster in load balanced manner?

Thanx
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-Shard-failover-Query-tp3178175p3181400.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How could I monitor solr cache

2011-07-18 Thread pravesh
This might be of some help:

http://wiki.apache.org/solr/SolrJmx http://wiki.apache.org/solr/SolrJmx 

Thanx
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-could-I-monitor-solr-cache-tp3181317p3181407.html
Sent from the Solr - User mailing list archive at Nabble.com.


SOLR Shard failover Query

2011-07-17 Thread pravesh
Hi,

SOLR has sharding feature, where we can distribute single search request
across shards; the results are collected,scored, and, then response is
generated.

Wanted to know, what happens in case of failure of specific shard(s),
suppose, one particular shard machine is down? Does the request fails, or,
is this handled gracefully by SOLR?

Thanx
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-Shard-failover-Query-tp3178175p3178175.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Deleted docs in IndexWriter Cache (NRT related)

2011-07-17 Thread pravesh
commit would be the safest way for making sure the deleted content doesn't
show up.

Thanx
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Deleted-docs-in-IndexWriter-Cache-NRT-related-tp3177877p3178179.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Is it possible to extract all the tokens from solr?

2011-07-14 Thread pravesh
You can use lucene for doing this. It provides TermEnum API to enumerate all
terms of field(s).
 SOLR-1.4.+ also provides a special request handler for this purpose. Check
it if that helps

Thanx
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-it-possible-to-extract-all-the-tokens-from-solr-tp3168362p3168589.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: how to build lucene-solr (espeically if behind a firewall)?

2011-07-13 Thread pravesh
If behind proxy; then use:

ant dist ${build_files:autoproxy}

Thanx
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-build-lucene-solr-espeically-if-behind-a-firewall-tp3163038p3165568.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: POST for queries, length/complexity limit of fq?

2011-07-13 Thread pravesh
1. I assume that it's worthwhile to rely on POST method instead of GET
when issuing a search. Right? As I can see, this should work. 

We do restrict users search by passing unique id's(sometimes in thousands)
in 'fq' and use POST method

Thanx
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/POST-for-queries-length-complexity-limit-of-fq-tp3162405p3165586.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How do I specify a different analyzer at search-time?

2011-07-13 Thread pravesh
You can configure analyzer for 'index-time'  for 'search-time' for each of
your field-types in schema.xml

Thanx
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-do-I-specify-a-different-analyzer-at-search-time-tp3159463p3165593.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Re:OOM at solr master node while updating document

2011-07-07 Thread pravesh
You just need to allocate more heap to your JVM.
BTW are you doing any complex search while indexing is in progress, like
getting large set of documents.

Thanx
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/OOM-at-solr-master-node-while-updating-document-tp3140018p3147475.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Read past EOF error due to broken connection

2011-06-23 Thread pravesh
Did you do manual copy of index from Master to Slave of servers. I suppose,
it won't be copied properly.
If this is the case, then you can check the size of indexes on both servers.
Otherwise, you would've to recreate the indexes.

Thanx
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Read-past-EOF-error-due-to-broken-connection-tp3091247p3098737.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Removing duplicate documents from search results

2011-06-23 Thread pravesh
Would you care to even index the duplicate documents? Finding duplicacy in
content fields would be not so easy as in some untokenized/keyword field.
May be you could do this filtering at indexing time before sending the
document to SOLR. Then the question comes, which one document should go(from
a group of duplicates)?? The latest one?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Removing-duplicate-documents-from-search-results-tp3099214p3099432.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Read past EOF error due to broken connection

2011-06-22 Thread pravesh
First commit and then try again to search.

You can also use lucene's CheckIndex tool to check  fix your index (it may
remove some corrupt segments in your index)

Thanx
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Read-past-EOF-error-due-to-broken-connection-tp3091247p3094334.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Search is taking long-long time.

2011-06-22 Thread pravesh
Was your searches always slow, OR, since you did some changes at
index/config/schema level?
Is it due to 5-mins index updation? Are you warming ur searches?

Thanx
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Search-is-taking-long-long-time-tp3095306p3098552.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: relevant result for query with boost factor on parameters

2011-06-20 Thread pravesh
You can try following:

1. Try to increase boost for fields(say, field-1^100, field-2^20), and pass
field-3 as a filtered query(using fq parameter). This way field-3 won't
effect the scoring.
2. Some implicit factors like length normalization would defer the results,
so, you can also switch it off

Thanx
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/relevant-result-for-query-with-boost-factor-on-parameters-tp3079337p3085406.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: relevant result for query with boost factor on parameters

2011-06-20 Thread pravesh
but if suppose field1 does not contain both the term *rock and roll,
*
*special attention *then field 2 results should take the priority (show
the results which has both the terms first and then show the results with
respect to boost factor or relevance)

if both the fields do not contain these terms together (show as normal one
with field1 having more relevance than field2) 

You wud've to experiment with different boost values to arrive at some
benchmark.
Start with same for field-1  field-2, then inc. for field-1 a little
bit...

:)

Thanx
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/relevant-result-for-query-with-boost-factor-on-parameters-tp3079337p3085424.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Search failed even if it has the keyword .

2011-06-17 Thread pravesh
First check, in your schema.xml, which is your default search field. Also
look if you are using WordDelimiterFilterFactory in your schema.xml for the
specific field. This would tokenize your words on every capital letter, so,
for the word DescribeYourImageWithAMovieTitle will be broken into multiple
tokens and each will be searchable.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Search-failed-even-if-it-has-the-keyword-tp3075626p3075644.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: difficult sort

2011-06-17 Thread pravesh
I'm not sure, but have looked at Collapsing feature in SOLR yet? You may have
to apply patch for 1.4.1 version, if this is what u want?


--
View this message in context: 
http://lucene.472066.n3.nabble.com/difficult-sort-tp3075563p3075661.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Search failed even if it has the keyword .

2011-06-17 Thread pravesh
What is the type for the field's  defaultquery  title in your schema.xml ?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Search-failed-even-if-it-has-the-keyword-tp3075626p3075797.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: difficult sort

2011-06-17 Thread pravesh
Yes. Then I beleive you would need multiple queries

--
View this message in context: 
http://lucene.472066.n3.nabble.com/difficult-sort-tp3075563p3075802.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SOlR -- Out of Memory exception

2011-06-16 Thread pravesh
If you are sending whole CSV in a single HTTP request using curl, why not
consider sending it in smaller chunks?


--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOlR-Out-of-Memory-exception-tp3074636p3075091.html
Sent from the Solr - User mailing list archive at Nabble.com.


High 100% CPU usage with SOLR 1.4.1

2011-06-15 Thread pravesh
Hi,

I'm planning to upgrade my system from SOLR1.2.1 to SOLR1.4.1 version.

We had done some lucene level optimizations on the SOLR slaves in the
earlier system(1.2.1), like:
1.  removed the synchronized block from the SegmentReader class's
isDeleted() method
2.  removed the synchronized block from the FSDirectory.FSIndexInput class's
readInternal() method

Since, in 1.4.1 we have better alternatives as with NIOFSDirectory  Read
only index reader (by defauly used by SOLR1.4.1), so, we did not applied
earlier changes with 1.4.1 version.

Now when load testing with 1.4.1, my CPU usage goes as high as 100%. When i
repeat the load test with my earlier setup(1.2.1) the CPU usage is below
50-55%.

But the total throughput of the new(1.441) version is much higher than the
older(1.2.1)

I would need some help in minimizing the CPU load on the new system. Could
possibly NIOFSDirectory attributes to high CPU?

Is there a mechanism in 1.4.1 to use the SimpleFSDirectory implementation
for searching(would this require full re-index)?

Help will be appreciated :)

Thanx
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/High-100-CPU-usage-with-SOLR-1-4-1-tp3068667p3068667.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: High 100% CPU usage with SOLR 1.4.1

2011-06-15 Thread pravesh
Hi Yonik,

Thanx for the prompt reply. This is a relief :)

Just 1 more question. Wouldn't the 100% CPU load would affect the system, as
system process would starve for the CPU?
 
I tried the load test 1st with 4-cores and then with 8-cores, still the CPU
usage was reaching 100%
We have index of about 32GB with 100+ fields indexed,18 fields stored 
using an optimized index for search

Thanx
Pravesh


--
View this message in context: 
http://lucene.472066.n3.nabble.com/High-100-CPU-usage-with-SOLR-1-4-1-tp3068667p3068778.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: problem with the new IndexSearcher when snpainstaller (and commit script) happen

2011-06-15 Thread pravesh
Try to look for snapshot.current file in the logs folder in ur SOLR-home dist
in your slave server, if this shows the older snapshot.

I also faced the similar issue(but with SOLR 1.2.1), using the
collection-distribution scripts.

The way i resolved it was:
1. Stopped the index replication script(s)  index updation scripts.
2. cleaning the slave(s) status directory from the master (keep the status
directory  only delete its contents)
3. removed the snapshot.current file from salve's [SOLR-Home]/log folder
4. restart the snapshooter on master and snappuller on slave(s)

Hope this helps

Thanx
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/problem-with-the-new-IndexSearcher-when-snpainstaller-and-commit-script-happen-tp3066902p3068903.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: High 100% CPU usage with SOLR 1.4.1

2011-06-15 Thread pravesh
Yes Erick,

I did create an artificial load test with 30 users concurrently doing search
(around 28000 samples of actual queries). With 1.4.1, the test completes
within 3hrs without any failures (with SOLR1.2.1 it wouldn't match with this
performance, i.e., in 3 hrs it could only do 9700 samples). My actual
production load is much less than that(3hrs cycle is actually spans to 24
hrs on production). I will repeat this with actual load now.

Thanx all 4 ur time :)

Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/High-100-CPU-usage-with-SOLR-1-4-1-tp3068667p3070663.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How do I make sure the resulting documents contain the query terms?

2011-06-07 Thread pravesh
k0 -- A | C
k1 -- A | B
k2 -- A | B | C
k3 -- B | C 
Now let q=k1, how do I make sure C doesn't appear as a result since it
doesn't contain any occurence of k1? 
Do we bother to do that. Now that's what lucene does :)

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-do-I-make-sure-the-resulting-documents-contain-the-query-terms-tp3031637p3033451.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Applying synonyms increase the data size from MB to GBs

2011-06-06 Thread pravesh
Since you r using expand=true , so, every time a matching synonym entry is
found the analyzer is expanding the term with all synonyms set in the index.
This may cause the index to grow in size.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Applying-synonyms-increase-the-data-size-from-MB-to-GBs-tp3028700p3028877.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Feature: skipping caches and info about cache use

2011-06-06 Thread pravesh
SOLR1.3+  logs only the fresh queries in the logs. If you re-run the same
query then it is served from cache, and not printed on the logs(unless
cache(s) are not warmed or sercher is reopened).

So, Otis's proposal would definitely help in doing some benchmarks 
baselining the search :)

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Feature-skipping-caches-and-info-about-cache-use-tp3020325p3028894.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Strategy -- Frequent updates in our application

2011-06-03 Thread pravesh
You can use DataImportHandler for your full/incremental indexing. Now NRT
indexing could vary as per business requirements (i mean delay cud be 5-mins
,10-mins,15-mins,OR, 30-mins). Then it also depends on how much volume will
be indexed incrementally.
BTW, r u having Master+Slave SOLR setup?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Strategy-Frequent-updates-in-our-application-tp3018386p3019040.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Sorting

2011-06-03 Thread pravesh
BTW, why r u sorting on this field?
You could also index  store this field twice. First, in its original value,
and then second, by encoding to some unique code/hash and index it and sort
on that.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Sorting-tp3017285p3019055.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Strategy -- Frequent updates in our application

2011-06-03 Thread pravesh
You can go ahead with the Master/Slave setup provided by SOLR. Its trivial to
setup and you also get SOLR's operational scripts for index synch'ing b/w
Master-to-Slave(s), OR the Java based replication feature.

There is no need to re-invent other architecture :)

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Strategy-Frequent-updates-in-our-application-tp3018386p3019475.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Anyway to know changed documents?

2011-06-01 Thread pravesh
If your index size if smaller (a few 100 MBs), you can consider the SOLR's
operational script tools provided with distribution to sync indexes from
Master to Slave servers. It will update(copies) the latest index snapshot
from Master to Slave(s). SOLR wiki provides good info on how to set them as
Cron, so, no manual intervention is required. BTW, SOLR1.4+ ,also has
feature where only the changed segment gets synched(but then index need not
be optimized)

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Anyway-to-know-changed-documents-tp3009527p3010015.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Query problem in Solr

2011-06-01 Thread pravesh
We're using Solr to search on a Shop index and a Product index
Do you have 2 separate indexes (using distributed shard search)?? I'm sure
you are actually having only single index.


 Currently a Shop has a field `shop_keyword` which also contains the
 keywords of the products assigned to it.

You mean, for a shop, you are first concatenating all keywords of all
products and then saving in shop_keywords field for the shop?? In this case
there is no way u can identify which keyword occurs in which product in ur
index.
You might need to change the index structure, may be, when u post documents,
then post a single document for a single product(with fields like
title,price,shop-id, etc), instead of single document for a single shop.
Hope I make myself clear

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-problem-in-Solr-tp3009812p3010072.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Re: Anyway to know changed documents?

2011-06-01 Thread pravesh
SOLR wiki will provide help on this. You might be interested in pure Java
based replication too. I'm not sure,whether SOLR operational will have this
feature(synch'ing only changed segments). You might need to change
configuration in searchconfig.xml

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Anyway-to-know-changed-documents-tp3009527p3010085.html
Sent from the Solr - User mailing list archive at Nabble.com.


Can we stream binary data with StreamingUpdateSolrServer ?

2011-05-30 Thread pravesh
Hi,

I'm using StreamingUpdateSolrServer to post a batch of content to SOLR1.4.1.
By looking at StreamingUpdateSolrServer code, it looks it only provides the
content to be streamed in XML format only.

Can we use it to stream data in binary format?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-we-stream-binary-data-with-StreamingUpdateSolrServer-tp3001813p3001813.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: UniqueKey field in schema.xml

2011-05-26 Thread pravesh
Create a new unique field for this purpose, like, myUniqueField, then, just
combine (product-id+cust-id) and post it to this new field.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/UniqueKey-field-in-schema-xml-tp2987807p2988098.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: What is omitNorms

2011-05-26 Thread pravesh
omitNorms=true on a field will have following effect:

1. length normalization will not work on the specific field-- Which means
matching documents with shorter length will not be preferred/boost over
matching documents with greater length for the specific field, at search
time.
2. Index time boosting will not be available on the field.

If, both the above cases are not required by you, then, you can set
omitNorms=true for the specific fields.
This has an added advantage, it will save you some(or a lot of) RAM also,
since, with omitNorms=false on total N fields in the index will require
RAM of size:

 Total docs in index * 1 byte * N

--
View this message in context: 
http://lucene.472066.n3.nabble.com/What-is-omitNorms-tp2987547p2988124.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: FieldCache

2011-05-26 Thread pravesh
This is because you may be having only 10 unique terms in your indexed Field.
BTW, what do you mean by controlling the FieldCache?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/FieldCache-tp2987541p2988142.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: What is document tag in data-config.xml of Solr

2011-05-26 Thread pravesh
document tag represents to the actual SOLR document that will be posted by
the DIH. This mapping is used by the DIH to map DB-to-index document.
You can have multiple entity tags, as you might be pulling data from more
than 1 table.

You can only have one document tag in you db-data-config.xml (remember,
the purpose of db-data-config.xml is to map db-structure TO index-structure
semantics)


--
View this message in context: 
http://lucene.472066.n3.nabble.com/What-is-document-tag-in-data-config-xml-of-Solr-tp2978668p2988176.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Too many Boolean Clause and Filter Query

2011-05-26 Thread pravesh
I'm sure you can fix this by increasing maxBooleanClauses value to some
max.
This shld apply to filter query as well

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Too-many-Boolean-Clause-and-Filter-Query-tp2974848p2988190.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Out of memory on sorting

2011-05-26 Thread pravesh
For saving Memory:

1. allocate as much memory to the JVM (especially if you are using 64bit OS)
2. You can set omitNorms=true for your date  id fields (actually for all
fields where index-time boosting  length normalization isn't required. This
will require a full reindex)
3. Are you sorting on all document available in index. Try to limit it using
filter queries.
4. Avoid match all docs query like, q=*:*  (if you are using this)
5. If you could do away with sorting on ID field, and sort on field with
lesser unique terms


Hope this helps

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Out-of-memory-on-sorting-tp2960578p2988336.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: how to integrate solr with spring framework

2011-05-26 Thread pravesh
Just read through:

http://www.springbyexample.org/examples/solr-client.html

http://static.springsource.org/spring-roo/reference/html/base-solr.html

--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-integrate-solr-with-spring-framework-tp2955540p2988363.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Huge performance drop in distributed search w/ shards on the same server/container

2011-05-26 Thread pravesh
Do you really require multi-shards? Single core/shard will do for even
millions of documents and the search will be faster than searching on
multi-shards.

Consider multi-shard when you cannot scale-up on a single shard/machine(e.g,
CPU,RAM etc. becomes major block). 

Also read through the SOLR distributed search wiki to check on all tuning up
required at application server(Tomcat) end, like maxHTTP request settings.
For a single request in a multi-shard setup internal HTTP requests are made
through all queried shards, so, make sure you set this parameter higher.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Huge-performance-drop-in-distributed-search-w-shards-on-the-same-server-container-tp2938421p2988464.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How does Solr's MoreLikeThis component internally work to get results?

2011-05-26 Thread pravesh
This will help:

http://cephas.net/blog/2008/03/30/how-morelikethis-works-in-lucene/



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-does-Solr-s-MoreLikeThis-component-internally-work-to-get-results-tp2938407p2988487.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: What is omitNorms

2011-05-26 Thread pravesh
What would be the default value for omitNorms? 
--- Default value is false

Is general advise to ignore this and set the value explicitly? 
--- Depends on your requirement. Do this on field-per-field basis. Set to
false on fields where you want the norms, or, set to true on fields
where you want to omit the norms

--
View this message in context: 
http://lucene.472066.n3.nabble.com/What-is-omitNorms-tp2987547p2988714.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: FieldCache

2011-05-26 Thread pravesh
Since FieldCache is an expert level API in lucene, there is no direct control
provided by SOLR/Lucene to control its size.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/FieldCache-tp2987541p2989443.html
Sent from the Solr - User mailing list archive at Nabble.com.