How to create optional 'fq' plugin?

2013-07-10 Thread Learner
I am trying to suppress the error messages received when a value is not
passed to a query, 

Ex:

/select?first_name=peterfq=$first_nameq=*:*

I don't want the above query to throw error or die whenever the variable
first_name is not passed to the query hence I came up with a plugin to
return null whenever the variable is not passed in the query. The below code
works fine for 'q' but doesn't work for 'fq' parameter.

Something like...

public class OptionalQParserPlugin extends QParserPlugin {
public QParser createParser(String qstr, SolrParams localParams,
SolrParams params, SolrQueryRequest req) {
if (qstr == null || qstr.trim().length()  1) {
return new QParser(qstr, localParams, params, req) {
@Override
public Query parse() throws SyntaxError {
return null;
}
};
}
}

 Can someone let me know how to make fq variables optional?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-create-optional-fq-plugin-tp4077000.html
Sent from the Solr - User mailing list archive at Nabble.com.


How to make a variable in 'fq' optional?

2013-07-10 Thread Learner
I am trying to make a variable in fq optional, 

Ex: 

/select?first_name=peterfq=$first_nameq=*:* 

I don't want the above query to throw error or die whenever the variable
first_name is not passed to the query instead return the value corresponding
to rest of the query. I can use switch but its difficult to handle each and
every case using switch (as I need to handle switch for so many
variables)... Is there a way to resolve this via some other way?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-make-a-variable-in-fq-optional-tp4077007.html
Sent from the Solr - User mailing list archive at Nabble.com.


How to make a variable in 'fq' optional?

2013-07-10 Thread Learner
I am trying to make a variable in fq optional, 

Ex: 

/select?first_name=peterfq=$first_nameq=*:* 

I don't want the above query to throw error or die whenever the variable
first_name is not passed to the query instead return the value corresponding
to rest of the query. I can use switch but its difficult to handle each and
every case using switch (as I need to handle switch for so many
variables)... Is there a way to resolve this via some other way?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-make-a-variable-in-fq-optional-tp4077008.html
Sent from the Solr - User mailing list archive at Nabble.com.


How to make 'fq' optional?

2013-07-10 Thread Learner
I am trying to make a variable in fq optional, 

Ex: 

/select?first_name=peterfq=$first_nameq=*:* 

I don't want the above query to throw error or die whenever the variable
first_name is not passed to the query instead return the value corresponding
to rest of the query. I can use switch but its difficult to handle each and
every case using switch (as I need to handle switch for so many
variables)... Is there a way to resolve this via some other way? 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-make-fq-optional-tp4077042.html
Sent from the Solr - User mailing list archive at Nabble.com.


How to form / return filter (Query object)?

2013-07-10 Thread Learner
I was able to form a TermQuery as below,

Query query=new TermQuery(new Term(id,value));

I am trying to form a filter query something that returns just the filter
that can be used with any query type (q or fq).

if (qstr == null || qstr.trim().length()  1) {
return new QParser(qstr, localParams, params, req) {
@Override
public Query parse() throws SyntaxError {
*return null;*
}
};
}

As of now I am returning null, instead I am trying return query object with
filter (ex: *:*). Can someone let me know how to implement that?





--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-form-return-filter-Query-object-tp4077067.html
Sent from the Solr - User mailing list archive at Nabble.com.


Best way to call asynchronously - Custom data import handler

2013-07-08 Thread Learner

I wrote a custom data import handler to import data from files. I am trying
to figure out a way to make asynchronous call instead of waiting for the
data import response. Is there an easy way to invoke asynchronously  (other
than using futures and callables) ?

public class CustomFileImportHandler extends RequestHandlerBase implements
SolrCoreAware{
public void handleRequestBody(SolrQueryRequest arg0, SolrQueryResponse
arg1){
   indexer a= new indexer(); // constructor
   String status= a.Index(); // method to do indexing, trying to make it
async
}
}




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Best-way-to-call-asynchronously-Custom-data-import-handler-tp4076475.html
Sent from the Solr - User mailing list archive at Nabble.com.


Is it possible to use nested switch - urgent

2013-07-03 Thread Learner
I am using the switch parser plugin as below. Whenever there is a value for
$latlong it will invoke $fq_bbox or else it will invoke $fq_simple.. Now I
need to add one more case whenever $where is not present in the query I need
to do call fq_all and ignore both fq_bbox / fq_simple. Can someone let me
know a way to implement that?

Thanks a lot for your response..

 lst name=appends
  str name=fq{!switch case=$fq_simple default=$fq_bbox
v=$latlong}/str
/lst
lst name=invariants
str name=fq_bbox
(
_query_:{!bbox pt=$latlong sfield=geo d=$dist} OR
_query_:{!wp_optional df='addr_location_clean_i' qs=1 v=$where} OR
_query_:{!wp_optional df='addr_location_i' qs=1 v=$where}
)
/str
str name=fq_simple
( {!wp_optional df='addr_location_clean_i' qs=1 v=$where} OR
  {!wp_optional df='addr_location_i' qs=1 v=$where})
/str
str name=fq_all
 _query_:*:*
/str
  /lst



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-it-possible-to-use-nested-switch-urgent-tp4075277.html
Sent from the Solr - User mailing list archive at Nabble.com.


Error- missing sfield for spatial request

2013-06-28 Thread Learner
I am trying to combine geospatial query (latlong) with the below query inside
a search component but I am getting the below error..

*Error:*

lst name=error
str name=msgmissing sfield for spatial request/str
int name=code400/int
/lst
/response

  str name=fq_bbox
 (
   _query_:{!wp_optional df='addr_location_clean_i' qs=1
v=$fps_where}^6.2 OR
   _query_:{!wp_optional df='addr_location_i' qs=1 v=$fps_where}^6.2
OR
*  
_query_:{!geofilt}amp;sfield=latlongamp;pt=$fps_latlongamp;d=50amp;sort=geodist()
asc
*)
/str

I initially thought that it might be an issue with latlong field, but this
query works fine.

*:*fq=_query_:%22{!geofilt}%22sfield=latlongpt=47.601234,-122.330466d=50sort=geodist()%20asc


Can someone let me know how to form the geospatial (LatLonType) query in
search component?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Error-missing-sfield-for-spatial-request-tp4073940.html
Sent from the Solr - User mailing list archive at Nabble.com.


solrj indexing using embedded solr is slow

2013-06-27 Thread Learner
I was using ConcurrentUpdateSOLR for indexing documents to Solr. Later I had
a need to do portable indexing hence started using Embedded solr server.

I created a multithreaded program to create /submit the documents in batch
of 100 to Embedded SOLR server (running inside Solrj indexing process) but
for some reason it takes more time to index the data when compared with
ConcurrentUpdateSOLR server(CUSS). I was under assumption that embedded
server would take less time compared to http update (made when using CUSS)
but not sure why it takes more time...

Is there a way to speed up the indexing when using Embedded solr
serveretc..(something like specifying thread and queue size similar to
CUSS)?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solrj-indexing-using-embedded-solr-is-slow-tp4073636.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solrj indexing using embedded solr is slow

2013-06-27 Thread Learner
Shawn,

Thanks a lot for your reply.

I have pasted my entire code below, it would be great if you can let me know
if I am doing anything wrong in terms of running the code in multithreaded
environment.

http://pastebin.com/WRLn3yWn



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solrj-indexing-using-embedded-solr-is-slow-tp4073636p4073711.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Normalizing/Returning solr scores between 0 to 1

2013-06-27 Thread Learner
Might not be useful but a work around would be to divide all scores by max
score to get scores between 0 and 1.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Normalizing-Returning-solr-scores-between-0-to-1-tp4073797p4073829.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Is there a way to build indexes using SOLRJ without SOLR instance?

2013-06-27 Thread Learner
Thanks a lot for your response. I created a multithreaded program to create
/submit the documents in batch of 100 to Embedded SOLR server but for some
reason it takes more time to index the data when compared with
ConcurrentUpdateeSOLR server. I was under assumption that embedded server
would take less time compared to http calls but not sure why it takes more
time...

Is there a way to speed up the indexing by increasing queue size
etc..(something similar to concurrent update SOLR server)?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-there-a-way-to-build-indexes-using-SOLRJ-without-SOLR-instance-tp4073383p4073509.html
Sent from the Solr - User mailing list archive at Nabble.com.


Question on forming query when using switch parser plugin?

2013-06-27 Thread Learner
Hi,

I currently have a query as below. I am using the fq only if latlong value
(using switch plugin) is not empty else I am not using fq at all.

Whenever latlong value is empty, I just use value of $where (in q)
parameter to return the results based on location.

Now whenever latlong value is available I need to use both $where as well as
the values returned by $lallong (geo spatial search). Currently the values
first get filtered based on 'q' first and then the values are passed to fq
hence the values returned are always subset of the values returned by 'q'. 
I need 'q' to boost the score of the documents.

Can someone let me know how to return the values corresponding to fq
(without getting filtered by 'q')?

Example:

If I search for a place like Charlotte, NC (by passing the latitude and
longitude with distance of 20 miles), I get only the results belonging to
Charlotte, NC when I use the below query. I need to return all the results
based on distance. If I dont pass the latitude and longitude but rather if I
just pass Charlotte, geo spatial function won't kick in hence the results
will be just based on $where value in 'q'.

lst name=defaults
str name=q
(
  _query_:{!cust1 qf=person_name_lname_i v=$lname}^8.3 OR
  _query_:{!cust1 qf=person_name_lname_phonetic_i v=$lname}^8.6
)
(
  _query_:{!cust df='addr_location_clean_i' qs=1 v=$where}^6.2 OR
  _query_:{!cust df='addr_location_i' qs=1 v=$where}^6.2
)
/str/lst

 lst name=appends
  str name=fq{!switch case='*:*' default=$fq_bbox
v=$latlong}/str
  /lst
  lst name=invariants
  str name=fq_bbox_query_:{!bbox pt=$latlong sfield=geo
d=$dist}^0.2/str
/lst




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Question-on-forming-query-when-using-switch-parser-plugin-tp4073847.html
Sent from the Solr - User mailing list archive at Nabble.com.


Is there a way to build indexes using SOLRJ without SOLR instance?

2013-06-26 Thread Learner
I currently have a SOLRJ program which I am using for indexing the data in
SOLR. I am trying to figure out a way to build index without depending on
running instance of SOLR. I should be able to supply the solrconfig and
schema.xml to the indexing program which in turn create index files that I
can use with any SOLR instance. Is it possible to implement this?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-there-a-way-to-build-indexes-using-SOLRJ-without-SOLR-instance-tp4073383.html
Sent from the Solr - User mailing list archive at Nabble.com.


SOLR online reference document - WIKI

2013-06-25 Thread Learner
I just came across a wonderful online reference wiki for SOLR and thought of
sharing it with the community..

https://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference+Guide



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-online-reference-document-WIKI-tp4073110.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Name of the couple of popular app/web sites using solar as search engine

2013-06-25 Thread Learner
Check the list here..

http://wiki.apache.org/solr/PublicServers



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Name-of-the-couple-of-popular-app-web-sites-using-solar-as-search-engine-tp4073157p4073160.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Varnish

2013-06-25 Thread Learner
Check this link..
http://lucene.472066.n3.nabble.com/SolrJ-HTTP-caching-td490063.html



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Varnish-tp4072057p4073205.html
Sent from the Solr - User mailing list archive at Nabble.com.


SOLRJ commit question - SOLR 4.3.1

2013-06-24 Thread Learner
Hi,

I am adding around 100 million records to SOLR using SOLRJ. I am not
performing commit operation until I add all the documents to SOLR. I see
that my program adds the docs very fast (1 million per minute) for around 18
million documents which is expected result but after 18 million records
index time keeps climbing higher and higher..

I am using ConcurrentUpdateSolrServer for adding the docs,

I have set the queueSize=200, thread count of 8.
I am adding document batch of size 250 (I add the document array once size
reaches 250).

My system configuration:
24 cores
32GB RAM
110 GB HD

Am I doing anything wrong in terms of specifying the queue size / document
batch size?

Can someone please help me fix this index issue?





--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLRJ-commit-question-SOLR-4-3-1-tp4072868.html
Sent from the Solr - User mailing list archive at Nabble.com.


Restarting SOLR will remove all cache?

2013-06-21 Thread Learner
I have a very simple question. Does restarting SOLR removes all caches
(including disk caches if any?).

I have disabled all caches in solrconfig.xml but even then I see that there
is some caching happening all the time.

I am currently doing some performance testing and I dont want cache to play
any role now..



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Restarting-SOLR-will-remove-all-cache-tp4072200.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud - Score calculation

2013-06-20 Thread Learner
Thanks for your response. 

So in case of SolrCloud, SOLR/zookeeper takes care of managing the indexing
/ searching. So in that case I assume most of the shards will be of equal
size (I am just going to push the data to a leader). I assume IDF wont be a
big issue then since the shards size are almost equal... Am I right?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-Score-calculation-tp4071805p4071900.html
Sent from the Solr - User mailing list archive at Nabble.com.


fq vs q parameter

2013-06-19 Thread Learner
Hi,

I am currently using the below configuration in one of my handler and I was
thinking of removing the values from q parameter and including as a part of
fq parameter.

Can someone let me know if there is any performance improvement when using
fq parameter compared to q? 

  str name=q
(
  _query_:{!dismax qf=person_name_lname_i v=$fps_lname}^8.3 OR
)

  /str
/lst
  lst name=appends
  str name=fq{!switch case='*:*' default=$fq_bbox
v=$fps_latlong}/str
  /lst
  lst name=invariants
  str name=fq_bbox_query_:{!bbox pt=$fps_latlong sfield=geo
d=$fps_dist}^0.2/str
  /lst




--
View this message in context: 
http://lucene.472066.n3.nabble.com/fq-vs-q-parameter-tp4071748.html
Sent from the Solr - User mailing list archive at Nabble.com.


SolrCloud - Score calculation

2013-06-19 Thread Learner
Hi,

Sorry if its a very basic question but I am pretty new to SolrCloud and I am
trying to understand the underlying mechanism for calculating relevancy.

Currently we are using SOLR 3.6.X and we use shards to perform distributed
searching. Our shards are not of equal size hence sometimes the results are
not as we expected. 

For ex: Shard 1 has 30 million documents, Shard 2 has 30 millon documents
and shard 3 has just 3 million documents (push indexing via message queue). 

When we do a search using shards, documents from shard 1 and shard 2 gets
higher priority compared to documents in shard 3 (since its smaller).
Currently we add index time boost when adding documents to shard 3 so that
the documents from shard 3 also comes up (higher) in search results.

Now when using SolrCloud, say for example if one shard has person name
repeated 5 times (with different unique id)  and we have one more same
person name in shard 2 (with diff id), and when we do a search how does SOLR
calculate the score? Does it do something like constant scoring across
various shards in order to bring up the search results across various
shards? How does the score gets calculated.. Does the score of all 6
documents have same value(5 from shard 1 and 1 from shard 2 -if all the
fields have same value except for unique id)? 

Thanks,
BB 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-Score-calculation-tp4071805.html
Sent from the Solr - User mailing list archive at Nabble.com.


ConcurrentUpdateSolrserver - Queue size not working

2013-06-18 Thread Learner
I am using ConcurrentUpdateSolrserver to create 4 threads (threadCount=4)
with queueSize of 3.

Indexing works fine as expected.

My issue is that, I see that the documents are getting adding to server even
before it reaches the queue size.  Am I doing anything wrong? Or is
queuesize not implemented yet?

Also I dont see a very big performance improvements when I increase /
decrease the number of threads. Can someone let me know the best way to
improve indexing performance when using ConcurrentUpdateSolrserver

FYI... I am running this program on 4 core machine.. 

Sample snippet:

ConcurrentUpdateSolrServer server = new 
ConcurrentUpdateSolrServer(
solrServer, 3, 4);
try {
while ((line = bReader.readLine()) != null) {
inputDocument = line.split(\t);
  Do some processing
server.add(doc);

}}




--
View this message in context: 
http://lucene.472066.n3.nabble.com/ConcurrentUpdateSolrserver-Queue-size-not-working-tp4071408.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: preserve special characters

2013-06-18 Thread Learner
You can use keyword tokenizer..

Creates org.apache.lucene.analysis.core.KeywordTokenizer.

Treats the entire field as a single token, regardless of its content.

Example: http://example.com/I-am+example?Text=-Hello; ==
http://example.com/I-am+example?Text=-Hello;



--
View this message in context: 
http://lucene.472066.n3.nabble.com/preserve-special-characters-tp4071488p4071496.html
Sent from the Solr - User mailing list archive at Nabble.com.


Merge tool based on mergefactor

2013-06-18 Thread Learner
We have SOLR master, primarily for indexing and SOLR slave primarily for
searching. I see that the merge factor plays a key factor in Indexing as
well as searching. I would like to have a high merge factor for my master
instance and low merge factor for slave.

As of now since I just replicate the data from master, the number of
segments remain the same as that of master. May be its already there ... but
I feel that it would be great if theres a tool with which we can merge the
segments in slave based on merge factor specified in slave (config) after
replication..Is there any work around to change the number of segments
without doing complete indexing?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Merge-tool-based-on-mergefactor-tp4071506.html
Sent from the Solr - User mailing list archive at Nabble.com.