Re: Solr Segments, Segment Merges,Optimize

2014-02-23 Thread KNitin
Commit Parameters: Server does an auto commit every 30 seconds with open_searcher=false. The pipeline does a hard commit only at the very end of its run The high CPU issue I am seeing is only during the reads and not during the writes. Right now I see a direct corelation between latencies and #

Fwd: configuration for heavy system

2014-02-23 Thread Harish Reddy
Hi, We are testing solr. We have a document with some 100 indexes and there are around 10 million records.It is failing,either stuck or timed out on query. Is this indexing job possible with solr? If Yes,what should be the hardware,solr configuration and how many nodes would be optimum? Now I am

Re: configuration for heavy system

2014-02-23 Thread Erick Erickson
You haven't told us anything about _how_ you're trying to index this document nor what it's format is. Nor what 100 indexes and around 10 million records means. 1B total records? 10M total records? Solr easily handles 10s of M records on a single decent size node, I've seen between 50M and

DistributedSearch: Skipping STAGE_GET_FIELDS?

2014-02-23 Thread Gregg Donovan
In most of our Solr use-cases, we fetch only fl=uniqueKey or fl=uniqueKey,another_int_field. I'd like to be able to do a distributed search and skip STAGE_GET_FIELDS -- i.e. the stage where each shard is queried for the documents found the the top ids -- as it seems like we could be collecting

Re: DistributedSearch: Skipping STAGE_GET_FIELDS?

2014-02-23 Thread Shalin Shekhar Mangar
What a coincidence - I was about to commit a patch which makes it possible. It will be released with 4.8 See https://issues.apache.org/jira/browse/SOLR-1880 On Sun, Feb 23, 2014 at 11:27 PM, Gregg Donovan gregg...@gmail.com wrote: In most of our Solr use-cases, we fetch only fl=uniqueKey or

Re: DistributedSearch: Skipping STAGE_GET_FIELDS?

2014-02-23 Thread Shalin Shekhar Mangar
I should clarify though that this optimization only works with fl=id,score. On Sun, Feb 23, 2014 at 11:34 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: What a coincidence - I was about to commit a patch which makes it possible. It will be released with 4.8 See

Re: DistributedSearch: Skipping STAGE_GET_FIELDS?

2014-02-23 Thread Yonik Seeley
On Sun, Feb 23, 2014 at 1:08 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: I should clarify though that this optimization only works with fl=id,score. Although it seems like it should be relatively simple to make it work with other fields as well, by passing down the complete fl

Re: Solr Segments, Segment Merges,Optimize

2014-02-23 Thread KNitin
I should also mention that apart from committing, the pipeline also does a bunch of deletes for stale documents (based on a custom version field). The # of deletes can be very significant causing the % of deleted documents to be easily 40-50% of the index itself Thanks KNitin On Sun, Feb 23,

Re: Wikipedia Data Cleaning at Solr

2014-02-23 Thread Furkan KAMACI
I've compared the results when using WikipediaTokenizer for index time analyzer but there is no difference? 2014-02-23 3:44 GMT+02:00 Ahmet Arslan iori...@yahoo.com: Hi Furkan, There is org.apache.lucene.analysis.wikipedia.WikipediaTokenizer Ahmet On Sunday, February 23, 2014 2:22 AM,

Issue with PHP urlencode and solr encoding

2014-02-23 Thread manju16832003
Hi, I come across the issue with urlencoding between PHP and Solr. I have a field indexed with value *WBE(Honda Edix)* in Solr. From PHP codes, if I urlencode($string) and send to Solr, I do not get the accurate results. Here is the part of the solr query *fq=model:WBE(Honda+Edix)* However, If I

Re: Issue with PHP urlencode and solr encoding

2014-02-23 Thread Shawn Heisey
On 2/23/2014 8:58 PM, manju16832003 wrote: I come across the issue with urlencoding between PHP and Solr. I have a field indexed with value *WBE(Honda Edix)* in Solr. From PHP codes, if I urlencode($string) and send to Solr, I do not get the accurate results. Here is the part of the solr

Re: Issue with PHP urlencode and solr encoding

2014-02-23 Thread Rico P
On Mon, Feb 24, 2014 at 11:52 AM, Shawn Heisey s...@elyograg.org wrote: The Solarium library for PHP also says that it does escaping, but I can't find the manual section that they mention about term escaping. Here's a section that has an example of phrase escaping (putting the value in

Re: Issue with PHP urlencode and solr encoding

2014-02-23 Thread manju16832003
Hi Shawn and Rico, Thanks you for your suggestions, those are valuable suggestions :-). If Pharse Query does not work as we expected sometimes, I guess we could use *TermQuery* instead. http://blog.florian-hopf.de/2013/01/make-your-filters-match-faceting-in-solr.html This worked fine

Can not index raw binary data stored in Database in BLOB format.

2014-02-23 Thread Chandan khatua
Hi, We have raw binary data stored in database(not word,excel,xml etc files) in BLOB. We are trying to index using TikaEntityProcessor but nothing seems to get indexed. But the same configuration works when xml/word/excel files are stored in the BLOB field. Below is our data-config.xml: