Re: Indexing tweet and searching @keyword OR #keyword

2011-08-10 Thread Mohammad Shariq
I tried tweaking WordDelimiterFactory but I won't accept # OR @ symbols
and it ignored totally.
I need solution plz suggest.

On 4 August 2011 21:08, Jonathan Rochkind rochk...@jhu.edu wrote:

 It's the WordDelimiterFactory in your filter chain that's removing the
 punctuation entirely from your index, I think.

 Read up on what the WordDelimiter filter does, and what it's settings are;
 decide how you want things to be tokenized in your index to get the behavior
 your want; either get WordDelimiter to do it that way by passing it
 different arguments, or stop using WordDelimiter; come back with any
 questions after trying that!



 On 8/4/2011 11:22 AM, Mohammad Shariq wrote:

 I have indexed around 1 million tweets ( using  text dataType).
 when I search the tweet with #  OR @  I dont get the exact result.
 e.g.  when I search for #ipad OR @ipad   I get the result where ipad
 is
 mentioned skipping the # and @.
 please suggest me, how to tune or what are filterFactories to use to get
 the
 desired result.
 I am indexing the tweet as text, below is text which is there in my
 schema.xml.


 fieldType name=text class=solr.TextField positionIncrementGap=100
 analyzer type=index
 tokenizer class=solr.**KeywordTokenizerFactory/
 filter class=solr.**CommonGramsFilterFactory words=stopwords.txt
 minShingleSize=3 maxShingleSize=3 ignoreCase=true/
 filter class=solr.**WordDelimiterFilterFactory
 generateWordParts=1
 generateNumberParts=1 catenateWords=1 catenateNumbers=1
 catenateAll=0 splitOnCaseChange=1/
 filter class=solr.**LowerCaseFilterFactory/
 filter class=solr.**SnowballPorterFilterFactory
 protected=protwords.txt language=English/
 /analyzer
 analyzer type=query
 tokenizer class=solr.**KeywordTokenizerFactory/
 filter class=solr.**CommonGramsFilterFactory
 words=stopwords.txt
 minShingleSize=3 maxShingleSize=3 ignoreCase=true/
 filter class=solr.**WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
 filter class=solr.**LowerCaseFilterFactory/
 filter class=solr.**SnowballPorterFilterFactory
 protected=protwords.txt language=English/
 /analyzer
 /fieldType




-- 
Thanks and Regards
Mohammad Shariq


Re: Indexing tweet and searching @keyword OR #keyword

2011-08-10 Thread Mohammad Shariq
Do you really want a search on ipad to *fail* to match input of #ipad?
Or
vice-versa?
My requirement is :  I want to search both '#ipad' and 'ipad' for q='ipad'
BUT for q='#ipad'  I want to search ONLY '#ipad' excluding 'ipad'.


On 10 August 2011 19:49, Erick Erickson erickerick...@gmail.com wrote:

 Please look more carefully at the documentation for WDDF,
 specifically:

 split on intra-word delimiters (all non alpha-numeric characters).

 WordDelimiterFilterFactory will always throw away non alpha-numeric
 characters, you can't tell it do to otherwise. Try some of the other
 tokenizers/analyzers to get what you want, and also look at the
 admin/analysis page to see what the exact effects are of your
 fieldType definitions.

 Here's a great place to start:
 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

 You probably want something like WhitespaceTokenizerFactory
 followed by LowerCaseFilterFactory or some such...

 But I really question whether this is what you want either. Do you
 really want a search on ipad to *fail* to match input of #ipad? Or
 vice-versa?

 KeywordTokenizerFactory is probably not the place you want to start,
 the tokenization process doesn't break anything up, you happen to be
 getting separate tokens because of WDDF, which as you see can't
 process things the way you want.


 Best
 Erick

 On Wed, Aug 10, 2011 at 3:09 AM, Mohammad Shariq shariqn...@gmail.com
 wrote:
  I tried tweaking WordDelimiterFactory but I won't accept # OR @ symbols
  and it ignored totally.
  I need solution plz suggest.
 
  On 4 August 2011 21:08, Jonathan Rochkind rochk...@jhu.edu wrote:
 
  It's the WordDelimiterFactory in your filter chain that's removing the
  punctuation entirely from your index, I think.
 
  Read up on what the WordDelimiter filter does, and what it's settings
 are;
  decide how you want things to be tokenized in your index to get the
 behavior
  your want; either get WordDelimiter to do it that way by passing it
  different arguments, or stop using WordDelimiter; come back with any
  questions after trying that!
 
 
 
  On 8/4/2011 11:22 AM, Mohammad Shariq wrote:
 
  I have indexed around 1 million tweets ( using  text dataType).
  when I search the tweet with #  OR @  I dont get the exact result.
  e.g.  when I search for #ipad OR @ipad   I get the result where
 ipad
  is
  mentioned skipping the # and @.
  please suggest me, how to tune or what are filterFactories to use to
 get
  the
  desired result.
  I am indexing the tweet as text, below is text which is there in my
  schema.xml.
 
 
  fieldType name=text class=solr.TextField
 positionIncrementGap=100
  analyzer type=index
  tokenizer class=solr.**KeywordTokenizerFactory/
  filter class=solr.**CommonGramsFilterFactory
 words=stopwords.txt
  minShingleSize=3 maxShingleSize=3 ignoreCase=true/
  filter class=solr.**WordDelimiterFilterFactory
  generateWordParts=1
  generateNumberParts=1 catenateWords=1 catenateNumbers=1
  catenateAll=0 splitOnCaseChange=1/
  filter class=solr.**LowerCaseFilterFactory/
  filter class=solr.**SnowballPorterFilterFactory
  protected=protwords.txt language=English/
  /analyzer
  analyzer type=query
  tokenizer class=solr.**KeywordTokenizerFactory/
  filter class=solr.**CommonGramsFilterFactory
  words=stopwords.txt
  minShingleSize=3 maxShingleSize=3 ignoreCase=true/
  filter class=solr.**WordDelimiterFilterFactory
  generateWordParts=1 generateNumberParts=1 catenateWords=1
  catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
  filter class=solr.**LowerCaseFilterFactory/
  filter class=solr.**SnowballPorterFilterFactory
  protected=protwords.txt language=English/
  /analyzer
  /fieldType
 
 
 
 
  --
  Thanks and Regards
  Mohammad Shariq
 




-- 
Thanks and Regards
Mohammad Shariq


Indexing tweet and searching @keyword OR #keyword

2011-08-04 Thread Mohammad Shariq
I have indexed around 1 million tweets ( using  text dataType).
when I search the tweet with #  OR @  I dont get the exact result.
e.g.  when I search for #ipad OR @ipad   I get the result where ipad is
mentioned skipping the # and @.
please suggest me, how to tune or what are filterFactories to use to get the
desired result.
I am indexing the tweet as text, below is text which is there in my
schema.xml.


fieldType name=text class=solr.TextField positionIncrementGap=100
analyzer type=index
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.CommonGramsFilterFactory words=stopwords.txt
minShingleSize=3 maxShingleSize=3 ignoreCase=true/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1
generateNumberParts=1 catenateWords=1 catenateNumbers=1
catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory
protected=protwords.txt language=English/
/analyzer
analyzer type=query
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.CommonGramsFilterFactory words=stopwords.txt
minShingleSize=3 maxShingleSize=3 ignoreCase=true/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory
protected=protwords.txt language=English/
/analyzer
/fieldType

-- 
Thanks and Regards
Mohammad Shariq


Delete by range query

2011-07-27 Thread Mohammad Shariq
Hi,
I want to delete the bunch of docs from my solr using rangeQuery.
I have one field called 'time' which is tint.

I am deleting using the query :
deletequerytime:[1296777600+TO+1296778000]/query/delete

but solr is returning Error, Saying bad request.
however I am able to delete one by one using below deleteQuery:
deletequerytime:1296777600/query/delete

Please suggest any solution to this problem.

-- 
Thanks and Regards
Mohammad Shariq


Re: Delete by range query

2011-07-27 Thread Mohammad Shariq
Thanks Koji
Its working now.


On 27 July 2011 19:30, Koji Sekiguchi k...@r.email.ne.jp wrote:

 deletequerytime:[**1296777600+TO+1296778000]/**query/delete


 Should be deletequerytime:[**1296777600 TO
 1296778000]/query/delete ?

 koji
 --
 http://www.rondhuit.com/en/




-- 
Thanks and Regards
Mohammad Shariq


Re: How to find whether solr server is running or not

2011-07-19 Thread Mohammad Shariq
Check for HTTP response code, if its other than 200 means services are not
OK.

On 19 July 2011 14:39, Romi romijain3...@gmail.com wrote:

 I am running an application that get search results from solr server. But
 when server is not running i get no response from the server. Is there any
 way i can found that my server is not running so that i can give proper
 error message regarding it


 -
 Thanks  Regards
 Romi
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/How-to-find-whether-solr-server-is-running-or-not-tp3181870p3181870.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Thanks and Regards
Mohammad Shariq


Re: Need Suggestion

2011-07-15 Thread Mohammad Shariq
below are  certain things to do for search latency.
1) Do bulk insert.
2) Commit after every ~5000 docs.
3) Do optimization once in a day.
4) in search query  use fq parameter.

What is the size of JVM you are using ???





On 15 July 2011 17:44, Rohit ro...@in-rev.com wrote:

 I am facing some performance issues on my Solr Installation (3core server).
 I am indexing live twitter data based on certain keywords, as you can
 imagine, the rate at which documents are received is very high and so the
 updates to the core is very high and regular. Given below are the document
 size on my three core.



 Twitter  - 26874747

 Core2-  3027800

 Core3-  6074253



 My Server configuration has 8GB RAM, but now we are experiencing server
 performance drop. What can be done to improve this?  Also, I have a few
 questions.



 1.  Does the number of commit takes high memory? Will reducing the
 number of commits per hour help?
 2.  Most of my queries are field or date faceting based? how to improve
 those?



 Regards,

 Rohit





 Regards,

 Rohit

 Mobile: +91-9901768202

 About Me:  http://about.me/rohitg http://about.me/rohitg






-- 
Thanks and Regards
Mohammad Shariq


how to do ExactMatch for PhraseQuery

2011-07-14 Thread Mohammad Shariq
I need exact match On PhraseQuery.
when I search for the phrase call it spring   I get the result for :
1) It's spring
2) The spring

but my requirement is ExactMatch for PhraseQuery.
my search field is text.
Along with PhraseQuery I am doing RegularQuery too.
how to tune the solr to do Exactmatch for PhraseQuery without affecting
RegularQuery.

below is text field of schema.xml.

fieldType name=text class=solr.TextField positionIncrementGap=100
analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.CommonGramsFilterFactory words=stopwords.txt
maxShingleSize=3 ignoreCase=true/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1
generateNumberParts=1 catenateWords=1 catenateNumbers=1
catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory language=English
protected=protwords.txt/

/analyzeranalyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.CommonGramsFilterFactory words=stopwords.txt
maxShingleSize=3 ignoreCase=true/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1
generateNumberParts=1 catenateWords=0 catenateNumbers=0
catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory language=English
protected=protwords.txt/
/analyzer
/fieldType



Thanks
Shariq


Re: what s the optimum size of SOLR indexes

2011-07-04 Thread Mohammad Shariq
There are Solutions for Indexing huge data. e.g.  SolrCloud,
ZooKeeperIntegration, MultiCore, MultiShard.
depending on your requirement you can choose one or other.


On 4 July 2011 17:21, Jame Vaalet jvaa...@capitaliq.com wrote:

 Hi,

 What would be the maximum size of a single SOLR index file for resulting in
 optimum search time ?
 In case I have got to index all the documents in my repository  (which is
 in TB size) what would be the ideal architecture to follow , distributed
 SOLR ?

 Regards,
 JAME VAALET
 Software Developer
 EXT :8108
 Capital IQ




-- 
Thanks and Regards
Mohammad Shariq


How to disable Phonetic search

2011-06-29 Thread Mohammad Shariq
I am using solr1.4
When I search for keyword ansys I get lot of posts.
but when I search for ansys NOT ansi I get nothing.
I guess its because of Phonetic search, ansys is converted into ansi (
that is NOT keyword) and nothing returns.

How to handle this kind of problem.

-- 
Thanks and Regards
Mohammad Shariq


Re: How to disable Phonetic search

2011-06-29 Thread Mohammad Shariq
I was using SnowballPorterFilterFactory for stemming, and that stammer was
stemming the words.
I added the keyword ansys to   file protwords.txt.
Now the stemming is not happening for ansys and Its OK now.

On 29 June 2011 17:12, Ahmet Arslan iori...@yahoo.com wrote:

  I am using solr1.4
  When I search for keyword ansys I get lot of posts.
  but when I search for ansys NOT ansi I get nothing.
  I guess its because of Phonetic search, ansys is
  converted into ansi (
  that is NOT keyword) and nothing returns.
 
  How to handle this kind of problem.

 Find and remove occurrences of solr.PhoneticFilterFactory from your
 schema.xml file.




-- 
Thanks and Regards
Mohammad Shariq


Re: Removing duplicate documents from search results

2011-06-28 Thread Mohammad Shariq
I also have the problem of duplicate docs.
I am indexing news articles, Every news article will have the source URL,
If two news-article has the same URL, only one need to index,
removal of duplicate at index time.



On 23 June 2011 21:24, simon mtnes...@gmail.com wrote:

 have you checked out the deduplication process that's available at
 indexing time ? This includes a fuzzy hash algorithm .

 http://wiki.apache.org/solr/Deduplication

 -Simon

 On Thu, Jun 23, 2011 at 5:55 AM, Pranav Prakash pra...@gmail.com wrote:
  This approach would definitely work is the two documents are *Exactly*
 the
  same. But this is very fragile. Even if one extra space has been added,
 the
  whole hash would change. What I am really looking for is some %age
  similarity between documents, and remove those documents which are more
 than
  95% similar.
 
  *Pranav Prakash*
 
  temet nosce
 
  Twitter http://twitter.com/pranavprakash | Blog 
 http://blog.myblive.com |
  Google http://www.google.com/profiles/pranny
 
 
  On Thu, Jun 23, 2011 at 15:16, Omri Cohen o...@yotpo.com wrote:
 
  What you need to do, is to calculate some HASH (using any message digest
  algorithm you want, md5, sha-1 and so on), then do some reading on solr
  field collapse capabilities. Should not be too complicated..
 
  *Omri Cohen*
 
 
 
  Co-founder @ yotpo.com | o...@yotpo.com | +972-50-7235198 |
 +972-3-6036295
 
 
 
 
  My profiles: [image: LinkedIn] http://www.linkedin.com/in/omric
 [image:
  Twitter] http://www.twitter.com/omricohe [image:
  WordPress]http://omricohen.me
   Please consider your environmental responsibility. Before printing this
  e-mail message, ask yourself whether you really need a hard copy.
  IMPORTANT: The contents of this email and any attachments are
 confidential.
  They are intended for the named recipient(s) only. If you have received
  this
  email by mistake, please notify the sender immediately and do not
 disclose
  the contents to anyone or make copies thereof.
  Signature powered by
  
 
 http://www.wisestamp.com/email-install?utm_source=extensionutm_medium=emailutm_campaign=footer
  
  WiseStamp
 
 http://www.wisestamp.com/email-install?utm_source=extensionutm_medium=emailutm_campaign=footer
  
 
 
 
  -- Forwarded message --
  From: Pranav Prakash pra...@gmail.com
  Date: Thu, Jun 23, 2011 at 12:26 PM
  Subject: Removing duplicate documents from search results
  To: solr-user@lucene.apache.org
 
 
  How can I remove very similar documents from search results?
 
  My scenario is that there are documents in the index which are almost
  similar (people submitting same stuff multiple times, sometimes
 different
  people submitting same stuff). Now when a search is performed for
  keyword,
  in the top N results, quite frequently, same document comes up multiple
  times. I want to remove those duplicate (or possible duplicate)
 documents.
  Very similar to what Google does when they say In order to show you
 most
  relevant result, duplicates have been removed. How can I achieve this
  functionality using Solr? Does Solr has an implied or plugin which could
  help me with it?
 
 
  *Pranav Prakash*
 
  temet nosce
 
  Twitter http://twitter.com/pranavprakash | Blog 
 http://blog.myblive.com
  
  |
  Google http://www.google.com/profiles/pranny
 
 




-- 
Thanks and Regards
Mohammad Shariq


Re: Removing duplicate documents from search results

2011-06-28 Thread Mohammad Shariq
I am making the Hash from URL, but I can't use this as UniqueKey because I
am using UUID as UniqueKey,
Since I am using SOLR as  index engine Only and using Riak(key-value
storage) as storage engine, I dont want to do the overwrite on duplicate.
I just need to discard the duplicates.



2011/6/28 François Schiettecatte fschietteca...@gmail.com

 Create a hash from the url and use that as the unique key, md5 or sha1
 would probably be good enough.

 Cheers

 François

 On Jun 28, 2011, at 7:29 AM, Mohammad Shariq wrote:

  I also have the problem of duplicate docs.
  I am indexing news articles, Every news article will have the source URL,
  If two news-article has the same URL, only one need to index,
  removal of duplicate at index time.
 
 
 
  On 23 June 2011 21:24, simon mtnes...@gmail.com wrote:
 
  have you checked out the deduplication process that's available at
  indexing time ? This includes a fuzzy hash algorithm .
 
  http://wiki.apache.org/solr/Deduplication
 
  -Simon
 
  On Thu, Jun 23, 2011 at 5:55 AM, Pranav Prakash pra...@gmail.com
 wrote:
  This approach would definitely work is the two documents are *Exactly*
  the
  same. But this is very fragile. Even if one extra space has been added,
  the
  whole hash would change. What I am really looking for is some %age
  similarity between documents, and remove those documents which are more
  than
  95% similar.
 
  *Pranav Prakash*
 
  temet nosce
 
  Twitter http://twitter.com/pranavprakash | Blog 
  http://blog.myblive.com |
  Google http://www.google.com/profiles/pranny
 
 
  On Thu, Jun 23, 2011 at 15:16, Omri Cohen o...@yotpo.com wrote:
 
  What you need to do, is to calculate some HASH (using any message
 digest
  algorithm you want, md5, sha-1 and so on), then do some reading on
 solr
  field collapse capabilities. Should not be too complicated..
 
  *Omri Cohen*
 
 
 
  Co-founder @ yotpo.com | o...@yotpo.com | +972-50-7235198 |
  +972-3-6036295
 
 
 
 
  My profiles: [image: LinkedIn] http://www.linkedin.com/in/omric
  [image:
  Twitter] http://www.twitter.com/omricohe [image:
  WordPress]http://omricohen.me
  Please consider your environmental responsibility. Before printing
 this
  e-mail message, ask yourself whether you really need a hard copy.
  IMPORTANT: The contents of this email and any attachments are
  confidential.
  They are intended for the named recipient(s) only. If you have
 received
  this
  email by mistake, please notify the sender immediately and do not
  disclose
  the contents to anyone or make copies thereof.
  Signature powered by
  
 
 
 http://www.wisestamp.com/email-install?utm_source=extensionutm_medium=emailutm_campaign=footer
 
  WiseStamp
 
 
 http://www.wisestamp.com/email-install?utm_source=extensionutm_medium=emailutm_campaign=footer
 
 
 
 
  -- Forwarded message --
  From: Pranav Prakash pra...@gmail.com
  Date: Thu, Jun 23, 2011 at 12:26 PM
  Subject: Removing duplicate documents from search results
  To: solr-user@lucene.apache.org
 
 
  How can I remove very similar documents from search results?
 
  My scenario is that there are documents in the index which are almost
  similar (people submitting same stuff multiple times, sometimes
  different
  people submitting same stuff). Now when a search is performed for
  keyword,
  in the top N results, quite frequently, same document comes up
 multiple
  times. I want to remove those duplicate (or possible duplicate)
  documents.
  Very similar to what Google does when they say In order to show you
  most
  relevant result, duplicates have been removed. How can I achieve this
  functionality using Solr? Does Solr has an implied or plugin which
 could
  help me with it?
 
 
  *Pranav Prakash*
 
  temet nosce
 
  Twitter http://twitter.com/pranavprakash | Blog 
  http://blog.myblive.com
 
  |
  Google http://www.google.com/profiles/pranny
 
 
 
 
 
 
  --
  Thanks and Regards
  Mohammad Shariq




-- 
Thanks and Regards
Mohammad Shariq


Re: Removing duplicate documents from search results

2011-06-28 Thread Mohammad Shariq
Hey François,
thanks for your suggestion, I followed the same link (
http://wiki.apache.org/solr/Deduplication)

they have the solution*, either make Hash as uniqueKey OR overwrite on
duplicate,
I dont need either.

I need Discard on Duplicate.
*



 I have not used it but it looks like it will do the trick.

 François

 On Jun 28, 2011, at 8:44 AM, Pranav Prakash wrote:

  I found the deduplication thing really useful. Although I have not yet
  started to work on it, as there are some other low hanging fruits I've to
  capture. Will share my thoughts soon.
 
 
  *Pranav Prakash*
 
  temet nosce
 
  Twitter http://twitter.com/pranavprakash | Blog 
 http://blog.myblive.com |
  Google http://www.google.com/profiles/pranny
 
 
  2011/6/28 François Schiettecatte fschietteca...@gmail.com
 
  Maybe there is a way to get Solr to reject documents that already exist
 in
  the index but I doubt it, maybe someone else with can chime here here.
 You
  could do a search for each document prior to indexing it so see if it is
  already in the index, that is probably non-optimal, maybe it is easiest
 to
  check if the document exists in your Riak repository, it no add it and
 index
  it, and drop if it already exists.
 
  François
 
  On Jun 28, 2011, at 8:24 AM, Mohammad Shariq wrote:
 
  I am making the Hash from URL, but I can't use this as UniqueKey
 because
  I
  am using UUID as UniqueKey,
  Since I am using SOLR as  index engine Only and using Riak(key-value
  storage) as storage engine, I dont want to do the overwrite on
 duplicate.
  I just need to discard the duplicates.
 
 
 
  2011/6/28 François Schiettecatte fschietteca...@gmail.com
 
  Create a hash from the url and use that as the unique key, md5 or sha1
  would probably be good enough.
 
  Cheers
 
  François
 
  On Jun 28, 2011, at 7:29 AM, Mohammad Shariq wrote:
 
  I also have the problem of duplicate docs.
  I am indexing news articles, Every news article will have the source
  URL,
  If two news-article has the same URL, only one need to index,
  removal of duplicate at index time.
 
 
 
  On 23 June 2011 21:24, simon mtnes...@gmail.com wrote:
 
  have you checked out the deduplication process that's available at
  indexing time ? This includes a fuzzy hash algorithm .
 
  http://wiki.apache.org/solr/Deduplication
 
  -Simon
 
  On Thu, Jun 23, 2011 at 5:55 AM, Pranav Prakash pra...@gmail.com
  wrote:
  This approach would definitely work is the two documents are
  *Exactly*
  the
  same. But this is very fragile. Even if one extra space has been
  added,
  the
  whole hash would change. What I am really looking for is some %age
  similarity between documents, and remove those documents which are
  more
  than
  95% similar.
 
  *Pranav Prakash*
 
  temet nosce
 
  Twitter http://twitter.com/pranavprakash | Blog 
  http://blog.myblive.com |
  Google http://www.google.com/profiles/pranny
 
 
  On Thu, Jun 23, 2011 at 15:16, Omri Cohen o...@yotpo.com wrote:
 
  What you need to do, is to calculate some HASH (using any message
  digest
  algorithm you want, md5, sha-1 and so on), then do some reading on
  solr
  field collapse capabilities. Should not be too complicated..
 
  *Omri Cohen*
 
 
 
  Co-founder @ yotpo.com | o...@yotpo.com | +972-50-7235198 |
  +972-3-6036295
 
 
 
 
  My profiles: [image: LinkedIn] http://www.linkedin.com/in/omric
  [image:
  Twitter] http://www.twitter.com/omricohe [image:
  WordPress]http://omricohen.me
  Please consider your environmental responsibility. Before printing
  this
  e-mail message, ask yourself whether you really need a hard copy.
  IMPORTANT: The contents of this email and any attachments are
  confidential.
  They are intended for the named recipient(s) only. If you have
  received
  this
  email by mistake, please notify the sender immediately and do not
  disclose
  the contents to anyone or make copies thereof.
  Signature powered by
  
 
 
 
 
 http://www.wisestamp.com/email-install?utm_source=extensionutm_medium=emailutm_campaign=footer
 
  WiseStamp
 
 
 
 
 http://www.wisestamp.com/email-install?utm_source=extensionutm_medium=emailutm_campaign=footer
 
 
 
 
  -- Forwarded message --
  From: Pranav Prakash pra...@gmail.com
  Date: Thu, Jun 23, 2011 at 12:26 PM
  Subject: Removing duplicate documents from search results
  To: solr-user@lucene.apache.org
 
 
  How can I remove very similar documents from search results?
 
  My scenario is that there are documents in the index which are
  almost
  similar (people submitting same stuff multiple times, sometimes
  different
  people submitting same stuff). Now when a search is performed for
  keyword,
  in the top N results, quite frequently, same document comes up
  multiple
  times. I want to remove those duplicate (or possible duplicate)
  documents.
  Very similar to what Google does when they say In order to show
 you
  most
  relevant result, duplicates have been removed. How can I achieve
  this
  functionality using Solr? Does

Solr PhraseSearch and ExactMatch

2011-06-27 Thread Mohammad Shariq
Hello,
I am using solr1.4 on ubuntu 10.10.
Currently I got the requirement to do the ExactMatch  for PhraseSearch.
I tried googling but I did'nt got the exact solution.

I am doing the search on 'text' field.
if I give the search query :
http://localhost:8983/solr/select/?q=the search agency http://localhost

It apply the stopWordsFilter and remove the 'the' word from query, but my
requirement is to do the ExactMatch.
please suggest me the right solution to this problem.

-- 
Thanks and Regards
Mohammad Shariq


Re: Solr PhraseSearch and ExactMatch

2011-06-27 Thread Mohammad Shariq
I can use 'String' instead of 'Text' for exact match, but I need ExactMatch
only on PhraseSearch.

On 27 June 2011 16:29, Gora Mohanty g...@mimirtech.com wrote:

 On Mon, Jun 27, 2011 at 3:42 PM, Mohammad Shariq shariqn...@gmail.com
 wrote:
  Hello,
  I am using solr1.4 on ubuntu 10.10.
  Currently I got the requirement to do the ExactMatch  for PhraseSearch.
  I tried googling but I did'nt got the exact solution.
 
  I am doing the search on 'text' field.
  if I give the search query :
  http://localhost:8983/solr/select/?q=the search agency 
 http://localhost
 
  It apply the stopWordsFilter and remove the 'the' word from query, but my
  requirement is to do the ExactMatch.
  please suggest me the right solution to this problem.
 [...]

 Use a string field which does not have any analyzers, and
 tokenizers.

 Regards,
 Gora




-- 
Thanks and Regards
Mohammad Shariq


Re: Analyzer creates PhraseQuery

2011-06-27 Thread Mohammad Shariq
I guess 'to' may be listed in 'stopWords' .

On 28 June 2011 08:27, entdeveloper cameron.develo...@gmail.com wrote:

 I have an analyzer setup in my schema like so:

  analyzer
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.NGramFilterFactory minGramSize=1
 maxGramSize=2/
  /analyzer

 What's happening is if I index a term like toys and dolls, if I search
 for
 to, I get no matches. The debug output in solr gives me:

 str name=rawquerystringto/str
 str name=querystringto/str
 str name=parsedqueryPhraseQuery(autocomplete:t o to)/str
 str name=parsedquery_toStringautocomplete:t o to/str

 Which means it looks like the lucene query parser is turning it into a
 PhraseQuery for some reason. The explain seems to confirm that this
 PhraseQuery is what's causing my document to not match:

 0.0 = (NON-MATCH) weight(autocomplete:t o to in 82), product of:
  1.0 = queryWeight(autocomplete:t o to), product of:
6.684934 = idf(autocomplete: t=60 o=68 to=14)
0.1495901 = queryNorm
  0.0 = fieldWeight(autocomplete:t o to in 82), product of:
0.0 = tf(phraseFreq=0.0)
6.684934 = idf(autocomplete: t=60 o=68 to=14)
0.1875 = fieldNorm(field=autocomplete, doc=82)

 But why? This seems like it should match to me, and indeed the Solr
 analysis
 tool highlights the matches (see image), so something isn't lining up
 right.


 http://lucene.472066.n3.nabble.com/file/n3116288/Screen_shot_2011-06-27_at_7.55.49_PM.png

 In case you're wondering, I'm trying to implement a semi-advanced
 autocomplete feature that goes beyond using what a simple EdgeNGram
 analyzer
 could do.


 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Analyzer-creates-PhraseQuery-tp3116288p3116288.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Thanks and Regards
Mohammad Shariq


Re: how to index data in solr form database automatically

2011-06-24 Thread Mohammad Shariq
First write a Script in Python ( or JAVA or PHP or anyLanguage)  which reads
the data from database and index into Solr.

Now setup this script as cron-job to run automatically at certain interval.





On 24 June 2011 17:23, Romi romijain3...@gmail.com wrote:

 would you please tell me how can i use Cron for auto index my database
 tables
 in solr

 -
 Thanks  Regards
 Romi
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/how-to-index-data-in-solr-form-database-automatically-tp3102893p3103768.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Thanks and Regards
Mohammad Shariq


Search is taking long-long time.

2011-06-22 Thread Mohammad Shariq
I am running two solrShards. I have indexed 100 million docs in each shard (
each are 50 GB and only 'id' is stored).
My search have became very slow. Its taking around 2-3 seconds.
below is my query :

http://solrHost1:8080/solr/select?shards=solrHost1:8080/solr,solrHost2:8080/solrq=
QUERYfq=FilterQueryfl=idstart=0rows=100indent=onsort=time desc

QUERY and FilterQuery is below :

QUERY = Online Shopping AND ( Amex OR Am ex OR American express OR
americanexpress )
FilterQuery = time:[1308659371 TO 1308745771] AND category:news AND
lang:English

How to boost the query perfomance.
default search filed is title( text).


-- 
Thanks and Regards
Mohammad Shariq


Re: Search is taking long-long time.

2011-06-22 Thread Mohammad Shariq
this is how my 'time' field looks in schema :
field name=time type=tint indexed=true stored=false/

and also, I am doing frequent Update to Solr (every 5 minuts).


On 22 June 2011 18:41, Ahmet Arslan iori...@yahoo.com wrote:

  I am running two solrShards. I have
  indexed 100 million docs in each shard (
  each are 50 GB and only 'id' is stored).
  My search have became very slow. Its taking around 2-3
  seconds.
  below is my query :
 
 
 http://solrHost1:8080/solr/select?shards=solrHost1:8080/solr,solrHost2:8080/solrq=
  QUERYfq=FilterQueryfl=idstart=0rows=100indent=onsort=time
  desc
 
  QUERY and FilterQuery is below :
 
  QUERY = Online Shopping AND ( Amex OR Am ex OR American
  express OR
  americanexpress )
  FilterQuery = time:[1308659371 TO 1308745771] AND
  category:news AND
  lang:English
 
  How to boost the query perfomance.
  default search filed is title( text).

 If fieldType of time is not trie-based, you can change it to tdate, tint
 etc. For range queries.

 If you don't update your index frequently, you can use separate filter
 queries (fq) for your clauses. To benefit from caching.
 fq=category:newsfq=lang:English

 http://wiki.apache.org/solr/SolrPerformanceFactors
 http://wiki.apache.org/lucene-java/ImproveSearchingSpeed




-- 
Thanks and Regards
Mohammad Shariq


Re: fq vs adding to query

2011-06-19 Thread Mohammad Shariq
fq is filter-query, search based on category, timestamp, language etc. but I
dont see any performance improvement if use 'keyword' in fq.

useCases :
fq=lang:Englishq=camera AND digital
OR
fq=time:[13023567 TO 13023900]q=camera AND digital


On 19 June 2011 20:17, Jamie Johnson jej2...@gmail.com wrote:

 Are there any hard and fast rules about when touse fq vs adding to the
 query?  For instance if I started with a search of
 camera

 then wanted to add another keyword say digital, is it better to do

 q=camera AND digital

 or

 q=camerafq=digital

 I know that fq isn't taken into account when doing highlighting, so what I
 am currently doing is when there are facet based queries I am doing fqs but
 everything else is being added to the query, so in the case above I would
 have done q=camera AND digital.  If however there was a field called
 category with values standard or digital I would have done
 q=camerafq=category:digital.  Any guidance would be appreciated.




-- 
Thanks and Regards
Mohammad Shariq


Re: Weird optimize performance degradation

2011-06-19 Thread Mohammad Shariq
I also have the solr with around 100mn docs.
I do optimize once in a week, and it takes around 1 hour 30 mins to
optimize.


On 19 June 2011 20:02, Santiago Bazerque sbazer...@gmail.com wrote:

 Hello Erick, thanks for your answer!

 Yes, our over-optimization is mainly due to paranoia over these strange
 commit times. The long optimize time persisted in all the subsequent
 commits, and this is consistent with what we are seeing in other production
 indexes that have the same problem. Once the anomaly shows up, it never
 commits quickly again.

 I combed through the last 50k documents that were added before the first
 slow commit. I found one with a larger than usual number of fields (didn't
 write down the number, but it was a few thousands).

 I deleted it, and the following optimize was normal again (110 seconds). So
 I'm pretty sure a document with lots of fields is the cause of the
 slowdown.

 If that would be useful, I can do some further testing to confirm this
 hypothesis and send the document to the list.

 Thanks again for your answer.

 Best,
 Santiago

 On Sun, Jun 19, 2011 at 10:21 AM, Erick Erickson erickerick...@gmail.com
 wrote:

  First, there's absolutely no reason to optimize this often, if at all.
  Older
  versions of Lucene would search faster on an optimized index, but
  this is no longer necessary. Optimize will reclaim data from
  deleted documents, but is generally recommended to be performed
  fairly rarely, often at off-peak hours.
 
  Note that optimize will re-write your entire index into a single new
  segment,
  so following your pattern it'll take longer and longer each time.
 
  But the speed change happening at 500,000 documents is suspiciously
  close to the default mergeFactor of 10 X 50,000. Do subsequent
  optimizes (i.e. on the 750,000th document) still take that long? But
  this doesn't make sense because if you're optimizing instead of
  committing, each optimize should reduce your index to 1 segment and
  you'll never hit a merge.
 
  So I'm a little confused. If you're really optimizing every 50K docs,
 what
  I'd expect to see is successively longer times, and at the end of each
  optimize I'd expect there to be only one segment in your index.
 
  Are you sure you're not just seeing successively longer times on each
  optimize and just noticing it after 10?
 
  Best
  Erick
 
  On Sun, Jun 19, 2011 at 6:04 AM, Santiago Bazerque sbazer...@gmail.com
  wrote:
   Hello!
  
   Here is a puzzling experiment:
  
   I build an index of about 1.2MM documents using SOLR 3.1. The index has
 a
   large number of dynamic fields (about 15.000). Each document has about
  100
   fields.
  
   I add the documents in batches of 20, and every 50.000 documents I
  optimize
   the index.
  
   The first 10 optimizes (up to exactly 500k documents) take less than a
   minute and a half.
  
   But the 11th and all subsequent commits take north of 10 minutes. The
  commit
   logs look identical (in the INFOSTREAM.txt file), but what used to be
  
 Jun 19, 2011 4:03:59 AM IW 13 [Sun Jun 19 04:03:59 EDT 2011; Lucene
  Merge
   Thread #0]: merge: total 50 docs
  
   Jun 19, 2011 4:04:37 AM IW 13 [Sun Jun 19 04:04:37 EDT 2011; Lucene
 Merge
   Thread #0]: merge store matchedCount=2 vs 2
  
  
   now eats a lot of time:
  
  
 Jun 19, 2011 4:37:06 AM IW 14 [Sun Jun 19 04:37:06 EDT 2011; Lucene
  Merge
   Thread #0]: merge: total 55 docs
  
   Jun 19, 2011 4:46:42 AM IW 14 [Sun Jun 19 04:46:42 EDT 2011; Lucene
 Merge
   Thread #0]: merge store matchedCount=2 vs 2
  
  
   What could be happening between those two lines that takes 10 minutes
 at
   full CPU? (and with 50k docs less used to take so much less?).
  
  
   Thanks in advance,
  
   Santiago
  
 




-- 
Thanks and Regards
Mohammad Shariq


Re: Solr and Tag Cloud

2011-06-18 Thread Mohammad Shariq
I am also looking for the same, Is there any way to find the cloud-tag of
all the documents matching a specific query.


On 18 June 2011 09:42, Jamie Johnson jej2...@gmail.com wrote:

 Does anyone have details of how to generate a tag cloud of popular terms
 across an entire data set and then also across a query?




-- 
Thanks and Regards
Mohammad Shariq


Re: Is it true that I cannot delete stored content from the index?

2011-06-18 Thread Mohammad Shariq
I have define uniqueKey in my solr and Deleting the docs from solr using
this uniqueKey.
and then doing optimization once in a day.
is this right way to delete ???

On 19 June 2011 05:14, Erick Erickson erickerick...@gmail.com wrote:

 Yep, you've got to delete and re-add. Although if you have a
 uniqueKey defined you
 can just re-add that document and Solr will automatically delete the
 underlying
 document.

 You might have to optimize the index afterwards to get the data to really
 disappear since the deletion process just marks the document as
 deleted.

 Best
 Erick

 On Sat, Jun 18, 2011 at 1:20 PM, Gabriele Kahlout
 gabri...@mysimpatico.com wrote:
  Hello,
 
  I've indexing with the content field stored. Now I'd like to delete all
  stored content, is there how to do that without re-indexing?
 
  It seems not from lucene
  FAQ
 http://wiki.apache.org/lucene-java/LuceneFAQ#How_do_I_update_a_document_or_a_set_of_documents_that_are_already_indexed.3F
 
  :
  How do I update a document or a set of documents that are already
  indexed? There
  is no direct update procedure in Lucene. To update an index incrementally
  you must first *delete* the documents that were updated, and *then
  re-add*them to the index.
 
  --
  Regards,
  K. Gabriele
 
  --- unchanged since 20/9/10 ---
  P.S. If the subject contains [LON] or the addressee acknowledges the
  receipt within 48 hours then I don't resend the email.
  subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
 time(x)
   Now + 48h) ⇒ ¬resend(I, this).
 
  If an email is sent by a sender that is not a trusted contact or the
 email
  does not contain a valid code then the email is not received. A valid
 code
  starts with a hyphen and ends with X.
  ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
  L(-[a-z]+[0-9]X)).
 




-- 
Thanks and Regards
Mohammad Shariq


Search failed even if it has the keyword .

2011-06-17 Thread Mohammad Shariq
Hello,
solr-search failed even if it has the keyword .
I am using solr (solr3.1 on ubuntu 10.10) for Indexing the tweets.
I am indexing certain tweets, but solr do'nt return any result when I search
any keyword from tweet.

in Solr, tweet is stored as 'text'.
below is the tweet which I index :
*RT @Khan_KK: DescribeYourImageWithAMovieTitle Khan Rais*

once this tweet is index, I search for :
*
http://127.0.0.1:8983/solr/select/?q=DescribeYourImageWithAMovieTitleversion=2.2start=0rows=10indent=on
*

and nothing is returned from Solr even though this tweet is there in the
solr.
I tried searching many keywords e.g. describe, image, movie  but nothing is
returned from solr.
I am using 'text' field of solr3.1.
am I using right textChunker ??
please help me.




-- 
Thanks and Regards
Mohammad Shariq


Re: Search failed even if it has the keyword .

2011-06-17 Thread Mohammad Shariq
Hi Pravesh,
this is how my schema looks for 'text' field :
fieldType name=text class=solr.TextField positionIncrementGap=100
analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1
generateNumberParts=1 catenateWords=1 catenateNumbers=1
catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory language=English
protected=protwords.txt/
/analyzer
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1
generateNumberParts=1 catenateWords=0 catenateNumbers=0
catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory language=English
protected=protwords.txt/
/analyzer
/fieldType

My default search field is 'defaultquery'
and I am copy field is :
copyField source=title dest=defaultquery/
copyField source=description dest=defaultquery/
copyField source=siteurl dest=defaultquery/

And My tweet is indexed into 'title'.



On 17 June 2011 15:46, pravesh suyalprav...@yahoo.com wrote:

 First check, in your schema.xml, which is your default search field. Also
 look if you are using WordDelimiterFilterFactory in your schema.xml for the
 specific field. This would tokenize your words on every capital letter, so,
 for the word DescribeYourImageWithAMovieTitle will be broken into
 multiple
 tokens and each will be searchable.

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Search-failed-even-if-it-has-the-keyword-tp3075626p3075644.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Thanks and Regards
Mohammad Shariq


Re: Search failed even if it has the keyword .

2011-06-17 Thread Mohammad Shariq
very very thanks Parvesh.
my 'title' was 'text'  whereas 'defaultquery'  was 'query_text'.

I change my 'defaultquery'  to 'text' and problem is solved.

thanks again.

On 17 June 2011 16:57, pravesh suyalprav...@yahoo.com wrote:

 What is the type for the field's  defaultquery  title in your schema.xml ?

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Search-failed-even-if-it-has-the-keyword-tp3075626p3075797.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Thanks and Regards
Mohammad Shariq


Re: how to Index and Search non-Eglish Text in solr

2011-06-09 Thread Mohammad Shariq
Can I specify multiple language in filter tag in schema.xml ???  like below

fieldType name=text class=solr.TextField positionIncrementGap=100
   analyzer type=index
  tokenizer class=solr.
WhitespaceTokenizerFactory/
  filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true/
  filter class=solr.WordDelimiterFilterFactory generateWordParts=1
generateNumberParts=1 catenateWords=1 catenateNumbers=1
catenateAll=0 splitOnCaseChange=1/

filter class=solr.SnowballPorterFilterFactory language=Dutch /
filter class=solr.SnowballPorterFilterFactory language=English /
filter class=solr.SnowballPorterFilterFactory language=Chinese /
tokenizer class=solr.WhitespaceTokenizerFactory/
tokenizer class=solr.CJKTokenizerFactory/



  filter class=solr.LowerCaseFilterFactory/filter
class=solr.SnowballPorterFilterFactory language=Hungarian /


On 8 June 2011 18:47, Erick Erickson erickerick...@gmail.com wrote:

 This page is a handy reference for individual languages...
 http://wiki.apache.org/solr/LanguageAnalysis

 But the usual approach, especially for Chinese/Japanese/Korean
 (CJK) is to index the content in different fields with language-specific
 analyzers then spread your search across the language-specific
 fields (e.g. title_en, title_fr, title_ar). Stemming and stopwords
 particularly give surprising results if you put words from different
 languages in the same field.

 Best
 Erick

 On Wed, Jun 8, 2011 at 8:34 AM, Mohammad Shariq shariqn...@gmail.com
 wrote:
  Hi,
  I had setup solr( solr-1.4 on Ubuntu 10.10) for indexing news articles in
  English, but my requirement extend to index the news of other languages
 too.
 
  This is how my schema looks :
  field name=news type=text indexed=true stored=false
  required=false/
 
 
  And the text Field in schema.xml looks like :
 
  fieldType name=text class=solr.TextField positionIncrementGap=100
 analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
  words=stopwords.txt enablePositionIncrements=true/
filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1
  generateNumberParts=1 catenateWords=1 catenateNumbers=1
  catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory language=English
  protected=protwords.txt/
 /analyzer
 analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
  ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true
  words=stopwords.txt enablePositionIncrements=true/
filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1
  generateNumberParts=1 catenateWords=0 catenateNumbers=0
  catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory language=English
  protected=protwords.txt/
 /analyzer
  /fieldType
 
 
  My Problem is :
  Now I want to index the news articles in other languages to e.g.
  Chinese,Japnese.
  How I can I modify my text field so that I can Index the news in other
 lang
  too and make it searchable ??
 
  Thanks
  Shariq
 
 
 
 
 
  --
  View this message in context:
 http://lucene.472066.n3.nabble.com/how-to-Index-and-Search-non-Eglish-Text-in-solr-tp3038851p3038851.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 




-- 
Thanks and Regards
Mohammad Shariq


Re: how to Index and Search non-Eglish Text in solr

2011-06-09 Thread Mohammad Shariq
Thanks Erick for your help.
I have another silly question.
Suppose I created mutiple fieldTypes e.g. news_English, news_Chinese,
news_Japnese etc.
after creating these field, can I copy all these to CopyField *defaultquery
*like below :

*copyField source=news_English dest=defaultquery/
copyField source=news_Chinese dest=defaultquery/
copyField source=news_Japnese dest=defaultquery/

*and my defaultquery looks like :*
field name=defaultquery type=query_text indexed=false stored=false
multiValued=true/

*Is this right way to deal  with multiple language Indexing and searching* *
???*

*


On 9 June 2011 19:06, Erick Erickson erickerick...@gmail.com wrote:

 No, you'd have to create multiple fieldTypes, one for each language

 Best
 Erick

 On Thu, Jun 9, 2011 at 5:26 AM, Mohammad Shariq shariqn...@gmail.com
 wrote:
  Can I specify multiple language in filter tag in schema.xml ???  like
 below
 
  fieldType name=text class=solr.TextField positionIncrementGap=100
analyzer type=index
   tokenizer class=solr.
  WhitespaceTokenizerFactory/
   filter class=solr.StopFilterFactory ignoreCase=true
  words=stopwords.txt enablePositionIncrements=true/
   filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1
  generateNumberParts=1 catenateWords=1 catenateNumbers=1
  catenateAll=0 splitOnCaseChange=1/
 
  filter class=solr.SnowballPorterFilterFactory language=Dutch /
  filter class=solr.SnowballPorterFilterFactory language=English /
  filter class=solr.SnowballPorterFilterFactory language=Chinese /
  tokenizer class=solr.WhitespaceTokenizerFactory/
  tokenizer class=solr.CJKTokenizerFactory/
 
 
 
   filter class=solr.LowerCaseFilterFactory/filter
  class=solr.SnowballPorterFilterFactory language=Hungarian /
 
 
  On 8 June 2011 18:47, Erick Erickson erickerick...@gmail.com wrote:
 
  This page is a handy reference for individual languages...
  http://wiki.apache.org/solr/LanguageAnalysis
 
  But the usual approach, especially for Chinese/Japanese/Korean
  (CJK) is to index the content in different fields with language-specific
  analyzers then spread your search across the language-specific
  fields (e.g. title_en, title_fr, title_ar). Stemming and stopwords
  particularly give surprising results if you put words from different
  languages in the same field.
 
  Best
  Erick
 
  On Wed, Jun 8, 2011 at 8:34 AM, Mohammad Shariq shariqn...@gmail.com
  wrote:
   Hi,
   I had setup solr( solr-1.4 on Ubuntu 10.10) for indexing news articles
 in
   English, but my requirement extend to index the news of other
 languages
  too.
  
   This is how my schema looks :
   field name=news type=text indexed=true stored=false
   required=false/
  
  
   And the text Field in schema.xml looks like :
  
   fieldType name=text class=solr.TextField
 positionIncrementGap=100
  analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.StopFilterFactory ignoreCase=true
   words=stopwords.txt enablePositionIncrements=true/
 filter class=solr.WordDelimiterFilterFactory
  generateWordParts=1
   generateNumberParts=1 catenateWords=1 catenateNumbers=1
   catenateAll=0 splitOnCaseChange=1/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.SnowballPorterFilterFactory
 language=English
   protected=protwords.txt/
  /analyzer
  analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.SynonymFilterFactory
 synonyms=synonyms.txt
   ignoreCase=true expand=true/
 filter class=solr.StopFilterFactory ignoreCase=true
   words=stopwords.txt enablePositionIncrements=true/
 filter class=solr.WordDelimiterFilterFactory
  generateWordParts=1
   generateNumberParts=1 catenateWords=0 catenateNumbers=0
   catenateAll=0 splitOnCaseChange=1/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.SnowballPorterFilterFactory
 language=English
   protected=protwords.txt/
  /analyzer
   /fieldType
  
  
   My Problem is :
   Now I want to index the news articles in other languages to e.g.
   Chinese,Japnese.
   How I can I modify my text field so that I can Index the news in other
  lang
   too and make it searchable ??
  
   Thanks
   Shariq
  
  
  
  
  
   --
   View this message in context:
 
 http://lucene.472066.n3.nabble.com/how-to-Index-and-Search-non-Eglish-Text-in-solr-tp3038851p3038851.html
   Sent from the Solr - User mailing list archive at Nabble.com.
  
 
 
 
 
  --
  Thanks and Regards
  Mohammad Shariq
 




-- 
Thanks and Regards
Mohammad Shariq


Re: SolrCloud questions

2011-06-09 Thread Mohammad Shariq
I am also planning to move to SolrCloud;
since its still in under development, I am not sure about its behavior in
Production.
Please update us once you find it stable.


On 10 June 2011 03:56, Upayavira u...@odoko.co.uk wrote:

 I'm exploring SolrCloud for a new project, and have some questions based
 upon what I've found so far.

 The setup I'm planning is going to have a number of multicore hosts,
 with cores being moved between hosts, and potentially with cores merging
 as they get older (cores are time based, so once today has passed, they
 don't get updated).

 First question: The solr/conf dir gets uploaded to Zookeeper when you
 first start up, and using system properties you can specify a name to be
 associated with those conf files. How do you handle it when you have a
 multicore setup, and different configs for each core on your host?

 Second question: Can you query collections when using multicore? On
 single core, I can query:

  http://localhost:8983/solr/collection1/select?q=blah

 On a multicore system I can query:

  http://localhost:8983/solr/core1/select?q=blah

 but I cannot work out a URL to query collection1 when I have multiple
 cores.

 Third question: For replication, I'm assuming that replication in
 SolrCloud is still managed in the same way as non-cloud Solr, that is as
 ReplicationHandler config in solrconfig? In which case, I need a
 different config setup for each slave, as each slave has a different
 master (or can I delegate the decision as to which host/core is its
 master to zookeeper?)

 Thanks for any pointers.

 Upayavira
 ---
 Enterprise Search Consultant at Sourcesense UK,
 Making Sense of Open Source




-- 
Thanks and Regards
Mohammad Shariq


Re: Re: Can I update a specific field in solr?

2011-06-08 Thread Mohammad Shariq
Solr dont support partial updates.

On 8 June 2011 16:04, ZiLi dangld...@163.com wrote:


 Thanks very much , I'll re-index a whole document : )




 发件人: Chandan Tamrakar
 发送时间: 2011-06-08  18:25:37
 收件人: solr-user
 抄送:
 主题: Re: Can I update a specific field in solr?

 I think You can do that but you need to re-index a whole document again.
 note that there is nothing like update  , its usually delete and then
 add.
 thanks
 On Wed, Jun 8, 2011 at 4:00 PM, ZiLi dangld...@163.com wrote:
  Hi, I try to update a specific field in solr , but I didn't find anyway
 to
  implement this .
  Anyone who knows how to ?
  Any suggestions will be appriciate : )
 
 
  2011-06-08
 
 
 
  ZiLi
 
 --
 Chandan Tamrakar
 *
 *




-- 
Thanks and Regards
Mohammad Shariq


Re: solr speed issues..

2011-06-08 Thread Mohammad Shariq
How frequently you Optimize your solrIndex ??
Optimization also helps in reducing search latency. 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-speed-issues-tp2254823p3038794.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: problem: zooKeeper Integration with solr

2011-06-07 Thread Mohammad Shariq
how this method
(http://localhost:8983/solr/select?shards=*Machine:Port/Solr
Path,**Machine:Port/Solr Path*indent=trueq=query)
is better than zooKeeper, could you please refer any performance doc.


On 7 June 2011 08:18, bmdakshinamur...@gmail.com bmdakshinamur...@gmail.com
 wrote:

 Instead of integrating zookeeper, you could create shards over multiple
 machines and specify the shards while you are querying solr.
 Eg: http://localhost:8983/solr/select?shards=*Machine:Port/Solr
 Path,*
 *Machine:Port/Solr Path*indent=trueq=query



 On Mon, Jun 6, 2011 at 5:59 PM, Mohammad Shariq shariqn...@gmail.com
 wrote:

  Hi folk,
  I am using solr to index around 100mn docs.
  now I am planning to move to cluster based solr, so that I can scale the
  indexing and searching process.
  since solrCloud is in development  stage, I am trying to index in shard
  based environment using zooKeeper.
 
  I followed the steps from
  http://wiki.apache.org/solr/ZooKeeperIntegrationthen also I am not
  able to do distributes search.
  Once I index the docs in one shard, not able to query from other shard
 and
  vice-versa, (using the query
 
 
 http://localhost:8180/solr/select/?q=itunesversion=2.2start=0rows=10indent=on
  )
 
  I am running solr3.1 on ubuntu 10.10.
 
  please help me.
 
 
  --
  Thanks and Regards
  Mohammad Shariq
 



 --
 Thanks and Regards,
 DakshinaMurthy BM




-- 
Thanks and Regards
Mohammad Shariq


problem: zooKeeper Integration with solr

2011-06-06 Thread Mohammad Shariq
Hi folk,
I am using solr to index around 100mn docs.
now I am planning to move to cluster based solr, so that I can scale the
indexing and searching process.
since solrCloud is in development  stage, I am trying to index in shard
based environment using zooKeeper.

I followed the steps from
http://wiki.apache.org/solr/ZooKeeperIntegrationthen also I am not
able to do distributes search.
Once I index the docs in one shard, not able to query from other shard and
vice-versa, (using the query
http://localhost:8180/solr/select/?q=itunesversion=2.2start=0rows=10indent=on
)

I am running solr3.1 on ubuntu 10.10.

please help me.


-- 
Thanks and Regards
Mohammad Shariq