Re: Stable Versions in Solr 4

2015-12-28 Thread Rajani Maski
Solr 4.10.3

On Mon, Dec 28, 2015 at 5:51 PM, Binoy Dalal  wrote:

> You should take a look at solr's jira.
> That'll give you a pretty good idea of the various feature upgrades across
> versions as well as the bugs present in the various versions.
>
> On Mon, 28 Dec 2015, 17:42 abhi Abhishek  wrote:
>
> > Hi All,
> >i am trying to determine stable version of SOLR 4. is there a blog
> which
> > we can refer.. i understand we can read through Release Notes. I am
> > interested in user reviews and challenges seen with various versions of
> > SOLR 4.
> >
> >
> > Appreciate your contribution.
> >
> > Thanks,
> > Abhishek
> >
> --
> Regards,
> Binoy Dalal
>


Re: Apache Solr SpellChecker Integration with the default select request handler

2015-11-04 Thread Rajani Maski
The attached exception seems to be stripped off. Anyways,

>>I want to integrate spellcheck handler with default select handler. Please
guide me how can I achieve this.

If  you were unable to follow the steps mentioned on reference guide[2],
here is another link[1] that gives same but quick setup steps for spell
check that you may want to have a look at.

[1] https://support.lucidworks.com/hc/en-us/articles/212722027
[2]https://cwiki.apache.org/confluence/display/solr/Spell+Checking

Hope that helps.





On Wed, Nov 4, 2015 at 2:37 AM, Shruthi BN  wrote:

> Hi Team,
>
> I want to integrate spellcheck handler with default select handler.
>
> Please guide me how can I achieve this.
>
>
>
>
>
> I tried like
>
>
>
> 
>
>
>
>  
>
>explicit
>
>10
>
>text
>
>
>
>   productname
>
>   default
>
>   on
>
>   true
>
>   5
>
>   true
>
>   true
>
>  
>
>   
>
>   spellcheck
>
> 
>
> 
>
>
>
>
>
> But above code is not working.I got exception like
>
> 
>
> 
>
> java.lang.NullPointerException at
>
> org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComp
> onent.java:130) at
>
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHand
> ler.java:208) at
>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.
> java:135) at
>
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest
> (RequestHandlers.java:242) at
> org.apache.solr.core.SolrCore.execute(SolrCore.java:1816) at
>
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:4
> 48) at
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:
> 269) at
>
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Application
> FilterChain.java:243) at
>
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterCh
> ain.java:210) at
>
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.ja
> va:225) at
>
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.ja
> va:169) at
>
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168
> ) at
>
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
> at
> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:927)
> at
>
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java
> :118) at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
> at
>
> org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Proce
> ssor.java:999) at
>
> org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(Abstrac
> tProtocol.java:565) at
>
> org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint.java:
> 1812) at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:11
> 45) at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:6
> 15) at java.lang.Thread.run(Thread.java:745)
>
> 
>
> 500
>
> 
>
>
>
> Thanks & Regards,
>
> Shruthi
>
> Ideapoke Technologies
>
>
>
>


Re: `cat /dev/null > solr-8983-console.log` frees host's memory

2015-10-21 Thread Rajani Maski
This details in this link[1] might be of help.

[1]https://support.lucidworks.com/hc/en-us/articles/207072137

On Wed, Oct 21, 2015 at 7:42 AM, Emir Arnautovic <
emir.arnauto...@sematext.com> wrote:

> Hi Eric,
> As Shawn explained, memory is freed because it was used to cache portion
> of log file.
>
> Since you are already with Sematext, I guess you are aware, but doesn't
> hurt to remind you that we also have Logsene that you can use to manage
> your logs: http://sematext.com/logsene/index.html
>
> Thanks,
> Emir
>
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>
>
>
> On 20.10.2015 17:42, Shawn Heisey wrote:
>
>> On 10/20/2015 9:19 AM, Eric Torti wrote:
>>
>>> I had a 52GB solr-8983-console.log on my Solr 5.2.1 Amazon Linux
>>> 64-bit box and decided to `cat /dev/null > solr-8983-console.log` to
>>> free space.
>>>
>>> The weird thing is that when I checked Sematext I noticed the OS had
>>> freed a lot of memory at the same exact instant I did that.
>>>
>> On that memory graph, the legend doesn't indicate which of the graph
>> colors represent each of the four usage types at the top -- they all
>> have blue checkboxes, so I can't tell for sure what changed.
>>
>> If the number that dropped is "cached" (which I think is likely) then
>> everything is working exactly as it should.  The OS had simply cached a
>> large chunk of the logfile, exactly as it is designed to do, and once
>> the file was deleted, it stopped reserving that memory and made it
>> available.
>>
>> https://en.wikipedia.org/wiki/Page_cache
>>
>> Thanks,
>> Shawn
>>
>>


Re: sort on fields that are not mandatory in each document

2015-05-27 Thread Rajani Maski
Hi Derek,

They are at the fieldType Level. You might find some reference examples in
schema.xml using them.

https://cwiki.apache.org/confluence/display/solr/Field+Type+Definitions+and+Properties

On Wed, May 27, 2015 at 2:30 PM, Derek Poh d...@globalsources.com wrote:

 Hi Ahmet

 The sortMissingLast and sortMissingFirst attributes are defined at the
 field or fieldType level?

 field name=P_TSRank type=int indexed=true stored=true
 multiValued=false/

 fieldType name=int class=solr.TrieIntField precisionStep=0
 positionIncrementGap=0/


 On 5/27/2015 4:43 PM, Ahmet Arslan wrote:

 Hi,
 I think you are looking for sortMissing* attributes:

 sortMissingLast and sortMissingFirst attributes are optional attributes
 are
 currently supported on types that are sorted internally as strings
 and on numeric types.

 Ahmet

 On Wednesday, May 27, 2015 11:36 AM, Derek Poh d...@globalsources.com
 wrote:
 Hi

 I am trying to sort on multiple fields. These fields donot necessary
 exist in every document.
 sort=sppddrank asc, ddrank asc

 From the sorted result, it seems that documents which donot have
 sppddrank field is at the top.

 How can I make the documents that have the sppddrank field to be on top
 and sortedby it and those documents which do not have the field below?

 -Derek






Re: A Synonym Searching for Phrase?

2015-05-15 Thread Rajani Maski
Hi Ryan,

I am not really sure whether this[1] solution mentioned in the link below
can work for your case considering its cons. However, I recommend having a
quick look at it.

@Chris, Would eagerly wait for your contribution.


[1] https://support.lucidworks.com/hc/en-us/articles/205359448



On Thu, May 14, 2015 at 11:30 PM, Chris Morley ch...@depahelix.com wrote:

 I have implemented that but it's not open sourced yet.  It will be soon.

  -Chris.




 
  From: Ryan Yacyshyn ryan.yacys...@gmail.com
 Sent: Thursday, May 14, 2015 12:07 PM
 To: solr-user@lucene.apache.org
 Subject: A Synonym Searching for Phrase?
 Hi All,

 I'm running into an issue where I have some tokens that really mean the
 same thing as two. For example, there are a couple ways users might want
 to
 search for certain type of visa called the s pass, but they might query
 for spass or s-pass.

 I thought I could add a line in my synonym file to solve this, such as:

 s-pass, spass = s pass

 This doesn't seem to work. I found an Auto Phrase TokenFilter (
 https://github.com/LucidWorks/auto-phrase-tokenfilter) that looks like it
 might help, but it sounds like it needs to use a specific query parser as
 well (we're using edismax).

 Has anyone came across this specific problem before? Would really
 appreciate your suggestions / help.

 We're using Solr 4.8.x (and lucidWorks 2.9).

 Thanks!
 Ryan





Re: Proximity Search

2015-04-30 Thread Rajani Maski
Hi Vijaya,

I just quickly tried proximity search with the example set shipped with
solr 5 and it looked like working for me.
Perhaps, what you could is debug the query by enabling debugQuery=true.


Here are the steps that I tried.(Assuming you are on Solr 5. Though this
term proximity functionality should work for 4.x versions too)

1. Go to solr5.0 downloaded folder and navigate to bin.

Rajanis-MacBook-Pro:solr-5.0.0 rajanishivarajmaski$ bin/solr -e techproducts

2. Execute the below query. The field name has value Test with some
GB18030 encoded characters and you search for  name: Test  GB18030~10

http://localhost:8983/solr/techproducts/select?q=name: Test
 GB18030~10wt=jsonindent=true

Image : http://postimg.org/image/bjkbufsph/


On Thu, Apr 30, 2015 at 7:14 PM, Vijaya Narayana Reddy Bhoomi Reddy 
vijaya.bhoomire...@whishworks.com wrote:

 I just tried with simple proximity search like word1 word2 ~3 and it is
 not working. Just wondering whether I have to make any configuration
 changes to solrconfig.xml to make proximity search work?

 Thanks
 Vijay


 On 30 April 2015 at 14:32, Vijaya Narayana Reddy Bhoomi Reddy 
 vijaya.bhoomire...@whishworks.com wrote:

  Hi,
 
  I have created my index with the default configurations. Now I am trying
  to use proximity search. However, I am bit not sure on the results and
  where its going wrong.
 
  For example, I want to find two phrases this is phrase one and another
  phrase this is the second phrase with not more than a proximity
 distance
  of 4 words in between them. The query syntax I am using is (\this is
  phrase one\) (\this is the second phrase\)~4
 
  However, the results I am getting are similar to OR operation. Can anyone
  please let me know whether the syntax is correct?
 
  Also, please let me know how to implement proximity search using SolrJ
  Query API?
 
  Thanks  Regards
  Vijay
 

 --
 The contents of this e-mail are confidential and for the exclusive use of
 the intended recipient. If you receive this e-mail in error please delete
 it from your system immediately and notify us either by e-mail or
 telephone. You should not copy, forward or otherwise disclose the content
 of the e-mail. The views expressed in this communication may not
 necessarily be the view held by WHISHWORKS.



Re: Solr Synonyms, Escape space in case of multi words

2014-10-16 Thread Rajani Maski
Hi David,

  I think you should have the filter class with tokenizer specified. [As
shown below]

  filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true

*tokenizerFactory=solr.KeywordTokenizerFactory/*



So your field type should be as shown below:

fieldType name=text_syn class=solr.TextField
positionIncrementGap=100
  analyzer
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true
tokenizerFactory=solr.KeywordTokenizerFactory/
  /analyzer
/fieldType


On Wed, Oct 15, 2014 at 7:25 PM, David Philip davidphilipshe...@gmail.com
wrote:

 Sorry, analysis page clip is getting trimmed off and hence the indention is
 lost.

 Here it is :

 ridemakers | ride | ridemakerz | ride | ridemark | ride | makers | makerz|
 care

 expected:

 ridemakers | ride | ridemakerz | ride | ridemark | ride | makers |
 makerz| *ride
 care*



 On Wed, Oct 15, 2014 at 7:21 PM, David Philip davidphilipshe...@gmail.com
 
 wrote:

  contd..
 
  expectation was that the ride care  should not have split into two
  tokens.
 
  It should have been as below. Please correct me/point me where I am
 wrong.
 
 
  Input : ridemakers, ride makers, ridemakerz, ride makerz, ride\mark,
 ride\
  care
 
  o/p
 
  ridemakersrideridemakerzrideridemarkridemakersmakerz
 
  *ride care*
 
 
 
 
  On Wed, Oct 15, 2014 at 7:16 PM, David Philip 
 davidphilipshe...@gmail.com
   wrote:
 
  Hi All,
 
 I remember using multi-words in synonyms in Solr 3.x version. In case
  of multi words, I was escaping space with back slash[\] and it work as
  intended.  Ex: ride\ makers, riders, rider\ guards.  Each one mapped to
  each other and so when I searched for ride makers, I obtained the search
  results for all of them. The field type was same as below. I have same
 set
  up in solr 4.10 but now the multi word space escape is getting ignored.
 It
  is tokenizing on spaces.
 
   synonyms.txt
  ridemakers, ride makers, ridemakerz, ride makerz, ride\mark, ride\
  care
 
 
  Analysis page:
 
  ridemakersrideridemakerzrideridemarkridemakersmakerzcare
 
  Field Type
 
  fieldType name=text_syn class=solr.TextField
  positionIncrementGap=100
analyzer
  tokenizer class=solr.KeywordTokenizerFactory/
  filter class=solr.SynonymFilterFactory
 synonyms=synonyms.txt
  ignoreCase=true expand=true/
/analyzer
  /fieldType
 
 
 
  Could you please tell me what could be the issue? How do I handle
  multi-word cases?
 
 
 
 
  synonyms.txt
  ridemakers, ride makers, ridemakerz, ride makerz, ride\mark, ride\
  care
 
 
  Thanks - David
 
 
 
 
 



Re: import solr source to eclipse

2014-10-14 Thread Rajani Maski
Configure eclipse with Jetty plugin. Create a Solr folder under your
Solr-Java-Project and Run the project [Run as] on Jetty Server.

This blog[1] may help you to configure Solr within eclipse.


[1]
http://hokiesuns.blogspot.in/2010/01/setting-up-apache-solr-in-eclipse.html

On Tue, Oct 14, 2014 at 12:06 PM, Ali Nazemian alinazem...@gmail.com
wrote:

 Thank you very much for your guides but how can I run solr server inside
 eclipse?
 Best regards.

 On Mon, Oct 13, 2014 at 8:02 PM, Rajani Maski rajinima...@gmail.com
 wrote:

  Hi,
 
  The best tutorial for setting up Solr[solr 4.7] in eclipse/intellij  is
  documented in Solr In Action book, Apendix A, *Working with the Solr
  codebase*
 
 
  On Mon, Oct 13, 2014 at 6:45 AM, Tomás Fernández Löbbe 
  tomasflo...@gmail.com wrote:
 
   The way I do this:
   From a terminal:
   svn checkout https://svn.apache.org/repos/asf/lucene/dev/trunk/
   lucene-solr-trunk
   cd lucene-solr-trunk
   ant eclipse
  
   ... And then, from your Eclipse import existing java project, and
  select
   the directory where you placed lucene-solr-trunk
  
   On Sun, Oct 12, 2014 at 7:09 AM, Ali Nazemian alinazem...@gmail.com
   wrote:
  
Hi,
I am going to import solr source code to eclipse for some development
purpose. Unfortunately every tutorial that I found for this purpose
 is
outdated and did not work. So would you please give me some hint
 about
   how
can I import solr source code to eclipse?
Thank you very much.
   
--
A.Nazemian
   
  
 



 --
 A.Nazemian



Re: import solr source to eclipse

2014-10-13 Thread Rajani Maski
Hi,

The best tutorial for setting up Solr[solr 4.7] in eclipse/intellij  is
documented in Solr In Action book, Apendix A, *Working with the Solr
codebase*


On Mon, Oct 13, 2014 at 6:45 AM, Tomás Fernández Löbbe 
tomasflo...@gmail.com wrote:

 The way I do this:
 From a terminal:
 svn checkout https://svn.apache.org/repos/asf/lucene/dev/trunk/
 lucene-solr-trunk
 cd lucene-solr-trunk
 ant eclipse

 ... And then, from your Eclipse import existing java project, and select
 the directory where you placed lucene-solr-trunk

 On Sun, Oct 12, 2014 at 7:09 AM, Ali Nazemian alinazem...@gmail.com
 wrote:

  Hi,
  I am going to import solr source code to eclipse for some development
  purpose. Unfortunately every tutorial that I found for this purpose is
  outdated and did not work. So would you please give me some hint about
 how
  can I import solr source code to eclipse?
  Thank you very much.
 
  --
  A.Nazemian
 



Re: Document Security Model Question

2013-11-14 Thread Rajani Maski
Hi,

For the case: *it requires *constant reindexing if a value in this field
changes
 If the acl for documents keep changing, Solr PostFilter is one of the
option. We use it in our system. We have almost near to billion documents
and 5000 approx users.


But it is important to check whether the acl changes are frequent and
decide solution based on that. The first option in your list works
efficiently without effecting search performance. In case the value changes
are less frequent then re-indexing of only those documents should not be
the concern.  But then, If changes are frequent, Post filter can be used
and will add some amount of delay.


Thanks












On Fri, Nov 15, 2013 at 4:32 AM, kchellappa kannan.chella...@gmail.comwrote:

 I had earlier posted a similar discussion in LinkedIn and David Smiley
 rightly advised me that solr-user is a better place for technical
 discussions

 --

 Our product which is hosted supports searching on educational resources.
 Our
 customers can choose to make specific resources unavailable for their users
 and also it depends on licensing. Our current solution uses full text
 search
 support in the database and handles availability as part of sql .

 My task is to move the search from the database full text search into Solr.
 I searched through posts and found some that were kind of related and I am
 thinking along the following lines

   a)  Use the authorization model.   I can add fields like allow and/or
 deny
 in the index which contain the list of customers.  At query time, I can add
 the constraint based on the customer Id.  I am concerned about the
 performance if there are lot of values for these fields and also it
 requires
 constant reindexing if a value in this field changes
  b) Use Query-time Join.
  Have the resource to availability for customer in separate inner
 documents.
  We are planning to deploy in SolrCloud.  I have read some challenges
 about Query-time join and SolrCloud. So this may not work for us.

 c) Other ideas?

 Excerpts from David Smiley's response

 You're right that there may be some re-indexing as security rules change.
 If
 many Lucene/Solr documents share identical access control with other
 documents, then it may make more sense to externally determine which unique
 set of access-control sets the user has access to, then finally search by
 id
 -- which will hopefully not be a huge number. I've seen this done both
 externally and with a Solr core to join on.






 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Document-Security-Model-Question-tp4101078.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: character encoding issue...

2013-11-03 Thread Rajani Maski
How are you extracting the text that is there in the website[1] you are
referring to? Apache Nutch or any other crawler? If yes, initially check
whether that crawler engine is giving you data in correct format before you
invoke solr index method.

[1]http://blog.diigo.com/2009/09/28/scheduled-groups-maintenance/

URI encoding should resolve this problem.




On Fri, Nov 1, 2013 at 10:50 AM, Chris christu...@gmail.com wrote:

 Hi Rajani,

 I followed the steps exactly as in

 http://zensarteam.wordpress.com/2011/11/25/6-steps-to-configure-solr-on-apache-tomcat-7-0-20/

 However, when i send a query to this new instance in tomcat, i again get
 the error -

   str name=fulltxtScheduled Groups Maintenance
 In preparation for the new release roll-out, Diigo groups won’t be
 accessible on Sept 28 (Mon) around midnight 0:00 PST for several
 hours.
 Stay tuned to say hello to Diigo V4 soon!

 location of the text  -
 http://blog.diigo.com/2009/09/28/scheduled-groups-maintenance/

 same problem at - http://cn.nytimes.com/business/20130926/c26alibaba/

 All text in title comes like -

  - �
 /str
 arr name=text
   str -
 � /str
 /arr


 Can you please advice?

 Chris




 On Tue, Oct 29, 2013 at 11:33 PM, Rajani Maski rajinima...@gmail.com
 wrote:

  Hi,
 
 If you are using Apache Tomcat Server, hope you are not missing the
  below mentioned configuration:
 
   Connector port=”port Number″ protocol=”HTTP/1.1″
  connectionTimeout=”2″
  redirectPort=”8443″ *URIEncoding=”UTF-8″*/
 
  I had faced similar issue with Chinese Characters and had resolved with
 the
  above config.
 
  Links for reference :
 
 
 http://zensarteam.wordpress.com/2011/11/25/6-steps-to-configure-solr-on-apache-tomcat-7-0-20/
 
 
 http://blog.sidu.in/2007/05/tomcat-and-utf-8-encoded-uri-parameters.html#.Um_3P3Cw2X8
 
 
  Thanks
 
 
 
  On Tue, Oct 29, 2013 at 9:20 PM, Chris christu...@gmail.com wrote:
 
   Hi All,
  
   I get characters like -
  
   �� - CTA -
  
   in the solr index. I am adding Java beans to solr by the addBean()
   function.
  
   This seems to be a character encoding issue. Any pointers on how to
   resolve this one?
  
   I have seen that this occurs  mostly for japanese chinese characters.
  
 



Re: Replacing Google Mini Search Appliance with Solr?

2013-10-30 Thread Rajani Maski
Hi Eric,

  I have also developed mini-applications replacing GSA for some of our
clients using Apache Nutch + Solr to crawl multi lingual sites and enable
multi-lingual search. Nutch+Solr is very stable and Nutch mailing list
provides a good support.

Reference link to start:
https://sites.google.com/site/profilerajanimaski/webcrawlers/apache-nutch

Thanks
Rajani




On Thu, Oct 31, 2013 at 12:27 AM, Palmer, Eric epal...@richmond.edu wrote:

 Markus and Jason

 thanks for the info.

 I will start to research Nutch.  Writing a crawler, agree it is a rabbit
 hole.


 --
 Eric Palmer

 Web Services
 U of Richmond

 To report technical issues, obtain technical support or make requests for
 enhancements please visit
 http://web.richmond.edu/contact/technical-support.html





 On 10/30/13 2:53 PM, Jason Hellman jhell...@innoventsolutions.com
 wrote:

 Nutch is an excellent option.  It should feel very comfortable for people
 migrating away from the Google appliances.
 
 Apache Droids is another possible way to approach, and I¹ve found people
 using Heretrix or Manifold for various use cases (and usually in
 combination with other use cases where the extra overhead was worth the
 trouble).
 
 I think the simples approach will be NutchŠit¹s absolutely worth taking a
 shot at it.
 
 DO NOT write a crawler!  That is a rabbit hole you do not want to peer
 down into :)
 
 
 
 On Oct 30, 2013, at 10:54 AM, Markus Jelsma markus.jel...@openindex.io
 wrote:
 
  Hi Eric,
 
  We have also helped some government institution to replave their
 expensive GSA with open source software. In our case we use Apache Nutch
 1.7 to crawl the websites and index to Apache Solr. It is very
 effective, robust and scales easily with Hadoop if you have to. Nutch
 may not be the easiest tool for the job but is very stable, feature rich
 and has an active community here at Apache.
 
  Cheers,
 
  -Original message-
  From:Palmer, Eric epal...@richmond.edu
  Sent: Wednesday 30th October 2013 18:48
  To: solr-user@lucene.apache.org
  Subject: Replacing Google Mini Search Appliance with Solr?
 
  Hello all,
 
  Been lurking on the list for awhile.
 
  We are at the end of life for replacing two google mini search
 appliances used to index our public web sites. Google is no longer
 selling the mini appliances and buying the big appliance is not cost
 beneficial.
 
  http://search.richmond.edu/
 
  We would run a solr replacement in linux (cents, redhat, similar) with
 open Java or Oracle Java.
 
  Background
  ==
  ~130 sites
  only ~12,000 pages (at a depth of 3)
  probably ~40,000 pages if we go to a depth of 4
 
  We use key matches a lot. In solr terms these are elevated documents
 (elevations)
 
  We would code a search query form in php and wrap it into our design
 (http://www.richmond.edu)
 
  I have played with and love lucidworks and know that their $ solution
 works for our use cases but the cost model is not attractive for such a
 small collection.
 
  So with solr what are my open source options and what are people's
 experiences crawling and indexing web sites with solr + crawler. I
 understand there is not a crawler with solr so that would have to be
 first up to get one working.
 
  We can code in Java, PHP, Python etc. if we have to, but we don't want
 to write a crawler if we can avoid it.
 
  thanks in advance for and information.
 
  --
  Eric Palmer
  Web Services
  U of Richmond
 
 
 




Re: character encoding issue...

2013-10-29 Thread Rajani Maski
Hi,

   If you are using Apache Tomcat Server, hope you are not missing the
below mentioned configuration:

 Connector port=”port Number″ protocol=”HTTP/1.1″
connectionTimeout=”2″
redirectPort=”8443″ *URIEncoding=”UTF-8″*/

I had faced similar issue with Chinese Characters and had resolved with the
above config.

Links for reference :
http://zensarteam.wordpress.com/2011/11/25/6-steps-to-configure-solr-on-apache-tomcat-7-0-20/
http://blog.sidu.in/2007/05/tomcat-and-utf-8-encoded-uri-parameters.html#.Um_3P3Cw2X8


Thanks



On Tue, Oct 29, 2013 at 9:20 PM, Chris christu...@gmail.com wrote:

 Hi All,

 I get characters like -

 �� - CTA -

 in the solr index. I am adding Java beans to solr by the addBean()
 function.

 This seems to be a character encoding issue. Any pointers on how to
 resolve this one?

 I have seen that this occurs  mostly for japanese chinese characters.



Re: Chinese language search in SOLR 3.6.1

2013-10-22 Thread Rajani Maski
Hi Poornima,

  Your statement :   It works fine with the chinese strings but not
working with product code or ISBN even though the fields are defined as
string is confusing.

Did you mean that the product code and ISBN fields are of type text_Chinese?

Is it first or second:
field name=product_code* type=string *indexed=true stored=false/
or
field name=product_code type=text_chinese indexed=true
stored=false/


What do you refer to when you tell that it's not working? Unable to search?

















On Tue, Oct 22, 2013 at 6:09 PM, Poornima Jay poornima...@rocketmail.comwrote:

 Hi,

 Did any one face a problem for chinese language in SOLR 3.6.1. Below is
 the analyzer in the schema.xml file.

 fieldType name=text_chinese class=solr.TextField
 positionIncrementGap=100
   analyzer type=index
   tokenizer class=solr.CJKTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true/
filter class=solr.ChineseFilterFactory /
   filter class=solr.RemoveDuplicatesTokenFilterFactory/

   /analyzer
   analyzer type=query
 tokenizer class=solr.CJKTokenizerFactory/
   filter class=solr.ChineseFilterFactory /
   filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt/
   filter class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
  /fieldType

 It works fine with the chinese strings but not working with product code
 or ISBN even though the fields are defined as string.

 Please let me know how should the chinese schema be configured.

 Thanks.
 Poornima



Re: Chinese language search in SOLR 3.6.1

2013-10-22 Thread Rajani Maski
String field will work for any case when you do exact key search.
text_chinese also should work if you are simply searching with exact
string676767667.

Well, the best way to find an answer to this query is by using solr
analysis tool : http://localhost:8983/solr/#/collection1/analysis
Enter your field type and index time input that you had given with query
value that you are searching for.

You should be able to find your answers.





On Tue, Oct 22, 2013 at 8:06 PM, Poornima Jay poornima...@rocketmail.comwrote:

 Hi Rajani,

 Below is the configured in my schema.
 fieldType name=text_chinese class=solr.TextField
 positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.ChineseTokenizerFactory/
 filter class=solr.StopFilterFactory  ignoreCase=true
  words=stopwords.txt   enablePositionIncrements=true /
 filter class=solr.ChineseFilterFactory /
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
   analyzer type=query
 tokenizer class=solr.ChineseTokenizerFactory/
 filter class=solr.ChineseFilterFactory /
 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
 /fieldType

 field name=product_code type=string indexed=true stored=false
 multiValued=true /
 field name=author_name type=text_chinese indexed=true
 stored=false multiValued=true/
 field name=author_name_string type=string indexed=true
 stored=false multiValued=true /
 field name=simple type=text_chinese indexed=true stored=false
 multiValued=true /
 copyField source=product_code dest=simple /
 copyField source=author_name dest=author_name_string /

 if I search with the query q=simple:总评价 it works but doesn't work if I
 search with q=simple:676767667. If the field is defined as string the
 chinese character works but doesn't work if it is defined as text_chinese.

 Regards,
 Poornima




   On Tuesday, 22 October 2013 7:52 PM, Rajani Maski rajinima...@gmail.com
 wrote:
  Hi Poornima,

   Your statement :   It works fine with the chinese strings but not
 working with product code or ISBN even though the fields are defined as
 string is confusing.

 Did you mean that the product code and ISBN fields are of type
 text_Chinese?

 Is it first or second:
 field name=product_code* type=string *indexed=true stored=false/
 or
 field name=product_code type=text_chinese indexed=true
 stored=false/


 What do you refer to when you tell that it's not working? Unable to search?

















 On Tue, Oct 22, 2013 at 6:09 PM, Poornima Jay 
 poornima...@rocketmail.comwrote:

 Hi,

 Did any one face a problem for chinese language in SOLR 3.6.1. Below is
 the analyzer in the schema.xml file.

 fieldType name=text_chinese class=solr.TextField
 positionIncrementGap=100
   analyzer type=index
   tokenizer class=solr.CJKTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true/
filter class=solr.ChineseFilterFactory /
   filter class=solr.RemoveDuplicatesTokenFilterFactory/

   /analyzer
   analyzer type=query
 tokenizer class=solr.CJKTokenizerFactory/
   filter class=solr.ChineseFilterFactory /
   filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt/
   filter class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
  /fieldType

 It works fine with the chinese strings but not working with product code
 or ISBN even though the fields are defined as string.

 Please let me know how should the chinese schema be configured.

 Thanks.
 Poornima







Re: Grouping results - set document return count not group.limit

2013-02-10 Thread Rajani Maski
Hi Thank you for the reply.


On Fri, Feb 8, 2013 at 12:32 PM, Prakhar Birla prakharbi...@gmail.comwrote:

 Hi Rajani,

 I recently tried to solve a similar problem as the one you have. (I think)
 Solr doesn't support a param to achieve this because if we were to limit
 the no of documents returned, to get the next result set the starting
 offset of each group will be different based on the number of
 documents/group in the first page.

 My problem was a little more complex as I had to limit the number of
 documents differently per group and paginate them together. We solved this
 by using Solr 4.0 with a patch from JIRA (
 https://issues.apache.org/jira/browse/SOLR-1093) which allowed execution
 of
 multiple queries in parallel threads along with a few enhancements that
 have not been made public yet by the company I work for.

 On 8 February 2013 10:13, Rajani Maski rajinima...@gmail.com wrote:

  Hi all,
 
  Is there any parameter which will set the number of document returned
  after applying grouping on results? Like we have query.setRows for
 results
  without grouping?
 
 
 
  I know all the below functions will set to group param. But this will not
  limit number of document returned. We want to get all the results
 belonging
  for each group (group.limit=-1) and display only 20 records at a time(doc
  returned should be limited to given integer). Anyway param to get this?
 
  rowsintegerThe number of groups to return. The default value is 10.start
  integerSpecifies an initial offset for the list of
  groups.group.limitintegerSpecifies
  the number of results to return for each group. The default value is 1.
 



 --
 Regards,
 Prakhar Birla



Grouping results - set document return count not group.limit

2013-02-07 Thread Rajani Maski
Hi all,

Is there any parameter which will set the number of document returned
after applying grouping on results? Like we have query.setRows for results
without grouping?



I know all the below functions will set to group param. But this will not
limit number of document returned. We want to get all the results belonging
for each group (group.limit=-1) and display only 20 records at a time(doc
returned should be limited to given integer). Anyway param to get this?

rowsintegerThe number of groups to return. The default value is 10.start
integerSpecifies an initial offset for the list of
groups.group.limitintegerSpecifies
the number of results to return for each group. The default value is 1.


Re: Dynamic fields - names having special characters like

2013-02-06 Thread Rajani Maski
Hi all,


  I found a solution myself -  to replace -  with lt; and gt;


Thanks  Regards
Rajani



On Wed, Feb 6, 2013 at 3:50 PM, Rajani Maski rajinima...@gmail.com wrote:

 Hi all,

   We have few *dynamic field names*  coming with special character. ex: *
 first/_str*. Solr throws error:* org.apache.solr.common.SolrException:
 Unexpected ''*
 I followed this link
 http://lucene.472066.n3.nabble.com/Search-on-dynamic-fields-which-contains-spaces-special-characters-td472100.htmlfor
 escaping  - */first//_str *didn't work.


 Is there anyway to handle such cases?



 Awaiting for reply


 Thanks  Regards
 Rajani



Re: Get report of keywords searched.

2012-10-07 Thread Rajani Maski
Hi Davide,  Yes right. This can be done.

 Just one question, I am not sure if I had to create new thread for
this question, Just wanted to know whether solrmeter or jmeter can help me
get the keywords searched list? I am novice to solrmeter, just know that
its used for stress test. Interested to know if I can use same tools for
this case of getting keywords searhed list.


Thanks
Rajani

On Fri, Oct 5, 2012 at 7:23 PM, Davide Lorenzo Marino 
davide.mar...@gmail.com wrote:

 If you think this could be a problem for your performances you can try two
 different solutions:

 1 - Make the call to update the db in a different thread
 2 - Make an asynchronous http call to a web application that update the db
 (in this case the web app can be resident in a different machine, so the
 ram, cpu time and disk operations don't slow your solr engine)


 2012/10/5 Rajani Maski rajinima...@gmail.com

  Hi,
 
   Thank you for the reply Davide.
 
 Writing to db you mean to insert into db the search queries? I was
  thinking that this might effect search performance?
  Yes you are right, Getting stats for particular key word is tough. It
 would
  suffice if I can get q param and fq param values( when we search using
  standard request handler).  Any open source solr log analysis tools? Can
 we
  achieve this with solrmeter? Has anyone tried with this?
 
  Thank You
 
 
 
 
  On Thu, Oct 4, 2012 at 2:07 PM, Davide Lorenzo Marino 
  davide.mar...@gmail.com wrote:
 
   If you need to analyze the search queries is not very difficult, just
   create a search plugin and put them in a db.
   If you need to search the single keywords it is more difficult and you
  need
   before starting to take some decision. In particular take the following
   queries and try to answer how you would like to treat them for the
   keywards:
  
   1) apple OR orange
   2) apple AND orange
   3) title:apple AND subject:orange
   4) apple -orange
   5) apple OR (orange AND banana)
   6) title:apple OR subject:orange
  
   Ciao
  
   Davide Marino
  
  
  
  
  
  
  
  
   2012/10/3 Rajani Maski rajinima...@gmail.com
  
Hi All,
   
   I am using solrJ. When there is search query hit, I am logging the
  url
in a location and also it is getting logged into tomcat catalina
 logs.
 Now I wanted to implement a functionality of periodically(per week)
analyzing search logs of solr and find out the keywords searched. Is
   there
a way to do it using any of the existing functionality of solr? If
 not,
Anybody has tried this implementation with any open source tools?
Suggestions welcome. . Awaiting reply
   
   
Thank you.
   
  
 



Re: Get report of keywords searched.

2012-10-05 Thread Rajani Maski
Hi,

 Thank you for the reply Davide.

   Writing to db you mean to insert into db the search queries? I was
thinking that this might effect search performance?
Yes you are right, Getting stats for particular key word is tough. It would
suffice if I can get q param and fq param values( when we search using
standard request handler).  Any open source solr log analysis tools? Can we
achieve this with solrmeter? Has anyone tried with this?

Thank You




On Thu, Oct 4, 2012 at 2:07 PM, Davide Lorenzo Marino 
davide.mar...@gmail.com wrote:

 If you need to analyze the search queries is not very difficult, just
 create a search plugin and put them in a db.
 If you need to search the single keywords it is more difficult and you need
 before starting to take some decision. In particular take the following
 queries and try to answer how you would like to treat them for the
 keywards:

 1) apple OR orange
 2) apple AND orange
 3) title:apple AND subject:orange
 4) apple -orange
 5) apple OR (orange AND banana)
 6) title:apple OR subject:orange

 Ciao

 Davide Marino








 2012/10/3 Rajani Maski rajinima...@gmail.com

  Hi All,
 
 I am using solrJ. When there is search query hit, I am logging the url
  in a location and also it is getting logged into tomcat catalina logs.
   Now I wanted to implement a functionality of periodically(per week)
  analyzing search logs of solr and find out the keywords searched. Is
 there
  a way to do it using any of the existing functionality of solr? If not,
  Anybody has tried this implementation with any open source tools?
  Suggestions welcome. . Awaiting reply
 
 
  Thank you.
 



Get report of keywords searched.

2012-10-03 Thread Rajani Maski
Hi All,

   I am using solrJ. When there is search query hit, I am logging the url
in a location and also it is getting logged into tomcat catalina logs.
 Now I wanted to implement a functionality of periodically(per week)
analyzing search logs of solr and find out the keywords searched. Is there
a way to do it using any of the existing functionality of solr? If not,
Anybody has tried this implementation with any open source tools?
Suggestions welcome. . Awaiting reply


Thank you.


Prefix (facet.prefix) based auto-suggest on Multi-Valued field do not return results

2012-08-17 Thread Rajani Maski
Hi All,

 * When I do facet.prefix on a * KEYWORDS *field(this field is multi
valued) , I don't get suggestion for the first key in this field . *

Example  :

I have 2 documents with the field KEYWORDS  containing multiple values.

arr name=KEYWORDS
str偏振式3D成像原理/str
str采用LED边缘发光的新技术/str
str高级降噪运算法及画质增强技术可/str
/arr

arr name=KEYWORDS
str紧凑机身,轻松携带/str
str节能低耗,持久续航/str
/arr



If I do on next following strings - I get respective suggestions.

BUT If I do facet.prefix  on red colored string  - facet.field=KEYWORDS
facet.prefix=偏振 : there are no suggestions.



What can be the reason?





Thanks  Regards
Rajani


Re: Prefix (facet.prefix) based auto-suggest on Multi-Valued field do not return results

2012-08-17 Thread Rajani Maski
Hi,

I think this is because of the space observed   - facet.prefix=   empty
string  -  Please see below

lst name=A_KEYS
int name= 偏振式3D成像原理3/int
int name=usb媒体播放3/int
int name=手机智能遥控3/int
int name= 紧凑机身,轻松携带 3/int
int name=电脑键盘遥控3/int
... so on

*But why is this space inserted?*  If you see below, the list of keywords
taken from search results , there is no space.

Thanks  Regards
Rajani



On Fri, Aug 17, 2012 at 3:02 PM, Rajani Maski rajinima...@gmail.com wrote:

 Hi All,

  * When I do facet.prefix on a * KEYWORDS *field(this field is multi
 valued) , I don't get suggestion for the first key in this field . *

 Example  :

 I have 2 documents with the field KEYWORDS  containing multiple values.

 arr name=KEYWORDS
 str偏振式3D成像原理/str
 str采用LED边缘发光的新技术/str
 str高级降噪运算法及画质增强技术可/str
 /arr

 arr name=KEYWORDS
 str紧凑机身,轻松携带/str
 str节能低耗,持久续航/str
 /arr



 If I do on next following strings - I get respective suggestions.

 BUT If I do facet.prefix  on red colored string  - facet.field=KEYWORDS
 facet.prefix=偏振 : there are no suggestions.



 What can be the reason?





 Thanks  Regards
 Rajani






Chinese character not encoded for facet.prefix but encoded for q field

2012-08-16 Thread Rajani Maski
Chinese character not encoded for facet.prefix but encoded for q field  -
BODY
*
*
*why?what might be the problem?*

This is done :
   Connector port=8090 protocol=HTTP/1.1
   connectionTimeout=2
   redirectPort=8443 URIEncoding=UTF-8/




[image: Inline image 2]


Paoding analyzer with solr for chinese

2012-08-08 Thread Rajani Maski
Hi All,

  As said in this blog
sitehttp://java.dzone.com/articles/indexing-chinese-solr that
paoding
analyzer is much better for chinese text, I was trying to implement it to
get accurate results for chinese text.

I followed the instruction specified in the below sites
Site1http://androidyou.blogspot.hk/2010/05/chinese-tokenizerlibrary-paoding-with.html
   
Site2http://www.opensourceconnections.com/2011/12/23/indexing-chinese-in-solr/


After Indexing, when I search on same field with same text, no search
results(numFound=0)

And luke tool is not showing up any terms for the field that is indexed
with below field type. Can anyone comment on what is going wrong?



*Schema field types for  paoding :*

*1) fieldType name=paoding class=solr.TextField
positionIncrementGap=100 *
* analyzer*
* tokenizer class=test.solr.PaodingTokerFactory.PaoDingTokenizerFactory/
*
* /analyzer*
* /fieldType*


And analaysis page results is :
[image: Inline image 2]

*2)fieldType name=paoding_chinese class=solr.TextField*
*  analyzer class=net.paoding.analysis.analyzer.PaodingAnalyzer*
*  /analyzer*
*/fieldType*

Analysis on the  field paoding_chinese throws this error
[image: Inline image 3]



Thanks  Regards
Rajani


Re: Paoding analyzer with solr for chinese

2012-08-08 Thread Rajani Maski
Hi All,

  Any reply on this?



On Wed, Aug 8, 2012 at 3:23 PM, Rajani Maski rajinima...@gmail.com wrote:

 Hi All,

   As said in this blog 
 sitehttp://java.dzone.com/articles/indexing-chinese-solr that paoding
 analyzer is much better for chinese text, I was trying to implement it to
 get accurate results for chinese text.

 I followed the instruction specified in the below sites
 Site1http://androidyou.blogspot.hk/2010/05/chinese-tokenizerlibrary-paoding-with.html

 Site2http://www.opensourceconnections.com/2011/12/23/indexing-chinese-in-solr/


 After Indexing, when I search on same field with same text, no search
 results(numFound=0)

 And luke tool is not showing up any terms for the field that is indexed
 with below field type. Can anyone comment on what is going wrong?



 *Schema field types for  paoding :*

 *1) fieldType name=paoding class=solr.TextField
 positionIncrementGap=100 *
 * analyzer*
 * tokenizer
 class=test.solr.PaodingTokerFactory.PaoDingTokenizerFactory/*
 * /analyzer*
 * /fieldType*


 And analaysis page results is :
 [image: Inline image 2]

 *2)fieldType name=paoding_chinese class=solr.TextField*
 *  analyzer class=net.paoding.analysis.analyzer.PaodingAnalyzer*
 *  /analyzer*
 */fieldType*

 Analysis on the  field paoding_chinese throws this error
 [image: Inline image 3]



 Thanks  Regards
 Rajani





Re: Adding new field before import- using post.jar

2012-08-04 Thread Rajani Maski
Thank you for the reply.

8.How about Extending class : XmlUpdateRequestHandler? Is it possible and
good method?


Regards
Rajani






On Fri, Aug 3, 2012 at 8:32 PM, Erik Hatcher erik.hatc...@gmail.com wrote:

 I hate to also add:

   6. Use DataImportHandler

 It can index Solr XML, and could add field values, either statically or by
 template glue if you need to combine multiple field values somehow.

 And in 4.0 you'll be able to use:

   7: scripting update processor

 Erik


 On Aug 3, 2012, at 10:51 , Jack Krupansky wrote:

  1. Google for XSLT tools.
  2. Write a script that loads the XML, adds the fields, and writes the
 updated XML.
  3. Same as #2, but using Java.
  4. If the fields are constants, set default values in the schema and
 then the documents will automatically get those values when added. Take the
 default value attributes out of the schema once you have input documents
 that actually have the new field values.
  5. Hire a consultant.
 
  -- Jack Krupansky
 
  -Original Message- From: Rajani Maski
  Sent: Friday, August 03, 2012 5:37 AM
  To: solr-user@lucene.apache.org
  Subject: Adding new field before import- using post.jar
 
  Hi all,
 
  I have xmls in a folder in the standard solr xml format. I was simply
 using
  SimplePostTool.java to import these xmls to solr. Now I have to add 3 new
  fields to each document in the xml before doing a post.
 
  What can be the effective way for doing this?
 
 
  Thanks  Regards
  Rajani




Re: Adding new field before import- using post.jar

2012-08-04 Thread Rajani Maski
They are coming from text file.

SolrXML input documents are xmls in folder location. (To import these xmls,
I was using simple post.jar) Now, for each xml there is need to add 3
external new fields reading values from text file.


Regards
Rajani

On Sat, Aug 4, 2012 at 10:59 PM, Jack Krupansky j...@basetechnology.comwrote:

 Where are the values of the three new fields coming from?

 Are they constant/default values?
 Computed from other fields in the XML?
 From other XML files?
 From a text file?
 From a database?
 Or where?

 So, given a specific Solr XML input document, how will you be accessing
 the three field values to add?

 This may guide the approach that you could/should take.


 -- Jack Krupansky

 -Original Message- From: Rajani Maski
 Sent: Friday, August 03, 2012 5:37 AM
 To: solr-user@lucene.apache.org
 Subject: Adding new field before import- using post.jar

 Hi all,

 I have xmls in a folder in the standard solr xml format. I was simply using
 SimplePostTool.java to import these xmls to solr. Now I have to add 3 new
 fields to each document in the xml before doing a post.

 What can be the effective way for doing this?


 Thanks  Regards
 Rajani



Re: split on white space and then EdgeNGramFilterFactory

2012-08-03 Thread Rajani Maski
Yes this works, Thank you.


Regards
Rajani

On Thu, Aug 2, 2012 at 6:04 PM, Jack Krupansky j...@basetechnology.comwrote:

 Only do the ngram filter at index time. So, add a query-time analyzer to
 that field type but without the ngram filter.

 Also, add debugQuery to your query request to see what Lucene query is
 generated.

 And, use the Solr admin analyzer to validate both index-time and
 query-time analysis of your terms.

 -- Jack Krupansky

 -Original Message- From: Rajani Maski
 Sent: Thursday, August 02, 2012 7:26 AM
 To: solr-user@lucene.apache.org
 Subject: split on white space and then EdgeNGramFilterFactory


 Hi,

   I wanted to do split on white space and then apply
 EdgeNGramFilterFactory.

 Example : A field in a document has text content : smart phone, i24
 xpress exchange offer, 500 dollars

 smart s sm sma smar smart
 phone p ph pho phon phone
 i24  i i2 i24
 xpress x xp xpr xpre xpres xpress

 so on.

 If I search on  xpres  I should get this document record matched

 What field type can support this?

 I was trying with below one but was not able to achieve the above
 requirement.

 fieldType name=text class=solr.TextField positionIncrementGap=100
 analyzer
 tokenizer class=solr.**WhitespaceTokenizerFactory/
 filter class=solr.**EdgeNGramFilterFactory minGramSize=1
 maxGramSize=25
 /
 filter class=solr.**LowerCaseFilterFactory/
 /analyzer
 /type

 Any suggestions?

 Thanks,
 Rajani



split on white space and then EdgeNGramFilterFactory

2012-08-02 Thread Rajani Maski
Hi,

   I wanted to do split on white space and then apply
EdgeNGramFilterFactory.

Example : A field in a document has text content : smart phone, i24
xpress exchange offer, 500 dollars

smart s sm sma smar smart
phone p ph pho phon phone
i24  i i2 i24
xpress x xp xpr xpre xpres xpress

so on.

If I search on  xpres  I should get this document record matched

What field type can support this?

I was trying with below one but was not able to achieve the above
requirement.

fieldType name=text class=solr.TextField positionIncrementGap=100
analyzer
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=25
/
filter class=solr.LowerCaseFilterFactory/
/analyzer
/type

Any suggestions?

Thanks,
Rajani


Re: Autocomplete terms from the middle of name/description of a Doc

2012-07-28 Thread Rajani Maski
Hi,

   One approach for this can be to get fact.prefix results for prefix based
suggests and for suggesting names from middle of doc what you can do is
index that name field with white space and edge ngram filter; search on
that field with prefix key word and fl=title only.. Then concatenate both :
facet prefix results and doc fields obtained for that search.

Ex: user searched for lcd
query should be  :  q=name_edgramed=lcdfacet.prefix= lcd fl=
name_edgramed.

You will get documents matched results having this keyword and also faceted
results with this prefix.

--Rajani







On Thu, Jul 26, 2012 at 12:21 AM, Chantal Ackermann 
c.ackerm...@it-agenten.com wrote:


  Suppose I have a product with a title='kMix Espresso maker'. If I
 tokenize
  this and put the result in product_tokens I should get
  '[kMix][Espresso][maker]'.
 
  If now I try to search with facet.field='product_tokens' and
  facet.prefix='espresso' I should get only 'espresso' while I want 'kMix
  Espresso maker'.

 Yes, you are probably right. I did use this approach at somepoint. Your
 remark has made me check my code again.
 I was using n_gram in the end.

 (facet.prefix on tokenized fields might work in certain circumstances
 where you can get the actual value from the string field (or its facet) in
 parallel.)

 This is the jquery autocomplete plugin instantiation:

 $(function() {
 $(#qterm).autocomplete({
 minLength: 1,
 source: function(request,response) {
 jQuery.ajax({
 url: /solr/select,
 dataType: json,
 data: {
 q : title_ngrams:\ +
 request.term + \,
 rows: 0,
 facet: true,
 facet.field: title,
 facet.mincount: 1,
 facet.sort: index,
 facet.limit: 10,
 fq: end_date:[NOW TO *]
 wt: json
 },
 success: function( data ) {
 /*var result = jQuery.map(
 data.facet_counts.facet_fields.title, function( item, index ) {
 if (index%2)
 return null;
 else return {
 //label:
 item,
 value: item
 }
 });*/
 var result = [];
 var facets =
 data.facet_counts.facet_fields.title;
 var j = 0;
 for (i=0; ifacets.length; i=i+2) {
 result[j] = facets[i];
 j = j+1;
 }
 response(result);
 }
 });
 }
 });

 And here the fieldtype ngram for title_ngram. title is a string type
 field.

 !-- NGram configuration for searching for wordparts
 without the use of wildcards.
 This is for suggesting search terms e.g. sourcing
 an autocomplete widget. --
 fieldType name=ngram class=solr.TextField
 analyzer type=index
 tokenizer
 class=solr.KeywordTokenizerFactory /
 filter class=solr.LengthFilterFactory
 min=1 max=500 /
 filter class=solr.TrimFilterFactory /
 filter
 class=solr.ISOLatin1AccentFilterFactory /
 filter
 class=solr.WordDelimiterFilterFactory splitOnCaseChange=1
  splitOnNumerics=1
 stemEnglishPossessive=1 generateWordParts=1
  generateNumberParts=1 catenateAll=1
 preserveOriginal=1 /
 filter
 class=solr.LowerCaseFilterFactory /
 filter
 class=solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=15
 side=front/
 filter
 class=solr.RemoveDuplicatesTokenFilterFactory /
 /analyzer
 analyzer 

Re: Significance of Analyzer Class attribute

2012-07-26 Thread Rajani Maski
Hi All,

  Thank you for the replies.



--Regards
Rajani


On Fri, Jul 27, 2012 at 9:58 AM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 :  When I specify analyzer class in schema,  something
 :  like below and do
 :  analysis on this field in analysis page : I cant  see
 :  verbose output on
 :  tokenizer and filters

 The reason for that is that if you use an explicit Analyzer
 implimentation, the analysis tool doesn't know what the individual phases
 of hte tokenfilters are -- the Analyzer API doesn't expose that
 information (some Analyzers may be monolithic and not made up of
 individual TokenFilters)


  :  fieldType name=text_chinese
 :  class=solr.TextField
 :analyzer
 :  class=org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer
 :tokenizer
 ...

 : Above config is somehow wrong. You cannot use both analyzer combined
 : with tokenizer and filter altogether. If you want to use lucene analyzer
 : in schema.xml there should be only analyzer definition.

 Right.  what's happening here is htat since a class is specifid for hte
 analyzer, it is ignoring the tokenizer+tokenfilters listed.  I've opened a
 bug to add better error checking to catch these kinds of configuration
 mistakes...

 https://issues.apache.org/jira/browse/SOLR-3683


 -Hoss


Significance of Analyzer Class attribute

2012-07-25 Thread Rajani Maski
Hi,  What is the significance of Analyzer  class  attribute?


When I specify analyzer class in schema,  something like below and do
analysis on this field in analysis page : I cant  see verbose output on
tokenizer and filters

fieldType name=text_chinese class=solr.TextField
  analyzer
class=org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer
  tokenizer class=solr.SmartChineseSentenceTokenizerFactory/
  filter class=solr.SmartChineseWordTokenFilterFactory/
  /analyzer
/fieldType


*But if i don't add analyzer class, I can see the verbose output based on
token and filters applied.*

fieldType name=text_chinese class=solr.TextField
  analyzer
  tokenizer class=solr.SmartChineseSentenceTokenizerFactory/
  filter class=solr.SmartChineseWordTokenFilterFactory/
  /analyzer
/fieldType

Why is it that I cant see for above case? What happens when I specify
Analyzer class?  Does it take any default if i do not mention class
attribute in analyzer tag?



Thanks  Regards
Rajani


Re: Facet on all the dynamic fields with *_s feature

2012-07-17 Thread Rajani Maski
Hi Users,

  Any reply for the query below?


On Mon, Jul 16, 2012 at 6:27 PM, Rajani Maski rajinima...@gmail.com wrote:

 In this URL  -  https://issues.apache.org/jira/browse/SOLR-247

 there are *patches *and one patch with name *SOLR-247-FacetAllFields*

 Will that help me to fix this problem?

 If yes, how do I  add this to solr plugin ?


 Thanks  Regards
 Rajani




 On Mon, Jul 16, 2012 at 5:04 PM, Darren Govoni dar...@ontrenet.comwrote:

 You'll have to query the index for the fields and sift out the _s ones
 and cache them or something.

 On Mon, 2012-07-16 at 16:52 +0530, Rajani Maski wrote:

  Yes, This feature will solve the below problem very neatly.
 
  All,
 
   Is there any approach to achieve this for now?
 
 
  --Rajani
 
  On Sun, Jul 15, 2012 at 6:02 PM, Jack Krupansky 
 j...@basetechnology.comwrote:
 
   The answer appears to be No, but it's good to hear people express an
   interest in proposed features.
  
   -- Jack Krupansky
  
   -Original Message- From: Rajani Maski
   Sent: Sunday, July 15, 2012 12:02 AM
   To: solr-user@lucene.apache.org
   Subject: Facet on all the dynamic fields with *_s feature
  
  
   Hi All,
  
 Is this issue fixed in solr 3.6 or 4.0:  Faceting on all Dynamic
 field
   with facet.field=*_s
  
 Link  :  https://issues.apache.org/**jira/browse/SOLR-247
 https://issues.apache.org/jira/browse/SOLR-247
  
  
  
If it is not fixed, any suggestion on how do I achieve this?
  
  
   My requirement is just same as this one :
   http://lucene.472066.n3.**nabble.com/Dynamic-facet-**
   field-tc2979407.html#none
 http://lucene.472066.n3.nabble.com/Dynamic-facet-field-tc2979407.html#none
 
  
  
   Regards
   Rajani
  






Re: Facet on all the dynamic fields with *_s feature

2012-07-16 Thread Rajani Maski
Yes, This feature will solve the below problem very neatly.

All,

 Is there any approach to achieve this for now?


--Rajani

On Sun, Jul 15, 2012 at 6:02 PM, Jack Krupansky j...@basetechnology.comwrote:

 The answer appears to be No, but it's good to hear people express an
 interest in proposed features.

 -- Jack Krupansky

 -Original Message- From: Rajani Maski
 Sent: Sunday, July 15, 2012 12:02 AM
 To: solr-user@lucene.apache.org
 Subject: Facet on all the dynamic fields with *_s feature


 Hi All,

   Is this issue fixed in solr 3.6 or 4.0:  Faceting on all Dynamic field
 with facet.field=*_s

   Link  :  
 https://issues.apache.org/**jira/browse/SOLR-247https://issues.apache.org/jira/browse/SOLR-247



  If it is not fixed, any suggestion on how do I achieve this?


 My requirement is just same as this one :
 http://lucene.472066.n3.**nabble.com/Dynamic-facet-**
 field-tc2979407.html#nonehttp://lucene.472066.n3.nabble.com/Dynamic-facet-field-tc2979407.html#none


 Regards
 Rajani



Re: Facet on all the dynamic fields with *_s feature

2012-07-16 Thread Rajani Maski
In this URL  -  https://issues.apache.org/jira/browse/SOLR-247

there are *patches *and one patch with name *SOLR-247-FacetAllFields*

Will that help me to fix this problem?

If yes, how do I  add this to solr plugin ?


Thanks  Regards
Rajani




On Mon, Jul 16, 2012 at 5:04 PM, Darren Govoni dar...@ontrenet.com wrote:

 You'll have to query the index for the fields and sift out the _s ones
 and cache them or something.

 On Mon, 2012-07-16 at 16:52 +0530, Rajani Maski wrote:

  Yes, This feature will solve the below problem very neatly.
 
  All,
 
   Is there any approach to achieve this for now?
 
 
  --Rajani
 
  On Sun, Jul 15, 2012 at 6:02 PM, Jack Krupansky j...@basetechnology.com
 wrote:
 
   The answer appears to be No, but it's good to hear people express an
   interest in proposed features.
  
   -- Jack Krupansky
  
   -Original Message- From: Rajani Maski
   Sent: Sunday, July 15, 2012 12:02 AM
   To: solr-user@lucene.apache.org
   Subject: Facet on all the dynamic fields with *_s feature
  
  
   Hi All,
  
 Is this issue fixed in solr 3.6 or 4.0:  Faceting on all Dynamic
 field
   with facet.field=*_s
  
 Link  :  https://issues.apache.org/**jira/browse/SOLR-247
 https://issues.apache.org/jira/browse/SOLR-247
  
  
  
If it is not fixed, any suggestion on how do I achieve this?
  
  
   My requirement is just same as this one :
   http://lucene.472066.n3.**nabble.com/Dynamic-facet-**
   field-tc2979407.html#none
 http://lucene.472066.n3.nabble.com/Dynamic-facet-field-tc2979407.html#none
 
  
  
   Regards
   Rajani
  





Facet on all the dynamic fields with *_s feature

2012-07-14 Thread Rajani Maski
Hi All,

   Is this issue fixed in solr 3.6 or 4.0:  Faceting on all Dynamic field
with facet.field=*_s

   Link  :  https://issues.apache.org/jira/browse/SOLR-247



  If it is not fixed, any suggestion on how do I achieve this?


My requirement is just same as this one :
http://lucene.472066.n3.nabble.com/Dynamic-facet-field-tc2979407.html#none


Regards
Rajani


Re: Trouble handling Unit symbol

2012-04-13 Thread Rajani Maski
Hi All,

   I tried to index with UTF-8  encode but the issue is still not fixed.
Please see my inputs below.

*Indexed XML:*
?xml version=1.0 encoding=UTF-8 ?
add
  doc
field name=ID0.100/field
field name=BODYµ/field
  /doc
/add

*Search Query - * BODY:µ

numfound : 0 results obtained.

*What can be the reason for this? How do i need to make search query so
that the above document is found.*


Thanks  Regards

Regards
Rajani



2012/4/2 Rajani Maski rajinima...@gmail.com

 Thank you for the reply.



 On Sat, Mar 31, 2012 at 3:38 AM, Chris Hostetter hossman_luc...@fucit.org
  wrote:


 : We have data having such symbols like :  ต
 : Indexed data has  -Dose:0 ตL
 : Now , when  it is searched as  - Dose:0 ตL
...
 : Query Q value observed  : str name=qS257:0 ยตL/injection/str

 First off: your when searched as example does not match up to your
 Query Q observed value (ie: field queries, extra /injection text at
 the end) suggesting that you maybe cut/paste something you didn't mean to
 -- so take the rest of this advice with a grain of salt.

 If i ignore your when it is searched as exampleand focus entirely on
 what you say you've indexed the data as, and the Q value you are sing (in
 what looks like the echoParams output) then the first thing that jumps out
 at me is that it looks like your servlet container (or perhaps your web
 browser if that's where you tested this) is not dealing with the unicode
 correctly -- because allthough i see a ต in the first three lines i
 quoted above (UTF8: 0xC2 0xB5) in your value observed i'm seeing it
 preceeded by a ย (UTF8: 0xC3 0x82) ... suggesting that perhaps the ต
 did not get URL encoded properly when the request was made to your servlet
 container?

 In particular, you might want to take a look at...


 https://wiki.apache.org/solr/FAQ#Why_don.27t_International_Characters_Work.3F
 http://wiki.apache.org/solr/SolrTomcat#URI_Charset_Config
 The example/exampledocs/test_utf8.sh script included with solr




 -Hoss





Re: Trouble handling Unit symbol

2012-04-13 Thread Rajani Maski
Fine. Thank you. I will look at it.


On Fri, Apr 13, 2012 at 5:21 PM, Erick Erickson erickerick...@gmail.comwrote:

 Please review:
 http://wiki.apache.org/solr/UsingMailingLists

 Especially the bit about adding debugQuery=on
 and showing the results. You're asking people
 to guess at solutions without providing much
 in the way of context.

 You might try looking at your index with Luke to
 see what's actually in your index, or perhaps
 TermsComponent


 Best
 Erick

 On Fri, Apr 13, 2012 at 2:29 AM, Rajani Maski rajinima...@gmail.com
 wrote:
  Hi All,
 
I tried to index with UTF-8  encode but the issue is still not fixed.
  Please see my inputs below.
 
  *Indexed XML:*
  ?xml version=1.0 encoding=UTF-8 ?
  add
   doc
 field name=ID0.100/field
 field name=BODYµ/field
   /doc
  /add
 
  *Search Query - * BODY:µ
 
  numfound : 0 results obtained.
 
  *What can be the reason for this? How do i need to make search query so
  that the above document is found.*
 
 
  Thanks  Regards
 
  Regards
  Rajani
 
 
 
  2012/4/2 Rajani Maski rajinima...@gmail.com
 
  Thank you for the reply.
 
 
 
  On Sat, Mar 31, 2012 at 3:38 AM, Chris Hostetter 
 hossman_luc...@fucit.org
   wrote:
 
 
  : We have data having such symbols like :  ต
  : Indexed data has  -Dose:0 ตL
  : Now , when  it is searched as  - Dose:0 ตL
 ...
  : Query Q value observed  : str name=qS257:0 ยตL/injection/str
 
  First off: your when searched as example does not match up to your
  Query Q observed value (ie: field queries, extra /injection text at
  the end) suggesting that you maybe cut/paste something you didn't mean
 to
  -- so take the rest of this advice with a grain of salt.
 
  If i ignore your when it is searched as exampleand focus entirely on
  what you say you've indexed the data as, and the Q value you are sing
 (in
  what looks like the echoParams output) then the first thing that jumps
 out
  at me is that it looks like your servlet container (or perhaps your web
  browser if that's where you tested this) is not dealing with the
 unicode
  correctly -- because allthough i see a ต in the first three lines i
  quoted above (UTF8: 0xC2 0xB5) in your value observed i'm seeing it
  preceeded by a ย (UTF8: 0xC3 0x82) ... suggesting that perhaps the
 ต
  did not get URL encoded properly when the request was made to your
 servlet
  container?
 
  In particular, you might want to take a look at...
 
 
 
 https://wiki.apache.org/solr/FAQ#Why_don.27t_International_Characters_Work.3F
  http://wiki.apache.org/solr/SolrTomcat#URI_Charset_Config
  The example/exampledocs/test_utf8.sh script included with solr
 
 
 
 
  -Hoss
 
 
 



Re: Trouble handling Unit symbol

2012-04-01 Thread Rajani Maski
Thank you for the reply.



On Sat, Mar 31, 2012 at 3:38 AM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 : We have data having such symbols like :  ?
 : Indexed data has  -Dose:0 ?L
 : Now , when  it is searched as  - Dose:0 ?L
...
 : Query Q value observed  : str name=qS257:0 ??L/injection/str

 First off: your when searched as example does not match up to your
 Query Q observed value (ie: field queries, extra /injection text at
 the end) suggesting that you maybe cut/paste something you didn't mean to
 -- so take the rest of this advice with a grain of salt.

 If i ignore your when it is searched as exampleand focus entirely on
 what you say you've indexed the data as, and the Q value you are sing (in
 what looks like the echoParams output) then the first thing that jumps out
 at me is that it looks like your servlet container (or perhaps your web
 browser if that's where you tested this) is not dealing with the unicode
 correctly -- because allthough i see a ? in the first three lines i
 quoted above (UTF8: 0xC2 0xB5) in your value observed i'm seeing it
 preceeded by a ? (UTF8: 0xC3 0x82) ... suggesting that perhaps the ?
 did not get URL encoded properly when the request was made to your servlet
 container?

 In particular, you might want to take a look at...


 https://wiki.apache.org/solr/FAQ#Why_don.27t_International_Characters_Work.3F
 http://wiki.apache.org/solr/SolrTomcat#URI_Charset_Config
 The example/exampledocs/test_utf8.sh script included with solr




 -Hoss


Trouble handling Unit symbol

2012-03-30 Thread Rajani Maski
Hi,

We have data having such symbols like :  µ


Indexed data has  -Dose:0 µL
Language type - English


Now , when  it is searched as  - Dose:0 µL
Number of document matched = 0


Query Q value observed  : str name=qS257:0 µL/injection/str




*Any solution to handle such cases? *

Thanks  Regards,
Rajani
*
*
*
*


Solr Optimization Fail

2011-12-16 Thread Rajani Maski
Hi,

 When we do optimize, it actually reduces the data size right?

I have index of size 6gb(5 million documents). Index is already created
with commits for every 1 documents.

Now I was trying to do optimization with  http optimize command.   When i
did that,  data size became - 12gb.  Why this might have happened?

And can anyone please suggest me fix for it?

Thanks
Rajani


Re: Solr Optimization Fail

2011-12-16 Thread Rajani Maski
These parameters are commented in my solr config.xml

see the parameters attached.

!-- The RunExecutableListener executes an external command from a
  hook such as postCommit or postOptimize.
 exe - the name of the executable to run
 dir - dir to use as the current working directory. default=.
 wait - the calling thread waits until the executable returns.
default=true
 args - the arguments to pass to the program.  default=nothing
 env - environment variables to set.  default=nothing
  --
!-- A postCommit event is fired after every commit or optimize command
listener event=postCommit class=solr.RunExecutableListener
  str name=exesolr/bin/snapshooter/str
  str name=dir./str
  bool name=waittrue/bool
  arr name=args strarg1/str strarg2/str /arr
  arr name=env strMYVAR=val1/str /arr
/listener
--
!-- A postOptimize event is fired only after every optimize command
listener event=postOptimize class=solr.RunExecutableListener
  str name=exesnapshooter/str
  str name=dirsolr/bin/str
  bool name=waittrue/bool
/listener
--


When i do optimize on index of size 400 mb , it reduces the size of data
folder to 200 mb. But when data is huge it doubles it.
Why is that so?

Optimization : Actually should reduce the size of the data ? Or
just improves the search query performance?






On Fri, Dec 16, 2011 at 5:40 PM, Juan Pablo Mora jua...@informa.es wrote:

 Maybe you are generating a snapshot of your index attached to the optimize
 ???
 Look for post-commit or post-optimize events in your solr-config.xml

 
 De: Rajani Maski [rajinima...@gmail.com]
 Enviado el: viernes, 16 de diciembre de 2011 11:11
 Para: solr-user@lucene.apache.org
 Asunto: Solr Optimization Fail

 Hi,

  When we do optimize, it actually reduces the data size right?

 I have index of size 6gb(5 million documents). Index is already created
 with commits for every 1 documents.

 Now I was trying to do optimization with  http optimize command.   When i
 did that,  data size became - 12gb.  Why this might have happened?

 And can anyone please suggest me fix for it?

 Thanks
 Rajani



Re: Solr Optimization Fail

2011-12-16 Thread Rajani Maski
Oh, yes on windows, using java 1.6 and Solr 1.4.1.

Ok let me try that one...

Thank you so much.

Regards,
Rajani



2011/12/16 Tomás Fernández Löbbe tomasflo...@gmail.com

 Are you on Windows? There is a JVM bug that makes Solr keep the old files,
 even if they are not used anymore. The files are going to be eventually
 removed, but if you want them out of there immediately try optimizing
 twice, the second optimize doesn't do much but it will remove the old
 files.

 On Fri, Dec 16, 2011 at 9:10 AM, Juan Pablo Mora jua...@informa.es
 wrote:

  Maybe you are generating a snapshot of your index attached to the
 optimize
  ???
  Look for post-commit or post-optimize events in your solr-config.xml
 
  
  De: Rajani Maski [rajinima...@gmail.com]
  Enviado el: viernes, 16 de diciembre de 2011 11:11
  Para: solr-user@lucene.apache.org
  Asunto: Solr Optimization Fail
 
  Hi,
 
   When we do optimize, it actually reduces the data size right?
 
  I have index of size 6gb(5 million documents). Index is already created
  with commits for every 1 documents.
 
  Now I was trying to do optimization with  http optimize command.   When i
  did that,  data size became - 12gb.  Why this might have happened?
 
  And can anyone please suggest me fix for it?
 
  Thanks
  Rajani