Re: Issue with getting highlight with hl.maxAnalyzedChars = -1

2013-05-13 Thread Dmitry Kan
You didn't say, what is exactly going weird..


On Fri, May 10, 2013 at 2:19 PM, meghana meghana.rav...@amultek.com wrote:

 I am facing one weird issue while setting hl.maxAnalyzedChars to -1 to
 fetch
 highlight for some random records, for other records its working fine.

 Below is my solr query

 http://localhost:8080/solr/core0/select?q=(text:new year) AND
 (id:2343287)hl=onhl.fl=texthl.fragsize=500hl.maxAnalyzedChars=-1

 If i remove hl.maxAnalyzedChars=-1 from above query , or set it to some
 positive value (higher than text field length) , then it return record with
 proper highlight.

 But my text field length is very long, and i want like to limit it, so I
 also need to set hl.maxAnalyzedChars to -1 . Please help me to solve this.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Issue-with-getting-highlight-with-hl-maxAnalyzedChars-1-tp4062269.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Negative Boosting at Recent Versions of Solr?

2013-05-13 Thread Toke Eskildsen
On Fri, 2013-05-10 at 16:49 +0200, Jason Hellman wrote:
   -23.0 = product(float(price)=11.5,const(-2))
 I wonder how fantastically this can be abused now?

Mmm... Products of negative scores. I foresee The products matching an
uneven number of search terms gets much higher scores. Why?-questions
in the future.

- Toke Eskildsen, State and University Library, Denmark



Re: Unable to load environment info from /solr/collection1/admin/system?wt=json

2013-05-13 Thread Furkan KAMACI
I have tried to open that URL:

ip:8983/solr/

I get that error:

INFO: [collection1] CLOSING SolrCore org.apache.solr.core.SolrCore@62ad1b5c
May 13, 2013 10:38:40 AM org.apache.solr.update.DirectUpdateHandler2 close
INFO: closing DirectUpdateHandler2{commits=0,autocommit
maxTime=15000ms,autocommits=0,soft
autocommits=0,optimizes=0,rollbacks=0,expungeDeletes=0,docsPending=0,adds=0,deletesById=0,deletesByQuery=0,errors=0,cumulative_adds=0,cumulative_deletesById=0,cumulative_deletesByQuery=0,cumulative_errors=0}

May 13, 2013 10:38:40 AM org.apache.solr.core.SolrCore closeSearcher
INFO: [collection1] Closing main searcher on request.
May 13, 2013 10:38:40 AM org.apache.solr.common.SolrException log
SEVERE: java.lang.RuntimeException: Already closed
at
org.apache.solr.core.CachingDirectoryFactory.get(CachingDirectoryFactory.java:336)

at
org.apache.solr.core.CachingDirectoryFactory.get(CachingDirectoryFactory.java:321)

at org.apache.solr.core.SolrCore.getNewIndexDir(SolrCore.java:244)
at org.apache.solr.core.SolrCore.getIndexDir(SolrCore.java:223)
at
org.apache.solr.handler.admin.SystemInfoHandler.getCoreInfo(SystemInfoHandler.java:112)

at
org.apache.solr.handler.admin.SystemInfoHandler.handleRequestBody(SystemInfoHandler.java:78)

at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1812)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639)

at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)

at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)

at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)

at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)

at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)

at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)

at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)

at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)

at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)

at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)

at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)

at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)

at org.eclipse.jetty.server.Server.handle(Server.java:365)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)

at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)

at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:926)

at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:988)

at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:635)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)

at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)

at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)

at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)

at java.lang.Thread.run(Thread.java:722)

May 13, 2013 10:38:40 AM org.apache.solr.core.SolrCore execute
INFO: [collection1] webapp=/solr path=/admin/system
params={_=136843072wt=json} status=500 QTime=1
May 13, 2013 10:38:40 AM org.apache.solr.common.SolrException log
SEVERE: null:java.lang.RuntimeException: Already closed
at
org.apache.solr.core.CachingDirectoryFactory.get(CachingDirectoryFactory.java:336)

at
org.apache.solr.core.CachingDirectoryFactory.get(CachingDirectoryFactory.java:321)

at org.apache.solr.core.SolrCore.getNewIndexDir(SolrCore.java:244)
at org.apache.solr.core.SolrCore.getIndexDir(SolrCore.java:223)
at
org.apache.solr.handler.admin.SystemInfoHandler.getCoreInfo(SystemInfoHandler.java:112)

at
org.apache.solr.handler.admin.SystemInfoHandler.handleRequestBody(SystemInfoHandler.java:78)

at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1812)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639)

at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)

at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)

at

Re: MultiValue

2013-05-13 Thread manju16832003
Hi All,
I managed to *solve* the issue I had posted earlier with respect to
multiValued.

Here is the Query suppose to configured this way in *data-config.xml *
Description: in the below, first query has associated table images. Each
person would have many images. Here the JSON/XML would return all the images
associated with the person in block.

document name=manju
  entity name=list dataSource=manjudb query=select
member_id,MemberName FROM member
entity name=imagelist dataSource=manjudb query=SELECT 
imagepath FROM
images WHERE member_id='${list.member_id}'
/entity
  /entity
/document

and *schema.xml* for the above query fields looks like

field name=member_id type=int indexed=true stored=true
required=true/ 
field name=MemberName type=string indexed=true stored=true/
field name=imagepath type=string indexed=true stored=true
bmultiValued=true* /

Output:
response: {
numFound: 3,
start: 0,
docs: [
  {
MemberName: Vettel,
member_id: 1,
_version_: 1434904021528739800
  },
  {
MemberName: Schumacher,
member_id: 2,
imagepath: [ //As you could see here that Three rows from the
table *images* returned as one JSON object.
  c:images\\etc\\test.jpg,
  c:images\\etc\\test211.jpg,
  c:images\\etc\\test2343434.jpg,
  C:manju
],
_version_: 1434904021541322800
  },
  {
MemberName: J.Button,
member_id: 3,
_version_: 143490402154342
  }
]
  }
}

Thanks






--
View this message in context: 
http://lucene.472066.n3.nabble.com/MultiValue-tp4034305p4062863.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: multiValued schema example (SOLVED)

2013-05-13 Thread manju16832003
Hi All,
I managed to *solve* the issue I had posted earlier with respect to
multiValued.

Here is the Query suppose to configured this way in *data-config.xml *
Description: in the below, first query has associated table images. Each
person would have many images. Here the JSON/XML would return all the images
associated with the person in block.

document name=manju
  entity name=list dataSource=manjudb query=select
member_id,MemberName FROM member
entity name=imagelist dataSource=manjudb query=SELECT 
imagepath FROM
images WHERE member_id='${list.member_id}'
/entity
  /entity
/document

and *schema.xml* for the above query fields looks like

field name=member_id type=int indexed=true stored=true
required=true/ 
field name=MemberName type=string indexed=true stored=true/
field name=imagepath type=string indexed=true stored=true
bmultiValued=true* /

Output:
response: {
numFound: 3,
start: 0,
docs: [
  {
MemberName: Vettel,
member_id: 1,
_version_: 1434904021528739800
  },
  {
MemberName: Schumacher,
member_id: 2,
imagepath: [ //As you could see here that Three rows from the
table *images* returned as one JSON object.
  c:images\\etc\\test.jpg,
  c:images\\etc\\test211.jpg,
  c:images\\etc\\test2343434.jpg,
  C:manju
],
_version_: 1434904021541322800
  },
  {
MemberName: J.Button,
member_id: 3,
_version_: 143490402154342
  }
]
  }
}

Thanks





--
View this message in context: 
http://lucene.472066.n3.nabble.com/multiValued-schema-example-SOLVED-tp4062209p4062864.html
Sent from the Solr - User mailing list archive at Nabble.com.


Best way to design a story and comments schema.

2013-05-13 Thread samabhiK
Hi, I wish to know how to best design a schema to store comments in stories /
articles posted.
I have a set of fields:
   /   lt;field name=quot;subjectquot; type=quot;text_generalquot;
indexed=quot;truequot; stored=quot;truequot;/gt;
   lt;field name=quot;keywordsquot; type=quot;text_generalquot;
indexed=quot;truequot; stored=quot;truequot;/gt;
   lt;field name=quot;categoryquot; type=quot;text_generalquot;
indexed=quot;truequot; stored=quot;truequot;/gt;
   lt;field name=quot;contentquot; type=quot;text_generalquot;
indexed=quot;falsequot; stored=quot;truequot; /gt;   /
Users can post their comments on a post and I should be able to retrieve
these comments and show it along side the original post. I only need to show
the last 3 comments and show a facet of the remaining comments which user
can click and see the rest of the comments ( something like facebook does ).
One alternative, I could think of, was adding a dynamic field for all
comments : 
/lt;dynamicField name=quot;comment_*quot;  type=quot;stringquot; 
indexed=quot;falsequot;  stored=quot;truequot;/gt;/
So, to store each comments, I would send a text to solr of the form -
For Field Name: /comment_n/ Value:/[Commenter Name]:[Commenter ID]:[Actual
Comment Text]/
And to keep the count of those comments, I could use another field like so
:/lt;field name=quot;comment_countquot; type=quot;intquot;
indexed=quot;truequot; stored=quot;truequot;/gt;/
With this approach, I will have to do some calculation when a comment is
deleted by the user but I still can manage to show the comments right.
My idea is to find the best solution for this scenario which will be fast
and also be simple. 
Kindly suggest.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Best-way-to-design-a-story-and-comments-schema-tp4062867.html
Sent from the Solr - User mailing list archive at Nabble.com.

Solr Sorting Algorithm

2013-05-13 Thread Sandeep Mestry
Good Morning All,

The alphabetical sorting is causing slight issues as below:

I have 3 documents with title value as below:

1) Acer Palmatum (Tree)
2) Aceraceae (Tree Family)
3) Acer Pseudoplatanus (Tree)

I have created title_sort field which is defined with field type as
alphaNumericalSort (that comes with solr example schema)

When I apply the sort order (sort=title_sort asc), I get the results as:

Aceraceae (Tree Family)
Acer Palmatum (Tree)
Acer Pseudoplatanus (Tree)

But, the expected order is (spaces first),

Acer Palmatum (Tree)
Acer Pseudoplatanus (Tree)
Aceraceae (Tree Family)

My unit test contains Collections.sort method and I get the expected
results but I'm not sure why Solr is doing it in different way.

From Collections.sort API, I can see that it uses modified merge sort,
could you tell me which algorithm solr follows for sorting logic and also
if there is any other approach I can take?

Many Thanks,
Sandeep


Mandatory words search in SOLR

2013-05-13 Thread Kamal Palei
Hi SOLR Experts
When I search documents with keyword as *java, mysql* then I get the
documents containing either *java* or *mysql* or both.

Is it possible to get the documents those contains both *java* and *mysql*.

In that case, how the query would look like.

Thanks a lot
Kamal


Re: Mandatory words search in SOLR

2013-05-13 Thread Rafał Kuć
Hello!

Change  the  default  query  operator. For example add the q.op=AND to
your query.

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

 Hi SOLR Experts
 When I search documents with keyword as *java, mysql* then I get the
 documents containing either *java* or *mysql* or both.

 Is it possible to get the documents those contains both *java* and *mysql*.

 In that case, how the query would look like.

 Thanks a lot
 Kamal



CJK question

2013-05-13 Thread Bernd Fehling
A question about CJK, how will U+3000 be handled?

U+3000 belongs to CJK Symbols and Punctuation and is named IDEOGRAPHIC 
SPACE.

Is it wrong if I just map it to U+0020 (SPACE)?

What is CJK Analyzer doing with U+3000?

If two CJK words have U+3000 inside, does it mean these two CJK words
belong together and changing U+3000 to U+0020 will break the meaning of the
whole CJK word?

Actually I have no idea about CJK.
Any help welcome.

Bernd


RE: CJK question

2013-05-13 Thread Markus Jelsma
Hi,

It uses the StandardAnalyzer which does split on IDEOGRAPHIC SPACE.

Cheers,
Markus
 
 
-Original message-
 From:Bernd Fehling bernd.fehl...@uni-bielefeld.de
 Sent: Mon 13-May-2013 13:36
 To: solr-user@lucene.apache.org
 Subject: CJK question
 
 A question about CJK, how will U+3000 be handled?
 
 U+3000 belongs to CJK Symbols and Punctuation and is named IDEOGRAPHIC 
 SPACE.
 
 Is it wrong if I just map it to U+0020 (SPACE)?
 
 What is CJK Analyzer doing with U+3000?
 
 If two CJK words have U+3000 inside, does it mean these two CJK words
 belong together and changing U+3000 to U+0020 will break the meaning of the
 whole CJK word?
 
 Actually I have no idea about CJK.
 Any help welcome.
 
 Bernd
 


Re: Mandatory words search in SOLR

2013-05-13 Thread Kamal Palei
Hi Rafał Kuć
I added q.op=AND as per you suggested. I see though some initial record
document contains both keywords (*java* and *mysql*), towards end I see
still there are number of
documents, they have only one key word either *java* or *mysql*.

Is it the SOLR behaviour or can I ask for a *strict search only if all my
keywords are present, then only* *fetch record* else not.

BR,
Kamal



On Mon, May 13, 2013 at 4:02 PM, Rafał Kuć r@solr.pl wrote:

 Hello!

 Change  the  default  query  operator. For example add the q.op=AND to
 your query.

 --
 Regards,
  Rafał Kuć
  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

  Hi SOLR Experts
  When I search documents with keyword as *java, mysql* then I get the
  documents containing either *java* or *mysql* or both.

  Is it possible to get the documents those contains both *java* and
 *mysql*.

  In that case, how the query would look like.

  Thanks a lot
  Kamal




Re: Mandatory words search in SOLR

2013-05-13 Thread François Schiettecatte
Kamal

You could also use the 'mm' parameter to require a minimum match, or you could 
prepend '+' to each required term.

Cheers

François


On May 13, 2013, at 7:57 AM, Kamal Palei palei.ka...@gmail.com wrote:

 Hi Rafał Kuć
 I added q.op=AND as per you suggested. I see though some initial record
 document contains both keywords (*java* and *mysql*), towards end I see
 still there are number of
 documents, they have only one key word either *java* or *mysql*.
 
 Is it the SOLR behaviour or can I ask for a *strict search only if all my
 keywords are present, then only* *fetch record* else not.
 
 BR,
 Kamal
 
 
 
 On Mon, May 13, 2013 at 4:02 PM, Rafał Kuć r@solr.pl wrote:
 
 Hello!
 
 Change  the  default  query  operator. For example add the q.op=AND to
 your query.
 
 --
 Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch
 
 Hi SOLR Experts
 When I search documents with keyword as *java, mysql* then I get the
 documents containing either *java* or *mysql* or both.
 
 Is it possible to get the documents those contains both *java* and
 *mysql*.
 
 In that case, how the query would look like.
 
 Thanks a lot
 Kamal
 
 



Re: Log Monitor System for SolrCloud and Logging to log4j at SolrCloud?

2013-05-13 Thread Furkan KAMACI
Sorry but do you mean that I can use log4j with Solr 4.2.1?

2013/5/6 Steve Rowe sar...@gmail.com

 Done - see http://markmail.org/message/66vpwk42ih6uxps7

 On May 6, 2013, at 5:29 AM, Furkan KAMACI furkankam...@gmail.com wrote:

  Is there any road map for Solr when will Solr 4.3 be tagged at svn?
 
  2013/4/26 Mark Miller markrmil...@gmail.com
 
  Slf4j is meant to work with existing frameworks - you can set it up to
  work with log4j, and Solr will use log4j by default in the about to be
  released 4.3.
 
  http://wiki.apache.org/solr/SolrLogging
 
  - Mark
 
  On Apr 26, 2013, at 7:19 AM, Furkan KAMACI furkankam...@gmail.com
 wrote:
 
  I want to use GrayLog2 to monitor my logging files for SolrCloud.
  However I
  think that GrayLog2 works with log4j and logback. Solr uses slf4j.
  How can I solve this problem and what logging monitoring system does
  folks
  use?
 
 




Solr Licensing (Sizzle)

2013-05-13 Thread Polhodzik Peter (ext)


In the source code of Apache Solr 4.2.0 there is an unclear license reference in



* \solr-4.2.0\solr\webapp\web\js\lib\jquery-1.7.2.min.js

and

* \solr-4.2.0\solr\webapp\web\js\require.jstxt



Can you please tell me what kind of license does this refer to exactly:

* Sizzle CSS Selector Engine

* Copyright 2011, The Dojo Foundation

* Released under the MIT, BSD, and GPL Licenses.

* More information: http://sizzlejs.com/;



* Includes Sizzle.js

* http://sizzlejs.com/

* Copyright 2010, The Dojo Foundation

* Released under the MIT, BSD, and GPL Licenses.





Dojo Foundation says its not their business anymore.



1. Which version of GPL, which clause and copyright of BSD and MIT?

2. Is there a choice here? MIT or BSD or GPL? or all apply at the same time, 
hence the article and?

3. I cannot find Sizzle in the Solr distribution at all. Is it really included?





Thank you in advance



 Peter Polhodzik


Üdvözlettel / Best Regards,

Péter POLHODZIK
License Clearing Specialist
[cid:image001.jpg@01CE4D87.0B17B5B0]
Phone:

+36 (46) 5-17894

Fax:

+36 (46) 5-17801

peter.polhodzik@evosoft.commailto:peter.polhodzik@evosoft.com
evosoft Hungary Kft.
Arany János tér 1.
H-3508 Miskolc
www.evosoft.comhttp://www.evosoft.com/
---



maximum number of simultaneous threads

2013-05-13 Thread venkata
I am seeing the following in solrconfig.xml






It is possible to specific max number of threads for query time too?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/maximum-number-of-simultaneous-threads-tp4062903.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Mandatory words search in SOLR

2013-05-13 Thread Kamal Palei
Hi François
Thanks for input. The major problem I face is , I make use of Drupal (as a
framework) and apachesolr_module provided by Drupal. Where I am not sure,
how do I directly modify the query. However this is not a right forum to
ask Drupal related questions. If somebody here knows both Drupal 7 and SOLR
well, kindly let me know.


One more doubt, lets say I want to search some mandatory words and some
optional words. Say I want to search all documents those contains all Java,
mysql, php keywords along with atleast one keyword out of TCL, Perl,
Selenium.

*Basically I am looking at few mandatory keywords and few optional keywords.
*

Is it possible to search this way. If so, kindly guide me how the query
should look like.

Best Regards
Kamal














On Mon, May 13, 2013 at 5:31 PM, François Schiettecatte 
fschietteca...@gmail.com wrote:

 Kamal

 You could also use the 'mm' parameter to require a minimum match, or you
 could prepend '+' to each required term.

 Cheers

 François


 On May 13, 2013, at 7:57 AM, Kamal Palei palei.ka...@gmail.com wrote:

  Hi Rafał Kuć
  I added q.op=AND as per you suggested. I see though some initial record
  document contains both keywords (*java* and *mysql*), towards end I see
  still there are number of
  documents, they have only one key word either *java* or *mysql*.
 
  Is it the SOLR behaviour or can I ask for a *strict search only if all my
  keywords are present, then only* *fetch record* else not.
 
  BR,
  Kamal
 
 
 
  On Mon, May 13, 2013 at 4:02 PM, Rafał Kuć r@solr.pl wrote:
 
  Hello!
 
  Change  the  default  query  operator. For example add the q.op=AND to
  your query.
 
  --
  Regards,
  Rafał Kuć
  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
 ElasticSearch
 
  Hi SOLR Experts
  When I search documents with keyword as *java, mysql* then I get the
  documents containing either *java* or *mysql* or both.
 
  Is it possible to get the documents those contains both *java* and
  *mysql*.
 
  In that case, how the query would look like.
 
  Thanks a lot
  Kamal
 
 




Re: Best way to design a story and comments schema.

2013-05-13 Thread Jack Krupansky
Try the simplest, cleanest design first (at least on paper), before you 
start resorting to either dynamic fields or multi-valued fields or other 
messy approaches. Like, one collection for stories, which would have a story 
id and a second collection for comments, each with a comment id and a field 
that is the associated story id and user id. And a third collection for 
users and their profiles. Identify the user and get their user id. Identify 
the story (maybe by keyword search) to get story id. Then identify and facet 
user comments by story id and user id and whatever other search criteria, 
and then facet on that.


-- Jack Krupansky

-Original Message- 
From: samabhiK

Sent: Monday, May 13, 2013 5:24 AM
To: solr-user@lucene.apache.org
Subject: Best way to design a story and comments schema.

Hi, I wish to know how to best design a schema to store comments in stories 
/

articles posted.
I have a set of fields:
  /   lt;field name=quot;subjectquot; type=quot;text_generalquot;
indexed=quot;truequot; stored=quot;truequot;/gt;
  lt;field name=quot;keywordsquot; type=quot;text_generalquot;
indexed=quot;truequot; stored=quot;truequot;/gt;
  lt;field name=quot;categoryquot; type=quot;text_generalquot;
indexed=quot;truequot; stored=quot;truequot;/gt;
  lt;field name=quot;contentquot; type=quot;text_generalquot;
indexed=quot;falsequot; stored=quot;truequot; /gt;   /
Users can post their comments on a post and I should be able to retrieve
these comments and show it along side the original post. I only need to show
the last 3 comments and show a facet of the remaining comments which user
can click and see the rest of the comments ( something like facebook does ).
One alternative, I could think of, was adding a dynamic field for all
comments :
/lt;dynamicField name=quot;comment_*quot;  type=quot;stringquot;
indexed=quot;falsequot;  stored=quot;truequot;/gt;/
So, to store each comments, I would send a text to solr of the form -
For Field Name: /comment_n/ Value:/[Commenter Name]:[Commenter ID]:[Actual
Comment Text]/
And to keep the count of those comments, I could use another field like so
:/lt;field name=quot;comment_countquot; type=quot;intquot;
indexed=quot;truequot; stored=quot;truequot;/gt;/
With this approach, I will have to do some calculation when a comment is
deleted by the user but I still can manage to show the comments right.
My idea is to find the best solution for this scenario which will be fast
and also be simple.
Kindly suggest.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Best-way-to-design-a-story-and-comments-schema-tp4062867.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: maximum number of simultaneous threads

2013-05-13 Thread Dmitry Kan
venkata,

only blank lines between ..in solrconfig.xml and Is it possible.. have
arrived.


On Mon, May 13, 2013 at 3:25 PM, venkata vmarr...@yahoo.com wrote:

 I am seeing the following in solrconfig.xml






 It is possible to specific max number of threads for query time too?




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/maximum-number-of-simultaneous-threads-tp4062903.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Best way to design a story and comments schema.

2013-05-13 Thread samabhiK
Thanks for your reply.

I generally get confused by a collection and a core. But just FYI, I do have
two cores at the moment - one for the users and another for the Stories.
Initially I thought of adding an extra core for the Comments too but
realized that it would mean multiple HTTP calls to fetch both the story and
the comments. Also, when a story is deleted, so should be its comments.
Having that spread across two cores might cause issues with transaction when
I delete the story and try to delete the respective comments? Or when I
delete the User and all hos stories and comments?

I really wish to understand how that works.

Sam



 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Best-way-to-design-a-story-and-comments-schema-tp4062867p4062913.html
Sent from the Solr - User mailing list archive at Nabble.com.


Getting explain information of more like this search in a more usable format

2013-05-13 Thread Achim Domma
Hi,

I'm executing a more like this search using the MoreLikeThisHandler. I can add 
score to the fields to be returned, but that's all I could find about getting 
information about how/why documents match. I would like to give my users more 
hints why documents are similar, so I would like to display important 
overlapping terms. If I specify debugQuery=true the result contains a explain 
section which is quite detailed, but in a text format I would have to parse. Is 
there a way to get this kind of information in a more usable way which does not 
force me to use a debug-flag? I'm mainly interested in showing the terms which 
each result document has in common with the reference document.

regards,
Achim

Solr fullname search

2013-05-13 Thread trukmuch
Hi,
I'm trying to set up a fullname search in Solr. Until now I thought my work
was fine until I've found something strange, and I can't figure out how to
correct it.

So I want to be able to do searches on full names. My index is a database
where I get first name and last name and put them in one multivalued field
with keyword tokenizer.

Here's my fieldtype :
fieldType name=text_auto class=solr.TextField
  analyzer
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.ASCIIFoldingFilterFactory/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType

 Everything works fine, I can search only a first name OR lastname and it
gives me the full names that exists, and it also works for full names in any
order if there's no mispelling.

I just noticed something wrong ! For example, if I ask for Dupont dupont,
it'll give me every Dupont that exists, even the ones for which the first
name doesn't match with dupont. It seems that for each word in the query, it
searches in the full name once. The problem is that if they're looking for
dupont d, they'll find every Dupont that exist because d is contained in
Dupont ! That's not what I want, in that case, I want to find every Dupont
with a d in their first name (the other string).

So I need to find a way to make it work, I tried many different tokenizers
and filters but I'm affraid it's not possible... FYI, I'm using SolrJ,
q.op=AND and wildcards (*) in front and behind every word queried. Thank you
for any help you could provide me !





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-fullname-search-tp4062919.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Best way to design a story and comments schema.

2013-05-13 Thread Jack Krupansky

There are no transactions in Solr. Delete the Story and then the comments.

Core is just the old Solr terminology. A collection is the data itself, 
like the data on the disk. And with SolrCloud, the collection terminology is 
required.


How much data will hou have. I mean, a news article could have thousands of 
comments. Do you want to be able to search through them? Solr has no 
provision for searching across an arbitrary number of dynamic fields. I 
mean, if you want a query to search in a field, you need to name the field 
in either the query, or qf even for dismax, which makes query across 
arbitrary columns unworkable.


Multiple HTTP requests should not be a problem, especially if each of them 
is shorter. Are you running into some problem?


Technically, you could also do a custom search component that did a lot of 
the multi-query processing inside Solr, but once again, it is best to start 
with a simple design first.


-- Jack Krupansky

-Original Message- 
From: samabhiK

Sent: Monday, May 13, 2013 8:55 AM
To: solr-user@lucene.apache.org
Subject: Re: Best way to design a story and comments schema.

Thanks for your reply.

I generally get confused by a collection and a core. But just FYI, I do have
two cores at the moment - one for the users and another for the Stories.
Initially I thought of adding an extra core for the Comments too but
realized that it would mean multiple HTTP calls to fetch both the story and
the comments. Also, when a story is deleted, so should be its comments.
Having that spread across two cores might cause issues with transaction when
I delete the story and try to delete the respective comments? Or when I
delete the User and all hos stories and comments?

I really wish to understand how that works.

Sam







--
View this message in context: 
http://lucene.472066.n3.nabble.com/Best-way-to-design-a-story-and-comments-schema-tp4062867p4062913.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Getting explain information of more like this search in a more usable format

2013-05-13 Thread Jack Krupansky
Try debug.explain.structured=true, which will give you an XML response that 
can be traversed.


Don't worry about the fact that these features are labeled debug - they 
are there simply to explain what is happening. Is there some particular 
concern you have about them being labeled debug? Although, you are not the 
first person to complain! What if Solr simply renamed these features with 
the term detail instead of debug - would that cure your concern?!


-- Jack Krupansky

-Original Message- 
From: Achim Domma

Sent: Monday, May 13, 2013 9:12 AM
To: solr-user@lucene.apache.org
Subject: Getting explain information of more like this search in a more 
usable format


Hi,

I'm executing a more like this search using the MoreLikeThisHandler. I can 
add score to the fields to be returned, but that's all I could find about 
getting information about how/why documents match. I would like to give my 
users more hints why documents are similar, so I would like to display 
important overlapping terms. If I specify debugQuery=true the result 
contains a explain section which is quite detailed, but in a text format I 
would have to parse. Is there a way to get this kind of information in a 
more usable way which does not force me to use a debug-flag? I'm mainly 
interested in showing the terms which each result document has in common 
with the reference document.


regards,
Achim= 



Re: Best way to design a story and comments schema.

2013-05-13 Thread samabhiK
I think I got your point.

So, what I will create are three cores (or collections) - one for the users,
one for the stories and the last one for comments. 

When I need to find all the stories posted by a single user, I first need to
search the stories core with a unique userid in the filter and then run
another query to fetch the collection of comments. Correct?

Also, I have no such requirement to search through the comments and its
mostly a storage filed for me. So, do you think I should shift that into a
DB from where I may query the comments? Or will it be too costly for Solr to
just plain store that data in a core? Which would be the best option here?

Also, the idea of custom search component sounds great. But as you said, I
will first try this out with a simple possible setup and then go from there.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Best-way-to-design-a-story-and-comments-schema-tp4062867p4062929.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Best way to design a story and comments schema.

2013-05-13 Thread Jack Park
Jack,

Why are multi-valued fields considered messy?
I think I am about to learn something..

Thanks
Another Jack

On Mon, May 13, 2013 at 5:29 AM, Jack Krupansky j...@basetechnology.com wrote:
 Try the simplest, cleanest design first (at least on paper), before you
 start resorting to either dynamic fields or multi-valued fields or other
 messy approaches. Like, one collection for stories, which would have a story
 id and a second collection for comments, each with a comment id and a field
 that is the associated story id and user id. And a third collection for
 users and their profiles. Identify the user and get their user id. Identify
 the story (maybe by keyword search) to get story id. Then identify and facet
 user comments by story id and user id and whatever other search criteria,
 and then facet on that.

 -- Jack Krupansky

 -Original Message- From: samabhiK
 Sent: Monday, May 13, 2013 5:24 AM
 To: solr-user@lucene.apache.org
 Subject: Best way to design a story and comments schema.


 Hi, I wish to know how to best design a schema to store comments in stories
 /
 articles posted.
 I have a set of fields:
   /   lt;field name=quot;subjectquot; type=quot;text_generalquot;
 indexed=quot;truequot; stored=quot;truequot;/gt;
   lt;field name=quot;keywordsquot; type=quot;text_generalquot;
 indexed=quot;truequot; stored=quot;truequot;/gt;
   lt;field name=quot;categoryquot; type=quot;text_generalquot;
 indexed=quot;truequot; stored=quot;truequot;/gt;
   lt;field name=quot;contentquot; type=quot;text_generalquot;
 indexed=quot;falsequot; stored=quot;truequot; /gt;   /
 Users can post their comments on a post and I should be able to retrieve
 these comments and show it along side the original post. I only need to show
 the last 3 comments and show a facet of the remaining comments which user
 can click and see the rest of the comments ( something like facebook does ).
 One alternative, I could think of, was adding a dynamic field for all
 comments :
 /lt;dynamicField name=quot;comment_*quot;  type=quot;stringquot;
 indexed=quot;falsequot;  stored=quot;truequot;/gt;/
 So, to store each comments, I would send a text to solr of the form -
 For Field Name: /comment_n/ Value:/[Commenter Name]:[Commenter ID]:[Actual
 Comment Text]/
 And to keep the count of those comments, I could use another field like so
 :/lt;field name=quot;comment_countquot; type=quot;intquot;
 indexed=quot;truequot; stored=quot;truequot;/gt;/
 With this approach, I will have to do some calculation when a comment is
 deleted by the user but I still can manage to show the comments right.
 My idea is to find the best solution for this scenario which will be fast
 and also be simple.
 Kindly suggest.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Best-way-to-design-a-story-and-comments-schema-tp4062867.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Mandatory words search in SOLR

2013-05-13 Thread Kamal Palei
Hi François
As per  suggestion, I used 'mm' param and was able to do search for
mandatory fields.

In Drupal, one need to do as

 $query-addParam('mm' ,  '100%');

in query alter hook.

Thanks a lot for guiding me.

Best Regards
Kamal







On Mon, May 13, 2013 at 5:56 PM, Kamal Palei palei.ka...@gmail.com wrote:

 Hi François
 Thanks for input. The major problem I face is , I make use of Drupal (as a
 framework) and apachesolr_module provided by Drupal. Where I am not sure,
 how do I directly modify the query. However this is not a right forum to
 ask Drupal related questions. If somebody here knows both Drupal 7 and SOLR
 well, kindly let me know.


 One more doubt, lets say I want to search some mandatory words and some
 optional words. Say I want to search all documents those contains all Java,
 mysql, php keywords along with atleast one keyword out of TCL, Perl,
 Selenium.

 *Basically I am looking at few mandatory keywords and few optional
 keywords.*

 Is it possible to search this way. If so, kindly guide me how the query
 should look like.

 Best Regards
 Kamal














 On Mon, May 13, 2013 at 5:31 PM, François Schiettecatte 
 fschietteca...@gmail.com wrote:

 Kamal

 You could also use the 'mm' parameter to require a minimum match, or you
 could prepend '+' to each required term.

 Cheers

 François


 On May 13, 2013, at 7:57 AM, Kamal Palei palei.ka...@gmail.com wrote:

  Hi Rafał Kuć
  I added q.op=AND as per you suggested. I see though some initial record
  document contains both keywords (*java* and *mysql*), towards end I see
  still there are number of
  documents, they have only one key word either *java* or *mysql*.
 
  Is it the SOLR behaviour or can I ask for a *strict search only if all
 my
  keywords are present, then only* *fetch record* else not.
 
  BR,
  Kamal
 
 
 
  On Mon, May 13, 2013 at 4:02 PM, Rafał Kuć r@solr.pl wrote:
 
  Hello!
 
  Change  the  default  query  operator. For example add the q.op=AND to
  your query.
 
  --
  Regards,
  Rafał Kuć
  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
 ElasticSearch
 
  Hi SOLR Experts
  When I search documents with keyword as *java, mysql* then I get the
  documents containing either *java* or *mysql* or both.
 
  Is it possible to get the documents those contains both *java* and
  *mysql*.
 
  In that case, how the query would look like.
 
  Thanks a lot
  Kamal
 
 





Can we search some mandatory words and some optional words in SOLR

2013-05-13 Thread Kamal Palei
Dear SOLR Experts
Llets say I want to search some mandatory words and some optional words.
Say I want to search all documents those contains all *Java, mysql,
php*keywords along with atleast one keyword out of
* TCL, Perl, Selenium*.

*Basically I am looking at few mandatory keywords and few optional keywords.
*

Is it possible to search this way. If so, kindly guide me how the query
should look like.

Best Regards
Kamal


Re: Log Monitor System for SolrCloud and Logging to log4j at SolrCloud?

2013-05-13 Thread Shawn Heisey
On 5/13/2013 6:09 AM, Furkan KAMACI wrote:
 Sorry but do you mean that I can use log4j with Solr 4.2.1?

You can.  You need to obtain a war without any slf4j jars, which you can
do by unpacking the original war, deleting the jars, and repackaging it.
 You can also build from source with the dist-excl-slf4j or
dist-war-excl-slf4j build target.

After you have the war without slf4j, then you need to put the proper
slf4j and log4j jars into the classpath - for jetty, this is typically
lib/ext.  See the 4.3.0 download - it has the proper jars in its
example/lib/ext directory.

Thanks,
Shawn



Re: Can we search some mandatory words and some optional words in SOLR

2013-05-13 Thread Jack Krupansky

That's simply a standard, old-fashioned Lucene query:

+Java +mysql +php TCL Perl Selenium

And you can decide if min should match (mm) is 0, 1, 2, 3, etc. for the 
optional terms (TCL, Perl, Selenium)


-- Jack Krupansky

-Original Message- 
From: Kamal Palei

Sent: Monday, May 13, 2013 9:56 AM
To: solr-user@lucene.apache.org
Subject: Can we search some mandatory words and some optional words in SOLR

Dear SOLR Experts
Llets say I want to search some mandatory words and some optional words.
Say I want to search all documents those contains all *Java, mysql,
php*keywords along with atleast one keyword out of
* TCL, Perl, Selenium*.

*Basically I am looking at few mandatory keywords and few optional keywords.
*

Is it possible to search this way. If so, kindly guide me how the query
should look like.

Best Regards
Kamal 



Re: Best way to design a story and comments schema.

2013-05-13 Thread Jack Krupansky
Multi-valued fields don't have the same full support as simple fields and 
documents (since they are effectively a sub-document). Although we do now 
have the ability to add to a multi-valued field with atomic update, we 
can't directly edit them, like delete/replace the kth item or insert 
before/after an item, sort them by various criteria, etc. And a query won't 
tell you which entry matched. And you can't narrow your query to search a 
subset of a multi-valued field.


They do work well for short lists, but not Big Data. Listing a few 
authors for a book is fine. But trying to do hundreds, thousands, and more, 
is quite problematic. There was a recent issue on the list about how 
multi-valued field values are sometimes handled inefficiently in Solr.


-- Jack Krupansky

-Original Message- 
From: Jack Park

Sent: Monday, May 13, 2013 9:44 AM
To: solr-user@lucene.apache.org
Subject: Re: Best way to design a story and comments schema.

Jack,

Why are multi-valued fields considered messy?
I think I am about to learn something..

Thanks
Another Jack

On Mon, May 13, 2013 at 5:29 AM, Jack Krupansky j...@basetechnology.com 
wrote:

Try the simplest, cleanest design first (at least on paper), before you
start resorting to either dynamic fields or multi-valued fields or other
messy approaches. Like, one collection for stories, which would have a 
story
id and a second collection for comments, each with a comment id and a 
field

that is the associated story id and user id. And a third collection for
users and their profiles. Identify the user and get their user id. 
Identify
the story (maybe by keyword search) to get story id. Then identify and 
facet

user comments by story id and user id and whatever other search criteria,
and then facet on that.

-- Jack Krupansky

-Original Message- From: samabhiK
Sent: Monday, May 13, 2013 5:24 AM
To: solr-user@lucene.apache.org
Subject: Best way to design a story and comments schema.


Hi, I wish to know how to best design a schema to store comments in 
stories

/
articles posted.
I have a set of fields:
  /   lt;field name=quot;subjectquot; type=quot;text_generalquot;
indexed=quot;truequot; stored=quot;truequot;/gt;
  lt;field name=quot;keywordsquot; type=quot;text_generalquot;
indexed=quot;truequot; stored=quot;truequot;/gt;
  lt;field name=quot;categoryquot; type=quot;text_generalquot;
indexed=quot;truequot; stored=quot;truequot;/gt;
  lt;field name=quot;contentquot; type=quot;text_generalquot;
indexed=quot;falsequot; stored=quot;truequot; /gt;   /
Users can post their comments on a post and I should be able to retrieve
these comments and show it along side the original post. I only need to 
show

the last 3 comments and show a facet of the remaining comments which user
can click and see the rest of the comments ( something like facebook 
does ).

One alternative, I could think of, was adding a dynamic field for all
comments :
/lt;dynamicField name=quot;comment_*quot;  type=quot;stringquot;
indexed=quot;falsequot;  stored=quot;truequot;/gt;/
So, to store each comments, I would send a text to solr of the form -
For Field Name: /comment_n/ Value:/[Commenter Name]:[Commenter ID]:[Actual
Comment Text]/
And to keep the count of those comments, I could use another field like so
:/lt;field name=quot;comment_countquot; type=quot;intquot;
indexed=quot;truequot; stored=quot;truequot;/gt;/
With this approach, I will have to do some calculation when a comment is
deleted by the user but I still can manage to show the comments right.
My idea is to find the best solution for this scenario which will be fast
and also be simple.
Kindly suggest.



--
View this message in context:
http://lucene.472066.n3.nabble.com/Best-way-to-design-a-story-and-comments-schema-tp4062867.html
Sent from the Solr - User mailing list archive at Nabble.com. 




Quick question about indexing with SolrJ.

2013-05-13 Thread Luis Cappa Banda
Is it possible to index plain String JSON documents using SolrJ? I already
know annotating POJOs works fine, but I need a more flexible way to index
data without any intermediate POJO.

That's because when changing, adding or removing new fields I don't want to
change continously that POJO again and again.


-- 
- Luis Cappa


Re: Solr Licensing (Sizzle)

2013-05-13 Thread Raymond Wiker
On May 13, 2013, at 14:15 , Polhodzik Peter (ext) 
peter.polhodzik@evosoft.com wrote:
 In the source code of Apache Solr 4.2.0 there is an unclear license reference 
 in
  
 · \solr-4.2.0\solr\webapp\web\js\lib\jquery-1.7.2.min.js
 and
 · \solr-4.2.0\solr\webapp\web\js\require.jstxt
  
 Can you please tell me what kind of license does this refer to exactly:
 “* Sizzle CSS Selector Engine
 
 * Copyright 2011, The Dojo Foundation
 
 * Released under the MIT, BSD, and GPL Licenses.
 
 * More information: http://sizzlejs.com/”
 
  
 
 “* Includes Sizzle.js
 
 * http://sizzlejs.com/
 
 * Copyright 2010, The Dojo Foundation
 
 * Released under the MIT, BSD, and GPL Licenses.”
 
  
  
 Dojo Foundation says its not their business anymore.
  
 1. Which version of GPL, which clause and copyright of BSD and MIT?
 2. Is there a choice here? MIT or BSD or GPL? or all apply at the same time, 
 hence the article and?
 3. I cannot find Sizzle in the Solr distribution at all. Is it really 
 included?
  
  

In my experience, the presence of several licenses normally indicate that you 
get to choose one of them, and not that they all apply. 

If you go to sizzlejs.com and follow the link to documentation, you'll find the 
file MIT-LICENSE.txt. This file indicates that the current license for 
sizzle.js is a variation of the MIT license. It also indicates that the current 
licensor of sizzle.js is the jQuery project and other contributors, so you 
should be able to get the definitive answer on licensing terms from the jQuery 
project.

Re: Quick question about indexing with SolrJ.

2013-05-13 Thread Jack Krupansky
Do your POJOs follow a simple flat data model that is 100% compatible with 
Solr?


If so, maybe you can simply ingest them by setting the Content-type to 
application/json and maybe having to put some minimal wrapper around the 
raw JSON.


But... if they DON'T follow a simple, flat data model, then YOU are going to 
have to transform their data into a format that does have a simple, flat 
data model.


-- Jack Krupansky

-Original Message- 
From: Luis Cappa Banda

Sent: Monday, May 13, 2013 10:52 AM
To: solr-user@lucene.apache.org
Subject: Quick question about indexing with SolrJ.

Is it possible to index plain String JSON documents using SolrJ? I already
know annotating POJOs works fine, but I need a more flexible way to index
data without any intermediate POJO.

That's because when changing, adding or removing new fields I don't want to
change continously that POJO again and again.


--
- Luis Cappa 



Re: Quick question about indexing with SolrJ.

2013-05-13 Thread Luis Cappa Banda
Hello, Jack.

I don't want to use POJOs, that's the main problem. I know that you can
send AJAX POST HTTP Requests with JSON data to index new documents and I
would like to do that with SolrJ, that's all, but I don't find the way to
do that, :-/ . What I would like to do is simple retrieve an String with an
embedded JSON and add() it via an HttpSolrServer object instance. If the
JSON matches the Solr server schema.xml or not it would be a server-side
problem, not a client-side one. I mean, I want to use a best effort and
more flexible way to index data, and using POJOs is not the way to do that:
you have to change the Java class, compile it again and relaunch whatever
the process that uses that Java class.

Regards,

- Luis Cappa


2013/5/13 Jack Krupansky j...@basetechnology.com

 Do your POJOs follow a simple flat data model that is 100% compatible with
 Solr?

 If so, maybe you can simply ingest them by setting the Content-type to
 application/json and maybe having to put some minimal wrapper around the
 raw JSON.

 But... if they DON'T follow a simple, flat data model, then YOU are going
 to have to transform their data into a format that does have a simple, flat
 data model.

 -- Jack Krupansky

 -Original Message- From: Luis Cappa Banda
 Sent: Monday, May 13, 2013 10:52 AM
 To: solr-user@lucene.apache.org
 Subject: Quick question about indexing with SolrJ.


 Is it possible to index plain String JSON documents using SolrJ? I already
 know annotating POJOs works fine, but I need a more flexible way to index
 data without any intermediate POJO.

 That's because when changing, adding or removing new fields I don't want to
 change continously that POJO again and again.


 --
 - Luis Cappa




-- 
- Luis Cappa


Re: Quick question about indexing with SolrJ.

2013-05-13 Thread Alexandre Rafalovitch
You can send JSON to Solr as update documents:
http://wiki.apache.org/solr/UpdateJSON. Not sure if SolrJ supports it,
but it is just an HTTP post, so you may not even need SolrJ.

But the issue is that your own JSON probably does not match JSON
expected by Solr. So, you need to map it somehow, right? Unless you
figured that part already.

Regards,
   Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Mon, May 13, 2013 at 11:51 AM, Luis Cappa Banda luisca...@gmail.com wrote:
 Hello, Jack.

 I don't want to use POJOs, that's the main problem. I know that you can
 send AJAX POST HTTP Requests with JSON data to index new documents and I
 would like to do that with SolrJ, that's all, but I don't find the way to
 do that, :-/ . What I would like to do is simple retrieve an String with an
 embedded JSON and add() it via an HttpSolrServer object instance. If the
 JSON matches the Solr server schema.xml or not it would be a server-side
 problem, not a client-side one. I mean, I want to use a best effort and
 more flexible way to index data, and using POJOs is not the way to do that:
 you have to change the Java class, compile it again and relaunch whatever
 the process that uses that Java class.

 Regards,

 - Luis Cappa


 2013/5/13 Jack Krupansky j...@basetechnology.com

 Do your POJOs follow a simple flat data model that is 100% compatible with
 Solr?

 If so, maybe you can simply ingest them by setting the Content-type to
 application/json and maybe having to put some minimal wrapper around the
 raw JSON.

 But... if they DON'T follow a simple, flat data model, then YOU are going
 to have to transform their data into a format that does have a simple, flat
 data model.

 -- Jack Krupansky

 -Original Message- From: Luis Cappa Banda
 Sent: Monday, May 13, 2013 10:52 AM
 To: solr-user@lucene.apache.org
 Subject: Quick question about indexing with SolrJ.


 Is it possible to index plain String JSON documents using SolrJ? I already
 know annotating POJOs works fine, but I need a more flexible way to index
 data without any intermediate POJO.

 That's because when changing, adding or removing new fields I don't want to
 change continously that POJO again and again.


 --
 - Luis Cappa




 --
 - Luis Cappa


Re: maximum number of simultaneous threads

2013-05-13 Thread venkata

   



I am seeing  configuration point for indexing threads. 

However I am not finding anything for search.   How many simultaneous
threads, SOLR can spin during search time?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/maximum-number-of-simultaneous-threads-tp4062903p4062982.html
Sent from the Solr - User mailing list archive at Nabble.com.


Making protwords.txt changes effective

2013-05-13 Thread Shane Magee
Hi

I added some words to protwords.txt, but there doesnt seem to be any effect
in the resulting search. Do I need to restart Apache or Solr or rebuild the
index?


Re: Need solr query help

2013-05-13 Thread smsolr
Hi Abhishek,

I've had a look into this problem and have come up with a solution.

Following instructions assume you have downloaded the 4.3.0 release of Solr
from:-

http://www.apache.org/dyn/closer.cgi/lucene/solr/4.3.0

First add to:-

solr-4.3.0/solr/example/solr/collection1/conf/schema.xml

the following:-

   field name=shopLocation type=location indexed=true stored=true/
   field name=shopMaxDeliveryDistance type=float indexed=true
stored=true/

after the id field:-

   field name=id type=string indexed=true stored=true
required=true multiValued=false / 

Then start solr by going to:-

solr-4.3.0/solr/example

and running:-

java -jar start.jar

Then change into your solr-4.3.0/solr/example/exampledocs directory and
write the following text to a new file called shops.xml:-

add
doc
field name=id2468/field
field name=nameShop A/field
   field name=shopLocation0.1,0.1/field
   field name=shopMaxDeliveryDistance10/field
/doc
doc
field name=id2469/field
field name=nameShop B/field
   field name=shopLocation0.2,0.2/field
   field name=shopMaxDeliveryDistance35/field
/doc
doc
field name=id2470/field
field name=nameShop C/field
   field name=shopLocation0.9,0.1/field
   field name=shopMaxDeliveryDistance25/field
/doc
doc
field name=id2480/field
field name=nameShop D/field
   field name=shopLocation0.3,0.2/field
   field name=shopMaxDeliveryDistance50/field
/doc
/add


Now run:-

./post.sh shops.xml 

You should get back something like:-


Posting file shops.xml to http://localhost:8983/solr/update
?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeaderint name=status0/intint
name=QTime120/int/lst
/response

?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeaderint name=status0/intint
name=QTime46/int/lst
/response


The doing the following queries in your browser:-

All 4 shops:-

http://localhost:8983/solr/select?q=name:shopfl=name,shopLocation,shopMaxDeliveryDistance


All shops with distance from point 0.0,0.0 and ordered by distance from
point 0.0,0.0 (gives order A, B, D, C):-

http://localhost:8983/solr/select?q=name:shopfl=name,shopLocation,shopMaxDeliveryDistance,geodist%28shopLocation,0.0,0.0%29sort=geodist%28shopLocation,0.0,0.0%29%20asc


All shops with distance from point 0.0,0.0 and ordered by distance from
point 0.0,0.0 and filtered to eliminate all shops with distance from point
0.0,0.0 greater than shopMaxDeliveryDistance (gives shops  B and D):-

http://localhost:8983/solr/select?q=name:shopfl=name,shopLocation,shopMaxDeliveryDistance,geodist%28shopLocation,0.0,0.0%29sort=geodist%28shopLocation,0.0,0.0%29%20ascfq={!frange%20u=0}sub%28geodist%28shopLocation,0.0,0.0%29,shopMaxDeliveryDistance%29


To delete all shops so you can edit the file to play with it and repost the
shops:-

http://localhost:8983/solr/update?stream.body=deletequeryname:shop/query/deletecommit=true



smsolr



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Need-solr-query-help-tp4061800p4062591.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Disabling tf (term frequency) during indexing and/or scoring

2013-05-13 Thread tasmaniski
This is an old post, now there is a solution in SOLR 

omitTermFreqAndPositions=true

http://wiki.apache.org/solr/SchemaXml#Data_Types



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Disabling-tf-term-frequency-during-indexing-and-or-scoring-tp502956p4062595.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Need solr query help

2013-05-13 Thread smsolr
Hi Abhishek,

I forgot to explain why it works.  It uses the frange filter which is
mentioned here:-

http://wiki.apache.org/solr/CommonQueryParameters

and it works because it filters in results where the geodist minus the
shopMaxDeliveryDistance is less than zero (that's what the u=0 means, upper
limit=0), i.e.:-

geodist - shopMaxDeliveryDistance  0
-
geodist  shopMaxDeliveryDistance

i.e. the geodist is less than the shopMaxDeliveryDistance and so the shop is
within delivery range of the location specified.

smsolr



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Need-solr-query-help-tp4061800p4062603.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Making protwords.txt changes effective

2013-05-13 Thread Jack Krupansky

Yes, restart Solr. Not to reindex, but simply to reload the file.

Well... depending on where you use the protected words, you may need to 
reindex as well. For a query-time filter you don't need to reindex, but for 
index-time filters, you must reindex.


-- Jack Krupansky

-Original Message- 
From: Shane Magee

Sent: Saturday, May 11, 2013 7:42 AM
To: solr-user@lucene.apache.org
Subject: Making protwords.txt changes effective

Hi

I added some words to protwords.txt, but there doesnt seem to be any effect
in the resulting search. Do I need to restart Apache or Solr or rebuild the
index? 



Re: Looking for Best Practice of Spellchecker

2013-05-13 Thread Nicholas Ding
Thank you for you help, guys. I agreed, wall mart should be a synonyms,
it's not a good example.

I did an experiment by using KeywordTokenizer + DirectSolrSpellChecker, I
can get suggestion even for wall mart to walmart. But I don't know
whether it's a good practice or not. It's much like a workaround to me. And
for WordBreakSpellChecker, I haven't tried it yet. Does this spellchecker
break the word and concatenate them then give me collations?

Thanks


On Fri, May 10, 2013 at 11:34 AM, Dyer, James
james.d...@ingramcontent.comwrote:

 Good point, Jason.  In fact, even if you use WorkBreakSpellChecker wall
 mart will not correct to walmart.  The reason is the spellchecker cannot
 both correct a token's spelling *and* fix the wordbreak issue involving
 that same token.  So in this case a synonym is the way to go.

 James Dyer
 Ingram Content Group
 (615) 213-4311


 -Original Message-
 From: Jason Hellman [mailto:jhell...@innoventsolutions.com]
 Sent: Friday, May 10, 2013 9:55 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Looking for Best Practice of Spellchecker

 Nicholas,

 Also consider that some misspellings are better handled through Synonyms
 (or injected metadata).

 You can garner a great deal of value out of the spell checker by following
 the great advice James is giving here...but you'll find a well-placed
 helper synonym or metavalue can often save a lot of headache and time.

 Jason

 On May 10, 2013, at 7:32 AM, Dyer, James james.d...@ingramcontent.com
 wrote:

  Nicholas,
 
  It sounds like you might want to use WordBreakSolrSpellChecker, which
 gets obscure mention in the wiki.  Read through this section:
 http://wiki.apache.org/solr/SpellCheckComponent#Configuration and you
 will see some information.
 
  Also, the Solr Example shows how to configure this.  See
 http://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x/solr/example/solr/collection1/conf/solrconfig.xml
 
  Look for...
 
  lst name=spellchecker
   str name=namewordbreak/str
   ...
  /lst
 
  ...and...
 
  requestHandler name=/spell ...
  ...
  /requestHandler
 
  Also, I'd recommend you take a look at each parameter in the /spell
 request handler and read its section on the spellcheckcomponent wiki
 page.  You probably will want to set many of these parameters as well.
 
  You can get a query to return only spell results simply by specifying
 rows=0.  However, its one less query to just have it return the results
 also.  If there are no results, your application can check for collations
 and re-issue a collation query.  If there are both results and collations
 returned, you can give the user results with did-you-mean suggestions.
 
  James Dyer
  Ingram Content Group
  (615) 213-4311
 
 
  -Original Message-
  From: Nicholas Ding [mailto:nicholas...@gmail.com]
  Sent: Friday, May 10, 2013 8:47 AM
  To: solr-user@lucene.apache.org
  Subject: Looking for Best Practice of Spellchecker
 
  Hi guys,
 
  I'm working on a local search project, I wanna integrate spellchecker for
  the search.
 
  So basically, my search engines is used to search local businesses. For
  example, user could search for wall mart, here is a typo, I wanna
  spellchecker to give me Collation for walmart.
 
  My problems are:
  1. I use DirectSolrSpellChecker on my BusinessNameField and pass wall
  mart as phrase search, but I can't get collation from the spellchecker.
  2. I tried not to pass phrase search, but pass q=Wall AND Mart to force a
  100% match, but spellchecker can't give me collation also.
 
  I read the documents about spellchecker on Solr wiki, but it's very
 brief.
  I'm wondering is there any best practice of spellchecker, I believe it's
  widely used in the search, right?
 
  And I have another idea, I don't know whether it's valid or not. I want
 to
  apply spellchecker everything before doing the search, so that I could
 rely
  on the spellchecker to tell me whether my search could get result or not.
 
  Thanks
  Nicholas
 






How to improve performance of geodist()

2013-05-13 Thread Nicholas Ding
Hi guys,

I'm using geodist() in a recip boost function. I noticed a performance
impact to the response time. I did a profiling session, the geodist()
calculation took 30% of CPU time.

I'm wondering is there any alternative to Haversine function that can
reduce CPU calculation? I don't need very accurate float numbers when I use
geodist() in the boost function.

Thanks
Nicholas


Re: rename a core to same name of existing core

2013-05-13 Thread Jie Sun
did any one verified the following is ture?
 the Description on http://wiki.apache.org/solr/CoreAdmin#CREATE is:

 *quote*
 If a core with the same name exists, while the new created core is
 initalizing, the old one will continue to accept requests. Once it
 has finished, all new request will go to the new core, and the old
 core will be unloaded.
 */quote*

step1 - I have a core 'abc' with 30 documents in it:
http://myhost.com:8080/solr/abc/select/?q=type%3Amessageversion=2.2start=0rows=10indent=on

str name=rows10/str/lst/lstresult name=response numFound=30
start=0doc

step2 - then I create a new core with same name 'abc':
http://myhost.com:8080/solr/admin/cores?action=createname=abcinstanceDir=./

responselst name=responseHeaderint name=status0/intint
name=QTime303/int/lststr name=coreabc/strstr
name=saved/mxl/var/solr/solr.xml/str/response

step3 - I cleared out my browser cache

step4 - I did same query as in step1, got same results (30 documents):
http://myhost.com:8080/solr/abc/select/?q=type%3Amessageversion=2.2start=0rows=10indent=on

str name=rows10/str/lst/lstresult name=response numFound=30
start=0doc

I thought the old core should be unloaded?
did I misunderstand any thing here?

thanks
Jie



--
View this message in context: 
http://lucene.472066.n3.nabble.com/rename-a-core-to-same-name-of-existing-core-tp3090960p4063008.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SOLR guidance required

2013-05-13 Thread Lance Norskog

If this is for the US, remove the age range feature before you get sued.

On 05/09/2013 08:41 PM, Kamal Palei wrote:

Dear SOLR experts
I might be asking a very silly question. As I am new to SOLR kindly guide
me.


I have a job site. Using SOLR to search resumes. When a HR user enters some
keywords say JAVA, MySQL etc, I search resume documents using SOLR,
retrieve 100 records and show to user.

The problem I face is say, I retrieved 100 records, then we do filtering
for experience range, age range, salary range (using mysql query).
Sometimes it so happens that the 100 records I fetch , I do not get a
single record to show to user. When user clicks next link there might be
few records, it looks odd really.


I hope there must be some mechanism, by which I can associate salary,
experience, age etc with resume document during indexing. And when
I search for resumes I can give all filters accordingly and can retrieve
100 records and strait way I can show 100 records to user without doing any
mysql query. Please let me know if this is feasible. If so, kindly give me
some pointer how do I do it.

Best Regards
Kamal





Re: SOLR guidance required

2013-05-13 Thread Furkan KAMACI
Jason can you explain what you mean at here: Where OR operators apply,
this does not matter. But your Solr cache will be much more savvy with the
first construct.

2013/5/13 Lance Norskog goks...@gmail.com

 If this is for the US, remove the age range feature before you get sued.


 On 05/09/2013 08:41 PM, Kamal Palei wrote:

 Dear SOLR experts
 I might be asking a very silly question. As I am new to SOLR kindly guide
 me.


 I have a job site. Using SOLR to search resumes. When a HR user enters
 some
 keywords say JAVA, MySQL etc, I search resume documents using SOLR,
 retrieve 100 records and show to user.

 The problem I face is say, I retrieved 100 records, then we do filtering
 for experience range, age range, salary range (using mysql query).
 Sometimes it so happens that the 100 records I fetch , I do not get a
 single record to show to user. When user clicks next link there might be
 few records, it looks odd really.


 I hope there must be some mechanism, by which I can associate salary,
 experience, age etc with resume document during indexing. And when
 I search for resumes I can give all filters accordingly and can retrieve
 100 records and strait way I can show 100 records to user without doing
 any
 mysql query. Please let me know if this is feasible. If so, kindly give me
 some pointer how do I do it.

 Best Regards
 Kamal





Re: How to improve performance of geodist()

2013-05-13 Thread Yonik Seeley
On Mon, May 13, 2013 at 1:12 PM, Nicholas Ding nicholas...@gmail.com wrote:
 I'm using geodist() in a recip boost function. I noticed a performance
 impact to the response time. I did a profiling session, the geodist()
 calculation took 30% of CPU time.

Are you also using an fq with geofilt to narrow down the number of
documents that must be scored?

-Yonik
http://lucidworks.com


How to force a document to be indexed in a given shard at SolrCloud?

2013-05-13 Thread Furkan KAMACI
I want to run some test cases on SolrCloud at my pre-prototype system. How
can I force a document to be indexed in a given shard at SolrCloud (I use
Solr 4.2.1) ? Does something like shard.keys works for me?


Re: SOLR guidance required

2013-05-13 Thread Upayavira
Multiple fq params are ANDed. So if you have fq=clause1 AND clause2, you
should implement that as fq=clause1fq=clause2. However, if you want
fq=clause1 OR clause2, you have no choice but to keep it as a single
filter query.

Upayavira

On Mon, May 13, 2013, at 06:55 PM, Furkan KAMACI wrote:
 Jason can you explain what you mean at here: Where OR operators apply,
 this does not matter. But your Solr cache will be much more savvy with
 the
 first construct.
 
 2013/5/13 Lance Norskog goks...@gmail.com
 
  If this is for the US, remove the age range feature before you get sued.
 
 
  On 05/09/2013 08:41 PM, Kamal Palei wrote:
 
  Dear SOLR experts
  I might be asking a very silly question. As I am new to SOLR kindly guide
  me.
 
 
  I have a job site. Using SOLR to search resumes. When a HR user enters
  some
  keywords say JAVA, MySQL etc, I search resume documents using SOLR,
  retrieve 100 records and show to user.
 
  The problem I face is say, I retrieved 100 records, then we do filtering
  for experience range, age range, salary range (using mysql query).
  Sometimes it so happens that the 100 records I fetch , I do not get a
  single record to show to user. When user clicks next link there might be
  few records, it looks odd really.
 
 
  I hope there must be some mechanism, by which I can associate salary,
  experience, age etc with resume document during indexing. And when
  I search for resumes I can give all filters accordingly and can retrieve
  100 records and strait way I can show 100 records to user without doing
  any
  mysql query. Please let me know if this is feasible. If so, kindly give me
  some pointer how do I do it.
 
  Best Regards
  Kamal
 
 
 


Re: SOLR guidance required

2013-05-13 Thread Shawn Heisey

On 5/13/2013 11:55 AM, Furkan KAMACI wrote:

Jason can you explain what you mean at here: Where OR operators apply,
this does not matter. But your Solr cache will be much more savvy with the
first construct.


If you need to OR different filters together, you have to have all those 
in the same filter query.  Multiple filter queries are ANDed together, 
that can't be changed.


If you need your filter clauses ANDed together, you can split them into 
multiple small filter queries.  Those filters will be cached 
individually.  If you put all of them in the same filter query, then you 
can't re-use pieces of the filter without a new cache entry, so caching 
isn't as efficient.


Thanks,
Shawn



Faceting json response - odd format

2013-05-13 Thread Cord Thomas
Hello,

Relatively new to SOLR, I am quite happy with the API. 

I am a bit challenged by the faceting response in JSON though. 

This is what i am getting which mirrors what is in the documentation:

facet_counts:{facet_queries:{},

facet_fields:{metadata_meta_last_author:[Nick,330,standarduser,153,Mohan,52,wwd,49,gerald,45,Riggins,36,fallon,31,blister,28,
 
,26,morfitelli,24,Administrator,22,morrow,22,richard,22,egilhoi,18,USer
 Group,16],
  

This is not trivial to parse - I've read the docs but can't seem to figure out 
who one might get a more structured response to this.

Assuming I am not missing anything,  I guess i have to write a custom parser to 
build a separate data structure that can be more easily presented in a UI.  

Thank you

Cord

Re: rename a core to same name of existing core

2013-05-13 Thread Shawn Heisey

On 5/13/2013 11:46 AM, Jie Sun wrote:

did any one verified the following is ture?

the Description on http://wiki.apache.org/solr/CoreAdmin#CREATE is:

*quote*
If a core with the same name exists, while the new created core is
initalizing, the old one will continue to accept requests. Once it
has finished, all new request will go to the new core, and the old
core will be unloaded.
*/quote*


step1 - I have a core 'abc' with 30 documents in it:
http://myhost.com:8080/solr/abc/select/?q=type%3Amessageversion=2.2start=0rows=10indent=on

str name=rows10/str/lst/lstresult name=response numFound=30
start=0doc

step2 - then I create a new core with same name 'abc':
http://myhost.com:8080/solr/admin/cores?action=createname=abcinstanceDir=./

responselst name=responseHeaderint name=status0/intint
name=QTime303/int/lststr name=coreabc/strstr
name=saved/mxl/var/solr/solr.xml/str/response

step3 - I cleared out my browser cache

step4 - I did same query as in step1, got same results (30 documents):
http://myhost.com:8080/solr/abc/select/?q=type%3Amessageversion=2.2start=0rows=10indent=on

str name=rows10/str/lst/lstresult name=response numFound=30
start=0doc

I thought the old core should be unloaded?
did I misunderstand any thing here?


If the instanceDir value that you are using is different than the 
existing core, then this might be a bug, either in the documentation or 
Solr.  If the instanceDir is the same as the existing core, then it's 
working as designed -- you've created a core with an index that already 
exists.


I personally would like to see core creation fail if the core already 
exists, but others with more authority may disagree.


A workaround would be to CREATE a new core with a different name and 
instanceDir, SWAP them, and then UNLOAD the one you don't need any more, 
optionally deleting it.


Thanks,
Shawn



RE: Looking for Best Practice of Spellchecker

2013-05-13 Thread Dyer, James
The Word Break spellchecker will incorporate the broken  combined words in the 
collations.  Its designed to work seamlessly in conjunction with a regular 
spellchecker (IndexBased- or Direct-).  

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Nicholas Ding [mailto:nicholas...@gmail.com] 
Sent: Monday, May 13, 2013 12:07 PM
To: solr-user@lucene.apache.org
Subject: Re: Looking for Best Practice of Spellchecker

Thank you for you help, guys. I agreed, wall mart should be a synonyms,
it's not a good example.

I did an experiment by using KeywordTokenizer + DirectSolrSpellChecker, I
can get suggestion even for wall mart to walmart. But I don't know
whether it's a good practice or not. It's much like a workaround to me. And
for WordBreakSpellChecker, I haven't tried it yet. Does this spellchecker
break the word and concatenate them then give me collations?

Thanks


On Fri, May 10, 2013 at 11:34 AM, Dyer, James
james.d...@ingramcontent.comwrote:

 Good point, Jason.  In fact, even if you use WorkBreakSpellChecker wall
 mart will not correct to walmart.  The reason is the spellchecker cannot
 both correct a token's spelling *and* fix the wordbreak issue involving
 that same token.  So in this case a synonym is the way to go.

 James Dyer
 Ingram Content Group
 (615) 213-4311


 -Original Message-
 From: Jason Hellman [mailto:jhell...@innoventsolutions.com]
 Sent: Friday, May 10, 2013 9:55 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Looking for Best Practice of Spellchecker

 Nicholas,

 Also consider that some misspellings are better handled through Synonyms
 (or injected metadata).

 You can garner a great deal of value out of the spell checker by following
 the great advice James is giving here...but you'll find a well-placed
 helper synonym or metavalue can often save a lot of headache and time.

 Jason

 On May 10, 2013, at 7:32 AM, Dyer, James james.d...@ingramcontent.com
 wrote:

  Nicholas,
 
  It sounds like you might want to use WordBreakSolrSpellChecker, which
 gets obscure mention in the wiki.  Read through this section:
 http://wiki.apache.org/solr/SpellCheckComponent#Configuration and you
 will see some information.
 
  Also, the Solr Example shows how to configure this.  See
 http://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x/solr/example/solr/collection1/conf/solrconfig.xml
 
  Look for...
 
  lst name=spellchecker
   str name=namewordbreak/str
   ...
  /lst
 
  ...and...
 
  requestHandler name=/spell ...
  ...
  /requestHandler
 
  Also, I'd recommend you take a look at each parameter in the /spell
 request handler and read its section on the spellcheckcomponent wiki
 page.  You probably will want to set many of these parameters as well.
 
  You can get a query to return only spell results simply by specifying
 rows=0.  However, its one less query to just have it return the results
 also.  If there are no results, your application can check for collations
 and re-issue a collation query.  If there are both results and collations
 returned, you can give the user results with did-you-mean suggestions.
 
  James Dyer
  Ingram Content Group
  (615) 213-4311
 
 
  -Original Message-
  From: Nicholas Ding [mailto:nicholas...@gmail.com]
  Sent: Friday, May 10, 2013 8:47 AM
  To: solr-user@lucene.apache.org
  Subject: Looking for Best Practice of Spellchecker
 
  Hi guys,
 
  I'm working on a local search project, I wanna integrate spellchecker for
  the search.
 
  So basically, my search engines is used to search local businesses. For
  example, user could search for wall mart, here is a typo, I wanna
  spellchecker to give me Collation for walmart.
 
  My problems are:
  1. I use DirectSolrSpellChecker on my BusinessNameField and pass wall
  mart as phrase search, but I can't get collation from the spellchecker.
  2. I tried not to pass phrase search, but pass q=Wall AND Mart to force a
  100% match, but spellchecker can't give me collation also.
 
  I read the documents about spellchecker on Solr wiki, but it's very
 brief.
  I'm wondering is there any best practice of spellchecker, I believe it's
  widely used in the search, right?
 
  And I have another idea, I don't know whether it's valid or not. I want
 to
  apply spellchecker everything before doing the search, so that I could
 rely
  on the spellchecker to tell me whether my search could get result or not.
 
  Thanks
  Nicholas
 







Re: SOLR guidance required

2013-05-13 Thread Michael Della Bitta
Best advice in this thread. :)


Michael Della Bitta


Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

The science of influence marketing.


On Mon, May 13, 2013 at 1:29 PM, Lance Norskog goks...@gmail.com wrote:

 If this is for the US, remove the age range feature before you get sued.


 On 05/09/2013 08:41 PM, Kamal Palei wrote:

 Dear SOLR experts
 I might be asking a very silly question. As I am new to SOLR kindly guide
 me.


 I have a job site. Using SOLR to search resumes. When a HR user enters
 some
 keywords say JAVA, MySQL etc, I search resume documents using SOLR,
 retrieve 100 records and show to user.

 The problem I face is say, I retrieved 100 records, then we do filtering
 for experience range, age range, salary range (using mysql query).
 Sometimes it so happens that the 100 records I fetch , I do not get a
 single record to show to user. When user clicks next link there might be
 few records, it looks odd really.


 I hope there must be some mechanism, by which I can associate salary,
 experience, age etc with resume document during indexing. And when
 I search for resumes I can give all filters accordingly and can retrieve
 100 records and strait way I can show 100 records to user without doing
 any
 mysql query. Please let me know if this is feasible. If so, kindly give me
 some pointer how do I do it.

 Best Regards
 Kamal





Re: How to improve performance of geodist()

2013-05-13 Thread Nicholas Ding
Yes, I did. But instead of sorting by geodist(), I use function query to
boost by distance. That's why I noticed the heavy calculation happened in
the processing.

Example:
bf=recip(geodist(), 50, 5)

Basically, I think the boost function will iterate all the results, and
calculate the distance.



On Mon, May 13, 2013 at 1:27 PM, Yonik Seeley yo...@lucidworks.com wrote:

 On Mon, May 13, 2013 at 1:12 PM, Nicholas Ding nicholas...@gmail.com
 wrote:
  I'm using geodist() in a recip boost function. I noticed a performance
  impact to the response time. I did a profiling session, the geodist()
  calculation took 30% of CPU time.

 Are you also using an fq with geofilt to narrow down the number of
 documents that must be scored?

 -Yonik
 http://lucidworks.com



Anybody knows what IBM FileNet search looks like?

2013-05-13 Thread Alexandre Rafalovitch
And how does it compare to Solr.

I am not buying (or selling), just trying to get some technical
details and my GoogleFoo is failing me. I thought they were one of the
purchased companies, but Autonomy/Verity seems to be referred to as
'old' search engine with FileNet's as new.

Regards,
   Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


Re: Making protwords.txt changes effective

2013-05-13 Thread Upayavira
I think you can put it in your data dir and it'll get reloaded on
commit. Try it and report back.

Upayavira

On Mon, May 13, 2013, at 06:01 PM, Jack Krupansky wrote:
 Yes, restart Solr. Not to reindex, but simply to reload the file.
 
 Well... depending on where you use the protected words, you may need to 
 reindex as well. For a query-time filter you don't need to reindex, but
 for 
 index-time filters, you must reindex.
 
 -- Jack Krupansky
 
 -Original Message- 
 From: Shane Magee
 Sent: Saturday, May 11, 2013 7:42 AM
 To: solr-user@lucene.apache.org
 Subject: Making protwords.txt changes effective
 
 Hi
 
 I added some words to protwords.txt, but there doesnt seem to be any
 effect
 in the resulting search. Do I need to restart Apache or Solr or rebuild
 the
 index? 
 


Re: rename a core to same name of existing core

2013-05-13 Thread Jie Sun
thanks for the information, you are right, I was using the same instance dir.

I agree with you, I would like to see an error is I am creating a core with
the name of existing core name.

right now I have to do ping first, and analyze if the returned code is 404
or not.

Jie



--
View this message in context: 
http://lucene.472066.n3.nabble.com/rename-a-core-to-same-name-of-existing-core-tp3090960p4063047.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Anybody knows what IBM FileNet search looks like?

2013-05-13 Thread Oleg Tikhonov
:-) Alex, it seems to be a copyright ... Think about Lucene + ManifoldCF.
FileNet is file repository saved in DB2. ManifoldCF has a connector that
helps retrieve files/directories from DB  using Lucene it may index the
context of the files.

I am not sure if Solr has such handler like Tika, however you can write by
yourself.

Hope it helps,

Oleg


On Mon, May 13, 2013 at 9:39 PM, Alexandre Rafalovitch
arafa...@gmail.comwrote:

 And how does it compare to Solr.

 I am not buying (or selling), just trying to get some technical
 details and my GoogleFoo is failing me. I thought they were one of the
 purchased companies, but Autonomy/Verity seems to be referred to as
 'old' search engine with FileNet's as new.

 Regards,
Alex.
 Personal blog: http://blog.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all
 at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
 book)



Re: How to force a document to be indexed in a given shard at SolrCloud?

2013-05-13 Thread gpssolr2020
Hi,

Yes shard.keys should work for this case.Please check this link.

http://docs.lucidworks.com/display/solr/Shards+and+Indexing+Data+in+SolrCloud

Thanks.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-force-a-document-to-be-indexed-in-a-given-shard-at-SolrCloud-tp4063017p4063052.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Faceting json response - odd format

2013-05-13 Thread Chris Hostetter

: This is what i am getting which mirrors what is in the documentation:
: 
: facet_counts:{facet_queries:{},
: 
facet_fields:{metadata_meta_last_author:[Nick,330,standarduser,153,Mohan,52,wwd,49,gerald,45,Riggins,36,fallon,31,blister,28,
 
,26,morfitelli,24,Administrator,22,morrow,22,richard,22,egilhoi,18,USer
 Group,16],
:   
: 
: This is not trivial to parse - I've read the docs but can't seem to 
: figure out who one might get a more structured response to this.

You didn't provide any specifics about what you felt was problematic, but 
i'm guessing what you want to do is pick the value you think is best for 
the json.nl param...

http://wiki.apache.org/solr/SolJSON#JSON_specific_parameters


-Hoss


Re: Faceting json response - odd format

2013-05-13 Thread Cord Thomas
thank you Hoss,

What i would prefer to see as we do with all other parameters is a normal 
key/value pairing.  this might look like:

{metadata_meta_last_author:[{value: Nick, count: 330},{value: 
standard user,count: 153},{value: Mohan,count: 
52},{value:wwd,count: 49}…

Cord

On May 13, 2013, at 12:34 PM, Chris Hostetter hossman_luc...@fucit.org wrote:

 
 : This is what i am getting which mirrors what is in the documentation:
 : 
 : facet_counts:{facet_queries:{},
 : 
 facet_fields:{metadata_meta_last_author:[Nick,330,standarduser,153,Mohan,52,wwd,49,gerald,45,Riggins,36,fallon,31,blister,28,
  
 ,26,morfitelli,24,Administrator,22,morrow,22,richard,22,egilhoi,18,USer
  Group,16],
 :   
 : 
 : This is not trivial to parse - I've read the docs but can't seem to 
 : figure out who one might get a more structured response to this.
 
 You didn't provide any specifics about what you felt was problematic, but 
 i'm guessing what you want to do is pick the value you think is best for 
 the json.nl param...
 
 http://wiki.apache.org/solr/SolJSON#JSON_specific_parameters
 
 
 -Hoss



RE: spellcheker and exact match

2013-05-13 Thread hacene
I tried those parameters and it does suggest keywords but not the ones I'm
interested in



--
View this message in context: 
http://lucene.472066.n3.nabble.com/spellcheker-and-exact-match-tp4061672p4063060.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Faceting json response - odd format

2013-05-13 Thread Chris Hostetter

: What i would prefer to see as we do with all other parameters is a 
: normal key/value pairing.  this might look like:

a true key value pairing via a map type structure is what you get with 
json.nl=map -- but in most client langauges that would lose whatever 
sorting you might have specified with facet.sort.

: {metadata_meta_last_author:[{value: Nick, count: 330},{value: 
: standard user,count: 153},{value: Mohan,count: 
: 52},{value:wwd,count: 49}…

that structure is essentially what you get with json.nl=arrarr -- ie: the
values are still in the specified facet.sort order; but instead of an
array of maps, it's an array of array pairs.

This is the most close equivilent to how the facet counts are internally 
modeled -- you should be able to inject those keys you choose (value and 
count) in your client layer fairely easily.


-Hoss

Solr 4.3 core swap

2013-05-13 Thread richardg
Since upgrading to solr 4.3 we get the following errors on our slaves when we
swap cores on our master:

Solr index directory
'/usr/local/solr_aggregate/solr_aggregate/data/index.20130513152644966' is
locked.  Throwing exception

SEVERE: Unable to reload core: production
org.apache.solr.common.SolrException: Index locked for write for core
production

SEVERE: Could not reload core 
org.apache.solr.common.SolrException: Unable to reload core: production

On older solr versions it would create a new index.* directory and use it,
it hasn't been the case w/ 4.3.  The new core seems to replicate fine and
the new index files are in the original index.* directory so I'm not sure
what is happening.

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-3-core-swap-tp4063065.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr Boolean query help

2013-05-13 Thread Arun Rangarajan
I am trying to form a Solr query. Our documents have a multi-valued field
named tag_id. I want to get documents that either do not have tag_id 1 or
have both tag_id 1 and 2 i.e.
q=(tag_id:(1 AND 2) OR tag_id:(NOT 1))

This is not giving the desired results. The result is the same as that of
q=tag_id:(1 AND 2)
and the OR condition is ignored.
How would one do this query?


Re: Solr Boolean query help

2013-05-13 Thread Erik Hatcher
Inner purely negative clauses aren't allowed by Lucene.  (Solr supports 
top-level negative clauses, though, so q=NOT foo works as expected)

To get a nested negative clause to work, try this:

q=tag_id:(1 AND 2) OR (*:* AND -tag_id:1)



On May 13, 2013, at 16:11 , Arun Rangarajan wrote:

 I am trying to form a Solr query. Our documents have a multi-valued field
 named tag_id. I want to get documents that either do not have tag_id 1 or
 have both tag_id 1 and 2 i.e.
 q=(tag_id:(1 AND 2) OR tag_id:(NOT 1))
 
 This is not giving the desired results. The result is the same as that of
 q=tag_id:(1 AND 2)
 and the OR condition is ignored.
 How would one do this query?



Re: Solr Boolean query help

2013-05-13 Thread Jack Krupansky

Pure negative queries only work at the top level.

So, try:

q=(tag_id:(1 AND 2) OR tag_id:(*:* NOT 1))


-- Jack Krupansky
-Original Message- 
From: Arun Rangarajan 
Sent: Monday, May 13, 2013 4:11 PM 
To: solr-user@lucene.apache.org 
Subject: Solr Boolean query help 


I am trying to form a Solr query. Our documents have a multi-valued field
named tag_id. I want to get documents that either do not have tag_id 1 or
have both tag_id 1 and 2 i.e.
q=(tag_id:(1 AND 2) OR tag_id:(NOT 1))

This is not giving the desired results. The result is the same as that of
q=tag_id:(1 AND 2)
and the OR condition is ignored.
How would one do this query?


Re: Solr Boolean query help

2013-05-13 Thread Arun Rangarajan
Erik, Jack,
Thanks for your quick replies! That works.


On Mon, May 13, 2013 at 1:18 PM, Jack Krupansky j...@basetechnology.comwrote:

 Pure negative queries only work at the top level.

 So, try:

 q=(tag_id:(1 AND 2) OR tag_id:(*:* NOT 1))


 -- Jack Krupansky
 -Original Message- From: Arun Rangarajan Sent: Monday, May 13,
 2013 4:11 PM To: solr-user@lucene.apache.org Subject: Solr Boolean query
 help
  I am trying to form a Solr query. Our documents have a multi-valued field
 named tag_id. I want to get documents that either do not have tag_id 1 or
 have both tag_id 1 and 2 i.e.
 q=(tag_id:(1 AND 2) OR tag_id:(NOT 1))

 This is not giving the desired results. The result is the same as that of
 q=tag_id:(1 AND 2)
 and the OR condition is ignored.
 How would one do this query?



writing a custom Filter plugin?

2013-05-13 Thread Jonathan Rochkind
Does anyone know of any tutorials, basic examples, and/or documentation 
on writing your own Filter plugin for Solr? For Solr 4.x/4.3?


I would like a Solr 4.3 version of the normalization filters found here 
for Solr 1.4: https://github.com/billdueber/lib.umich.edu-solr-stuff


But those are old, for Solr 1.4.

Does anyone have any hints for writing a simple substitution Filter for 
Solr 4.x?  Or, does a simple sourcecode example exist anywhere?


.skip.autorecovery=Y + restart solr after crash + losing many documents

2013-05-13 Thread Gilles Comeau
Hi all,

We write to two same-named cores in the same collection for redundancy, and are 
not taking advantage of the full benefits of solr cloud replication.

We use solrcloud.skip.autorecovery=true so that Solr doesn't try to sync the 
indexes when it starts up.

However, we find that if the core is not optimized prior to shutting it down 
(in a crash situation), we can lose all of the data after starting up.   The 
files are written to disk, but we can lose a full 24 hours worth of data as 
they are all removed when we start SOLR.  (I don't think it is a commit issue)

If we optimize before shutting down, we never lose any data.   Sadly, sometimes 
SOLR is in a state where optimizing is not an option.

Can anyone think of why that might be?   Is there any special configuration you 
need if you want to write directly to two cores rather than use replication?   
Version 4.0, this used to work in our 4.0 nightly build, but broke when we 
migrated to 4.0 production.(until we test and migrate to the replication 
setup - it won't be too long and I'm a bit embarrassed to be asking this 
question!)

Regards,

Gilles



Re: How to get/set customized Solr data source properties?

2013-05-13 Thread Chris Hostetter

: learned it should work. And this is my actual code. I create this
: DataSource for testing my ideas. I am blocked at the very beginning...sucks
: :(

but you only showed us one line of code w/o any context.  nothing in your 
email was reproducible for other people to try to compile/run themselves 
to see if htey can figure out why your code isn't working.

:  : I am working on a DataSource implementation. I want to get some
:  customized
:  : properties when the *DataSource.init* method is called. I tried to add
:  the
:  ...
:  : dataConfig
:  :   dataSource type=com.my.company.datasource
:  : my=value /
: 
:  My understanding from looking at other DataSources is that should work.
: 
:  : But initProps.getProperty(my) == null.
: 
:  can you show us some actual that fails with that dataConfig you mentioned?


-Hoss


Solritas truncates content

2013-05-13 Thread Michael Schmitz
Hi, I'm playing around with the example that comes with SOLR 4.  I've
indexed some documents using the Tika extractor.  I'm looking at the
velocity templates and trying to figure out how the /browse (solritas)
functionality works because I would like to add functionality to view the
complete document content.  Presently, the content field is truncated in
the results to around 730 characters.  How is this done?  How can I access
the full text?  I've poked around quite a bit but have not found anything.

The content field is added to the result set in richtext-doc.vm:

div class=result-body#field('content')/div

Any help is greatly appreciated!
Peace.  Michael


Re: How to deal with cache for facet search when index is always increment?

2013-05-13 Thread Chris Hostetter

:  For real time seach, the docs would be import to index anytime. In this
:  case, the cache is nealy always need to create again, which cause the facet
:  seach is very slowly.
:  Do you have any idea to deal with such problem?

: We're in a similar situation and have had better performance using
: facet.method=fcs.
: 
: http://wiki.apache.org/solr/SimpleFacetParameters#facet.method

DocValues is another vey new option that may help improve the performance 
of faceting in NRT sitautions, because it eliminates the need to build the 
Field Cache...

http://wiki.apache.org/solr/DocValues

...but there are caveats to using it (see wiki).


-Hoss


Request to be added to ContributorsGroup

2013-05-13 Thread Shreejay Nair
Hello Wiki Admins,

Request you to please add me to the ContributorsGroup.

I have been using Solr for a few years now and I would like to contribute
back by adding more information to the wiki Pages.

Wiki User Name : Shreejay

--Shreejay


Re: Quick question about indexing with SolrJ.

2013-05-13 Thread Chris Hostetter

: I don't want to use POJOs, that's the main problem. I know that you can
: send AJAX POST HTTP Requests with JSON data to index new documents and I
: would like to do that with SolrJ, that's all, but I don't find the way to
: do that, :-/ . What I would like to do is simple retrieve an String with an
: embedded JSON and add() it via an HttpSolrServer object instance. If the

Use ContentStreamUpdateRequest -- provide your pre-generated JSON as the 
ContentStream (you can back it by a String using 
ContentStreamBase.StringStream or whatever you have to work with) then 
process it against your HttpSolrServer object.


-Hoss


Re: Request to be added to ContributorsGroup

2013-05-13 Thread Steve Rowe
On May 13, 2013, at 6:54 PM, Shreejay Nair shreej...@gmail.com wrote:
 Hello Wiki Admins,
 
 Request you to please add me to the ContributorsGroup.
 
 I have been using Solr for a few years now and I would like to contribute
 back by adding more information to the wiki Pages.
 
 Wiki User Name : Shreejay
 
 --Shreejay

Added to the solr wiki ContributorsGroup. - Steve



Re: How to get/set customized Solr data source properties?

2013-05-13 Thread Alexandre Rafalovitch
If the property has a full stop, it is probably going through the
scoped resolver which may be causing issues. I would start with very
basic property name format and see what happens.

Otherwise, it is probably a breakpoint and debug time.

Regards,
   Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Mon, May 13, 2013 at 6:08 PM, Chris Hostetter
hossman_luc...@fucit.org wrote:

 : learned it should work. And this is my actual code. I create this
 : DataSource for testing my ideas. I am blocked at the very beginning...sucks
 : :(

 but you only showed us one line of code w/o any context.  nothing in your
 email was reproducible for other people to try to compile/run themselves
 to see if htey can figure out why your code isn't working.

 :  : I am working on a DataSource implementation. I want to get some
 :  customized
 :  : properties when the *DataSource.init* method is called. I tried to add
 :  the
 :  ...
 :  : dataConfig
 :  :   dataSource type=com.my.company.datasource
 :  : my=value /
 : 
 :  My understanding from looking at other DataSources is that should work.
 : 
 :  : But initProps.getProperty(my) == null.
 : 
 :  can you show us some actual that fails with that dataConfig you mentioned?


 -Hoss


Re: Solritas truncates content

2013-05-13 Thread Erik Hatcher
#field is defined in conf/velocity/VM_global_library.vm as:

#macro(field $f)
  #if($response.response.highlighting.get($docId).get($f).get(0))
#set($pad = )
#foreach($v in $response.response.highlighting.get($docId).get($f))
$pad$v##
  #set($pad =  ... )
#end
  #else
#foreach($v in $doc.getFieldValues($f))
$v##
#end
  #end
#end 

It's a little ugly because it supports highlighting if a field has an values 
for that document in the highlighting section of the response.

But if there is no highlighting, then it outputs each value of a field as-is 
from the response.  Are you sure you're getting it truncated?  Try adding 
wt=xml to the /browse requests you're making and see if perhaps the actual 
value coming back from Solr is the same as what you're seeing rendered.  Unless 
it's from highlighting, it should be the same.

Erik


On May 13, 2013, at 18:14 , Michael Schmitz wrote:

 Hi, I'm playing around with the example that comes with SOLR 4.  I've
 indexed some documents using the Tika extractor.  I'm looking at the
 velocity templates and trying to figure out how the /browse (solritas)
 functionality works because I would like to add functionality to view the
 complete document content.  Presently, the content field is truncated in
 the results to around 730 characters.  How is this done?  How can I access
 the full text?  I've poked around quite a bit but have not found anything.
 
 The content field is added to the result set in richtext-doc.vm:
 
 div class=result-body#field('content')/div
 
 Any help is greatly appreciated!
 Peace.  Michael



Re: How to improve performance of geodist()

2013-05-13 Thread David Smiley (@MITRE.org)
Hi Nicholas,

Given that boosting is generally inherently fuzzy / inexact thing, you can
likely get away with using simpler calculations.  dist() can do the
Euclidean distance (i.e. the Pythagorean theorem).  If your data is in just
one region of the world, you can project your data into a 2-D plane (a
so-called projection) and use the Euclidean distance.  If your data is
everywhere, you may need to use multiple projections, putting them in
separate fields for each projection and then choose the best projected set
of coordinates based on your starting point.

~ David


Nicholas Ding wrote
 Yes, I did. But instead of sorting by geodist(), I use function query to
 boost by distance. That's why I noticed the heavy calculation happened in
 the processing.
 
 Example:
 bf=recip(geodist(), 50, 5)
 
 Basically, I think the boost function will iterate all the results, and
 calculate the distance.
 
 
 
 On Mon, May 13, 2013 at 1:27 PM, Yonik Seeley lt;

 yonik@

 gt; wrote:
 
 On Mon, May 13, 2013 at 1:12 PM, Nicholas Ding lt;

 nicholasdsj@

 gt;
 wrote:
  I'm using geodist() in a recip boost function. I noticed a performance
  impact to the response time. I did a profiling session, the geodist()
  calculation took 30% of CPU time.

 Are you also using an fq with geofilt to narrow down the number of
 documents that must be scored?

 -Yonik
 http://lucidworks.com






-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-improve-performance-of-geodist-tp4063004p4063136.html
Sent from the Solr - User mailing list archive at Nabble.com.


Request to be added to Contributor Group

2013-05-13 Thread Dao Xuan, Hoang
Hi Admins,

My name is Eric. I got an account at http://wiki.apache.org/solr/ with user
name is Eric D. Please add me to the Contributor Group. We currently have
JobSearcher.com.au up and running which is using Solr. I am sure we can add
comments and share some experience with Solr up there.

Thank you very much.

Best regards,