Solr Data Routing

2014-09-02 Thread Ankit Jain
Hi All,

I want to route data into shards depends on value of input column. For
example:

I am getting user data and want to store data of user1 on shard1 and user2
on shard2 and so on.

Can you please let me know, how we can achieve the above scenario in Solr.

-- 
Thanks,
Ankit Jain


Re: Solr Data Routing

2014-09-02 Thread Himanshu Mehrotra
Hi,

You can use multi level compositeId routing in solr cloud.  Read through
the following link http://searchhub.org/2014/01/06/10590/ it should help.

Thanks,
Himanshu



On Tue, Sep 2, 2014 at 1:25 PM, Ankit Jain ankitjainc...@gmail.com wrote:

 Hi All,

 I want to route data into shards depends on value of input column. For
 example:

 I am getting user data and want to store data of user1 on shard1 and user2
 on shard2 and so on.

 Can you please let me know, how we can achieve the above scenario in Solr.

 --
 Thanks,
 Ankit Jain




-- 

Himanshu Mehrotra

Download Our App[image: A]
https://play.google.com/store/apps/details?id=com.snapdeal.mainutm_source=mobileAppLputm_campaign=android[image:
A]
https://itunes.apple.com/in/app/snapdeal-mobile-shopping/id721124909?ls=1mt=8utm_source=mobileAppLputm_campaign=ios[image:
W]
http://www.windowsphone.com/en-in/store/app/snapdeal/ee17fccf-40d0-4a59-80a3-04da47a5553f
*Ext*: 529

246 OKHLA PHASE III, NEW DELHI 110 020, INDIA
[image: Snapdeal.com] http://www.snapdeal.com/


Re: Help with StopFilterFactory

2014-09-02 Thread heaven
Jira issue: https://issues.apache.org/jira/browse/SOLR-6468



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-tp4153839p4156373.html
Sent from the Solr - User mailing list archive at Nabble.com.


Date field related query

2014-09-02 Thread Aman Tandon
Hi,

I am working on date and i want to find all those records which are indexed
today.

With Regards
Aman Tandon


Re: Date field related query

2014-09-02 Thread Aman Tandon
Hi,

I did it using this, fq=datefield:[2014-09-01T23:59:59Z TO
2014-09-02T23:59:59Z].
Correct me if i am wrong.

Is there any way to find this using the NOW?


With Regards
Aman Tandon


On Tue, Sep 2, 2014 at 4:08 PM, Aman Tandon amantandon...@gmail.com wrote:

 Hi,

 I am working on date and i want to find all those records which are
 indexed today.

 With Regards
 Aman Tandon



Re: Date field related query

2014-09-02 Thread François Schiettecatte
How about :

datefield:[NOW-1DAY/DAY TO *]

François

On Sep 2, 2014, at 6:54 AM, Aman Tandon amantandon...@gmail.com wrote:

 Hi,
 
 I did it using this, fq=datefield:[2014-09-01T23:59:59Z TO
 2014-09-02T23:59:59Z].
 Correct me if i am wrong.
 
 Is there any way to find this using the NOW?
 
 
 With Regards
 Aman Tandon
 
 
 On Tue, Sep 2, 2014 at 4:08 PM, Aman Tandon amantandon...@gmail.com wrote:
 
 Hi,
 
 I am working on date and i want to find all those records which are
 indexed today.
 
 With Regards
 Aman Tandon
 



HTTPS for SolrCloud

2014-09-02 Thread Christopher Gross
Solr 4.8.1
Java 1.7
Tomcat 7.0.50
Zookeeper 3.4.6

Trying to get a SolrCloud running with https only.  I found this:
https://issues.apache.org/jira/browse/SOLR-3854

I don't have a clusterprops.json file, and running the zkCli command
doesn't add one either.
Command is along the lines of:
./zkCli.sh -zkhost host:port -cmd put /clusterprops.json
'{urlScheme:https}'
(run from the zookeeper/bin directory).

I've done some googling, but I can't seem to figure out what I'm doing
wrong.  I'm not getting an error message when doing the command.

Any ideas?  Thanks.

-- Chris


Search on specific shard

2014-09-02 Thread Ankit Jain
Hi All,

I am using below piece of code to route a data on the basis of user field.
The data of user1 is going on one shard and data of user2 is going on
another shard.
 try {
String zkHostString = 127.0.0.1:2181;
CloudSolrServer cloudSolrServer = new
CloudSolrServer(zkHostString);
CollectionAdminRequest.createCollection(collection5, 2, 2, 2,
null, null, user, cloudSolrServer);
cloudSolrServer.setDefaultCollection(collection5);

for (int i = 0; i = 100; i++) {
SolrInputDocument document = new SolrInputDocument();
document.addField(id, i);
document.addField(user, user+(i%2));
cloudSolrServer.add(document);
}
cloudSolrServer.commit();
cloudSolrServer.shutdown();
} catch (SolrException e) {
e.printStackTrace();
} catch (SolrServerException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}

Now, I  want to use routing at search time. If user search the documents
for user1, then my query should be execute on only shard1 (shard1 contains
the data of user1).

Please let me know, how we can route the query to specific shard at search
time.

-- 
Thanks,
Ankit Jain


Solr source code

2014-09-02 Thread Shay Sofer
Hi,

What is the process regarding modify Solr source code (legal part)?

In addition, who should I update for this bug and fix so Solr team will 
consider using it.

Thanks,
Shay.


Re: Solr source code

2014-09-02 Thread Shawn Heisey
On 9/2/2014 8:27 AM, Shay Sofer wrote:
 What is the process regarding modify Solr source code (legal part)?

 In addition, who should I update for this bug and fix so Solr team will 
 consider using it.

The Lucene/Solr project is licensed under the Apache License, version 2.0.

http://www.apache.org/licenses/LICENSE-2.0
http://www.apache.org/foundation/license-faq.html#WhatDoesItMEAN

The generally accepted way to contribute a bugfix to the project is to
find (or create) the appropriate issue in Jira, then attach your patch
to it.  Ideally you will check out the trunk branch from SVN and create
your patch against that with the svn diff tool, but the stable branch
(branch_4x currently) will do just as well.  Just be sure there's enough
info accompanying the patch for us to identify the exact branch/revision
used to build it.

http://wiki.apache.org/solr/HowToContribute

There are other methods, like the mailing list, a pull request for the
github mirror, etc... but Jira and a patch from SVN are the best way.

Thanks,
Shawn



Re: Solr Data Routing

2014-09-02 Thread Erick Erickson
Here's another link:
http://searchhub.org/2013/06/13/solr-cloud-document-routing/


I have to ask why you want to do this? If you want to put docs in a
particular
shard yourself, you have to be very careful that you're not shooting
yourself
in the foot. Not saying it's a bad idea, but this may be an XY problem.
What is
the use-case you're supporting by doing this? Would separate collections
serve
as well?

Best,
Erick


On Tue, Sep 2, 2014 at 2:03 AM, Himanshu Mehrotra 
himanshu.mehro...@snapdeal.com wrote:

 Hi,

 You can use multi level compositeId routing in solr cloud.  Read through
 the following link http://searchhub.org/2014/01/06/10590/ it should help.

 Thanks,
 Himanshu



 On Tue, Sep 2, 2014 at 1:25 PM, Ankit Jain ankitjainc...@gmail.com
 wrote:

  Hi All,
 
  I want to route data into shards depends on value of input column. For
  example:
 
  I am getting user data and want to store data of user1 on shard1 and
 user2
  on shard2 and so on.
 
  Can you please let me know, how we can achieve the above scenario in
 Solr.
 
  --
  Thanks,
  Ankit Jain
 



 --

 Himanshu Mehrotra

 Download Our App[image: A]
 
 https://play.google.com/store/apps/details?id=com.snapdeal.mainutm_source=mobileAppLputm_campaign=android
 [image:
 A]
 
 https://itunes.apple.com/in/app/snapdeal-mobile-shopping/id721124909?ls=1mt=8utm_source=mobileAppLputm_campaign=ios
 [image:
 W]
 
 http://www.windowsphone.com/en-in/store/app/snapdeal/ee17fccf-40d0-4a59-80a3-04da47a5553f
 
 *Ext*: 529

 246 OKHLA PHASE III, NEW DELHI 110 020, INDIA
 [image: Snapdeal.com] http://www.snapdeal.com/



Re: Date field related query

2014-09-02 Thread Erick Erickson
Hmmm, not quite, I think you meant:

datefield:[NOW/DAY TO NOW/DAY+1DAY]

You're particularly interested in using date math if
if you use these in filter query clauses, see:
http://searchhub.org/2012/02/23/date-math-now-and-filter-queries/

Best,
Erick


On Tue, Sep 2, 2014 at 3:59 AM, François Schiettecatte 
fschietteca...@gmail.com wrote:

 How about :

 datefield:[NOW-1DAY/DAY TO *]

 François

 On Sep 2, 2014, at 6:54 AM, Aman Tandon amantandon...@gmail.com wrote:

  Hi,
 
  I did it using this, fq=datefield:[2014-09-01T23:59:59Z TO
  2014-09-02T23:59:59Z].
  Correct me if i am wrong.
 
  Is there any way to find this using the NOW?
 
 
  With Regards
  Aman Tandon
 
 
  On Tue, Sep 2, 2014 at 4:08 PM, Aman Tandon amantandon...@gmail.com
 wrote:
 
  Hi,
 
  I am working on date and i want to find all those records which are
  indexed today.
 
  With Regards
  Aman Tandon
 




Re: Date field related query

2014-09-02 Thread Aman Tandon
Thanks Erick :)

With Regards
Aman Tandon


On Tue, Sep 2, 2014 at 8:28 PM, Erick Erickson erickerick...@gmail.com
wrote:

 Hmmm, not quite, I think you meant:

 datefield:[NOW/DAY TO NOW/DAY+1DAY]

 You're particularly interested in using date math if
 if you use these in filter query clauses, see:
 http://searchhub.org/2012/02/23/date-math-now-and-filter-queries/

 Best,
 Erick


 On Tue, Sep 2, 2014 at 3:59 AM, François Schiettecatte 
 fschietteca...@gmail.com wrote:

  How about :
 
  datefield:[NOW-1DAY/DAY TO *]
 
  François
 
  On Sep 2, 2014, at 6:54 AM, Aman Tandon amantandon...@gmail.com wrote:
 
   Hi,
  
   I did it using this, fq=datefield:[2014-09-01T23:59:59Z TO
   2014-09-02T23:59:59Z].
   Correct me if i am wrong.
  
   Is there any way to find this using the NOW?
  
  
   With Regards
   Aman Tandon
  
  
   On Tue, Sep 2, 2014 at 4:08 PM, Aman Tandon amantandon...@gmail.com
  wrote:
  
   Hi,
  
   I am working on date and i want to find all those records which are
   indexed today.
  
   With Regards
   Aman Tandon
  
 
 



Re: HTTPS for SolrCloud

2014-09-02 Thread Christopher Gross
Getting closer.

I can at least get the file to be there, but I can't figure out what to put
into it.
I make a clusterprops.json file, and its had:
{ urlScheme: https }
{ \urlScheme\: \https\ }
{ \\urlScheme\\: \\https\\ }

Which gets loaded in like this:
./zkCli.sh -zkhost localhost:2181 -cmd put /cluserprops.json `cat
./clusterprops.json`
(and I've also tried just pushing those above values within the zkCli app
to no avail)

I always get a message like this:
Caused by: org.noggit.JSONParser$ParseException: Expected string:
char=\,position=1 BEFORE='{\' AFTER='urlScheme\:\https\}'

I'm not getting a whole lot on searches for clusterprops.json -- any
advice would be appreciated.


-- Chris


On Tue, Sep 2, 2014 at 8:59 AM, Christopher Gross cogr...@gmail.com wrote:

 Solr 4.8.1
 Java 1.7
 Tomcat 7.0.50
 Zookeeper 3.4.6

 Trying to get a SolrCloud running with https only.  I found this:
 https://issues.apache.org/jira/browse/SOLR-3854

 I don't have a clusterprops.json file, and running the zkCli command
 doesn't add one either.
 Command is along the lines of:
 ./zkCli.sh -zkhost host:port -cmd put /clusterprops.json
 '{urlScheme:https}'
 (run from the zookeeper/bin directory).

 I've done some googling, but I can't seem to figure out what I'm doing
 wrong.  I'm not getting an error message when doing the command.

 Any ideas?  Thanks.

 -- Chris



Re: HTTPS for SolrCloud

2014-09-02 Thread Chris Hostetter

First question: ignoring the oiginal jira (which may be out of date 
due to later improvements) have you seen the instructions? 

https://cwiki.apache.org/confluence/display/solr/Enabling+SSL#EnablingSSL-SolrCloud

: I always get a message like this:
: Caused by: org.noggit.JSONParser$ParseException: Expected string:
: char=\,position=1 BEFORE='{\' AFTER='urlScheme\:\https\}'

looks like you have literally backslash caractes in your JSON (evidently 
from your attempts to ecape the quote characters)

If you're having trouble with putting hte JSON directly in the command 
line (you're examples looked really contrived - which shell are you 
using?) you can always -putfile directly and bypass any concerns about 
the shell...

https://cwiki.apache.org/confluence/display/solr/Command+Line+Utilities



-Hoss
http://www.lucidworks.com/


Re: HTTPS for SolrCloud

2014-09-02 Thread Christopher Gross
Hi Hoss.

I did finally stumble onto that document (just after I posted my last
message, of course).
Using bash shell.

I've now tried those steps:

Tomcat is stopped.

First I run:
./zkcli.sh -zkhost localhost:2181 -cmd put /clusterprops.json
'{urlScheme:https}'

I confirm via the zookeeper-provided client:
[zk: localhost:2181(CONNECTED) 0] get /clusterprops.json
{urlScheme:https}
cZxid = 0x1053a
ctime = Tue Sep 02 16:11:09 GMT-00:00 2014
mZxid = 0x1053a
mtime = Tue Sep 02 16:11:09 GMT-00:00 2014
pZxid = 0x1053a
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 21
numChildren = 0
[zk: localhost:2181(CONNECTED) 1]

Next I start Tomcat, I get this:
482  [localhost-startStop-1] ERROR org.apache.solr.core.SolrCore  â
null:org.noggit.JSONParser$ParseException: JSON Parse Error:
char=',position=0 BEFORE=''' AFTER='{urlScheme:https}''

I've done it with  without the quotes, based on commentary here:
http://qnalist.com/questions/4770318/solrcloud-and-https

I get the same error with loading in the props this way:
./zkcli.sh -zkhost localhost:2181 -cmd put /clusterprops.json
{\urlScheme\:\https\}

Error:
533  [localhost-startStop-1] ERROR org.apache.solr.core.SolrCore  â
null:org.noggit.JSONParser$ParseException: JSON Parse Error:
char=',position=0 BEFORE=''' AFTER='{urlScheme:https}''

putfile also nets the same error.

I'm not sure where I'm supposed to go from here.

Thanks!

-- Chris


On Tue, Sep 2, 2014 at 12:06 PM, Chris Hostetter hossman_luc...@fucit.org
wrote:


 First question: ignoring the oiginal jira (which may be out of date
 due to later improvements) have you seen the instructions?


 https://cwiki.apache.org/confluence/display/solr/Enabling+SSL#EnablingSSL-SolrCloud

 : I always get a message like this:
 : Caused by: org.noggit.JSONParser$ParseException: Expected string:
 : char=\,position=1 BEFORE='{\' AFTER='urlScheme\:\https\}'

 looks like you have literally backslash caractes in your JSON (evidently
 from your attempts to ecape the quote characters)

 If you're having trouble with putting hte JSON directly in the command
 line (you're examples looked really contrived - which shell are you
 using?) you can always -putfile directly and bypass any concerns about
 the shell...

 https://cwiki.apache.org/confluence/display/solr/Command+Line+Utilities



 -Hoss
 http://www.lucidworks.com/



Re: HTTPS for SolrCloud

2014-09-02 Thread Christopher Gross
Side note -- I've also tried adding the clusterprops.json file via
zookeeper's shell client on the command line, and within that client, all
with no luck.

-- Chris


On Tue, Sep 2, 2014 at 12:19 PM, Christopher Gross cogr...@gmail.com
wrote:

 Hi Hoss.

 I did finally stumble onto that document (just after I posted my last
 message, of course).
 Using bash shell.

 I've now tried those steps:

 Tomcat is stopped.

 First I run:
 ./zkcli.sh -zkhost localhost:2181 -cmd put /clusterprops.json
 '{urlScheme:https}'

 I confirm via the zookeeper-provided client:
 [zk: localhost:2181(CONNECTED) 0] get /clusterprops.json
 {urlScheme:https}
 cZxid = 0x1053a
 ctime = Tue Sep 02 16:11:09 GMT-00:00 2014
 mZxid = 0x1053a
 mtime = Tue Sep 02 16:11:09 GMT-00:00 2014
 pZxid = 0x1053a
 cversion = 0
 dataVersion = 0
 aclVersion = 0
 ephemeralOwner = 0x0
 dataLength = 21
 numChildren = 0
 [zk: localhost:2181(CONNECTED) 1]

 Next I start Tomcat, I get this:
 482  [localhost-startStop-1] ERROR org.apache.solr.core.SolrCore  â
 null:org.noggit.JSONParser$ParseException: JSON Parse Error:
 char=',position=0 BEFORE=''' AFTER='{urlScheme:https}''

 I've done it with  without the quotes, based on commentary here:
 http://qnalist.com/questions/4770318/solrcloud-and-https

 I get the same error with loading in the props this way:
 ./zkcli.sh -zkhost localhost:2181 -cmd put /clusterprops.json
 {\urlScheme\:\https\}

 Error:
 533  [localhost-startStop-1] ERROR org.apache.solr.core.SolrCore  â
 null:org.noggit.JSONParser$ParseException: JSON Parse Error:
 char=',position=0 BEFORE=''' AFTER='{urlScheme:https}''

 putfile also nets the same error.

 I'm not sure where I'm supposed to go from here.

 Thanks!

 -- Chris


 On Tue, Sep 2, 2014 at 12:06 PM, Chris Hostetter hossman_luc...@fucit.org
  wrote:


 First question: ignoring the oiginal jira (which may be out of date
 due to later improvements) have you seen the instructions?


 https://cwiki.apache.org/confluence/display/solr/Enabling+SSL#EnablingSSL-SolrCloud

 : I always get a message like this:
 : Caused by: org.noggit.JSONParser$ParseException: Expected string:
 : char=\,position=1 BEFORE='{\' AFTER='urlScheme\:\https\}'

 looks like you have literally backslash caractes in your JSON (evidently
 from your attempts to ecape the quote characters)

 If you're having trouble with putting hte JSON directly in the command
 line (you're examples looked really contrived - which shell are you
 using?) you can always -putfile directly and bypass any concerns about
 the shell...

 https://cwiki.apache.org/confluence/display/solr/Command+Line+Utilities



 -Hoss
 http://www.lucidworks.com/





WordDelimiter filter, expanding to multiple words, unexpected results

2014-09-02 Thread Jonathan Rochkind
Hello, I'm running into a case where a query is not returning the 
results I expect, and I'm hoping someone can offer some explanation that 
might help me fine tune things or understand what's up.


I am running Solr 4.3.

My filter chain includes a WordDelimiterFilter and, later a filter that 
downcases everything for case-insensitive searching. It includes many 
other things too, but I think these are the pertinent facts.


For query dELALAIN, the WordDelimiterFilter splits into:

text: d
start: 0
position: 1

text: ELALAIN
start: 1
position: 2

text: dELALAIN
start: 0
position: 2

Note the duplication/overlap of the tokens -- one version with d and 
ELALAIN split into two tokens, and another with just one token.


Later, all the tokens are lowercased by another filter in the chain. 
(actually an ICU filter which is doing something more complicated than 
just lowercasing, but I think we can consider it lowercasing for the 
purposes of this discussion).


If I understand right what the WordDelimiterFilter is trying to do here, 
it's probably doing something special because of the lowercase d 
followed by an uppercase letter, a special case for that. (I don't get 
this behavior with other mixed case queries not beginning with 'd').


And, what I think it's trying to do, is match text indexed as d 
elalain as well as text indexed by delalain.


The problem is, it's not accomplishing that -- it is NOT matching text 
that was indexed as delalain (one token).


I don't entirely understand what the position attribute is for -- but 
I wonder if in this case, the position on dELALAIN is really supposed 
to be 1, not 2?  Could that be responsible for the bug?  Or is position 
irrelevant in this case?


If that's not it, then I'm at a loss as to what may be causing this bug 
-- or even if it's a bug at all, or I'm just not understanding intended 
behavior. I expect a query for dELALAIN to match text indexed as 
delalain (because of the forced lowercasing in the filter chain). But 
it's not doing so. Are my expectations wrong? Bug? Something else?


Thanks for any advice,

Jonathan


Re: Search on specific shard

2014-09-02 Thread Anshum Gupta
Hi Ankit,

The following blog posts should help you understand composite-id routing in
SolrCloud better.

http://searchhub.org/2013/06/13/solr-cloud-document-routing/

A more complicated use case (multi-level routing) :
http://searchhub.org/2014/01/06/10590/



On Tue, Sep 2, 2014 at 6:38 AM, Ankit Jain ankitjainc...@gmail.com wrote:

 Hi All,

 I am using below piece of code to route a data on the basis of user field.
 The data of user1 is going on one shard and data of user2 is going on
 another shard.
  try {
 String zkHostString = 127.0.0.1:2181;
 CloudSolrServer cloudSolrServer = new
 CloudSolrServer(zkHostString);
 CollectionAdminRequest.createCollection(collection5, 2, 2, 2,
 null, null, user, cloudSolrServer);
 cloudSolrServer.setDefaultCollection(collection5);

 for (int i = 0; i = 100; i++) {
 SolrInputDocument document = new SolrInputDocument();
 document.addField(id, i);
 document.addField(user, user+(i%2));
 cloudSolrServer.add(document);
 }
 cloudSolrServer.commit();
 cloudSolrServer.shutdown();
 } catch (SolrException e) {
 e.printStackTrace();
 } catch (SolrServerException e) {
 e.printStackTrace();
 } catch (IOException e) {
 e.printStackTrace();
 }

 Now, I  want to use routing at search time. If user search the documents
 for user1, then my query should be execute on only shard1 (shard1 contains
 the data of user1).

 Please let me know, how we can route the query to specific shard at search
 time.

 --
 Thanks,
 Ankit Jain




-- 

Anshum Gupta
http://www.anshumgupta.net


Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-09-02 Thread Michael Della Bitta
Hi Jonathan,

Little confused by this line:

 And, what I think it's trying to do, is match text indexed as d elalain
as well as text indexed by delalain.

In this case, I don't know how WordDelimiterFilter will help, as you're
likely tokenizing on spaces somewhere, and that input text has a space. I
could be wrong. It's probably best if you post your field definition from
your schema.

Also, is this a free-text field, or something that's more like a short
string?

Thanks,


Michael Della Bitta

Applications Developer

o: +1 646 532 3062

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions https://twitter.com/Appinions | g+:
plus.google.com/appinions
https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
w: appinions.com http://www.appinions.com/


On Tue, Sep 2, 2014 at 12:41 PM, Jonathan Rochkind rochk...@jhu.edu wrote:

 Hello, I'm running into a case where a query is not returning the results
 I expect, and I'm hoping someone can offer some explanation that might help
 me fine tune things or understand what's up.

 I am running Solr 4.3.

 My filter chain includes a WordDelimiterFilter and, later a filter that
 downcases everything for case-insensitive searching. It includes many other
 things too, but I think these are the pertinent facts.

 For query dELALAIN, the WordDelimiterFilter splits into:

 text: d
 start: 0
 position: 1

 text: ELALAIN
 start: 1
 position: 2

 text: dELALAIN
 start: 0
 position: 2

 Note the duplication/overlap of the tokens -- one version with d and
 ELALAIN split into two tokens, and another with just one token.

 Later, all the tokens are lowercased by another filter in the chain.
 (actually an ICU filter which is doing something more complicated than just
 lowercasing, but I think we can consider it lowercasing for the purposes of
 this discussion).

 If I understand right what the WordDelimiterFilter is trying to do here,
 it's probably doing something special because of the lowercase d followed
 by an uppercase letter, a special case for that. (I don't get this behavior
 with other mixed case queries not beginning with 'd').

 And, what I think it's trying to do, is match text indexed as d elalain
 as well as text indexed by delalain.

 The problem is, it's not accomplishing that -- it is NOT matching text
 that was indexed as delalain (one token).

 I don't entirely understand what the position attribute is for -- but I
 wonder if in this case, the position on dELALAIN is really supposed to be
 1, not 2?  Could that be responsible for the bug?  Or is position
 irrelevant in this case?

 If that's not it, then I'm at a loss as to what may be causing this bug --
 or even if it's a bug at all, or I'm just not understanding intended
 behavior. I expect a query for dELALAIN to match text indexed as
 delalain (because of the forced lowercasing in the filter chain). But
 it's not doing so. Are my expectations wrong? Bug? Something else?

 Thanks for any advice,

 Jonathan



Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-09-02 Thread Jonathan Rochkind

Thanks for the response.

I understand the problem a little bit better after investigating more.

Posting my full field definitions is, I think, going to be confusing, as 
they are long and complicated. I can narrow it down to an isolation case 
if I need to. My indexed field in question is relatively short strings.


But what it's got to do with is the WordDelimiterFilter's default 
splitOnCaseChange=1 and generateWordParts=1, and the effects of such.


Let's take a less confusing example, query MacBook. With a 
WordDelimiterFilter followed by something that downcases everything.


I think what the WDF (followed by case folding) is trying to do is make 
query MacBook match both indexed text mac book as well as macbook 
-- either one should be a match. Is my understanding right of what 
WordDelimiterfilter with splitOnCaseChange=1 and generateWordParts=1 is 
intending to do?


In my actual index, query MacBook is matching ONLY mac book, and not 
macbook.  Which is unexpected. I indeed want it to match both. (I 
realize I could make it match only 'macbook' by setting 
splitOnCaseChange=0 and/or generateWordParts=0).


It's possible this is happening as a side effect of other parts of my 
complex field definition, and I really do need to post hte whole thing 
and/or isolate it. But I wonder if there are known general problem cases 
that cause this kind of failure, or any known bugs in 
WordDelimiterFilter (in Solr 4.3?) that cause this kind of failure.


And I wonder if WordDelimiter filter spitting out the token MacBook 
with position 2 rather than 1 is expected, irrelevant, or possibly a 
relevant problem.


Thanks again,

Jonathan

On 9/2/14 12:59 PM, Michael Della Bitta wrote:

Hi Jonathan,

Little confused by this line:


And, what I think it's trying to do, is match text indexed as d elalain

as well as text indexed by delalain.

In this case, I don't know how WordDelimiterFilter will help, as you're
likely tokenizing on spaces somewhere, and that input text has a space. I
could be wrong. It's probably best if you post your field definition from
your schema.

Also, is this a free-text field, or something that's more like a short
string?

Thanks,


Michael Della Bitta

Applications Developer

o: +1 646 532 3062

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions https://twitter.com/Appinions | g+:
plus.google.com/appinions
https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
w: appinions.com http://www.appinions.com/


On Tue, Sep 2, 2014 at 12:41 PM, Jonathan Rochkind rochk...@jhu.edu wrote:


Hello, I'm running into a case where a query is not returning the results
I expect, and I'm hoping someone can offer some explanation that might help
me fine tune things or understand what's up.

I am running Solr 4.3.

My filter chain includes a WordDelimiterFilter and, later a filter that
downcases everything for case-insensitive searching. It includes many other
things too, but I think these are the pertinent facts.

For query dELALAIN, the WordDelimiterFilter splits into:

text: d
start: 0
position: 1

text: ELALAIN
start: 1
position: 2

text: dELALAIN
start: 0
position: 2

Note the duplication/overlap of the tokens -- one version with d and
ELALAIN split into two tokens, and another with just one token.

Later, all the tokens are lowercased by another filter in the chain.
(actually an ICU filter which is doing something more complicated than just
lowercasing, but I think we can consider it lowercasing for the purposes of
this discussion).

If I understand right what the WordDelimiterFilter is trying to do here,
it's probably doing something special because of the lowercase d followed
by an uppercase letter, a special case for that. (I don't get this behavior
with other mixed case queries not beginning with 'd').

And, what I think it's trying to do, is match text indexed as d elalain
as well as text indexed by delalain.

The problem is, it's not accomplishing that -- it is NOT matching text
that was indexed as delalain (one token).

I don't entirely understand what the position attribute is for -- but I
wonder if in this case, the position on dELALAIN is really supposed to be
1, not 2?  Could that be responsible for the bug?  Or is position
irrelevant in this case?

If that's not it, then I'm at a loss as to what may be causing this bug --
or even if it's a bug at all, or I'm just not understanding intended
behavior. I expect a query for dELALAIN to match text indexed as
delalain (because of the forced lowercasing in the filter chain). But
it's not doing so. Are my expectations wrong? Bug? Something else?

Thanks for any advice,

Jonathan





Re: HTTPS for SolrCloud

2014-09-02 Thread Chris Hostetter

: ./zkcli.sh -zkhost localhost:2181 -cmd put /clusterprops.json
: '{urlScheme:https}'
...
: Next I start Tomcat, I get this:
: 482  [localhost-startStop-1] ERROR org.apache.solr.core.SolrCore  â
: null:org.noggit.JSONParser$ParseException: JSON Parse Error:
: char=',position=0 BEFORE=''' AFTER='{urlScheme:https}''

I can't reproduce the erorr you are describing when i follow all the 
steps on the SSL doc page (using bash, and the outer single quotes, just 
like you)...

https://cwiki.apache.org/confluence/display/solr/Enabling+SSL#EnablingSSL-SolrCloud


Are you certain that you  your solr nodes are talking to the same 
zookeeper instance?

(Because according to that error, there is a stray sigle-quote at the 
begining of the clusterprops.json file in the ZK server solr is 
talking to, and as you already confirmed there's no single quotes in the 
string you read back from the zk server you are talking to ... perhaps 
there are 2 zk instances setup somewhere and the one solr is using still 
has crufty data from before you got the quoting issue straightened out?)


do you see log messages early on in Solr's startup from ZkContainer that 
say...

1359 [main] INFO  org.apache.solr.core.ZkContainer  – Zookeeper 
client=localhost:2181

?
-Hoss
http://www.lucidworks.com/

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-09-02 Thread Michael Della Bitta
If that's your problem, I bet all you have to do is twiddle on one of the
catenate options, either catenateWords or catenateAll.

Michael Della Bitta

Applications Developer

o: +1 646 532 3062

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions https://twitter.com/Appinions | g+:
plus.google.com/appinions
https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
w: appinions.com http://www.appinions.com/


On Tue, Sep 2, 2014 at 1:07 PM, Jonathan Rochkind rochk...@jhu.edu wrote:

 Thanks for the response.

 I understand the problem a little bit better after investigating more.

 Posting my full field definitions is, I think, going to be confusing, as
 they are long and complicated. I can narrow it down to an isolation case if
 I need to. My indexed field in question is relatively short strings.

 But what it's got to do with is the WordDelimiterFilter's default
 splitOnCaseChange=1 and generateWordParts=1, and the effects of such.

 Let's take a less confusing example, query MacBook. With a
 WordDelimiterFilter followed by something that downcases everything.

 I think what the WDF (followed by case folding) is trying to do is make
 query MacBook match both indexed text mac book as well as macbook --
 either one should be a match. Is my understanding right of what
 WordDelimiterfilter with splitOnCaseChange=1 and generateWordParts=1 is
 intending to do?

 In my actual index, query MacBook is matching ONLY mac book, and not
 macbook.  Which is unexpected. I indeed want it to match both. (I realize
 I could make it match only 'macbook' by setting splitOnCaseChange=0 and/or
 generateWordParts=0).

 It's possible this is happening as a side effect of other parts of my
 complex field definition, and I really do need to post hte whole thing
 and/or isolate it. But I wonder if there are known general problem cases
 that cause this kind of failure, or any known bugs in WordDelimiterFilter
 (in Solr 4.3?) that cause this kind of failure.

 And I wonder if WordDelimiter filter spitting out the token MacBook with
 position 2 rather than 1 is expected, irrelevant, or possibly a
 relevant problem.

 Thanks again,

 Jonathan


 On 9/2/14 12:59 PM, Michael Della Bitta wrote:

 Hi Jonathan,

 Little confused by this line:

  And, what I think it's trying to do, is match text indexed as d elalain

 as well as text indexed by delalain.

 In this case, I don't know how WordDelimiterFilter will help, as you're
 likely tokenizing on spaces somewhere, and that input text has a space. I
 could be wrong. It's probably best if you post your field definition from
 your schema.

 Also, is this a free-text field, or something that's more like a short
 string?

 Thanks,


 Michael Della Bitta

 Applications Developer

 o: +1 646 532 3062

 appinions inc.

 “The Science of Influence Marketing”

 18 East 41st Street

 New York, NY 10017

 t: @appinions https://twitter.com/Appinions | g+:
 plus.google.com/appinions
 https://plus.google.com/u/0/b/112002776285509593336/
 112002776285509593336/posts
 w: appinions.com http://www.appinions.com/



 On Tue, Sep 2, 2014 at 12:41 PM, Jonathan Rochkind rochk...@jhu.edu
 wrote:

  Hello, I'm running into a case where a query is not returning the results
 I expect, and I'm hoping someone can offer some explanation that might
 help
 me fine tune things or understand what's up.

 I am running Solr 4.3.

 My filter chain includes a WordDelimiterFilter and, later a filter that
 downcases everything for case-insensitive searching. It includes many
 other
 things too, but I think these are the pertinent facts.

 For query dELALAIN, the WordDelimiterFilter splits into:

 text: d
 start: 0
 position: 1

 text: ELALAIN
 start: 1
 position: 2

 text: dELALAIN
 start: 0
 position: 2

 Note the duplication/overlap of the tokens -- one version with d and
 ELALAIN split into two tokens, and another with just one token.

 Later, all the tokens are lowercased by another filter in the chain.
 (actually an ICU filter which is doing something more complicated than
 just
 lowercasing, but I think we can consider it lowercasing for the purposes
 of
 this discussion).

 If I understand right what the WordDelimiterFilter is trying to do here,
 it's probably doing something special because of the lowercase d
 followed
 by an uppercase letter, a special case for that. (I don't get this
 behavior
 with other mixed case queries not beginning with 'd').

 And, what I think it's trying to do, is match text indexed as d elalain
 as well as text indexed by delalain.

 The problem is, it's not accomplishing that -- it is NOT matching text
 that was indexed as delalain (one token).

 I don't entirely understand what the position attribute is for -- but I
 wonder if in this case, the position on dELALAIN is really supposed to
 be
 1, not 2?  Could that be responsible for the bug?  Or is position
 irrelevant in this case?

 If that's not it, then I'm 

Re: HTTPS for SolrCloud

2014-09-02 Thread Christopher Gross
OK -- so I think my previous attempts were causing the problem.
Since this is a dev environment (and is still empty), I just went ahead and
wiped out the version-2 directories for the zookeeper nodes, reloaded my
solr collections, then ran that command (zkcli.sh in the solr distro).
That did work.  What is a reliable way to remove a file from Zookeeper?

Now I just get this error when trying to create a collection:
org.apache.solr.client.solrj.SolrServerException:IOException occured when
talking to server at: https://server:8444

This brings up another problem that I have -- if there's an error creating
a collection, if I fix the issue and try to re-create the collection, I get
something like this:

str name=Operation createcollection caused
exception:org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
collection already exists: testcollection/str

How do I go about cleaning those up?  The only reliable thing that I've
found is to wipe out the zookeepers and start over.

Thanks Hoss!




-- Chris


On Tue, Sep 2, 2014 at 1:08 PM, Chris Hostetter hossman_luc...@fucit.org
wrote:


 : ./zkcli.sh -zkhost localhost:2181 -cmd put /clusterprops.json
 : '{urlScheme:https}'
 ...
 : Next I start Tomcat, I get this:
 : 482  [localhost-startStop-1] ERROR org.apache.solr.core.SolrCore  â
 : null:org.noggit.JSONParser$ParseException: JSON Parse Error:
 : char=',position=0 BEFORE=''' AFTER='{urlScheme:https}''

 I can't reproduce the erorr you are describing when i follow all the
 steps on the SSL doc page (using bash, and the outer single quotes, just
 like you)...


 https://cwiki.apache.org/confluence/display/solr/Enabling+SSL#EnablingSSL-SolrCloud


 Are you certain that you  your solr nodes are talking to the same
 zookeeper instance?

 (Because according to that error, there is a stray sigle-quote at the
 begining of the clusterprops.json file in the ZK server solr is
 talking to, and as you already confirmed there's no single quotes in the
 string you read back from the zk server you are talking to ... perhaps
 there are 2 zk instances setup somewhere and the one solr is using still
 has crufty data from before you got the quoting issue straightened out?)


 do you see log messages early on in Solr's startup from ZkContainer that
 say...

 1359 [main] INFO  org.apache.solr.core.ZkContainer  – Zookeeper
 client=localhost:2181

 ?
 -Hoss
 http://www.lucidworks.com/


Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-09-02 Thread Jonathan Rochkind
Yes, thanks, I realize I can twiddle those parameters, but it will 
probably result in MacBook no longer matching mac book at all, but 
ONLY matching macbook.


My understanding of the default settings of WordDelimiterFactory is that 
they are intending for MacBook to match both mac book AND macbook.


I will try to create an isolation reproduction that demonstrates this 
ruling out interference from other filters (or identifying the other 
filters), to make my question more clear, I guess.


Jonathan

On 9/2/14 1:34 PM, Michael Della Bitta wrote:

If that's your problem, I bet all you have to do is twiddle on one of the
catenate options, either catenateWords or catenateAll.

Michael Della Bitta

Applications Developer

o: +1 646 532 3062

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions https://twitter.com/Appinions | g+:
plus.google.com/appinions
https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
w: appinions.com http://www.appinions.com/


On Tue, Sep 2, 2014 at 1:07 PM, Jonathan Rochkind rochk...@jhu.edu wrote:


Thanks for the response.

I understand the problem a little bit better after investigating more.

Posting my full field definitions is, I think, going to be confusing, as
they are long and complicated. I can narrow it down to an isolation case if
I need to. My indexed field in question is relatively short strings.

But what it's got to do with is the WordDelimiterFilter's default
splitOnCaseChange=1 and generateWordParts=1, and the effects of such.

Let's take a less confusing example, query MacBook. With a
WordDelimiterFilter followed by something that downcases everything.

I think what the WDF (followed by case folding) is trying to do is make
query MacBook match both indexed text mac book as well as macbook --
either one should be a match. Is my understanding right of what
WordDelimiterfilter with splitOnCaseChange=1 and generateWordParts=1 is
intending to do?

In my actual index, query MacBook is matching ONLY mac book, and not
macbook.  Which is unexpected. I indeed want it to match both. (I realize
I could make it match only 'macbook' by setting splitOnCaseChange=0 and/or
generateWordParts=0).

It's possible this is happening as a side effect of other parts of my
complex field definition, and I really do need to post hte whole thing
and/or isolate it. But I wonder if there are known general problem cases
that cause this kind of failure, or any known bugs in WordDelimiterFilter
(in Solr 4.3?) that cause this kind of failure.

And I wonder if WordDelimiter filter spitting out the token MacBook with
position 2 rather than 1 is expected, irrelevant, or possibly a
relevant problem.

Thanks again,

Jonathan


On 9/2/14 12:59 PM, Michael Della Bitta wrote:


Hi Jonathan,

Little confused by this line:

  And, what I think it's trying to do, is match text indexed as d elalain



as well as text indexed by delalain.

In this case, I don't know how WordDelimiterFilter will help, as you're
likely tokenizing on spaces somewhere, and that input text has a space. I
could be wrong. It's probably best if you post your field definition from
your schema.

Also, is this a free-text field, or something that's more like a short
string?

Thanks,


Michael Della Bitta

Applications Developer

o: +1 646 532 3062

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions https://twitter.com/Appinions | g+:
plus.google.com/appinions
https://plus.google.com/u/0/b/112002776285509593336/
112002776285509593336/posts
w: appinions.com http://www.appinions.com/



On Tue, Sep 2, 2014 at 12:41 PM, Jonathan Rochkind rochk...@jhu.edu
wrote:

  Hello, I'm running into a case where a query is not returning the results

I expect, and I'm hoping someone can offer some explanation that might
help
me fine tune things or understand what's up.

I am running Solr 4.3.

My filter chain includes a WordDelimiterFilter and, later a filter that
downcases everything for case-insensitive searching. It includes many
other
things too, but I think these are the pertinent facts.

For query dELALAIN, the WordDelimiterFilter splits into:

text: d
start: 0
position: 1

text: ELALAIN
start: 1
position: 2

text: dELALAIN
start: 0
position: 2

Note the duplication/overlap of the tokens -- one version with d and
ELALAIN split into two tokens, and another with just one token.

Later, all the tokens are lowercased by another filter in the chain.
(actually an ICU filter which is doing something more complicated than
just
lowercasing, but I think we can consider it lowercasing for the purposes
of
this discussion).

If I understand right what the WordDelimiterFilter is trying to do here,
it's probably doing something special because of the lowercase d
followed
by an uppercase letter, a special case for that. (I don't get this
behavior
with other mixed case queries not beginning with 'd').

And, what I think it's 

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-09-02 Thread Erick Erickson
bq: In my actual index, query MacBook is matching ONLY mac book, and
not macbook

I suspect your query parameters for WordDelimiterFilterFactory doesn't have
catenate words set.

What do you see when you enter these in both the index and query portions
of the admin/analysis page?

Best,
Erick


On Tue, Sep 2, 2014 at 10:34 AM, Michael Della Bitta 
michael.della.bi...@appinions.com wrote:

 If that's your problem, I bet all you have to do is twiddle on one of the
 catenate options, either catenateWords or catenateAll.

 Michael Della Bitta

 Applications Developer

 o: +1 646 532 3062

 appinions inc.

 “The Science of Influence Marketing”

 18 East 41st Street

 New York, NY 10017

 t: @appinions https://twitter.com/Appinions | g+:
 plus.google.com/appinions
 
 https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
 
 w: appinions.com http://www.appinions.com/


 On Tue, Sep 2, 2014 at 1:07 PM, Jonathan Rochkind rochk...@jhu.edu
 wrote:

  Thanks for the response.
 
  I understand the problem a little bit better after investigating more.
 
  Posting my full field definitions is, I think, going to be confusing, as
  they are long and complicated. I can narrow it down to an isolation case
 if
  I need to. My indexed field in question is relatively short strings.
 
  But what it's got to do with is the WordDelimiterFilter's default
  splitOnCaseChange=1 and generateWordParts=1, and the effects of such.
 
  Let's take a less confusing example, query MacBook. With a
  WordDelimiterFilter followed by something that downcases everything.
 
  I think what the WDF (followed by case folding) is trying to do is make
  query MacBook match both indexed text mac book as well as macbook
 --
  either one should be a match. Is my understanding right of what
  WordDelimiterfilter with splitOnCaseChange=1 and generateWordParts=1 is
  intending to do?
 
  In my actual index, query MacBook is matching ONLY mac book, and not
  macbook.  Which is unexpected. I indeed want it to match both. (I
 realize
  I could make it match only 'macbook' by setting splitOnCaseChange=0
 and/or
  generateWordParts=0).
 
  It's possible this is happening as a side effect of other parts of my
  complex field definition, and I really do need to post hte whole thing
  and/or isolate it. But I wonder if there are known general problem cases
  that cause this kind of failure, or any known bugs in WordDelimiterFilter
  (in Solr 4.3?) that cause this kind of failure.
 
  And I wonder if WordDelimiter filter spitting out the token MacBook
 with
  position 2 rather than 1 is expected, irrelevant, or possibly a
  relevant problem.
 
  Thanks again,
 
  Jonathan
 
 
  On 9/2/14 12:59 PM, Michael Della Bitta wrote:
 
  Hi Jonathan,
 
  Little confused by this line:
 
   And, what I think it's trying to do, is match text indexed as d
 elalain
 
  as well as text indexed by delalain.
 
  In this case, I don't know how WordDelimiterFilter will help, as you're
  likely tokenizing on spaces somewhere, and that input text has a space.
 I
  could be wrong. It's probably best if you post your field definition
 from
  your schema.
 
  Also, is this a free-text field, or something that's more like a short
  string?
 
  Thanks,
 
 
  Michael Della Bitta
 
  Applications Developer
 
  o: +1 646 532 3062
 
  appinions inc.
 
  “The Science of Influence Marketing”
 
  18 East 41st Street
 
  New York, NY 10017
 
  t: @appinions https://twitter.com/Appinions | g+:
  plus.google.com/appinions
  https://plus.google.com/u/0/b/112002776285509593336/
  112002776285509593336/posts
  w: appinions.com http://www.appinions.com/
 
 
 
  On Tue, Sep 2, 2014 at 12:41 PM, Jonathan Rochkind rochk...@jhu.edu
  wrote:
 
   Hello, I'm running into a case where a query is not returning the
 results
  I expect, and I'm hoping someone can offer some explanation that might
  help
  me fine tune things or understand what's up.
 
  I am running Solr 4.3.
 
  My filter chain includes a WordDelimiterFilter and, later a filter that
  downcases everything for case-insensitive searching. It includes many
  other
  things too, but I think these are the pertinent facts.
 
  For query dELALAIN, the WordDelimiterFilter splits into:
 
  text: d
  start: 0
  position: 1
 
  text: ELALAIN
  start: 1
  position: 2
 
  text: dELALAIN
  start: 0
  position: 2
 
  Note the duplication/overlap of the tokens -- one version with d and
  ELALAIN split into two tokens, and another with just one token.
 
  Later, all the tokens are lowercased by another filter in the chain.
  (actually an ICU filter which is doing something more complicated than
  just
  lowercasing, but I think we can consider it lowercasing for the
 purposes
  of
  this discussion).
 
  If I understand right what the WordDelimiterFilter is trying to do
 here,
  it's probably doing something special because of the lowercase d
  followed
  by an uppercase letter, a special case for that. (I don't get this
  behavior
  with 

RE: Solr spellcheck returns more than 1 word for a 1 word spellcheck

2014-09-02 Thread Dyer, James
This is the WordBreakSolrSpellChecker, which is there to correct spelling 
errors involving misplaced whitespace (or is it white space ??)  To disable it, 
remove this or similar line from your requestHandler in solrconfig.xml:

str name=spellcheck.dictionarywordbreak/str

Keep in mind, if you want the best of both worlds, you can keep this there and 
using the collation feature, it will try and pick the best combination of 
spelling corrections that best fixes your user's query. See 
http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.collate and 
following sections.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Thomas Michael Engelke [mailto:thomas.enge...@posteo.de] 
Sent: Monday, September 01, 2014 6:44 AM
To: Solr user
Subject: Solr spellcheck returns more than 1 word for a 1 word spellcheck

 I'm in the process of incorporating Solr spellchecking in our product.
For that, I've created a new field:

 field name=spell type=spell
indexed=true stored=true required=false multiValued=false/

copyField source=name dest=spell maxChars=3 /

And in the
fieldType definitions:

 fieldType name=spell class=solr.TextField
positionIncrementGap=100
 analyzer
 tokenizer
class=solr.WhitespaceTokenizerFactory/
 /analyzer

/fieldType

Then I feed the names of products into the corresponding
core. They can have a lot of words (examples):

 door lock rear left

Door brake, door in front + rear fitting.

However, the names get pretty
long, and in the source data, they have been truncated. This sometimes
leaves parts of words at the end:

 The water pump can evacuate some
coo

I have created a spellcheck component, feeding of the `spell` field
defined earlier. Now for the problem.

Sometimes, when I look up a
slightly misspelled word, I get results I do not expect. Example
request:

 http://solr.url:8983/solr/en/spell?q=coole

This is (part of)
the response:

 str name=wordcooler/strint name=freq21/int

str name=wordcoo le/strint name=freq2/int
 str
name=wordcable/strint name=freq334/int
 str name=wordco o
le/strint name=freq4/int
 [...]

Now, as you can see, the
misspelled `coole` should have been `cooler`, and it's the first
suggestion. However, the second and fourth suggestion baffle me. After a
bit of research, I found this to be multiple words clunked together. As
I described above, `coo` was a part of a name that was truncated. I
found `co` the same way, and the source data contains a small number of
`o` characters on their own (product number names).

Now, my question
is: Why is Solr suggesting `multiple words` pasted together for a
spellcheck for a single word? Is there a way to prevent Solr from
pasting together word parts to forge suggestions? 
 


Re: HTTPS for SolrCloud

2014-09-02 Thread Christopher Gross
Is the solr.ssl.checkPeerName option available in 4.8.1?  I have my Tomcat
starting up with that as a -D option, but I'm getting an exception on
validating the hostname w/ the cert...

-- Chris


On Tue, Sep 2, 2014 at 1:44 PM, Christopher Gross cogr...@gmail.com wrote:

 OK -- so I think my previous attempts were causing the problem.
 Since this is a dev environment (and is still empty), I just went ahead
 and wiped out the version-2 directories for the zookeeper nodes, reloaded
 my solr collections, then ran that command (zkcli.sh in the solr distro).
 That did work.  What is a reliable way to remove a file from Zookeeper?

 Now I just get this error when trying to create a collection:
 org.apache.solr.client.solrj.SolrServerException:IOException occured when
 talking to server at: https://server:8444

 This brings up another problem that I have -- if there's an error creating
 a collection, if I fix the issue and try to re-create the collection, I get
 something like this:

 str name=Operation createcollection caused
 exception:org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
 collection already exists: testcollection/str

 How do I go about cleaning those up?  The only reliable thing that I've
 found is to wipe out the zookeepers and start over.

 Thanks Hoss!




 -- Chris


 On Tue, Sep 2, 2014 at 1:08 PM, Chris Hostetter hossman_luc...@fucit.org
 wrote:


 : ./zkcli.sh -zkhost localhost:2181 -cmd put /clusterprops.json
 : '{urlScheme:https}'
 ...
 : Next I start Tomcat, I get this:
 : 482  [localhost-startStop-1] ERROR org.apache.solr.core.SolrCore  â
 : null:org.noggit.JSONParser$ParseException: JSON Parse Error:
 : char=',position=0 BEFORE=''' AFTER='{urlScheme:https}''

 I can't reproduce the erorr you are describing when i follow all the
 steps on the SSL doc page (using bash, and the outer single quotes, just
 like you)...


 https://cwiki.apache.org/confluence/display/solr/Enabling+SSL#EnablingSSL-SolrCloud


 Are you certain that you  your solr nodes are talking to the same
 zookeeper instance?

 (Because according to that error, there is a stray sigle-quote at the
 begining of the clusterprops.json file in the ZK server solr is
 talking to, and as you already confirmed there's no single quotes in the
 string you read back from the zk server you are talking to ... perhaps
 there are 2 zk instances setup somewhere and the one solr is using still
 has crufty data from before you got the quoting issue straightened out?)


 do you see log messages early on in Solr's startup from ZkContainer that
 say...

 1359 [main] INFO  org.apache.solr.core.ZkContainer  – Zookeeper
 client=localhost:2181

 ?
 -Hoss
 http://www.lucidworks.com/





Solr 4.1.0 Compatibility with zookeeper 3.4.5

2014-09-02 Thread Shivam Bajpai

Hello,
  I'm using solr 4.1.0 with zookeeper 3.3.6 and need to update to 
zookeeper 3.4.5 . I would like to make sure if solr 4.1.0 is compatible 
with zookeeper 3.4.5 or if there are any precautions should I take 
before up-gradation.


--
Best Regards,
Shivam Bajpai
DevOps Engineer
StackExpress



Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-09-02 Thread Jonathan Rochkind

On 9/2/14 1:51 PM, Erick Erickson wrote:

bq: In my actual index, query MacBook is matching ONLY mac book, and
not macbook

I suspect your query parameters for WordDelimiterFilterFactory doesn't have
catenate words set.

What do you see when you enter these in both the index and query portions
of the admin/analysis page?


Thanks Erick!

Our WordDelimiterFilterFactory does have catenate words set, in both 
index and query phases (is that right?):


filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=1 catenateNumbers=1 
catenateAll=0 splitOnCaseChange=1/


It's hard to cut and paste the results of the analysis page into email 
(or anywhere!), I'll give you screenshots, sorry -- and I'll give them 
for our whole real world app complex field definition. I'll also paste 
in our entire field definition below. But I realize my next step is 
probably creating a simpler isolation/reproduction case (unless you have 
a magic answer from this!).


Again, the problem is that MacBook seems to be only matching on 
indexed macbook and not indexed mac book.



MacBook query analysis:
https://www.dropbox.com/s/b8y11usjdlc88un/mixedcasequery.png

MacBook index analysis:
https://www.dropbox.com/s/fwae3nz4tdtjhjv/mixedcaseindex.png

mac book index analysis:
https://www.dropbox.com/s/mihd58f6zs3rfu8/twowordindex.png


Our entire actual field definition:

  fieldType name=text class=solr.TextField 
positionIncrementGap=100 autoGeneratePhraseQueries=true

  analyzer
   !-- the rulefiles thing is to keep ICUTokenizerFactory from 
stripping punctuation,

so our synonym filter involving C++ etc can still work.
From: 
https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201305.mbox/%3c51965e70.6070...@elyograg.org%3E
the rbbi file is in our local ./conf, copied from lucene 
source tree --
   tokenizer class=solr.ICUTokenizerFactory 
rulefiles=Latn:Latin-break-only-on-whitespace.rbbi/


   filter class=solr.SynonymFilterFactory 
synonyms=punctuation-whitelist.txt ignoreCase=true/


filter class=solr.WordDelimiterFilterFactory 
generateWordParts=1 generateNumberParts=1 catenateWords=1 
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/



!-- folding need sto be after WordDelimiter, so WordDelimiter
 can do it's thing with full cases and such --
filter class=solr.ICUFoldingFilterFactory /


!-- ICUFolding already includes lowercasing, no
 need for seperate lowercasing step
filter class=solr.LowerCaseFilterFactory/
--

filter class=solr.SnowballPorterFilterFactory 
language=English protected=protwords.txt/

filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType






Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-09-02 Thread Erick Erickson
What happens if you append debug=query to your query? IOW, what does the
_parsed_ query look like?

Also note that the defaults for WDFF are _not_ identical. catenateWords and
catenateNumbers are 1 in the
index portion and 0 in the query section. Still, this shouldn't be a
problem all other things being equal.

Best,
Erick


On Tue, Sep 2, 2014 at 12:43 PM, Jonathan Rochkind rochk...@jhu.edu wrote:

 On 9/2/14 1:51 PM, Erick Erickson wrote:

 bq: In my actual index, query MacBook is matching ONLY mac book, and
 not macbook

 I suspect your query parameters for WordDelimiterFilterFactory doesn't
 have
 catenate words set.

 What do you see when you enter these in both the index and query portions
 of the admin/analysis page?


 Thanks Erick!

 Our WordDelimiterFilterFactory does have catenate words set, in both index
 and query phases (is that right?):

 filter class=solr.WordDelimiterFilterFactory generateWordParts=1
 generateNumberParts=1 catenateWords=1 catenateNumbers=1
 catenateAll=0 splitOnCaseChange=1/

 It's hard to cut and paste the results of the analysis page into email (or
 anywhere!), I'll give you screenshots, sorry -- and I'll give them for our
 whole real world app complex field definition. I'll also paste in our
 entire field definition below. But I realize my next step is probably
 creating a simpler isolation/reproduction case (unless you have a magic
 answer from this!).

 Again, the problem is that MacBook seems to be only matching on indexed
 macbook and not indexed mac book.


 MacBook query analysis:
 https://www.dropbox.com/s/b8y11usjdlc88un/mixedcasequery.png

 MacBook index analysis:
 https://www.dropbox.com/s/fwae3nz4tdtjhjv/mixedcaseindex.png

 mac book index analysis:
 https://www.dropbox.com/s/mihd58f6zs3rfu8/twowordindex.png


 Our entire actual field definition:

   fieldType name=text class=solr.TextField positionIncrementGap=100
 autoGeneratePhraseQueries=true
   analyzer
!-- the rulefiles thing is to keep ICUTokenizerFactory from
 stripping punctuation,
 so our synonym filter involving C++ etc can still work.
 From: https://mail-archives.apache.
 org/mod_mbox/lucene-solr-user/201305.mbox/%3C51965E70.
 6070...@elyograg.org%3E
 the rbbi file is in our local ./conf, copied from lucene
 source tree --
tokenizer class=solr.ICUTokenizerFactory
 rulefiles=Latn:Latin-break-only-on-whitespace.rbbi/

filter class=solr.SynonymFilterFactory 
 synonyms=punctuation-whitelist.txt
 ignoreCase=true/

 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/


 !-- folding need sto be after WordDelimiter, so WordDelimiter
  can do it's thing with full cases and such --
 filter class=solr.ICUFoldingFilterFactory /


 !-- ICUFolding already includes lowercasing, no
  need for seperate lowercasing step
 filter class=solr.LowerCaseFilterFactory/
 --

 filter class=solr.SnowballPorterFilterFactory
 language=English protected=protwords.txt/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
 /fieldType







Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-09-02 Thread Diego Fernandez
Although not a solution, this may help in trying to find the problem.
In http://solr.pl/en/2010/08/16/what-is-schema-xml/ it says:

It is worth noting that there is an additional attribute for the text field 
type:

autoGeneratePhraseQueries

This attribute is responsible for telling filters how to behave when dividing 
tokens. Some filters (such as WordDelimiterFilter) can divide tokens into a set 
of tokens. Setting the attribute to true (default value) will automatically 
generate phrase queries. This means that WordDelimiterFilter will divide the 
word “wi-fi” into two tokens “wi” and “fi”. With autoGeneratePhraseQueries set 
to true query sent to Lucene will look like field:wi fi, while with set to 
false Lucene query will look like field:wi OR field:fi. However, please note, 
that this attribute only behaves well with tokenizers based on white spaces.

Since phrases are made by looking at the position, it is possible that the 
position set for the other generated tokens have something to do with it.  Have 
you tried turning autoGeneratePhraseQueries=false to see if it'll match both? 
(I know that might have other unintended behaviors but it might give some 
insight into the problem)

Diego Fernandez - 爱国
Software Engineer
US GSS Supportability - Diagnostics



- Original Message -
 On 9/2/14 1:51 PM, Erick Erickson wrote:
  bq: In my actual index, query MacBook is matching ONLY mac book, and
  not macbook
 
  I suspect your query parameters for WordDelimiterFilterFactory doesn't have
  catenate words set.
 
  What do you see when you enter these in both the index and query portions
  of the admin/analysis page?
 
 Thanks Erick!
 
 Our WordDelimiterFilterFactory does have catenate words set, in both
 index and query phases (is that right?):
 
 filter class=solr.WordDelimiterFilterFactory generateWordParts=1
 generateNumberParts=1 catenateWords=1 catenateNumbers=1
 catenateAll=0 splitOnCaseChange=1/
 
 It's hard to cut and paste the results of the analysis page into email
 (or anywhere!), I'll give you screenshots, sorry -- and I'll give them
 for our whole real world app complex field definition. I'll also paste
 in our entire field definition below. But I realize my next step is
 probably creating a simpler isolation/reproduction case (unless you have
 a magic answer from this!).
 
 Again, the problem is that MacBook seems to be only matching on
 indexed macbook and not indexed mac book.
 
 
 MacBook query analysis:
 https://www.dropbox.com/s/b8y11usjdlc88un/mixedcasequery.png
 
 MacBook index analysis:
 https://www.dropbox.com/s/fwae3nz4tdtjhjv/mixedcaseindex.png
 
 mac book index analysis:
 https://www.dropbox.com/s/mihd58f6zs3rfu8/twowordindex.png
 
 
 Our entire actual field definition:
 
fieldType name=text class=solr.TextField
 positionIncrementGap=100 autoGeneratePhraseQueries=true
analyzer
 !-- the rulefiles thing is to keep ICUTokenizerFactory from
 stripping punctuation,
  so our synonym filter involving C++ etc can still work.
  From:
 https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201305.mbox/%3c51965e70.6070...@elyograg.org%3E
  the rbbi file is in our local ./conf, copied from lucene
 source tree --
 tokenizer class=solr.ICUTokenizerFactory
 rulefiles=Latn:Latin-break-only-on-whitespace.rbbi/
 
 filter class=solr.SynonymFilterFactory
 synonyms=punctuation-whitelist.txt ignoreCase=true/
 
  filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
 
 
  !-- folding need sto be after WordDelimiter, so WordDelimiter
   can do it's thing with full cases and such --
  filter class=solr.ICUFoldingFilterFactory /
 
 
  !-- ICUFolding already includes lowercasing, no
   need for seperate lowercasing step
  filter class=solr.LowerCaseFilterFactory/
  --
 
  filter class=solr.SnowballPorterFilterFactory
 language=English protected=protwords.txt/
  filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
  /fieldType
 
 
 
 
 


How can I set shard members?

2014-09-02 Thread Lee Chunki
Hi,

I am trying to test Solr Cloud with version 4.1.0.
(  
http://wiki.apache.org/solr/SolrCloud#Example_C:_Two_shard_cluster_with_shard_replicas_and_zookeeper_ensemble
 )

Is there any way set shard  shard member ?

for example.
server1, server2 for shard1
server3, server4 for shard2

when I tested the example, shard member depend on running Solr order.
i.e. run server1 - server2 - server3 - server4 then server1, 3 are shard1 
and server 2,4 are shard2
of course, from second time there is no dependency of running Solr order.

and I tried -DshardId=shard1” but it is not working.

Thanks,
Chunki.

Re: How can I set shard members?

2014-09-02 Thread Jürgen Wagner (DVT)
Hello,
  have you tried the createNodeSet option of collection/shard creation
and the node option of replica creation in Solr 4.9.0+?
As you're just testing, I would strongly recommend going to the latest
version.

https://cwiki.apache.org/confluence/display/solr/Collections+API

This is useful to provide underlying topology information. We use this
in customer scenarios to partition the set of servers into at least two
groups, so all shards of a SolrCloud cluster will have replica X of a
shard located in server group X (usually, X = 2). The two server groups
then correspond to two separate physical ESX clusters, so if one VM
cluster goes down, at least one replica of each shard will still be
available.

Cheers,
--Jürgen


On 03.09.2014 06:00, Lee Chunki wrote:
 Hi,

 I am trying to test Solr Cloud with version 4.1.0.
 (  
 http://wiki.apache.org/solr/SolrCloud#Example_C:_Two_shard_cluster_with_shard_replicas_and_zookeeper_ensemble
  )

 Is there any way set shard  shard member ?

 for example.
 server1, server2 for shard1
 server3, server4 for shard2

 when I tested the example, shard member depend on running Solr order.
 i.e. run server1 - server2 - server3 - server4 then server1, 3 are shard1 
 and server 2,4 are shard2
 of course, from second time there is no dependency of running Solr order.

 and I tried -DshardId=shard1” but it is not working.

 Thanks,
 Chunki.




Re: How can I set shard members?

2014-09-02 Thread Erick Erickson
Take a look here:
http://heliosearch.org/solrcloud-assigning-nodes-machines/

If you really, really, really require that shard1 be on server1 and
_not_ server 3 I'm not quite sure how you'd do it. But if you want
your leaders on servers 1 and 3, just use the nodeset. (Jürgen beat me
to it!).

Best
Erick

On Tue, Sep 2, 2014 at 9:00 PM, Lee Chunki lck7...@coupang.com wrote:
 Hi,

 I am trying to test Solr Cloud with version 4.1.0.
 (  
 http://wiki.apache.org/solr/SolrCloud#Example_C:_Two_shard_cluster_with_shard_replicas_and_zookeeper_ensemble
  )

 Is there any way set shard  shard member ?

 for example.
 server1, server2 for shard1
 server3, server4 for shard2

 when I tested the example, shard member depend on running Solr order.
 i.e. run server1 - server2 - server3 - server4 then server1, 3 are shard1 
 and server 2,4 are shard2
 of course, from second time there is no dependency of running Solr order.

 and I tried -DshardId=shard1” but it is not working.

 Thanks,
 Chunki.