Re: External File Field eating memory

2014-07-16 Thread Kamal Kishore Aggarwal
Hi Apporva,

This was my master server replication configuration:

core/conf/solrconfig.xml

requestHandler name=/replication class=solr.ReplicationHandler 
 lst name=master
 str name=replicateAftercommit/str
 str name=replicateAfterstartup/str
 str name=confFiles../data/external_eff_views/str
 /lst
 /requestHandler


It is only configuration files that can be replicated. So, when I wrote the
above config. The external files was getting replicated in
core/conf/data/external_eff_views.
But for solr to read the external file, it looks for it into
core/data/external_eff_views
location. So firstly the file was not getting replicated properly.
Therefore, I did not opted the option of replicating the eff file.

And the second thing is that whenever there is a change in configuration
files, the core gets reloaded by itself to reflect the changes. I am not
sure if you can disable this reloading.

Finally, I thought of creating files on slaves in a different way.

Thanks
Kamal


On Tue, Jul 15, 2014 at 11:00 AM, Apoorva Gaurav apoorva.gau...@myntra.com
wrote:

 Hey Kamal,
 What all config changes have you done to establish replication of external
 files and how have you disabled role reloading?


 On Wed, Jul 9, 2014 at 11:30 AM, Kamal Kishore Aggarwal 
 kkroyal@gmail.com wrote:

  Hi All,
 
  It was found that external file, which was getting replicated after every
  10 minutes was reloading the core as well. This was increasing the query
  time.
 
  Thanks
  Kamal Kishore
 
 
 
  On Thu, Jul 3, 2014 at 12:48 PM, Kamal Kishore Aggarwal 
  kkroyal@gmail.com wrote:
 
   With the above replication configuration, the eff file is getting
   replicated at core/conf/data/external_eff_views (new dir data is being
   created in conf dir) location, but it is not getting replicated at
  core/data/external_eff_views
   on slave.
  
   Please help.
  
  
   On Thu, Jul 3, 2014 at 12:21 PM, Kamal Kishore Aggarwal 
   kkroyal@gmail.com wrote:
  
   Thanks for your guidance Alexandre Rafalovitch.
  
   I am looking into this seriously.
  
   Another question is that I facing error in replication of eff file
  
   This is master replication configuration:
  
   core/conf/solrconfig.xml
  
   requestHandler name=/replication class=solr.ReplicationHandler 
   lst name=master
   str name=replicateAftercommit/str
   str name=replicateAfterstartup/str
   str name=confFiles../data/external_eff_views/str
   /lst
   /requestHandler
  
  
   The eff file is present at core/data/external_eff_views location.
  
  
   On Thu, Jul 3, 2014 at 11:50 AM, Shalin Shekhar Mangar 
   shalinman...@gmail.com wrote:
  
   This might be related:
  
   https://issues.apache.org/jira/browse/SOLR-3514
  
  
   On Sat, Jun 28, 2014 at 5:34 PM, Kamal Kishore Aggarwal 
   kkroyal@gmail.com wrote:
  
Hi Team,
   
I have recently implemented EFF in solr. There are about 1.5
   lacs(unsorted)
values in the external file. After this implementation, the server
  has
become slow. The solr query time has also increased.
   
Can anybody confirm me if these issues are because of this
   implementation.
Is that memory does EFF eats up?
   
Regards
Kamal Kishore
   
  
  
  
   --
   Regards,
   Shalin Shekhar Mangar.
  
  
  
  
 



 --
 Thanks  Regards,
 Apoorva



Re: External File Field eating memory

2014-07-16 Thread Apoorva Gaurav
Thanks Kamal.


On Wed, Jul 16, 2014 at 11:43 AM, Kamal Kishore Aggarwal 
kkroyal@gmail.com wrote:

 Hi Apporva,

 This was my master server replication configuration:

 core/conf/solrconfig.xml

 requestHandler name=/replication class=solr.ReplicationHandler 
  lst name=master
  str name=replicateAftercommit/str
  str name=replicateAfterstartup/str
  str name=confFiles../data/external_eff_views/str
  /lst
  /requestHandler


 It is only configuration files that can be replicated. So, when I wrote the
 above config. The external files was getting replicated in
 core/conf/data/external_eff_views.
 But for solr to read the external file, it looks for it into
 core/data/external_eff_views
 location. So firstly the file was not getting replicated properly.
 Therefore, I did not opted the option of replicating the eff file.

 And the second thing is that whenever there is a change in configuration
 files, the core gets reloaded by itself to reflect the changes. I am not
 sure if you can disable this reloading.

 Finally, I thought of creating files on slaves in a different way.

 Thanks
 Kamal


 On Tue, Jul 15, 2014 at 11:00 AM, Apoorva Gaurav 
 apoorva.gau...@myntra.com
 wrote:

  Hey Kamal,
  What all config changes have you done to establish replication of
 external
  files and how have you disabled role reloading?
 
 
  On Wed, Jul 9, 2014 at 11:30 AM, Kamal Kishore Aggarwal 
  kkroyal@gmail.com wrote:
 
   Hi All,
  
   It was found that external file, which was getting replicated after
 every
   10 minutes was reloading the core as well. This was increasing the
 query
   time.
  
   Thanks
   Kamal Kishore
  
  
  
   On Thu, Jul 3, 2014 at 12:48 PM, Kamal Kishore Aggarwal 
   kkroyal@gmail.com wrote:
  
With the above replication configuration, the eff file is getting
replicated at core/conf/data/external_eff_views (new dir data is
 being
created in conf dir) location, but it is not getting replicated at
   core/data/external_eff_views
on slave.
   
Please help.
   
   
On Thu, Jul 3, 2014 at 12:21 PM, Kamal Kishore Aggarwal 
kkroyal@gmail.com wrote:
   
Thanks for your guidance Alexandre Rafalovitch.
   
I am looking into this seriously.
   
Another question is that I facing error in replication of eff file
   
This is master replication configuration:
   
core/conf/solrconfig.xml
   
requestHandler name=/replication class=solr.ReplicationHandler
 
lst name=master
str name=replicateAftercommit/str
str name=replicateAfterstartup/str
str name=confFiles../data/external_eff_views/str
/lst
/requestHandler
   
   
The eff file is present at core/data/external_eff_views location.
   
   
On Thu, Jul 3, 2014 at 11:50 AM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:
   
This might be related:
   
https://issues.apache.org/jira/browse/SOLR-3514
   
   
On Sat, Jun 28, 2014 at 5:34 PM, Kamal Kishore Aggarwal 
kkroyal@gmail.com wrote:
   
 Hi Team,

 I have recently implemented EFF in solr. There are about 1.5
lacs(unsorted)
 values in the external file. After this implementation, the
 server
   has
 become slow. The solr query time has also increased.

 Can anybody confirm me if these issues are because of this
implementation.
 Is that memory does EFF eats up?

 Regards
 Kamal Kishore

   
   
   
--
Regards,
Shalin Shekhar Mangar.
   
   
   
   
  
 
 
 
  --
  Thanks  Regards,
  Apoorva
 




-- 
Thanks  Regards,
Apoorva


weird drastic query latency during performance testing and DIH import delay after performance testing

2014-07-16 Thread YouPeng Yang
Hi
  I build my SolrCloud using Solr 4.6.0 (java version:1.7.0_45). In my
cloud,I have a collection with 30 shard,and each shard has one replica.
each core of the shard contains nearly  50 million docs  that is 15GB in
size,so does the replica.
  Before applying my cloud in the real world,I do a performance test with
JMeter 2.11.
  The scenario of the my test is simple:100 threads sending requests for 20
seconds ,and these requests are only sent to  a specific core of a specific
shard.the request is similar to the following :
 http://IP:port/solr/tv_201407/select?q=*:*fq=BEGINTIME:[2014-06-01
00:00:00+TO+*]+AND+(CONTACT:${user})+AND (TV_STATE:00)shards=tv_201407
rows=2000sort=BEGINTIME+desc.

  I encountered the drastic  query latency during performance testing and
DIH import delay after performance testing.Please help me. I have tested
 several times and get the same problem and can not handle it by myself.Any
suggestion will be apprecaited.

 The following steps describes what I have done .

Step 1: Before the test,the DIH import job is very fast.As the statistics
[1], the DIH importing takes only 1s for 10 docs.
[1]---
Indexing completed. Added/Updated: 10 documents. Deleted 0 documents.
(Duration: 01s)
Requests: 1 (1/s), Fetched: 10 (10/s), Skipped: 0, Processed: 10 (10/s)
Started: less than a minute ago
---

Step 2:  Then ,Doing the test under the caches are cleaned. The summery
statistics data is as [2]. Although I have clean the caches,I never think
the query latency becomes so drastic that it cannot be acceptable in my
real application.
  The red font describes the latency of the query performance test on the
core tv_201407 of the shard tv_201407 .

  So would you experts can give some hints about the drastic  query latency
?

[2]---
[solr@solr2 test]$ ../bin/jmeter.sh  -n -t solrCoudKala20140401.jmx  -l
logfile_solrCloud_20.jtl
Creating summariser aggregate
Created the tree successfully using solrCoudKala20140401.jmx
Starting the test @ Wed Jul 16 15:59:28 CST 2014 (1405497568104)
Waiting for possible shutdown message on port 4445
aggregate +  1 in   8.1s =0.1/s Avg:  8070 Min:  8070 Max:  8070 Err:
0 (0.00%) Active: 100 Started: 100 Finished: 0
aggregate +103 in  13.4s =7.7/s Avg:  8027 Min:  4191 Max:  8434 Err:
0 (0.00%) Active: 97 Started: 100 Finished: 3
aggregate =104 in  13.4s =7.7/s Avg:  8027 Min:  4191 Max:  8434 Err:
0 (0.00%)
aggregate + 96 in 7s =   14.5/s Avg:  6160 Min:  5295 Max:  6625 Err:
0 (0.00%) Active: 0 Started: 100 Finished: 100
aggregate =200 in15s =   13.6/s Avg:  7131 Min:  4191 Max:  8434 Err:
0 (0.00%)
Tidying up ...@ Wed Jul 16 15:59:43 CST 2014 (1405497583461)
... end of run
[solr@solr2 test]$
---
Step 3:To be continued,after the test,I do the DIH importing job again
using  the same import expresion.However the performance of the DIH becomes
so unacceptable.
to import  the 10 docs takes 2 m 15 s [3]!
  Having noticing that ,solr can fetched the 10 docs fast,the processing is
slow.

[3]---
*Indexing completed. Added/Updated: 10 documents. Deleted 0 documents.
(Duration: 2m 15s)*
Requests: 1 (0/s), Fetched: 10 (0/s), Skipped: 0, Processed: 10 (0/s)
Started: about an hour ago
---

 By the way. jvm gc goes normal,and there is no long full gc during the
test. the load of my system(rhel 6.5) are also normal.

Regards


TrieDateField, precisionStep impact on sorting performance

2014-07-16 Thread Kuehn, Dennis
Hello,

I'd like to sort on a TrieDateField which currently has a precisionStep value 
of 6.
From what I got so far, the precisionStep value only affects range query 
performance and index size.

However, the documentation for TrieDateField says:
'precisionStep=0 enables efficient date sorting and minimizes index size; 
precisionStep=8 (the default) enables efficient range queries.'

Does this mean sorting performance will suffer for precisionStep values other 
than 0?

Cheers,
Dennis


Solr score manager

2014-07-16 Thread Shay Sofer
Hi All,

I need a specific score mechanism.

I would like to sort my results based on customize scoring field.
scoring for example -



1.   If this is a new object - 100

2.   Edited - 80

3.   Recent search - 50

4.   Opened - 40
and some more actions...

And then when execute a new search they sorted based on score field.

Example:
Object 1 : opened  = 40.
Object 2: New = 100
Object 3: edited X 2 + recent search X 1 = 210.

Result:

Object 3
Object 2
Object 1

Any good article for this? Examples?
I'm using Solr with Java.

Thanks in advance,
Shay.







Re: Solr score manager

2014-07-16 Thread Alexandre Rafalovitch
How are you storing this information in your documents?

Regards,
Alex
On 16/07/2014 5:03 pm, Shay Sofer sha...@checkpoint.com wrote:

 Hi All,

 I need a specific score mechanism.

 I would like to sort my results based on customize scoring field.
 scoring for example -



 1.   If this is a new object - 100

 2.   Edited - 80

 3.   Recent search - 50

 4.   Opened - 40
 and some more actions...

 And then when execute a new search they sorted based on score field.

 Example:
 Object 1 : opened  = 40.
 Object 2: New = 100
 Object 3: edited X 2 + recent search X 1 = 210.

 Result:

 Object 3
 Object 2
 Object 1

 Any good article for this? Examples?
 I'm using Solr with Java.

 Thanks in advance,
 Shay.








Re: questions on Solr WordBreakSolrSpellChecker and WordDelimiterFilterFactory

2014-07-16 Thread Ahmet Arslan
Hi Jia,

What happens when you use 

 arr name=last-components

instead of 

 arr name=components

Ahmet


On Wednesday, July 16, 2014 3:07 AM, j...@ece.ubc.ca j...@ece.ubc.ca wrote:



Hello everyone :)

I have a product called xbox indexed, and when the user search for
either x-box or x box i want the xbox product to be
returned.  I'm new to Solr, and from reading online, I thought I need
to use WordDelimiterFilterFactory for x-box case, and
WordBreakSolrSpellChecker for x box case. Is this correct?

(1) In my schema file, this is what I changed:
filter class=solr.WordDelimiterFilterFactory generateWordParts=1
generateNumberParts=1 catenateWords=1 catenateNumbers=1
catenateAll=1 splitOnCaseChange=0 preserveOriginal=1/

But I don't see the xbox product returned when the search term is
x-box, so I must have missed something

(2) I tried to use  WordBreakSolrSpellChecker together with
DirectSolrSpellChecker as shown below, but the WordBreakSolrSpellChecker
never got used:

searchComponent name=wc_spellcheck
class=solr.SpellCheckComponent
    str name=queryAnalyzerFieldTypewc_textSpell/str

    lst name=spellchecker
      str name=namedefault/str
      str name=fieldspellCheck/str
      str name=classnamesolr.DirectSolrSpellChecker/str
      str name=distanceMeasureinternal/str
          float name=accuracy0.3/float
            int name=maxEdits2/int
            int name=minPrefix1/int
            int name=maxInspections5/int
            int name=minQueryLength3/int
            float name=maxQueryFrequency0.01/float
            float name=thresholdTokenFrequency0.004/float
    /lst
lst name=spellchecker
    str name=namewordbreak/str
    str name=classnamesolr.WordBreakSolrSpellChecker/str
    str name=fieldspellCheck/str
    str name=combineWordstrue/str
    str name=breakWordstrue/str
    int name=maxChanges10/int
  /lst
  /searchComponent

  requestHandler name=/spellcheck
class=org.apache.solr.handler.component.SearchHandler
    lst name=defaults
        str name=dfSpellCheck/str
        str name=spellchecktrue/str
       str name=spellcheck.dictionarydefault/str
        str name=spellcheck.dictionarywordbreak/str
        str name=spellcheck.build true/str
       str name=spellcheck.onlyMorePopularfalse/str
       str name=spellcheck.count10/str
       str name=spellcheck.collatetrue/str
       str name=spellcheck.collateExtendedResultsfalse/str
    /lst
    arr name=components
      strwc_spellcheck/str
    /arr
  /requestHandler

I tried to build the dictionary this way:
http://localhost/solr/coreName/select?spellcheck=truespellcheck.build=true,
but the response returned is this:
response
lst name=responseHeader
int name=status0/int
int name=QTime0/int
lst name=params
str name=spellcheck.buildtrue/str
str name=spellchecktrue/str
/lst
/lst
str name=commandbuild/str
result name=response numFound=0 start=0/
/response

What's the correct way to build the dictionary?
Even though my requestHandler's name=/spellcheck, i wasn't able to
use
http://localhost/solr/coreName/spellcheck?spellcheck=truespellcheck.build=true
.. is there something wrong with my definition above?

(3) I also tried to use WordBreakSolrSpellChecker without the
DirectSolrSpellChecker as shown below:
searchComponent name=wc_spellcheck
class=solr.SpellCheckComponent

  str name=queryAnalyzerFieldTypewc_textSpell/str
    lst name=spellchecker
    str name=namedefault/str
    str name=classnamesolr.WordBreakSolrSpellChecker/str
    str name=fieldspellCheck/str
    str name=combineWordstrue/str
    str name=breakWordstrue/str
    int name=maxChanges10/int
  /lst
   /searchComponent

   requestHandler name=/spellcheck
class=org.apache.solr.handler.component.SearchHandler
    lst name=defaults
        str name=dfSpellCheck/str
        str name=spellchecktrue/str
       str name=spellcheck.dictionarydefault/str
        !--str name=spellcheck.dictionarywordbreak/str --
        str name=spellcheck.build true/str
       str name=spellcheck.onlyMorePopularfalse/str
       str name=spellcheck.count10/str
       str name=spellcheck.collatetrue/str
       str name=spellcheck.collateExtendedResultsfalse/str
    /lst
    arr name=components
      strwc_spellcheck/str
    /arr
  /requestHandler

And still unable to see WordBreakSolrSpellChecker being called anywhere.

Would someone kindly help me?

Many thanks,
Jia


Re: TrieDateField, precisionStep impact on sorting performance

2014-07-16 Thread Yonik Seeley
On Wed, Jul 16, 2014 at 5:51 AM, Kuehn, Dennis
dennis.ku...@brands4friends.de wrote:
 I'd like to sort on a TrieDateField which currently has a precisionStep value 
 of 6.
 From what I got so far, the precisionStep value only affects range query 
 performance and index size.

 However, the documentation for TrieDateField says:
 'precisionStep=0 enables efficient date sorting and minimizes index size; 
 precisionStep=8 (the default) enables efficient range queries.'

 Does this mean sorting performance will suffer for precisionStep values other 
 than 0?

No, sorting speed is unaffected by precisionStep.  That comment looks
slightly misleading.

-Yonik
http://heliosearch.org - native code faceting, facet functions,
sub-facets, off-heap data


Re: Slow inserts when using Solr Cloud

2014-07-16 Thread ian
That's useful to know, thanks very much.   I'll look into using
CloudSolrServer, although I'm using solrnet at present.

That would reduce some of the overhead - but not the extra 200ms I'm getting
for forwarding to the replica when the replica is switched on.  

It does seem a very high overhead.  When I consider that it takes 20ms to
insert a new document to Solr with replicas disabled (if I route to the
correct shard), you might expect it to take two to three times longer if it
has to forward to one replica and then wait for a response, but an increase
of 200ms seems really high doesn't it?  Is there a forum where I should
raise that? 

Thanks again for your help
Ian


Shalin Shekhar Mangar wrote
 You can use CloudSolrServer (if you're using Java) which will route
 documents correctly to the leader of the appropriate shard.
 
 
 On Tue, Jul 15, 2014 at 3:04 PM, ian lt;

 Ian.Williams@.nhs

 gt; wrote:
 
 Hi Mark

 Thanks for replying to my post.  Would you know whether my findings are
 consistent with what other people see when using SolrCloud?

 One thing I want to investigate is whether I can route my updates to the
 correct shard in the first place, by having my client using the same
 hashing
 logic as Solr, and working out in advance which shard my inserts should
 be
 sent to.  Do you know whether that's an approach that others have used?

 Thanks again
 Ian



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Slow-inserts-when-using-Solr-Cloud-tp4146087p4147183.html
 Sent from the Solr - User mailing list archive at Nabble.com.

 
 
 
 -- 
 Regards,
 Shalin Shekhar Mangar.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Slow-inserts-when-using-Solr-Cloud-tp4146087p4147481.html
Sent from the Solr - User mailing list archive at Nabble.com.


Fwd: Solr score manager

2014-07-16 Thread Alexandre Rafalovitch
-- Forwarded message --
From: Shay Sofer sha...@checkpoint.com
Date: Wed, Jul 16, 2014 at 6:55 PM

That’s my question :-)

How should I manage this scoring system.

I guess that I need to add new field (my_score) and update him as I want.



-Original Message-
From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
Sent: Wednesday, July 16, 2014 1:53 PM
To: solr-user
Subject: Re: Solr score manager

How are you storing this information in your documents?

Regards,
Alex
On 16/07/2014 5:03 pm, Shay Sofer sha...@checkpoint.com wrote:

 Hi All,

 I need a specific score mechanism.

 I would like to sort my results based on customize scoring field.
 scoring for example -



 1.   If this is a new object - 100

 2.   Edited - 80

 3.   Recent search - 50

 4.   Opened - 40
 and some more actions...

 And then when execute a new search they sorted based on score field.

 Example:
 Object 1 : opened  = 40.
 Object 2: New = 100
 Object 3: edited X 2 + recent search X 1 = 210.

 Result:

 Object 3
 Object 2
 Object 1

 Any good article for this? Examples?
 I'm using Solr with Java.

 Thanks in advance,
 Shay.








Email secured by Check Point


Re: TrieDateField, precisionStep impact on sorting performance

2014-07-16 Thread Kuehn, Dennis
Thanks for clarifying!

Dennis



On 7/16/14 3:19 PM, Yonik Seeley yo...@heliosearch.com wrote:

On Wed, Jul 16, 2014 at 5:51 AM, Kuehn, Dennis
dennis.ku...@brands4friends.de wrote:
 I'd like to sort on a TrieDateField which currently has a precisionStep
value of 6.
 From what I got so far, the precisionStep value only affects range
query performance and index size.

 However, the documentation for TrieDateField says:
 'precisionStep=0 enables efficient date sorting and minimizes index
size; precisionStep=8 (the default) enables efficient range queries.'

 Does this mean sorting performance will suffer for precisionStep values
other than 0?

No, sorting speed is unaffected by precisionStep.  That comment looks
slightly misleading.

-Yonik
http://heliosearch.org - native code faceting, facet functions,
sub-facets, off-heap data



RE: Solr score manager

2014-07-16 Thread Doug Turnbull
Shay this presentation I gave at apachecon and dc solr exchange might
be useful to you:

http://www.slideshare.net/mobile/o19s/hacking-lucene-for-custom-search-results

Sent from my Windows Phone From: Shay Sofer
Sent: ‎7/‎16/‎2014 6:03 AM
To: solr-user@lucene.apache.org
Subject: Solr score manager
Hi All,

I need a specific score mechanism.

I would like to sort my results based on customize scoring field.
scoring for example -



1.   If this is a new object - 100

2.   Edited - 80

3.   Recent search - 50

4.   Opened - 40
and some more actions...

And then when execute a new search they sorted based on score field.

Example:
Object 1 : opened  = 40.
Object 2: New = 100
Object 3: edited X 2 + recent search X 1 = 210.

Result:

Object 3
Object 2
Object 1

Any good article for this? Examples?
I'm using Solr with Java.

Thanks in advance,
Shay.


Mixing ordinary and nested documents

2014-07-16 Thread Bjørn Axelsen
Hi Solr users

I would appreciate your inputs on how to handle a *mix *of *simple *and *nested
*documents in the most easy and flexible way.

I need to handle:

   - simple documens: webpages, short articles etc. (approx. 90% of the
   content)
   - nested documents: books containing chapters etc. (approx 10% of the
   content)

For simple documents I just want to present straightforward search results
without any grouping etc.

For the nested documents I want to group by book and show book title, book
price etc. AND the individual results within the book. Lets say there is a
hit on Chapters 1 and Chapter 7 within Book 1 and a hit on Article
1, I would like to present this:

*Book 1 title*
Book 1 published date
Book 1 description
- *Chapter 1 title*
  Chapter 1 snippet
- *Chapter 7 title*
  CHapter 7 snippet

*Article 1 title*
Article 1 published date
Article 1 description
Article 1 snippet

It looks like it is pretty straightforward to use the CollapsingQParser to
collapse the book results into one result and not to collapse the other
results. But how about showing the information about the book (the parent
document of the chapters)?

1) Is there a way to do an* optional block join* to a *parent *document and
return it together *with *the *child *document - but not to require a
parent document?

- or -

2) Do I need to require parent-child documents for everything? This is
really not my preferred strategy as only a small part of the documents is
in a real parent-child relationship. This would mean a lot of dummy child
documents.

- or -

3) Should I just denormalize data and include the book information within
each chapter document?

- or -

4) ... or is there a smarter way?

Your help is very much appreciated.

Cheers,

Bjørn Axelsen


RE: questions on Solr WordBreakSolrSpellChecker and WordDelimiterFilterFactory

2014-07-16 Thread Dyer, James
Jia,

I agree that for the spellcheckers to work, you need  arr 
name=last-components instead of arr name=components.

But the x-box = xbox example ought to be solved by analyzing using 
WordDelimiterFilterFactory and catenateWords=1 at query-time.  Did you 
re-index after changing your analysis chain (you need to)?  Perhaps you can 
show your full analyzer configuration, and someone here can help you find the 
problem. Also, the Analysis page on the solr Admin UI is invaluable for 
debugging text-field analyzer problems.

Getting x box to analyze to xbox is trickier (but possible).  The 
WordBreakSpellChecker is probably your best option if you have cases like this 
in your data  users' queries. 

Of course, if you have a finite number of products that have spelling variants 
like this, SynonymFilterFactory might be all you need.  I would recommend using 
index-time synonyms for your case rather than query-time synonyms.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com.INVALID] 
Sent: Wednesday, July 16, 2014 7:42 AM
To: solr-user@lucene.apache.org; j...@ece.ubc.ca
Subject: Re: questions on Solr WordBreakSolrSpellChecker and 
WordDelimiterFilterFactory

Hi Jia,

What happens when you use 

 arr name=last-components

instead of 

 arr name=components

Ahmet


On Wednesday, July 16, 2014 3:07 AM, j...@ece.ubc.ca j...@ece.ubc.ca wrote:



Hello everyone :)

I have a product called xbox indexed, and when the user search for
either x-box or x box i want the xbox product to be
returned.  I'm new to Solr, and from reading online, I thought I need
to use WordDelimiterFilterFactory for x-box case, and
WordBreakSolrSpellChecker for x box case. Is this correct?

(1) In my schema file, this is what I changed:
filter class=solr.WordDelimiterFilterFactory generateWordParts=1
generateNumberParts=1 catenateWords=1 catenateNumbers=1
catenateAll=1 splitOnCaseChange=0 preserveOriginal=1/

But I don't see the xbox product returned when the search term is
x-box, so I must have missed something

(2) I tried to use  WordBreakSolrSpellChecker together with
DirectSolrSpellChecker as shown below, but the WordBreakSolrSpellChecker
never got used:

searchComponent name=wc_spellcheck
class=solr.SpellCheckComponent
    str name=queryAnalyzerFieldTypewc_textSpell/str

    lst name=spellchecker
      str name=namedefault/str
      str name=fieldspellCheck/str
      str name=classnamesolr.DirectSolrSpellChecker/str
      str name=distanceMeasureinternal/str
          float name=accuracy0.3/float
            int name=maxEdits2/int
            int name=minPrefix1/int
            int name=maxInspections5/int
            int name=minQueryLength3/int
            float name=maxQueryFrequency0.01/float
            float name=thresholdTokenFrequency0.004/float
    /lst
lst name=spellchecker
    str name=namewordbreak/str
    str name=classnamesolr.WordBreakSolrSpellChecker/str
    str name=fieldspellCheck/str
    str name=combineWordstrue/str
    str name=breakWordstrue/str
    int name=maxChanges10/int
  /lst
  /searchComponent

  requestHandler name=/spellcheck
class=org.apache.solr.handler.component.SearchHandler
    lst name=defaults
        str name=dfSpellCheck/str
        str name=spellchecktrue/str
       str name=spellcheck.dictionarydefault/str
        str name=spellcheck.dictionarywordbreak/str
        str name=spellcheck.build true/str
       str name=spellcheck.onlyMorePopularfalse/str
       str name=spellcheck.count10/str
       str name=spellcheck.collatetrue/str
       str name=spellcheck.collateExtendedResultsfalse/str
    /lst
    arr name=components
      strwc_spellcheck/str
    /arr
  /requestHandler

I tried to build the dictionary this way:
http://localhost/solr/coreName/select?spellcheck=truespellcheck.build=true,
but the response returned is this:
response
lst name=responseHeader
int name=status0/int
int name=QTime0/int
lst name=params
str name=spellcheck.buildtrue/str
str name=spellchecktrue/str
/lst
/lst
str name=commandbuild/str
result name=response numFound=0 start=0/
/response

What's the correct way to build the dictionary?
Even though my requestHandler's name=/spellcheck, i wasn't able to
use
http://localhost/solr/coreName/spellcheck?spellcheck=truespellcheck.build=true
.. is there something wrong with my definition above?

(3) I also tried to use WordBreakSolrSpellChecker without the
DirectSolrSpellChecker as shown below:
searchComponent name=wc_spellcheck
class=solr.SpellCheckComponent

  str name=queryAnalyzerFieldTypewc_textSpell/str
    lst name=spellchecker
    str name=namedefault/str
    str name=classnamesolr.WordBreakSolrSpellChecker/str
    str name=fieldspellCheck/str
    str name=combineWordstrue/str
    str name=breakWordstrue/str
    int name=maxChanges10/int
  /lst
   /searchComponent

   requestHandler name=/spellcheck
class=org.apache.solr.handler.component.SearchHandler
    lst name=defaults
       

Strange Scoring Results

2014-07-16 Thread Michael Carlson
Hey All - 

I’m a Solr newbie in need of some help.

I’m using Apache Nutch to crawl a site and populate a Solr core, which we then 
use to query search results. I’ve got it all up and running, but the Solr 
scoring results I get don’t seem to make any sense. Let’s take the following 
query as an example:

content:devlearn 2014 registration information

I have a page with a title of DevLearn 2014 Conference  Expo - Registration 
Information” and a url of 
www.mydomain.com/DevLearn/content/3426/devlearn-2014-conference--expo--registration-information/“
 which has multiple instances of all terms in the content field. I would expect 
this document to be returned at the top of the list, since in addition to being 
in the content field, all terms are in both the title and the url, which I’m 
boosting for. Instead, it returns as number 3320 in the results with a score of 
0. Meanwhile, 3319 other pages return with higher scores, and all of these have 
fewer instances of the terms in the content field, and one or fewer of the 
terms in the title or url.

Below is the select requestHandler section from my solrconfig.xml which shows 
the query select defaults. Let me know if I should include more of this file or 
any other information:

requestHandler name=/select class=solr.SearchHandler
  
 lst name=defaults
   str name=echoParamsexplicit/str
   int name=rows10/int
   str name=dftext/str
   
   str name=hlon/str
   str name=hl.flcontent/str
   str name=hl.encoderhtml/str
   str name=hl.simple.prelt;stronggt;/str
   str name=hl.simple.postlt;/stronggt;/str
   str name=f.content.hl.snippets1/str
   str name=f.content.hl.fragsize200/str
   str name=f.content.hl.alternateFieldcontent/str
   str name=f.content.hl.maxAlternateFieldLength750/str

   str name=defTypeedismax/str
   str name=qf
  content^0.5 url^10.0 title^10.0
   /str
   str name=dfcontent/str
   str name=mm100%/str
   str name=q.alt*:*/str
   str name=rows10/str
   str name=fl*,score/str
   str name=pf
   content^0.5 url^10.0 title^10.0
   /str
   str name=ps100/str

 /lst
/requestHandler







Re: Slow inserts when using Solr Cloud

2014-07-16 Thread Timothy Potter
Hi Ian,

What's the CPU doing on the leader? Have you tried attaching a
profiler to the leader while running and then seeing if there are any
hotspots showing. Not sure if this is related but we recently fixed an
issue in the area of leader forwarding to replica that used too many
CPU cycles inefficiently - see SOLR-6136.

Tim

On Wed, Jul 16, 2014 at 7:49 AM, ian ian.willi...@wales.nhs.uk wrote:
 That's useful to know, thanks very much.   I'll look into using
 CloudSolrServer, although I'm using solrnet at present.

 That would reduce some of the overhead - but not the extra 200ms I'm getting
 for forwarding to the replica when the replica is switched on.

 It does seem a very high overhead.  When I consider that it takes 20ms to
 insert a new document to Solr with replicas disabled (if I route to the
 correct shard), you might expect it to take two to three times longer if it
 has to forward to one replica and then wait for a response, but an increase
 of 200ms seems really high doesn't it?  Is there a forum where I should
 raise that?

 Thanks again for your help
 Ian


 Shalin Shekhar Mangar wrote
 You can use CloudSolrServer (if you're using Java) which will route
 documents correctly to the leader of the appropriate shard.


 On Tue, Jul 15, 2014 at 3:04 PM, ian lt;

 Ian.Williams@.nhs

 gt; wrote:

 Hi Mark

 Thanks for replying to my post.  Would you know whether my findings are
 consistent with what other people see when using SolrCloud?

 One thing I want to investigate is whether I can route my updates to the
 correct shard in the first place, by having my client using the same
 hashing
 logic as Solr, and working out in advance which shard my inserts should
 be
 sent to.  Do you know whether that's an approach that others have used?

 Thanks again
 Ian



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Slow-inserts-when-using-Solr-Cloud-tp4146087p4147183.html
 Sent from the Solr - User mailing list archive at Nabble.com.




 --
 Regards,
 Shalin Shekhar Mangar.





 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Slow-inserts-when-using-Solr-Cloud-tp4146087p4147481.html
 Sent from the Solr - User mailing list archive at Nabble.com.


solr-4.9.0 : [OverseerExitThread] but has failed to stop it. This is very likely to create a memory leak

2014-07-16 Thread Vijayakumar Ramdoss
Hi,

When I am starting the SolrCloud (4.9) version top of the Tomcat, its
throwing the below error message, I am using the JAVA runtime for memory
leak exception . 

 

Summary of error message,

[OverseerExitThread] but has failed to stop it. This is very likely to
create a memory leak

 

 

Detailed error message here,

 

16-Jul-2014 15:14:01.044 INFO [Thread-5]
com.springsource.tcserver.licensing.LicensingLifecycleListener.setComponentS
tate ComponentState to off

16-Jul-2014 15:14:01.049 INFO [Thread-5]
org.apache.coyote.AbstractProtocol.pause Pausing ProtocolHandler
[http-bio-8080]

16-Jul-2014 15:14:01.049 INFO [Thread-5]
org.apache.catalina.core.StandardService.stopInternal Stopping service
Catalina

16-Jul-2014 15:14:01.091 SEVERE [localhost-startStop-2]
org.apache.catalina.loader.WebappClassLoader.clearReferencesThreads The web
application [/solr-4.9.0] appears to have started a thread named
[localhost-startStop-1-SendThread(cpsslrsbx01:2181)] but has failed to stop
it. This is very likely to create a memory leak.

16-Jul-2014 15:14:01.091 SEVERE [localhost-startStop-2]
org.apache.catalina.loader.WebappClassLoader.clearReferencesThreads The web
application [/solr-4.9.0] appears to have started a thread named
[localhost-startStop-1-EventThread] but has failed to stop it. This is very
likely to create a memory leak.

16-Jul-2014 15:14:01.091 SEVERE [localhost-startStop-2]
org.apache.catalina.loader.WebappClassLoader.clearReferencesThreads The web
application [/solr-4.9.0] appears to have started a thread named
[OverseerExitThread] but has failed to stop it. This is very likely to
create a memory leak.

16-Jul-2014 15:14:01.093 INFO [Thread-5]
org.apache.coyote.AbstractProtocol.stop Stopping ProtocolHandler
[http-bio-8080]

16-Jul-2014 15:14:01.094 INFO [Thread-5]
org.apache.coyote.AbstractProtocol.destroy Destroying ProtocolHandler
[http-bio-8080]

16-Jul-2014 15:31:40.834 INFO [main]
com.springsource.tcserver.security.PropertyDecoder.init tc Runtime
property decoder using memory-based key

16-Jul-2014 15:31:41.131 INFO [main]
com.springsource.tcserver.security.PropertyDecoder.init tcServer Runtime
property decoder has been initialized in 301 ms

16-Jul-2014 15:31:43.978 INFO [main] org.apache.coyote.AbstractProtocol.init
Initializing ProtocolHandler [http-bio-8080]

16-Jul-2014 15:31:45.141 INFO [main]
com.springsource.tcserver.licensing.LicensingLifecycleListener.setComponentS
tate ComponentState to on

16-Jul-2014 15:31:45.345 INFO [main]
com.springsource.tcserver.serviceability.rmi.JmxSocketListener.init Started
up JMX registry on 127.0.0.1:6969 in 187 ms

16-Jul-2014 15:31:45.370 INFO [main]
org.apache.catalina.core.StandardService.startInternal Starting service
Catalina

16-Jul-2014 15:31:45.370 INFO [main]
org.apache.catalina.core.StandardEngine.startInternal Starting Servlet
Engine: VMware vFabric tc Runtime 2.9.2.RELEASE/7.0.39.B.RELEASE

16-Jul-2014 15:31:45.384 INFO [localhost-startStop-1]
org.apache.catalina.startup.HostConfig.deployWAR Deploying web application
archive
/apps/ecps/vfabric-tc-server-standard-2.9.2.RELEASE/cps_8080/webapps/solr-4.
9.0.war

16-Jul-2014 15:31:48.204 INFO [localhost-startStop-1]
org.apache.catalina.startup.HostConfig.deployDirectory Deploying web
application directory
/apps/ecps/vfabric-tc-server-standard-2.9.2.RELEASE/cps_8080/webapps/ROOT

16-Jul-2014 15:31:48.349 INFO [main]
org.apache.coyote.AbstractProtocol.start Starting ProtocolHandler
[http-bio-8080]



Re: Using hundreds of dynamic fields

2014-07-16 Thread Andy Crossen
Thanks, Jack and Jared, for your input on this.  I'm looking into whether
parent-child relationships via block or query time join will meet my
requirements.

Jack, I noticed in a bunch of other posts around the web that you've
suggested to use dynamic fields in moderation.  Is this suggestion based on
negative performance implications of having to read and rewrite all
previous fields for a document when doing atomic updates?  Or are there
additional inherent negatives to using lots of dynamic fields?

Andy


On Fri, Jun 27, 2014 at 11:46 AM, Jared Whiklo jared.whi...@umanitoba.ca
wrote:

 This is probably not the best answer, but my gut says that even if you
 changed your document to a simple 2 fields and have one as your metric and
 the other as a TrieDateField you would speed up and simplify your date
 range queries.


 --
 Jared Whiklo



 On 2014-06-27 10:10 AM, Andy Crossen acros...@gmail.com wrote:

 Hi folks,
 
 My application requires tracking a daily performance metric for all
 documents. I start tracking for an 18 month window from the time a doc is
 indexed, so each doc will have ~548 of these fields.  I have in my schema
 a
 dynamic field to capture this requirement:
 
 dynamicField name=“metric_*” type=int …/
 
 Example:
 metric_2014_06_24 : 15
 metric_2014_06_25 : 21
 …
 
 My application then issues a query that:
 a) sorts documents by the sum of the metrics within a date range that is
 variable for each query;
 b) gathers stats on the metrics using the Statistics component.
 
 With this design, the app must unfortunately:
 a) construct the sort as a long list of fields within the spec’d date
 range
 to accomplish the sum; e.g. sort=sum(metric_2014_06_24,metric_2014_06_25…)
 desc
 b) specify each field in the range independently to the Stats component;
 e.g. stats.field=metric_2014_06_24stats.field=metric_2014_06_25…
 
 Am I missing a cleaner way to accomplish this given the requirements
 above?
 
 Thanks for any suggestions you may have.




clearing fieldValueCache in solr 4.6

2014-07-16 Thread Matthew LeMay
Hello.  We're just starting to use solr in production.  We've indexed 
18,000 documents or so.  We've just implemented faceted search results.  
We mistakenly stored integer ids in what was meant to be a string 
field.  So, our facet results are showing numbers instead of the textual 
values.


After fixing this oversight, reindexing the documents yields the correct 
results, but the faceted results still return the integer ids in 
addition to the enumerated values (the counts with the integer ids are 
zero).


It looks like fieldValueCache is doing this.  Is there any way to empty 
the cache?  I've tried reloading the core through the admin, which 
didn't work, and haven't been able to find (REST-like) API documentation 
on fieldValueCache.  We want to avoid emptying the index, if possible 
(or if that would even work).


thanks!

-Matt LeMay



Re: Solr irregularly having QTime 50000ms, stracing solr cures the problem

2014-07-16 Thread IJ
I know u mentioned you have a single machine at play - but do you have
multiple nodes on the machine that talk to one another ??

Does your problem recur when the load on the system is low ?

Also faced a similar problem wherein the 5 second delay (described in
detail on my other post) kept happening after a 1.5 minute inactivity
interval. This was explained off as Solr keeping alive the http connection
for inter-node communication for around 1.5 minutes before disconnecting -
and if a new request happens post 1.5 minutes then, a new connection is
created - which probably suffers a latency due to a DNS Name Lookup delay.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-irregularly-having-QTime-5ms-stracing-solr-cures-the-problem-tp4146047p4147512.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: clearing fieldValueCache in solr 4.6

2014-07-16 Thread IJ
One thing you could do is:
1. If you current index is called A1, then you can create a new index called
A2 with the correct schema.xml / solrconfig.xml
2. Index your 18,000 documents into A2 afresh
3. Then delete A1 (the bad index)
4. Then quickly create an Alias with the name of A1 pointng to A2 - This way
your consumers will still think they are talking to A1 - but in fact they
would be querying against the new index.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/clearing-fieldValueCache-in-solr-4-6-tp4147509p4147514.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Strategies for effective prefix queries?

2014-07-16 Thread Hayden Muhl
A copy field does not address my problem, and this has nothing to do with
stored fields. This is a query parsing problem, not an indexing problem.

Here's the use case.

If someone has a username like bob-smith, I would like it to match
prefixes of bo and sm. I tokenize the username into the tokens bob
and smith. Everything is fine so far.

If someone enters bo sm as a search string, I would like bob-smith to
be one of the results. The query to do this is straight forward,
username:bo* username:sm*. Here's the problem. In order to construct that
query, I have to tokenize the search string bo sm **on the client**. I
don't want to reimplement tokenization on the client. Is there any way to
give Solr the string bo sm, have Solr do the tokenization, then treat
each token like a prefix?


On Tue, Jul 15, 2014 at 4:55 PM, Alexandre Rafalovitch arafa...@gmail.com
wrote:

 So copyField it to another and apply alternative processing there. Use
 eDismax to search both. No need to store the copied field, just index it.

 Regards,
  Alex
 On 16/07/2014 2:46 am, Hayden Muhl haydenm...@gmail.com wrote:

  Both fields? There is only one field here: username.
 
 
  On Mon, Jul 14, 2014 at 6:17 PM, Alexandre Rafalovitch 
 arafa...@gmail.com
  
  wrote:
 
   Search against both fields (one split, one not split)? Keep original
   and tokenized form? I am doing something similar with class name
   autocompletes here:
  
  
 
 https://github.com/arafalov/Solr-Javadoc/blob/master/JavadocIndex/JavadocCollection/conf/schema.xml#L24
  
   Regards,
  Alex.
   Personal: http://www.outerthoughts.com/ and @arafalov
   Solr resources: http://www.solr-start.com/ and @solrstart
   Solr popularizers community:
 https://www.linkedin.com/groups?gid=6713853
  
  
   On Tue, Jul 15, 2014 at 8:04 AM, Hayden Muhl haydenm...@gmail.com
  wrote:
I'm working on using Solr for autocompleting usernames. I'm running
  into
   a
problem with the wildcard queries (e.g. username:al*).
   
We are tokenizing usernames so that a username like solr-user will
 be
tokenized into solr and user, and will match both sol and use
prefixes. The problem is when we get solr-u as a prefix, I'm having
  to
split that up on the client side before I construct a query
   username:solr*
username:u*. I'm basically using a regex as a poor man's tokenizer.
   
Is there a better way to approach this? Is there a way to tell Solr
 to
tokenize a string and use the parts as prefixes?
   
- Hayden
  
 



Re: Using hundreds of dynamic fields

2014-07-16 Thread Jack Krupansky
I guess I'm just a big fan of simpler and cleaner data models! Especially if 
I were to have to look at somebody's data model and try to make sense out of 
it, such as how to keep all the fields straight for constructing queries.


But atomic update and the need to read and rewrite all the fields is a 
concern as well.


-- Jack Krupansky

-Original Message- 
From: Andy Crossen

Sent: Wednesday, July 16, 2014 1:05 PM
To: solr-user@lucene.apache.org
Subject: Re: Using hundreds of dynamic fields

Thanks, Jack and Jared, for your input on this.  I'm looking into whether
parent-child relationships via block or query time join will meet my
requirements.

Jack, I noticed in a bunch of other posts around the web that you've
suggested to use dynamic fields in moderation.  Is this suggestion based on
negative performance implications of having to read and rewrite all
previous fields for a document when doing atomic updates?  Or are there
additional inherent negatives to using lots of dynamic fields?

Andy


On Fri, Jun 27, 2014 at 11:46 AM, Jared Whiklo jared.whi...@umanitoba.ca
wrote:


This is probably not the best answer, but my gut says that even if you
changed your document to a simple 2 fields and have one as your metric and
the other as a TrieDateField you would speed up and simplify your date
range queries.


--
Jared Whiklo



On 2014-06-27 10:10 AM, Andy Crossen acros...@gmail.com wrote:

Hi folks,

My application requires tracking a daily performance metric for all
documents. I start tracking for an 18 month window from the time a doc is
indexed, so each doc will have ~548 of these fields.  I have in my schema
a
dynamic field to capture this requirement:

dynamicField name=“metric_*” type=int …/

Example:
metric_2014_06_24 : 15
metric_2014_06_25 : 21
…

My application then issues a query that:
a) sorts documents by the sum of the metrics within a date range that is
variable for each query;
b) gathers stats on the metrics using the Statistics component.

With this design, the app must unfortunately:
a) construct the sort as a long list of fields within the spec’d date
range
to accomplish the sum; e.g. 
sort=sum(metric_2014_06_24,metric_2014_06_25…)

desc
b) specify each field in the range independently to the Stats component;
e.g. stats.field=metric_2014_06_24stats.field=metric_2014_06_25…

Am I missing a cleaner way to accomplish this given the requirements
above?

Thanks for any suggestions you may have.






Re: questions on Solr WordBreakSolrSpellChecker and WordDelimiterFilterFactory

2014-07-16 Thread Diego Fernandez
Which tokenizer are you using?  StandardTokenizer will split x-box into x 
and box, same as x box.

If there's not too many of these, you could also use the 
PatternReplaceCharFilterFactory to map x box and x-box to xbox before the 
tokenizer.

Diego Fernandez - 爱国
Software Engineer
US GSS Supportability - Diagnostics


- Original Message -
 Jia,
 
 I agree that for the spellcheckers to work, you need  arr
 name=last-components instead of arr name=components.
 
 But the x-box = xbox example ought to be solved by analyzing using
 WordDelimiterFilterFactory and catenateWords=1 at query-time.  Did you
 re-index after changing your analysis chain (you need to)?  Perhaps you can
 show your full analyzer configuration, and someone here can help you find
 the problem. Also, the Analysis page on the solr Admin UI is invaluable for
 debugging text-field analyzer problems.
 
 Getting x box to analyze to xbox is trickier (but possible).  The
 WordBreakSpellChecker is probably your best option if you have cases like
 this in your data  users' queries.
 
 Of course, if you have a finite number of products that have spelling
 variants like this, SynonymFilterFactory might be all you need.  I would
 recommend using index-time synonyms for your case rather than query-time
 synonyms.
 
 James Dyer
 Ingram Content Group
 (615) 213-4311
 
 
 -Original Message-
 From: Ahmet Arslan [mailto:iori...@yahoo.com.INVALID]
 Sent: Wednesday, July 16, 2014 7:42 AM
 To: solr-user@lucene.apache.org; j...@ece.ubc.ca
 Subject: Re: questions on Solr WordBreakSolrSpellChecker and
 WordDelimiterFilterFactory
 
 Hi Jia,
 
 What happens when you use
 
  arr name=last-components
 
 instead of
 
  arr name=components
 
 Ahmet
 
 
 On Wednesday, July 16, 2014 3:07 AM, j...@ece.ubc.ca j...@ece.ubc.ca
 wrote:
 
 
 
 Hello everyone :)
 
 I have a product called xbox indexed, and when the user search for
 either x-box or x box i want the xbox product to be
 returned.  I'm new to Solr, and from reading online, I thought I need
 to use WordDelimiterFilterFactory for x-box case, and
 WordBreakSolrSpellChecker for x box case. Is this correct?
 
 (1) In my schema file, this is what I changed:
 filter class=solr.WordDelimiterFilterFactory generateWordParts=1
 generateNumberParts=1 catenateWords=1 catenateNumbers=1
 catenateAll=1 splitOnCaseChange=0 preserveOriginal=1/
 
 But I don't see the xbox product returned when the search term is
 x-box, so I must have missed something
 
 (2) I tried to use  WordBreakSolrSpellChecker together with
 DirectSolrSpellChecker as shown below, but the WordBreakSolrSpellChecker
 never got used:
 
 searchComponent name=wc_spellcheck
 class=solr.SpellCheckComponent
     str name=queryAnalyzerFieldTypewc_textSpell/str
 
     lst name=spellchecker
       str name=namedefault/str
       str name=fieldspellCheck/str
       str name=classnamesolr.DirectSolrSpellChecker/str
       str name=distanceMeasureinternal/str
           float name=accuracy0.3/float
             int name=maxEdits2/int
             int name=minPrefix1/int
             int name=maxInspections5/int
             int name=minQueryLength3/int
             float name=maxQueryFrequency0.01/float
             float name=thresholdTokenFrequency0.004/float
     /lst
 lst name=spellchecker
     str name=namewordbreak/str
     str name=classnamesolr.WordBreakSolrSpellChecker/str
     str name=fieldspellCheck/str
     str name=combineWordstrue/str
     str name=breakWordstrue/str
     int name=maxChanges10/int
   /lst
   /searchComponent
 
   requestHandler name=/spellcheck
 class=org.apache.solr.handler.component.SearchHandler
     lst name=defaults
         str name=dfSpellCheck/str
         str name=spellchecktrue/str
        str name=spellcheck.dictionarydefault/str
         str name=spellcheck.dictionarywordbreak/str
         str name=spellcheck.build true/str
        str name=spellcheck.onlyMorePopularfalse/str
        str name=spellcheck.count10/str
        str name=spellcheck.collatetrue/str
        str name=spellcheck.collateExtendedResultsfalse/str
     /lst
     arr name=components
       strwc_spellcheck/str
     /arr
   /requestHandler
 
 I tried to build the dictionary this way:
 http://localhost/solr/coreName/select?spellcheck=truespellcheck.build=true,
 but the response returned is this:
 response
 lst name=responseHeader
 int name=status0/int
 int name=QTime0/int
 lst name=params
 str name=spellcheck.buildtrue/str
 str name=spellchecktrue/str
 /lst
 /lst
 str name=commandbuild/str
 result name=response numFound=0 start=0/
 /response
 
 What's the correct way to build the dictionary?
 Even though my requestHandler's name=/spellcheck, i wasn't able to
 use
 http://localhost/solr/coreName/spellcheck?spellcheck=truespellcheck.build=true
 .. is there something wrong with my definition above?
 
 (3) I also tried to use WordBreakSolrSpellChecker without the
 DirectSolrSpellChecker as shown below:
 searchComponent 

Updating Oracle

2014-07-16 Thread Jason Bourne
Hi,

I am new to Solr so I just want to know is something is possible.  I might
need some help coding later on after taking the tutorials.

I am taking over a program that uses html and java script to dislay metadata
from solr.  They now would like to update one field.  The solr db gets
refeshed weekly from an oracle db.  So in order to save the changes the
oracle db needs to be updated.  But to keep the updated field visible to the
users I would like to update solr with the changes and then update oracle 
Can solr fire a data base trigger on the field updated to update oracle? 
What is called in solr and can you point me to an example?

The other option is to add/modify the code to update oracle and solr from
the application but this would be a lot of work.  If this is the only option
can you point to an example of updating a field in solr?

What are my options?  Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Updating-Oracle-tp4147537.html
Sent from the Solr - User mailing list archive at Nabble.com.


Shard Replicas not getting replicated data from leader

2014-07-16 Thread Marc Campeau
Hi,

I have setup 4 Solr (4.9.0) Nodes into a single shard for a given
collection, meaning I should have 4 replicated nodes. I have 3 Zookeepers
in ensemble managing the configs for this collection. I have a load
balancer in front of the 4 nodes to split traffic between them.

I start this collection with an empty data/index directory.

When I send /update requests to the load balancers I see these going to all
4 nodes. Also, I can see that all FOLLOWERs distribute the requests they
receive to the LEADER as is expected. But for some reason the FOLLOWERS are
not getting /replication requests from the LEADER.  So the collection for
the leader contains many thousand of documents and is on the 8th
generation. I see that it's replicable in the admin interface, yet all
FOLLOWER nodes have an empty index.

Hence, I need your insights please.

Thanks,

Marc

To Note:

When I startup my nodes I see the following error in solr.log:
1) When Zookeeper does a clusterstate update, all nodes have their starte
DOWN, why? This I means that in the Solr Admin interface they show up has
down. This never updates to active.

2) I have a warning :  org.apache.solr.rest.ManagedResource; No registered
observers for /rest/managed, which I need to update solrconfig.xml to fix

3) I have the following error:
ERROR - 2014-07-16 19:49:25.336; org.apache.solr.cloud.SyncStrategy; No
UpdateLog found - cannot sync

SOLR.LOG
-
[]
INFO  - 2014-07-16 19:47:30.870;
org.apache.solr.cloud.Overseer$ClusterStateUpdater; Update state
numShards=null message={
  operation:state,
  state:down,
  base_url:http://192.168.150.90:8983/solr;,
  core:collection_name,
  roles:null,
  node_name:192.168.150.90:8983_solr,
  shard:null,
  collection:collection_name,
  numShards:null,
  core_node_name:null}
INFO  - 2014-07-16 19:47:30.871;
org.apache.solr.cloud.Overseer$ClusterStateUpdater; node=core_node1 is
already registered
[]
WARN  - 2014-07-16 19:47:34.535; org.apache.solr.rest.ManagedResource; No
registered observers for /rest/managed
[]
INFO  - 2014-07-16 19:48:25.135;
org.apache.solr.common.cloud.ZkStateReader$3; Updating live nodes... (2)
INFO  - 2014-07-16 19:48:25.287;
org.apache.solr.cloud.DistributedQueue$LatchChildWatcher; LatchChildWatcher
fired on path: /overseer/queue state: SyncConnected type NodeChildrenChanged
INFO  - 2014-07-16 19:48:25.291;
org.apache.solr.common.cloud.ZkStateReader; Updating cloud state from
ZooKeeper...
INFO  - 2014-07-16 19:48:25.293;
org.apache.solr.cloud.Overseer$ClusterStateUpdater; Update state
numShards=null message={
  operation:state,
  state:down,
  base_url:http://192.168.200.90:8983/solr;,
  core:collection_name,
  roles:null,
  node_name:192.168.200.90:8983_solr,
  shard:null,
  collection:collection_name,
  numShards:null,
  core_node_name:null}
INFO  - 2014-07-16 19:48:25.293;
org.apache.solr.cloud.Overseer$ClusterStateUpdater; node=core_node2 is
already registered
INFO  - 2014-07-16 19:48:25.293;
org.apache.solr.cloud.Overseer$ClusterStateUpdater; shard=shard1 is already
registered
[]
INFO  - 2014-07-16 19:49:00.188;
org.apache.solr.common.cloud.ZkStateReader$3; Updating live nodes... (3)
INFO  - 2014-07-16 19:49:00.322;
org.apache.solr.cloud.DistributedQueue$LatchChildWatcher; LatchChildWatcher
fired on path: /overseer/queue state: SyncConnected type NodeChildrenChanged
INFO  - 2014-07-16 19:49:00.335;
org.apache.solr.common.cloud.ZkStateReader; Updating cloud state from
ZooKeeper...
INFO  - 2014-07-16 19:49:00.337;
org.apache.solr.cloud.Overseer$ClusterStateUpdater; Update state
numShards=null message={
  operation:state,
  state:down,
  base_url:http://192.168.200.91:8983/solr;,
  core:collection_name,
  roles:null,
  node_name:192.168.200.91:8983_solr,
  shard:null,
  collection:collection_name,
  numShards:null,
  core_node_name:null}
INFO  - 2014-07-16 19:49:00.337;
org.apache.solr.cloud.Overseer$ClusterStateUpdater; node=core_node3 is
already registered
INFO  - 2014-07-16 19:49:00.337;
org.apache.solr.cloud.Overseer$ClusterStateUpdater; shard=shard1 is already
registered
[]
INFO  - 2014-07-16 19:49:21.220;
org.apache.solr.common.cloud.ZkStateReader$3; Updating live nodes... (4)
INFO  - 2014-07-16 19:49:21.350;
org.apache.solr.cloud.DistributedQueue$LatchChildWatcher; LatchChildWatcher
fired on path: /overseer/queue state: SyncConnected type NodeChildrenChanged
INFO  - 2014-07-16 19:49:21.357;
org.apache.solr.common.cloud.ZkStateReader; Updating cloud state from
ZooKeeper...
INFO  - 2014-07-16 19:49:21.359;
org.apache.solr.cloud.Overseer$ClusterStateUpdater; Update state
numShards=null message={
  operation:state,
  state:down,
  base_url:http://192.168.150.91:8983/solr;,
  core:collection_name,
  roles:null,
  node_name:192.168.150.91:8983_solr,
  shard:null,
  collection:collection_name,
  numShards:null,
  core_node_name:null}
INFO  - 2014-07-16 19:49:21.359;
org.apache.solr.cloud.Overseer$ClusterStateUpdater; node=core_node4 is
already registered
INFO  

Re: Updating Oracle

2014-07-16 Thread Shawn Heisey
On 7/16/2014 1:45 PM, Jason Bourne wrote:
 I am new to Solr so I just want to know is something is possible.  I might
 need some help coding later on after taking the tutorials.

 I am taking over a program that uses html and java script to dislay metadata
 from solr.  They now would like to update one field.  The solr db gets
 refeshed weekly from an oracle db.  So in order to save the changes the
 oracle db needs to be updated.  But to keep the updated field visible to the
 users I would like to update solr with the changes and then update oracle 
 Can solr fire a data base trigger on the field updated to update oracle? 
 What is called in solr and can you point me to an example?

 The other option is to add/modify the code to update oracle and solr from
 the application but this would be a lot of work.  If this is the only option
 can you point to an example of updating a field in solr?

 What are my options?  Thanks.

It would be better to have systems outside of Solr manage this, make the
change in Oracle and Solr at the same time.

If you wanted to manage it from within Solr, you could write a custom
Update Processor that looks at all of the updates that come into Solr,
decides which of them require changes in Oracle, and makes those
changes.  You would then include that custom update processor in Solr. 
Most likely that would be Java code written against the solr-core API
and condensed down into a jar file.  You would then include that jar in
your classpath and reference the class in an update chain configuration.

https://wiki.apache.org/solr/UpdateRequestProcessor

Thanks,
Shawn



Upper or Lower Case

2014-07-16 Thread EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions)
Hi ,

If I search 'Transmission Flush' it get the good match results, but when I use 
'transmission flush' I get different order of results, I search for the Name 
column in the schema and it has below config for the field type. Any clue what 
is wrong or is there any Conf changes need to get the same results.?

fieldType name=text_general class=solr.TextField 
positionIncrementGap=100
  analyzer type=index
 charFilter class=solr.HTMLStripCharFilterFactory /
  tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt /
  filter class=solr.KStemFilterFactory/
  filter class=solr.LowerCaseFilterFactory/
  filter class=solr.WordDelimiterFilterFactory 
generateWordParts=0 generateNumberParts=0 splitOnCaseChange=0 
splitOnNumerics=0 stemEnglishPossessive=0 catenateWords=1 
catenateNumbers=1 catenateAll=1 preserveOriginal=0/

  !-- in this example, we will only use synonyms at query time
filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt 
ignoreCase=true expand=false/
--

  /analyzer
  analyzer type=query
 charFilter class=solr.HTMLStripCharFilterFactory /
 tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.KStemFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt /
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
ignoreCase=true expand=true/
filter class=solr.LowerCaseFilterFactory/
  filter class=solr.WordDelimiterFilterFactory 
generateWordParts=0 generateNumberParts=0 splitOnCaseChange=0 
splitOnNumerics=0 stemEnglishPossessive=0 catenateWords=1 
catenateNumbers=1 catenateAll=1 preserveOriginal=0/

 /analyzer
/fieldType


Re: Memory leak for debugQuery?

2014-07-16 Thread Erik Hatcher
Tom -

You could maybe isolate it a little further by seeing using the “debug 
parameter with values of timing|query|results

Erik

On May 15, 2014, at 5:50 PM, Tom Burton-West tburt...@umich.edu wrote:

 Hello all,
 
 I'm trying to get relevance scoring information for each of 1,000 docs 
 returned for each of 250 queries.If I run the query (appended below) 
 without debugQuery=on, I have no problem with getting all the results with 
 under 4GB of memory use.  If I add the parameter debugQuery=on, memory use 
 goes up continuously and after about 20 queries (with 1,000 results each), 
 memory use reaches about 29.1 GB and the garbage collector gives up:
 
  org.apache.solr.common.SolrException; null:java.lang.RuntimeException: 
 java.lang.OutOfMemoryError: GC overhead limit exceeded
 
 I've attached a jmap -histo, exgerpt below.
 
 Is this a known issue with debugQuery?
 
 Tom
 
 query: 
 
 q=Abraham+Lincolnfl=id,scoreindent=onwt=jsonstart=0rows=1000version=2.2debugQuery=on
 
 without debugQuery=on:
 
 q=Abraham+Lincolnfl=id,scoreindent=onwt=jsonstart=0rows=1000version=2.2
 
 num   #instances#bytes  Class description
 --
 1:  585,559 10,292,067,456  byte[]
 2:  743,639 18,874,349,592  char[]
 3:  53,821  91,936,328  long[]
 4:  70,430  69,234,400  int[]
 5:  51,348  27,111,744  org.apache.lucene.util.fst.FST$Arc[]
 6:  286,357 20,617,704  org.apache.lucene.util.fst.FST$Arc
 7:  715,364 17,168,736  java.lang.String
 8:  79,561  12,547,792  * ConstMethodKlass
 9:  18,909  11,404,696  short[]
 10: 345,854 11,067,328  java.util.HashMap$Entry
 11: 8,823   10,351,024  * ConstantPoolKlass
 12: 79,561  10,193,328  * MethodKlass
 13: 228,587 9,143,480   org.apache.lucene.document.FieldType
 14: 228,584 9,143,360   org.apache.lucene.document.Field
 15: 368,423 8,842,152   org.apache.lucene.util.BytesRef
 16: 210,342 8,413,680   java.util.TreeMap$Entry
 17: 81,576  8,204,648   java.util.HashMap$Entry[]
 18: 107,921 7,770,312   org.apache.lucene.util.fst.FST$Arc
 19: 13,020  6,874,560   org.apache.lucene.util.fst.FST$Arc[]
 
 debugQuery_jmap.txt



Re: Upper or Lower Case

2014-07-16 Thread Ahmet Arslan
Hi,

you need to put lowercase filter before kstem filter.

Ahmet 



On Wednesday, July 16, 2014 11:55 PM, EXTERNAL Taminidi Ravi (ETI, 
Automotive-Service-Solutions) external.ravi.tamin...@us.bosch.com wrote:
Hi ,

If I search 'Transmission Flush' it get the good match results, but when I use 
'transmission flush' I get different order of results, I search for the Name 
column in the schema and it has below config for the field type. Any clue what 
is wrong or is there any Conf changes need to get the same results.?

fieldType name=text_general class=solr.TextField 
positionIncrementGap=100
      analyzer type=index
         charFilter class=solr.HTMLStripCharFilterFactory /
      tokenizer class=solr.WhitespaceTokenizerFactory/
        filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt /
              filter class=solr.KStemFilterFactory/
              filter class=solr.LowerCaseFilterFactory/
              filter class=solr.WordDelimiterFilterFactory 
generateWordParts=0 generateNumberParts=0 splitOnCaseChange=0 
splitOnNumerics=0 stemEnglishPossessive=0 catenateWords=1 
catenateNumbers=1 catenateAll=1 preserveOriginal=0/

              !-- in this example, we will only use synonyms at query time
        filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt 
ignoreCase=true expand=false/
        --

      /analyzer
      analyzer type=query
         charFilter class=solr.HTMLStripCharFilterFactory /
     tokenizer class=solr.WhitespaceTokenizerFactory/
              filter class=solr.KStemFilterFactory/
        filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt /
        filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
ignoreCase=true expand=true/
        filter class=solr.LowerCaseFilterFactory/
              filter class=solr.WordDelimiterFilterFactory 
generateWordParts=0 generateNumberParts=0 splitOnCaseChange=0 
splitOnNumerics=0 stemEnglishPossessive=0 catenateWords=1 
catenateNumbers=1 catenateAll=1 preserveOriginal=0/

         /analyzer
    /fieldType


Re: Memory leak for debugQuery?

2014-07-16 Thread Tomás Fernández Löbbe
Also, is this trunk? Solr 4.x? Single shard, right?


On Wed, Jul 16, 2014 at 2:24 PM, Erik Hatcher erik.hatc...@gmail.com
wrote:

 Tom -

 You could maybe isolate it a little further by seeing using the “debug
 parameter with values of timing|query|results

 Erik

 On May 15, 2014, at 5:50 PM, Tom Burton-West tburt...@umich.edu wrote:

  Hello all,
 
  I'm trying to get relevance scoring information for each of 1,000 docs
 returned for each of 250 queries.If I run the query (appended below)
 without debugQuery=on, I have no problem with getting all the results with
 under 4GB of memory use.  If I add the parameter debugQuery=on, memory use
 goes up continuously and after about 20 queries (with 1,000 results each),
 memory use reaches about 29.1 GB and the garbage collector gives up:
 
   org.apache.solr.common.SolrException; null:java.lang.RuntimeException:
 java.lang.OutOfMemoryError: GC overhead limit exceeded
 
  I've attached a jmap -histo, exgerpt below.
 
  Is this a known issue with debugQuery?
 
  Tom
  
  query:
 
 
 q=Abraham+Lincolnfl=id,scoreindent=onwt=jsonstart=0rows=1000version=2.2debugQuery=on
 
  without debugQuery=on:
 
 
 q=Abraham+Lincolnfl=id,scoreindent=onwt=jsonstart=0rows=1000version=2.2
 
  num   #instances#bytes  Class description
 
 --
  1:  585,559 10,292,067,456  byte[]
  2:  743,639 18,874,349,592  char[]
  3:  53,821  91,936,328  long[]
  4:  70,430  69,234,400  int[]
  5:  51,348  27,111,744
  org.apache.lucene.util.fst.FST$Arc[]
  6:  286,357 20,617,704
  org.apache.lucene.util.fst.FST$Arc
  7:  715,364 17,168,736  java.lang.String
  8:  79,561  12,547,792  * ConstMethodKlass
  9:  18,909  11,404,696  short[]
  10: 345,854 11,067,328  java.util.HashMap$Entry
  11: 8,823   10,351,024  * ConstantPoolKlass
  12: 79,561  10,193,328  * MethodKlass
  13: 228,587 9,143,480
 org.apache.lucene.document.FieldType
  14: 228,584 9,143,360   org.apache.lucene.document.Field
  15: 368,423 8,842,152   org.apache.lucene.util.BytesRef
  16: 210,342 8,413,680   java.util.TreeMap$Entry
  17: 81,576  8,204,648   java.util.HashMap$Entry[]
  18: 107,921 7,770,312
 org.apache.lucene.util.fst.FST$Arc
  19: 13,020  6,874,560
 org.apache.lucene.util.fst.FST$Arc[]
 
  debugQuery_jmap.txt




Re: Strategies for effective prefix queries?

2014-07-16 Thread Alexandre Rafalovitch
Your first and last email seem to be contradicting. You said initially
you wanted to search for solr-u and match that. Now you are saying
you want to search bo sm and match that.

Either way, I do have very similar scenario working in the project I
sent you a link to. I am breaking on full-stops and case changes for
Javadoc names. You can try it live for yourself here:
http://www.solr-start.com/javadoc/solr-lucene/index.html (Search for
To Fi to match for TokenFilter).

Regards,
Alex
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On Thu, Jul 17, 2014 at 1:00 AM, Hayden Muhl haydenm...@gmail.com wrote:
 A copy field does not address my problem, and this has nothing to do with
 stored fields. This is a query parsing problem, not an indexing problem.

 Here's the use case.

 If someone has a username like bob-smith, I would like it to match
 prefixes of bo and sm. I tokenize the username into the tokens bob
 and smith. Everything is fine so far.

 If someone enters bo sm as a search string, I would like bob-smith to
 be one of the results. The query to do this is straight forward,
 username:bo* username:sm*. Here's the problem. In order to construct that
 query, I have to tokenize the search string bo sm **on the client**. I
 don't want to reimplement tokenization on the client. Is there any way to
 give Solr the string bo sm, have Solr do the tokenization, then treat
 each token like a prefix?


 On Tue, Jul 15, 2014 at 4:55 PM, Alexandre Rafalovitch arafa...@gmail.com
 wrote:

 So copyField it to another and apply alternative processing there. Use
 eDismax to search both. No need to store the copied field, just index it.

 Regards,
  Alex
 On 16/07/2014 2:46 am, Hayden Muhl haydenm...@gmail.com wrote:

  Both fields? There is only one field here: username.
 
 
  On Mon, Jul 14, 2014 at 6:17 PM, Alexandre Rafalovitch 
 arafa...@gmail.com
  
  wrote:
 
   Search against both fields (one split, one not split)? Keep original
   and tokenized form? I am doing something similar with class name
   autocompletes here:
  
  
 
 https://github.com/arafalov/Solr-Javadoc/blob/master/JavadocIndex/JavadocCollection/conf/schema.xml#L24
  
   Regards,
  Alex.
   Personal: http://www.outerthoughts.com/ and @arafalov
   Solr resources: http://www.solr-start.com/ and @solrstart
   Solr popularizers community:
 https://www.linkedin.com/groups?gid=6713853
  
  
   On Tue, Jul 15, 2014 at 8:04 AM, Hayden Muhl haydenm...@gmail.com
  wrote:
I'm working on using Solr for autocompleting usernames. I'm running
  into
   a
problem with the wildcard queries (e.g. username:al*).
   
We are tokenizing usernames so that a username like solr-user will
 be
tokenized into solr and user, and will match both sol and use
prefixes. The problem is when we get solr-u as a prefix, I'm having
  to
split that up on the client side before I construct a query
   username:solr*
username:u*. I'm basically using a regex as a poor man's tokenizer.
   
Is there a better way to approach this? Is there a way to tell Solr
 to
tokenize a string and use the parts as prefixes?
   
- Hayden
  
 



Re: Strategies for effective prefix queries?

2014-07-16 Thread Jorge Luis Betancourt Gonzalez
Perhaps what you’re trying to do could be addressed by using the 
EdgeNGramFilterFactory filter? For query suggestions I’m using a very similar 
approach, this is an extract of the configuration I’m using:

tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 
splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EdgeNGramFilterFactory maxGramSize=“10 minGramSize=1/

Basically this allows you to get partial matches from any part of the string, 
let’s say the field get’s this content at index time: A brown fox”, this 
document will be matched by the query (“bro”) for instance. My personal 
recommendation is to use this in a separated field that get’s populated through 
a copyField, this way you could apply different boosts.

Greetings,

On Jul 16, 2014, at 2:00 PM, Hayden Muhl haydenm...@gmail.com wrote:

 A copy field does not address my problem, and this has nothing to do with
 stored fields. This is a query parsing problem, not an indexing problem.
 
 Here's the use case.
 
 If someone has a username like bob-smith, I would like it to match
 prefixes of bo and sm. I tokenize the username into the tokens bob
 and smith. Everything is fine so far.
 
 If someone enters bo sm as a search string, I would like bob-smith to
 be one of the results. The query to do this is straight forward,
 username:bo* username:sm*. Here's the problem. In order to construct that
 query, I have to tokenize the search string bo sm **on the client**. I
 don't want to reimplement tokenization on the client. Is there any way to
 give Solr the string bo sm, have Solr do the tokenization, then treat
 each token like a prefix?
 
 
 On Tue, Jul 15, 2014 at 4:55 PM, Alexandre Rafalovitch arafa...@gmail.com
 wrote:
 
 So copyField it to another and apply alternative processing there. Use
 eDismax to search both. No need to store the copied field, just index it.
 
 Regards,
 Alex
 On 16/07/2014 2:46 am, Hayden Muhl haydenm...@gmail.com wrote:
 
 Both fields? There is only one field here: username.
 
 
 On Mon, Jul 14, 2014 at 6:17 PM, Alexandre Rafalovitch 
 arafa...@gmail.com
 
 wrote:
 
 Search against both fields (one split, one not split)? Keep original
 and tokenized form? I am doing something similar with class name
 autocompletes here:
 
 
 
 https://github.com/arafalov/Solr-Javadoc/blob/master/JavadocIndex/JavadocCollection/conf/schema.xml#L24
 
 Regards,
   Alex.
 Personal: http://www.outerthoughts.com/ and @arafalov
 Solr resources: http://www.solr-start.com/ and @solrstart
 Solr popularizers community:
 https://www.linkedin.com/groups?gid=6713853
 
 
 On Tue, Jul 15, 2014 at 8:04 AM, Hayden Muhl haydenm...@gmail.com
 wrote:
 I'm working on using Solr for autocompleting usernames. I'm running
 into
 a
 problem with the wildcard queries (e.g. username:al*).
 
 We are tokenizing usernames so that a username like solr-user will
 be
 tokenized into solr and user, and will match both sol and use
 prefixes. The problem is when we get solr-u as a prefix, I'm having
 to
 split that up on the client side before I construct a query
 username:solr*
 username:u*. I'm basically using a regex as a poor man's tokenizer.
 
 Is there a better way to approach this? Is there a way to tell Solr
 to
 tokenize a string and use the parts as prefixes?
 
 - Hayden
 
 
 

VII Escuela Internacional de Verano en la UCI del 30 de junio al 11 de julio de 
2014. Ver www.uci.cu


Re: problem with replication/solrcloud - getting 'missing required field' during update intermittently (SOLR-6251)

2014-07-16 Thread Nathan Neulinger
FYI. We finally tracked down the problem at least 99.9% sure at this point, and it was staring me in the face the 
whole time - just never noticed:


[{id:4b2c4d09-31e2-4fe2-b767-3868efbdcda1,channel: {add: preet},channel: 
{add: adam}}]

Look at the JSON... It's trying to add two channel array elements... Should 
have been:

[{id:4b2c4d09-31e2-4fe2-b767-3868efbdcda1,channel: {add: preet}},
 {id:4b2c4d09-31e2-4fe2-b767-3868efbdcda1,channel: {add: adam}}]

I half wonder how it chose to interpret that particular chunk of json, but either way, I think the origin of our issue 
is resolved.



From what I'm reading on JSON - this isn't valid syntax at all. I'm guessing that SOLR doesn't actually validate the 
JSON, and it's parser is just creating something weird in that situation like a new request for a whole new document.


-- Nathan


On 07/15/2014 07:19 PM, Nathan Neulinger wrote:

Issue was closed in Jira requesting it be discussed here first. Looking for any 
diagnostic assistance on this issue with
4.8.0 since it is intermittent and occurs without warning.

Setup is two nodes, with external zk ensemble. Nodes are accessed round-robin 
on EC2 behind an ELB.

Schema has:

schema name=hive version=1.5
...
field name=timestamp type=long indexed=false stored=true required=true 
multiValued=false
omitNorms=true /
...


Most of the updates are working without issue, but randomly we'll get the above 
failure, even though searches before and
after the update clearly indicate that the document had the timestamp field in 
it. The error occurs when the second node
does it's distrib operation against the first node.

Diagnostic details are all in the jira issue. Can provide more as needed, but 
would appreciate any suggestions on what
to try or to help diagnose this other than just trying to throw thousands of 
requests at it in round-robin between the
two instances to see if it's possible to reproduce the issue.

-- Nathan


Nathan Neulinger   nn...@neulinger.org
Neulinger Consulting   (573) 612-1412


--

Nathan Neulinger   nn...@neulinger.org
Neulinger Consulting   (573) 612-1412


Inconsistant result's on solr cloud 4.8

2014-07-16 Thread Cool Techi
Hi,
We are using solr cloud with solr version 4.8, we have 2 shard/2 replica 
servers in Solr Cloud. During two consecutive request to the solr cloud, the 
total results number varies. 
1) As per my understanding this can happen when the leader and the replica have 
inconsistant number of results.
2) This inconsistant number of docs between leader and replica can happen only 
when replica is recovering. Should a request be sent to a node which is 
recovering.
Since this is happening on our live setup, we tend to question how much can we 
rely on solr. What could be causing this and what's the fix.
Regards   

Script Transformer Help

2014-07-16 Thread pavan patharde
Hi All,

I have data-config.xml as below:Script Transformer is omitted.
dataConfig
dataSource driver=org.hsqldb.jdbcDriver­­
url=jdbc:hsqldb:/temp/example/ex­­ user=sa /

script![CDATA[function f1(row){
   row.put('message', 'Hello World!');return
row;}]]/script


document name=products
entity name=item query=select NAME,BSIN from items
transformer=script:f1
field column=NAME name=id /
field column=BSIN name=bsin /

entity name=brands query=select brandname from brand where
bsin='${item.BSIN}' ­­
field name=brand column=BRAND /
field name=cname column=namedesc  /
/entity

/entity
/document
/dataConfig

I am able to access NAME and BSIN in the function f1. I am not able to
access the brand and cname. Is there any way i can access brand and cname
from child entity in script transformer ?
Thanks in advance.

Regards,
Pavan .P.Patharde


Re: Script Transformer Help

2014-07-16 Thread Alexandre Rafalovitch
Have you tried putting the transformer on the inner entity definition?
It's like a nested loop and you just put it in the outer loop.

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On Thu, Jul 17, 2014 at 11:29 AM, pavan patharde
pathardepa...@gmail.com wrote:
 Hi All,

 I have data-config.xml as below:Script Transformer is omitted.
 dataConfig
 dataSource driver=org.hsqldb.jdbcDriver­­
 url=jdbc:hsqldb:/temp/example/ex­­ user=sa /

 script![CDATA[function f1(row){
row.put('message', 'Hello World!');return
 row;}]]/script


 document name=products
 entity name=item query=select NAME,BSIN from items
 transformer=script:f1
 field column=NAME name=id /
 field column=BSIN name=bsin /

 entity name=brands query=select brandname from brand where
 bsin='${item.BSIN}' ­­
 field name=brand column=BRAND /
 field name=cname column=namedesc  /
 /entity

 /entity
 /document
 /dataConfig

 I am able to access NAME and BSIN in the function f1. I am not able to
 access the brand and cname. Is there any way i can access brand and cname
 from child entity in script transformer ?
 Thanks in advance.

 Regards,
 Pavan .P.Patharde


Re: problem with replication/solrcloud - getting 'missing required field' during update intermittently (SOLR-6251)

2014-07-16 Thread Shalin Shekhar Mangar
Phew, thanks for tracking it down.


On Thu, Jul 17, 2014 at 7:50 AM, Nathan Neulinger nn...@neulinger.org
wrote:

 FYI. We finally tracked down the problem at least 99.9% sure at this
 point, and it was staring me in the face the whole time - just never
 noticed:

 [{id:4b2c4d09-31e2-4fe2-b767-3868efbdcda1,channel: {add:
 preet},channel: {add: adam}}]

 Look at the JSON... It's trying to add two channel array elements...
 Should have been:

 [{id:4b2c4d09-31e2-4fe2-b767-3868efbdcda1,channel: {add:
 preet}},
  {id:4b2c4d09-31e2-4fe2-b767-3868efbdcda1,channel: {add: adam}}]

 I half wonder how it chose to interpret that particular chunk of json, but
 either way, I think the origin of our issue is resolved.


 From what I'm reading on JSON - this isn't valid syntax at all. I'm
 guessing that SOLR doesn't actually validate the JSON, and it's parser is
 just creating something weird in that situation like a new request for a
 whole new document.

 -- Nathan



 On 07/15/2014 07:19 PM, Nathan Neulinger wrote:

 Issue was closed in Jira requesting it be discussed here first. Looking
 for any diagnostic assistance on this issue with
 4.8.0 since it is intermittent and occurs without warning.

 Setup is two nodes, with external zk ensemble. Nodes are accessed
 round-robin on EC2 behind an ELB.

 Schema has:

 schema name=hive version=1.5
 ...
 field name=timestamp type=long indexed=false stored=true
 required=true multiValued=false
 omitNorms=true /
 ...


 Most of the updates are working without issue, but randomly we'll get the
 above failure, even though searches before and
 after the update clearly indicate that the document had the timestamp
 field in it. The error occurs when the second node
 does it's distrib operation against the first node.

 Diagnostic details are all in the jira issue. Can provide more as needed,
 but would appreciate any suggestions on what
 to try or to help diagnose this other than just trying to throw thousands
 of requests at it in round-robin between the
 two instances to see if it's possible to reproduce the issue.

 -- Nathan

 
 Nathan Neulinger   nn...@neulinger.org
 Neulinger Consulting   (573) 612-1412


 --
 
 Nathan Neulinger   nn...@neulinger.org
 Neulinger Consulting   (573) 612-1412




-- 
Regards,
Shalin Shekhar Mangar.


Re: Script Transformer Help

2014-07-16 Thread pavan patharde
Thats a good idea Alexandre. I will try it and update the results.. Thanks.

Pavan .P.Patharde
Phone:9844626450


On Thu, Jul 17, 2014 at 10:08 AM, Alexandre Rafalovitch arafa...@gmail.com
wrote:

 Have you tried putting the transformer on the inner entity definition?
 It's like a nested loop and you just put it in the outer loop.

 Regards,
Alex.
 Personal: http://www.outerthoughts.com/ and @arafalov
 Solr resources: http://www.solr-start.com/ and @solrstart
 Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


 On Thu, Jul 17, 2014 at 11:29 AM, pavan patharde
 pathardepa...@gmail.com wrote:
  Hi All,
 
  I have data-config.xml as below:Script Transformer is omitted.
  dataConfig
  dataSource driver=org.hsqldb.jdbcDriver­­
  url=jdbc:hsqldb:/temp/example/ex­­ user=sa /
 
  script![CDATA[function f1(row){
 row.put('message', 'Hello World!');return
  row;}]]/script
 
 
  document name=products
  entity name=item query=select NAME,BSIN from items
  transformer=script:f1
  field column=NAME name=id /
  field column=BSIN name=bsin /
 
  entity name=brands query=select brandname from brand where
  bsin='${item.BSIN}' ­­
  field name=brand column=BRAND /
  field name=cname column=namedesc  /
  /entity
 
  /entity
  /document
  /dataConfig
 
  I am able to access NAME and BSIN in the function f1. I am not able to
  access the brand and cname. Is there any way i can access brand and cname
  from child entity in script transformer ?
  Thanks in advance.
 
  Regards,
  Pavan .P.Patharde



Re: Strategies for effective prefix queries?

2014-07-16 Thread Hayden Muhl
Thank you Jorge. I didn't know about that filter. It's just what I was
looking for.

- Hayden


On Wed, Jul 16, 2014 at 4:35 PM, Jorge Luis Betancourt Gonzalez 
jlbetanco...@uci.cu wrote:

 Perhaps what you’re trying to do could be addressed by using the
 EdgeNGramFilterFactory filter? For query suggestions I’m using a very
 similar approach, this is an extract of the configuration I’m using:

 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
 filter class=solr.WordDelimiterFilterFactory generateWordParts=1
 generateNumberParts=1 catenateWords=0 catenateNumbers=0
 catenateAll=0 splitOnCaseChange=1/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.EdgeNGramFilterFactory maxGramSize=“10
 minGramSize=1/

 Basically this allows you to get partial matches from any part of the
 string, let’s say the field get’s this content at index time: A brown
 fox”, this document will be matched by the query (“bro”) for instance. My
 personal recommendation is to use this in a separated field that get’s
 populated through a copyField, this way you could apply different boosts.

 Greetings,

 On Jul 16, 2014, at 2:00 PM, Hayden Muhl haydenm...@gmail.com wrote:

  A copy field does not address my problem, and this has nothing to do with
  stored fields. This is a query parsing problem, not an indexing problem.
 
  Here's the use case.
 
  If someone has a username like bob-smith, I would like it to match
  prefixes of bo and sm. I tokenize the username into the tokens bob
  and smith. Everything is fine so far.
 
  If someone enters bo sm as a search string, I would like bob-smith to
  be one of the results. The query to do this is straight forward,
  username:bo* username:sm*. Here's the problem. In order to construct
 that
  query, I have to tokenize the search string bo sm **on the client**. I
  don't want to reimplement tokenization on the client. Is there any way to
  give Solr the string bo sm, have Solr do the tokenization, then treat
  each token like a prefix?
 
 
  On Tue, Jul 15, 2014 at 4:55 PM, Alexandre Rafalovitch 
 arafa...@gmail.com
  wrote:
 
  So copyField it to another and apply alternative processing there. Use
  eDismax to search both. No need to store the copied field, just index
 it.
 
  Regards,
  Alex
  On 16/07/2014 2:46 am, Hayden Muhl haydenm...@gmail.com wrote:
 
  Both fields? There is only one field here: username.
 
 
  On Mon, Jul 14, 2014 at 6:17 PM, Alexandre Rafalovitch 
  arafa...@gmail.com
 
  wrote:
 
  Search against both fields (one split, one not split)? Keep original
  and tokenized form? I am doing something similar with class name
  autocompletes here:
 
 
 
 
 https://github.com/arafalov/Solr-Javadoc/blob/master/JavadocIndex/JavadocCollection/conf/schema.xml#L24
 
  Regards,
Alex.
  Personal: http://www.outerthoughts.com/ and @arafalov
  Solr resources: http://www.solr-start.com/ and @solrstart
  Solr popularizers community:
  https://www.linkedin.com/groups?gid=6713853
 
 
  On Tue, Jul 15, 2014 at 8:04 AM, Hayden Muhl haydenm...@gmail.com
  wrote:
  I'm working on using Solr for autocompleting usernames. I'm running
  into
  a
  problem with the wildcard queries (e.g. username:al*).
 
  We are tokenizing usernames so that a username like solr-user will
  be
  tokenized into solr and user, and will match both sol and use
  prefixes. The problem is when we get solr-u as a prefix, I'm having
  to
  split that up on the client side before I construct a query
  username:solr*
  username:u*. I'm basically using a regex as a poor man's tokenizer.
 
  Is there a better way to approach this? Is there a way to tell Solr
  to
  tokenize a string and use the parts as prefixes?
 
  - Hayden
 
 
 

 VII Escuela Internacional de Verano en la UCI del 30 de junio al 11 de
 julio de 2014. Ver www.uci.cu