CSV entry as multiple documents

2015-02-17 Thread Henrique Oliveira
Hi all,

I was wondering if there is a way to tell Solr to treat a CSV entry as multiple 
documents instead of one document. For instance, suppose that a CSV file has 4 
fields and a single entry:
t1,v1,v2,v3
2015-01-01T01:00:59Z,0.3,0.5,0.7

I want Solr to update its index like it were 3 different documents:
t1,v
2015-01-01T01:00:59Z,0.3
2015-01-01T01:00:59Z,0.5
2015-01-01T01:00:59Z,0.7

Is that possible, or do I have to create a different CSV for it?

Many thanks,
Henrique.

Re: Simple Sort Is Not Working In Solr 4.7?

2015-02-17 Thread Alexandre Rafalovitch
What's the field definition for your title field? Is it just string
or are you doing some tokenizing?

It should be a string or a single token cleaned up (e.g. lower-cased)
using KeywordTokenizer. In the example schema, you will normally see
the original field tokenized and the sort field separately with
copyField connection. In latest Solr, docValues are also recommended
for sort fields.

Regards,
   Alex.

Sign up for my Solr resources newsletter at http://www.solr-start.com/

On 17 February 2015 at 19:52, Simon Cheng simonwhch...@gmail.com wrote:
 I don't know whether it is my setup or any other reasons. But the fact is
 that a very simple sort is not working in my Solr 4.7 environment.

 The query is very simple :
 http://localhost:8983/solr/bibs/select?q=author:sorosfl=id,author,titlesort=title+ascwt=xmlstart=0indent=true

 And the output is NOT sorted according to title :


Re: Checkout the source Code to the Release Version of Solr?

2015-02-17 Thread Mike Drob
The SVN source is under tags, not branches.

http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_4_10_3/

On Tue, Feb 17, 2015 at 4:39 PM, O. Olson olson_...@yahoo.it wrote:

 Thank you Hrishikesh. Funny how GitHub is not mentioned  on
 http://lucene.apache.org/solr/resources.html

 I think common-build.xml is what I was looking for. Thank you



 Hrishikesh Gadre-3 wrote
  Also the version number is encoded (at least) in the build file
 
 
 https://github.com/apache/lucene-solr/blob/817303840fce547a1557e330e93e5a8ac0618f34/lucene/common-build.xml#L32
 
  Hope this helps.
 
  Thanks
  Hrishikesh


 Hrishikesh Gadre-3 wrote
  Hi,
 
  You can get the released code base here
 
  https://github.com/apache/lucene-solr/releases
 
  Thanks
  Hrishikesh





 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Checkout-the-source-Code-to-the-Release-Version-of-Solr-tp4187041p4187048.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: CSV entry as multiple documents

2015-02-17 Thread Anshum Gupta
Hi Henrique,

Solr supports posting a csv with multiple rows. Have a look at the
documentation in the ref. guide here:
https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-CSVFormattedIndexUpdates



On Tue, Feb 17, 2015 at 2:44 PM, Henrique Oliveira hensan...@gmail.com
wrote:

 Hi all,

 I was wondering if there is a way to tell Solr to treat a CSV entry as
 multiple documents instead of one document. For instance, suppose that a
 CSV file has 4 fields and a single entry:
 t1,v1,v2,v3
 2015-01-01T01:00:59Z,0.3,0.5,0.7

 I want Solr to update its index like it were 3 different documents:
 t1,v
 2015-01-01T01:00:59Z,0.3
 2015-01-01T01:00:59Z,0.5
 2015-01-01T01:00:59Z,0.7

 Is that possible, or do I have to create a different CSV for it?

 Many thanks,
 Henrique.




-- 
Anshum Gupta
http://about.me/anshumgupta


Re: Using TimestampUpdateProcessorFactory and updateRequestProcessorChain

2015-02-17 Thread Shu-Wai Chow
 if your goal is that *every* doc will get a last_modified, regarldess of 
 how it is indexed, then you don't need to set the update.chain default 
 on every requestHandler -- instead just mark your 
 updateRequestProcessorChain as the default...
 
   updateRequestProcessorChain name=last_modified default=true
 processor class=solr.TimestampUpdateProcessorFactory
   str name=fieldNamelast_modified/str
 /processor
…

Thanks for this.  There was some confusion between me and my coworker on which 
requestHandler to set it, but setting it as a default should solve the problem. 
 Unfortunately, I’m still not getting it back.  I’m now wondering if it’s the 
schema that I’m screwing up or how I’m sending the index command.


Schema.xml:

 : field name=last_modified type=date indexed=true stored=true /
 : fieldType name=date class=solr.TrieDateField precisionStep=0 
 positionIncrementGap=0”/

And the update command:

 : curl 
 http://localhost:8983/solr/update/extract?uprefix=attr_fmap.content=bodyliteral.id=1234.idlast_modified=NOW;
  -F sc=@1234.txt

--


 On Feb 17, 2015, at 10:26 AM, Chris Hostetter hossman_luc...@fucit.org 
 wrote:
 
 : Hi,
 : 
 : You are using /update when registering, but using /update/extract when 
 invoking.
 : 
 : Ahmet
 
 if your goal is that *every* doc will get a last_modified, regarldess of 
 how it is indexed, then you don't need to set the update.chain default 
 on every requestHandler -- instead just mark your 
 updateRequestProcessorChain as the default...
 
   updateRequestProcessorChain name=last_modified default=true
 processor class=solr.TimestampUpdateProcessorFactory
   str name=fieldNamelast_modified/str
 /processor
...
 
 : 
 : On Tuesday, February 17, 2015 6:28 PM, Shu-Wai Chow 
 sc...@alumni.rutgers.edu wrote:
 : Hi, all.  I’m trying to insert a field into Solr called last_modified, 
 which holds a timestamp of the update. Since this is a cloud setup, I'm using 
 the TimestampUpdateProcessorFactory to update the updateRequestProcessorChain.
 : 
 : solrconfig.xml:
 : 
 : requestHandler name=/update class=solr.UpdateRequestHandler
 : lst name=defaults
 : str name=update.chainlast_modified/str
 : /lst
 : /requestHandler
 : 
 : updateRequestProcessorChain name=last_modified
 : processor class=solr.TimestampUpdateProcessorFactory
 : str name=fieldNamelast_modified/str
 : /processor
 : processor class=solr.LogUpdateProcessorFactory /
 : processor class=solr.RunUpdateProcessorFactory /
 : /updateRequestProcessorChain
 : 
 : 
 : In schema.xml, I have:
 : 
 : field name=last_modified type=date indexed=true stored=true /
 : fieldType name=date class=solr.TrieDateField precisionStep=0 
 positionIncrementGap=0/
 : This is the command I'm using to index:
 : 
 : curl 
 http://localhost:8983/solr/update/extract?uprefix=attr_fmap.content=bodyliteral.id=1234.idlast_modified=NOW;
  -F sc=@1234.txt
 : However, after indexing, the last_modified field is still not showing up on 
 queries. Is there something else I should be doing?  Thanks.
 : 
 
 -Hoss
 http://www.lucidworks.com/



Re: Checkout the source Code to the Release Version of Solr?

2015-02-17 Thread Shawn Heisey
On 2/17/2015 3:20 PM, O. Olson wrote:
 At this time the latest released version of Solr is 4.10.3. Is there anyway
 we can get the source code for this release version?

 I tried to checkout the Solr code from
 http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_10/ In the
 commit log, I see a number of revisions but nothing mention which is the
 release version. The latest revision being 1657441 on Feb 4. Does this
 correspond to 4.10.3? If no, then how do I go about getting the source code
 of 4.10.3.

That is the current development branch for 4.10.x.  There are some
changes in that branch that are not in any released version yet.  If a
4.10.4 is ever released, it will come from that branch.  There is no
guarantee that a 4.10.4 will ever be released.

It is likely that the 5.0.0 release will be announced in the next few
days.  A problem could still be found, but the current release candidate
is looking good so far.

 I'm also curious where the version number is embedded i.e. is it in a file
 somewhere?

Yes.  You can find it in lucene/version.propertiesin a typical checkout.

 I want to ensure I am using the released version, and not some bug fixes
 after the version got released. 

For that exact version, you want to use this URL for your svn checkout:

http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_4_10_3/

I don't see lucene/version.properties in that tag, but the 4.10.3
version does show up in lucene/common-build.xml.

Thanks,
Shawn



Re: Checkout the source Code to the Release Version of Solr?

2015-02-17 Thread O. Olson
Thank you Mike. This is what I was looking for. I apparently did not
understand what tags where.


Mike Drob wrote
 The SVN source is under tags, not branches.
 
 http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_4_10_3/





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Checkout-the-source-Code-to-the-Release-Version-of-Solr-tp4187041p4187054.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: CSV entry as multiple documents

2015-02-17 Thread Henrique Oliveira
Yes, Alexandre is right about my question. To make it clear, a CSV that look 
like:
t1,v1,v2,v2
2015-01-01T01:59:00Z,0.3,0.5,0.7
2015-01-01T02:00:00Z,0.4,0.5,0.8

would be the same of indexing
t1,v
2015-01-01T01:59:00Z,0.3
2015-01-01T01:59:00Z,0.5
2015-01-01T01:59:00Z,0.7
2015-01-01T02:00:00Z,0.4
2015-01-01T02:00:00Z,0.5
2015-01-01T02:00:00Z,0.8

I don’t know if multiValued field would do the trick. Do you have more info on 
that split command?

Henrique

 On Feb 17, 2015, at 7:57 PM, Alexandre Rafalovitch arafa...@gmail.com wrote:
 
 I think the question asked was a bit different. It was about having
 one row/document split into multiple with some fields replicated and
 some mapped.
 
 JSON (single-document format) has a split command which might be
 similar to what's being asked. CSV has a split command as well, but I
 think it is more about creating a multiValued field.
 
 Or did I miss a different parameter?
 
 Regards,
   Alex.
 
 Sign up for my Solr resources newsletter at http://www.solr-start.com/
 
 
 On 17 February 2015 at 19:41, Anshum Gupta ans...@anshumgupta.net wrote:
 Hi Henrique,
 
 Solr supports posting a csv with multiple rows. Have a look at the
 documentation in the ref. guide here:
 https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-CSVFormattedIndexUpdates
 
 
 
 On Tue, Feb 17, 2015 at 2:44 PM, Henrique Oliveira hensan...@gmail.com
 wrote:
 
 Hi all,
 
 I was wondering if there is a way to tell Solr to treat a CSV entry as
 multiple documents instead of one document. For instance, suppose that a
 CSV file has 4 fields and a single entry:
 t1,v1,v2,v3
 2015-01-01T01:00:59Z,0.3,0.5,0.7
 
 I want Solr to update its index like it were 3 different documents:
 t1,v
 2015-01-01T01:00:59Z,0.3
 2015-01-01T01:00:59Z,0.5
 2015-01-01T01:00:59Z,0.7
 
 Is that possible, or do I have to create a different CSV for it?
 
 Many thanks,
 Henrique.
 
 
 
 
 --
 Anshum Gupta
 http://about.me/anshumgupta



Re: Checkout the source Code to the Release Version of Solr?

2015-02-17 Thread O. Olson
Thank you Hrishikesh. Funny how GitHub is not mentioned  on
http://lucene.apache.org/solr/resources.html  

I think common-build.xml is what I was looking for. Thank you



Hrishikesh Gadre-3 wrote
 Also the version number is encoded (at least) in the build file
 
 https://github.com/apache/lucene-solr/blob/817303840fce547a1557e330e93e5a8ac0618f34/lucene/common-build.xml#L32
 
 Hope this helps.
 
 Thanks
 Hrishikesh


Hrishikesh Gadre-3 wrote
 Hi,
 
 You can get the released code base here
 
 https://github.com/apache/lucene-solr/releases
 
 Thanks
 Hrishikesh





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Checkout-the-source-Code-to-the-Release-Version-of-Solr-tp4187041p4187048.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Checkout the source Code to the Release Version of Solr?

2015-02-17 Thread O. Olson
Thank you Shawn. I have not updated my version in a while, so I prefer to do
it to 4.10 first, rather than go directly to 5.0. I'd be working on it
towards the end of this week.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Checkout-the-source-Code-to-the-Release-Version-of-Solr-tp4187041p4187055.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solrcloud sizing

2015-02-17 Thread Dominique Bejean
One of our customers needs to index 15 billions document in a collection.
As this volume is not usual for me, I need some advices about solrcloud
sizing (how much servers, nodes, shards, replicas, memory, ...)

Some inputs :

   - Collection size : 15 billions document
   - Collection update : 8 millions new documents / days + 8 millions
   deleted documents / days
   - Updates occur during the night without queries
   - Queries occur during the day without updates
   - Document size is nearly 300 bytes
   - Document fields are mainly string including one date field
   - The same terms will occurs several time for a given field (from 10 to
   100.000)
   - Query will use a date period and a filter query on one or more fields
   - 10.000 queries / minutes
   - expected response time  500ms
   - 1 billion documents indexed = 5Gb index size
   - no ssd drives

So, what is you advice about :

# of shards : 15 billions documents - 16 shards ?
# of replicas ?
# of nodes = # of shards ?
heap memory per node ?
direct memory per node ?

Thank your advices ?

Dominique


Re: CSV entry as multiple documents

2015-02-17 Thread Alexandre Rafalovitch
I think the question asked was a bit different. It was about having
one row/document split into multiple with some fields replicated and
some mapped.

JSON (single-document format) has a split command which might be
similar to what's being asked. CSV has a split command as well, but I
think it is more about creating a multiValued field.

Or did I miss a different parameter?

Regards,
   Alex.

Sign up for my Solr resources newsletter at http://www.solr-start.com/


On 17 February 2015 at 19:41, Anshum Gupta ans...@anshumgupta.net wrote:
 Hi Henrique,

 Solr supports posting a csv with multiple rows. Have a look at the
 documentation in the ref. guide here:
 https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-CSVFormattedIndexUpdates



 On Tue, Feb 17, 2015 at 2:44 PM, Henrique Oliveira hensan...@gmail.com
 wrote:

 Hi all,

 I was wondering if there is a way to tell Solr to treat a CSV entry as
 multiple documents instead of one document. For instance, suppose that a
 CSV file has 4 fields and a single entry:
 t1,v1,v2,v3
 2015-01-01T01:00:59Z,0.3,0.5,0.7

 I want Solr to update its index like it were 3 different documents:
 t1,v
 2015-01-01T01:00:59Z,0.3
 2015-01-01T01:00:59Z,0.5
 2015-01-01T01:00:59Z,0.7

 Is that possible, or do I have to create a different CSV for it?

 Many thanks,
 Henrique.




 --
 Anshum Gupta
 http://about.me/anshumgupta


Re: Simple Sort Is Not Working In Solr 4.7?

2015-02-17 Thread Simon Cheng
Hi Alex,

It's okay after I added in a new field s_title in the schema and
re-indexed.

   field name=s_title type=string indexed=true stored=false
multiValued=false/
   copyField source=title dest=s_title/

But how can I ignore the articles (A, An, The) in the sorting. As you
can see from the below example :

http://localhost:8983/solr/bibs/select?q=singaporefl=id,titlesort=s_title+ascwt=xmlstart=0rows=20indent=true

response
lst name=responseHeader
int name=status0/int
int name=QTime0/int
lst name=params
str name=qsingapore/str
str name=indenttrue/str
str name=flid,title/str
str name=start0/str
str name=sorts_title asc/str
str name=rows20/str
str name=wtxml/str
/lst
/lst
result name=response numFound=18 start=0
doc
str name=id36/str
str name=title
5th SEACEN-Toronto Centre Leadership Seminar for Senior Management of
Central Banks on Financial System Oversight, 16-21 Oct 2005, Singapore
/str
/doc
doc
str name=id70/str
str name=title
Anti-money laundering  counter-terrorism financing / Commercial Affairs
Dept
/str
/doc
doc
str name=id15/str
str name=title
China's anti-secession law : a legal perspective / Zou, Keyuan
/str
/doc
doc
str name=id12/str
str name=title
China's currency peg : firm in the eye of the storm / Calla Wiemer
/str
/doc
doc
str name=id22/str
str name=title
China's politics in 2004 : dawn of the Hu Jintao era / Zheng Yongnian  Lye
Liang Fook
/str
/doc
doc
str name=id92/str
str name=title
Goods and Services Tax Act [2005 ed.] (Chapter 117A)
/str
/doc
doc
str name=id13/str
str name=title
Governing capacity in China : creating a contingent of qualified personnel
/ Kjeld Erik Brodsgaard
/str
/doc
doc
str name=id21/str
str name=titleHealth care marketization in urban China / Gu Xin/str
/doc
doc
str name=id85/str
str name=titleLianhe Zaobao, Sunday/str
/doc
doc
str name=id84/str
str name=title
Singapore : vision of a global city / Jones Lang LaSalle
/str
/doc
doc
str name=id7/str
str name=title
Singapore real estate investment trusts : leveraged value / Tony Darwell
/str
/doc
doc
str name=id96/str
str name=title
Singapore's success : engineering economic growth / Henri Ghesquiere
/str
/doc
doc
str name=id23/str
str name=title
The Chen-Soong meeting : the beginning of inter-party rapprochement in
Taiwan? / Raymond R. Wu
/str
/doc
doc
str name=id17/str
str name=title
The Haw Par saga in the 1970s / project sponsor, Low Kwok Mun; team leader,
Sandy Ho; team members, Audrey Low ... et al
/str
/doc
doc
str name=id78/str
str name=titleThe New paper on Sunday/str
/doc
doc
str name=id95/str
str name=title
The little Red Dot : reflections by Singapore's diplomats / editors, Tommy
Koh, Chang Li Lin
/str
/doc
doc
str name=id52/str
str name=title
[Press releases and articles on policy changes affecting the Singapore
property market] / compiled by the Information Resource Centre, Monetary
Authority of Singapore
/str
/doc
doc
str name=iddataq/str
str name=title
Simon is testing Solr - This one is in English. Color of the Wind. 我是中国人 ,
БOΛbШ OЙ PYCCKO-KИTAЙCKИЙ CΛOBAPb , Français-Chinois
/str
/doc
/result
/response


Re: Simple Sort Is Not Working In Solr 4.7?

2015-02-17 Thread Alexandre Rafalovitch
Like I mentioned before. You could use string type if you just want
title it is. Or you can use a custom type to normalize the indexed
value, as long as you end up with a single token.

So, if you want to strip leading A/An/The, you can use
KeywordTokenizer, combined with whatever post-processing you need. I
would suggest LowerCase filter and perhaps Regex filter to strip off
those leading articles. You may need to iterate a couple of times on
that specific chain.

The good news is that you can just make a couple of type definitions
with different values/order, reload the index (from Cores screen of
the Web Admin UI) and run some of your sample titles through those
different definitions without having to reindex in the Analysis
screen.

Regards,
  Alex.


Sign up for my Solr resources newsletter at http://www.solr-start.com/

On 17 February 2015 at 22:36, Simon Cheng simonwhch...@gmail.com wrote:
 Hi Alex,

 It's okay after I added in a new field s_title in the schema and
 re-indexed.

field name=s_title type=string indexed=true stored=false
 multiValued=false/
copyField source=title dest=s_title/

 But how can I ignore the articles (A, An, The) in the sorting. As you
 can see from the below example :


Re: Simple Sort Is Not Working In Solr 4.7?

2015-02-17 Thread Simon Cheng
Hi Alex,

It's simply defined like this in the schema.xml :

   field name=title type=text_general indexed=true stored=true
multiValued=false/

and it is cloned to the other multi-valued field o_title :

   copyField source=title dest=o_title/

Should I simply change the type to be string instead?

Thanks again,
Simon.


On Wed, Feb 18, 2015 at 12:00 PM, Alexandre Rafalovitch arafa...@gmail.com
wrote:

 What's the field definition for your title field? Is it just string
 or are you doing some tokenizing?

 It should be a string or a single token cleaned up (e.g. lower-cased)
 using KeywordTokenizer. In the example schema, you will normally see
 the original field tokenized and the sort field separately with
 copyField connection. In latest Solr, docValues are also recommended
 for sort fields.

 Regards,
Alex.



Simple Sort Is Not Working In Solr 4.7?

2015-02-17 Thread Simon Cheng
Hi,

I don't know whether it is my setup or any other reasons. But the fact is
that a very simple sort is not working in my Solr 4.7 environment.

The query is very simple :
http://localhost:8983/solr/bibs/select?q=author:sorosfl=id,author,titlesort=title+ascwt=xmlstart=0indent=true

And the output is NOT sorted according to title :

response
lst name=responseHeader
int name=status0/int
int name=QTime1/int
lst name=params
str name=sorttitle asc/str
str name=flid,author,title/str
str name=indenttrue/str
str name=start0/str
str name=qauthor:soros/str
str name=wtxml/str
/lst
/lst
result name=response numFound=13 start=0
doc
str name=id9018/str
arr name=author
strSoros, George, 1930-/str
/arr
str name=title
The alchemy of finance : reading the mind of the market / George Soros
/str
/doc
doc
str name=id15785/str
arr name=author
strSoros, George, 1930-/str
strSoros Foundations/str
/arr
str name=titleBosnia / by George Soros/str
/doc
doc
str name=id16281/str
arr name=author
strSoros, George, 1930-/str
strSoros Foundations/str
/arr
str name=title
Prospect for European disintegration / by George Soros
/str
/doc
doc
str name=id25807/str
arr name=author
strSoros, George/str
/arr
str name=title
Open society : reforming global capitalism / George Soros
/str
/doc
doc
str name=id27440/str
str name=titleGeorge Soros on globalization/str
arr name=author
strSoros, George, 1930-/str
/arr
/doc
doc
str name=id22254/str
arr name=author
strSoros, George, 1930-/str
/arr
str name=title
The crisis of global capitalism : open society endangered / George Soros
/str
/doc
doc
str name=id16914/str
arr name=author
strSoros, George, 1930-/str
strSoros Fund Management/str
/arr
str name=titleThe theory of reflexivity / by George Soros/str
/doc
doc
str name=id17343/str
str name=title
Financial turmoil in Europe and the United States : essays / George Soros
/str
arr name=author
strSoros, George, 1930-/str
/arr
/doc
doc
str name=id15542/str
arr name=author
strSoros, George, 1930-/str
strHarvard Club of New York City/str
/arr
str name=title
Nationalist dictatorships versus open society / by George Soros
/str
/doc
doc
str name=id15891/str
arr name=author
strSoros, George/str
/arr
str name=title
The new paradigm for financial markets : the credit crisis of 2008 and what
it means / George Soros
/str
/doc
/result
/response

Thank you for the help in advance,
Simon.


Re: Simple Sort Is Not Working In Solr 4.7?

2015-02-17 Thread Alexandre Rafalovitch
If you are not searching against the title field directly, you can
change it to string. If you do, create a separate one, specifically
for sorting. You should be able to use docValues with that field even
in Solr 4.7.

Remember to re-index.

Regards,
   Alex.

Sign up for my Solr resources newsletter at http://www.solr-start.com/


On 17 February 2015 at 20:16, Simon Cheng simonwhch...@gmail.com wrote:
 Hi Alex,

 It's simply defined like this in the schema.xml :

field name=title type=text_general indexed=true stored=true
 multiValued=false/

 and it is cloned to the other multi-valued field o_title :

copyField source=title dest=o_title/

 Should I simply change the type to be string instead?

 Thanks again,
 Simon.


 On Wed, Feb 18, 2015 at 12:00 PM, Alexandre Rafalovitch arafa...@gmail.com
 wrote:

 What's the field definition for your title field? Is it just string
 or are you doing some tokenizing?

 It should be a string or a single token cleaned up (e.g. lower-cased)
 using KeywordTokenizer. In the example schema, you will normally see
 the original field tokenized and the sort field separately with
 copyField connection. In latest Solr, docValues are also recommended
 for sort fields.

 Regards,
Alex.



Re: CSV entry as multiple documents

2015-02-17 Thread Alexandre Rafalovitch
What's your business use case? You don't need the split command, as
you already have those values in separate fields. You could copyField
them to a single multiValued field, but you would still have one
document per original CSV line.

Why do you need multiple documents out of one big CSV entry?

Regards,
Alex.

Sign up for my Solr resources newsletter at http://www.solr-start.com/


On 17 February 2015 at 20:37, Henrique Oliveira hensan...@gmail.com wrote:
 Yes, Alexandre is right about my question. To make it clear, a CSV that look 
 like:
 t1,v1,v2,v2
 2015-01-01T01:59:00Z,0.3,0.5,0.7
 2015-01-01T02:00:00Z,0.4,0.5,0.8

 would be the same of indexing
 t1,v
 2015-01-01T01:59:00Z,0.3
 2015-01-01T01:59:00Z,0.5
 2015-01-01T01:59:00Z,0.7
 2015-01-01T02:00:00Z,0.4
 2015-01-01T02:00:00Z,0.5
 2015-01-01T02:00:00Z,0.8

 I don’t know if multiValued field would do the trick. Do you have more info 
 on that split command?

 Henrique

 On Feb 17, 2015, at 7:57 PM, Alexandre Rafalovitch arafa...@gmail.com 
 wrote:

 I think the question asked was a bit different. It was about having
 one row/document split into multiple with some fields replicated and
 some mapped.

 JSON (single-document format) has a split command which might be
 similar to what's being asked. CSV has a split command as well, but I
 think it is more about creating a multiValued field.

 Or did I miss a different parameter?

 Regards,
   Alex.
 
 Sign up for my Solr resources newsletter at http://www.solr-start.com/


 On 17 February 2015 at 19:41, Anshum Gupta ans...@anshumgupta.net wrote:
 Hi Henrique,

 Solr supports posting a csv with multiple rows. Have a look at the
 documentation in the ref. guide here:
 https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-CSVFormattedIndexUpdates



 On Tue, Feb 17, 2015 at 2:44 PM, Henrique Oliveira hensan...@gmail.com
 wrote:

 Hi all,

 I was wondering if there is a way to tell Solr to treat a CSV entry as
 multiple documents instead of one document. For instance, suppose that a
 CSV file has 4 fields and a single entry:
 t1,v1,v2,v3
 2015-01-01T01:00:59Z,0.3,0.5,0.7

 I want Solr to update its index like it were 3 different documents:
 t1,v
 2015-01-01T01:00:59Z,0.3
 2015-01-01T01:00:59Z,0.5
 2015-01-01T01:00:59Z,0.7

 Is that possible, or do I have to create a different CSV for it?

 Many thanks,
 Henrique.




 --
 Anshum Gupta
 http://about.me/anshumgupta



Confirm Solr index corruption

2015-02-17 Thread Thomas Mathew
Hi All,

I use Solr 4.4.0 in a master-slave configuration. Last week, the master
server ran out of disk (logs got too big too quick due to a bug in our
system). Because of this, we weren't able to add new docs to an index. The
first thing I did was to delete a few old log files to free up disk space
(later I moved the other logs to free up disk). The index is working fine
even after this fiasco.

The next day, a colleague of mine pointed out that we may be missing a few
documents in the index. I suspect the above scenario may have broken the
index. I ran the checkIndex against this index. It didn't mention of any
corruption though.

Right now, the index has about 25k docs. I haven't optimized this index in
a while, and there are about 4000 deleted-docs. How can I confirm if we
lost anything? If we've lost docs, is there a way to recover it?

Thanks in advance!!

Regards
Thomas


Re: spellcheck.count v/s spellcheck.alternativeTermCount

2015-02-17 Thread Nitin Solanki
Thanks James,
  I tried the same thing
spellcheck.count=10spellcheck.alternativeTermCount=5. And I got 5
suggestions of both life and hope but not like this * The spellchecker
will try to return you up to 10 suggestions for hope, but only up to 5
suggestions for life. *


On Wed, Feb 18, 2015 at 1:10 AM, Dyer, James james.d...@ingramcontent.com
wrote:

 Here is an example to illustrate what I mean...

 - query q=text:(life AND
 hope)spellcheck.count=10spellcheck.alternativeTermCount=5
 - suppose at least one document in your dictionary field has life in it
 - also suppose zero documents in your dictionary field have hope in them
 - The spellchecker will try to return you up to 10 suggestions for hope,
 but only up to 5 suggestions for life

 James Dyer
 Ingram Content Group


 -Original Message-
 From: Nitin Solanki [mailto:nitinml...@gmail.com]
 Sent: Tuesday, February 17, 2015 11:35 AM
 To: solr-user@lucene.apache.org
 Subject: Re: spellcheck.count v/s spellcheck.alternativeTermCount

 Hi James,
 How can you say that count doesn't use
 index/dictionary then from where suggestions come.

 On Tue, Feb 17, 2015 at 10:29 PM, Dyer, James 
 james.d...@ingramcontent.com
 wrote:

  See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.count and
  the following section, for details.
 
  Briefly, count is the # of suggestions it will return for terms that
 are
  *not* in your index/dictionary.  alternativeTermCount are the # of
  alternatives you want returned for terms that *are* in your dictionary.
  You can set them to the same value, unless you want fewer suggestions
 when
  the terms is in the dictionary.
 
  James Dyer
  Ingram Content Group
 
  -Original Message-
  From: Nitin Solanki [mailto:nitinml...@gmail.com]
  Sent: Tuesday, February 17, 2015 5:27 AM
  To: solr-user@lucene.apache.org
  Subject: spellcheck.count v/s spellcheck.alternativeTermCount
 
  Hello Everyone,
I got confusion between spellcheck.count and
  spellcheck.alternativeTermCount in Solr. Any help in details?
 



Re: Boosting by calculated distance buckets

2015-02-17 Thread David Smiley
Raav,

You may need to actually subscribe to the solr-user list.  Nabble seems to
not be working to well.
p.s. I’m on vacation this week so I can’t be very responsive

First of all... it's not clear you actually want to *boost* (since you seem
to not care about the relevancy score), it seems you want to *sort* based on
a function query.  So simply sort by the function query instead of using the
'bq' param.

Have you read about geodist() in the Solr Reference Guide?  It returns the
spatial distance.  With that and other function queries like map() you could
do something like sum(map(geodist(),0,40,40,0),map(geodist(),0,20,10,0)) and
you could put that into your main function query.  I purposefully overlapped
the map ranges so that I didn't have to deal with double-counting an edge. 
The only thing I don't like about this is that the distance is going to be
calculated as many times as you reference the function, and it's slow.  So
you may want to write your own function query (internally called a
ValueSource), which is relatively easy to do in Solr.

~ David


sraav wrote
 David,
 
 Thank you for your prompt response. I truly appreciate it. Also, My post
 was not accepted the first two times so I am posting it again one final
 time. 
 
 In my case I want to turn off the dependency on scoring and let solr use
 just the boost values that I pass to each function to sort on. Here is a
 quick example of how I got that to work with non-geo fields which are
 present in the document and are not dynamically calculated. Using edismax
 ofcourse.
 
 I was able to turn off the scoring (i mean remove the dependency on score)
 on the result set and drive the sort by the boost that I mentioned in the
 below query. In the below function For example - if the document1
 matches the date listed it gets a boost = 5. If the same document matches
 the owner AND product  - it will get an additional boost of 5 more. The
 total boost of this document1 is 10. From what ever I have seen, it
 seems like i was able to turn off of negate the affects of solr score.
 There was a query norm param that was affecting the boost but it seemed to
 be a constant around 0.70345...most of the time for any fq mentioned).  
 
 bq = {!func}sum(if(query({!v='datelisted:[2015-01-22T00:00:00.000Z TO
 *]'}),5,0),if(and(query({!v='owner:*BRAVE*'}),query({!v='PRODUCT:*SWORD*'}),5,0))
 
 What I am trying to do is to add additional boosting function to the
 custom boost that will eventually tie into the above function and boost
 value.
 
 For example - if document1 falls in 0-20 KM range i would like to add a
 boost of 50 making the final boost value to be 60. If it falls under
 20-40KM - i would like to add a boost of 40 and so on.  
 
 Is there a way we can do this?  Please let me know if I can provide better
 clarity on the use case that I am trying to solve. Thank you David.
 
 Thanks,
 Raav





-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
 Independent Lucene/Solr search consultant, 
http://www.linkedin.com/in/davidwsmiley
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Boosting-by-calculated-distance-buckets-tp4186504p4187112.html
Sent from the Solr - User mailing list archive at Nabble.com.


Collations problem even term is available in documents.

2015-02-17 Thread Nitin Solanki
Hi,
I am misspelling a query hota hai to hota hain. Inside
collations, hota hai is not coming, instead of that hot main, home have.
etc are coming. I have 37 documents where hota hai is present.

*URL: *localhost:8983/solr/wikingram/spell?q=gram_ci:hota
hainwt=jsonindent=trueshards.qt=/spell

*Configuration:*
*solrconfig.xml:*

searchComponent name=spellcheck class=solr.SpellCheckComponent
str name=queryAnalyzerFieldTypetextSpellCi/str
lst name=spellchecker
  str name=namedefault/str
  str name=fieldgram_ci/str
  str name=classnamesolr.DirectSolrSpellChecker/str
  str name=distanceMeasureinternal/str
  float name=accuracy0.5/float
  int name=maxEdits2/int
  int name=minPrefix0/int
  int name=maxInspections5/int
  int name=minQueryLength2/int
  float name=maxQueryFrequency0.9/float
  str name=comparatorClassfreq/str
/lst
/searchComponent

requestHandler name=/spell class=solr.SearchHandler startup=lazy
lst name=defaults
  str name=dfgram_ci/str
  str name=spellcheck.dictionarydefault/str
  str name=spellcheckon/str
  str name=spellcheck.extendedResultstrue/str
  str name=spellcheck.count15/str
  str name=spellcheck.onlyMorePopulartrue/str
  str name=spellcheck.alternativeTermCount15/str
  str name=spellcheck.collatetrue/str
  str name=spellcheck.maxCollations1000/str
  str name=spellcheck.maxCollationTries3000/str
  str name=spellcheck.collateExtendedResultstrue/str
/lst
arr name=last-components
  strspellcheck/str
/arr
  /requestHandler

*Schema.xml: *

field name=gram_ci type=textSpellCi indexed=true stored=true
multiValued=false/

/fieldTypefieldType name=textSpellCi class=solr.TextField
positionIncrementGap=100
   analyzer type=index
tokenizer class=solr.
StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.ShingleFilterFactory maxShingleSize=5
minShingleSize=2 outputUnigrams=true/
/analyzer
analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.ShingleFilterFactory maxShingleSize=5
minShingleSize=2 outputUnigrams=true/
/analyzer
/fieldType


Re: Solrcloud sizing

2015-02-17 Thread Erick Erickson
Well, it's really impossible to say, you have to prototype. Here's something
explaining this a bit:
https://lucidworks.com/blog/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

This is a major undertaking. Your question is simply impossible to
answer without prototyping as in
the link above, anything else is guesswork. And at this scale being
wrong is expensive.

So my advice would be to test on a small cluster, say a 2 shard
system and see what kind of
performance you can get and extrapolate from there, with your data,
your queries etc. Perhaps
work with your client on a limited-scope proof-of-concept. Plan on
spending some time tuning
even the small cluster to get enough answers to form a go/no-go decision.

Best,
Erick


On Tue, Feb 17, 2015 at 4:40 PM, Dominique Bejean
dominique.bej...@eolya.fr wrote:
 One of our customers needs to index 15 billions document in a collection.
 As this volume is not usual for me, I need some advices about solrcloud
 sizing (how much servers, nodes, shards, replicas, memory, ...)

 Some inputs :

- Collection size : 15 billions document
- Collection update : 8 millions new documents / days + 8 millions
deleted documents / days
- Updates occur during the night without queries
- Queries occur during the day without updates
- Document size is nearly 300 bytes
- Document fields are mainly string including one date field
- The same terms will occurs several time for a given field (from 10 to
100.000)
- Query will use a date period and a filter query on one or more fields
- 10.000 queries / minutes
- expected response time  500ms
- 1 billion documents indexed = 5Gb index size
- no ssd drives

 So, what is you advice about :

 # of shards : 15 billions documents - 16 shards ?
 # of replicas ?
 # of nodes = # of shards ?
 heap memory per node ?
 direct memory per node ?

 Thank your advices ?

 Dominique


Re: Solrcloud sizing

2015-02-17 Thread Dominique Bejean
Thank you Erick.

This was also my own opinion.

2015-02-18 7:12 GMT+01:00 Erick Erickson erickerick...@gmail.com:

 Well, it's really impossible to say, you have to prototype. Here's
 something
 explaining this a bit:

 https://lucidworks.com/blog/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

 This is a major undertaking. Your question is simply impossible to
 answer without prototyping as in
 the link above, anything else is guesswork. And at this scale being
 wrong is expensive.

 So my advice would be to test on a small cluster, say a 2 shard
 system and see what kind of
 performance you can get and extrapolate from there, with your data,
 your queries etc. Perhaps
 work with your client on a limited-scope proof-of-concept. Plan on
 spending some time tuning
 even the small cluster to get enough answers to form a go/no-go decision.

 Best,
 Erick


 On Tue, Feb 17, 2015 at 4:40 PM, Dominique Bejean
 dominique.bej...@eolya.fr wrote:
  One of our customers needs to index 15 billions document in a collection.
  As this volume is not usual for me, I need some advices about solrcloud
  sizing (how much servers, nodes, shards, replicas, memory, ...)
 
  Some inputs :
 
 - Collection size : 15 billions document
 - Collection update : 8 millions new documents / days + 8 millions
 deleted documents / days
 - Updates occur during the night without queries
 - Queries occur during the day without updates
 - Document size is nearly 300 bytes
 - Document fields are mainly string including one date field
 - The same terms will occurs several time for a given field (from 10
 to
 100.000)
 - Query will use a date period and a filter query on one or more
 fields
 - 10.000 queries / minutes
 - expected response time  500ms
 - 1 billion documents indexed = 5Gb index size
 - no ssd drives
 
  So, what is you advice about :
 
  # of shards : 15 billions documents - 16 shards ?
  # of replicas ?
  # of nodes = # of shards ?
  heap memory per node ?
  direct memory per node ?
 
  Thank your advices ?
 
  Dominique



Re: Block Join Query Parsers regular expression feature workaround req

2015-02-17 Thread Sankalp Gupta
Hi Mikhail,

It won't solve my problem.
For ex:
Suppose my docs are like this:
doc
field name=userid1/field
doc
   field name=addresscity1/field
/doc
doc
   field name=addresscity2/field
/doc
/doc

doc
field name=userid2/field
doc
   field name=addresscity2/field
/doc
doc
   field name=addresscity3/field
/doc
/doc

Now if I want* a query to return me all the users not having any address*
related to *city1* (i.e. only userid=2 should be in the result)and then if
i query:
*q={!parent which=userid:*}*:* -address:city1*
This will return me two* results i.e.** userid=2 and userid=1 *(as userid=1
is also having a child whose address is city2)  , *desired output was
userid=2 only.*

On Tue, Feb 17, 2015 at 8:12 PM, Mikhail Khludnev 
mkhlud...@griddynamics.com wrote:

 try to search all children remove those who has a value1 by dash, then join
 remaining
 q={!parent which=contentType:parent}contentType:child -contentType:value1
 if the space in underneath query causes the problem try to escape it or
 wrap to v=$subq



 On Tue, Feb 17, 2015 at 4:13 PM, Sankalp Gupta sankalp.gu...@snapdeal.com
 
 wrote:

  Hi
 
  I need to have a query in which I need to choose only those parent docs
  none of whose children's field is having the specified value.
  i.e. I need something like this:
  http://localhost:8983/solr/core1/select?*q={!parent
  which=contentType:parent}childField:NOT value1*
 
  The problem is* NOT operator is not being supported* in the Block Join
  Query Parsers. Could anyone please suggest a way to workaround this
  problem?
  Have also added the problem on *stackoverflow*:
 
 
 http://stackoverflow.com/questions/28562355/in-solr-does-block-join-query-parsers-lack-regular-expression-feature
 
  Regards
  Sankalp Gupta
 



 --
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics

 http://www.griddynamics.com
 mkhlud...@griddynamics.com



Re: Using TimestampUpdateProcessorFactory and updateRequestProcessorChain

2015-02-17 Thread Ahmet Arslan
Hi,

You are using /update when registering, but using /update/extract when 
invoking.

Ahmet



On Tuesday, February 17, 2015 6:28 PM, Shu-Wai Chow sc...@alumni.rutgers.edu 
wrote:
Hi, all.  I’m trying to insert a field into Solr called last_modified, which 
holds a timestamp of the update. Since this is a cloud setup, I'm using the 
TimestampUpdateProcessorFactory to update the updateRequestProcessorChain.

solrconfig.xml:

requestHandler name=/update class=solr.UpdateRequestHandler
lst name=defaults
str name=update.chainlast_modified/str
/lst
/requestHandler

updateRequestProcessorChain name=last_modified
processor class=solr.TimestampUpdateProcessorFactory
str name=fieldNamelast_modified/str
/processor
processor class=solr.LogUpdateProcessorFactory /
processor class=solr.RunUpdateProcessorFactory /
/updateRequestProcessorChain


In schema.xml, I have:

field name=last_modified type=date indexed=true stored=true /
fieldType name=date class=solr.TrieDateField precisionStep=0 
positionIncrementGap=0/
This is the command I'm using to index:

curl 
http://localhost:8983/solr/update/extract?uprefix=attr_fmap.content=bodyliteral.id=1234.idlast_modified=NOW;
 -F sc=@1234.txt
However, after indexing, the last_modified field is still not showing up on 
queries. Is there something else I should be doing?  Thanks.


Using TimestampUpdateProcessorFactory and updateRequestProcessorChain

2015-02-17 Thread Shu-Wai Chow
Hi, all.  I’m trying to insert a field into Solr called last_modified, which 
holds a timestamp of the update. Since this is a cloud setup, I'm using the 
TimestampUpdateProcessorFactory to update the updateRequestProcessorChain.

solrconfig.xml:

requestHandler name=/update class=solr.UpdateRequestHandler
lst name=defaults
str name=update.chainlast_modified/str
/lst
/requestHandler

updateRequestProcessorChain name=last_modified
processor class=solr.TimestampUpdateProcessorFactory
str name=fieldNamelast_modified/str
/processor
processor class=solr.LogUpdateProcessorFactory /
processor class=solr.RunUpdateProcessorFactory /
/updateRequestProcessorChain


In schema.xml, I have:

field name=last_modified type=date indexed=true stored=true /
fieldType name=date class=solr.TrieDateField precisionStep=0 
positionIncrementGap=0/
This is the command I'm using to index:

curl 
http://localhost:8983/solr/update/extract?uprefix=attr_fmap.content=bodyliteral.id=1234.idlast_modified=NOW;
 -F sc=@1234.txt
However, after indexing, the last_modified field is still not showing up on 
queries. Is there something else I should be doing?  Thanks.

RE: spellcheck.count v/s spellcheck.alternativeTermCount

2015-02-17 Thread Dyer, James
See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.count and the 
following section, for details.

Briefly, count is the # of suggestions it will return for terms that are 
*not* in your index/dictionary.  alternativeTermCount are the # of 
alternatives you want returned for terms that *are* in your dictionary.  You 
can set them to the same value, unless you want fewer suggestions when the 
terms is in the dictionary.

James Dyer
Ingram Content Group

-Original Message-
From: Nitin Solanki [mailto:nitinml...@gmail.com] 
Sent: Tuesday, February 17, 2015 5:27 AM
To: solr-user@lucene.apache.org
Subject: spellcheck.count v/s spellcheck.alternativeTermCount

Hello Everyone,
  I got confusion between spellcheck.count and
spellcheck.alternativeTermCount in Solr. Any help in details?


Discrepancy between Full import and Delta import query

2015-02-17 Thread Aniket Bhoi
Hi Folks,

I am running Solr 3.4 and using DIH for importing data from a SQL server
backend.

The query for Full import and Delta import is the same ie both pull the
same data.

Full and Delta import query:

SELECT KB_ENTRY.ADDITIONAL_INFO ,KB_ENTRY.KNOWLEDGE_REF
ID,SU_ENTITY_TYPE.REF ENTRY_TYPE_REF,KB_ENTRY.PROFILE_REF,
KB_ENTRY.ITEM_REF, KB_ENTRY.TITLE, KB_ENTRY.ABSTRACT, KB_ENTRY.SOLUTION,
KB_ENTRY.SOLUTION_HTML, KB_ENTRY.FREE_TEXT, KB_ENTRY.DATE_UPDATED,
KB_ENTRY.STATUS_REF, KB_ENTRY.CALL_NUMBER, SU_ENTITY_TYPE.DISPLAY
ENTRY_TYPE, KB_PROFILE.NAME PROFILE_TYPE, AR_PRIMARY_ASSET.ASSET_REF
SERVICE_TYPE, AR_PERSON.FULL_NAME CONTRIBUTOR, IN_SYS_SOURCE.NAME SOURCE,
KB_ENTRY_STATUS.NAME STATUS,(SELECT COUNT (CL_KB_REFER.CALL_NUMBER) FROM
CL_KB_REFER WHERE CL_KB_REFER.ARTICLE_REF = KB_ENTRY.KNOWLEDGE_REF)
LINK_RATE FROM KB_ENTRY, SU_ENTITY_TYPE, KB_PROFILE, AR_PRIMARY_ASSET,
AR_PERSON, IN_SYS_SOURCE, KB_ENTRY_STATUS WHERE KB_ENTRY.PARTITION = 1 AND
KB_ENTRY.STATUS = 'A' AND AR_PERSON.OFFICER_IND = 1 AND
KB_ENTRY.CREATED_BY_REF = AR_PERSON.REF AND KB_ENTRY.SOURCE =
IN_SYS_SOURCE.REF AND KB_ENTRY.STATUS_REF = KB_ENTRY_STATUS.REF AND
KB_ENTRY_STATUS.STATUS = 'A' AND KB_ENTRY.PROFILE_REF = KB_PROFILE.REF AND
KB_ENTRY.ITEM_REF = AR_PRIMARY_ASSET.ITEM_REF AND KB_ENTRY.ENTITY_TYPE =
SU_ENTITY_TYPE.REF AND KB_ENTRY.KNOWLEDGE_REF='${dataimporter.delta.ID}'


Delta query:select KNOWLEDGE_REF as ID from KB_ENTRY where (DATE_UPDATED
gt; '${dataimporter.last_index_time}' OR DATE_CREATED gt;
'${dataimporter.last_index_time}')


The Problem here is that When I run the full Import ,everything works fine
and all the feilds .data are displayed fine in the search

However When I run Delta import,for some records the ENTRY_TYPE field is
not returned from the database,

Let me illustrate it with an example:

Search result After running Full Import:

Record Name:John Doe
Entry ID:500
Entry Type:Worker

Search result after running Delta import:

Record Name:John Doe
Entry ID:500
Entry Type:


FYI:I have run the Full and Delta import queries(Though both are the same)
on the SQL Server IDE  and both return The Entry Type feild correctly.

Not sure why the entry Type feild vanishes from Solr when Delta import is
run.

Any idea why this would happen.

Thanks,

Aniket


Collations are not using suggestions to build collations

2015-02-17 Thread Nitin Solanki
Hi,
  I want to build collations using suggestions of the query. But
collations are building without using suggestions, they are using its own
suggesters*(misspellingsAndCorrections)* and don't know from where these
suggestions are coming.

You can see the result by seeing below response for the query
*URL :*
http://localhost:8983/solr/wikingram/spell?q=gram_ci:%22kuchi%20kucch%20hota%22wt=jsonindent=trueshards.qt=/spellshards.tolerant=truerows=1

You can see that kuch terms are not in both kuchi and kucch
suggestions. But kuch is coming into

misspellingsAndCorrections,[
  kuchi,kuch,
  kucch,kuch,
  hota,hota]]]}}

. How it is happening?


*Response:*

{
  responseHeader:{
status:0,
QTime:3440},
  response:{numFound:0,start:0,maxScore:0.0,docs:[]
  },
  spellcheck:{
suggestions:[
  kuchi,{
numFound:5,
startOffset:9,
endOffset:14,
origFreq:40,
suggestion:[{
word:kochi,
freq:976},
  {
word:k chi,
freq:442},
  {
word:yuchi,
freq:71},
  {
word:kucha,
freq:32},
  {
word:kichi,
freq:17}]},
  kucch,{
numFound:2,
startOffset:15,
endOffset:20,
origFreq:9,
suggestion:[{
word:kutch,
freq:231},
  {
word:kusch,
freq:67}]},
  correctlySpelled,false,
  collation,[
collationQuery,gram_ci:\kuch kuch hota\,
hits,22,
misspellingsAndCorrections,[
  kuchi,kuch,
  kucch,kuch,
  hota,hota]]]}}


Re: spellcheck.count v/s spellcheck.alternativeTermCount

2015-02-17 Thread Nitin Solanki
Any help please?

On Tue, Feb 17, 2015 at 4:57 PM, Nitin Solanki nitinml...@gmail.com wrote:

 Hello Everyone,
   I got confusion between spellcheck.count and
 spellcheck.alternativeTermCount in Solr. Any help in details?



Re: Block Join Query Parsers regular expression feature workaround req

2015-02-17 Thread Mikhail Khludnev
try to search all children remove those who has a value1 by dash, then join
remaining
q={!parent which=contentType:parent}contentType:child -contentType:value1
if the space in underneath query causes the problem try to escape it or
wrap to v=$subq



On Tue, Feb 17, 2015 at 4:13 PM, Sankalp Gupta sankalp.gu...@snapdeal.com
wrote:

 Hi

 I need to have a query in which I need to choose only those parent docs
 none of whose children's field is having the specified value.
 i.e. I need something like this:
 http://localhost:8983/solr/core1/select?*q={!parent
 which=contentType:parent}childField:NOT value1*

 The problem is* NOT operator is not being supported* in the Block Join
 Query Parsers. Could anyone please suggest a way to workaround this
 problem?
 Have also added the problem on *stackoverflow*:

 http://stackoverflow.com/questions/28562355/in-solr-does-block-join-query-parsers-lack-regular-expression-feature

 Regards
 Sankalp Gupta




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
mkhlud...@griddynamics.com


Re: Having a spot of trouble setting up /browse

2015-02-17 Thread Erik Hatcher
And FYI, out of the box with Solr 5.0, using the data driven config (the 
default when creating a collection with `bin/solr create -c …`), /browse is 
wired in by default with no templates explicit in the configuration as they are 
baked into the VrW library itself.

But yeah, what Alexandre said - need to the lib’s included like in Solr’s 
4.10.3 example collection1 configuration as well as the conf/velocity files.

—
Erik Hatcher, Senior Solutions Architect
http://www.lucidworks.com http://www.lucidworks.com/




 On Feb 16, 2015, at 8:44 PM, Alexandre Rafalovitch arafa...@gmail.com wrote:
 
 Velocity libraries and .vm templates as a first step! Did you get those setup?
 
 Regards,
   Alex.
 
 Sign up for my Solr resources newsletter at http://www.solr-start.com/
 
 
 On 16 February 2015 at 19:33, Benson Margulies ben...@basistech.com wrote:
 So, I had set up a solr core modelled on the 'multicore' example in 4.10.3,
 which has no /browse.
 
 Upon request, I went to set up /browse.
 
 I copied in a minimal version. When I go there, I just get some XML back:
 
 response
 lst name=responseHeader
 int name=status0/int
 int name=QTime4/int
 lst name=params/
 /lst
 result name=response numFound=0 start=0 maxScore=0.0/
 /response
 
 What else does /browse depend upon?



Re: Too many merges, stalling...

2015-02-17 Thread Shawn Heisey
On 2/16/2015 8:12 PM, ralph tice wrote:
 Recently I turned on INFO level logging in order to get better insight
 as to what our Solr cluster is doing.  Sometimes as frequently as
 almost 3 times a second we get messages like:
 [CMS][qtp896644936-33133]: too many merges; stalling...
 
 Less frequently we get:
 [TMP][commitScheduler-8-thread-1]:
 seg=_5dy(4.10.3):C13520226/1044084:delGen=318 size=2784.291 MB [skip:
 too large]
 
 where size is 2500-4900MB.

I've trimmed most of your original message, but I will refer to things
you have mentioned in the unquoted portion.

The first message simply indicates that you have reached more
simultaneous merges than CMS is configured to allow (3 by default), so
it will stall all of them except one.  The javadocs say that the one
allowed to run will be the smallest, but I have observed the opposite --
the one that is allowed to run is always the largest.

The second message indicates that the merge under consideration would
have exceeded the maximum size, which defaults to 5GB, so it refused to
do that merge.

The mergeFactor setting is deprecated, but still works for now in 4.x
releases.  The reason your merges are happening so frequently is that
you have set this to a low value - 5.  Setting it to a larger value will
make merges less frequent.

The mergeFactor value is used to set maxMergeAtOnce and segmentsPerTier.
 A proper TieredMergePolicy config will have those two settings
(normally set to the same value) as well as maxMergeAtOnceExplicit,
which should be set to three times the value of the other two.  My
config uses 35, 35, and 105 for each of those values, respectively.

You can also allow more simultaneous merges in the CMS config.  I use a
value of 6 here, to avoid lengthy indexing stalls that will kill the DIH
connection to MySQL.  If the disks are standard spinning magnetic disks,
the number of CMS threads should be one.  If it's SSD, you can use more
threads.

Thanks,
Shawn



Re: Block Join Query Parsers regular expression feature workaround req

2015-02-17 Thread Kydryavtsev Andrey
How about  find all parents which have at least one child with address:city1 
and then not
Like (not sure about syntax at all)
q=-{!parent which=userid:*}address:city1

17.02.2015, 20:21, Sankalp Gupta sankalp.gu...@snapdeal.com:
 Hi Mikhail,

 It won't solve my problem.
 For ex:
 Suppose my docs are like this:
 doc
 field name=userid1/field
 doc
    field name=addresscity1/field
 /doc
 doc
    field name=addresscity2/field
 /doc
 /doc

 doc
 field name=userid2/field
 doc
    field name=addresscity2/field
 /doc
 doc
    field name=addresscity3/field
 /doc
 /doc

 Now if I want* a query to return me all the users not having any address*
 related to *city1* (i.e. only userid=2 should be in the result)and then if
 i query:
 *q={!parent which=userid:*}*:* -address:city1*
 This will return me two* results i.e.** userid=2 and userid=1 *(as userid=1
 is also having a child whose address is city2)  , *desired output was
 userid=2 only.*

 On Tue, Feb 17, 2015 at 8:12 PM, Mikhail Khludnev 
 mkhlud...@griddynamics.com wrote:
  try to search all children remove those who has a value1 by dash, then join
  remaining
  q={!parent which=contentType:parent}contentType:child -contentType:value1
  if the space in underneath query causes the problem try to escape it or
  wrap to v=$subq

  On Tue, Feb 17, 2015 at 4:13 PM, Sankalp Gupta sankalp.gu...@snapdeal.com
  wrote:
  Hi

  I need to have a query in which I need to choose only those parent docs
  none of whose children's field is having the specified value.
  i.e. I need something like this:
  http://localhost:8983/solr/core1/select?*q={!parent
  which=contentType:parent}childField:NOT value1*

  The problem is* NOT operator is not being supported* in the Block Join
  Query Parsers. Could anyone please suggest a way to workaround this
  problem?
  Have also added the problem on *stackoverflow*:
  
 http://stackoverflow.com/questions/28562355/in-solr-does-block-join-query-parsers-lack-regular-expression-feature
  Regards
  Sankalp Gupta
  --
  Sincerely yours
  Mikhail Khludnev
  Principal Engineer,
  Grid Dynamics

  http://www.griddynamics.com
  mkhlud...@griddynamics.com


Re: Too many merges, stalling...

2015-02-17 Thread Shawn Heisey
On 2/17/2015 7:47 AM, Shawn Heisey wrote:
 The first message simply indicates that you have reached more
 simultaneous merges than CMS is configured to allow (3 by default), so
 it will stall all of them except one.  The javadocs say that the one
 allowed to run will be the smallest, but I have observed the opposite --
 the one that is allowed to run is always the largest.

I have stated some things incorrectly here.  The gist of what I wrote is
correct, but the details are not.  These details are important,
especially for those who read this in the archives later.

As long as you are below maxMergeCount (default 3) for the number of
merges that have been scheduled, the system will simultaneously run up
to maxThreads (default 1) merges from that list, and it will ALSO allow
the incoming thread (indexing new data) to run.

Once you reach maxMergeCount, the incoming thread is stalled until you
are back below maxMergeCount, and up to maxThreads merges will be
running while the incoming thread is stalled.

Thanks,
Shawn



Re: spellcheck.count v/s spellcheck.alternativeTermCount

2015-02-17 Thread Nitin Solanki
Hi James,
How can you say that count doesn't use
index/dictionary then from where suggestions come.

On Tue, Feb 17, 2015 at 10:29 PM, Dyer, James james.d...@ingramcontent.com
wrote:

 See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.count and
 the following section, for details.

 Briefly, count is the # of suggestions it will return for terms that are
 *not* in your index/dictionary.  alternativeTermCount are the # of
 alternatives you want returned for terms that *are* in your dictionary.
 You can set them to the same value, unless you want fewer suggestions when
 the terms is in the dictionary.

 James Dyer
 Ingram Content Group

 -Original Message-
 From: Nitin Solanki [mailto:nitinml...@gmail.com]
 Sent: Tuesday, February 17, 2015 5:27 AM
 To: solr-user@lucene.apache.org
 Subject: spellcheck.count v/s spellcheck.alternativeTermCount

 Hello Everyone,
   I got confusion between spellcheck.count and
 spellcheck.alternativeTermCount in Solr. Any help in details?



Re: Using TimestampUpdateProcessorFactory and updateRequestProcessorChain

2015-02-17 Thread Chris Hostetter
: Hi,
: 
: You are using /update when registering, but using /update/extract when 
invoking.
: 
: Ahmet

if your goal is that *every* doc will get a last_modified, regarldess of 
how it is indexed, then you don't need to set the update.chain default 
on every requestHandler -- instead just mark your 
updateRequestProcessorChain as the default...

   updateRequestProcessorChain name=last_modified default=true
 processor class=solr.TimestampUpdateProcessorFactory
   str name=fieldNamelast_modified/str
 /processor
 ...

: 
: On Tuesday, February 17, 2015 6:28 PM, Shu-Wai Chow 
sc...@alumni.rutgers.edu wrote:
: Hi, all.  I’m trying to insert a field into Solr called last_modified, which 
holds a timestamp of the update. Since this is a cloud setup, I'm using the 
TimestampUpdateProcessorFactory to update the updateRequestProcessorChain.
: 
: solrconfig.xml:
: 
: requestHandler name=/update class=solr.UpdateRequestHandler
: lst name=defaults
: str name=update.chainlast_modified/str
: /lst
: /requestHandler
: 
: updateRequestProcessorChain name=last_modified
: processor class=solr.TimestampUpdateProcessorFactory
: str name=fieldNamelast_modified/str
: /processor
: processor class=solr.LogUpdateProcessorFactory /
: processor class=solr.RunUpdateProcessorFactory /
: /updateRequestProcessorChain
: 
: 
: In schema.xml, I have:
: 
: field name=last_modified type=date indexed=true stored=true /
: fieldType name=date class=solr.TrieDateField precisionStep=0 
positionIncrementGap=0/
: This is the command I'm using to index:
: 
: curl 
http://localhost:8983/solr/update/extract?uprefix=attr_fmap.content=bodyliteral.id=1234.idlast_modified=NOW;
 -F sc=@1234.txt
: However, after indexing, the last_modified field is still not showing up on 
queries. Is there something else I should be doing?  Thanks.
: 

-Hoss
http://www.lucidworks.com/

Re: Solr 4.8.1 : Response Code 500 when creating the new request handler

2015-02-17 Thread Chris Hostetter

: 1. Look further down in the stack trace for the caused by that details
:  the specific cause of the exception.

: I am still not able to find the cause of this.

jack is refering to the log file from your server ... sometimes there 
are more details there.

: Sorry i but don't know it is non-standard approach. please guide me here.

I'm not sure what jack was refering to -- i don't see anything non 
standard about how you have your handler configured.

: We are trying to find all the results so we are using q.alt=*:*.
: There are some products in our company who wants of find all the results 
*whose
: type is garments* and i forgot to mention we are trying to find only 6
: rows. So using this request handler we are providing the 6 rows.

Jack's point here is that you have specified a q.alt in your invariants 
but you have also specified it in the query params -- which will be 
totally ignored.  what specifically is your goal of haivng that query 
param in the sample query you tried? 

As a general debugging tip: Did you try ignoring your custom 
reuqestHandler, and just running a simple /select query with all of those 
params specified in the URL?  ... it can help to try and narrow down the 
problem -- in this case, i'm pretty sure you would have gotten the same 
error, and then the distractions of hte invariants question owuld have 
been irellevant


Looking at the source code for 4.8.1 it appears that the error you are 
seeing is edismax doing a really bad job of trying to report an error 
parsing in parsing the qf param -- which you haven't specified at all in 
your params

  try {
queryFields = DisMaxQParser.parseQueryFields(req.getSchema(), 
solrParams);  // req.getSearcher() here causes searcher refcount imbalance
  } catch (SyntaxError e) {
throw new RuntimeException();
  }

..if you add a qf param with the list of fields you want to search, (of 
a 'df' param to specify a default field) i suspect this error will go away.


I filed a bug to fix this terrible code to give a useful error msg in the 
future...

https://issues.apache.org/jira/browse/SOLR-7120




:  3. You have q.alt in invariants, but also in the actual request, which is a
:  contradiction in terms - what is your actual intent? This isn't the cause
:  of the exception, but does raise questions of what you are trying to do.
:  4. Why don't you have a q parameter for the actual query?
: 
: 
:  -- Jack Krupansky
: 
:  On Sat, Feb 14, 2015 at 1:57 AM, Aman Tandon amantandon...@gmail.com
:  wrote:
: 
:   Hi,
:  
:   I am using Solr 4.8.1 and when i am creating the new request handler i am
:   getting the following error:
:  
:   *Request Handler config:*
:  
:   requestHandler name=my_clothes_data class=solr.SearchHandler
:   lst name=invariants
:   str name=defTypeedismax/str
:   str name=indenton/str
:   str name=q.alt*:*/str
:  
:   float name=tie0.01/float
:   /lst
:  
:   lst name=appends
:   str name=fqtype:garments/str
:   /lst
:   /requestHandler
:  
:   *Error:*
:  
:   java.lang.RuntimeException at
:   
:  
:  
org.apache.solr.search.ExtendedDismaxQParser$ExtendedDismaxConfiguration.init(ExtendedDismaxQParser.java:1455)
:at
:   
:  
:  
org.apache.solr.search.ExtendedDismaxQParser.createConfiguration(ExtendedDismaxQParser.java:239)
:at
:   
:  
:  
org.apache.solr.search.ExtendedDismaxQParser.init(ExtendedDismaxQParser.java:108)
:at
:   
:  
:  
org.apache.solr.search.ExtendedDismaxQParserPlugin.createParser(ExtendedDismaxQParserPlugin.java:37)
:at org.apache.solr.search.QParser.getParser(QParser.java:315) at
:   
:  
:  
org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:144)
:at
:   
:  
:  
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:197)
:at
:   
:  
:  
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
:at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952) at
:   
:  
:  
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:774)
:at
:   
:  
:  
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
:at
:   
:  
:  
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
:at
:   
:  
:  
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
:at
:   
:  
:  org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
:at
:   
:  
:  
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
:at
:   
:  
:  org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
:at
:   
:  
:  
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
:at
:   
:  
:  

Re: unusually high 4.10.2 vs 4.3.1 RAM consumption

2015-02-17 Thread Toke Eskildsen
On Tue, 2015-02-17 at 11:05 +0100, Dmitry Kan wrote:
 Solr: 4.10.2 (high load, mass indexing)
 Java: 1.7.0_76 (Oracle)
 -Xmx25600m
 
 
 Solr: 4.3.1 (normal load, no mass indexing)
 Java: 1.7.0_11 (Oracle)
 -Xmx25600m
 
 The RAM consumption remained the same after the load has stopped on the
 4.10.2 cluster. Manually collecting the memory on a 4.10.2 shard via
 jvisualvm dropped the used RAM from 8,5G to 0,5G. But the reserved RAM as
 seen by top remained at 9G level.

As the JVM does not free OS memory once allocated, top just shows
whatever peak it reached at some point. When you tell the JVM that it is
free to use 25GB, it makes a lot of sense to allocate a fair chunk of
that instead of garbage collecting if there is a period of high usage
(mass indexing for example). 

 What else could be the artifact of such a difference -- Solr or JVM? Can it
 only be explained by the mass indexing? What is worrisome is that the
 4.10.2 shard reserves 8x times it uses.

If you set your Xmx to a lot less, the JVM will probably favour more
frequent garbage collections over extra heap allocation.

- Toke Eskildsen, State and University Library, Denmark




RE: unusually high 4.10.2 vs 4.3.1 RAM consumption

2015-02-17 Thread Markus Jelsma
We have seen an increase between 4.8.1 and 4.10. 
 
-Original message-
 From:Dmitry Kan solrexp...@gmail.com
 Sent: Tuesday 17th February 2015 11:06
 To: solr-user@lucene.apache.org
 Subject: unusually high 4.10.2 vs 4.3.1 RAM consumption
 
 Hi,
 
 We are currently comparing the RAM consumption of two parallel Solr
 clusters with different solr versions: 4.10.2 and 4.3.1.
 
 For comparable index sizes of a shard (20G and 26G), we observed 9G vs 5.6G
 RAM footprint (reserved RAM as seen by top), 4.3.1 being the winner.
 
 We have not changed the solrconfig.xml to upgrade to 4.10.2 and have
 reindexed data from scratch. The commits are all controlled on the client,
 i.e. not auto-commits.
 
 Solr: 4.10.2 (high load, mass indexing)
 Java: 1.7.0_76 (Oracle)
 -Xmx25600m
 
 
 Solr: 4.3.1 (normal load, no mass indexing)
 Java: 1.7.0_11 (Oracle)
 -Xmx25600m
 
 The RAM consumption remained the same after the load has stopped on the
 4.10.2 cluster. Manually collecting the memory on a 4.10.2 shard via
 jvisualvm dropped the used RAM from 8,5G to 0,5G. But the reserved RAM as
 seen by top remained at 9G level.
 
 This unusual spike happened during mass data indexing.
 
 What else could be the artifact of such a difference -- Solr or JVM? Can it
 only be explained by the mass indexing? What is worrisome is that the
 4.10.2 shard reserves 8x times it uses.
 
 What can be done about this?
 
 -- 
 Dmitry Kan
 Luke Toolbox: http://github.com/DmitryKey/luke
 Blog: http://dmitrykan.blogspot.com
 Twitter: http://twitter.com/dmitrykan
 SemanticAnalyzer: www.semanticanalyzer.info
 


Re: unusually high 4.10.2 vs 4.3.1 RAM consumption

2015-02-17 Thread Dmitry Kan
;) ok. Currently I'm trying parallel GC options, mentioned here:
http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/101377

At least the saw-tooth RAM chart is starting to shape up.

On Tue, Feb 17, 2015 at 12:55 PM, Markus Jelsma markus.jel...@openindex.io
wrote:

 I would have shared it if i had one :)

 -Original message-
  From:Dmitry Kan solrexp...@gmail.com
  Sent: Tuesday 17th February 2015 11:40
  To: solr-user@lucene.apache.org
  Subject: Re: unusually high 4.10.2 vs 4.3.1 RAM consumption
 
  Have you found an explanation to that?
 
  On Tue, Feb 17, 2015 at 12:12 PM, Markus Jelsma 
 markus.jel...@openindex.io
  wrote:
 
   We have seen an increase between 4.8.1 and 4.10.
  
   -Original message-
From:Dmitry Kan solrexp...@gmail.com
Sent: Tuesday 17th February 2015 11:06
To: solr-user@lucene.apache.org
Subject: unusually high 4.10.2 vs 4.3.1 RAM consumption
   
Hi,
   
We are currently comparing the RAM consumption of two parallel Solr
clusters with different solr versions: 4.10.2 and 4.3.1.
   
For comparable index sizes of a shard (20G and 26G), we observed 9G
 vs
   5.6G
RAM footprint (reserved RAM as seen by top), 4.3.1 being the winner.
   
We have not changed the solrconfig.xml to upgrade to 4.10.2 and have
reindexed data from scratch. The commits are all controlled on the
   client,
i.e. not auto-commits.
   
Solr: 4.10.2 (high load, mass indexing)
Java: 1.7.0_76 (Oracle)
-Xmx25600m
   
   
Solr: 4.3.1 (normal load, no mass indexing)
Java: 1.7.0_11 (Oracle)
-Xmx25600m
   
The RAM consumption remained the same after the load has stopped on
 the
4.10.2 cluster. Manually collecting the memory on a 4.10.2 shard via
jvisualvm dropped the used RAM from 8,5G to 0,5G. But the reserved
 RAM as
seen by top remained at 9G level.
   
This unusual spike happened during mass data indexing.
   
What else could be the artifact of such a difference -- Solr or JVM?
 Can
   it
only be explained by the mass indexing? What is worrisome is that the
4.10.2 shard reserves 8x times it uses.
   
What can be done about this?
   
--
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info
   
  
 
 
 
  --
  Dmitry Kan
  Luke Toolbox: http://github.com/DmitryKey/luke
  Blog: http://dmitrykan.blogspot.com
  Twitter: http://twitter.com/dmitrykan
  SemanticAnalyzer: www.semanticanalyzer.info
 




-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info


RE: spellcheck.count v/s spellcheck.alternativeTermCount

2015-02-17 Thread Dyer, James
Here is an example to illustrate what I mean...

- query q=text:(life AND 
hope)spellcheck.count=10spellcheck.alternativeTermCount=5
- suppose at least one document in your dictionary field has life in it
- also suppose zero documents in your dictionary field have hope in them
- The spellchecker will try to return you up to 10 suggestions for hope, but 
only up to 5 suggestions for life

James Dyer
Ingram Content Group


-Original Message-
From: Nitin Solanki [mailto:nitinml...@gmail.com] 
Sent: Tuesday, February 17, 2015 11:35 AM
To: solr-user@lucene.apache.org
Subject: Re: spellcheck.count v/s spellcheck.alternativeTermCount

Hi James,
How can you say that count doesn't use
index/dictionary then from where suggestions come.

On Tue, Feb 17, 2015 at 10:29 PM, Dyer, James james.d...@ingramcontent.com
wrote:

 See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.count and
 the following section, for details.

 Briefly, count is the # of suggestions it will return for terms that are
 *not* in your index/dictionary.  alternativeTermCount are the # of
 alternatives you want returned for terms that *are* in your dictionary.
 You can set them to the same value, unless you want fewer suggestions when
 the terms is in the dictionary.

 James Dyer
 Ingram Content Group

 -Original Message-
 From: Nitin Solanki [mailto:nitinml...@gmail.com]
 Sent: Tuesday, February 17, 2015 5:27 AM
 To: solr-user@lucene.apache.org
 Subject: spellcheck.count v/s spellcheck.alternativeTermCount

 Hello Everyone,
   I got confusion between spellcheck.count and
 spellcheck.alternativeTermCount in Solr. Any help in details?



Re: Block Join Query Parsers regular expression feature workaround req

2015-02-17 Thread Mikhail Khludnev
Sankalp,
would you mind to post debugQuery=on output, without it it's hard to get
what's the problem?

However, it's worth to mention that Andrey's suggestion seems really
promising.


On Tue, Feb 17, 2015 at 8:19 PM, Sankalp Gupta sankalp.gu...@snapdeal.com
wrote:

 Hi Mikhail,

 It won't solve my problem.
 For ex:
 Suppose my docs are like this:
 doc
 field name=userid1/field
 doc
field name=addresscity1/field
 /doc
 doc
field name=addresscity2/field
 /doc
 /doc

 doc
 field name=userid2/field
 doc
field name=addresscity2/field
 /doc
 doc
field name=addresscity3/field
 /doc
 /doc

 Now if I want* a query to return me all the users not having any address*
 related to *city1* (i.e. only userid=2 should be in the result)and then if
 i query:
 *q={!parent which=userid:*}*:* -address:city1*
 This will return me two* results i.e.** userid=2 and userid=1 *(as userid=1
 is also having a child whose address is city2)  , *desired output was
 userid=2 only.*

 On Tue, Feb 17, 2015 at 8:12 PM, Mikhail Khludnev 
 mkhlud...@griddynamics.com wrote:

  try to search all children remove those who has a value1 by dash, then
 join
  remaining
  q={!parent which=contentType:parent}contentType:child -contentType:value1
  if the space in underneath query causes the problem try to escape it or
  wrap to v=$subq
 
 
 
  On Tue, Feb 17, 2015 at 4:13 PM, Sankalp Gupta 
 sankalp.gu...@snapdeal.com
  
  wrote:
 
   Hi
  
   I need to have a query in which I need to choose only those parent docs
   none of whose children's field is having the specified value.
   i.e. I need something like this:
   http://localhost:8983/solr/core1/select?*q={!parent
   which=contentType:parent}childField:NOT value1*
  
   The problem is* NOT operator is not being supported* in the Block Join
   Query Parsers. Could anyone please suggest a way to workaround this
   problem?
   Have also added the problem on *stackoverflow*:
  
  
 
 http://stackoverflow.com/questions/28562355/in-solr-does-block-join-query-parsers-lack-regular-expression-feature
  
   Regards
   Sankalp Gupta
  
 
 
 
  --
  Sincerely yours
  Mikhail Khludnev
  Principal Engineer,
  Grid Dynamics
 
  http://www.griddynamics.com
  mkhlud...@griddynamics.com
 




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
mkhlud...@griddynamics.com


Re: Collations are not working fine.

2015-02-17 Thread Nitin Solanki
Hey James Dyer,
 Sorry for late responding because I went out
for couple of days. I have tried out the Rajesh Hazari's configuration
which he pasted inside the mail. It seems to be working. I feel that It is
working because by reducing the *str name=spellcheck.count25/str *to*
str name=spellcheck.count5/str* by which collations come less and
spellcheck.maxCollationTries is able to identify or evaluate the collation
gone with the wind.
But here, the problem is that, hits of gone with the wind are coming
less(only 53) *{Look collations.png}* while there are 394 hits for gone
with the wind, if I tried the correct phrase in param q=gone with the
wind. I got 394 - numFound in response.*{Look response.png}*
Any Idea of it?


On Fri, Feb 13, 2015 at 11:31 PM, Dyer, James james.d...@ingramcontent.com
wrote:

 Nitin,

 Can you post the full spellcheck response when you query:

 q=gram_ci:gone wthh thes wintwt=jsonindent=trueshards.qt=/spell

 James Dyer
 Ingram Content Group


 -Original Message-
 From: Nitin Solanki [mailto:nitinml...@gmail.com]
 Sent: Friday, February 13, 2015 1:05 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Collations are not working fine.

 Hi James Dyer,
   I did the same as you told me. Used
 WordBreakSolrSpellChecker instead of shingles. But still collations are not
 coming or working.
 For instance, I tried to get collation of gone with the wind by searching
 gone wthh thes wint on field=gram_ci but didn't succeed. Even, I am
 getting the suggestions of wtth as *with*, thes as *the*, wint as *wind*.
 Also I have documents which contains gone with the wind having 167 times
 in the documents. I don't know that I am missing something or not.
 Please check my below solr configuration:

 *URL: *localhost:8983/solr/wikingram/spell?q=gram_ci:gone wthh thes
 wintwt=jsonindent=trueshards.qt=/spell

 *solrconfig.xml:*

 searchComponent name=spellcheck class=solr.SpellCheckComponent
 str name=queryAnalyzerFieldTypetextSpellCi/str
 lst name=spellchecker
   str name=namedefault/str
   str name=fieldgram_ci/str
   str name=classnamesolr.DirectSolrSpellChecker/str
   str name=distanceMeasureinternal/str
   float name=accuracy0.5/float
   int name=maxEdits2/int
   int name=minPrefix0/int
   int name=maxInspections5/int
   int name=minQueryLength2/int
   float name=maxQueryFrequency0.9/float
   str name=comparatorClassfreq/str
 /lst
 lst name=spellchecker
   str name=namewordbreak/str
   str name=classnamesolr.WordBreakSolrSpellChecker/str
   str name=fieldgram/str
   str name=combineWordstrue/str
   str name=breakWordstrue/str
   int name=maxChanges5/int
 /lst
 /searchComponent

 requestHandler name=/spell class=solr.SearchHandler startup=lazy
 lst name=defaults
   str name=dfgram_ci/str
   str name=spellcheck.dictionarydefault/str
   str name=spellcheckon/str
   str name=spellcheck.extendedResultstrue/str
   str name=spellcheck.count25/str
   str name=spellcheck.onlyMorePopulartrue/str
   str name=spellcheck.maxResultsForSuggest1/str
   str name=spellcheck.alternativeTermCount25/str
   str name=spellcheck.collatetrue/str
   str name=spellcheck.maxCollations50/str
   str name=spellcheck.maxCollationTries50/str
   str name=spellcheck.collateExtendedResultstrue/str
 /lst
 arr name=last-components
   strspellcheck/str
 /arr
   /requestHandler

 *Schema.xml: *

 field name=gram_ci type=textSpellCi indexed=true stored=true
 multiValued=false/

 /fieldTypefieldType name=textSpellCi class=solr.TextField
 positionIncrementGap=100
analyzer type=index
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
 /analyzer
 analyzer type=query
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
 /analyzer
 /fieldType



Re: Release date for Solr 5

2015-02-17 Thread CKReddy Bhimavarapu
Hi,
 Can i get any developer version to test and run for now.

On Tue, Feb 17, 2015 at 12:45 PM, Anshum Gupta ans...@anshumgupta.net
wrote:

 There's a vote going on for the 3rd release candidate of Solr / Lucene 5.0.
 If everything goes smooth and the vote passes, the release should happen in
 about 4-5 days.

 On Mon, Feb 16, 2015 at 10:09 PM, CKReddy Bhimavarapu chaitu...@gmail.com
 
 wrote:

  What is the anticipated release date for Solr 5?
 
  --
  ckreddybh. chaitu...@gmail.com
 



 --
 Anshum Gupta
 http://about.me/anshumgupta




-- 
ckreddybh. chaitu...@gmail.com


Re: Collations are not working fine.

2015-02-17 Thread Nitin Solanki
Hey Rajesh,
 Sorry for late responding because I went out
for couple of days. I have tried out the configuration which you sent me.
Thanks a lot. It seems to be working. I feel that It is working because by
reducing the *str name=spellcheck.count25/str *to* str
name=spellcheck.count5/str* by which collations come less and
spellcheck.maxCollationTries is able to identify or evaluate the collation
gone with the wind.
But here, the problem is that, hits of gone with the wind are coming
less(only 53) *{Look collations.png}* while there are 394 hits for gone
with the wind, if I tried the correct phrase in param q=gone with the
wind. I got 394 - numFound in response.*{Look response.png}*
Any Idea of it?

One more thing to say: You used
str name=spellcheck.collateParam.mm100%/str
str name=spellcheck.collateParam.q.opAND/str
But It doesn't seems to be working. I tried by removing above 2 lines, it
doesn't affect the result. I also changed the value of
spellcheck.collateParam.mm to 0% and spellcheck.collateParam.q.op to OR.
Even it doesn't affect on the results. I am unable to understand what is
spellcheck.collateParam.mm and spellcheck.collateParam.q.op after googling.
Will you please assist me?
Thanks .



On Sat, Feb 14, 2015 at 2:18 AM, Rajesh Hazari rajeshhaz...@gmail.com
wrote:

 Hi Nitin,

 Can u try with the below config, we have these config seems to be working
 for us.

 searchComponent name=spellcheck class=solr.SpellCheckComponent

  str name=queryAnalyzerFieldTypetext_general/str


   lst name=spellchecker
 str name=namewordbreak/str
 str name=classnamesolr.WordBreakSolrSpellChecker/str
 str name=fieldtextSpell/str
 str name=combineWordstrue/str
 str name=breakWordsfalse/str
 int name=maxChanges5/int
   /lst

lst name=spellchecker
 str name=namedefault/str
 str name=fieldtextSpell/str
 str name=classnamesolr.IndexBasedSpellChecker/str
 str name=spellcheckIndexDir./spellchecker/str
 str name=accuracy0.75/str
 float name=thresholdTokenFrequency0.01/float
 str name=buildOnCommittrue/str
 str name=spellcheck.maxResultsForSuggest5/str
  /lst


   /searchComponent



 str name=spellchecktrue/str
 str name=spellcheck.dictionarydefault/str
 str name=spellcheck.dictionarywordbreak/str
 int name=spellcheck.count5/int
 str name=spellcheck.alternativeTermCount15/str
 str name=spellcheck.collatetrue/str
 str name=spellcheck.onlyMorePopularfalse/str
 str name=spellcheck.extendedResultstrue/str
 str name =spellcheck.maxCollations100/str
 str name=spellcheck.collateParam.mm100%/str
 str name=spellcheck.collateParam.q.opAND/str
 str name=spellcheck.maxCollationTries1000/str


 *Rajesh.*

 On Fri, Feb 13, 2015 at 1:01 PM, Dyer, James james.d...@ingramcontent.com
 
 wrote:

  Nitin,
 
  Can you post the full spellcheck response when you query:
 
  q=gram_ci:gone wthh thes wintwt=jsonindent=trueshards.qt=/spell
 
  James Dyer
  Ingram Content Group
 
 
  -Original Message-
  From: Nitin Solanki [mailto:nitinml...@gmail.com]
  Sent: Friday, February 13, 2015 1:05 AM
  To: solr-user@lucene.apache.org
  Subject: Re: Collations are not working fine.
 
  Hi James Dyer,
I did the same as you told me. Used
  WordBreakSolrSpellChecker instead of shingles. But still collations are
 not
  coming or working.
  For instance, I tried to get collation of gone with the wind by
 searching
  gone wthh thes wint on field=gram_ci but didn't succeed. Even, I am
  getting the suggestions of wtth as *with*, thes as *the*, wint as *wind*.
  Also I have documents which contains gone with the wind having 167
 times
  in the documents. I don't know that I am missing something or not.
  Please check my below solr configuration:
 
  *URL: *localhost:8983/solr/wikingram/spell?q=gram_ci:gone wthh thes
  wintwt=jsonindent=trueshards.qt=/spell
 
  *solrconfig.xml:*
 
  searchComponent name=spellcheck class=solr.SpellCheckComponent
  str name=queryAnalyzerFieldTypetextSpellCi/str
  lst name=spellchecker
str name=namedefault/str
str name=fieldgram_ci/str
str name=classnamesolr.DirectSolrSpellChecker/str
str name=distanceMeasureinternal/str
float name=accuracy0.5/float
int name=maxEdits2/int
int name=minPrefix0/int
int name=maxInspections5/int
int name=minQueryLength2/int
float name=maxQueryFrequency0.9/float
str name=comparatorClassfreq/str
  /lst
  lst name=spellchecker
str name=namewordbreak/str
str name=classnamesolr.WordBreakSolrSpellChecker/str
str name=fieldgram/str
str name=combineWordstrue/str
str name=breakWordstrue/str
int name=maxChanges5/int
  /lst
  /searchComponent
 
  requestHandler name=/spell class=solr.SearchHandler startup=lazy
  lst name=defaults
str name=dfgram_ci/str
str name=spellcheck.dictionarydefault/str
str name=spellcheckon/str
str 

Re: Release date for Solr 5

2015-02-17 Thread Anshum Gupta
You can either checkout the release branch and build it yourself from:
http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_5_0

or download it from the RC here:
http://people.apache.org/~anshum/staging_area/lucene-solr-5.0.0-RC3-rev1659987

You should remember that this is a release candidate and not a release at
this point.


On Tue, Feb 17, 2015 at 12:13 AM, CKReddy Bhimavarapu chaitu...@gmail.com
wrote:

 Hi,
  Can i get any developer version to test and run for now.

 On Tue, Feb 17, 2015 at 12:45 PM, Anshum Gupta ans...@anshumgupta.net
 wrote:

  There's a vote going on for the 3rd release candidate of Solr / Lucene
 5.0.
  If everything goes smooth and the vote passes, the release should happen
 in
  about 4-5 days.
 
  On Mon, Feb 16, 2015 at 10:09 PM, CKReddy Bhimavarapu 
 chaitu...@gmail.com
  
  wrote:
 
   What is the anticipated release date for Solr 5?
  
   --
   ckreddybh. chaitu...@gmail.com
  
 
 
 
  --
  Anshum Gupta
  http://about.me/anshumgupta
 



 --
 ckreddybh. chaitu...@gmail.com




-- 
Anshum Gupta
http://about.me/anshumgupta


Sort collation on hits.

2015-02-17 Thread Nitin Solanki
Hi,
All I want to sort the collations on hits in descending order. How
to do ?


Re: Solr suggest is related to second letter, not to initial letter

2015-02-17 Thread Volkan Altan
First of all thank you for your answer.

Example Url:
doc 1 suggest_field: galaxy samsung s5 phone
doc 2 suggest_field: shoe adidas 2 hiking 


http://localhost:8983/solr/solr/suggest?q=galaxy+s

The result for which I am waiting is just like the one indicated below. But; 
the ‘’Galaxy shoe’’ isn’t supposed to appear. However,unfortunately, the galaxy 
shoe appears now.


lst name=collation
str name=collationQuerygalaxy samsung/str
int name=hits0/int
lst name=misspellingsAndCorrections
str name=galaxygalaxy/str
str name=samsungsamsung/str
/lst
/lst
lst name=collation
str name=collationQuerygalaxy s5/str
int name=hits0/int
lst name=misspellingsAndCorrections
str name=galaxygalaxy/str
str name=s5s5/str
/lst
/lst


I don’t want to use KeywordTokenizer. Because, as long as the compound words 
written by the user are available in any document, I am able to receive a 
conclusion. I just don’t want “q=galaxy + samsung” to appear; because it is an 
inappropriate suggession and it doesn’t work.

Many Thanks Ahead of Time!


My settings;

searchComponent class=solr.SpellCheckComponent name=suggest
lst name=spellchecker
str name=namedefault/str
str 
name=classnameorg.apache.solr.spelling.suggest.Suggester/str
str 
name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookup/str  
  
str name=fieldsuggestions/str 
float name=threshold0.1/float
str name=buildOnCommittrue/str
/lst
str name=queryAnalyzerFieldTypesuggest_term/str
/searchComponent
!-- auto-complete --
requestHandler name=/suggest class=solr.SearchHandler
lst name=defaults
str name=spellchecktrue/str
str name=spellcheck.buildfalse/str
str name=spellcheck.dictionarydefault/str
str name=spellcheck.onlyMorePopulartrue/str
str name=spellcheck.count10/str
str name=“spellcheck.collatetrue/str
str name=spellcheck.collateExtendedResultstrue/str
str name=spellcheck.maxCollations10/str
str name=spellcheck.maxCollationTries100/str
/lst
arr name=components
strsuggest/str
/arr
 /requestHandler


fieldType name=suggest_term class=solr.TextField 
positionIncrementGap=100
analyzer type=index
charFilter class=solr.MappingCharFilterFactory 
mapping=mapping-PunctuationToSpace.txt/
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.TrimFilterFactory/
filter class=solr.LowerCaseFilterFactory /
filter class=solr.TurkishLowerCaseFilterFactory/
filter class=solr.StandardFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt enablePositionIncrements=true /
/analyzer
analyzer type=query
charFilter class=solr.MappingCharFilterFactory 
mapping=mapping-PunctuationToSpace.txt/
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.TrimFilterFactory/
filter class=solr.StandardFilterFactory/
filter class=solr.ApostropheFilterFactory/
filter class=solr.TurkishLowerCaseFilterFactory/

filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt enablePositionIncrements=true/
/analyzer
/fieldType


 On 16 Şub 2015, at 03:52, Michael Sokolov msoko...@safaribooksonline.com 
 wrote:
 
 StandardTokenizer splits your text into tokens, and the suggester suggests 
 tokens independently.  It sounds as if you want the suggestions to be based 
 on the entire text (not just the current word), and that only adjacent words 
 in the original should appear as suggestions.  Assuming that's what you are 
 after (it's a little hard to tell from your e-mail -- you might want to 
 clarify by providing a few example of how you *do* want it to work instead of 
 just examples of how you *don't* want it to work), you have a couple of 
 choices:
 
 1) don't use StandardTokenizer, use KeywordTokenizer instead - this will 
 preserve the entire original text and suggest complete texts, rather than 
 words
 2) maybe consider using a shingle filter along with standard tokenizer, so 
 that your tokens include multi-word shingles
 3) Use a suggester with better support for a statistical language model, like 
 this one: 
 http://blog.mikemccandless.com/2014/01/finding-long-tail-suggestions-using.html,
  but to do this you will probably need to do some java programming since it 
 isn't well integrated into solr
 
 -Mike
 
 On 2/14/2015 3:44 AM, Volkan Altan wrote:
 Any idea?
 
 
 On 12 Şub 2015, at 11:12, Volkan Altan volkanal...@gmail.com wrote:
 
 Hello Everyone,
 
 All I want to do with Solr suggester is obtaining the 

Re: Collations are not working fine.

2015-02-17 Thread Nitin Solanki
Hi Charles,
 Will you please send the configuration which you tried. It
will help to solve my problem. Have you sorted the collations on hits or
frequencies of suggestions? If you did than please assist me.

On Mon, Feb 16, 2015 at 7:59 PM, Reitzel, Charles 
charles.reit...@tiaa-cref.org wrote:

 I have been working with collations the last couple days and I kept adding
 the collation-related parameters until it started working for me.   It
 seems I needed str name=spellcheck.collateMaxCollectDocs50/str.

 But, I am using the Suggester with the WFSTLookupFactory.

 Also, I needed to patch the suggester to get frequency information in the
 spellcheck response.

 -Original Message-
 From: Rajesh Hazari [mailto:rajeshhaz...@gmail.com]
 Sent: Friday, February 13, 2015 3:48 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Collations are not working fine.

 Hi Nitin,

 Can u try with the below config, we have these config seems to be working
 for us.

 searchComponent name=spellcheck class=solr.SpellCheckComponent

  str name=queryAnalyzerFieldTypetext_general/str


   lst name=spellchecker
 str name=namewordbreak/str
 str name=classnamesolr.WordBreakSolrSpellChecker/str
 str name=fieldtextSpell/str
 str name=combineWordstrue/str
 str name=breakWordsfalse/str
 int name=maxChanges5/int
   /lst

lst name=spellchecker
 str name=namedefault/str
 str name=fieldtextSpell/str
 str name=classnamesolr.IndexBasedSpellChecker/str
 str name=spellcheckIndexDir./spellchecker/str
 str name=accuracy0.75/str
 float name=thresholdTokenFrequency0.01/float
 str name=buildOnCommittrue/str
 str name=spellcheck.maxResultsForSuggest5/str
  /lst


   /searchComponent



 str name=spellchecktrue/str
 str name=spellcheck.dictionarydefault/str
 str name=spellcheck.dictionarywordbreak/str
 int name=spellcheck.count5/int
 str name=spellcheck.alternativeTermCount15/str
 str name=spellcheck.collatetrue/str
 str name=spellcheck.onlyMorePopularfalse/str
 str name=spellcheck.extendedResultstrue/str
 str name =spellcheck.maxCollations100/str
 str name=spellcheck.collateParam.mm100%/str
 str name=spellcheck.collateParam.q.opAND/str
 str name=spellcheck.maxCollationTries1000/str


 *Rajesh.*

 On Fri, Feb 13, 2015 at 1:01 PM, Dyer, James james.d...@ingramcontent.com
 
 wrote:

  Nitin,
 
  Can you post the full spellcheck response when you query:
 
  q=gram_ci:gone wthh thes wintwt=jsonindent=trueshards.qt=/spell
 
  James Dyer
  Ingram Content Group
 
 
  -Original Message-
  From: Nitin Solanki [mailto:nitinml...@gmail.com]
  Sent: Friday, February 13, 2015 1:05 AM
  To: solr-user@lucene.apache.org
  Subject: Re: Collations are not working fine.
 
  Hi James Dyer,
I did the same as you told me. Used
  WordBreakSolrSpellChecker instead of shingles. But still collations
  are not coming or working.
  For instance, I tried to get collation of gone with the wind by
  searching gone wthh thes wint on field=gram_ci but didn't succeed.
  Even, I am getting the suggestions of wtth as *with*, thes as *the*,
 wint as *wind*.
  Also I have documents which contains gone with the wind having 167
  times in the documents. I don't know that I am missing something or not.
  Please check my below solr configuration:
 
  *URL: *localhost:8983/solr/wikingram/spell?q=gram_ci:gone wthh thes
  wintwt=jsonindent=trueshards.qt=/spell
 
  *solrconfig.xml:*
 
  searchComponent name=spellcheck class=solr.SpellCheckComponent
  str name=queryAnalyzerFieldTypetextSpellCi/str
  lst name=spellchecker
str name=namedefault/str
str name=fieldgram_ci/str
str name=classnamesolr.DirectSolrSpellChecker/str
str name=distanceMeasureinternal/str
float name=accuracy0.5/float
int name=maxEdits2/int
int name=minPrefix0/int
int name=maxInspections5/int
int name=minQueryLength2/int
float name=maxQueryFrequency0.9/float
str name=comparatorClassfreq/str
  /lst
  lst name=spellchecker
str name=namewordbreak/str
str name=classnamesolr.WordBreakSolrSpellChecker/str
str name=fieldgram/str
str name=combineWordstrue/str
str name=breakWordstrue/str
int name=maxChanges5/int
  /lst
  /searchComponent
 
  requestHandler name=/spell class=solr.SearchHandler startup=lazy
  lst name=defaults
str name=dfgram_ci/str
str name=spellcheck.dictionarydefault/str
str name=spellcheckon/str
str name=spellcheck.extendedResultstrue/str
str name=spellcheck.count25/str
str name=spellcheck.onlyMorePopulartrue/str
str name=spellcheck.maxResultsForSuggest1/str
str name=spellcheck.alternativeTermCount25/str
str name=spellcheck.collatetrue/str
str name=spellcheck.maxCollations50/str
str name=spellcheck.maxCollationTries50/str
str name=spellcheck.collateExtendedResultstrue/str
  /lst
  arr 

Re: Release date for Solr 5

2015-02-17 Thread Shalin Shekhar Mangar
You can help by testing out the release candidate available from:
http://people.apache.org/~anshum/staging_area/lucene-solr-5.0.0-RC3-rev1659987

Note that this is *NOT* an official release.

On Tue, Feb 17, 2015 at 1:43 PM, CKReddy Bhimavarapu chaitu...@gmail.com
wrote:

 Hi,
  Can i get any developer version to test and run for now.

 On Tue, Feb 17, 2015 at 12:45 PM, Anshum Gupta ans...@anshumgupta.net
 wrote:

  There's a vote going on for the 3rd release candidate of Solr / Lucene
 5.0.
  If everything goes smooth and the vote passes, the release should happen
 in
  about 4-5 days.
 
  On Mon, Feb 16, 2015 at 10:09 PM, CKReddy Bhimavarapu 
 chaitu...@gmail.com
  
  wrote:
 
   What is the anticipated release date for Solr 5?
  
   --
   ckreddybh. chaitu...@gmail.com
  
 
 
 
  --
  Anshum Gupta
  http://about.me/anshumgupta
 



 --
 ckreddybh. chaitu...@gmail.com




-- 
Regards,
Shalin Shekhar Mangar.


Re: Weird Solr Replication Slave out of sync

2015-02-17 Thread Dmitry Kan
Hi,
This sounds quite strange. Do you see any error messages either in the solr
admin's replication page or in the master's OR slave's logs?

When we had issues with slave replicating from the master, they related to
slave running out of disk. I'm sure there could be a bunch of other reasons
for failed replication, but those should generally be evident in the logs.

On Tue, Feb 17, 2015 at 7:46 AM, Summer Shire shiresum...@gmail.com wrote:

 Hi All,

 My master and slave index version and generation is the same
 yet the index is not in sync because when I execute the same query
 on both master and slave I see old docs on slave which should not be there.

 I also tried to fetch a specific indexversion on slave using
 command=fetchindexindexversion=latestMasterVersion

 This is very spooky because I do not get any errors on master or slave.
 Also I see in the logs that the slave is polling the master every 15 mins
 I was able to find this issue only because I was looking at the specific
 old document.

 Now I can manually delete the index folder on slave and restart my slave.
 But I really want to find out what could be going on. Because these type
 of issues are going to
 be hard to find especially when there are on errors.

 What could be happening. and how can I avoid this from happening ?


 Thanks,
 Summer




-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info


Checkout the source Code to the Release Version of Solr?

2015-02-17 Thread O. Olson
At this time the latest released version of Solr is 4.10.3. Is there anyway
we can get the source code for this release version?

I tried to checkout the Solr code from
http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_10/ In the
commit log, I see a number of revisions but nothing mention which is the
release version. The latest revision being 1657441 on Feb 4. Does this
correspond to 4.10.3? If no, then how do I go about getting the source code
of 4.10.3.

I'm also curious where the version number is embedded i.e. is it in a file
somewhere?

I want to ensure I am using the released version, and not some bug fixes
after the version got released. 

Thank you in anticipation.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Checkout-the-source-Code-to-the-Release-Version-of-Solr-tp4187041.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Collations are not working fine.

2015-02-17 Thread Reitzel, Charles
Hi Nitin,

I was trying many different options for a couple different queries.   In fact, 
I have collations working ok now with the Suggester and WFSTLookup.   The 
problem may have been due to a different dictionary and/or lookup 
implementation and the specific options I was sending.

In general, we're using spellcheck for search suggestions.   The Suggester 
component (vs. Suggester spellcheck implementation), doesn't handle all of our 
cases.  But we can get things working using the spellcheck interface.  What 
gives us particular troubles are the cases where a term may be valid by itself, 
but also be the start of longer words.

The specific terms are acronyms specific to our business.   But I'll attempt to 
show generic examples.

E.g. a partial term like fo can expand to fox, fog, etc. and a full term like 
brown can also expand to something like brownstone.   And, yes, the collation 
brownstone fox is nonsense.  But assume, for the sake of argument, it appears 
in our documents somewhere.

For multiple term query with a spelling error (or partially typed term):  brown 
fo

We get collations in order of hits, descending like ...
brown fox,
brown fog,
brownstone fox.

So far, so good.  

For a single term query, brown, we get a single suggestion, brownstone and no 
collations.

So, we don't know to keep the term brown!

At this point, we need spellcheck.extendedResults=true and look at the origFreq 
value in the suggested corrections.  Unfortunately, the Suggester (spellcheck 
dictionary) does not populate the original frequency information.  And, without 
this information, the SpellCheckComponent cannot format the extended results.

However, with a simple change to Suggester.java, it was easy to get the needed 
frequency information use it to make a sound decision to keep or drop the input 
term.   But I'd be much obliged if there is a better way to go about it.

Configs below.

Thanks,
Charlie

!-- SpellCheck component --
  searchComponent class=solr.SpellCheckComponent name=suggestSC
lst name=spellchecker
  str name=namesuggestDictionary/str
  str name=classnameorg.apache.solr.spelling.suggest.Suggester/str
  str 
name=lookupImplorg.apache.solr.spelling.suggest.fst.WFSTLookupFactory/str
  str name=fieldtext_all/str
  float name=threshold0.0001/float
  str name=exactMatchFirsttrue/str
  str name=buildOnCommittrue/str
/lst
  /searchComponent

!-- Request Handler --
requestHandler name=/tcSuggest class=solr.SearchHandler
  lst name=defaults
str name=titleSearch Suggestions (spellcheck)/str
str name=echoParamsexplicit/str
str name=wtjson/str
str name=rows0/str
str name=defTypeedismax/str
str name=dftext_all/str
str name=flid,name,ticker,entityType,transactionType,accountType/str
str name=spellchecktrue/str
str name=spellcheck.count5/str
str name=spellcheck.dictionarysuggestDictionary/str
str name=spellcheck.alternativeTermCount5/str
str name=spellcheck.collatetrue/str
str name=spellcheck.extendedResultstrue/str
str name=spellcheck.maxCollationTries10/str
str name=spellcheck.maxCollations5/str
  /lst
  arr name=last-components
strsuggestSC/str
  /arr
/requestHandler

-Original Message-
From: Nitin Solanki [mailto:nitinml...@gmail.com] 
Sent: Tuesday, February 17, 2015 3:17 AM
To: solr-user@lucene.apache.org
Subject: Re: Collations are not working fine.

Hi Charles,
 Will you please send the configuration which you tried. It 
will help to solve my problem. Have you sorted the collations on hits or 
frequencies of suggestions? If you did than please assist me.

On Mon, Feb 16, 2015 at 7:59 PM, Reitzel, Charles  
charles.reit...@tiaa-cref.org wrote:

 I have been working with collations the last couple days and I kept adding
 the collation-related parameters until it started working for me.   It
 seems I needed str name=spellcheck.collateMaxCollectDocs50/str.

 But, I am using the Suggester with the WFSTLookupFactory.

 Also, I needed to patch the suggester to get frequency information in 
 the spellcheck response.

 -Original Message-
 From: Rajesh Hazari [mailto:rajeshhaz...@gmail.com]
 Sent: Friday, February 13, 2015 3:48 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Collations are not working fine.

 Hi Nitin,

 Can u try with the below config, we have these config seems to be 
 working for us.

 searchComponent name=spellcheck class=solr.SpellCheckComponent

  str name=queryAnalyzerFieldTypetext_general/str


   lst name=spellchecker
 str name=namewordbreak/str
 str name=classnamesolr.WordBreakSolrSpellChecker/str
 str name=fieldtextSpell/str
 str name=combineWordstrue/str
 str name=breakWordsfalse/str
 int name=maxChanges5/int
   /lst

lst name=spellchecker
 str name=namedefault/str
 str name=fieldtextSpell/str
 str name=classnamesolr.IndexBasedSpellChecker/str
 str name=spellcheckIndexDir./spellchecker/str
 str name=accuracy0.75/str
 float 

Re: Checkout the source Code to the Release Version of Solr?

2015-02-17 Thread Hrishikesh Gadre
Hi,

You can get the released code base here

https://github.com/apache/lucene-solr/releases

Thanks
Hrishikesh

On Tue, Feb 17, 2015 at 2:20 PM, O. Olson olson_...@yahoo.it wrote:

 At this time the latest released version of Solr is 4.10.3. Is there anyway
 we can get the source code for this release version?

 I tried to checkout the Solr code from
 http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_10/ In
 the
 commit log, I see a number of revisions but nothing mention which is the
 release version. The latest revision being 1657441 on Feb 4. Does this
 correspond to 4.10.3? If no, then how do I go about getting the source code
 of 4.10.3.

 I'm also curious where the version number is embedded i.e. is it in a file
 somewhere?

 I want to ensure I am using the released version, and not some bug fixes
 after the version got released.

 Thank you in anticipation.




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Checkout-the-source-Code-to-the-Release-Version-of-Solr-tp4187041.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Checkout the source Code to the Release Version of Solr?

2015-02-17 Thread Hrishikesh Gadre
Also the version number is encoded (at least) in the build file

https://github.com/apache/lucene-solr/blob/817303840fce547a1557e330e93e5a8ac0618f34/lucene/common-build.xml#L32

Hope this helps.

Thanks
Hrishikesh

On Tue, Feb 17, 2015 at 2:25 PM, Hrishikesh Gadre gadre.s...@gmail.com
wrote:

 Hi,

 You can get the released code base here

 https://github.com/apache/lucene-solr/releases

 Thanks
 Hrishikesh

 On Tue, Feb 17, 2015 at 2:20 PM, O. Olson olson_...@yahoo.it wrote:

 At this time the latest released version of Solr is 4.10.3. Is there
 anyway
 we can get the source code for this release version?

 I tried to checkout the Solr code from
 http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_10/ In
 the
 commit log, I see a number of revisions but nothing mention which is the
 release version. The latest revision being 1657441 on Feb 4. Does this
 correspond to 4.10.3? If no, then how do I go about getting the source
 code
 of 4.10.3.

 I'm also curious where the version number is embedded i.e. is it in a file
 somewhere?

 I want to ensure I am using the released version, and not some bug fixes
 after the version got released.

 Thank you in anticipation.




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Checkout-the-source-Code-to-the-Release-Version-of-Solr-tp4187041.html
 Sent from the Solr - User mailing list archive at Nabble.com.





Better way of copying/backup of index in Solr 4.10.2

2015-02-17 Thread dinesh naik
What is the best way for copying/backup of index in Solr 4.10.2?
-- 
Best Regards,
Dinesh Naik


unusually high 4.10.2 vs 4.3.1 RAM consumption

2015-02-17 Thread Dmitry Kan
Hi,

We are currently comparing the RAM consumption of two parallel Solr
clusters with different solr versions: 4.10.2 and 4.3.1.

For comparable index sizes of a shard (20G and 26G), we observed 9G vs 5.6G
RAM footprint (reserved RAM as seen by top), 4.3.1 being the winner.

We have not changed the solrconfig.xml to upgrade to 4.10.2 and have
reindexed data from scratch. The commits are all controlled on the client,
i.e. not auto-commits.

Solr: 4.10.2 (high load, mass indexing)
Java: 1.7.0_76 (Oracle)
-Xmx25600m


Solr: 4.3.1 (normal load, no mass indexing)
Java: 1.7.0_11 (Oracle)
-Xmx25600m

The RAM consumption remained the same after the load has stopped on the
4.10.2 cluster. Manually collecting the memory on a 4.10.2 shard via
jvisualvm dropped the used RAM from 8,5G to 0,5G. But the reserved RAM as
seen by top remained at 9G level.

This unusual spike happened during mass data indexing.

What else could be the artifact of such a difference -- Solr or JVM? Can it
only be explained by the mass indexing? What is worrisome is that the
4.10.2 shard reserves 8x times it uses.

What can be done about this?

-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info


Re: unusually high 4.10.2 vs 4.3.1 RAM consumption

2015-02-17 Thread Dmitry Kan
Have you found an explanation to that?

On Tue, Feb 17, 2015 at 12:12 PM, Markus Jelsma markus.jel...@openindex.io
wrote:

 We have seen an increase between 4.8.1 and 4.10.

 -Original message-
  From:Dmitry Kan solrexp...@gmail.com
  Sent: Tuesday 17th February 2015 11:06
  To: solr-user@lucene.apache.org
  Subject: unusually high 4.10.2 vs 4.3.1 RAM consumption
 
  Hi,
 
  We are currently comparing the RAM consumption of two parallel Solr
  clusters with different solr versions: 4.10.2 and 4.3.1.
 
  For comparable index sizes of a shard (20G and 26G), we observed 9G vs
 5.6G
  RAM footprint (reserved RAM as seen by top), 4.3.1 being the winner.
 
  We have not changed the solrconfig.xml to upgrade to 4.10.2 and have
  reindexed data from scratch. The commits are all controlled on the
 client,
  i.e. not auto-commits.
 
  Solr: 4.10.2 (high load, mass indexing)
  Java: 1.7.0_76 (Oracle)
  -Xmx25600m
 
 
  Solr: 4.3.1 (normal load, no mass indexing)
  Java: 1.7.0_11 (Oracle)
  -Xmx25600m
 
  The RAM consumption remained the same after the load has stopped on the
  4.10.2 cluster. Manually collecting the memory on a 4.10.2 shard via
  jvisualvm dropped the used RAM from 8,5G to 0,5G. But the reserved RAM as
  seen by top remained at 9G level.
 
  This unusual spike happened during mass data indexing.
 
  What else could be the artifact of such a difference -- Solr or JVM? Can
 it
  only be explained by the mass indexing? What is worrisome is that the
  4.10.2 shard reserves 8x times it uses.
 
  What can be done about this?
 
  --
  Dmitry Kan
  Luke Toolbox: http://github.com/DmitryKey/luke
  Blog: http://dmitrykan.blogspot.com
  Twitter: http://twitter.com/dmitrykan
  SemanticAnalyzer: www.semanticanalyzer.info
 




-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info


Re: Possibility of Indexing without feeding again in Solr 4.10.2

2015-02-17 Thread Gora Mohanty
On 17 February 2015 at 15:18, dinesh naik dineshkumarn...@gmail.com wrote:

 Hi all,
 How to can do re-indexing in Solr without importing the data again?
 Is there a way to do re-indexing only for few documents ?


If you have a unique ID for your documents, updating the index with that ID
will update just that document. Other than that, you need to import all
your data again if you want to change the Solr index.

Regards,
Gora


Possibility of Indexing without feeding again in Solr 4.10.2

2015-02-17 Thread dinesh naik
Hi all,
How to can do re-indexing in Solr without importing the data again?
Is there a way to do re-indexing only for few documents ?
-- 
Best Regards,
Dinesh Naik


Re: Better way of copying/backup of index in Solr 4.10.2

2015-02-17 Thread Gora Mohanty
On 17 February 2015 at 15:19, dinesh naik dineshkumarn...@gmail.com wrote:

 What is the best way for copying/backup of index in Solr 4.10.2?

Please take a look at
https://cwiki.apache.org/confluence/display/solr/Backing+Up

Regards,
Gora


Re: unusually high 4.10.2 vs 4.3.1 RAM consumption

2015-02-17 Thread Dmitry Kan
Thanks Toke!

Now I consistently see the saw-tooth pattern on two shards with new GC
parameters, next I will try your suggestion.

The current params are:

-Xmx25600m -XX:+UseParNewGC -XX:+ExplicitGCInvokesConcurrent
-XX:+UseConcMarkSweepGC -XX:MaxTenuringThreshold=8
-XX:CMSInitiatingOccupancyFraction=40

Dmitry

On Tue, Feb 17, 2015 at 1:34 PM, Toke Eskildsen t...@statsbiblioteket.dk
wrote:

 On Tue, 2015-02-17 at 11:05 +0100, Dmitry Kan wrote:
  Solr: 4.10.2 (high load, mass indexing)
  Java: 1.7.0_76 (Oracle)
  -Xmx25600m
 
 
  Solr: 4.3.1 (normal load, no mass indexing)
  Java: 1.7.0_11 (Oracle)
  -Xmx25600m
 
  The RAM consumption remained the same after the load has stopped on the
  4.10.2 cluster. Manually collecting the memory on a 4.10.2 shard via
  jvisualvm dropped the used RAM from 8,5G to 0,5G. But the reserved RAM as
  seen by top remained at 9G level.

 As the JVM does not free OS memory once allocated, top just shows
 whatever peak it reached at some point. When you tell the JVM that it is
 free to use 25GB, it makes a lot of sense to allocate a fair chunk of
 that instead of garbage collecting if there is a period of high usage
 (mass indexing for example).

  What else could be the artifact of such a difference -- Solr or JVM? Can
 it
  only be explained by the mass indexing? What is worrisome is that the
  4.10.2 shard reserves 8x times it uses.

 If you set your Xmx to a lot less, the JVM will probably favour more
 frequent garbage collections over extra heap allocation.

 - Toke Eskildsen, State and University Library, Denmark





-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info


RE: unusually high 4.10.2 vs 4.3.1 RAM consumption

2015-02-17 Thread Markus Jelsma
I would have shared it if i had one :)  
 
-Original message-
 From:Dmitry Kan solrexp...@gmail.com
 Sent: Tuesday 17th February 2015 11:40
 To: solr-user@lucene.apache.org
 Subject: Re: unusually high 4.10.2 vs 4.3.1 RAM consumption
 
 Have you found an explanation to that?
 
 On Tue, Feb 17, 2015 at 12:12 PM, Markus Jelsma markus.jel...@openindex.io
 wrote:
 
  We have seen an increase between 4.8.1 and 4.10.
 
  -Original message-
   From:Dmitry Kan solrexp...@gmail.com
   Sent: Tuesday 17th February 2015 11:06
   To: solr-user@lucene.apache.org
   Subject: unusually high 4.10.2 vs 4.3.1 RAM consumption
  
   Hi,
  
   We are currently comparing the RAM consumption of two parallel Solr
   clusters with different solr versions: 4.10.2 and 4.3.1.
  
   For comparable index sizes of a shard (20G and 26G), we observed 9G vs
  5.6G
   RAM footprint (reserved RAM as seen by top), 4.3.1 being the winner.
  
   We have not changed the solrconfig.xml to upgrade to 4.10.2 and have
   reindexed data from scratch. The commits are all controlled on the
  client,
   i.e. not auto-commits.
  
   Solr: 4.10.2 (high load, mass indexing)
   Java: 1.7.0_76 (Oracle)
   -Xmx25600m
  
  
   Solr: 4.3.1 (normal load, no mass indexing)
   Java: 1.7.0_11 (Oracle)
   -Xmx25600m
  
   The RAM consumption remained the same after the load has stopped on the
   4.10.2 cluster. Manually collecting the memory on a 4.10.2 shard via
   jvisualvm dropped the used RAM from 8,5G to 0,5G. But the reserved RAM as
   seen by top remained at 9G level.
  
   This unusual spike happened during mass data indexing.
  
   What else could be the artifact of such a difference -- Solr or JVM? Can
  it
   only be explained by the mass indexing? What is worrisome is that the
   4.10.2 shard reserves 8x times it uses.
  
   What can be done about this?
  
   --
   Dmitry Kan
   Luke Toolbox: http://github.com/DmitryKey/luke
   Blog: http://dmitrykan.blogspot.com
   Twitter: http://twitter.com/dmitrykan
   SemanticAnalyzer: www.semanticanalyzer.info
  
 
 
 
 
 -- 
 Dmitry Kan
 Luke Toolbox: http://github.com/DmitryKey/luke
 Blog: http://dmitrykan.blogspot.com
 Twitter: http://twitter.com/dmitrykan
 SemanticAnalyzer: www.semanticanalyzer.info
 


spellcheck.count v/s spellcheck.alternativeTermCount

2015-02-17 Thread Nitin Solanki
Hello Everyone,
  I got confusion between spellcheck.count and
spellcheck.alternativeTermCount in Solr. Any help in details?


Block Join Query Parsers regular expression feature workaround req

2015-02-17 Thread Sankalp Gupta
Hi

I need to have a query in which I need to choose only those parent docs
none of whose children's field is having the specified value.
i.e. I need something like this:
http://localhost:8983/solr/core1/select?*q={!parent
which=contentType:parent}childField:NOT value1*

The problem is* NOT operator is not being supported* in the Block Join
Query Parsers. Could anyone please suggest a way to workaround this problem?
Have also added the problem on *stackoverflow*:
http://stackoverflow.com/questions/28562355/in-solr-does-block-join-query-parsers-lack-regular-expression-feature

Regards
Sankalp Gupta