solr cloud index corruption

2013-07-09 Thread Cool Techi
Hi,

We are frequently getting issues of index corruption on the cloud, this used to 
not happen in our master slave setup with solr 3.6. I have tried to check the 
logs, but don't see an exact reason.

I have run the index checker and it recovers, but I am not able to understand 
as to why this is happening. Any pointers would help.

regards,
rohit
  

Re: solr cloud index corruption

2013-07-09 Thread Otis Gospodnetic
Hi,

Maybe you can describe how you are using Solr?  Which version exactly?
 Can you share the errors you are seeing? etc.

Otis
--
Solr  ElasticSearch Support -- http://sematext.com/
Performance Monitoring -- http://sematext.com/spm



On Tue, Jul 9, 2013 at 2:07 AM, Cool Techi cooltec...@outlook.com wrote:
 Hi,

 We are frequently getting issues of index corruption on the cloud, this used 
 to not happen in our master slave setup with solr 3.6. I have tried to check 
 the logs, but don't see an exact reason.

 I have run the index checker and it recovers, but I am not able to understand 
 as to why this is happening. Any pointers would help.

 regards,
 rohit



two types of answers in my query

2013-07-09 Thread Mysurf Mail
Hi,
A general question:


Let's say I have Car And CarParts 1:n relation.

And I have discovered that the user had entered in the search field instead
of car name - a part serial number (SKU).
(I discovered it useing regex)

Is there a way to fetch different types of answers in Solr?
Is there a way to fetch mixed types in the answers?
Is there something similiar to that and how is that feature called?

Thank you.


Re: two types of answers in my query

2013-07-09 Thread Gora Mohanty
On 9 July 2013 12:08, Mysurf Mail stammail...@gmail.com wrote:
 Hi,
 A general question:


 Let's say I have Car And CarParts 1:n relation.

 And I have discovered that the user had entered in the search field instead
 of car name - a part serial number (SKU).
 (I discovered it useing regex)

 Is there a way to fetch different types of answers in Solr?
 Is there a way to fetch mixed types in the answers?
 Is there something similiar to that and how is that feature called?

Your description is not clear enough. What do you mean
by different types of answers, and mixed types?

Assuming that you want to have a different query, or multiple
different queries, when you deduce on the front-end that the
user might have entered a part number instead of a name,
you will need to change the query/queries going to Solr, and
collate the results.

Regards,
Gora


dataDir not being stored in solr.xml

2013-07-09 Thread Chris Collins
I am migrating from solr 3.6 to 4.3.1.  Using the core create rest call, 
something like:


http://10.1.10.150:8090/solr/admin/cores?action=CREATEname=fooinstanceDir=/home/solrdata/foopersist=truewt=jsondataDir=/home/solrdata/foo

I am able to add data to the index it creates within the /home/solrdata/foo 
directory and search it.  The solr config however does not contain the dataDir 
path.  When the process is restarted the dataDir is set to /home/solrdata and 
not /home/solrdata/foo.

Now if I create the index, index some docs, stop the process, manually edit the 
solr.xml  to include dataDir search works.


I am not sure but it seems that in the following class dataDir is not persisted 
in a case that looks like it is work in progress for solr 5.0.

CoreContainer.addPersistOneCore


I also played with passing properties in the create args of the form:

property.dataDir=/home/solrdata/foo

That didnt seem to help but I may not be understanding the exact property 
syntax.

Any clues?

Cheers

C

Re: Solr limitations

2013-07-09 Thread Ramkumar R. Aiyengar
 5. No more than 32 nodes in your SolrCloud cluster.

I hope this isn't too OT, but what tradeoffs is this based on? Would have
thought it easy to hit this number for a big index and high load (hence
with the view of both the number of shards and replicas horizontally
scaling..)

 6. Don't return more than 250 results on a query.

 None of those is a hard limit, but don't go beyond them unless your Proof
of Concept testing proves that performance is acceptable for your situation.

 Start with a simple 4-node, 2-shard, 2-replica cluster for preliminary
tests and then scale as needed.

 Dynamic and multivalued fields? Try to stay away from them - excepts for
the simplest cases, they are usually an indicator of a weak data model.
Sure, it's fine to store a relatively small number of values in a
multivalued field (say, dozens of values), but be aware that you can't
directly access individual values, you can't tell which was matched on a
query, and you can't coordinate values between multiple multivalued fields.
Except for very simple cases, multivalued fields should be flattened into
multiple documents with a parent ID.

 Since you brought up the topic of dynamic fields, I am curious how you
got the impression that they were a good technique to use as a starting
point. They're fine for prototyping and hacking, and fine when used in
moderation, but not when used to excess. The whole point of Solr is
searching and searching is optimized within fields, not across fields, so
having lots of dynamic fields is counter to the primary strengths of Lucene
and Solr. And... schemas with lots  of dynamic fields tend to be difficult
to maintain. For example, if you wanted to ask a support question here, one
of the first things we want to know is what your schema looks like, but
with lots of dynamic fields it is not possible to have a simple discussion
of what your schema looks like.

 Sure, there is something called schemaless design (and Solr supports
that in 4.4), but that's very different from heavy reliance on dynamic
fields in the traditional sense. Schemaless design is A-OK, but using
dynamic fields for arrays of data in a single document is a poor match
for the search features of Solr (e.g., Edismax searching across multiple
fields.)

 One other tidbit: Although Solr does not enforce naming conventions for
field names, and you can put special characters in them, there are plenty
of features in Solr, such as the common fl parameter, where field names
are expected to adhere to Java naming rules. When people start going wild
with dynamic fields, it is common that they start going wild with their
names as well, using spaces, colons, slashes, etc. that cannot be parsed in
the fl and qf parameters, for example. Please don't go there!

 In short, put up a small cluster and start doing a Proof of Concept
cluster. Stay within my suggested guidelines and you should do okay.

 -- Jack Krupansky

 -Original Message- From: Marcelo Elias Del Valle
 Sent: Monday, July 08, 2013 9:46 AM
 To: solr-user@lucene.apache.org
 Subject: Solr limitations


 Hello everyone,

I am trying to search information about possible solr limitations I
 should consider in my architecture. Things like max number of dynamic
 fields, max number o documents in SolrCloud, etc.
Does anyone know where I can find this info?

 Best regards,
 --
 Marcelo Elias Del Valle
 http://mvalle.com - @mvallebr


Calculating Solr document score by ignoring the boost field.

2013-07-09 Thread imran khan
Greetings,

I am using nutch 2.x as my datasource for Solr 4.3.0. And nutch passes on
its own boost field to my Solr schema

field name=boost type=float stored=true indexed=false/

Now due to some reason I always get boost = 0.0 and due to this my Solr's
document score is also always 0.0.

Is there any way in Solr that it ignores the boost field's value for its
document's score calculation ?

Regards,
Khan


Field not available on Edimax query

2013-07-09 Thread It-forum

Hello to all,

I load solr by data-import.

I add in db_data_config.xml inside the product entity the tag entity as 
follow :


|
|

 entity name=product_tags
query=select t.name as tags, id_product
FROM ps_product_tag as pt
JOIN ps_tag as t ON pt.id_tag =t.id_tag 
AND t.id_lang=2

WHERE id_product='${product.id_product}'
parentDeltaQuery=select id_product as id from 
ps_product where id_product=${product_features.id_product}

field column=tags name=tag /
/entity
/entitiy //main product entity close

shema.xml :
field name=tag type=text_fr indexed=true stored=true 
multiValued=true  /



When I use a comon select query I get the field tag and his values .

However when i use edimax query with the following details, I'm not able 
to retreive the field tag. And it seems that it is not taken in match 
score too.


The edimax qf parameters are :
qf=id^1.0 ref^9.0 name^6.0 descriptif^1.0 cat^7.0 brand^5.0 
fphonetic^5.0 tag^7.0 features^3.0

q.alt=*:*


Could you help me to understand why ?

Regards

David


Re: Restrict/change numFound solr result

2013-07-09 Thread aniljayanti
Hi Erick,

thanks for reply, I am doing the same thing already. But for paging
calculation i am depending on numFound=120 value. That result i want
.(result name=response numFound=120 start=0)

thanks

aniljayanti



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Restrict-change-numFound-solr-result-tp4075882p4076485.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 4.3 Pivot Performance Issue

2013-07-09 Thread solrUserJM
Hi Jack,

Thanks for your answer.

I upgraded Solr from 4.0.0 (LUCENE_40) to 4.3.0 (LUCENE_43), and later to
solr 4.3.1. As result the pivot queries I had already running against solr
4.0.0 that were taking a few milisecs (100ms, 150ms), now, with solr 4.3.1,
are taking arround 13 secs.

An index optimization reduce the index size, and the time needed to 9 secs
but is still far from the time we had before.

I would like to avoid a full reindex, and, as far as I read in the
documentation, it isn't really needed if the major version doesn't changed.

Is there something I missed? Is somebody facing the same problem?

Thanks
Francisco


On Tue, Jul 2, 2013 at 2:35 PM, Jack Krupansky-2 [via Lucene] 
ml-node+s472066n407467...@n3.nabble.com wrote:

 What is the nature of your degradation?

 -- Jack Krupansky

 -Original Message-
 From: solrUserJM
 Sent: Tuesday, July 02, 2013 4:22 AM
 To: [hidden email] http://user/SendEmail.jtp?type=nodenode=4074679i=0
 Subject: Solr 4.3 Pivot Performance Issue

 Hi There,

 I notice with the upgrade from solr 4.0 to solr 4.3 that we had a
 degradation of queries that are using pivot fields. Have someone else
 notice
 it too?

 Thanks



 --
 View this message in context:

 http://lucene.472066.n3.nabble.com/Solr-4-3-Pivot-Performance-Issue-tp4074617.html
 Sent from the Solr - User mailing list archive at Nabble.com.



 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://lucene.472066.n3.nabble.com/Solr-4-3-Pivot-Performance-Issue-tp4074617p4074679.html
  To unsubscribe from Solr 4.3 Pivot Performance Issue, click 
 herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4074617code=ZnJhbmNpc2NvLnNwYWV0aEBnbWFpbC5jb218NDA3NDYxN3wtMTE4NTkyMjQ4
 .
 NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml




-- 
Francisco Späth




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-3-Pivot-Performance-Issue-tp4074617p4076516.html
Sent from the Solr - User mailing list archive at Nabble.com.

Phrase search without stopwords

2013-07-09 Thread Parul Gupta(Knimbus)
Hi solr-user!!!
I have an issue
I want to know that is it possible to implement StopwordFilterFactory with
KeywordTokenizer?
example I have multiple title :
1)title:Canadian journal of information and library science
2)title:Canadian information of  science
3)title:Southern  information and library science

what I want is if i search for
q=title:Canadian information of science
OR
q=title:Canadian information science

My output should be only title no. 2,i.e Canadian information of  science.

my schema.xml is:
fieldType name=itext class=solr.TextField positionIncrementGap=100
analyzer
charFilter class=solr.MappingCharFilterFactory mapping=mapping.txt/
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.TrimFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=false/
filter class=solr.PatternReplaceFilterFactory pattern=([^a-z])
replacement=  replace=all /
/analyzer
/fieldType


field name=title type=itext indexed=true stored=true
required=false multiValued=false /


Then exact search is working but search without stopwords is not working and
if I use WhitespaceTokenizer instead of KeywordTokenizer then search without
stopwords is working but all the 3 title are coming as output.Please reply
ASAP.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Phrase-search-without-stopwords-tp4076527.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Phrase search without stopwords

2013-07-09 Thread Ahmet Arslan
Hi Parul,

You might find this useful : https://github.com/cominvent/exactmatch/



 From: Parul Gupta(Knimbus) parulgp...@gmail.com
To: solr-user@lucene.apache.org 
Sent: Tuesday, July 9, 2013 12:03 PM
Subject: Phrase search without stopwords
 

Hi solr-user!!!
I have an issue
I want to know that is it possible to implement StopwordFilterFactory with
KeywordTokenizer?
example I have multiple title :
1)title:Canadian journal of information and library science
2)title:Canadian information of  science
3)title:Southern  information and library science

what I want is if i search for
q=title:Canadian information of science
                    OR
q=title:Canadian information science

My output should be only title no. 2,i.e Canadian information of  science.

my schema.xml is:
fieldType name=itext class=solr.TextField positionIncrementGap=100
analyzer
charFilter class=solr.MappingCharFilterFactory mapping=mapping.txt/
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.TrimFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=false/
filter class=solr.PatternReplaceFilterFactory pattern=([^a-z])
replacement=  replace=all /
/analyzer
/fieldType


field name=title type=itext indexed=true stored=true
required=false multiValued=false /


Then exact search is working but search without stopwords is not working and
if I use WhitespaceTokenizer instead of KeywordTokenizer then search without
stopwords is working but all the 3 title are coming as output.Please reply
ASAP.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Phrase-search-without-stopwords-tp4076527.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: dataDir not being stored in solr.xml

2013-07-09 Thread Erick Erickson
There's been a lot of action around this recently, this is
a known issue in 4.3.1.

The short form is it should all be better in Solr 4.4 which
may be out in the next couple of weeks, assuming we
can get agreement.

But look at Solr-4862, 4910, 4982 and related if you want
to see the ugly details.

Best
Erick

On Tue, Jul 9, 2013 at 3:50 AM, Chris Collins ch...@geekychris.com wrote:
 I am migrating from solr 3.6 to 4.3.1.  Using the core create rest call, 
 something like:

 
 http://10.1.10.150:8090/solr/admin/cores?action=CREATEname=fooinstanceDir=/home/solrdata/foopersist=truewt=jsondataDir=/home/solrdata/foo

 I am able to add data to the index it creates within the /home/solrdata/foo 
 directory and search it.  The solr config however does not contain the 
 dataDir path.  When the process is restarted the dataDir is set to 
 /home/solrdata and not /home/solrdata/foo.

 Now if I create the index, index some docs, stop the process, manually edit 
 the solr.xml  to include dataDir search works.


 I am not sure but it seems that in the following class dataDir is not 
 persisted in a case that looks like it is work in progress for solr 5.0.

 CoreContainer.addPersistOneCore


 I also played with passing properties in the create args of the form:

 property.dataDir=/home/solrdata/foo

 That didnt seem to help but I may not be understanding the exact property 
 syntax.

 Any clues?

 Cheers

 C


Re: [Solr 4.2] deleteInstanceDir is added to CoreAdminHandler but is not supported in Unload CoreAdminRequest

2013-07-09 Thread Lyuba Romanchuk
According to code, at least in Solr 4.2, getParams of CoreAdminRequest.Unload
returns locally created ModifiableSolrParams.
It means that parameters that are set in such way won't be received in
CoreAdminHandler.

I'm going to open an issue in Jira and provide a patch for this.

Best regards,
Lyuba



On Fri, Jul 5, 2013 at 6:12 PM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 SolrJ doesn't have explicit support for that param but you can always
 add it yourself.

 For example:
 CoreAdminRequest.Unload req = new CoreAdminRequest.Unload(false);
 ((ModifiableSolrParams) req.getParams()).set(deleteInstanceDir, true);
 req.process(server);

 On Thu, Jul 4, 2013 at 12:50 PM, Lyuba Romanchuk
 lyuba.romanc...@gmail.com wrote:
  Hi,
 
  I need to unload core with deleting instance directory of the core.
  According to code of Solr4.2 I don't see the support for this parameter
 in
  solrj.
  Is there the fix or open issue for this?
 
  Best regards,
  Lyuba



 --
 Regards,
 Shalin Shekhar Mangar.



Re: Solr limitations

2013-07-09 Thread Erick Erickson
I think Jack was mostly thinking in slam dunk terms. I know of
SolrCloud demo clusters with 500+ nodes, and at that point
people said it's going to work for our situation, we don't need
to push more.

As you start getting into that kind of scale, though, you really
have a bunch of ops considerations etc. Mostly when I get into
larger scales I pretty much want to examine my assumptions
and see if they're correct, perhaps start to trim my requirements
etc.

FWIW,
Erick

On Tue, Jul 9, 2013 at 4:07 AM, Ramkumar R. Aiyengar
andyetitmo...@gmail.com wrote:
 5. No more than 32 nodes in your SolrCloud cluster.

 I hope this isn't too OT, but what tradeoffs is this based on? Would have
 thought it easy to hit this number for a big index and high load (hence
 with the view of both the number of shards and replicas horizontally
 scaling..)

 6. Don't return more than 250 results on a query.

 None of those is a hard limit, but don't go beyond them unless your Proof
 of Concept testing proves that performance is acceptable for your situation.

 Start with a simple 4-node, 2-shard, 2-replica cluster for preliminary
 tests and then scale as needed.

 Dynamic and multivalued fields? Try to stay away from them - excepts for
 the simplest cases, they are usually an indicator of a weak data model.
 Sure, it's fine to store a relatively small number of values in a
 multivalued field (say, dozens of values), but be aware that you can't
 directly access individual values, you can't tell which was matched on a
 query, and you can't coordinate values between multiple multivalued fields.
 Except for very simple cases, multivalued fields should be flattened into
 multiple documents with a parent ID.

 Since you brought up the topic of dynamic fields, I am curious how you
 got the impression that they were a good technique to use as a starting
 point. They're fine for prototyping and hacking, and fine when used in
 moderation, but not when used to excess. The whole point of Solr is
 searching and searching is optimized within fields, not across fields, so
 having lots of dynamic fields is counter to the primary strengths of Lucene
 and Solr. And... schemas with lots  of dynamic fields tend to be difficult
 to maintain. For example, if you wanted to ask a support question here, one
 of the first things we want to know is what your schema looks like, but
 with lots of dynamic fields it is not possible to have a simple discussion
 of what your schema looks like.

 Sure, there is something called schemaless design (and Solr supports
 that in 4.4), but that's very different from heavy reliance on dynamic
 fields in the traditional sense. Schemaless design is A-OK, but using
 dynamic fields for arrays of data in a single document is a poor match
 for the search features of Solr (e.g., Edismax searching across multiple
 fields.)

 One other tidbit: Although Solr does not enforce naming conventions for
 field names, and you can put special characters in them, there are plenty
 of features in Solr, such as the common fl parameter, where field names
 are expected to adhere to Java naming rules. When people start going wild
 with dynamic fields, it is common that they start going wild with their
 names as well, using spaces, colons, slashes, etc. that cannot be parsed in
 the fl and qf parameters, for example. Please don't go there!

 In short, put up a small cluster and start doing a Proof of Concept
 cluster. Stay within my suggested guidelines and you should do okay.

 -- Jack Krupansky

 -Original Message- From: Marcelo Elias Del Valle
 Sent: Monday, July 08, 2013 9:46 AM
 To: solr-user@lucene.apache.org
 Subject: Solr limitations


 Hello everyone,

I am trying to search information about possible solr limitations I
 should consider in my architecture. Things like max number of dynamic
 fields, max number o documents in SolrCloud, etc.
Does anyone know where I can find this info?

 Best regards,
 --
 Marcelo Elias Del Valle
 http://mvalle.com - @mvallebr


Re: Calculating Solr document score by ignoring the boost field.

2013-07-09 Thread Erick Erickson
My guess is that you're not really passing on the boost field's value
and getting the default. Don't quite know how I'd track that down though

Best
Erick

On Tue, Jul 9, 2013 at 4:09 AM, imran khan imrankhan.x...@gmail.com wrote:
 Greetings,

 I am using nutch 2.x as my datasource for Solr 4.3.0. And nutch passes on
 its own boost field to my Solr schema

 field name=boost type=float stored=true indexed=false/

 Now due to some reason I always get boost = 0.0 and due to this my Solr's
 document score is also always 0.0.

 Is there any way in Solr that it ignores the boost field's value for its
 document's score calculation ?

 Regards,
 Khan


Re: Restrict/change numFound solr result

2013-07-09 Thread Erick Erickson
No, there's no good way to make Solr return
numFound=120 when there are 540 (or
whatever) records. Why do you care?
If you need to stop at 120, just stop at 120 and ignore
the numFound.

If you need to display the 120 to the end user even if there
are more docs, just do that.

Best
Erick

On Tue, Jul 9, 2013 at 2:33 AM, aniljayanti aniljaya...@yahoo.co.in wrote:
 Hi Erick,

 thanks for reply, I am doing the same thing already. But for paging
 calculation i am depending on numFound=120 value. That result i want
 .(result name=response numFound=120 start=0)

 thanks

 aniljayanti



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Restrict-change-numFound-solr-result-tp4075882p4076485.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Solr Live Nodes not updating immediately

2013-07-09 Thread Ranjith Venkatesan
Hi,

I am new to solr. Currently i m using Solr-4.3.0. I had setup a solrcloud
setup in 3 machines. If I kill a node running in any of the machine using
/kill -9/, status of the killed node is not updating immediately in web
console of solr. I takes hardly /20+ mins/ to mark that as Gone node. 

My question is

1. Why does it takes so much time to update the status of the inactive node.

2. And if the leader node itself is killed means, i cant able to use the
service till the status of the node gets updated.


Thanks in advance


Ranjith Venkatesan



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Live-Nodes-not-updating-immediately-tp4076560.html
Sent from the Solr - User mailing list archive at Nabble.com.


Document count mismatch

2013-07-09 Thread Furkan KAMACI
I've run a command to find term counts at my index:

solr/select/?q=*:*rows=0facet=onfacet.field=tenowt=xmlindent=on

it gives me a result like that:

...
result name=response numFound=3245092 start=0
maxScore=1.0/result
...
lst name=teno
int name=lev3107206/int
int name=tenu59821/int
...

when I sum that numbers(3107206 + 59821 + ...) I get: *3245074 *however *
numFound=3245092 *how it comes?

*PS:*  Returned list has 100 elements. Does Solr returns max 100 elements
 for such kind of situations?


Re: ClassNotFoundException regarding SolrInfoMBean under Tomcat 7

2013-07-09 Thread Michael Bakonyi
Am 05.07.2013 um 16:36 schrieb Shalin Shekhar Mangar:

 Okay so just for the rest of the people who dig up this thread. You
 had to put all the extra jar files required by typo3 into WEB-INF/lib
 to make this work. Is that right?

Maybe this works aswell but I'd put it in a directory called lib within the 
core's folder. That way it is loaded automatically, too, says the example 
solrconfig.xml:

https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/example/solr/collection1/conf/solrconfig.xml

Cheers,
Michael

Am 05.07.2013 um 16:36 schrieb Shalin Shekhar Mangar:

 Okay so just for the rest of the people who dig up this thread. You
 had to put all the extra jar files required by typo3 into WEB-INF/lib
 to make this work. Is that right?
 
 On Fri, Jul 5, 2013 at 8:03 PM, Michael Bakonyi
 kont...@mb-neuemedien.de wrote:
 Hi Shalin,
 
 Am 05.07.2013 um 16:23 schrieb Shalin Shekhar Mangar:
 There are plenty of use-cases for having multiple cores. You may have
 two different schemas for two different kind of documents. Perhaps you
 are indexing content in multiple languages and you may want a core per
 language. In SolrCloud, a node can have multiple cores to support more
 than one shard on the same box.
 
 alright, so it depends on the use case. I guess for me the different use 
 cases will be combinations of domain.tld and language. But for me this is 
 far future I think.
 
 The Solr war file has all the classes it needs to startup and run
 (well except for some optional components like DataImportHandler etc)
 and the SolrInfoMBean is most definitely present in the war file.
 Enabling or disabling jmx has nothing to do with loading that class.
 
 This is what I guessed, too. But I'm neither know Java or Tomcat nor Solr so 
 I tried everything I could.
 
 It is very difficult to guess what's wrong with your setup this way.
 Why don't you try using the example jetty? It works and is well
 supported and optimized for Solr.
 
 Giovanni's guess was right, so this error disappeared luckily.
 
 Cheers,
 Michael
 
 
 
 
 
 
 Am 05.07.2013 um 16:23 schrieb Shalin Shekhar Mangar:
 
 On Thu, Jul 4, 2013 at 4:32 PM, Michael Bakonyi
 kont...@mb-neuemedien.de wrote:
 Hi everyone,
 
 I'm trying to get the CMS TYPO3 connected with Solr 3.6.2.
 
 By now I followed the installation at 
 http://wiki.apache.org/solr/SolrTomcat except that I didn't copy the 
 .war-file into the $SOLR_HOME but referencing to it at a different 
 location via Tomcat Context fragment file.
 
 Until then the Solr-Server works – I can reach the GUI via URL.
 
 To get Solr connected with the CMS I then created a new core-folder (btw. 
 can anybody give me kind of a live example, when to use different cores? 
 Until now I still don't really understand the concept of cores ..) by 
 duplicating the example-folder in which I overwrote some files (especially 
 solrconfig.xml) with files offered by the TYPO3-community. I also moved 
 the file solr.xml one level up and edited it (added core-fragment and 
 especially adjusted instanceDir)  to get a correct multicore-setup like 
 in the example multicore-setup within the downloaded solr-tgz-package.
 
 There are plenty of use-cases for having multiple cores. You may have
 two different schemas for two different kind of documents. Perhaps you
 are indexing content in multiple languages and you may want a core per
 language. In SolrCloud, a node can have multiple cores to support more
 than one shard on the same box.
 
 
 But now I get the Java-exception
 
 java.lang.NoClassDefFoundError: org/apache/solr/core/SolrInfoMBean at 
 java.lang.ClassLoader.defineClass1(Native Method)
 
 In the Tomcat-log file it is said additionally: Caused by: 
 java.lang.ClassNotFoundException: org.apache.solr.core.SolrInfoMBean.
 
 My guess is, that within the new solrconfig.xml there are calls to classes 
 which aren't included correctly. There are some libs, which are included 
 at the top of this file but the paths of the references should be ok as I 
 checked them via Bash: At http://wiki.apache.org/solr/SolrConfigXml it is 
 said that the lib dir= directory is relative to the instanceDir, so this 
 is what I've checked. I also inserted absolute paths but this wasn't 
 successful either.
 
 Can anybody give me a hint how to solve this problem? Would be great :)
 
 The Solr war file has all the classes it needs to startup and run
 (well except for some optional components like DataImportHandler etc)
 and the SolrInfoMBean is most definitely present in the war file.
 Enabling or disabling jmx has nothing to do with loading that class.
 It is very difficult to guess what's wrong with your setup this way.
 Why don't you try using the example jetty? It works and is well
 supported and optimized for Solr.
 
 
 --
 Regards,
 Shalin Shekhar Mangar.
 
 
 
 
 -- 
 Regards,
 Shalin Shekhar Mangar.



Re: Phrase search without stopwords

2013-07-09 Thread It-forum

Hi

I solve it by copying the field in a string field type.

And query on this field only.

Regards

David

Le 09/07/2013 11:03, Parul Gupta(Knimbus) a écrit :

Hi solr-user!!!
I have an issue
I want to know that is it possible to implement StopwordFilterFactory with
KeywordTokenizer?
example I have multiple title :
1)title:Canadian journal of information and library science
2)title:Canadian information of  science
3)title:Southern  information and library science

what I want is if i search for
q=title:Canadian information of science
 OR
q=title:Canadian information science

My output should be only title no. 2,i.e Canadian information of  science.

my schema.xml is:
fieldType name=itext class=solr.TextField positionIncrementGap=100
analyzer
charFilter class=solr.MappingCharFilterFactory mapping=mapping.txt/
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.TrimFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=false/
filter class=solr.PatternReplaceFilterFactory pattern=([^a-z])
replacement=  replace=all /
/analyzer
/fieldType


field name=title type=itext indexed=true stored=true
required=false multiValued=false /


Then exact search is working but search without stopwords is not working and
if I use WhitespaceTokenizer instead of KeywordTokenizer then search without
stopwords is working but all the 3 title are coming as output.Please reply
ASAP.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Phrase-search-without-stopwords-tp4076527.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: Field not available on Edimax query

2013-07-09 Thread It-forum

Any suggestion ?


Le 09/07/2013 12:29, It-forum a écrit :

Hello to all,

I load solr by data-import.

I add in db_data_config.xml inside the product entity the tag entity 
as follow :


|
|

 entity name=product_tags
query=select t.name as tags, id_product
FROM ps_product_tag as pt
JOIN ps_tag as t ON pt.id_tag 
=t.id_tag AND t.id_lang=2

WHERE id_product='${product.id_product}'
parentDeltaQuery=select id_product as id from 
ps_product where id_product=${product_features.id_product}

field column=tags name=tag /
/entity
/entitiy //main product entity close

shema.xml :
field name=tag type=text_fr indexed=true stored=true 
multiValued=true  /



When I use a comon select query I get the field tag and his values .

However when i use edimax query with the following details, I'm not 
able to retreive the field tag. And it seems that it is not taken in 
match score too.


The edimax qf parameters are :
qf=id^1.0 ref^9.0 name^6.0 descriptif^1.0 cat^7.0 brand^5.0 
fphonetic^5.0 tag^7.0 features^3.0

q.alt=*:*


Could you help me to understand why ?

Regards

David





Re: Document count mismatch

2013-07-09 Thread Jack Krupansky
1. Try facet.missing=true to count the number of documents that do not have 
a value for that field.


2. Try facet.limit=n to set the number of returned facet values to a larger 
or smaller value than the default of 100.


3. Try reading the Faceting chapter of my book!

-- Jack Krupansky

-Original Message- 
From: Furkan KAMACI

Sent: Tuesday, July 09, 2013 8:09 AM
To: solr-user@lucene.apache.org
Subject: Document count mismatch

I've run a command to find term counts at my index:

solr/select/?q=*:*rows=0facet=onfacet.field=tenowt=xmlindent=on

it gives me a result like that:

...
result name=response numFound=3245092 start=0
maxScore=1.0/result
...
lst name=teno
int name=lev3107206/int
int name=tenu59821/int
...

when I sum that numbers(3107206 + 59821 + ...) I get: *3245074 *however *
numFound=3245092 *how it comes?

*PS:*  Returned list has 100 elements. Does Solr returns max 100 elements
for such kind of situations? 



Re: Calculating Solr document score by ignoring the boost field.

2013-07-09 Thread Tony Mullins
I am passing boost value (via nutch) and i.e boost =0.0.
But my question is why Solr is showing me score = 0.0 when my boost (index
time boost) = 0.0 ?
Should not Solr calculate its documents score on the basis of TF-IDF ? And
if not how can I make Solr to only consider TF-IDF while calculating
document's score ?

Regards,
Khan


On Tue, Jul 9, 2013 at 4:46 PM, Erick Erickson erickerick...@gmail.comwrote:

 My guess is that you're not really passing on the boost field's value
 and getting the default. Don't quite know how I'd track that down
 though

 Best
 Erick

 On Tue, Jul 9, 2013 at 4:09 AM, imran khan imrankhan.x...@gmail.com
 wrote:
  Greetings,
 
  I am using nutch 2.x as my datasource for Solr 4.3.0. And nutch passes on
  its own boost field to my Solr schema
 
  field name=boost type=float stored=true indexed=false/
 
  Now due to some reason I always get boost = 0.0 and due to this my
 Solr's
  document score is also always 0.0.
 
  Is there any way in Solr that it ignores the boost field's value for
 its
  document's score calculation ?
 
  Regards,
  Khan



Re: Calculating Solr document score by ignoring the boost field.

2013-07-09 Thread Jack Krupansky

Simple math: x times zero equals zero.

That's why the default document boost is 1.0 - score times 1.0 equals score.

Any particular reason you wanted to zero out the document score from the 
document level?


-- Jack Krupansky

-Original Message- 
From: Tony Mullins

Sent: Tuesday, July 09, 2013 9:23 AM
To: solr-user@lucene.apache.org
Subject: Re: Calculating Solr document score by ignoring the  field.

I am passing boost value (via nutch) and i.e boost =0.0.
But my question is why Solr is showing me score = 0.0 when my boost (index
time boost) = 0.0 ?
Should not Solr calculate its documents score on the basis of TF-IDF ? And
if not how can I make Solr to only consider TF-IDF while calculating
document's score ?

Regards,
Khan


On Tue, Jul 9, 2013 at 4:46 PM, Erick Erickson 
erickerick...@gmail.comwrote:



My guess is that you're not really passing on the boost field's value
and getting the default. Don't quite know how I'd track that down
though

Best
Erick

On Tue, Jul 9, 2013 at 4:09 AM, imran khan imrankhan.x...@gmail.com
wrote:
 Greetings,

 I am using nutch 2.x as my datasource for Solr 4.3.0. And nutch passes 
 on

 its own boost field to my Solr schema

 field name=boost type=float stored=true indexed=false/

 Now due to some reason I always get boost = 0.0 and due to this my
Solr's
 document score is also always 0.0.

 Is there any way in Solr that it ignores the boost field's value for
its
 document's score calculation ?

 Regards,
 Khan





Re: Document count mismatch

2013-07-09 Thread Furkan KAMACI
Ok, one more question. I have another field at my schema: *url*. How can I
get urls at each facet?

2013/7/9 Jack Krupansky j...@basetechnology.com

 1. Try facet.missing=true to count the number of documents that do not
 have a value for that field.

 2. Try facet.limit=n to set the number of returned facet values to a
 larger or smaller value than the default of 100.

 3. Try reading the Faceting chapter of my book!

 -- Jack Krupansky

 -Original Message- From: Furkan KAMACI
 Sent: Tuesday, July 09, 2013 8:09 AM
 To: solr-user@lucene.apache.org
 Subject: Document count mismatch


 I've run a command to find term counts at my index:

 solr/select/?q=*:*rows=0facet=onfacet.field=tenowt=xmlindent=on

 it gives me a result like that:

 ...
 result name=response numFound=3245092 start=0
 maxScore=1.0/result
 ...
 lst name=teno
 int name=lev3107206/int
 int name=tenu59821/int
 ...

 when I sum that numbers(3107206 + 59821 + ...) I get: *3245074 *however *
 numFound=3245092 *how it comes?

 *PS:*  Returned list has 100 elements. Does Solr returns max 100 elements

 for such kind of situations?



Re: two types of answers in my query

2013-07-09 Thread Jack Krupansky
Usually a car term and a car part term will look radically different. So, 
simply use the edismax query parser and set qf to be both the car and car 
part fields. If either matches, the document will be selected. And if you 
have a type field, you can check that to see if a car or part was matched 
in the results.


-- Jack Krupansky

-Original Message- 
From: Mysurf Mail

Sent: Tuesday, July 09, 2013 2:38 AM
To: solr-user@lucene.apache.org
Subject: two types of answers in my query

Hi,
A general question:


Let's say I have Car And CarParts 1:n relation.

And I have discovered that the user had entered in the search field instead
of car name - a part serial number (SKU).
(I discovered it useing regex)

Is there a way to fetch different types of answers in Solr?
Is there a way to fetch mixed types in the answers?
Is there something similiar to that and how is that feature called?

Thank you. 



Re: Document count mismatch

2013-07-09 Thread Jack Krupansky

I don't quite follow the question. Give us an example.

-- Jack Krupansky

-Original Message- 
From: Furkan KAMACI 
Sent: Tuesday, July 09, 2013 9:37 AM 
To: solr-user@lucene.apache.org 
Subject: Re: Document count mismatch 


Ok, one more question. I have another field at my schema: *url*. How can I
get urls at each facet?

2013/7/9 Jack Krupansky j...@basetechnology.com


1. Try facet.missing=true to count the number of documents that do not
have a value for that field.

2. Try facet.limit=n to set the number of returned facet values to a
larger or smaller value than the default of 100.

3. Try reading the Faceting chapter of my book!

-- Jack Krupansky

-Original Message- From: Furkan KAMACI
Sent: Tuesday, July 09, 2013 8:09 AM
To: solr-user@lucene.apache.org
Subject: Document count mismatch


I've run a command to find term counts at my index:

solr/select/?q=*:*rows=0facet=onfacet.field=tenowt=xmlindent=on

it gives me a result like that:

...
result name=response numFound=3245092 start=0
maxScore=1.0/result
...
lst name=teno
int name=lev3107206/int
int name=tenu59821/int
...

when I sum that numbers(3107206 + 59821 + ...) I get: *3245074 *however *
numFound=3245092 *how it comes?

*PS:*  Returned list has 100 elements. Does Solr returns max 100 elements

for such kind of situations?



Re: Phrase search without stopwords

2013-07-09 Thread Parul Gupta(Knimbus)
Hey thanks.


Its some what works for me





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Phrase-search-without-stopwords-tp4076527p4076598.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Document count mismatch

2013-07-09 Thread Furkan KAMACI
I've another field at my schema: it is *url*. When I get results as facet
I see that there are 3107206 numbers of *lev* (int
name=lev3107206/int). However what are the urls of that 3107206
documents? I tried grouping instead of facet:

/solr/select/?q=*:*group=truegroup.field=langwt=xmlfl=url

and I get only one result for each group. I want to get all of them. on the
other hand if I change my query into that:

/solr/select/?q=*:*group=truegroup.field=langwt=xmlfl=url*
group.query=teno:lev*
*
*
I get that error:*
*

str name=msgshard 0 did not set sort field values (FieldDoc.fields is
null); you must pass fillFields=true to IndexSearcher.search on each
shard/strstr name=tracejava.lang.IllegalArgumentException: shard 0
did not set sort field values (FieldDoc.fields is null); you must pass
fillFields=true to IndexSearcher.search on each shard
at org.apache.lucene.search.TopDocs$MergeSortQueue.init(TopDocs.java:143)
at org.apache.lucene.search.TopDocs.merge(TopDocs.java:214)
...





2013/7/9 Jack Krupansky j...@basetechnology.com

 I don't quite follow the question. Give us an example.

 -- Jack Krupansky

 -Original Message- From: Furkan KAMACI Sent: Tuesday, July 09,
 2013 9:37 AM To: solr-user@lucene.apache.org Subject: Re: Document count
 mismatch
 Ok, one more question. I have another field at my schema: *url*. How can I

 get urls at each facet?

 2013/7/9 Jack Krupansky j...@basetechnology.com

  1. Try facet.missing=true to count the number of documents that do not
 have a value for that field.

 2. Try facet.limit=n to set the number of returned facet values to a
 larger or smaller value than the default of 100.

 3. Try reading the Faceting chapter of my book!

 -- Jack Krupansky

 -Original Message- From: Furkan KAMACI
 Sent: Tuesday, July 09, 2013 8:09 AM
 To: solr-user@lucene.apache.org
 Subject: Document count mismatch


 I've run a command to find term counts at my index:

 solr/select/?q=*:*rows=0facet=onfacet.field=tenowt=xmlindent=on

 it gives me a result like that:

 ...
 result name=response numFound=3245092 start=0
 maxScore=1.0/result
 ...
 lst name=teno
 int name=lev3107206/int
 int name=tenu59821/int
 ...

 when I sum that numbers(3107206 + 59821 + ...) I get: *3245074 *however *
 numFound=3245092 *how it comes?

 *PS:*  Returned list has 100 elements. Does Solr returns max 100 elements

 for such kind of situations?




Re: Best way to call asynchronously - Custom data import handler

2013-07-09 Thread Shawn Heisey
On 7/8/2013 11:10 PM, Learner wrote:
 
 I wrote a custom data import handler to import data from files. I am trying
 to figure out a way to make asynchronous call instead of waiting for the
 data import response. Is there an easy way to invoke asynchronously  (other
 than using futures and callables) ?
 
 public class CustomFileImportHandler extends RequestHandlerBase implements
 SolrCoreAware{
   public void handleRequestBody(SolrQueryRequest arg0, SolrQueryResponse
 arg1){
indexer a= new indexer(); // constructor
String status= a.Index(); // method to do indexing, trying to make it
 async
 }
 }

Generally speaking, it's easier to write a separate program than write a
Solr plugin, unless you just want to add a tiny tweak to an existing
class and not make fundamental changes in how it works.  The dataimport
handler is designed around a model of starting and frequently checking
the status to know whether it's done.

For what you want to do, I'd write a subroutine, module, or a separate
program using a Solr API for your language that obtains the data from
the source and indexes it to Solr directly.  This is definitely the
preferred method if your code is written in Java, but it's generally the
right way to go no matter what language you're using.

Thanks,
Shawn



Re: Solr Live Nodes not updating immediately

2013-07-09 Thread Mark Miller
Something is wrong if it actually takes 20 minutes.


- Mark

On Jul 9, 2013, at 7:43 AM, Ranjith Venkatesan ranjit...@zohocorp.com wrote:

 Hi,
 
 I am new to solr. Currently i m using Solr-4.3.0. I had setup a solrcloud
 setup in 3 machines. If I kill a node running in any of the machine using
 /kill -9/, status of the killed node is not updating immediately in web
 console of solr. I takes hardly /20+ mins/ to mark that as Gone node. 
 
 My question is
 
 1. Why does it takes so much time to update the status of the inactive node.
 
 2. And if the leader node itself is killed means, i cant able to use the
 service till the status of the node gets updated.
 
 
 Thanks in advance
 
 
 Ranjith Venkatesan
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-Live-Nodes-not-updating-immediately-tp4076560.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Is there an easy way to know if a Solr cloud node is a shard leader?

2013-07-09 Thread Robert Stewart
I would like to be able to do it without consulting Zookeeper. Is there some 
variable or API I can call on a specific Solr cloud node to know if it is 
currently a shard leader?  The reason I want to know is I want to perform index 
backup on the shard leader from a cron job *only* if that node is a shard 
leader.

Bob


Re: Solr Live Nodes not updating immediately

2013-07-09 Thread Shawn Heisey
On 7/9/2013 5:43 AM, Ranjith Venkatesan wrote:
 I am new to solr. Currently i m using Solr-4.3.0. I had setup a solrcloud
 setup in 3 machines. If I kill a node running in any of the machine using
 /kill -9/, status of the killed node is not updating immediately in web
 console of solr. I takes hardly /20+ mins/ to mark that as Gone node. 
 
 My question is
 
 1. Why does it takes so much time to update the status of the inactive node.
 
 2. And if the leader node itself is killed means, i cant able to use the
 service till the status of the node gets updated.

As Mark said, something is very wrong if it takes 20 minutes for the
cloud state to update.

I'm wondering why you have done a kill -9 to stop Solr?  If running a
stop command (or a standard SIGTERM) doesn't properly shut the process
down, then you may have some other underlying operating system issue
that needs to be solved, and could be causing the node status problem.

Thanks,
Shawn



Re: Solr Live Nodes not updating immediately

2013-07-09 Thread Ranjith Venkatesan
The same scenario happens if network to any one of the machine is
unavailable. (i.e if we manually disconnect network cable also, status of
the node not gets updated immediately).

Pls help me in this issue



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Live-Nodes-not-updating-immediately-tp4076560p4076621.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Live Nodes not updating immediately

2013-07-09 Thread Ranjith Venkatesan
We are going to use solr in production. There are chances that the machine
itself might shutdown due to power failure or the network is disconnected
due to manual intervention. We need to address those cases as well to build
a robust system..



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Live-Nodes-not-updating-immediately-tp4076560p4076633.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Live Nodes not updating immediately

2013-07-09 Thread Shawn Heisey
 We are going to use solr in production. There are chances that the machine
 itself might shutdown due to power failure or the network is disconnected
 due to manual intervention. We need to address those cases as well to
 build
 a robust system..

The latest version of Solr is 4.3.1, and 4.4 is right around the corner.
Any chance you can test a nightly 4.4 build or a checkout of the
lucene_solr_4_4 branch,ji so we can know whether you are running into the
same problems with what will be released soon? No sense in fixing a
problem that no longer exists.

Thanks,
Shawn




Re: Field not available on Edimax query

2013-07-09 Thread Alexandre Rafalovitch
On Tue, Jul 9, 2013 at 6:29 AM, It-forum it-fo...@meseo.fr wrote:

 However when i use edimax query with the following details, I'm not able
 to retreive the field tag. And it seems that it is not taken in match
 score too.


You seem to have two problems here. One not matching (use debug flags for
that) and one not retrieving. But what do you mean by not retrieving? By
default all the fields are returned regardless of the query. So if you are
getting it in one but not in another you might be either getting different
documents without that field populated or you have explicitly mis-defined
which fields to return (with 'fl' parameter).

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


Solr Hangs During Updates for over 10 minutes

2013-07-09 Thread Jed Glazner
I'll give you the high level before delving deep into setup etc. I have been 
struggeling at work with a seemingly random problem when solr will hang for 
10-15 minutes during updates.  This outage always seems to immediately be 
proceeded by an EOF exception on  the replica.  Then 10-15 minutes later we see 
an exception on the leader for a socket timeout to the replica.  The leader 
will then tell the replica to recover which in most cases it does and then the 
outage is over.

Here are the setup details:

We are currently using Solr 4.0.0 with an external ZK ensemble of 5 machines. 
We have 2 active collections each with only 1 shard (we have in total about 15 
collections but most are empty or have less than 100 docs). The first index 
(collection1) is 6.5GB and has ~18M documents.  The 2nd index (collection2) is 
9GB and has about 13M documents. In all cases the leader resides on 1 server 
and the replica resides on the other.  Both servers are AWS XL High Mem 
instances. (8 CPUs @ 2.67Ghz, 70GB Ram) with the index residing on a 1TB raid 
10 using ephemeral storage disks.  We are starting solr using the embedded 
jetty with the following java memory and GC options:

-Xmx16382m -Xms4092m -XX:MaxPermSize=256m -Xss256k -XX:NewSize=1536m 
-XX:SurvivorRatio=16 -XX:+DisableExplicitGC -XX:+UseConcMarkSweepGC 
-XX:ParallelCMSThreads=2 -XX:+CMSClassUnloadingEnabled 
-XX:+UseCMSCompactAtFullCollection -XX:CMSInitiatingOccupancyFraction=80 
-XX:+CMSParallelRemarkEnabled

Both collections receive a constant stream of updates ~10k per hour (both 
adds/deletes).  Approximately once per day the following events transpire:


 1.  We see a log entry for a distributed update that takes just over 5 ms 
followed by an EOF write exception on the replica. In all cases this exception 
is triggered by an update to the 9GB collection.
 2.  Occasionally we'll see a 503 shard update error on the leader but usually 
not.
 3.  Approximately 15 minutes after this exception we see a timeout error for a 
this distributed update request on the leader.
 4.  The leader then creates a new connection and tells the replica to recover, 
which it does and everything is OK again.
 5.  During the 15 minute window from when the replica throws the EOF until the 
SocketTimeout by the leader no other updates are processed:

ERROR ON REPLICA:

Jul 8, 2013 6:38:16 PM org.apache.solr.core.SolrCore execute
INFO: [collection2_0] webapp=/solr path=/update 
params={distrib.from=http://Solr4-1-1.domain.com:8983/solr/collection2_0/update.distrib=FROMLEADERwt=javabinversion=2}
 status=0 QTime=50012

Jul 8, 2013 6:38:16 PM org.apache.solr.common.SolrException log
SEVERE: null:org.eclipse.jetty.io.EofException
at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:154)
at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:101)
at 
org.apache.solr.common.util.FastOutputStream.flush(FastOutputStream.java:203)
at 
org.apache.solr.common.util.FastOutputStream.flushBuffer(FastOutputStream.java:196)
at 
org.apache.solr.common.util.JavaBinCodec.marshal(JavaBinCodec.java:94)
at 
org.apache.solr.response.BinaryResponseWriter.write(BinaryResponseWriter.java:49)
at 
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:404)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:289)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111)
at org.eclipse.jetty.server.Server.handle(Server.java:351)
at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47)
at 

Re: Is there an easy way to know if a Solr cloud node is a shard leader?

2013-07-09 Thread Mark Miller
If you call /solr/zookeeper on a specific node, that servlet would tell you - 
output is a bit verbose for what you want though.

- Mark

On Jul 9, 2013, at 10:36 AM, Robert Stewart robert_stew...@epam.com wrote:

 I would like to be able to do it without consulting Zookeeper. Is there some 
 variable or API I can call on a specific Solr cloud node to know if it is 
 currently a shard leader?  The reason I want to know is I want to perform 
 index backup on the shard leader from a cron job *only* if that node is a 
 shard leader.
 
 Bob



Re: Solr Hangs During Updates for over 10 minutes

2013-07-09 Thread Shawn Heisey

On 7/9/2013 9:50 AM, Jed Glazner wrote:

I'll give you the high level before delving deep into setup etc. I have been 
struggeling at work with a seemingly random problem when solr will hang for 
10-15 minutes during updates.  This outage always seems to immediately be 
proceeded by an EOF exception on  the replica.  Then 10-15 minutes later we see 
an exception on the leader for a socket timeout to the replica.  The leader 
will then tell the replica to recover which in most cases it does and then the 
outage is over.

Here are the setup details:

We are currently using Solr 4.0.0 with an external ZK ensemble of 5 machines.


After 4.0.0 was released, a *lot* of problems with SolrCloud surfaced 
and have since been fixed.  You're five releases and about nine months 
behind what's current.  My recommendation: Upgrade to 4.3.1, ensure your 
configuration is up to date with changes to the example config between 
4.0.0 and 4.3.1, and reindex.  Ideally, you should set up a 4.0.0 
testbed, duplicate your current problem, and upgrade the testbed to see 
if the problem goes away.  A testbed will also give you practice for a 
smooth upgrade of your production system.


Thanks,
Shawn



Staggered Replication In Solr?

2013-07-09 Thread adityab
Hi, 
Is staggered replication possible in Solr through configuration?

We are concern with the CPU spike (80%) and GC pauses on all the slaves when
they try to replicate updated index from repeaters. We havent observed this
behavior in v3.5 (Max spike were 50% during replication)
In our case we have 8 slaves serving the traffic, and all start replicating
the new index at the same time. When the switch for Reader happens after
warm-up we see a spike in CPU and at the same time GC pause which causes
request on our application to queue and eventually fails. 

It would be good to have a throttle on master/repeater for max replication
request to serve at a given time.

Planning to write a script and schedule it which will trigger replication in
a staggered fashion so not all slaves are busy replicating. 

thanks
Aditya 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Staggered-Replication-In-Solr-tp4076659.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Staggered Replication In Solr?

2013-07-09 Thread Shawn Heisey

On 7/9/2013 10:37 AM, adityab wrote:

Is staggered replication possible in Solr through configuration?


You wouldn't be able to do this directly without switching to completely 
manually triggered replication, but the concept of a repeater may 
interest you.


http://wiki.apache.org/solr/SolrReplication#Setting_up_a_Repeater

You set up a limited number of slaves replicating from your master. 
Those slaves get also set up as masters, and the rest of your slaves 
replicate from those, instead of the true master.  When the index gets 
updated, the repeaters do their replication, then the other slaves 
replicate from the repeaters.


Thanks,
Shawn



Re: dataDir not being stored in solr.xml

2013-07-09 Thread Chris Collins
Thanks Erick  I made a private patch to the CoreContainer until the real deal.

C
On Jul 9, 2013, at 4:35 AM, Erick Erickson erickerick...@gmail.com wrote:

 There's been a lot of action around this recently, this is
 a known issue in 4.3.1.
 
 The short form is it should all be better in Solr 4.4 which
 may be out in the next couple of weeks, assuming we
 can get agreement.
 
 But look at Solr-4862, 4910, 4982 and related if you want
 to see the ugly details.
 
 Best
 Erick
 
 On Tue, Jul 9, 2013 at 3:50 AM, Chris Collins ch...@geekychris.com wrote:
 I am migrating from solr 3.6 to 4.3.1.  Using the core create rest call, 
 something like:
 

 http://10.1.10.150:8090/solr/admin/cores?action=CREATEname=fooinstanceDir=/home/solrdata/foopersist=truewt=jsondataDir=/home/solrdata/foo
 
 I am able to add data to the index it creates within the /home/solrdata/foo 
 directory and search it.  The solr config however does not contain the 
 dataDir path.  When the process is restarted the dataDir is set to 
 /home/solrdata and not /home/solrdata/foo.
 
 Now if I create the index, index some docs, stop the process, manually edit 
 the solr.xml  to include dataDir search works.
 
 
 I am not sure but it seems that in the following class dataDir is not 
 persisted in a case that looks like it is work in progress for solr 5.0.
 
CoreContainer.addPersistOneCore
 
 
 I also played with passing properties in the create args of the form:
 
property.dataDir=/home/solrdata/foo
 
 That didnt seem to help but I may not be understanding the exact property 
 syntax.
 
 Any clues?
 
 Cheers
 
 C
 



Re: Best way to call asynchronously - Custom data import handler

2013-07-09 Thread Roman Chyla
Other than using futures and callables? Runnables ;-) Other than that you
will need async request (ie. client).

But in case sb else is looking for an easy-recipe for the server-side async:


public void handleRequestBody(.) {
   if (isBusy()) {
rsp.add(message, Batch processing is already running...);
 rsp.add(status, busy);
return;
  }
   runAsynchronously(new LocalSolrQueryRequest(req.getCore(),
req.getParams()));
}
private void runAsynchronously(SolrQueryRequest req) {

final SolrQueryRequest request = req;
 thread = new Thread(new Runnable() {
public void run() {
try {
 while (queue.hasMore()) {
runSynchronously(queue, request);
}
 } catch (Exception e) {
log.error(e.getLocalizedMessage());
} finally {
 request.close();
setBusy(false);
}
 }
});

thread.start();
}


On Tue, Jul 9, 2013 at 1:10 AM, Learner bbar...@gmail.com wrote:


 I wrote a custom data import handler to import data from files. I am trying
 to figure out a way to make asynchronous call instead of waiting for the
 data import response. Is there an easy way to invoke asynchronously  (other
 than using futures and callables) ?

 public class CustomFileImportHandler extends RequestHandlerBase implements
 SolrCoreAware{
 public void handleRequestBody(SolrQueryRequest arg0,
 SolrQueryResponse
 arg1){
indexer a= new indexer(); // constructor
String status= a.Index(); // method to do indexing, trying to make
 it
 async
 }
 }




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Best-way-to-call-asynchronously-Custom-data-import-handler-tp4076475.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Perl Solr help - doing *:* query

2013-07-09 Thread Shawn Heisey

This is primarily to Andy Lester, who wrote the WebService::Solr module
on CPAN, but I'll take a response from anyone who knows what I can do.

If I use the following Perl code, I get an error.  If I try to build
some other query besides *:* to request all documents, the script runs,
but the query doesn't do what I asked it to do.

http://apaste.info/3j3Q

How can I use a perl script with a proper Solr API to count the number
of documents in my Solr index?

I already have a version of my script that parses a JSON response as
plain text, but as I have just learned, it's possible to get invalid
information out of it.  Specifically, the shards.info output has
multiple numFound instances in it, which broke my script.  The 
shards.info parameter is in the request handler defaults.  I'd like to

future-proof it by using actual objects.

Thanks,
Shawn


Re: Perl Solr help - doing *:* query

2013-07-09 Thread Andy Lester

On Jul 9, 2013, at 2:48 PM, Shawn Heisey s...@elyograg.org wrote:

 This is primarily to Andy Lester, who wrote the WebService::Solr module
 on CPAN, but I'll take a response from anyone who knows what I can do.
 
 If I use the following Perl code, I get an error.

What error do you get?  Never say I get an error.  Always say I get this 
error: .

  If I try to build
 some other query besides *:* to request all documents, the script runs,
 but the query doesn't do what I asked it to do.

What DOES it do?


 http://apaste.info/3j3Q

For the sake of future readers, please put your code in the message.  This 
message will get archived, and future people reading the lists will not be able 
to read the code at some arbitrary paste site.

Shawn's code is:

use strict;
use WebService::Solr;
use WebService::Solr::Query;
use WebService::Solr::Response;



my $url = http://idx.REDACTED.com:8984/solr/ncmain;;
my $solr = WebService::Solr-new($url);
my $query = WebService::Solr::Query-new(*:*);
my $response = $solr-search($query, {'rows' = '0'});
my $numFound = $response-content-{response}-{numFound};

print nf: $numFound\n;


xoa

--
Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance



replication getting stuck on a file

2013-07-09 Thread Petersen, Robert
Hi 

My solr 3.6.1 slave farm is suddenly getting stuck during replication.  It 
seems to stop on a random file on various slaves (not all) and not continue.  
I've tried stoping and restarting tomcat etc but some slaves just can't get the 
index pulled down.  Note there is plenty of space on the hard drive.  I don't 
get it.  Everything else seems fine.  Does this ring a bell for anyone?  I have 
the slaves set for five minute polling intervals.

Here is what I see in admin page, it just stays on that one file and won't get 
past it while the speed steadily averages down to 0kbs:

Master   http://ssbuyma01:8983/solr/1/replication
Latest Index Version:null, Generation: null
Replicatable Index Version:1276893670111, Generation: 127205
Poll Interval00:05:00
Local Index  Index Version: 1276893670084, Generation: 127202
Location: /var/LucidWorks/lucidworks/solr/1/data/index
Size: 23.06 GB
Times Replicated Since Startup: 48903
Previous Replication Done At: Tue Jul 09 12:55:01 EDT 2013
Config Files Replicated At: null
Config Files Replicated: null
Times Config Files Replicated Since Startup: null
Next Replication Cycle At: Tue Jul 09 13:00:00 EDT 2013
Current Replication Status   Start Time: Tue Jul 09 12:55:00 EDT 2013
Files Downloaded: 59 / 486
Downloaded: 88.73 MB / 23.06 GB [0.0%]
Downloading File: _34mt.fnm, Downloaded: 1.35 MB / 1.35 MB [100.0%]
Time Elapsed: 691s, Estimated Time Remaining: 183204s, Speed: 131.49 KB/s


Robert (Robi) Petersen
Senior Software Engineer
Search Department

 


  




AW: Solr Hangs During Updates for over 10 minutes

2013-07-09 Thread Jed Glazner
Hi Shawn,

I have been trying to duplicate this problem without success for the last 2 
weeks which is one reason I'm getting flustered.   It seems reasonable to be 
able to duplicate it but I can't.

 We do have a story to upgrade but that is still weeks if not months before 
that gets rolled out to production.

We have another cluster running the same version but with 8 shards and 8 
replicas with each shard at 100gb and more load and more indexing requests 
without this problem but we send docs in batches here and all fields are 
stored.   Where as the trouble index has only 1 or 2 stored fields and only 
send docs 1 at a time.

Could that have anything to do with it?

Jed


Von Samsung Mobile gesendet



 Ursprüngliche Nachricht 
Von: Shawn Heisey s...@elyograg.org
Datum: 07.09.2013 18:33 (GMT+01:00)
An: solr-user@lucene.apache.org
Betreff: Re: Solr Hangs During Updates for over 10 minutes


On 7/9/2013 9:50 AM, Jed Glazner wrote:
 I'll give you the high level before delving deep into setup etc. I have been 
 struggeling at work with a seemingly random problem when solr will hang for 
 10-15 minutes during updates.  This outage always seems to immediately be 
 proceeded by an EOF exception on  the replica.  Then 10-15 minutes later we 
 see an exception on the leader for a socket timeout to the replica.  The 
 leader will then tell the replica to recover which in most cases it does and 
 then the outage is over.

 Here are the setup details:

 We are currently using Solr 4.0.0 with an external ZK ensemble of 5 machines.

After 4.0.0 was released, a *lot* of problems with SolrCloud surfaced
and have since been fixed.  You're five releases and about nine months
behind what's current.  My recommendation: Upgrade to 4.3.1, ensure your
configuration is up to date with changes to the example config between
4.0.0 and 4.3.1, and reindex.  Ideally, you should set up a 4.0.0
testbed, duplicate your current problem, and upgrade the testbed to see
if the problem goes away.  A testbed will also give you practice for a
smooth upgrade of your production system.

Thanks,
Shawn



Re: Perl Solr help - doing *:* query

2013-07-09 Thread Shawn Heisey

On 7/9/2013 2:02 PM, Andy Lester wrote:
What error do you get? Never say I get an error. Always say I get 
this error: . 


This is the actual error when trying *:* :

Can't locate object method _struct_ via package 
WebService::Solr::Query at 
/usr/local/share/perl/5.14.2/WebService/Solr/Query.pm line 37.



  If I try to build
some other query besides *:* to request all documents, the script runs,
but the query doesn't do what I asked it to do.

What DOES it do?


If I change the query line to this:

my $query = WebService::Solr::Query-new({tag_id = '[* TO *]'});

With this, numFound is zero.  The tag_id field is my uniqueKey, and is a 
StrField.  When I use Dumper to print out the actual response from this 
query, it contains the following info:


'q' = '(tag_id:\\[\\* TO \\*\\])',

I didn't ask for a phrase search (the quotes) or for escaping on the 
special query characters.  By automatically doing this, it makes complex 
queries like ranges impossible.  Is there something else that should be 
done for more complex queries?


Thanks,
Shawn



Deleted Docs

2013-07-09 Thread Katie McCorkell
Hello,

I am curious about the Deleted Docs: statistic on the solr/#/collection1
Overview page. Does Solr remove docs while indexing? I thought it only did
that when Optimizing, however my instance had 726 Deleted Docs, but then
after adding some documents that number decreased, eventually to 18 Deleted
Docs.

I understood these Deleted Docs are from situations where two docs have the
same UniqueKey. However my data had way more deleted docs than I expected.
I was using a data-generated uniquekey, when I changed to using the UUID
generator there were 0 deleted docs. But I just wanted to double check, are
there any other cases which would create a Deleted Doc?

Thanks so much!! :)
Katie


Re: Deleted Docs

2013-07-09 Thread Jack Krupansky
Solr (Lucene, actually) will be doing segment merge operations in the 
background, continually, so generally you won't need to do optimize 
operations.


Generally, an explicit delete and a replace of an existing document are the 
only two ways that you would get a deleted document.


-- Jack Krupansky

-Original Message- 
From: Katie McCorkell

Sent: Tuesday, July 09, 2013 5:38 PM
To: solr-user@lucene.apache.org
Subject: Deleted Docs

Hello,

I am curious about the Deleted Docs: statistic on the solr/#/collection1
Overview page. Does Solr remove docs while indexing? I thought it only did
that when Optimizing, however my instance had 726 Deleted Docs, but then
after adding some documents that number decreased, eventually to 18 Deleted
Docs.

I understood these Deleted Docs are from situations where two docs have the
same UniqueKey. However my data had way more deleted docs than I expected.
I was using a data-generated uniquekey, when I changed to using the UUID
generator there were 0 deleted docs. But I just wanted to double check, are
there any other cases which would create a Deleted Doc?

Thanks so much!! :)
Katie 



Re: Deleted Docs

2013-07-09 Thread Shawn Heisey

On 7/9/2013 3:38 PM, Katie McCorkell wrote:

I am curious about the Deleted Docs: statistic on the solr/#/collection1
Overview page. Does Solr remove docs while indexing? I thought it only did
that when Optimizing, however my instance had 726 Deleted Docs, but then
after adding some documents that number decreased, eventually to 18 Deleted
Docs.

I understood these Deleted Docs are from situations where two docs have the
same UniqueKey. However my data had way more deleted docs than I expected.
I was using a data-generated uniquekey, when I changed to using the UUID
generator there were 0 deleted docs. But I just wanted to double check, are
there any other cases which would create a Deleted Doc?


Changes to deleted documents can happen through normal segment merging. 
 Optimizing is just an explicit and deliberate merge down to a single 
segment, but segment merging is a normal part of Solr/Lucene indexing. 
Any deleted documents in segments that get merged will be purged.


I believe the UUID generator will always generate a new value even if a 
document with the same information in the other fields is indexed again. 
 This option should only be used if you do not have an existing field 
with unique values on every document.


Thanks,
Shawn



RE: replication getting stuck on a file

2013-07-09 Thread Petersen, Robert
Look at the speed and time remaining on this one, pretty funny:


Master   http://ssbuyma01:8983/solr/1/replication
Latest Index Version:null, Generation: null
Replicatable Index Version:1276893670202, Generation: 127213
Poll Interval00:05:00
Local Index  Index Version: 1276893670108, Generation: 127204
Location: /var/LucidWorks/lucidworks/solr/1/data/index
Size: 23.13 GB
Times Replicated Since Startup: 48874
Previous Replication Done At: Tue Jul 09 13:12:05 PDT 2013
Config Files Replicated At: null
Config Files Replicated: null
Times Config Files Replicated Since Startup: null
Next Replication Cycle At: Tue Jul 09 13:17:04 PDT 2013
Current Replication Status   Start Time: Tue Jul 09 13:12:04 PDT 2013
Files Downloaded: 10 / 538
Downloaded: 1.67 MB / 23.13 GB [0.0%]
Downloading File: _34n2.prx, Downloaded: 140 bytes / 140 bytes [100.0%]
Time Elapsed: 6203s, Estimated Time Remaining: 88091277s, Speed: 281 bytes/s


-Original Message-
From: Petersen, Robert [mailto:robert.peter...@mail.rakuten.com] 
Sent: Tuesday, July 09, 2013 1:22 PM
To: solr-user@lucene.apache.org
Subject: replication getting stuck on a file

Hi 

My solr 3.6.1 slave farm is suddenly getting stuck during replication.  It 
seems to stop on a random file on various slaves (not all) and not continue.  
I've tried stoping and restarting tomcat etc but some slaves just can't get the 
index pulled down.  Note there is plenty of space on the hard drive.  I don't 
get it.  Everything else seems fine.  Does this ring a bell for anyone?  I have 
the slaves set for five minute polling intervals.

Here is what I see in admin page, it just stays on that one file and won't get 
past it while the speed steadily averages down to 0kbs:

Master   http://ssbuyma01:8983/solr/1/replication
Latest Index Version:null, Generation: null Replicatable Index 
Version:1276893670111, Generation: 127205
Poll Interval00:05:00
Local Index  Index Version: 1276893670084, Generation: 127202
Location: /var/LucidWorks/lucidworks/solr/1/data/index
Size: 23.06 GB
Times Replicated Since Startup: 48903
Previous Replication Done At: Tue Jul 09 12:55:01 EDT 2013 Config Files 
Replicated At: null Config Files Replicated: null Times Config Files Replicated 
Since Startup: null Next Replication Cycle At: Tue Jul 09 13:00:00 EDT 2013
Current Replication Status   Start Time: Tue Jul 09 12:55:00 EDT 2013
Files Downloaded: 59 / 486
Downloaded: 88.73 MB / 23.06 GB [0.0%]
Downloading File: _34mt.fnm, Downloaded: 1.35 MB / 1.35 MB [100.0%] Time 
Elapsed: 691s, Estimated Time Remaining: 183204s, Speed: 131.49 KB/s


Robert (Robi) Petersen
Senior Software Engineer
Search Department

 


  






Re: Solr Hangs During Updates for over 10 minutes

2013-07-09 Thread Otis Gospodnetic
Hi Jed,

This is really with Solr 4.0?  If so, it may be wiser to jump on 4.4
that is about to be released.  We did not have fun working with 4.0 in
SolrCloud mode a few months ago.  You will save time, hair, and money
if you convince your manager to let you use Solr 4.4. :)

Otis
--
Solr  ElasticSearch Support -- http://sematext.com/
Performance Monitoring -- http://sematext.com/spm



On Tue, Jul 9, 2013 at 4:44 PM, Jed Glazner jglaz...@adobe.com wrote:
 Hi Shawn,

 I have been trying to duplicate this problem without success for the last 2 
 weeks which is one reason I'm getting flustered.   It seems reasonable to be 
 able to duplicate it but I can't.

  We do have a story to upgrade but that is still weeks if not months before 
 that gets rolled out to production.

 We have another cluster running the same version but with 8 shards and 8 
 replicas with each shard at 100gb and more load and more indexing requests 
 without this problem but we send docs in batches here and all fields are 
 stored.   Where as the trouble index has only 1 or 2 stored fields and only 
 send docs 1 at a time.

 Could that have anything to do with it?

 Jed


 Von Samsung Mobile gesendet



  Ursprüngliche Nachricht 
 Von: Shawn Heisey s...@elyograg.org
 Datum: 07.09.2013 18:33 (GMT+01:00)
 An: solr-user@lucene.apache.org
 Betreff: Re: Solr Hangs During Updates for over 10 minutes


 On 7/9/2013 9:50 AM, Jed Glazner wrote:
 I'll give you the high level before delving deep into setup etc. I have been 
 struggeling at work with a seemingly random problem when solr will hang for 
 10-15 minutes during updates.  This outage always seems to immediately be 
 proceeded by an EOF exception on  the replica.  Then 10-15 minutes later we 
 see an exception on the leader for a socket timeout to the replica.  The 
 leader will then tell the replica to recover which in most cases it does and 
 then the outage is over.

 Here are the setup details:

 We are currently using Solr 4.0.0 with an external ZK ensemble of 5 machines.

 After 4.0.0 was released, a *lot* of problems with SolrCloud surfaced
 and have since been fixed.  You're five releases and about nine months
 behind what's current.  My recommendation: Upgrade to 4.3.1, ensure your
 configuration is up to date with changes to the example config between
 4.0.0 and 4.3.1, and reindex.  Ideally, you should set up a 4.0.0
 testbed, duplicate your current problem, and upgrade the testbed to see
 if the problem goes away.  A testbed will also give you practice for a
 smooth upgrade of your production system.

 Thanks,
 Shawn



join not working with UUIDs

2013-07-09 Thread Marcelo Elias Del Valle
Hello,

I am trying to create a POC to test query joins. However, I was
surprised when I saw my test worked with some ids, but when my document ids
are UUIDs, it doesn't work.
Follows an example, using solrj:

SolrInputDocument doc = new SolrInputDocument();
doc.addField(id, bcbaf9eb-0da7-4225-be24-2b9472ad2c20);
doc.addField(cor_parede, branca);
doc.addField(num_cadeiras, 34);
solr.add(doc);

// Add children
SolrInputDocument doc2 = new SolrInputDocument();
doc2.addField(id, computador1);
doc2.addField(acessorio1, Teclado);
doc2.addField(acessorio2, Mouse);
doc2.addField(root_id, bcbaf9eb-0da7-4225-be24-2b9472ad2c20);
solr.add(doc2);

 When I execute:

///select
params={start=0rows=10q=cor_parede%3Abrancafq=%7B%21join+from%3Droot_id+to%3Did%7Dacessorio1%3ATeclado}
SolrQuery query = new SolrQuery();

query.setStart(0);
query.setRows(10);
query.set(q, cor_parede:branca);
query.set(fq, {!join from=root_id to=id}acessorio1:Teclado);

QueryResponse response = DGSolrServer.get().query(query);
long numFound = response.getResults().getNumFound();

   it returns zero results. However, if I use room1 for first
document's id and for root_id field on second document, it works.

   Any idea why? What am I missing?

Best regards,
-- 
Marcelo Elias Del Valle
http://mvalle.com - @mvallebr


Re: join not working with UUIDs

2013-07-09 Thread Jack Krupansky
Your join is requesting to use the join_id field (from) of documents 
matching the query of cor_parede:branca, but the join_id field of that 
document is empty.


Maybe you intended to search in the other direction, like 
acessorio1:Teclado.


-- Jack Krupansky

-Original Message- 
From: Marcelo Elias Del Valle

Sent: Tuesday, July 09, 2013 7:34 PM
To: solr-user@lucene.apache.org
Subject: join not working with UUIDs

Hello,

   I am trying to create a POC to test query joins. However, I was
surprised when I saw my test worked with some ids, but when my document ids
are UUIDs, it doesn't work.
   Follows an example, using solrj:

SolrInputDocument doc = new SolrInputDocument();
doc.addField(id, bcbaf9eb-0da7-4225-be24-2b9472ad2c20);
doc.addField(cor_parede, branca);
doc.addField(num_cadeiras, 34);
solr.add(doc);

// Add children
SolrInputDocument doc2 = new SolrInputDocument();
doc2.addField(id, computador1);
doc2.addField(acessorio1, Teclado);
doc2.addField(acessorio2, Mouse);
doc2.addField(root_id, bcbaf9eb-0da7-4225-be24-2b9472ad2c20);
solr.add(doc2);

When I execute:

   ///select
params={start=0rows=10q=cor_parede%3Abrancafq=%7B%21join+from%3Droot_id+to%3Did%7Dacessorio1%3ATeclado}
   SolrQuery query = new SolrQuery();

query.setStart(0);
query.setRows(10);
query.set(q, cor_parede:branca);
query.set(fq, {!join from=root_id to=id}acessorio1:Teclado);

QueryResponse response = DGSolrServer.get().query(query);
long numFound = response.getResults().getNumFound();

  it returns zero results. However, if I use room1 for first
document's id and for root_id field on second document, it works.

  Any idea why? What am I missing?

Best regards,
--
Marcelo Elias Del Valle
http://mvalle.com - @mvallebr 



Re: join not working with UUIDs

2013-07-09 Thread Jack Krupansky

Oops... I misread and confused your q and fq params.

-- Jack Krupansky

-Original Message- 
From: Jack Krupansky

Sent: Tuesday, July 09, 2013 7:47 PM
To: solr-user@lucene.apache.org
Subject: Re: join not working with UUIDs

Your join is requesting to use the join_id field (from) of documents
matching the query of cor_parede:branca, but the join_id field of that
document is empty.

Maybe you intended to search in the other direction, like
acessorio1:Teclado.

-- Jack Krupansky

-Original Message- 
From: Marcelo Elias Del Valle

Sent: Tuesday, July 09, 2013 7:34 PM
To: solr-user@lucene.apache.org
Subject: join not working with UUIDs

Hello,

   I am trying to create a POC to test query joins. However, I was
surprised when I saw my test worked with some ids, but when my document ids
are UUIDs, it doesn't work.
   Follows an example, using solrj:

SolrInputDocument doc = new SolrInputDocument();
doc.addField(id, bcbaf9eb-0da7-4225-be24-2b9472ad2c20);
doc.addField(cor_parede, branca);
doc.addField(num_cadeiras, 34);
solr.add(doc);

// Add children
SolrInputDocument doc2 = new SolrInputDocument();
doc2.addField(id, computador1);
doc2.addField(acessorio1, Teclado);
doc2.addField(acessorio2, Mouse);
doc2.addField(root_id, bcbaf9eb-0da7-4225-be24-2b9472ad2c20);
solr.add(doc2);

When I execute:

   ///select
params={start=0rows=10q=cor_parede%3Abrancafq=%7B%21join+from%3Droot_id+to%3Did%7Dacessorio1%3ATeclado}
   SolrQuery query = new SolrQuery();

query.setStart(0);
query.setRows(10);
query.set(q, cor_parede:branca);
query.set(fq, {!join from=root_id to=id}acessorio1:Teclado);

QueryResponse response = DGSolrServer.get().query(query);
long numFound = response.getResults().getNumFound();

  it returns zero results. However, if I use room1 for first
document's id and for root_id field on second document, it works.

  Any idea why? What am I missing?

Best regards,
--
Marcelo Elias Del Valle
http://mvalle.com - @mvallebr 



Overseer queues confused me

2013-07-09 Thread Illu.Y.Ying (mis.sh04.Newegg) 41417
Hi there:
 In solr4.3 source code , I found overseer use 3 queues to handle all 
solrcloud management request:
 1: /overseer/queue
2: /overseer/queue-work
3: /overseer/collection-queue-work

 ClusterStateUpdater use 1st  2nd queue to handle solrcloud shard or 
state request.
 When peek request from 1st queue, then offer it to 2nd queue and 
handle it.

 OverseerCollectionProcessor use 3rd queue to handle collection related 
request.

 My question is why ClusterStateUpdater use 2 queues but 
OverseerCollectionProcessor use 1 also can handle request correctly?
 Is there any additional design for ClusterStateUpdater?


 Thanks in advance:)


Best Regards,
Illu Ying
Assistant Supervisor, NESC-SH.MIS
+86-021-51530666*41417
Floor 19, KaiKai Plaza, 888, Wanhangdu Rd, Shanghai (200042)
ONCE YOU KNOW, YOU NEWEGG.
CONFIDENTIALITY NOTICE: This email and any files transmitted with it may 
contain privileged or otherwise confidential information. It is intended only 
for the person or persons to whom it is addressed. If you received this message 
in error, you are not authorized to read, print, retain, copy, disclose, 
disseminate, distribute, or use this message any part thereof or any 
information contained therein. Please notify the sender immediately and delete 
all copies of this message. Thank you in advance for your cooperation.
保密注意:此邮件及其附随文件可能包含了保密信息。该邮件的目的是发送给指定收件人。如果您非指定收件人而错误地收到了本邮件,您将无权阅读、打印、保存、复制、泄露、传播、分发或使用此邮件全部或部分内容或者邮件中包含的任何信息。请立即通知发件人,并删除该邮件。感谢您的配合!



Norms

2013-07-09 Thread William Bell
I have a field that has omitNorms=true, but when I look at debugQuery I see
that
the field is being normalized for the score.

What can I do to turn off normalization in the score?

I want a simple way to do 2 things:

boost geodist() highest at 1 mile and lowest at 100 miles.
plus add a boost for a query=edgefield^5.

I only want tf() and no queryNorm. I am not even sure I want idf() but I
can probably live with rare names being boosted.



The results are being normalized. See below. I tried dismax and edismax -
bf, bq and boost.

requestHandler name=autoproviderdist class=solr.SearchHandler
lst name=defaults
str name=echoParamsnone/str
str name=defTypeedismax/str
float name=tie0.01/float
str name=fl
display_name,city_state,prov_url,pwid,city_state_alternative
/str
!--
str name=bq_val_:sum(recip(geodist(store_geohash), .5, 6, 6),
0.1)^10/str
--
str name=boostsum(recip(geodist(store_geohash), .5, 6, 6), 0.1)/str
int name=rows5/int
str name=q.alt*:*/str
str name=qfname_edgy^.9 name_edge^.9 name_word/str
str name=grouptrue/str
str name=group.fieldpwid/str
str name=group.maintrue/str
!-- str name=pfname_edgy/str do not turn on --
str name=sortscore desc, last_name asc/str
str name=d100/str
str name=pt39.740112,-104.984856/str
str name=sfieldstore_geohash/str
str name=hlfalse/str
str name=hl.flname_edgy/str
str name=mm2-1 4-2 6-3/str
/lst
/requestHandler

0.058555886 = queryNorm

product of: 10.854807 = (MATCH) sum of: 1.8391232 = (MATCH) max plus 0.01
times others of: 1.8214592 = (MATCH) weight(name_edge:paul^0.9 in 231378),
product of: 0.30982485 = queryWeight(name_edge:paul^0.9), product of: 0.9 =
boost 5.8789964 = idf(docFreq=26567, maxDocs=3493655)* 0.058555886 =
queryNorm* 5.8789964 = (MATCH) fieldWeight(name_edge:paul in 231378),
product of: 1.0 = tf(termFreq(name_edge:paul)=1) 5.8789964 =
idf(docFreq=26567, maxDocs=3493655) 1.0 = fieldNorm(field=name_edge,
doc=231378) 1.7664119 = (MATCH) weight(name_edgy:paul^0.9 in 231378),
product of: 0.30510724 = queryWeight(name_edgy:paul^0.9), product of: 0.9 =
boost 5.789479 = idf(docFreq=29055, maxDocs=3493655)* 0.058555886 =
queryNorm* 5.789479 = (MATCH) fieldWeight(name_edgy:paul in 231378),
product of: 1.0 = tf(termFreq(name_edgy:paul)=1) 5.789479 =
idf(docFreq=29055, maxDocs=3493655) 1.0 = fieldNorm(field=name_edgy,
doc=231378) 9.015684 = (MATCH) max plus 0.01 times others of: 8.9352665 =
(MATCH) weight(name_word:nutting in 231378), product of: 0.72333425 =
queryWeight(name_word:nutting), product of: 12.352887 = idf(docFreq=40,
maxDocs=3493655) 0.058555886 = queryNorm 12.352887 = (MATCH)
fieldWeight(name_word:nutting in 231378), product of: 1.0 =
tf(termFreq(name_word:nutting)=1) 12.352887 = idf(docFreq=40,
maxDocs=3493655) 1.0 = fieldNorm(field=name_word, doc=231378) 8.04174 =
(MATCH) weight(name_edgy:nutting^0.9 in 231378), product of: 0.65100086 =
queryWeight(name_edgy:nutting^0.9), product of: 0.9 = boost 12.352887 =
idf(docFreq=40, maxDocs=3493655)* 0.058555886 = queryNorm* 12.352887 =
(MATCH) fieldWeight(name_edgy:nutting in 231378), product of: 1.0 =
tf(termFreq(name_edgy:nutting)=1) 12.352887 = idf(docFreq=40,
maxDocs=3493655) 1.0 = fieldNorm(field=name_edgy, doc=231378) 1.0855998 =
sum(6.0/(0.5*float(geodist(39.74168747663498,-104.9849385023117,39.740112,-104.984856))+6.0),const(0.1))



-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: Surround query parser not working?

2013-07-09 Thread William Bell
Can we get a sample fieldType and field definition?

Thanks.


On Mon, Jul 8, 2013 at 8:40 AM, Jack Krupansky j...@basetechnology.comwrote:

 Yes, you should be able to used nested query parsers to mix the queries.
 Solr 4.1(?) made it easier.

 -- Jack Krupansky

 -Original Message- From: Abeygunawardena, Niran
 Sent: Monday, July 08, 2013 7:00 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Surround query parser not working?


 Hi,

 Thanks. I found out that my issue was the default field (df) was being
 ignored and I had to specify the parameter by adding df=text in the URL.

 Thank you for updating the wiki page on the surround parser:
 http://wiki.apache.org/solr/**SurroundQueryParserhttp://wiki.apache.org/solr/SurroundQueryParser

 Hopefully, ordered proximity searches will be supported in the edismax
 query parser itself as the surround query parser is not as good as the
 edismax parser: 
 https://issues.apache.org/**jira/browse/SOLR-3101https://issues.apache.org/jira/browse/SOLR-3101
 Is there a way to AND the surround parser query with the edismax query so
 the ordered proximity search can be run through the surround query parser
 and the results combined/queried with the edismax query parser for other
 parts of the query? Can nested queries support this?

 Thanks,
 Niran


 Niran -

 Looks like you're being bitten by a known feature* of the surround query
 parser.  It does not analyze the text, as some of the other more commonly
 used query parsers does.  The dismax, edismax, and lucene query parsers
 all leverage field analysis on the query terms or phrases. The surround
 query parser just takes the terms as-is.  It's by design, but not
 necessarily something that can't at least be optionally available.  But as
 it is, you'll need to lowercase, at least.  Be careful with index-time
 stemming, as you'd have to account for that in the surround query parser
 syntax by wildcarding things a bit.  Instead of searching for finding,
 one would use find* (and index without stemming) in the query to match
 finds, finding.  It was by design to not analyze in the surround query
 parser because it can be handy to use less analysis tricks at index time,
 and let the query itself be more sophisticated to allow more flexible and
 indeed more complex query-time constructs.

Erik

 * 
 http://wiki.apache.org/solr/**SurroundQueryParser#**Limitationshttp://wiki.apache.org/solr/SurroundQueryParser#Limitations-
  though it'd be useful to have analysis at least optionally available.




-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: Calculating Solr document score by ignoring the boost field.

2013-07-09 Thread Tony Mullins
Jack due to 'some' reason my nutch is returning me index time boost =0.0
and just for a moment suppose that nutch is and will always return boost =0.

Now my simple question was why Solr is showing me document's score = 0 ?
Why is it depending upon index time boost value ? Why or how to make Solr
to only calculate the score value on TF-IDF ?

Regards,
Khan


On Tue, Jul 9, 2013 at 6:31 PM, Jack Krupansky j...@basetechnology.comwrote:

 Simple math: x times zero equals zero.

 That's why the default document boost is 1.0 - score times 1.0 equals
 score.

 Any particular reason you wanted to zero out the document score from the
 document level?

 -- Jack Krupansky

 -Original Message- From: Tony Mullins
 Sent: Tuesday, July 09, 2013 9:23 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Calculating Solr document score by ignoring the  field.


 I am passing boost value (via nutch) and i.e boost =0.0.
 But my question is why Solr is showing me score = 0.0 when my boost (index
 time boost) = 0.0 ?
 Should not Solr calculate its documents score on the basis of TF-IDF ? And
 if not how can I make Solr to only consider TF-IDF while calculating
 document's score ?

 Regards,
 Khan


 On Tue, Jul 9, 2013 at 4:46 PM, Erick Erickson erickerick...@gmail.com**
 wrote:

  My guess is that you're not really passing on the boost field's value
 and getting the default. Don't quite know how I'd track that down
 though

 Best
 Erick

 On Tue, Jul 9, 2013 at 4:09 AM, imran khan imrankhan.x...@gmail.com
 wrote:
  Greetings,
 
  I am using nutch 2.x as my datasource for Solr 4.3.0. And nutch passes
  on
  its own boost field to my Solr schema
 
  field name=boost type=float stored=true indexed=false/
 
  Now due to some reason I always get boost = 0.0 and due to this my
 Solr's
  document score is also always 0.0.
 
  Is there any way in Solr that it ignores the boost field's value for
 its
  document's score calculation ?
 
  Regards,
  Khan