date:20130709

There's been a lot of action around this recently, this is
a known issue in 4.3.1.

The short form is it should all be better in Solr 4.4 which
may be out in the next couple of weeks, assuming we
can get agreement.

But look at Solr-4862, 4910, 4982 and related if you want
to see the ugly details.

Best
Erick

On Tue, Jul 9, 2013 at 3:50 AM, Chris Collins ch...@geekychris.com wrote:
 I am migrating from solr 3.6 to 4.3.1.  Using the core create rest call, 
 something like:

 
 http://10.1.10.150:8090/solr/admin/cores?action=CREATEname=fooinstanceDir=/home/solrdata/foopersist=truewt=jsondataDir=/home/solrdata/foo

 I am able to add data to the index it creates within the /home/solrdata/foo 
 directory and search it.  The solr config however does not contain the 
 dataDir path.  When the process is restarted the dataDir is set to 
 /home/solrdata and not /home/solrdata/foo.

 Now if I create the index, index some docs, stop the process, manually edit 
 the solr.xml  to include dataDir search works.


 I am not sure but it seems that in the following class dataDir is not 
 persisted in a case that looks like it is work in progress for solr 5.0.

 CoreContainer.addPersistOneCore


 I also played with passing properties in the create args of the form:

 property.dataDir=/home/solrdata/foo

 That didnt seem to help but I may not be understanding the exact property 
 syntax.

 Any clues?

 Cheers

 C

Re: [Solr 4.2] deleteInstanceDir is added to CoreAdminHandler but is not supported in Unload CoreAdminRequest

2013-07-09 Thread Lyuba Romanchuk

According to code, at least in Solr 4.2, getParams of CoreAdminRequest.Unload
returns locally created ModifiableSolrParams.
It means that parameters that are set in such way won't be received in
CoreAdminHandler.

I'm going to open an issue in Jira and provide a patch for this.

Best regards,
Lyuba



On Fri, Jul 5, 2013 at 6:12 PM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 SolrJ doesn't have explicit support for that param but you can always
 add it yourself.

 For example:
 CoreAdminRequest.Unload req = new CoreAdminRequest.Unload(false);
 ((ModifiableSolrParams) req.getParams()).set(deleteInstanceDir, true);
 req.process(server);

 On Thu, Jul 4, 2013 at 12:50 PM, Lyuba Romanchuk
 lyuba.romanc...@gmail.com wrote:
  Hi,
 
  I need to unload core with deleting instance directory of the core.
  According to code of Solr4.2 I don't see the support for this parameter
 in
  solrj.
  Is there the fix or open issue for this?
 
  Best regards,
  Lyuba



 --
 Regards,
 Shalin Shekhar Mangar.

Re: Solr limitations

I think Jack was mostly thinking in slam dunk terms. I know of
SolrCloud demo clusters with 500+ nodes, and at that point
people said it's going to work for our situation, we don't need
to push more.

As you start getting into that kind of scale, though, you really
have a bunch of ops considerations etc. Mostly when I get into
larger scales I pretty much want to examine my assumptions
and see if they're correct, perhaps start to trim my requirements
etc.

FWIW,
Erick

On Tue, Jul 9, 2013 at 4:07 AM, Ramkumar R. Aiyengar
andyetitmo...@gmail.com wrote:
 5. No more than 32 nodes in your SolrCloud cluster.

 I hope this isn't too OT, but what tradeoffs is this based on? Would have
 thought it easy to hit this number for a big index and high load (hence
 with the view of both the number of shards and replicas horizontally
 scaling..)

 6. Don't return more than 250 results on a query.

 None of those is a hard limit, but don't go beyond them unless your Proof
 of Concept testing proves that performance is acceptable for your situation.

 Start with a simple 4-node, 2-shard, 2-replica cluster for preliminary
 tests and then scale as needed.

 Dynamic and multivalued fields? Try to stay away from them - excepts for
 the simplest cases, they are usually an indicator of a weak data model.
 Sure, it's fine to store a relatively small number of values in a
 multivalued field (say, dozens of values), but be aware that you can't
 directly access individual values, you can't tell which was matched on a
 query, and you can't coordinate values between multiple multivalued fields.
 Except for very simple cases, multivalued fields should be flattened into
 multiple documents with a parent ID.

 Since you brought up the topic of dynamic fields, I am curious how you
 got the impression that they were a good technique to use as a starting
 point. They're fine for prototyping and hacking, and fine when used in
 moderation, but not when used to excess. The whole point of Solr is
 searching and searching is optimized within fields, not across fields, so
 having lots of dynamic fields is counter to the primary strengths of Lucene
 and Solr. And... schemas with lots  of dynamic fields tend to be difficult
 to maintain. For example, if you wanted to ask a support question here, one
 of the first things we want to know is what your schema looks like, but
 with lots of dynamic fields it is not possible to have a simple discussion
 of what your schema looks like.

 Sure, there is something called schemaless design (and Solr supports
 that in 4.4), but that's very different from heavy reliance on dynamic
 fields in the traditional sense. Schemaless design is A-OK, but using
 dynamic fields for arrays of data in a single document is a poor match
 for the search features of Solr (e.g., Edismax searching across multiple
 fields.)

 One other tidbit: Although Solr does not enforce naming conventions for
 field names, and you can put special characters in them, there are plenty
 of features in Solr, such as the common fl parameter, where field names
 are expected to adhere to Java naming rules. When people start going wild
 with dynamic fields, it is common that they start going wild with their
 names as well, using spaces, colons, slashes, etc. that cannot be parsed in
 the fl and qf parameters, for example. Please don't go there!

 In short, put up a small cluster and start doing a Proof of Concept
 cluster. Stay within my suggested guidelines and you should do okay.

 -- Jack Krupansky

 -Original Message- From: Marcelo Elias Del Valle
 Sent: Monday, July 08, 2013 9:46 AM
 To: solr-user@lucene.apache.org
 Subject: Solr limitations


 Hello everyone,

I am trying to search information about possible solr limitations I
 should consider in my architecture. Things like max number of dynamic
 fields, max number o documents in SolrCloud, etc.
Does anyone know where I can find this info?

 Best regards,
 --
 Marcelo Elias Del Valle
 http://mvalle.com - @mvallebr

Re: Calculating Solr document score by ignoring the boost field.

My guess is that you're not really passing on the boost field's value
and getting the default. Don't quite know how I'd track that down though

Best
Erick

On Tue, Jul 9, 2013 at 4:09 AM, imran khan imrankhan.x...@gmail.com wrote:
 Greetings,

 I am using nutch 2.x as my datasource for Solr 4.3.0. And nutch passes on
 its own boost field to my Solr schema

 field name=boost type=float stored=true indexed=false/

 Now due to some reason I always get boost = 0.0 and due to this my Solr's
 document score is also always 0.0.

 Is there any way in Solr that it ignores the boost field's value for its
 document's score calculation ?

 Regards,
 Khan

Re: Restrict/change numFound solr result

No, there's no good way to make Solr return
numFound=120 when there are 540 (or
whatever) records. Why do you care?
If you need to stop at 120, just stop at 120 and ignore
the numFound.

If you need to display the 120 to the end user even if there
are more docs, just do that.

Best
Erick

On Tue, Jul 9, 2013 at 2:33 AM, aniljayanti aniljaya...@yahoo.co.in wrote:
 Hi Erick,

 thanks for reply, I am doing the same thing already. But for paging
 calculation i am depending on numFound=120 value. That result i want
 .(result name=response numFound=120 start=0)

 thanks

 aniljayanti



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Restrict-change-numFound-solr-result-tp4075882p4076485.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Solr Live Nodes not updating immediately

2013-07-09 Thread Ranjith Venkatesan

Hi,

I am new to solr. Currently i m using Solr-4.3.0. I had setup a solrcloud
setup in 3 machines. If I kill a node running in any of the machine using
/kill -9/, status of the killed node is not updating immediately in web
console of solr. I takes hardly /20+ mins/ to mark that as Gone node. 

My question is

1. Why does it takes so much time to update the status of the inactive node.

2. And if the leader node itself is killed means, i cant able to use the
service till the status of the node gets updated.


Thanks in advance


Ranjith Venkatesan



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Live-Nodes-not-updating-immediately-tp4076560.html
Sent from the Solr - User mailing list archive at Nabble.com.

Document count mismatch

2013-07-09 Thread Furkan KAMACI

I've run a command to find term counts at my index:

solr/select/?q=*:*rows=0facet=onfacet.field=tenowt=xmlindent=on

it gives me a result like that:

...
result name=response numFound=3245092 start=0
maxScore=1.0/result
...
lst name=teno
int name=lev3107206/int
int name=tenu59821/int
...

when I sum that numbers(3107206 + 59821 + ...) I get: *3245074 *however *
numFound=3245092 *how it comes?

*PS:*  Returned list has 100 elements. Does Solr returns max 100 elements
 for such kind of situations?

Re: ClassNotFoundException regarding SolrInfoMBean under Tomcat 7

2013-07-09 Thread Michael Bakonyi

Am 05.07.2013 um 16:36 schrieb Shalin Shekhar Mangar:

Okay so just for the rest of the people who dig up this thread. You
had to put all the extra jar files required by typo3 into WEB-INF/lib
to make this work. Is that right?

Maybe this works aswell but I'd put it in a directory called lib within the
core's folder. That way it is loaded automatically, too, says the example
solrconfig.xml:

https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/example/solr/collection1/conf/solrconfig.xml

Cheers,
Michael

Am 05.07.2013 um 16:36 schrieb Shalin Shekhar Mangar:

Okay so just for the rest of the people who dig up this thread. You
had to put all the extra jar files required by typo3 into WEB-INF/lib
to make this work. Is that right?

On Fri, Jul 5, 2013 at 8:03 PM, Michael Bakonyi
kont...@mb-neuemedien.de wrote:
Hi Shalin,

Am 05.07.2013 um 16:23 schrieb Shalin Shekhar Mangar:
There are plenty of use-cases for having multiple cores. You may have
two different schemas for two different kind of documents. Perhaps you
are indexing content in multiple languages and you may want a core per
language. In SolrCloud, a node can have multiple cores to support more
than one shard on the same box.

alright, so it depends on the use case. I guess for me the different use
cases will be combinations of domain.tld and language. But for me this is
far future I think.

This is what I guessed, too. But I'm neither know Java or Tomcat nor Solr so
I tried everything I could.

It is very difficult to guess what's wrong with your setup this way.
Why don't you try using the example jetty? It works and is well
supported and optimized for Solr.

Giovanni's guess was right, so this error disappeared luckily.

Cheers,
Michael

Am 05.07.2013 um 16:23 schrieb Shalin Shekhar Mangar:

On Thu, Jul 4, 2013 at 4:32 PM, Michael Bakonyi
kont...@mb-neuemedien.de wrote:
Hi everyone,

I'm trying to get the CMS TYPO3 connected with Solr 3.6.2.

By now I followed the installation at
http://wiki.apache.org/solr/SolrTomcat except that I didn't copy the
.war-file into the $SOLR_HOME but referencing to it at a different
location via Tomcat Context fragment file.

Until then the Solr-Server works – I can reach the GUI via URL.

To get Solr connected with the CMS I then created a new core-folder (btw.
can anybody give me kind of a live example, when to use different cores?
Until now I still don't really understand the concept of cores ..) by
duplicating the example-folder in which I overwrote some files (especially
solrconfig.xml) with files offered by the TYPO3-community. I also moved
the file solr.xml one level up and edited it (added core-fragment and
especially adjusted instanceDir) to get a correct multicore-setup like
in the example multicore-setup within the downloaded solr-tgz-package.

There are plenty of use-cases for having multiple cores. You may have
two different schemas for two different kind of documents. Perhaps you
are indexing content in multiple languages and you may want a core per
language. In SolrCloud, a node can have multiple cores to support more
than one shard on the same box.

But now I get the Java-exception

java.lang.NoClassDefFoundError: org/apache/solr/core/SolrInfoMBean at
java.lang.ClassLoader.defineClass1(Native Method)

In the Tomcat-log file it is said additionally: Caused by:
java.lang.ClassNotFoundException: org.apache.solr.core.SolrInfoMBean.

My guess is, that within the new solrconfig.xml there are calls to classes
which aren't included correctly. There are some libs, which are included
at the top of this file but the paths of the references should be ok as I
checked them via Bash: At http://wiki.apache.org/solr/SolrConfigXml it is
said that the lib dir= directory is relative to the instanceDir, so this
is what I've checked. I also inserted absolute paths but this wasn't
successful either.

Can anybody give me a hint how to solve this problem? Would be great :)

The Solr war file has all the classes it needs to startup and run
(well except for some optional components like DataImportHandler etc)
and the SolrInfoMBean is most definitely present in the war file.
Enabling or disabling jmx has nothing to do with loading that class.
It is very difficult to guess what's wrong with your setup this way.
Why don't you try using the example jetty? It works and is well
supported and optimized for Solr.

--
Regards,
Shalin Shekhar Mangar.

Re: Phrase search without stopwords

2013-07-09 Thread It-forum


Hi

I solve it by copying the field in a string field type.

And query on this field only.

Regards

David

Le 09/07/2013 11:03, Parul Gupta(Knimbus) a écrit :

Hi solr-user!!!
I have an issue
I want to know that is it possible to implement StopwordFilterFactory with
KeywordTokenizer?
example I have multiple title :
1)title:Canadian journal of information and library science
2)title:Canadian information of  science
3)title:Southern  information and library science

what I want is if i search for
q=title:Canadian information of science
 OR
q=title:Canadian information science

My output should be only title no. 2,i.e Canadian information of  science.

my schema.xml is:
fieldType name=itext class=solr.TextField positionIncrementGap=100
analyzer
charFilter class=solr.MappingCharFilterFactory mapping=mapping.txt/
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.TrimFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=false/
filter class=solr.PatternReplaceFilterFactory pattern=([^a-z])
replacement=  replace=all /
/analyzer
/fieldType


field name=title type=itext indexed=true stored=true
required=false multiValued=false /


Then exact search is working but search without stopwords is not working and
if I use WhitespaceTokenizer instead of KeywordTokenizer then search without
stopwords is working but all the 3 title are coming as output.Please reply
ASAP.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Phrase-search-without-stopwords-tp4076527.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Field not available on Edimax query

2013-07-09 Thread It-forum


Any suggestion ?


Le 09/07/2013 12:29, It-forum a écrit :

Hello to all,

I load solr by data-import.

I add in db_data_config.xml inside the product entity the tag entity 
as follow :


|
|

 entity name=product_tags
query=select t.name as tags, id_product
FROM ps_product_tag as pt
JOIN ps_tag as t ON pt.id_tag 
=t.id_tag AND t.id_lang=2

WHERE id_product='${product.id_product}'
parentDeltaQuery=select id_product as id from 
ps_product where id_product=${product_features.id_product}

field column=tags name=tag /
/entity
/entitiy //main product entity close

shema.xml :
field name=tag type=text_fr indexed=true stored=true 
multiValued=true  /



When I use a comon select query I get the field tag and his values .

However when i use edimax query with the following details, I'm not 
able to retreive the field tag. And it seems that it is not taken in 
match score too.


The edimax qf parameters are :
qf=id^1.0 ref^9.0 name^6.0 descriptif^1.0 cat^7.0 brand^5.0 
fphonetic^5.0 tag^7.0 features^3.0

q.alt=*:*


Could you help me to understand why ?

Regards

David

Re: Document count mismatch

1. Try facet.missing=true to count the number of documents that do not have 
a value for that field.


2. Try facet.limit=n to set the number of returned facet values to a larger 
or smaller value than the default of 100.


3. Try reading the Faceting chapter of my book!

-- Jack Krupansky

-Original Message- 
From: Furkan KAMACI

Sent: Tuesday, July 09, 2013 8:09 AM
To: solr-user@lucene.apache.org
Subject: Document count mismatch

I've run a command to find term counts at my index:

solr/select/?q=*:*rows=0facet=onfacet.field=tenowt=xmlindent=on

it gives me a result like that:

...
result name=response numFound=3245092 start=0
maxScore=1.0/result
...
lst name=teno
int name=lev3107206/int
int name=tenu59821/int
...

when I sum that numbers(3107206 + 59821 + ...) I get: *3245074 *however *
numFound=3245092 *how it comes?

*PS:*  Returned list has 100 elements. Does Solr returns max 100 elements
for such kind of situations?

Re: Calculating Solr document score by ignoring the boost field.

2013-07-09 Thread Tony Mullins

I am passing boost value (via nutch) and i.e boost =0.0.
But my question is why Solr is showing me score = 0.0 when my boost (index
time boost) = 0.0 ?
Should not Solr calculate its documents score on the basis of TF-IDF ? And
if not how can I make Solr to only consider TF-IDF while calculating
document's score ?

Regards,
Khan


On Tue, Jul 9, 2013 at 4:46 PM, Erick Erickson erickerick...@gmail.comwrote:

 My guess is that you're not really passing on the boost field's value
 and getting the default. Don't quite know how I'd track that down
 though

 Best
 Erick

 On Tue, Jul 9, 2013 at 4:09 AM, imran khan imrankhan.x...@gmail.com
 wrote:
  Greetings,
 
  I am using nutch 2.x as my datasource for Solr 4.3.0. And nutch passes on
  its own boost field to my Solr schema
 
  field name=boost type=float stored=true indexed=false/
 
  Now due to some reason I always get boost = 0.0 and due to this my
 Solr's
  document score is also always 0.0.
 
  Is there any way in Solr that it ignores the boost field's value for
 its
  document's score calculation ?
 
  Regards,
  Khan

Re: Calculating Solr document score by ignoring the boost field.


Simple math: x times zero equals zero.

That's why the default document boost is 1.0 - score times 1.0 equals score.

Any particular reason you wanted to zero out the document score from the 
document level?


-- Jack Krupansky

-Original Message- 
From: Tony Mullins

Sent: Tuesday, July 09, 2013 9:23 AM
To: solr-user@lucene.apache.org
Subject: Re: Calculating Solr document score by ignoring the  field.

I am passing boost value (via nutch) and i.e boost =0.0.
But my question is why Solr is showing me score = 0.0 when my boost (index
time boost) = 0.0 ?
Should not Solr calculate its documents score on the basis of TF-IDF ? And
if not how can I make Solr to only consider TF-IDF while calculating
document's score ?

Regards,
Khan


On Tue, Jul 9, 2013 at 4:46 PM, Erick Erickson 
erickerick...@gmail.comwrote:



My guess is that you're not really passing on the boost field's value
and getting the default. Don't quite know how I'd track that down
though

Best
Erick

On Tue, Jul 9, 2013 at 4:09 AM, imran khan imrankhan.x...@gmail.com
wrote:
 Greetings,

 I am using nutch 2.x as my datasource for Solr 4.3.0. And nutch passes 
 on

 its own boost field to my Solr schema

 field name=boost type=float stored=true indexed=false/

 Now due to some reason I always get boost = 0.0 and due to this my
Solr's
 document score is also always 0.0.

 Is there any way in Solr that it ignores the boost field's value for
its
 document's score calculation ?

 Regards,
 Khan

Re: Document count mismatch

2013-07-09 Thread Furkan KAMACI

Ok, one more question. I have another field at my schema: *url*. How can I
get urls at each facet?

2013/7/9 Jack Krupansky j...@basetechnology.com

 1. Try facet.missing=true to count the number of documents that do not
 have a value for that field.

 2. Try facet.limit=n to set the number of returned facet values to a
 larger or smaller value than the default of 100.

 3. Try reading the Faceting chapter of my book!

 -- Jack Krupansky

 -Original Message- From: Furkan KAMACI
 Sent: Tuesday, July 09, 2013 8:09 AM
 To: solr-user@lucene.apache.org
 Subject: Document count mismatch


 I've run a command to find term counts at my index:

 solr/select/?q=*:*rows=0facet=onfacet.field=tenowt=xmlindent=on

 it gives me a result like that:

 ...
 result name=response numFound=3245092 start=0
 maxScore=1.0/result
 ...
 lst name=teno
 int name=lev3107206/int
 int name=tenu59821/int
 ...

 when I sum that numbers(3107206 + 59821 + ...) I get: *3245074 *however *
 numFound=3245092 *how it comes?

 *PS:*  Returned list has 100 elements. Does Solr returns max 100 elements

 for such kind of situations?

Re: two types of answers in my query

Usually a car term and a car part term will look radically different. So, 
simply use the edismax query parser and set qf to be both the car and car 
part fields. If either matches, the document will be selected. And if you 
have a type field, you can check that to see if a car or part was matched 
in the results.


-- Jack Krupansky

-Original Message- 
From: Mysurf Mail

Sent: Tuesday, July 09, 2013 2:38 AM
To: solr-user@lucene.apache.org
Subject: two types of answers in my query

Hi,
A general question:


Let's say I have Car And CarParts 1:n relation.

And I have discovered that the user had entered in the search field instead
of car name - a part serial number (SKU).
(I discovered it useing regex)

Is there a way to fetch different types of answers in Solr?
Is there a way to fetch mixed types in the answers?
Is there something similiar to that and how is that feature called?

Thank you.

Re: Document count mismatch

I don't quite follow the question. Give us an example.

-- Jack Krupansky

-Original Message- 
From: Furkan KAMACI 
Sent: Tuesday, July 09, 2013 9:37 AM 
To: solr-user@lucene.apache.org 
Subject: Re: Document count mismatch 

Ok, one more question. I have another field at my schema: *url*. How can I
get urls at each facet?

2013/7/9 Jack Krupansky j...@basetechnology.com

1. Try facet.missing=true to count the number of documents that do not
have a value for that field.

2. Try facet.limit=n to set the number of returned facet values to a
larger or smaller value than the default of 100.

3. Try reading the Faceting chapter of my book!

-- Jack Krupansky

-Original Message- From: Furkan KAMACI
Sent: Tuesday, July 09, 2013 8:09 AM
To: solr-user@lucene.apache.org
Subject: Document count mismatch

I've run a command to find term counts at my index:

solr/select/?q=*:*rows=0facet=onfacet.field=tenowt=xmlindent=on

it gives me a result like that:

...
result name=response numFound=3245092 start=0
maxScore=1.0/result
...
lst name=teno
int name=lev3107206/int
int name=tenu59821/int
...

when I sum that numbers(3107206 + 59821 + ...) I get: *3245074 *however *
numFound=3245092 *how it comes?

*PS:*  Returned list has 100 elements. Does Solr returns max 100 elements

for such kind of situations?

Re: Phrase search without stopwords

2013-07-09 Thread Parul Gupta(Knimbus)

Hey thanks.


Its some what works for me





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Phrase-search-without-stopwords-tp4076527p4076598.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Document count mismatch

2013-07-09 Thread Furkan KAMACI

I've another field at my schema: it is *url*. When I get results as facet
I see that there are 3107206 numbers of *lev* (int
name=lev3107206/int). However what are the urls of that 3107206
documents? I tried grouping instead of facet:

/solr/select/?q=*:*group=truegroup.field=langwt=xmlfl=url

and I get only one result for each group. I want to get all of them. on the
other hand if I change my query into that:

/solr/select/?q=*:*group=truegroup.field=langwt=xmlfl=url*
group.query=teno:lev*
*
*
I get that error:*
*

str name=msgshard 0 did not set sort field values (FieldDoc.fields is
null); you must pass fillFields=true to IndexSearcher.search on each
shard/strstr name=tracejava.lang.IllegalArgumentException: shard 0
did not set sort field values (FieldDoc.fields is null); you must pass
fillFields=true to IndexSearcher.search on each shard
at org.apache.lucene.search.TopDocs$MergeSortQueue.init(TopDocs.java:143)
at org.apache.lucene.search.TopDocs.merge(TopDocs.java:214)
...





2013/7/9 Jack Krupansky j...@basetechnology.com

 I don't quite follow the question. Give us an example.

 -- Jack Krupansky

 -Original Message- From: Furkan KAMACI Sent: Tuesday, July 09,
 2013 9:37 AM To: solr-user@lucene.apache.org Subject: Re: Document count
 mismatch
 Ok, one more question. I have another field at my schema: *url*. How can I

 get urls at each facet?

 2013/7/9 Jack Krupansky j...@basetechnology.com

  1. Try facet.missing=true to count the number of documents that do not
 have a value for that field.

 2. Try facet.limit=n to set the number of returned facet values to a
 larger or smaller value than the default of 100.

 3. Try reading the Faceting chapter of my book!

 -- Jack Krupansky

 -Original Message- From: Furkan KAMACI
 Sent: Tuesday, July 09, 2013 8:09 AM
 To: solr-user@lucene.apache.org
 Subject: Document count mismatch


 I've run a command to find term counts at my index:

 solr/select/?q=*:*rows=0facet=onfacet.field=tenowt=xmlindent=on

 it gives me a result like that:

 ...
 result name=response numFound=3245092 start=0
 maxScore=1.0/result
 ...
 lst name=teno
 int name=lev3107206/int
 int name=tenu59821/int
 ...

 when I sum that numbers(3107206 + 59821 + ...) I get: *3245074 *however *
 numFound=3245092 *how it comes?

 *PS:*  Returned list has 100 elements. Does Solr returns max 100 elements

 for such kind of situations?

Re: Best way to call asynchronously - Custom data import handler

On 7/8/2013 11:10 PM, Learner wrote:
 
 I wrote a custom data import handler to import data from files. I am trying
 to figure out a way to make asynchronous call instead of waiting for the
 data import response. Is there an easy way to invoke asynchronously  (other
 than using futures and callables) ?
 
 public class CustomFileImportHandler extends RequestHandlerBase implements
 SolrCoreAware{
   public void handleRequestBody(SolrQueryRequest arg0, SolrQueryResponse
 arg1){
indexer a= new indexer(); // constructor
String status= a.Index(); // method to do indexing, trying to make it
 async
 }
 }

Generally speaking, it's easier to write a separate program than write a
Solr plugin, unless you just want to add a tiny tweak to an existing
class and not make fundamental changes in how it works.  The dataimport
handler is designed around a model of starting and frequently checking
the status to know whether it's done.

For what you want to do, I'd write a subroutine, module, or a separate
program using a Solr API for your language that obtains the data from
the source and indexes it to Solr directly.  This is definitely the
preferred method if your code is written in Java, but it's generally the
right way to go no matter what language you're using.

Thanks,
Shawn

Re: Solr Live Nodes not updating immediately

2013-07-09 Thread Mark Miller

Something is wrong if it actually takes 20 minutes.


- Mark

On Jul 9, 2013, at 7:43 AM, Ranjith Venkatesan ranjit...@zohocorp.com wrote:

 Hi,
 
 I am new to solr. Currently i m using Solr-4.3.0. I had setup a solrcloud
 setup in 3 machines. If I kill a node running in any of the machine using
 /kill -9/, status of the killed node is not updating immediately in web
 console of solr. I takes hardly /20+ mins/ to mark that as Gone node. 
 
 My question is
 
 1. Why does it takes so much time to update the status of the inactive node.
 
 2. And if the leader node itself is killed means, i cant able to use the
 service till the status of the node gets updated.
 
 
 Thanks in advance
 
 
 Ranjith Venkatesan
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-Live-Nodes-not-updating-immediately-tp4076560.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Is there an easy way to know if a Solr cloud node is a shard leader?

2013-07-09 Thread Robert Stewart

I would like to be able to do it without consulting Zookeeper. Is there some 
variable or API I can call on a specific Solr cloud node to know if it is 
currently a shard leader?  The reason I want to know is I want to perform index 
backup on the shard leader from a cron job *only* if that node is a shard 
leader.

Bob

Re: Solr Live Nodes not updating immediately

On 7/9/2013 5:43 AM, Ranjith Venkatesan wrote:
 I am new to solr. Currently i m using Solr-4.3.0. I had setup a solrcloud
 setup in 3 machines. If I kill a node running in any of the machine using
 /kill -9/, status of the killed node is not updating immediately in web
 console of solr. I takes hardly /20+ mins/ to mark that as Gone node. 
 
 My question is
 
 1. Why does it takes so much time to update the status of the inactive node.
 
 2. And if the leader node itself is killed means, i cant able to use the
 service till the status of the node gets updated.

As Mark said, something is very wrong if it takes 20 minutes for the
cloud state to update.

I'm wondering why you have done a kill -9 to stop Solr?  If running a
stop command (or a standard SIGTERM) doesn't properly shut the process
down, then you may have some other underlying operating system issue
that needs to be solved, and could be causing the node status problem.

Thanks,
Shawn

Re: Solr Live Nodes not updating immediately

2013-07-09 Thread Ranjith Venkatesan

The same scenario happens if network to any one of the machine is
unavailable. (i.e if we manually disconnect network cable also, status of
the node not gets updated immediately).

Pls help me in this issue



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Live-Nodes-not-updating-immediately-tp4076560p4076621.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Live Nodes not updating immediately

2013-07-09 Thread Ranjith Venkatesan

We are going to use solr in production. There are chances that the machine
itself might shutdown due to power failure or the network is disconnected
due to manual intervention. We need to address those cases as well to build
a robust system..



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Live-Nodes-not-updating-immediately-tp4076560p4076633.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Live Nodes not updating immediately

 We are going to use solr in production. There are chances that the machine
 itself might shutdown due to power failure or the network is disconnected
 due to manual intervention. We need to address those cases as well to
 build
 a robust system..

The latest version of Solr is 4.3.1, and 4.4 is right around the corner.
Any chance you can test a nightly 4.4 build or a checkout of the
lucene_solr_4_4 branch,ji so we can know whether you are running into the
same problems with what will be released soon? No sense in fixing a
problem that no longer exists.

Thanks,
Shawn

Re: Field not available on Edimax query

2013-07-09 Thread Alexandre Rafalovitch

On Tue, Jul 9, 2013 at 6:29 AM, It-forum it-fo...@meseo.fr wrote:

 However when i use edimax query with the following details, I'm not able
 to retreive the field tag. And it seems that it is not taken in match
 score too.


You seem to have two problems here. One not matching (use debug flags for
that) and one not retrieving. But what do you mean by not retrieving? By
default all the fields are returned regardless of the query. So if you are
getting it in one but not in another you might be either getting different
documents without that field populated or you have explicitly mis-defined
which fields to return (with 'fl' parameter).

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)

Solr Hangs During Updates for over 10 minutes

2013-07-09 Thread Jed Glazner

I'll give you the high level before delving deep into setup etc. I have been 
struggeling at work with a seemingly random problem when solr will hang for 
10-15 minutes during updates.  This outage always seems to immediately be 
proceeded by an EOF exception on  the replica.  Then 10-15 minutes later we see 
an exception on the leader for a socket timeout to the replica.  The leader 
will then tell the replica to recover which in most cases it does and then the 
outage is over.

Here are the setup details:

We are currently using Solr 4.0.0 with an external ZK ensemble of 5 machines. 
We have 2 active collections each with only 1 shard (we have in total about 15 
collections but most are empty or have less than 100 docs). The first index 
(collection1) is 6.5GB and has ~18M documents.  The 2nd index (collection2) is 
9GB and has about 13M documents. In all cases the leader resides on 1 server 
and the replica resides on the other.  Both servers are AWS XL High Mem 
instances. (8 CPUs @ 2.67Ghz, 70GB Ram) with the index residing on a 1TB raid 
10 using ephemeral storage disks.  We are starting solr using the embedded 
jetty with the following java memory and GC options:

-Xmx16382m -Xms4092m -XX:MaxPermSize=256m -Xss256k -XX:NewSize=1536m 
-XX:SurvivorRatio=16 -XX:+DisableExplicitGC -XX:+UseConcMarkSweepGC 
-XX:ParallelCMSThreads=2 -XX:+CMSClassUnloadingEnabled 
-XX:+UseCMSCompactAtFullCollection -XX:CMSInitiatingOccupancyFraction=80 
-XX:+CMSParallelRemarkEnabled

Both collections receive a constant stream of updates ~10k per hour (both 
adds/deletes).  Approximately once per day the following events transpire:


 1.  We see a log entry for a distributed update that takes just over 5 ms 
followed by an EOF write exception on the replica. In all cases this exception 
is triggered by an update to the 9GB collection.
 2.  Occasionally we'll see a 503 shard update error on the leader but usually 
not.
 3.  Approximately 15 minutes after this exception we see a timeout error for a 
this distributed update request on the leader.
 4.  The leader then creates a new connection and tells the replica to recover, 
which it does and everything is OK again.
 5.  During the 15 minute window from when the replica throws the EOF until the 
SocketTimeout by the leader no other updates are processed:

ERROR ON REPLICA:

Jul 8, 2013 6:38:16 PM org.apache.solr.core.SolrCore execute
INFO: [collection2_0] webapp=/solr path=/update 
params={distrib.from=http://Solr4-1-1.domain.com:8983/solr/collection2_0/update.distrib=FROMLEADERwt=javabinversion=2}
 status=0 QTime=50012

Jul 8, 2013 6:38:16 PM org.apache.solr.common.SolrException log
SEVERE: null:org.eclipse.jetty.io.EofException
at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:154)
at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:101)
at 
org.apache.solr.common.util.FastOutputStream.flush(FastOutputStream.java:203)
at 
org.apache.solr.common.util.FastOutputStream.flushBuffer(FastOutputStream.java:196)
at 
org.apache.solr.common.util.JavaBinCodec.marshal(JavaBinCodec.java:94)
at 
org.apache.solr.response.BinaryResponseWriter.write(BinaryResponseWriter.java:49)
at 
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:404)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:289)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111)
at org.eclipse.jetty.server.Server.handle(Server.java:351)
at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47)
at

Re: Is there an easy way to know if a Solr cloud node is a shard leader?

2013-07-09 Thread Mark Miller

If you call /solr/zookeeper on a specific node, that servlet would tell you - 
output is a bit verbose for what you want though.

- Mark

On Jul 9, 2013, at 10:36 AM, Robert Stewart robert_stew...@epam.com wrote:

 I would like to be able to do it without consulting Zookeeper. Is there some 
 variable or API I can call on a specific Solr cloud node to know if it is 
 currently a shard leader?  The reason I want to know is I want to perform 
 index backup on the shard leader from a cron job *only* if that node is a 
 shard leader.
 
 Bob

Re: Solr Hangs During Updates for over 10 minutes


On 7/9/2013 9:50 AM, Jed Glazner wrote:

I'll give you the high level before delving deep into setup etc. I have been 
struggeling at work with a seemingly random problem when solr will hang for 
10-15 minutes during updates.  This outage always seems to immediately be 
proceeded by an EOF exception on  the replica.  Then 10-15 minutes later we see 
an exception on the leader for a socket timeout to the replica.  The leader 
will then tell the replica to recover which in most cases it does and then the 
outage is over.

Here are the setup details:

We are currently using Solr 4.0.0 with an external ZK ensemble of 5 machines.


After 4.0.0 was released, a *lot* of problems with SolrCloud surfaced 
and have since been fixed.  You're five releases and about nine months 
behind what's current.  My recommendation: Upgrade to 4.3.1, ensure your 
configuration is up to date with changes to the example config between 
4.0.0 and 4.3.1, and reindex.  Ideally, you should set up a 4.0.0 
testbed, duplicate your current problem, and upgrade the testbed to see 
if the problem goes away.  A testbed will also give you practice for a 
smooth upgrade of your production system.


Thanks,
Shawn

Staggered Replication In Solr?

2013-07-09 Thread adityab

Hi, 
Is staggered replication possible in Solr through configuration?

We are concern with the CPU spike (80%) and GC pauses on all the slaves when
they try to replicate updated index from repeaters. We havent observed this
behavior in v3.5 (Max spike were 50% during replication)
In our case we have 8 slaves serving the traffic, and all start replicating
the new index at the same time. When the switch for Reader happens after
warm-up we see a spike in CPU and at the same time GC pause which causes
request on our application to queue and eventually fails. 

It would be good to have a throttle on master/repeater for max replication
request to serve at a given time.

Planning to write a script and schedule it which will trigger replication in
a staggered fashion so not all slaves are busy replicating. 

thanks
Aditya 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Staggered-Replication-In-Solr-tp4076659.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Staggered Replication In Solr?


On 7/9/2013 10:37 AM, adityab wrote:

Is staggered replication possible in Solr through configuration?


You wouldn't be able to do this directly without switching to completely 
manually triggered replication, but the concept of a repeater may 
interest you.


http://wiki.apache.org/solr/SolrReplication#Setting_up_a_Repeater

You set up a limited number of slaves replicating from your master. 
Those slaves get also set up as masters, and the rest of your slaves 
replicate from those, instead of the true master.  When the index gets 
updated, the repeaters do their replication, then the other slaves 
replicate from the repeaters.


Thanks,
Shawn

Re: dataDir not being stored in solr.xml

2013-07-09 Thread Chris Collins

Thanks Erick  I made a private patch to the CoreContainer until the real deal.

C
On Jul 9, 2013, at 4:35 AM, Erick Erickson erickerick...@gmail.com wrote:

 There's been a lot of action around this recently, this is
 a known issue in 4.3.1.
 
 The short form is it should all be better in Solr 4.4 which
 may be out in the next couple of weeks, assuming we
 can get agreement.
 
 But look at Solr-4862, 4910, 4982 and related if you want
 to see the ugly details.
 
 Best
 Erick
 
 On Tue, Jul 9, 2013 at 3:50 AM, Chris Collins ch...@geekychris.com wrote:
 I am migrating from solr 3.6 to 4.3.1.  Using the core create rest call, 
 something like:
 

 http://10.1.10.150:8090/solr/admin/cores?action=CREATEname=fooinstanceDir=/home/solrdata/foopersist=truewt=jsondataDir=/home/solrdata/foo
 
 I am able to add data to the index it creates within the /home/solrdata/foo 
 directory and search it.  The solr config however does not contain the 
 dataDir path.  When the process is restarted the dataDir is set to 
 /home/solrdata and not /home/solrdata/foo.
 
 Now if I create the index, index some docs, stop the process, manually edit 
 the solr.xml  to include dataDir search works.
 
 
 I am not sure but it seems that in the following class dataDir is not 
 persisted in a case that looks like it is work in progress for solr 5.0.
 
CoreContainer.addPersistOneCore
 
 
 I also played with passing properties in the create args of the form:
 
property.dataDir=/home/solrdata/foo
 
 That didnt seem to help but I may not be understanding the exact property 
 syntax.
 
 Any clues?
 
 Cheers
 
 C

Re: Best way to call asynchronously - Custom data import handler

2013-07-09 Thread Roman Chyla

Other than using futures and callables? Runnables ;-) Other than that you
will need async request (ie. client).

But in case sb else is looking for an easy-recipe for the server-side async:


public void handleRequestBody(.) {
   if (isBusy()) {
rsp.add(message, Batch processing is already running...);
 rsp.add(status, busy);
return;
  }
   runAsynchronously(new LocalSolrQueryRequest(req.getCore(),
req.getParams()));
}
private void runAsynchronously(SolrQueryRequest req) {

final SolrQueryRequest request = req;
 thread = new Thread(new Runnable() {
public void run() {
try {
 while (queue.hasMore()) {
runSynchronously(queue, request);
}
 } catch (Exception e) {
log.error(e.getLocalizedMessage());
} finally {
 request.close();
setBusy(false);
}
 }
});

thread.start();
}


On Tue, Jul 9, 2013 at 1:10 AM, Learner bbar...@gmail.com wrote:


 I wrote a custom data import handler to import data from files. I am trying
 to figure out a way to make asynchronous call instead of waiting for the
 data import response. Is there an easy way to invoke asynchronously  (other
 than using futures and callables) ?

 public class CustomFileImportHandler extends RequestHandlerBase implements
 SolrCoreAware{
 public void handleRequestBody(SolrQueryRequest arg0,
 SolrQueryResponse
 arg1){
indexer a= new indexer(); // constructor
String status= a.Index(); // method to do indexing, trying to make
 it
 async
 }
 }




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Best-way-to-call-asynchronously-Custom-data-import-handler-tp4076475.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Perl Solr help - doing : query


This is primarily to Andy Lester, who wrote the WebService::Solr module
on CPAN, but I'll take a response from anyone who knows what I can do.

If I use the following Perl code, I get an error.  If I try to build
some other query besides *:* to request all documents, the script runs,
but the query doesn't do what I asked it to do.

http://apaste.info/3j3Q

How can I use a perl script with a proper Solr API to count the number
of documents in my Solr index?

I already have a version of my script that parses a JSON response as
plain text, but as I have just learned, it's possible to get invalid
information out of it.  Specifically, the shards.info output has
multiple numFound instances in it, which broke my script.  The 
shards.info parameter is in the request handler defaults.  I'd like to

future-proof it by using actual objects.

Thanks,
Shawn

Re: Perl Solr help - doing : query

2013-07-09 Thread Andy Lester


On Jul 9, 2013, at 2:48 PM, Shawn Heisey s...@elyograg.org wrote:

 This is primarily to Andy Lester, who wrote the WebService::Solr module
 on CPAN, but I'll take a response from anyone who knows what I can do.
 
 If I use the following Perl code, I get an error.

What error do you get?  Never say I get an error.  Always say I get this 
error: .

  If I try to build
 some other query besides *:* to request all documents, the script runs,
 but the query doesn't do what I asked it to do.

What DOES it do?


 http://apaste.info/3j3Q

For the sake of future readers, please put your code in the message.  This 
message will get archived, and future people reading the lists will not be able 
to read the code at some arbitrary paste site.

Shawn's code is:

use strict;
use WebService::Solr;
use WebService::Solr::Query;
use WebService::Solr::Response;



my $url = http://idx.REDACTED.com:8984/solr/ncmain;;
my $solr = WebService::Solr-new($url);
my $query = WebService::Solr::Query-new(*:*);
my $response = $solr-search($query, {'rows' = '0'});
my $numFound = $response-content-{response}-{numFound};

print nf: $numFound\n;


xoa

--
Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance

replication getting stuck on a file

2013-07-09 Thread Petersen, Robert

Hi 

My solr 3.6.1 slave farm is suddenly getting stuck during replication.  It 
seems to stop on a random file on various slaves (not all) and not continue.  
I've tried stoping and restarting tomcat etc but some slaves just can't get the 
index pulled down.  Note there is plenty of space on the hard drive.  I don't 
get it.  Everything else seems fine.  Does this ring a bell for anyone?  I have 
the slaves set for five minute polling intervals.

Here is what I see in admin page, it just stays on that one file and won't get 
past it while the speed steadily averages down to 0kbs:

Master   http://ssbuyma01:8983/solr/1/replication
Latest Index Version:null, Generation: null
Replicatable Index Version:1276893670111, Generation: 127205
Poll Interval00:05:00
Local Index  Index Version: 1276893670084, Generation: 127202
Location: /var/LucidWorks/lucidworks/solr/1/data/index
Size: 23.06 GB
Times Replicated Since Startup: 48903
Previous Replication Done At: Tue Jul 09 12:55:01 EDT 2013
Config Files Replicated At: null
Config Files Replicated: null
Times Config Files Replicated Since Startup: null
Next Replication Cycle At: Tue Jul 09 13:00:00 EDT 2013
Current Replication Status   Start Time: Tue Jul 09 12:55:00 EDT 2013
Files Downloaded: 59 / 486
Downloaded: 88.73 MB / 23.06 GB [0.0%]
Downloading File: _34mt.fnm, Downloaded: 1.35 MB / 1.35 MB [100.0%]
Time Elapsed: 691s, Estimated Time Remaining: 183204s, Speed: 131.49 KB/s


Robert (Robi) Petersen
Senior Software Engineer
Search Department

AW: Solr Hangs During Updates for over 10 minutes

2013-07-09 Thread Jed Glazner

Hi Shawn,

I have been trying to duplicate this problem without success for the last 2 
weeks which is one reason I'm getting flustered.   It seems reasonable to be 
able to duplicate it but I can't.

 We do have a story to upgrade but that is still weeks if not months before 
that gets rolled out to production.

We have another cluster running the same version but with 8 shards and 8 
replicas with each shard at 100gb and more load and more indexing requests 
without this problem but we send docs in batches here and all fields are 
stored.   Where as the trouble index has only 1 or 2 stored fields and only 
send docs 1 at a time.

Could that have anything to do with it?

Jed


Von Samsung Mobile gesendet



 Ursprüngliche Nachricht 
Von: Shawn Heisey s...@elyograg.org
Datum: 07.09.2013 18:33 (GMT+01:00)
An: solr-user@lucene.apache.org
Betreff: Re: Solr Hangs During Updates for over 10 minutes


On 7/9/2013 9:50 AM, Jed Glazner wrote:
 I'll give you the high level before delving deep into setup etc. I have been 
 struggeling at work with a seemingly random problem when solr will hang for 
 10-15 minutes during updates.  This outage always seems to immediately be 
 proceeded by an EOF exception on  the replica.  Then 10-15 minutes later we 
 see an exception on the leader for a socket timeout to the replica.  The 
 leader will then tell the replica to recover which in most cases it does and 
 then the outage is over.

 Here are the setup details:

 We are currently using Solr 4.0.0 with an external ZK ensemble of 5 machines.

After 4.0.0 was released, a *lot* of problems with SolrCloud surfaced
and have since been fixed.  You're five releases and about nine months
behind what's current.  My recommendation: Upgrade to 4.3.1, ensure your
configuration is up to date with changes to the example config between
4.0.0 and 4.3.1, and reindex.  Ideally, you should set up a 4.0.0
testbed, duplicate your current problem, and upgrade the testbed to see
if the problem goes away.  A testbed will also give you practice for a
smooth upgrade of your production system.

Thanks,
Shawn

Re: Perl Solr help - doing : query


On 7/9/2013 2:02 PM, Andy Lester wrote:
What error do you get? Never say I get an error. Always say I get 
this error: . 


This is the actual error when trying *:* :

Can't locate object method _struct_ via package 
WebService::Solr::Query at 
/usr/local/share/perl/5.14.2/WebService/Solr/Query.pm line 37.



  If I try to build
some other query besides *:* to request all documents, the script runs,
but the query doesn't do what I asked it to do.

What DOES it do?


If I change the query line to this:

my $query = WebService::Solr::Query-new({tag_id = '[* TO *]'});

With this, numFound is zero.  The tag_id field is my uniqueKey, and is a 
StrField.  When I use Dumper to print out the actual response from this 
query, it contains the following info:


'q' = '(tag_id:\\[\\* TO \\*\\])',

I didn't ask for a phrase search (the quotes) or for escaping on the 
special query characters.  By automatically doing this, it makes complex 
queries like ranges impossible.  Is there something else that should be 
done for more complex queries?


Thanks,
Shawn

Deleted Docs

2013-07-09 Thread Katie McCorkell

Hello,

I am curious about the Deleted Docs: statistic on the solr/#/collection1
Overview page. Does Solr remove docs while indexing? I thought it only did
that when Optimizing, however my instance had 726 Deleted Docs, but then
after adding some documents that number decreased, eventually to 18 Deleted
Docs.

I understood these Deleted Docs are from situations where two docs have the
same UniqueKey. However my data had way more deleted docs than I expected.
I was using a data-generated uniquekey, when I changed to using the UUID
generator there were 0 deleted docs. But I just wanted to double check, are
there any other cases which would create a Deleted Doc?

Thanks so much!! :)
Katie

Re: Deleted Docs

Solr (Lucene, actually) will be doing segment merge operations in the 
background, continually, so generally you won't need to do optimize 
operations.


Generally, an explicit delete and a replace of an existing document are the 
only two ways that you would get a deleted document.


-- Jack Krupansky

-Original Message- 
From: Katie McCorkell

Sent: Tuesday, July 09, 2013 5:38 PM
To: solr-user@lucene.apache.org
Subject: Deleted Docs

Hello,

I am curious about the Deleted Docs: statistic on the solr/#/collection1
Overview page. Does Solr remove docs while indexing? I thought it only did
that when Optimizing, however my instance had 726 Deleted Docs, but then
after adding some documents that number decreased, eventually to 18 Deleted
Docs.

I understood these Deleted Docs are from situations where two docs have the
same UniqueKey. However my data had way more deleted docs than I expected.
I was using a data-generated uniquekey, when I changed to using the UUID
generator there were 0 deleted docs. But I just wanted to double check, are
there any other cases which would create a Deleted Doc?

Thanks so much!! :)
Katie

Re: Deleted Docs


On 7/9/2013 3:38 PM, Katie McCorkell wrote:

I am curious about the Deleted Docs: statistic on the solr/#/collection1
Overview page. Does Solr remove docs while indexing? I thought it only did
that when Optimizing, however my instance had 726 Deleted Docs, but then
after adding some documents that number decreased, eventually to 18 Deleted
Docs.

I understood these Deleted Docs are from situations where two docs have the
same UniqueKey. However my data had way more deleted docs than I expected.
I was using a data-generated uniquekey, when I changed to using the UUID
generator there were 0 deleted docs. But I just wanted to double check, are
there any other cases which would create a Deleted Doc?


Changes to deleted documents can happen through normal segment merging. 
 Optimizing is just an explicit and deliberate merge down to a single 
segment, but segment merging is a normal part of Solr/Lucene indexing. 
Any deleted documents in segments that get merged will be purged.


I believe the UUID generator will always generate a new value even if a 
document with the same information in the other fields is indexed again. 
 This option should only be used if you do not have an existing field 
with unique values on every document.


Thanks,
Shawn

RE: replication getting stuck on a file

2013-07-09 Thread Petersen, Robert

Look at the speed and time remaining on this one, pretty funny:


Master   http://ssbuyma01:8983/solr/1/replication
Latest Index Version:null, Generation: null
Replicatable Index Version:1276893670202, Generation: 127213
Poll Interval00:05:00
Local Index  Index Version: 1276893670108, Generation: 127204
Location: /var/LucidWorks/lucidworks/solr/1/data/index
Size: 23.13 GB
Times Replicated Since Startup: 48874
Previous Replication Done At: Tue Jul 09 13:12:05 PDT 2013
Config Files Replicated At: null
Config Files Replicated: null
Times Config Files Replicated Since Startup: null
Next Replication Cycle At: Tue Jul 09 13:17:04 PDT 2013
Current Replication Status   Start Time: Tue Jul 09 13:12:04 PDT 2013
Files Downloaded: 10 / 538
Downloaded: 1.67 MB / 23.13 GB [0.0%]
Downloading File: _34n2.prx, Downloaded: 140 bytes / 140 bytes [100.0%]
Time Elapsed: 6203s, Estimated Time Remaining: 88091277s, Speed: 281 bytes/s


-Original Message-
From: Petersen, Robert [mailto:robert.peter...@mail.rakuten.com] 
Sent: Tuesday, July 09, 2013 1:22 PM
To: solr-user@lucene.apache.org
Subject: replication getting stuck on a file

Hi 

My solr 3.6.1 slave farm is suddenly getting stuck during replication.  It 
seems to stop on a random file on various slaves (not all) and not continue.  
I've tried stoping and restarting tomcat etc but some slaves just can't get the 
index pulled down.  Note there is plenty of space on the hard drive.  I don't 
get it.  Everything else seems fine.  Does this ring a bell for anyone?  I have 
the slaves set for five minute polling intervals.

Here is what I see in admin page, it just stays on that one file and won't get 
past it while the speed steadily averages down to 0kbs:

Master   http://ssbuyma01:8983/solr/1/replication
Latest Index Version:null, Generation: null Replicatable Index 
Version:1276893670111, Generation: 127205
Poll Interval00:05:00
Local Index  Index Version: 1276893670084, Generation: 127202
Location: /var/LucidWorks/lucidworks/solr/1/data/index
Size: 23.06 GB
Times Replicated Since Startup: 48903
Previous Replication Done At: Tue Jul 09 12:55:01 EDT 2013 Config Files 
Replicated At: null Config Files Replicated: null Times Config Files Replicated 
Since Startup: null Next Replication Cycle At: Tue Jul 09 13:00:00 EDT 2013
Current Replication Status   Start Time: Tue Jul 09 12:55:00 EDT 2013
Files Downloaded: 59 / 486
Downloaded: 88.73 MB / 23.06 GB [0.0%]
Downloading File: _34mt.fnm, Downloaded: 1.35 MB / 1.35 MB [100.0%] Time 
Elapsed: 691s, Estimated Time Remaining: 183204s, Speed: 131.49 KB/s


Robert (Robi) Petersen
Senior Software Engineer
Search Department

Re: Solr Hangs During Updates for over 10 minutes

2013-07-09 Thread Otis Gospodnetic

Hi Jed,

This is really with Solr 4.0?  If so, it may be wiser to jump on 4.4
that is about to be released.  We did not have fun working with 4.0 in
SolrCloud mode a few months ago.  You will save time, hair, and money
if you convince your manager to let you use Solr 4.4. :)

Otis
--
Solr  ElasticSearch Support -- http://sematext.com/
Performance Monitoring -- http://sematext.com/spm



On Tue, Jul 9, 2013 at 4:44 PM, Jed Glazner jglaz...@adobe.com wrote:
 Hi Shawn,

 I have been trying to duplicate this problem without success for the last 2 
 weeks which is one reason I'm getting flustered.   It seems reasonable to be 
 able to duplicate it but I can't.

  We do have a story to upgrade but that is still weeks if not months before 
 that gets rolled out to production.

 We have another cluster running the same version but with 8 shards and 8 
 replicas with each shard at 100gb and more load and more indexing requests 
 without this problem but we send docs in batches here and all fields are 
 stored.   Where as the trouble index has only 1 or 2 stored fields and only 
 send docs 1 at a time.

 Could that have anything to do with it?

 Jed


 Von Samsung Mobile gesendet



  Ursprüngliche Nachricht 
 Von: Shawn Heisey s...@elyograg.org
 Datum: 07.09.2013 18:33 (GMT+01:00)
 An: solr-user@lucene.apache.org
 Betreff: Re: Solr Hangs During Updates for over 10 minutes


 On 7/9/2013 9:50 AM, Jed Glazner wrote:
 I'll give you the high level before delving deep into setup etc. I have been 
 struggeling at work with a seemingly random problem when solr will hang for 
 10-15 minutes during updates.  This outage always seems to immediately be 
 proceeded by an EOF exception on  the replica.  Then 10-15 minutes later we 
 see an exception on the leader for a socket timeout to the replica.  The 
 leader will then tell the replica to recover which in most cases it does and 
 then the outage is over.

 Here are the setup details:

 We are currently using Solr 4.0.0 with an external ZK ensemble of 5 machines.

 After 4.0.0 was released, a *lot* of problems with SolrCloud surfaced
 and have since been fixed.  You're five releases and about nine months
 behind what's current.  My recommendation: Upgrade to 4.3.1, ensure your
 configuration is up to date with changes to the example config between
 4.0.0 and 4.3.1, and reindex.  Ideally, you should set up a 4.0.0
 testbed, duplicate your current problem, and upgrade the testbed to see
 if the problem goes away.  A testbed will also give you practice for a
 smooth upgrade of your production system.

 Thanks,
 Shawn

join not working with UUIDs

2013-07-09 Thread Marcelo Elias Del Valle

Hello,

I am trying to create a POC to test query joins. However, I was
surprised when I saw my test worked with some ids, but when my document ids
are UUIDs, it doesn't work.
Follows an example, using solrj:

SolrInputDocument doc = new SolrInputDocument();
doc.addField(id, bcbaf9eb-0da7-4225-be24-2b9472ad2c20);
doc.addField(cor_parede, branca);
doc.addField(num_cadeiras, 34);
solr.add(doc);

// Add children
SolrInputDocument doc2 = new SolrInputDocument();
doc2.addField(id, computador1);
doc2.addField(acessorio1, Teclado);
doc2.addField(acessorio2, Mouse);
doc2.addField(root_id, bcbaf9eb-0da7-4225-be24-2b9472ad2c20);
solr.add(doc2);

 When I execute:

///select
params={start=0rows=10q=cor_parede%3Abrancafq=%7B%21join+from%3Droot_id+to%3Did%7Dacessorio1%3ATeclado}
SolrQuery query = new SolrQuery();

query.setStart(0);
query.setRows(10);
query.set(q, cor_parede:branca);
query.set(fq, {!join from=root_id to=id}acessorio1:Teclado);

QueryResponse response = DGSolrServer.get().query(query);
long numFound = response.getResults().getNumFound();

   it returns zero results. However, if I use room1 for first
document's id and for root_id field on second document, it works.

   Any idea why? What am I missing?

Best regards,
-- 
Marcelo Elias Del Valle
http://mvalle.com - @mvallebr

Re: join not working with UUIDs

Your join is requesting to use the join_id field (from) of documents 
matching the query of cor_parede:branca, but the join_id field of that 
document is empty.


Maybe you intended to search in the other direction, like 
acessorio1:Teclado.


-- Jack Krupansky

-Original Message- 
From: Marcelo Elias Del Valle

Sent: Tuesday, July 09, 2013 7:34 PM
To: solr-user@lucene.apache.org
Subject: join not working with UUIDs

Hello,

   I am trying to create a POC to test query joins. However, I was
surprised when I saw my test worked with some ids, but when my document ids
are UUIDs, it doesn't work.
   Follows an example, using solrj:

SolrInputDocument doc = new SolrInputDocument();
doc.addField(id, bcbaf9eb-0da7-4225-be24-2b9472ad2c20);
doc.addField(cor_parede, branca);
doc.addField(num_cadeiras, 34);
solr.add(doc);

// Add children
SolrInputDocument doc2 = new SolrInputDocument();
doc2.addField(id, computador1);
doc2.addField(acessorio1, Teclado);
doc2.addField(acessorio2, Mouse);
doc2.addField(root_id, bcbaf9eb-0da7-4225-be24-2b9472ad2c20);
solr.add(doc2);

When I execute:

   ///select
params={start=0rows=10q=cor_parede%3Abrancafq=%7B%21join+from%3Droot_id+to%3Did%7Dacessorio1%3ATeclado}
   SolrQuery query = new SolrQuery();

query.setStart(0);
query.setRows(10);
query.set(q, cor_parede:branca);
query.set(fq, {!join from=root_id to=id}acessorio1:Teclado);

QueryResponse response = DGSolrServer.get().query(query);
long numFound = response.getResults().getNumFound();

  it returns zero results. However, if I use room1 for first
document's id and for root_id field on second document, it works.

  Any idea why? What am I missing?

Best regards,
--
Marcelo Elias Del Valle
http://mvalle.com - @mvallebr

Re: join not working with UUIDs