Re: SolrEntityProcessor gets slower and slower

2013-07-22 Thread Manuel Le Normand
 Minfeng- This issue is tougher as the number of shard you have raise, you
can read Erick Erickson's post:
http://grokbase.com/t/lucene/solr-user/131p75p833/how-distributed-queries-works.
If you have 100M docs I guess you are running this issue.

The common way to deal with this issue is by filtering on a value that
would return fewer results every query, as a creation_date field, and every
query change this field range. For your data import use-case you might want
to generate your data-import.xml with different entities, each one for
another creation_date range. Thus no need for deep paging.

Another option is using
http://wiki.apache.org/solr/CommonQueryParameters#pageDoc_and_pageScore.
Implementing
it in a multi sharded environment, as all your scores=1.0 thus results are
ranked by shard (according to the internal [docId] of each shard), is not
possible of my knowledge.

Caching all the query results in each shard (by raising the
queryResultWindow) should help, wouldn't it?


Best,

Manu


On Mon, Jun 10, 2013 at 8:56 PM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 SolrEntityProcessor is fine for small amounts of data but not useful for
 such a large index. The problem is that deep paging in search results is
 expensive. As the start value for a query increases so does the cost of
 the query. You are much better off just re-indexing the data.


 On Mon, Jun 10, 2013 at 11:19 PM, Mingfeng Yang mfy...@wisewindow.com
 wrote:

  I trying to migrate 100M documents from a solr index (v3.6) to a
 solrcloud
  index (v4.1, 4 shards) by using SolrEntityProcessor.  My data-config.xml
 is
  like
 
  dataConfig document entity name=sep
 processor=SolrEntityProcessor
  url=http://10.64.35.117:8995/solr/; query=*:* rows=2000 fl=
 
 
 author_class,authorlink,author_location_text,author_text,author,category,date,dimension,entity,id,language,md5_text,op_dimension,opinion_text,query_id,search_source,sentiment,source_domain_text,source_domain,text,textshingle,title,topic,topic_text,url
  / /document /dataConfig
 
  Initially, the data import rate is about 1K docs/second, but it
 eventually
  decrease to 20docs/second after running for tens of hours.
 
  Last time I tried data import with solorentityprocessor, the transfer
 rate
  can be as high as 3K docs/seconds.
 
  Anyone has any clues what can cause the slowdown?
 
  Thanks,
  Ming-
 



 --
 Regards,
 Shalin Shekhar Mangar.



DIH and tinyint(1) Field

2013-07-22 Thread deniz
Hello, 

I have exactly the same problem as here 

http://lucene.472066.n3.nabble.com/how-to-avoid-DataImportHandler-from-interpreting-quot-tinyint-1-unsigned-quot-value-as-quot-Boolean--td4035241.html#a4036967

however for the solution there, it is ruining my date type fields...

are there any other ways to deal with this problem? 



-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-and-tinyint-1-Field-tp4079392.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DIH and tinyint(1) Field

2013-07-22 Thread Shalin Shekhar Mangar
Your database's JDBC driver is interpreting the tinyint(1) as a boolean.

Solr 4.4 fixes the problem affected date fields with convertType=true. It
should be released by the end of this week.


On Mon, Jul 22, 2013 at 12:18 PM, deniz denizdurmu...@gmail.com wrote:

 Hello,

 I have exactly the same problem as here


 http://lucene.472066.n3.nabble.com/how-to-avoid-DataImportHandler-from-interpreting-quot-tinyint-1-unsigned-quot-value-as-quot-Boolean--td4035241.html#a4036967

 however for the solution there, it is ruining my date type fields...

 are there any other ways to deal with this problem?



 -
 Zeki ama calismiyor... Calissa yapar...
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/DIH-and-tinyint-1-Field-tp4079392.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Regards,
Shalin Shekhar Mangar.


Re: Solr 4.3.1 - SolrCloud nodes down and lost documents

2013-07-22 Thread Neil Prosser
Very true. I was impatient (I think less than three minutes impatient so
hopefully 4.4 will save me from myself) but I didn't realise it was doing
something rather than just hanging. Next time I have to restart a node I'll
just leave and go get a cup of coffee or something.

My configuration is set to auto hard-commit every 5 minutes. No auto
soft-commit time is set.

Over the course of the weekend, while left unattended the nodes have been
going up and down (I've got to solve the issue that is causing them to come
and go, but any suggestions on what is likely to be causing something like
that are welcome), at one point one of the nodes stopped taking updates.
After indexing properly for a few hours with that one shard not accepting
updates, the replica of that shard which contains all the correct documents
must have replicated from the broken node and dropped documents. Is there
any protection against this in Solr or should I be focusing on getting my
nodes to be more reliable? I've now got a situation where four of my five
shards have leaders who are marked as down and followers who are up.

I'm going to start grabbing information about the cluster state so I can
track which changes are happening and in what order. I can get hold of Solr
logs and garbage collection logs while these things are happening.

Is this all just down to my nodes being unreliable?


On 21 July 2013 13:52, Erick Erickson erickerick...@gmail.com wrote:

 Well, if I'm reading this right you had a node go out of circulation
 and then bounced nodes until that node became the leader. So of course
 it wouldn't have the documents (how could it?). Basically you shot
 yourself in the foot.

 Underlying here is why it took the machine you were re-starting so
 long to come up that you got impatient and started killing nodes.
 There has been quite a bit done to make that process better, so what
 version of Solr are you using? 4.4 is being voted on right now, so if
 you might want to consider upgrading.

 There was, for instance, a situation where it would take 3 minutes for
 machines to start up. How impatient were you?

 Also, what are your hard commit parameters? All of the documents
 you're indexing will be in the transaction log between hard commits,
 and when a node comes up the leader will replay everything in the tlog
 to the new node, which might be a source of why it took so long for
 the new node to come back up. At the very least the new node you were
 bringing back online will need to do a full index replication (old
 style) to get caught up.

 Best
 Erick

 On Fri, Jul 19, 2013 at 4:02 AM, Neil Prosser neil.pros...@gmail.com
 wrote:
  While indexing some documents to a SolrCloud cluster (10 machines, 5
 shards
  and 2 replicas, so one replica on each machine) one of the replicas
 stopped
  receiving documents, while the other replica of the shard continued to
 grow.
 
  That was overnight so I was unable to track exactly what happened (I'm
  going off our Graphite graphs here). This morning when I was able to look
  at the cluster both replicas of that shard were marked as down (with one
  marked as leader). I attempted to restart the non-leader node but it
 took a
  long time to restart so I killed it and restarted the old leader, which
  also took a long time. I killed that one (I'm impatient) and left the
  non-leader node to restart, not realising it was missing approximately
 700k
  documents that the old leader had. Eventually it restarted and became
  leader. I restarted the old leader and it dropped the number of documents
  it had to match the previous non-leader.
 
  Is this expected behaviour when a replica with fewer documents is started
  before the other and elected leader? Should I have been paying more
  attention to the number of documents on the server before restarting
 nodes?
 
  I am still in the process of tuning the caches and warming for these
  servers but we are putting some load through the cluster so it is
 possible
  that the nodes are having to work quite hard when a new version of the
 core
  comes is made available. Is this likely to explain why I occasionally see
  nodes dropping out? Unfortunately in restarting the nodes I lost the GC
  logs to see whether that was likely to be the culprit. Is this the sort
 of
  situation where you raise the ZooKeeper timeout a bit? Currently the
  timeout for all nodes is 15 seconds.
 
  Are there any known issues which might explain what's happening? I'm just
  getting started with SolrCloud after using standard master/slave
  replication for an index which has got too big for one machine over the
  last few months.
 
  Also, is there any particular information that would be helpful to help
  with these issues if it should happen again?



highlighting required in document

2013-07-22 Thread Jamshaid Ashraf
Hi,

I'm using solr 4.3.0  following is the response against hit highlighting
request:

Request: http://localhost:8080/solr/collection2/select?q=content:ps4hl=true

Response:

doc
 arr name=contentstrThis post is regarding ps4 accuracy and qulaity
which is smooth and factastic/str/arr
/doc
lst name=highlighting
lst name=1
 arr name=contentstrThis post is regarding bps4/b accuracy and
qulaity which is smooth and factastic/str/arr
/lst

I wanted result like this:

doc
 arr name=contentstrThis post is regarding bps4/b accuracy and
qulaity which is smooth and factastic/str/arr
/doc
lst name=highlighting
lst name=1
 arr name=contentstrThis post is regarding bps4/b accuracy and
qulaity which is smooth and factastic/str/arr
/lst

Thanks in advance!

Regards,
Jamshaid


Re: DIH and tinyint(1) Field

2013-07-22 Thread deniz
Shalin Shekhar Mangar wrote
 Your database's JDBC driver is interpreting the tinyint(1) as a boolean.
 
 Solr 4.4 fixes the problem affected date fields with convertType=true. It
 should be released by the end of this week.
 
 
 On Mon, Jul 22, 2013 at 12:18 PM, deniz lt;

 denizdurmus87@

 gt; wrote:
 
 Hello,

 I have exactly the same problem as here


 http://lucene.472066.n3.nabble.com/how-to-avoid-DataImportHandler-from-interpreting-quot-tinyint-1-unsigned-quot-value-as-quot-Boolean--td4035241.html#a4036967

 however for the solution there, it is ruining my date type fields...

 are there any other ways to deal with this problem?



 -
 Zeki ama calismiyor... Calissa yapar...
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/DIH-and-tinyint-1-Field-tp4079392.html
 Sent from the Solr - User mailing list archive at Nabble.com.

 
 
 
 -- 
 Regards,
 Shalin Shekhar Mangar.


thank you Shalin, for a quick solution i found that adding
amp;tinyInt1isBit=false to connection url also works fine



-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-and-tinyint-1-Field-tp4079392p4079398.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: short-circuit OR operator in lucene/solr

2013-07-22 Thread Mikhail Khludnev
Short answer, no - it has zero sense.

But after some thinking, it can make some sense, potentially.
DisjunctionSumScorer holds child scorers semi-ordered in a binary heap.
Hypothetically inequality can be enforced at that heap, but heap might not
work anymore for such alignment. Hence, instead of heap TreeSet can be used
for experiment.
fwiw, it's a dev list question.


On Mon, Jul 22, 2013 at 4:48 AM, Deepak Konidena deepakk...@gmail.comwrote:

 I understand that lucene's AND (), OR (||) and NOT (!) operators are
 shorthands for REQUIRED, OPTIONAL and EXCLUDE respectively, which is why
 one can't treat them as boolean operators (adhering to boolean algebra).

 I have been trying to construct a simple OR expression, as follows

 q = +(field1:value1 OR field2:value2)

 with a match on either field1 or field2. But since the OR is merely an
 optional, documents where both field1:value1 and field2:value2 are matched,
 the query returns a score resulting in a match on both the clauses.

 How do I enforce short-circuiting in this context? In other words, how to
 implement short-circuiting as in boolean algebra where an expression A || B
 || C returns true if A is true without even looking into whether B or C
 could be true.
 -Deepak




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com


Regex in Stopword.xml

2013-07-22 Thread Scatman
Hi, 

I was looking for an issue, in order to put some regular expression in the
StopWord.xml, but it seems that we can only have words in the file.
I'm just wondering if there is a feature which will be done in this way or
if someone got a tip it will help me a lot :) 

Best,
Scatman.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Regex-in-Stopword-xml-tp4079412.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr - Multiple Facet Exclusion for the same Field

2013-07-22 Thread Ralf Heyde

Hello,

i need different (multiple) Facet exclusions for the same field. This 
approach works:


http://server/core/select/?q=*:*
 fq={!tag=b}brand:adidas
 fq={!tag=c}color:red
 facet.field={!ex=b}brand
 facet.field={!ex=c}brand
 facet.field={!ex=b,c}brand
 facet.field=brand
 facet=truefacet.mincount=1

then my result provides different facets for brand.
BUT: is there any possibility to get to know, which exclusion fits to 
which facet? Is there something like as in SQL (e.g. 
facet.field={!ex=b as BrandB}brand) ?

We are using Solr 3.6.

Hopefully this is a feature, not a bug, which we are using.

Thanks in advance.
Ralf




Re: Solr - Multiple Facet Exclusion for the same Field

2013-07-22 Thread Ralf Heyde

Just found it.
Use {!ex=c key=ckey} ...

On 07/22/2013 11:35 AM, Ralf Heyde wrote:

Hello,

i need different (multiple) Facet exclusions for the same field. This 
approach works:


http://server/core/select/?q=*:*
 fq={!tag=b}brand:adidas
 fq={!tag=c}color:red
 facet.field={!ex=b}brand
 facet.field={!ex=c}brand
 facet.field={!ex=b,c}brand
 facet.field=brand
 facet=truefacet.mincount=1

then my result provides different facets for brand.
BUT: is there any possibility to get to know, which exclusion fits to 
which facet? Is there something like as in SQL (e.g. 
facet.field={!ex=b as BrandB}brand) ?

We are using Solr 3.6.

Hopefully this is a feature, not a bug, which we are using.

Thanks in advance.
Ralf






Programatic instantiation of solr container and cores with config loaded from a jar

2013-07-22 Thread Robert Krüger
Hi,

I use solr embedded in a desktop app and I want to change it to no
longer require the configuration for the container and core to be in
the filesystem but rather be distributed as part of a jar file.

Could someone kindly point me to the right docs?

So far my impression is, I need to instantiate CoreContainer with a
custom SolrResourceLoader with properties parsed via some other API
but from the javadocs alone I feel a bit lost (why does it have to
have an instance directory at all?) and googling did not give me many
results. What would be ideal would be to have something like this
(pseudocode with partly imagined names, which hopefully illustrates
what I am trying to achieve):

ContainerConfig containerConfig =
ContainerConfigParser.parse(InputStream from Classloader);
CoreContainer  container = new CoreContainer(containerConfig);

CoreConfig coreConfig = CoreConfigParser.parse(container, InputStream
from Classloader);
container.register(name, coreConfig);

Ideally I would like to keep XML format to reuse my current solr.xml
and solrconfig.xml but that is just a nice-to-have.

Does such a way exist and if so, what are the real API classes and calls to use?

Thank you in advance,

Robert


Re: Regex in Stopword.xml

2013-07-22 Thread Manuel Le Normand
Use the pattern replace filter factory


filter class=solr.PatternReplaceFilterFactory pattern=([^a-z])
replacement=/

This will do exactly what you asked for


http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PatternReplaceFilterFactory




On Mon, Jul 22, 2013 at 12:22 PM, Scatman alan.aron...@sfr.com wrote:

 Hi,

 I was looking for an issue, in order to put some regular expression in the
 StopWord.xml, but it seems that we can only have words in the file.
 I'm just wondering if there is a feature which will be done in this way or
 if someone got a tip it will help me a lot :)

 Best,
 Scatman.




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Regex-in-Stopword-xml-tp4079412.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Programatic instantiation of solr container and cores with config loaded from a jar

2013-07-22 Thread Alan Woodward
Hi Robert,

The upcoming 4.4 release should make this a bit easier (you can check out the 
release branch now if you like, or wait a few days for the official version).  
CoreContainer now takes a SolrResourceLoader and a ConfigSolr object as 
constructor parameters, and you can create a ConfigSolr object from a string 
representation of solr.xml using the ConfigSolr.fromString() static method.

Alan Woodward
www.flax.co.uk


On 22 Jul 2013, at 11:41, Robert Krüger wrote:

 Hi,
 
 I use solr embedded in a desktop app and I want to change it to no
 longer require the configuration for the container and core to be in
 the filesystem but rather be distributed as part of a jar file.
 
 Could someone kindly point me to the right docs?
 
 So far my impression is, I need to instantiate CoreContainer with a
 custom SolrResourceLoader with properties parsed via some other API
 but from the javadocs alone I feel a bit lost (why does it have to
 have an instance directory at all?) and googling did not give me many
 results. What would be ideal would be to have something like this
 (pseudocode with partly imagined names, which hopefully illustrates
 what I am trying to achieve):
 
 ContainerConfig containerConfig =
 ContainerConfigParser.parse(InputStream from Classloader);
 CoreContainer  container = new CoreContainer(containerConfig);
 
 CoreConfig coreConfig = CoreConfigParser.parse(container, InputStream
 from Classloader);
 container.register(name, coreConfig);
 
 Ideally I would like to keep XML format to reuse my current solr.xml
 and solrconfig.xml but that is just a nice-to-have.
 
 Does such a way exist and if so, what are the real API classes and calls to 
 use?
 
 Thank you in advance,
 
 Robert



Re: Regex in Stopword.xml

2013-07-22 Thread Scatman
Thank for reply but it's not a solution that i'm looking for, and i should
better explained myself, because i got like 100 hundred regex to put in the
config. In order to manage easiest Solr, i think the better way is to put
regex in a file... I know that GSA from google do it, so i'd just hoped that
it will the case for Solr :)  

Best,
Scatman. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Regex-in-Stopword-xml-tp4079412p4079438.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 4.3.1 - SolrCloud nodes down and lost documents

2013-07-22 Thread Erick Erickson
Wow, you really shouldn't be having nodes go up and down so
frequently, that's a big red flag. That said, SolrCloud should be
pretty robust so this is something to pursue...

But even a 5 minute hard commit can lead to a hefty transaction
log under load, you may want to reduce it substantially depending
on how fast you are sending docs to the index. I'm talking
15-30 seconds here. It's critical that openSearcher be set to false
or you'll invalidate your caches that often. All a hard commit
with openSearcher set to false does is close off the current segment
and open a new one. It does NOT open/warm new searchers etc.

The soft commits control visibility, so that's how you control
whether you can search the docs or not. Pardon me if I'm
repeating stuff you already know!

As far as your nodes coming and going, I've seen some people have
good results by upping the ZooKeeper timeout limit. So I guess
my first question is whether the nodes are actually going out of service
or whether it's just a timeout issue

Good luck!
Erick

On Mon, Jul 22, 2013 at 3:29 AM, Neil Prosser neil.pros...@gmail.com wrote:
 Very true. I was impatient (I think less than three minutes impatient so
 hopefully 4.4 will save me from myself) but I didn't realise it was doing
 something rather than just hanging. Next time I have to restart a node I'll
 just leave and go get a cup of coffee or something.

 My configuration is set to auto hard-commit every 5 minutes. No auto
 soft-commit time is set.

 Over the course of the weekend, while left unattended the nodes have been
 going up and down (I've got to solve the issue that is causing them to come
 and go, but any suggestions on what is likely to be causing something like
 that are welcome), at one point one of the nodes stopped taking updates.
 After indexing properly for a few hours with that one shard not accepting
 updates, the replica of that shard which contains all the correct documents
 must have replicated from the broken node and dropped documents. Is there
 any protection against this in Solr or should I be focusing on getting my
 nodes to be more reliable? I've now got a situation where four of my five
 shards have leaders who are marked as down and followers who are up.

 I'm going to start grabbing information about the cluster state so I can
 track which changes are happening and in what order. I can get hold of Solr
 logs and garbage collection logs while these things are happening.

 Is this all just down to my nodes being unreliable?


 On 21 July 2013 13:52, Erick Erickson erickerick...@gmail.com wrote:

 Well, if I'm reading this right you had a node go out of circulation
 and then bounced nodes until that node became the leader. So of course
 it wouldn't have the documents (how could it?). Basically you shot
 yourself in the foot.

 Underlying here is why it took the machine you were re-starting so
 long to come up that you got impatient and started killing nodes.
 There has been quite a bit done to make that process better, so what
 version of Solr are you using? 4.4 is being voted on right now, so if
 you might want to consider upgrading.

 There was, for instance, a situation where it would take 3 minutes for
 machines to start up. How impatient were you?

 Also, what are your hard commit parameters? All of the documents
 you're indexing will be in the transaction log between hard commits,
 and when a node comes up the leader will replay everything in the tlog
 to the new node, which might be a source of why it took so long for
 the new node to come back up. At the very least the new node you were
 bringing back online will need to do a full index replication (old
 style) to get caught up.

 Best
 Erick

 On Fri, Jul 19, 2013 at 4:02 AM, Neil Prosser neil.pros...@gmail.com
 wrote:
  While indexing some documents to a SolrCloud cluster (10 machines, 5
 shards
  and 2 replicas, so one replica on each machine) one of the replicas
 stopped
  receiving documents, while the other replica of the shard continued to
 grow.
 
  That was overnight so I was unable to track exactly what happened (I'm
  going off our Graphite graphs here). This morning when I was able to look
  at the cluster both replicas of that shard were marked as down (with one
  marked as leader). I attempted to restart the non-leader node but it
 took a
  long time to restart so I killed it and restarted the old leader, which
  also took a long time. I killed that one (I'm impatient) and left the
  non-leader node to restart, not realising it was missing approximately
 700k
  documents that the old leader had. Eventually it restarted and became
  leader. I restarted the old leader and it dropped the number of documents
  it had to match the previous non-leader.
 
  Is this expected behaviour when a replica with fewer documents is started
  before the other and elected leader? Should I have been paying more
  attention to the number of documents on the server before restarting
 nodes?
 
  I am still 

Re: Solr 4.3.1 - SolrCloud nodes down and lost documents

2013-07-22 Thread Neil Prosser
No need to apologise. It's always good to have things like that reiterated
in case I've misunderstood along the way.

I have a feeling that it's related to garbage collection. I assume that if
the JVM heads into a stop-the-world GC Solr can't let ZooKeeper know it's
still alive and so gets marked as down. I've just taken a look at the GC
logs and can see a couple of full collections which took longer than my ZK
timeout of 15s). I'm still in the process of tuning the cache sizes and
have probably got it wrong (I'm coming from a Solr instance which runs on a
48G heap with ~40m documents and bringing it into five shards with 8G
heap). I thought I was being conservative with the cache sizes but I should
probably drop them right down and start again. The entire index is cached
by Linux so I should just need caches to help with things which eat CPU at
request time.

The indexing level is unusual because normally we wouldn't be indexing
everything sequentially, just making delta updates to the index as things
are changed in our MoR. However, it's handy to know how it reacts under the
most extreme load we could give it.

In the case that I set my hard commit time to 15-30 seconds with
openSearcher set to false, how do I control when I actually do invalidate
the caches and open a new searcher? Is this something that Solr can do
automatically, or will I need some sort of coordinator process to perform a
'proper' commit from outside Solr?

In our case the process of opening a new searcher is definitely a hefty
operation. We have a large number of boosts and filters which are used for
just about every query that is made against the index so we currently have
them warmed which can take upwards of a minute on our giant core.

Thanks for your help.


On 22 July 2013 13:00, Erick Erickson erickerick...@gmail.com wrote:

 Wow, you really shouldn't be having nodes go up and down so
 frequently, that's a big red flag. That said, SolrCloud should be
 pretty robust so this is something to pursue...

 But even a 5 minute hard commit can lead to a hefty transaction
 log under load, you may want to reduce it substantially depending
 on how fast you are sending docs to the index. I'm talking
 15-30 seconds here. It's critical that openSearcher be set to false
 or you'll invalidate your caches that often. All a hard commit
 with openSearcher set to false does is close off the current segment
 and open a new one. It does NOT open/warm new searchers etc.

 The soft commits control visibility, so that's how you control
 whether you can search the docs or not. Pardon me if I'm
 repeating stuff you already know!

 As far as your nodes coming and going, I've seen some people have
 good results by upping the ZooKeeper timeout limit. So I guess
 my first question is whether the nodes are actually going out of service
 or whether it's just a timeout issue

 Good luck!
 Erick

 On Mon, Jul 22, 2013 at 3:29 AM, Neil Prosser neil.pros...@gmail.com
 wrote:
  Very true. I was impatient (I think less than three minutes impatient so
  hopefully 4.4 will save me from myself) but I didn't realise it was doing
  something rather than just hanging. Next time I have to restart a node
 I'll
  just leave and go get a cup of coffee or something.
 
  My configuration is set to auto hard-commit every 5 minutes. No auto
  soft-commit time is set.
 
  Over the course of the weekend, while left unattended the nodes have been
  going up and down (I've got to solve the issue that is causing them to
 come
  and go, but any suggestions on what is likely to be causing something
 like
  that are welcome), at one point one of the nodes stopped taking updates.
  After indexing properly for a few hours with that one shard not accepting
  updates, the replica of that shard which contains all the correct
 documents
  must have replicated from the broken node and dropped documents. Is there
  any protection against this in Solr or should I be focusing on getting my
  nodes to be more reliable? I've now got a situation where four of my five
  shards have leaders who are marked as down and followers who are up.
 
  I'm going to start grabbing information about the cluster state so I can
  track which changes are happening and in what order. I can get hold of
 Solr
  logs and garbage collection logs while these things are happening.
 
  Is this all just down to my nodes being unreliable?
 
 
  On 21 July 2013 13:52, Erick Erickson erickerick...@gmail.com wrote:
 
  Well, if I'm reading this right you had a node go out of circulation
  and then bounced nodes until that node became the leader. So of course
  it wouldn't have the documents (how could it?). Basically you shot
  yourself in the foot.
 
  Underlying here is why it took the machine you were re-starting so
  long to come up that you got impatient and started killing nodes.
  There has been quite a bit done to make that process better, so what
  version of Solr are you using? 4.4 is being voted on right now, so 

Re: Solr 4.3.1 - SolrCloud nodes down and lost documents

2013-07-22 Thread Neil Prosser
Sorry, I should also mention that these leader nodes which are marked as
down can actually still be queried locally with distrib=false with no
problems. Is it possible that they've somehow got themselves out-of-sync?


On 22 July 2013 13:37, Neil Prosser neil.pros...@gmail.com wrote:

 No need to apologise. It's always good to have things like that reiterated
 in case I've misunderstood along the way.

 I have a feeling that it's related to garbage collection. I assume that if
 the JVM heads into a stop-the-world GC Solr can't let ZooKeeper know it's
 still alive and so gets marked as down. I've just taken a look at the GC
 logs and can see a couple of full collections which took longer than my ZK
 timeout of 15s). I'm still in the process of tuning the cache sizes and
 have probably got it wrong (I'm coming from a Solr instance which runs on a
 48G heap with ~40m documents and bringing it into five shards with 8G
 heap). I thought I was being conservative with the cache sizes but I should
 probably drop them right down and start again. The entire index is cached
 by Linux so I should just need caches to help with things which eat CPU at
 request time.

 The indexing level is unusual because normally we wouldn't be indexing
 everything sequentially, just making delta updates to the index as things
 are changed in our MoR. However, it's handy to know how it reacts under the
 most extreme load we could give it.

 In the case that I set my hard commit time to 15-30 seconds with
 openSearcher set to false, how do I control when I actually do invalidate
 the caches and open a new searcher? Is this something that Solr can do
 automatically, or will I need some sort of coordinator process to perform a
 'proper' commit from outside Solr?

 In our case the process of opening a new searcher is definitely a hefty
 operation. We have a large number of boosts and filters which are used for
 just about every query that is made against the index so we currently have
 them warmed which can take upwards of a minute on our giant core.

 Thanks for your help.


 On 22 July 2013 13:00, Erick Erickson erickerick...@gmail.com wrote:

 Wow, you really shouldn't be having nodes go up and down so
 frequently, that's a big red flag. That said, SolrCloud should be
 pretty robust so this is something to pursue...

 But even a 5 minute hard commit can lead to a hefty transaction
 log under load, you may want to reduce it substantially depending
 on how fast you are sending docs to the index. I'm talking
 15-30 seconds here. It's critical that openSearcher be set to false
 or you'll invalidate your caches that often. All a hard commit
 with openSearcher set to false does is close off the current segment
 and open a new one. It does NOT open/warm new searchers etc.

 The soft commits control visibility, so that's how you control
 whether you can search the docs or not. Pardon me if I'm
 repeating stuff you already know!

 As far as your nodes coming and going, I've seen some people have
 good results by upping the ZooKeeper timeout limit. So I guess
 my first question is whether the nodes are actually going out of service
 or whether it's just a timeout issue

 Good luck!
 Erick

 On Mon, Jul 22, 2013 at 3:29 AM, Neil Prosser neil.pros...@gmail.com
 wrote:
  Very true. I was impatient (I think less than three minutes impatient so
  hopefully 4.4 will save me from myself) but I didn't realise it was
 doing
  something rather than just hanging. Next time I have to restart a node
 I'll
  just leave and go get a cup of coffee or something.
 
  My configuration is set to auto hard-commit every 5 minutes. No auto
  soft-commit time is set.
 
  Over the course of the weekend, while left unattended the nodes have
 been
  going up and down (I've got to solve the issue that is causing them to
 come
  and go, but any suggestions on what is likely to be causing something
 like
  that are welcome), at one point one of the nodes stopped taking updates.
  After indexing properly for a few hours with that one shard not
 accepting
  updates, the replica of that shard which contains all the correct
 documents
  must have replicated from the broken node and dropped documents. Is
 there
  any protection against this in Solr or should I be focusing on getting
 my
  nodes to be more reliable? I've now got a situation where four of my
 five
  shards have leaders who are marked as down and followers who are up.
 
  I'm going to start grabbing information about the cluster state so I can
  track which changes are happening and in what order. I can get hold of
 Solr
  logs and garbage collection logs while these things are happening.
 
  Is this all just down to my nodes being unreliable?
 
 
  On 21 July 2013 13:52, Erick Erickson erickerick...@gmail.com wrote:
 
  Well, if I'm reading this right you had a node go out of circulation
  and then bounced nodes until that node became the leader. So of course
  it wouldn't have the documents (how could it?). 

RE: Solr 4.3.1 - SolrCloud nodes down and lost documents

2013-07-22 Thread Markus Jelsma
It is possible: https://issues.apache.org/jira/browse/SOLR-4260
I rarely see it and i cannot reliably reproduce it but it just sometimes 
happens. Nodes will not bring each other back in sync.

 
 
-Original message-
 From:Neil Prosser neil.pros...@gmail.com
 Sent: Monday 22nd July 2013 14:41
 To: solr-user@lucene.apache.org
 Subject: Re: Solr 4.3.1 - SolrCloud nodes down and lost documents
 
 Sorry, I should also mention that these leader nodes which are marked as
 down can actually still be queried locally with distrib=false with no
 problems. Is it possible that they've somehow got themselves out-of-sync?
 
 
 On 22 July 2013 13:37, Neil Prosser neil.pros...@gmail.com wrote:
 
  No need to apologise. It's always good to have things like that reiterated
  in case I've misunderstood along the way.
 
  I have a feeling that it's related to garbage collection. I assume that if
  the JVM heads into a stop-the-world GC Solr can't let ZooKeeper know it's
  still alive and so gets marked as down. I've just taken a look at the GC
  logs and can see a couple of full collections which took longer than my ZK
  timeout of 15s). I'm still in the process of tuning the cache sizes and
  have probably got it wrong (I'm coming from a Solr instance which runs on a
  48G heap with ~40m documents and bringing it into five shards with 8G
  heap). I thought I was being conservative with the cache sizes but I should
  probably drop them right down and start again. The entire index is cached
  by Linux so I should just need caches to help with things which eat CPU at
  request time.
 
  The indexing level is unusual because normally we wouldn't be indexing
  everything sequentially, just making delta updates to the index as things
  are changed in our MoR. However, it's handy to know how it reacts under the
  most extreme load we could give it.
 
  In the case that I set my hard commit time to 15-30 seconds with
  openSearcher set to false, how do I control when I actually do invalidate
  the caches and open a new searcher? Is this something that Solr can do
  automatically, or will I need some sort of coordinator process to perform a
  'proper' commit from outside Solr?
 
  In our case the process of opening a new searcher is definitely a hefty
  operation. We have a large number of boosts and filters which are used for
  just about every query that is made against the index so we currently have
  them warmed which can take upwards of a minute on our giant core.
 
  Thanks for your help.
 
 
  On 22 July 2013 13:00, Erick Erickson erickerick...@gmail.com wrote:
 
  Wow, you really shouldn't be having nodes go up and down so
  frequently, that's a big red flag. That said, SolrCloud should be
  pretty robust so this is something to pursue...
 
  But even a 5 minute hard commit can lead to a hefty transaction
  log under load, you may want to reduce it substantially depending
  on how fast you are sending docs to the index. I'm talking
  15-30 seconds here. It's critical that openSearcher be set to false
  or you'll invalidate your caches that often. All a hard commit
  with openSearcher set to false does is close off the current segment
  and open a new one. It does NOT open/warm new searchers etc.
 
  The soft commits control visibility, so that's how you control
  whether you can search the docs or not. Pardon me if I'm
  repeating stuff you already know!
 
  As far as your nodes coming and going, I've seen some people have
  good results by upping the ZooKeeper timeout limit. So I guess
  my first question is whether the nodes are actually going out of service
  or whether it's just a timeout issue
 
  Good luck!
  Erick
 
  On Mon, Jul 22, 2013 at 3:29 AM, Neil Prosser neil.pros...@gmail.com
  wrote:
   Very true. I was impatient (I think less than three minutes impatient so
   hopefully 4.4 will save me from myself) but I didn't realise it was
  doing
   something rather than just hanging. Next time I have to restart a node
  I'll
   just leave and go get a cup of coffee or something.
  
   My configuration is set to auto hard-commit every 5 minutes. No auto
   soft-commit time is set.
  
   Over the course of the weekend, while left unattended the nodes have
  been
   going up and down (I've got to solve the issue that is causing them to
  come
   and go, but any suggestions on what is likely to be causing something
  like
   that are welcome), at one point one of the nodes stopped taking updates.
   After indexing properly for a few hours with that one shard not
  accepting
   updates, the replica of that shard which contains all the correct
  documents
   must have replicated from the broken node and dropped documents. Is
  there
   any protection against this in Solr or should I be focusing on getting
  my
   nodes to be more reliable? I've now got a situation where four of my
  five
   shards have leaders who are marked as down and followers who are up.
  
   I'm going to start grabbing information about 

RE: Solr 4.3.1 - SolrCloud nodes down and lost documents

2013-07-22 Thread Markus Jelsma
You should increase your ZK time out, this may be the issue in your case. You 
may also want to try the G1GC collector to keep STW under ZK time out.
 
-Original message-
 From:Neil Prosser neil.pros...@gmail.com
 Sent: Monday 22nd July 2013 14:38
 To: solr-user@lucene.apache.org
 Subject: Re: Solr 4.3.1 - SolrCloud nodes down and lost documents
 
 No need to apologise. It's always good to have things like that reiterated
 in case I've misunderstood along the way.
 
 I have a feeling that it's related to garbage collection. I assume that if
 the JVM heads into a stop-the-world GC Solr can't let ZooKeeper know it's
 still alive and so gets marked as down. I've just taken a look at the GC
 logs and can see a couple of full collections which took longer than my ZK
 timeout of 15s). I'm still in the process of tuning the cache sizes and
 have probably got it wrong (I'm coming from a Solr instance which runs on a
 48G heap with ~40m documents and bringing it into five shards with 8G
 heap). I thought I was being conservative with the cache sizes but I should
 probably drop them right down and start again. The entire index is cached
 by Linux so I should just need caches to help with things which eat CPU at
 request time.
 
 The indexing level is unusual because normally we wouldn't be indexing
 everything sequentially, just making delta updates to the index as things
 are changed in our MoR. However, it's handy to know how it reacts under the
 most extreme load we could give it.
 
 In the case that I set my hard commit time to 15-30 seconds with
 openSearcher set to false, how do I control when I actually do invalidate
 the caches and open a new searcher? Is this something that Solr can do
 automatically, or will I need some sort of coordinator process to perform a
 'proper' commit from outside Solr?
 
 In our case the process of opening a new searcher is definitely a hefty
 operation. We have a large number of boosts and filters which are used for
 just about every query that is made against the index so we currently have
 them warmed which can take upwards of a minute on our giant core.
 
 Thanks for your help.
 
 
 On 22 July 2013 13:00, Erick Erickson erickerick...@gmail.com wrote:
 
  Wow, you really shouldn't be having nodes go up and down so
  frequently, that's a big red flag. That said, SolrCloud should be
  pretty robust so this is something to pursue...
 
  But even a 5 minute hard commit can lead to a hefty transaction
  log under load, you may want to reduce it substantially depending
  on how fast you are sending docs to the index. I'm talking
  15-30 seconds here. It's critical that openSearcher be set to false
  or you'll invalidate your caches that often. All a hard commit
  with openSearcher set to false does is close off the current segment
  and open a new one. It does NOT open/warm new searchers etc.
 
  The soft commits control visibility, so that's how you control
  whether you can search the docs or not. Pardon me if I'm
  repeating stuff you already know!
 
  As far as your nodes coming and going, I've seen some people have
  good results by upping the ZooKeeper timeout limit. So I guess
  my first question is whether the nodes are actually going out of service
  or whether it's just a timeout issue
 
  Good luck!
  Erick
 
  On Mon, Jul 22, 2013 at 3:29 AM, Neil Prosser neil.pros...@gmail.com
  wrote:
   Very true. I was impatient (I think less than three minutes impatient so
   hopefully 4.4 will save me from myself) but I didn't realise it was doing
   something rather than just hanging. Next time I have to restart a node
  I'll
   just leave and go get a cup of coffee or something.
  
   My configuration is set to auto hard-commit every 5 minutes. No auto
   soft-commit time is set.
  
   Over the course of the weekend, while left unattended the nodes have been
   going up and down (I've got to solve the issue that is causing them to
  come
   and go, but any suggestions on what is likely to be causing something
  like
   that are welcome), at one point one of the nodes stopped taking updates.
   After indexing properly for a few hours with that one shard not accepting
   updates, the replica of that shard which contains all the correct
  documents
   must have replicated from the broken node and dropped documents. Is there
   any protection against this in Solr or should I be focusing on getting my
   nodes to be more reliable? I've now got a situation where four of my five
   shards have leaders who are marked as down and followers who are up.
  
   I'm going to start grabbing information about the cluster state so I can
   track which changes are happening and in what order. I can get hold of
  Solr
   logs and garbage collection logs while these things are happening.
  
   Is this all just down to my nodes being unreliable?
  
  
   On 21 July 2013 13:52, Erick Erickson erickerick...@gmail.com wrote:
  
   Well, if I'm reading this right you had a node go out of 

Problem instatanting a ValueSourceParser plugin in 4.3.1

2013-07-22 Thread Abeygunawardena, Niran
Hi,

I'm trying to migrate to Solr 4.3.1 from Solr 4.0.0. I have a Solr Plugin which 
extends ValueSourceParser and it works under Solr 4.0.0 but does not work under 
Solr 4.3.1. I compiled the plugin using the latest solr-4.3.1*.jars and 
lucene-4.3.1*.jars but I get the following stacktrace error when starting up a 
core referencing this plugin...seen below. Does anyone know why it might be 
giving me a ClassCastException under 4.3.1?

Thanks,
Niran

2458 [coreLoadExecutor-3-thread-2] ERROR org.apache.solr.core.CoreContainer   
Unable to create core: example_core
org.apache.solr.common.SolrException: Error Instantiating ValueSourceParser, 
com.example.HitsValueSourceParser failed to instanti
ate org.apache.solr.search.ValueSourceParser
at org.apache.solr.core.SolrCore.init(SolrCore.java:821)
at org.apache.solr.core.SolrCore.init(SolrCore.java:618)
at 
org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:949)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:984)
at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597)
at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown 
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.apache.solr.common.SolrException: Error Instantiating 
ValueSourceParser, com.example.HitsValueSourceParser failed
to instantiate org.apache.solr.search.ValueSourceParser
at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:539)
at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:575)
at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2088)
at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2082)
at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2115)
at 
org.apache.solr.core.SolrCore.initValueSourceParsers(SolrCore.java:2027)
at org.apache.solr.core.SolrCore.init(SolrCore.java:749)
... 13 more
Caused by: java.lang.ClassCastException: class com.example.HitsValueSourceParser
at java.lang.Class.asSubclass(Unknown Source)
at 
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:448)
at 
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:396)
at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:518)
... 19 more
2466 [coreLoadExecutor-3-thread-2] ERROR org.apache.solr.core.CoreContainer   
null:org.apache.solr.common.SolrException: Unable to create core: example_core
at 
org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:1450)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:993)
at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597)
at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown 
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.apache.solr.common.SolrException: Error Instantiating 
ValueSourceParser, com.example.HitsValueSourceParser failed
to instantiate org.apache.solr.search.ValueSourceParser
at org.apache.solr.core.SolrCore.init(SolrCore.java:821)
at org.apache.solr.core.SolrCore.init(SolrCore.java:618)
at 
org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:949)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:984)
... 10 more
Caused by: org.apache.solr.common.SolrException: Error Instantiating 
ValueSourceParser, com.example.HitsValueSourceParser failed
to instantiate org.apache.solr.search.ValueSourceParser
at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:539)
at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:575)
at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2088)
at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2082)
at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2115)
   

Re: Programatic instantiation of solr container and cores with config loaded from a jar

2013-07-22 Thread Robert Krüger
Great, thank you!

On Jul 22, 2013 1:35 PM, Alan Woodward a...@flax.co.uk wrote:

 Hi Robert,

 The upcoming 4.4 release should make this a bit easier (you can check out
the release branch now if you like, or wait a few days for the official
version).  CoreContainer now takes a SolrResourceLoader and a ConfigSolr
object as constructor parameters, and you can create a ConfigSolr object
from a string representation of solr.xml using the ConfigSolr.fromString()
static method.

 Alan Woodward
 www.flax.co.uk


 On 22 Jul 2013, at 11:41, Robert Krüger wrote:

  Hi,
 
  I use solr embedded in a desktop app and I want to change it to no
  longer require the configuration for the container and core to be in
  the filesystem but rather be distributed as part of a jar file.
 
  Could someone kindly point me to the right docs?
 
  So far my impression is, I need to instantiate CoreContainer with a
  custom SolrResourceLoader with properties parsed via some other API
  but from the javadocs alone I feel a bit lost (why does it have to
  have an instance directory at all?) and googling did not give me many
  results. What would be ideal would be to have something like this
  (pseudocode with partly imagined names, which hopefully illustrates
  what I am trying to achieve):
 
  ContainerConfig containerConfig =
  ContainerConfigParser.parse(InputStream from Classloader);
  CoreContainer  container = new CoreContainer(containerConfig);
 
  CoreConfig coreConfig = CoreConfigParser.parse(container, InputStream
  from Classloader);
  container.register(name, coreConfig);
 
  Ideally I would like to keep XML format to reuse my current solr.xml
  and solrconfig.xml but that is just a nice-to-have.
 
  Does such a way exist and if so, what are the real API classes and
calls to use?
 
  Thank you in advance,
 
  Robert



Problem instatanting a ValueSourceParser plugin in 4.3.1

2013-07-22 Thread Abeygunawardena, Niran
Hi,

I'm trying to migrate to Solr 4.3.1 from Solr 4.0.0. I have a Solr Plugin which 
extends ValueSourceParser and it works under Solr 4.0.0 but it does not work 
under Solr 4.3.1. I compiled the plugin using the solr-4.3.1*.jars and 
lucene-4.3.1*.jars but I get the following stacktrace error when starting up a 
core referencing this plugin...seen below. Does anyone know why it might be 
giving me a ClassCastException under 4.3.1?

Thanks,
Niran

2458 [coreLoadExecutor-3-thread-2] ERROR org.apache.solr.core.CoreContainer   
Unable to create core: example_core
org.apache.solr.common.SolrException: Error Instantiating ValueSourceParser, 
com.example.HitsValueSourceParser failed to instanti
ate org.apache.solr.search.ValueSourceParser
at org.apache.solr.core.SolrCore.init(SolrCore.java:821)
at org.apache.solr.core.SolrCore.init(SolrCore.java:618)
at 
org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:949)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:984)
at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597)
at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown 
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.apache.solr.common.SolrException: Error Instantiating 
ValueSourceParser, com.example.HitsValueSourceParser failed
to instantiate org.apache.solr.search.ValueSourceParser
at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:539)
at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:575)
at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2088)
at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2082)
at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2115)
at 
org.apache.solr.core.SolrCore.initValueSourceParsers(SolrCore.java:2027)
at org.apache.solr.core.SolrCore.init(SolrCore.java:749)
... 13 more
Caused by: java.lang.ClassCastException: class com.example.HitsValueSourceParser
at java.lang.Class.asSubclass(Unknown Source)
at 
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:448)
at 
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:396)
at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:518)
... 19 more
2466 [coreLoadExecutor-3-thread-2] ERROR org.apache.solr.core.CoreContainer   
null:org.apache.solr.common.SolrException: Unable to create core: example_core
at 
org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:1450)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:993)
at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597)
at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown 
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.apache.solr.common.SolrException: Error Instantiating 
ValueSourceParser, com.example.HitsValueSourceParser failed
to instantiate org.apache.solr.search.ValueSourceParser
at org.apache.solr.core.SolrCore.init(SolrCore.java:821)
at org.apache.solr.core.SolrCore.init(SolrCore.java:618)
at 
org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:949)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:984)
... 10 more
Caused by: org.apache.solr.common.SolrException: Error Instantiating 
ValueSourceParser, com.example.HitsValueSourceParser failed
to instantiate org.apache.solr.search.ValueSourceParser
at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:539)
at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:575)
at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2088)
at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2082)
at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2115)
at 

how to improve (keyword) relevance?

2013-07-22 Thread eShard
Good morning,
I'm currently running Solr 4.0 final (multi core) with manifoldcf v1.3 dev
on tomcat 7.
Early on, I used copyfield to put the meta data into the text field to
simplify solr queries (i.e. I only have to query one field now.)
However, a lot people are concerned about improving relevance.
I found a relevancy solution on page 298 of the Apache Solr 4.0 Cookbook;
however is there a way to modify it so it only uses one field? (i.e. the
text field?) 

(Note well: I have multi cores and the schemas are all somewhat different;
If I can't get this to work with one field then I would have to build
complex queries for all the other cores; this would vastly over complicate
the UI. Is there another way?)
here's the requesthandler in question:
requestHandler name=/better class=solr.StandardRequestHandler
  1st name=defaults
  str name=indenttrue/str
  str name=q_query_:{!edismaxqf=$qfQuery
mm=$mmQuerypf=$pfQuerybq=$boostQuery v=$mainQuery}
  /str
  str name=qfQueryname^10 description/str
  str name=mmQuery1/str
  str name=pfQueryname description/str
  str name=boostQuery_query_:{!edismaxqf=$boostQuerQf mm=100%
v=$mainQuery}^10/str
  /1st
/requestHandler




--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-improve-keyword-relevance-tp4079462.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 4.3.1 - SolrCloud nodes down and lost documents

2013-07-22 Thread Shawn Heisey
On 7/22/2013 6:45 AM, Markus Jelsma wrote:
 You should increase your ZK time out, this may be the issue in your case. You 
 may also want to try the G1GC collector to keep STW under ZK time out.

When I tried G1, the occasional stop-the-world GC actually got worse.  I
tried G1 after trying CMS with no other tuning parameters.  The average
GC time went down, but when it got into a place where it had to do a
stop-the-world collection, it was worse.

Based on the GC statistics in jvisualvm and jstat, I didn't think I had
a problem.  The way I discovered that I had a problem was by looking at
my haproxy load balancer -- sometimes requests would be sent to a backup
server instead of my primary, because the ping request handler was
timing out on the LB health check.  The LB was set to time out after
five seconds.  When I went looking deeper with the GC log and some other
tools, I was seeing 8-10 second GC pauses.  G1 was showing me pauses of
12 seconds.

Now I use a heavily tuned CMS config, and there are no more LB switches
to a backup server.  I've put some of my own information about my GC
settings on my personal Solr wiki page:

http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning

I've got an 8GB heap on my systems running 3.5.0 (one copy of the index)
and a 6GB heap on those running 4.2.1 (the other copy of the index).

Summary: Just switching to the G1 collector won't solve GC pause
problems.  There's not a lot of G1 tuning information out there yet.  If
someone can come up with a good set of G1 tuning parameters, G1 might
become better than CMS.

Thanks,
Shawn



Re: Regex in Stopword.xml

2013-07-22 Thread Jack Krupansky
How did you get the impression that GSA supports regex stop words? GSA seems 
to follow the same rules as Solr.


See the doc:
http://www.google.com/support/enterprise/static/gsa/docs/admin/70/gsa_doc_set/admin_searchexp/ce_improving_search.html#1050255

As with GSA, the stop words are a simple .TXT file.

In any case, Solr and Lucene do not support stop words that are regular 
expressions, although a regex filter can simulate them to a limited degree.


-- Jack Krupansky

-Original Message- 
From: Scatman

Sent: Monday, July 22, 2013 7:48 AM
To: solr-user@lucene.apache.org
Subject: Re: Regex in Stopword.xml

Thank for reply but it's not a solution that i'm looking for, and i should
better explained myself, because i got like 100 hundred regex to put in the
config. In order to manage easiest Solr, i think the better way is to put
regex in a file... I know that GSA from google do it, so i'd just hoped that
it will the case for Solr :)

Best,
Scatman.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Regex-in-Stopword-xml-tp4079412p4079438.html
Sent from the Solr - User mailing list archive at Nabble.com. 



queryResultCache should not related with the order of fq's list

2013-07-22 Thread 黄飞鸿
 

Hello, 

 

QueryResultCache should not related with the order of fq's list.

 

There are two case query with the same meaning below. But the case2 can't
use the queryResultCache when case1 is executed.

 

case1: q=:fq=field1:value1fq=field2:value2
case2: q=:fq=field2:value2fq=field1:value1

 

I think queryResultCache should not be related with the order of fq's list.

 

I am a new comer in posting bug. I can’t sure whether it is a bug.

 

I create the issure:  https://issues.apache.org/jira/browse/SOLR-5057
https://issues.apache.org/jira/browse/SOLR-5057

 

 

By the way, if the issure is ok , how can I post my code? 

 

Thanks.

 

 



Re: how to improve (keyword) relevance?

2013-07-22 Thread Jack Krupansky
Could you please be more specific about the relevancy problem you are trying 
to solve?


-- Jack Krupansky

-Original Message- 
From: eShard

Sent: Monday, July 22, 2013 9:57 AM
To: solr-user@lucene.apache.org
Subject: how to improve (keyword) relevance?

Good morning,
I'm currently running Solr 4.0 final (multi core) with manifoldcf v1.3 dev
on tomcat 7.
Early on, I used copyfield to put the meta data into the text field to
simplify solr queries (i.e. I only have to query one field now.)
However, a lot people are concerned about improving relevance.
I found a relevancy solution on page 298 of the Apache Solr 4.0 Cookbook;
however is there a way to modify it so it only uses one field? (i.e. the
text field?)

(Note well: I have multi cores and the schemas are all somewhat different;
If I can't get this to work with one field then I would have to build
complex queries for all the other cores; this would vastly over complicate
the UI. Is there another way?)
here's the requesthandler in question:
requestHandler name=/better class=solr.StandardRequestHandler
 1st name=defaults
 str name=indenttrue/str
 str name=q_query_:{!edismaxqf=$qfQuery
mm=$mmQuerypf=$pfQuerybq=$boostQuery v=$mainQuery}
 /str
 str name=qfQueryname^10 description/str
 str name=mmQuery1/str
 str name=pfQueryname description/str
 str name=boostQuery_query_:{!edismaxqf=$boostQuerQf mm=100%
v=$mainQuery}^10/str
 /1st
/requestHandler




--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-improve-keyword-relevance-tp4079462.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: short-circuit OR operator in lucene/solr

2013-07-22 Thread Yonik Seeley
function queries to the rescue!

q={!func}def(query($a),query($b),query($c))
a=field1:value1
b=field2:value2
c=field3:value3

def or default function returns the value of the first argument that
matches.  It's named default because it's more commonly used like
def(popularity,50)  (return the value of the popularity field, or 50
if the doc has no value for that field).

-Yonik
http://lucidworks.com


On Sun, Jul 21, 2013 at 8:48 PM, Deepak Konidena deepakk...@gmail.com wrote:
 I understand that lucene's AND (), OR (||) and NOT (!) operators are
 shorthands for REQUIRED, OPTIONAL and EXCLUDE respectively, which is why
 one can't treat them as boolean operators (adhering to boolean algebra).

 I have been trying to construct a simple OR expression, as follows

 q = +(field1:value1 OR field2:value2)

 with a match on either field1 or field2. But since the OR is merely an
 optional, documents where both field1:value1 and field2:value2 are matched,
 the query returns a score resulting in a match on both the clauses.

 How do I enforce short-circuiting in this context? In other words, how to
 implement short-circuiting as in boolean algebra where an expression A || B
 || C returns true if A is true without even looking into whether B or C
 could be true.
 -Deepak


adding date column to the index

2013-07-22 Thread Mysurf Mail
I have added a date field to my index.
I dont want the query to search on this field, but I want it to be returned
with each row.
So I have defined it in the scema.xml as follows:
  field name=LastModificationTime type=date indexed=false
stored=true required=true/



I added it to the select in data-config.xml and I see it selected in the
profiler.
now, when I query all fileds (using the dashboard) I dont see it.
Even when I ask for it specifically I dont see it.
What am I doing wrong?

(In the db it is (datetimeoffset(7)))


Re: Auto-sharding and numShard parameter

2013-07-22 Thread Michael Della Bitta
That would be great.

One step toward this goal is to stop treating the situation where there are
no collections or cores as an error condition. It took me a while to get
out of the mindset when bringing up a Solr install that I had to avoid that
scenario at all costs, because red text == bad.

There's no reason for the web interface to be deactivated when there are no
collections or cores, though. Imagine if mysql didn't let you connect to it
via phpmyadmin if you hadn't configured a database yet?


Michael Della Bitta

Applications Developer

o: +1 646 532 3062  | c: +1 917 477 7906

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions https://twitter.com/Appinions | g+:
plus.google.com/appinions
w: appinions.com http://www.appinions.com/


On Sat, Jul 20, 2013 at 10:33 PM, Mark Miller markrmil...@gmail.com wrote:

 A lot has changed since those example were written - in general, we are
 moving away from that type of collection initialization and towards using
 the Collections API. Eventually, I'd personally like SolrCloud to ship with
 no predefined collections and have users simply start it and then start
 using the Collections API - preconfigured collections will be second class
 and possibly deprecated at some point.

 - Mark

 On Jul 20, 2013, at 10:13 PM, Erick Erickson erickerick...@gmail.com
 wrote:

  Flavio:
 
  One of the great things about having people continually using Solr
  (and SolrCloud) for the first time is the opportunity to improve the
  docs. Anyone can update/add to the docs, all it takes is a signon.
  Unfortunately we has a bunch of spam bots a while ago, so it's now a
  two step process
  1 create a login on the Solr wiki
  2 post a message on this list indicating that you'd like to help
  improve the Wiki and give us your Solr login. We'll add you to the
  list of people who can edit the wiki and you can help the community by
  improving the documentation.
 
  Best
  Erick
 
  On Fri, Jul 19, 2013 at 8:46 AM, Flavio Pompermaier
  pomperma...@okkam.it wrote:
  Thank you for the reply Erick,
  I was facing exactly with that problem..from the documentation it seems
  that those parameter are required to run SolrCloud,
  instead they are just used to initialize a sample collection..
  I think that in the examples on the user doc it should be better to
  separate those 2 concepts: one is starting the server,
  another one is creating/managing collections.
 
  Best,
  Flavio
 
 
  On Fri, Jul 19, 2013 at 2:13 PM, Erick Erickson 
 erickerick...@gmail.comwrote:
 
  First the numShards parameter is only relevant the very first time you
  create your collection. It's a little confusing because in the
 SolrCloud
  examples you're getting collection1 by default. Look further down the
  SolrCloud Wiki page, the section titled
  Managing Collections via the Collections API for creating collections
  with a different name.
 
  Either way, either when you run the bootstrap command or when you
  create a new collection, that's the only time numShards counts. It's
  ignored the rest of the time.
 
  As far as data growing, you need to either
  1 create enough shards to handle the eventual size things will be,
  sometimes called oversharding
  or
  2 use the splitShard capabilities in very recent Solrs to expand
  capacity.
 
  Best
  Erick
 
  On Thu, Jul 18, 2013 at 4:52 PM, Flavio Pompermaier
  pomperma...@okkam.it wrote:
  Hi to all,
  Probably this question has a simple answer but I just want to be sure
 of
  the potential drawbacks..when I run SolrCloud I run the main solr
  instance
  with the -numShard option (e.g. 2).
  Then as data grows, shards could potentially become a huge number. If
 I
  hadstio to restart all nodes and I re-run the master with the
 numShard=2,
  what will happen? It will be just ignored or Solr will try to reduce
  shards...?
 
  Another question...in SolrCloud, how do I restart all the cloud at
 once?
  Is
  it possible?
 
  Best,
  Flavio
 




Re: Programatic instantiation of solr container and cores with config loaded from a jar

2013-07-22 Thread Alexandre Rafalovitch
Does it mean that I can easily load Solr configuration as parsed by Solr
from an external program?

Because the last time I tried (4.3.1), the number of jars required was
quite long, including SolrJ jar due to some exception.

Regards.,
   Alex

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Mon, Jul 22, 2013 at 7:32 AM, Alan Woodward a...@flax.co.uk wrote:

 Hi Robert,

 The upcoming 4.4 release should make this a bit easier (you can check out
 the release branch now if you like, or wait a few days for the official
 version).  CoreContainer now takes a SolrResourceLoader and a ConfigSolr
 object as constructor parameters, and you can create a ConfigSolr object
 from a string representation of solr.xml using the ConfigSolr.fromString()
 static method.

 Alan Woodward
 www.flax.co.uk


 On 22 Jul 2013, at 11:41, Robert Krüger wrote:

  Hi,
 
  I use solr embedded in a desktop app and I want to change it to no
  longer require the configuration for the container and core to be in
  the filesystem but rather be distributed as part of a jar file.
 
  Could someone kindly point me to the right docs?
 
  So far my impression is, I need to instantiate CoreContainer with a
  custom SolrResourceLoader with properties parsed via some other API
  but from the javadocs alone I feel a bit lost (why does it have to
  have an instance directory at all?) and googling did not give me many
  results. What would be ideal would be to have something like this
  (pseudocode with partly imagined names, which hopefully illustrates
  what I am trying to achieve):
 
  ContainerConfig containerConfig =
  ContainerConfigParser.parse(InputStream from Classloader);
  CoreContainer  container = new CoreContainer(containerConfig);
 
  CoreConfig coreConfig = CoreConfigParser.parse(container, InputStream
  from Classloader);
  container.register(name, coreConfig);
 
  Ideally I would like to keep XML format to reuse my current solr.xml
  and solrconfig.xml but that is just a nice-to-have.
 
  Does such a way exist and if so, what are the real API classes and calls
 to use?
 
  Thank you in advance,
 
  Robert




Re: Problem instatanting a ValueSourceParser plugin in 4.3.1

2013-07-22 Thread Timothy Potter
I saw something similar and used an absolute path to my JAR file in
solrconfig.xml vs. a relative path and it resolved the issue for me.
Not elegant but worth trying, at least to rule that out.


Tim

On Mon, Jul 22, 2013 at 7:51 AM, Abeygunawardena, Niran
niran.abeygunaward...@proquest.co.uk wrote:
 Hi,

 I'm trying to migrate to Solr 4.3.1 from Solr 4.0.0. I have a Solr Plugin 
 which extends ValueSourceParser and it works under Solr 4.0.0 but it does not 
 work under Solr 4.3.1. I compiled the plugin using the solr-4.3.1*.jars and 
 lucene-4.3.1*.jars but I get the following stacktrace error when starting up 
 a core referencing this plugin...seen below. Does anyone know why it might be 
 giving me a ClassCastException under 4.3.1?

 Thanks,
 Niran

 2458 [coreLoadExecutor-3-thread-2] ERROR org.apache.solr.core.CoreContainer   
 Unable to create core: example_core
 org.apache.solr.common.SolrException: Error Instantiating ValueSourceParser, 
 com.example.HitsValueSourceParser failed to instanti
 ate org.apache.solr.search.ValueSourceParser
 at org.apache.solr.core.SolrCore.init(SolrCore.java:821)
 at org.apache.solr.core.SolrCore.init(SolrCore.java:618)
 at 
 org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:949)
 at org.apache.solr.core.CoreContainer.create(CoreContainer.java:984)
 at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597)
 at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592)
 at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
 at java.util.concurrent.FutureTask.run(Unknown Source)
 at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
 at java.util.concurrent.FutureTask.run(Unknown Source)
 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown 
 Source)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 at java.lang.Thread.run(Unknown Source)
 Caused by: org.apache.solr.common.SolrException: Error Instantiating 
 ValueSourceParser, com.example.HitsValueSourceParser failed
 to instantiate org.apache.solr.search.ValueSourceParser
 at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:539)
 at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:575)
 at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2088)
 at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2082)
 at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2115)
 at 
 org.apache.solr.core.SolrCore.initValueSourceParsers(SolrCore.java:2027)
 at org.apache.solr.core.SolrCore.init(SolrCore.java:749)
 ... 13 more
 Caused by: java.lang.ClassCastException: class 
 com.example.HitsValueSourceParser
 at java.lang.Class.asSubclass(Unknown Source)
 at 
 org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:448)
 at 
 org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:396)
 at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:518)
 ... 19 more
 2466 [coreLoadExecutor-3-thread-2] ERROR org.apache.solr.core.CoreContainer   
 null:org.apache.solr.common.SolrException: Unable to create core: example_core
 at 
 org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:1450)
 at org.apache.solr.core.CoreContainer.create(CoreContainer.java:993)
 at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597)
 at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592)
 at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
 at java.util.concurrent.FutureTask.run(Unknown Source)
 at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
 at java.util.concurrent.FutureTask.run(Unknown Source)
 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown 
 Source)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 at java.lang.Thread.run(Unknown Source)
 Caused by: org.apache.solr.common.SolrException: Error Instantiating 
 ValueSourceParser, com.example.HitsValueSourceParser failed
 to instantiate org.apache.solr.search.ValueSourceParser
 at org.apache.solr.core.SolrCore.init(SolrCore.java:821)
 at org.apache.solr.core.SolrCore.init(SolrCore.java:618)
 at 
 org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:949)
 at org.apache.solr.core.CoreContainer.create(CoreContainer.java:984)
 ... 10 more
 Caused by: org.apache.solr.common.SolrException: Error Instantiating 
 ValueSourceParser, com.example.HitsValueSourceParser failed
 to instantiate org.apache.solr.search.ValueSourceParser
   

RE: Problem instatanting a ValueSourceParser plugin in 4.3.1

2013-07-22 Thread Abeygunawardena, Niran
Thanks Tim. 

I copied my jar containing the plugin to the solr's lib directory as it wasn't 
finding my jar due to a bug in 4.3:
https://issues.apache.org/jira/browse/SOLR-4791
but the ClassCastException remains. I'll try solr 4.2 and see if the plugin 
works in that.

Cheers,
Niran

 
-Original Message-
From: Timothy Potter [mailto:thelabd...@gmail.com] 
Sent: 22 July 2013 15:39
To: solr-user@lucene.apache.org
Subject: Re: Problem instatanting a ValueSourceParser plugin in 4.3.1

I saw something similar and used an absolute path to my JAR file in 
solrconfig.xml vs. a relative path and it resolved the issue for me.
Not elegant but worth trying, at least to rule that out.


Tim

On Mon, Jul 22, 2013 at 7:51 AM, Abeygunawardena, Niran 
niran.abeygunaward...@proquest.co.uk wrote:
 Hi,

 I'm trying to migrate to Solr 4.3.1 from Solr 4.0.0. I have a Solr Plugin 
 which extends ValueSourceParser and it works under Solr 4.0.0 but it does not 
 work under Solr 4.3.1. I compiled the plugin using the solr-4.3.1*.jars and 
 lucene-4.3.1*.jars but I get the following stacktrace error when starting up 
 a core referencing this plugin...seen below. Does anyone know why it might be 
 giving me a ClassCastException under 4.3.1?

 Thanks,
 Niran

 2458 [coreLoadExecutor-3-thread-2] ERROR org.apache.solr.core.CoreContainer   
 Unable to create core: example_core
 org.apache.solr.common.SolrException: Error Instantiating 
 ValueSourceParser, com.example.HitsValueSourceParser failed to instanti ate 
 org.apache.solr.search.ValueSourceParser
 at org.apache.solr.core.SolrCore.init(SolrCore.java:821)
 at org.apache.solr.core.SolrCore.init(SolrCore.java:618)
 at 
 org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:949)
 at org.apache.solr.core.CoreContainer.create(CoreContainer.java:984)
 at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597)
 at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592)
 at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
 at java.util.concurrent.FutureTask.run(Unknown Source)
 at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
 at java.util.concurrent.FutureTask.run(Unknown Source)
 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown 
 Source)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 at java.lang.Thread.run(Unknown Source) Caused by: 
 org.apache.solr.common.SolrException: Error Instantiating 
 ValueSourceParser, com.example.HitsValueSourceParser failed to instantiate 
 org.apache.solr.search.ValueSourceParser
 at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:539)
 at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:575)
 at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2088)
 at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2082)
 at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2115)
 at 
 org.apache.solr.core.SolrCore.initValueSourceParsers(SolrCore.java:2027)
 at org.apache.solr.core.SolrCore.init(SolrCore.java:749)
 ... 13 more
 Caused by: java.lang.ClassCastException: class 
 com.example.HitsValueSourceParser
 at java.lang.Class.asSubclass(Unknown Source)
 at 
 org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:448)
 at 
 org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:396)
 at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:518)
 ... 19 more
 2466 [coreLoadExecutor-3-thread-2] ERROR org.apache.solr.core.CoreContainer   
 null:org.apache.solr.common.SolrException: Unable to create core: example_core
 at 
 org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:1450)
 at org.apache.solr.core.CoreContainer.create(CoreContainer.java:993)
 at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597)
 at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592)
 at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
 at java.util.concurrent.FutureTask.run(Unknown Source)
 at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
 at java.util.concurrent.FutureTask.run(Unknown Source)
 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown 
 Source)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 at java.lang.Thread.run(Unknown Source) Caused by: 
 org.apache.solr.common.SolrException: Error Instantiating 
 ValueSourceParser, com.example.HitsValueSourceParser failed to instantiate 
 org.apache.solr.search.ValueSourceParser
 

Re: custom field type plugin

2013-07-22 Thread David Smiley (@MITRE.org)
Like Hoss said, you're going to have to solve this using
http://wiki.apache.org/solr/SpatialForTimeDurations
Using PointType is *not* going to work because your durations are
multi-valued per document.

It would be useful to create a custom field type that wraps the capability
outlined on the wiki to make it easier to use without requiring the user to
think spatially.

You mentioned that these numeric ranges extend upwards of 10 billion or so. 
Unfortunately, the current prefix tree implementation under the hood for
non-geodetic spatial, the QuadTree, is unlikely to scale to numbers that
big.  I don't know where the boundary is, but I doubt 10B.  You could try
and see what happens.  I'm working (very slowly on very little spare time)
on improving the PrefixTree implementations to scale to such large numbers;
I hope something will be available this fall.

~ David Smiley


Kevin Stone wrote
 I have a particular use case that I think might require a custom field
 type, however I am having trouble getting the plugin to work.
 My use case has to do with genetics data, and we are running into several
 situations were we need to be able to query multiple regions of a
 chromosome (or gene, or other object types). All that really boils down to
 is being able to give a number, e.g. 10234, and return documents that have
 regions containing the number. So you'd have a document with a list like
 [1:16090,400:8000,40123:43564], and it should come back because
 10234 falls between 1:16090. If there is a better or easier way to
 do this please speak up. I'd rather not have to use a join on another
 index, because 1) it's more complex to set up, and 2) we might need to
 join against something else and you can only do one join at a time.
 
 Anyway… I tried creating a field type similar to a PointType just to see
 if I could get one working. I added the following jars to get it to
 compile:
 apache-solr-core-4.0.0,lucene-core-4.0.0,lucene-queries-4.0.0,apache-solr-solrj-4.0.0.
 I am running solr 4.0.0 on jetty, and put my jar file in a sharedLib
 folder, and specified it in my solr.xml (I have multiple cores).
 
 After starting up solr, I got the line that it picked up the jar:
 INFO: Adding 'file:/blah/blah/lib/CustomPlugins.jar' to classloader
 
 But I get this error about it not being able to find the
 AbstractSubTypeFieldType class.
 Here is the first bit of the trace:
 
 SEVERE: null:java.lang.NoClassDefFoundError:
 org/apache/solr/schema/AbstractSubTypeFieldType
 at java.lang.ClassLoader.defineClass1(Native Method)
 at java.lang.ClassLoader.defineClass(ClassLoader.java:791)
 at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
 at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
 at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 ...etc…
 
 
 Any hints as to what I did wrong? I can provide source code, or a fuller
 stack trace, config settings, etc.
 
 Also, I did try to unpack the solr.war, stick my jar in WEB-INF/lib, then
 repack. However, when I did that, I get a NoClassDefFoundError for my
 plugin itself.
 
 
 Thanks,
 Kevin
 
 The information in this email, including attachments, may be confidential
 and is intended solely for the addressee(s). If you believe you received
 this email by mistake, please notify the sender by return email as soon as
 possible.





-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/custom-field-type-plugin-tp4079086p4079494.html
Sent from the Solr - User mailing list archive at Nabble.com.


Node down, but not out

2013-07-22 Thread jimtronic
I've run into a problem recently that's difficult to debug and search for:

I have three nodes in a cluster and this weekend one of the nodes went
partially down. It no longer responds to distributed updates and it is
marked as GONE in the Cloud view of the admin screen. That's not ideal, but
there's still two boxes up so not the end of the world.

The problem is that it is still responding to ping requests and returning
queries successfully. In my setup, I have the three servers on an haproxy
load balancer so that I can distribute requests and have clients stick to a
specific solr box. Because the bad node is still returning OK to the ping
requests and still returns results for simple queries, the load balancer
does not remove it from the group.

Is there a ping like request handler that would tell me whether the given
box I'm hitting is still in the cloud?

Thanks!
Jim Musil



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Node-down-but-not-out-tp4079495.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Regex in Stopword.xml

2013-07-22 Thread Scatman
I know it because i actually want to change GSA with Solr who his much better
in the enterprise's situation :) 

Thank's for reply anyway !

Best,
Scatman.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Regex-in-Stopword-xml-tp4079412p4079491.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: short-circuit OR operator in lucene/solr

2013-07-22 Thread Roman Chyla
Deepak,

I think your goal is to gain something in speed, but most likely the
function query will be slower than the query without score computation (the
filter query) - this stems from the fact how the query is executed, but I
may, of course, be wrong. Would you mind sharing measurements you make?

Thanks,

  roman


On Mon, Jul 22, 2013 at 10:54 AM, Yonik Seeley yo...@lucidworks.com wrote:

 function queries to the rescue!

 q={!func}def(query($a),query($b),query($c))
 a=field1:value1
 b=field2:value2
 c=field3:value3

 def or default function returns the value of the first argument that
 matches.  It's named default because it's more commonly used like
 def(popularity,50)  (return the value of the popularity field, or 50
 if the doc has no value for that field).

 -Yonik
 http://lucidworks.com


 On Sun, Jul 21, 2013 at 8:48 PM, Deepak Konidena deepakk...@gmail.com
 wrote:
  I understand that lucene's AND (), OR (||) and NOT (!) operators are
  shorthands for REQUIRED, OPTIONAL and EXCLUDE respectively, which is why
  one can't treat them as boolean operators (adhering to boolean algebra).
 
  I have been trying to construct a simple OR expression, as follows
 
  q = +(field1:value1 OR field2:value2)
 
  with a match on either field1 or field2. But since the OR is merely an
  optional, documents where both field1:value1 and field2:value2 are
 matched,
  the query returns a score resulting in a match on both the clauses.
 
  How do I enforce short-circuiting in this context? In other words, how to
  implement short-circuiting as in boolean algebra where an expression A
 || B
  || C returns true if A is true without even looking into whether B or C
  could be true.
  -Deepak



Re: short-circuit OR operator in lucene/solr

2013-07-22 Thread Erick Erickson
Sweet!


On Mon, Jul 22, 2013 at 10:54 AM, Yonik Seeley yo...@lucidworks.com wrote:
 function queries to the rescue!

 q={!func}def(query($a),query($b),query($c))
 a=field1:value1
 b=field2:value2
 c=field3:value3

 def or default function returns the value of the first argument that
 matches.  It's named default because it's more commonly used like
 def(popularity,50)  (return the value of the popularity field, or 50
 if the doc has no value for that field).

 -Yonik
 http://lucidworks.com


 On Sun, Jul 21, 2013 at 8:48 PM, Deepak Konidena deepakk...@gmail.com wrote:
 I understand that lucene's AND (), OR (||) and NOT (!) operators are
 shorthands for REQUIRED, OPTIONAL and EXCLUDE respectively, which is why
 one can't treat them as boolean operators (adhering to boolean algebra).

 I have been trying to construct a simple OR expression, as follows

 q = +(field1:value1 OR field2:value2)

 with a match on either field1 or field2. But since the OR is merely an
 optional, documents where both field1:value1 and field2:value2 are matched,
 the query returns a score resulting in a match on both the clauses.

 How do I enforce short-circuiting in this context? In other words, how to
 implement short-circuiting as in boolean algebra where an expression A || B
 || C returns true if A is true without even looking into whether B or C
 could be true.
 -Deepak


Re: Solr 4.3.1 - SolrCloud nodes down and lost documents

2013-07-22 Thread Timothy Potter
A couple of things I've learned along the way ...

I had a similar architecture where we used fairly low numbers for
auto-commits with openSearcher=false. This keeps the tlog to a
reasonable size. You'll need something on the client side to send in
the hard commit request to open a new searcher every N docs or M
minutes.

Be careful with raising the Zk timeout as that also determines how
quickly Zk can detect a node has crashed (afaik). In other words, it
takes the zk client timeout seconds for Zk to consider an ephemeral
znode as gone, so I caution you in increasing this value too much.

The other thing to be aware of is this leaderVoteWait safety mechanism
... might see log messages that look like:

2013-06-24 18:12:40,408 [coreLoadExecutor-4-thread-1] INFO
solr.cloud.ShardLeaderElectionContext  - Waiting until we see more
replicas up: total=2 found=1 timeoutin=139368

From Mark M: This is a safety mechanism - you can turn it off by
configuring leaderVoteWait to 0 in solr.xml. This is meant to protect
the case where you stop a shard or it fails and then the first node to
get started back up has stale data - you don't want it to just become
the leader. So we wait to see everyone we know about in the shard up
to 3 or 5 min by default. Then we know all the shards participate in
the leader election and the leader will end up with all updates it
should have. You can lower that wait or turn it off with 0.

NOTE: I tried setting it to 0 and my cluster went haywire, so consider
just lowering it but not making it zero ;-)

Max heap of 8GB seems overly large to me for 8M docs per shard esp.
since you're using MMapDirectory to cache the primary data structures
of your index in OS cache. I have run shards with 40M docs with 6GB
max heap and chose to have more aggressive cache eviction by using a
smallish LFU filter cache. This approach seems to spread the cost of
GC out over time vs. massive amounts of clean-up when a new searcher
is opened. With 8M docs, each cached filter will require about 1M of
memory, so it seems like you could run with a smaller heap. I'm not a
GC expert but found that having smaller heap and more aggressive cache
evictions reduced full GC's (and how long they run for) on my Solr
instances.

On Mon, Jul 22, 2013 at 8:09 AM, Shawn Heisey s...@elyograg.org wrote:
 On 7/22/2013 6:45 AM, Markus Jelsma wrote:
 You should increase your ZK time out, this may be the issue in your case. 
 You may also want to try the G1GC collector to keep STW under ZK time out.

 When I tried G1, the occasional stop-the-world GC actually got worse.  I
 tried G1 after trying CMS with no other tuning parameters.  The average
 GC time went down, but when it got into a place where it had to do a
 stop-the-world collection, it was worse.

 Based on the GC statistics in jvisualvm and jstat, I didn't think I had
 a problem.  The way I discovered that I had a problem was by looking at
 my haproxy load balancer -- sometimes requests would be sent to a backup
 server instead of my primary, because the ping request handler was
 timing out on the LB health check.  The LB was set to time out after
 five seconds.  When I went looking deeper with the GC log and some other
 tools, I was seeing 8-10 second GC pauses.  G1 was showing me pauses of
 12 seconds.

 Now I use a heavily tuned CMS config, and there are no more LB switches
 to a backup server.  I've put some of my own information about my GC
 settings on my personal Solr wiki page:

 http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning

 I've got an 8GB heap on my systems running 3.5.0 (one copy of the index)
 and a 6GB heap on those running 4.2.1 (the other copy of the index).

 Summary: Just switching to the G1 collector won't solve GC pause
 problems.  There's not a lot of G1 tuning information out there yet.  If
 someone can come up with a good set of G1 tuning parameters, G1 might
 become better than CMS.

 Thanks,
 Shawn



RE: Problem instatanting a ValueSourceParser plugin in 4.3.1

2013-07-22 Thread Abeygunawardena, Niran
Hi,

Upgrading to Solr 4.2.1 works for my plugin but 4.3.1 does not work. I believe 
the ClassCastException which I am getting in 4.3.1 is due to this bug in 4.3.1:
https://issues.apache.org/jira/browse/SOLR-4791

Thanks,
Niran

-Original Message-
From: Abeygunawardena, Niran [mailto:niran.abeygunaward...@proquest.co.uk] 
Sent: 22 July 2013 16:01
To: solr-user@lucene.apache.org
Subject: RE: Problem instatanting a ValueSourceParser plugin in 4.3.1

Thanks Tim. 

I copied my jar containing the plugin to the solr's lib directory as it wasn't 
finding my jar due to a bug in 4.3:
https://issues.apache.org/jira/browse/SOLR-4791
but the ClassCastException remains. I'll try solr 4.2 and see if the plugin 
works in that.

Cheers,
Niran

 
-Original Message-
From: Timothy Potter [mailto:thelabd...@gmail.com]
Sent: 22 July 2013 15:39
To: solr-user@lucene.apache.org
Subject: Re: Problem instatanting a ValueSourceParser plugin in 4.3.1

I saw something similar and used an absolute path to my JAR file in 
solrconfig.xml vs. a relative path and it resolved the issue for me.
Not elegant but worth trying, at least to rule that out.


Tim

On Mon, Jul 22, 2013 at 7:51 AM, Abeygunawardena, Niran 
niran.abeygunaward...@proquest.co.uk wrote:
 Hi,

 I'm trying to migrate to Solr 4.3.1 from Solr 4.0.0. I have a Solr Plugin 
 which extends ValueSourceParser and it works under Solr 4.0.0 but it does not 
 work under Solr 4.3.1. I compiled the plugin using the solr-4.3.1*.jars and 
 lucene-4.3.1*.jars but I get the following stacktrace error when starting up 
 a core referencing this plugin...seen below. Does anyone know why it might be 
 giving me a ClassCastException under 4.3.1?

 Thanks,
 Niran

 2458 [coreLoadExecutor-3-thread-2] ERROR org.apache.solr.core.CoreContainer   
 Unable to create core: example_core
 org.apache.solr.common.SolrException: Error Instantiating 
 ValueSourceParser, com.example.HitsValueSourceParser failed to instanti ate 
 org.apache.solr.search.ValueSourceParser
 at org.apache.solr.core.SolrCore.init(SolrCore.java:821)
 at org.apache.solr.core.SolrCore.init(SolrCore.java:618)
 at 
 org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:949)
 at org.apache.solr.core.CoreContainer.create(CoreContainer.java:984)
 at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597)
 at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592)
 at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
 at java.util.concurrent.FutureTask.run(Unknown Source)
 at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
 at java.util.concurrent.FutureTask.run(Unknown Source)
 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown 
 Source)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 at java.lang.Thread.run(Unknown Source) Caused by: 
 org.apache.solr.common.SolrException: Error Instantiating 
 ValueSourceParser, com.example.HitsValueSourceParser failed to instantiate 
 org.apache.solr.search.ValueSourceParser
 at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:539)
 at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:575)
 at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2088)
 at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2082)
 at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2115)
 at 
 org.apache.solr.core.SolrCore.initValueSourceParsers(SolrCore.java:2027)
 at org.apache.solr.core.SolrCore.init(SolrCore.java:749)
 ... 13 more
 Caused by: java.lang.ClassCastException: class 
 com.example.HitsValueSourceParser
 at java.lang.Class.asSubclass(Unknown Source)
 at 
 org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:448)
 at 
 org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:396)
 at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:518)
 ... 19 more
 2466 [coreLoadExecutor-3-thread-2] ERROR org.apache.solr.core.CoreContainer   
 null:org.apache.solr.common.SolrException: Unable to create core: example_core
 at 
 org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:1450)
 at org.apache.solr.core.CoreContainer.create(CoreContainer.java:993)
 at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597)
 at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592)
 at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
 at java.util.concurrent.FutureTask.run(Unknown Source)
 at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
 at 

Re: deserializing highlighting json result

2013-07-22 Thread Jack Krupansky

Exactly why is it difficult to deserialize? Seems simple enough.

-- Jack Krupansky

-Original Message- 
From: Mysurf Mail 
Sent: Monday, July 22, 2013 11:14 AM 
To: solr-user@lucene.apache.org 
Subject: deserializing highlighting json result 


When I request a json result I get the following streucture in the
highlighting

{highlighting:{
  394c65f1-dfb1-4b76-9b6c-2f14c9682cc9:{
 PackageName:[- emTestingem channel twenty.]},
  baf8434a-99a4-4046-8a4d-2f7ec09eafc8:{
 PackageName:[- emTestingem channel twenty.]},
  0a699062-cd09-4b2e-a817-330193a352c1:{
PackageName:[- emTestingem channel twenty.]},
  0b9ec891-5ef8-4085-9de2-38bfa9ea327e:{
PackageName:[- emTestingem channel twenty.]}}}


It is difficult to deserialize this json because the guid is in the
attribute name.
Is that solveable (using c#)?


Re: Node down, but not out

2013-07-22 Thread Timothy Potter
Why was it down? e.g. did it OOM? If so, the recommended approach is
kill the process on OOM vs. leaving it in the cluster in a zombie
state. I had similar issues when my nodes OOM'd is why I ask. That
said, you can get the /clusterstate.json which contains Zk's status of
a node using a request like:
http://localhost:8983/solr/zookeeper?detail=truepath=%2Fclusterstate.json
Although that would require some basic JSON processing to dig into the
response to get the status of the node of interest, so you may want to
implement a custom request handler.

On Mon, Jul 22, 2013 at 9:55 AM, jimtronic jimtro...@gmail.com wrote:
 I've run into a problem recently that's difficult to debug and search for:

 I have three nodes in a cluster and this weekend one of the nodes went
 partially down. It no longer responds to distributed updates and it is
 marked as GONE in the Cloud view of the admin screen. That's not ideal, but
 there's still two boxes up so not the end of the world.

 The problem is that it is still responding to ping requests and returning
 queries successfully. In my setup, I have the three servers on an haproxy
 load balancer so that I can distribute requests and have clients stick to a
 specific solr box. Because the bad node is still returning OK to the ping
 requests and still returns results for simple queries, the load balancer
 does not remove it from the group.

 Is there a ping like request handler that would tell me whether the given
 box I'm hitting is still in the cloud?

 Thanks!
 Jim Musil



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Node-down-but-not-out-tp4079495.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: queryResultCache should not related with the order of fq's list

2013-07-22 Thread Chris Hostetter

: By the way, if the issure is ok , how can I post my code? 

Take a look at this wiki page for imformation on submitting patches...

https://wiki.apache.org/solr/HowToContribute
https://wiki.apache.org/solr/HowToContribute#Generating_a_patch

...you can attach your patch directly to hte Jira issue you created...

https://wiki.apache.org/solr/HowToContribute#Contributing_your_work


-Hoss


Re: how to improve (keyword) relevance?

2013-07-22 Thread eShard
Sure, let's say the user types in test pdf;
we need the results with all the query words to be near the top of the
result set.
the query will look like this: /select?q=text%3Atest+pdfwt=xml

How do I ensure that the top resultset contains all of the query words?
How can I boost the first (or second) term when they are both the same field
(i.e. text)?

Does this make sense?

Please bear with me; I'm still new to the solr query syntax so I don't even
know if I'm asking the right question. 

Thanks,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-improve-keyword-relevance-tp4079462p4079502.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Node down, but not out

2013-07-22 Thread jimtronic
I'm not sure why it went down exactly -- I restarted the process and lost the
logs. (d'oh!) 

An OOM seems likely, however. Is there a setting for killing the processes
when solr encounters an OOM?

Thanks!

Jim



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Node-down-but-not-out-tp4079495p4079507.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Auto-sharding and numShard parameter

2013-07-22 Thread Mark Miller
There is a reason of course, or else it wouldn't be like that.

We addressed it recently.

https://issues.apache.org/jira/browse/SOLR-3633
https://issues.apache.org/jira/browse/SOLR-3677
https://issues.apache.org/jira/browse/SOLR-4943

- Mark

On Jul 22, 2013, at 10:57 AM, Michael Della Bitta 
michael.della.bi...@appinions.com wrote:

 That would be great.
 
 One step toward this goal is to stop treating the situation where there are
 no collections or cores as an error condition. It took me a while to get
 out of the mindset when bringing up a Solr install that I had to avoid that
 scenario at all costs, because red text == bad.
 
 There's no reason for the web interface to be deactivated when there are no
 collections or cores, though. Imagine if mysql didn't let you connect to it
 via phpmyadmin if you hadn't configured a database yet?
 
 
 Michael Della Bitta
 
 Applications Developer
 
 o: +1 646 532 3062  | c: +1 917 477 7906
 
 appinions inc.
 
 “The Science of Influence Marketing”
 
 18 East 41st Street
 
 New York, NY 10017
 
 t: @appinions https://twitter.com/Appinions | g+:
 plus.google.com/appinions
 w: appinions.com http://www.appinions.com/
 
 
 On Sat, Jul 20, 2013 at 10:33 PM, Mark Miller markrmil...@gmail.com wrote:
 
 A lot has changed since those example were written - in general, we are
 moving away from that type of collection initialization and towards using
 the Collections API. Eventually, I'd personally like SolrCloud to ship with
 no predefined collections and have users simply start it and then start
 using the Collections API - preconfigured collections will be second class
 and possibly deprecated at some point.
 
 - Mark
 
 On Jul 20, 2013, at 10:13 PM, Erick Erickson erickerick...@gmail.com
 wrote:
 
 Flavio:
 
 One of the great things about having people continually using Solr
 (and SolrCloud) for the first time is the opportunity to improve the
 docs. Anyone can update/add to the docs, all it takes is a signon.
 Unfortunately we has a bunch of spam bots a while ago, so it's now a
 two step process
 1 create a login on the Solr wiki
 2 post a message on this list indicating that you'd like to help
 improve the Wiki and give us your Solr login. We'll add you to the
 list of people who can edit the wiki and you can help the community by
 improving the documentation.
 
 Best
 Erick
 
 On Fri, Jul 19, 2013 at 8:46 AM, Flavio Pompermaier
 pomperma...@okkam.it wrote:
 Thank you for the reply Erick,
 I was facing exactly with that problem..from the documentation it seems
 that those parameter are required to run SolrCloud,
 instead they are just used to initialize a sample collection..
 I think that in the examples on the user doc it should be better to
 separate those 2 concepts: one is starting the server,
 another one is creating/managing collections.
 
 Best,
 Flavio
 
 
 On Fri, Jul 19, 2013 at 2:13 PM, Erick Erickson 
 erickerick...@gmail.comwrote:
 
 First the numShards parameter is only relevant the very first time you
 create your collection. It's a little confusing because in the
 SolrCloud
 examples you're getting collection1 by default. Look further down the
 SolrCloud Wiki page, the section titled
 Managing Collections via the Collections API for creating collections
 with a different name.
 
 Either way, either when you run the bootstrap command or when you
 create a new collection, that's the only time numShards counts. It's
 ignored the rest of the time.
 
 As far as data growing, you need to either
 1 create enough shards to handle the eventual size things will be,
 sometimes called oversharding
 or
 2 use the splitShard capabilities in very recent Solrs to expand
 capacity.
 
 Best
 Erick
 
 On Thu, Jul 18, 2013 at 4:52 PM, Flavio Pompermaier
 pomperma...@okkam.it wrote:
 Hi to all,
 Probably this question has a simple answer but I just want to be sure
 of
 the potential drawbacks..when I run SolrCloud I run the main solr
 instance
 with the -numShard option (e.g. 2).
 Then as data grows, shards could potentially become a huge number. If
 I
 hadstio to restart all nodes and I re-run the master with the
 numShard=2,
 what will happen? It will be just ignored or Solr will try to reduce
 shards...?
 
 Another question...in SolrCloud, how do I restart all the cloud at
 once?
 Is
 it possible?
 
 Best,
 Flavio
 
 
 



deserializing highlighting json result

2013-07-22 Thread Mysurf Mail
When I request a json result I get the following streucture in the
highlighting

{highlighting:{
   394c65f1-dfb1-4b76-9b6c-2f14c9682cc9:{
  PackageName:[- emTestingem channel twenty.]},
   baf8434a-99a4-4046-8a4d-2f7ec09eafc8:{
  PackageName:[- emTestingem channel twenty.]},
   0a699062-cd09-4b2e-a817-330193a352c1:{
 PackageName:[- emTestingem channel twenty.]},
   0b9ec891-5ef8-4085-9de2-38bfa9ea327e:{
 PackageName:[- emTestingem channel twenty.]}}}


It is difficult to deserialize this json because the guid is in the
attribute name.
Is that solveable (using c#)?


Re: adding date column to the index

2013-07-22 Thread Gora Mohanty
On 22 July 2013 20:01, Mysurf Mail stammail...@gmail.com wrote:

 I have added a date field to my index.
 I dont want the query to search on this field, but I want it to be
 returned
 with each row.
 So I have defined it in the scema.xml as follows:
   field name=LastModificationTime type=date indexed=false
 stored=true required=true/



 I added it to the select in data-config.xml and I see it selected in the
 profiler.
 now, when I query all fileds (using the dashboard) I dont see it.
 Even when I ask for it specifically I dont see it.
 What am I doing wrong?

 (In the db it is (datetimeoffset(7)))

Did you restart your Java container, and reindex?

Regards,
Gora


Re: XInclude and Document Entity not working on schema.xml

2013-07-22 Thread Chris Hostetter
: to use Document Entity in schema.xml, I get this exception :
: java.lang.RuntimeException: schema fieldtype
: string(org.apache.solr.schema.StrField) invalid
: arguments:{xml:base=solrres:/commonschema_types.xml}

Elodie can you please open a bug in jira for this with your specific 
example?  please note in the Jira your comment that it works in Solr 4.2.1 
but fails in later versions (if you could test with 4.3 and the newly 
voted 4.4 that would be helpful.)

: The same error appears in this bug (fixed ?):
: https://issues.apache.org/jira/browse/SOLR-3087

That issue was specific to xinclude, not document entities, so it's 
possible the fix applied there did not affect/fix document entities -- but 
since you mentioned that you see document entity includes of 
fieldTypes working in 4.2.1 suggests that it might be a slightly diff 
problem, otherwise i would expect to see it fail as far back as 4.0 just 
like SOLR-3087...

: I also try to use use XML XInclude mechanism
: (http://en.wikipedia.org/wiki/XInclude) to include parts of schema.xml.
: 
: When I try to include a fieldType, I get this exception :
: org.apache.solr.common.SolrException: Unknown fieldType 'long' specified

...the issue you linked to before (SOLR-3087) included a specific test to 
ensure that fieldTYpes could be include like this, and that test works -- 
so pehaps in your testing you have some other subtle bug?  what are the 
absolute paths of the various files you are trying to include in one 
another?


-Hoss


Re: Programatic instantiation of solr container and cores with config loaded from a jar

2013-07-22 Thread Alan Woodward
Hi Alex,

I'm not sure I follow - are you trying to create a ConfigSolr object from data 
read in from elsewhere, or trying to export the ConfigSolr object to another 
process?  If you're dealing with solr core java objects, you'll need the solr 
jar and all its dependencies (including solrj).

Alan Woodward
www.flax.co.uk


On 22 Jul 2013, at 15:53, Alexandre Rafalovitch wrote:

 Does it mean that I can easily load Solr configuration as parsed by Solr
 from an external program?
 
 Because the last time I tried (4.3.1), the number of jars required was
 quite long, including SolrJ jar due to some exception.
 
 Regards.,
   Alex
 
 Personal website: http://www.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all at
 once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
 
 
 On Mon, Jul 22, 2013 at 7:32 AM, Alan Woodward a...@flax.co.uk wrote:
 
 Hi Robert,
 
 The upcoming 4.4 release should make this a bit easier (you can check out
 the release branch now if you like, or wait a few days for the official
 version).  CoreContainer now takes a SolrResourceLoader and a ConfigSolr
 object as constructor parameters, and you can create a ConfigSolr object
 from a string representation of solr.xml using the ConfigSolr.fromString()
 static method.
 
 Alan Woodward
 www.flax.co.uk
 
 
 On 22 Jul 2013, at 11:41, Robert Krüger wrote:
 
 Hi,
 
 I use solr embedded in a desktop app and I want to change it to no
 longer require the configuration for the container and core to be in
 the filesystem but rather be distributed as part of a jar file.
 
 Could someone kindly point me to the right docs?
 
 So far my impression is, I need to instantiate CoreContainer with a
 custom SolrResourceLoader with properties parsed via some other API
 but from the javadocs alone I feel a bit lost (why does it have to
 have an instance directory at all?) and googling did not give me many
 results. What would be ideal would be to have something like this
 (pseudocode with partly imagined names, which hopefully illustrates
 what I am trying to achieve):
 
 ContainerConfig containerConfig =
 ContainerConfigParser.parse(InputStream from Classloader);
 CoreContainer  container = new CoreContainer(containerConfig);
 
 CoreConfig coreConfig = CoreConfigParser.parse(container, InputStream
 from Classloader);
 container.register(name, coreConfig);
 
 Ideally I would like to keep XML format to reuse my current solr.xml
 and solrconfig.xml but that is just a nice-to-have.
 
 Does such a way exist and if so, what are the real API classes and calls
 to use?
 
 Thank you in advance,
 
 Robert
 
 



Re: Node down, but not out

2013-07-22 Thread Timothy Potter
There is but I couldn't get it to work in my environment on Jetty, see:

http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201306.mbox/%3CCAJt9Wnib+p_woYODtrSPhF==v8Vx==mDBd_qH=x_knbw-bn...@mail.gmail.com%3E

Let me know if you have any better luck. I had to resort to something
hacky but was out of time I could devote to such unproductive
endeavors ;-)

On Mon, Jul 22, 2013 at 10:49 AM, jimtronic jimtro...@gmail.com wrote:
 I'm not sure why it went down exactly -- I restarted the process and lost the
 logs. (d'oh!)

 An OOM seems likely, however. Is there a setting for killing the processes
 when solr encounters an OOM?

 Thanks!

 Jim



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Node-down-but-not-out-tp4079495p4079507.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: XInclude and Document Entity not working on schema.xml

2013-07-22 Thread Chris Hostetter

: Elodie can you please open a bug in jira for this with your specific 
...
: ...the issue you linked to before (SOLR-3087) included a specific test to 
: ensure that fieldTYpes could be include like this, and that test works -- 
: so pehaps in your testing you have some other subtle bug?  what are the 
: absolute paths of the various files you are trying to include in one 
: another?

Hmm... actually, i had some time while i was on a conf call, so i just 
updated the test to also test entity includes, and i wan't able to 
reproduce either of hte problems you described.

can you please take a look at this test, and the configs it uses, and 
compare with how you are trying to do things...

http://svn.apache.org/r1505749

http://svn.apache.org/viewvc/lucene/dev/branches/branch_4x/solr/core/src/test/org/apache/solr/core/TestXIncludeConfig.java?view=markup
http://svn.apache.org/viewvc/lucene/dev/branches/branch_4x/solr/core/src/test-files/solr/collection1/conf/schema-xinclude.xml?view=markup
http://svn.apache.org/viewvc/lucene/dev/branches/branch_4x/solr/core/src/test-files/solr/collection1/conf/schema-snippet-types.incl?view=markup
http://svn.apache.org/viewvc/lucene/dev/branches/branch_4x/solr/core/src/test-files/solr/collection1/conf/schema-snippet-type.xml?view=markup



-Hoss


Re: how to improve (keyword) relevance?

2013-07-22 Thread Jack Krupansky
Again, you haven't indicated what the problem is. I mean, have you actually 
confirmed that a problem exists? Add debugQuery=true to your query and 
examine the explain section if you believe that Solr has improperly 
computed any document scores.


If you simply want to boost a term in a query, use the ^ operator, which 
applies to the preceding term. a boost of 1.0 means no change, 2.0 means 
double, 0.5 means cut in half.


But, you don't need to boost. Relevancy is based on the data in the 
documents themselves.


BTW, q=text%3Atest+pdf does not search for pdf in the text field - 
field- qualification only applies to a single term, but you can use 
parentheses: q=text%3A(test+pdf)


-- Jack Krupansky

-Original Message- 
From: eShard

Sent: Monday, July 22, 2013 12:34 PM
To: solr-user@lucene.apache.org
Subject: Re: how to improve (keyword) relevance?

Sure, let's say the user types in test pdf;
we need the results with all the query words to be near the top of the
result set.
the query will look like this: /select?q=text%3Atest+pdfwt=xml

How do I ensure that the top resultset contains all of the query words?
How can I boost the first (or second) term when they are both the same field
(i.e. text)?

Does this make sense?

Please bear with me; I'm still new to the solr query syntax so I don't even
know if I'm asking the right question.

Thanks,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-improve-keyword-relevance-tp4079462p4079502.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Programatic instantiation of solr container and cores with config loaded from a jar

2013-07-22 Thread Alexandre Rafalovitch
I am trying to read a solr config files from outside of running Solr
instance. It's - one of the approaches - for SolrLint (
https://github.com/arafalov/SolrLint ). I kind of expected to just need
core Solr classes for that, but I needed SolrJ and Lucene analyzer jar and
a bunch of other jars.

The goal was to avoid recreating valid/invalid parsing of config files and
just use Solr's definition.

Anyway, I don't want to hijack the thread. In the end, I think Solr's parse
mechanism is probably not the best match for me, as I explicitly want to
detect things like field definitions in wrong place or incorrect spelling
and the current parser just ignores those by doing select XPath queries
instead.

Regards,
   Alex.



Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Mon, Jul 22, 2013 at 1:16 PM, Alan Woodward a...@flax.co.uk wrote:

 Hi Alex,

 I'm not sure I follow - are you trying to create a ConfigSolr object from
 data read in from elsewhere, or trying to export the ConfigSolr object to
 another process?  If you're dealing with solr core java objects, you'll
 need the solr jar and all its dependencies (including solrj).

 Alan Woodward
 www.flax.co.uk


 On 22 Jul 2013, at 15:53, Alexandre Rafalovitch wrote:

  Does it mean that I can easily load Solr configuration as parsed by Solr
  from an external program?
 
  Because the last time I tried (4.3.1), the number of jars required was
  quite long, including SolrJ jar due to some exception.
 
  Regards.,
Alex
 
  Personal website: http://www.outerthoughts.com/
  LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
  - Time is the quality of nature that keeps events from happening all at
  once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
 
 
  On Mon, Jul 22, 2013 at 7:32 AM, Alan Woodward a...@flax.co.uk wrote:
 
  Hi Robert,
 
  The upcoming 4.4 release should make this a bit easier (you can check
 out
  the release branch now if you like, or wait a few days for the official
  version).  CoreContainer now takes a SolrResourceLoader and a ConfigSolr
  object as constructor parameters, and you can create a ConfigSolr object
  from a string representation of solr.xml using the
 ConfigSolr.fromString()
  static method.
 
  Alan Woodward
  www.flax.co.uk
 
 
  On 22 Jul 2013, at 11:41, Robert Krüger wrote:
 
  Hi,
 
  I use solr embedded in a desktop app and I want to change it to no
  longer require the configuration for the container and core to be in
  the filesystem but rather be distributed as part of a jar file.
 
  Could someone kindly point me to the right docs?
 
  So far my impression is, I need to instantiate CoreContainer with a
  custom SolrResourceLoader with properties parsed via some other API
  but from the javadocs alone I feel a bit lost (why does it have to
  have an instance directory at all?) and googling did not give me many
  results. What would be ideal would be to have something like this
  (pseudocode with partly imagined names, which hopefully illustrates
  what I am trying to achieve):
 
  ContainerConfig containerConfig =
  ContainerConfigParser.parse(InputStream from Classloader);
  CoreContainer  container = new CoreContainer(containerConfig);
 
  CoreConfig coreConfig = CoreConfigParser.parse(container, InputStream
  from Classloader);
  container.register(name, coreConfig);
 
  Ideally I would like to keep XML format to reuse my current solr.xml
  and solrconfig.xml but that is just a nice-to-have.
 
  Does such a way exist and if so, what are the real API classes and
 calls
  to use?
 
  Thank you in advance,
 
  Robert
 
 




Re: Solr 4.3.1 - SolrCloud nodes down and lost documents

2013-07-22 Thread Lance Norskog

Are you feeding Graphite from Solr? If so, how?

On 07/19/2013 01:02 AM, Neil Prosser wrote:

That was overnight so I was unable to track exactly what happened (I'm
going off our Graphite graphs here).




Re: adding date column to the index

2013-07-22 Thread Lance Norskog
Solr/Lucene does not automatically add when asked, the way DBMS systems 
do. Instead, all data for a field is added at the same time. To get the 
new field, you have to reload all of your data.


This is also true for deleting fields. If you remove a field, that data 
does not go away until you re-index.


On 07/22/2013 07:31 AM, Mysurf Mail wrote:

I have added a date field to my index.
I dont want the query to search on this field, but I want it to be returned
with each row.
So I have defined it in the scema.xml as follows:
   field name=LastModificationTime type=date indexed=false
stored=true required=true/



I added it to the select in data-config.xml and I see it selected in the
profiler.
now, when I query all fileds (using the dashboard) I dont see it.
Even when I ask for it specifically I dont see it.
What am I doing wrong?

(In the db it is (datetimeoffset(7)))





IllegalStateException

2013-07-22 Thread Michael Long
I'm seeing random crashes in solr 4.0 but I don't have anything to go on 
other than IllegalStateException. Other than checking for corrupt 
index and out of memory, what other things should I check?



org.apache.catalina.core.StandardWrapperValve invoke
SEVERE: Servlet.service() for servlet default threw exception
java.lang.IllegalStateException
at 
org.apache.catalina.connector.ResponseFacade.sendError(ResponseFacade.java:407)
at 
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:483)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:297)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)

at java.lang.Thread.run(Thread.java:662)



Re: Performance of cross join vs block join

2013-07-22 Thread Roman Chyla
Hello Mikhail,

ps: sending to the solr-user as well, i've realized i was writing just to
you, sorry...

On Mon, Jul 22, 2013 at 3:07 AM, Mikhail Khludnev 
mkhlud...@griddynamics.com wrote:

 Hello Roman,

 Pleas get me right. I have no idea what happened with that dependency.
 There are recent patches from Yonik, they should be more actual, and I
 think he can help you with particular issues. From the common (captain's)
 sense I propose to specify any closer version of jetty, I don't think there
 are much reason to rely on that particular one.

 I'm thinking about your problem from time to time. You are right, it's
 definitely not a case for block join. I still trying to figure out how to
 make it computationally easier. As far as I get you have recursive
 many-to-many relationship and need to traverse it during the search.

 doc(id, author, text, references:[docid,] )

 I'm not sure it's possible with lucene now, but if it can, what you think
 about writing DocValues stripe contains internal Lucene docnums instead of
 external docIds. It moves few steps from query time to index time, hence
 can get some performance.


Our use case of many-to-many relations is probably a weird one and we ought
to de-normalize the values. What I do (a building a citation network in
memory, using Lucene caches) is just a work-around that happens to
out-perform the index seeking, no surprise on that, but in the expense of
memory. I am aware the de-normalization may be necessary, the DocValues
would probably be a step forward to it - the joins give great flexibility,
it is really cool, but that comes with its own price...



 Also, I mentioned you hesitates regarding cross segments join. You
 actually shouldn't due to the following reasons:
  - Join is a Solr code (which is a top reader beast);
  - it obtains and works with SolrIndexSearcher which is a top reader...
  - join happens at Weight without any awareness about leaf segments.

 https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/search/JoinQParserPlugin.java#L272


Thanks, I think I have not used (i believe) because there was very small
chance it could have been fast enough. It is reading terms/joins for docs
that match the query, so in that sense, it is not different from
pre-computing the citation cache - but it happens for every query/request,
and so for 0.5M of edges it must take some time. But I guess I should
measure it. I haven't made notes so now I am having hard time backtracking
:)

roman


 It seems to me cross segment join works well.



 On Mon, Jul 22, 2013 at 3:08 AM, Roman Chyla roman.ch...@gmail.comwrote:

 ah, in case you know the solution, here ant output:

 resolve:
 [ivy:retrieve]
 [ivy:retrieve] :: problems summary ::
 [ivy:retrieve]  WARNINGS
 [ivy:retrieve] module not found:
 org.eclipse.jetty#jetty-deploy;8.1.10.v20130312
 [ivy:retrieve]  local: tried
 [ivy:retrieve]  
 /home/rchyla/.ivy2/local/org.eclipse.jetty/jetty-deploy/8.1.10.v20130312/ivys/ivy.xml
 [ivy:retrieve]   -- artifact
 org.eclipse.jetty#jetty-deploy;8.1.10.v20130312!jetty-deploy.jar:
 [ivy:retrieve]  
 /home/rchyla/.ivy2/local/org.eclipse.jetty/jetty-deploy/8.1.10.v20130312/jars/jetty-deploy.jar
 [ivy:retrieve]  shared: tried
 [ivy:retrieve]  
 /home/rchyla/.ivy2/shared/org.eclipse.jetty/jetty-deploy/8.1.10.v20130312/ivys/ivy.xml
 [ivy:retrieve]   -- artifact
 org.eclipse.jetty#jetty-deploy;8.1.10.v20130312!jetty-deploy.jar:
 [ivy:retrieve]  
 /home/rchyla/.ivy2/shared/org.eclipse.jetty/jetty-deploy/8.1.10.v20130312/jars/jetty-deploy.jar
 [ivy:retrieve]  public: tried
 [ivy:retrieve]
 http://repo1.maven.org/maven2/org/eclipse/jetty/jetty-deploy/8.1.10.v20130312/jetty-deploy-8.1.10.v20130312.pom
 [ivy:retrieve]  sonatype-releases: tried
 [ivy:retrieve]
 http://oss.sonatype.org/content/repositories/releases/org/eclipse/jetty/jetty-deploy/8.1.10.v20130312/jetty-deploy-8.1.10.v20130312.pom
 [ivy:retrieve]   -- artifact
 org.eclipse.jetty#jetty-deploy;8.1.10.v20130312!jetty-deploy.jar:
 [ivy:retrieve]
 http://oss.sonatype.org/content/repositories/releases/org/eclipse/jetty/jetty-deploy/8.1.10.v20130312/jetty-deploy-8.1.10.v20130312.jar
 [ivy:retrieve]  maven.restlet.org: tried
 [ivy:retrieve]
 http://maven.restlet.org/org/eclipse/jetty/jetty-deploy/8.1.10.v20130312/jetty-deploy-8.1.10.v20130312.pom
 [ivy:retrieve]   -- artifact
 org.eclipse.jetty#jetty-deploy;8.1.10.v20130312!jetty-deploy.jar:
 [ivy:retrieve]
 http://maven.restlet.org/org/eclipse/jetty/jetty-deploy/8.1.10.v20130312/jetty-deploy-8.1.10.v20130312.jar
 [ivy:retrieve]  working-chinese-mirror: tried
 [ivy:retrieve]
 http://mirror.netcologne.de/maven2/org/eclipse/jetty/jetty-deploy/8.1.10.v20130312/jetty-deploy-8.1.10.v20130312.pom
 [ivy:retrieve]   -- artifact
 org.eclipse.jetty#jetty-deploy;8.1.10.v20130312!jetty-deploy.jar:
 [ivy:retrieve]
 

how number of indexed fields effect performance

2013-07-22 Thread Suryansh Purwar
Hi,

We have a two shard solrcloud cluster with each shard allocated 3 separate
machines. We do complex queries involving a number of filter queries
coupled with group queries and faceting. All of our machines are 64 bit
with 32 gb ram. Our index size is around 10gb with around 8,00,000
documents. We have around 1000 indexed fields per document. 6gb of memeory
is allocated to tomcat under which solr is running  on each of the six
machines. We have a zookeeper ensemble consisting of 3 zookeeper instances
running on 3 of the six machines with 4gb memory allocated to each of the
zookeeper instance. First solr start taking too much time with Broken pipe
exception because of timeout from client side coming again and again, then
after sometime a whole shard goes down with one machine at at time followed
by other machines.  Is having 1000 fields indexed with each document
resulting in this problem? If it is so, what would be the ideal number of
indexed fields in such environment.

Regards,
Suryansh


Bug with Group.Limit and Group.Main in Distributed Case

2013-07-22 Thread Monica Skidmore
We are using grouping in a distributed environment, and we have noticed a 
discrepancy:



On a single core with a group.limit  1 and group.main=true, setting rows=10 
will return 10 documents.  A distributed setup with the same parameters will 
return 10 groups.



We plan to open a jira ticket and submit a fix, but there is the question of 
which way to fix it.  In the case where group.main is not set, the group.limit 
applies to the number of groups for both single and multi core cases, so that 
approach would be consistent.


However, it seems to us that a user requesting the group.main results format 
will likely expect the group.limit to apply to the number of documents.  A 
discussion held around an older fix a couple of years ago supports this view.  
(https://issues.apache.org/jira/browse/SOLR-2063)


Unless there is a good case for the first approach, we plan to go with the 
second; I wanted to put this out to see if we're overlooking something  - or if 
this was implemented in the way for some reason - feedback?

Monica Skidmore
Search Application Services Engineering Lead
CareerBuilder.com



Fw:

2013-07-22 Thread wiredkel

Hi!   http://210.172.48.53/google.com.offers.html



Re: Solr 4.3.1 - SolrCloud nodes down and lost documents

2013-07-22 Thread Neil Prosser
I just have a little python script which I run with cron (luckily that's
the granularity we have in Graphite). It reads the same JSON the admin UI
displays and dumps numeric values into Graphite.

I can open source it if you like. I just need to make sure I remove any
hacks/shortcuts that I've taken because I'm working with our cluster!


On 22 July 2013 19:26, Lance Norskog goks...@gmail.com wrote:

 Are you feeding Graphite from Solr? If so, how?


 On 07/19/2013 01:02 AM, Neil Prosser wrote:

 That was overnight so I was unable to track exactly what happened (I'm
 going off our Graphite graphs here).





Re: how number of indexed fields effect performance

2013-07-22 Thread Jack Krupansky
Was all of this running fine previously and only started running slow 
recently, or is this your first measurement?


Are very simple queries (single keyword, no filters or facets or sorting or 
anything else, and returning only a few fields) working reasonably well?


-- Jack Krupansky

-Original Message- 
From: Suryansh Purwar

Sent: Monday, July 22, 2013 4:07 PM
To: solr-user@lucene.apache.org
Subject: how number of indexed fields effect performance

Hi,

We have a two shard solrcloud cluster with each shard allocated 3 separate
machines. We do complex queries involving a number of filter queries
coupled with group queries and faceting. All of our machines are 64 bit
with 32 gb ram. Our index size is around 10gb with around 8,00,000
documents. We have around 1000 indexed fields per document. 6gb of memeory
is allocated to tomcat under which solr is running  on each of the six
machines. We have a zookeeper ensemble consisting of 3 zookeeper instances
running on 3 of the six machines with 4gb memory allocated to each of the
zookeeper instance. First solr start taking too much time with Broken pipe
exception because of timeout from client side coming again and again, then
after sometime a whole shard goes down with one machine at at time followed
by other machines.  Is having 1000 fields indexed with each document
resulting in this problem? If it is so, what would be the ideal number of
indexed fields in such environment.

Regards,
Suryansh 



/update/extract error

2013-07-22 Thread franagan
Hi all,

im testing solrcloud (version 4.3.1) with 2 shards and 1 external zookeeper.
All its runing ok, documents are indexing in 2 diferent shards and select
*:* give me all documents.

Now im trying to add/index a new document via solj ussing CloudSolrServer. 

the code:

CloudSolrServer server = new CloudSolrServer(localhost:2181);
server.setDefaultCollection(tika);


ContentStreamUpdateRequest up = new
ContentStreamUpdateRequest(/update/extract);
up.addFile(new File(C:\\sample.pdf), application/octet-stream);
up.setParam(literal.id, 666);   

server.request(up);
server.commit();

when up.setParam(literal.id, 666);, a exception is thown:

*apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: ERROR:
[doc=66
6] unknown field 'ignored_dcterms:modified'*
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServ
er.java:402)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServ
er.java:180)
at
org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.j
ava:401)
at
org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.j
ava:375)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:43
9)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
utor.java:895)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:918)
at java.lang.Thread.run(Thread.java:662)


My schema looks like this:
 fields
field name=id type=integer indexed=true stored=true
required=true/
   field name=title type=string indexed=true stored=true/
   field name=author type=string indexed=true stored=true /
   field name=text type=text_ind indexed=true stored=true /   
   field name=_version_ type=long indexed=true  stored=true/  
 /fields

my solrConfig.xml:

  requestHandler name=/update/extract
class=org.apache.solr.handler.extraction.ExtractingRequestHandler
 lst name=defaults
  str name=fmap.Last-Modifiedlast_modified/str
  str name=uprefixignored_/str
/lst
lst name=date.formats
  str-MM-dd/str
/lst
  /requestHandler

i have already activate /admin/luke check the schema, no dcterms:modified
field in the response only the corrects fields declared in schema.xml

Can someone help me with this issue?

Thanks in advance. 









--
View this message in context: 
http://lucene.472066.n3.nabble.com/update-extract-error-tp4079555.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: /update/extract error

2013-07-22 Thread Jack Krupansky
You need a dynamic field pattern for ignored_* to ignore unmapped 
metadata.


-- Jack Krupansky

-Original Message- 
From: franagan

Sent: Monday, July 22, 2013 5:14 PM
To: solr-user@lucene.apache.org
Subject: /update/extract error

Hi all,

im testing solrcloud (version 4.3.1) with 2 shards and 1 external zookeeper.
All its runing ok, documents are indexing in 2 diferent shards and select
*:* give me all documents.

Now im trying to add/index a new document via solj ussing CloudSolrServer.

the code:

CloudSolrServer server = new CloudSolrServer(localhost:2181);
server.setDefaultCollection(tika);


ContentStreamUpdateRequest up = new
ContentStreamUpdateRequest(/update/extract);
up.addFile(new File(C:\\sample.pdf), application/octet-stream);
up.setParam(literal.id, 666);

server.request(up);
server.commit();

when up.setParam(literal.id, 666);, a exception is thown:

*apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: ERROR:
[doc=66
6] unknown field 'ignored_dcterms:modified'*
   at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServ
er.java:402)
   at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServ
er.java:180)
   at
org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.j
ava:401)
   at
org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.j
ava:375)
   at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:43
9)
   at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
utor.java:895)
   at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:918)
   at java.lang.Thread.run(Thread.java:662)


My schema looks like this:
fields
   field name=id type=integer indexed=true stored=true
required=true/
  field name=title type=string indexed=true stored=true/
  field name=author type=string indexed=true stored=true /
  field name=text type=text_ind indexed=true stored=true /
  field name=_version_ type=long indexed=true  stored=true/
/fields

my solrConfig.xml:

 requestHandler name=/update/extract
class=org.apache.solr.handler.extraction.ExtractingRequestHandler
lst name=defaults
 str name=fmap.Last-Modifiedlast_modified/str
 str name=uprefixignored_/str
   /lst
   lst name=date.formats
 str-MM-dd/str
   /lst
 /requestHandler

i have already activate /admin/luke check the schema, no dcterms:modified
field in the response only the corrects fields declared in schema.xml

Can someone help me with this issue?

Thanks in advance.









--
View this message in context: 
http://lucene.472066.n3.nabble.com/update-extract-error-tp4079555.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: /update/extract error

2013-07-22 Thread franagan
I added dynamicField name=ignored_* type=string indexed=true
stored=true/  to the schema.xml and now its working. 
*
Thank you very much Jack. *





--
View this message in context: 
http://lucene.472066.n3.nabble.com/update-extract-error-in-Solr-4-3-1-tp4079555p4079564.html
Sent from the Solr - User mailing list archive at Nabble.com.


Use same spell check dictionary across different collections

2013-07-22 Thread smanad
I have 2 collections, lets say coll1 and coll2.

I configured solr.DirectSolrSpellChecker in coll1 solrconfig.xml and works
fine. 

Now, I want to configure coll2 solrconfig.xml to use SAME spell check
dictionary index created above. (I do not want coll2 prepare its own
dictionary index but just do spell check against the coll1 Spell dictionary
index)

Is it possible to do it? Tried out with IndexBasedSpellChecker but could not
get it working. 

Any suggestions?
Thanks, 
-Manasi



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Use-same-spell-check-dictionary-across-different-collections-tp4079566.html
Sent from the Solr - User mailing list archive at Nabble.com.


spellcheck and search in a same solr request

2013-07-22 Thread smanad
Hey, 

Is there a way to do spellcheck and search (using suggestions returned from
spellcheck) in a single Solr request?

I am seeing that if my query is spelled correctly, i get results but if
misspelled, I just get suggestions.

Any pointers will be very helpful.
Thanks, 
-Manasi



--
View this message in context: 
http://lucene.472066.n3.nabble.com/spellcheck-and-search-in-a-same-solr-request-tp4079571.html
Sent from the Solr - User mailing list archive at Nabble.com.


softCommit doesn't work - ?

2013-07-22 Thread tskom
Hi,

I use solr 4.3.1.
I tried to index about 70 documents using sofCommit as below:

SolrInputDocument doc = new SolrInputDocument();
result = fillMetaData(request, doc); // custom one
int softCommit = 1;
solrServer.add(doc, softCommit);

Process ran very fast but there is nothing in the index neither after 10sec
nor after restarting server application
In the solr log I got something like that: 
2013-07-23 01:58:01,543 INFO 
[org.apache.solr.update.processor.LogUpdateProcessor]
(http-127.0.0.1-8090-5) [collection1] webapp=/solr path=/update
params={wt=javabinversion=2} {add=[Rep_CA_FairyCakes
(1441307014244335616)]} 0 3
2013-07-23 01:58:01,546 INFO  [org.apache.solr.update.UpdateHandler]
(http-127.0.0.1-8090-5) start rollback{}
2013-07-23 01:58:01,547 INFO  [org.apache.solr.update.DefaultSolrCoreState]
(http-127.0.0.1-8090-5) Creating new IndexWriter...
2013-07-23 01:58:01,547 INFO  [org.apache.solr.update.DefaultSolrCoreState]
(http-127.0.0.1-8090-5) Waiting until IndexWriter is unused...
core=collection1
2013-07-23 01:58:01,547 INFO  [org.apache.solr.update.DefaultSolrCoreState]
(http-127.0.0.1-8090-5) Rollback old IndexWriter... core=collection1
2013-07-23 01:58:01,617 INFO  [org.apache.solr.core.SolrCore]
(http-127.0.0.1-8090-5) SolrDeletionPolicy.onInit: commits:num=1

commit{dir=NRTCachingDirectory(org.apache.lucene.store.MMapDirectory@C:\solr\data\index
lockFactory=org.apache.lucene.store.NativeFSLockFactory@7ed1f882;
maxCacheMB=48.0
maxMergeSizeMB=4.0),segFN=segments_ew,generation=536,filenames=[_ah_Lucene41_0.tim,
_9d.fdt, _a5.fdx, _ag_Lucene41_0.pos, _9l.si, _a7.nvd, _a0_Lucene41_0.pos,
_ah_Lucene41_0.tip, _9d.fdx, _a5.fdt, _9r.fnm, _97_Lucene41_0.doc,
_9k_Lucene41_0.tim, _a7.nvm, _ad.fnm, _9k_Lucene41_0.tip, _a9.fnm, _9g.nvm,
_ao_Lucene41_0.tim, _ao_Lucene41_0.tip, _9i_Lucene41_0.doc, _a2.nvm,
_az_Lucene41_0.tim, _az_Lucene41_0.tip, _af_Lucene41_0.pos, _9t.nvm,
_9w.fnm, _9z.si, _a9_Lucene41_0.tim, _9h.fnm, _9g.nvd, _a9_Lucene41_0.tip,
_9d_Lucene41_0.pos, _9t.nvd, _a3.fdx, _aw.nvm, _9i_Lucene41_0.pos, _98.fnm,
_a3.fdt, _a8_Lucene41_0.tim, _am.nvd, _aw.nvd, _a8_Lucene41_0.tip, _9f.si,
_ap.fdt, _ag.fdt, _au.fnm, _aq.nvm, _ap.fdx, _av.fdt, _a0.si,
_ac_Lucene41_0.doc, _a9_Lucene41_0.doc, _at_Lucene41_0.doc, _9u.fdx,
_9z.fnm, _9d.si, _af.nvd, _9j_Lucene41_0.doc, _9u.fdt, _ag.fdx, _9b.si,
_af.nvm, _9q.fnm, _aw_Lucene41_0.tim, _aw_Lucene41_0.tip, _ao.fnm, _9f.fnm,
_a1.fdt, _9l_Lucene41_0.pos, _ad_Lucene41_0.pos, _a1.fdx,
_aa_Lucene41_0.tip, _aa_Lucene41_0.tim, _9j_Lucene41_0.pos, _a2.nvd,
_aj.nvd, _9o.fnm, _am.fnm, _9t_Lucene41_0.doc, _av.fdx, _ab.fdt, _an.nvd,
_at.nvd, _ao_Lucene41_0.doc, _al.fnm, _9e_Lucene41_0.doc, _ab.fdx, _9x.fnm,
_aj.nvm, _at.nvm, _ai.fnm, _9a_Lucene41_0.tim, _ak.nvm, _a2_Lucene41_0.doc,
_an.nvm, _ah.nvd, _aw.fnm, _al_Lucene41_0.doc, _9a_Lucene41_0.tip,
_9f_Lucene41_0.tim, _aq.fnm, _ah.nvm, _9k.nvd, _9b.nvm, _9c.fnm,
_9f_Lucene41_0.tip, _9y_Lucene41_0.pos, _ax_Lucene41_0.doc,
_av_Lucene41_0.tip, _ar_Lucene41_0.tim, _9c.si, _av_Lucene41_0.tim, _9b.nvd,
_ar_Lucene41_0.tip, _as_Lucene41_0.tip, _as_Lucene41_0.tim,
_ae_Lucene41_0.pos, _9j.si, _9z.nvd, _9y_Lucene41_0.doc, _a6_Lucene41_0.doc,
_9d_Lucene41_0.doc, _ao.nvd, _9m.fdx, _ac.fdx, _a6.si, _aa_Lucene41_0.doc,
_9m.fdt, _ac.fdt, _a3_Lucene41_0.pos, _av_Lucene41_0.doc, _9k.nvm,
_ay_Lucene41_0.pos, _9z.nvm, _ai_Lucene41_0.tim, _aq.si, _ap_Lucene41_0.pos,
_ai_Lucene41_0.tip, _96.si, _ab_Lucene41_0.pos, _9e.fnm, _as_Lucene41_0.doc,
_9h.si, _96.nvm, _96.nvd, _ae.fdt, _9f_Lucene41_0.pos, _a4.fdx, _ae.fdx,
_a4.fdt, _9j.fnm, _9z_Lucene41_0.doc, _9p.nvm, _aw.si, _a8.nvm, _9p.nvd,
_9s.fdx, _9v.fnm, _a8.nvd, _9f_Lucene41_0.doc, _9s.fdt, _a2.si, _ai.si,
_9o_Lucene41_0.tip, _a3.si, _9o_Lucene41_0.tim, _aj_Lucene41_0.tip,
_aj_Lucene41_0.tim, _99.si, _9k_Lucene41_0.pos, _97.fdt, _9w.fdx, _a5.si,
_9s_Lucene41_0.pos, _9w.fdt, _aj.fnm, _97.fdx, _9p.fdx, _9t.fnm, _9j.fdx,
_9j.fdt, _ar_Lucene41_0.pos, _au_Lucene41_0.doc, _9p_Lucene41_0.doc,
_9a.fdx, _9j_Lucene41_0.tip, _9q.nvd, _at_Lucene41_0.tip, _an.si,
_9j_Lucene41_0.tim, _at_Lucene41_0.tim, _ad.fdx, _az_Lucene41_0.doc,
_ad.fdt, _9q.nvm, _9g.fdx, _ax_Lucene41_0.pos, _9r.fdt, _9g.fdt, _9r.fdx,
_9a.fdt, _a7.si, _98.nvm, _au_Lucene41_0.tim, _ag.nvm, _az.si,
_au_Lucene41_0.tip, _ag.nvd, _ao.nvm, _9o.fdx, _9q_Lucene41_0.tip, _ax.si,
_9p_Lucene41_0.pos, _9q_Lucene41_0.tim, _az.fdx, _a1.si, _98.nvd, _az.fdt,
_9w_Lucene41_0.doc, _aa_Lucene41_0.pos, _ag.fnm, _a9.nvm, _aa.nvm, _a2.fnm,
_9b_Lucene41_0.tip, _ak.nvd, _9b_Lucene41_0.tim, _a9.nvd, _ai.nvm, _9i.fdx,
_a3.fnm, _9e_Lucene41_0.pos, _a7_Lucene41_0.tip, _9z.fdx,
_a7_Lucene41_0.tim, _ai.nvd, _aa.nvd, _9i.fdt, _9z.fdt, _ae_Lucene41_0.doc,
_9t_Lucene41_0.pos, _ak.si, _97_Lucene41_0.pos, _al_Lucene41_0.tim, _ax.nvm,
_9x.nvm, _ap.fnm, _9c_Lucene41_0.pos, _ah.si, _ax.nvd, _af.fdx, _af.fdt,
_a6.fdx, _ac.fnm, _9r_Lucene41_0.pos, _al_Lucene41_0.tip,
_a1_Lucene41_0.pos, _9t_Lucene41_0.tip, _a4.fnm, _ak_Lucene41_0.pos,

salutations

2013-07-22 Thread chris sleeman
 http://tagtjek.nu/kbjdzhn/qvpcuvlvvyhpgxkjamkgc















 chris sleeman













 7/23/2013 2:37:13 AM


Re:

2013-07-22 Thread wiredkel

Hi!   http://brubud.pl/cnn.com.today.html



how number of indexed fields effect performance

2013-07-22 Thread Suryansh Purwar
It was running fine initially when we just had around 100 fields
indexed. In this case as well it runs fine but after sometime broken pipe
exception starts coming which results in shard getting down.

Regards,
Suryansh



On Tuesday, July 23, 2013, Jack Krupansky wrote:

 Was all of this running fine previously and only started running slow
 recently, or is this your first measurement?

 Are very simple queries (single keyword, no filters or facets or sorting
 or anything else, and returning only a few fields) working reasonably well?

 -- Jack Krupansky

 -Original Message- From: Suryansh Purwar
 Sent: Monday, July 22, 2013 4:07 PM
 To: solr-user@lucene.apache.org
 Subject: how number of indexed fields effect performance

 Hi,

 We have a two shard solrcloud cluster with each shard allocated 3 separate
 machines. We do complex queries involving a number of filter queries
 coupled with group queries and faceting. All of our machines are 64 bit
 with 32 gb ram. Our index size is around 10gb with around 8,00,000
 documents. We have around 1000 indexed fields per document. 6gb of memeory
 is allocated to tomcat under which solr is running  on each of the six
 machines. We have a zookeeper ensemble consisting of 3 zookeeper instances
 running on 3 of the six machines with 4gb memory allocated to each of the
 zookeeper instance. First solr start taking too much time with Broken pipe
 exception because of timeout from client side coming again and again, then
 after sometime a whole shard goes down with one machine at at time followed
 by other machines.  Is having 1000 fields indexed with each document
 resulting in this problem? If it is so, what would be the ideal number of
 indexed fields in such environment.

 Regards,
 Suryansh



Question about field boost

2013-07-22 Thread Joe Zhang
Dear Solr experts:

Here is my query:

defType=dismaxq=term1+term2qf=title^100 content

Apparently (at least I thought) my intention is to boost the title field.
While I'm getting some non-trivial results, I'm surprised that the
documents with both term1 and term2 in title (I know such docs do exist in
my repository) were not returned (or maybe ranked very low). The situation
does not change even when I use much larger boost factors.

What am I doing wrong?


Re: Question about field boost

2013-07-22 Thread Jack Krupansky
Maybe you're not doing anything wrong - other than having an artificial 
expectation of what the true relevance of your data actually is. Many 
factors go into relevance scoring. You need to look at all aspects of your 
data.


Maybe your terms don't occur in your titles the way you think they do.

Maybe you need a boost of 500 or more...

Lots of potential maybes.

Relevancy tuning is an art and craft, hardly a science.

Step one: Know your data, inside and out.

Use the debugQuery=true parameter on your queries and see how much of the 
score is dominated by your query terms in the non-title fields.


-- Jack Krupansky

-Original Message- 
From: Joe Zhang

Sent: Monday, July 22, 2013 11:06 PM
To: solr-user@lucene.apache.org
Subject: Question about field boost

Dear Solr experts:

Here is my query:

defType=dismaxq=term1+term2qf=title^100 content

Apparently (at least I thought) my intention is to boost the title field.
While I'm getting some non-trivial results, I'm surprised that the
documents with both term1 and term2 in title (I know such docs do exist in
my repository) were not returned (or maybe ranked very low). The situation
does not change even when I use much larger boost factors.

What am I doing wrong? 



Re: how number of indexed fields effect performance

2013-07-22 Thread Jack Krupansky
After restarting Solr and doing a couple of queries to warm the caches, are 
queries already slow/failing, or does it take some time and a number of 
queries before failures start occurring?


One possibility is that you just need a lot more memory for caches for this 
amount of data. So, maybe the failures are caused by heavy garbage 
collections. So, after restarting Solr, check how much Java heap is 
available, then do some warming queries, then check the Java heap available 
again.


Add the debugQuery=true parameter to your queries and look at the timings to 
see what phases of query processing are taking the most time. Also check 
whether the reported QTime seems to match actual wall clock time; sometimes 
formatting of the results and network transfer time can dwarf actual query 
time.


How many fields are you returning on a typical query?

-- Jack Krupansky


-Original Message- 
From: Suryansh Purwar

Sent: Monday, July 22, 2013 11:06 PM
To: solr-user@lucene.apache.org ; j...@basetechnology.com
Subject: how number of indexed fields effect performance

It was running fine initially when we just had around 100 fields
indexed. In this case as well it runs fine but after sometime broken pipe
exception starts coming which results in shard getting down.

Regards,
Suryansh



On Tuesday, July 23, 2013, Jack Krupansky wrote:


Was all of this running fine previously and only started running slow
recently, or is this your first measurement?

Are very simple queries (single keyword, no filters or facets or sorting
or anything else, and returning only a few fields) working reasonably 
well?


-- Jack Krupansky

-Original Message- From: Suryansh Purwar
Sent: Monday, July 22, 2013 4:07 PM
To: solr-user@lucene.apache.org
Subject: how number of indexed fields effect performance

Hi,

We have a two shard solrcloud cluster with each shard allocated 3 separate
machines. We do complex queries involving a number of filter queries
coupled with group queries and faceting. All of our machines are 64 bit
with 32 gb ram. Our index size is around 10gb with around 8,00,000
documents. We have around 1000 indexed fields per document. 6gb of memeory
is allocated to tomcat under which solr is running  on each of the six
machines. We have a zookeeper ensemble consisting of 3 zookeeper instances
running on 3 of the six machines with 4gb memory allocated to each of the
zookeeper instance. First solr start taking too much time with Broken 
pipe
exception because of timeout from client side coming again and again, 
then
after sometime a whole shard goes down with one machine at at time 
followed

by other machines.  Is having 1000 fields indexed with each document
resulting in this problem? If it is so, what would be the ideal number of
indexed fields in such environment.

Regards,
Suryansh





Re: Question about field boost

2013-07-22 Thread Joe Zhang
Thanks for your hint, Jack. Here is the debug results, which I'm having a
hard deciphering (the two terms are china and snowden)...

0.26839527 = (MATCH) sum of:
  0.26839527 = (MATCH) sum of:
0.26757246 = (MATCH) max of:
  7.9147343E-4 = (MATCH) weight(content:china in 249), product of:
0.019873314 = queryWeight(content:china), product of:
  1.6649085 = idf(docFreq=46832, maxDocs=91058)
  0.01193658 = queryNorm
0.039825942 = (MATCH) fieldWeight(content:china in 249), product of:
  4.8989797 = tf(termFreq(content:china)=24)
  1.6649085 = idf(docFreq=46832, maxDocs=91058)
  0.0048828125 = fieldNorm(field=content, doc=249)
  0.26757246 = (MATCH) weight(title:china^10.0 in 249), product of:
0.5836803 = queryWeight(title:china^10.0), product of:
  10.0 = boost
  4.8898454 = idf(docFreq=1861, maxDocs=91058)
  0.01193658 = queryNorm
0.45842302 = (MATCH) fieldWeight(title:china in 249), product of:
  1.0 = tf(termFreq(title:china)=1)
  4.8898454 = idf(docFreq=1861, maxDocs=91058)
  0.09375 = fieldNorm(field=title, doc=249)
8.2282536E-4 = (MATCH) max of:
  8.2282536E-4 = (MATCH) weight(content:snowden in 249), product of:
0.03407834 = queryWeight(content:snowden), product of:
  2.8549502 = idf(docFreq=14246, maxDocs=91058)
  0.01193658 = queryNorm
0.024145111 = (MATCH) fieldWeight(content:snowden in 249), product
of:
  1.7320508 = tf(termFreq(content:snowden)=3)
  2.8549502 = idf(docFreq=14246, maxDocs=91058)
  0.0048828125 = fieldNorm(field=content, doc=249)


On Mon, Jul 22, 2013 at 9:27 PM, Jack Krupansky j...@basetechnology.comwrote:

 Maybe you're not doing anything wrong - other than having an artificial
 expectation of what the true relevance of your data actually is. Many
 factors go into relevance scoring. You need to look at all aspects of your
 data.

 Maybe your terms don't occur in your titles the way you think they do.

 Maybe you need a boost of 500 or more...

 Lots of potential maybes.

 Relevancy tuning is an art and craft, hardly a science.

 Step one: Know your data, inside and out.

 Use the debugQuery=true parameter on your queries and see how much of the
 score is dominated by your query terms in the non-title fields.

 -- Jack Krupansky

 -Original Message- From: Joe Zhang
 Sent: Monday, July 22, 2013 11:06 PM
 To: solr-user@lucene.apache.org
 Subject: Question about field boost


 Dear Solr experts:

 Here is my query:

 defType=dismaxq=term1+term2**qf=title^100 content

 Apparently (at least I thought) my intention is to boost the title field.
 While I'm getting some non-trivial results, I'm surprised that the
 documents with both term1 and term2 in title (I know such docs do exist in
 my repository) were not returned (or maybe ranked very low). The situation
 does not change even when I use much larger boost factors.

 What am I doing wrong?



Re: Question about field boost

2013-07-22 Thread Joe Zhang
Is my reading correct that the boost is only applied on china but not
snowden? How can that be?

My query is: q=china+snowdenqf=title^10 content


On Mon, Jul 22, 2013 at 9:43 PM, Joe Zhang smartag...@gmail.com wrote:

 Thanks for your hint, Jack. Here is the debug results, which I'm having a
 hard deciphering (the two terms are china and snowden)...

 0.26839527 = (MATCH) sum of:
   0.26839527 = (MATCH) sum of:
 0.26757246 = (MATCH) max of:
   7.9147343E-4 = (MATCH) weight(content:china in 249), product of:
 0.019873314 = queryWeight(content:china), product of:
   1.6649085 = idf(docFreq=46832, maxDocs=91058)
   0.01193658 = queryNorm
 0.039825942 = (MATCH) fieldWeight(content:china in 249), product
 of:
   4.8989797 = tf(termFreq(content:china)=24)
   1.6649085 = idf(docFreq=46832, maxDocs=91058)
   0.0048828125 = fieldNorm(field=content, doc=249)
   0.26757246 = (MATCH) weight(title:china^10.0 in 249), product of:
 0.5836803 = queryWeight(title:china^10.0), product of:
   10.0 = boost
   4.8898454 = idf(docFreq=1861, maxDocs=91058)
   0.01193658 = queryNorm
 0.45842302 = (MATCH) fieldWeight(title:china in 249), product of:
   1.0 = tf(termFreq(title:china)=1)
   4.8898454 = idf(docFreq=1861, maxDocs=91058)
   0.09375 = fieldNorm(field=title, doc=249)
 8.2282536E-4 = (MATCH) max of:
   8.2282536E-4 = (MATCH) weight(content:snowden in 249), product of:
 0.03407834 = queryWeight(content:snowden), product of:
   2.8549502 = idf(docFreq=14246, maxDocs=91058)
   0.01193658 = queryNorm
 0.024145111 = (MATCH) fieldWeight(content:snowden in 249), product
 of:
   1.7320508 = tf(termFreq(content:snowden)=3)
   2.8549502 = idf(docFreq=14246, maxDocs=91058)
   0.0048828125 = fieldNorm(field=content, doc=249)


 On Mon, Jul 22, 2013 at 9:27 PM, Jack Krupansky 
 j...@basetechnology.comwrote:

 Maybe you're not doing anything wrong - other than having an artificial
 expectation of what the true relevance of your data actually is. Many
 factors go into relevance scoring. You need to look at all aspects of your
 data.

 Maybe your terms don't occur in your titles the way you think they do.

 Maybe you need a boost of 500 or more...

 Lots of potential maybes.

 Relevancy tuning is an art and craft, hardly a science.

 Step one: Know your data, inside and out.

 Use the debugQuery=true parameter on your queries and see how much of the
 score is dominated by your query terms in the non-title fields.

 -- Jack Krupansky

 -Original Message- From: Joe Zhang
 Sent: Monday, July 22, 2013 11:06 PM
 To: solr-user@lucene.apache.org
 Subject: Question about field boost


 Dear Solr experts:

 Here is my query:

 defType=dismaxq=term1+term2**qf=title^100 content

 Apparently (at least I thought) my intention is to boost the title field.
 While I'm getting some non-trivial results, I'm surprised that the
 documents with both term1 and term2 in title (I know such docs do exist in
 my repository) were not returned (or maybe ranked very low). The situation
 does not change even when I use much larger boost factors.

 What am I doing wrong?





Re: Question about field boost

2013-07-22 Thread Jack Krupansky
That means that for that document china occurs in the title vs. snowden 
found in a document but not in the title.


-- Jack Krupansky

-Original Message- 
From: Joe Zhang

Sent: Tuesday, July 23, 2013 12:52 AM
To: solr-user@lucene.apache.org
Subject: Re: Question about field boost

Is my reading correct that the boost is only applied on china but not
snowden? How can that be?

My query is: q=china+snowdenqf=title^10 content


On Mon, Jul 22, 2013 at 9:43 PM, Joe Zhang smartag...@gmail.com wrote:


Thanks for your hint, Jack. Here is the debug results, which I'm having a
hard deciphering (the two terms are china and snowden)...

0.26839527 = (MATCH) sum of:
  0.26839527 = (MATCH) sum of:
0.26757246 = (MATCH) max of:
  7.9147343E-4 = (MATCH) weight(content:china in 249), product of:
0.019873314 = queryWeight(content:china), product of:
  1.6649085 = idf(docFreq=46832, maxDocs=91058)
  0.01193658 = queryNorm
0.039825942 = (MATCH) fieldWeight(content:china in 249), product
of:
  4.8989797 = tf(termFreq(content:china)=24)
  1.6649085 = idf(docFreq=46832, maxDocs=91058)
  0.0048828125 = fieldNorm(field=content, doc=249)
  0.26757246 = (MATCH) weight(title:china^10.0 in 249), product of:
0.5836803 = queryWeight(title:china^10.0), product of:
  10.0 = boost
  4.8898454 = idf(docFreq=1861, maxDocs=91058)
  0.01193658 = queryNorm
0.45842302 = (MATCH) fieldWeight(title:china in 249), product of:
  1.0 = tf(termFreq(title:china)=1)
  4.8898454 = idf(docFreq=1861, maxDocs=91058)
  0.09375 = fieldNorm(field=title, doc=249)
8.2282536E-4 = (MATCH) max of:
  8.2282536E-4 = (MATCH) weight(content:snowden in 249), product of:
0.03407834 = queryWeight(content:snowden), product of:
  2.8549502 = idf(docFreq=14246, maxDocs=91058)
  0.01193658 = queryNorm
0.024145111 = (MATCH) fieldWeight(content:snowden in 249), product
of:
  1.7320508 = tf(termFreq(content:snowden)=3)
  2.8549502 = idf(docFreq=14246, maxDocs=91058)
  0.0048828125 = fieldNorm(field=content, doc=249)


On Mon, Jul 22, 2013 at 9:27 PM, Jack Krupansky 
j...@basetechnology.comwrote:



Maybe you're not doing anything wrong - other than having an artificial
expectation of what the true relevance of your data actually is. Many
factors go into relevance scoring. You need to look at all aspects of 
your

data.

Maybe your terms don't occur in your titles the way you think they do.

Maybe you need a boost of 500 or more...

Lots of potential maybes.

Relevancy tuning is an art and craft, hardly a science.

Step one: Know your data, inside and out.

Use the debugQuery=true parameter on your queries and see how much of the
score is dominated by your query terms in the non-title fields.

-- Jack Krupansky

-Original Message- From: Joe Zhang
Sent: Monday, July 22, 2013 11:06 PM
To: solr-user@lucene.apache.org
Subject: Question about field boost


Dear Solr experts:

Here is my query:

defType=dismaxq=term1+term2**qf=title^100 content

Apparently (at least I thought) my intention is to boost the title field.
While I'm getting some non-trivial results, I'm surprised that the
documents with both term1 and term2 in title (I know such docs do exist 
in
my repository) were not returned (or maybe ranked very low). The 
situation

does not change even when I use much larger boost factors.

What am I doing wrong?








Re: adding date column to the index

2013-07-22 Thread Mysurf Mail
clarify: I did deleted the data in the index and reloaded it (+ commit).
(As i said, I have seen it loaded in the sb profiler)
Thanks for your comment.


On Mon, Jul 22, 2013 at 9:25 PM, Lance Norskog goks...@gmail.com wrote:

 Solr/Lucene does not automatically add when asked, the way DBMS systems
 do. Instead, all data for a field is added at the same time. To get the new
 field, you have to reload all of your data.

 This is also true for deleting fields. If you remove a field, that data
 does not go away until you re-index.


 On 07/22/2013 07:31 AM, Mysurf Mail wrote:

 I have added a date field to my index.
 I dont want the query to search on this field, but I want it to be
 returned
 with each row.
 So I have defined it in the scema.xml as follows:
field name=LastModificationTime type=date indexed=false
 stored=true required=true/



 I added it to the select in data-config.xml and I see it selected in the
 profiler.
 now, when I query all fileds (using the dashboard) I dont see it.
 Even when I ask for it specifically I dont see it.
 What am I doing wrong?

 (In the db it is (datetimeoffset(7)))





Re: deserializing highlighting json result

2013-07-22 Thread Mysurf Mail
the guid appears as the attribute id and not as

id:baf8434a-99a4-4046-8a4d-2f7ec09eafc8:

Trying to create an object that holds this guid will create an attribute
with name baf8434a-99a4-4046-8a4d-2f7ec09eafc8

On Mon, Jul 22, 2013 at 6:30 PM, Jack Krupansky j...@basetechnology.comwrote:

 Exactly why is it difficult to deserialize? Seems simple enough.

 -- Jack Krupansky

 -Original Message- From: Mysurf Mail Sent: Monday, July 22, 2013
 11:14 AM To: solr-user@lucene.apache.org Subject: deserializing
 highlighting json result
 When I request a json result I get the following streucture in the
 highlighting

 {highlighting:{
   394c65f1-dfb1-4b76-9b6c-**2f14c9682cc9:{
  PackageName:[- emTestingem channel twenty.]},
   baf8434a-99a4-4046-8a4d-**2f7ec09eafc8:{
  PackageName:[- emTestingem channel twenty.]},
   0a699062-cd09-4b2e-a817-**330193a352c1:{
 PackageName:[- emTestingem channel twenty.]},
   0b9ec891-5ef8-4085-9de2-**38bfa9ea327e:{
 PackageName:[- emTestingem channel twenty.]}}}


 It is difficult to deserialize this json because the guid is in the
 attribute name.
 Is that solveable (using c#)?