Re: FAcet with values are displayes in output

2013-09-19 Thread Upayavira
q=country:[* TO *] will find all docs that have a value in a field.
However, it seems you have a space, which *is* a value. I think Eric is
right - track down that record and fix the data.

Upayavira

On Wed, Sep 18, 2013, at 09:23 AM, Prasi S wrote:
 How to filter them in the query itself?
 
 Thanks,
 Prasi
 
 
 On Wed, Sep 18, 2013 at 1:06 PM, Upayavira u...@odoko.co.uk wrote:
 
  Filter them out in your query, or in your display code.
 
  Upayavira
 
  On Wed, Sep 18, 2013, at 06:36 AM, Prasi S wrote:
   Hi ,
   Im using solr 4.4 for our search. When i query for a keyword, it returns
   empty valued facets in the response
  
   lst name=facet_counts
   lst name=facet_queries/
   lst name=facet_fields
   lst name=Country
   *int name=1/int*
   int name=USA1/int
   /lst
   /lst
   lst name=facet_dates/
   lst name=facet_ranges/
   /lst
  
   I have also tried using facet.missing parameter., but no change. How can
   we
   handle this.
  
  
   Thanks,
   Prasi
 


Re: Memory Using In Faceted Search (UnInvertedField's)

2013-09-19 Thread Anton M
I ran some load tests and working memory usage was always about 10-11 Gb
(very slowly raising - that should be cause of query cache being filled in,
I think). 6 Gb was always heap size while 4-5 Gb was reported as shareable
memory.
First, I became afraid that Solr could continue taking memory up to all
available, but looks like it stops somewhere after fieldValueCache is filled
in.

Shawn, I had swap file growing (up to 50-60%) and working while load tests
ran. Did you configure 'swapiness' on your Linux box (set it to 0 earlier,
maybe)? If not, my Windows OS could be cause of that difference.

I'm not sure if that's completely an issue about shareable memory or some
missing JVM configurations (I don't have anything special except -Xmx, -Xms
and -XX:MaxPermSize=512M) or some Solr memory leak.
I'd appreciate any thoughts on that.

Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Memory-Using-In-Faceted-Search-UnInvertedField-s-tp4090889p4091014.html
Sent from the Solr - User mailing list archive at Nabble.com.


Migrating from Endeca

2013-09-19 Thread Gareth Poulton
Hi,
A customer wants us to move their entire enterprise platform - of which one
of the many components is Oracle Endeca - to open source.
However, customers being the way they are, they don't want to have to give
up any of the features they currently use, the most prominent of which are
user friendly web-based editors for non-technical people to be able to edit
things like:
- Schema
- Dimensions (i.e. facets)
- Dimension groups (not sure what these are)
- Thesaurus
- Stopwords
- Report generation
- Boosting individual records (i.e. sponsored links)
- Relevance ranking settings
- Process pipeline editor for, e.g. adding new languages
-...all without touching any xml.

My question is, are there any solr features, plugins, modules, third party
applications, or the like that will do this for us? Or will we have to
develop all the above from scratch?

thanks,
Gareth


solr atomic updates stored=true, and copyField limitation

2013-09-19 Thread Tanguy Moal
Hello,

I'm using solr 4.4. I have a solr core with a schema defining a bunch of 
different fields, and among them, a date field:
- date: indexed and stored   // the date used at search time
In practice it's a TrieDateField but I think that's not relevant for the 
concern.

It also has a multi valued, not required, string field named tags which 
contains, well a list of tags, for some of the documents.

So far, so good: everything works as expected and I'm glad.
I'm able to perform partial (or atomic) updates on the tags field whenever it 
gets modified, and I love it.

Now I have an new source that also pushes updates to the same solr core. 
Unfortunately, that source's incoming documents have their date in an other 
field, of the same type, named created_time instead of date.
- created_time: stored only  // some documents come in with this field set
To be able to sort any document by time, I decided to ask solr to copy the 
contents of the field created_time to the field named date:
 copyField source=created_date dest=date /

I updated my schema and reloaded my core and everything seemed fine. In fact, I 
did break something 8-)
But I figured it out later…
Quoting http://wiki.apache.org/solr/Atomic_Updates#Caveats_and_Limitations :
 all fields in your SchemaXml must be configured as stored=true except for 
 fields which are copyField/ destinations -- which must be configured as 
 stored=false


However at that time, I was not aware of the limitation and I was able to sort 
by time across all the documents in my solr core.
I then decided to make sure that partial (or atomic) updates could still be 
performed, and then I was surprised:
* documents from the more recent source (having both a date and a created_time 
field) are updated fine, the date field is kept (the copyField directive is 
replayed, I guess)
* documents from the first source (having only the date field set) are however 
a little bit less lucky: the date gets lost in process (looks like the date 
field was overridden by the execution of the copyField directive with nothing 
in its source field)

I then became aware of the caveats and limitations of atomic updates, but now I 
want to understand why ;-)

So my question is: What differs concerning copyField behaviours between a 
normal (classic) and a partial (atomic) update?
In practice, I don't understand why the targets of every copyField directives 
are *always* cleared during partial updates?
Could the clearing of the destination field be performed if one of the source 
field of a copyField is present in the atomic update only? May be we didn't 
want to do that because that would have put some complexity where it should not 
be (updates must be fast), but that's just an idea.

I have two ways to handle my problem:
1/ Create a stored=false search_date field and have two copyFields 
directives, one for the original date field an another one for the newer 
created_time field, and make the search application rely on the search_date 
field
2/ Since I have some control over the second source pushing documents, I can 
make sure that documents are pushed with the same date field, and work around 
the limitation by removing the copyField directive entirely.
Since it simplifies my solr schema, I chose the option #2

Thank you very much for your attention

Tanguy

Re: Migrating from Endeca

2013-09-19 Thread Jack Krupansky

Take a look at LucidWorks Enterprise. It has a graphical UI.

But if you must meet all of the listed requirements and Lucid doesn't meet 
all of them, then... you will have to develop everything on your own. Or, 
maybe Lucid might be interested in partnering with you to allow your to add 
extensions to their UI. If you really are committed to a deep replacement of 
Endeca's UI, then rolling your own is probably the way to go. Then the 
question is whether you should open source that UI.


You can also consider extending the Solr Admin UI. It does not do most of 
your listed features, but having better integration with the Solr Admin UI 
is a good idea.


-- Jack Krupansky

-Original Message- 
From: Gareth Poulton

Sent: Thursday, September 19, 2013 7:50 AM
To: solr-user@lucene.apache.org
Subject: Migrating from Endeca

Hi,
A customer wants us to move their entire enterprise platform - of which one
of the many components is Oracle Endeca - to open source.
However, customers being the way they are, they don't want to have to give
up any of the features they currently use, the most prominent of which are
user friendly web-based editors for non-technical people to be able to edit
things like:
- Schema
- Dimensions (i.e. facets)
- Dimension groups (not sure what these are)
- Thesaurus
- Stopwords
- Report generation
- Boosting individual records (i.e. sponsored links)
- Relevance ranking settings
- Process pipeline editor for, e.g. adding new languages
-...all without touching any xml.

My question is, are there any solr features, plugins, modules, third party
applications, or the like that will do this for us? Or will we have to
develop all the above from scratch?

thanks,
Gareth 



How to highlight multiple words in document

2013-09-19 Thread bramha
Hi All,

I want to highlight multiple words in document.

e.g If I search for Rework AND Build then after opening the document
returned by search result should highlight both words(Rework as well as
Build) in that document.

Currently I am adding word to highlight in highlight field.
In this example I am setting highlight = Rework AND Build. But it is
considering this as single word and highlighting this occurrence in that
document.

Thanks in advance.
- Bramha



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-highlight-multiple-words-in-document-tp4091021.html
Sent from the Solr - User mailing list archive at Nabble.com.


solr4.4 admin page show loading

2013-09-19 Thread Micheal Chao
hi, I have installed solr4.4 on tomcat7.0. the problem is I can't see the
solr admin page, it's always show loading. I can't find any error in
tomcat logs, and I can send search request, and get the result.

what can I do? please help me, thank you very much. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr4-4-admin-page-show-loading-tp4091039.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solrcloud - adding a node as a replica?

2013-09-19 Thread didier deshommes
Thanks Furkan,
That's exactly what I was looking for.


On Wed, Sep 18, 2013 at 4:21 PM, Furkan KAMACI furkankam...@gmail.comwrote:

 Are yoh looking for that:

 http://lucene.472066.n3.nabble.com/SOLR-Cloud-Collection-Management-quesiotn-td4063305.html

 18 Eylül 2013 Çarşamba tarihinde didier deshommes dfdes...@gmail.com
 adlı
 kullanıcı şöyle yazdı:
  Hi,
  How do I add a node as a replica to a solrcloud cluster? Here is my
  situation: some time ago, I created several collections
  with replicationFactor=2. Now I need to add a new replica. I thought just
  starting a new node and re-using the same zokeeper instance would make it
  automatically a replica, but that isn't the case. Do I need to delete and
  re-create my collections with the right replicationFactor (3 in this
 case)
  again? I am using solr 4.3.0.
 
  Thanks,
  didier
 



Re: Solrcloud - adding a node as a replica?

2013-09-19 Thread Furkan KAMACI
Do not hesitate to ask questions if you have any problems about it.


2013/9/19 didier deshommes dfdes...@gmail.com

 Thanks Furkan,
 That's exactly what I was looking for.


 On Wed, Sep 18, 2013 at 4:21 PM, Furkan KAMACI furkankam...@gmail.com
 wrote:

  Are yoh looking for that:
 
 
 http://lucene.472066.n3.nabble.com/SOLR-Cloud-Collection-Management-quesiotn-td4063305.html
 
  18 Eylül 2013 Çarşamba tarihinde didier deshommes dfdes...@gmail.com
  adlı
  kullanıcı şöyle yazdı:
   Hi,
   How do I add a node as a replica to a solrcloud cluster? Here is my
   situation: some time ago, I created several collections
   with replicationFactor=2. Now I need to add a new replica. I thought
 just
   starting a new node and re-using the same zokeeper instance would make
 it
   automatically a replica, but that isn't the case. Do I need to delete
 and
   re-create my collections with the right replicationFactor (3 in this
  case)
   again? I am using solr 4.3.0.
  
   Thanks,
   didier
  
 



I can't open the admin page, it's always loading.

2013-09-19 Thread Micheal Chao
Hi, I followed the tutoral to download solr4.4 and unzip it, and then i
started jetty. i can post data and search correctly, but when i try to open
admin page, it's always show loading. 

and then i setup solr on tomcat 7.0, but it's the same.

what's wrong? please help, thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/I-can-t-open-the-admin-page-it-s-always-loading-tp4091051.html
Sent from the Solr - User mailing list archive at Nabble.com.


SolrCloud setup - any advice?

2013-09-19 Thread Neil Prosser
Apologies for the giant email. Hopefully it makes sense.

We've been trying out SolrCloud to solve some scalability issues with our
current setup and have run into problems. I'd like to describe our current
setup, our queries and the sort of load we see and am hoping someone might
be able to spot the massive flaw in the way I've been trying to set things
up.

We currently run Solr 4.0.0 in the old style Master/Slave replication. We
have five slaves, each running Centos with 96GB of RAM, 24 cores and with
48GB assigned to the JVM heap. Disks aren't crazy fast (i.e. not SSDs) but
aren't slow either. Our GC parameters aren't particularly exciting, just
-XX:+UseConcMarkSweepGC. Java version is 1.7.0_11.

Our index size ranges between 144GB and 200GB (when we optimise it back
down, since we've had bad experiences with large cores). We've got just
over 37M documents some are smallish but most range between 1000-6000
bytes. We regularly update documents so large portions of the index will be
touched leading to a maxDocs value of around 43M.

Query load ranges between 400req/s to 800req/s across the five slaves
throughout the day, increasing and decreasing gradually over a period of
hours, rather than bursting.

Most of our documents have upwards of twenty fields. We use different
fields to store territory variant (we have around 30 territories) values
and also boost based on the values in some of these fields (integer ones).

So an average query can do a range filter by two of the territory variant
fields, filter by a non-territory variant field. Facet by a field or two
(may be territory variant). Bring back the values of 60 fields. Boost query
on field values of a non-territory variant field. Boost by values of two
territory-variant fields. Dismax query on up to 20 fields (with boosts) and
phrase boost on those fields too. They're pretty big queries. We don't do
any index-time boosting. We try to keep things dynamic so we can alter our
boosts on-the-fly.

Another common query is to list documents with a given set of IDs and
select documents with a common reference and order them by one of their
fields.

Auto-commit every 30 minutes. Replication polls every 30 minutes.

Document cache:
  * initialSize - 32768
  * size - 32768

Filter cache:
  * autowarmCount - 128
  * initialSize - 8192
  * size - 8192

Query result cache:
  * autowarmCount - 128
  * initialSize - 8192
  * size - 8192

After a replicated core has finished downloading (probably while it's
warming) we see requests which usually take around 100ms taking over 5s. GC
logs show concurrent mode failure.

I was wondering whether anyone can help with sizing the boxes required to
split this index down into shards for use with SolrCloud and roughly how
much memory we should be assigning to the JVM. Everything I've read
suggests that running with a 48GB heap is way too high but every attempt
I've made to reduce the cache sizes seems to wind up causing out-of-memory
problems. Even dropping all cache sizes by 50% and reducing the heap by 50%
caused problems.

I've already tried using SolrCloud 10 shards (around 3.7M documents per
shard, each with one replica) and kept the cache sizes low:

Document cache:
  * initialSize - 1024
  * size - 1024

Filter cache:
  * autowarmCount - 128
  * initialSize - 512
  * size - 512

Query result cache:
  * autowarmCount - 32
  * initialSize - 128
  * size - 128

Even when running on six machines in AWS with SSDs, 24GB heap (out of 60GB
memory) and four shards on two boxes and three on the rest I still see
concurrent mode failure. This looks like it's causing ZooKeeper to mark the
node as down and things begin to struggle.

Is concurrent mode failure just something that will inevitably happen or is
it avoidable by dropping the CMSInitiatingOccupancyFraction?

If anyone has anything that might shove me in the right direction I'd be
very grateful. I'm wondering whether our set-up will just never work and
maybe we're expecting too much.

Many thanks,

Neil


Problem with stopword

2013-09-19 Thread mpcmarcos
Hello everybody, 

I have a problem with stopwords, I have an index with some stopwords and
when I search by one of them only, solr dont select any document. ¿How can I
fix this? I need all the documents.

Example:

*Stopwords*: hello, goodbye
*Query*: http://localhost:8893/solr/select?q=hello
*DebugQuery*: str name=parsedquery_toString/
*Total Results*: 0

I try do this with e dismax, but only works if I do a call to solr without
q, no when q is empty by stopwords.

http://localhost:8983/solr/select?q=defType=edismaxq.alt=*:*



Thank you.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-with-stopword-tp4091064.html
Sent from the Solr - User mailing list archive at Nabble.com.


Indexing several sub-fields in one solr field

2013-09-19 Thread jimmy nguyen
Hello,

I'd like to index into Solr (4.4.0) documents that I previously annotated
with GATE (7.1).
I use Behemoth to be able to run my GATE application on a corpus of
documents on Hadoop, and then Behemoth allows me to directly send my
annotated documents to solr. But my question is not about the Behemoth or
Hadoop parts.

The annotations produced by my GATE application usually have several
features (for example, annotation type Person has the following features :
Person.title, Person.firstName, Person.lastName, Person.gender).
Each of my documents may contain more than one Person annotation, which is
why I would like to index all the features for one annotation in one field
in solr.
How do I do that ?

I thought I'd add the following lines in schema.xml :

types
...
fieldType name=person class=solr.StrField subSuffix=_person /
...
/types
...
fields
...
field name=personinfo type=person indexed=true stored=true
multiValued=true /
dynamicField name=*_person type=text_general indexed=true
stored=false /
...
/fields


But as soon as I start my solr instances and try to access solr from my
browser, I get an HTTP ERROR 500 :

Problem accessing /solr/. Reason:

{msg=SolrCore 'collection1' is not available due to init failure:
Plugin Initializing failure for [schema.xml]
fieldType,trace=org.apache.solr.common.SolrException: SolrCore
'collection1' is not available due to init failure: Plugin Initializing
failure for [schema.xml] fieldType
at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:860)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:287)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:722)
Caused by: org.apache.solr.common.SolrException: Plugin Initializing
failure for [schema.xml] fieldType
at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:193)
at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:467)
at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:164)
at
org.apache.solr.schema.IndexSchemaFactory.create(IndexSchemaFactory.java:55)
at
org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:69)
at org.apache.solr.core.ZkContainer.createFromZk(ZkContainer.java:268)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:655)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:364)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:356)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at

decimal numeric queries are too slow in solr

2013-09-19 Thread Karan jindal
Hi all,

I am using solr 3.4 and index size is around 250gb.
the issue that I am facing is the queries which have a decimal number in it
is taking long time to execute.
I am using dismax query handler with *qf* (15 fields) and *pf * (4 fields)
and a boost function on time.

Also I am using worddelimitorfilterfactory with following options (only
mentioning options related to numbers)
*generateNumberParts=1
*
*preserveOriginal=1
*
*catenateNumbers=1** **
*

Example Query :
solr 3.4 takes about 20 seconds
solr 3 takes less than 1 second

Couldn't understand the reason of so much difference.
I can understand the internally 3.4 will translate into something like this
(3.4 3) (4 34) because of worddelimitorfilterfactory, but still the
difference is quite huge.

On what factors query execution time depends?
Any help which helps me in knowing the reason will be appreciated.

Regards,
Karan


Question on ICUFoldingFilterFactory

2013-09-19 Thread Nemani, Raj
Hello,

I was wondering if anybody who has experience with ICUFoldingFilterFactory can 
help out with the following issue.  Thank you so much in advance.

Raj

--

Problem:
When a document is created/updated, the value's casing is indexed properly. 
However, when it's queried, the value is returned in lowercase.
Example:
Document input: NBAE
Document value: NBAE
Query input: NBAE,nbae,Nbae...etc
Query Output: nbae

If I remove the ICUFoldingFilterFactory filter, the casing problem goes away, 
but I then searches for nbae (lowercase) or Nbae (mix case) return no values.


Field Type:
fieldType name=text_phrase class=solr.TextField positionIncrementGap=20 
autoGeneratePhraseQueries=true
  analyzer
filter 
class=solr.PatternReplaceFilterFactory pattern=\samp;\s 
replacement=\sand\s/
charFilter 
class=solr.PatternReplaceCharFilterFactory pattern=[\p{Punct}\u00BF\u00A1] 
replaceWith= /
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.TrimFilterFactory /
filter 
class=solr.PatternReplaceFilterFactory pattern=[\p{Cntrl}] replacement=/
filter class=solr.ICUFoldingFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords_en.txt enablePositionIncrements=true /
  /analyzer
/fieldType


Let me know if that makes sense. I'm curious if the 
solr.ICUFoldingFilterFactory has additional attributes that I can use to 
control the casing behavior but retain it's other filtering properties 
(ASCIIFoldingFilter,  and ICUNormalizer2Filter)

Thanks!!!



RE: Solr 4.4.0: Plugin init failure for [schema.xml] analyzer/tokenizer

2013-09-19 Thread Chris Hostetter

Ok, first off -- let's clear up some confusion...

1) except for needing to put hte logging jars in your servlet container's 
to level classpath, you should not be putting any jars that come from solr 
or lucene, or any jars for custom plugins you have written in tomcat/lib

2) you should never manually add/remove any jars or any kind from solr's 
WEB-INF/lib/ directory.

3) if you have custom plugins you want to load, you should create a *new* 
lib directory, and put your custom plugin jars 9and their dependencies) in 
that directory, and configure it (either with sharedLib in solr.xml, or 
with a lib directive in your solrconfig.xml file)


As for the situation situation you find yourself in...

here are the big, gigantic (the size of football fields even) red flags 
that jump out at me as being the sort of thing that could cause all sorts 
of classloader nightmares with your setup...

: * Following are the jars placed in tomcat/lib dir:
...
: lucene-core.jar 
: solr-core-1.3.0.jar 
: solr-dataimporthandler-4.4.0.jar 
: solr-dataimporthandler-extras-4.4.0.jar
: solr-solrj-4.4.0.jar
: lucene-analyzers-common-4.2.0.jar
...
: Jars in tomcat/ webapps/ROOT/WEB-INF/lib/
...
: lucene-core-4.4.0.jar 
: nps-solr-plugin-1.0-SNAPSHOT.jar
: solr-core-4.4.0.jar
: solr-dataimporthandler-4.4.0.jar
: lucene-analyzers-common-4.4.0.jar 
: solr-solrj-4.4.0.jar
...

You clearly have two radically diff versions of solr-core and lucene-core 
in your classpath, which could easily explain the problems of 
ClassCastExceptions related to hte TokenizerFactory class -- because there 
are going ot be two radically differnet versions of that class in the 
classpath, and who knows which one java is trying to cast your custom impl 
to.

seperate from that: even if the multiple solr-dataimporthandler, 
lucene-analyzers-common, solr-solrj jars in each of those directories are 
the exact same binary files, when loaded into the hierarchical 
classloaders of a servlet container, they produce differnt copies of the 
same java classes -- so you can again have classloader problems of 
some execution paths using a leaf classloader to access ClassX while 
another thread might use a parent classloader to access ClassX -- these 
differnet class instances will have different static fields, and instances 
of these classes will (probably) not be .equals(), etc


-Hoss


Re: SolrCloud setup - any advice?

2013-09-19 Thread Shreejay Nair
Hi Neil,

Although you haven't mentioned it, just wanted to confirm - do you have
soft commits enabled?

Also what's the version of solr you are using for the solr cloud setup?
4.0.0 had lots of memory and zk related issues. What's the warmup time for
your caches? Have you tried disabling the caches?

Is this is static index or you documents are added continuously?

The answers to these questions might help us pin point the issue...

On Thursday, September 19, 2013, Neil Prosser wrote:

 Apologies for the giant email. Hopefully it makes sense.

 We've been trying out SolrCloud to solve some scalability issues with our
 current setup and have run into problems. I'd like to describe our current
 setup, our queries and the sort of load we see and am hoping someone might
 be able to spot the massive flaw in the way I've been trying to set things
 up.

 We currently run Solr 4.0.0 in the old style Master/Slave replication. We
 have five slaves, each running Centos with 96GB of RAM, 24 cores and with
 48GB assigned to the JVM heap. Disks aren't crazy fast (i.e. not SSDs) but
 aren't slow either. Our GC parameters aren't particularly exciting, just
 -XX:+UseConcMarkSweepGC. Java version is 1.7.0_11.

 Our index size ranges between 144GB and 200GB (when we optimise it back
 down, since we've had bad experiences with large cores). We've got just
 over 37M documents some are smallish but most range between 1000-6000
 bytes. We regularly update documents so large portions of the index will be
 touched leading to a maxDocs value of around 43M.

 Query load ranges between 400req/s to 800req/s across the five slaves
 throughout the day, increasing and decreasing gradually over a period of
 hours, rather than bursting.

 Most of our documents have upwards of twenty fields. We use different
 fields to store territory variant (we have around 30 territories) values
 and also boost based on the values in some of these fields (integer ones).

 So an average query can do a range filter by two of the territory variant
 fields, filter by a non-territory variant field. Facet by a field or two
 (may be territory variant). Bring back the values of 60 fields. Boost query
 on field values of a non-territory variant field. Boost by values of two
 territory-variant fields. Dismax query on up to 20 fields (with boosts) and
 phrase boost on those fields too. They're pretty big queries. We don't do
 any index-time boosting. We try to keep things dynamic so we can alter our
 boosts on-the-fly.

 Another common query is to list documents with a given set of IDs and
 select documents with a common reference and order them by one of their
 fields.

 Auto-commit every 30 minutes. Replication polls every 30 minutes.

 Document cache:
   * initialSize - 32768
   * size - 32768

 Filter cache:
   * autowarmCount - 128
   * initialSize - 8192
   * size - 8192

 Query result cache:
   * autowarmCount - 128
   * initialSize - 8192
   * size - 8192

 After a replicated core has finished downloading (probably while it's
 warming) we see requests which usually take around 100ms taking over 5s. GC
 logs show concurrent mode failure.

 I was wondering whether anyone can help with sizing the boxes required to
 split this index down into shards for use with SolrCloud and roughly how
 much memory we should be assigning to the JVM. Everything I've read
 suggests that running with a 48GB heap is way too high but every attempt
 I've made to reduce the cache sizes seems to wind up causing out-of-memory
 problems. Even dropping all cache sizes by 50% and reducing the heap by 50%
 caused problems.

 I've already tried using SolrCloud 10 shards (around 3.7M documents per
 shard, each with one replica) and kept the cache sizes low:

 Document cache:
   * initialSize - 1024
   * size - 1024

 Filter cache:
   * autowarmCount - 128
   * initialSize - 512
   * size - 512

 Query result cache:
   * autowarmCount - 32
   * initialSize - 128
   * size - 128

 Even when running on six machines in AWS with SSDs, 24GB heap (out of 60GB
 memory) and four shards on two boxes and three on the rest I still see
 concurrent mode failure. This looks like it's causing ZooKeeper to mark the
 node as down and things begin to struggle.

 Is concurrent mode failure just something that will inevitably happen or is
 it avoidable by dropping the CMSInitiatingOccupancyFraction?

 If anyone has anything that might shove me in the right direction I'd be
 very grateful. I'm wondering whether our set-up will just never work and
 maybe we're expecting too much.

 Many thanks,

 Neil



Will Solr work with a mapped drive?

2013-09-19 Thread johnmunir
Hi,


I'm having this same problem as described here: 
http://stackoverflow.com/questions/17708163/absolute-paths-in-solr-xml-configuration-using-tomcat6-on-windows
  Any one knows if this is a limitation of Solr or not?


I searched the web, nothing came up.


Thanks!!!


-- MJ


Re: Indexing several sub-fields in one solr field

2013-09-19 Thread Jack Krupansky
There is no such fieldType attribute as subSuffix. Solr is just 
complaining about extraneous, junk attributes. Delete the crap.


-- Jack Krupansky

-Original Message- 
From: jimmy nguyen

Sent: Thursday, September 19, 2013 12:43 PM
To: solr-user@lucene.apache.org
Subject: Indexing several sub-fields in one solr field

Hello,

I'd like to index into Solr (4.4.0) documents that I previously annotated
with GATE (7.1).
I use Behemoth to be able to run my GATE application on a corpus of
documents on Hadoop, and then Behemoth allows me to directly send my
annotated documents to solr. But my question is not about the Behemoth or
Hadoop parts.

The annotations produced by my GATE application usually have several
features (for example, annotation type Person has the following features :
Person.title, Person.firstName, Person.lastName, Person.gender).
Each of my documents may contain more than one Person annotation, which is
why I would like to index all the features for one annotation in one field
in solr.
How do I do that ?

I thought I'd add the following lines in schema.xml :

types
...
fieldType name=person class=solr.StrField subSuffix=_person /
...
/types
...
fields
...
field name=personinfo type=person indexed=true stored=true
multiValued=true /
dynamicField name=*_person type=text_general indexed=true
stored=false /
...
/fields


But as soon as I start my solr instances and try to access solr from my
browser, I get an HTTP ERROR 500 :

Problem accessing /solr/. Reason:

   {msg=SolrCore 'collection1' is not available due to init failure:
Plugin Initializing failure for [schema.xml]
fieldType,trace=org.apache.solr.common.SolrException: SolrCore
'collection1' is not available due to init failure: Plugin Initializing
failure for [schema.xml] fieldType
at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:860)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:287)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:722)
Caused by: org.apache.solr.common.SolrException: Plugin Initializing
failure for [schema.xml] fieldType
at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:193)
at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:467)
at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:164)
at
org.apache.solr.schema.IndexSchemaFactory.create(IndexSchemaFactory.java:55)
at
org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:69)
at org.apache.solr.core.ZkContainer.createFromZk(ZkContainer.java:268)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:655)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:364)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:356)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at 

Re: Indexing several sub-fields in one solr field

2013-09-19 Thread jimmy nguyen
Hello,

thanks for the answer. Sorry, I actually meant attribute subFieldSuffix.

So, in order to be able to index several features in one solr field, should
I program a new Java class inheriting AbstractSubTypeFieldType ? Or is
there another way to do it ?

Thanks !
Jim


On Thu, Sep 19, 2013 at 4:05 PM, Jack Krupansky j...@basetechnology.comwrote:

 There is no such fieldType attribute as subSuffix. Solr is just
 complaining about extraneous, junk attributes. Delete the crap.

 -- Jack Krupansky

 -Original Message- From: jimmy nguyen
 Sent: Thursday, September 19, 2013 12:43 PM
 To: solr-user@lucene.apache.org
 Subject: Indexing several sub-fields in one solr field


 Hello,

 I'd like to index into Solr (4.4.0) documents that I previously annotated
 with GATE (7.1).
 I use Behemoth to be able to run my GATE application on a corpus of
 documents on Hadoop, and then Behemoth allows me to directly send my
 annotated documents to solr. But my question is not about the Behemoth or
 Hadoop parts.

 The annotations produced by my GATE application usually have several
 features (for example, annotation type Person has the following features :
 Person.title, Person.firstName, Person.lastName, Person.gender).
 Each of my documents may contain more than one Person annotation, which is
 why I would like to index all the features for one annotation in one field
 in solr.
 How do I do that ?

 I thought I'd add the following lines in schema.xml :

 types
 ...
 fieldType name=person class=solr.StrField subSuffix=_person /
 ...
 /types
 ...
 fields
 ...
 field name=personinfo type=person indexed=true stored=true
 multiValued=true /
 dynamicField name=*_person type=text_general indexed=true
 stored=false /
 ...
 /fields


 But as soon as I start my solr instances and try to access solr from my
 browser, I get an HTTP ERROR 500 :

 Problem accessing /solr/. Reason:

{msg=SolrCore 'collection1' is not available due to init failure:
 Plugin Initializing failure for [schema.xml]
 fieldType,trace=org.apache.**solr.common.SolrException: SolrCore
 'collection1' is not available due to init failure: Plugin Initializing
 failure for [schema.xml] fieldType
 at org.apache.solr.core.**CoreContainer.getCore(**CoreContainer.java:860)
 at
 org.apache.solr.servlet.**SolrDispatchFilter.doFilter(**
 SolrDispatchFilter.java:287)
 at
 org.apache.solr.servlet.**SolrDispatchFilter.doFilter(**
 SolrDispatchFilter.java:158)
 at
 org.eclipse.jetty.servlet.**ServletHandler$CachedChain.**
 doFilter(ServletHandler.java:**1419)
 at
 org.eclipse.jetty.servlet.**ServletHandler.doHandle(**
 ServletHandler.java:455)
 at
 org.eclipse.jetty.server.**handler.ScopedHandler.handle(**
 ScopedHandler.java:137)
 at
 org.eclipse.jetty.security.**SecurityHandler.handle(**
 SecurityHandler.java:557)
 at
 org.eclipse.jetty.server.**session.SessionHandler.**
 doHandle(SessionHandler.java:**231)
 at
 org.eclipse.jetty.server.**handler.ContextHandler.**
 doHandle(ContextHandler.java:**1075)
 at org.eclipse.jetty.servlet.**ServletHandler.doScope(**
 ServletHandler.java:384)
 at
 org.eclipse.jetty.server.**session.SessionHandler.**
 doScope(SessionHandler.java:**193)
 at
 org.eclipse.jetty.server.**handler.ContextHandler.**
 doScope(ContextHandler.java:**1009)
 at
 org.eclipse.jetty.server.**handler.ScopedHandler.handle(**
 ScopedHandler.java:135)
 at
 org.eclipse.jetty.server.**handler.**ContextHandlerCollection.**handle(**
 ContextHandlerCollection.java:**255)
 at
 org.eclipse.jetty.server.**handler.HandlerCollection.**
 handle(HandlerCollection.java:**154)
 at
 org.eclipse.jetty.server.**handler.HandlerWrapper.handle(**
 HandlerWrapper.java:116)
 at org.eclipse.jetty.server.**Server.handle(Server.java:368)
 at
 org.eclipse.jetty.server.**AbstractHttpConnection.**handleRequest(**
 AbstractHttpConnection.java:**489)
 at
 org.eclipse.jetty.server.**BlockingHttpConnection.**handleRequest(**
 BlockingHttpConnection.java:**53)
 at
 org.eclipse.jetty.server.**AbstractHttpConnection.**headerComplete(**
 AbstractHttpConnection.java:**942)
 at
 org.eclipse.jetty.server.**AbstractHttpConnection$**
 RequestHandler.headerComplete(**AbstractHttpConnection.java:**1004)
 at org.eclipse.jetty.http.**HttpParser.parseNext(**HttpParser.java:640)
 at org.eclipse.jetty.http.**HttpParser.parseAvailable(**
 HttpParser.java:235)
 at
 org.eclipse.jetty.server.**BlockingHttpConnection.handle(**
 BlockingHttpConnection.java:**72)
 at
 org.eclipse.jetty.server.bio.**SocketConnector$**ConnectorEndPoint.run(**
 SocketConnector.java:264)
 at
 org.eclipse.jetty.util.thread.**QueuedThreadPool.runJob(**
 QueuedThreadPool.java:608)
 at
 org.eclipse.jetty.util.thread.**QueuedThreadPool$3.run(**
 QueuedThreadPool.java:543)
 at java.lang.Thread.run(Thread.**java:722)
 Caused by: org.apache.solr.common.**SolrException: Plugin Initializing
 failure for [schema.xml] fieldType
 at
 org.apache.solr.util.plugin.**AbstractPluginLoader.load(**
 AbstractPluginLoader.java:193)
 at 

Re: SOLR-5250

2013-09-19 Thread Chris Hostetter

: widget, BUT while researching for this message, I've learned about the 
: important difference between a text field and a string field in solr and 
: it appears that by default, the Drupal apachesolr module indexes text 
: fields as text and not strings. Now I just need to figure out how to 
: alter this process to suit my own needs. I'll update that d.org ticket 
: with my findings so hopefully that will prevent some other future, 
: confused developer from reaching out to the Apache Foundation 
: prematurely.

John: glad to hear you wer able to track down the root cause of your 
problem.

Thanks for closing the loop, and good luck on finding a solution that 
works nicely with the drupal bridge you are using.

Please feel free to folow up on this list with any additional questions 
you have on the solr side of things.

-Hoss


Re: I can't open the admin page, it's always loading.

2013-09-19 Thread Furkan KAMACI
Could you paste your jetty logs of when you try to open admin page.

19 Eylül 2013 Perşembe tarihinde Micheal Chao fisher030...@hotmail.com
adlı kullanıcı şöyle yazdı:
 Hi, I followed the tutoral to download solr4.4 and unzip it, and then i
 started jetty. i can post data and search correctly, but when i try to
open
 admin page, it's always show loading.

 and then i setup solr on tomcat 7.0, but it's the same.

 what's wrong? please help, thanks.



 --
 View this message in context:
http://lucene.472066.n3.nabble.com/I-can-t-open-the-admin-page-it-s-always-loading-tp4091051.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Problem with stopword

2013-09-19 Thread Furkan KAMACI
Firstly, you  houl read here:

https://cwiki.apache.org/confluence/display/solr/Running+Your+Analyzer

Secondly, when you write a quey stop word are filtered from your query if
you use stop word analyzer so there will not be anything else to search.

19 Eylül 2013 Perşembe tarihinde mpcmarcos mpcmar...@gmail.com adlı
kullanıcı şöyle yazdı:
 Hello everybody,

 I have a problem with stopwords, I have an index with some stopwords and
 when I search by one of them only, solr dont select any document. ¿How
can I
 fix this? I need all the documents.

 Example:

 *Stopwords*: hello, goodbye
 *Query*: http://localhost:8893/solr/select?q=hello
 *DebugQuery*: str name=parsedquery_toString/
 *Total Results*: 0

 I try do this with e dismax, but only works if I do a call to solr without
 q, no when q is empty by stopwords.

 http://localhost:8983/solr/select?q=defType=edismaxq.alt=*:*



 Thank you.



 --
 View this message in context:
http://lucene.472066.n3.nabble.com/Problem-with-stopword-tp4091064.html
 Sent from the Solr - User mailing list archive at Nabble.com.



SOLR-5250

2013-09-19 Thread John Brandenburg
Greetings, 

This is a follow up to https://issues.apache.org/jira/browse/SOLR-5250 where I 
reported a possible issue with sorting content which contains hyphens. Hoss Man 
suggested that I likely have a misconfiguration on my field settings and that I 
send a message to this list.

I am using the Drupal apachesolr module version 1.4 (Where I actually also 
posted an issue at https://drupal.org/node/2092363) with a hosted Acquia solr 
index. So the schema settings will reflect what is packaged with the apachesolr 
module in drupal-3.0-rc2-solr3.

I wasn't initially familiar with how Drupal field types are mapped to Solr 
field types, and the field in question is using the Text field widget, BUT 
while researching for this message, I've learned about the important difference 
between a text field and a string field in solr and it appears that by default, 
the Drupal apachesolr module indexes text fields as text and not strings. Now 
I just need to figure out how to alter this process to suit my own needs. I'll 
update that d.org ticket with my findings so hopefully that will prevent some 
other future, confused developer from reaching out to the Apache Foundation 
prematurely.

--
John P. Brandenburg
Developer

jbrandenb...@forumone.com
www.forumone.com
703-894-4362

Forum One Communications
Communicate • Collaborate • Change the World

Re: I can't open the admin page, it's always loading.

2013-09-19 Thread Chris Hostetter

: Hi, I followed the tutoral to download solr4.4 and unzip it, and then i
: started jetty. i can post data and search correctly, but when i try to open
: admin page, it's always show loading. 

the admin UI is entirely rendered by client side javascript in your 
browser -- so the most important question we need to know is what OS  
browser you are using to access the web UI.

if your browser has a debug/error console available, it would also help to 
know if it mentions any errors/warnings.


-Hoss


Re: SolrCloud setup - any advice?

2013-09-19 Thread Otis Gospodnetic
Hi Neil,

Consider using G1 instead.  See http://blog.sematext.com/?s=g1

If that doesn't help, we can play with various JVM parameters.  The latest
version of SPM for Solr exposes information about sizes and utilization of
JVM memory pools, which may help you understand which JVM params you need
to change, how, and whether your changes are achieving the desired effect.

Otis
Solr  ElasticSearch Support
http://sematext.com/


On Sep 19, 2013 11:21 AM, Neil Prosser neil.pros...@gmail.com wrote:

 Apologies for the giant email. Hopefully it makes sense.

 We've been trying out SolrCloud to solve some scalability issues with our
 current setup and have run into problems. I'd like to describe our current
 setup, our queries and the sort of load we see and am hoping someone might
 be able to spot the massive flaw in the way I've been trying to set things
 up.

 We currently run Solr 4.0.0 in the old style Master/Slave replication. We
 have five slaves, each running Centos with 96GB of RAM, 24 cores and with
 48GB assigned to the JVM heap. Disks aren't crazy fast (i.e. not SSDs) but
 aren't slow either. Our GC parameters aren't particularly exciting, just
 -XX:+UseConcMarkSweepGC. Java version is 1.7.0_11.

 Our index size ranges between 144GB and 200GB (when we optimise it back
 down, since we've had bad experiences with large cores). We've got just
 over 37M documents some are smallish but most range between 1000-6000
 bytes. We regularly update documents so large portions of the index will be
 touched leading to a maxDocs value of around 43M.

 Query load ranges between 400req/s to 800req/s across the five slaves
 throughout the day, increasing and decreasing gradually over a period of
 hours, rather than bursting.

 Most of our documents have upwards of twenty fields. We use different
 fields to store territory variant (we have around 30 territories) values
 and also boost based on the values in some of these fields (integer ones).

 So an average query can do a range filter by two of the territory variant
 fields, filter by a non-territory variant field. Facet by a field or two
 (may be territory variant). Bring back the values of 60 fields. Boost query
 on field values of a non-territory variant field. Boost by values of two
 territory-variant fields. Dismax query on up to 20 fields (with boosts) and
 phrase boost on those fields too. They're pretty big queries. We don't do
 any index-time boosting. We try to keep things dynamic so we can alter our
 boosts on-the-fly.

 Another common query is to list documents with a given set of IDs and
 select documents with a common reference and order them by one of their
 fields.

 Auto-commit every 30 minutes. Replication polls every 30 minutes.

 Document cache:
   * initialSize - 32768
   * size - 32768

 Filter cache:
   * autowarmCount - 128
   * initialSize - 8192
   * size - 8192

 Query result cache:
   * autowarmCount - 128
   * initialSize - 8192
   * size - 8192

 After a replicated core has finished downloading (probably while it's
 warming) we see requests which usually take around 100ms taking over 5s. GC
 logs show concurrent mode failure.

 I was wondering whether anyone can help with sizing the boxes required to
 split this index down into shards for use with SolrCloud and roughly how
 much memory we should be assigning to the JVM. Everything I've read
 suggests that running with a 48GB heap is way too high but every attempt
 I've made to reduce the cache sizes seems to wind up causing out-of-memory
 problems. Even dropping all cache sizes by 50% and reducing the heap by 50%
 caused problems.

 I've already tried using SolrCloud 10 shards (around 3.7M documents per
 shard, each with one replica) and kept the cache sizes low:

 Document cache:
   * initialSize - 1024
   * size - 1024

 Filter cache:
   * autowarmCount - 128
   * initialSize - 512
   * size - 512

 Query result cache:
   * autowarmCount - 32
   * initialSize - 128
   * size - 128

 Even when running on six machines in AWS with SSDs, 24GB heap (out of 60GB
 memory) and four shards on two boxes and three on the rest I still see
 concurrent mode failure. This looks like it's causing ZooKeeper to mark the
 node as down and things begin to struggle.

 Is concurrent mode failure just something that will inevitably happen or is
 it avoidable by dropping the CMSInitiatingOccupancyFraction?

 If anyone has anything that might shove me in the right direction I'd be
 very grateful. I'm wondering whether our set-up will just never work and
 maybe we're expecting too much.

 Many thanks,

 Neil



Re: Unknown attribute id in add:allowDups

2013-09-19 Thread Chris Hostetter

: I'm working with the Pecl package, with Solr 4.3.1. I have a doc defined in my
...
: $client = new SolrClient($options);
: $doc = new SolrInputDocument();
: $doc-addField('id', 12345);
: $doc-addField('description', 'This is the content of the doc');
: $updateResponse = $client-addDocument($doc);
: 
: When I do this, the doc is not added to the index, and I get the following
: error in the logs in admin
: 
:  Unknown attribute id in add:allowDups

id is a red herring here -- it's not refering to your id field it's 
refering to the fact that an XML attribute node exists with an XML id 
that it doesn't recognize.

or to put it another way: Pecl is generating an add xml element that 
contains an attribute like this:  allowDups=false|true ...and solr 
doesn't know what to do with that.

allowDups was an option that existed prior to 4.0, but is no longer 
supported (the overwrite attribute now takes it's place)

So my best guess is that the Pecl code you are using was designed for 3.x, 
and doesn't entirely work correctly with 4.x.

the warning you are getting isn't fatal or anything -- it's just letting 
you know that unknown attribute is being ignored -- but you may want to 
look into wether there is an updated Pecl library (for example: if you 
really wanted to use allowDups=true you should now be using 
overwrite=false and maybe the newer version of your client library will 
let you)

I've updated some places in the ref guide and wiki were it wasn't obvious 
that allowDups is gone, gone, gone ... i'll also update that error message 
so it will be more clear starting in 4.6...

https://issues.apache.org/jira/browse/SOLR-5257

-Hoss


Re: solrcloud shards backup/restoration

2013-09-19 Thread Aditya Sakhuja
Hi,

Sorry for the late followup on this. Let me put in more details here.

*The problem:*

Cannot successfully restore back the index backed up with
'/replication?command=backup'. The backup was generated as *
snapshot.mmdd*

*My setup and steps:*
*
*
6 solrcloud instances
7 zookeepers instances

Steps:

1. Take snapshot using *http://host1:8893/solr/replication?command=backup*,
on one host only. move *snapshot.mmdd *to some reliable storage.

2. Stop all 6 solr instances, all 7 zk instances.

3. Delete ../collectionname/data/* on all solrcloud nodes. ie. deleting
the index data completely.

4. Delete zookeeper/data/version*/* on all zookeeper nodes.

5. Copy back index from backup to one of the nodes.
 \ cp *snapshot.mmdd/*  *../collectionname/data/index/*

6. Restart all zk instances. Restart all solrcloud instances.


*Outcome:*
*
*
All solr instances are up. However, *num of docs = 0 *for all nodes.
Looking at the node where the index was restored, there is a new
index.yymmddhhmmss directory being created and index.properties pointing to
it. That explains why no documents are reported.


How do I have solrcloud pickup data from the index directory on a restart ?

Thanks in advance,
Aditya



On Fri, Sep 6, 2013 at 3:41 PM, Aditya Sakhuja aditya.sakh...@gmail.comwrote:

 Thanks Shalin and Mark for your responses. I am on the same page about the
 conventions for taking the backup. However, I am less sure about the
 restoration of the index. Lets say we have 3 shards across 3 solrcloud
 servers.

 1. I am assuming we should take a backup from each of the shard leaders
 to get a complete collection. do you think that will get the complete index
 ( not worrying about what is not hard committed at the time of backup ). ?

 2. How do we go about restoring the index in a fresh solrcloud cluster ?
 From the structure of the snapshot I took, I did not see any
 replication.properties or index.properties  which I see normally on a
 healthy solrcloud cluster nodes.
 if I have the snapshot named snapshot.20130905 does the
 snapshot.20130905/* go into data/index ?

 Thanks
 Aditya



 On Fri, Sep 6, 2013 at 7:28 AM, Mark Miller markrmil...@gmail.com wrote:

 Phone typing. The end should not say don't hard commit - it should say
 do a hard commit and take a snapshot.

 Mark

 Sent from my iPhone

 On Sep 6, 2013, at 7:26 AM, Mark Miller markrmil...@gmail.com wrote:

  I don't know that it's too bad though - its always been the case that
 if you do a backup while indexing, it's just going to get up to the last
 hard commit. With SolrCloud that will still be the case. So just make sure
 you do a hard commit right before taking the backup - yes, it might miss a
 few docs in the tran log, but if you are taking a back up while indexing,
 you don't have great precision in any case - you will roughly get a
 snapshot for around that time - even without SolrCloud, if you are worried
 about precision and getting every update into that backup, you want to stop
 indexing and commit first. But if you just want a rough snapshot for around
 that time, in both cases you can still just don't hard commit and take a
 snapshot.
 
  Mark
 
  Sent from my iPhone
 
  On Sep 6, 2013, at 1:13 AM, Shalin Shekhar Mangar 
 shalinman...@gmail.com wrote:
 
  The replication handler's backup command was built for pre-SolrCloud.
  It takes a snapshot of the index but it is unaware of the transaction
  log which is a key component in SolrCloud. Hence unless you stop
  updates, commit your changes and then take a backup, you will likely
  miss some updates.
 
  That being said, I'm curious to see how peer sync behaves when you try
  to restore from a snapshot. When you say that you haven't been
  successful in restoring, what exactly is the behaviour you observed?
 
  On Fri, Sep 6, 2013 at 5:14 AM, Aditya Sakhuja 
 aditya.sakh...@gmail.com wrote:
  Hello,
 
  I was looking for a good backup / recovery solution for the solrcloud
  indexes. I am more looking for restoring the indexes from the index
  snapshot, which can be taken using the replicationHandler's backup
 command.
 
  I am looking for something that works with solrcloud 4.3 eventually,
 but
  still relevant if you tested with a previous version.
 
  I haven't been successful in have the restored index replicate across
 the
  new replicas, after I restart all the nodes, with one node having the
  restored index.
 
  Is restoring the indexes on all the nodes the best way to do it ?
  --
  Regards,
  -Aditya Sakhuja
 
 
 
  --
  Regards,
  Shalin Shekhar Mangar.




 --
 Regards,
 -Aditya Sakhuja




-- 
Regards,
-Aditya Sakhuja


[ANN] Lux Release 0.10.5

2013-09-19 Thread Michael Sokolov
I'm pleased to announce the release of the XML search engine Lux, 
version 0.10.5.  There has been a lot of progress made since our last 
announced release, which was 0.9.1.  Some highlights:


The app server now provides full access to HTTP request data and control 
of HTTP responses.  We've implemented the excellent EXPath specification 
for this (http://expath.org/spec/webapp 
http://expath.org/spec/webapp/editor) with only a few gaps (eg. no 
binary file upload yet).


Range comparisons (like [@title  'median'] are now rewritten by the 
optimizer to use the lux:key() function when a suitable index is 
available, and comparisons involving lux:key() are optimized using the 
Lucene index.


and there have been numerous performance optimizations and bug fixes, 
detailed at http://issues.luxdb.org/ and in the release notes here: 
http://luxdb.org/RELEASE-0.10.html.


Lots more information, including downloads, documentation and setup 
instructions, is available at http://luxdb.org http://luxdb.org/, 
source code is at http://github.com/msokolov/lux, and there is an email 
list: lu...@luxdb.org, archived at 
https://groups.google.com/forum/?fromgroups#!topic/luxdb 
https://groups.google.com/forum/?fromgroups#%21topic/luxdb.


Finally, I'll be presenting Lux at Lucene/Solr Revolution in Dublin Nov. 
6-7, so if you're anywhere nearby, I encourage you to come, and I look 
forward to seeing you there!


-Mike Sokolov
soko...@falutin.net


Re: I can't open the admin page, it's always loading.

2013-09-19 Thread Alexandre Rafalovitch
You may have some over-eager Ad Blockers! Check the network panel of
Firebug/Chrome console/whatever you have. See if some resources are not
loaded.

Regards,
Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Thu, Sep 19, 2013 at 9:21 PM, Micheal Chao fisher030...@hotmail.comwrote:

 Hi, I followed the tutoral to download solr4.4 and unzip it, and then i
 started jetty. i can post data and search correctly, but when i try to open
 admin page, it's always show loading.

 and then i setup solr on tomcat 7.0, but it's the same.

 what's wrong? please help, thanks.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/I-can-t-open-the-admin-page-it-s-always-loading-tp4091051.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Question on ICUFoldingFilterFactory

2013-09-19 Thread Alexandre Rafalovitch
What do you mean by output? Are you looking at fields in returned
documents? In which case you should see original stored field. Or are you -
for example - looking at facet/group values which are using tokenized
post-processed results?

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Fri, Sep 20, 2013 at 2:22 AM, Nemani, Raj raj.nem...@turner.com wrote:

 Hello,

 I was wondering if anybody who has experience with ICUFoldingFilterFactory
 can help out with the following issue.  Thank you so much in advance.

 Raj

 --

 Problem:
 When a document is created/updated, the value's casing is indexed
 properly. However, when it's queried, the value is returned in lowercase.
 Example:
 Document input: NBAE
 Document value: NBAE
 Query input: NBAE,nbae,Nbae...etc
 Query Output: nbae

 If I remove the ICUFoldingFilterFactory filter, the casing problem goes
 away, but I then searches for nbae (lowercase) or Nbae (mix case) return no
 values.


 Field Type:
 fieldType name=text_phrase class=solr.TextField
 positionIncrementGap=20 autoGeneratePhraseQueries=true
   analyzer
 filter
 class=solr.PatternReplaceFilterFactory pattern=\samp;\s
 replacement=\sand\s/
 charFilter
 class=solr.PatternReplaceCharFilterFactory
 pattern=[\p{Punct}\u00BF\u00A1] replaceWith= /
 tokenizer class=solr.KeywordTokenizerFactory/
 filter class=solr.TrimFilterFactory /
 filter
 class=solr.PatternReplaceFilterFactory pattern=[\p{Cntrl}]
 replacement=/
 filter class=solr.ICUFoldingFilterFactory/
 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords_en.txt enablePositionIncrements=true /
   /analyzer
 /fieldType


 Let me know if that makes sense. I'm curious if the
 solr.ICUFoldingFilterFactory has additional attributes that I can use to
 control the casing behavior but retain it's other filtering properties
 (ASCIIFoldingFilter,  and ICUNormalizer2Filter)

 Thanks!!!




Re: Memory Using In Faceted Search (UnInvertedField's)

2013-09-19 Thread Shawn Heisey
On 9/19/2013 3:14 AM, Anton M wrote:
 Shawn, I had swap file growing (up to 50-60%) and working while load tests
 ran. Did you configure 'swapiness' on your Linux box (set it to 0 earlier,
 maybe)? If not, my Windows OS could be cause of that difference.

The vm.swappiness sysctl setting is 1.  I have used 0 as well.  I don't
want it to start swapping unless it *REALLY* needs to.  The default of
60 is pretty aggressive.

 I'm not sure if that's completely an issue about shareable memory or some
 missing JVM configurations (I don't have anything special except -Xmx, -Xms
 and -XX:MaxPermSize=512M) or some Solr memory leak.
 I'd appreciate any thoughts on that.

As I said before, I think that the memory reported as shareable is not
actually allocated.  It probably should be listed under virtual memory.
 Our app rarely does facets, and it typically sorts on one field, so I
have absolutely no idea what's being measured in the 11g of shared
memory for the solr process.

I was present for a conversation between Lucene committers on IRC where
they seemed to be discussing this issue, and it sounded like it is a
side effect of using MMap in a particular way.  It sounded like they
didn't want to change the way its used, because it was the correct way
of using it.  The conversation went way over my head for the most part.

Thanks,
Shawn



Re: I can't open the admin page, it's always loading.

2013-09-19 Thread Micheal Chao
I'm using windows7 and IE8, i debuged the script, it showed error: var
d3_formatPrefixes =
[y,z,a,f,p,n,μ,m,,k,M,G,T,P,E,Z,Y].map(d3_formatPrefix);
can't find object's method.

so i changed my browse, and it works.

thanks a lot. 
is this a bug of solr?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/I-can-t-open-the-admin-page-it-s-always-loading-tp4091051p4091157.html
Sent from the Solr - User mailing list archive at Nabble.com.


JVM Crash using solr 4.4 on Centos

2013-09-19 Thread Oak McIlwain
I have solr 4.4 running on tomcat 7 on my local development environment
which is ubuntu based and it works fine (Querying, Posting Documents, Data
Import etc.)

I am trying to move into a staging environment which is Centos based (still
using tomcat 7 and solr 4.4 however when attempting to post documents and
do a data import from mysql through jdbc, after a few hundred documents,
the tomcat server crashes and it logs:

# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x7fb4d8fe5e85, pid=10620, tid=140414656674112
#
# JRE version: 7.0-b147
# Java VM: Java HotSpot(TM) 64-Bit Server VM (21.0-b17 mixed mode
linux-amd64 compressed oops)
# Problematic frame:
# J  org.apache.lucene.analysis.en.PorterStemFilter.incrementToken()Z

I'm using Sun Java JDK 1.7.0

Anyone got any ideas I can pursue to resolve this?


Re: I can't open the admin page, it's always loading.

2013-09-19 Thread Alexandre Rafalovitch
I think IE8 itself might the bug! :-) Many popular libraries have dropped
=IE7 support completely and are phasing out IE8 as well. Looks like D3 - a
visualization library used for some of Admin stuff - is doing that as well.

Though I thought Admin javascript loading was more robust than that.

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Fri, Sep 20, 2013 at 8:48 AM, Micheal Chao fisher030...@hotmail.comwrote:

 I'm using windows7 and IE8, i debuged the script, it showed error: var
 d3_formatPrefixes =

 [y,z,a,f,p,n,μ,m,,k,M,G,T,P,E,Z,Y].map(d3_formatPrefix);
 can't find object's method.

 so i changed my browse, and it works.

 thanks a lot.
 is this a bug of solr?




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/I-can-t-open-the-admin-page-it-s-always-loading-tp4091051p4091157.html
 Sent from the Solr - User mailing list archive at Nabble.com.



RE: JVM Crash using solr 4.4 on Centos

2013-09-19 Thread Michael Ryan
This is a known bug in that JDK version. Upgrade to a newer version of JDK 7 
(any build within the last two years or so should be fine). If that's not 
possible for you, you can add -XX:-UseLoopPredicate as a command line option to 
java to work around this.

-Michael

-Original Message-
From: Oak McIlwain [mailto:oak.mcilw...@gmail.com] 
Sent: Thursday, September 19, 2013 10:10 PM
To: solr-user@lucene.apache.org
Subject: JVM Crash using solr 4.4 on Centos

I have solr 4.4 running on tomcat 7 on my local development environment which 
is ubuntu based and it works fine (Querying, Posting Documents, Data Import 
etc.)

I am trying to move into a staging environment which is Centos based (still 
using tomcat 7 and solr 4.4 however when attempting to post documents and do a 
data import from mysql through jdbc, after a few hundred documents, the tomcat 
server crashes and it logs:

# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x7fb4d8fe5e85, pid=10620, tid=140414656674112 # # 
JRE version: 7.0-b147 # Java VM: Java HotSpot(TM) 64-Bit Server VM (21.0-b17 
mixed mode
linux-amd64 compressed oops)
# Problematic frame:
# J  org.apache.lucene.analysis.en.PorterStemFilter.incrementToken()Z

I'm using Sun Java JDK 1.7.0

Anyone got any ideas I can pursue to resolve this?


Re: solrcloud shards backup/restoration

2013-09-19 Thread Aditya Sakhuja
How does one recover from an index corruption ? That's what I am trying to
eventually tackle here.

Thanks
Aditya

On Thursday, September 19, 2013, Aditya Sakhuja wrote:

 Hi,

 Sorry for the late followup on this. Let me put in more details here.

 *The problem:*

 Cannot successfully restore back the index backed up with
 '/replication?command=backup'. The backup was generated as *
 snapshot.mmdd*

 *My setup and steps:*
 *
 *
 6 solrcloud instances
 7 zookeepers instances

 Steps:

 1. Take snapshot using *http://host1:8893/solr/replication?command=backup
 *, on one host only. move *snapshot.mmdd *to some reliable storage.

 2. Stop all 6 solr instances, all 7 zk instances.

 3. Delete ../collectionname/data/* on all solrcloud nodes. ie. deleting
 the index data completely.

 4. Delete zookeeper/data/version*/* on all zookeeper nodes.

 5. Copy back index from backup to one of the nodes.
  \ cp *snapshot.mmdd/*  *../collectionname/data/index/*

 6. Restart all zk instances. Restart all solrcloud instances.


 *Outcome:*
 *
 *
 All solr instances are up. However, *num of docs = 0 *for all nodes.
 Looking at the node where the index was restored, there is a new
 index.yymmddhhmmss directory being created and index.properties pointing to
 it. That explains why no documents are reported.


 How do I have solrcloud pickup data from the index directory on a restart
 ?

 Thanks in advance,
 Aditya



 On Fri, Sep 6, 2013 at 3:41 PM, Aditya Sakhuja 
 aditya.sakh...@gmail.comwrote:

 Thanks Shalin and Mark for your responses. I am on the same page about the
 conventions for taking the backup. However, I am less sure about the
 restoration of the index. Lets say we have 3 shards across 3 solrcloud
 servers.

 1. I am assuming we should take a backup from each of the shard leaders
 to get a complete collection. do you think that will get the complete index
 ( not worrying about what is not hard committed at the time of backup ). ?

 2. How do we go about restoring the index in a fresh solrcloud cluster ?
 From the structure of the snapshot I took, I did not see any
 replication.properties or index.properties  which I see normally on a
 healthy solrcloud cluster nodes.
 if I have the snapshot named snapshot.20130905 does the
 snapshot.20130905/* go into data/index ?

 Thanks
 Aditya



 On Fri, Sep 6, 2013 at 7:28 AM, Mark Miller markrmil...@gmail.com wrote:

 Phone typing. The end should not say don't hard commit - it should say
 do a hard commit and take a snapshot.

 Mark

 Sent from my iPhone

 On Sep 6, 2013, at 7:26 AM, Mark Miller markrmil...@gmail.com wrote:

  I don't know that it's too bad though - its always been the case that if
 you do a backup while indexing, it's just going to get up to the last hard
 commit. With SolrCloud that will still be the case. So just make sure you
 do a hard commit right before taking the backup - yes, it might miss a few
 docs in the tran log, but if you are taking a back up while indexing, you
 don't have great precision in any case - you will roughly get a snapshot
 for around that time - even without SolrCloud, if you are worried about
 precision and getting every update into that backup, you want to stop
 indexing and commit first. But if you just want a rough snapshot for around
 that time, in both cases you can still just don't hard commit and take a
 snapshot.
 
  Mark
 
  Sent from my iPhone
 
  On Sep 6, 2013, at 1:13 AM, Shalin Shekhar Mangar 
 shalinman...@gmail.com wrote:
 
  The replication handler's backup command was built for pre-SolrCloud.
  It takes a snapshot of the index but it is unaware of the transaction
  log which is a key component in SolrCloud. Hence unless you stop
  updates, commit your changes and then take a backup, you will likely
  miss some updates.
 
  That being said, I'm curious to see how peer sync behaves when you try
  to restore from a snapshot. When you say that you haven't been
  successful in restoring, what exactly is the behaviour you observed?
 
  On Fri, Sep 6, 2013 at 5:14 AM, Aditya Sakhuja 
 aditya.sakh...@gmail.com wrote:
  Hello,
 
  I was looking for a good backup / recovery solution for the solrcloud
  indexes. I am more looking for restoring the indexes from the index
  snapshot, which can be taken using the replicationHandler's backup
 command.
 
  I am looking for something that works with solrcloud 4.3 eventually,
 but
  still relevant if you tested with a previous version.
 
  I haven't been successful in have the restored index replicate across
 the
  new replicas, after I restart all the nodes, with one node having the
  restored index.
 
  Is restoring the indexes on all the nodes the best way to do it ?
  --
  Regards,
  -Aditya Sakhuja
 
 
 
  --
  Regards,
  Shalin Shekhar Mangar.




 --
 Regards,
 -Aditya Sakhuja

 --
 Regards,
 -Aditya Sakhuja



-- 
Regards,
-Aditya Sakhuja


Re: Migrating from Endeca

2013-09-19 Thread Alexandre Rafalovitch
I think Hue ( http://cloudera.github.io/hue/ ) which Cloudera uses for Solr
search among other things has some of UI customization. And it is
open-source, so would make for much better base.

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Thu, Sep 19, 2013 at 8:21 PM, Jack Krupansky j...@basetechnology.comwrote:

 Take a look at LucidWorks Enterprise. It has a graphical UI.

 But if you must meet all of the listed requirements and Lucid doesn't meet
 all of them, then... you will have to develop everything on your own. Or,
 maybe Lucid might be interested in partnering with you to allow your to add
 extensions to their UI. If you really are committed to a deep replacement
 of Endeca's UI, then rolling your own is probably the way to go. Then the
 question is whether you should open source that UI.

 You can also consider extending the Solr Admin UI. It does not do most of
 your listed features, but having better integration with the Solr Admin UI
 is a good idea.

 -- Jack Krupansky

 -Original Message- From: Gareth Poulton
 Sent: Thursday, September 19, 2013 7:50 AM
 To: solr-user@lucene.apache.org
 Subject: Migrating from Endeca


 Hi,
 A customer wants us to move their entire enterprise platform - of which one
 of the many components is Oracle Endeca - to open source.
 However, customers being the way they are, they don't want to have to give
 up any of the features they currently use, the most prominent of which are
 user friendly web-based editors for non-technical people to be able to edit
 things like:
 - Schema
 - Dimensions (i.e. facets)
 - Dimension groups (not sure what these are)
 - Thesaurus
 - Stopwords
 - Report generation
 - Boosting individual records (i.e. sponsored links)
 - Relevance ranking settings
 - Process pipeline editor for, e.g. adding new languages
 -...all without touching any xml.

 My question is, are there any solr features, plugins, modules, third party
 applications, or the like that will do this for us? Or will we have to
 develop all the above from scratch?

 thanks,
 Gareth