Re: Solr CPU Usage

2014-08-28 Thread hendra_budiawan
Yes we do complex query with a lot of clauses and facets and data is growing
up bigger every day, i agree with you it might not on the hardware issue
maybe i need to tune up solr/OS/jetty system configration to optimize solr
process. Thank you so much for help.

Best regards,
Hendra



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-CPU-Usage-tp4155370p4155528.html
Sent from the Solr - User mailing list archive at Nabble.com.


Null pointer on multi-core search

2014-08-28 Thread Shay Sofer
Hi,

I'm using solr 4.8.1and with following scenario I got a null pointer exception:


1.   I'm trying to search over multi-cores and group search.

2.   SearchHandler is called and when executing
for(SearchComponent c : components) {
 c.finishStage(rb);
}


3.   With QueryComponent.finishStage, this code is called:

if (rb.grouping()) {
  groupedFinishStage(rb);
} else {
  regularFinishStage(rb);
}

4.   And then -

  private void groupedFinishStage(final ResponseBuilder rb) {

// To have same response as non-distributed request.

GroupingSpecification groupSpec = rb.getGroupingSpec();

if (rb.mergedTopGroups.isEmpty()) {

  for (String field : groupSpec.getFields()) {

rb.mergedTopGroups.put(field, new TopGroups(null, null, 0, 0, new 
GroupDocs[]{}, Float.NaN));

  }

  rb.resultIds = new HashMap();

}


5.   As you can see the marked line is initializing resultsIds.

6.   And then when get to HighlightComponent.finishStage in and trying to 
execute:

ShardDoc sdoc = rb.resultIds.get(id);

int idx = sdoc.positionInResponse;

arr[idx] = new NamedList.NamedListEntry(id, hl.getVal(i));

7.   resultsIds is empty and then sdoc.positionInResponse get a null 
pointer exception.

Hope that my description is clear.

Thanks,
Shay.


RE: Solr CPU Usage

2014-08-28 Thread Jacques du Rand
Do you index and search from this box ?
How many documents do you have ?


From: Shawn Heisey [s...@elyograg.org]
Sent: Thursday, August 28, 2014 7:48 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr CPU Usage

On 8/27/2014 8:42 PM, hendra_budiawan wrote:
 Yes i'm just worried about load average reported by OS, because last week
 suddenly server can't accessed  so we have to hard reboot. I'm still
 investigating what is the problem, because this server is dedicated to solr
 only, we suspect the problem came from the solr process but i'm still
 looking another possibility what makes this problem arises. Can you give me
 suggestion what supposed i need to check further?

What kind of query volume is your Solr server supporting?  Are you doing
complex queries with a lot of clauses, facets, or something else that's
CPU intensive?  Is your update volume high?

The numbers that you've shown, assuming that the htop info is accurate
and you really do have 16 or 32 CPU cores, do not look like any major
problem.  Solr is working hard, but there's a lot more CPU capacity
left.  The top output shows that iowait percentage is not a problem, so
it's not stuck in disk I/O.  Memory usage indicates that OS disk caching
is working well.

It looks like you were running jetty, but that the jetty might not be
the one included in the Solr example.  If it's not the one included in
the example, then its configuration is not well-tuned for Solr.  If you
have a high request volume, you may need to increase the maxThreads
parameter in the jetty config.  The only possible thing that I can think
of which might cause a complete inability to access the server via ssh
or other means is that you are hitting the open file limit in the
operating system.  Most linux distros use /etc/security/limits.conf to
configure the open file limit for each user.

Thanks,
Shawn

This email and its contents are subject to an email legal notice that can be 
viewed at http://www.naspers.com/disclaimer.php Should you be unable to access 
the link provided, please email us for a copy at c...@optinet.net
Hierdie e-pos en sy inhoud is onderhewig aan 'n regskennisgewing oor 
elektroniese pos wat gelees kan word by 
http://www.naspers.com/afrikaans/voorbehoud.php 'n Afskrif kan aangevra word by 
c...@optinet.net


RE: Solr CPU Usage

2014-08-28 Thread hendra_budiawan
HI Jacques,

Yes we index and search from this box, we have 6 core with almost 4000K
document each core getting and bigger each day. 

Regards,
Hendra Budiawan



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-CPU-Usage-tp4155370p412.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Solr CPU Usage

2014-08-28 Thread Jacques du Rand
HI Hendra
That doesn't seem overly huge...

I agree with the other person saying, from the top/htop graph it doesnt look 
too bad.
I will maybe try to split the searching/indexing as well try to schedule the 
delta index for the cores at different times maybe


PS.We had a nice little bump in efficiency by going with Tomcat 7 and java8.

Jacques


From: hendra_budiawan [hendra.budiawan...@gmail.com]
Sent: Thursday, August 28, 2014 9:46 AM
To: solr-user@lucene.apache.org
Subject: RE: Solr CPU Usage

HI Jacques,

Yes we index and search from this box, we have 6 core with almost 4000K
document each core getting and bigger each day.

Regards,
Hendra Budiawan



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-CPU-Usage-tp4155370p412.html
Sent from the Solr - User mailing list archive at Nabble.com.
This email and its contents are subject to an email legal notice that can be 
viewed at http://www.naspers.com/disclaimer.php Should you be unable to access 
the link provided, please email us for a copy at c...@optinet.net
Hierdie e-pos en sy inhoud is onderhewig aan 'n regskennisgewing oor 
elektroniese pos wat gelees kan word by 
http://www.naspers.com/afrikaans/voorbehoud.php 'n Afskrif kan aangevra word by 
c...@optinet.net


RE: Solr CPU Usage

2014-08-28 Thread hendra_budiawan
Hi Jacques,

I will try your advice to schedule index with different times also will try
to start research with tomcat7 and java8.

Thank you so much,
Hendra



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-CPU-Usage-tp4155370p4155562.html
Sent from the Solr - User mailing list archive at Nabble.com.


Indexing documents with ContentStreamUpdateRequest (SolrJ) asynchronously

2014-08-28 Thread Jorge Moreira
I am using SolrJ API 4.8 to index rich documents to solr. But i want
to index these documents asynchronously. The function that I made send
documents synchronously but i don't know how to change it to make it
asynchronously. Any idea?

Function:

public Boolean indexDocument(HttpSolrServer server, String PathFile,
InputReader external)
{

ContentStreamUpdateRequest up = new
ContentStreamUpdateRequest(/update/extract);

try {
up.addFile(new File(PathFile), text);
} catch (IOException e) {

Logger.getLogger(ANOIndexer.class.getName()).log(Level.SEVERE, null,
e);
return false;
}

up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);

try {
server.request(up);
} catch (SolrServerException e) {

Logger.getLogger(ANOIndexer.class.getName()).log(Level.SEVERE, null,
e);
return false;

} catch (IOException e) {

Logger.getLogger(ANOIndexer.class.getName()).log(Level.SEVERE, null,
e);
return false;
}
return true;
}

Solr server: version 4.8.


redo log for solr

2014-08-28 Thread Dmitry Kan
Hello solr users!

We have a case when any actions a user did to the solr shard should be
recorded for a possible later replay. This way we are looking at per user
replay feature such that if the user did something wrong accidentally or
because of a system level bug, we could restore a previous state.

Two actions are available:

1. INSERT new solr document
2. DELETE existing solr document

If user wants to perform an update on the existing document, we first
delete it and insert a new one with modified fields.

Are there any existing components / solutions in the Solr universe that
could help implement this?

Dmitry

-- 
Dmitry Kan
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info


Solr issue

2014-08-28 Thread Shay Sofer
Hi,

Version - 4.8.1

While executing this solr query (from solr web UI):

http://localhost:8983/solr/Global_A/select?q=%2Btext%3A%28shay*%29+rows=100fl=id%2CobjId%2Cnullshards=http%3A%2F%2F127.0.0.1%3A8983%2Fsolr%2F0_A%2Chttp%3A%2F%2F127.0.0.1%3A8983%2Fsolr%2FGlobal_Agroup=truegroup.query=name__s%3Ashaysort=name__s_sort+aschl=truehttp://localhost:8983/solr/cpm_Global_A/select?q=%2Btext%3A%28shay*%29+rows=100fl=id%2CobjId%2Cnullshards=http%3A%2F%2F127.0.0.1%3A8983%2Fsolr%2Fcpm_0_A%2Chttp%3A%2F%2F127.0.0.1%3A8983%2Fsolr%2Fcpm_Global_Agroup=truegroup.query=name__s%3Ashaysort=name__s_sort+aschl=true

We got NullPointerException:

java.lang.NullPointerException at 
org.apache.solr.handler.component.HighlightComponent.finishStage(HighlightComponent.java:189)
 at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:330)
 at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952) at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:774) 
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
 at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
 at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
 at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) 
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) 
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) 
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
 at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
 at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) 
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
 at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
 at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) 
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
 at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
 at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) 
at org.eclipse.jetty.server.Server.handle(Server.java:368) at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
 at 
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
 at 
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
 at 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
 at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at 
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at 
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
 at 
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
 at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
 at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) 
at java.lang.Thread.run(Thread.java:722)

Seems like integration of Grouping + shards + highlighting cause this 
NullPointerException.

Anyone familiar with this issue?

Thanks,
Shay.


Re: Query regarding URL Analysers

2014-08-28 Thread Sathyam
Gentle Reminder


On 21 August 2014 18:05, Sathyam sathyam.dorasw...@gmail.com wrote:

 Hi,

 I needed to generate tokens out of a URL such that I am able to get
 hierarchical units of the URL as well as each individual entity as tokens.
 For example:
 *Given a URL : *

 http://www.google.com/abcd/efgh/ijkl/mnop.php?a=10b=20c=30#xyz

 The tokens that I need are :

 *Hierarchical subsets of the URL*

 1 http://

 2 http://www.google.com/

 3 http://www.google.com/abcd/

  4 http://www.google.com/abcd/efgh/

 5 http://www.google.com/abcd/efgh/ijkl/

  6 h ttp://www.google.com/abcd/efgh/ijkl/mnop.php

 *Individual elements in the path to the resource*

 7 abcd

 8 efgh

 9 ijkl

 10 mnop.php

 *Query Terms*

 11 a=10

 12 b=20

 13 c=30

 *Fragment*
 14 xyz

 This comes to a total of 14 tokens for the given URL.
 Basically a URL analyzer that creates tokens based on the categories
 mentioned in bold. Also a separate token for port(if mentioned).

 I would like to know how this can be achieved by using a single analyzer
 that uses a combination of the tokenizers and filters provided by solr.
 Also curious to know why there is a restriction of only *one  *tokenizer
 to be used in an analyzer.
 Looking forward to a response from your side telling the best possible way
 to achieve the closest to what I need.

 Thanks.
 --
 Sathyam Doraswamy






-- 
Sathyam Doraswamy


Re: Help with StopFilterFactory

2014-08-28 Thread heaven
Hello,

Any thoughts on this? Should I open a jira ticket? Or how can we engage at
least one of Solr devs to this issue?

Best,
Alex



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-tp4153839p4155582.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Query regarding URL Analysers

2014-08-28 Thread Jack Krupansky
Sorry for the delay... take a look at the URL Classify update processor, 
which parses a URL and distributes the components to various fields:

http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/update/processor/URLClassifyProcessorFactory.html
http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/update/processor/URLClassifyProcessor.html

The official doc is... pitiful, but I have doc and examples in my e-book:
http://www.lulu.com/us/en/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-7/ebook/product-21203548.html

-- Jack Krupansky

-Original Message- 
From: Sathyam

Sent: Thursday, August 28, 2014 6:21 AM
To: solr-user@lucene.apache.org
Subject: Re: Query regarding URL Analysers

Gentle Reminder


On 21 August 2014 18:05, Sathyam sathyam.dorasw...@gmail.com wrote:


Hi,

I needed to generate tokens out of a URL such that I am able to get
hierarchical units of the URL as well as each individual entity as tokens.
For example:
*Given a URL : *

http://www.google.com/abcd/efgh/ijkl/mnop.php?a=10b=20c=30#xyz

The tokens that I need are :

*Hierarchical subsets of the URL*

1 http://

2 http://www.google.com/

3 http://www.google.com/abcd/

 4 http://www.google.com/abcd/efgh/

5 http://www.google.com/abcd/efgh/ijkl/

 6 h ttp://www.google.com/abcd/efgh/ijkl/mnop.php

*Individual elements in the path to the resource*

7 abcd

8 efgh

9 ijkl

10 mnop.php

*Query Terms*

11 a=10

12 b=20

13 c=30

*Fragment*
14 xyz

This comes to a total of 14 tokens for the given URL.
Basically a URL analyzer that creates tokens based on the categories
mentioned in bold. Also a separate token for port(if mentioned).

I would like to know how this can be achieved by using a single analyzer
that uses a combination of the tokenizers and filters provided by solr.
Also curious to know why there is a restriction of only *one  *tokenizer
to be used in an analyzer.
Looking forward to a response from your side telling the best possible way
to achieve the closest to what I need.

Thanks.
--
Sathyam Doraswamy







--
Sathyam Doraswamy 



Re: redo log for solr

2014-08-28 Thread Shawn Heisey
On 8/28/2014 3:10 AM, Dmitry Kan wrote:
 We have a case when any actions a user did to the solr shard should be
 recorded for a possible later replay. This way we are looking at per user
 replay feature such that if the user did something wrong accidentally or
 because of a system level bug, we could restore a previous state.
 
 Two actions are available:
 
 1. INSERT new solr document
 2. DELETE existing solr document
 
 If user wants to perform an update on the existing document, we first
 delete it and insert a new one with modified fields.
 
 Are there any existing components / solutions in the Solr universe that
 could help implement this?

I'm wondering what functionality you need beyond what Solr already
provides ... because it sounds like Solr already does a lot of what you
are implementing.

Solr already includes a transaction log that records all changes to the
index.  Each individual log is closed when you do a hard commit.  Enough
transaction logs are kept so that Solr can replay at least the last 100
transactions.  The entire transaction log is replayed when Solr is
restarted or a core is reloaded.

What you describe where you delete an existing document before inserting
a new one ... Solr already has that functionality built in, using the
uniqueKey.  That capability is further extended by the Atomic Update
functionality.

You're not new around here, so I don't think I'm telling you anything
you don't already know ... which may mean that I'm missing something. :)

Thanks,
Shawn



Re: redo log for solr

2014-08-28 Thread Dmitry Kan
It may mean that I wasn't clear enough :)

The idea is to build a paper trail system (without negative connotation!).
Such that for instance if user deleted some data _by mistake_ and we have
hard-committed to solr (upon which the tlog has been truncated), we paper
trail'ed the document before the delete for providing the restore
functionality.

So if tlog is meant to make soft commits durable, this feature will serve
more like undo functionality and persist the _history_ of modifications.

I'm currently investigating what you suggested over IRC -- the
UpdateProcessor. Looks like a way to go.

Thanks,

Dmitry


On Thu, Aug 28, 2014 at 4:16 PM, Shawn Heisey s...@elyograg.org wrote:

 On 8/28/2014 3:10 AM, Dmitry Kan wrote:
  We have a case when any actions a user did to the solr shard should be
  recorded for a possible later replay. This way we are looking at per user
  replay feature such that if the user did something wrong accidentally or
  because of a system level bug, we could restore a previous state.
 
  Two actions are available:
 
  1. INSERT new solr document
  2. DELETE existing solr document
 
  If user wants to perform an update on the existing document, we first
  delete it and insert a new one with modified fields.
 
  Are there any existing components / solutions in the Solr universe that
  could help implement this?

 I'm wondering what functionality you need beyond what Solr already
 provides ... because it sounds like Solr already does a lot of what you
 are implementing.

 Solr already includes a transaction log that records all changes to the
 index.  Each individual log is closed when you do a hard commit.  Enough
 transaction logs are kept so that Solr can replay at least the last 100
 transactions.  The entire transaction log is replayed when Solr is
 restarted or a core is reloaded.

 What you describe where you delete an existing document before inserting
 a new one ... Solr already has that functionality built in, using the
 uniqueKey.  That capability is further extended by the Atomic Update
 functionality.

 You're not new around here, so I don't think I'm telling you anything
 you don't already know ... which may mean that I'm missing something. :)

 Thanks,
 Shawn




-- 
Dmitry Kan
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info


Using a RequestHandler to expand query parameter

2014-08-28 Thread jimtronic
I would like to send only one query to my custom request handler and have the
request handler expand that query into a more complicated query.

Example:

*/myHandler?q=kids+books*

... would turn into a more complicated EDismax query of:

*kids books kids books*

Is this achievable via a Request Handler definition in solrconfig.xml?

Thanks!
Jim



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-a-RequestHandler-to-expand-query-parameter-tp4155596.html
Sent from the Solr - User mailing list archive at Nabble.com.


Problem with SOLR Collection creation

2014-08-28 Thread Kaushik
Hello,

We have deployed a solr.war file to a weblogic server. The web.xml has been
modified to have the path to the SOLR home as follows:
env-entryenv-entry-namesolr/home/env-entry-nameenv-entry-typejava.lang.String/env-entry-typeenv-entry-valueD:\SOLR\4.7.0\RegulatoryReview/env-entry-value/env-entry

The deployment of the Solr comes up fine. In the
D:\SOLR\4.7.0\RegulatoryReview directory we have RR folder under which the
conf directory with the required config files are present (solrconfig.xml,
schema.xml, etc). But when I try to add the collection to SOLR through the
admin console, I get the following error.

Thursday, August 28, 2014 10:06:37 AM ERROR SolrCore
org.apache.solr.common.SolrException: Error CREATEing SolrCore
'RegulatoryReview': Unable to create core: RegulatoryReview Caused by:
class org.apache.solr.search.LRUCache



org.apache.solr.common.SolrException: Error CREATEing SolrCore 'RR': Unable
to create core: RRCaused by: class org.apache.solr.search.LRUCache

at
org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:546)

at
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:152)

at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)

at
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:733)

at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:268)

at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:218)

at
weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:57)

at
weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.wrapRun(WebAppServletContext.java:3730)

at
weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.run(WebAppServletContext.java:3696)

at
weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:321)

at
weblogic.security.service.SecurityManager.runAs(SecurityManager.java:120)

at
weblogic.servlet.internal.WebAppServletContext.securedExecute(WebAppServletContext.java:2273)

at
weblogic.servlet.internal.WebAppServletContext.execute(WebAppServletContext.java:2179)

at
weblogic.servlet.internal.ServletRequestImpl.run(ServletRequestImpl.java:1490)

at
weblogic.work.ExecuteThread.execute(ExecuteThread.java:256)

at weblogic.work.ExecuteThread.run(ExecuteThread.java:221)

Caused by: org.apache.solr.common.SolrException: Unable to create core: RR

at
org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:989)

at
org.apache.solr.core.CoreContainer.create(CoreContainer.java:606)

at
org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:509)

at
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:152)

at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)

at
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:732)

at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:268)

at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217)

at
weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:56)

... 9 more

Caused by: org.apache.solr.common.SolrException: Could not load config file
D:\SOLR\4.7.0\RegulatoryReview\RR\solrconfig.xml

at
org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:530)

at
org.apache.solr.core.CoreContainer.create(CoreContainer.java:597)

at
org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:509)

at
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:152)

at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)

at
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:733)

at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:268)

at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:218)

at
weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:57)

... 9 more

Caused by: java.lang.ClassCastException: class
org.apache.solr.search.LRUCache

at java.lang.Class.asSubclass(Class.java:3027)

at

incomplete proximity boost for fielded searches

2014-08-28 Thread Burgmans, Tom
Consider query:
http://10.208.152.231:8080/solr/wkustaldocsphc_A/search?q=title:(Michigan 
Corporate Income Tax)debugQuery=truepf=titleps=255defType=edismax

The intention is to perform a search in field title and to apply a proximity 
boost within a window of 255 words. If I look at the debug information, I see:

str name=parsedquery
BoostedQuery(boost(+((title:michigan title:corporate title:income title:tax)~4) 
(title:corporate income tax~255)~1.0))
/str

Note that the first search term (michigan) is missing in the proximity boost 
clause. I can't believe this is intended behavior. 

Why is edismax splitting  (title:Michigan) and (Corporate Income Tax) while 
determining what to use for proximity boost?

Thanks, Tom


Issue with multivalued fields in UIMA

2014-08-28 Thread mkhordad
Hi all,
 I am trying to integrate Dictionary Annotator with Solr to find genotypes
in a multivalued field. It seems that it only works on the first row of
multivalued fields. I tried using SentenceAnnotation as well and the same
problem  occurs.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Issue-with-multivalued-fields-in-UIMA-tp4155609.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Problem with SOLR Collection creation

2014-08-28 Thread Shawn Heisey
On 8/28/2014 8:28 AM, Kaushik wrote:
 Hello,

 We have deployed a solr.war file to a weblogic server. The web.xml has been
 modified to have the path to the SOLR home as follows:
 env-entryenv-entry-namesolr/home/env-entry-nameenv-entry-typejava.lang.String/env-entry-typeenv-entry-valueD:\SOLR\4.7.0\RegulatoryReview/env-entry-value/env-entry

 The deployment of the Solr comes up fine. In the
 D:\SOLR\4.7.0\RegulatoryReview directory we have RR folder under which the
 conf directory with the required config files are present (solrconfig.xml,
 schema.xml, etc). But when I try to add the collection to SOLR through the
 admin console, I get the following error.

 Thursday, August 28, 2014 10:06:37 AM ERROR SolrCore
 org.apache.solr.common.SolrException: Error CREATEing SolrCore
 'RegulatoryReview': Unable to create core: RegulatoryReview Caused by:
 class org.apache.solr.search.LRUCache

It would seem there's a problem with the cache config in your
solrconfig.xml, or that there's some kind of problem with the Solr jars
contained within the war.  No testing is done with weblogic, so it's
always possible it's a class conflict with weblogic itself, but I would
bet on a config problem first.

 The issue I believe is that it is trying to find
 D:\SOLR\4.7.0\RegulatoryReview\RR\solrconfig.xml by ignoring the conf
 directory in which it should be finding it. What am I doing wrong?

This is SOLR-5814, a bug in the log messages, not the program logic.  I
thought it had been fixed by 4.8, but the issue is still unresolved.

https://issues.apache.org/jira/browse/SOLR-5814

Thanks,
Shawn



RE: solr query gives different numFound upon refreshing

2014-08-28 Thread Joshi, Shital
Hi Shawn,

Thanks for your reply. 

We did some tests enabling shards.info=true and confirmed that there is not 
duplicate copy of our index.  

We have one replica but many times we see three versions on Admin GUI/Overview 
tab. All three has different versions and gen. Is that a problem?
Master (Searching)  
Master (Replicable) 
Slave (Searching)   

We constantly see max searcher open exception. The warmup time is 1.5 minutes 
but the difference between openedAt date and registeredAt date is at times more 
than 4-5 minutes. Is the true searcher time the difference between two dates 
and not the warmupTime?

openedAt:   2014-08-28T16:17:24.829Z
registeredAt:   2014-08-28T16:21:02.278Z
warmupTime: 65727

Thanks for all help. 


-Original Message-
From: Shawn Heisey [mailto:s...@elyograg.org] 
Sent: Wednesday, August 27, 2014 2:37 PM
To: solr-user@lucene.apache.org
Subject: Re: solr query gives different numFound upon refreshing

On 8/27/2014 10:44 AM, Bryan Bende wrote:
 Theoretically this shouldn't happen, but is it possible that the two
 replicas for a given shard are not fully in sync?

 Say shard1 replica1 is missing a document that is in shard1 replica2... if
 you run a query that would hit on that document and run it a bunch of
 times, sometimes replica 1 will handle the request and sometimes replica 2
 will handle it, and it would change your number of results if one of them
 is missing a document. You could write a program that compares each
 replica's documents by querying them with distrib=false.

 If there was a replica out of sync, I would think it would detect that on a
 restart when comparing itself against the leader for that shard, but I'm
 not sure.

A replica out of sync is a possibility, but the most common reason for a
changing numFound is because the overall distributed index has more than
one document with the same uniqueKey value -- different versions of the
same document in more than one shard.

SolrCloud tries really hard to never end up with replicas out of sync,
but either due to highly unusual circumstances or bugs, it could still
happen.

Thanks,
Shawn



FileListEntityProcessor still ignores onError-Attribute!? (SOLR-2897?)

2014-08-28 Thread Heiko Ahrens
Hello,

it looks like I ran into an old problem: I configured an entity for data  
import with FileListEntityProcessor in data-config.xml. If the  
baseDir-Attribut points to a non-existing directory, the whole import  
process gets aborted no matter which value I provide in the  
onErrror-Attribute.

I did some search today. Several threads did include the same scenario but  
I could not find a solution.

It looks like the issue is already ackknowledged: SOLR-2897. But I'm not  
sure, since it's an almost 3 years old ticket. Any suggestions?

Btw.: What is the intended behaviour/difference for the onError-options  
skip and continue ?

H.

Re: Problem with SOLR Collection creation

2014-08-28 Thread Kaushik
The issue I was facing was that there were additonal librarires on the
classpath that were conflicting and not required. Removed those and the
problem dissapeared.

Thank you,
Kaushik


On Thu, Aug 28, 2014 at 11:50 AM, Shawn Heisey s...@elyograg.org wrote:

 On 8/28/2014 8:28 AM, Kaushik wrote:
  Hello,
 
  We have deployed a solr.war file to a weblogic server. The web.xml has
 been
  modified to have the path to the SOLR home as follows:
 
 env-entryenv-entry-namesolr/home/env-entry-nameenv-entry-typejava.lang.String/env-entry-typeenv-entry-valueD:\SOLR\4.7.0\RegulatoryReview/env-entry-value/env-entry
 
  The deployment of the Solr comes up fine. In the
  D:\SOLR\4.7.0\RegulatoryReview directory we have RR folder under which
 the
  conf directory with the required config files are present
 (solrconfig.xml,
  schema.xml, etc). But when I try to add the collection to SOLR through
 the
  admin console, I get the following error.
 
  Thursday, August 28, 2014 10:06:37 AM ERROR SolrCore
  org.apache.solr.common.SolrException: Error CREATEing SolrCore
  'RegulatoryReview': Unable to create core: RegulatoryReview Caused by:
  class org.apache.solr.search.LRUCache

 It would seem there's a problem with the cache config in your
 solrconfig.xml, or that there's some kind of problem with the Solr jars
 contained within the war.  No testing is done with weblogic, so it's
 always possible it's a class conflict with weblogic itself, but I would
 bet on a config problem first.

  The issue I believe is that it is trying to find
  D:\SOLR\4.7.0\RegulatoryReview\RR\solrconfig.xml by ignoring the conf
  directory in which it should be finding it. What am I doing wrong?

 This is SOLR-5814, a bug in the log messages, not the program logic.  I
 thought it had been fixed by 4.8, but the issue is still unresolved.

 https://issues.apache.org/jira/browse/SOLR-5814

 Thanks,
 Shawn




How to accomadate huge data

2014-08-28 Thread Ethan
Our index size is 110GB and growing, crossed RAM capacity of 96GB, and we
are seeing a lot of disk and network IO resulting in huge latencies and
instability(one of the server used to shutdown and stay in recovery mode
when restarted).  Our admin added swap space and that seemed to have
mitigated the issue.

But what is the usual practice in such scenario?  Index size eventually
outgrows RAM and is pushed on to disk.  Is it advisable to shard(solr forum
says no)? Or is there a different mechanism?

System config:
We have 3 node cluster with RAID1 SSD.  Two nodes are running solr and the
other is to maintain Quorum.

-E


Re: How to accomadate huge data

2014-08-28 Thread Shawn Heisey
On 8/28/2014 11:57 AM, Ethan wrote:
 Our index size is 110GB and growing, crossed RAM capacity of 96GB, and we
 are seeing a lot of disk and network IO resulting in huge latencies and
 instability(one of the server used to shutdown and stay in recovery mode
 when restarted).  Our admin added swap space and that seemed to have
 mitigated the issue.

Adding swap space doesn't seem like it would actually fix anything.  If
the system is actively swapping, performance will be terrible.

Assuming your heap size and query volume are not enormous, 96GB of RAM
for an index size of 110GB seems like it would actually be pretty good. 
Remember that you have to subtract all heap requirements (java and
otherwise) from the total RAM in order to determine how much RAM is left
for caching the index.  The ideal setup has enough extra RAM (beyond
what's required for the software itself) to cache the entire index, but
that ideal is usually not required.  In most cases, getting between half
and two thirds of the index into RAM is enough.  One thing to note: If
you don't have the entire index fitting into RAM, the server will
probably not be able to handle an extreme query volume.

 But what is the usual practice in such scenario?  Index size eventually
 outgrows RAM and is pushed on to disk.  Is it advisable to shard(solr forum
 says no)? Or is there a different mechanism?

 System config:
 We have 3 node cluster with RAID1 SSD.  Two nodes are running solr and the
 other is to maintain Quorum.

Whether or not to shard depends on several factors, not the least of
which is whether or not the features that you are using will work on a
distributed index.  My index is slightly larger than yours, and it's
sharded.  I don't run SolrCloud, the sharding is completely manual.

Thanks,
Shawn



re: How to accomadate huge data

2014-08-28 Thread Chris Morley
Look into SolrCloud.
  
  
  


 From: Ethan eh198...@gmail.com
Sent: Thursday, August 28, 2014 1:59 PM
To: solr-user solr-user@lucene.apache.org
Subject: How to accomadate huge data   
Our index size is 110GB and growing, crossed RAM capacity of 96GB, and we
are seeing a lot of disk and network IO resulting in huge latencies and
instability(one of the server used to shutdown and stay in recovery mode
when restarted). Our admin added swap space and that seemed to have
mitigated the issue.

But what is the usual practice in such scenario? Index size eventually
outgrows RAM and is pushed on to disk. Is it advisable to shard(solr forum
says no)? Or is there a different mechanism?

System config:
We have 3 node cluster with RAID1 SSD. Two nodes are running solr and the
other is to maintain Quorum.

-E
 



RE: How to accomadate huge data

2014-08-28 Thread Toke Eskildsen
kokatnur.vi...@gmail.com [kokatnur.vi...@gmail.com] On Behalf Of Ethan 
[eh198...@gmail.com] wrote:
 Our index size is 110GB and growing, crossed RAM capacity of 96GB, and we
 are seeing a lot of disk and network IO resulting in huge latencies and
 instability(one of the server used to shutdown and stay in recovery mode
 when restarted).  Our admin added swap space and that seemed to have
 mitigated the issue.

Something is off here. I can understand disk IO going up when the index size 
increases, but why would it cause more network IO? Are you using networked 
storage or performing aggressive synchronization? Can you describe how hard you 
are hitting your indexes, both for updates and queries? What is huge 
latencies? Have you tried profiling the running Solrs to see if the heap size 
is large enough?

 We have 3 node cluster with RAID1 SSD.  Two nodes are running solr and the
 other is to maintain Quorum.

Is that on the same physical hardware or on three separate ones? 

- Toke Eskildsen


Re: How to accomadate huge data

2014-08-28 Thread Ethan
On Thu, Aug 28, 2014 at 11:12 AM, Shawn Heisey s...@elyograg.org wrote:

 On 8/28/2014 11:57 AM, Ethan wrote:
  Our index size is 110GB and growing, crossed RAM capacity of 96GB, and we
  are seeing a lot of disk and network IO resulting in huge latencies and
  instability(one of the server used to shutdown and stay in recovery mode
  when restarted).  Our admin added swap space and that seemed to have
  mitigated the issue.

 Adding swap space doesn't seem like it would actually fix anything.  If
 the system is actively swapping, performance will be terrible.


Assuming your heap size and query volume are not enormous, 96GB of RAM
 for an index size of 110GB seems like it would actually be pretty good.


*E  Before adding swap space nodes used to shutdown due to OOM or crash
after 2-5 minutes of uptime. By bumping swap space the server came up
cleanly.  ** We have 7GB of heap.  I'll need to ask admin more questions to
know how it was solved.*



 Remember that you have to subtract all heap requirements (java and
 otherwise) from the total RAM in order to determine how much RAM is left
 for caching the index.  The ideal setup has enough extra RAM (beyond
 what's required for the software itself) to cache the entire index, but
 that ideal is usually not required.  In most cases, getting between half
 and two thirds of the index into RAM is enough.  One thing to note: If
 you don't have the entire index fitting into RAM, the server will
 probably not be able to handle an extreme query volume.


*E  Our query volume is low right now, about 30 **TPS for /select.
But** /update
is 80 and /get around 100 TPS.  In our SolrCloud setup  we don't have a
separate replication node that handles select traffic. The server currently
has 12-40ms TP99 as we don't have any facets or complex queries.  *


  But what is the usual practice in such scenario?  Index size eventually
  outgrows RAM and is pushed on to disk.  Is it advisable to shard(solr
 forum
  says no)? Or is there a different mechanism?
 
  System config:
  We have 3 node cluster with RAID1 SSD.  Two nodes are running solr and
 the
  other is to maintain Quorum.

 Whether or not to shard depends on several factors, not the least of
 which is whether or not the features that you are using will work on a
 distributed index.  My index is slightly larger than yours, and it's
 sharded.  I don't run SolrCloud, the sharding is completely manual.

 *E  Interesting. Whats your select and update TPS/TP99?  We index around
6-8Gb data every month.  I think we will need more than one server to
handle our index in the long run without degrading performance.*

Thanks,
 Shawn




RE: How to accomadate huge data

2014-08-28 Thread Toke Eskildsen
kokatnur.vi...@gmail.com [kokatnur.vi...@gmail.com] On Behalf Of Ethan 
[eh198...@gmail.com] wrote:
 Before adding swap space nodes used to shutdown due to OOM or crash
 after 2-5 minutes of uptime. By bumping swap space the server came up
 cleanly.  ** We have 7GB of heap.  I'll need to ask admin more questions to
 know how it was solved.*

Yes, please. What you are describing is not solved by adding swap, unless the 
system has very little free RAM.

- Toke Eskildsen


Re: Solr CPU Usage

2014-08-28 Thread Chris Hostetter

: Yes i'm just worried about load average reported by OS, because last week
: suddenly server can't accessed  so we have to hard reboot. I'm still
: investigating what is the problem, because this server is dedicated to solr

ok - so here is the key bit.

basically, nothing else you've mentioend in this thread indicates any sort 
of problem -- your load (now, the one you've observed) is fine.  the 
questionsi what happened last week?

do you have any metrics/monitoring information from the server when you 
actually had a problem?

do you have any logs (from Solr, or from jetty, or from the OS, or any 
OS/hardware monitoring tools from last week when the problem happened?

define server can't accessed?  do you mean solr wasn't responding to 
queries, or do you mean i couldn't evevne ping the machine, let alone ssh 
to it? ... because there is a big difference.

if you can ssh to a machine, but solr is not responding, then generating 
thread dumps would help see what solr is doing.


-Hoss
http://www.lucidworks.com/


Re: Solr issue

2014-08-28 Thread Patanachai Tangchaisin

Hi Shay,

I'm not quite sure about this.
But, I think it is get fixed with this.

https://issues.apache.org/jira/browse/SOLR-6223
https://issues.apache.org/jira/browse/SOLR-4186
https://issues.apache.org/jira/browse/SOLR-4049

Could you try 4.10 from a svn branch and see if your problem is fixed?

Thanks,
Patanachai



On 08/28/2014 03:23 AM, Shay Sofer wrote:

Hi,

Version - 4.8.1

While executing this solr query (from solr web UI):

http://localhost:8983/solr/Global_A/select?q=%2Btext%3A%28shay*%29+rows=100fl=id%2CobjId%2Cnullshards=http%3A%2F%2F127.0.0.1%3A8983%2Fsolr%2F0_A%2Chttp%3A%2F%2F127.0.0.1%3A8983%2Fsolr%2FGlobal_Agroup=truegroup.query=name__s%3Ashaysort=name__s_sort+aschl=truehttp://localhost:8983/solr/cpm_Global_A/select?q=%2Btext%3A%28shay*%29+rows=100fl=id%2CobjId%2Cnullshards=http%3A%2F%2F127.0.0.1%3A8983%2Fsolr%2Fcpm_0_A%2Chttp%3A%2F%2F127.0.0.1%3A8983%2Fsolr%2Fcpm_Global_Agroup=truegroup.query=name__s%3Ashaysort=name__s_sort+aschl=true

We got NullPointerException:

java.lang.NullPointerException at 
org.apache.solr.handler.component.HighlightComponent.finishStage(HighlightComponent.java:189)
 at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:330)
 at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952) at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:774) 
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
 at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
 at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
 at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) 
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) 
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) 
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
 at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
 at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) 
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
 at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
 at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) 
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
 at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
 at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) 
at org.eclipse.jetty.server.Server.handle(Server.java:368) at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
 at 
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
 at 
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
 at 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
 at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at 
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at 
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
 at 
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
 at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
 at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) 
at java.lang.Thread.run(Thread.java:722)

Seems like integration of Grouping + shards + highlighting cause this 
NullPointerException.

Anyone familiar with this issue?

Thanks,
Shay.





CONFIDENTIALITY NOTICE
==
This email message and any attachments are for the exclusive use of the 
intended recipient(s) and may contain confidential and privileged information. 
Any unauthorized review, use, disclosure or distribution is prohibited. If you 
are not the intended recipient, please contact the sender by reply email and 
destroy all copies of the original message along with any attachments, from 
your computer system. If you are the intended recipient, please be advised that 
the content of this message is subject to access, review and disclosure by the 
sender's Email System Administrator.


CopyField Wildcard Exception possible?

2014-08-28 Thread O. Olson
I have hundreds of fields of the form in my schema.xml: 

 field name=F10434 type=string indexed=true stored=true
multiValued=true/
 field name=B20215 type=string indexed=true stored=true
multiValued=true/
  .

I also have a field 'text' that is set as the Default Search Field

field name=text type=text indexed=true stored=false
multiValued=true/

I populate this 'text' field using CopyField as: 

copyField source=* dest=text/

This '*' worked so far. However, I now want to exclude some of the fields
from this i.e. I would like 'text' to contain everything (hundreds of
fields) except a few. Is there any way to do this?

One of the ways would be to specify the '*' explicitly e.g. 

copyField source=F10434 dest=text/
copyField source=B20215 dest=text/
 

and in this list I would exclude the ones I do not want. Is there an
alternative to this? (I would like an alternative because putting these
copyFields would be long and too difficult.


Thank you
O. O.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/CopyField-Wildcard-Exception-possible-tp4155686.html
Sent from the Solr - User mailing list archive at Nabble.com.


Two (or more) uniqueKey fields?

2014-08-28 Thread Shawn Heisey
An odd requirement has come my way.  One of our indexes has uniqueness
on two different fields, but because Solr only allows one uniqueKey
field, we cannot have automatic document replacement on both of the
fields.  This means that the indexing code must handle it, which (for
reasons I don't fully understand) currently results in some *massive*
delete requests being sent frequently -- one such request was over 130KB
in size.  They look like field:(x || y || z) -- but with a LOT of
different values.

How much pain would it take to implement multiple uniqueKeys?  I have
not searched Jira for an existing issue.

Thanks,
Shawn



Re: Two (or more) uniqueKey fields?

2014-08-28 Thread Alexandre Rafalovitch
Can't you do a composite unique key? Combine them during indexing in URP stage.

Regards,
  Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On Thu, Aug 28, 2014 at 4:41 PM, Shawn Heisey s...@elyograg.org wrote:
 An odd requirement has come my way.  One of our indexes has uniqueness
 on two different fields, but because Solr only allows one uniqueKey
 field, we cannot have automatic document replacement on both of the
 fields.  This means that the indexing code must handle it, which (for
 reasons I don't fully understand) currently results in some *massive*
 delete requests being sent frequently -- one such request was over 130KB
 in size.  They look like field:(x || y || z) -- but with a LOT of
 different values.

 How much pain would it take to implement multiple uniqueKeys?  I have
 not searched Jira for an existing issue.

 Thanks,
 Shawn



Re: Two (or more) uniqueKey fields?

2014-08-28 Thread Shawn Heisey
On 8/28/2014 2:46 PM, Alexandre Rafalovitch wrote:
 Can't you do a composite unique key? Combine them during indexing in URP 
 stage.

That's an interesting idea.  If they aren't *independently* unique
(which would make it impossible to treat them as a single unit
together), that might work.  Thanks for the idea!  I'll chase it down on
this end.

Shawn



Re: Two (or more) uniqueKey fields?

2014-08-28 Thread Chris Hostetter

: That's an interesting idea.  If they aren't *independently* unique
: (which would make it impossible to treat them as a single unit
: together), that might work.  Thanks for the idea!  I'll chase it down on

if they are independently unique, check out the 
SignatureUpdateProcessorFactory, but be aware of SOLR-3473 

https://cwiki.apache.org/confluence/display/solr/De-Duplication

: 

-Hoss
http://www.lucidworks.com/


Using Update Handler to Combine Data from 2 Cores

2014-08-28 Thread Carlos Maroto
Hi,

Say I have an index of Product Types and a different index of Products
that belong to one of the types in the other index.  Users will do their
searches for attributes of types and products combined so the two distinct,
but related indices must be combined into a single, flattened index so that
the searches and relevancy ranking can be done appropriately.  Let's call
this 3rd index type+product index.

I've been asked by a customer to implement a custom update processor chain
for the 3rd index that will get as input two values that define a
relationship between a product and its corresponding type.  In other words,
the documents posted to the type+product index would simply be a value that
corresponds with the uniqueId of a product type doc and another value that
represents the uniqueId of the specific product of that type.  An update
processor would then read all fields stored in the product type index and
append them to the document, then another update processor would take the
other key and read the stored fields in the products index to also append
them to the doc that will then be ready to be indexed into the 3rd core for
merged content.

I explained to the customer already that this would be custom development,
for which we would need to extend various classes and implement ourselves
the desired logic (not modifying anything in trunk, preferably).

Has anyone implemented something similar? Is there anything that would
prevent this from being possible in Solr?

Here is an example scenario to illustrate what I've been asked to implement.
Product Types:
*
T1  car
T2  truck
T3  motorcycle

Products:
**
1   white  $14500
2   red $  5600
3   white  $  3300
4   blue   $ 88000

Possible searches:
*
white car
red motorcycle
white truck

Notice that with the two independent data sets above it is not possible to
implement this solution.  Therefore the idea to create a 3rd index (core)
which will take the relationships:

typeId = T1, prodId = 1
typeId = T3, prodId = 2
typeId = T3, prodId = 3
typeId = T2, prodId = 4

To generate through a custom update processing chain an index consisting of:
Type+Product

T1+1   car   white  $14500
T3+2   motorcycle   red $  5600
T3+3   motorcycle   white  $  3300
T2+4   truckblue   $ 88000

Thanks,
Carlos


Re: CopyField Wildcard Exception possible?

2014-08-28 Thread Joe Gresock
We would enjoy this feature as well, if you'd like to create a JIRA ticket.


On Thu, Aug 28, 2014 at 4:21 PM, O. Olson olson_...@yahoo.it wrote:

 I have hundreds of fields of the form in my schema.xml:

  field name=F10434 type=string indexed=true stored=true
 multiValued=true/
  field name=B20215 type=string indexed=true stored=true
 multiValued=true/
   .

 I also have a field 'text' that is set as the Default Search Field

 field name=text type=text indexed=true stored=false
 multiValued=true/

 I populate this 'text' field using CopyField as:

 copyField source=* dest=text/

 This '*' worked so far. However, I now want to exclude some of the fields
 from this i.e. I would like 'text' to contain everything (hundreds of
 fields) except a few. Is there any way to do this?

 One of the ways would be to specify the '*' explicitly e.g.

 copyField source=F10434 dest=text/
 copyField source=B20215 dest=text/
  

 and in this list I would exclude the ones I do not want. Is there an
 alternative to this? (I would like an alternative because putting these
 copyFields would be long and too difficult.


 Thank you
 O. O.




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/CopyField-Wildcard-Exception-possible-tp4155686.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
I know what it is to be in need, and I know what it is to have plenty.  I
have learned the secret of being content in any and every situation,
whether well fed or hungry, whether living in plenty or in want.  I can do
all this through him who gives me strength.*-Philippians 4:12-13*


After zk restart SOLR can't update its clusterstate.json

2014-08-28 Thread Ugo Matrangolo
Hi,

just after we finished to restart our zk cluster SOLR started to fail with
tons of zk events.

We shut down all the nodes and restarted them one by one but looks like the
clusterstate.json does not get updated properly.

Example:

core_node11 {

 state:active,

base_url:http://10.140.4.161:9765
http://t.signauxdix.com/link?url=http%3A%2F%2F10.140.4.161%3A9765%2Fukey=agxzfnNpZ25hbHNjcnhyGAsSC1VzZXJQcm9maWxlGICAwKnt6LQIDAk=70f82f78-368e-46bc-c0e5-2c271f002d3c
,

core:sku_shard1_replica11,

SOLR on the above node is actually down :/ and correctly does not appear in
the live_nodes.

any clue ?


Ugo


Re: After zk restart SOLR can't update its clusterstate.json

2014-08-28 Thread Ugo Matrangolo
Just adding some info:

whan I do:

curl -v 'http://10.140.3.25:9765/zookeeper?wt=json'

it takes ages to come back and on the Admin UI I can't see the Cloud Graph.

Ugo


On Fri, Aug 29, 2014 at 12:52 AM, Ugo Matrangolo ugo.matrang...@gmail.com
wrote:

 Hi,

 just after we finished to restart our zk cluster SOLR started to fail with
 tons of zk events.

 We shut down all the nodes and restarted them one by one but looks like
 the clusterstate.json does not get updated properly.

 Example:

 core_node11 {

  state:active,

 base_url:http://10.140.4.161:9765
 http://t.signauxdix.com/link?url=http%3A%2F%2F10.140.4.161%3A9765%2Fukey=agxzfnNpZ25hbHNjcnhyGAsSC1VzZXJQcm9maWxlGICAwKnt6LQIDAk=70f82f78-368e-46bc-c0e5-2c271f002d3c
 ,

 core:sku_shard1_replica11,

 SOLR on the above node is actually down :/ and correctly does not appear
 in the live_nodes.

 any clue ?


 Ugo




On Fri, Aug 29, 2014 at 12:52 AM, Ugo Matrangolo ugo.matrang...@gmail.com
wrote:

 Hi,

 just after we finished to restart our zk cluster SOLR started to fail with
 tons of zk events.

 We shut down all the nodes and restarted them one by one but looks like
 the clusterstate.json does not get updated properly.

 Example:

 core_node11 {

  state:active,

 base_url:http://10.140.4.161:9765
 http://t.signauxdix.com/link?url=http%3A%2F%2F10.140.4.161%3A9765%2Fukey=agxzfnNpZ25hbHNjcnhyGAsSC1VzZXJQcm9maWxlGICAwKnt6LQIDAk=70f82f78-368e-46bc-c0e5-2c271f002d3c
 ,

 core:sku_shard1_replica11,

 SOLR on the above node is actually down :/ and correctly does not appear
 in the live_nodes.

 any clue ?


 Ugo



Re: Solr CPU Usage

2014-08-28 Thread rulinma
I think that is configs not tuned well. 
Can use jmx to monitor what is doing?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-CPU-Usage-tp4155370p4155747.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: After zk restart SOLR can't update its clusterstate.json

2014-08-28 Thread Shawn Heisey
On 8/28/2014 5:52 PM, Ugo Matrangolo wrote:
 just after we finished to restart our zk cluster SOLR started to fail with
 tons of zk events.
 
 We shut down all the nodes and restarted them one by one but looks like the
 clusterstate.json does not get updated properly.

On IRC, you mentioned you were on 4.7.2.

I wonder if maybe the overseer queue is not being processed?  Can you
look in that section of zookeeper?

The big overseer queue bug (SOLR-5811) was fixed in 4.7.1, but I know
there was at least one more bug fixed in 4.8 or later.

Thanks,
Shawn



Re: Solr CPU Usage

2014-08-28 Thread Greg Harris
Here is a quick way you can identify which thread is taking up all your CPU.

1) Look at top (or htop) sorted by CPU Usage and with threads toggled on -
hit capital 'H'
2) Get the native process ids of the threads taking up a lot of CPU
3) Convert that number to hex using a converter:
http://www.mathsisfun.com/binary-decimal-hexadecimal-converter.html
4) Use the hex number to identify the problematic threads on the thread
dump via the nid= value. So for example:
nid=0x549 would equate to the native thread id of 1353 on top.

Take a thread dump and identify any problematic threads so you can see the
stack trace.
However, Chris has pointed out that there is as of yet no evidence your
outage is related to CPU overload.

Greg

On Thu, Aug 28, 2014 at 6:45 PM, rulinma ruli...@gmail.com wrote:

 I think that is configs not tuned well.
 Can use jmx to monitor what is doing?



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-CPU-Usage-tp4155370p4155747.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Using wild characters in query doesn't work with my configuraiton

2014-08-28 Thread Erick Erickson
OK, Do not, repeat NOT use different tokenizers
at index and query time unless you are _very_ sure
that you know exactly what the consequences are.

Take a look at the admin/analyzer page for the
field in question and put your values in. You'll see
that what's in your index is very different than what's
being looked for at query time.

Nothing is worth trying until you straighten this out.

The other great resource is adding debug=query
to your URL and examining the parsed query.

Best,
Erick


On Wed, Aug 27, 2014 at 12:08 PM, Romain Pigeyre rpige...@gmail.com wrote:

 Hi,

 I have a little mistake using Solr :

 I can query this : lastName:HK+IE
 The result contains the next record :
 { customerId: 0003500226598, countryLibelle: HONG KONG,
 firstName1:
 lC /o, countryCode: HK, address1:  1F0/, address2: 11-35, 
 storeId: 100, lastName1: HK IE, city: HONG KONG, _version_:
 1477612965227135000 }
 NB : lastName contains the lastName1 field.

 When I'm adding * on the same query : lastName:*HK*+*IE*, there is no
 result. I hoped that the * character replace 0 to n character.

 Here is my configuration :
 field name=lastName type=text_general indexed=true stored=false
 multiValued=true/

 copyField source=lastName1 dest=lastName/
 copyField source=lastName2 dest=lastName/

 fieldType name=text_general class=solr.TextField
 positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt /
 !-- in this example, we will only use synonyms at query time
 filter class=solr.SynonymFilterFactory
 synonyms=index_synonyms.txt ignoreCase=true expand=false/
 --
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
   analyzer type=query
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt /
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
 /fieldType

 I'm using a WhitespaceTokenizerFactory at indexing time in order to keep
 specials characters : /?...
 After this configuration, I restarted Solr and re-indexed data.

 Is Somebody have any idea to resolve this issue?

 Thanks a lot

 --

 *-Romain PIGEYRE*



Re: incomplete proximity boost for fielded searches

2014-08-28 Thread Erick Erickson
feels like a JIRA to me.

This _does_ seem weird.

if I omit the field qualification, i.e. my query is:
q=Michigan
http://10.208.152.231:8080/solr/wkustaldocsphc_A/search?q=title:(Michigan
Corporate
Income TaxdebugQuery=truepf=titleps=255defType=edismax
it works fine.

I can get the results I think you expect by omitting the field qualifier
and defining my default search field as:
q=Michigan
http://10.208.152.231:8080/solr/wkustaldocsphc_A/search?q=title:(Michigan
Corporate
Income TaxdebugQuery=truepf=titleps=255defType=edismaxdf=title

But the fact that you get the results feels like a bug. Or at least
something that I don't understand.

Feels like a bug to me, do others agree?

Can you raise a JIRA? on this?

Best,
Erick


On Thu, Aug 28, 2014 at 7:41 AM, Burgmans, Tom 
tom.burgm...@wolterskluwer.com wrote:

 Consider query:
 http://10.208.152.231:8080/solr/wkustaldocsphc_A/search?q=title:(Michigan
 Corporate Income Tax)debugQuery=truepf=titleps=255defType=edismax

 The intention is to perform a search in field title and to apply a
 proximity boost within a window of 255 words. If I look at the debug
 information, I see:

 str name=parsedquery
 BoostedQuery(boost(+((title:michigan title:corporate title:income
 title:tax)~4) (title:corporate income tax~255)~1.0))
 /str

 Note that the first search term (michigan) is missing in the proximity
 boost clause. I can't believe this is intended behavior.

 Why is edismax splitting  (title:Michigan) and (Corporate Income Tax)
 while determining what to use for proximity boost?

 Thanks, Tom



Re: solr query gives different numFound upon refreshing

2014-08-28 Thread Erick Erickson
First, I  want to be sure you're not mixing old-style
replication and SolrCloud. Your use of Master/Slave
causes this question.

Second, your maxWarmingSearchers error indicates that
your commit interval is too short relative to your autowarm
times. Try lengthening your autocommit settings (probably
soft commit) until you no longer see that error message
and see if the problem goes away. If it doesn't, let us know.

Best,
Erick



On Thu, Aug 28, 2014 at 9:39 AM, Joshi, Shital shital.jo...@gs.com wrote:

 Hi Shawn,

 Thanks for your reply.

 We did some tests enabling shards.info=true and confirmed that there is
 not duplicate copy of our index.

 We have one replica but many times we see three versions on Admin
 GUI/Overview tab. All three has different versions and gen. Is that a
 problem?
 Master (Searching)
 Master (Replicable)
 Slave (Searching)

 We constantly see max searcher open exception. The warmup time is 1.5
 minutes but the difference between openedAt date and registeredAt date is
 at times more than 4-5 minutes. Is the true searcher time the difference
 between two dates and not the warmupTime?

 openedAt:   2014-08-28T16:17:24.829Z
 registeredAt:   2014-08-28T16:21:02.278Z
 warmupTime: 65727

 Thanks for all help.


 -Original Message-
 From: Shawn Heisey [mailto:s...@elyograg.org]
 Sent: Wednesday, August 27, 2014 2:37 PM
 To: solr-user@lucene.apache.org
 Subject: Re: solr query gives different numFound upon refreshing

 On 8/27/2014 10:44 AM, Bryan Bende wrote:
  Theoretically this shouldn't happen, but is it possible that the two
  replicas for a given shard are not fully in sync?
 
  Say shard1 replica1 is missing a document that is in shard1 replica2...
 if
  you run a query that would hit on that document and run it a bunch of
  times, sometimes replica 1 will handle the request and sometimes replica
 2
  will handle it, and it would change your number of results if one of them
  is missing a document. You could write a program that compares each
  replica's documents by querying them with distrib=false.
 
  If there was a replica out of sync, I would think it would detect that
 on a
  restart when comparing itself against the leader for that shard, but I'm
  not sure.

 A replica out of sync is a possibility, but the most common reason for a
 changing numFound is because the overall distributed index has more than
 one document with the same uniqueKey value -- different versions of the
 same document in more than one shard.

 SolrCloud tries really hard to never end up with replicas out of sync,
 but either due to highly unusual circumstances or bugs, it could still
 happen.

 Thanks,
 Shawn




Re: Problem with SOLR Collection creation

2014-08-28 Thread Erick Erickson
Ahhh, thanks for bringing closure to this! Whew!

Erick


On Thu, Aug 28, 2014 at 10:47 AM, Kaushik kaushika...@gmail.com wrote:

 The issue I was facing was that there were additonal librarires on the
 classpath that were conflicting and not required. Removed those and the
 problem dissapeared.

 Thank you,
 Kaushik


 On Thu, Aug 28, 2014 at 11:50 AM, Shawn Heisey s...@elyograg.org wrote:

  On 8/28/2014 8:28 AM, Kaushik wrote:
   Hello,
  
   We have deployed a solr.war file to a weblogic server. The web.xml has
  been
   modified to have the path to the SOLR home as follows:
  
 
 env-entryenv-entry-namesolr/home/env-entry-nameenv-entry-typejava.lang.String/env-entry-typeenv-entry-valueD:\SOLR\4.7.0\RegulatoryReview/env-entry-value/env-entry
  
   The deployment of the Solr comes up fine. In the
   D:\SOLR\4.7.0\RegulatoryReview directory we have RR folder under which
  the
   conf directory with the required config files are present
  (solrconfig.xml,
   schema.xml, etc). But when I try to add the collection to SOLR through
  the
   admin console, I get the following error.
  
   Thursday, August 28, 2014 10:06:37 AM ERROR SolrCore
   org.apache.solr.common.SolrException: Error CREATEing SolrCore
   'RegulatoryReview': Unable to create core: RegulatoryReview Caused by:
   class org.apache.solr.search.LRUCache
 
  It would seem there's a problem with the cache config in your
  solrconfig.xml, or that there's some kind of problem with the Solr jars
  contained within the war.  No testing is done with weblogic, so it's
  always possible it's a class conflict with weblogic itself, but I would
  bet on a config problem first.
 
   The issue I believe is that it is trying to find
   D:\SOLR\4.7.0\RegulatoryReview\RR\solrconfig.xml by ignoring the conf
   directory in which it should be finding it. What am I doing wrong?
 
  This is SOLR-5814, a bug in the log messages, not the program logic.  I
  thought it had been fixed by 4.8, but the issue is still unresolved.
 
  https://issues.apache.org/jira/browse/SOLR-5814
 
  Thanks,
  Shawn