full garbage collection about upgraded the solr version to 5.3.0

2015-10-27 Thread 殷亚云
HI

 Recently, I upgraded the solr version in our server, from 4.7.2 to
5.3 .0 .

 

 Through the pressure test, I found that full garbage collection is
becoming very frequent in the 5.3 version. I noticed that in BooleanScorer.
< init >, TermWeight scorer with frequent memory using,I don't know whether
it will cause frequent full garbage collection?

 

 So, I want to confirm whether there is such a problem. It has a
solution?

 

Thanks.



Re: Highlighting content field problem when using JiebaTokenizerFactory

2015-10-27 Thread Zheng Lin Edwin Yeo
Hi Scott,

Thank you for providing the links and references. Will look through them,
and let you know if I find any solutions or workaround.

Regards,
Edwin


On 27 October 2015 at 11:13, Scott Chu  wrote:

>
> Take a look at Michael's 2 articles, they might help you calrify the idea
> of highlighting in Solr:
>
> Changing Bits: Lucene's TokenStreams are actually graphs!
>
> http://blog.mikemccandless.com/2012/04/lucenes-tokenstreams-are-actually.html
>
> Also take a look at 4th paragraph In his another article:
>
> Changing Bits: A new Lucene highlighter is born
>
> http://blog.mikemccandless.com/2012/12/a-new-lucene-highlighter-is-born.html
>
> Currently, I can't figure out the possible cause of your problem unless I
> got spare time to test it on my own, which is not available these days (Got
> some projects to close)!
>
> If you find the solution or workaround, pls. let us know. Good luck again!
>
> Scott Chu,scott@udngroup.com
> 2015/10/27
>
> - Original Message -
> *From: *Scott Chu 
> *To: *solr-user 
> *Date: *2015-10-27, 10:27:45
> *Subject: *Re: Highlighting content field problem when using
> JiebaTokenizerFactory
>
> Hi Edward,
>
> Took a lot of time to see if there's anything can help you to define
> the cause of your problem. Maybe this might help you a bit:
>
> [SOLR-4722] Highlighter which generates a list of query term position(s)
> for each item in a list of documents, or returns null if highlighting is
> disabled. - AS...
> https://issues.apache.org/jira/browse/SOLR-4722
>
> This one is modified from FastVectorHighLighter, so ensure those 3 term*
> attributes are on.
>
> Scott Chu,scott@udngroup.com
> 2015/10/27
>
> - Original Message -
> *From: *Zheng Lin Edwin Yeo 
> *To: *solr-user 
> *Date: *2015-10-23, 10:42:32
> *Subject: *Re: Highlighting content field problem when using
> JiebaTokenizerFactory
>
> Hi Scott,
>
> Thank you for your respond.
>
> 1. You said the problem only happens on "contents" field, so maybe there're
> something wrong with the contents of that field. Doe it contain any special
> thing in them, e.g. HTML tags or symbols. I recall SOLR-42 mentions
> something about HTML stripping will cause highlight problem. Maybe you can
>
> try purify that fields to be closed to pure text and see if highlight comes
> ok.
> *A) I check that the SOLR-42 is mentioning about the
> HTMLStripWhiteSpaceTokenizerFactory, which I'm not using. I believe that
> tokenizer is already deprecated too. I've tried with all kinds of content
> for rich-text documents, and all of them have the same problem.*
>
> 2. Maybe something imcompatible between JiebaTokenizer and Solr
> highlighter. If you switch to other tokenizers, e.g. Standard, CJK,
> SmartChinese (I don't use this since I am dealing with Traditional Chinese
>
> but I see you are dealing with Simplified Chinese), or 3rd-party MMSeg and
>
> see if the problem goes away. However when I'm googling similar problem, I
>
> saw you asked same question on August at Huaban/Jieba-analysis and somebody
> said he also uses JiebaTokenizer but he doesn't have your problem. So I see
> this could be less suspect.
> *A) I was thinking about the incompatible issue too, as I previously
> thought that JiebaTokenizer is optimised for Solr 4.x, so it may have issue
> in 5.x. But the person from Hunban/Jieba-analysis said that he doesn't have
> this problem in Solr 5.1. I also face the same problem in Solr 5.1, and
> although I'm using Solr 5.3.0 now, the same problem persist. *
>
> I'm looking at the indexing process too, to see if there's any problem
> there. But just can't figure out why it only happen to JiebaTokenizer, and
>
> it only happen for content field.
>
>
> Regards,
> Edwin
>
>
> On 23 October 2015 at 09:41, Scott Chu  <+scott@udngroup.com>> wrote:
>
> > Hi Edwin,
> >
> > Since you've tested all my suggestions and the problem is still there, I
>
> > can't think of anything wrong with your configuration. Now I can only
> > suspect two things:
> >
> > 1. You said the problem only happens on "contents" field, so maybe
> > there're something wrong with the contents of that field. Doe it contain
>
> > any special thing in them, e.g. HTML tags or symbols. I recall SOLR-42
> > mentions something about HTML stripping will cause highlight problem.
> Maybe
> > you can try purify that fields to be closed to pure text and see if
> > highlight comes ok.
> >
> > 2. Maybe something imcompatible between JiebaTokenizer and Solr
> > highlighter. If you switch to other tokenizers, e.g. Standard, CJK,
> > SmartChinese (I don't use this since I am dealing with Traditional
> Chinese
> > but I see you are dealing with Simplified Chinese), or 3rd-party MMSeg
> and
> > see if the problem goes away. However when I'm googling similar problem,
> I
> > saw you asked same question on August at 

Getting error on importing JSON file

2015-10-27 Thread Prathmesh Gat
Hi,

Using Solr Ver 4.10, when we try to import the attached JSON we get an
error saying:
{ "responseHeader": { "status": 400, "QTime": 1 }, "error": { "msg": "Error
parsing JSON field value. Unexpected OBJECT_START", "code": 400 } }

And when we try to send the file via curl:

curl "http://:8983/solr/update/extract?=true" -F
"myfile=@189907.json" -H "Content-type:application/json" , i get a 500
error:





java.lang.NoClassDefFoundError:
org/apache/commons/compress/PasswordRequiredExceptionjava.lang.RuntimeException: java.lang.NoClassDefFoundError:
org/apache/commons/compress/PasswordRequiredException

at
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:793)

at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:434)

at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)

at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)

at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)

at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)

at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)

at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)

at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)

at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)

at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)

at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)

at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)

at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)

at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)

at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)

at org.eclipse.jetty.server.Server.handle(Server.java:368)

at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)

at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)

at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)

at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)

at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:647)

at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)

at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)

at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)

at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)

at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)

at java.lang.Thread.run(Thread.java:745)

Caused by: java.lang.NoClassDefFoundError:
org/apache/commons/compress/PasswordRequiredException

at
org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDetector.java:88)

at
org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:77)

at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:112)

at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:221)

at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)

at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)

at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:246)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1976)

at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777)

at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)


500

Any pointers on what I am doing incorrectly?


4.json
Description: application/json


Re: Getting error on importing JSON file

2015-10-27 Thread Upayavira
Solr won't index arbitrary json.

Please research the format that solr expects.

Upayavira

On Tue, Oct 27, 2015, at 07:23 AM, Prathmesh Gat wrote:
> Hi,
> 
> Using Solr Ver 4.10, when we try to import the attached JSON we get an
> error saying:
> { "responseHeader": { "status": 400, "QTime": 1 }, "error": { "msg":
> "Error
> parsing JSON field value. Unexpected OBJECT_START", "code": 400 } }
> 
> And when we try to send the file via curl:
> 
> curl "http://:8983/solr/update/extract?=true" -F
> "myfile=@189907.json" -H "Content-type:application/json" , i get a 500
> error:
> 
> 
> 
> 
> 
> java.lang.NoClassDefFoundError:
> org/apache/commons/compress/PasswordRequiredException name="trace">java.lang.RuntimeException: java.lang.NoClassDefFoundError:
> org/apache/commons/compress/PasswordRequiredException
> 
> at
> org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:793)
> 
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:434)
> 
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
> 
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
> 
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
> 
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
> 
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
> 
> at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
> 
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
> 
> at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
> 
> at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
> 
> at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
> 
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
> 
> at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
> 
> at
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
> 
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
> 
> at org.eclipse.jetty.server.Server.handle(Server.java:368)
> 
> at
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
> 
> at
> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
> 
> at
> org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
> 
> at
> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
> 
> at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:647)
> 
> at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
> 
> at
> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
> 
> at
> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
> 
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
> 
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
> 
> at java.lang.Thread.run(Thread.java:745)
> 
> Caused by: java.lang.NoClassDefFoundError:
> org/apache/commons/compress/PasswordRequiredException
> 
> at
> org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDetector.java:88)
> 
> at
> org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:77)
> 
> at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:112)
> 
> at
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:221)
> 
> at
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
> 
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> 
> at
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:246)
> 
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1976)
> 
> at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777)
> 
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
> 
> 
> 500
> 
> Any pointers on what I am doing incorrectly?
> Email had 1 attachment:
> + 4.json
>   5k (application/json)


Many mapping files

2015-10-27 Thread fabigol
Hi,
I already posted on my subject that helped me a lot. I recover a solr
project but the person is gone.
There are several configuration files / mapping (base / Solr)
What interest do that?
thanks for your help



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Many-mapping-files-tp4236658.html
Sent from the Solr - User mailing list archive at Nabble.com.


Nested document limitation

2015-10-27 Thread Midas A
Hi all,

Is there any limitation of nested document in solr .

Can we have any DIH(data  import handler ) plugin for solr .


Regards,
Abhishek


Re: Split shard onto new physical volumes

2015-10-27 Thread Nikolay Shuyskiy

Den 2015-10-22 17:54:44 skrev Shawn Heisey :


On 10/22/2015 8:29 AM, Nikolay Shuyskiy wrote:

I imagined that I could, say, add two new nodes to SolrCloud, and split
shard so that two new shards ("halves" of the one being split) will be
created on those new nodes.

Right now the only way to split shard in my situation I see is to create
two directories (shard_1_0 and shard_1_1) and mount new volumes onto
them *before* calling SPLITSHARD. Then I would be able to split shards,
and after adding two new nodes, these new shards will be replicated, and
I'll be able to clean up all the data on the first node.


The reason that they must be on the same node is because index splitting
is a *Lucene* operation, and Lucene has no knowledge of Solr nodes, only
the one index on the one machine.

Depending on the overall cloud distribution, one option *might* be to
add a replica of the shard you want to split to one or more new nodes
with plenty of disk space, and after it is replicated, delete it from
any nodes where the disk is nearly full.  Then do the split operation,
and once it's done, use ADDREPLICA/DELETEREPLICA to arrange everything
the way you want it.
Thank you, that makes sense and is a usable alternative for us for the  
time being.
Probably we have to consider using implicit routing for the future so that  
we could add new nodes without dealing with splitting.


--
Yrs sincerely,
 Nikolay Shuyskiy


Zookeeper ACL issue when using Solr 5.3.1 to connect it

2015-10-27 Thread diyun2008
Following the guide:
https://cwiki.apache.org/confluence/display/solr/ZooKeeper+Access+Control
Setting solr.xml and solr.in.sh with : 
...

...
org.apache.solr.common.cloud.VMParamsSingleSetCredentialsDigestZkCredentialsProvider
org.apache.solr.common.cloud.VMParamsAllAndReadonlyDigestZkACLProvider

and 

SOLR_ZK_CREDS_AND_ACLS="-DzkDigestUsername=admin-user
-DzkDigestPassword=admin-password \
-DzkDigestReadonlyUsername=readonly-user
-DzkDigestReadonlyPassword=readonly-password"
 
SOLR_OPTS="$SOLR_OPTS $SOLR_ZK_CREDS_AND_ACLS"

Still cannot connect zookeeper for solr. with Exception:
org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode =
NoAuth for /live_nodes/9.112.232.200:8983_solr
at org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
at
org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:345)
at
org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:342)
at
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:61)
at 
org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:342)
at
org.apache.solr.servlet.ZookeeperInfoServlet$ZKPrinter.printTree(ZookeeperInfoServlet.java:598)

Here's my Zookeeper chroot znode like:
[zk: localhost:2181(CONNECTED) 1] getAcl /testvm
'digest,'admin-user:MZZofqHt1zazEWJFPeUbL8d2l0k=
: cdrwa

Any advice on this issue will be appreciated.






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Zookeeper-ACL-issue-when-using-Solr-5-3-1-to-connect-it-tp4236675.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: CloudSolrClient query /admin/info/system

2015-10-27 Thread Alan Woodward
Hi Kevin,

This looks like a bug in CSC - could you raise an issue?

Alan Woodward
www.flax.co.uk


On 26 Oct 2015, at 22:21, Kevin Risden wrote:

> I am trying to use CloudSolrClient to query information about the Solr
> server including version information. I found /admin/info/system and it
> seems to provide the information I am looking for. However, it looks like
> CloudSolrClient cannot query /admin/info since INFO_HANDLER_PATH [1] is not
> part of the ADMIN_PATHS in CloudSolrClient.java [2]. Was this possibly
> missed as part of SOLR-4943 [3]?
> 
> Is this an issue or is there a better way to query this information?
> 
> As a side note, ZK_PATH also isn't listed in ADMIN_PATHS. I'm not sure what
> issues that could cause. Is there a reason that ADMIN_PATHS in
> CloudSolrClient would be different than the paths in CommonParams [1]?
> 
> [1]
> https://github.com/apache/lucene-solr/blob/trunk/solr/solrj/src/java/org/apache/solr/common/params/CommonParams.java#L168
> [2]
> https://github.com/apache/lucene-solr/blob/trunk/solr/solrj/src/java/org/apache/solr/client/solrj/impl/CloudSolrClient.java#L808
> [3] https://issues.apache.org/jira/browse/SOLR-4943
> 
> Kevin Risden
> Hadoop Tech Lead | Avalon Consulting, LLC 
> M: 732 213 8417
> LinkedIn  | Google+
>  | Twitter
> 



Re: Split shard onto new physical volumes

2015-10-27 Thread Upayavira


On Tue, Oct 27, 2015, at 10:50 AM, Nikolay Shuyskiy wrote:
> Den 2015-10-22 17:54:44 skrev Shawn Heisey :
> 
> > On 10/22/2015 8:29 AM, Nikolay Shuyskiy wrote:
> >> I imagined that I could, say, add two new nodes to SolrCloud, and split
> >> shard so that two new shards ("halves" of the one being split) will be
> >> created on those new nodes.
> >>
> >> Right now the only way to split shard in my situation I see is to create
> >> two directories (shard_1_0 and shard_1_1) and mount new volumes onto
> >> them *before* calling SPLITSHARD. Then I would be able to split shards,
> >> and after adding two new nodes, these new shards will be replicated, and
> >> I'll be able to clean up all the data on the first node.
> >
> > The reason that they must be on the same node is because index splitting
> > is a *Lucene* operation, and Lucene has no knowledge of Solr nodes, only
> > the one index on the one machine.
> >
> > Depending on the overall cloud distribution, one option *might* be to
> > add a replica of the shard you want to split to one or more new nodes
> > with plenty of disk space, and after it is replicated, delete it from
> > any nodes where the disk is nearly full.  Then do the split operation,
> > and once it's done, use ADDREPLICA/DELETEREPLICA to arrange everything
> > the way you want it.
> Thank you, that makes sense and is a usable alternative for us for the  
> time being.
> Probably we have to consider using implicit routing for the future so
> that  
> we could add new nodes without dealing with splitting.

Depends upon the use-case. For things like log files, use time based
collections, then create/destroy collection aliases to point to them.

I've had a "today" alias that points to logs_20151027 and logs_20151026,
meaning all content for the last 24hrs is available via
http://localhost:8983/solr/today. I had "week" and "month" also.

Dunno if that works for you.

Upayavira


Re: Many mapping files

2015-10-27 Thread Gora Mohanty
On 27 October 2015 at 13:22, fabigol  wrote:
> Hi,
> I already posted on my subject that helped me a lot. I recover a solr
> project but the person is gone.
> There are several configuration files / mapping (base / Solr)
> What interest do that?
> thanks for your help

Unfortunately, this question is too vague for any kind of reasonable
help. Your best bet probably is to get a basic familiarity with Solr.
There are books available, and the Wiki is a helpful resource. You
might want to start at https://wiki.apache.org/solr/ and go through
the tutorial linked therein. Many of the files will be explained that,
and once you have at least a basic understanding, please ask more
specific questions about specific configuration files on this list.

Regards,
Gora


Re: Split shard onto new physical volumes

2015-10-27 Thread Nikolay Shuyskiy

On Tue, Oct 27, 2015, at 10:50 AM, Nikolay Shuyskiy wrote:

Den 2015-10-22 17:54:44 skrev Shawn Heisey :


On 10/22/2015 8:29 AM, Nikolay Shuyskiy wrote:
>> I imagined that I could, say, add two new nodes to SolrCloud, and  
split
>> shard so that two new shards ("halves" of the one being split) will  
be

>> created on those new nodes.
>>
>> Right now the only way to split shard in my situation I see is to  
create

>> two directories (shard_1_0 and shard_1_1) and mount new volumes onto
>> them *before* calling SPLITSHARD. Then I would be able to split  
shards,
>> and after adding two new nodes, these new shards will be replicated,  
and

>> I'll be able to clean up all the data on the first node.
>
> The reason that they must be on the same node is because index  
splitting
> is a *Lucene* operation, and Lucene has no knowledge of Solr nodes,  
only

> the one index on the one machine.
>
> Depending on the overall cloud distribution, one option *might* be to
> add a replica of the shard you want to split to one or more new nodes
> with plenty of disk space, and after it is replicated, delete it from
> any nodes where the disk is nearly full.  Then do the split operation,
> and once it's done, use ADDREPLICA/DELETEREPLICA to arrange everything
> the way you want it.
Thank you, that makes sense and is a usable alternative for us for the
time being.
Probably we have to consider using implicit routing for the future so
that we could add new nodes without dealing with splitting.


Depends upon the use-case. For things like log files, use time based
collections, then create/destroy collection aliases to point to them.

I've had a "today" alias that points to logs_20151027 and logs_20151026,
meaning all content for the last 24hrs is available via
http://localhost:8983/solr/today. I had "week" and "month" also.

Dunno if that works for you.
Thanks for sharing your experience, but in our case any kind of time-based  
splitting is irrelevant. If worse comes to worst, we can impose some kind  
of pre-grouping on our documents (thank you for idea!), but it'd  
complicate application logic (and Solr maintenance, I'm afraid) too much  
for our taste.


--
Yrs sincerely,
 Nikolay Shuyskiy


Re: Two seperate intance of Solr on the same machine

2015-10-27 Thread Steven White
How do I specify a different log directory by editing "log4j.properties"?

Steve

On Mon, Oct 26, 2015 at 9:08 PM, Pushkar Raste 
wrote:

> It depends on your case. If you don't mind logs from 3 different instances
> inter-mingled with each other you should be fine.
> You add "-Dsolr.log=" to make logs to go different
> directories. If you want logs to go to same directory but different files
> try updating log4j.properties.
>
> On 26 October 2015 at 13:33, Steven White  wrote:
>
> > Hi,
> >
> > For reasons I have no control over, I'm required to run 2 (maybe more)
> > instances of Solr on the same server (Windows and Linux).  To be more
> > specific, I will need to start each instance like so:
> >
> >   > solr\bin start -p 8983 -s ..\instance_one
> >   > solr\bin start -p 8984 -s ..\instance_two
> >   > solr\bin start -p 8985 -s ..\instance_three
> >
> > Each of those instances is a stand alone Solr (no ZK here at all).
> >
> > I have tested this over and over and did not see any issue.  However, I
> did
> > notice that each instance is writing to the same solr\server\logs\ files
> > (will this be an issue?!!)
> >
> > Is the above something I should avoid?  If so, why?
> >
> > Thanks in advanced !!
> >
> > Steve
> >
>


Re: Two seperate intance of Solr on the same machine

2015-10-27 Thread Steven White
That's what I'm doing using "-s" to instruct each instance of Solr where
the data is.

Steve

On Tue, Oct 27, 2015 at 12:52 AM, Jack Krupansky 
wrote:

> Each instance should be installed in a separate directory. IOW, don't try
> running multiple Solr processes for the same data.
>
> -- Jack Krupansky
>
> On Mon, Oct 26, 2015 at 1:33 PM, Steven White 
> wrote:
>
> > Hi,
> >
> > For reasons I have no control over, I'm required to run 2 (maybe more)
> > instances of Solr on the same server (Windows and Linux).  To be more
> > specific, I will need to start each instance like so:
> >
> >   > solr\bin start -p 8983 -s ..\instance_one
> >   > solr\bin start -p 8984 -s ..\instance_two
> >   > solr\bin start -p 8985 -s ..\instance_three
> >
> > Each of those instances is a stand alone Solr (no ZK here at all).
> >
> > I have tested this over and over and did not see any issue.  However, I
> did
> > notice that each instance is writing to the same solr\server\logs\ files
> > (will this be an issue?!!)
> >
> > Is the above something I should avoid?  If so, why?
> >
> > Thanks in advanced !!
> >
> > Steve
> >
>


Re: Nested document limitation

2015-10-27 Thread Erick Erickson
You really have to ask a specific question. What do
you want to _do_ with your nested documents? A
DIH plugin for Solr do do what?

You might review:
http://wiki.apache.org/solr/UsingMailingLists

Best,
Erick

On Tue, Oct 27, 2015 at 1:20 AM, Midas A  wrote:
> Hi all,
>
> Is there any limitation of nested document in solr .
>
> Can we have any DIH(data  import handler ) plugin for solr .
>
>
> Regards,
> Abhishek


Re: Solr hard commit

2015-10-27 Thread Erick Erickson
bq: So, the updated file(s) on the disk automatically read into memory
as they are Memory mapped?

Not quite sure why you care, curiosity or is there something you're
trying to accomplish?

The contents of the index's segment files are read into virtual memory
by MMapDirectory as needed to satisfy queries. Which is the point of
autowarming BTW.

commit in the following is either hard commit with openSearcher=true
or soft commit.

Segments that have been created (closed actually) after the last
commit  are _not_ read at all until the next searcher is opened via
another commit. Nothing is done with these new segments before the new
searcher is opened which you control with your commit strategy.

Best,
Erick

On Mon, Oct 26, 2015 at 9:07 PM, Rallavagu  wrote:
> Erick, Thanks for clarification. I was under impression that MMapDirectory
> is being used for both read/write operations. Now, I see how it is being
> used. Essentially, it only reads from MMapDirectory and writes directly to
> disk. So, the updated file(s) on the disk automatically read into memory as
> they are Memory mapped?
>
> On 10/26/15 8:43 PM, Erick Erickson wrote:
>>
>> You're really looking at this backwards. The MMapDirectory stuff is
>> for Solr (Lucene, really) _reading_ data from closed segment files.
>>
>> When indexing, there are internal memory structures that are flushed
>> to disk on commit, but these have nothing to do with MMapDirectory.
>>
>> So the question is really moot ;)
>>
>> Best,
>> Erick
>>
>> On Mon, Oct 26, 2015 at 5:47 PM, Rallavagu  wrote:
>>>
>>> All,
>>>
>>> Are memory mapped files (mmap) flushed to disk during "hard commit"? If
>>> yes,
>>> should we disable OS level (Linux for example) memory mapped flush?
>>>
>>> I am referring to following for mmap files for Lucene/Solr
>>>
>>> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
>>>
>>> Linux level flush
>>>
>>> http://www.cyberciti.biz/faq/linux-stop-flushing-of-mmaped-pages-to-disk/
>>>
>>> Solr's hard and soft commit
>>>
>>>
>>> https://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>>>
>>> Thanks in advance.


Re: Can not resolve the webdav path

2015-10-27 Thread Erick Erickson
Probably Alfresco support. There's nothing in that stack
trace that looks like a Solr or Lucene issue, and I
wouldn't know where to start understanding the
Alfresco code.

Best,
Erick

On Tue, Oct 27, 2015 at 7:12 AM,   wrote:
>
> When we are trying to access content from our 5.0.2 alfresco install we get
> the a Status 500 - No bean named 'webClientConfigService' is defined
>
> The stack trace from the catalina.out log looks like this.
>
> 2015-10-27 09:00:29,911  WARN  [app.servlet.BaseServlet]
> [http-nio-8080-exec-5] Failed to resolve webdav path
>  org.alfresco.service.cmr.model.FileNotFoundException: Folder not found:
> at
> org.alfresco.repo.model.filefolder.FileFolderServiceImpl.resolveNamePath
> (FileFolderServiceImpl.java:1548)
> at
> org.alfresco.repo.model.filefolder.FileFolderServiceImpl.resolveNamePath
> (FileFolderServiceImpl.java:1527)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke
> (NativeMethodAccessorImpl.java:57)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke
> (DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
> org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection
> (AopUtils.java:317)
> at
> org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint
> (ReflectiveMethodInvocation.java:183)
> at
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed
> (ReflectiveMethodInvocation.java:150)
> at org.alfresco.repo.model.ml.MLContentInterceptor.invoke
> (MLContentInterceptor.java:129)
> at
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed
> (ReflectiveMethodInvocation.java:172)
> at
> org.alfresco.repo.model.filefolder.MLTranslationInterceptor.invoke
> (MLTranslationInterceptor.java:268)
> at
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed
> (ReflectiveMethodInvocation.java:172)
> at
> net.sf.acegisecurity.intercept.method.aopalliance.MethodSecurityInterceptor.invoke
> (MethodSecurityInterceptor.java:80)
> at
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed
> (ReflectiveMethodInvocation.java:172)
> at
> org.alfresco.repo.security.permissions.impl.ExceptionTranslatorMethodInterceptor.invoke
> (ExceptionTranslatorMethodInterceptor.java:46)
> at
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed
> (ReflectiveMethodInvocation.java:172)
> at org.alfresco.repo.audit.AuditMethodInterceptor.invoke
> (AuditMethodInterceptor.java:159)
> at
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed
> (ReflectiveMethodInvocation.java:172)
> at
> org.alfresco.repo.model.filefolder.FilenameFilteringInterceptor.invoke
> (FilenameFilteringInterceptor.java:382)
> at
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed
> (ReflectiveMethodInvocation.java:172)
> at
> org.springframework.transaction.interceptor.TransactionInterceptor
> $1.proceedWithInvocation(TransactionInterceptor.java:96)
> at
> org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction
> (TransactionAspectSupport.java:260)
> at
> org.springframework.transaction.interceptor.TransactionInterceptor.invoke
> (TransactionInterceptor.java:94)
> at
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed
> (ReflectiveMethodInvocation.java:172)
> at org.springframework.aop.framework.JdkDynamicAopProxy.invoke
> (JdkDynamicAopProxy.java:204)
> at com.sun.proxy.$Proxy65.resolveNamePath(Unknown Source)
> at org.alfresco.web.app.servlet.BaseServlet$1.doWork
> (BaseServlet.java:435)
> at org.alfresco.web.app.servlet.BaseServlet$1.doWork
> (BaseServlet.java:392)
> at
> org.alfresco.repo.security.authentication.AuthenticationUtil.runAs
> (AuthenticationUtil.java:548)
> at org.alfresco.web.app.servlet.BaseServlet.resolveWebDAVPath
> (BaseServlet.java:391)
> at org.alfresco.web.app.servlet.BaseServlet.resolveWebDAVPath
> (BaseServlet.java:379)
> at org.alfresco.web.app.servlet.BaseServlet.resolveNamePath
> (BaseServlet.java:475)
> at
> org.alfresco.web.app.servlet.BaseDownloadContentServlet.processDownloadRequest
> (BaseDownloadContentServlet.java:155)
> at org.alfresco.web.app.servlet.DownloadContentServlet$2.execute
> (DownloadContentServlet.java:141)
> at org.alfresco.web.app.servlet.DownloadContentServlet$2.execute
> (DownloadContentServlet.java:138)
> at
> org.alfresco.repo.transaction.RetryingTransactionHelper.doInTransaction
> (RetryingTransactionHelper.java:454)
> at
> org.alfresco.repo.transaction.RetryingTransactionHelper.doInTransaction
> (RetryingTransactionHelper.java:342)
> at 

Re: Solr hard commit

2015-10-27 Thread Rallavagu



On 10/27/15 8:43 AM, Erick Erickson wrote:

bq: So, the updated file(s) on the disk automatically read into memory
as they are Memory mapped?

Yes.


Not quite sure why you care, curiosity or is there something you're
trying to accomplish?
This is out of curiosity. So, I can get better understanding of Solr's 
memory usage (heap & mmap).




The contents of the index's segment files are read into virtual memory
by MMapDirectory as needed to satisfy queries. Which is the point of
autowarming BTW.


Ok. But, I have noticed that even "tlog" files are memory mapped (output 
from "lsof") in addition to all other files under "data" directory.




commit in the following is either hard commit with openSearcher=true
or soft commit.


Hard commit is setup with openSearcher=false and softCommit is setup for 
every 2 min.




Segments that have been created (closed actually) after the last
commit  are _not_ read at all until the next searcher is opened via
another commit. Nothing is done with these new segments before the new
searcher is opened which you control with your commit strategy.


I see. Thanks for the insight.



Best,
Erick

On Mon, Oct 26, 2015 at 9:07 PM, Rallavagu  wrote:

Erick, Thanks for clarification. I was under impression that MMapDirectory
is being used for both read/write operations. Now, I see how it is being
used. Essentially, it only reads from MMapDirectory and writes directly to
disk. So, the updated file(s) on the disk automatically read into memory as
they are Memory mapped?

On 10/26/15 8:43 PM, Erick Erickson wrote:


You're really looking at this backwards. The MMapDirectory stuff is
for Solr (Lucene, really) _reading_ data from closed segment files.

When indexing, there are internal memory structures that are flushed
to disk on commit, but these have nothing to do with MMapDirectory.

So the question is really moot ;)

Best,
Erick

On Mon, Oct 26, 2015 at 5:47 PM, Rallavagu  wrote:


All,

Are memory mapped files (mmap) flushed to disk during "hard commit"? If
yes,
should we disable OS level (Linux for example) memory mapped flush?

I am referring to following for mmap files for Lucene/Solr

http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

Linux level flush

http://www.cyberciti.biz/faq/linux-stop-flushing-of-mmaped-pages-to-disk/

Solr's hard and soft commit


https://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Thanks in advance.


Using books.json in solr

2015-10-27 Thread Salonee Rege
Hello,
  We are trying to query the books.json that we have posted to solr. But
when we try to specfically query it on genre it does not return a complete
json with valid key-value pairs. Kindly help.

*Salonee Rege*
USC Viterbi School of Engineering
University of Southern California
Master of Computer Science - Student
Computer Science - B.E
salon...@usc.edu  *||* *619-709-6756*


books.json
Description: application/json


Re: Using books.json in solr

2015-10-27 Thread Erik Hatcher
Salonee - attachments generally do not pass through the solr-user list.

If you added a field, did you then *reindex* the data?   And what was the query 
you gave?   Just q=fantasy may not be good enough (see debug=true output and 
what the query was parsed to, in particular what fields were queried).  Try 
q=genre_s:fantasy (after reindexing)

—
Erik Hatcher, Senior Solutions Architect
http://www.lucidworks.com 




> On Oct 27, 2015, at 2:35 PM, Salonee Rege  wrote:
> 
>  {
> "id" : "978-1423103349",
> "cat" : ["book","paperback"],
> "name" : "The Sea of Monsters",
> "author" : "Rick Riordan",
> "series_t" : "Percy Jackson and the Olympians",
> "sequence_i" : 2,
> "genre_s" : "fantasy",
> "inStock" : true,
> "price" : 6.49,
> "pages_i" : 304
>   }
> 
> We have uploaded the books.json from exampledocs in the example folder of 
> solr. In schema.xml we have included the following field:
> 
> 
> But when we try to query the JSON on genre field by querying "fantasy" we get 
> the following result. PFA the screenshot of our result.
> 
> Salonee Rege
> USC Viterbi School of Engineering
> University of Southern California
> Master of Computer Science - Student
> Computer Science - B.E
> salon...@usc.edu   || 619-709-6756
> 
> 
> 
> On Tue, Oct 27, 2015 at 11:22 AM, Salonee Rege  > wrote:
> Hello
> 
> Salonee Rege
> USC Viterbi School of Engineering
> University of Southern California
> Master of Computer Science - Student
> Computer Science - B.E
> salon...@usc.edu   || 619-709-6756 
> 
> 
> 
> On Tue, Oct 27, 2015 at 11:06 AM, Sameer Maggon  > wrote:
> Hi Salonee,
> 
> I believe you missed adding the query screenshot?
> 
> Sameer.
> 
> On Tue, Oct 27, 2015 at 10:57 AM, Salonee Rege  > wrote:
> 
> > Please find attached the following books.json which is in the example-docs
> > file for your reference. And a screenshot of querying it on the field
> > fantasy for genre key.
> > Thanks for the help.
> >
> >
> > *Salonee Rege*
> > USC Viterbi School of Engineering
> > University of Southern California
> > Master of Computer Science - Student
> > Computer Science - B.E
> > salon...@usc.edu   *||* *619-709-6756 
> >  <619-709-6756 >*
> >
> >
> >
> > On Tue, Oct 27, 2015 at 10:47 AM, Rallavagu  > > wrote:
> >
> >> Could you please share your query? You could use "wt=json" query
> >> parameter to receive JSON formatted results if that is what you are looking
> >> for.
> >>
> >> On 10/27/15 10:44 AM, Salonee Rege wrote:
> >>
> >>> Hello,
> >>>We are trying to query the books.json that we have posted to solr.
> >>> But when we try to specfically query it on genre it does not return a
> >>> complete json with valid key-value pairs. Kindly help.
> >>>
> >>> /Salonee Rege/
> >>> USC Viterbi School of Engineering
> >>> University of Southern California
> >>> Master of Computer Science - Student
> >>> Computer Science - B.E
> >>> salon...@usc.edu   >>> > _||_ _619-709-6756 _
> >>> _
> >>> _
> >>> _
> >>> _
> >>>
> >>
> >
> 
> 
> --
> *Sameer Maggon*
> Measured Search
> c: 310.344.7266 
> www.measuredsearch.com  
> >
> 
> 



Re: Using books.json in solr

2015-10-27 Thread Sameer Maggon
Hi Salonee, can you post the query and your schema file too?

Thanks,
-- 
*Sameer Maggon*
www.measuredsearch.com 
Solr Cloud Hosting | Managed Services | Solr Consulting


On Tue, Oct 27, 2015 at 10:44 AM, Salonee Rege  wrote:

> Hello,
>   We are trying to query the books.json that we have posted to solr. But
> when we try to specfically query it on genre it does not return a complete
> json with valid key-value pairs. Kindly help.
>
> *Salonee Rege*
> USC Viterbi School of Engineering
> University of Southern California
> Master of Computer Science - Student
> Computer Science - B.E
> salon...@usc.edu  *||* *619-709-6756 <619-709-6756>*
>
>
>


Re: Using books.json in solr

2015-10-27 Thread Rallavagu
Could you please share your query? You could use "wt=json" query 
parameter to receive JSON formatted results if that is what you are 
looking for.


On 10/27/15 10:44 AM, Salonee Rege wrote:

Hello,
   We are trying to query the books.json that we have posted to solr.
But when we try to specfically query it on genre it does not return a
complete json with valid key-value pairs. Kindly help.

/Salonee Rege/
USC Viterbi School of Engineering
University of Southern California
Master of Computer Science - Student
Computer Science - B.E
salon...@usc.edu  _||_ _619-709-6756_
_
_
_
_


Re: Nested document limitation

2015-10-27 Thread Mikhail Khludnev
On Tue, Oct 27, 2015 at 11:20 AM, Midas A  wrote:

> Hi all,
>
> Is there any limitation of nested document in solr .
>

there are a plenty of them

>
> Can we have any DIH(data  import handler ) plugin for solr .
>

mind child="true" at
https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler#UploadingStructuredDataStoreDatawiththeDataImportHandler-EntityProcessors

>
>
> Regards,
> Abhishek
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Re: Using books.json in solr

2015-10-27 Thread Salonee Rege
 {
"id" : "978-1423103349",
"cat" : ["book","paperback"],
"name" : "The Sea of Monsters",
"author" : "Rick Riordan",
"series_t" : "Percy Jackson and the Olympians",
"sequence_i" : 2,
"genre_s" : "fantasy",
"inStock" : true,
"price" : 6.49,
"pages_i" : 304
  }

We have uploaded the books.json from exampledocs in the example folder of
solr. In schema.xml we have included the following field:


But when we try to query the JSON on genre field by querying "fantasy" we
get the following result. PFA the screenshot of our result.

*Salonee Rege*
USC Viterbi School of Engineering
University of Southern California
Master of Computer Science - Student
Computer Science - B.E
salon...@usc.edu  *||* *619-709-6756*



On Tue, Oct 27, 2015 at 11:22 AM, Salonee Rege  wrote:

> Hello
>
> *Salonee Rege*
> USC Viterbi School of Engineering
> University of Southern California
> Master of Computer Science - Student
> Computer Science - B.E
> salon...@usc.edu  *||* *619-709-6756 <619-709-6756>*
>
>
>
> On Tue, Oct 27, 2015 at 11:06 AM, Sameer Maggon  > wrote:
>
>> Hi Salonee,
>>
>> I believe you missed adding the query screenshot?
>>
>> Sameer.
>>
>> On Tue, Oct 27, 2015 at 10:57 AM, Salonee Rege  wrote:
>>
>> > Please find attached the following books.json which is in the
>> example-docs
>> > file for your reference. And a screenshot of querying it on the field
>> > fantasy for genre key.
>> > Thanks for the help.
>> >
>> >
>> > *Salonee Rege*
>> > USC Viterbi School of Engineering
>> > University of Southern California
>> > Master of Computer Science - Student
>> > Computer Science - B.E
>> > salon...@usc.edu  *||* *619-709-6756 <619-709-6756>*
>> >
>> >
>> >
>> > On Tue, Oct 27, 2015 at 10:47 AM, Rallavagu 
>> wrote:
>> >
>> >> Could you please share your query? You could use "wt=json" query
>> >> parameter to receive JSON formatted results if that is what you are
>> looking
>> >> for.
>> >>
>> >> On 10/27/15 10:44 AM, Salonee Rege wrote:
>> >>
>> >>> Hello,
>> >>>We are trying to query the books.json that we have posted to solr.
>> >>> But when we try to specfically query it on genre it does not return a
>> >>> complete json with valid key-value pairs. Kindly help.
>> >>>
>> >>> /Salonee Rege/
>> >>> USC Viterbi School of Engineering
>> >>> University of Southern California
>> >>> Master of Computer Science - Student
>> >>> Computer Science - B.E
>> >>> salon...@usc.edu  _||_ _619-709-6756_
>> >>> _
>> >>> _
>> >>> _
>> >>> _
>> >>>
>> >>
>> >
>>
>>
>> --
>> *Sameer Maggon*
>> Measured Search
>> c: 310.344.7266
>> www.measuredsearch.com 
>>
>
>


Re: Using books.json in solr

2015-10-27 Thread Sameer Maggon
Hi Salonee,

I believe you missed adding the query screenshot?

Sameer.

On Tue, Oct 27, 2015 at 10:57 AM, Salonee Rege  wrote:

> Please find attached the following books.json which is in the example-docs
> file for your reference. And a screenshot of querying it on the field
> fantasy for genre key.
> Thanks for the help.
>
>
> *Salonee Rege*
> USC Viterbi School of Engineering
> University of Southern California
> Master of Computer Science - Student
> Computer Science - B.E
> salon...@usc.edu  *||* *619-709-6756 <619-709-6756>*
>
>
>
> On Tue, Oct 27, 2015 at 10:47 AM, Rallavagu  wrote:
>
>> Could you please share your query? You could use "wt=json" query
>> parameter to receive JSON formatted results if that is what you are looking
>> for.
>>
>> On 10/27/15 10:44 AM, Salonee Rege wrote:
>>
>>> Hello,
>>>We are trying to query the books.json that we have posted to solr.
>>> But when we try to specfically query it on genre it does not return a
>>> complete json with valid key-value pairs. Kindly help.
>>>
>>> /Salonee Rege/
>>> USC Viterbi School of Engineering
>>> University of Southern California
>>> Master of Computer Science - Student
>>> Computer Science - B.E
>>> salon...@usc.edu  _||_ _619-709-6756_
>>> _
>>> _
>>> _
>>> _
>>>
>>
>


-- 
*Sameer Maggon*
Measured Search
c: 310.344.7266
www.measuredsearch.com 


Re: Using books.json in solr

2015-10-27 Thread Salonee Rege
Please find attached the following books.json which is in the example-docs
file for your reference. And a screenshot of querying it on the field
fantasy for genre key.
Thanks for the help.


*Salonee Rege*
USC Viterbi School of Engineering
University of Southern California
Master of Computer Science - Student
Computer Science - B.E
salon...@usc.edu  *||* *619-709-6756*



On Tue, Oct 27, 2015 at 10:47 AM, Rallavagu  wrote:

> Could you please share your query? You could use "wt=json" query parameter
> to receive JSON formatted results if that is what you are looking for.
>
> On 10/27/15 10:44 AM, Salonee Rege wrote:
>
>> Hello,
>>We are trying to query the books.json that we have posted to solr.
>> But when we try to specfically query it on genre it does not return a
>> complete json with valid key-value pairs. Kindly help.
>>
>> /Salonee Rege/
>> USC Viterbi School of Engineering
>> University of Southern California
>> Master of Computer Science - Student
>> Computer Science - B.E
>> salon...@usc.edu  _||_ _619-709-6756_
>> _
>> _
>> _
>> _
>>
>


books.json
Description: application/json


Re: Using books.json in solr

2015-10-27 Thread Salonee Rege
Hello

*Salonee Rege*
USC Viterbi School of Engineering
University of Southern California
Master of Computer Science - Student
Computer Science - B.E
salon...@usc.edu  *||* *619-709-6756*



On Tue, Oct 27, 2015 at 11:06 AM, Sameer Maggon 
wrote:

> Hi Salonee,
>
> I believe you missed adding the query screenshot?
>
> Sameer.
>
> On Tue, Oct 27, 2015 at 10:57 AM, Salonee Rege  wrote:
>
> > Please find attached the following books.json which is in the
> example-docs
> > file for your reference. And a screenshot of querying it on the field
> > fantasy for genre key.
> > Thanks for the help.
> >
> >
> > *Salonee Rege*
> > USC Viterbi School of Engineering
> > University of Southern California
> > Master of Computer Science - Student
> > Computer Science - B.E
> > salon...@usc.edu  *||* *619-709-6756 <619-709-6756>*
> >
> >
> >
> > On Tue, Oct 27, 2015 at 10:47 AM, Rallavagu  wrote:
> >
> >> Could you please share your query? You could use "wt=json" query
> >> parameter to receive JSON formatted results if that is what you are
> looking
> >> for.
> >>
> >> On 10/27/15 10:44 AM, Salonee Rege wrote:
> >>
> >>> Hello,
> >>>We are trying to query the books.json that we have posted to solr.
> >>> But when we try to specfically query it on genre it does not return a
> >>> complete json with valid key-value pairs. Kindly help.
> >>>
> >>> /Salonee Rege/
> >>> USC Viterbi School of Engineering
> >>> University of Southern California
> >>> Master of Computer Science - Student
> >>> Computer Science - B.E
> >>> salon...@usc.edu  _||_ _619-709-6756_
> >>> _
> >>> _
> >>> _
> >>> _
> >>>
> >>
> >
>
>
> --
> *Sameer Maggon*
> Measured Search
> c: 310.344.7266
> www.measuredsearch.com 
>


Re: Using books.json in solr

2015-10-27 Thread Salonee Rege
Thanks . We tried genre_s:fantasy and it worked. Thank you for the help. We
were stuck on this for a long time. Another question is that when we query
the value for the author Rick it returns the JSON without having to specify
author:Rick . Is there any specific reason for this? Will we have to do
this for all the fields query ?
 {
 "id" : "978-1423103349",
> "cat" : ["book","paperback"],
> "name" : "The Sea of Monsters",
> "author" : "Rick Riordan",
> "series_t" : "Percy Jackson and the Olympians",
> "sequence_i" : 2,
> "genre_s" : "fantasy",
> "inStock" : true,
> "price" : 6.49,
> "pages_i" : 304
  }

Thanks and Regards ,
Salonee Rege

*Salonee Rege*
USC Viterbi School of Engineering
University of Southern California
Master of Computer Science - Student
Computer Science - B.E
salon...@usc.edu  *||* *619-709-6756*



On Tue, Oct 27, 2015 at 11:40 AM, Erik Hatcher 
wrote:

> Salonee - attachments generally do not pass through the solr-user list.
>
> If you added a field, did you then *reindex* the data?   And what was the
> query you gave?   Just q=fantasy may not be good enough (see debug=true
> output and what the query was parsed to, in particular what fields were
> queried).  Try q=genre_s:fantasy (after reindexing)
>
> —
> Erik Hatcher, Senior Solutions Architect
> http://www.lucidworks.com 
>
>
>
>
> > On Oct 27, 2015, at 2:35 PM, Salonee Rege  wrote:
> >
> >  {
> > "id" : "978-1423103349",
> > "cat" : ["book","paperback"],
> > "name" : "The Sea of Monsters",
> > "author" : "Rick Riordan",
> > "series_t" : "Percy Jackson and the Olympians",
> > "sequence_i" : 2,
> > "genre_s" : "fantasy",
> > "inStock" : true,
> > "price" : 6.49,
> > "pages_i" : 304
> >   }
> >
> > We have uploaded the books.json from exampledocs in the example folder
> of solr. In schema.xml we have included the following field:
> > 
> >
> > But when we try to query the JSON on genre field by querying "fantasy"
> we get the following result. PFA the screenshot of our result.
> >
> > Salonee Rege
> > USC Viterbi School of Engineering
> > University of Southern California
> > Master of Computer Science - Student
> > Computer Science - B.E
> > salon...@usc.edu   || 619-709-6756
> >
> >
> >
> > On Tue, Oct 27, 2015 at 11:22 AM, Salonee Rege  > wrote:
> > Hello
> >
> > Salonee Rege
> > USC Viterbi School of Engineering
> > University of Southern California
> > Master of Computer Science - Student
> > Computer Science - B.E
> > salon...@usc.edu   || 619-709-6756  619-709-6756>
> >
> >
> >
> > On Tue, Oct 27, 2015 at 11:06 AM, Sameer Maggon <
> sam...@measuredsearch.com > wrote:
> > Hi Salonee,
> >
> > I believe you missed adding the query screenshot?
> >
> > Sameer.
> >
> > On Tue, Oct 27, 2015 at 10:57 AM, Salonee Rege  > wrote:
> >
> > > Please find attached the following books.json which is in the
> example-docs
> > > file for your reference. And a screenshot of querying it on the field
> > > fantasy for genre key.
> > > Thanks for the help.
> > >
> > >
> > > *Salonee Rege*
> > > USC Viterbi School of Engineering
> > > University of Southern California
> > > Master of Computer Science - Student
> > > Computer Science - B.E
> > > salon...@usc.edu   *||* *619-709-6756  619-709-6756> <619-709-6756 >*
> > >
> > >
> > >
> > > On Tue, Oct 27, 2015 at 10:47 AM, Rallavagu  > wrote:
> > >
> > >> Could you please share your query? You could use "wt=json" query
> > >> parameter to receive JSON formatted results if that is what you are
> looking
> > >> for.
> > >>
> > >> On 10/27/15 10:44 AM, Salonee Rege wrote:
> > >>
> > >>> Hello,
> > >>>We are trying to query the books.json that we have posted to solr.
> > >>> But when we try to specfically query it on genre it does not return a
> > >>> complete json with valid key-value pairs. Kindly help.
> > >>>
> > >>> /Salonee Rege/
> > >>> USC Viterbi School of Engineering
> > >>> University of Southern California
> > >>> Master of Computer Science - Student
> > >>> Computer Science - B.E
> > >>> salon...@usc.edu   > _||_ _619-709-6756 _
> > >>> _
> > >>> _
> > >>> _
> > >>> _
> > >>>
> > >>
> > >
> >
> >
> > --
> > *Sameer Maggon*
> > Measured Search
> > c: 310.344.7266 
> > www.measuredsearch.com  <
> http://measuredsearch.com >
> >
> >
>
>


Performance degradation with two collection on same sole instance

2015-10-27 Thread SolrUser1543
we have a large solr cloud , with one collection and no replicas .
Each machine has one solr core .
Recently we decided to add a new collection , based on same schema , so now
each solr instance has two cores .
First collection has very big index , but the new one has only several
hundreds of documents.

Day after we did it , we experienced very strong performance degradation,
like long query times and server unavailability.

JVM was configured to 20GB heap , and we did not changed it during addition
of a new collection.

The question is , how Solr manages its resources when it has more than one
core ? Does it need twice memory ? Or this degradation might be a
coincidence ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Performance-degradation-with-two-collection-on-same-sole-instance-tp4236774.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr collection alias - how rank is affected

2015-10-27 Thread SolrUser1543
How is document ranking is affected when using a collection alias for
searching on two collections with same schema ? is it affected at all  ? 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-collection-alias-how-rank-is-affected-tp4236776.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr hard commit

2015-10-27 Thread Erick Erickson
Hmm, the tlog. AFAIK, that's because of
"real time get" (Yonik/Mark/Whoever, please
correct if wrong).

This is a feature where when fetching a document as
the result of a search it insures that you get the most
recent version whether it's been committed or not.

The tlog isn't relevant for searching however.

Best,
Erick

On Tue, Oct 27, 2015 at 8:56 AM, Rallavagu  wrote:
>
>
> On 10/27/15 8:43 AM, Erick Erickson wrote:
>>
>> bq: So, the updated file(s) on the disk automatically read into memory
>> as they are Memory mapped?
>
> Yes.
>>
>>
>> Not quite sure why you care, curiosity or is there something you're
>> trying to accomplish?
>
> This is out of curiosity. So, I can get better understanding of Solr's
> memory usage (heap & mmap).
>
>>
>> The contents of the index's segment files are read into virtual memory
>> by MMapDirectory as needed to satisfy queries. Which is the point of
>> autowarming BTW.
>
>
> Ok. But, I have noticed that even "tlog" files are memory mapped (output
> from "lsof") in addition to all other files under "data" directory.
>
>>
>> commit in the following is either hard commit with openSearcher=true
>> or soft commit.
>
>
> Hard commit is setup with openSearcher=false and softCommit is setup for
> every 2 min.
>
>>
>> Segments that have been created (closed actually) after the last
>> commit  are _not_ read at all until the next searcher is opened via
>> another commit. Nothing is done with these new segments before the new
>> searcher is opened which you control with your commit strategy.
>
>
> I see. Thanks for the insight.
>
>>
>> Best,
>> Erick
>>
>> On Mon, Oct 26, 2015 at 9:07 PM, Rallavagu  wrote:
>>>
>>> Erick, Thanks for clarification. I was under impression that
>>> MMapDirectory
>>> is being used for both read/write operations. Now, I see how it is being
>>> used. Essentially, it only reads from MMapDirectory and writes directly
>>> to
>>> disk. So, the updated file(s) on the disk automatically read into memory
>>> as
>>> they are Memory mapped?
>>>
>>> On 10/26/15 8:43 PM, Erick Erickson wrote:


 You're really looking at this backwards. The MMapDirectory stuff is
 for Solr (Lucene, really) _reading_ data from closed segment files.

 When indexing, there are internal memory structures that are flushed
 to disk on commit, but these have nothing to do with MMapDirectory.

 So the question is really moot ;)

 Best,
 Erick

 On Mon, Oct 26, 2015 at 5:47 PM, Rallavagu  wrote:
>
>
> All,
>
> Are memory mapped files (mmap) flushed to disk during "hard commit"? If
> yes,
> should we disable OS level (Linux for example) memory mapped flush?
>
> I am referring to following for mmap files for Lucene/Solr
>
> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
>
> Linux level flush
>
>
> http://www.cyberciti.biz/faq/linux-stop-flushing-of-mmaped-pages-to-disk/
>
> Solr's hard and soft commit
>
>
>
> https://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>
> Thanks in advance.


Re: Solr collection alias - how rank is affected

2015-10-27 Thread Scott Stults
Collection statistics aren't shared between collections, so there's going
to be a difference. However, if the distribution is fairly random you won't
notice.

On Tue, Oct 27, 2015 at 3:21 PM, SolrUser1543  wrote:

> How is document ranking is affected when using a collection alias for
> searching on two collections with same schema ? is it affected at all  ?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-collection-alias-how-rank-is-affected-tp4236776.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC
| 434.409.2780
http://www.opensourceconnections.com


Sold MiniSolrCloudCluster Issue

2015-10-27 Thread Madhire, Naveen
Hi,

I am using MiniSolrCloudCluster class in writing unit test cases for testing 
the solr application.

Looks like there is a HTTPClient library mismatch with the solr version and I 
am getting the below error,

java.lang.VerifyError: Bad return type
Exception Details:
  Location:

org/apache/solr/client/solrj/impl/HttpClientUtil.createClient(Lorg/apache/solr/common/params/SolrParams;)Lorg/apache/http/impl/client/CloseableHttpClient;
 @57: areturn
  Reason:
Type 'org/apache/http/impl/client/SystemDefaultHttpClient' (current frame, 
stack[0]) is not assignable to 'org/apache/http/impl/client/CloseableHttp

I am using Solr 5.3.1
I see a similar issue here https://issues.apache.org/jira/browse/SOLR-7948 but 
there doesn’t seem to be a work around for this.

Can anyone please tell me how to fix this issue?


Below is code snippet,


dataDir = tempFolder.newFolder();
File solrXml = new File("src/test/resources/solr.xml”);

MiniSolrCloudCluster cluster = new 
MiniSolrCloudCluster(1,null,workingDir,solrXml,null,null);


Thanks.


The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.


The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.


Using Nutch Segments

2015-10-27 Thread Salonee Rege
 We have crawled data using Nutch and now we want to post the nutch
segments to Solr to index that. We are following this link
https://wiki.apache.org/nutch/bin/nutch%20solrindex.But how to check what
to query.As we are directly posting the JSON of the Nutch segments to
solr.Kindly
help.

Thanks and Regards,
*Salonee Rege*
USC Viterbi School of Engineering
University of Southern California
Master of Computer Science - Student
Computer Science - B.E
salon...@usc.edu  *||* *619-709-6756*


Re: Solr hard commit

2015-10-27 Thread Erick Erickson
yep. Although I won't guarantee that the tlog won't be MMapped
even if this is turned off.

On Tue, Oct 27, 2015 at 4:21 PM, Rallavagu  wrote:
> Is it related to this config?
>
> 
> 

Re: Two seperate intance of Solr on the same machine

2015-10-27 Thread Pushkar Raste
add "-Dsolr.log=" to your command line

On 27 October 2015 at 08:13, Steven White  wrote:

> How do I specify a different log directory by editing "log4j.properties"?
>
> Steve
>
> On Mon, Oct 26, 2015 at 9:08 PM, Pushkar Raste 
> wrote:
>
> > It depends on your case. If you don't mind logs from 3 different
> instances
> > inter-mingled with each other you should be fine.
> > You add "-Dsolr.log=" to make logs to go different
> > directories. If you want logs to go to same directory but different files
> > try updating log4j.properties.
> >
> > On 26 October 2015 at 13:33, Steven White  wrote:
> >
> > > Hi,
> > >
> > > For reasons I have no control over, I'm required to run 2 (maybe more)
> > > instances of Solr on the same server (Windows and Linux).  To be more
> > > specific, I will need to start each instance like so:
> > >
> > >   > solr\bin start -p 8983 -s ..\instance_one
> > >   > solr\bin start -p 8984 -s ..\instance_two
> > >   > solr\bin start -p 8985 -s ..\instance_three
> > >
> > > Each of those instances is a stand alone Solr (no ZK here at all).
> > >
> > > I have tested this over and over and did not see any issue.  However, I
> > did
> > > notice that each instance is writing to the same solr\server\logs\
> files
> > > (will this be an issue?!!)
> > >
> > > Is the above something I should avoid?  If so, why?
> > >
> > > Thanks in advanced !!
> > >
> > > Steve
> > >
> >
>


Re: Solr hard commit

2015-10-27 Thread Rallavagu

Is it related to this config?



Re: replica recovery

2015-10-27 Thread Brian Scholl
Both are excellent points and I will look to implement them.  Particularly I 
wonder if a respectable increase to the numRecordsToKeep param could solve this 
problem entirely.  

Thanks!



> On Oct 27, 2015, at 20:50, Jeff Wartes  wrote:
> 
> 
> On the face of it, your scenario seems plausible. I can offer two pieces
> of info that may or may not help you:
> 
> 1. A write request to Solr will not be acknowledged until an attempt has
> been made to write to all relevant replicas. So, B won’t ever be missing
> updates that were applied to A, unless communication with B was disrupted
> somehow at the time of the update request. You can add a min_rf param to
> your write request, in which case the response will tell you how many
> replicas received the update, but it’s still up to your indexer client to
> decide what to do if that’s less than your replication factor.
> 
> See 
> https://cwiki.apache.org/confluence/display/solr/Read+and+Write+Side+Fault+
> Tolerance for more info.
> 
> 2. There are two forms of replication. The usual thing is for the leader
> for each shard to write an update to all replicas before acknowledging the
> write itself, as above. If a replica is less than N docs behind the
> leader, the leader can replay those docs to the replica from its
> transaction log. If a replica is more than N docs behind though, it falls
> back to the replication handler recovery mode you mention, and attempts to
> re-sync the whole shard from the leader.
> The default N for this is 100, which is pretty low for a high-update-rate
> index. It can be changed by increasing the size of the transaction log,
> (via numRecordsToKeep) but be aware that a large transaction log size can
> delay node restart.
> 
> See 
> https://cwiki.apache.org/confluence/display/solr/UpdateHandlers+in+SolrConf
> ig#UpdateHandlersinSolrConfig-TransactionLog for more info.
> 
> 
> Hope some of that helps, I don’t know a way to say
> delete-first-on-recovery.
> 
> 
> 
> On 10/27/15, 5:21 PM, "Brian Scholl"  wrote:
> 
>> Whoops, in the description of my setup that should say 2 replicas per
>> shard.  Every server has a replica.
>> 
>> 
>>> On Oct 27, 2015, at 20:16, Brian Scholl  wrote:
>>> 
>>> Hello,
>>> 
>>> I am experiencing a failure mode where a replica is unable to recover
>>> and it will try to do so forever.  In writing this email I want to make
>>> sure that I haven't missed anything obvious or missed a configurable
>>> option that could help.  If something about this looks funny, I would
>>> really like to hear from you.
>>> 
>>> Relevant details:
>>> - solr 5.3.1
>>> - java 1.8
>>> - ubuntu linux 14.04 lts
>>> - the cluster is composed of 1 SolrCloud collection with 100 shards
>>> backed by a 3 node zookeeper ensemble
>>> - there are 200 solr servers in the cluster, 1 replica per shard
>>> - a shard replica is larger than 50% of the available disk
>>> - ~40M docs added per day, total indexing time is 8-10 hours spread
>>> over the day
>>> - autoCommit is set to 15s
>>> - softCommit is not defined
>>> 
>>> I think I have traced the failure to the following set of events but
>>> would appreciate feedback:
>>> 
>>> 1. new documents are being indexed
>>> 2. the leader of a shard, server A, fails for any reason (java crashes,
>>> times out with zookeeper, etc)
>>> 3. zookeeper promotes the other replica of the shard, server B, to the
>>> leader position and indexing resumes
>>> 4. server A comes back online (typically 10s of seconds later) and
>>> reports to zookeeper
>>> 5. zookeeper tells server A that it is no longer the leader and to sync
>>> with server B
>>> 6. server A checks with server B but finds that server B's index
>>> version is different from its own
>>> 7. server A begins replicating a new copy of the index from server B
>>> using the (legacy?) replication handler
>>> 8. the original index on server A was not deleted so it runs out of
>>> disk space mid-replication
>>> 9. server A throws an error, deletes the partially replicated index,
>>> and then tries to replicate again
>>> 
>>> At this point I think steps 6  => 9 will loop forever
>>> 
>>> If the actual errors from solr.log are useful let me know, not doing
>>> that now for brevity since this email is already pretty long.  In a
>>> nutshell and in order, on server A I can find the error that took it
>>> down, the post-recovery instruction from ZK to unregister itself as a
>>> leader, the corrupt index error message, and then the (start - whoops,
>>> out of disk- stop) loop of the replication messages.
>>> 
>>> I first want to ask if what I described is possible or did I get lost
>>> somewhere along the way reading the docs?  Is there any reason to think
>>> that solr should not do this?
>>> 
>>> If my version of events is feasible I have a few other questions:
>>> 
>>> 1. What happens to the docs that were indexed on server A but never
>>> replicated to server B before the failure?  

replica recovery

2015-10-27 Thread Brian Scholl
Hello,

I am experiencing a failure mode where a replica is unable to recover and it 
will try to do so forever.  In writing this email I want to make sure that I 
haven't missed anything obvious or missed a configurable option that could 
help.  If something about this looks funny, I would really like to hear from 
you.

Relevant details:
- solr 5.3.1
- java 1.8
- ubuntu linux 14.04 lts
- the cluster is composed of 1 SolrCloud collection with 100 shards backed by a 
3 node zookeeper ensemble
- there are 200 solr servers in the cluster, 1 replica per shard
- a shard replica is larger than 50% of the available disk
- ~40M docs added per day, total indexing time is 8-10 hours spread over the day
- autoCommit is set to 15s
- softCommit is not defined

I think I have traced the failure to the following set of events but would 
appreciate feedback:

1. new documents are being indexed
2. the leader of a shard, server A, fails for any reason (java crashes, times 
out with zookeeper, etc)
3. zookeeper promotes the other replica of the shard, server B, to the leader 
position and indexing resumes
4. server A comes back online (typically 10s of seconds later) and reports to 
zookeeper
5. zookeeper tells server A that it is no longer the leader and to sync with 
server B
6. server A checks with server B but finds that server B's index version is 
different from its own
7. server A begins replicating a new copy of the index from server B using the 
(legacy?) replication handler
8. the original index on server A was not deleted so it runs out of disk space 
mid-replication
9. server A throws an error, deletes the partially replicated index, and then 
tries to replicate again

At this point I think steps 6  => 9 will loop forever

If the actual errors from solr.log are useful let me know, not doing that now 
for brevity since this email is already pretty long.  In a nutshell and in 
order, on server A I can find the error that took it down, the post-recovery 
instruction from ZK to unregister itself as a leader, the corrupt index error 
message, and then the (start - whoops, out of disk- stop) loop of the 
replication messages.

I first want to ask if what I described is possible or did I get lost somewhere 
along the way reading the docs?  Is there any reason to think that solr should 
not do this?

If my version of events is feasible I have a few other questions:

1. What happens to the docs that were indexed on server A but never replicated 
to server B before the failure?  Assuming that the replica on server A were to 
complete the recovery process would those docs appear in the index or are they 
gone for good?

2. I am guessing that the corrupt replica on server A is not deleted because it 
is still viable, if server B had a catastrophic failure you could pick up the 
pieces from server A.  If so is this a configurable option somewhere?  I'd 
rather take my chances on server B going down before replication finishes than 
be stuck in this state and have to manually intervene.  Besides, I have 
disaster recovery backups for exactly this situation.

3. Is there anything I can do to prevent this type of failure?  It seems to me 
that if server B gets even 1 new document as a leader the shard will enter this 
state.  My only thought right now is to try to stop sending documents for 
indexing the instant a leader goes down but on the surface this solution sounds 
tough to implement perfectly (and it would have to be perfect).

If you got this far thanks for sticking with me.

Cheers,
Brian



Re: replica recovery

2015-10-27 Thread Erick Erickson
Brian:

Two things come to mind here:

1> Even a partial index is better than none. Let's say we have a
leader and follower. Follower goes offline and thus out of date.
Follower comes back up and sees it needs to replicate and deletes the
index as the first step. At this very instant someone throws a glass
of water into the guts of the leader and the disk head dives into the
disk and totally destroys it. Now you don't even have a partial index
on the follower you can use to limp along until you can re-index
anything that might have been missed.

2> Yeah, you say, that's really artificial... but deleting the index
first in order to not run out of disk space still leaves you
vulnerable to the situation where background merging merges to a
single segment, thus requiring at least as much free space on the disk
as your index occupies.

So I'm not sure this does anyone any favors when there is not enough
disk space to replicate in the scenario you describe.

I'm trending negative on this since <1> is not actually that
far-fetched and <2> will bite a person in this situation sooner or
later anyway. Having the old index still around until a _successful_
replication isn't a bad thing...

FWIW,
Erick

On Tue, Oct 27, 2015 at 6:02 PM, Brian Scholl  wrote:
> Both are excellent points and I will look to implement them.  Particularly I 
> wonder if a respectable increase to the numRecordsToKeep param could solve 
> this problem entirely.
>
> Thanks!
>
>
>
>> On Oct 27, 2015, at 20:50, Jeff Wartes  wrote:
>>
>>
>> On the face of it, your scenario seems plausible. I can offer two pieces
>> of info that may or may not help you:
>>
>> 1. A write request to Solr will not be acknowledged until an attempt has
>> been made to write to all relevant replicas. So, B won’t ever be missing
>> updates that were applied to A, unless communication with B was disrupted
>> somehow at the time of the update request. You can add a min_rf param to
>> your write request, in which case the response will tell you how many
>> replicas received the update, but it’s still up to your indexer client to
>> decide what to do if that’s less than your replication factor.
>>
>> See
>> https://cwiki.apache.org/confluence/display/solr/Read+and+Write+Side+Fault+
>> Tolerance for more info.
>>
>> 2. There are two forms of replication. The usual thing is for the leader
>> for each shard to write an update to all replicas before acknowledging the
>> write itself, as above. If a replica is less than N docs behind the
>> leader, the leader can replay those docs to the replica from its
>> transaction log. If a replica is more than N docs behind though, it falls
>> back to the replication handler recovery mode you mention, and attempts to
>> re-sync the whole shard from the leader.
>> The default N for this is 100, which is pretty low for a high-update-rate
>> index. It can be changed by increasing the size of the transaction log,
>> (via numRecordsToKeep) but be aware that a large transaction log size can
>> delay node restart.
>>
>> See
>> https://cwiki.apache.org/confluence/display/solr/UpdateHandlers+in+SolrConf
>> ig#UpdateHandlersinSolrConfig-TransactionLog for more info.
>>
>>
>> Hope some of that helps, I don’t know a way to say
>> delete-first-on-recovery.
>>
>>
>>
>> On 10/27/15, 5:21 PM, "Brian Scholl"  wrote:
>>
>>> Whoops, in the description of my setup that should say 2 replicas per
>>> shard.  Every server has a replica.
>>>
>>>
 On Oct 27, 2015, at 20:16, Brian Scholl  wrote:

 Hello,

 I am experiencing a failure mode where a replica is unable to recover
 and it will try to do so forever.  In writing this email I want to make
 sure that I haven't missed anything obvious or missed a configurable
 option that could help.  If something about this looks funny, I would
 really like to hear from you.

 Relevant details:
 - solr 5.3.1
 - java 1.8
 - ubuntu linux 14.04 lts
 - the cluster is composed of 1 SolrCloud collection with 100 shards
 backed by a 3 node zookeeper ensemble
 - there are 200 solr servers in the cluster, 1 replica per shard
 - a shard replica is larger than 50% of the available disk
 - ~40M docs added per day, total indexing time is 8-10 hours spread
 over the day
 - autoCommit is set to 15s
 - softCommit is not defined

 I think I have traced the failure to the following set of events but
 would appreciate feedback:

 1. new documents are being indexed
 2. the leader of a shard, server A, fails for any reason (java crashes,
 times out with zookeeper, etc)
 3. zookeeper promotes the other replica of the shard, server B, to the
 leader position and indexing resumes
 4. server A comes back online (typically 10s of seconds later) and
 reports to zookeeper
 5. zookeeper tells server A that it is no longer the 

RE: Solr collection alias - how rank is affected

2015-10-27 Thread Markus Jelsma
Hello - regarding fairly random/smooth distribution, you will notice it for 
sure. A solution there is to use distributed collection statistics. On top of 
that you might want to rely on docCount, not maxDoc inside your similarity 
implementation, because docCount should be identical in both collections. 
maxDoc is not really deterministic it seems, since identical replica's do not 
merge segments at the same time.

Markus
 
 
-Original message-
> From:Scott Stults 
> Sent: Tuesday 27th October 2015 21:18
> To: solr-user@lucene.apache.org
> Subject: Re: Solr collection alias - how rank is affected
> 
> Collection statistics aren't shared between collections, so there's going
> to be a difference. However, if the distribution is fairly random you won't
> notice.
> 
> On Tue, Oct 27, 2015 at 3:21 PM, SolrUser1543  wrote:
> 
> > How is document ranking is affected when using a collection alias for
> > searching on two collections with same schema ? is it affected at all  ?
> >
> >
> >
> > --
> > View this message in context:
> > http://lucene.472066.n3.nabble.com/Solr-collection-alias-how-rank-is-affected-tp4236776.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
> 
> 
> 
> -- 
> Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC
> | 434.409.2780
> http://www.opensourceconnections.com
> 


Re: replica recovery

2015-10-27 Thread Brian Scholl
Whoops, in the description of my setup that should say 2 replicas per shard.  
Every server has a replica.


> On Oct 27, 2015, at 20:16, Brian Scholl  wrote:
> 
> Hello,
> 
> I am experiencing a failure mode where a replica is unable to recover and it 
> will try to do so forever.  In writing this email I want to make sure that I 
> haven't missed anything obvious or missed a configurable option that could 
> help.  If something about this looks funny, I would really like to hear from 
> you.
> 
> Relevant details:
> - solr 5.3.1
> - java 1.8
> - ubuntu linux 14.04 lts
> - the cluster is composed of 1 SolrCloud collection with 100 shards backed by 
> a 3 node zookeeper ensemble
> - there are 200 solr servers in the cluster, 1 replica per shard
> - a shard replica is larger than 50% of the available disk
> - ~40M docs added per day, total indexing time is 8-10 hours spread over the 
> day
> - autoCommit is set to 15s
> - softCommit is not defined
>   
> I think I have traced the failure to the following set of events but would 
> appreciate feedback:
> 
> 1. new documents are being indexed
> 2. the leader of a shard, server A, fails for any reason (java crashes, times 
> out with zookeeper, etc)
> 3. zookeeper promotes the other replica of the shard, server B, to the leader 
> position and indexing resumes
> 4. server A comes back online (typically 10s of seconds later) and reports to 
> zookeeper
> 5. zookeeper tells server A that it is no longer the leader and to sync with 
> server B
> 6. server A checks with server B but finds that server B's index version is 
> different from its own
> 7. server A begins replicating a new copy of the index from server B using 
> the (legacy?) replication handler
> 8. the original index on server A was not deleted so it runs out of disk 
> space mid-replication
> 9. server A throws an error, deletes the partially replicated index, and then 
> tries to replicate again
> 
> At this point I think steps 6  => 9 will loop forever
> 
> If the actual errors from solr.log are useful let me know, not doing that now 
> for brevity since this email is already pretty long.  In a nutshell and in 
> order, on server A I can find the error that took it down, the post-recovery 
> instruction from ZK to unregister itself as a leader, the corrupt index error 
> message, and then the (start - whoops, out of disk- stop) loop of the 
> replication messages.
> 
> I first want to ask if what I described is possible or did I get lost 
> somewhere along the way reading the docs?  Is there any reason to think that 
> solr should not do this?
> 
> If my version of events is feasible I have a few other questions:
> 
> 1. What happens to the docs that were indexed on server A but never 
> replicated to server B before the failure?  Assuming that the replica on 
> server A were to complete the recovery process would those docs appear in the 
> index or are they gone for good?
> 
> 2. I am guessing that the corrupt replica on server A is not deleted because 
> it is still viable, if server B had a catastrophic failure you could pick up 
> the pieces from server A.  If so is this a configurable option somewhere?  
> I'd rather take my chances on server B going down before replication finishes 
> than be stuck in this state and have to manually intervene.  Besides, I have 
> disaster recovery backups for exactly this situation.
> 
> 3. Is there anything I can do to prevent this type of failure?  It seems to 
> me that if server B gets even 1 new document as a leader the shard will enter 
> this state.  My only thought right now is to try to stop sending documents 
> for indexing the instant a leader goes down but on the surface this solution 
> sounds tough to implement perfectly (and it would have to be perfect).
> 
> If you got this far thanks for sticking with me.
> 
> Cheers,
> Brian
> 



Re: replica recovery

2015-10-27 Thread Jeff Wartes

On the face of it, your scenario seems plausible. I can offer two pieces
of info that may or may not help you:

1. A write request to Solr will not be acknowledged until an attempt has
been made to write to all relevant replicas. So, B won’t ever be missing
updates that were applied to A, unless communication with B was disrupted
somehow at the time of the update request. You can add a min_rf param to
your write request, in which case the response will tell you how many
replicas received the update, but it’s still up to your indexer client to
decide what to do if that’s less than your replication factor.

See 
https://cwiki.apache.org/confluence/display/solr/Read+and+Write+Side+Fault+
Tolerance for more info.

2. There are two forms of replication. The usual thing is for the leader
for each shard to write an update to all replicas before acknowledging the
write itself, as above. If a replica is less than N docs behind the
leader, the leader can replay those docs to the replica from its
transaction log. If a replica is more than N docs behind though, it falls
back to the replication handler recovery mode you mention, and attempts to
re-sync the whole shard from the leader.
The default N for this is 100, which is pretty low for a high-update-rate
index. It can be changed by increasing the size of the transaction log,
(via numRecordsToKeep) but be aware that a large transaction log size can
delay node restart.

See 
https://cwiki.apache.org/confluence/display/solr/UpdateHandlers+in+SolrConf
ig#UpdateHandlersinSolrConfig-TransactionLog for more info.


Hope some of that helps, I don’t know a way to say
delete-first-on-recovery.



On 10/27/15, 5:21 PM, "Brian Scholl"  wrote:

>Whoops, in the description of my setup that should say 2 replicas per
>shard.  Every server has a replica.
>
>
>> On Oct 27, 2015, at 20:16, Brian Scholl  wrote:
>> 
>> Hello,
>> 
>> I am experiencing a failure mode where a replica is unable to recover
>>and it will try to do so forever.  In writing this email I want to make
>>sure that I haven't missed anything obvious or missed a configurable
>>option that could help.  If something about this looks funny, I would
>>really like to hear from you.
>> 
>> Relevant details:
>> - solr 5.3.1
>> - java 1.8
>> - ubuntu linux 14.04 lts
>> - the cluster is composed of 1 SolrCloud collection with 100 shards
>>backed by a 3 node zookeeper ensemble
>> - there are 200 solr servers in the cluster, 1 replica per shard
>> - a shard replica is larger than 50% of the available disk
>> - ~40M docs added per day, total indexing time is 8-10 hours spread
>>over the day
>> - autoCommit is set to 15s
>> - softCommit is not defined
>>  
>> I think I have traced the failure to the following set of events but
>>would appreciate feedback:
>> 
>> 1. new documents are being indexed
>> 2. the leader of a shard, server A, fails for any reason (java crashes,
>>times out with zookeeper, etc)
>> 3. zookeeper promotes the other replica of the shard, server B, to the
>>leader position and indexing resumes
>> 4. server A comes back online (typically 10s of seconds later) and
>>reports to zookeeper
>> 5. zookeeper tells server A that it is no longer the leader and to sync
>>with server B
>> 6. server A checks with server B but finds that server B's index
>>version is different from its own
>> 7. server A begins replicating a new copy of the index from server B
>>using the (legacy?) replication handler
>> 8. the original index on server A was not deleted so it runs out of
>>disk space mid-replication
>> 9. server A throws an error, deletes the partially replicated index,
>>and then tries to replicate again
>> 
>> At this point I think steps 6  => 9 will loop forever
>> 
>> If the actual errors from solr.log are useful let me know, not doing
>>that now for brevity since this email is already pretty long.  In a
>>nutshell and in order, on server A I can find the error that took it
>>down, the post-recovery instruction from ZK to unregister itself as a
>>leader, the corrupt index error message, and then the (start - whoops,
>>out of disk- stop) loop of the replication messages.
>> 
>> I first want to ask if what I described is possible or did I get lost
>>somewhere along the way reading the docs?  Is there any reason to think
>>that solr should not do this?
>> 
>> If my version of events is feasible I have a few other questions:
>> 
>> 1. What happens to the docs that were indexed on server A but never
>>replicated to server B before the failure?  Assuming that the replica on
>>server A were to complete the recovery process would those docs appear
>>in the index or are they gone for good?
>> 
>> 2. I am guessing that the corrupt replica on server A is not deleted
>>because it is still viable, if server B had a catastrophic failure you
>>could pick up the pieces from server A.  If so is this a configurable
>>option somewhere?  I'd rather take my chances on server B going down

Re: CloudSolrClient query /admin/info/system

2015-10-27 Thread Kevin Risden
Created https://issues.apache.org/jira/browse/SOLR-8216

Kevin Risden
Hadoop Tech Lead | Avalon Consulting, LLC 
M: 732 213 8417
LinkedIn  | Google+
 | Twitter


-
This message (including any attachments) contains confidential information
intended for a specific individual and purpose, and is protected by law. If
you are not the intended recipient, you should delete this message. Any
disclosure, copying, or distribution of this message, or the taking of any
action based on it, is strictly prohibited.

On Tue, Oct 27, 2015 at 5:11 AM, Alan Woodward  wrote:

> Hi Kevin,
>
> This looks like a bug in CSC - could you raise an issue?
>
> Alan Woodward
> www.flax.co.uk
>
>
> On 26 Oct 2015, at 22:21, Kevin Risden wrote:
>
> > I am trying to use CloudSolrClient to query information about the Solr
> > server including version information. I found /admin/info/system and it
> > seems to provide the information I am looking for. However, it looks like
> > CloudSolrClient cannot query /admin/info since INFO_HANDLER_PATH [1] is
> not
> > part of the ADMIN_PATHS in CloudSolrClient.java [2]. Was this possibly
> > missed as part of SOLR-4943 [3]?
> >
> > Is this an issue or is there a better way to query this information?
> >
> > As a side note, ZK_PATH also isn't listed in ADMIN_PATHS. I'm not sure
> what
> > issues that could cause. Is there a reason that ADMIN_PATHS in
> > CloudSolrClient would be different than the paths in CommonParams [1]?
> >
> > [1]
> >
> https://github.com/apache/lucene-solr/blob/trunk/solr/solrj/src/java/org/apache/solr/common/params/CommonParams.java#L168
> > [2]
> >
> https://github.com/apache/lucene-solr/blob/trunk/solr/solrj/src/java/org/apache/solr/client/solrj/impl/CloudSolrClient.java#L808
> > [3] https://issues.apache.org/jira/browse/SOLR-4943
> >
> > Kevin Risden
> > Hadoop Tech Lead | Avalon Consulting, LLC  >
> > M: 732 213 8417
> > LinkedIn  |
> Google+
> >  | Twitter
> > 
>
>


Can not resolve the webdav path

2015-10-27 Thread espeake

When we are trying to access content from our 5.0.2 alfresco install we get
the a Status 500 - No bean named 'webClientConfigService' is defined

The stack trace from the catalina.out log looks like this.

2015-10-27 09:00:29,911  WARN  [app.servlet.BaseServlet]
[http-nio-8080-exec-5] Failed to resolve webdav path
 org.alfresco.service.cmr.model.FileNotFoundException: Folder not found:
at
org.alfresco.repo.model.filefolder.FileFolderServiceImpl.resolveNamePath
(FileFolderServiceImpl.java:1548)
at
org.alfresco.repo.model.filefolder.FileFolderServiceImpl.resolveNamePath
(FileFolderServiceImpl.java:1527)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke
(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke
(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection
(AopUtils.java:317)
at
org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint
(ReflectiveMethodInvocation.java:183)
at
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed
(ReflectiveMethodInvocation.java:150)
at org.alfresco.repo.model.ml.MLContentInterceptor.invoke
(MLContentInterceptor.java:129)
at
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed
(ReflectiveMethodInvocation.java:172)
at
org.alfresco.repo.model.filefolder.MLTranslationInterceptor.invoke
(MLTranslationInterceptor.java:268)
at
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed
(ReflectiveMethodInvocation.java:172)
at
net.sf.acegisecurity.intercept.method.aopalliance.MethodSecurityInterceptor.invoke
(MethodSecurityInterceptor.java:80)
at
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed
(ReflectiveMethodInvocation.java:172)
at
org.alfresco.repo.security.permissions.impl.ExceptionTranslatorMethodInterceptor.invoke
(ExceptionTranslatorMethodInterceptor.java:46)
at
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed
(ReflectiveMethodInvocation.java:172)
at org.alfresco.repo.audit.AuditMethodInterceptor.invoke
(AuditMethodInterceptor.java:159)
at
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed
(ReflectiveMethodInvocation.java:172)
at
org.alfresco.repo.model.filefolder.FilenameFilteringInterceptor.invoke
(FilenameFilteringInterceptor.java:382)
at
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed
(ReflectiveMethodInvocation.java:172)
at
org.springframework.transaction.interceptor.TransactionInterceptor
$1.proceedWithInvocation(TransactionInterceptor.java:96)
at
org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction
(TransactionAspectSupport.java:260)
at
org.springframework.transaction.interceptor.TransactionInterceptor.invoke
(TransactionInterceptor.java:94)
at
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed
(ReflectiveMethodInvocation.java:172)
at org.springframework.aop.framework.JdkDynamicAopProxy.invoke
(JdkDynamicAopProxy.java:204)
at com.sun.proxy.$Proxy65.resolveNamePath(Unknown Source)
at org.alfresco.web.app.servlet.BaseServlet$1.doWork
(BaseServlet.java:435)
at org.alfresco.web.app.servlet.BaseServlet$1.doWork
(BaseServlet.java:392)
at
org.alfresco.repo.security.authentication.AuthenticationUtil.runAs
(AuthenticationUtil.java:548)
at org.alfresco.web.app.servlet.BaseServlet.resolveWebDAVPath
(BaseServlet.java:391)
at org.alfresco.web.app.servlet.BaseServlet.resolveWebDAVPath
(BaseServlet.java:379)
at org.alfresco.web.app.servlet.BaseServlet.resolveNamePath
(BaseServlet.java:475)
at
org.alfresco.web.app.servlet.BaseDownloadContentServlet.processDownloadRequest
(BaseDownloadContentServlet.java:155)
at org.alfresco.web.app.servlet.DownloadContentServlet$2.execute
(DownloadContentServlet.java:141)
at org.alfresco.web.app.servlet.DownloadContentServlet$2.execute
(DownloadContentServlet.java:138)
at
org.alfresco.repo.transaction.RetryingTransactionHelper.doInTransaction
(RetryingTransactionHelper.java:454)
at
org.alfresco.repo.transaction.RetryingTransactionHelper.doInTransaction
(RetryingTransactionHelper.java:342)
at org.alfresco.web.app.servlet.DownloadContentServlet.doGet
(DownloadContentServlet.java:145)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:620)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter
(ApplicationFilterChain.java:303)
at org.apache.catalina.core.ApplicationFilterChain.doFilter
(ApplicationFilterChain.java:208)
at