Re: Solr 5.2.1 Most solr nodes in a cluster going down at once.

2015-12-14 Thread Pushkar Raste
Hi Philippa,
Try taking a heap dump (when heap usage is high) and then using a profiler
look at which objects are taking up most of the memory. I have seen that if
you are using faceting/sorting on large number of documents then fieldCache
grows very big and dominates most of of the heap. Enabling docValues on the
fields you are sorting/faceting on helps.

On 8 December 2015 at 07:17, philippa griggs 
wrote:

> Hello Emir,
>
> The query load is around 35 requests per min on each shard, we don't
> document route so we query the entire index.
>
> We do have some heavy queries like faceting and its possible that a heavy
> queries is causing the nodes to go down- we are looking into this.  I'm new
> to solr so this could be a slightly stupid question but would a heavy query
> cause most of the nodes to go down? This didn't happen with the previous
> solr version we were using Solr 4.10.0, we did have nodes/shards which went
> down but there wasn't wipe out effect where most of the nodes go.
>
> Many thanks
>
> Philippa
>
> 
> From: Emir Arnautovic 
> Sent: 08 December 2015 10:38
> To: solr-user@lucene.apache.org
> Subject: Re: Solr 5.2.1 Most solr nodes in a cluster going down at once.
>
> Hi Phillippa,
> My guess would be that you are running some heavy queries (faceting/deep
> paging/large pages) or have high query load (can you give bit details
> about load) or have misconfigured caches. Do you query entire index or
> you have query routing?
>
> You have big machine and might consider running two Solr on each node
> (with smaller heap) and split shards so queries can be more
> parallelized, resources better utilized, and smaller heap to GC.
>
> Regards,
> Emir
>
> On 08.12.2015 10:49, philippa griggs wrote:
> > Hello Erick,
> >
> > Thanks for your reply.
> >
> > We have one collection and are writing documents to that collection all
> the time- it peaks at around 2,500 per minute and dips to 250 per minute,
> the size of the document varies. On each node we have around 55,000,000
> documents with a data size of 43G located on a drive of 200G.
> >
> > Each node has 122G memory, the heap size is currently set at 45G
> although we have plans to increase this to 50G.
> >
> > The heap settings we are using are:
> >
> >   -XX: +UseG1GC,
> > -XX:+ParallelRefProcEnabled.
> >
> > Please let me know if you need any more information.
> >
> > Philippa
> > 
> > From: Erick Erickson 
> > Sent: 07 December 2015 16:53
> > To: solr-user
> > Subject: Re: Solr 5.2.1 Most solr nodes in a cluster going down at once.
> >
> > Tell us a bit more.
> >
> > Are you adding documents to your collections or adding more
> > collections? Solr is a balancing act between the number of docs you
> > have on each node and the memory you have allocated. If you're
> > continually adding docs to Solr, you'll eventually run out of memory
> > and/or hit big GC pauses.
> >
> > How much memory are you allocating to Solr? How much physical memory
> > to you have? etc.
> >
> > Best,
> > Erick
> >
> >
> > On Mon, Dec 7, 2015 at 8:37 AM, philippa griggs
> >  wrote:
> >> Hello,
> >>
> >>
> >> I'm using:
> >>
> >>
> >> Solr 5.2.1 10 shards each with a replica. (20 nodes in total)
> >>
> >>
> >> Zookeeper 3.4.6.
> >>
> >>
> >> About half a year ago we upgraded to Solr 5.2.1 and since then have
> been experiencing a 'wipe out' effect where all of a sudden most if not all
> nodes will go down. Sometimes they will recover by themselves but more
> often than not we have to step in to restart nodes.
> >>
> >>
> >> Nothing in the logs jumps out as being the problem. With the latest
> wipe out we noticed that 10 out of the 20 nodes had garbage collections
> over 1min all at the same time, with the heap usage spiking up in some
> cases to 80%. We also noticed the amount of selects run on the solr cluster
> increased just before the wipe out.
> >>
> >>
> >> Increasing the heap size seems to help for a while but then it starts
> happening again- so its more like a delay than a fix. Our GC settings are
> set to -XX: +UseG1GC, -XX:+ParallelRefProcEnabled.
> >>
> >>
> >> With our previous version of solr (4.10.0) this didn't happen. We had
> nodes/shards go down but it was contained, with the new version they all
> seem to go at around the same time. We can't really continue just
> increasing the heap size and would like to solve this issue rather than
> delay it.
> >>
> >>
> >> Has anyone experienced something simular?
> >>
> >> Is there a difference between the two versions around the recovery
> process?
> >>
> >> Does anyone have any suggestions on a fix.
> >>
> >>
> >> Many thanks
> >>
> >>
> >> Philippa
> > >
>
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>


Re: Moving to SolrCloud, specifying dataDir correctly

2015-12-14 Thread Jeff Wartes

Don’t set solr.data.dir. Instead, set the install dir. Something like:
-Dsolr.solr.home=/data/solr
-Dsolr.install.dir=/opt/solr

I have many solrcloud collections, and separate data/install dirs, and
I’ve never had to do anything with manual per-collection or per-replica
data dirs.

That said, it’s been a while since I set this up, and I may not remember
all the pieces. 
You might need something like this too, for example:

-Djetty.home=/opt/solr/server


On 12/14/15, 3:11 PM, "Erick Erickson"  wrote:

>Currently, it'll be a little tedious but here's what you can do (going
>partly from memory)...
>
>When you create the collection, specify the special value EMPTY for
>createNodeSet (Solr 5.3+).
>Use ADDREPLICA to add each individual replica. When you do this, you
>can add a dataDir for
>each individual replica and thus keep them separate, i.e. for a
>particular box the first
>replica would get /data/solr/collection1_shard1_replica1, the second
>/data/solr/collection1_shard2_replica1 and so forth.
>
>If you don't have Solr 5.3+, you can still to the same thing, except
>you create your collection letting
>the replicas fall where they will. Then do the ADDREPLICA as above.
>When that's all done,
>DELETEREPLICA for the original replicas.
>
>Best,
>Erick
>
>On Mon, Dec 14, 2015 at 2:21 PM, Tom Evans 
>wrote:
>> On Mon, Dec 14, 2015 at 1:22 PM, Shawn Heisey 
>>wrote:
>>> On 12/14/2015 10:49 AM, Tom Evans wrote:
 When I tried this in SolrCloud mode, specifying
 "-Dsolr.data.dir=/mnt/solr/" when starting each node, it worked fine
 for the first collection, but then the second collection tried to use
 the same directory to store its index, which obviously failed. I fixed
 this by changing solrconfig.xml in each collection to specify a
 specific directory, like so:

   ${solr.data.dir:}products

 Looking back after the weekend, I'm not a big fan of this. Is there a
 way to add a core.properties to ZK, or a way to specify
 core.baseDatadir on the command line, or just a better way of handling
 this that I'm not aware of?
>>>
>>> Since you're running SolrCloud, just let Solr handle the dataDir, don't
>>> try to override it.  It will default to "data" relative to the
>>> instanceDir.  Each instanceDir is likely to be in the solr home.
>>>
>>> With SolrCloud, your cores will not contain a "conf" directory (unless
>>> you create it manually), therefore the on-disk locations will be *only*
>>> data, there's not really any need to have separate locations for
>>> instanceDir and dataDir.  All active configuration information for
>>> SolrCloud is in zookeeper.
>>>
>>
>> That makes sense, but I guess I was asking the wrong question :)
>>
>> We have our SSDs mounted on /data/solr, which is where our indexes
>> should go, but our solr install is on /opt/solr, with the default solr
>> home in /opt/solr/server/solr. How do we change where the indexes get
>> put so they end up on the fast storage?
>>
>> Cheers
>>
>> Tom



Re: Moving to SolrCloud, specifying dataDir correctly

2015-12-14 Thread Rahul Ramesh
We currently moved data from magnetic drive to SSD. We run Solr in cloud
mode. Only data is stored in the drive configuration is stored in ZK. We
start solr using the -s option specifying the data dir
Command to start solr
./bin/solr start -c -h  -p  -z  -s 

We followed the following steps to migrate data

1. Stop all new insertions .
2. Copy the solr data to the new location
3. restart the server with -s option pointing to new solr directory name.
4. We have a 3 node solr cluster. The restarted server will get in sync
with the other two servers.
5. Repeat this procedure for other two servers.

We used the similar procedure to upgrade from 5.2.1 to 5.3.1.





On Tue, Dec 15, 2015 at 5:07 AM, Jeff Wartes  wrote:

>
> Don’t set solr.data.dir. Instead, set the install dir. Something like:
> -Dsolr.solr.home=/data/solr
> -Dsolr.install.dir=/opt/solr
>
> I have many solrcloud collections, and separate data/install dirs, and
> I’ve never had to do anything with manual per-collection or per-replica
> data dirs.
>
> That said, it’s been a while since I set this up, and I may not remember
> all the pieces.
> You might need something like this too, for example:
>
> -Djetty.home=/opt/solr/server
>
>
> On 12/14/15, 3:11 PM, "Erick Erickson"  wrote:
>
> >Currently, it'll be a little tedious but here's what you can do (going
> >partly from memory)...
> >
> >When you create the collection, specify the special value EMPTY for
> >createNodeSet (Solr 5.3+).
> >Use ADDREPLICA to add each individual replica. When you do this, you
> >can add a dataDir for
> >each individual replica and thus keep them separate, i.e. for a
> >particular box the first
> >replica would get /data/solr/collection1_shard1_replica1, the second
> >/data/solr/collection1_shard2_replica1 and so forth.
> >
> >If you don't have Solr 5.3+, you can still to the same thing, except
> >you create your collection letting
> >the replicas fall where they will. Then do the ADDREPLICA as above.
> >When that's all done,
> >DELETEREPLICA for the original replicas.
> >
> >Best,
> >Erick
> >
> >On Mon, Dec 14, 2015 at 2:21 PM, Tom Evans 
> >wrote:
> >> On Mon, Dec 14, 2015 at 1:22 PM, Shawn Heisey 
> >>wrote:
> >>> On 12/14/2015 10:49 AM, Tom Evans wrote:
>  When I tried this in SolrCloud mode, specifying
>  "-Dsolr.data.dir=/mnt/solr/" when starting each node, it worked fine
>  for the first collection, but then the second collection tried to use
>  the same directory to store its index, which obviously failed. I fixed
>  this by changing solrconfig.xml in each collection to specify a
>  specific directory, like so:
> 
>    ${solr.data.dir:}products
> 
>  Looking back after the weekend, I'm not a big fan of this. Is there a
>  way to add a core.properties to ZK, or a way to specify
>  core.baseDatadir on the command line, or just a better way of handling
>  this that I'm not aware of?
> >>>
> >>> Since you're running SolrCloud, just let Solr handle the dataDir, don't
> >>> try to override it.  It will default to "data" relative to the
> >>> instanceDir.  Each instanceDir is likely to be in the solr home.
> >>>
> >>> With SolrCloud, your cores will not contain a "conf" directory (unless
> >>> you create it manually), therefore the on-disk locations will be *only*
> >>> data, there's not really any need to have separate locations for
> >>> instanceDir and dataDir.  All active configuration information for
> >>> SolrCloud is in zookeeper.
> >>>
> >>
> >> That makes sense, but I guess I was asking the wrong question :)
> >>
> >> We have our SSDs mounted on /data/solr, which is where our indexes
> >> should go, but our solr install is on /opt/solr, with the default solr
> >> home in /opt/solr/server/solr. How do we change where the indexes get
> >> put so they end up on the fast storage?
> >>
> >> Cheers
> >>
> >> Tom
>
>


Re: Defining SOLR nested fields

2015-12-14 Thread Alessandro Benedetti
Exacly, In Solr there is no concept of "nested fields" .
But there's the concept of nested documents ( via Query time join and Index
time (block) join ) .
You can have a "flat" schema which actually will be used to model nested
documents at index and query time.
There is plenty of documentation about that ( starting from 4.5 block join
is available in Solr as well)

Cheers

On 14 December 2015 at 03:18, Binoy Dalal  wrote:

> From what I've seen, you can't nest fields in the schema.xml, since those
> are just declarations. If you want nested documents, you need to do so at
> index time with the  _childDocuments_ json key in your doc.
> Take a look here: http://yonik.com/solr-nested-objects/
>
> On Mon, Dec 14, 2015 at 6:10 AM santosh sidnal 
> wrote:
>
> > Hi All,
> >
> > I want to define nested fileds in SOLR using schema.xml. we are using
> > Apache
> > Solr 4.7.0.
> >
> > i see some links which says how to do, but not sure how can i do it in
> > schema.xml
> >
> >
> https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-BlockJoinQueryParsers
> >
> >
> > any help over here is appreciable.
> >
> > --
> > Regards,
> > Santosh Sidnal
> >
> --
> Regards,
> Binoy Dalal
>



-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: Getting a document version back after updating

2015-12-14 Thread Debraj Manna
Is there any seperate api available in solrj 5.2.1 for setting version=true
while adding or updating a solr doc?
On Dec 13, 2015 8:03 AM, "Debraj Manna"  wrote:

> Thanks Alex. This is what I was looking for. One more query how to set
> this from solrj while calling add() ? Do I have to make a curl call with
> this param set?
> On Dec 13, 2015 12:53 AM, "Shalin Shekhar Mangar" 
> wrote:
>
>> Oh yes, I had forgotten about that! Thanks Alexandre!
>>
>> On Sat, Dec 12, 2015 at 11:57 PM, Alexandre Rafalovitch
>>  wrote:
>> > Does "versions=true" flag match what you are looking for? It is
>> > described towards the end of:
>> >
>> https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents#UpdatingPartsofDocuments-OptimisticConcurrency
>> >
>> > Regards,
>> >Alex.
>> > 
>> > Newsletter and resources for Solr beginners and intermediates:
>> > http://www.solr-start.com/
>> >
>> >
>> > On 12 December 2015 at 11:35, Debraj Manna 
>> wrote:
>> >> I was thinking if it is possible to get the version without making one
>> more
>> >> extra call getById. Can I get that as part of the update response when
>> I am
>> >> updating or adding a new document?
>> >> On Dec 12, 2015 3:28 PM, "Shalin Shekhar Mangar" <
>> shalinman...@gmail.com>
>> >> wrote:
>> >>
>> >>> You will have to request a real-time-get with the unique key of the
>> >>> document you added/updated. In Solr 5.1+ you can go
>> >>> client.getById(String id) to get this information.
>> >>>
>> >>> On Sat, Dec 12, 2015 at 10:19 AM, Debraj Manna <
>> subharaj.ma...@gmail.com>
>> >>> wrote:
>> >>> > Is there a way I can get the version of a document back in response
>> after
>> >>> > adding or updating the document via Solrj 5.2.1?
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Regards,
>> >>> Shalin Shekhar Mangar.
>> >>>
>>
>>
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>
>


Re: Highlighting large documents

2015-12-14 Thread Jens Brandt
Hi Edwin,

you are limiting the portion of the document analyzed for highlighting in your 
solrconfig.xml by

 100

Thus, snippets are only produced correctly if the query was found in the first 
100 characters of the document.

If you set this parameter to

 -1

the original highlighter uses the whole document to find the snippet.

I hope that helps
  Jens


> Am 04.12.2015 um 16:51 schrieb Zheng Lin Edwin Yeo :
> 
> Hi,
> 
> I'm using Solr 5.3.0
> 
> I found that in large documents, sometimes I face situation that when I do
> a highlight query, the resultset that is returned does not contain the
> highlighted query. There are actually matches in the documents, but just
> that they located further back in the documents.
> 
> I have tried to increase the value of the hl.maxAnalyzedChars, as the
> default value is 51200, and I have documents that are much larger than
> 51200 characters. Although this method works, but, when I increase this
> value, the performance of the search and highlight drops. It can drop from
> less than 0.5 seconds to more than 10 seconds.
> 
> Would like to check, is this method of increasing the value of the
> hl.maxAnalyzedChars the best method to use, or is there other ways which
> can solve the same purpose, but without affecting the performance much?
> 
> Regards,
> Edwin



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Getting a document version back after updating

2015-12-14 Thread Mikhail Khludnev
what about  UpdateRequest().getParam().add("versions","true") ?

On Mon, Dec 14, 2015 at 1:15 PM, Debraj Manna 
wrote:

> Is there any seperate api available in solrj 5.2.1 for setting version=true
> while adding or updating a solr doc?
> On Dec 13, 2015 8:03 AM, "Debraj Manna"  wrote:
>
> > Thanks Alex. This is what I was looking for. One more query how to set
> > this from solrj while calling add() ? Do I have to make a curl call with
> > this param set?
> > On Dec 13, 2015 12:53 AM, "Shalin Shekhar Mangar" <
> shalinman...@gmail.com>
> > wrote:
> >
> >> Oh yes, I had forgotten about that! Thanks Alexandre!
> >>
> >> On Sat, Dec 12, 2015 at 11:57 PM, Alexandre Rafalovitch
> >>  wrote:
> >> > Does "versions=true" flag match what you are looking for? It is
> >> > described towards the end of:
> >> >
> >>
> https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents#UpdatingPartsofDocuments-OptimisticConcurrency
> >> >
> >> > Regards,
> >> >Alex.
> >> > 
> >> > Newsletter and resources for Solr beginners and intermediates:
> >> > http://www.solr-start.com/
> >> >
> >> >
> >> > On 12 December 2015 at 11:35, Debraj Manna 
> >> wrote:
> >> >> I was thinking if it is possible to get the version without making
> one
> >> more
> >> >> extra call getById. Can I get that as part of the update response
> when
> >> I am
> >> >> updating or adding a new document?
> >> >> On Dec 12, 2015 3:28 PM, "Shalin Shekhar Mangar" <
> >> shalinman...@gmail.com>
> >> >> wrote:
> >> >>
> >> >>> You will have to request a real-time-get with the unique key of the
> >> >>> document you added/updated. In Solr 5.1+ you can go
> >> >>> client.getById(String id) to get this information.
> >> >>>
> >> >>> On Sat, Dec 12, 2015 at 10:19 AM, Debraj Manna <
> >> subharaj.ma...@gmail.com>
> >> >>> wrote:
> >> >>> > Is there a way I can get the version of a document back in
> response
> >> after
> >> >>> > adding or updating the document via Solrj 5.2.1?
> >> >>>
> >> >>>
> >> >>>
> >> >>> --
> >> >>> Regards,
> >> >>> Shalin Shekhar Mangar.
> >> >>>
> >>
> >>
> >>
> >> --
> >> Regards,
> >> Shalin Shekhar Mangar.
> >>
> >
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Re: Block Join query

2015-12-14 Thread Mikhail Khludnev
In addition to the link in the previous response,
http://blog.griddynamics.com/2013/09/solr-block-join-support.html provides
an example of such combination. From my experience fq doen't participate in
highlighting nor scoring.

On Mon, Dec 14, 2015 at 2:45 PM, Novin Novin  wrote:

> Hi Mikhail,
>
> I'm having a little bit problem to construct the query for solr when I have
> been trying to use block join query. As you said, i can't use + or 
> in front of block join query, so I have to put *{**!parent
> which="doctype:200"}  *in front. and after this, all fields are child
> document, so I can't add any parent document field, if I add parent doc
> field it would give me nothing because field is not exist in child
> document.
>
> But I can still add parent doc in "fq". Does it going to be cause any
> trouble something related to highlight or scoring, because I was using
> parent doc field in q not in fq.
>
> Thanks,
> Novin
>
> On 12 December 2015 at 00:01, Novin  wrote:
>
> > No Worries, I was just wondering what did I miss.  And thanks for blog
> > link.
> >
> >
> > On 11/12/2015 18:52, Mikhail Khludnev wrote:
> >
> >> Novin,
> >>
> >> I regret so much. It's my pet peeve in Solr query parsing. Handling s
> >> space
> >> is dependent from the first symbol of query sting
> >> This will work (starts from '{!' ):
> >> q={!parent which="doctype:200"}flow:[624 TO 700]
> >> These won't due to " ", "+":
> >> q= {!parent which="doctype:200"}flow:[624 TO 700]
> >> q=+{!parent which="doctype:200"}flow:[624 TO 700]
> >> Subordinate clauses with spaces are better handled with "Nested Queries"
> >> or
> >> so, check the post
> >> <
> >>
> http://blog.griddynamics.com/2013/12/grandchildren-and-siblings-with-block.html
> >> >
> >>
> >>
> >> On Fri, Dec 11, 2015 at 6:31 PM, Novin  wrote:
> >>
> >> Hi Guys,
> >>>
> >>> I'm trying  block join query, so I have tried   +{!parent
> >>> which="doctype:200"}flow:624 worked fine. But when i tried
> +{!parent
> >>> which="doctype:200"}flow:[624 TO 700]
> >>>
> >>> Got the below error
> >>>
> >>> org.apache.solr.search.SyntaxError: Cannot parse 'flow_l:[624':
> >>> Encountered \"\" at line 1, column 11.\nWas expecting one of:\n
> >>> \"TO\" ...\n ...\n  ...\n
> >>>
> >>> Just wondering too, can we able to do range in block join query.
> >>>
> >>> Thanks,
> >>> Novin
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>
> >
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





solr cloud invalid shard/collection configuration

2015-12-14 Thread ig01
I have an existing solrcloud 4.4 configured with zookeeper.
 The current setting is 3 shards, each shard has a leader and replica. All
are mapped to the same collection1. 

{"collection1":{
"shards":{
  "shard1":{
"range":"8000-d554",
"state":"active",
"replicas":{
  "core_node4":{
"state":"active",
"core":"collection1",
"node_name":"10.200.101.132:9983_solr",
"base_url":"http://10.200.101.132:9983/solr;,
"leader":"true"},
  "core_node7":{
"state":"active",
"core":"collection1",
"node_name":"10.200.101.131:8983_solr",
"base_url":"http://10.200.101.131:8983/solr"}}},
  "shard2":{
"range":"d555-2aa9",
"state":"active",
"replicas":{
  "core_node2":{
"state":"active",
"core":"collection1",
"node_name":"10.200.101.131:9983_solr",
"base_url":"http://10.200.101.131:9983/solr"},
  "core_node5":{
"state":"active",
"core":"collection1",
"node_name":"10.200.101.133:8983_solr",
"base_url":"http://10.200.101.133:8983/solr;,
"leader":"true"}}},
  "shard3":{
"range":"2aaa-7fff",
"state":"active",
"replicas":{
  "core_node3":{
"state":"active",
"core":"collection1",
"node_name":"10.200.101.132:8983_solr",
"base_url":"http://10.200.101.132:8983/solr"},
  "core_node6":{
"state":"active",
"core":"collection1",
"node_name":"10.200.101.133:9983_solr",
"base_url":"http://10.200.101.133:9983/solr;,
"leader":"true",
"router":"compositeId"}}



I have downloaded solrcloud 5.2.1 and ran solr.cmd. Created almost the same
setting with 2 shards each shard has replica and leader.
 
{"collection1":{
"replicationFactor":"1",
"shards":{
  "shard1_0":{
"range":"8000-",
"state":"active",
"replicas":{
  "core_node3":{
"core":"collection1_shard1_0_replica1",
"base_url":"http://10.1.20.31:8983/solr;,
"node_name":"10.1.20.31:8983_solr",
"state":"active",
"leader":"true"},
  "core_node5":{
"core":"collection1_shard1_0_replica2",
"base_url":"http://10.1.20.31:7574/solr;,
"node_name":"10.1.20.31:7574_solr",
"state":"active"}}},
  "shard1_1":{
"range":"0-7fff",
"state":"active",
"replicas":{
  "core_node4":{
"core":"collection1_shard1_1_replica1",
"base_url":"http://10.1.20.31:8983/solr;,
"node_name":"10.1.20.31:8983_solr",
"state":"active",
"leader":"true"},
  "core_node6":{
"core":"collection1_shard1_1_replica2",
"base_url":"http://10.1.20.31:7574/solr;,
"node_name":"10.1.20.31:7574_solr",
"state":"active",
"router":{"name":"compositeId"},
"maxShardsPerNode":"1",
"autoAddReplicas":"false",
"autoCreated":"true"}}

The problem is when i am indexing a document to http://10.1.20.31:8983/solr,
it is only indexed to collection1_shard1_0_replica1, it is not spreading the
documents to the other shard.. why is that ? is the solr configured well ?
In the existing enviroment, i see only 1 core for all shards, in the new
there are 2 cores for each shard ? 

Please advise.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-cloud-invalid-shard-collection-configuration-tp4245151.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Block Join query

2015-12-14 Thread Novin Novin
Hi Mikhail,

I'm having a little bit problem to construct the query for solr when I have
been trying to use block join query. As you said, i can't use + or 
in front of block join query, so I have to put *{**!parent
which="doctype:200"}  *in front. and after this, all fields are child
document, so I can't add any parent document field, if I add parent doc
field it would give me nothing because field is not exist in child
document.

But I can still add parent doc in "fq". Does it going to be cause any
trouble something related to highlight or scoring, because I was using
parent doc field in q not in fq.

Thanks,
Novin

On 12 December 2015 at 00:01, Novin  wrote:

> No Worries, I was just wondering what did I miss.  And thanks for blog
> link.
>
>
> On 11/12/2015 18:52, Mikhail Khludnev wrote:
>
>> Novin,
>>
>> I regret so much. It's my pet peeve in Solr query parsing. Handling s
>> space
>> is dependent from the first symbol of query sting
>> This will work (starts from '{!' ):
>> q={!parent which="doctype:200"}flow:[624 TO 700]
>> These won't due to " ", "+":
>> q= {!parent which="doctype:200"}flow:[624 TO 700]
>> q=+{!parent which="doctype:200"}flow:[624 TO 700]
>> Subordinate clauses with spaces are better handled with "Nested Queries"
>> or
>> so, check the post
>> <
>> http://blog.griddynamics.com/2013/12/grandchildren-and-siblings-with-block.html
>> >
>>
>>
>> On Fri, Dec 11, 2015 at 6:31 PM, Novin  wrote:
>>
>> Hi Guys,
>>>
>>> I'm trying  block join query, so I have tried   +{!parent
>>> which="doctype:200"}flow:624 worked fine. But when i tried +{!parent
>>> which="doctype:200"}flow:[624 TO 700]
>>>
>>> Got the below error
>>>
>>> org.apache.solr.search.SyntaxError: Cannot parse 'flow_l:[624':
>>> Encountered \"\" at line 1, column 11.\nWas expecting one of:\n
>>> \"TO\" ...\n ...\n  ...\n
>>>
>>> Just wondering too, can we able to do range in block join query.
>>>
>>> Thanks,
>>> Novin
>>>
>>>
>>>
>>>
>>>
>>>
>>
>


Re: Security Problems

2015-12-14 Thread Jan Høydahl
> 1) "read" should cover all the paths

This is very fragile. If all paths were closed by default, forgetting to 
configure a path would not result in a security breach like today.

/Jan

Re: pf2 pf3 and stopwords

2015-12-14 Thread Binoy Dalal
Moreover, the stopword de will work on your queries and not on your
documents, meaning if you query 'Gare de Saint Lazare', the terms actually
searched for will be Gare Saint and Lazare, 'de' will be filtered out.

On Mon, Dec 14, 2015 at 8:49 PM Binoy Dalal  wrote:

> This isn't a bug. During pf3 matching, since your query has only three
> tokens, the entire query will be treated as a single phrase, and with slop
> = 0, any word that comes in the middle of your query  - 'de' in this case
> will cause the phrase to not be matched. If you want to get around this,
> try setting your slop = 1 in which case it should match Gare Saint Lazare
> even with the de in it.
>
> On Mon, Dec 14, 2015 at 7:22 PM elisabeth benoit <
> elisaelisael...@gmail.com> wrote:
>
>> Hello,
>>
>> I am using solr 4.10.1. I have a field with stopwords
>>
>>
>> > words="stopwords.txt"
>> enablePositionIncrements="true"/>
>>
>> And I use pf2 pf3 on that field with a slop of 0.
>>
>> If the request is "Gare Saint Lazare", and I have a document "Gare de
>> Saint
>> Lazare", "de" being a stopword, this document doesn't get the pf3 boost,
>> because of "de".
>>
>> I was wondering, is this normal? is this a bug? is something wrong with my
>> configuration?
>>
>> Best regards,
>> Elisabeth
>>
> --
> Regards,
> Binoy Dalal
>
-- 
Regards,
Binoy Dalal


Re: pf2 pf3 and stopwords

2015-12-14 Thread Binoy Dalal
This isn't a bug. During pf3 matching, since your query has only three
tokens, the entire query will be treated as a single phrase, and with slop
= 0, any word that comes in the middle of your query  - 'de' in this case
will cause the phrase to not be matched. If you want to get around this,
try setting your slop = 1 in which case it should match Gare Saint Lazare
even with the de in it.

On Mon, Dec 14, 2015 at 7:22 PM elisabeth benoit 
wrote:

> Hello,
>
> I am using solr 4.10.1. I have a field with stopwords
>
>
>  enablePositionIncrements="true"/>
>
> And I use pf2 pf3 on that field with a slop of 0.
>
> If the request is "Gare Saint Lazare", and I have a document "Gare de Saint
> Lazare", "de" being a stopword, this document doesn't get the pf3 boost,
> because of "de".
>
> I was wondering, is this normal? is this a bug? is something wrong with my
> configuration?
>
> Best regards,
> Elisabeth
>
-- 
Regards,
Binoy Dalal


re: nested fields

2015-12-14 Thread Rick Leir
On Sun, Dec 13, 2015 at 8:26 PM, 
wrote:

>
> I want to define nested fileds in SOLR using schema.xml.


Us too (using Solr 5.3.1). And doco is not jumping out at me. My approach
is (please suggest a better way)
1/ create a blank core
2/ add a few nested docs using bin/post
3/ use the schema browser to see what fields got added
  http://localhost:8983/solr/#/dorsetdata/schema-browser
4/ add those fields to a schema.xml

Here is an experimental nested doc, which results in 3 'flat' Solr docs:
[
{
"id"  : "552",
"key" : "552",
"pkey": "552",
"type_s": "book",
"lang": "Eng",
"media" : "text",
"canonicalMaster": "tdr/552",
"title": "The object, benefits and history of",
"publisher": "[Halifax, N.S.? : s.n.], 1855 (Halifax, N.S. : J.
)",
"content_type" : "parentDocument",
"_childDocuments_" : [
{
"id"  : "552.8",
"key" : "oocihm.552.8",
"pkey": "oocihm.552",
"type_s": "page",
"label": "Image 8",
"seq" : "8",
"canonicalMaster": "tdr/552/data/files/0008.jpg",
"canonicalMasterMime": "image/jpeg",
"text": "DBJE 11 OF NORMAL SCHOOLS. 3 illustration as a
whole of what such"
},
{
"id"  : "552.9",
"key" : "oocihm.552.9",
"pkey": "oocihm.552",
"type_s": "page",
"label": "Image 9",
"seq" : "9",
"canonicalMaster": "tdr/552/data/files/0009.jpg",
"canonicalMasterMime": "image/jpeg",
"text": "BESEFITS OF NORMAL SCHOOLS. with a natural
titncss. both  "
}
]
}
]

Then a search for normal:
$ curl http://localhost:8983/solr/dorsetdata2/query -d '
q={!parent which="content_type:parentDocument" score=total}
type_s:page AND normal&
wt=json&
omitHeader=true&
indent=false&
fl=score,[child parentFilter=type_s:book childFilter=normal ],title,publisher'

Look for ChildDocTransformerFactory in

https://cwiki.apache.org/confluence/display/solr/Transforming+Result+Documents
cheers - Rick


Re: Block Join query

2015-12-14 Thread Novin Novin
Thanks Man.

On Mon, 14 Dec 2015 at 12:19 Mikhail Khludnev 
wrote:

> In addition to the link in the previous response,
> http://blog.griddynamics.com/2013/09/solr-block-join-support.html provides
> an example of such combination. From my experience fq doen't participate in
> highlighting nor scoring.
>
> On Mon, Dec 14, 2015 at 2:45 PM, Novin Novin  wrote:
>
> > Hi Mikhail,
> >
> > I'm having a little bit problem to construct the query for solr when I
> have
> > been trying to use block join query. As you said, i can't use + or
> 
> > in front of block join query, so I have to put *{**!parent
> > which="doctype:200"}  *in front. and after this, all fields are child
> > document, so I can't add any parent document field, if I add parent doc
> > field it would give me nothing because field is not exist in child
> > document.
> >
> > But I can still add parent doc in "fq". Does it going to be cause any
> > trouble something related to highlight or scoring, because I was using
> > parent doc field in q not in fq.
> >
> > Thanks,
> > Novin
> >
> > On 12 December 2015 at 00:01, Novin  wrote:
> >
> > > No Worries, I was just wondering what did I miss.  And thanks for blog
> > > link.
> > >
> > >
> > > On 11/12/2015 18:52, Mikhail Khludnev wrote:
> > >
> > >> Novin,
> > >>
> > >> I regret so much. It's my pet peeve in Solr query parsing. Handling s
> > >> space
> > >> is dependent from the first symbol of query sting
> > >> This will work (starts from '{!' ):
> > >> q={!parent which="doctype:200"}flow:[624 TO 700]
> > >> These won't due to " ", "+":
> > >> q= {!parent which="doctype:200"}flow:[624 TO 700]
> > >> q=+{!parent which="doctype:200"}flow:[624 TO 700]
> > >> Subordinate clauses with spaces are better handled with "Nested
> Queries"
> > >> or
> > >> so, check the post
> > >> <
> > >>
> >
> http://blog.griddynamics.com/2013/12/grandchildren-and-siblings-with-block.html
> > >> >
> > >>
> > >>
> > >> On Fri, Dec 11, 2015 at 6:31 PM, Novin  wrote:
> > >>
> > >> Hi Guys,
> > >>>
> > >>> I'm trying  block join query, so I have tried   +{!parent
> > >>> which="doctype:200"}flow:624 worked fine. But when i tried
> > +{!parent
> > >>> which="doctype:200"}flow:[624 TO 700]
> > >>>
> > >>> Got the below error
> > >>>
> > >>> org.apache.solr.search.SyntaxError: Cannot parse 'flow_l:[624':
> > >>> Encountered \"\" at line 1, column 11.\nWas expecting one of:\n
> > >>> \"TO\" ...\n ...\n  ...\n
> > >>>
> > >>> Just wondering too, can we able to do range in block join query.
> > >>>
> > >>> Thanks,
> > >>> Novin
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>
> > >
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> 
> 
>


RE: how to secure standalone solr

2015-12-14 Thread Davis, Daniel (NIH/NLM) [C]
Wait a second. There are other sorts of ways to secure Solr that don't work 
with any sort role-based security control.   What I do is place a reverse-proxy 
in front of Apache Solr on port 80, and have that reverse proxy use CAS 
authentication.  I also have a list of "valid-users" who may use the Solr admin 
UI.

Then, I have port 8983 open in my port-based host firewall (iptables), but it 
only allows the hosts that need to talk directly to Solr.  Firewalls prevent 
the other accesses.  Many security wrappers such as mod_auth_cas, which works 
with Apache httpd, can set a request header such as REMOTE_USER to the username 
of the individual who has authenticated with the wrapper.   In fact, I'm hoping 
security.json can eventually be made to work with such a header.

-Original Message-
From: Noble Paul [mailto:noble.p...@gmail.com] 
Sent: Friday, December 11, 2015 8:12 PM
To: solr-user@lucene.apache.org
Subject: Re: how to secure standalone solr

For standalone Solr , Kerberos is the only option for authentication.
If you have  a SolrCloud setup, you have other options

https://cwiki.apache.org/confluence/display/solr/Basic+Authentication+Plugin
https://cwiki.apache.org/confluence/display/solr/Rule-Based+Authorization+Plugin

On Fri, Dec 11, 2015 at 11:02 PM, Don Bosco Durai  wrote:
>>Anyone told me how to secure standalone solr .
> Recently there were few discussion on this. In short, it is not tested and 
> there doesn’t seem to a plan to test it.
>
>>1.)using Kerberos Plugin is a good practice or any other else.
> The answer depends how you are using it. Where you are deploying it, who is 
> accessing it, whether you want to restrict by access type (read/write), what 
> authentication environment (LDAP/AD, Kerberos, etc) you already have.
>
> Depending upon your use cases and environment, you may have one or more 
> options.
>
> Bosco
>
>
>
>
>
>
> On 12/11/15, 4:27 AM, "Mugeesh Husain"  wrote:
>
>>Hello,
>>
>>Anyone told me how to secure standalone solr .
>>
>>1.)using Kerberos Plugin is a good practice or any other else.
>>
>>
>>
>>--
>>View this message in context: 
>>http://lucene.472066.n3.nabble.com/how-to-secure-standalone-solr-tp424
>>4866.html Sent from the Solr - User mailing list archive at 
>>Nabble.com.
>



--
-
Noble Paul


Memory leak in SolrCloud 4.6

2015-12-14 Thread Mark Houts
I am running a SolrCloud 4.6 cluster with three solr nodes and three
external zookeeper nodes. Each Solr node has 12GB RAM. 8GB RAM dedicated to
the JVM.

When solr is started it consumes barely 1GB but over the course of 36 to 48
hours physical memory will be consumed and swap will be used. The i/o
latency of using swap will soon make the machine so slow that it will
become unresponsive.

Has anyone had experience with memory leaks in this version?

Regards,

M Houts


Re: Providing own _version field in solr doc

2015-12-14 Thread Debraj Manna
Can I somehow get "documentVersion" for each doc  back in the Update
Response like the way we get _version back in Optimistic Concurrency when
we set "version=true" in the update request?
On Dec 14, 2015 10:58 PM, "Chris Hostetter" 
wrote:

>
> The _version_ field used to optimistic concurrency can't be user supplied
> -- it's not just a record of the *document's* version, but actually a
> record of the *update command* version -- so even deleteByQuery commands
> have one -- and the order must (internally) increase across all types of
> updates to a shard to ensure that they are executed in the correct order
> across all replicas.
>
> However...
>
> Your use case sounds like an exact match for the
> DocBasedVersionConstraintsProcessorFactory described here...
>
>
> https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents#UpdatingPartsofDocuments-DocumentCentricVersioningConstraints
>
> http://lucene.apache.org/solr/5_4_0/solr-core/org/apache/solr/update/processor/DocBasedVersionConstraintsProcessorFactory.html
>
>
>
> : Date: Mon, 14 Dec 2015 22:47:43 +0530
> : From: Debraj Manna 
> : Reply-To: solr-user@lucene.apache.org
> : To: solr-user@lucene.apache.org
> : Subject: Providing own _version field in solr doc
> :
> : We have a use case in which there are multiple clients writing
> concurrently
> : to solr. Each of the doc is having an 'timestamp' field which indicates
> : when these docs were generated.
> :
> : We also have to ensure that any old doc doesn't overwrite any new doc in
> : solr. So to achieve this we were thinking if we can make use of the
> : _version field in solr doc and set the _version field equal to the
> : 'timestamp' field that is present in each doc.
> :
> : Can someone let me know if the approach that we thought can be done? If
> not
> : can someone suggest some other approach of achieving the same with
> minimum
> : calls to solr?
> :
>
> -Hoss
> http://www.lucidworks.com/
>


Partial sentence match with block join

2015-12-14 Thread Yangrui Guo
Hello

I've been using 5.3.1. I would like to enable this feature: when user
enters a query, the results should include documents that also partially
match the query. For example, the document is Apple Company
and user query is "apple computer company". Though the document is missing
the term "computer". I've tried phrase slop but it doesn't seem to be
working with block join. How can I do this in solr?

Thanks

Yangrui


Is DIH going to be removed from Solr future versions?

2015-12-14 Thread Anil Cherian
Dear Team,

I use DIH extensively and even wrote my own custom transformers in some
situations.
Recently during an architecture discussion one of my team members told that
Solr is going to take away DIH from its future versions.

Is that true?

Also is using DIH for say 2 or 3 million docs a good option for indexing an
XML content data set. I am planning to use it either by calling separate
entities parallely or multiple /dataimport in solrconfig.xml.

Cld you please reply at your earliest convenience as it is an important
decision for us to continue on DIH or not!

Thanks and Rgds,
Anil.


Re: solr cloud invalid shard/collection configuration

2015-12-14 Thread ig01
Hi, thanks for the answer.


We installed solr with solr.cmd -e cloud utility that comes with the
installation.
The names of shards are odd because in this case after the installation
We've migrated an old index from our other environment (wich is solr single
node) and splitted it with Collection API splitt command.
The splitting completed successfuly, documents were spreaded almost equally
between two shards and I was able to retrieve our old documents. After that
I deleted the old shard that was splitted (with Collection API delete
command).

Anyway this behavior is the same also for a regular solr cloud installation
with solr.cmd -e cloud, without any index migration...

We are indexing our documents by using the
url="http://10.1.20.31/8983/solr/collection1/;.
After the installation we indexed 4 documents and they all indexed on
the same shard. 

Thanks in advance,
Inna.






--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-cloud-invalid-shard-collection-configuration-tp4245151p4245419.html
Sent from the Solr - User mailing list archive at Nabble.com.


pf2 pf3 and stopwords

2015-12-14 Thread elisabeth benoit
Hello,

I am using solr 4.10.1. I have a field with stopwords




And I use pf2 pf3 on that field with a slop of 0.

If the request is "Gare Saint Lazare", and I have a document "Gare de Saint
Lazare", "de" being a stopword, this document doesn't get the pf3 boost,
because of "de".

I was wondering, is this normal? is this a bug? is something wrong with my
configuration?

Best regards,
Elisabeth


[ANNOUNCE] Apache Solr 5.4.0 released

2015-12-14 Thread Upayavira
14 December 2015, Apache Solr™ 5.4 available

Solr is the popular, blazing fast, open source NoSQL search platform
from the Apache Lucene project. Its major features include powerful
full-text search, hit highlighting, faceted search, dynamic
clustering, database integration, rich document (e.g., Word, PDF)
handling, and geospatial search.  Solr is highly scalable, providing
fault tolerant distributed search and indexing, and powers the search
and navigation features of many of the world's largest internet sites.

Solr 5.4 is available for immediate download at:
  http://lucene.apache.org/solr/mirrors-solr-latest-redir.html

Please read CHANGES.txt for a full list of new features and changes:
  https://lucene.apache.org/solr/5_4_0/changes/Changes.html

Solr 5.4 Release Highlights:

New Features


UI Changes
 * The re-architected Admin UI is now prominently linked to from the 
   existing UI, and includes support for managing collections as well 
   as creating and removing fields via the schema tab. Expect it to 
   be default in the next release.

API Features
 * New Collections APIs for migrating from clusterstate.json to 
   per-collection state.json and forcing the election of a leader 
   when all replicas in a shard are down.
 * A new configset management API has been added.

Querying Features
 * Filter cache is now accessible via a solr query syntax.
 * ScoreJoins can now refer to a single-sharded collection that is 
   replicated on all nodes.
 * Add boost support, and 'exclude the queried document' in MoreLikeThis 
   QParser.
 * Add a 'sort' local param to the collapse QParser to support using 
   complex sort options to select the representitive doc for each 
   collapsed group.

Other Features
 * SolrJ now has support for connecting to Solr using basic
 authentication.
 * Analyzing suggesters can now filter suggestions by a context field.
 * JSON Facet API: add "method" param to terms/field facets to give an 
   execution hint for what method should be used to facet.
 * CloneFieldUpdateProcessorFactory now supports choosing a "dest" field 
   name based on a regex pattern and replacement init options.
 * Provide pluggable context tool support for VelocityResponseWriter.

Please report any feedback to the mailing lists
(http://lucene.apache.org/solr/discussion.html)

Note: The Apache Software Foundation uses an extensive mirroring network
for distributing releases. It is possible that the mirror you are using
may not have replicated the release yet. If that is the case, please try
another mirror. This also applies to Maven access.


Re: Security Problems

2015-12-14 Thread Noble Paul
". If all paths were closed by default, forgetting to configure a path
would not result in a security breach like today."

But it will still mean that unauthorized users are able to access,
like guest being able to post to "/update". Just authenticating is not
enough without proper authorization

On Mon, Dec 14, 2015 at 3:59 PM, Jan Høydahl  wrote:
>> 1) "read" should cover all the paths
>
> This is very fragile. If all paths were closed by default, forgetting to 
> configure a path would not result in a security breach like today.
>
> /Jan



-- 
-
Noble Paul


Re: Solr5.3.1 solrcloud Enabling Basic AUthentication

2015-12-14 Thread Noble Paul
You don't need to submit a sha256, Solr will do itself. Just use the
provided commands

please refer this
https://cwiki.apache.org/confluence/display/solr/Basic+Authentication+Plugin

On Mon, Dec 14, 2015 at 6:56 AM, soledede_w...@ehsy.com <
soledede_w...@ehsy.com> wrote:

> I want to restrict Admin UI , I know can config the security.json
>
> but how.
>
>
> if use sha256(password+salt) hash) ,how to submit the salt for solr
> server.
>
> who can give me a simple example
> Thanks
>
> --
> soledede_w...@ehsy.com
>



-- 
-
Noble Paul


Re: Help Indexing Large File

2015-12-14 Thread Erick Erickson
Well, this usually means the maximum packet size has been exceeded,
there are several possibilities here that I'm going to skip over
because I have to ask the purpose of indexing a 5G file.

Indexing such a huge file has several problems from a user's perspective:
1> assuming the bulk of it is text, it'll be hit on many, many searches.
2> because it is so large, it'll probably rank quite a ways down the
list, users may rarely see it.
3> even if it is found and a user clicks on the doc, what then? You
can't reasonably fetch it from a server and display it.

In short, before diving into the mechanics of why you get this error
and correcting that, I'd be sure it made any sense to even try to
index this doc. It may, don't get me wrong. Just askin'.

Also, if this is some kind of binary file (say a movie or something),
and what you're trying to index is actually the metadata, consider
extracting that on the client side with SolrJ and/or Tika and just
sending the data you expect to index to Solr. This scales much better
than sending huge does to Solr and letting that poor little server
extract _and_ index _and_ serve queries ;).

Best,
Erick

On Mon, Dec 14, 2015 at 9:04 AM, Antelmo Aguilar
 wrote:
> Hello,
>
> I am trying to index a very large file in Solr (around 5GB).  However, I
> get out of memory errors using Curl.  I tried using the post script and I
> had some success with it.  After indexing several hundred thousand records
> though, I got the following error message:
>
> *SimplePostTool: FATAL: IOException while posting data:
> java.io.IOException: too many bytes written*
>
> Would it be possible to get some help on where I can start looking to solve
> this issue?  I tried finding some type of log that would give me more
> information.  I have not had any luck.  The only logs I was able to find
> related to this error were the logs from Solr, but I assume these are from
> the "server" perspective and not "cient's" perspective of the error.  I
> would really appreciate the help.
>
> Thanks,
> Antelmo


Providing own _version field in solr doc

2015-12-14 Thread Debraj Manna
We have a use case in which there are multiple clients writing concurrently
to solr. Each of the doc is having an 'timestamp' field which indicates
when these docs were generated.

We also have to ensure that any old doc doesn't overwrite any new doc in
solr. So to achieve this we were thinking if we can make use of the
_version field in solr doc and set the _version field equal to the
'timestamp' field that is present in each doc.

Can someone let me know if the approach that we thought can be done? If not
can someone suggest some other approach of achieving the same with minimum
calls to solr?


Best practice for incremental Data Import Handler

2015-12-14 Thread Gian Maria Ricci - aka Alkampfer
Hi,

 

I just want some feedback on best practice to run incremental DIH. During
last years I always preferred to have dedicated application that pushes data
inside ElasticSearch / Solr, but now I have a situation where we are forced
to use DIH.

 

I have several SQL Server database with a column of type timestamp (I'm
trying to understand if it is possible to have a standard DateTime column).

 

In the past I've written a super simple C# routine that executes these macro
steps

 

1)  Query solr to understand if the DIH is running (to avoid problem if
multiple instances fired togheter)

2)  Query solr to get the document with higher timestamp value

3)  Launch DIH passing the higer timestamp value to do incremental
population (Greater than or equal)

4)  Monitor DIH and wait for it to finish.

 

I never had problem with this approach, but actually I'm wondering if there
is some better approach instead of having a custom routine that manage
running DIH. Also I'm in a situation where we are not allowed to run C#
code, so we should rewrite that simple program in Node.js or plain bash
shell. My aim is not reimplementing the wheel J.

 

Thanks for any suggestion you can give me.

--
Gian Maria Ricci
Cell: +39 320 0136949

 

   


 



Re: Help Indexing Large File

2015-12-14 Thread Toke Eskildsen
Antelmo Aguilar  wrote:
> I am trying to index a very large file in Solr (around 5GB).  However, I
>get out of memory errors using Curl.  I tried using the post script and I
> had some success with it.  After indexing several hundred thousand records
> though, I got the following error message:

This indicates that your file contains a lot of documents. The solution is to 
create smaller files and send more of them. Maybe a few hundred MB, to keep it 
manageable?

> *SimplePostTool: FATAL: IOException while posting data:
> java.io.IOException: too many bytes written*

A look in the postData method in SimplePostTool (at least for Solr 4.10, which 
is what my editor had open) reveals that it takes the length of the file as an 
Integer, which overflows when the file is more than 2GB. This means the 
HttpUrlComponent, that is used for posting, gets the wrong expected size and 
throws the exception when that is exceeded.

A real fix (if it is not already in Solr 5) would be to fail fast if the file 
is larger than Integer.MAX_VALUE.

- Toke Eskildsen


Re: Providing own _version field in solr doc

2015-12-14 Thread Andrea Gazzarini
Hi Debraj,
I think this nice article [1] from Yonik could be helpful.

Andrea

[1] http://yonik.com/solr/optimistic-concurrency/

2015-12-14 18:17 GMT+01:00 Debraj Manna :

> We have a use case in which there are multiple clients writing concurrently
> to solr. Each of the doc is having an 'timestamp' field which indicates
> when these docs were generated.
>
> We also have to ensure that any old doc doesn't overwrite any new doc in
> solr. So to achieve this we were thinking if we can make use of the
> _version field in solr doc and set the _version field equal to the
> 'timestamp' field that is present in each doc.
>
> Can someone let me know if the approach that we thought can be done? If not
> can someone suggest some other approach of achieving the same with minimum
> calls to solr?
>


Re: Defining SOLR nested fields

2015-12-14 Thread Tom Evans
On Sun, Dec 13, 2015 at 6:40 PM, santosh sidnal
 wrote:
> Hi All,
>
> I want to define nested fileds in SOLR using schema.xml. we are using Apache
> Solr 4.7.0.
>
> i see some links which says how to do, but not sure how can i do it in
> schema.xml
> https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-BlockJoinQueryParsers
>
>
> any help over here is appreciable.
>

With nested documents, it is better to not think of them as
"children", but as related documents. All the documents in your index
will follow exactly the same schema, whether they are "children" or
"parents", and the nested aspect of a a document simply allows you to
restrict your queries based upon that relationship.

Solr is extremely efficient dealing with sparse documents (docs with
only a few fields defined), so one way is to define all your fields
for "parent" and "child" in the schema, and only use the appropriate
ones in the right document. Another way is to use a schema-less
structure, although I'm not a fan of that for error checking reasons.
You can also define a suffix or prefix for fields that you use as part
of your methodology, so that you know what domain it belongs in, but
that would just be for your benefit, Solr would not complain if you
put a "child" field in a parent or vice-versa.

Cheers

Tom

PS:

I would not use Solr 4.7 for this. Nested docs are a new-ish feature,
you may encounter bugs that have been fixed in later versions, and
performance has certainly been improved in later versions. Faceting on
a specific domain (eg, on children or parents) is only supported by
the JSON facet API, which was added in 5.2, and the current stable
version of Solr is 5.4.


Re: Providing own _version field in solr doc

2015-12-14 Thread Alexandre Rafalovitch
At the first glance, this sounds like a perfect match to
https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents#UpdatingPartsofDocuments-DocumentCentricVersioningConstraints

Just make sure your "timestamps" are truly atomic and not local
clock-based. The drift could cause interesting problems.

Regards,
   Alex.

Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 14 December 2015 at 12:17, Debraj Manna  wrote:
> We have a use case in which there are multiple clients writing concurrently
> to solr. Each of the doc is having an 'timestamp' field which indicates
> when these docs were generated.
>
> We also have to ensure that any old doc doesn't overwrite any new doc in
> solr. So to achieve this we were thinking if we can make use of the
> _version field in solr doc and set the _version field equal to the
> 'timestamp' field that is present in each doc.
>
> Can someone let me know if the approach that we thought can be done? If not
> can someone suggest some other approach of achieving the same with minimum
> calls to solr?


Help Indexing Large File

2015-12-14 Thread Antelmo Aguilar
Hello,

I am trying to index a very large file in Solr (around 5GB).  However, I
get out of memory errors using Curl.  I tried using the post script and I
had some success with it.  After indexing several hundred thousand records
though, I got the following error message:

*SimplePostTool: FATAL: IOException while posting data:
java.io.IOException: too many bytes written*

Would it be possible to get some help on where I can start looking to solve
this issue?  I tried finding some type of log that would give me more
information.  I have not had any luck.  The only logs I was able to find
related to this error were the logs from Solr, but I assume these are from
the "server" perspective and not "cient's" perspective of the error.  I
would really appreciate the help.

Thanks,
Antelmo


Re: Providing own _version field in solr doc

2015-12-14 Thread Chris Hostetter

The _version_ field used to optimistic concurrency can't be user supplied 
-- it's not just a record of the *document's* version, but actually a 
record of the *update command* version -- so even deleteByQuery commands 
have one -- and the order must (internally) increase across all types of 
updates to a shard to ensure that they are executed in the correct order 
across all replicas.

However...

Your use case sounds like an exact match for the 
DocBasedVersionConstraintsProcessorFactory described here...

https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents#UpdatingPartsofDocuments-DocumentCentricVersioningConstraints
http://lucene.apache.org/solr/5_4_0/solr-core/org/apache/solr/update/processor/DocBasedVersionConstraintsProcessorFactory.html



: Date: Mon, 14 Dec 2015 22:47:43 +0530
: From: Debraj Manna 
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org
: Subject: Providing own _version field in solr doc
: 
: We have a use case in which there are multiple clients writing concurrently
: to solr. Each of the doc is having an 'timestamp' field which indicates
: when these docs were generated.
: 
: We also have to ensure that any old doc doesn't overwrite any new doc in
: solr. So to achieve this we were thinking if we can make use of the
: _version field in solr doc and set the _version field equal to the
: 'timestamp' field that is present in each doc.
: 
: Can someone let me know if the approach that we thought can be done? If not
: can someone suggest some other approach of achieving the same with minimum
: calls to solr?
: 

-Hoss
http://www.lucidworks.com/


Re: solr cloud invalid shard/collection configuration

2015-12-14 Thread Erick Erickson
On a quick glance those look OK, what commands did you use _exactly_
to create your new collection? The names are a bit odd and it's not
clear how they could have gotten that way. how many documents have you
tried to index to your new collection? Any errors in the logs?

And how many documents are indexed to the single shard in your tests?
It's unlikely but possible that if you're using just a few docs they
all happen to hash into one shard.

bq: In the existing enviroment, i see only 1 core for all shards
Also not sure what's going on here. Did you perhaps use the _core_
admin API on the original collection? The Collections API imposes a
naming convention.

We need to know the exact steps you used in each case, exactly how
you're  indexing documents etc.

Best,
Erick

On Mon, Dec 14, 2015 at 1:41 AM, ig01  wrote:
> I have an existing solrcloud 4.4 configured with zookeeper.
>  The current setting is 3 shards, each shard has a leader and replica. All
> are mapped to the same collection1.
>
> {"collection1":{
> "shards":{
>   "shard1":{
> "range":"8000-d554",
> "state":"active",
> "replicas":{
>   "core_node4":{
> "state":"active",
> "core":"collection1",
> "node_name":"10.200.101.132:9983_solr",
> "base_url":"http://10.200.101.132:9983/solr;,
> "leader":"true"},
>   "core_node7":{
> "state":"active",
> "core":"collection1",
> "node_name":"10.200.101.131:8983_solr",
> "base_url":"http://10.200.101.131:8983/solr"}}},
>   "shard2":{
> "range":"d555-2aa9",
> "state":"active",
> "replicas":{
>   "core_node2":{
> "state":"active",
> "core":"collection1",
> "node_name":"10.200.101.131:9983_solr",
> "base_url":"http://10.200.101.131:9983/solr"},
>   "core_node5":{
> "state":"active",
> "core":"collection1",
> "node_name":"10.200.101.133:8983_solr",
> "base_url":"http://10.200.101.133:8983/solr;,
> "leader":"true"}}},
>   "shard3":{
> "range":"2aaa-7fff",
> "state":"active",
> "replicas":{
>   "core_node3":{
> "state":"active",
> "core":"collection1",
> "node_name":"10.200.101.132:8983_solr",
> "base_url":"http://10.200.101.132:8983/solr"},
>   "core_node6":{
> "state":"active",
> "core":"collection1",
> "node_name":"10.200.101.133:9983_solr",
> "base_url":"http://10.200.101.133:9983/solr;,
> "leader":"true",
> "router":"compositeId"}}
>
>
>
> I have downloaded solrcloud 5.2.1 and ran solr.cmd. Created almost the same
> setting with 2 shards each shard has replica and leader.
>
> {"collection1":{
> "replicationFactor":"1",
> "shards":{
>   "shard1_0":{
> "range":"8000-",
> "state":"active",
> "replicas":{
>   "core_node3":{
> "core":"collection1_shard1_0_replica1",
> "base_url":"http://10.1.20.31:8983/solr;,
> "node_name":"10.1.20.31:8983_solr",
> "state":"active",
> "leader":"true"},
>   "core_node5":{
> "core":"collection1_shard1_0_replica2",
> "base_url":"http://10.1.20.31:7574/solr;,
> "node_name":"10.1.20.31:7574_solr",
> "state":"active"}}},
>   "shard1_1":{
> "range":"0-7fff",
> "state":"active",
> "replicas":{
>   "core_node4":{
> "core":"collection1_shard1_1_replica1",
> "base_url":"http://10.1.20.31:8983/solr;,
> "node_name":"10.1.20.31:8983_solr",
> "state":"active",
> "leader":"true"},
>   "core_node6":{
> "core":"collection1_shard1_1_replica2",
> "base_url":"http://10.1.20.31:7574/solr;,
> "node_name":"10.1.20.31:7574_solr",
> "state":"active",
> "router":{"name":"compositeId"},
> "maxShardsPerNode":"1",
> "autoAddReplicas":"false",
> "autoCreated":"true"}}
>
> The problem is when i am indexing a document to http://10.1.20.31:8983/solr,
> it is only indexed to collection1_shard1_0_replica1, it is not spreading the
> documents to the other shard.. why is that ? is the solr configured well ?
> In the existing enviroment, i see only 1 core for all shards, in the new
> there are 2 cores for each shard ?
>
> Please advise.
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/solr-cloud-invalid-shard-collection-configuration-tp4245151.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: SOLR-7996

2015-12-14 Thread Upayavira


On Mon, Dec 14, 2015, at 06:20 PM, Jamie Johnson wrote:
> Has anyone looked at this issue? I'd be willing to take a stab at it if
> someone could provide some high level design guidance. This would be a
> critical piece preventing us from moving to version 5.

Just start working on it, Jamie.

Make a proposal for what you think could work. Post a patch that
demonstrates it. I suspect you'll be more likely to get more traction
that way. Also, copy the content of that link into the description of
the ticket so people don't need to leave the ticket to get its full
context.

Upayavira


How for distributed search only log collective search response

2015-12-14 Thread Koorosh Vakhshoori
  In my use case, I have a number of shards where a query would run as 
distributed search.  I am not using Solr Cloud, I have just a Solr server. Now, 
when the search runs, I see one entry for each shard query as well as the 
finally collective search query response. As the results, I am ending up with a 
very noisy log. I don't care about individual shard queries, just the aggregate 
result. Is there a way to configure Solr so it would only log the final 
collective response? I believe this use case also applies to Solr Cloud.

  Looking at the Solr code, class SolrCore, I see the following lines is 
performing the logging:

if (rsp.getToLog().size() > 0) {
  if (requestLog.isInfoEnabled()) {
requestLog.info(rsp.getToLogAsString(logid));
  }

  I was thinking of adding a flag that filter the distributed logs by looking 
at the 'params' and check for 'isShard=true' and if present don't log it.

  Any suggestion or comment? Is this something people would be interested in?

Regards,

Koorosh



Re: Moving to SolrCloud, specifying dataDir correctly

2015-12-14 Thread Shawn Heisey
On 12/14/2015 10:49 AM, Tom Evans wrote:
> When I tried this in SolrCloud mode, specifying
> "-Dsolr.data.dir=/mnt/solr/" when starting each node, it worked fine
> for the first collection, but then the second collection tried to use
> the same directory to store its index, which obviously failed. I fixed
> this by changing solrconfig.xml in each collection to specify a
> specific directory, like so:
>
>   ${solr.data.dir:}products
>
> Looking back after the weekend, I'm not a big fan of this. Is there a
> way to add a core.properties to ZK, or a way to specify
> core.baseDatadir on the command line, or just a better way of handling
> this that I'm not aware of?

Since you're running SolrCloud, just let Solr handle the dataDir, don't
try to override it.  It will default to "data" relative to the
instanceDir.  Each instanceDir is likely to be in the solr home.

With SolrCloud, your cores will not contain a "conf" directory (unless
you create it manually), therefore the on-disk locations will be *only*
data, there's not really any need to have separate locations for
instanceDir and dataDir.  All active configuration information for
SolrCloud is in zookeeper.

Thanks,
Shawn



Moving to SolrCloud, specifying dataDir correctly

2015-12-14 Thread Tom Evans
Hi all

We're currently in the process of migrating our distributed search
running on 5.0 to SolrCloud running on 5.4, and setting up a test
cluster for performance testing etc.

We have several cores/collections, and in each core's solrconfig.xml,
we were specifying an empty , and specifying the same
core.baseDataDir in core.properties.

When I tried this in SolrCloud mode, specifying
"-Dsolr.data.dir=/mnt/solr/" when starting each node, it worked fine
for the first collection, but then the second collection tried to use
the same directory to store its index, which obviously failed. I fixed
this by changing solrconfig.xml in each collection to specify a
specific directory, like so:

  ${solr.data.dir:}products

Looking back after the weekend, I'm not a big fan of this. Is there a
way to add a core.properties to ZK, or a way to specify
core.baseDatadir on the command line, or just a better way of handling
this that I'm not aware of?

Cheers

Tom


SOLR-7996

2015-12-14 Thread Jamie Johnson
Has anyone looked at this issue? I'd be willing to take a stab at it if
someone could provide some high level design guidance. This would be a
critical piece preventing us from moving to version 5.

Jamie


Re: Moving to SolrCloud, specifying dataDir correctly

2015-12-14 Thread Tom Evans
On Mon, Dec 14, 2015 at 1:22 PM, Shawn Heisey  wrote:
> On 12/14/2015 10:49 AM, Tom Evans wrote:
>> When I tried this in SolrCloud mode, specifying
>> "-Dsolr.data.dir=/mnt/solr/" when starting each node, it worked fine
>> for the first collection, but then the second collection tried to use
>> the same directory to store its index, which obviously failed. I fixed
>> this by changing solrconfig.xml in each collection to specify a
>> specific directory, like so:
>>
>>   ${solr.data.dir:}products
>>
>> Looking back after the weekend, I'm not a big fan of this. Is there a
>> way to add a core.properties to ZK, or a way to specify
>> core.baseDatadir on the command line, or just a better way of handling
>> this that I'm not aware of?
>
> Since you're running SolrCloud, just let Solr handle the dataDir, don't
> try to override it.  It will default to "data" relative to the
> instanceDir.  Each instanceDir is likely to be in the solr home.
>
> With SolrCloud, your cores will not contain a "conf" directory (unless
> you create it manually), therefore the on-disk locations will be *only*
> data, there's not really any need to have separate locations for
> instanceDir and dataDir.  All active configuration information for
> SolrCloud is in zookeeper.
>

That makes sense, but I guess I was asking the wrong question :)

We have our SSDs mounted on /data/solr, which is where our indexes
should go, but our solr install is on /opt/solr, with the default solr
home in /opt/solr/server/solr. How do we change where the indexes get
put so they end up on the fast storage?

Cheers

Tom


Re: Help Indexing Large File

2015-12-14 Thread Jack Krupansky
What is the nature of the file? Is it Solr XML, CSV, PDF (via Solr Cell),
or... what? If a PDF, maybe it has lots of hi-resolution images. If so, you
may need to strip out the images and just send the text, which would be a
lot smaller. For example, you could run Tika locally to extract the text
and then index the raw text.

-- Jack Krupansky

On Mon, Dec 14, 2015 at 12:04 PM, Antelmo Aguilar  wrote:

> Hello,
>
> I am trying to index a very large file in Solr (around 5GB).  However, I
> get out of memory errors using Curl.  I tried using the post script and I
> had some success with it.  After indexing several hundred thousand records
> though, I got the following error message:
>
> *SimplePostTool: FATAL: IOException while posting data:
> java.io.IOException: too many bytes written*
>
> Would it be possible to get some help on where I can start looking to solve
> this issue?  I tried finding some type of log that would give me more
> information.  I have not had any luck.  The only logs I was able to find
> related to this error were the logs from Solr, but I assume these are from
> the "server" perspective and not "cient's" perspective of the error.  I
> would really appreciate the help.
>
> Thanks,
> Antelmo
>


RE: Re:Re: Implementing security.json is breaking ADDREPLICA

2015-12-14 Thread Oakley, Craig (NIH/NLM/NCBI) [C]
Anshum and Nobel,

I've downloaded 5.4, and this seems to be working so far

Thanks again

-Original Message-
From: Anshum Gupta [mailto:ans...@anshumgupta.net] 
Sent: Tuesday, December 01, 2015 12:52 AM
To: solr-user@lucene.apache.org
Subject: Re: Re:Re: Implementing security.json is breaking ADDREPLICA

Hi Craig,

As part of my manual testing for the 5.3 RC, I tried out collection admin
request restriction and update restriction on a single node setup. I don't
have the manual test steps documented but it wasn't too intensive I'd
admit. I think the complications involved in stopping specific nodes and
bringing them back up stop us from testing the node restarts as part of the
automated tests but we should find a way and fix that.

I've just found another issue and opened SOLR-8355 for the same and it
involves the "update" permission.

As far as patching 5.3.1 go, it's involves more than just this one patch
and this patch alone wouldn't help you resolve this issue. You'd certainly
need the patch from SOLR-8167. Also, make sure you actually use the
'commit' and not the posted patch as the patch on SOLR-8167 is different
from the commit. I don't think you'd need anything other than those patches
and whatever comes from 8355 to have a patched 5.3.1.

Any help in testing this out would be awesome and thanks for reporting and
following up on the issues!


On Tue, Dec 1, 2015 at 6:09 AM, Oakley, Craig (NIH/NLM/NCBI) [C] <
craig.oak...@nih.gov> wrote:

> Thank you, Anshum and Nobel, for your progress on SOLR-8326
>
> I have a couple questions to tide me over until 5.4 (hoping to test
> security.json a bit further while I wait).
>
> Given that the seven steps (tar xvzf solr-5.3.1.tgz; tar xvzf
> zookeeper-3.4.6.tar.gz; zkServer.sh start zoo_sample.cfg; zkcli.sh -zkhost
> localhost:2181 -cmd putfile /security.json ~/security.json; solr start -e
> cloud -z localhost:2181; solr stop -p 7574 & solr start -c -p 7574 -s
> "example/cloud/node2/solr" -z localhost:2181) demonstrate the problem, are
> there a similar set of steps by which one can load _some_ minimal
> security.json and still be able to stop & successfully restart one node of
> the cluster? (I am wondering what steps were used in the original testing
> of 5.3.1)
>
> Also, has it been verified that the SOLR-8326 patch resolves the
> ADDREPLICA bug in addition to the
> shutdown-&-restart-one-node-while-keeping-another-node-running bug?
>
> Also, would it make sense for me to download solr-5.3.1-src.tgz and (in a
> test environment) make the changes described in the latest attachment to
> SOLR-8326? Or would it be more advisable just to wait for 5.4? I don't know
> what may be involved in compiling a new solr.war from the source code.
>
> Thanks again
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Tuesday, November 24, 2015 1:25 PM
> To: solr-user 
> Subject: Re: Re:Re: Implementing security.json is breaking ADDREPLICA
>
> bq: I don't suppose there is an ETA for 5.4?
>
> Actually, 5.4 is probably in the works within the next month. I'm not
> the one cutting the
> release, but there's some rumors that a label will be cut this week,
> then the "usual"
> process is a week or two (sometimes more if bugs are flushed out) before
> the
> official release.
>
> Call it the first of the year for safety's sake, but that's a guess.
>
> Best,
> Erick
>
> On Tue, Nov 24, 2015 at 10:22 AM, Oakley, Craig (NIH/NLM/NCBI) [C]
>  wrote:
> > Thanks for the reply,
> >
> > I don't suppose there is an ETA for 5.4?
> >
> >
> > Thanks again
> >
> > -Original Message-
> ...
>



-- 
Anshum Gupta


Re: Moving to SolrCloud, specifying dataDir correctly

2015-12-14 Thread Erick Erickson
Currently, it'll be a little tedious but here's what you can do (going
partly from memory)...

When you create the collection, specify the special value EMPTY for
createNodeSet (Solr 5.3+).
Use ADDREPLICA to add each individual replica. When you do this, you
can add a dataDir for
each individual replica and thus keep them separate, i.e. for a
particular box the first
replica would get /data/solr/collection1_shard1_replica1, the second
/data/solr/collection1_shard2_replica1 and so forth.

If you don't have Solr 5.3+, you can still to the same thing, except
you create your collection letting
the replicas fall where they will. Then do the ADDREPLICA as above.
When that's all done,
DELETEREPLICA for the original replicas.

Best,
Erick

On Mon, Dec 14, 2015 at 2:21 PM, Tom Evans  wrote:
> On Mon, Dec 14, 2015 at 1:22 PM, Shawn Heisey  wrote:
>> On 12/14/2015 10:49 AM, Tom Evans wrote:
>>> When I tried this in SolrCloud mode, specifying
>>> "-Dsolr.data.dir=/mnt/solr/" when starting each node, it worked fine
>>> for the first collection, but then the second collection tried to use
>>> the same directory to store its index, which obviously failed. I fixed
>>> this by changing solrconfig.xml in each collection to specify a
>>> specific directory, like so:
>>>
>>>   ${solr.data.dir:}products
>>>
>>> Looking back after the weekend, I'm not a big fan of this. Is there a
>>> way to add a core.properties to ZK, or a way to specify
>>> core.baseDatadir on the command line, or just a better way of handling
>>> this that I'm not aware of?
>>
>> Since you're running SolrCloud, just let Solr handle the dataDir, don't
>> try to override it.  It will default to "data" relative to the
>> instanceDir.  Each instanceDir is likely to be in the solr home.
>>
>> With SolrCloud, your cores will not contain a "conf" directory (unless
>> you create it manually), therefore the on-disk locations will be *only*
>> data, there's not really any need to have separate locations for
>> instanceDir and dataDir.  All active configuration information for
>> SolrCloud is in zookeeper.
>>
>
> That makes sense, but I guess I was asking the wrong question :)
>
> We have our SSDs mounted on /data/solr, which is where our indexes
> should go, but our solr install is on /opt/solr, with the default solr
> home in /opt/solr/server/solr. How do we change where the indexes get
> put so they end up on the fast storage?
>
> Cheers
>
> Tom


Re: how to secure standalone solr

2015-12-14 Thread Ishan Chattopadhyaya
Hi Daniel,
That sounds good. It is a custom solution, which is a way to secure just
about any server. I think Noble's point was about out of the box, community
supported, way of securing Solr.
Regards,
Ishan

On Mon, Dec 14, 2015 at 9:26 PM, Davis, Daniel (NIH/NLM) [C] <
daniel.da...@nih.gov> wrote:

> Wait a second. There are other sorts of ways to secure Solr that don't
> work with any sort role-based security control.   What I do is place a
> reverse-proxy in front of Apache Solr on port 80, and have that reverse
> proxy use CAS authentication.  I also have a list of "valid-users" who may
> use the Solr admin UI.
>
> Then, I have port 8983 open in my port-based host firewall (iptables), but
> it only allows the hosts that need to talk directly to Solr.  Firewalls
> prevent the other accesses.  Many security wrappers such as mod_auth_cas,
> which works with Apache httpd, can set a request header such as REMOTE_USER
> to the username of the individual who has authenticated with the wrapper.
>  In fact, I'm hoping security.json can eventually be made to work with such
> a header.
>
> -Original Message-
> From: Noble Paul [mailto:noble.p...@gmail.com]
> Sent: Friday, December 11, 2015 8:12 PM
> To: solr-user@lucene.apache.org
> Subject: Re: how to secure standalone solr
>
> For standalone Solr , Kerberos is the only option for authentication.
> If you have  a SolrCloud setup, you have other options
>
>
> https://cwiki.apache.org/confluence/display/solr/Basic+Authentication+Plugin
>
> https://cwiki.apache.org/confluence/display/solr/Rule-Based+Authorization+Plugin
>
> On Fri, Dec 11, 2015 at 11:02 PM, Don Bosco Durai 
> wrote:
> >>Anyone told me how to secure standalone solr .
> > Recently there were few discussion on this. In short, it is not tested
> and there doesn’t seem to a plan to test it.
> >
> >>1.)using Kerberos Plugin is a good practice or any other else.
> > The answer depends how you are using it. Where you are deploying it, who
> is accessing it, whether you want to restrict by access type (read/write),
> what authentication environment (LDAP/AD, Kerberos, etc) you already have.
> >
> > Depending upon your use cases and environment, you may have one or more
> options.
> >
> > Bosco
> >
> >
> >
> >
> >
> >
> > On 12/11/15, 4:27 AM, "Mugeesh Husain"  wrote:
> >
> >>Hello,
> >>
> >>Anyone told me how to secure standalone solr .
> >>
> >>1.)using Kerberos Plugin is a good practice or any other else.
> >>
> >>
> >>
> >>--
> >>View this message in context:
> >>http://lucene.472066.n3.nabble.com/how-to-secure-standalone-solr-tp424
> >>4866.html Sent from the Solr - User mailing list archive at
> >>Nabble.com.
> >
>
>
>
> --
> -
> Noble Paul
>