Re: Setting solr.data.dir for SolrCloud instance

2013-11-26 Thread Erick Erickson
The data _is_ separated from the code. It's all relative
to solr_home which need not have any relation to where
the code is executing from.

For instance, I can start Solr like
java -Dsolr.solr.home=/Users/Erick/testdir/solr -jar start.jar

and have my war in a completely different place.

Best,
Erick


On Tue, Nov 26, 2013 at 1:08 AM, adfel70 adfe...@gmail.com wrote:

 Thanks for the reply, Erick.
 Actually, I didnt not think this through. I just thought it would be a good
 idea to separate the data from the application code.
 I guess I'll leave it without setting the datadir parameter and add a
 symlink.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Setting-solr-data-dir-for-SolrCloud-instance-tp4103052p4103228.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Setting solr.data.dir for SolrCloud instance

2013-11-26 Thread adfel70
The problem we had was that we tried to run: 
java -Dsolr.data.dir=/opt/solr/data -Dsolr.solr.home=/opt/solr/home -jar
start.jar
and got different behavior for how solr handles these 2 params.

we created 2 collections, which created 2 cores. 
then we got 2 home dirs for the cores, as expected:
/opt/solr/home/collection1_shard1_replica1
/opt/solr/home/collection2_shard1_replica1

but instead of creating 2 data dirs like:
/opt/solr/data/collection1_shard1_replica1
/opt/solr/data/collection2_shard1_replica1
 
solr had both cores' data dirs  pointing to the same directory -
/opt/solr/data

when we tried putting a relative path in -Dsolr.data.dir, it worked as
expected.

I don't know if this is a bug, but we thought of 2 solutions in our case:
1. point -Dsolr.data.dir to a relative path on symlink that path to the
absolute path we wanted in the first place.
2. dont provide -Dsolr.data.dir at all, and then solr puts the data dir
inside the home.dir, which as said, works with relative paths.

we chose the first option for now.





Erick Erickson wrote
 The data _is_ separated from the code. It's all relative
 to solr_home which need not have any relation to where
 the code is executing from.
 
 For instance, I can start Solr like
 java -Dsolr.solr.home=/Users/Erick/testdir/solr -jar start.jar
 
 and have my war in a completely different place.
 
 Best,
 Erick
 
 
 On Tue, Nov 26, 2013 at 1:08 AM, adfel70 lt;

 adfel70@

 gt; wrote:
 
 Thanks for the reply, Erick.
 Actually, I didnt not think this through. I just thought it would be a
 good
 idea to separate the data from the application code.
 I guess I'll leave it without setting the datadir parameter and add a
 symlink.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Setting-solr-data-dir-for-SolrCloud-instance-tp4103052p4103228.html
 Sent from the Solr - User mailing list archive at Nabble.com.






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Setting-solr-data-dir-for-SolrCloud-instance-tp4103052p4103334.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Setting solr.data.dir for SolrCloud instance

2013-11-26 Thread Mark Miller

On Nov 25, 2013, at 8:12 AM, adfel70 adfe...@gmail.com wrote:

 I was expecting that the path I sent would serve as the BASE path for all
 cores the the node hosts

When running Solr on HDFS, there is a similar prop you can use 
-Dsolr.hdfs.home. If you set that, all data dirs are created nicely under it.

We talked about wanting a similar option for SolrCloud and local filesystem a 
while back. If there is no JIRA issue for it, please file one!

- Mark



Re: Setting solr.data.dir for SolrCloud instance

2013-11-26 Thread Shawn Heisey

On 11/26/2013 9:19 AM, adfel70 wrote:

The problem we had was that we tried to run:
java -Dsolr.data.dir=/opt/solr/data -Dsolr.solr.home=/opt/solr/home -jar
start.jar
and got different behavior for how solr handles these 2 params.

we created 2 collections, which created 2 cores.
then we got 2 home dirs for the cores, as expected:
/opt/solr/home/collection1_shard1_replica1
/opt/solr/home/collection2_shard1_replica1

but instead of creating 2 data dirs like:
/opt/solr/data/collection1_shard1_replica1
/opt/solr/data/collection2_shard1_replica1
  
solr had both cores' data dirs  pointing to the same directory -

/opt/solr/data

when we tried putting a relative path in -Dsolr.data.dir, it worked as
expected.

I don't know if this is a bug, but we thought of 2 solutions in our case:
1. point -Dsolr.data.dir to a relative path on symlink that path to the
absolute path we wanted in the first place.
2. dont provide -Dsolr.data.dir at all, and then solr puts the data dir
inside the home.dir, which as said, works with relative paths.

we chose the first option for now.


The dataDir is a per-core setting, you cannot set it for the entire 
application.  If you make it relative, then it will be relative to each 
individual instanceDir.  It defaults to ./data, so you get 
$instanceDir/data as the location.


Thanks,
Shawn



Re: Setting solr.data.dir for SolrCloud instance

2013-11-25 Thread Erick Erickson
The first thing I'd do is not send an absolute path. What
happens if you just sent -Dsolr.data.dir=data? (no '/')?

We had this discussion a while ago when we were working
on auto-discovery, and it turns out that
there _are_ legitimate cases in which more than one
core/collection can point to the same data dir. You have to very
carefully control who writes to the core, and I wouldn't do it
unless there was no choice, but some people find it useful.

And, in general, I wouldn't mix and match the _core_ admin API
with the _collections_ api unless you're very confident in what
you are doing.

Why isn't just letting the default data.dir location working for you?
There are good reasons to make it explicit, mostly just checking
that you're not over-thinking the problem. Usually they'll be located
in a reasonable place.

Best,
Erick



On Mon, Nov 25, 2013 at 8:12 AM, adfel70 adfe...@gmail.com wrote:

 I found something strange while trying to create more than one collection
 in
 SolrCloud:
 I am running every instance with -Dsolr.data.dir=/data
 If I look at Core Admin section, I can see that I have one core and its
 dataDir is set to this fixed location. Problem is, if I create a new
 collection, another core is created - but with this fixed index location
 again.
 I was expecting that the path I sent would serve as the BASE path for all
 cores the the node hosts. Current behaviour seems like a bug to me, because
 obviously one collection will see data that was not indexed to him.
 Is there a way to overcome this? I mean, change the default data dir
 location, but still be able to create more than one collection correctly?



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Setting-solr-data-dir-for-SolrCloud-instance-tp4103052.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Setting solr.data.dir for SolrCloud instance

2013-11-25 Thread adfel70
Thanks for the reply, Erick.
Actually, I didnt not think this through. I just thought it would be a good
idea to separate the data from the application code.
I guess I'll leave it without setting the datadir parameter and add a
symlink.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Setting-solr-data-dir-for-SolrCloud-instance-tp4103052p4103228.html
Sent from the Solr - User mailing list archive at Nabble.com.