Re: SolrCloud 5.1 startup looking for standalone config

2015-06-05 Thread tuxedomoon
 I would need to look at the code to figure out how it works, but I would
 imagine that the shards are shuffled randomly among the hosts so that
 multiple collections will be evenly distributed across the cluster.  It
 would take me quite a while to familiarize myself with the code before I
 could figure out where to look.

The random assignment is ok, wherever shard3 is created will become node3
for my system.  As long as each leader and replica pair remain partnered

mycollection_shard1_replica1  -- mycollection_shard1_replica2
mycollection_shard2_replica1  -- mycollection_shard2_replica2
etc

Does this remain 'fixed' in Zookeeper once established, so that restarting
nodes will not affect their shardn assignment?


 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-5-1-startup-looking-for-standalone-config-tp4209118p4209990.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud 5.1 startup looking for standalone config

2015-06-03 Thread tuxedomoon
Yes adding _solr worked, thx.  But I also had to populate the SOLR_HOST param
for each of the 4 hosts, as in
SOLR_HOST=ec2-52-4-232-216.compute-1.amazonaws.com.   I'm in an EC2 VPN
environment which might be the problem.

This command now works (leaving off port)

http://s1/solr/admin/collections?action=CREATEname=mycollectionnumShards=3collection.configName=mycollection_cloud_confcreateNodeSet=s1_solr,s2_solr,s3_solr

The shard directories do now appear on s1,s2,s3 but the order is different
every time I DELETE the collection and rerun the CREATE, right now it is

s1: mycollection_shard2_replica1
s2: mycollection_shard3_replica1
s3: mycollection_shard1_replica1

I'll look further at your article but any advice appreciated on controlling
what hosts the shards land on.

Also are these considered leaders?  If so I don't understand the replica1
suffix.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-5-1-startup-looking-for-standalone-config-tp4209118p4209581.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud 5.1 startup looking for standalone config

2015-06-02 Thread tuxedomoon
I ran this command with Solr hosts s1  s2 running.  

http://s1:8983/solr/admin/collections?action=CREATEname=mycollectionnumShards=2collection.configName=mycollection_cloud_confcreateNodeSet=s1:8983,s2:8983

I referred to  this link
http://heliosearch.org/solrcloud-assigning-nodes-machines/   which looks
like it is only passing the desired leaders to createNodeSet.

But I'm getting this error
-
Cannot create collection mycollection. Value of maxShardsPerNode is 1, and
the number of nodes currently live or live and part of your createNodeSet is
0. This allows a maximum of 0 to be created. Value of numShards is 2 and
value of replicationFactor is 1. This requires 2 shards to be created
(higher than the allowed number)

I get the same error with createNodeSet=s1:8983,s2:8983,s3:8983,s4:8983 with
all four Solr hosts running.

But the service status command shows that Zookeeper sees all my running
nodes

Solr process 24603 running on port 8983
{
  solr_home:/volume/solr/data/,
  version:5.1.0 1672403 - timpotter - 2015-04-09 10:37:54,
  startTime:2015-06-02T18:00:06.665Z,
  uptime:0 days, 0 hours, 4 minutes, 35 seconds,
  memory:19.6 MB (%4) of 490.7 MB,
  cloud:{
ZooKeeper:zk1:2181,zk2:2181,k3:2181,
liveNodes:4,
collections:0}}


I was expecting the absent maxShardsPerNode param to default to 1 and give
me 2 leaders, 2 replicas.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-5-1-startup-looking-for-standalone-config-tp4209118p4209294.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud 5.1 startup looking for standalone config

2015-06-02 Thread tuxedomoon
ok thanks, continuing...

 numShards in SOLR_OPTS isn't a good idea, what happens if you want to
 create a collection with 5 shards?)
yes I was following my old pattern CATALINA_OPTS=${CATALINA_OPTS}
-DnumShards=n

 down the nodes and nuke the directories you created by hand and bring the
 nodes back up
yes I did this

 create the collection via the Collections API CREATE 
 I did this but kept getting not running in SolrCloud mode.  Added the -c
option to my service script like this

su -c SOLR_INCLUDE=$SOLR_ENV $SOLR_INSTALL_DIR/bin/solr $SOLR_CMD -c -
$RUNAS

and it did start in cloud mode.  Is the -c necessary and is that the right
place for it?  I thought  uncommenting the ZK param in solr.in.sh would put
it in cloud mode.  

Reran the CREATE and got a shard1 and shard2 in the GUI cloud view.   

New directories are arc_search_shard1_replica1 and
arc_search_shard2_replica1.  Is this because I have only 2 Solr hosts
running?   I'm used to adding nodes one by one and having the replica
assignments start when numShards count is exceeded.

Transitioning from 4.2 to 5.1 and it's quite different!





--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-5-1-startup-looking-for-standalone-config-tp4209118p4209222.html
Sent from the Solr - User mailing list archive at Nabble.com.


SolrCloud 5.1 startup looking for standalone config

2015-06-01 Thread tuxedomoon
I followed these steps and I am unable to launch in cloud mode.

1. created / started 3 external Zookeeper hosts: zk1, zk2, zk3

2. installed Solr 5.1 as a service called solrsvc on two hosts: s1, s2

3. uploaded a configset to zk1  (solr home is /volume/solr/data)
---
/opt/solrsvc/server/scripts/cloud-scripts/zkcli.sh -cmd upconfig -zkhost 
zk1:2181  -confname mycollection_cloud_conf -solrhome /volume/solr/data
-confdir  /home/ec2-user/mycollection/conf


4. on s1, added these params to solr.in.sh
---
ZK_HOST=zk1:2181,zk2:2181,zk3:2181
SOLR_HOST=s1
ZK_CLIENT_TIMEOUT=15000
SOLR_OPTS=$SOLR_OPTS -DnumShards=2


5. on s1 created core directory and file

/volume/solr/data/mycollection/core.properties (name=mycollection)


6. repeated steps 4,5 for s2 minus the numShards param


Starting the service on s1 gives me

mycollection:
org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
Could not load conf for core mycollection: Error loading solr config from
/volume/solr/data/mycollection/conf/solrconfig.xml 

but aren't the config files supposed to be in Zookeeper?  

Tux


   
   







--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-5-1-startup-looking-for-standalone-config-tp4209118.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Reindex of document leaves old fields behind

2015-05-22 Thread tuxedomoon
This is fixed.  My SolrJ client was putting a JSON object into a multivalued
field in the SolrInputDocument.  Solr returned a 0 status code but did not
add the bad object, instead it performed what looks like an atomic index as
described above.  Once I removed the illegal JSON object from the
SolrInputDocument a regular document replacement occurred and my unwanted
fields were removed in Solr. 

Is this a known behaviour, for Solr to switch into atomic update mode based
on attributes of the SolrInputDocument?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Reindex-of-document-leaves-old-fields-behind-tp4206710p4207164.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Reindex of document leaves old fields behind

2015-05-21 Thread tuxedomoon
I'm relying on an autocommit of 60 secs.

I just ran the same test via my SolrJ client and result was the same,
SolrCloud query always returns correct number of fields.  

Is there a way to find out which shard and replica a particular document
lives on?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Reindex-of-document-leaves-old-fields-behind-tp4206710p4206908.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Reindex of document leaves old fields behind

2015-05-21 Thread tuxedomoon
a few further clues to this unresolved problem

1. I found one of my 5 zookeeper instances was down
2. I tried another reindex of a bad document but no change on the SOLR side
3. I deleted and reindexed the same doc, that worked (obviously, but at this
point I don't know what to expect)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Reindex-of-document-leaves-old-fields-behind-tp4206710p4206946.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Reindex of document leaves old fields behind

2015-05-21 Thread tuxedomoon
l If it is implicit then
 you may have indexed the new document to a different shard, which means
 that it is now in your index more than once, and which one gets returned
 may not be predictable.

If a document with uniqueKey 1234 is assigned to a shard by SolrCloud,
implicit routing won't a reindex of 1234 be assigned to the same shard? 
If not you'd have dups all over the cluster. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Reindex-of-document-leaves-old-fields-behind-tp4206710p4206849.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Reindex of document leaves old fields behind

2015-05-21 Thread tuxedomoon
 let's see the code.

simplified code and some comments

1.  solrUrl points at leader 1 of 3 leaders, each with a replica  
2.  createSolrDoc takes a full Mongo doc and returns a valid
SolrInputDocument 
3.  I have done dumps of the returned solrDoc and verified it does not have
the unwanted fields

SolrServer solrServer = new HttpSolrServer(solrUrl);   
SolrInputDocument solrDoc = solrDocFactory.createSolrDoc(mongoDoc,
dbName);
UpdateResponse uresponse  = solrServer.add(solrDoc);


 issue a query on some of the unique ids in question
SolrCloud is returning only 1 document per uniqueKey


 Did you push your schema up to Zookeeper and reload 
 (or restart) your collection before re-indexing things? 
no.  the config was pushed up to Zookeeper only once a few months ago.  The
documents in question were updated in Mongo and given an updated
create_date.  Based on this new create_date my SolrJ client detects and
reindexes them.

 are you sure the documents are actually getting indexed and that the
 update 
 is succeeding?
yes, I see a new value in the timestamp field each time I reindex  




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Reindex-of-document-leaves-old-fields-behind-tp4206710p4206841.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Reindex of document leaves old fields behind

2015-05-21 Thread tuxedomoon
I'm posting the fields from one of my problem document, based on this comment
I found from Shawn on Grokbase.  

 If you are trying to use a Map object as the value of a field, that is
 probably why it is interpreting your add request as an atomic update.
 If this is the case, and you're doing it because you have a multivalued
 field, you can use a List object rather than a Map.

This is just a solrDoc.toString() with linebreaks where commas were.  Maybe
some of these are being seen as map fields by SOLR.
=
SolrInputDocument[

mynamespaces_s_mv=[drama],

changedates_s_mv=[Tue May 19 17:21:26 EDT 2015, Thu Dec 30 19:00:00 EST
],

networks_t_mv=[{ abcitem-id : 288578fd-6596-47bc-af95-80daecd1f24a ,
abccontentType : Standard:SocialHandle , SocialNetwork : { $uuid :
73553c4c-4919-4ba9-b16c-fb340f3e4c31} , Handle : in my
imaginationseries}],

links_s_mv=[ { $uuid : 4d8eb47c-ce2d-4e7f-a567-d8d6692fed4e} , { $uuid
: 9fd75c26-35f2-4f48-b55a-6e82089cc3ba} , { $uuid :
150e43ed-9ebe-41b4-86cc-bdf4885a50fe} , { $uuid :
e20b0040-561f-4c34-9dd3-df85250b5a5b} , { $uuid :
0cff75d0-4f32-46c9-9092-60eec2dc847a} , { $uuid :
73553c4c-4919-4ba9-b16c-fb340f3e4c31}],

ratings_t_mv=[{ abcitem-id : 56058649-579a-4160-9439-e59448eb3dff ,
abccontentType : Standard:TVPG , Rating : { $uuid :
150e43ed-9ebe-41b4-86cc-bdf4885a50fe}}],

title_ci_t=in my imagination,

urlkey_s=in-my imagination,

title_cs_t=In My Imagination,

dp2_1_s_mv=[ { _id : { $uuid : 4d8eb47c-ce2d-4e7f-a567-d8d6692fed4e} ,
_rules : [ { _startDate : { $date : 2015-03-23T14:58:00.000Z} ,
_endDate : { $date : -12-31T00:00:00.000Z} , _r : { $uuid :
47b6b31d-d690-437a-9bab-6eeb7be3c8a4} , _p : { $uuid :
d478874f-8fc7-4b3d-97f3-f7e63222d633} , _o : { $uuid :
983b6ae9-7882-4af8-bb2f-cff342be99b3} , _a :  null }]}],

seriestype_s=e20b0040-561f-4c34-9dd3-df85250b5a5b,

shortid_s=x5jqqf, i

shorttitle_t=In My Imagination,

uuid_s=90a1fbbf-ddf8-47a7-9f00-55f05e7dc297,

status_s=DEFAULT,

updatedby_s=maceirar,

description_t=sometext,

review_s_mv=[{ abcpublished : { $date : 2015-05-19T21:21:30.930Z} ,
abcpublishedBy : jelly , abctargetEnvironment :
entertainment-staging , abcrequestId : { $uuid :
56769138-4a03-4ed6-8b29-8030d0941b08} , abcsourceEnvironment : fishing
, abcstate : true}, { abcpublished : { $date :
2015-05-19T21:21:31.731Z} , abcpublishedBy : jelly ,
abctargetEnvironment : myshow-live , abcrequestId : { $uuid :
56769138-4a03-4ed6-8b29-8030d0941b08} , abcsourceEnvironment :
myshow-staging , abcstate : true}],

sorttitle_t=In My Imagination,

images_s_mv=[ { $uuid : 9fd75c26-35f2-4f48-b55a-6e82089cc3ba} , {
$uuid : 0cff75d0-4f32-46c9-9092-60eec2dc847a}],

title_ci_s=in my imagination,

firmuuids_s_mv=[ { $uuid : 4d8eb47c-ce2d-4e7f-a567-d8d6692fed4e}],

id=mongo-v2.abcnservices.com-fishing-90a1fbbf-ddf8-47a7-9f00-55f05e7dc297,

timestamp=Thu May 21 17:29:58 EDT 2015

]




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Reindex-of-document-leaves-old-fields-behind-tp4206710p4206963.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Reindex of document leaves old fields behind

2015-05-21 Thread tuxedomoon
I'm doing all my index to leader 1 and have not specified any router
configuration.  But there is an equal distribution of 240M docs across 5
shards.  I think I've been stating I have 3 shards in these posts, I have 5,
sorry.

How do I know what kind of routing I am using?  




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Reindex-of-document-leaves-old-fields-behind-tp4206710p4206869.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Reindex of document leaves old fields behind

2015-05-21 Thread tuxedomoon
OK it is composite

I've just used post.sh to index a test doc with 3 fields to leader 1 of my
SolrCloud.  I then reindexed it with 1 field removed and the query on it
shows 2 fields.   I repeated this a few times and always get the correct
field count from Solr.  

I'm now wondering if SolrJ is somehow involved in performing an atomic
update rather than replacement. I will  try the above test via SolrJ.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Reindex-of-document-leaves-old-fields-behind-tp4206710p4206886.html
Sent from the Solr - User mailing list archive at Nabble.com.


Reindex of document leaves old fields behind

2015-05-20 Thread tuxedomoon
I'm reindexing Mongo docs into SolrCloud.  The new docs have had a few fields
removed so upon reindexing those fields should be gone in Solr.  They are
not.  So the result is a new doc merged with an old doc rather than a
replacement which is what I need.

I do not know whether the issue is with my SolrJ client, Solr config or
something else.  





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Reindex-of-document-leaves-old-fields-behind-tp4206710.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Reindex of document leaves old fields behind

2015-05-20 Thread tuxedomoon
The uniqueKey value is the same.  

The new documents contain fewer fields than the already indexed ones.  Could
this cause the updates to be treated as atomic?  With the persisting fields
treated as un-updated?

Routing should be implicit since the collection was created using numShards. 
Many request for the same document with cache busting produce the same
unwanted fields, so I doubt the correct one is hiding somewhere.  I can
also see the timestamp going up with each reindex. 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Reindex-of-document-leaves-old-fields-behind-tp4206710p4206732.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Can a single SolrServer instance update multiple collections?

2015-03-11 Thread tuxedomoon
@Shawn,

I can definitely upgrade to SolrJ 4.x and would prefer that so as to target
4.x cores as well.  I'm already on Java 7. 

One attempt I made was this

UpdateRequest updateRequest = new UpdateRequest();
updateRequest.setParam(collection, collectionName);
updateRequest.setMethod(SolrRequest.METHOD.POST);
updateRequest.add(solrdoc);
UpdateResponse updateResponse = updateRequest.process(solrServer);

but I kept getting Bad Request which I suspect was a SOLR/SolrJ version
conflict.  

I'm all ears!

Dan






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-a-single-SolrServer-instance-update-multiple-collections-tp4192480p4192520.html
Sent from the Solr - User mailing list archive at Nabble.com.


Can a single SolrServer instance update multiple collections?

2015-03-11 Thread tuxedomoon
I have a SolrJ application that reads from a Redis queue and updates
different collections based on the message content.  New collections are
added without my knowledge, so I am creating SolrServer objects on the fly
as follows:

def solrHost = http://myhost/solr/; (defined at startup)

def solrTarget = solrHost + collectionName
SolrServer solrServer = new CommonsHttpSolrServer(solrTarget)
updateResponse = solrServer.add(solrdoc)


This does work but obviously creates a new CommonsHttpSolrServer instance
for each message.  I assume GC will eliminate these but is there a way to do
this with a single SolrServer object?  

The SOLR host is version 3.5 and I am using the 3.5 jars for my application
(not sure if that is necessary). 





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-a-single-SolrServer-instance-update-multiple-collections-tp4192480.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Can a single SolrServer instance update multiple collections?

2015-03-11 Thread tuxedomoon
@Shawn

I'm getting the Bad Request again, with the original code snippet I posted,
it appears to be an 'illegal' string field.

SOLR log
-
INFO:
{add=[mgid:arc:content:jokers.com:694d5bf8-ecfd-11e0-aca6-0026b9414f30]} 0 7
Mar 12, 2015 12:15:09 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: ERROR:
[doc=mgid:arc:content:jokers.com:694d5bf8-ecfd-11e0-aca6-0026b9414f30]
multiple values encountered for non multiValued field image_url_s:
[mgid:file:gsp:movie-assets:/movie-assets/cc/images/shows/miami-beach/episode-thumbnails/specials/iamstupid-the-movie_4x3.jpg,
mgid:file:gsp:movie-assets:/movie-assets/cc/images/shows/miami-beach/episode-thumbnails/specials/iamstupid-the-movie_4x3.jpg]
at
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:246)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
at
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:115)
at
org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:158)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79)




SolrJ Log shows the doc being sent (this is the offending field only)

 field name=image_url_s/field


I will investigate on the feeds side, the existing SolrJ code is not the
culprit.  But I'd still like a more elegant solution.  If a SolrJ 5 client
can talk to a 3.5 host I'm willing to go there.  I know I'm not the only one
who would like to address collections on the fly.

thx

Dan  



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-a-single-SolrServer-instance-update-multiple-collections-tp4192480p4192545.html
Sent from the Solr - User mailing list archive at Nabble.com.


How to direct SOLR 4.9 log output to regular Tomcat logs

2015-03-06 Thread tuxedomoon
I want SOLR 4.9 to log to my rolling tomcat logs like
catalina.2015-03-06.log.  Instead I'm just getting a solr.log with no
timestamp.  Maybe this is this just the way it has to be now?

I'm also not sure if I need to copy more SOLR jars into my tomcat lib.  

This is my setup.


tomcat6/conf/log4j.properties

log4j.rootLogger=debug, R
log4j.appender.R=org.apache.log4j.RollingFileAppender
log4j.appender.R.File=${catalina.home}/logs/tomcat.log
log4j.appender.R.MaxFileSize=10MB
log4j.appender.R.MaxBackupIndex=10
log4j.appender.R.layout=org.apache.log4j.PatternLayout
log4j.appender.R.layout.ConversionPattern=%p %t %c - %m%n
log4j.logger.org.apache.catalina=DEBUG, R
log4j.logger.org.apache.catalina.core.ContainerBase.[Catalina].[localhost]=DEBUG,
R
log4j.logger.org.apache.catalina.core=DEBUG, R
log4j.logger.org.apache.catalina.session=DEBUG, R


tomcat6/conf/logging.properties
-
handlers = 1catalina.org.apache.juli.FileHandler,
2localhost.org.apache.juli.FileHandler,
3manager.org.apache.juli.FileHandler,
4host-manager.org.apache.juli.FileHandler, java.util.logging.ConsoleHandler

.handlers = 1catalina.org.apache.juli.FileHandler,
java.util.logging.ConsoleHandler

1catalina.org.apache.juli.FileHandler.level = FINE
1catalina.org.apache.juli.FileHandler.directory = /data/tomcatlogs
1catalina.org.apache.juli.FileHandler.prefix = catalina.

2localhost.org.apache.juli.FileHandler.level = FINE
2localhost.org.apache.juli.FileHandler.directory = /data/tomcatlogs
2localhost.org.apache.juli.FileHandler.prefix = localhost.

3manager.org.apache.juli.FileHandler.level = FINE
3manager.org.apache.juli.FileHandler.directory = /data/tomcatlogs
3manager.org.apache.juli.FileHandler.prefix = manager.

4host-manager.org.apache.juli.FileHandler.level = FINE
4host-manager.org.apache.juli.FileHandler.directory = /data/tomcatlogs
4host-manager.org.apache.juli.FileHandler.prefix = host-manager.

java.util.logging.ConsoleHandler.level = FINE
java.util.logging.ConsoleHandler.formatter =
java.util.logging.SimpleFormatter

org.apache.catalina.core.ContainerBase.[Catalina].[localhost].level = INFO
org.apache.catalina.core.ContainerBase.[Catalina].[localhost].handlers =
2localhost.org.apache.juli.FileHandler

org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/manager].level
= INFO
org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/manager].handlers
= 3manager.org.apache.juli.FileHandler

org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/host-manager].level
= INFO
org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/host-manager].handlers
= 4host-manager.org.apache.juli.FileHandler


copied solr-4.9.0/example/lib/ext/*.jar to tomcat6/lib, not the solrj-lib +
dist jars as some tutorials suggested
--
jcl-over-slf4j-1.7.6.jar
jul-to-slf4j-1.7.6.jar
log4j-1.2.17.jar
slf4j-api-1.7.6.jar
slf4j-log4j12-1.7.6.jar


copied ./solr-4.9.0/example/resources/log4j.properties to tomcat6/lib and
pointed solr.log to my chosen directory.  I also have a
tomcat6/conf/log4j.properties and don't know if I should delete it.
--
#  Logging level
solr.log=/data/tomcatlogs
log4j.rootLogger=INFO, file, CONSOLE

log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender

log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout
log4j.appender.CONSOLE.layout.ConversionPattern=%-4r [%t] %-5p %c %x \u2013
%m%n

#- size rotation with log cleanup.
log4j.appender.file=org.apache.log4j.RollingFileAppender
log4j.appender.file.MaxFileSize=4MB
log4j.appender.file.MaxBackupIndex=9

#- File to log to and log format
log4j.appender.file.File=${solr.log}/solr.log
log4j.appender.file.layout=org.apache.log4j.PatternLayout
log4j.appender.file.layout.ConversionPattern=%-5p - %d{-MM-dd
HH:mm:ss.SSS}; %C; %m\n

log4j.logger.org.apache.zookeeper=WARN
log4j.logger.org.apache.hadoop=WARN

# set to INFO to enable infostream log messages
log4j.logger.org.apache.solr.update.LoggingInfoStream=OFF 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-direct-SOLR-4-9-log-output-to-regular-Tomcat-logs-tp4191502.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Does shard splitting double host count

2015-03-02 Thread tuxedomoon
Shawn, in light of Garth's response below

You can't just add a new core to an existing collection.  You can add the
new node to the cloud, but it won't be part of any collection.  You're not
going to be able to just slide it in as a 4th shard to an established
collection of 3 shards.

how is it that you say I can just start up new hosts, especially without
modfying the numShards parameter from 3 to 4?  And then probably reindexing
because the other options look risky (my company has no backup system).




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Does-shard-splitting-double-host-count-tp4189595p4190320.html
Sent from the Solr - User mailing list archive at Nabble.com.


Does shard splitting double host count

2015-02-27 Thread tuxedomoon
I currently have a SolrCloud with 3 shards + replicas, it is holding 130M
documents and the r3.large hosts are running out of memory. As it's on 4.2
there is no shard splitting, I will have to reindex to a 4.3+ version.

If I had that feature would I need to split each shard into 2 subshards
resulting in a total of 6 subshards, in order to keep all shards relatively
equal?

And since host memory is the problem I'd be migrating subshards to new
hosts. So it seems I'd be going from 6 hosts to 12.  Are these assumptions
correct or is there a way to avoid doubling my host count?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Does-shard-splitting-double-host-count-tp4189595.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Does shard splitting double host count

2015-02-27 Thread tuxedomoon
What about adding one new leader/replica pair?  It seems that would entail

a) creating the r3.large instances and volumes
b) adding 2 new Zookeeper hosts?
c) updating my Zookeeper configs (new hosts, new ids, new SOLR config)
d) restarting all ZKs
e) restarting SOLR hosts in sequence needed for correct shard/replica
assignment
f)  start indexing again

So shards 1,2,3 start with 33% of the docs each.  As I start indexing new
documents get sharded at 25% per shard.  If I reindex a document that exists
already in shard2, does it remain in shard2 or could it migrate to another
shard, thus removing it from shard2.

I'm looking for a migration strategy to achieve 25% docs per shard.  I would
also consider deleting docs by daterange from shards1,2,3 and reindexing
them to redistribute evenly.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Does-shard-splitting-double-host-count-tp4189595p4189672.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Does shard splitting double host count

2015-02-27 Thread tuxedomoon
I'd forgotten that DzkHost refers to the Zookeeper hosts not SOLR hosts. 
Thanks.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Does-shard-splitting-double-host-count-tp4189595p4189703.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud OOM Problem

2014-08-13 Thread tuxedomoon
Great info.  Can I ask how much data you are handling with that 6G or 7G
heap?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-OOM-Problem-tp4152389p4152712.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud OOM Problem

2014-08-13 Thread tuxedomoon
Have you used a queue to intercept queries and if so what was your
implementation?  We are indexing huge amounts of data from 7 SolrJ instances
which run independently, so there's a lot of concurrent indexing.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-OOM-Problem-tp4152389p4152717.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud OOM Problem

2014-08-13 Thread tuxedomoon
I applied the OPTS you pointed me to, here's the full string:

CATALINA_OPTS=${CATALINA_OPTS} -XX:NewSize=1536m -XX:MaxNewSize=1536m
-Xms12288m -Xmx12288m -XX:NewRatio=3 -XX:SurvivorRatio=4
-XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8
-XX:+UseConcMarkSweepGC -XX:+CMSScavengeBeforeRemark
-XX:PretenureSizeThreshold=64m -XX:CMSFullGCsBeforeCompaction=1
-XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=70
-XX:CMSTriggerPermRatio=80 -XX:CMSMaxAbortablePrecleanTime=6000
-XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -XX:+UseLargePages
-XX:+AggressiveOpts

jConsole is now showing lower heap usage.  It had been climbing to 12G
consistently, now it is only spiking to 10G every 10 minutes or so.

Here's my top output
===
  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
 4250 root  20   0  129g  14g  1.9g S2.021.317:40.61 java









--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-OOM-Problem-tp4152389p4152753.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud OOM Problem

2014-08-12 Thread tuxedomoon
I have modified my instances to m2.4xlarge 64-bit with 68.4G memory.  Hate to
ask this but can you recommend Java memory and GC settings for 90G data and
the above memory?  Currently I have

CATALINA_OPTS=${CATALINA_OPTS} -XX:NewSize=1536m -XX:MaxNewSize=1536m
-Xms5120m -Xmx5120m -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled
-XX:+UseConcMarkSweepGC

Doesn't this mean I am starting with 5G and never going over 5G?

I've seen a few of those univerted multi-valued field OOMs already on the
upgraded host.

Thanks

Tux







--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-OOM-Problem-tp4152389p4152585.html
Sent from the Solr - User mailing list archive at Nabble.com.