RE: Solr 6.4 - Transient core loading is extremely slow with HDFS and S3

2017-04-12 Thread Cahill, Trey
Hi Amarnath, 

From this log snippet:
"
2017-04-12 17:53:44.900 INFO
 (searcherExecutor-12-thread-1-processing-x:amar1) [   x:amar1]
o.a.s.c.SolrCore [amar1] Registered new searcher Searcher@3f61e7f2[amar1]
main{ExitableDirectoryReader(UninvertingDirectoryReader(Uninverting(_16(6.4.2):c97790)
Uninverting(_17(6.4.2):C236640) Uninverting(_b(6.4.2):C51852)
Uninverting(_d(6.4.2):C4) Uninverting(_f(6.4.2):C1)
Uninverting(_o(6.4.2):C33360) Uninverting(_r(6.4.2):C40358)
Uninverting(_y(6.4.2):C6) Uninverting(_14(6.4.2):C1) 
Uninverting(_15(6.4.2):C1)))}
2017-04-12 17:56:22.799 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.c.SolrCores Opening transient core amar1
2017-04-12 17:56:22.837 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.c.S.Request [amar1]  webapp=/solr path=/select params={q=*:*=0}
hits=59 status=0 *QTime=243787*
"

It does look like reading the data into Solr from S3 is being slow.

Running Solr on an EC2 instance in the same AWS region as your S3 bucket should 
help.  While you’re in AWS, using VPC endpoints should also help with 
performance. From your logs, it looks like you're running from your laptop.

It looks like you’re using s3a, which is a good start.  On a side note,  Hadoop 
2.8 has recently been released 
(https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+2.8+Release), which 
includes some work for s3a. Not promising any performance improvements if you 
use s3a with Hadoop 2.8, but it's probably the best way to access S3 right now.

Finally,  remember that S3 is a service; if the S3 service is slow (for example 
due to a heavy stream of request), then your operations with S3 will also be 
slow.  

Hope this helps and good luck, 

Trey

-Original Message-
From: Amarnath palavalli [mailto:pamarn...@gmail.com] 
Sent: Wednesday, April 12, 2017 2:09 PM
To: solr-user@lucene.apache.org
Subject: Solr 6.4 - Transient core loading is extremely slow with HDFS and S3

Hello,

I am using S3 as the primary store for data directory of core. To achieve this, 
I have the following in Solrconfig.xml:


  **
*  s3a://amar-hdfs/solr*
*  /usr/local/Cellar/hadoop/2.7.3/libexec/etc/hadoop*
*  true*
*  4*
*  true*
*  16384*
*  true*
*  true*
*  16*
*  192*
*  *

When I access the core 'amar1' it is taking like 245 seconds to load the core 
of total size about 85 MB. Here is the complete solr.log for core
loading:

2017-04-12 17:52:19.079 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.c.SolrResourceLoader [amar1] Added 57 libs to classloader, from
paths: [/Users/apalavalli/solr/solr-deployment/contrib/clustering/lib,
/Users/apalavalli/solr/solr-deployment/contrib/extraction/lib,
/Users/apalavalli/solr/solr-deployment/contrib/langid/lib,
/Users/apalavalli/solr/solr-deployment/contrib/velocity/lib,
/Users/apalavalli/solr/solr-deployment/dist,
/Users/apalavalli/solr/solr-deployment/dist/lib2]
2017-04-12 17:52:19.109 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.c.SolrConfig Using Lucene MatchVersion: 6.4.2
2017-04-12 17:52:19.155 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.s.IndexSchema [amar1] Schema name=log-saas
2017-04-12 17:52:19.217 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.s.IndexSchema Loaded schema log-saas/1.6 with uniqueid field id
2017-04-12 17:52:19.217 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.c.CoreContainer Creating SolrCore 'amar1' using configuration from 
configset
/Users/apalavalli/solr/solr-deployment/server/solr/configsets/base-config-s3
2017-04-12 17:52:19.223 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.c.HdfsDirectoryFactory solr.hdfs.home=s3a://amar-hdfs/solr
2017-04-12 17:52:19.223 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.c.HdfsDirectoryFactory Solr Kerberos Authentication disabled
2017-04-12 17:52:19.234 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.c.SolrCore [[amar1] ] Opening new SolrCore at 
[/Users/apalavalli/solr/solr-deployment/server/solr/configsets/base-config-s3],
dataDir=[s3a://amar-hdfs/solr/amar1/data/]
2017-04-12 17:52:19.234 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.c.JmxMonitoredMap JMX monitoring is enabled. Adding Solr mbeans to JMX 
Server: com.sun.jmx.mbeanserver.JmxMBeanServer@5745ca0e
2017-04-12 17:52:19.236 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.c.HdfsDirectoryFactory creating directory factory for path 
s3a://amar-hdfs/solr/amar1/data/snapshot_metadata
2017-04-12 17:52:19.274 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.c.HdfsDirectoryFactory Number of slabs of block cache [4] with direct 
memory allocation set to [true]
2017-04-12 17:52:19.274 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.c.HdfsDirectoryFactory Block cache target memory usage, slab size of 
[134217728] will allocate [4] slabs and use ~[536870912] bytes
2017-04-12 17:52:19.274 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.c.HdfsDirectoryFactory Creating new global HDFS BlockCache
2017-04-12 17:52:19.888 WARN  (qtp1654589030-18) [   x:amar1]
o.a.h.u.NativeCodeLoader Unable to load native-hadoop library for your 
platform... using builtin-java classes where 

Solr 6.4 - Transient core loading is extremely slow with HDFS and S3

2017-04-12 Thread Amarnath palavalli
Hello,

I am using S3 as the primary store for data directory of core. To achieve
this, I have the following in Solrconfig.xml:


  **
*  s3a://amar-hdfs/solr*
*  /usr/local/Cellar/hadoop/2.7.3/libexec/etc/hadoop*
*  true*
*  4*
*  true*
*  16384*
*  true*
*  true*
*  16*
*  192*
*  *

When I access the core 'amar1' it is taking like 245 seconds to load the
core of total size about 85 MB. Here is the complete solr.log for core
loading:

2017-04-12 17:52:19.079 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.c.SolrResourceLoader [amar1] Added 57 libs to classloader, from
paths: [/Users/apalavalli/solr/solr-deployment/contrib/clustering/lib,
/Users/apalavalli/solr/solr-deployment/contrib/extraction/lib,
/Users/apalavalli/solr/solr-deployment/contrib/langid/lib,
/Users/apalavalli/solr/solr-deployment/contrib/velocity/lib,
/Users/apalavalli/solr/solr-deployment/dist,
/Users/apalavalli/solr/solr-deployment/dist/lib2]
2017-04-12 17:52:19.109 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.c.SolrConfig Using Lucene MatchVersion: 6.4.2
2017-04-12 17:52:19.155 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.s.IndexSchema [amar1] Schema name=log-saas
2017-04-12 17:52:19.217 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.s.IndexSchema Loaded schema log-saas/1.6 with uniqueid field id
2017-04-12 17:52:19.217 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.c.CoreContainer Creating SolrCore 'amar1' using configuration from
configset
/Users/apalavalli/solr/solr-deployment/server/solr/configsets/base-config-s3
2017-04-12 17:52:19.223 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.c.HdfsDirectoryFactory solr.hdfs.home=s3a://amar-hdfs/solr
2017-04-12 17:52:19.223 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.c.HdfsDirectoryFactory Solr Kerberos Authentication disabled
2017-04-12 17:52:19.234 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.c.SolrCore [[amar1] ] Opening new SolrCore at
[/Users/apalavalli/solr/solr-deployment/server/solr/configsets/base-config-s3],
dataDir=[s3a://amar-hdfs/solr/amar1/data/]
2017-04-12 17:52:19.234 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.c.JmxMonitoredMap JMX monitoring is enabled. Adding Solr mbeans to
JMX Server: com.sun.jmx.mbeanserver.JmxMBeanServer@5745ca0e
2017-04-12 17:52:19.236 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.c.HdfsDirectoryFactory creating directory factory for path
s3a://amar-hdfs/solr/amar1/data/snapshot_metadata
2017-04-12 17:52:19.274 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.c.HdfsDirectoryFactory Number of slabs of block cache [4] with direct
memory allocation set to [true]
2017-04-12 17:52:19.274 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.c.HdfsDirectoryFactory Block cache target memory usage, slab size of
[134217728] will allocate [4] slabs and use ~[536870912] bytes
2017-04-12 17:52:19.274 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.c.HdfsDirectoryFactory Creating new global HDFS BlockCache
2017-04-12 17:52:19.888 WARN  (qtp1654589030-18) [   x:amar1]
o.a.h.u.NativeCodeLoader Unable to load native-hadoop library for your
platform... using builtin-java classes where applicable
2017-04-12 17:52:20.759 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.s.b.BlockDirectory Block cache on write is disabled
2017-04-12 17:52:21.074 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.c.HdfsDirectoryFactory creating directory factory for path
s3a://amar-hdfs/solr/amar1/data
2017-04-12 17:52:21.659 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.c.HdfsDirectoryFactory creating directory factory for path
s3a://amar-hdfs/solr/amar1/data/index
2017-04-12 17:52:21.670 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.c.HdfsDirectoryFactory Number of slabs of block cache [4] with direct
memory allocation set to [true]
2017-04-12 17:52:21.671 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.c.HdfsDirectoryFactory Block cache target memory usage, slab size of
[134217728] will allocate [4] slabs and use ~[536870912] bytes
2017-04-12 17:52:21.947 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.s.b.BlockDirectory Block cache on write is disabled
2017-04-12 17:52:22.058 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.r.XSLTResponseWriter xsltCacheLifetimeSeconds=5
2017-04-12 17:52:22.112 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.u.UpdateHandler Using UpdateLog implementation:
org.apache.solr.update.UpdateLog
2017-04-12 17:52:22.112 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.u.UpdateLog Initializing UpdateLog: dataDir= defaultSyncLevel=FLUSH
numRecordsToKeep=100 maxNumLogsToKeep=10 numVersionBuckets=65536
2017-04-12 17:52:22.128 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.u.CommitTracker Hard AutoCommit: if uncommited for 1ms;
2017-04-12 17:52:22.128 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.u.CommitTracker Soft AutoCommit: if uncommited for 5000ms;
2017-04-12 17:53:44.573 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.s.SolrIndexSearcher Opening [Searcher@3f61e7f2[amar1] main]
2017-04-12 17:53:44.575 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.r.ManagedResourceStorage File-based storage initialized to use dir: