Solr 6.4 - Transient core loading is extremely slow with HDFS and S3

2017-04-12 Thread Amarnath palavalli
Hello,

I am using S3 as the primary store for data directory of core. To achieve
this, I have the following in Solrconfig.xml:


  **
*  s3a://amar-hdfs/solr*
*  /usr/local/Cellar/hadoop/2.7.3/libexec/etc/hadoop*
*  true*
*  4*
*  true*
*  16384*
*  true*
*  true*
*  16*
*  192*
*  *

When I access the core 'amar1' it is taking like 245 seconds to load the
core of total size about 85 MB. Here is the complete solr.log for core
loading:

2017-04-12 17:52:19.079 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.c.SolrResourceLoader [amar1] Added 57 libs to classloader, from
paths: [/Users/apalavalli/solr/solr-deployment/contrib/clustering/lib,
/Users/apalavalli/solr/solr-deployment/contrib/extraction/lib,
/Users/apalavalli/solr/solr-deployment/contrib/langid/lib,
/Users/apalavalli/solr/solr-deployment/contrib/velocity/lib,
/Users/apalavalli/solr/solr-deployment/dist,
/Users/apalavalli/solr/solr-deployment/dist/lib2]
2017-04-12 17:52:19.109 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.c.SolrConfig Using Lucene MatchVersion: 6.4.2
2017-04-12 17:52:19.155 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.s.IndexSchema [amar1] Schema name=log-saas
2017-04-12 17:52:19.217 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.s.IndexSchema Loaded schema log-saas/1.6 with uniqueid field id
2017-04-12 17:52:19.217 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.c.CoreContainer Creating SolrCore 'amar1' using configuration from
configset
/Users/apalavalli/solr/solr-deployment/server/solr/configsets/base-config-s3
2017-04-12 17:52:19.223 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.c.HdfsDirectoryFactory solr.hdfs.home=s3a://amar-hdfs/solr
2017-04-12 17:52:19.223 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.c.HdfsDirectoryFactory Solr Kerberos Authentication disabled
2017-04-12 17:52:19.234 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.c.SolrCore [[amar1] ] Opening new SolrCore at
[/Users/apalavalli/solr/solr-deployment/server/solr/configsets/base-config-s3],
dataDir=[s3a://amar-hdfs/solr/amar1/data/]
2017-04-12 17:52:19.234 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.c.JmxMonitoredMap JMX monitoring is enabled. Adding Solr mbeans to
JMX Server: com.sun.jmx.mbeanserver.JmxMBeanServer@5745ca0e
2017-04-12 17:52:19.236 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.c.HdfsDirectoryFactory creating directory factory for path
s3a://amar-hdfs/solr/amar1/data/snapshot_metadata
2017-04-12 17:52:19.274 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.c.HdfsDirectoryFactory Number of slabs of block cache [4] with direct
memory allocation set to [true]
2017-04-12 17:52:19.274 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.c.HdfsDirectoryFactory Block cache target memory usage, slab size of
[134217728] will allocate [4] slabs and use ~[536870912] bytes
2017-04-12 17:52:19.274 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.c.HdfsDirectoryFactory Creating new global HDFS BlockCache
2017-04-12 17:52:19.888 WARN  (qtp1654589030-18) [   x:amar1]
o.a.h.u.NativeCodeLoader Unable to load native-hadoop library for your
platform... using builtin-java classes where applicable
2017-04-12 17:52:20.759 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.s.b.BlockDirectory Block cache on write is disabled
2017-04-12 17:52:21.074 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.c.HdfsDirectoryFactory creating directory factory for path
s3a://amar-hdfs/solr/amar1/data
2017-04-12 17:52:21.659 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.c.HdfsDirectoryFactory creating directory factory for path
s3a://amar-hdfs/solr/amar1/data/index
2017-04-12 17:52:21.670 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.c.HdfsDirectoryFactory Number of slabs of block cache [4] with direct
memory allocation set to [true]
2017-04-12 17:52:21.671 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.c.HdfsDirectoryFactory Block cache target memory usage, slab size of
[134217728] will allocate [4] slabs and use ~[536870912] bytes
2017-04-12 17:52:21.947 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.s.b.BlockDirectory Block cache on write is disabled
2017-04-12 17:52:22.058 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.r.XSLTResponseWriter xsltCacheLifetimeSeconds=5
2017-04-12 17:52:22.112 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.u.UpdateHandler Using UpdateLog implementation:
org.apache.solr.update.UpdateLog
2017-04-12 17:52:22.112 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.u.UpdateLog Initializing UpdateLog: dataDir= defaultSyncLevel=FLUSH
numRecordsToKeep=100 maxNumLogsToKeep=10 numVersionBuckets=65536
2017-04-12 17:52:22.128 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.u.CommitTracker Hard AutoCommit: if uncommited for 1ms;
2017-04-12 17:52:22.128 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.u.CommitTracker Soft AutoCommit: if uncommited for 5000ms;
2017-04-12 17:53:44.573 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.s.SolrIndexSearcher Opening [Searcher@3f61e7f2[amar1] main]
2017-04-12 17:53:44.575 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.r.ManagedResourceStorage File-based storage initialized to use dir:

Re: Solr with HDFS on AWS S3 - Server restart fails to load the core

2017-04-07 Thread Amarnath palavalli
Hi Kevin,
Sorry for not being clear on the response.
What I meant here is with attribute loadOnStartup=true is not helping. I
see the same issue as posted on the images 'connection to solr lost' on
choosing core.  And don't see any errors in the log with with DEBUG level.

Thanks,
Amar

On Fri, Apr 7, 2017 at 3:38 PM, Kevin Risden <compuwizard...@gmail.com>
wrote:

> >
> > Thank you for the response. Setting “loadOnStartup=true“ results in
> showing
> > the connection timeout on clicking 'Core Admin' on Solr UI. Also, reload
> > does not work as the core is not loaded at all.
>
>
> Can you clarify what you mean by this? Does the core get loaded after you
> restart Solr?
>
> The initial description was the core wasn't loaded after Solr was
> restarted. What you are describing now is different I think.
>
> Kevin Risden
>
> On Fri, Apr 7, 2017 at 6:31 PM, Amarnath palavalli <pamarn...@gmail.com>
> wrote:
>
> > Hi Trey,
> >
> > Thank you for the response. Setting “loadOnStartup=true“ results in
> showing
> > the connection timeout on clicking 'Core Admin' on Solr UI. Also, reload
> > does not work as the core is not loaded at all.
> >
> > I suspect, something to do with HTTP connection idle time, probably the
> > connection is closed before the data is pulled from S3. I see that the '
> > maxUpdateConnectionIdleTime' is 40 seconds by default. However, don't
> know
> > how to change it.
> >
> > Thanks,
> > Amar
> >
> >
> >
> > On Fri, Apr 7, 2017 at 12:47 PM, Cahill, Trey <trey.cah...@siemens.com>
> > wrote:
> >
> > > Hi Amarnath,
> > >
> > > It looks like you’ve set the core to not load on startup via the
> > > “loadOnStartup=false“ property.   Your response also shows that the
> core
> > is
> > > not loaded, “false“.
> > >
> > > I’m not really sure how to load cores after a restart, but possibly
> using
> > > the Core Admin Reload would do it (https://cwiki.apache.org/
> > > confluence/display/solr/CoreAdmin+API#CoreAdminAPI-RELOAD).
> > >
> > > Best of luck,
> > >
> > > Trey
> > >
> > > From: Amarnath palavalli [mailto:pamarn...@gmail.com]
> > > Sent: Friday, April 07, 2017 3:20 PM
> > > To: solr-user@lucene.apache.org
> > > Subject: Solr with HDFS on AWS S3 - Server restart fails to load the
> core
> > >
> > > Hello,
> > >
> > > I configured Solr to use HDFS, which in turn configured to use S3N. I
> > used
> > > the information from this issue to configure:
> > > https://issues.apache.org/jira/browse/SOLR-9952
> > >
> > > Here is the command I have used to start the Solr with HDFS:
> > > bin/solr start -Dsolr.directoryFactory=HdfsDirectoryFactory
> > > -Dsolr.lock.type=hdfs -Dsolr.hdfs.home=s3n://amar-hdfs/solr
> > > -Dsolr.hdfs.confdir=/usr/local/Cellar/hadoop/2.7.3/libexec/etc/hadoop
> > > -DXX:MaxDirectMemorySize=2g
> > >
> > > I am able to create a core, with the following properties:
> > > #Written by CorePropertiesLocator
> > > #Thu Apr 06 23:08:57 UTC 2017
> > > name=amar-s3
> > > loadOnStartup=false
> > > transient=true
> > > configSet=base-config
> > >
> > > I am able to ingest messages into Solr and also query the content.
> > > Everything seems to be fine until this stage and I can see the data dir
> > on
> > > S3.
> > >
> > > However, the problem is when I restart the Solr server, that is when I
> > see
> > > the core not loaded even when accessed/queried against it. Here is the
> > > admin API to get all cores gives:
> > > 
> > > 
> > > 0
> > > 617
> > > 
> > > 
> > > 
> > > ...
> > > 
> > > amar-s3
> > > 
> > > /Users/apalavalli/solr/solr-deployment/server/solr/amar-s3
> > > 
> > > data/
> > > solrconfig.xml
> > > schema.xml
> > > false
> > > 
> > > 
> > > 
> > >
> > > I don't see any issues reported in the log as well, but see this error
> > > from the UI:
> > >
> > > [Inline image 1]
> > >
> > >
> > > Not sure about the problem. This is happening when I ingest more than
> 40K
> > > messages in core before restarting Solr server.
> > >
> > > I am using Hadoop 2.7.3 with S3N FS. Please help me on resolving this
> > > issue.
> > >
> > > Thanks,
> > > Regards,
> > > Amar
> > >
> > >
> > >
> > >
> > >
> >
>


Re: Solr with HDFS on AWS S3 - Server restart fails to load the core

2017-04-07 Thread Amarnath palavalli
Hi Trey,

Thank you for the response. Setting “loadOnStartup=true“ results in showing
the connection timeout on clicking 'Core Admin' on Solr UI. Also, reload
does not work as the core is not loaded at all.

I suspect, something to do with HTTP connection idle time, probably the
connection is closed before the data is pulled from S3. I see that the '
maxUpdateConnectionIdleTime' is 40 seconds by default. However, don't know
how to change it.

Thanks,
Amar



On Fri, Apr 7, 2017 at 12:47 PM, Cahill, Trey <trey.cah...@siemens.com>
wrote:

> Hi Amarnath,
>
> It looks like you’ve set the core to not load on startup via the
> “loadOnStartup=false“ property.   Your response also shows that the core is
> not loaded, “false“.
>
> I’m not really sure how to load cores after a restart, but possibly using
> the Core Admin Reload would do it (https://cwiki.apache.org/
> confluence/display/solr/CoreAdmin+API#CoreAdminAPI-RELOAD).
>
> Best of luck,
>
> Trey
>
> From: Amarnath palavalli [mailto:pamarn...@gmail.com]
> Sent: Friday, April 07, 2017 3:20 PM
> To: solr-user@lucene.apache.org
> Subject: Solr with HDFS on AWS S3 - Server restart fails to load the core
>
> Hello,
>
> I configured Solr to use HDFS, which in turn configured to use S3N. I used
> the information from this issue to configure:
> https://issues.apache.org/jira/browse/SOLR-9952
>
> Here is the command I have used to start the Solr with HDFS:
> bin/solr start -Dsolr.directoryFactory=HdfsDirectoryFactory
> -Dsolr.lock.type=hdfs -Dsolr.hdfs.home=s3n://amar-hdfs/solr
> -Dsolr.hdfs.confdir=/usr/local/Cellar/hadoop/2.7.3/libexec/etc/hadoop
> -DXX:MaxDirectMemorySize=2g
>
> I am able to create a core, with the following properties:
> #Written by CorePropertiesLocator
> #Thu Apr 06 23:08:57 UTC 2017
> name=amar-s3
> loadOnStartup=false
> transient=true
> configSet=base-config
>
> I am able to ingest messages into Solr and also query the content.
> Everything seems to be fine until this stage and I can see the data dir on
> S3.
>
> However, the problem is when I restart the Solr server, that is when I see
> the core not loaded even when accessed/queried against it. Here is the
> admin API to get all cores gives:
> 
> 
> 0
> 617
> 
> 
> 
> ...
> 
> amar-s3
> 
> /Users/apalavalli/solr/solr-deployment/server/solr/amar-s3
> 
> data/
> solrconfig.xml
> schema.xml
> false
> 
> 
> 
>
> I don't see any issues reported in the log as well, but see this error
> from the UI:
>
> [Inline image 1]
>
>
> Not sure about the problem. This is happening when I ingest more than 40K
> messages in core before restarting Solr server.
>
> I am using Hadoop 2.7.3 with S3N FS. Please help me on resolving this
> issue.
>
> Thanks,
> Regards,
> Amar
>
>
>
>
>


Solr with HDFS on AWS S3 - Server restart fails to load the core

2017-04-07 Thread Amarnath palavalli
Hello,

I configured Solr to use HDFS, which in turn configured to use S3N. I used
the information from this issue to configure:
*https://issues.apache.org/jira/browse/SOLR-9952
*

Here is the command I have used to start the Solr with HDFS:

*bin/solr start -Dsolr.directoryFactory=HdfsDirectoryFactory
-Dsolr.lock.type=hdfs -Dsolr.hdfs.home=s3n://amar-hdfs/solr
-Dsolr.hdfs.confdir=/usr/local/Cellar/hadoop/2.7.3/libexec/etc/hadoop
 -DXX:MaxDirectMemorySize=2g*

I am able to create a core, with the following properties:
*#Written by CorePropertiesLocator*
*#Thu Apr 06 23:08:57 UTC 2017*
*name=amar-s3*
*loadOnStartup=false*
*transient=true*
*configSet=base-config*

I am able to ingest messages into Solr and also query the content.
Everything seems to be fine until this stage and I can see the data dir on
S3.

However, the problem is when I restart the Solr server, that is when I see
the core not loaded even when accessed/queried against it. Here is the
admin API to get all cores gives:


0
617



...
**
*amar-s3*
**
*/Users/apalavalli/solr/solr-deployment/server/solr/amar-s3*
**
*data/*
*solrconfig.xml*
*schema.xml*
*false*




I don't see any issues reported in the log as well, but see this error from
the UI:

[image: Inline image 1]


Not sure about the problem. This is happening when I ingest more than 40K
messages in core before restarting Solr server.

I am using Hadoop 2.7.3 with S3N FS. Please help me on resolving this issue.

Thanks,
Regards,
Amar