RE: Solr 6.4 - Transient core loading is extremely slow with HDFS and S3

2017-04-12 Thread Cahill, Trey
Hi Amarnath, 

From this log snippet:
"
2017-04-12 17:53:44.900 INFO
 (searcherExecutor-12-thread-1-processing-x:amar1) [   x:amar1]
o.a.s.c.SolrCore [amar1] Registered new searcher Searcher@3f61e7f2[amar1]
main{ExitableDirectoryReader(UninvertingDirectoryReader(Uninverting(_16(6.4.2):c97790)
Uninverting(_17(6.4.2):C236640) Uninverting(_b(6.4.2):C51852)
Uninverting(_d(6.4.2):C4) Uninverting(_f(6.4.2):C1)
Uninverting(_o(6.4.2):C33360) Uninverting(_r(6.4.2):C40358)
Uninverting(_y(6.4.2):C6) Uninverting(_14(6.4.2):C1) 
Uninverting(_15(6.4.2):C1)))}
2017-04-12 17:56:22.799 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.c.SolrCores Opening transient core amar1
2017-04-12 17:56:22.837 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.c.S.Request [amar1]  webapp=/solr path=/select params={q=*:*=0}
hits=59 status=0 *QTime=243787*
"

It does look like reading the data into Solr from S3 is being slow.

Running Solr on an EC2 instance in the same AWS region as your S3 bucket should 
help.  While you’re in AWS, using VPC endpoints should also help with 
performance. From your logs, it looks like you're running from your laptop.

It looks like you’re using s3a, which is a good start.  On a side note,  Hadoop 
2.8 has recently been released 
(https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+2.8+Release), which 
includes some work for s3a. Not promising any performance improvements if you 
use s3a with Hadoop 2.8, but it's probably the best way to access S3 right now.

Finally,  remember that S3 is a service; if the S3 service is slow (for example 
due to a heavy stream of request), then your operations with S3 will also be 
slow.  

Hope this helps and good luck, 

Trey

-Original Message-
From: Amarnath palavalli [mailto:pamarn...@gmail.com] 
Sent: Wednesday, April 12, 2017 2:09 PM
To: solr-user@lucene.apache.org
Subject: Solr 6.4 - Transient core loading is extremely slow with HDFS and S3

Hello,

I am using S3 as the primary store for data directory of core. To achieve this, 
I have the following in Solrconfig.xml:


  **
*  s3a://amar-hdfs/solr*
*  /usr/local/Cellar/hadoop/2.7.3/libexec/etc/hadoop*
*  true*
*  4*
*  true*
*  16384*
*  true*
*  true*
*  16*
*  192*
*  *

When I access the core 'amar1' it is taking like 245 seconds to load the core 
of total size about 85 MB. Here is the complete solr.log for core
loading:

2017-04-12 17:52:19.079 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.c.SolrResourceLoader [amar1] Added 57 libs to classloader, from
paths: [/Users/apalavalli/solr/solr-deployment/contrib/clustering/lib,
/Users/apalavalli/solr/solr-deployment/contrib/extraction/lib,
/Users/apalavalli/solr/solr-deployment/contrib/langid/lib,
/Users/apalavalli/solr/solr-deployment/contrib/velocity/lib,
/Users/apalavalli/solr/solr-deployment/dist,
/Users/apalavalli/solr/solr-deployment/dist/lib2]
2017-04-12 17:52:19.109 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.c.SolrConfig Using Lucene MatchVersion: 6.4.2
2017-04-12 17:52:19.155 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.s.IndexSchema [amar1] Schema name=log-saas
2017-04-12 17:52:19.217 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.s.IndexSchema Loaded schema log-saas/1.6 with uniqueid field id
2017-04-12 17:52:19.217 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.c.CoreContainer Creating SolrCore 'amar1' using configuration from 
configset
/Users/apalavalli/solr/solr-deployment/server/solr/configsets/base-config-s3
2017-04-12 17:52:19.223 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.c.HdfsDirectoryFactory solr.hdfs.home=s3a://amar-hdfs/solr
2017-04-12 17:52:19.223 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.c.HdfsDirectoryFactory Solr Kerberos Authentication disabled
2017-04-12 17:52:19.234 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.c.SolrCore [[amar1] ] Opening new SolrCore at 
[/Users/apalavalli/solr/solr-deployment/server/solr/configsets/base-config-s3],
dataDir=[s3a://amar-hdfs/solr/amar1/data/]
2017-04-12 17:52:19.234 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.c.JmxMonitoredMap JMX monitoring is enabled. Adding Solr mbeans to JMX 
Server: com.sun.jmx.mbeanserver.JmxMBeanServer@5745ca0e
2017-04-12 17:52:19.236 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.c.HdfsDirectoryFactory creating directory factory for path 
s3a://amar-hdfs/solr/amar1/data/snapshot_metadata
2017-04-12 17:52:19.274 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.c.HdfsDirectoryFactory Number of slabs of block cache [4] with direct 
memory allocation set to [true]
2017-04-12 17:52:19.274 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.c.HdfsDirectoryFactory Block cache target memory usage, slab size of 
[134217728] will allocate [4] slabs and use ~[536870912] bytes
2017-04-12 17:52:19.274 INFO  (qtp1654589030-18) [   x:amar1]
o.a.s.c.HdfsDirectoryFactory Creating new global HDFS BlockCache
2017-04-12 17:52:19.888 WARN  (qtp1654589030-18) [   x:amar1]
o.a.h.u.NativeCodeLoader Unable to load native-hadoop library for your 
platform... using builtin-java classes where 

RE: Solr with HDFS on AWS S3 - Server restart fails to load the core

2017-04-07 Thread Cahill, Trey
Hi Amarnath,

It looks like you’ve set the core to not load on startup via the 
“loadOnStartup=false“ property.   Your response also shows that the core is not 
loaded, “false“.

I’m not really sure how to load cores after a restart, but possibly using the 
Core Admin Reload would do it 
(https://cwiki.apache.org/confluence/display/solr/CoreAdmin+API#CoreAdminAPI-RELOAD).

Best of luck,

Trey

From: Amarnath palavalli [mailto:pamarn...@gmail.com]
Sent: Friday, April 07, 2017 3:20 PM
To: solr-user@lucene.apache.org
Subject: Solr with HDFS on AWS S3 - Server restart fails to load the core

Hello,

I configured Solr to use HDFS, which in turn configured to use S3N. I used the 
information from this issue to configure:
https://issues.apache.org/jira/browse/SOLR-9952

Here is the command I have used to start the Solr with HDFS:
bin/solr start -Dsolr.directoryFactory=HdfsDirectoryFactory 
-Dsolr.lock.type=hdfs -Dsolr.hdfs.home=s3n://amar-hdfs/solr 
-Dsolr.hdfs.confdir=/usr/local/Cellar/hadoop/2.7.3/libexec/etc/hadoop  
-DXX:MaxDirectMemorySize=2g

I am able to create a core, with the following properties:
#Written by CorePropertiesLocator
#Thu Apr 06 23:08:57 UTC 2017
name=amar-s3
loadOnStartup=false
transient=true
configSet=base-config

I am able to ingest messages into Solr and also query the content. Everything 
seems to be fine until this stage and I can see the data dir on S3.

However, the problem is when I restart the Solr server, that is when I see the 
core not loaded even when accessed/queried against it. Here is the admin API to 
get all cores gives:


0
617



...

amar-s3

/Users/apalavalli/solr/solr-deployment/server/solr/amar-s3

data/
solrconfig.xml
schema.xml
false




I don't see any issues reported in the log as well, but see this error from the 
UI:

[Inline image 1]


Not sure about the problem. This is happening when I ingest more than 40K 
messages in core before restarting Solr server.

I am using Hadoop 2.7.3 with S3N FS. Please help me on resolving this issue.

Thanks,
Regards,
Amar






RE: reset version number

2017-01-11 Thread Cahill, Trey
What are you trying to accomplish by resetting the version number?

-Original Message-
From: Kris Musshorn [mailto:mussho...@comcast.net] 
Sent: Tuesday, January 10, 2017 9:31 PM
To: solr-user@lucene.apache.org
Subject: RE: reset version number

Obviously deleting and rebuilding the core will work but is there another way?
K

-Original Message-
From: KRIS MUSSHORN [mailto:mussho...@comcast.net] 
Sent: Tuesday, January 10, 2017 12:00 PM
To: solr-user@lucene.apache.org
Subject: reset version number

SOLR 5.4.1 web admin interface has a version number in the selected core's 
overview. 
How does one reset this number? 

Kris 



RE: Integration Solr Cloudera with squirrel-sql

2016-08-26 Thread Cahill, Trey
Hardika,

Parallel SQL and the accompanying JDBC connector only became available in Solr 
6.x.  Since Cloudera's Solr is only at 4.10, it will not have this feature.

Trey

-Original Message-
From: Hardika Catur S [mailto:hardika.sa...@solusi247.com.INVALID] 
Sent: Friday, August 26, 2016 4:31 AM
To: solr-user@lucene.apache.org
Subject: Integration Solr Cloudera with squirrel-sql

Hi,

I want integrate apache solr with squirrel-sql. It work in solr 6.0 and 
squirrel-sql 3.7.  But I have difficulty in integrating solr Cloudera 4.10,  
because lib not in accordance with the needs of a squirrel-sql.

Does solr Cloudera 4.10 could be integrated with squirrel-sql??
Or there aresoftware to transform a solr query into sql query similar to 
squirrel ??

Please help me to find a solution.

Thanks,
Hardika CS.