Re: Getting 500s on distributed queries with SolrCloud

2014-03-24 Thread Shalin Shekhar Mangar
The Grouping feature only works if groups are in the same shard.
Perhaps that is the problem here?

I could find https://issues.apache.org/jira/browse/SOLR-4164 which
says that once the sharding was fixed, the problem went away. We
should come up with a better exception message though.

On Fri, Mar 21, 2014 at 10:49 PM, Ugo Matrangolo
ugo.matrang...@gmail.com wrote:
 Hi,

 I have a two shard collection running and I'm getting this error on each
 query:

 2014-03-21 17:08:42,018 [qtp-75] ERROR
 org.apache.solr.servlet.SolrDispatchFilter  -
 *null:java.lang.IllegalArgumentException:
 numHits must be  0; please use TotalHitCountCollector if you just need the
 total hit count*
 at
 org.apache.lucene.search.TopFieldCollector.create(TopFieldCollector.java:1130)
 at
 org.apache.lucene.search.TopFieldCollector.create(TopFieldCollector.java:1079)
 at
 org.apache.lucene.search.grouping.AbstractSecondPassGroupingCollector.init(AbstractSecondPassGroupingCollector.java:75)
 at
 org.apache.lucene.search.grouping.term.TermSecondPassGroupingCollector.init(TermSecondPassGroupingCollector.java:49)
 at
 org.apache.solr.search.grouping.distributed.command.TopGroupsFieldCommand.create(TopGroupsFieldCommand.java:129)
 at
 org.apache.solr.search.grouping.CommandHandler.execute(CommandHandler.java:142)
 at
 org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:387)
 at
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:214)
 at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1916)
 at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:780)

 Note that I'm using grouping and disabling it fixed the problem.

 I was aware that SolrCloud does not fully supports grouping in a
 distributed setup but I was expecting incorrect results (that have to
 addressed with custom hashing afaik) and not an error.

 Does anyone see this error before?

 Ugo



-- 
Regards,
Shalin Shekhar Mangar.


how to generate json response from the php solarium ?

2014-03-24 Thread Sohan Kalsariya
How can i get the json response from solr ?
I mean how can i get response of the searched results in json format
and print it in solarium php code ?

-- 
Regards,
*Sohan Kalsariya*


Re: how to generate json response from the php solarium ?

2014-03-24 Thread Gora Mohanty
On 24 March 2014 12:35, Sohan Kalsariya sohankalsar...@gmail.com wrote:
 How can i get the json response from solr ?
 I mean how can i get response of the searched results in json format
 and print it in solarium php code ?

Adding wt=json to the query will get you Solr results in JSON format.
Please refer to the Solarium documentation for how to print the
results.

Regards,
Gora


Re: Solr dih to read Clob contents

2014-03-24 Thread Prasi S
My database configuration is  as below

  entity name=x query=SELECT ID, XMLSERIALIZE(SMRY as CLOB(1M)) as SMRY
FROM BOOK_REC  fetch first 40 rows only
   transformer=ClobTransformer 
field column=MBR name=mbr /
   entity name=y dataSource=xmldata dataField=x.SMRY
processor=XPathEntityProcessor
forEach=/*:summary rootEntity=true 
 field column=card_no xpath=/cardNo /

   /entity
 /entity

and i get my response from solr as below

doc
str name=card_noorg...@1c8e807/str

Am i mising anything?



Thanks,
Prasi


On Thu, Mar 20, 2014 at 4:25 PM, Gora Mohanty g...@mimirtech.com wrote:

 On 20 March 2014 14:53, Prasi S prasi1...@gmail.com wrote:
 
  Hi,
  I have a requirement to index a database table with clob content. Each
 row
  in my table a column which is an xml stored as clob. I want to read the
  contents of xmlthrough dih and map each of the xml tag to a separate solr
  field,
 
  Below is my clob content.
  root
 authorA/author
 date02-Dec-2013/date
 .
 .
 .
  /root
 
  i want to read the contents of the clob and map author to author_solr and
  date to date_solr . Is this possible with a clob tranformer or a script
  tranformer.

 You will need to use a FieldReaderDataSource, and a XPathEntityProcessor
 along with the ClobTransformer. You do not provide details of your DIH data
 configuration file, but this should look something like:

 dataSource name=xmldata type=FieldReaderDataSource/
 ...
 document
   entity name=x query=... transformer=ClobTransformer
  entity name=y dataSource=xmldata dataField=x.clob_column
 processor=XPathEntityProcessor forEach=/root
field column=author_solr xpath=/author /
field column=date_solr xpath=/date /
  /entity
   /entity
 /document

 Regards,
 Gora



Re: join and filter query with AND

2014-03-24 Thread Marcin Rzewucki
Hi,

Yonik, thank you for explaining me the reason of the issue. The workarounds
you suggested are working fine.
Kranti, your suggestion was also good :-)

Thanks a lot!



On 21 March 2014 20:00, Kranti Parisa kranti.par...@gmail.com wrote:

 My example should also work, am I missing something?

 q=({!join from=inner_id to=outer_id fromIndex=othercore
 v=$joinQuery})joinQuery=(city:Stara Zagora AND prod:214)

 Thanks,
 Kranti K. Parisa
 http://www.linkedin.com/in/krantiparisa



 On Fri, Mar 21, 2014 at 2:11 PM, Yonik Seeley yo...@heliosearch.com
 wrote:

  Correct.  This is only a limitation of embedding a local-params style
  subquery within lucene syntax.
  The parser, not knowing the syntax of the embedded query, currently
  assumes the query text ends at whitespace or other special punctuation
  such as ).
 
  Original:
  (({!join from=inner_id to=outer_id fromIndex=othercore}city:Stara
  Zagora)) AND (prod:214)
 
  Some possible workarounds that should work:
  q={!join from=inner_id to=outer_id fromIndex=othercore}city:Stara
 Zagora
  fq=prod:214
 
  q=({!join from=inner_id to=outer_id fromIndex=othercore
  v='city:Stara Zagora'} AND prod:214)
 
  q=({!join from=inner_id to=outer_id fromIndex=othercore v=$jq} AND
  prod:214)
  jq=city:Stara Zagora
 
 
  -Yonik
  http://heliosearch.org - solve Solr GC pauses with off-heap filters
  and fieldcache
 
 
  On Fri, Mar 21, 2014 at 1:54 PM, Jack Krupansky j...@basetechnology.com
 
  wrote:
   I suspect that this is a bug in the implementation of the parsing of
   embedded nested query parsers . That's a fairly new feature compared to
   non-embedded nested query parsers - maybe Yonik could shed some light.
  This
   may date from when he made a copy of the Lucene query parser for Solr
 and
   added the parsing of embedded nested query parsers to the grammar. It
  seems
   like the embedded nested query parser is only being applied to a
 single,
   white space-delimited term, and not respecting the fact that the term
 is
  a
   quoted phrase.
  
   -- Jack Krupansky
  
   -Original Message- From: Marcin Rzewucki
   Sent: Thursday, March 20, 2014 5:19 AM
   To: solr-user@lucene.apache.org
   Subject: Re: join and filter query with AND
  
  
   Nope. There is no line break in the string and it is not feed from
 file.
   What else could be the reason ?
  
  
  
   On 19 March 2014 17:57, Erick Erickson erickerick...@gmail.com
 wrote:
  
   It looks to me like you're feeding this from some
   kind of text file and you really _do_ have a
   line break after Stara
  
   Or have a line break in the string you paste into the URL
   or something similar.
  
   Kind of shooting in the dark though.
  
   Erick
  
   On Wed, Mar 19, 2014 at 8:48 AM, Marcin Rzewucki mrzewu...@gmail.com
 
   wrote:
Hi,
   
I have the following issue with join query parser and filter query.
  For
such query:
   
str name=q*:*/str
str name=fq
(({!join from=inner_id to=outer_id fromIndex=othercore}city:Stara
Zagora)) AND (prod:214)
/str
   
I got error:
lst name=error
str name=msg
org.apache.solr.search.SyntaxError: Cannot parse 'city:Stara':
  Lexical
error at line 1, column 12. Encountered: EOF after : \Stara
/str
int name=code400/int
/lst
   
Stack:
DEBUG - 2014-03-19 13:35:20.825;
   org.eclipse.jetty.servlet.ServletHandler;
chain=SolrRequestFilter-default
DEBUG - 2014-03-19 13:35:20.826;
org.eclipse.jetty.servlet.ServletHandler$CachedChain; call filter
SolrRequestFilter
ERROR - 2014-03-19 13:35:20.828;
 org.apache.solr.common.SolrException;
org.apache.solr.common.SolrException: 
org.apache.solr.search.SyntaxError:
Cannot parse 'city:Stara': Lexical error at line 1, column 12.  E
ncountered: EOF after : \Stara
at
   
  
  
 
 org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:179)
at
   
  
  
 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:193)
at
   
  
  
 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1916)
at
   
  
  
 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:780)
at
   
  
  
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427)
at
   
  
  
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217)
at
   
  
  
 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
   
  
 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
   
  
  
 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
   
  
  
 
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
   

Re: Solr4.7 No live SolrServers available to handle this request

2014-03-24 Thread Sathya
Hi Greg,

This is my Clusterstate.json.

WatchedEvent state:SyncConnected type:None path:null
[zk: 10.10.1.72:2185(CONNECTED) 0] get /clusterstate.json
{set_recent:{
shards:{
  shard1:{
range:8000-d554,
state:active,
replicas:{
  10.10.1.16:4040_solr_set_recent_shard1_replica1:{
state:active,
base_url:http://10.10.1.16:4040/solr;,
core:set_recent_shard1_replica1,
node_name:10.10.1.16:4040_solr},
  10.10.1.72:2020_solr_set_recent_shard1_replica2:{
state:active,
base_url:http://10.10.1.72:2020/solr;,
core:set_recent_shard1_replica2,
node_name:10.10.1.72:2020_solr},
  10.10.1.19:3030_solr_set_recent_shard1_replica3:{
state:active,
base_url:http://10.10.1.19:3030/solr;,
core:set_recent_shard1_replica3,
node_name:10.10.1.19:3030_solr,
leader:true},
  10.10.1.21:1010_solr_set_recent_shard1_replica4:{
state:active,
base_url:http://10.10.1.21:1010/solr;,
core:set_recent_shard1_replica4,
node_name:10.10.1.21:1010_solr},
  10.10.1.14:5050_solr_set_recent_shard1_replica5:{
state:active,
base_url:http://10.10.1.14:5050/solr;,
core:set_recent_shard1_replica5,
node_name:10.10.1.14:5050_solr}}},
  shard2:{
range:d555-2aa9,
state:active,
replicas:{
  10.10.1.16:4040_solr_set_recent_shard2_replica1:{
state:active,
base_url:http://10.10.1.16:4040/solr;,
core:set_recent_shard2_replica1,
node_name:10.10.1.16:4040_solr},
  10.10.1.72:2020_solr_set_recent_shard2_replica2:{
state:active,
base_url:http://10.10.1.72:2020/solr;,
core:set_recent_shard2_replica2,
node_name:10.10.1.72:2020_solr},
  10.10.1.19:3030_solr_set_recent_shard2_replica3:{
state:active,
base_url:http://10.10.1.19:3030/solr;,
core:set_recent_shard2_replica3,
node_name:10.10.1.19:3030_solr,
leader:true},
  10.10.1.21:1010_solr_set_recent_shard2_replica4:{
state:active,
base_url:http://10.10.1.21:1010/solr;,
core:set_recent_shard2_replica4,
node_name:10.10.1.21:1010_solr},
  10.10.1.14:5050_solr_set_recent_shard2_replica5:{
state:active,
base_url:http://10.10.1.14:5050/solr;,
core:set_recent_shard2_replica5,
node_name:10.10.1.14:5050_solr}}},
  shard3:{
range:2aaa-7fff,
state:active,
replicas:{
  10.10.1.16:4040_solr_set_recent_shard3_replica1:{
state:active,
base_url:http://10.10.1.16:4040/solr;,
core:set_recent_shard3_replica1,
node_name:10.10.1.16:4040_solr},
  10.10.1.72:2020_solr_set_recent_shard3_replica2:{
state:active,
base_url:http://10.10.1.72:2020/solr;,
core:set_recent_shard3_replica2,
node_name:10.10.1.72:2020_solr},
  10.10.1.19:3030_solr_set_recent_shard3_replica3:{
state:active,
base_url:http://10.10.1.19:3030/solr;,
core:set_recent_shard3_replica3,
node_name:10.10.1.19:3030_solr,
leader:true},
  10.10.1.21:1010_solr_set_recent_shard3_replica4:{
state:active,
base_url:http://10.10.1.21:1010/solr;,
core:set_recent_shard3_replica4,
node_name:10.10.1.21:1010_solr},
  10.10.1.14:5050_solr_set_recent_shard3_replica5:{
state:active,
base_url:http://10.10.1.14:5050/solr;,
core:set_recent_shard3_replica5,
node_name:10.10.1.14:5050_solr,
maxShardsPerNode:3,
router:{name:compositeId},
replicationFactor:5}}
cZxid = 0x10014
ctime = Tue Mar 18 13:05:38 IST 2014
mZxid = 0x5027c
mtime = Mon Mar 24 14:22:24 IST 2014
pZxid = 0x10014
cversion = 0
dataVersion = 387
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 4182
numChildren = 0


Kindly let me know for further inputs..



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr4-7-No-live-SolrServers-available-to-handle-this-request-tp4125679p4126478.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr4.7 No live SolrServers available to handle this request

2014-03-24 Thread Sathya
Hi Greg,

This is my Clusterstate.json.

WatchedEvent state:SyncConnected type:None path:null
[zk: 10.10.1.72:2185(CONNECTED) 0] get /clusterstate.json
{set_recent:{
shards:{
  shard1:{
range:8000-d554,
state:active,
replicas:{
  10.10.1.16:4040_solr_set_recent_shard1_replica1:{
state:active,
base_url:http://10.10.1.16:4040/solr;,
core:set_recent_shard1_replica1,
node_name:10.10.1.16:4040_solr},
  10.10.1.72:2020_solr_set_recent_shard1_replica2:{
state:active,
base_url:http://10.10.1.72:2020/solr;,
core:set_recent_shard1_replica2,
node_name:10.10.1.72:2020_solr},
  10.10.1.19:3030_solr_set_recent_shard1_replica3:{
state:active,
base_url:http://10.10.1.19:3030/solr;,
core:set_recent_shard1_replica3,
node_name:10.10.1.19:3030_solr,
leader:true},
  10.10.1.21:1010_solr_set_recent_shard1_replica4:{
state:active,
base_url:http://10.10.1.21:1010/solr;,
core:set_recent_shard1_replica4,
node_name:10.10.1.21:1010_solr},
  10.10.1.14:5050_solr_set_recent_shard1_replica5:{
state:active,
base_url:http://10.10.1.14:5050/solr;,
core:set_recent_shard1_replica5,
node_name:10.10.1.14:5050_solr}}},
  shard2:{
range:d555-2aa9,
state:active,
replicas:{
  10.10.1.16:4040_solr_set_recent_shard2_replica1:{
state:active,
base_url:http://10.10.1.16:4040/solr;,
core:set_recent_shard2_replica1,
node_name:10.10.1.16:4040_solr},
  10.10.1.72:2020_solr_set_recent_shard2_replica2:{
state:active,
base_url:http://10.10.1.72:2020/solr;,
core:set_recent_shard2_replica2,
node_name:10.10.1.72:2020_solr},
  10.10.1.19:3030_solr_set_recent_shard2_replica3:{
state:active,
base_url:http://10.10.1.19:3030/solr;,
core:set_recent_shard2_replica3,
node_name:10.10.1.19:3030_solr,
leader:true},
  10.10.1.21:1010_solr_set_recent_shard2_replica4:{
state:active,
base_url:http://10.10.1.21:1010/solr;,
core:set_recent_shard2_replica4,
node_name:10.10.1.21:1010_solr},
  10.10.1.14:5050_solr_set_recent_shard2_replica5:{
state:active,
base_url:http://10.10.1.14:5050/solr;,
core:set_recent_shard2_replica5,
node_name:10.10.1.14:5050_solr}}},
  shard3:{
range:2aaa-7fff,
state:active,
replicas:{
  10.10.1.16:4040_solr_set_recent_shard3_replica1:{
state:active,
base_url:http://10.10.1.16:4040/solr;,
core:set_recent_shard3_replica1,
node_name:10.10.1.16:4040_solr},
  10.10.1.72:2020_solr_set_recent_shard3_replica2:{
state:active,
base_url:http://10.10.1.72:2020/solr;,
core:set_recent_shard3_replica2,
node_name:10.10.1.72:2020_solr},
  10.10.1.19:3030_solr_set_recent_shard3_replica3:{
state:active,
base_url:http://10.10.1.19:3030/solr;,
core:set_recent_shard3_replica3,
node_name:10.10.1.19:3030_solr,
leader:true},
  10.10.1.21:1010_solr_set_recent_shard3_replica4:{
state:active,
base_url:http://10.10.1.21:1010/solr;,
core:set_recent_shard3_replica4,
node_name:10.10.1.21:1010_solr},
  10.10.1.14:5050_solr_set_recent_shard3_replica5:{
state:active,
base_url:http://10.10.1.14:5050/solr;,
core:set_recent_shard3_replica5,
node_name:10.10.1.14:5050_solr,
maxShardsPerNode:3,
router:{name:compositeId},
replicationFactor:5}}
cZxid = 0x10014
ctime = Tue Mar 18 13:05:38 IST 2014
mZxid = 0x5027c
mtime = Mon Mar 24 14:22:24 IST 2014
pZxid = 0x10014
cversion = 0
dataVersion = 387
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 4182
numChildren = 0


Kindly let me know for further inputs..



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr4-7-No-live-SolrServers-available-to-handle-this-request-tp4125679p4126479.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr dih to read Clob contents

2014-03-24 Thread Shalin Shekhar Mangar
1. I don't see the definition of a datasource named 'xmldata' in your
data-config.
2. You have forEach=/*:summary but I don't think that is a syntax
supported by XPathRecordReader.

If you can give a sample of the xml stored as Clob in your database,
then we can help you write the right xpaths.

On Mon, Mar 24, 2014 at 12:55 PM, Prasi S prasi1...@gmail.com wrote:
 My database configuration is  as below

   entity name=x query=SELECT ID, XMLSERIALIZE(SMRY as CLOB(1M)) as SMRY
 FROM BOOK_REC  fetch first 40 rows only
transformer=ClobTransformer 
 field column=MBR name=mbr /
entity name=y dataSource=xmldata dataField=x.SMRY
 processor=XPathEntityProcessor
 forEach=/*:summary rootEntity=true 
  field column=card_no xpath=/cardNo /

/entity
  /entity

 and i get my response from solr as below

 doc
 str name=card_noorg...@1c8e807/str

 Am i mising anything?



 Thanks,
 Prasi


 On Thu, Mar 20, 2014 at 4:25 PM, Gora Mohanty g...@mimirtech.com wrote:

 On 20 March 2014 14:53, Prasi S prasi1...@gmail.com wrote:
 
  Hi,
  I have a requirement to index a database table with clob content. Each
 row
  in my table a column which is an xml stored as clob. I want to read the
  contents of xmlthrough dih and map each of the xml tag to a separate solr
  field,
 
  Below is my clob content.
  root
 authorA/author
 date02-Dec-2013/date
 .
 .
 .
  /root
 
  i want to read the contents of the clob and map author to author_solr and
  date to date_solr . Is this possible with a clob tranformer or a script
  tranformer.

 You will need to use a FieldReaderDataSource, and a XPathEntityProcessor
 along with the ClobTransformer. You do not provide details of your DIH data
 configuration file, but this should look something like:

 dataSource name=xmldata type=FieldReaderDataSource/
 ...
 document
   entity name=x query=... transformer=ClobTransformer
  entity name=y dataSource=xmldata dataField=x.clob_column
 processor=XPathEntityProcessor forEach=/root
field column=author_solr xpath=/author /
field column=date_solr xpath=/date /
  /entity
   /entity
 /document

 Regards,
 Gora




-- 
Regards,
Shalin Shekhar Mangar.


Re: Solr dih to read Clob contents

2014-03-24 Thread Prasi S
Below is my full configuration,

dataConfig
dataSource driver=com.ibm.db2.jcc.DB2Driver
url=jdbc:db2://IP:port/dbname user= password= /
dataSource name=xmldata type=FieldReaderDataSource/

 document

entity name=x query=SELECT ID, XMLSERIALIZE(SMRY as CLOB(1M)) as SMRY
FROM BOOK_REC fetch first 40 rows only
transformer=ClobTransformer 
field column=MBR name=mbr /
entity name=y dataSource=xmldata dataField=x.SMRY
processor=XPathEntityProcessor
forEach=/*:summary rootEntity=true 
field column=card_no xpath=/cardNo /

/entity
/entity
  /document
/dataConfig

And this is my xml data

ns:summary xmlns:ns=***
cardNoZAYQ5181/tripId
firstNameSam/firstName
lastNameMathews/lastName
date2013-01-18T23:29:04.492/date
/ns:summary

Thanks,
Prasi


On Mon, Mar 24, 2014 at 3:23 PM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 1. I don't see the definition of a datasource named 'xmldata' in your
 data-config.
 2. You have forEach=/*:summary but I don't think that is a syntax
 supported by XPathRecordReader.

 If you can give a sample of the xml stored as Clob in your database,
 then we can help you write the right xpaths.

 On Mon, Mar 24, 2014 at 12:55 PM, Prasi S prasi1...@gmail.com wrote:
  My database configuration is  as below
 
entity name=x query=SELECT ID, XMLSERIALIZE(SMRY as CLOB(1M)) as
 SMRY
  FROM BOOK_REC  fetch first 40 rows only
 transformer=ClobTransformer 
  field column=MBR name=mbr /
 entity name=y dataSource=xmldata dataField=x.SMRY
  processor=XPathEntityProcessor
  forEach=/*:summary rootEntity=true 
   field column=card_no xpath=/cardNo /
 
 /entity
   /entity
 
  and i get my response from solr as below
 
  doc
  str name=card_noorg...@1c8e807/str
 
  Am i mising anything?
 
 
 
  Thanks,
  Prasi
 
 
  On Thu, Mar 20, 2014 at 4:25 PM, Gora Mohanty g...@mimirtech.com
 wrote:
 
  On 20 March 2014 14:53, Prasi S prasi1...@gmail.com wrote:
  
   Hi,
   I have a requirement to index a database table with clob content. Each
  row
   in my table a column which is an xml stored as clob. I want to read
 the
   contents of xmlthrough dih and map each of the xml tag to a separate
 solr
   field,
  
   Below is my clob content.
   root
  authorA/author
  date02-Dec-2013/date
  .
  .
  .
   /root
  
   i want to read the contents of the clob and map author to author_solr
 and
   date to date_solr . Is this possible with a clob tranformer or a
 script
   tranformer.
 
  You will need to use a FieldReaderDataSource, and a XPathEntityProcessor
  along with the ClobTransformer. You do not provide details of your DIH
 data
  configuration file, but this should look something like:
 
  dataSource name=xmldata type=FieldReaderDataSource/
  ...
  document
entity name=x query=... transformer=ClobTransformer
   entity name=y dataSource=xmldata dataField=x.clob_column
  processor=XPathEntityProcessor forEach=/root
 field column=author_solr xpath=/author /
 field column=date_solr xpath=/date /
   /entity
/entity
  /document
 
  Regards,
  Gora
 



 --
 Regards,
 Shalin Shekhar Mangar.



Re: Getting 500s on distributed queries with SolrCloud

2014-03-24 Thread Ugo Matrangolo
Hi Shalin,

Thank you for your answer.

I'm already using custom hashing to make sure all the docs that are going
to be grouped together are on the same shard. During index I make sure the
uniqueKey is something like:

productId!skuId

so all the skus belonging to the same product will end up on the same
shard. At query time I will then group on the product id (I want all the
skus grouped by their owning product).

While working correctly the above did not fix the problem :/

What I have found by selectively switching off my grouping instructions to
SOLR is that the problem is in the

  group.limit=-1

that I append to each query.

This query (with all the skus sharing the same product sharded correctly on
the same shard) does not work:

 http://localhost:9766/skus/product_looks_for_sale?

 q=newsaledistrib=truefl=doc_id,%20idgroup=true*group.limit=-1*

while this works fine:

  http://localhost:9766/skus/product_looks_for_sale?
  q=new-salestart=0distrib=truefl=doc_id,%20idgroup=true
*group.limit=100*

AFAIK the -1 is only yo tell SOLR to give back all the matching docs in a
group so given that I do not think I will have more than 100s skus in a
single product I'm going to fix this issue setting the limit to 100.

Would be nice to know why the -1 makes the query fail anyway :)

Any thought ?

Thank you,
Ugo



On Mon, Mar 24, 2014 at 6:30 AM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 The Grouping feature only works if groups are in the same shard.
 Perhaps that is the problem here?

 I could find https://issues.apache.org/jira/browse/SOLR-4164 which
 says that once the sharding was fixed, the problem went away. We
 should come up with a better exception message though.

 On Fri, Mar 21, 2014 at 10:49 PM, Ugo Matrangolo
 ugo.matrang...@gmail.com wrote:
  Hi,
 
  I have a two shard collection running and I'm getting this error on each
  query:
 
  2014-03-21 17:08:42,018 [qtp-75] ERROR
  org.apache.solr.servlet.SolrDispatchFilter  -
  *null:java.lang.IllegalArgumentException:
  numHits must be  0; please use TotalHitCountCollector if you just need
 the
  total hit count*
  at
 
 org.apache.lucene.search.TopFieldCollector.create(TopFieldCollector.java:1130)
  at
 
 org.apache.lucene.search.TopFieldCollector.create(TopFieldCollector.java:1079)
  at
 
 org.apache.lucene.search.grouping.AbstractSecondPassGroupingCollector.init(AbstractSecondPassGroupingCollector.java:75)
  at
 
 org.apache.lucene.search.grouping.term.TermSecondPassGroupingCollector.init(TermSecondPassGroupingCollector.java:49)
  at
 
 org.apache.solr.search.grouping.distributed.command.TopGroupsFieldCommand.create(TopGroupsFieldCommand.java:129)
  at
 
 org.apache.solr.search.grouping.CommandHandler.execute(CommandHandler.java:142)
  at
 
 org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:387)
  at
 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:214)
  at
 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1916)
  at
 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:780)
 
  Note that I'm using grouping and disabling it fixed the problem.
 
  I was aware that SolrCloud does not fully supports grouping in a
  distributed setup but I was expecting incorrect results (that have to
  addressed with custom hashing afaik) and not an error.
 
  Does anyone see this error before?
 
  Ugo



 --
 Regards,
 Shalin Shekhar Mangar.



highlight did not work correctly

2014-03-24 Thread panzj.f...@cn.fujitsu.com
Hi all

While using solr 4.6 to highlight the result, I ran into a strange situation.
Most searching results were correctly highlighted. 
But a few gave out all the content of the indexed webpage without any 
highlighted keywords.

Is anybody ever met this problem?

Here is my solrconfig.xml 

--
bool name=hltrue/bool
   str name=hl.flcontent title/str
   str name=hl.simple.prelt;bgt;lt;emgt;lt;biggt;/str
   str name=hl.simple.postlt;/bgt;lt;/emgt;lt;/biggt;/str

   str name=f.title.hl.fragsize0/str
   str name=f.title.hl.alternateFieldtitle/str
   str name=f.content.hl.snippets1/str
   str name=f.content.hl.fragsize200/str
   str name=f.content.hl.alternateFieldcontent/str
   str name=f.content.hl.maxAlternateFieldLength200/str

--

I would appreciate for any reply.

THX 





Re: highlight did not work correctly

2014-03-24 Thread Ahmet Arslan
Hi,

You may need to increase hl.maxAnalyzedChars which has a default of 51200.





On Monday, March 24, 2014 2:33 PM, panzj.f...@cn.fujitsu.com 
panzj.f...@cn.fujitsu.com wrote:

Hi all

While using solr 4.6 to highlight the result, I ran into a strange situation.
Most searching results were correctly highlighted. 
But a few gave out all the content of the indexed webpage without any 
highlighted keywords.

Is anybody ever met this problem?

Here is my solrconfig.xml 

--
bool name=hltrue/bool
       str name=hl.flcontent title/str
       str name=hl.simple.prelt;bgt;lt;emgt;lt;biggt;/str
       str name=hl.simple.postlt;/bgt;lt;/emgt;lt;/biggt;/str

       str name=f.title.hl.fragsize0/str
       str name=f.title.hl.alternateFieldtitle/str
       str name=f.content.hl.snippets1/str
       str name=f.content.hl.fragsize200/str
       str name=f.content.hl.alternateFieldcontent/str
       str name=f.content.hl.maxAlternateFieldLength200/str

--

I would appreciate for any reply.

THX


Ram usage

2014-03-24 Thread David Flower
Hi All

We have a 4 node cluster with a collection thats sharded into 2 and each
shard having a master and a slave for redundancy however 1 node has decied
to use twice the ram that the others are using within the cluster

The only difference we can spot between the node is that the one with the
ram usage is saying its a slave while all the other are reporting that
they are masters



Does any one have any ideas why this has occurred

Cheers,
David




Re: Ram usage

2014-03-24 Thread Furkan KAMACI
Hi David;

Which version of Solr you are using?

Thanks;
Furkan KAMACI


2014-03-24 15:15 GMT+02:00 David Flower dflo...@amplience.com:

 Hi All

 We have a 4 node cluster with a collection thats sharded into 2 and each
 shard having a master and a slave for redundancy however 1 node has decied
 to use twice the ram that the others are using within the cluster

 The only difference we can spot between the node is that the one with the
 ram usage is saying its a slave while all the other are reporting that
 they are masters



 Does any one have any ideas why this has occurred

 Cheers,
 David





Re: Ram usage

2014-03-24 Thread David Flower
We¹re still on 4.4.0

David

On 24/03/2014 13:19, Furkan KAMACI furkankam...@gmail.com wrote:

Hi David;

Which version of Solr you are using?

Thanks;
Furkan KAMACI


2014-03-24 15:15 GMT+02:00 David Flower dflo...@amplience.com:

 Hi All

 We have a 4 node cluster with a collection thats sharded into 2 and each
 shard having a master and a slave for redundancy however 1 node has
decied
 to use twice the ram that the others are using within the cluster

 The only difference we can spot between the node is that the one with
the
 ram usage is saying its a slave while all the other are reporting that
 they are masters



 Does any one have any ideas why this has occurred

 Cheers,
 David






Re: SolrCloud from Stopping recovery for warnings to crash

2014-03-24 Thread Lukas Mikuckis
Yes, we upgraded solr from 4.6.1 to 4.7 3 weeks ago (2 weeks before solr
started crashing).
When we were upgrading, we just upgraded solr and changed versions in
collections configs.

When solr crashes we get OOM but only 2h after first Stopping recovery
warnings.

Maybe you have any ideas when Stopping recovery warnings are thrown?
Because now we have no idea what could cause this issue.

Mon, 24 Mar 2014 04:03:17 GMT Shalin Shekhar Mangar shalinman...@gmail.com
:

 Did you upgrade recently to Solr 4.7? 4.7 has a bad bug which can
 cause out of memory issues. Can you check your logs for out of memory
 errors?

 On Sun, Mar 23, 2014 at 9:07 PM, Lukas Mikuckis lukasmikuc...@gmail.com
wrote:
  Solr version: 4.7
 
  Architecture:
  2 solrs (1 shard, leader + replica)
  3 zookeepers
 
  Servers:
  * zookeeper + solr (heap 4gb) - RAM 8gb, 2 cpu cores
  * zookeeper + solr  (heap 4gb) - RAM 8gb, 2 cpu cores
  * zookeeper
 
  Solr data:
  * 21 collections
  * Many fields, small docs, docs count per collection from 1k to 500k
 
  About a week ago solr started crashing. It crashes every day, 3-4 times
a
  day. Usually at nigh. I can't tell anything what could it be related to
  because at that time we haven't done any configuration changes. Load
  haven't changed too.
 
 
  Everything starts with Stopping recovery for .. warnings (every
warnings is
  repeated several times):
 
  WARN  org.apache.solr.cloud.RecoveryStrategy; Stopping recovery for
  zkNodeName=core_node1core=**
 
  WARN  org.apache.solr.cloud.ElectionContext; cancelElection did not find
  election node to remove
 
  WARN  org.apache.solr.update.PeerSync; no frame of reference to tell if
  we've missed updates
 
  WARN  - 2014-03-23 04:00:26.286; org.apache.solr.update.PeerSync; no
frame
  of reference to tell if we've missed updates
 
  WARN  - 2014-03-23 04:00:30.728; org.apache.solr.handler.SnapPuller;
File
  _f9m_Lucene41_0.doc expected to be 6218278 while it is 7759879
 
  WARN  - 2014-03-23 04:00:54.126;
  org.apache.solr.update.UpdateLog$LogReplayer; Starting log replay
 
tlog{file=/path/solr/collection1_shard1_replica2/data/tlog/tlog.0003272
  refcount=2} active=true starting pos=356216606
 
  Then again Stopping recovery for .. warnings:
 
  WARN  org.apache.solr.cloud.RecoveryStrategy; Stopping recovery for
  zkNodeName=core_node1core=**
 
  ERROR - 2014-03-23 05:19:29.566; org.apache.solr.common.SolrException;
  org.apache.solr.common.SolrException: No registered leader was found
after
  waiting for 4000ms , collection: collection1 slice: shard1
 
  ERROR - 2014-03-23 05:20:03.961; org.apache.solr.common.SolrException;
  org.apache.solr.common.SolrException: I was asked to wait on state down
for
  IP:PORT_solr but I still do not see the requested state. I see state:
  active live:false
 
 
  After this serves mostly didn't recover.



 --
 Regards,
 Shalin Shekhar Mangar.




Re: Solr4.7 No live SolrServers available to handle this request

2014-03-24 Thread Greg Walters
Sathya,

We're still missing a fair amount of information here though it looks like your 
cluster is healthy. How are you indexing and what's the request you're sending 
that results in the error you're seeing? Have you checked your nodes' logs for 
errors that correspond with the one you're seeing while indexing?

Thanks,
Greg

On Mar 22, 2014, at 2:32 PM, Shalin Shekhar Mangar shalinman...@gmail.com 
wrote:

 Thanks Michael! I just committed your fix. It will be released with 4.7.1
 
 On Fri, Mar 21, 2014 at 8:30 PM, Michael Sokolov
 msoko...@safaribooksonline.com wrote:
 I just managed to track this down -- as you said the disconnect was a red
 herring.
 
 Ultimately the problem was caused by a custom analysis component we wrote
 that was raising an IOException -- it was missing some configuration files
 it relies on.
 
 What might be interesting for solr devs to have a look at is that exception
 was completely swallowed by JavabinCodec, making it very difficult to track
 down the problem.  Furthermore -- if the /add request was routed directly to
 the shard where the document was destined to end up, then the IOException
 raised by the analysis component (a char filter) showed up in the Solr HTTP
 response (probably because my client used XML format in one test -- javabin
 is used internally in SolrCloud).  But if the request was routed to a
 different shard, then the only exception that showed up anywhere (in the
 logs, in the HTTP response) was kind of irrelevant.
 
 I think this could be fixed pretty easily; see SOLR-5985 for my suggestion.
 
 -Mike
 
 
 
 On 03/21/2014 10:20 AM, Greg Walters wrote:
 
 Broken pipe errors are generally caused by unexpected disconnections and
 are some times hard to track down. Given the stack traces you've provided
 it's hard to point to any one thing and I suspect the relevant information
 was snipped out in the long dump of document fields. You might grab the
 entire error from the client you're uploading documents with, the server
 you're connected to and any other nodes that have an error at the same time
 and put it on pastebin or the like.
 
 Thanks,
 Greg
 
 On Mar 20, 2014, at 3:36 PM, Michael Sokolov
 msoko...@safaribooksonline.com wrote:
 
 I'm getting a similar exception when writing documents (on the client
 side).  I can write one document fine, but the second (which is being 
 routed
 to a different shard) generates the error.  It happens every time -
 definitely not a resource issue or timing problem since this database is
 completely empty -- I'm just getting started and running some tests, so
 there must be some kind of setup problem.  But it's difficult to diagnose
 (for me, anyway)!  I'd appreciate any insight, hints, guesses, etc. since
 I'm stuck. Thanks!
 
 One node (the leader?) is reporting Internal Server Error in its log,
 and another node (presumably the shard where the document is being 
 directed)
 bombs out like this:
 
 ERROR - 2014-03-20 15:56:53.022; org.apache.solr.common.SolrException;
 null:org.apache.solr.common.SolrException: ERROR adding document
 SolrInputDocument(
 
 ... long dump of document fields
 
 )
at
 org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:99)
at
 org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:166)
at
 org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:136)
at
 org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:225)
at
 org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:121)
at
 org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:190)
at
 org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:116)
at
 org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:173)
at
 org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:106)
at
 org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:58)
at
 org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:721)
 ...
 Caused by: java.net.SocketException: Broken pipe
at java.net.SocketOutputStream.socketWrite0(Native Method)
at
 java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)
at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
at
 org.apache.coyote.http11.InternalOutputBuffer.realWriteBytes(InternalOutputBuffer.java:215)
at
 

Re: SolrCloud from Stopping recovery for warnings to crash

2014-03-24 Thread Shalin Shekhar Mangar
I am guessing that it is all related to memory issues. I guess that as
the used heap increases, full GC cycles increase causing ZK timeouts
which in turn cause more recoveries to be initiated. In the end,
everything blows up with the out of memory errors. Do you log GC
activity on your servers?

I suggest that you rollback to 4.6.1 for now and upgrade to 4.7.1 when
it releases next week.

On Mon, Mar 24, 2014 at 7:51 PM, Lukas Mikuckis lukasmikuc...@gmail.com wrote:
 Yes, we upgraded solr from 4.6.1 to 4.7 3 weeks ago (2 weeks before solr
 started crashing).
 When we were upgrading, we just upgraded solr and changed versions in
 collections configs.

 When solr crashes we get OOM but only 2h after first Stopping recovery
 warnings.

 Maybe you have any ideas when Stopping recovery warnings are thrown?
 Because now we have no idea what could cause this issue.

 Mon, 24 Mar 2014 04:03:17 GMT Shalin Shekhar Mangar shalinman...@gmail.com
:

 Did you upgrade recently to Solr 4.7? 4.7 has a bad bug which can
 cause out of memory issues. Can you check your logs for out of memory
 errors?

 On Sun, Mar 23, 2014 at 9:07 PM, Lukas Mikuckis lukasmikuc...@gmail.com
 wrote:
  Solr version: 4.7
 
  Architecture:
  2 solrs (1 shard, leader + replica)
  3 zookeepers
 
  Servers:
  * zookeeper + solr (heap 4gb) - RAM 8gb, 2 cpu cores
  * zookeeper + solr  (heap 4gb) - RAM 8gb, 2 cpu cores
  * zookeeper
 
  Solr data:
  * 21 collections
  * Many fields, small docs, docs count per collection from 1k to 500k
 
  About a week ago solr started crashing. It crashes every day, 3-4 times
 a
  day. Usually at nigh. I can't tell anything what could it be related to
  because at that time we haven't done any configuration changes. Load
  haven't changed too.
 
 
  Everything starts with Stopping recovery for .. warnings (every
 warnings is
  repeated several times):
 
  WARN  org.apache.solr.cloud.RecoveryStrategy; Stopping recovery for
  zkNodeName=core_node1core=**
 
  WARN  org.apache.solr.cloud.ElectionContext; cancelElection did not find
  election node to remove
 
  WARN  org.apache.solr.update.PeerSync; no frame of reference to tell if
  we've missed updates
 
  WARN  - 2014-03-23 04:00:26.286; org.apache.solr.update.PeerSync; no
 frame
  of reference to tell if we've missed updates
 
  WARN  - 2014-03-23 04:00:30.728; org.apache.solr.handler.SnapPuller;
 File
  _f9m_Lucene41_0.doc expected to be 6218278 while it is 7759879
 
  WARN  - 2014-03-23 04:00:54.126;
  org.apache.solr.update.UpdateLog$LogReplayer; Starting log replay
 
 tlog{file=/path/solr/collection1_shard1_replica2/data/tlog/tlog.0003272
  refcount=2} active=true starting pos=356216606
 
  Then again Stopping recovery for .. warnings:
 
  WARN  org.apache.solr.cloud.RecoveryStrategy; Stopping recovery for
  zkNodeName=core_node1core=**
 
  ERROR - 2014-03-23 05:19:29.566; org.apache.solr.common.SolrException;
  org.apache.solr.common.SolrException: No registered leader was found
 after
  waiting for 4000ms , collection: collection1 slice: shard1
 
  ERROR - 2014-03-23 05:20:03.961; org.apache.solr.common.SolrException;
  org.apache.solr.common.SolrException: I was asked to wait on state down
 for
  IP:PORT_solr but I still do not see the requested state. I see state:
  active live:false
 
 
  After this serves mostly didn't recover.



 --
 Regards,
 Shalin Shekhar Mangar.





-- 
Regards,
Shalin Shekhar Mangar.


Re: Ram usage

2014-03-24 Thread Shawn Heisey
On 3/24/2014 7:15 AM, David Flower wrote:
 We have a 4 node cluster with a collection thats sharded into 2 and each
 shard having a master and a slave for redundancy however 1 node has decied
 to use twice the ram that the others are using within the cluster
 
 The only difference we can spot between the node is that the one with the
 ram usage is saying its a slave while all the other are reporting that
 they are masters

If you are using SolrCloud, then there are no masters and no slaves.
Each shard has a leader, but that is not a permanent role.

The master and slave designations that you see on the Replication tab
have zero meaning in SolrCloud unless a replication happens to be
happening right at that moment.  In SolrCloud, replication is only used
at node startup, and only if it's required.  The master/slave roles are
decided at the moment of replication and are not changed until another
replication becomes necessary.

When you say it's using twice the RAM, what *precisely* are you looking
at which tells you this?  Due to Solr using MMap for file access, some
of the numbers reported by the operating system will look bad but will
not indicate a problem.

Thanks,
Shawn



Re: SolrCloud from Stopping recovery for warnings to crash

2014-03-24 Thread Lukas Mikuckis
Garbage Collectors Summary:
https://apps.sematext.com/spm-reports/s/rgRnwuShgI

Pool Size:
https://apps.sematext.com/spm-reports/s/H16ndqichM

First Stopping recovery warning: 4:00, OOM error: 6:30.


2014-03-24 16:35 GMT+02:00 Shalin Shekhar Mangar shalinman...@gmail.com:

 I am guessing that it is all related to memory issues. I guess that as
 the used heap increases, full GC cycles increase causing ZK timeouts
 which in turn cause more recoveries to be initiated. In the end,
 everything blows up with the out of memory errors. Do you log GC
 activity on your servers?

 I suggest that you rollback to 4.6.1 for now and upgrade to 4.7.1 when
 it releases next week.

 On Mon, Mar 24, 2014 at 7:51 PM, Lukas Mikuckis lukasmikuc...@gmail.com
 wrote:
  Yes, we upgraded solr from 4.6.1 to 4.7 3 weeks ago (2 weeks before solr
  started crashing).
  When we were upgrading, we just upgraded solr and changed versions in
  collections configs.
 
  When solr crashes we get OOM but only 2h after first Stopping recovery
  warnings.
 
  Maybe you have any ideas when Stopping recovery warnings are thrown?
  Because now we have no idea what could cause this issue.
 
  Mon, 24 Mar 2014 04:03:17 GMT Shalin Shekhar Mangar 
 shalinman...@gmail.com
 :
 
  Did you upgrade recently to Solr 4.7? 4.7 has a bad bug which can
  cause out of memory issues. Can you check your logs for out of memory
  errors?
 
  On Sun, Mar 23, 2014 at 9:07 PM, Lukas Mikuckis 
 lukasmikuc...@gmail.com
  wrote:
   Solr version: 4.7
  
   Architecture:
   2 solrs (1 shard, leader + replica)
   3 zookeepers
  
   Servers:
   * zookeeper + solr (heap 4gb) - RAM 8gb, 2 cpu cores
   * zookeeper + solr  (heap 4gb) - RAM 8gb, 2 cpu cores
   * zookeeper
  
   Solr data:
   * 21 collections
   * Many fields, small docs, docs count per collection from 1k to 500k
  
   About a week ago solr started crashing. It crashes every day, 3-4
 times
  a
   day. Usually at nigh. I can't tell anything what could it be related
 to
   because at that time we haven't done any configuration changes. Load
   haven't changed too.
  
  
   Everything starts with Stopping recovery for .. warnings (every
  warnings is
   repeated several times):
  
   WARN  org.apache.solr.cloud.RecoveryStrategy; Stopping recovery for
   zkNodeName=core_node1core=**
  
   WARN  org.apache.solr.cloud.ElectionContext; cancelElection did not
 find
   election node to remove
  
   WARN  org.apache.solr.update.PeerSync; no frame of reference to tell
 if
   we've missed updates
  
   WARN  - 2014-03-23 04:00:26.286; org.apache.solr.update.PeerSync; no
  frame
   of reference to tell if we've missed updates
  
   WARN  - 2014-03-23 04:00:30.728; org.apache.solr.handler.SnapPuller;
  File
   _f9m_Lucene41_0.doc expected to be 6218278 while it is 7759879
  
   WARN  - 2014-03-23 04:00:54.126;
   org.apache.solr.update.UpdateLog$LogReplayer; Starting log replay
  
 
 tlog{file=/path/solr/collection1_shard1_replica2/data/tlog/tlog.0003272
   refcount=2} active=true starting pos=356216606
  
   Then again Stopping recovery for .. warnings:
  
   WARN  org.apache.solr.cloud.RecoveryStrategy; Stopping recovery for
   zkNodeName=core_node1core=**
  
   ERROR - 2014-03-23 05:19:29.566; org.apache.solr.common.SolrException;
   org.apache.solr.common.SolrException: No registered leader was found
  after
   waiting for 4000ms , collection: collection1 slice: shard1
  
   ERROR - 2014-03-23 05:20:03.961; org.apache.solr.common.SolrException;
   org.apache.solr.common.SolrException: I was asked to wait on state
 down
  for
   IP:PORT_solr but I still do not see the requested state. I see state:
   active live:false
  
  
   After this serves mostly didn't recover.
 
 
 
  --
  Regards,
  Shalin Shekhar Mangar.
 
 



 --
 Regards,
 Shalin Shekhar Mangar.



Re: Ram usage

2014-03-24 Thread David Flower
I¹m looking at dashboard page on all 4 nodes and seeing
Physical Memory 92% compared with ~41-44%

And JVM-Memory 52.9% compared to 23-28%

The reason I mentioned slave is that on the core overview page there is
An entry for Slave (Searching) that doesn¹t appear on any of the other
nodes

Cheers,
David




On 24/03/2014 14:47, Shawn Heisey s...@elyograg.org wrote:

On 3/24/2014 7:15 AM, David Flower wrote:
 We have a 4 node cluster with a collection thats sharded into 2 and each
 shard having a master and a slave for redundancy however 1 node has
decied
 to use twice the ram that the others are using within the cluster
 
 The only difference we can spot between the node is that the one with
the
 ram usage is saying its a slave while all the other are reporting that
 they are masters

If you are using SolrCloud, then there are no masters and no slaves.
Each shard has a leader, but that is not a permanent role.

The master and slave designations that you see on the Replication tab
have zero meaning in SolrCloud unless a replication happens to be
happening right at that moment.  In SolrCloud, replication is only used
at node startup, and only if it's required.  The master/slave roles are
decided at the moment of replication and are not changed until another
replication becomes necessary.

When you say it's using twice the RAM, what *precisely* are you looking
at which tells you this?  Due to Solr using MMap for file access, some
of the numbers reported by the operating system will look bad but will
not indicate a problem.

Thanks,
Shawn




Re: join and filter query with AND

2014-03-24 Thread Kranti Parisa
glad the suggestions are working for you!

Thanks,
Kranti K. Parisa
http://www.linkedin.com/in/krantiparisa



On Mon, Mar 24, 2014 at 4:10 AM, Marcin Rzewucki mrzewu...@gmail.comwrote:

 Hi,

 Yonik, thank you for explaining me the reason of the issue. The workarounds
 you suggested are working fine.
 Kranti, your suggestion was also good :-)

 Thanks a lot!



 On 21 March 2014 20:00, Kranti Parisa kranti.par...@gmail.com wrote:

  My example should also work, am I missing something?
 
  q=({!join from=inner_id to=outer_id fromIndex=othercore
  v=$joinQuery})joinQuery=(city:Stara Zagora AND prod:214)
 
  Thanks,
  Kranti K. Parisa
  http://www.linkedin.com/in/krantiparisa
 
 
 
  On Fri, Mar 21, 2014 at 2:11 PM, Yonik Seeley yo...@heliosearch.com
  wrote:
 
   Correct.  This is only a limitation of embedding a local-params style
   subquery within lucene syntax.
   The parser, not knowing the syntax of the embedded query, currently
   assumes the query text ends at whitespace or other special punctuation
   such as ).
  
   Original:
   (({!join from=inner_id to=outer_id fromIndex=othercore}city:Stara
   Zagora)) AND (prod:214)
  
   Some possible workarounds that should work:
   q={!join from=inner_id to=outer_id fromIndex=othercore}city:Stara
  Zagora
   fq=prod:214
  
   q=({!join from=inner_id to=outer_id fromIndex=othercore
   v='city:Stara Zagora'} AND prod:214)
  
   q=({!join from=inner_id to=outer_id fromIndex=othercore v=$jq} AND
   prod:214)
   jq=city:Stara Zagora
  
  
   -Yonik
   http://heliosearch.org - solve Solr GC pauses with off-heap filters
   and fieldcache
  
  
   On Fri, Mar 21, 2014 at 1:54 PM, Jack Krupansky 
 j...@basetechnology.com
  
   wrote:
I suspect that this is a bug in the implementation of the parsing of
embedded nested query parsers . That's a fairly new feature compared
 to
non-embedded nested query parsers - maybe Yonik could shed some
 light.
   This
may date from when he made a copy of the Lucene query parser for Solr
  and
added the parsing of embedded nested query parsers to the grammar. It
   seems
like the embedded nested query parser is only being applied to a
  single,
white space-delimited term, and not respecting the fact that the term
  is
   a
quoted phrase.
   
-- Jack Krupansky
   
-Original Message- From: Marcin Rzewucki
Sent: Thursday, March 20, 2014 5:19 AM
To: solr-user@lucene.apache.org
Subject: Re: join and filter query with AND
   
   
Nope. There is no line break in the string and it is not feed from
  file.
What else could be the reason ?
   
   
   
On 19 March 2014 17:57, Erick Erickson erickerick...@gmail.com
  wrote:
   
It looks to me like you're feeding this from some
kind of text file and you really _do_ have a
line break after Stara
   
Or have a line break in the string you paste into the URL
or something similar.
   
Kind of shooting in the dark though.
   
Erick
   
On Wed, Mar 19, 2014 at 8:48 AM, Marcin Rzewucki 
 mrzewu...@gmail.com
  
wrote:
 Hi,

 I have the following issue with join query parser and filter
 query.
   For
 such query:

 str name=q*:*/str
 str name=fq
 (({!join from=inner_id to=outer_id fromIndex=othercore}city:Stara
 Zagora)) AND (prod:214)
 /str

 I got error:
 lst name=error
 str name=msg
 org.apache.solr.search.SyntaxError: Cannot parse 'city:Stara':
   Lexical
 error at line 1, column 12. Encountered: EOF after : \Stara
 /str
 int name=code400/int
 /lst

 Stack:
 DEBUG - 2014-03-19 13:35:20.825;
org.eclipse.jetty.servlet.ServletHandler;
 chain=SolrRequestFilter-default
 DEBUG - 2014-03-19 13:35:20.826;
 org.eclipse.jetty.servlet.ServletHandler$CachedChain; call filter
 SolrRequestFilter
 ERROR - 2014-03-19 13:35:20.828;
  org.apache.solr.common.SolrException;
 org.apache.solr.common.SolrException: 
 org.apache.solr.search.SyntaxError:
 Cannot parse 'city:Stara': Lexical error at line 1, column 12.  E
 ncountered: EOF after : \Stara
 at

   
   
  
 
 org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:179)
 at

   
   
  
 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:193)
 at

   
   
  
 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at
 org.apache.solr.core.SolrCore.execute(SolrCore.java:1916)
 at

   
   
  
 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:780)
 at

   
   
  
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427)
 at

   
   
  
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217)
 at

   
   
  
 
 

Re: SolrCloud from Stopping recovery for warnings to crash

2014-03-24 Thread Lukas Mikuckis
We tried to set ZK timeout to 1s and did load testing (both indexing and
search) and this issue didn't happen.


2014-03-24 17:00 GMT+02:00 Lukas Mikuckis lukasmikuc...@gmail.com:

 Garbage Collectors Summary:
 https://apps.sematext.com/spm-reports/s/rgRnwuShgIhttps://app.getsignals.com/link?url=https%3A%2F%2Fapps.sematext.com%2Fspm-reports%2Fs%2FrgRnwuShgIukey=agxzfnNpZ25hbHNjcnhyGAsSC1VzZXJQcm9maWxlGICAgIa0jfILDAk=26275c93-7d78-4359-c01e-afe10a004d52

 Pool Size:
 https://apps.sematext.com/spm-reports/s/H16ndqichMhttps://app.getsignals.com/link?url=https%3A%2F%2Fapps.sematext.com%2Fspm-reports%2Fs%2FH16ndqichMukey=agxzfnNpZ25hbHNjcnhyGAsSC1VzZXJQcm9maWxlGICAgIa0jfILDAk=5027ed8d-cdc8-4e12-ea51-ea5677720d9a

 First Stopping recovery warning: 4:00, OOM error: 6:30.


 2014-03-24 16:35 GMT+02:00 Shalin Shekhar Mangar shalinman...@gmail.com:

 I am guessing that it is all related to memory issues. I guess that as
 the used heap increases, full GC cycles increase causing ZK timeouts
 which in turn cause more recoveries to be initiated. In the end,
 everything blows up with the out of memory errors. Do you log GC
 activity on your servers?

 I suggest that you rollback to 4.6.1 for now and upgrade to 4.7.1 when
 it releases next week.

 On Mon, Mar 24, 2014 at 7:51 PM, Lukas Mikuckis lukasmikuc...@gmail.com
 wrote:
  Yes, we upgraded solr from 4.6.1 to 4.7 3 weeks ago (2 weeks before solr
  started crashing).
  When we were upgrading, we just upgraded solr and changed versions in
  collections configs.
 
  When solr crashes we get OOM but only 2h after first Stopping recovery
  warnings.
 
  Maybe you have any ideas when Stopping recovery warnings are thrown?
  Because now we have no idea what could cause this issue.
 
  Mon, 24 Mar 2014 04:03:17 GMT Shalin Shekhar Mangar 
 shalinman...@gmail.com
 :
 
  Did you upgrade recently to Solr 4.7? 4.7 has a bad bug which can
  cause out of memory issues. Can you check your logs for out of memory
  errors?
 
  On Sun, Mar 23, 2014 at 9:07 PM, Lukas Mikuckis 
 lukasmikuc...@gmail.com
  wrote:
   Solr version: 4.7
  
   Architecture:
   2 solrs (1 shard, leader + replica)
   3 zookeepers
  
   Servers:
   * zookeeper + solr (heap 4gb) - RAM 8gb, 2 cpu cores
   * zookeeper + solr  (heap 4gb) - RAM 8gb, 2 cpu cores
   * zookeeper
  
   Solr data:
   * 21 collections
   * Many fields, small docs, docs count per collection from 1k to 500k
  
   About a week ago solr started crashing. It crashes every day, 3-4
 times
  a
   day. Usually at nigh. I can't tell anything what could it be related
 to
   because at that time we haven't done any configuration changes. Load
   haven't changed too.
  
  
   Everything starts with Stopping recovery for .. warnings (every
  warnings is
   repeated several times):
  
   WARN  org.apache.solr.cloud.RecoveryStrategy; Stopping recovery for
   zkNodeName=core_node1core=**
  
   WARN  org.apache.solr.cloud.ElectionContext; cancelElection did not
 find
   election node to remove
  
   WARN  org.apache.solr.update.PeerSync; no frame of reference to tell
 if
   we've missed updates
  
   WARN  - 2014-03-23 04:00:26.286; org.apache.solr.update.PeerSync; no
  frame
   of reference to tell if we've missed updates
  
   WARN  - 2014-03-23 04:00:30.728; org.apache.solr.handler.SnapPuller;
  File
   _f9m_Lucene41_0.doc expected to be 6218278 while it is 7759879
  
   WARN  - 2014-03-23 04:00:54.126;
   org.apache.solr.update.UpdateLog$LogReplayer; Starting log replay
  
 
 tlog{file=/path/solr/collection1_shard1_replica2/data/tlog/tlog.0003272
   refcount=2} active=true starting pos=356216606
  
   Then again Stopping recovery for .. warnings:
  
   WARN  org.apache.solr.cloud.RecoveryStrategy; Stopping recovery for
   zkNodeName=core_node1core=**
  
   ERROR - 2014-03-23 05:19:29.566;
 org.apache.solr.common.SolrException;
   org.apache.solr.common.SolrException: No registered leader was found
  after
   waiting for 4000ms , collection: collection1 slice: shard1
  
   ERROR - 2014-03-23 05:20:03.961;
 org.apache.solr.common.SolrException;
   org.apache.solr.common.SolrException: I was asked to wait on state
 down
  for
   IP:PORT_solr but I still do not see the requested state. I see state:
   active live:false
  
  
   After this serves mostly didn't recover.
 
 
 
  --
  Regards,
  Shalin Shekhar Mangar.
 
 



 --
 Regards,
 Shalin Shekhar Mangar.





Using Sentence Information For Snippet Generation

2014-03-24 Thread Furkan KAMACI
Hi;

When I generate snippet via Solr I do not want to remove beginning of any
sentence at the snippet. So I need to do a sentence detection. I think that
I can do it before I send documents into Solr. I can put some special
characters that signs beginning or end of a sentence. Then I can use that
information when generating snippet. On the other hand I should not show
that special character to the user.

What do you think that how can I do it or do you have any other ideas for
my purpose?

PS: I do not do it for English sentences.

Thanks;
Furkan KAMACI


Re: Ram usage

2014-03-24 Thread Shawn Heisey
 I¹m looking at dashboard page on all 4 nodes and seeing
 Physical Memory 92% compared with ~41-44%

 And JVM-Memory 52.9% compared to 23-28%

 The reason I mentioned slave is that on the core overview page there is
 An entry for Slave (Searching) that doesn¹t appear on any of the other
 nodes

 Cheers,
 David

It's completely normal for physical memory to be nearly 100 percent at all
times, unless you have memory available that greatly exceeds the size of
your index and any other data used by programs on the server. This is
simply how operating systems work.

http://en.wikipedia.org/wiki/Page_cache

JVM memory usage is highly variable and will fluctuate all over. Do a
Google image search for 'JVM sawtooth' to see how memory usage graphs
within the JVM.

Thanks,
Shawn




Re: Ram usage

2014-03-24 Thread David Flower
Its not saw toothing though it’s sitting solidly at 52%

On 24/03/2014 15:46, Shawn Heisey s...@elyograg.org wrote:

 I¹m looking at dashboard page on all 4 nodes and seeing
 Physical Memory 92% compared with ~41-44%

 And JVM-Memory 52.9% compared to 23-28%

 The reason I mentioned slave is that on the core overview page there is
 An entry for Slave (Searching) that doesn¹t appear on any of the other
 nodes

 Cheers,
 David

It's completely normal for physical memory to be nearly 100 percent at all
times, unless you have memory available that greatly exceeds the size of
your index and any other data used by programs on the server. This is
simply how operating systems work.

http://en.wikipedia.org/wiki/Page_cache

JVM memory usage is highly variable and will fluctuate all over. Do a
Google image search for 'JVM sawtooth' to see how memory usage graphs
within the JVM.

Thanks,
Shawn





Re: Can the solr dataimporthandler consume an atom feed?

2014-03-24 Thread eShard
The only message I get is:
 Indexing completed. Added/Updated: 0 documents. Deleted 0 documents.
Requests: 1, Skipped: 0

And there are no errors in the log.

Here's what the ibm atom feed looks like:

?xml version=1.0 encoding=utf-16?
atom:feed xmlns:atom=http://www.w3.org/2005/Atom;
xmlns:wplc=http://www.ibm.com/wplc/atom/1.0;
xmlns:age=http://purl.org/atompub/age/1.0;
xmlns:snx=http://www.ibm.com/xmlns/prod/sn;
xmlns:lconn=http://www.ibm.com/lotus/connections/seedlist/atom/1.0;

  atom:id
 
https://[redacted]/files/seedlist/myserver?Action=GetDocumentsamp;Format=ATOMamp;Locale=en_USamp;Range=2amp;Start=0/atom:id
  atom:link
href=https://[redacted]/files/seedlist/myserver?Action=GetDocumentsamp;Range=2amp;Start=1000amp;Format=ATOMamp;Locale=en_USamp;State=U0VDT05EXzIwMTQtMDMtMTMgMTY6MjM6NTguODRfMjAxMS0wNi0wNiAwODowNDoxNC42MjJfNmQ1YzQ3MWMtYTM3ZS00ZjlmLWE0OGEtZWZjYjMyZjU2NDgzXzEwMDBfZmFsc2U%3D;
  rel=next type=application/atom+xml title=Next page /
  atom:generator xml:lang=en-US version=1.2
  lconn:version=4.0.0.0Seedlist Service Backend
  System/atom:generator
  atom:category term=ContentSourceType/Files
  scheme=com.ibm.wplc.taxonomy://feature_taxonomy
  label=Files /
  atom:title xml:lang=en-USFiles : 1,000 entries of Seedlist
  FILES/atom:title
  wplc:action do=update /
  wplc:fieldInfo id=title name=Title type=string
  contentSearchable=true fieldSearchable=true
  parametric=false returnable=true sortable=false
  supportsExactMatch=false /
  wplc:fieldInfo id=author name=Owner's directory id
  type=string contentSearchable=false fieldSearchable=true
  parametric=false returnable=true sortable=false
  supportsExactMatch=true /
  wplc:fieldInfo id=published name=Created timestamp
  type=date contentSearchable=false fieldSearchable=false
  parametric=true returnable=true sortable=false
  supportsExactMatch=false /
  wplc:fieldInfo id=updated
  name=Last modification timestamp (major change only, as indicated in UI)
  type=date contentSearchable=false fieldSearchable=false
  parametric=true returnable=true sortable=true
  supportsExactMatch=false /
  wplc:fieldInfo id=summary name=Description type=string
  contentSearchable=true fieldSearchable=true
  parametric=false returnable=true sortable=false
  supportsExactMatch=false /
  wplc:fieldInfo id=tag name=Tag type=string
  contentSearchable=true fieldSearchable=true
  parametric=false returnable=true sortable=false
  supportsExactMatch=false /
  wplc:fieldInfo id=commentCount name=Number of comments
  type=int contentSearchable=false fieldSearchable=false
  parametric=true returnable=true sortable=true
  supportsExactMatch=true /
  wplc:fieldInfo id=downloadCount name=Number of downloads
  type=int contentSearchable=false fieldSearchable=false
  parametric=true returnable=true sortable=true
  supportsExactMatch=true /
  wplc:fieldInfo id=recommendCount
  name=Number of recommendations type=int
  contentSearchable=false fieldSearchable=false
  parametric=true returnable=true sortable=true
  supportsExactMatch=true /
  wplc:fieldInfo id=fileUpdated
  name=Binary file last modification timestamp type=date
  contentSearchable=false fieldSearchable=false
  parametric=true returnable=true sortable=true
  supportsExactMatch=true /
  wplc:fieldInfo id=fileSize name=Binary file size type=int
  contentSearchable=false fieldSearchable=false
  parametric=true returnable=true sortable=false
  supportsExactMatch=true /
  wplc:fieldInfo id=fileName name=File name type=string
  contentSearchable=true fieldSearchable=true
  parametric=false returnable=true sortable=true
  supportsExactMatch=false /
  wplc:fieldInfo id=sharedWithUser
  name=Shared with user's directory id type=string
  contentSearchable=false fieldSearchable=true
  parametric=false returnable=true sortable=false
  supportsExactMatch=true /
  wplc:fieldInfo id=sharedWithUserName
  name=Shared with user's name type=string
  contentSearchable=false fieldSearchable=false
  parametric=false returnable=true sortable=false
  supportsExactMatch=false /
  wplc:fieldInfo id=libraryId
  name=The id of library owning the file type=string
  contentSearchable=false fieldSearchable=true
  parametric=false returnable=true sortable=false
  supportsExactMatch=true /
  wplc:fieldInfo id=ORGANISATIONAL_ID
  name=The id of the organization the owning user belongs to
  type=string contentSearchable=false fieldSearchable=true
  parametric=false returnable=true sortable=false
  supportsExactMatch=true /
  wplc:fieldInfo id=communityId
  name=The id of the community associated to the file
  type=string contentSearchable=false fieldSearchable=true
  parametric=false returnable=true sortable=false
  supportsExactMatch=true /
  wplc:fieldInfo id=containerType
  name=The type of the container (library) associated to the file
  type=string contentSearchable=false fieldSearchable=true
  parametric=false returnable=true sortable=false
  supportsExactMatch=true /
  wplc:fieldInfo id=ATOMAPISOURCE name=Atom API link
  type=string 

Re: Ram usage

2014-03-24 Thread Shawn Heisey

On 3/24/2014 9:48 AM, David Flower wrote:

Its not saw toothing though it’s sitting solidly at 52%


It may be very difficult to see the sawtooth effect unless you actually 
connect an app like jconsole to your running Solr instance and watch the 
graphs over time.


My point was that what you've described does not sound like a problem at 
all.  If you are having symptoms noticeable from the client side, then 
we can tackle those, but these memory numbers sound fine to me.


Thanks,
Shawn



Re: Singles in solr for bigrams,trigrams in parsed_query

2014-03-24 Thread Dmitry Kan
Hi,

Query rewrite happens down the chain, after query parsing. For example a
wildcard query triggers an index based query rewrite where terms matching
the wildcard are added into the original query.

In your case, looks like the query rewrite will generate the ngrams and add
them into the original query.

So just make sure, that the analysis page shows what you expect on indexing
and querying sides.

Out of curiosity: what are you trying to achieve with the query side
shingles? Isn't just index time shingles enough?


On Thu, Mar 20, 2014 at 8:06 PM, Jyotirmoy Sundi sundi...@gmail.com wrote:

 Hi Folks,
I am using singles to index bigrams/trigrams. The same is also used
 for query in the schema.xml file. But when I run the query in debug mode
 for a collections, I dont see the bigrams in the parsed_query . Any idea
 what I might be missing.
 solr/colection/select?q=best%20pricedebugQuery=on

 str name=parsedquery_toStringtext:best text:price/str
 I was hoping to see
 str name=parsedquery_toStringtext:best text:price text:best price/str

 My schema files looks like this:
  types
 fieldType name=string class=solr.StrField sortMissingLast=true
 omitNorms=true/
 fieldType name=int class=solr.TrieIntField precisionStep=0
 omitNorms=true positionIncrementGap=0/

 fieldType name=text class=solr.TextField
 positionIncrementGap=100
   analyzer type=index
 charFilter class=solr.HTMLStripCharFilterFactory/
 filter class=solr.ShingleFilterFactory minShingleSize=2
 maxShingleSize=4 outputUnigrams=true /
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.LengthFilterFactory min=3 max=50 /
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=0 generateNumberParts=0 catenateWords=1
 catenateNumbers=1 catenateAll=1 preserveOriginal=1
 splitOnCaseChange=0 splitOnNumerics=0 stemEnglishPossessive=1/
 filter class=solr.StopFilterFactory/
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
 filter class=solr.TrimFilterFactory /
 /analyzer

   analyzer type=query
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.LengthFilterFactory min=3 max=50 /
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.StopFilterFactory/
 filter class=solr.TrimFilterFactory /
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=1 splitOnCaseChange=0
 splitOnNumerics=0 stemEnglishPossessive=1/
 filter class=solr.ShingleFilterFactory minShingleSize=2
 maxShingleSize=4 outputUnigrams=true /
 filter class=solr.CommonGramsFilterFactory words=stopwords.txt
 ignoreCase=true/
 !--filter class=solr.CommonGramsFilterFactory
 words=stopwords.txt ignoreCase=true/
 filter class=solr.ShingleFilterFactory minShingleSize=2
 maxShingleSize=4 outputUnigrams=true /--
  /analyzer
 /fieldType
  /types



 --
 Best Regards,
 Jyotirmoy Sundi




-- 
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan


Re: Solr Cloud collection keep going down?

2014-03-24 Thread Software Dev
Shawn,

Thanks for pointing me in the right direction. After consulting the
above document I *think* that the problem may be too large of a heap
and which may be affecting GC collection and hence causing ZK
timeouts.

We have around 20G of memory on these machines with a min/max of heap
at 6, 8 respectively (-Xms6G -Xmx10G). The rest was allocated for
aside for disk cache. Why did we choose 6-10? No other reason than we
wanted to allot enough for disk cache and then everything else was
thrown and Solr. Does this sound about right?

I took some screenshots for VisualVM and our NewRelic reporting as
well as some relevant portions of our SolrConfig.xml. Any
thoughts/comments would be greatly appreciated.

http://postimg.org/gallery/4t73sdks/1fc10f9c/

Thanks




On Sat, Mar 22, 2014 at 2:26 PM, Shawn Heisey s...@elyograg.org wrote:
 On 3/22/2014 1:23 PM, Software Dev wrote:
 We have 2 collections with 1 shard each replicated over 5 servers in the
 cluster. We see a lot of flapping (down or recovering) on one of the
 collections. When this happens the other collection hosted on the same
 machine is still marked as active. When this happens it takes a fairly long
 time (~30 minutes) for the collection to come back online, if at all. I
 find that its usually more reliable to completely shutdown solr on the
 affected machine and bring it back up with its core disabled. We then
 re-enable the core when its marked as active.

 A few questions:

 1) What is the healthcheck in Solr-Cloud? Put another way, what is failing
 that marks one collection as down but the other on the same machine as up?

 2) Why does recovery take forever when a node goes down.. even if its only
 down for 30 seconds. Our index is only 7-8G and we are running on SSD's.

 3) What can be done to diagnose and fix this problem?

 Unless you are actually using the ping request handler, the healthcheck
 config will not matter.  Or were you referring to something else?

 Referencing the logs you included in your reply:  The EofException
 errors happen because your client code times out and disconnects before
 the request it made has completed.  That is most likely just a symptom
 that has nothing at all to do with the problem.

 Read the following wiki page.  What I'm going to say below will
 reference information you can find there:

 http://wiki.apache.org/solr/SolrPerformanceProblems

 Relevant side note: The default zookeeper client timeout is 15 seconds.
  A typical zookeeper config defines tickTime as 2 seconds, and the
 timeout cannot be configured to be more than 20 times the tickTime,
 which means it cannot go beyond 40 seconds.  The default timeout value
 15 seconds is usually more than enough, unless you are having
 performance problems.

 If you are not actually taking Solr instances down, then the fact that
 you are seeing the log replay messages indicates to me that something is
 taking so much time that the connection to Zookeeper times out.  When it
 finally responds, it will attempt to recover the index, which means
 first it will replay the transaction log and then it might replicate the
 index from the shard leader.

 Replaying the transaction log is likely the reason it takes so long to
 recover.  The wiki page I linked above has a slow startup section that
 explains how to fix this.

 There is some kind of underlying problem that is causing the zookeeper
 connection to timeout.  It is most likely garbage collection pauses or
 insufficient RAM to cache the index, possibly both.

 You did not indicate how much total RAM you have or how big your Java
 heap is.  As the wiki page mentions in the SSD section, SSD is not a
 substitute for having enough RAM to cache at significant percentage of
 your index.

 Thanks,
 Shawn



Solr 4.3.1 memory swapping

2014-03-24 Thread Darrell Burgan
Hello all, we have a SolrCloud implementation in production, with two servers 
running Solr 4.3.1 in a SolrCloud configuration. Our search index is about 
70-80GB in size.  The trouble is that after several days of uptime, we will 
suddenly have periods where the operating system Solr is running in starts 
swapping heavily. This gets progressively worse until the swapping slows things 
down so much that Zookeeper thinks the nodes are no longer available. If both 
nodes are swapping, it can lead to an outage, which has happened to us a couple 
of times.

My question is why is it swapping?  Here's an example with numbers from our 
prod environment:


-  Total physical memory: 16GB

-  Physical memory usage: 15.58GB (99.4%)

-  Total swap space: 4GB

-  Swap space usage: 1.51GB (37.7%)

-  Total JVM Memory: 10GB

-  JVM heap: 1.89GB/4.44GB

The top command reports that the JVM has 3.8GB resident RAM and 81.8GB 
virtual.  Note that it is using up close to half of the swap space, even though 
the JVM only needs a subset of the physical memory.

So what is causing the swapping, and what should I do about it? I can add more 
memory to the VMs if I need to, but how much? And how much should I allocate to 
JVM v. leave available for the OS?

I could attach a screen shot of our Solr console and the top output if the 
listserv allows attachments.

Any ideas?

Thanks!
Darrell Burgan

[Description: Infor]http://www.infor.com/

Darrell Burgan | Chief Architect, PeopleAnswers
office: 214 445 2172 | mobile: 214 564 4450 | fax: 972 692 5386 | 
darrell.bur...@infor.commailto:darrell.bur...@infor.com | http://www.infor.com

CONFIDENTIALITY NOTE: This email (including any attachments) is confidential 
and may be protected by legal privilege. If you are not the intended recipient, 
be aware that any disclosure, copying, distribution, or use of the information 
contained herein is prohibited.  If you have received this message in error, 
please notify the sender by replying to this message and then delete this 
message in its entirety. Thank you for your cooperation.



Fixing corrupted index?

2014-03-24 Thread zqzuk
My Lucene index - built with Solr using Lucene4.1 - is corrupted. Upon trying
to read the index using the following code I get
org.apache.solr.common.SolrException: No such core: collection1 exception:


File configFile = new File(cacheFolder + File.separator + solr.xml);
CoreContainer container = new CoreContainer(cacheFolder, configFile);
SolrServer server = new EmbeddedSolrServer(container, collection1);
ModifiableSolrParams params = new ModifiableSolrParams();
params.set(q, idFieldName + : + ClientUtils.escapeQueryChars(queryId));
params.set(fl,idFieldName+,+valueFieldName);

QueryResponse response = server.query(params)


I used checkindex util to check the integrity of the index and it seems
not able to perform the task by throwing the following error:


Opening index @
/../solrindex_cache/zookeeper/solr/collection1/data/index

ERROR: could not read any segments file in directory
java.io.FileNotFoundException:
/../solrindex_cache/zookeeper/solr/collection1/data/index/segments_b5tb
(No such file or directory)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.init(RandomAccessFile.java:233)
at
org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:223)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:285)
at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:347)
at
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:783)
at
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:630)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:343)
at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:383)
at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:1777)


The file segments_b5tb that index checker is looking for is indeed missing
in the index folder. The only file that looks similar is segments.gen.
However, the index segment files including .si, tip, doc, fdx etc still
exist. 

Is there any way to fix this as it took me 2 weeks to build this index...

Many many thanks for your kind advice!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Fixing-corrupted-index-tp4126644.html
Sent from the Solr - User mailing list archive at Nabble.com.


Indexer: java.io.IOException: Job failed!

2014-03-24 Thread Laura McCord
Hi,

I’m trying to integrate Solr with Nutch and I performed all of the necessary 
steps except after Nutch performs the crawl it appears that I’m receiving a 
connection refused.

2014-03-24 11:42:43,062 INFO  indexer.IndexerMapReduce - IndexerMapReduce: 
crawldb: TestCrawl/crawldb
2014-03-24 11:42:43,062 INFO  indexer.IndexerMapReduce - IndexerMapReduce: 
linkdb: TestCrawl/linkdb
2014-03-24 11:42:43,062 INFO  indexer.IndexerMapReduce - IndexerMapReduces: 
adding segment: TestCrawl/segments/20140324113941
2014-03-24 11:42:43,304 WARN  util.NativeCodeLoader - Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable
2014-03-24 11:42:43,942 INFO  anchor.AnchorIndexingFilter - Anchor 
deduplication is: off
2014-03-24 11:42:44,456 INFO  indexer.IndexWriters - Adding 
org.apache.nutch.indexwriter.solr.SolrIndexWriter
2014-03-24 11:42:44,465 INFO  solr.SolrUtils - Authenticating as: my username
2014-03-24 11:42:44,483 INFO  solr.SolrMappingReader - source: content dest: 
content
2014-03-24 11:42:44,483 INFO  solr.SolrMappingReader - source: title dest: title
2014-03-24 11:42:44,483 INFO  solr.SolrMappingReader - source: host dest: host
2014-03-24 11:42:44,483 INFO  solr.SolrMappingReader - source: segment dest: 
segment
2014-03-24 11:42:44,483 INFO  solr.SolrMappingReader - source: boost dest: boost
2014-03-24 11:42:44,484 INFO  solr.SolrMappingReader - source: digest dest: 
digest
2014-03-24 11:42:44,484 INFO  solr.SolrMappingReader - source: tstamp dest: 
tstamp
2014-03-24 11:42:44,484 INFO  solr.SolrMappingReader - source: url dest: id
2014-03-24 11:42:44,484 INFO  solr.SolrMappingReader - source: url dest: url
2014-03-24 11:42:44,616 INFO  solr.SolrIndexWriter - Indexing 22 documents
2014-03-24 11:42:44,704 INFO  httpclient.HttpMethodDirector - I/O exception 
(java.net.ConnectException) caught when processing request: Connection refused
2014-03-24 11:42:44,704 INFO  httpclient.HttpMethodDirector - Retrying request
2014-03-24 11:42:44,707 INFO  httpclient.HttpMethodDirector - I/O exception 
(java.net.ConnectException) caught when processing request: Connection refused
2014-03-24 11:42:44,707 INFO  httpclient.HttpMethodDirector - Retrying request
2014-03-24 11:42:44,707 INFO  httpclient.HttpMethodDirector - I/O exception 
(java.net.ConnectException) caught when processing request: Connection refused
2014-03-24 11:42:44,707 INFO  httpclient.HttpMethodDirector - Retrying request
2014-03-24 11:42:44,708 INFO  solr.SolrIndexWriter - Indexing 22 documents
2014-03-24 11:42:44,709 INFO  httpclient.HttpMethodDirector - I/O exception 
(java.net.ConnectException) caught when processing request: Connection refused
2014-03-24 11:42:44,709 INFO  httpclient.HttpMethodDirector - Retrying request
2014-03-24 11:42:44,709 INFO  httpclient.HttpMethodDirector - I/O exception 
(java.net.ConnectException) caught when processing request: Connection refused
2014-03-24 11:42:44,709 INFO  httpclient.HttpMethodDirector - Retrying request
2014-03-24 11:42:44,709 INFO  httpclient.HttpMethodDirector - I/O exception 
(java.net.ConnectException) caught when processing request: Connection refused
2014-03-24 11:42:44,709 INFO  httpclient.HttpMethodDirector - Retrying request
2014-03-24 11:42:44,715 WARN  mapred.LocalJobRunner - job_local319933392_0001
java.io.IOException
at 
org.apache.nutch.indexwriter.solr.SolrIndexWriter.makeIOException(SolrIndexWriter.java:173)
at 
org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:159)
at org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:118)
at 
org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:44)
at 
org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.close(ReduceTask.java:467)
at 
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:535)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:421)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
Caused by: org.apache.solr.client.solrj.SolrServerException: 
java.net.ConnectException: Connection refused
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:478)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
at 
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
at 
org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:155)
... 6 more
Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
at 

Re: Indexer: java.io.IOException: Job failed!

2014-03-24 Thread Laura McCord
So the problem might be because I’m running solr on tomcat port 8080. is there 
a way to resolve this so I can run the command successfully?

Thanks,
 Laura



On Mar 24, 2014, at 1:33 PM, Laura McCord lmcc...@ucmerced.edu wrote:

 Hi,
 
 I’m trying to integrate Solr with Nutch and I performed all of the necessary 
 steps except after Nutch performs the crawl it appears that I’m receiving a 
 connection refused.
 
 2014-03-24 11:42:43,062 INFO  indexer.IndexerMapReduce - IndexerMapReduce: 
 crawldb: TestCrawl/crawldb
 2014-03-24 11:42:43,062 INFO  indexer.IndexerMapReduce - IndexerMapReduce: 
 linkdb: TestCrawl/linkdb
 2014-03-24 11:42:43,062 INFO  indexer.IndexerMapReduce - IndexerMapReduces: 
 adding segment: TestCrawl/segments/20140324113941
 2014-03-24 11:42:43,304 WARN  util.NativeCodeLoader - Unable to load 
 native-hadoop library for your platform... using builtin-java classes where 
 applicable
 2014-03-24 11:42:43,942 INFO  anchor.AnchorIndexingFilter - Anchor 
 deduplication is: off
 2014-03-24 11:42:44,456 INFO  indexer.IndexWriters - Adding 
 org.apache.nutch.indexwriter.solr.SolrIndexWriter
 2014-03-24 11:42:44,465 INFO  solr.SolrUtils - Authenticating as: my 
 username
 2014-03-24 11:42:44,483 INFO  solr.SolrMappingReader - source: content dest: 
 content
 2014-03-24 11:42:44,483 INFO  solr.SolrMappingReader - source: title dest: 
 title
 2014-03-24 11:42:44,483 INFO  solr.SolrMappingReader - source: host dest: host
 2014-03-24 11:42:44,483 INFO  solr.SolrMappingReader - source: segment dest: 
 segment
 2014-03-24 11:42:44,483 INFO  solr.SolrMappingReader - source: boost dest: 
 boost
 2014-03-24 11:42:44,484 INFO  solr.SolrMappingReader - source: digest dest: 
 digest
 2014-03-24 11:42:44,484 INFO  solr.SolrMappingReader - source: tstamp dest: 
 tstamp
 2014-03-24 11:42:44,484 INFO  solr.SolrMappingReader - source: url dest: id
 2014-03-24 11:42:44,484 INFO  solr.SolrMappingReader - source: url dest: url
 2014-03-24 11:42:44,616 INFO  solr.SolrIndexWriter - Indexing 22 documents
 2014-03-24 11:42:44,704 INFO  httpclient.HttpMethodDirector - I/O exception 
 (java.net.ConnectException) caught when processing request: Connection refused
 2014-03-24 11:42:44,704 INFO  httpclient.HttpMethodDirector - Retrying request
 2014-03-24 11:42:44,707 INFO  httpclient.HttpMethodDirector - I/O exception 
 (java.net.ConnectException) caught when processing request: Connection refused
 2014-03-24 11:42:44,707 INFO  httpclient.HttpMethodDirector - Retrying request
 2014-03-24 11:42:44,707 INFO  httpclient.HttpMethodDirector - I/O exception 
 (java.net.ConnectException) caught when processing request: Connection refused
 2014-03-24 11:42:44,707 INFO  httpclient.HttpMethodDirector - Retrying request
 2014-03-24 11:42:44,708 INFO  solr.SolrIndexWriter - Indexing 22 documents
 2014-03-24 11:42:44,709 INFO  httpclient.HttpMethodDirector - I/O exception 
 (java.net.ConnectException) caught when processing request: Connection refused
 2014-03-24 11:42:44,709 INFO  httpclient.HttpMethodDirector - Retrying request
 2014-03-24 11:42:44,709 INFO  httpclient.HttpMethodDirector - I/O exception 
 (java.net.ConnectException) caught when processing request: Connection refused
 2014-03-24 11:42:44,709 INFO  httpclient.HttpMethodDirector - Retrying request
 2014-03-24 11:42:44,709 INFO  httpclient.HttpMethodDirector - I/O exception 
 (java.net.ConnectException) caught when processing request: Connection refused
 2014-03-24 11:42:44,709 INFO  httpclient.HttpMethodDirector - Retrying request
 2014-03-24 11:42:44,715 WARN  mapred.LocalJobRunner - job_local319933392_0001
 java.io.IOException
   at 
 org.apache.nutch.indexwriter.solr.SolrIndexWriter.makeIOException(SolrIndexWriter.java:173)
   at 
 org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:159)
   at org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:118)
   at 
 org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:44)
   at 
 org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.close(ReduceTask.java:467)
   at 
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:535)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:421)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
 Caused by: org.apache.solr.client.solrj.SolrServerException: 
 java.net.ConnectException: Connection refused
   at 
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:478)
   at 
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
   at 
 org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
   at 
 org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:155)
   ... 6 more
 Caused by: java.net.ConnectException: Connection refused
   at 

Re: Using Sentence Information For Snippet Generation

2014-03-24 Thread Dmitry Kan
Hi Furkan,

I have done an implementation with a custom filler (special character)
sequence in between sentences. A better solution I landed at was increasing
the position of each sentence's first token by a large number, like 1
(perhaps, a smaller number could be used too). Then a user search can be
conducted with a proximity query: some tokens ~5000 (the recently
committed complexphrase parser supports rich phrase syntax, for example).
This of course expects that a sentence fits the 5000 window size and the
total number of sentences in the field * 10k does not exceed
Integer.MAX_VALUE. Then on the highlighter side you'd get the hits within
sentences naturally.

Is this something you are looking for?

Dmitry



On Mon, Mar 24, 2014 at 5:43 PM, Furkan KAMACI furkankam...@gmail.comwrote:

 Hi;

 When I generate snippet via Solr I do not want to remove beginning of any
 sentence at the snippet. So I need to do a sentence detection. I think that
 I can do it before I send documents into Solr. I can put some special
 characters that signs beginning or end of a sentence. Then I can use that
 information when generating snippet. On the other hand I should not show
 that special character to the user.

 What do you think that how can I do it or do you have any other ideas for
 my purpose?

 PS: I do not do it for English sentences.

 Thanks;
 Furkan KAMACI




-- 
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan


Re: Can the solr dataimporthandler consume an atom feed?

2014-03-24 Thread eShard
I confirmed the xpath is correct with a third party XPath visualizer.
/atom:feed/atom:entry parses the xml correctly.

Can anyone confirm or deny that the dataimporthandler can handle an atom
feed?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-the-solr-dataimporthandler-consume-an-atom-feed-tp4126134p4126672.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: w/10 ? [was: Partial Counts in SOLR]

2014-03-24 Thread T. Kuro Kurosaka

On 3/19/14 5:13 PM, Otis Gospodnetic wrote: Hi,

 Guessing it's surround query parser's support for within backed by span
 queries.

 Otis

You mean this?
http://wiki.apache.org/solr/SurroundQueryParser

I guess this parser needs improvement in documentation area.
It doesn't explain or have an example of the w/int syntax at all.
(Is this the infix notation of W?)
An example would help explaining difference between W and N;
some readers may not understand what ordered and unordered
in this context mean.

Kuro



Re: Fixing corrupted index?

2014-03-24 Thread Dmitry Kan
Hi,

Have a look at:

http://lucene.apache.org/core/4_1_0/core/org/apache/lucene/index/CheckIndex.html

HTH,
Dmitry


On Mon, Mar 24, 2014 at 8:16 PM, zqzuk ziqizh...@hotmail.co.uk wrote:

 My Lucene index - built with Solr using Lucene4.1 - is corrupted. Upon
 trying
 to read the index using the following code I get
 org.apache.solr.common.SolrException: No such core: collection1 exception:

 
 File configFile = new File(cacheFolder + File.separator + solr.xml);
 CoreContainer container = new CoreContainer(cacheFolder, configFile);
 SolrServer server = new EmbeddedSolrServer(container, collection1);
 ModifiableSolrParams params = new ModifiableSolrParams();
 params.set(q, idFieldName + : + ClientUtils.escapeQueryChars(queryId));
 params.set(fl,idFieldName+,+valueFieldName);

 QueryResponse response = server.query(params)
 

 I used checkindex util to check the integrity of the index and it seems
 not able to perform the task by throwing the following error:

 
 Opening index @
 /../solrindex_cache/zookeeper/solr/collection1/data/index

 ERROR: could not read any segments file in directory
 java.io.FileNotFoundException:
 /../solrindex_cache/zookeeper/solr/collection1/data/index/segments_b5tb
 (No such file or directory)
 at java.io.RandomAccessFile.open(Native Method)
 at java.io.RandomAccessFile.init(RandomAccessFile.java:233)
 at
 org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:223)
 at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:285)
 at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:347)
 at

 org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:783)
 at

 org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:630)
 at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:343)
 at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:383)
 at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:1777)
 

 The file segments_b5tb that index checker is looking for is indeed missing
 in the index folder. The only file that looks similar is segments.gen.
 However, the index segment files including .si, tip, doc, fdx etc still
 exist.

 Is there any way to fix this as it took me 2 weeks to build this index...

 Many many thanks for your kind advice!



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Fixing-corrupted-index-tp4126644.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan


Re: Fixing corrupted index?

2014-03-24 Thread zqzuk
Hi
Thanks.

But I am already using CheckIndex and the error is given by the CheckIndex
utility: it could not even continue after reporting could not read any
segements file in directory. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Fixing-corrupted-index-tp4126644p4126687.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Can the solr dataimporthandler consume an atom feed?

2014-03-24 Thread eShard
Ok, I found one typo:
the links need to be this: /atom:feed/atom:entry/atom:link/@href
But the import still doesn't work... :(

I guess I have to convert the feed over to RSS 2.0



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-the-solr-dataimporthandler-consume-an-atom-feed-tp4126134p4126691.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: w/10 ? [was: Partial Counts in SOLR]

2014-03-24 Thread Otis Gospodnetic
I think SQP is getting axed, no?

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr  Elasticsearch Support * http://sematext.com/


On Mon, Mar 24, 2014 at 3:45 PM, T. Kuro Kurosaka k...@healthline.comwrote:

 On 3/19/14 5:13 PM, Otis Gospodnetic wrote: Hi,
 
  Guessing it's surround query parser's support for within backed by span
  queries.
 
  Otis

 You mean this?
 http://wiki.apache.org/solr/SurroundQueryParser

 I guess this parser needs improvement in documentation area.
 It doesn't explain or have an example of the w/int syntax at all.
 (Is this the infix notation of W?)
 An example would help explaining difference between W and N;
 some readers may not understand what ordered and unordered
 in this context mean.

 Kuro




solr 4.x reindexing issues

2014-03-24 Thread Ravi Solr
Hello,
We are trying to reindex as part of our move from 3.6.2 to 4.6.1
and have faced various issues reindexing 1.5 Million docs. We dont use
solrcloud, its still Master/Slave config. For testing this Iam using a
single test server reading from it and putting back into same index.

We send docs in batches of 100 but only 10/100 are getting indexed, is this
related to the maxBufferedAddsPerServer setting that is hard coded ?? Also
I tried to play with autocommit and softcommit settings but in vain.

autoCommit
   maxDocs5/maxDocs
   maxTime5000/maxTime
   openSearchertrue/openSearcher
/autoCommit

autoSoftCommit
maxTime1000/maxTime
/autoSoftCommit

I use these on the test system just to check if docs are being indexed, but
even with a batch of 5 my solrj client code runs faster than indexing
causing some docs to not get indexed. The function that's indexing is a
recursive method call  (shown below) which fails after sometime with stack
overflow (I did not have this issue with 3.6.2 with same code)

private static void processDocs(HttpSolrServer server, Integer start,
Integer rows) throws Exception {
SolrQuery query = new SolrQuery();
query.setQuery(*:*);
query.addFilterQuery(-allfields:[* TO *]);
QueryResponse resp = server.query(query);
SolrDocumentList list =  resp.getResults();
Long total = list.getNumFound();

if(list != null  !list.isEmpty()) {
for(SolrDocument doc : list) {
SolrInputDocument iDoc =
ClientUtils.toSolrInputDocument(doc);
//To index full doc again
iDoc.removeField(_version_);
server.add(iDoc, 1000);
}

System.out.println(Indexed  + (start+rows) + / + total);
if (total = (start + rows)) {
processDocs(server, (start + rows), rows);
}
}
}

I also tried turning on the updateLog but that was filling up so fast to
the point where it is useless.

How do we do bulk updates in solr 4.x environment ?? Is there any setting
that Iam missing ??

Thanks

Ravi Kiran Bhaskar
Technical Architect
The Washington Post


Multiple Languages in Same Core

2014-03-24 Thread Jeremy Thomerson
I recently deployed Solr to back the site search feature of a site I work
on. The site itself is available in hundreds of languages. With the initial
release of site search we have enabled the feature for ten of those
languages. This is distributed across eight cores, with two Chinese
languages plus Korean combined into one CJK core and each of the other
seven languages in their own individual cores. The reason for splitting
these into separate cores was so that we could have the same field names
across all cores but have different configuration for analyzers, etc, per
core.

Now I have some questions on this approach.

1) Scalability: Considering I need to scale this to many dozens more
languages, perhaps hundreds more, is there a better way so that I don't end
up needing dozens or hundreds of cores? My initial plan was that many
languages that didn't have special support within Solr would simply get
lumped into a single default core that has some default analyzers that
are applicable to the majority of languages.

1b) Related to this: is there a practical limit to the number of cores that
can be run on one instance of Lucene?

2) Auto Suggest: In phase two I intend to add auto-suggestions as a user
types a query. In reviewing how this is implemented and how the suggestion
dictionary is built I have concerns. If I have more than one language in a
single core (and I keep the same field name for suggestions on all
languages within a core) then it seems that I could get suggestions from
another language returned with a suggest query. Is there a way to build a
separate dictionary for each language, but keep these languages within the
same core?

If it's helpful to know: I have a field in every core for Locale. Values
will be the locale of the language of that document, i.e. en, es,
zh_hans, etc. I'd like to be able to: 1) when building a suggestion
dictionary, divide it into multiple dictionaries, grouping them by locale,
and 2) supply a parameter to the suggest query that allows the suggest
component to only return suggestions from the appropriate dictionary for
that locale.

If the answer to #1 is keep splitting groups of languages that have
different analyzers into their own cores and the answer to #2 is that's
not supported, then I'd be curious: where would I start to write my own
extension that supported #2? I looked last night at the suggest lookup
classes, dictionary classes, etc. But I didn't see a clear point where it
would be clean to implement something like I'm suggesting above.

Best Regards,
Jeremy Thomerson


Re: Best approach to handle large volume of documents with constantly high incoming rate?

2014-03-24 Thread shushuai zhu
Jack, thanks. 

Actually the 20K events/sec is some low-end rate we estimated. It is not 
necessarily related to sensor; when you want to centralize data from many 
sources, regardless multi-tenancy, even for a single tenant, many events per 
second have to be handled.

I have a question regarding to the size of nodes used in Solr Cloud, what are 
the general pros/cons between using big or small nodes to setup Solr Clouds for 
similar cases as I described? For example, mainly considering memory:

256 (GB) x 4
vs.
32 (GB) x 32

or a little extreme:
256 (GB) x 4
vs.
8 (GB) x 128

Is it better to use fewer bigger nodes to setup a Solr Cloud or better to use 
more small nodes to setup a Solr Cloud? In the latter (a little extreme 
example), multiple Solr Clouds could be considered as Erick mentioned.

Regards.

Shushuai

 


 From: Jack Krupansky j...@basetechnology.com
To: solr-user@lucene.apache.org 
Sent: Sunday, March 23, 2014 1:03 AM
Subject: Re: Best approach to handle large volume of documents with constantly 
high incoming rate?
  

I defer to Erick on on this level of detail and experience.

Let's continue the discussion - some of it will be a matter of how to 
configure and tune Solr, how to select, configure, and tune hardware, the 
need for further Lucene/Solr improvements, and how much further we have to 
go to get to the next level with Big Data. I mean, 20K events/sec is not 
necessarily beyond the realm of reality these days with sensor data (20K/sec 
= 1 event every 50 microseconds)

-- Jack Krupansky


-Original Message- 
From: Erick Erickson
Sent: Saturday, March 22, 2014 11:02 PM
To: solr-user@lucene.apache.org ; shushuai zhu
Subject: Re: Best approach to handle large volume of documents with 
constantly high incoming rate?

Well, the commonsense limits Jack is referring to in that post are
more (IMO) scales you should count on having to do some _serious_
prototyping/configuring/etc. As you scale out, you'll run into edge
cases that aren't the common variety, aren't reliably tested every
night, etc. I mean how would you set up a test bed that had 1,000
nodes? Sure, it can be done, but nobody's volunteered yet to provide
the Apache Solr project that much hardware. I suspect that it would
make Uwe's week if someone did though.

In the practical limit vein, one example: You'll run up against the
laggard problem. Let's assume that you successfully put up 2,000
nodes, for simplicity's sake, no replicas, just leaders and they all
stay up all the time. To successfully do a search, you need to send
out a request to all 2,000 nodes. The chance that one of them is slow
for _any_ reason (GC, high CPU load, it's just tired) increases the
more nodes you have. And since you have to wait until the slowest node
responds, your query rate will suffer correspondingly.

I've seen 4 node clusters handle 5,000 docs/sec update rate FWIW. YMMV
of course.

However, you say ...dedicated indexing servers There's no such
thing in SolrCloud. Every document gets sent to every member of the
slice it belongs to. How else could NRT be supported? When I saw that
comment I wondered how well you understand SolrCloud. I flat guarantee
you'll understand SolrCloud really, really well if yo try to scale as
you indicate :). There'll be a whole bunch of learning experiences
along the way, some will be painful. I guarantee that too.

Responding to your points

1) Yes, no, and maybe. For relatively small docs on relatively modern
hardware, it's a good place to start. Then you have to push it until
it falls over to determine your _real_ rates. See:
http://searchhub.org/dev/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

2) Nobody knows. There's no theoretical reason why SolrCloud
shouldn't; no a-priori hard limits. I _strongly_ suspect you'll be on
the bleeding edge of size, though. Expect some things to be learning
experiences.

3) No, it doesn't mean that at all. 64 is an arbitrary number that
means, IMO, here there be dragons. As you start to scale out beyond
this you'll run into pesky issues I expect. Your network won't be as
reliable as you think. You'll find one of your VMs (which I expect
you'll be running on) has some glitches. Someone loaded a very CPU
intensive program on three of your machines and your Solrs on those
machines is being starved. Etc.

4) I've personally seen 1,000 node clusters. You ought to see the very
cool. SolrCloud admin graph I recently saw... But I expect you'll
actually be in for some kind of divide-and-conquer strategy whereby
you have a bunch of clusters that are significantly smaller. You
could, for instance, determine that the use-case you support is
searching across small ranges, say a week at a time and have 52
clusters of 128 machines or so. You could have 365 clusters of  20
machines. It all depends on how the index will be used.

5) Not at all. See above, I've seen 5K/sec on 4 nodes, also supporting
simultaneous 

Question on highlighting edgegrams

2014-03-24 Thread Software Dev
In 3.5.0 we have the following.

fieldType name=autocomplete class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EdgeNGramFilterFactory minGramSize=1
maxGramSize=30/
  /analyzer
  analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType

If we searched for c with highlighting enabled we would get back
results such as:

emc/emdat
emc/emrocdile
emce/mool beans

But in the latest Solr (4.7) we get the full words highlighted back.
Did something change from these versions with regards to highlighting?

Thanks


Re: w/10 ? [was: Partial Counts in SOLR]

2014-03-24 Thread Ahmet Arslan
Hi,

There is no w/int syntax in surround. 
/* Query language operators: OR, AND, NOT, W, N, (, ), ^, *, ?,  and comma */

Ahmet



On Monday, March 24, 2014 9:46 PM, T. Kuro Kurosaka k...@healthline.com wrote:
On 3/19/14 5:13 PM, Otis Gospodnetic wrote: Hi,

 Guessing it's surround query parser's support for within backed by span
 queries.

 Otis

You mean this?
http://wiki.apache.org/solr/SurroundQueryParser

I guess this parser needs improvement in documentation area.
It doesn't explain or have an example of the w/int syntax at all.
(Is this the infix notation of W?)
An example would help explaining difference between W and N;
some readers may not understand what ordered and unordered
in this context mean.


Kuro


Re: w/10 ? [was: Partial Counts in SOLR]

2014-03-24 Thread Walter Underwood
That is similar to Verity VQL, but that used NEAR/10.  --wunder

On Mar 24, 2014, at 4:21 PM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi,
 
 There is no w/int syntax in surround. 
 /* Query language operators: OR, AND, NOT, W, N, (, ), ^, *, ?,  and comma */
 
 Ahmet
 
 
 
 On Monday, March 24, 2014 9:46 PM, T. Kuro Kurosaka k...@healthline.com 
 wrote:
 On 3/19/14 5:13 PM, Otis Gospodnetic wrote: Hi,
 
 Guessing it's surround query parser's support for within backed by span
 queries.
 
 Otis
 
 You mean this?
 http://wiki.apache.org/solr/SurroundQueryParser
 
 I guess this parser needs improvement in documentation area.
 It doesn't explain or have an example of the w/int syntax at all.
 (Is this the infix notation of W?)
 An example would help explaining difference between W and N;
 some readers may not understand what ordered and unordered
 in this context mean.
 
 
 Kuro

--
Walter Underwood
wun...@wunderwood.org





Re: Required fields

2014-03-24 Thread Chris Hostetter

: What is the default value for the required attribute of a field element 
: in a schema? I've just looked everywhere I can think of in the wiki, the 
: reference manual, and the JavaDoc. Most of the documentation doesn't 
: even mention that attribute.

Good catch, fixed...
https://cwiki.apache.org/confluence/pages/diffpages.action?pageId=32604269originalId=40506114

https://cwiki.apache.org/confluence/display/solr/Field+Type+Definitions+and+Properties
https://cwiki.apache.org/confluence/display/solr/Defining+Fields


-Hoss
http://www.lucidworks.com/


Re: Multiple Languages in Same Core

2014-03-24 Thread Alexandre Rafalovitch
Solr In Action has a significant discussion on the multi-lingual
approach. They also have some code samples out there. Might be worth a
look

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Tue, Mar 25, 2014 at 4:43 AM, Jeremy Thomerson
jer...@thomersonfamily.com wrote:
 I recently deployed Solr to back the site search feature of a site I work
 on. The site itself is available in hundreds of languages. With the initial
 release of site search we have enabled the feature for ten of those
 languages. This is distributed across eight cores, with two Chinese
 languages plus Korean combined into one CJK core and each of the other
 seven languages in their own individual cores. The reason for splitting
 these into separate cores was so that we could have the same field names
 across all cores but have different configuration for analyzers, etc, per
 core.

 Now I have some questions on this approach.

 1) Scalability: Considering I need to scale this to many dozens more
 languages, perhaps hundreds more, is there a better way so that I don't end
 up needing dozens or hundreds of cores? My initial plan was that many
 languages that didn't have special support within Solr would simply get
 lumped into a single default core that has some default analyzers that
 are applicable to the majority of languages.

 1b) Related to this: is there a practical limit to the number of cores that
 can be run on one instance of Lucene?

 2) Auto Suggest: In phase two I intend to add auto-suggestions as a user
 types a query. In reviewing how this is implemented and how the suggestion
 dictionary is built I have concerns. If I have more than one language in a
 single core (and I keep the same field name for suggestions on all
 languages within a core) then it seems that I could get suggestions from
 another language returned with a suggest query. Is there a way to build a
 separate dictionary for each language, but keep these languages within the
 same core?

 If it's helpful to know: I have a field in every core for Locale. Values
 will be the locale of the language of that document, i.e. en, es,
 zh_hans, etc. I'd like to be able to: 1) when building a suggestion
 dictionary, divide it into multiple dictionaries, grouping them by locale,
 and 2) supply a parameter to the suggest query that allows the suggest
 component to only return suggestions from the appropriate dictionary for
 that locale.

 If the answer to #1 is keep splitting groups of languages that have
 different analyzers into their own cores and the answer to #2 is that's
 not supported, then I'd be curious: where would I start to write my own
 extension that supported #2? I looked last night at the suggest lookup
 classes, dictionary classes, etc. But I didn't see a clear point where it
 would be clean to implement something like I'm suggesting above.

 Best Regards,
 Jeremy Thomerson


Re: Can the solr dataimporthandler consume an atom feed?

2014-03-24 Thread Gora Mohanty
On 25 March 2014 01:15, eShard zim...@yahoo.com wrote:
 I confirmed the xpath is correct with a third party XPath visualizer.
 /atom:feed/atom:entry parses the xml correctly.

 Can anyone confirm or deny that the dataimporthandler can handle an atom
 feed?

Yes, an ATOM feed can be consumed by DIH, as noted in the documentation.
We have done this in the past, and a Google search turns up examples, e.g.,
http://blog.florian-hopf.de/2012/05/importing-atom-feeds-in-solr-using-data.html

Have not dealt with namespaces, but here is a line from the documentation
that s probably relevant to your ATOM feed:
It does not support namespaces, but it can handle xmls with namespaces.
When you provide the xpath, just drop the namespace and give the rest
(eg if the tag is 'dc:subject' the mapping should just contain 'subject').

Other than that, I still see nothing wrong with your DIH data configuration. The
message from the dataimport shows that it did make a request to the
URLDataSource. If things still do not work:
* Can you double-check that the specified URL in the url attribute of the
   entity does indeed retrieve the desired XML.
* I am pretty sure that you have checked this, but are your fields properly
  defined in the Solr schema?

Regards,
Gora


Re: w/10 ? [was: Partial Counts in SOLR]

2014-03-24 Thread Salman Akram
Basically we just created this syntax for the ease of users, otherwise on
back end it uses W or N operators.


On Tue, Mar 25, 2014 at 4:21 AM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi,

 There is no w/int syntax in surround.
 /* Query language operators: OR, AND, NOT, W, N, (, ), ^, *, ?,  and
 comma */

 Ahmet



 On Monday, March 24, 2014 9:46 PM, T. Kuro Kurosaka k...@healthline.com
 wrote:
 On 3/19/14 5:13 PM, Otis Gospodnetic wrote: Hi,
 
  Guessing it's surround query parser's support for within backed by span
  queries.
 
  Otis

 You mean this?
 http://wiki.apache.org/solr/SurroundQueryParser

 I guess this parser needs improvement in documentation area.
 It doesn't explain or have an example of the w/int syntax at all.
 (Is this the infix notation of W?)
 An example would help explaining difference between W and N;
 some readers may not understand what ordered and unordered
 in this context mean.


 Kuro




-- 
Regards,

Salman Akram


Re: w/10 ? [was: Partial Counts in SOLR]

2014-03-24 Thread Roman Chyla
perhaps useful, here is an open source implementation with near[digit]
support, incl analysis of proximity tokens. When days become longer maybe
itwill be packaged into a nice lib...:-)

https://github.com/romanchyla/montysolr/blob/master/contrib/adsabs/grammars/ADS.g
On 25 Mar 2014 00:14, Salman Akram salman.ak...@northbaysolutions.net
wrote:

 Basically we just created this syntax for the ease of users, otherwise on
 back end it uses W or N operators.


 On Tue, Mar 25, 2014 at 4:21 AM, Ahmet Arslan iori...@yahoo.com wrote:

  Hi,
 
  There is no w/int syntax in surround.
  /* Query language operators: OR, AND, NOT, W, N, (, ), ^, *, ?,  and
  comma */
 
  Ahmet
 
 
 
  On Monday, March 24, 2014 9:46 PM, T. Kuro Kurosaka k...@healthline.com
 
  wrote:
  On 3/19/14 5:13 PM, Otis Gospodnetic wrote: Hi,
  
   Guessing it's surround query parser's support for within backed by
 span
   queries.
  
   Otis
 
  You mean this?
  http://wiki.apache.org/solr/SurroundQueryParser
 
  I guess this parser needs improvement in documentation area.
  It doesn't explain or have an example of the w/int syntax at all.
  (Is this the infix notation of W?)
  An example would help explaining difference between W and N;
  some readers may not understand what ordered and unordered
  in this context mean.
 
 
  Kuro
 



 --
 Regards,

 Salman Akram



Re: solr cloud distributed optimize() becomes serialized

2014-03-24 Thread Shalin Shekhar Mangar
Found it - https://issues.apache.org/jira/browse/LUCENE-5481

On Fri, Mar 21, 2014 at 8:11 PM, Mark Miller markrmil...@gmail.com wrote:
 Recently fixed in Lucene - should be able to find the issue if you dig a 
 little.
 --
 Mark Miller
 about.me/markrmiller

 On March 21, 2014 at 10:25:56 AM, Greg Walters (greg.walt...@answers.com) 
 wrote:

 I've seen this on 4.6.

 Thanks,
 Greg

 On Mar 20, 2014, at 11:58 PM, Shalin Shekhar Mangar shalinman...@gmail.com 
 wrote:

 That's not right. Which Solr versions are you on (question for both
 William and Chris)?

 On Fri, Mar 21, 2014 at 8:07 AM, William Bell billnb...@gmail.com wrote:
 Yeah. optimize() also used to come back immediately if the index was
 already indexed. It just reopened the index.

 We uses to use that for cleaning up the old directories quickly. But now it
 does another optimize() even through the index is already optimized.

 Very strange.


 On Tue, Mar 18, 2014 at 11:30 AM, Chris Lu chris...@gmail.com wrote:

 I wonder whether this is a known bug. In previous SOLR cloud versions, 4.4
 or maybe 4.5, an explicit optimize(), without any parameters, it usually
 took 2 minutes for a 32 core cluster.

 However, in 4.6.1, the same call took about 1 hour. Checking the index
 modification time for each core shows 2 minutes gap if sorted.

 We are using a solrj client connecting to zookeeper. I found it is talking
 to a specific solr server A, and that server A is distributing the calls to
 all other solr servers. Here is the thread dump for this server A:

 at

 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:395)
 at

 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:199)
 at

 org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer.request(ConcurrentUpdateSolrServer.java:293)
 at

 org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:226)
 at

 org.apache.solr.update.SolrCmdDistributor.distribCommit(SolrCmdDistributor.java:195)
 at

 org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1250)
 at

 org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)




 --
 Bill Bell
 billnb...@gmail.com
 cell 720-256-8076



 --
 Regards,
 Shalin Shekhar Mangar.




-- 
Regards,
Shalin Shekhar Mangar.


Semantic search with python numpy and Solr

2014-03-24 Thread Sohan Kalsariya
I am beginner with solr, started playing with solr for last one month.

I am building a search mechanism for the http://allevents.in and i want to
implement semantic search with solr, when someone search events in our
website. And the back-end is in php(solarium-client). So can you please
guide me for the semantic search with solr?
I have gone through the article at 
http://java.dzone.com/articles/semantic-search-solr-and 

So how should i proceed further to implement semantic search with solr and
specifically with my site(allevents.in) ?
Will it give me results like :
if someone search : music events in new york, then it should also give
results like dj night in new york, concerts in new york   and other
related results.
Is it possible?
Can anyone here please guide me or suggest me some material or example of
semantic search from the above article?

-- 
Regards,
*Sohan Kalsariya*