date:20131023

You can't control that if using the compositeIdRouter because the routing
is dependent on the hash function. What you want is custom sharding i.e.
the ability to control the shard to which updates are routed.

You should create a collection using the Collections API with a shards
param specifying the names of the shards you want to create. Then when
indexing documents, include a shard=X param to route requests directly to
that shard. While querying, you can choose to query the entire collection
or again limit the shards using the same shard parameter.


On Wed, Oct 23, 2013 at 4:22 AM, yriveiro yago.rive...@gmail.com wrote:

 Hi,

 I created a collection with 12 shards and route.field=month (month field
 will have values between 1 .. 12)

 I notice that I have shards with more that  a month into them. This could
 left empty some shard and I want the documents one month in each shard.

 My question is, how I configure the sharding method to avoid overlaps?

 /Yago



 -
 Best regards
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Question-about-sharding-and-overlapping-tp4097111.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Regards,
Shalin Shekhar Mangar.

Re: shards.tolerant throwing null pointer exception when spellcheck is on

This is a known bug. See https://issues.apache.org/jira/browse/SOLR-5204

Patches welcome.


On Wed, Oct 23, 2013 at 8:27 AM, Shamik Bandopadhyay sham...@gmail.comwrote:

 Hi,
  .96
   I'm trying to simulate a fault tolerance test where a shard and its
 replica(s) goes. down, leaving other shard(s) running. To test it, I added
 str name=shards.toleranttrue/str in my request handler under defaults
 section. This is to make sure that the condition is added to each query
 running against this request handler.

 In my test environment, I have to 2 shards with a replica each. I brought
 down Shard 1 and Replica 1, then fired a query using SolrJ CloudSolrServer,
 which internally talks to the zookeeper ensemble. In my request handler,
 the spellcheck option is turned on. Due to this, the servers are throwing
 null pointer exception. Here's the stack trace.

 2013-10-22 20:24:43,875] INFO482886[qtp1783079124-15] -
 org.apache.solr.core.SolrCore.execute(SolrCore.java:1909) - [collection1]
 webapp=/solr path=/testhtmlhelp

 params={spellcheck=onq=xrefwt=xmlfq=TestProductLine:ADTfq=TestProductRelease:ADT+2014fq=language:english}
 hits=157 status=500 QTime=70
 [2013-10-22 20:24:43,876]ERROR482887[qtp1783079124-15] -
 org.apache.solr.common.SolrException.log(SolrException.java:119) -
 null:java.lang.NullPointerException
 at

 org.apache.solr.handler.component.SpellCheckComponent.finishStage(SpellCheckComponent.java:323)
 at

 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:317)
 at

 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904)
 at

 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659)
 at

 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362)
 at

 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
 at

 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
 at
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
 at

 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
 at
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
 at

 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
 at

 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
 at
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
 at

 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
 at

 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
 at

 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
 at

 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
 at

 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
 at

 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
 at org.eclipse.jetty.server.Server.handle(Server.java:368)
 at

 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
 at

 org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
 at

 org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
 at

 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
 at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
 at
 org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
 at

 org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
 at

 org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
 at

 org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
 at

 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
 at java.lang.Thread.run(Thread.java:619)


 Here's the query detail from the server log, as you can see the spellcheck
 is on.

 [collection1] webapp=/solr path=/testhtmlhelp

 params={facet=onf.TestCategory.facet.limit=160tie=0.01shards.qt=/testhtmlhelpfl=id,scorefacet.field=Source2fq=TestProductLine:ADTfq=TestProductRelease:ADT+2014fq=language:englishrows=150defType=edismaxstart=0spellcheck=onshards.tolerant=trueshard.url=localhost:8984/solr/collection1/|localhost:8983/solr/collection1/q=xrefisShard=true}
 hits=157 status=0 QTime=15

  Now, if I comment out the spellcheck option in request handler, the query
 works as expected, even if the other shard and its

Re: Solr 4.5 router.name issue?

The router.name in the collections node is not used. The router specified
in the clusterstate.json is the one which is actually used.

This is a known issue and will be fixed with Solr 4.6

See http://issues.apache.org/jira/browse/SOLR-5319


On Wed, Oct 23, 2013 at 4:10 AM, yriveiro yago.rive...@gmail.com wrote:

 Hi,

 I create a collection with command (Solr 4.5):


 http://localhost:8983/solr/admin/collections?action=CREATEname=testDocValuescollection.configName=page-statisticsnumShards=12maxShardsPerNode=12router.field=month

 The documentation says that the default router.name it's compositeId. The
 clusterstate.json it's write compositeId for the testDocValues collection
 but the zookepeer's node /collections/testDocValues says:

 {
   configName:page-statistics,
   router:{name:implicit}
 }

 Is it this correct or is some kind of issue?




 -
 Best regards
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-4-5-router-name-issue-tp4097110.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Regards,
Shalin Shekhar Mangar.

Re: Solr Cloud Distributed IDF

2013-10-23 Thread Toke Eskildsen

On Wed, 2013-10-23 at 04:26 +0200, dboychuck wrote:
 I recently moved an index from 3.6 non-distributed to Solr Cloud 4.4 with
 three shards. My company uses a boosting function with a value assigned to
 each document. This boosting function no longer works dependably and I
 believe the cause is that IDF is not distributed.
 
 This seems like it should be a high priority for Solr Cloud.

It has been relevant for several years, well before SolrCloud. We run a
mixed environment (Lucene/Solr/external index) and hacked a kinda-sorta
distributed IDF together by boosting the search terms, but it is a poor
man's solution.

Distributed IDF for Solr is a very old JIRA issue, dating back to 2009:
https://issues.apache.org/jira/browse/SOLR-1632
Activity has been on/off and I can see that it was last updated in June,
but I have no idea of how close it is to completion.

If you want anything out-of-the-box at this time, you'll have to look at
Elasticsearch, which has this feature.

- Toke Eskildsen, State and University Library, Denmark

Re: Solr Cloud Distributed IDF

2013-10-23 Thread Upayavira

Can you say more about the problem? What did you see that led to that
problem? How did you distribute docs between shards, and how is that
different from your 3.6 setup?

It might be a distributed IDF thing, or it could be something simpler.

Upayavira

On Wed, Oct 23, 2013, at 03:26 AM, dboychuck wrote:
 I recently moved an index from 3.6 non-distributed to Solr Cloud 4.4 with
 three shards. My company uses a boosting function with a value assigned
 to
 each document. This boosting function no longer works dependably and I
 believe the cause is that IDF is not distributed.
 
 This seems like it should be a high priority for Solr Cloud. Does anybody
 know the status of this feature? I understand that the elevate component
 does work for Solr Cloud in version 4.5 but unfortunately it would be a
 pretty big leap for how we are currently using our index and our boosting
 function for relevancy scoring.
 
 
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-Cloud-Distributed-IDF-tp4097127.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: DIH - URLDataSource import size

anyone?


On Tue, Oct 22, 2013 at 9:50 PM, Raheel Hasan raheelhasan@gmail.comwrote:

 Hi,

 I have an issue that is only coming on live environment. The DIH
 with URLDataSource is not working when the file size imported is large
 (i.e. 100kb above - which is not so large). If its large, it returns
 nothing (as seen in the Debug section of DataImport at Solr Admin).

 However, when working on local environment, this issue doesnt come at all.

 (note that I am using it with URLDataSource with PlainTextEntityProcessor
 in the entity field).

 Please help me as I tried to get it done a lot, but cant !!

 Thanks a lot.

 --
 Regards,
 Raheel Hasan




-- 
Regards,
Raheel Hasan

Re: Question about sharding and overlapping

2013-10-23 Thread Yago Riveiro

Can I split shards as with compositeId using this method?

On Wednesday, October 23, 2013, Shalin Shekhar Mangar wrote:

 You can't control that if using the compositeIdRouter because the routing
 is dependent on the hash function. What you want is custom sharding i.e.
 the ability to control the shard to which updates are routed.

 You should create a collection using the Collections API with a shards
 param specifying the names of the shards you want to create. Then when
 indexing documents, include a shard=X param to route requests directly to
 that shard. While querying, you can choose to query the entire collection
 or again limit the shards using the same shard parameter.


 On Wed, Oct 23, 2013 at 4:22 AM, yriveiro 
 yago.rive...@gmail.comjavascript:;
 wrote:

  Hi,
 
  I created a collection with 12 shards and route.field=month (month field
  will have values between 1 .. 12)
 
  I notice that I have shards with more that  a month into them. This could
  left empty some shard and I want the documents one month in each shard.
 
  My question is, how I configure the sharding method to avoid overlaps?
 
  /Yago
 
 
 
  -
  Best regards
  --
  View this message in context:
 
 http://lucene.472066.n3.nabble.com/Question-about-sharding-and-overlapping-tp4097111.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 



 --
 Regards,
 Shalin Shekhar Mangar.



-- 
/Yago Riveiro

Re: Class name of parsing the fq clause

2013-10-23 Thread Sandeep Gupta

Thanks Jack for detailing out the parser logic.
Would it be possible for you to say something more about filter cache code
flow...  sometimes we do not use fq parameter  in query string and pass the
raw query

Regards
Sandeep


On Mon, Oct 21, 2013 at 7:11 PM, Jack Krupansky j...@basetechnology.comwrote:

 Start with org.apache.solr.handler.**component.QueryComponent#**prepare
 which fetches the fq parameters and indirectly invokes the query parser(s):

 String[] fqs = req.getParams().getParams(**CommonParams.FQ);
 if (fqs!=null  fqs.length!=0) {
   ListQuery filters = rb.getFilters();
   // if filters already exists, make a copy instead of modifying the
 original
   filters = filters == null ? new ArrayListQuery(fqs.length) : new
 ArrayListQuery(filters);
   for (String fq : fqs) {
 if (fq != null  fq.trim().length()!=0) {
   QParser fqp = QParser.getParser(fq, null, req);
   filters.add(fqp.getQuery());
 }
   }
   // only set the filters if they are not empty otherwise
   // fq=someotherParam= will trigger all docs filter for every request
   // if filter cache is disabled
   if (!filters.isEmpty()) {
 rb.setFilters( filters );

 Note that this line actually invokes the parser:

   filters.add(fqp.getQuery());

 Then in org.apache.lucene.search.**Query.QParser#getParser:

 QParserPlugin qplug = req.getCore().getQueryPlugin(**parserName);
 QParser parser =  qplug.createParser(qstr, localParams, req.getParams(),
 req);

 And for the common case of the Lucene query parser, org.apache.solr.search.
 **LuceneQParserPlugin#**createParser:

 public QParser createParser(String qstr, SolrParams localParams,
 SolrParams params, SolrQueryRequest req) {
  return new LuceneQParser(qstr, localParams, params, req);
 }

 And then in org.apache.lucene.search.**Query.QParser#getQuery:

 public Query getQuery() throws SyntaxError {
  if (query==null) {
query=parse();

 And then in org.apache.lucene.search.**Query.LuceneQParser#parse:

 lparser = new SolrQueryParser(this, defaultField);

 lparser.setDefaultOperator
  (QueryParsing.**getQueryParserDefaultOperator(**getReq().getSchema(),
  getParam(QueryParsing.OP)));

 return lparser.parse(qstr);

 And then in org.apache.solr.parser.**SolrQueryParserBase#parse:

 Query res = TopLevelQuery(null);  // pass null so we can tell later if an
 explicit field was provided or not

 And then in org.apache.solr.parser.**QueryParser#TopLevelQuery, the
 parsing begins.

 And org.apache.solr.parser.**QueryParser.jj is the grammar for a basic
 Solr/Lucene query, and org.apache.solr.parser.**QueryParser.java is
 generated by JFlex, and a lot of the logic is in the base class of the
 generated class, org.apache.solr.parser.**SolrQueryParserBase.java.

 Good luck! Happy hunting!

 -- Jack Krupansky

 -Original Message- From: YouPeng Yang
 Sent: Monday, October 21, 2013 2:57 AM
 To: solr-user@lucene.apache.org
 Subject: Class name of parsing the fq clause


 Hi
   I search the solr with fq clause,which is like:
   fq=BEGINTIME:[2013-08-25T16:**00:00Z TO *] AND BUSID:(M3 OR M9)


   I am curious about the parsing process . I want to study it.
   What is the Java file name describes  the parsing  process of the fq
 clause.


  Thanks

 Regards.

Re: Adding documents in Solr plugin

2013-10-23 Thread Avner Levy

I've tried to write the plugin code.
Currently I do:
AddUpdateCommand addUpdateCommand = new
AddUpdateCommand(solrQueryRequest);
DocIterator iterator = docList.iterator();
SolrIndexSearcher indexReader =
solrQueryRequest.getSearcher();
while (iterator.hasNext()) {
Document document = indexReader.doc(iterator.nextDoc());
SolrInputDocument solrInputDocument = new
SolrInputDocument();
addUpdateCommand.clear();
addUpdateCommand.solrDoc = solrInputDocument;
addUpdateCommand.solrDoc.setField(id,
document.get(id));
addUpdateCommand.solrDoc.setField(my_updated_field,
new_value);
updateRequestProcessor.processAdd(addUpdateCommand);
}
But this is very expensive since the update handler will fetch again the
document which I already hold at hand. 
Is there a safe way to update the lucene document and write it back while
taking into account all the Solr related code such as caches, extra solr
logic, etc?
I was thinking of converting it to a SolrInputDocument and then just add the
document through Solr but I need first to convert all fields.
Thanks in advance,
  Avner



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Adding-documents-in-Solr-plugin-tp4071574p4097168.html
Sent from the Solr - User mailing list archive at Nabble.com.

fq with { or } in Solr 4.3.1

2013-10-23 Thread Peter Kirk

Hi

If I do a search like 
/search?q=catid:{123}

I get the results I expect.

But if I do
/search?q=*:*fq=catid{123}

I get an error from Solr like:
org.apache.solr.search.SyntaxError: Cannot parse 'catid:{123}': Encountered  
} }  at line 1, column 58. Was expecting one of: TO ... RANGE_QUOTED 
... RANGE_GOOP ...


Can I not use { or } in an fq?

Thanks,
Peter

Re: fq with { or } in Solr 4.3.1

2013-10-23 Thread Upayavira

Missing a colon before the curly bracket in the fq?

On Wed, Oct 23, 2013, at 09:42 AM, Peter Kirk wrote:
 Hi
 
 If I do a search like 
 /search?q=catid:{123}
 
 I get the results I expect.
 
 But if I do
 /search?q=*:*fq=catid{123}
 
 I get an error from Solr like:
 org.apache.solr.search.SyntaxError: Cannot parse 'catid:{123}':
 Encountered  } }  at line 1, column 58. Was expecting one of: TO
 ... RANGE_QUOTED ... RANGE_GOOP ...
 
 
 Can I not use { or } in an fq?
 
 Thanks,
 Peter

RE: fq with { or } in Solr 4.3.1

2013-10-23 Thread Peter Kirk

Sorry, that was just a typo.

/ search?q=*:*fq=catid:{123}

Gives me the error. 

I think that { and } must be used in ranges for fq, and that's why I can't use 
them directly like this.

/Peter



-Original Message-
From: Upayavira [mailto:u...@odoko.co.uk] 
Sent: 23. oktober 2013 10:52
To: solr-user@lucene.apache.org
Subject: Re: fq with { or } in Solr 4.3.1

Missing a colon before the curly bracket in the fq?

On Wed, Oct 23, 2013, at 09:42 AM, Peter Kirk wrote:
 Hi
 
 If I do a search like 
 /search?q=catid:{123}
 
 I get the results I expect.
 
 But if I do
 /search?q=*:*fq=catid{123}
 
 I get an error from Solr like:
 org.apache.solr.search.SyntaxError: Cannot parse 'catid:{123}':
 Encountered  } }  at line 1, column 58. Was expecting one of: TO
 ... RANGE_QUOTED ... RANGE_GOOP ...
 
 
 Can I not use { or } in an fq?
 
 Thanks,
 Peter

RE: Facet performance

2013-10-23 Thread Toke Eskildsen

On Tue, 2013-10-22 at 17:25 +0200, Lemke, Michael SZ/HZA-ZSW wrote:
 On Tue, October 22, 2013 11:54 AM Andre Bois-Crettez wrote:
  This is with Solr 1.4.
 Really ?
 This sound really outdated to me.
 Have you tried a tried more recent version, 4.5 just went out ?
 
 Sorry, can't.  Too much `grown' stuff.

I did not see that. I guess I parsed it as 4.1.

Well, that rules out DocValues and fcs (as far as I remember). I am a
bit surprised that the limit on #terms with fc is also in 1.4. I thought
it was introduced in a later version.

We too has been in a position where upgrading was hard due to homegrown
addons. We even scrapped some DidYouMean-like functionality when going
from 3.x to 4.x, but 4.x was so much better that there were little
choice.

Last suggestion for using fc: Create 2 or more CONTENT-fields and choose
between them randomly when indexing. Facet on all the CONTENT fields and
merge the results. It will take a bit more RAM though, so it is still
out on your (assumedly) 32 bit machine.

Regards,
Toke Eskildsen, State and University Library, Denmark

Issue with large html indexing

Hi,

I have an issue here while indexing large html. Here is the confguration
for that:

1) Data is imported via URLDataSource / PlainTextEntityProcessor (DIH)

2) Schema has this for the field:
type=text_en_splitting indexed=true stored=false required=false

3) text_en_splitting has the following work done for indexing:
HTMLStripCharFilterFactory
WhitespaceTokenizerFactory (create tokens)
StopFilterFactory
WordDelimiterFilterFactory
ICUFoldingFilterFactory
PorterStemFilterFactory
RemoveDuplicatesTokenFilterFactory
LengthFilterFactory

However, the indexed data is like this (as in the attached image):
[image: Inline image 1]


so what are these numbers?
If I put small html, it works fine, but as the size of html file increases,
this is what happens..

-- 
Regards,
Raheel Hasan

RE: fq with { or } in Solr 4.3.1

2013-10-23 Thread michael.boom

For filtering categories i'm using something like this :
fq=category:(cat1 OR cat2 OR cat3)



-
Thanks,
Michael
--
View this message in context: 
http://lucene.472066.n3.nabble.com/fq-with-or-in-Solr-4-3-1-tp4097170p4097183.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Stop/Restart Solr

Kill -9  didnt kill it... ... the process is now again listed, but with
PPID=1 which I dont want to kill as many processes have this same id...


On Tue, Oct 22, 2013 at 11:59 PM, Utkarsh Sengar utkarsh2...@gmail.comwrote:

 We use this to start/stop solr:

 Start:
 java -Dsolr.clustering.enabled=true -Dsolr.solr.home=multicore
 -Djetty.class.path=lib/ext/* -Dbootstrap_conf=true -DnumShards=3
 -DSTOP.PORT=8079 -DSTOP.KEY=some_value -jar start.jar

 Stop:
 java -Dsolr.solr.home=multicore -Dbootstrap_conf=true  -DnumShards=3
 -DSTOP.PORT=8079 -DSTOP.KEY=some_value -jar start.jar --stop


 Thanks,
 -Utkarsh



 On Tue, Oct 22, 2013 at 10:09 AM, Raheel Hasan raheelhasan@gmail.com
 wrote:

  ok fantastic... thanks a lot guyz
 
 
  On Tue, Oct 22, 2013 at 10:00 PM, François Schiettecatte 
  fschietteca...@gmail.com wrote:
 
   Yago has the right command to search for the process, that will get you
   the process ID specifically the first number on the output line, then
 do
   'kill ###', if that fails 'kill -9 ###'.
  
   François
  
   On Oct 22, 2013, at 12:56 PM, Raheel Hasan raheelhasan@gmail.com
   wrote:
  
its CentOS...
   
and using jetty with solr here..
   
   
On Tue, Oct 22, 2013 at 9:54 PM, François Schiettecatte 
fschietteca...@gmail.com wrote:
   
A few more specifics about the environment would help,
   Windows/Linux/...?
Jetty/Tomcat/...?
   
François
   
On Oct 22, 2013, at 12:50 PM, Yago Riveiro yago.rive...@gmail.com
   wrote:
   
If you are asking about if solr has a way to restart himself, I
 think
that the answer is no.
   
If you lost control of the remote machine someone will need to go
 and
restart the machine ...
   
You can try use a kvm or other remote control system
   
--
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
   
   
On Tuesday, October 22, 2013 at 5:46 PM, François Schiettecatte
  wrote:
   
If you are on linux/unix, use the kill command.
   
François
   
On Oct 22, 2013, at 12:42 PM, Raheel Hasan 
  raheelhasan@gmail.com
   (mailto:
raheelhasan@gmail.com) wrote:
   
Hi,
   
is there a way to stop/restart java? I lost control over it via
 SSH
   and
connection was closed. But the Solr (start.jar) is still running.
   
thanks.
   
--
Regards,
Raheel Hasan
   
   
   
   
   
   
   
   
   
   
--
Regards,
Raheel Hasan
  
  
 
 
  --
  Regards,
  Raheel Hasan
 



 --
 Thanks,
 -Utkarsh




-- 
Regards,
Raheel Hasan

Re: Stop/Restart Solr

also, is this DSTOP.PORT same as on which solr is visible on a browser
(i.e. like 8983 from http://localhost:8983)?


On Wed, Oct 23, 2013 at 2:49 PM, Raheel Hasan raheelhasan@gmail.comwrote:

 Kill -9  didnt kill it... ... the process is now again listed, but
 with PPID=1 which I dont want to kill as many processes have this same id...


  On Tue, Oct 22, 2013 at 11:59 PM, Utkarsh Sengar 
 utkarsh2...@gmail.comwrote:

 We use this to start/stop solr:

 Start:
 java -Dsolr.clustering.enabled=true -Dsolr.solr.home=multicore
 -Djetty.class.path=lib/ext/* -Dbootstrap_conf=true -DnumShards=3
 -DSTOP.PORT=8079 -DSTOP.KEY=some_value -jar start.jar

 Stop:
 java -Dsolr.solr.home=multicore -Dbootstrap_conf=true  -DnumShards=3
 -DSTOP.PORT=8079 -DSTOP.KEY=some_value -jar start.jar --stop


 Thanks,
 -Utkarsh



 On Tue, Oct 22, 2013 at 10:09 AM, Raheel Hasan raheelhasan@gmail.com
 wrote:

  ok fantastic... thanks a lot guyz
 
 
  On Tue, Oct 22, 2013 at 10:00 PM, François Schiettecatte 
  fschietteca...@gmail.com wrote:
 
   Yago has the right command to search for the process, that will get
 you
   the process ID specifically the first number on the output line, then
 do
   'kill ###', if that fails 'kill -9 ###'.
  
   François
  
   On Oct 22, 2013, at 12:56 PM, Raheel Hasan raheelhasan@gmail.com
 
   wrote:
  
its CentOS...
   
and using jetty with solr here..
   
   
On Tue, Oct 22, 2013 at 9:54 PM, François Schiettecatte 
fschietteca...@gmail.com wrote:
   
A few more specifics about the environment would help,
   Windows/Linux/...?
Jetty/Tomcat/...?
   
François
   
On Oct 22, 2013, at 12:50 PM, Yago Riveiro yago.rive...@gmail.com
 
   wrote:
   
If you are asking about if solr has a way to restart himself, I
 think
that the answer is no.
   
If you lost control of the remote machine someone will need to go
 and
restart the machine ...
   
You can try use a kvm or other remote control system
   
--
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
   
   
On Tuesday, October 22, 2013 at 5:46 PM, François Schiettecatte
  wrote:
   
If you are on linux/unix, use the kill command.
   
François
   
On Oct 22, 2013, at 12:42 PM, Raheel Hasan 
  raheelhasan@gmail.com
   (mailto:
raheelhasan@gmail.com) wrote:
   
Hi,
   
is there a way to stop/restart java? I lost control over it via
 SSH
   and
connection was closed. But the Solr (start.jar) is still
 running.
   
thanks.
   
--
Regards,
Raheel Hasan
   
   
   
   
   
   
   
   
   
   
--
Regards,
Raheel Hasan
  
  
 
 
  --
  Regards,
  Raheel Hasan
 



 --
 Thanks,
 -Utkarsh




 --
 Regards,
 Raheel Hasan




-- 
Regards,
Raheel Hasan

RE: Stop/Restart Solr

2013-10-23 Thread Jeevanandam M.

Can you please share output of following command?
ps -ef | grep 'start.jar'

- Jeeva

-- Original Message --
From: Raheel Hasan [mailto:raheelhasan@gmail.com]
Sent: October 23, 2013 3:19:46 PM GMT+05:30
To: solr-user@lucene.apache.org
Subject: Re: Stop/Restart Solr

Kill -9  didnt kill it... ... the process is now again listed, but with
PPID=1 which I dont want to kill as many processes have this same id...

On Tue, Oct 22, 2013 at 11:59 PM, Utkarsh Sengar utkarsh2...@gmail.comwrote:

 We use this to start/stop solr:

 Start:
 java -Dsolr.clustering.enabled=true -Dsolr.solr.home=multicore
 -Djetty.class.path=lib/ext/* -Dbootstrap_conf=true -DnumShards=3
 -DSTOP.PORT=8079 -DSTOP.KEY=some_value -jar start.jar

 Stop:
 java -Dsolr.solr.home=multicore -Dbootstrap_conf=true  -DnumShards=3
 -DSTOP.PORT=8079 -DSTOP.KEY=some_value -jar start.jar --stop

 Thanks,
 -Utkarsh

 On Tue, Oct 22, 2013 at 10:09 AM, Raheel Hasan raheelhasan@gmail.com
 wrote:

 ok fantastic... thanks a lot guyz

 On Tue, Oct 22, 2013 at 10:00 PM, François Schiettecatte 
 fschietteca...@gmail.com wrote:

 Yago has the right command to search for the process, that will get you
 the process ID specifically the first number on the output line, then
 do
 'kill ###', if that fails 'kill -9 ###'.

 François

 On Oct 22, 2013, at 12:56 PM, Raheel Hasan raheelhasan@gmail.com
 wrote:

 its CentOS...

 and using jetty with solr here..

 On Tue, Oct 22, 2013 at 9:54 PM, François Schiettecatte 
 fschietteca...@gmail.com wrote:

 A few more specifics about the environment would help,
 Windows/Linux/...?
 Jetty/Tomcat/...?

 François

 On Oct 22, 2013, at 12:50 PM, Yago Riveiro yago.rive...@gmail.com
 wrote:

 If you are asking about if solr has a way to restart himself, I
 think
 that the answer is no.

 If you lost control of the remote machine someone will need to go
 and
 restart the machine ...

 You can try use a kvm or other remote control system

 --
 Yago Riveiro
 Sent with Sparrow (http://www.sparrowmailapp.com/?sig)

 On Tuesday, October 22, 2013 at 5:46 PM, François Schiettecatte
 wrote:

 If you are on linux/unix, use the kill command.

 François

 On Oct 22, 2013, at 12:42 PM, Raheel Hasan 
 raheelhasan@gmail.com
 (mailto:
 raheelhasan@gmail.com) wrote:

 Hi,

 is there a way to stop/restart java? I lost control over it via
 SSH
 and
 connection was closed. But the Solr (start.jar) is still running.

 thanks.

 --
 Regards,
 Raheel Hasan

 --
 Regards,
 Raheel Hasan

 --
 Regards,
 Raheel Hasan

 --
 Thanks,
 -Utkarsh

-- 
Regards,
Raheel Hasan

Minor bug with CloudSolrServer and collection-alias.

2013-10-23 Thread Thomas Egense

I found this bug in both 4.4 and 4.5

Using cloudSolrServer.setDefaultCollection(collectionId) does not work as
intended for an alias spanning more than 1 collection.
The virtual collection-alias collectionID is recoqnized as a existing
collection, but it does only query one of the collections it is mapped to.

You can confirm this easy in AliasIntegrationTest.

The test-class AliasIntegrationTest creates to cores with 2 and 3 different
documents. And then creates an alias pointing to both of them.

Line 153:
// search with new cloud client
CloudSolrServer cloudSolrServer = new
CloudSolrServer(zkServer.getZkAddress(), random().nextBoolean());
cloudSolrServer.setParallelUpdates(random().nextBoolean());
query = new SolrQuery(*:*);
query.set(collection, testalias);
res = cloudSolrServer.query(query);
cloudSolrServer.shutdown();
assertEquals(5, res.getResults().getNumFound());

No unit-test bug here, however if you change it from setting the
collectionid on the query but on CloudSolrServer instead,it will produce
the bug:

// search with new cloud client
CloudSolrServer cloudSolrServer = new
CloudSolrServer(zkServer.getZkAddress(), random().nextBoolean());
cloudSolrServer.setDefaultCollection(testalias);
cloudSolrServer.setParallelUpdates(random().nextBoolean());
query = new SolrQuery(*:*);
//query.set(collection, testalias);
res = cloudSolrServer.query(query);
cloudSolrServer.shutdown();
assertEquals(5, res.getResults().getNumFound());  -- Assertion failure

Should I create a Jira issue for this?

From,
Thomas Egense

Re: Stop/Restart Solr

31173 1  0 16:45 ?00:00:08 java -jar start.jar


On Wed, Oct 23, 2013 at 2:53 PM, Jeevanandam M. je...@myjeeva.com wrote:

 Can you please share output of following command?
 ps -ef | grep 'start.jar'

 - Jeeva

 -- Original Message --
 From: Raheel Hasan [mailto:raheelhasan@gmail.com]
 Sent: October 23, 2013 3:19:46 PM GMT+05:30
 To: solr-user@lucene.apache.org
 Subject: Re: Stop/Restart Solr


 Kill -9  didnt kill it... ... the process is now again listed, but with
 PPID=1 which I dont want to kill as many processes have this same id...


 On Tue, Oct 22, 2013 at 11:59 PM, Utkarsh Sengar utkarsh2...@gmail.com
 wrote:

  We use this to start/stop solr:
 
  Start:
  java -Dsolr.clustering.enabled=true -Dsolr.solr.home=multicore
  -Djetty.class.path=lib/ext/* -Dbootstrap_conf=true -DnumShards=3
  -DSTOP.PORT=8079 -DSTOP.KEY=some_value -jar start.jar
 
  Stop:
  java -Dsolr.solr.home=multicore -Dbootstrap_conf=true  -DnumShards=3
  -DSTOP.PORT=8079 -DSTOP.KEY=some_value -jar start.jar --stop
 
 
  Thanks,
  -Utkarsh
 
 
 
  On Tue, Oct 22, 2013 at 10:09 AM, Raheel Hasan 
 raheelhasan@gmail.com
  wrote:
 
  ok fantastic... thanks a lot guyz
 
 
  On Tue, Oct 22, 2013 at 10:00 PM, François Schiettecatte 
  fschietteca...@gmail.com wrote:
 
  Yago has the right command to search for the process, that will get you
  the process ID specifically the first number on the output line, then
  do
  'kill ###', if that fails 'kill -9 ###'.
 
  François
 
  On Oct 22, 2013, at 12:56 PM, Raheel Hasan raheelhasan@gmail.com
  wrote:
 
  its CentOS...
 
  and using jetty with solr here..
 
 
  On Tue, Oct 22, 2013 at 9:54 PM, François Schiettecatte 
  fschietteca...@gmail.com wrote:
 
  A few more specifics about the environment would help,
  Windows/Linux/...?
  Jetty/Tomcat/...?
 
  François
 
  On Oct 22, 2013, at 12:50 PM, Yago Riveiro yago.rive...@gmail.com
  wrote:
 
  If you are asking about if solr has a way to restart himself, I
  think
  that the answer is no.
 
  If you lost control of the remote machine someone will need to go
  and
  restart the machine ...
 
  You can try use a kvm or other remote control system
 
  --
  Yago Riveiro
  Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
 
 
  On Tuesday, October 22, 2013 at 5:46 PM, François Schiettecatte
  wrote:
 
  If you are on linux/unix, use the kill command.
 
  François
 
  On Oct 22, 2013, at 12:42 PM, Raheel Hasan 
  raheelhasan@gmail.com
  (mailto:
  raheelhasan@gmail.com) wrote:
 
  Hi,
 
  is there a way to stop/restart java? I lost control over it via
  SSH
  and
  connection was closed. But the Solr (start.jar) is still running.
 
  thanks.
 
  --
  Regards,
  Raheel Hasan
 
 
 
 
 
 
 
 
 
 
  --
  Regards,
  Raheel Hasan
 
 
 
 
  --
  Regards,
  Raheel Hasan
 
 
 
 
  --
  Thanks,
  -Utkarsh
 



 --
 Regards,
 Raheel Hasan






-- 
Regards,
Raheel Hasan

RE: Stop/Restart Solr

2013-10-23 Thread Jeevanandam M.

It seems process started recently. Is there any external cron/process 
triggering a startup of Solr?
Kill again and monitor it.

- Jeeva

-- Original Message --
From: Raheel Hasan [mailto:raheelhasan@gmail.com]
Sent: October 23, 2013 3:29:47 PM GMT+05:30
To: solr-user@lucene.apache.org
Subject: Re: Stop/Restart Solr

31173 1  0 16:45 ?00:00:08 java -jar start.jar

On Wed, Oct 23, 2013 at 2:53 PM, Jeevanandam M. je...@myjeeva.com wrote:

 Can you please share output of following command?
 ps -ef | grep 'start.jar'

 - Jeeva

 -- Original Message --
 From: Raheel Hasan [mailto:raheelhasan@gmail.com]
 Sent: October 23, 2013 3:19:46 PM GMT+05:30
 To: solr-user@lucene.apache.org
 Subject: Re: Stop/Restart Solr

 Kill -9  didnt kill it... ... the process is now again listed, but with
 PPID=1 which I dont want to kill as many processes have this same id...

 On Tue, Oct 22, 2013 at 11:59 PM, Utkarsh Sengar utkarsh2...@gmail.com
 wrote:

 We use this to start/stop solr:

 Start:
 java -Dsolr.clustering.enabled=true -Dsolr.solr.home=multicore
 -Djetty.class.path=lib/ext/* -Dbootstrap_conf=true -DnumShards=3
 -DSTOP.PORT=8079 -DSTOP.KEY=some_value -jar start.jar

 Stop:
 java -Dsolr.solr.home=multicore -Dbootstrap_conf=true  -DnumShards=3
 -DSTOP.PORT=8079 -DSTOP.KEY=some_value -jar start.jar --stop

 Thanks,
 -Utkarsh

 On Tue, Oct 22, 2013 at 10:09 AM, Raheel Hasan 
 raheelhasan@gmail.com
 wrote:

 ok fantastic... thanks a lot guyz

 On Tue, Oct 22, 2013 at 10:00 PM, François Schiettecatte 
 fschietteca...@gmail.com wrote:

 Yago has the right command to search for the process, that will get you
 the process ID specifically the first number on the output line, then
 do
 'kill ###', if that fails 'kill -9 ###'.

 François

 On Oct 22, 2013, at 12:56 PM, Raheel Hasan raheelhasan@gmail.com
 wrote:

 its CentOS...

 and using jetty with solr here..

 On Tue, Oct 22, 2013 at 9:54 PM, François Schiettecatte 
 fschietteca...@gmail.com wrote:

 A few more specifics about the environment would help,
 Windows/Linux/...?
 Jetty/Tomcat/...?

 François

 On Oct 22, 2013, at 12:50 PM, Yago Riveiro yago.rive...@gmail.com
 wrote:

 If you are asking about if solr has a way to restart himself, I
 think
 that the answer is no.

 If you lost control of the remote machine someone will need to go
 and
 restart the machine ...

 You can try use a kvm or other remote control system

 --
 Yago Riveiro
 Sent with Sparrow (http://www.sparrowmailapp.com/?sig)

 On Tuesday, October 22, 2013 at 5:46 PM, François Schiettecatte
 wrote:

 If you are on linux/unix, use the kill command.

 François

 On Oct 22, 2013, at 12:42 PM, Raheel Hasan 
 raheelhasan@gmail.com
 (mailto:
 raheelhasan@gmail.com) wrote:

 Hi,

 is there a way to stop/restart java? I lost control over it via
 SSH
 and
 connection was closed. But the Solr (start.jar) is still running.

 thanks.

 --
 Regards,
 Raheel Hasan

 --
 Regards,
 Raheel Hasan

 --
 Regards,
 Raheel Hasan

 --
 Thanks,
 -Utkarsh

 --
 Regards,
 Raheel Hasan

-- 
Regards,
Raheel Hasan

Why analyzer only output part of my string ?

2013-10-23 Thread Mingzhu Gao

Hi All ,

I have configured a custom analyzer (Chinese) in solr 4.5.0 , when I access  
http://localhost:8983/solr/#/collection1/analysis ,
Choose my fieldType , and input some character string , why only part of string 
is analyzed ?  the last part of string is dismissed.
Is there any length limitation for analyzer ?  how to configure this ?

Thanks,
-Judy

Re: Question about sharding and overlapping

No, shard splitting does not support collections with implicit router.

On Wed, Oct 23, 2013 at 1:21 PM, Yago Riveiro yago.rive...@gmail.comwrote:

Can I split shards as with compositeId using this method?

On Wednesday, October 23, 2013, Shalin Shekhar Mangar wrote:

You can't control that if using the compositeIdRouter because the routing
is dependent on the hash function. What you want is custom sharding i.e.
the ability to control the shard to which updates are routed.

You should create a collection using the Collections API with a shards
param specifying the names of the shards you want to create. Then when
indexing documents, include a shard=X param to route requests directly to
that shard. While querying, you can choose to query the entire collection
or again limit the shards using the same shard parameter.

On Wed, Oct 23, 2013 at 4:22 AM, yriveiro yago.rive...@gmail.com
javascript:;
wrote:

Hi,

I created a collection with 12 shards and route.field=month (month
field
will have values between 1 .. 12)

I notice that I have shards with more that a month into them. This
could
left empty some shard and I want the documents one month in each shard.

My question is, how I configure the sharding method to avoid overlaps?

/Yago

-
Best regards
--
View this message in context:

http://lucene.472066.n3.nabble.com/Question-about-sharding-and-overlapping-tp4097111.html
Sent from the Solr - User mailing list archive at Nabble.com.

--
Regards,
Shalin Shekhar Mangar.

--
/Yago Riveiro

--
Regards,
Shalin Shekhar Mangar.

Re: DIH - URLDataSource import size

Following up within 15 hours is not going to do any good -- it just
increases email traffic for everyone. Please understand that a lot of
people here are in different time zones and almost all of them are
volunteers answering questions in addition to their day jobs.

Are there any exceptions in the logs of your production environment?


On Wed, Oct 23, 2013 at 1:10 PM, Raheel Hasan raheelhasan@gmail.comwrote:

 anyone?


 On Tue, Oct 22, 2013 at 9:50 PM, Raheel Hasan raheelhasan@gmail.com
 wrote:

  Hi,
 
  I have an issue that is only coming on live environment. The DIH
  with URLDataSource is not working when the file size imported is large
  (i.e. 100kb above - which is not so large). If its large, it returns
  nothing (as seen in the Debug section of DataImport at Solr Admin).
 
  However, when working on local environment, this issue doesnt come at
 all.
 
  (note that I am using it with URLDataSource with PlainTextEntityProcessor
  in the entity field).
 
  Please help me as I tried to get it done a lot, but cant !!
 
  Thanks a lot.
 
  --
  Regards,
  Raheel Hasan
 



 --
 Regards,
 Raheel Hasan




-- 
Regards,
Shalin Shekhar Mangar.

New shard leaders or existing shard replicas depends on zookeeper?

2013-10-23 Thread Hoggarth, Gil

Hi solr-users,

 

I'm seeing some confusing behaviour in Solr/zookeeper and hope you can
shed some light on what's happening/how I can correct it.

 

We have two physical servers running automated builds of RedHat 6.4 and
Solr 4.4.0 that host two separate Solr services. The first server
(called ld01) has 24 shards and hosts a collection called 'ukdomain';
the second server (ld02) also has 24 shards and hosts a different
collection called 'ldwa'. It's evidently important to note that
previously both of these physical servers provided the 'ukdomain'
collection, but the 'ldwa' server has been rebuilt for the new
collection.

 

When I start the ldwa solr nodes with their zookeeper configuration
(defined in /etc/sysconfig/solrnode* and with collection.configName as
'ldwacfg') pointing to the development zookeeper ensemble, all nodes
initially become shard leaders and then replicas as I'd expect. But if I
change the ldwa solr nodes to point to the zookeeper ensemble also used
for the ukdomain collection, all ldwa solr nodes start on the same shard
(that is, the first ldwa solr node becomes the shard leader, then every
other solr node becomes a replica for this shard). The significant point
here is no other ldwa shards gain leaders (or replicas).

 

The ukdomain collection uses a zookeeper collection.configName of
'ukdomaincfg', and prior to the creation of this ldwa service the
collection.configName of 'ldwacfg' has never previously been used. So
I'm confused why the ldwa service would differ when the only difference
is which zookeeper ensemble is used (both zookeeper ensembles are
automatedly built using version 3.4.5).

 

If anyone can explain why this is happening and how I can get the ldwa
services to start correctly using the non-development zookeeper
ensemble, I'd be very grateful! If more information or explanation is
needed, just ask.

 

Thanks, Gil

 

Gil Hoggarth

Web Archiving Technical Services Engineer 

The British Library, Boston Spa, West Yorkshire, LS23 7BQ

Re: Indexing logs files of thousands of GBs

2013-10-23 Thread Chris Geeringh

Prerna,

The FileListEntityProcessor has a terribly inefficient recursive method,
which will be using up all your heap building a list of files.

I would suggest writing a client application and traverse your filesystem
with NIO available in Java 7. Files.walkFileTree() and a FileVisitor.

As you walk post up to the server with SolrJ.

Cheers,
Chris


On 22 October 2013 18:58, keshari.prerna keshari.pre...@gmail.com wrote:

 Hello,

 I am tried to index log files (all text data) stored in file system. Data
 can be as big as 1000 GBs or more. I am working on windows.

 A sample file can be found at
 https://www.dropbox.com/s/mslwwnme6om38b5/batkid.glnxa64.66441

 I tried using FileListEntityProcessor with TikaEntityProcessor which ended
 up in java heap exception and couldn't get rid of it no matter how much I
 increase my ram size.
 data-confilg.xml

 dataConfig
 dataSource name=bin type=FileDataSource /
 document
 entity name=f dataSource=null rootEntity=true
 processor=FileListEntityProcessor
 transformer=TemplateTransformer
 baseDir=//mathworks/devel/bat/A/logs/66048/
 fileName=.*\.* onError=skip recursive=true

 field column=fileAbsolutePath name=path /
 field column=fileSize name=size/
 field column=fileLastModified name=lastmodified /

 entity name=file dataSource=bin
 processor=TikaEntityProcessor url=${f.fileAbsolutePath} format=text
 onError=skip transformer=TemplateTransformer
rootEntity=true
 field column=text name=text/
 /entity
 /entity
 /document
 /dataConfig

 Then i used FileListEntityProcessor with LineEntityProcessor which never
 stopped indexing even after 40 hours or so.

 data-config.xml

 dataConfig
 dataSource name=bin type=FileDataSource /
 document
 entity name=f dataSource=null rootEntity=true
 processor=FileListEntityProcessor
 transformer=TemplateTransformer
 baseDir=//mathworks/devel/bat/A/logs/
 fileName=.*\.* onError=skip recursive=true

 field column=fileAbsolutePath name=path /
 field column=fileSize name=size/
 field column=fileLastModified name=lastmodified /

 entity name=file dataSource=bin
 processor=LineEntityProcessor url=${f.fileAbsolutePath} format=text
 onError=skip
rootEntity=true
 field column=content name=rawLine/
 /entity
 /entity
 /document
 /dataConfig

 Is there any way i can use post.jar to index text file recursively. Or any
 other way which works without java heap exception and doesn't take days to
 index.

 I am completely stuck here. Any help would be greatly appreciated.

 Thanks,
 Prerna



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Indexing-logs-files-of-thousands-of-GBs-tp4097073.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: fq with { or } in Solr 4.3.1

Are you using the edismax query parser? It traps the syntax error and then 
escapes or ignores special characters.


Curly braces are used for exclusive range queries (square brackets are 
inclusive ranges). The proper syntax is {term1 TO term2}.


So, what were your intentions with catid:{123}? If you are simply trying 
to pass the braces as literal characters for a string field, either escape 
them with backslash or enclose the entire term in quotes:


catid:\{123\}

catid:{123}

-- Jack Krupansky

-Original Message- 
From: Peter Kirk

Sent: Wednesday, October 23, 2013 4:57 AM
To: solr-user@lucene.apache.org
Subject: RE: fq with { or } in Solr 4.3.1

Sorry, that was just a typo.

/ search?q=*:*fq=catid:{123}

Gives me the error.

I think that { and } must be used in ranges for fq, and that's why I can't 
use them directly like this.


/Peter



-Original Message-
From: Upayavira [mailto:u...@odoko.co.uk]
Sent: 23. oktober 2013 10:52
To: solr-user@lucene.apache.org
Subject: Re: fq with { or } in Solr 4.3.1

Missing a colon before the curly bracket in the fq?

On Wed, Oct 23, 2013, at 09:42 AM, Peter Kirk wrote:

Hi

If I do a search like
/search?q=catid:{123}

I get the results I expect.

But if I do
/search?q=*:*fq=catid{123}

I get an error from Solr like:
org.apache.solr.search.SyntaxError: Cannot parse 'catid:{123}':
Encountered  } }  at line 1, column 58. Was expecting one of: TO
... RANGE_QUOTED ... RANGE_GOOP ...


Can I not use { or } in an fq?

Thanks,
Peter

Re: Stop/Restart Solr

2013-10-23 Thread Furkan KAMACI

Did you check that is it running as a service or not? If it runs as a
service when even you kill the process it may start again.


2013/10/23 Jeevanandam M. je...@myjeeva.com

 It seems process started recently. Is there any external cron/process
 triggering a startup of Solr?
 Kill again and monitor it.

 - Jeeva

 -- Original Message --
 From: Raheel Hasan [mailto:raheelhasan@gmail.com]
 Sent: October 23, 2013 3:29:47 PM GMT+05:30
 To: solr-user@lucene.apache.org
 Subject: Re: Stop/Restart Solr


 31173 1  0 16:45 ?00:00:08 java -jar start.jar


 On Wed, Oct 23, 2013 at 2:53 PM, Jeevanandam M. je...@myjeeva.com wrote:

  Can you please share output of following command?
  ps -ef | grep 'start.jar'
 
  - Jeeva
 
  -- Original Message --
  From: Raheel Hasan [mailto:raheelhasan@gmail.com]
  Sent: October 23, 2013 3:19:46 PM GMT+05:30
  To: solr-user@lucene.apache.org
  Subject: Re: Stop/Restart Solr
 
 
  Kill -9  didnt kill it... ... the process is now again listed, but
 with
  PPID=1 which I dont want to kill as many processes have this same id...
 
 
  On Tue, Oct 22, 2013 at 11:59 PM, Utkarsh Sengar utkarsh2...@gmail.com
  wrote:
 
  We use this to start/stop solr:
 
  Start:
  java -Dsolr.clustering.enabled=true -Dsolr.solr.home=multicore
  -Djetty.class.path=lib/ext/* -Dbootstrap_conf=true -DnumShards=3
  -DSTOP.PORT=8079 -DSTOP.KEY=some_value -jar start.jar
 
  Stop:
  java -Dsolr.solr.home=multicore -Dbootstrap_conf=true  -DnumShards=3
  -DSTOP.PORT=8079 -DSTOP.KEY=some_value -jar start.jar --stop
 
 
  Thanks,
  -Utkarsh
 
 
 
  On Tue, Oct 22, 2013 at 10:09 AM, Raheel Hasan 
  raheelhasan@gmail.com
  wrote:
 
  ok fantastic... thanks a lot guyz
 
 
  On Tue, Oct 22, 2013 at 10:00 PM, François Schiettecatte 
  fschietteca...@gmail.com wrote:
 
  Yago has the right command to search for the process, that will get
 you
  the process ID specifically the first number on the output line, then
  do
  'kill ###', if that fails 'kill -9 ###'.
 
  François
 
  On Oct 22, 2013, at 12:56 PM, Raheel Hasan raheelhasan@gmail.com
 
  wrote:
 
  its CentOS...
 
  and using jetty with solr here..
 
 
  On Tue, Oct 22, 2013 at 9:54 PM, François Schiettecatte 
  fschietteca...@gmail.com wrote:
 
  A few more specifics about the environment would help,
  Windows/Linux/...?
  Jetty/Tomcat/...?
 
  François
 
  On Oct 22, 2013, at 12:50 PM, Yago Riveiro yago.rive...@gmail.com
  wrote:
 
  If you are asking about if solr has a way to restart himself, I
  think
  that the answer is no.
 
  If you lost control of the remote machine someone will need to go
  and
  restart the machine ...
 
  You can try use a kvm or other remote control system
 
  --
  Yago Riveiro
  Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
 
 
  On Tuesday, October 22, 2013 at 5:46 PM, François Schiettecatte
  wrote:
 
  If you are on linux/unix, use the kill command.
 
  François
 
  On Oct 22, 2013, at 12:42 PM, Raheel Hasan 
  raheelhasan@gmail.com
  (mailto:
  raheelhasan@gmail.com) wrote:
 
  Hi,
 
  is there a way to stop/restart java? I lost control over it via
  SSH
  and
  connection was closed. But the Solr (start.jar) is still running.
 
  thanks.
 
  --
  Regards,
  Raheel Hasan
 
 
 
 
 
 
 
 
 
 
  --
  Regards,
  Raheel Hasan
 
 
 
 
  --
  Regards,
  Raheel Hasan
 
 
 
 
  --
  Thanks,
  -Utkarsh
 
 
 
 
  --
  Regards,
  Raheel Hasan
 
 
 
 


 --
 Regards,
 Raheel Hasan

Re: Class name of parsing the fq clause

Not in just a few words. Do you have specific questions? I mean none of that 
relates to parsing of fq, the topic of this particular email thread, right?


-- Jack Krupansky

-Original Message- 
From: Sandeep Gupta

Sent: Wednesday, October 23, 2013 3:58 AM
To: solr-user@lucene.apache.org
Subject: Re: Class name of parsing the fq clause

Thanks Jack for detailing out the parser logic.
Would it be possible for you to say something more about filter cache code
flow...  sometimes we do not use fq parameter  in query string and pass the
raw query

Regards
Sandeep


On Mon, Oct 21, 2013 at 7:11 PM, Jack Krupansky 
j...@basetechnology.comwrote:



Start with org.apache.solr.handler.**component.QueryComponent#**prepare
which fetches the fq parameters and indirectly invokes the query 
parser(s):


String[] fqs = req.getParams().getParams(**CommonParams.FQ);
if (fqs!=null  fqs.length!=0) {
  ListQuery filters = rb.getFilters();
  // if filters already exists, make a copy instead of modifying the
original
  filters = filters == null ? new ArrayListQuery(fqs.length) : new
ArrayListQuery(filters);
  for (String fq : fqs) {
if (fq != null  fq.trim().length()!=0) {
  QParser fqp = QParser.getParser(fq, null, req);
  filters.add(fqp.getQuery());
}
  }
  // only set the filters if they are not empty otherwise
  // fq=someotherParam= will trigger all docs filter for every request
  // if filter cache is disabled
  if (!filters.isEmpty()) {
rb.setFilters( filters );

Note that this line actually invokes the parser:

  filters.add(fqp.getQuery());

Then in org.apache.lucene.search.**Query.QParser#getParser:

QParserPlugin qplug = req.getCore().getQueryPlugin(**parserName);
QParser parser =  qplug.createParser(qstr, localParams, req.getParams(),
req);

And for the common case of the Lucene query parser, 
org.apache.solr.search.

**LuceneQParserPlugin#**createParser:

public QParser createParser(String qstr, SolrParams localParams,
SolrParams params, SolrQueryRequest req) {
 return new LuceneQParser(qstr, localParams, params, req);
}

And then in org.apache.lucene.search.**Query.QParser#getQuery:

public Query getQuery() throws SyntaxError {
 if (query==null) {
   query=parse();

And then in org.apache.lucene.search.**Query.LuceneQParser#parse:

lparser = new SolrQueryParser(this, defaultField);

lparser.setDefaultOperator
 (QueryParsing.**getQueryParserDefaultOperator(**getReq().getSchema(),
 getParam(QueryParsing.OP)));

return lparser.parse(qstr);

And then in org.apache.solr.parser.**SolrQueryParserBase#parse:

Query res = TopLevelQuery(null);  // pass null so we can tell later if an
explicit field was provided or not

And then in org.apache.solr.parser.**QueryParser#TopLevelQuery, the
parsing begins.

And org.apache.solr.parser.**QueryParser.jj is the grammar for a basic
Solr/Lucene query, and org.apache.solr.parser.**QueryParser.java is
generated by JFlex, and a lot of the logic is in the base class of the
generated class, org.apache.solr.parser.**SolrQueryParserBase.java.

Good luck! Happy hunting!

-- Jack Krupansky

-Original Message- From: YouPeng Yang
Sent: Monday, October 21, 2013 2:57 AM
To: solr-user@lucene.apache.org
Subject: Class name of parsing the fq clause


Hi
  I search the solr with fq clause,which is like:
  fq=BEGINTIME:[2013-08-25T16:**00:00Z TO *] AND BUSID:(M3 OR M9)


  I am curious about the parsing process . I want to study it.
  What is the Java file name describes  the parsing  process of the fq
clause.


 Thanks

Regards.

Re: Chinese language search in SOLR 3.6.1

2013-10-23 Thread Poornima Jay

Hi Rajani,

The string field type is not analyzed. But that is not the case for 
text_chinese field type for which is  ChineseTokenizerFactory and 
ChineseFilterFactory is added for index and query analysis. Below check the 
schema and the fields how it is defined in my above mail.

Thanks,
Poornima



On Wednesday, 23 October 2013 7:21 AM, Rajani Maski rajinima...@gmail.com 
wrote:
 
String field will work for any case when you do exact key search.
text_chinese also should work if you are simply searching with exact
string676767667.

Well, the best way to find an answer to this query is by using solr
analysis tool : http://localhost:8983/solr/#/collection1/analysis
Enter your field type and index time input that you had given with query
value that you are searching for.

You should be able to find your answers.






On Tue, Oct 22, 2013 at 8:06 PM, Poornima Jay poornima...@rocketmail.comwrote:

 Hi Rajani,

 Below is the configured in my schema.
 fieldType name=text_chinese class=solr.TextField
 positionIncrementGap=100
       analyzer type=index
         tokenizer class=solr.ChineseTokenizerFactory/
         filter class=solr.StopFilterFactory  ignoreCase=true
  words=stopwords.txt   enablePositionIncrements=true /
         filter class=solr.ChineseFilterFactory /
         filter class=solr.RemoveDuplicatesTokenFilterFactory/
       /analyzer
       analyzer type=query
         tokenizer class=solr.ChineseTokenizerFactory/
         filter class=solr.ChineseFilterFactory /
         filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt/
         filter class=solr.RemoveDuplicatesTokenFilterFactory/
       /analyzer
     /fieldType

 field name=product_code type=string indexed=true stored=false
 multiValued=true /
 field name=author_name type=text_chinese indexed=true
 stored=false multiValued=true/
 field name=author_name_string type=string indexed=true
 stored=false multiValued=true /
 field name=simple type=text_chinese indexed=true stored=false
 multiValued=true /
 copyField source=product_code dest=simple /
 copyField source=author_name dest=author_name_string /

 if I search with the query q=simple:总评价 it works but doesn't work if I
 search with q=simple:676767667. If the field is defined as string the
 chinese character works but doesn't work if it is defined as text_chinese.

 Regards,
 Poornima




   On Tuesday, 22 October 2013 7:52 PM, Rajani Maski rajinima...@gmail.com
 wrote:
  Hi Poornima,

   Your statement :   It works fine with the chinese strings but not
 working with product code or ISBN even though the fields are defined as
 string is confusing.

 Did you mean that the product code and ISBN fields are of type
 text_Chinese?

 Is it first or second:
 field name=product_code* type=string *indexed=true stored=false/
 or
 field name=product_code type=text_chinese indexed=true
 stored=false/


 What do you refer to when you tell that it's not working? Unable to search?

















 On Tue, Oct 22, 2013 at 6:09 PM, Poornima Jay 
 poornima...@rocketmail.comwrote:

 Hi,

 Did any one face a problem for chinese language in SOLR 3.6.1. Below is
 the analyzer in the schema.xml file.

 fieldType name=text_chinese class=solr.TextField
 positionIncrementGap=100
       analyzer type=index
           tokenizer class=solr.CJKTokenizerFactory/
            filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true/
            filter class=solr.ChineseFilterFactory /
           filter class=solr.RemoveDuplicatesTokenFilterFactory/

       /analyzer
       analyzer type=query
         tokenizer class=solr.CJKTokenizerFactory/
           filter class=solr.ChineseFilterFactory /
           filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt/
           filter class=solr.RemoveDuplicatesTokenFilterFactory/
       /analyzer
  /fieldType

 It works fine with the chinese strings but not working with product code
 or ISBN even though the fields are defined as string.

 Please let me know how should the chinese schema be configured.

 Thanks.
 Poornima

Query cache and group by queries

2013-10-23 Thread Kalle Aaltonen

Hi,

It seems that query cache is not used to all for group queries? Can someone
explain why this is?

Having two document sets in one index, separated by filter query.

2013-10-23 Thread Achim Domma

Hi,

I have two document sets, both having the same schema. On set is the larger 
reference set (lets say a few hundred thousand documents) and the smaller set 
is some user generated content (a few hundreds or thousands). In most cases, I 
just want to search on the larger reference sets but some functionality also 
works on both sets.

Is the following assumption correct?

I could add a field to the schema which defines what type a document is (large 
set or small set). Now I configure two search handlers. For one handler I add a 
filter query, which filters on the just defined type field. If I use this 
handler in my application, I should only see content from the large set and it 
should be impossible to get results from the small set back.

cheers,
Achim

Re: Solr cloud weird behaviour

When you say missing files, do you mean the index segments are missing or
what?

Are your document counts the same the night before and after? Is there any
indexing
going on?

We need some more specifics if we're to help you. If you do have indexing
going on,
then you might be getting segment merges. The key is whether documents are
disappearing from your index when you search.

Best,
Erick


On Tue, Oct 22, 2013 at 11:11 AM, Andreas Weichhart
a.weichh...@gmail.comwrote:

 Hi guys

 i have a little bit of a problem for some time now and can't finde any
 solution nor have i any idea why this is happening :)

 So i'm running a solr cloud for my project at work 3 zks, 4 solrs, and i'm
 indexing through jsolr because i'm indexing xml files -- parsing them
 first and creating solr documents and then putting them on the solr.

 so my problem scenario is as follows:

 I do a full import, i have erverything indexed, commited and optimized at
 least that what i do in the java code and in the webapp it shows me that
 every collection is commited/optimzed

 So when i check the next day (have a db too where all my documents are
 referenced) i'm suddendly missing files on the solr and i have no idea how
 they got lost :)

 any idea how this could happen?

 cheers andi

Re: SolrCloud performance in VM environment

Be a bit careful here. 128G is lots of memory, you may encounter very long
garbage collection pauses. Just be aware that this may be happening later.

Best,
Erick


On Tue, Oct 22, 2013 at 5:04 PM, Tom Mortimer tom.m.f...@gmail.com wrote:

 Just tried it with no other changes than upping the RAM to 128GB total, and
 it's flying. I think that proves that RAM is good. =)  Will implement
 suggested changes later, though.

 cheers,
 Tom


 On 22 October 2013 09:04, Tom Mortimer tom.m.f...@gmail.com wrote:

  Boogie, Shawn,
 
  Thanks for the replies. I'm going to try out some of your suggestions
  today. Although, without more RAM I'm not that optimistic..
 
  Tom
 
 
 
  On 21 October 2013 18:40, Shawn Heisey s...@elyograg.org wrote:
 
  On 10/21/2013 9:48 AM, Tom Mortimer wrote:
 
  Hi everyone,
 
  I've been working on an installation recently which uses SolrCloud to
  index
  45M documents into 8 shards on 2 VMs running 64-bit Ubuntu (with
 another
  2
  identical VMs set up for replicas). The reason we're using so many
 shards
  for a relatively small index is that there are complex filtering
  requirements at search time, to restrict users to items they are
 licensed
  to view. Initial tests demonstrated that multiple shards would be
  required.
 
  The total size of the index is about 140GB, and each VM has 16GB RAM
  (32GB
  total) and 4 CPU units. I know this is far under what would normally be
  recommended for an index of this size, and I'm working on persuading
 the
  customer to increase the RAM (basically, telling them it won't work
  otherwise.) Performance is currently pretty poor and I would expect
 more
  RAM to improve things. However, there are a couple of other oddities
  which
  concern me,
 
 
  Running multiple shards like you are, where each operating system is
  handling more than one shard, is only going to perform better if your
 query
  volume is low and you have lots of CPU cores.  If your query volume is
 high
  or you only have 2-4 CPU cores on each VM, you might be better off with
  fewer shards or not sharded at all.
 
  The way that I read this is that you've got two physical machines with
  32GB RAM, each running two VMs that have 16GB.  Each VM houses 4
 shards, or
  70GB of index.
 
  There's a scenario that might be better if all of the following are
 true:
  1) I'm right about how your hardware is provisioned.  2) You or the
 client
  owns the hardware.  3) You have an extremely low-end third machine
  available - single CPU with 1GB of RAM would probably be enough.  In
 this
  scenario, you run one Solr instance and one zookeeper instance on each
 of
  your two big machines, and use the third wimpy machine as a third
  zookeeper node.  No virtualization.  For the rest of my reply, I'm
 assuming
  that you haven't taken this step, but it will probably apply either way.
 
 
   The first is that I've been reindexing a fixed set of 500 docs to test
  indexing and commit performance (with soft commits within 60s). The
 time
  taken to complete a hard commit after this is longer than I'd expect,
 and
  highly variable - from 10s to 70s. This makes me wonder whether the SAN
  (which provides all the storage for these VMs and the customers several
  other VMs) is being saturated periodically. I grabbed some iostat
 output
  on
  different occasions to (possibly) show the variability:
 
  Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
  sdb  64.50 0.00  2476.00  0   4952
  ...
  sdb   8.90 0.00   348.00  0   6960
  ...
  sdb   1.15 0.0043.20  0864
 
 
  There are two likely possibilities for this.  One or both of them might
  be in play.  1) Because the OS disk cache is small, not much of the
 index
  can be cached.  This can result in a lot of disk I/O for a commit,
 slowing
  things way down.  Increasing the size of the OS disk cache is really the
  only solution for that. 2) Cache autowarming, particularly the filter
  cache.  In the cache statistics, you can see how long each cache took to
  warm up after the last searcher was opened.  The solution for that is to
  reduce the autowarmCount values.
 
 
   The other thing that confuses me is that after a Solr restart or hard
  commit, search times average about 1.2s under light load. After
 searching
  the same set of queries for 5-6 iterations this improves to 0.1s.
  However,
  in either case - cold or warm - iostat reports no device reads at all:
 
  Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
  sdb   0.40 0.00 8.00  0160
  ...
  sdb   0.30 0.0010.40  0104
 
  (the writes are due to logging). This implies to me that the 'hot'
 blocks
  are being completely cached in RAM - so why the variation in search
 time
  and the number of iterations required to speed it up?

Re: External Zookeeper and JBOSS

When you create the collection, you specify the number of shards you want.
From there on, the data is stored in ZK, I don't think shows up in your
solr.xml file.

Best,
Erick


On Tue, Oct 22, 2013 at 7:08 PM, Branham, Jeremy [HR] 
jeremy.d.bran...@sprint.com wrote:

 [collections] was empty until I used the correct zkcli script from the
 solr distribution.

 I uploaded the config -
 java -classpath .:/production/v8p/deploy/svc.war/WEB-INF/lib/*
 org.apache.solr.cloud.ZkCLI -cmd upconfig -zkhost localhost:2181 -confdir
 /data/v8p/solr/root/conf -confname defaultconfig

 Then ran the bootstrap -
 java -classpath .:/production/v8p/deploy/svc.war/WEB-INF/lib/*
 org.apache.solr.cloud.ZkCLI -cmd bootstrap -zkhost 127.0.0.1:2181-solrhome 
 /data/v8p/solr

 If I'm not mistaken, I don't need to link anything if the collection names
 are defined in the core element [solr.xml]
 The cloud admin page shows each core now, but I'm curious how it know how
 many shards I want to use... I think I missed that somewhere.


 Jeremy D. Branham
 Performance Technologist II
 Sprint University Performance Support
 Fort Worth, TX | Tel: **DOTNET
 http://JeremyBranham.Wordpress.com
 http://www.linkedin.com/in/jeremybranham


 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Tuesday, October 22, 2013 3:57 AM
 To: solr-user@lucene.apache.org
 Subject: Re: External Zookeeper and JBOSS

 What happens if you look in collections?

 Best,
 Erick


 On Mon, Oct 21, 2013 at 9:55 PM, Shawn Heisey s...@elyograg.org wrote:

  On 10/21/2013 1:19 PM, Branham, Jeremy [HR] wrote:
 
  Sorl.xml [simplified by removing additional cores]
 
  ?xml version=1.0 encoding=UTF-8 ? solr persistent=true
  sharedLib=lib zkHost=192.168.1.101:2181
 cores adminPath=/admin/cores
   core schema=/data/v8p/solr/root/**schema/schema.xml
  instanceDir=/data/v8p/solr/**root/ name=wdsp
  dataDir=/data/v8p/solr/wdsp2/**data/
   core schema=/data/v8p/solr/root/**schema/schema.xml
  instanceDir=/data/v8p/solr/**root/ name=wdsp2
  dataDir=/data/v8p/solr/wdsp/**data/
 /cores
  /solr
 
 
  These cores that you have listed here do not look like
  SolrCloud-related cores, because they do not reference a collection or
  a shard.  Here's what I've got on a 4.2.1 box where all cores were
  automatically created by the CREATE action on the collections API:
 
  core schema=schema.xml loadOnStartup=true shard=shard1
  instanceDir=eatatjoes_shard1_**replica2/ transient=false
  name=eatatjoes_shard1_**replica2 config=solrconfig.xml
  collection=eatatjoes/
  core schema=schema.xml loadOnStartup=true shard=shard1
  instanceDir=test3_shard1_**replica1/ transient=false
  name=test3_shard1_replica1 config=solrconfig.xml collection=test3/
  core schema=schema.xml loadOnStartup=true shard=shard1
  instanceDir=smb2_shard1_**replica1/ transient=false
  name=smb2_shard1_replica1 config=solrconfig.xml
  collection=smb2/
 
  On the commandline script -- the zkCli.sh script comes with zookeeper,
  but it is not aware of anything having to do with SolrCloud.  There is
  another script named zkcli.sh (note the lowercase C) that comes with
  the solr example (in example/cloud-scripts)- it's a very different
  script and will accept the options that you tried to give.
 
  I do wonder how much pain would be caused by renaming the Solr zkcli
  script so it's not so similar to the one that comes with Zookeeper.
 
  Thanks,
  Shawn
 
 

 

 This e-mail may contain Sprint proprietary information intended for the
 sole use of the recipient(s). Any use by others is prohibited. If you are
 not the intended recipient, please contact the sender and delete all copies
 of the message.

Changing indexed property on a field from false to true

2013-10-23 Thread michael.boom

Being given 
field name=title type=string bindexed=false* stored=true
multiValued=false /
Changed to 
field name=title type=string bindexed=true* stored=true
multiValued=false /

Once the above is done and the collection reloaded, is there a way I can
build that index on that field, without reindexing the everything?

Thank you!



-
Thanks,
Michael
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Changing-indexed-property-on-a-field-from-false-to-true-tp4097213.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: deleteByQuery does not work with SolrCloud

The first thing I'd do is go in to the browser UI and make sure you can get
hits on documents, something like
blah/collection/q=indexname:shardTv_20131010

Best,
Erick


On Wed, Oct 23, 2013 at 8:20 AM, YouPeng Yang yypvsxf19870...@gmail.comwrote:

 Hi
   I am using SolrCloud withing solr 4.4 ,and I try the SolrJ API
 deleteByQuery to delete the Index as :
  CloudSolrServer cloudServer = new CloudSolrServer(myZKhost)
  cloudServer.connect()
  cloudServer.setDefaultCollection
  cloudServer.deleteByQuery(indexname:shardTv_20131010);
  cloudServer.commit();

 It seems not to work.
 I also have do some google,unfortunately there is no help.

 Do I miss anything?


 Thanks

 Regard

Multiple facet fields in defaults section of a Request Handler

2013-10-23 Thread Varun Thacker

I define 2 facets - brand and category. Both have been configured in a
request handler inside defaults

Now a client wants to use multi select faceting. He calls the following API:

http://localhost:8983/solr/collection1/search?q=*:*facet.field={!ex=foo}categoryfq={!tag=foo}category
:cat

What happens in DefaultSolrParams#getParams is it picks up the facet field
from the API and discards all the other facets defined in defaults. Thus
the response does not facet on brand.

If I put the facet definitions in invariants then whatever is provided by
the client will be discarded.

Putting the facet definitions in appends cases it to facet category 2
times.

Is there a way where he does not have to provide all the facet.field
parameters in the API call?


-- 


Regards,
Varun Thacker
http://www.vthacker.in/

RE: SOLR Cloud node link is wrong in the admin panel

2013-10-23 Thread Branham, Jeremy [HR]

It seems the parameters in solr.xml are being ignored.

?xml version=1.0 encoding=UTF-8 ?
solr persistent=true sharedLib=lib zkHost=192.168.1.102:2181 
host=localhost hostPort=8080 hostContext=/svc/solr
  cores adminPath=/admin/cores
core schema=schema.xml instanceDir=root/ name=test shard=shard1 
collection=test dataDir=/data/v8p/solr/test/data/
  /cores
/solr



Jeremy D. Branham
Performance Technologist II
Sprint University Performance Support
Fort Worth, TX | Tel: **DOTNET
http://JeremyBranham.Wordpress.com
http://www.linkedin.com/in/jeremybranham


-Original Message-
From: Shawn Heisey [mailto:s...@elyograg.org]
Sent: Tuesday, October 22, 2013 3:14 PM
To: solr-user@lucene.apache.org
Subject: Re: SOLR Cloud node link is wrong in the admin panel

On 10/22/2013 2:01 PM, Branham, Jeremy [HR] wrote:
 I'm thinking I might have a configuration problem...

 The SOLR Cloud node link is wrong in the admin panel.

 I am running solr on port 8080 in JBOSS, but the SOLR cloud admin panel has 
 links to http://192.168.1.123:8983/solr for example.
 Also the context should be svc instead of solr.

 Is this a configuration problem, or are there some hardcoded values?

You're going to need to define the hostPort value in your solr.xml file.  In 
the example solr.xml, this is set to the following string:

${jetty.port:8983}

This means that it will use the java property jetty.port unless that's not 
defined, in which case it will use 8983.  Just remove this from hostPort and 
put 8080 in there.

http://wiki.apache.org/solr/SolrCloud#SolrCloud_Instance_Params

You might ask why Solr doesn't just figure out what port it is running on and 
store what it finds in the cloudstate.  The reason it doesn't do that is 
because it can't - a java webapp/servlet has no idea what port it's on until it 
actually receives a request, but it's not going to receive any requests until 
it's initialized, and by then it's too late to do anything useful with the 
information ... plus you need to send it a request.  This is one of the prime 
motivating factors behind the project's decision that Solr will no longer be a 
war in a future major version.

Thanks,
Shawn





This e-mail may contain Sprint proprietary information intended for the sole 
use of the recipient(s). Any use by others is prohibited. If you are not the 
intended recipient, please contact the sender and delete all copies of the 
message.

Is Solr can create temporary sub-index ?


Dear Solr User,

We have to do a new web project which is : Connect our SOLR database to 
a web plateform.


This Web Plateform will be used by several users at the same time.
They do requests on our SOLR and they can apply filter on the result.

i.e.:
Our SOLR contains 87M docs
An user do requests, result is around few hundreds to several thousands.
On the Web Plateform, user will see first 20 results (or more by using 
Next Page button)
But he will need also to filter the whole result by additional terms. 
(Terms that our plateform will propose him)


Is SOLR can create temporary index (manage by SOLR himself during a web 
session) ?


My goal is to not download the whole result on local computer to provide 
filter, or to re-send

the same request several times added to the new criterias.

Many thanks for your comment,

Regards,
Bruno

Re: Changing indexed property on a field from false to true

2013-10-23 Thread Upayavira

The content needs to be re-indexed, the question is whether you can use
the info in the index to do it rather than pushing fresh copies of the
documents to the index.

I've often wondered whether atomic updates could be used to handle this
sort of thing. If all fields are stored, push a nominal update to cause
the document to be re-indexed. I've never tried it though. I'd be
curious to know if it works.

Upayavira

On Wed, Oct 23, 2013, at 02:25 PM, michael.boom wrote:
 Being given 
 field name=title type=string bindexed=false* stored=true
 multiValued=false /
 Changed to 
 field name=title type=string bindexed=true* stored=true
 multiValued=false /
 
 Once the above is done and the collection reloaded, is there a way I can
 build that index on that field, without reindexing the everything?
 
 Thank you!
 
 
 
 -
 Thanks,
 Michael
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Changing-indexed-property-on-a-field-from-false-to-true-tp4097213.html
 Sent from the Solr - User mailing list archive at Nabble.com.

SolR document with high number of fields

2013-10-23 Thread Jean-Marc Desprez

Hi,

I have done some research about SolR document with a very high number of
fields.

In the mailing list archive there's a thread about this subject which
answers my question :
http://lucene.472066.n3.nabble.com/Dynamic-fields-performance-question-td476337.html
.

By the way, this post is a little old and I would like to know if it's
still applicable for a recent SolR version (4.5) ?

Thanks

Reclaiming disk space from (large, optimized) segments

2013-10-23 Thread Scott Lundgren

*Background:*

- Our use case is to use SOLR as a massive FIFO queue.

- Document additions and updates happen continuously.

- Documents are being added at sustained a rate of 50 - 100 documents
per second.

- About 50% of these document are updates to existing docs, indexed
using atomic updates: the original doc is thus deleted and re-added.

- There is a separate purge operation running every four hours that deletes
the oldest docs, if required based on a number of unrelated configuration
parameters.

- At some time in the past, a manual force merge / optimize with
maxSegments=2 was run to troubleshoot high disk i/o and remove too many
segments as a potential variable.  Currently, the largest fdts are 74G and
43G.   There are 47 total segments, the largest other sizes are all around
2G.

- Merge policies are all at Solr 4 defaults. Index size is currently ~50M
maxDocs, ~35M numDocs, 276GB.

*Issue:*

The background purge operation is deleting docs on schedule, but the disk
space is not being recovered.

*Presumptions:*
I presume, but have not confirmed (how?) the 15M deleted documents are
predominately in the two large segments.  Because they are largely in the
two large segments, and those large segments still have (some/many) live
documents, the segment backing files are not deleted.

*Questions:*

- When will those segments get merged and documents recovered?  Does it
happen when _all_ the documents in those segments are deleted?  Some
percentage of the segment is filled with deleted documents?
- Is there a way to do it right now vs. just waiting?
- In some cases, the purge delete conditional is _just_ free disk space:
 when index  free space, delete oldest.  Those setups are now in scenarios
where index  free space, and getting worse.  How does low disk space
effect above two questions?
- Is there a way for me to determine stats on a per-segment basis?
   - for example, how many deleted documents in a particular segment?
- On the flip side, can I determine in what segment a particular document
is located?

Thank you,

Scott

-- 
Scott Lundgren
Director of Engineering
Carbon Black, Inc.
(210) 204-0483 | scott.lundg...@carbonblack.com

Re: Stop/Restart Solr

2013-10-23 Thread Walter Underwood

PPID is the parent process ID. You want to kill the PID, not the PPID.

wunder

On Oct 23, 2013, at 3:09 AM, Jeevanandam M. wrote:

 It seems process started recently. Is there any external cron/process 
 triggering a startup of Solr?
 Kill again and monitor it.
 
 - Jeeva
 
 -- Original Message --
 From: Raheel Hasan [mailto:raheelhasan@gmail.com]
 Sent: October 23, 2013 3:29:47 PM GMT+05:30
 To: solr-user@lucene.apache.org
 Subject: Re: Stop/Restart Solr
 
 
 31173 1  0 16:45 ?00:00:08 java -jar start.jar
 
 
 On Wed, Oct 23, 2013 at 2:53 PM, Jeevanandam M. je...@myjeeva.com wrote:
 
 Can you please share output of following command?
 ps -ef | grep 'start.jar'
 
 - Jeeva
 
 -- Original Message --
 From: Raheel Hasan [mailto:raheelhasan@gmail.com]
 Sent: October 23, 2013 3:19:46 PM GMT+05:30
 To: solr-user@lucene.apache.org
 Subject: Re: Stop/Restart Solr
 
 
 Kill -9  didnt kill it... ... the process is now again listed, but with
 PPID=1 which I dont want to kill as many processes have this same id...
 
 
 On Tue, Oct 22, 2013 at 11:59 PM, Utkarsh Sengar utkarsh2...@gmail.com
 wrote:
 
 We use this to start/stop solr:
 
 Start:
 java -Dsolr.clustering.enabled=true -Dsolr.solr.home=multicore
 -Djetty.class.path=lib/ext/* -Dbootstrap_conf=true -DnumShards=3
 -DSTOP.PORT=8079 -DSTOP.KEY=some_value -jar start.jar
 
 Stop:
 java -Dsolr.solr.home=multicore -Dbootstrap_conf=true  -DnumShards=3
 -DSTOP.PORT=8079 -DSTOP.KEY=some_value -jar start.jar --stop
 
 
 Thanks,
 -Utkarsh
 
 
 
 On Tue, Oct 22, 2013 at 10:09 AM, Raheel Hasan 
 raheelhasan@gmail.com
 wrote:
 
 ok fantastic... thanks a lot guyz
 
 
 On Tue, Oct 22, 2013 at 10:00 PM, François Schiettecatte 
 fschietteca...@gmail.com wrote:
 
 Yago has the right command to search for the process, that will get you
 the process ID specifically the first number on the output line, then
 do
 'kill ###', if that fails 'kill -9 ###'.
 
 François
 
 On Oct 22, 2013, at 12:56 PM, Raheel Hasan raheelhasan@gmail.com
 wrote:
 
 its CentOS...
 
 and using jetty with solr here..
 
 
 On Tue, Oct 22, 2013 at 9:54 PM, François Schiettecatte 
 fschietteca...@gmail.com wrote:
 
 A few more specifics about the environment would help,
 Windows/Linux/...?
 Jetty/Tomcat/...?
 
 François
 
 On Oct 22, 2013, at 12:50 PM, Yago Riveiro yago.rive...@gmail.com
 wrote:
 
 If you are asking about if solr has a way to restart himself, I
 think
 that the answer is no.
 
 If you lost control of the remote machine someone will need to go
 and
 restart the machine ...
 
 You can try use a kvm or other remote control system
 
 --
 Yago Riveiro
 Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
 
 
 On Tuesday, October 22, 2013 at 5:46 PM, François Schiettecatte
 wrote:
 
 If you are on linux/unix, use the kill command.
 
 François
 
 On Oct 22, 2013, at 12:42 PM, Raheel Hasan 
 raheelhasan@gmail.com
 (mailto:
 raheelhasan@gmail.com) wrote:
 
 Hi,
 
 is there a way to stop/restart java? I lost control over it via
 SSH
 and
 connection was closed. But the Solr (start.jar) is still running.
 
 thanks.
 
 --
 Regards,
 Raheel Hasan
 
 
 
 
 
 
 
 
 
 
 --
 Regards,
 Raheel Hasan
 
 
 
 
 --
 Regards,
 Raheel Hasan
 
 
 
 
 --
 Thanks,
 -Utkarsh
 
 
 
 
 --
 Regards,
 Raheel Hasan
 
 
 
 
 
 
 -- 
 Regards,
 Raheel Hasan
 
 
 

--
Walter Underwood
wun...@wunderwood.org

Stemming and Synonyms in Apache Solr

2013-10-23 Thread venkatesham.gu...@igate.com

We have written a blog with our understanding and experiments on stemming and
synonyms in Apache Solr. 

http://theunstructuredworld.blogspot.in/

We appreciate the users can read and post their valuable
suggestions/comments.

Thanks.






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Stemming-and-Synonyms-in-Apache-Solr-tp4097227.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Stop/Restart Solr

ok got it thanks :)


On Wed, Oct 23, 2013 at 7:33 PM, Walter Underwood wun...@wunderwood.orgwrote:

 PPID is the parent process ID. You want to kill the PID, not the PPID.

 wunder

 On Oct 23, 2013, at 3:09 AM, Jeevanandam M. wrote:

  It seems process started recently. Is there any external cron/process
 triggering a startup of Solr?
  Kill again and monitor it.
 
  - Jeeva
 
  -- Original Message --
  From: Raheel Hasan [mailto:raheelhasan@gmail.com]
  Sent: October 23, 2013 3:29:47 PM GMT+05:30
  To: solr-user@lucene.apache.org
  Subject: Re: Stop/Restart Solr
 
 
  31173 1  0 16:45 ?00:00:08 java -jar start.jar
 
 
  On Wed, Oct 23, 2013 at 2:53 PM, Jeevanandam M. je...@myjeeva.com
 wrote:
 
  Can you please share output of following command?
  ps -ef | grep 'start.jar'
 
  - Jeeva
 
  -- Original Message --
  From: Raheel Hasan [mailto:raheelhasan@gmail.com]
  Sent: October 23, 2013 3:19:46 PM GMT+05:30
  To: solr-user@lucene.apache.org
  Subject: Re: Stop/Restart Solr
 
 
  Kill -9  didnt kill it... ... the process is now again listed, but
 with
  PPID=1 which I dont want to kill as many processes have this same id...
 
 
  On Tue, Oct 22, 2013 at 11:59 PM, Utkarsh Sengar utkarsh2...@gmail.com
  wrote:
 
  We use this to start/stop solr:
 
  Start:
  java -Dsolr.clustering.enabled=true -Dsolr.solr.home=multicore
  -Djetty.class.path=lib/ext/* -Dbootstrap_conf=true -DnumShards=3
  -DSTOP.PORT=8079 -DSTOP.KEY=some_value -jar start.jar
 
  Stop:
  java -Dsolr.solr.home=multicore -Dbootstrap_conf=true  -DnumShards=3
  -DSTOP.PORT=8079 -DSTOP.KEY=some_value -jar start.jar --stop
 
 
  Thanks,
  -Utkarsh
 
 
 
  On Tue, Oct 22, 2013 at 10:09 AM, Raheel Hasan 
  raheelhasan@gmail.com
  wrote:
 
  ok fantastic... thanks a lot guyz
 
 
  On Tue, Oct 22, 2013 at 10:00 PM, François Schiettecatte 
  fschietteca...@gmail.com wrote:
 
  Yago has the right command to search for the process, that will get
 you
  the process ID specifically the first number on the output line, then
  do
  'kill ###', if that fails 'kill -9 ###'.
 
  François
 
  On Oct 22, 2013, at 12:56 PM, Raheel Hasan 
 raheelhasan@gmail.com
  wrote:
 
  its CentOS...
 
  and using jetty with solr here..
 
 
  On Tue, Oct 22, 2013 at 9:54 PM, François Schiettecatte 
  fschietteca...@gmail.com wrote:
 
  A few more specifics about the environment would help,
  Windows/Linux/...?
  Jetty/Tomcat/...?
 
  François
 
  On Oct 22, 2013, at 12:50 PM, Yago Riveiro yago.rive...@gmail.com
 
  wrote:
 
  If you are asking about if solr has a way to restart himself, I
  think
  that the answer is no.
 
  If you lost control of the remote machine someone will need to go
  and
  restart the machine ...
 
  You can try use a kvm or other remote control system
 
  --
  Yago Riveiro
  Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
 
 
  On Tuesday, October 22, 2013 at 5:46 PM, François Schiettecatte
  wrote:
 
  If you are on linux/unix, use the kill command.
 
  François
 
  On Oct 22, 2013, at 12:42 PM, Raheel Hasan 
  raheelhasan@gmail.com
  (mailto:
  raheelhasan@gmail.com) wrote:
 
  Hi,
 
  is there a way to stop/restart java? I lost control over it via
  SSH
  and
  connection was closed. But the Solr (start.jar) is still
 running.
 
  thanks.
 
  --
  Regards,
  Raheel Hasan
 
 
 
 
 
 
 
 
 
 
  --
  Regards,
  Raheel Hasan
 
 
 
 
  --
  Regards,
  Raheel Hasan
 
 
 
 
  --
  Thanks,
  -Utkarsh
 
 
 
 
  --
  Regards,
  Raheel Hasan
 
 
 
 
 
 
  --
  Regards,
  Raheel Hasan
 
 
 

 --
 Walter Underwood
 wun...@wunderwood.org






-- 
Regards,
Raheel Hasan

Re: Is Solr can create temporary sub-index ?

2013-10-23 Thread Timothy Potter

Hi Bruno,

Have you looked into Solr's facet support? If I'm reading your post
correctly, this sounds like the classic case for facets. Each time the user
selects a facet, you add a filter query (fq clause) to the original query.
http://wiki.apache.org/solr/SolrFacetingOverview

Tim


On Wed, Oct 23, 2013 at 8:16 AM, Bruno Mannina bmann...@free.fr wrote:

 Dear Solr User,

 We have to do a new web project which is : Connect our SOLR database to a
 web plateform.

 This Web Plateform will be used by several users at the same time.
 They do requests on our SOLR and they can apply filter on the result.

 i.e.:
 Our SOLR contains 87M docs
 An user do requests, result is around few hundreds to several thousands.
 On the Web Plateform, user will see first 20 results (or more by using
 Next Page button)
 But he will need also to filter the whole result by additional terms.
 (Terms that our plateform will propose him)

 Is SOLR can create temporary index (manage by SOLR himself during a web
 session) ?

 My goal is to not download the whole result on local computer to provide
 filter, or to re-send
 the same request several times added to the new criterias.

 Many thanks for your comment,

 Regards,
 Bruno

Re: Changing indexed property on a field from false to true

2013-10-23 Thread michael.boom

I've made a test, based on your suggestion.

Using the example in 4.5.0 i set the title field as indexed=false, indexed a
couple of docs:
add
  doc
  field name=id1/field
  field name=title update=setBigApple/field
/doc
doc
  field name=id2/field
  field name=title update=setSmallApple/field
/doc
/add

 and made fq=title:BigApple. No docs were returned, of course.

Then I modified the schema, setting indexed=true for the title field and
restarted solr.
Following that I posted a document update :
add
  doc
  field name=id1/field
  field name=title update=setBigApple/field
/doc
/add

Afterwards i ranned the same query fq=title:BigApple and the document was
returned.

So at a first look an atomic update can do the trick. 
Unless I was doing something wrong.



-
Thanks,
Michael
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Changing-indexed-property-on-a-field-from-false-to-true-tp4097213p4097233.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Having two document sets in one index, separated by filter query.

2013-10-23 Thread Timothy Potter

Sounds correct - you probably want to use an invariant parameter in
solrconfig.xml, something along the lines of:

lst name=invariants str name=fqdocset:0/str /lst

Where docset is the new field you add to the schema to determine which set
a document belongs to. You might also consider adding a newSearcher warming
query that includes this fq so that the filter gets cached everytime you
open a new searcher.

On Wed, Oct 23, 2013 at 7:09 AM, Achim Domma do...@procoders.net wrote:

 Hi,

 I have two document sets, both having the same schema. On set is the
 larger reference set (lets say a few hundred thousand documents) and the
 smaller set is some user generated content (a few hundreds or thousands).
 In most cases, I just want to search on the larger reference sets but some
 functionality also works on both sets.

 Is the following assumption correct?

 I could add a field to the schema which defines what type a document is
 (large set or small set). Now I configure two search handlers. For one
 handler I add a filter query, which filters on the just defined type field.
 If I use this handler in my application, I should only see content from the
 large set and it should be impossible to get results from the small set
 back.

 cheers,
 Achim

Re: shards.tolerant throwing null pointer exception when spellcheck is on

2013-10-23 Thread shamik

Thanks for the information. I think its good to have this issue fixed,
specially for cases where the spellcheck feature is on. I'll check out at
the source code and take a look, even a quick suppressing of the null
pointer exception might make a difference.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/shards-tolerant-throwing-null-pointer-exception-when-spellcheck-is-on-tp4097133p4097234.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Minor bug with CloudSolrServer and collection-alias.

2013-10-23 Thread Shawn Heisey

On 10/23/2013 3:59 AM, Thomas Egense wrote:
 Using cloudSolrServer.setDefaultCollection(collectionId) does not work as
 intended for an alias spanning more than 1 collection.
 The virtual collection-alias collectionID is recoqnized as a existing
 collection, but it does only query one of the collections it is mapped to.
 
 You can confirm this easy in AliasIntegrationTest.
 
 The test-class AliasIntegrationTest creates to cores with 2 and 3 different
 documents. And then creates an alias pointing to both of them.
 
 Line 153:
 // search with new cloud client
 CloudSolrServer cloudSolrServer = new
 CloudSolrServer(zkServer.getZkAddress(), random().nextBoolean());
 cloudSolrServer.setParallelUpdates(random().nextBoolean());
 query = new SolrQuery(*:*);
 query.set(collection, testalias);
 res = cloudSolrServer.query(query);
 cloudSolrServer.shutdown();
 assertEquals(5, res.getResults().getNumFound());
 
 No unit-test bug here, however if you change it from setting the
 collectionid on the query but on CloudSolrServer instead,it will produce
 the bug:
 
 // search with new cloud client
 CloudSolrServer cloudSolrServer = new
 CloudSolrServer(zkServer.getZkAddress(), random().nextBoolean());
 cloudSolrServer.setDefaultCollection(testalias);
 cloudSolrServer.setParallelUpdates(random().nextBoolean());
 query = new SolrQuery(*:*);
 //query.set(collection, testalias);
 res = cloudSolrServer.query(query);
 cloudSolrServer.shutdown();
 assertEquals(5, res.getResults().getNumFound());  -- Assertion failure
 
 Should I create a Jira issue for this?

Thomas,

I have confirmed this with the following test patch, which adds to the
test rather than changing what's already there:

http://apaste.info/9ke5

I'm about to head off to the train station to start my commute, so I
will be unavailable for a little while.  If you haven't gotten the jira
filed by the time I get to another computer, I will create it.

Thanks,
Shawn

Re: SOLR Cloud node link is wrong in the admin panel

2013-10-23 Thread Shawn Heisey

On 10/23/2013 7:50 AM, Branham, Jeremy [HR] wrote:
 It seems the parameters in solr.xml are being ignored.
 
 ?xml version=1.0 encoding=UTF-8 ?
 solr persistent=true sharedLib=lib zkHost=192.168.1.102:2181 
 host=localhost hostPort=8080 hostContext=/svc/solr
   cores adminPath=/admin/cores
 core schema=schema.xml instanceDir=root/ name=test shard=shard1 
 collection=test dataDir=/data/v8p/solr/test/data/
   /cores
 /solr

Did you restart Solr (actually your container - jetty, tomcat, etc)
after making that change?  You'll need to make the change on all your
Solr instances and restart them all.

Thanks,
Shawn

Re: Is Solr can create temporary sub-index ?


Hello Tim,

Yes solr's facet could be a solution, but I need to re-send the q= each 
time.

I'm asking me just if an another solution exists.

Facet seems to be the good solution.

Bruno



Le 23/10/2013 17:03, Timothy Potter a écrit :

Hi Bruno,

Have you looked into Solr's facet support? If I'm reading your post
correctly, this sounds like the classic case for facets. Each time the user
selects a facet, you add a filter query (fq clause) to the original query.
http://wiki.apache.org/solr/SolrFacetingOverview

Tim


On Wed, Oct 23, 2013 at 8:16 AM, Bruno Mannina bmann...@free.fr wrote:


Dear Solr User,

We have to do a new web project which is : Connect our SOLR database to a
web plateform.

This Web Plateform will be used by several users at the same time.
They do requests on our SOLR and they can apply filter on the result.

i.e.:
Our SOLR contains 87M docs
An user do requests, result is around few hundreds to several thousands.
On the Web Plateform, user will see first 20 results (or more by using
Next Page button)
But he will need also to filter the whole result by additional terms.
(Terms that our plateform will propose him)

Is SOLR can create temporary index (manage by SOLR himself during a web
session) ?

My goal is to not download the whole result on local computer to provide
filter, or to re-send
the same request several times added to the new criterias.

Many thanks for your comment,

Regards,
Bruno

SV: fq with { or } in Solr 4.3.1

2013-10-23 Thread Peter Kirk

Thanks.

The data for the catid comes from another system, and is actually a string 
with a start { and an end }.

I was confused that it works in a q parameter but not fq.

I think the easiest for me, is simply to strip the start and end characters 
when I feed to the index.


Thanks


Fra: Jack Krupansky j...@basetechnology.com
Sendt: 23. oktober 2013 12:59
Til: solr-user@lucene.apache.org
Emne: Re: fq with { or } in Solr 4.3.1

Are you using the edismax query parser? It traps the syntax error and then
escapes or ignores special characters.

Curly braces are used for exclusive range queries (square brackets are
inclusive ranges). The proper syntax is {term1 TO term2}.

So, what were your intentions with catid:{123}? If you are simply trying
to pass the braces as literal characters for a string field, either escape
them with backslash or enclose the entire term in quotes:

catid:\{123\}

catid:{123}

-- Jack Krupansky

-Original Message-
From: Peter Kirk
Sent: Wednesday, October 23, 2013 4:57 AM
To: solr-user@lucene.apache.org
Subject: RE: fq with { or } in Solr 4.3.1

Sorry, that was just a typo.

/ search?q=*:*fq=catid:{123}

Gives me the error.

I think that { and } must be used in ranges for fq, and that's why I can't
use them directly like this.

/Peter



-Original Message-
From: Upayavira [mailto:u...@odoko.co.uk]
Sent: 23. oktober 2013 10:52
To: solr-user@lucene.apache.org
Subject: Re: fq with { or } in Solr 4.3.1

Missing a colon before the curly bracket in the fq?

On Wed, Oct 23, 2013, at 09:42 AM, Peter Kirk wrote:
 Hi

 If I do a search like
 /search?q=catid:{123}

 I get the results I expect.

 But if I do
 /search?q=*:*fq=catid{123}

 I get an error from Solr like:
 org.apache.solr.search.SyntaxError: Cannot parse 'catid:{123}':
 Encountered  } }  at line 1, column 58. Was expecting one of: TO
 ... RANGE_QUOTED ... RANGE_GOOP ...


 Can I not use { or } in an fq?

 Thanks,
 Peter

Re: Is Solr can create temporary sub-index ?

2013-10-23 Thread Timothy Potter

Yes, absolutely you resend the q= each time, optionally with any facets
selected by the user using fq=

On Wed, Oct 23, 2013 at 10:00 AM, Bruno Mannina bmann...@free.fr wrote:

Hello Tim,

Yes solr's facet could be a solution, but I need to re-send the q= each
time.
I'm asking me just if an another solution exists.

Facet seems to be the good solution.

Bruno

Le 23/10/2013 17:03, Timothy Potter a écrit :

Hi Bruno,

Have you looked into Solr's facet support? If I'm reading your post
correctly, this sounds like the classic case for facets. Each time the
user
selects a facet, you add a filter query (fq clause) to the original query.
http://wiki.apache.org/solr/**SolrFacetingOverviewhttp://wiki.apache.org/solr/SolrFacetingOverview

Tim

On Wed, Oct 23, 2013 at 8:16 AM, Bruno Mannina bmann...@free.fr wrote:

Dear Solr User,

We have to do a new web project which is : Connect our SOLR database to a
web plateform.

This Web Plateform will be used by several users at the same time.
They do requests on our SOLR and they can apply filter on the result.

i.e.:
Our SOLR contains 87M docs
An user do requests, result is around few hundreds to several thousands.
On the Web Plateform, user will see first 20 results (or more by using
Next Page button)
But he will need also to filter the whole result by additional terms.
(Terms that our plateform will propose him)

Is SOLR can create temporary index (manage by SOLR himself during a web
session) ?

My goal is to not download the whole result on local computer to provide
filter, or to re-send
the same request several times added to the new criterias.

Many thanks for your comment,

Regards,
Bruno

Re: Is Solr can create temporary sub-index ?


I have a little question concerning statistics on a request:

I have a field defined like that:
field name=ic type=text_classification indexed=true stored=true 
multiValued=true/


fieldType name=text_classification class=solr.TextField 
positionIncrementGap=100 autoGeneratePhraseQueries=true

 analyzer type=index
  tokenizer class=solr.StandardTokenizerFactory/
  filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt enablePositionIncrements=true/

  filter class=solr.LowerCaseFilterFactory/
 /analyzer
 analyzer type=query
  tokenizer class=solr.StandardTokenizerFactory/
  filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt enablePositionIncrements=true/
  filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
ignoreCase=true expand=true/

  filter class=solr.LowerCaseFilterFactory/
 /analyzer
/fieldType

Date sample for this field:
arr name=ic
 strA23L1/22066/str
 strA23L1/227/str
 strA23L1/231/str
 strA23L1/2375/str
/arr

My question is:
  Is it possible to have frequency of terms for the whole result of the 
initial user's request?


Thanks a lot,
Bruno

Le 23/10/2013 18:12, Timothy Potter a écrit :

Yes, absolutely you resend the q= each time, optionally with any facets
selected by the user using fq=


On Wed, Oct 23, 2013 at 10:00 AM, Bruno Mannina bmann...@free.fr wrote:


Hello Tim,

Yes solr's facet could be a solution, but I need to re-send the q= each
time.
I'm asking me just if an another solution exists.

Facet seems to be the good solution.

Bruno



Le 23/10/2013 17:03, Timothy Potter a écrit :

  Hi Bruno,

Have you looked into Solr's facet support? If I'm reading your post
correctly, this sounds like the classic case for facets. Each time the
user
selects a facet, you add a filter query (fq clause) to the original query.
http://wiki.apache.org/solr/**SolrFacetingOverviewhttp://wiki.apache.org/solr/SolrFacetingOverview

Tim


On Wed, Oct 23, 2013 at 8:16 AM, Bruno Mannina bmann...@free.fr wrote:

  Dear Solr User,

We have to do a new web project which is : Connect our SOLR database to a
web plateform.

This Web Plateform will be used by several users at the same time.
They do requests on our SOLR and they can apply filter on the result.

i.e.:
Our SOLR contains 87M docs
An user do requests, result is around few hundreds to several thousands.
On the Web Plateform, user will see first 20 results (or more by using
Next Page button)
But he will need also to filter the whole result by additional terms.
(Terms that our plateform will propose him)

Is SOLR can create temporary index (manage by SOLR himself during a web
session) ?

My goal is to not download the whole result on local computer to provide
filter, or to re-send
the same request several times added to the new criterias.

Many thanks for your comment,

Regards,
Bruno

Re: Is Solr can create temporary sub-index ?

Hum I think my fieldType = text_classification is not appropriated for 
this kind of data...


I don't need to use stopwords, synonym etc...

IC field is a field that contains codes, and codes contains often the 
char /

and if I use the Terms option, I get:

lst name=ic
...
int name=004563254/int
int name=003763554/int
int name=002263254/int
...
..

Le 23/10/2013 18:51, Bruno Mannina a écrit :
fieldType name=text_classification class=solr.TextField 
positionIncrementGap=100 autoGeneratePhraseQueries=true

 analyzer type=index
  tokenizer class=solr.StandardTokenizerFactory/
  filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt enablePositionIncrements=true/

  filter class=solr.LowerCaseFilterFactory/
 /analyzer
 analyzer type=query
  tokenizer class=solr.StandardTokenizerFactory/
  filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt enablePositionIncrements=true/
  filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
ignoreCase=true expand=true/

  filter class=solr.LowerCaseFilterFactory/
 /analyzer
/fieldType

Re: Minor bug with CloudSolrServer and collection-alias.

2013-10-23 Thread Mark Miller

I filed https://issues.apache.org/jira/browse/SOLR-5380 and just committed a 
fix.

- Mark

On Oct 23, 2013, at 11:15 AM, Shawn Heisey s...@elyograg.org wrote:

 On 10/23/2013 3:59 AM, Thomas Egense wrote:
 Using cloudSolrServer.setDefaultCollection(collectionId) does not work as
 intended for an alias spanning more than 1 collection.
 The virtual collection-alias collectionID is recoqnized as a existing
 collection, but it does only query one of the collections it is mapped to.
 
 You can confirm this easy in AliasIntegrationTest.
 
 The test-class AliasIntegrationTest creates to cores with 2 and 3 different
 documents. And then creates an alias pointing to both of them.
 
 Line 153:
// search with new cloud client
CloudSolrServer cloudSolrServer = new
 CloudSolrServer(zkServer.getZkAddress(), random().nextBoolean());
cloudSolrServer.setParallelUpdates(random().nextBoolean());
query = new SolrQuery(*:*);
query.set(collection, testalias);
res = cloudSolrServer.query(query);
cloudSolrServer.shutdown();
assertEquals(5, res.getResults().getNumFound());
 
 No unit-test bug here, however if you change it from setting the
 collectionid on the query but on CloudSolrServer instead,it will produce
 the bug:
 
// search with new cloud client
CloudSolrServer cloudSolrServer = new
CloudSolrServer(zkServer.getZkAddress(), random().nextBoolean());
cloudSolrServer.setDefaultCollection(testalias);
cloudSolrServer.setParallelUpdates(random().nextBoolean());
query = new SolrQuery(*:*);
//query.set(collection, testalias);
res = cloudSolrServer.query(query);
cloudSolrServer.shutdown();
assertEquals(5, res.getResults().getNumFound());  -- Assertion failure
 
 Should I create a Jira issue for this?
 
 Thomas,
 
 I have confirmed this with the following test patch, which adds to the
 test rather than changing what's already there:
 
 http://apaste.info/9ke5
 
 I'm about to head off to the train station to start my commute, so I
 will be unavailable for a little while.  If you haven't gotten the jira
 filed by the time I get to another computer, I will create it.
 
 Thanks,
 Shawn

Spellcheck with Distributed Search (sharding).

2013-10-23 Thread Luis Cappa Banda

Hello!

I'be been trying to enable Spellchecking using sharding following the steps
from the Wiki, but I failed, :-( What I do is:

*Solrconfig.xml*


*searchComponent name=suggest* class=solr.SpellCheckComponent
lst name=spellchecker
str name=namesuggest/str
str name=classnameorg.apache.solr.spelling.suggest.Suggester/str
str name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookup/str
str name=fieldsuggestion/str
str name=buildOnOptimizetrue/str
/lst
/searchComponent


*requestHandler name=/suggest* class=solr.SearchHandler
lst name=defaults
str name=dfsuggestion/str
str name=spellchecktrue/str
str name=spellcheck.dictionarysuggest/str
str name=spellcheck.count10/str
/lst
  arr name=last-components
strsuggest/str
  /arr
/requestHandler


*Note:* I have two shards (solr1 and solr2) and both have the same
solrconfig.xml. Also, bot indexes were optimized to create the spellchecker
indexes.

*Query*

solr1:8080/events/data/select?q=mqt=/suggestionshards.qt=/suggestionwt=jsonshards=solr1:8080/events/data,solr2:8080/events/data

*
*
*Response*
*
*
{

   - responseHeader:
   {
  - status: 404,
  - QTime: 12,
  - params:
  {
 - shards: solr1:8080/events/data,solr2:8080/events/data,
 - shards.qt: /suggestion,
 - q: m,
 - wt: json,
 - qt: /suggestion
 }
  },
   - error:
   {
  - msg: Server at http://solr1:8080/events/data returned non ok
  status:404, message:Not Found,
  - code: 404
  }

}

More query syntaxes that I used and that doesn't work:

http://solr1:8080/events/data/select?q=mqt=suggestionshards.qt=/suggestionwt=jsonshards=solr1:8080/events/data,solr2:8080/events/datahttp://solrclusterd.buguroo.dev:8080/events/data/select?q=mqt=suggestionshards.qt=/suggestionwt=jsonshards=solrclusterd.buguroo.dev:8080/events/data,solrclusterc.buguroo.dev:8080/events/data

http://solr1:8080/events/data/select?q=*:*spellcheck.q=mqt=suggestionshards.qt=/suggestionwt=jsonshards=solr1:8080/events/data,solr2:8080/events/datahttp://solrclusterd.buguroo.dev:8080/events/data/select?q=*:*spellcheck.q=mqt=suggestionshards.qt=/suggestionwt=jsonshards=solrclusterd.buguroo.dev:8080/events/data,solrclusterc.buguroo.dev:8080/events/data


Any idea of what I'm doing wrong?

Thank you very much in advance!

Best regards,

-- 
- Luis Cappa

Re: Solr Cloud Distributed IDF

I am indexing documents using the domin:id format ex id = k-690kohler!670614
This ensures that all k-690kohler documents are indexed to the same shard.
This does cause numDocs that are not perfectly distributed across shards
probably even worse than the default sharding algorithm.

Here is the search on Solr Cloud
http://solrsolr/productindex/productQuery?q=categories_82_is:108996bf=linear(popularity_82_i,1,2)^3debugQuery=true

And on Solr 3.6
http://solr-2-build.sys.id.build.com:8080/solr-build/select?q.alt=categoryId:108996qt=dismaxbf=linear(popularity,1,2)^3debugQuery=truefl=id,productID,manufacturer

Here is the debug output from Solr Cloud

lst name=explain
str name=921rusticware!1210842
48481.992 = (MATCH) sum of: 4.7323933 = (MATCH)
weight(categories_82_is:`#8;#0;#6;SD in 248779) [DefaultSimilarity], result
of: 4.7323933 = score(doc=248779,freq=1.0 = termFreq=1.0 ), product of:
0.8745785 = queryWeight, product of: 5.411056 = idf(docFreq=3181,
maxDocs=262058) 0.16162805 = queryNorm 5.411056 = fieldWeight in 248779,
product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 5.411056 =
idf(docFreq=3181, maxDocs=262058) 1.0 = fieldNorm(doc=248779) 48477.26 =
(MATCH) FunctionQuery(1.0*float(int(popularity_82_i))+2.0), product of:
99977.0 = 1.0*float(int(popularity_82_i)=99975)+2.0 3.0 = boost 0.16162805 =
queryNorm
/str
str name=4706baldwin!1223898
48380.168 = (MATCH) sum of: 4.7323933 = (MATCH)
weight(categories_82_is:`#8;#0;#6;SD in 67238) [DefaultSimilarity], result
of: 4.7323933 = score(doc=67238,freq=1.0 = termFreq=1.0 ), product of:
0.8745785 = queryWeight, product of: 5.411056 = idf(docFreq=3181,
maxDocs=262058) 0.16162805 = queryNorm 5.411056 = fieldWeight in 67238,
product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 5.411056 =
idf(docFreq=3181, maxDocs=262058) 1.0 = fieldNorm(doc=67238) 48375.438 =
(MATCH) FunctionQuery(1.0*float(int(popularity_82_i))+2.0), product of:
99767.0 = 1.0*float(int(popularity_82_i)=99765)+2.0 3.0 = boost 0.16162805 =
queryNorm
/str
str name=yb5405moen!1748274
48278.34 = (MATCH) sum of: 4.7323933 = (MATCH)
weight(categories_82_is:`#8;#0;#6;SD in 123982) [DefaultSimilarity], result
of: 4.7323933 = score(doc=123982,freq=1.0 = termFreq=1.0 ), product of:
0.8745785 = queryWeight, product of: 5.411056 = idf(docFreq=3181,
maxDocs=262058) 0.16162805 = queryNorm 5.411056 = fieldWeight in 123982,
product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 5.411056 =
idf(docFreq=3181, maxDocs=262058) 1.0 = fieldNorm(doc=123982) 48273.61 =
(MATCH) FunctionQuery(1.0*float(int(popularity_82_i))+2.0), product of:
99557.0 = 1.0*float(int(popularity_82_i)=99555)+2.0 3.0 = boost 0.16162805 =
queryNorm
/str
str name=bp53005amerock!1721790
48262.008 = (MATCH) sum of: 4.7675867 = (MATCH)
weight(categories_82_is:`#8;#0;#6;SD in 108146) [DefaultSimilarity], result
of: 4.7675867 = score(doc=108146,freq=1.0 = termFreq=1.0 ), product of:
0.8758082 = queryWeight, product of: 5.4436426 = idf(docFreq=3131,
maxDocs=266484) 0.16088642 = queryNorm 5.4436426 = fieldWeight in 108146,
product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 5.4436426 =
idf(docFreq=3131, maxDocs=266484) 1.0 = fieldNorm(doc=108146) 48257.24 =
(MATCH) FunctionQuery(1.0*float(int(popularity_82_i))+2.0), product of:
99982.0 = 1.0*float(int(popularity_82_i)=99980)+2.0 3.0 = boost 0.16088642 =
queryNorm
/str
str name=bp29340amerock!1721865
48208.918 = (MATCH) sum of: 4.7675867 = (MATCH)
weight(categories_82_is:`#8;#0;#6;SD in 108031) [DefaultSimilarity], result
of: 4.7675867 = score(doc=108031,freq=1.0 = termFreq=1.0 ), product of:
0.8758082 = queryWeight, product of: 5.4436426 = idf(docFreq=3131,
maxDocs=266484) 0.16088642 = queryNorm 5.4436426 = fieldWeight in 108031,
product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 5.4436426 =
idf(docFreq=3131, maxDocs=266484) 1.0 = fieldNorm(doc=108031) 48204.15 =
(MATCH) FunctionQuery(1.0*float(int(popularity_82_i))+2.0), product of:
99872.0 = 1.0*float(int(popularity_82_i)=99870)+2.0 3.0 = boost 0.16088642 =
queryNorm
/str
str name=bp53001amerock!1314101
48176.516 = (MATCH) sum of: 4.7323933 = (MATCH)
weight(categories_82_is:`#8;#0;#6;SD in 47622) [DefaultSimilarity], result
of: 4.7323933 = score(doc=47622,freq=1.0 = termFreq=1.0 ), product of:
0.8745785 = queryWeight, product of: 5.411056 = idf(docFreq=3181,
maxDocs=262058) 0.16162805 = queryNorm 5.411056 = fieldWeight in 47622,
product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 5.411056 =
idf(docFreq=3181, maxDocs=262058) 1.0 = fieldNorm(doc=47622) 48171.785 =
(MATCH) FunctionQuery(1.0*float(int(popularity_82_i))+2.0), product of:
99347.0 = 1.0*float(int(popularity_82_i)=99345)+2.0 3.0 = boost 0.16162805 =
queryNorm
/str


And here is the debug output from Solr 3.6
lst name=explain
str name=bp53005amerock
15421.395 = (MATCH) sum of: 1.6594616 = (MATCH)
weight(categoryId:`#8;#0;#6;SD in 45538), product of: 0.29207912 =
queryWeight(categoryId:`#8;#0;#6;SD), product of:

RE: Facet performance

2013-10-23 Thread Lemke, Michael SZ/HZA-ZSW

On Tue, October 22, 2013 5:23 PM Michael Lemke wrote:
On Tue, October 22, 2013 9:23 AM Toke Eskildsen wrote:
On Mon, 2013-10-21 at 16:57 +0200, Lemke, Michael SZ/HZA-ZSW wrote:
 QTime fc:
never returns, webserver restarts itself after 30 min with 100% CPU 
 load

It might be because it dies due to garbage collection. But since more
memory (as your test server presumably has) just leads to the too many
values-error, there isn't much to do.

Essentially, fc is out then.


 QTime=41205  facet.prefix=q=frequent_word  
 numFound=44532
 
 Same query repeated:
 QTime=225810 facet.prefix=q=ottomotor  
 numFound=909
 QTime=199839 facet.prefix=q=ottomotor  
 numFound=909

I am stumped on this, sorry. I do not understand why the 'ottomotor'
query can take 5 times as long as the 'frequent_word'-one.

I looked into this some more this morning.  I noticed the java process was 
doing
a lot of I/O as shown in Process Explorer.  For the frequent_word it read 
about 
180MB, for ottomotor is was about seven times as much, ~ 1,200 MB.


Got another observation today.  The response time for q=ottomotor depends on 
facet.limit:

QTime=59300  facet.limit=2
QTime=69395  facet.limit=4
QTime=85208  facet.limit=6
QTime=158150 facet.limit=8
QTime=186276 facet.limit=10
QTime=231763 facet.limit=15
QTime=260437 facet.limit=20
QTime=312268 facet.limit=30

For q=frequent_word the result is much less pronounced and shows only
for facet.limit = 15 :

QTime=0  facet.limit=0
QTime=20535  facet.limit=1
QTime=13456  facet.limit=2
QTime=13925  facet.limit=4
QTime=13705  facet.limit=6
QTime=13924  facet.limit=8
QTime=13799  facet.limit=10
QTime=14361  facet.limit=15
QTime=14704  facet.limit=20
QTime=15189  facet.limit=30
QTime=16783  facet.limit=50
QTime=57128  facet.limit=500

Looks to me for solr to collect enough facets to fulfill the limit constraint
it has to read much more of the index in the case of the infrequent word.

jconsole didn't show anything unusual according to our more experienced Java 
experts here.  Nor was the machine swapping.

Is it possible to screw up an index such that this sort of faceting leads to
constant reading of the index?  Something like full table scans in a db?


Michael

Re: Is Solr can create temporary sub-index ?


I need your help to define the right fieldType, please,

this field must be indexed, stored and each value must be considered as 
one term.

The char / don't be consider like a separator.

Is String could be a good fieldType ?

thanks

Le 23/10/2013 18:51, Bruno Mannina a écrit :

arr name=ic
 strA23L1/22066/str
 strA23L1/227/str
 strA23L1/231/str
 strA23L1/2375/str
/arr

New query-time multi-word synonym expander

Hi,

Heads up that there is new query-time multi-word synonym expander
patch in https://issues.apache.org/jira/browse/SOLR-5379

This worked for our customer and we hope it works for others.

Any feedback would be greatly appreciated.

Thanks,
Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr  Elasticsearch Support * http://sematext.com

Re: Spellcheck with Distributed Search (sharding).

2013-10-23 Thread Luis Cappa Banda

More info:

When executing the Query to a single Solr server it works:
http://solr1:8080/events/data/suggest?q=mwt=jsonhttp://solrclusterd.buguroo.dev:8080/events/data/suggest?q=mwt=json

{

   - responseHeader:
   {
  - status: 0,
  - QTime: 1
  },
   - response:
   {
  - numFound: 0,
  - start: 0,
  - docs: [ ]
  },
   - spellcheck:
   {
  - suggestions:
  [
 - m,
 -
 {
- numFound: 4,
- startOffset: 0,
- endOffset: 1,
- suggestion:
[
   - marca,
   - marcacom,
   - mis,
   - mispelotas
   ]
}
 ]
  }

}


But when choosing the Request handler this way it doesn't:
http://solr1:8080/events/data/select?*qt=/sugges*twt=jsonq=*:*http://solrclusterd.buguroo.dev:8080/events/data/select?qt=/suggestwt=jsonq=*:*




2013/10/23 Luis Cappa Banda luisca...@gmail.com

 Hello!

 I'be been trying to enable Spellchecking using sharding following the
 steps from the Wiki, but I failed, :-( What I do is:

 *Solrconfig.xml*


 *searchComponent name=suggest* class=solr.SpellCheckComponent
 lst name=spellchecker
  str name=namesuggest/str
 str name=classnameorg.apache.solr.spelling.suggest.Suggester/str
  str
 name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookup/str
 str name=fieldsuggestion/str
  str name=buildOnOptimizetrue/str
 /lst
 /searchComponent


 *requestHandler name=/suggest* class=solr.SearchHandler
 lst name=defaults
  str name=dfsuggestion/str
 str name=spellchecktrue/str
  str name=spellcheck.dictionarysuggest/str
 str name=spellcheck.count10/str
  /lst
   arr name=last-components
 strsuggest/str
   /arr
 /requestHandler


 *Note:* I have two shards (solr1 and solr2) and both have the same
 solrconfig.xml. Also, bot indexes were optimized to create the spellchecker
 indexes.

 *Query*


 solr1:8080/events/data/select?q=mqt=/suggestionshards.qt=/suggestionwt=jsonshards=solr1:8080/events/data,solr2:8080/events/data

 *
 *
 *Response*
 *
 *
 {

- responseHeader:
{
   - status: 404,
   - QTime: 12,
   - params:
   {
  - shards: solr1:8080/events/data,solr2:8080/events/data,
  - shards.qt: /suggestion,
  - q: m,
  - wt: json,
  - qt: /suggestion
  }
   },
- error:
{
   - msg: Server at http://solr1:8080/events/data returned non ok
   status:404, message:Not Found,
   - code: 404
   }

 }

 More query syntaxes that I used and that doesn't work:


 http://solr1:8080/events/data/select?q=mqt=suggestionshards.qt=/suggestionwt=jsonshards=solr1:8080/events/data,solr2:8080/events/datahttp://solrclusterd.buguroo.dev:8080/events/data/select?q=mqt=suggestionshards.qt=/suggestionwt=jsonshards=solrclusterd.buguroo.dev:8080/events/data,solrclusterc.buguroo.dev:8080/events/data


 http://solr1:8080/events/data/select?q=*:*spellcheck.q=mqt=suggestionshards.qt=/suggestionwt=jsonshards=solr1:8080/events/data,solr2:8080/events/datahttp://solrclusterd.buguroo.dev:8080/events/data/select?q=*:*spellcheck.q=mqt=suggestionshards.qt=/suggestionwt=jsonshards=solrclusterd.buguroo.dev:8080/events/data,solrclusterc.buguroo.dev:8080/events/data


 Any idea of what I'm doing wrong?

 Thank you very much in advance!

 Best regards,

 --
 - Luis Cappa




-- 
- Luis Cappa

Re: Issue with large html indexing

Attachments and images are often eaten by the mail server, your image is
not visible at least to me. Can you describe what you're seeing? Or post
the image somewhere and provide a link?

Best,
Erick


On Wed, Oct 23, 2013 at 11:07 AM, Raheel Hasan raheelhasan@gmail.comwrote:

 Hi,

 I have an issue here while indexing large html. Here is the confguration
 for that:

 1) Data is imported via URLDataSource / PlainTextEntityProcessor (DIH)

 2) Schema has this for the field:
 type=text_en_splitting indexed=true stored=false required=false

 3) text_en_splitting has the following work done for indexing:
 HTMLStripCharFilterFactory
 WhitespaceTokenizerFactory (create tokens)
 StopFilterFactory
 WordDelimiterFilterFactory
 ICUFoldingFilterFactory
 PorterStemFilterFactory
 RemoveDuplicatesTokenFilterFactory
 LengthFilterFactory

 However, the indexed data is like this (as in the attached image):
 [image: Inline image 1]


 so what are these numbers?
 If I put small html, it works fine, but as the size of html file
 increases, this is what happens..

 --
 Regards,
 Raheel Hasan

Re: New query-time multi-word synonym expander

Otis, could you provide a little (well, maybe a lot!) of discussion and 
detailed examples that illustrate what the patch can and can't handle? I 
mean, I read the Jira and and is simultaneously promising and a bit vague. 
Does it fully solve the issue, or is it yet another partial solution? Either 
way, it may be reasonably satisfactory, but some clarity would help. Thanks!


-- Jack Krupansky

-Original Message- 
From: Otis Gospodnetic

Sent: Wednesday, October 23, 2013 1:28 PM
To: solr-user@lucene.apache.org
Subject: New query-time multi-word synonym expander

Hi,

Heads up that there is new query-time multi-word synonym expander
patch in https://issues.apache.org/jira/browse/SOLR-5379

This worked for our customer and we hope it works for others.

Any feedback would be greatly appreciated.

Thanks,
Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr  Elasticsearch Support * http://sematext.com

RE: New query-time multi-word synonym expander

2013-10-23 Thread Markus Jelsma

Nice, but now we got three multi-word synonym parsers? Didn't the LUCENE-4499 
or SOLR-4381 patches work? I know the latter has had a reasonable amount of 
users and committers on github, but it was never brought back to ASF it seems. 

-Original message-
 From:Otis Gospodnetic otis.gospodne...@gmail.com
 Sent: Wednesday 23rd October 2013 18:54
 To: solr-user@lucene.apache.org
 Subject: New query-time multi-word synonym expander

 Hi,

 Heads up that there is new query-time multi-word synonym expander
 patch in https://issues.apache.org/jira/browse/SOLR-5379

 This worked for our customer and we hope it works for others.

 Any feedback would be greatly appreciated.

 Thanks,
 Otis
 --
 Performance Monitoring * Log Analytics * Search Analytics
 Solr  Elasticsearch Support * http://sematext.com

Re: New shard leaders or existing shard replicas depends on zookeeper?

My first impulse would be to ask how you created the collection. It
sure _sounds_ like you didn't specify 24 shards and thus have only
a single shard, one leader and 23 replicas

bq: ...to point to the zookeeper ensemble also used
for the ukdomain collection...

so my guess is that this ZK ensemble has the ldwa collection
defined as having only one shard

I admit I pretty much skimmed your post though...

Best,
Erick


On Wed, Oct 23, 2013 at 12:54 PM, Hoggarth, Gil gil.hogga...@bl.uk wrote:

 Hi solr-users,



 I'm seeing some confusing behaviour in Solr/zookeeper and hope you can
 shed some light on what's happening/how I can correct it.



 We have two physical servers running automated builds of RedHat 6.4 and
 Solr 4.4.0 that host two separate Solr services. The first server
 (called ld01) has 24 shards and hosts a collection called 'ukdomain';
 the second server (ld02) also has 24 shards and hosts a different
 collection called 'ldwa'. It's evidently important to note that
 previously both of these physical servers provided the 'ukdomain'
 collection, but the 'ldwa' server has been rebuilt for the new
 collection.



 When I start the ldwa solr nodes with their zookeeper configuration
 (defined in /etc/sysconfig/solrnode* and with collection.configName as
 'ldwacfg') pointing to the development zookeeper ensemble, all nodes
 initially become shard leaders and then replicas as I'd expect. But if I
 change the ldwa solr nodes to point to the zookeeper ensemble also used
 for the ukdomain collection, all ldwa solr nodes start on the same shard
 (that is, the first ldwa solr node becomes the shard leader, then every
 other solr node becomes a replica for this shard). The significant point
 here is no other ldwa shards gain leaders (or replicas).



 The ukdomain collection uses a zookeeper collection.configName of
 'ukdomaincfg', and prior to the creation of this ldwa service the
 collection.configName of 'ldwacfg' has never previously been used. So
 I'm confused why the ldwa service would differ when the only difference
 is which zookeeper ensemble is used (both zookeeper ensembles are
 automatedly built using version 3.4.5).



 If anyone can explain why this is happening and how I can get the ldwa
 services to start correctly using the non-development zookeeper
 ensemble, I'd be very grateful! If more information or explanation is
 needed, just ask.



 Thanks, Gil



 Gil Hoggarth

 Web Archiving Technical Services Engineer

 The British Library, Boston Spa, West Yorkshire, LS23 7BQ

Re: Indexing logs files of thousands of GBs

As a supplement to what Chris said, if you can
partition the walking amongst a number of clients
you can also parallelize the indexing. If you're using
SolrCloud 4.5+, there are also some nice optimizations
in SolrCloud to keep intra-shard routing to a minimum.

FWIW,
Erick


On Wed, Oct 23, 2013 at 12:59 PM, Chris Geeringh geeri...@gmail.com wrote:

 Prerna,

 The FileListEntityProcessor has a terribly inefficient recursive method,
 which will be using up all your heap building a list of files.

 I would suggest writing a client application and traverse your filesystem
 with NIO available in Java 7. Files.walkFileTree() and a FileVisitor.

 As you walk post up to the server with SolrJ.

 Cheers,
 Chris


 On 22 October 2013 18:58, keshari.prerna keshari.pre...@gmail.com wrote:

  Hello,
 
  I am tried to index log files (all text data) stored in file system. Data
  can be as big as 1000 GBs or more. I am working on windows.
 
  A sample file can be found at
  https://www.dropbox.com/s/mslwwnme6om38b5/batkid.glnxa64.66441
 
  I tried using FileListEntityProcessor with TikaEntityProcessor which
 ended
  up in java heap exception and couldn't get rid of it no matter how much I
  increase my ram size.
  data-confilg.xml
 
  dataConfig
  dataSource name=bin type=FileDataSource /
  document
  entity name=f dataSource=null rootEntity=true
  processor=FileListEntityProcessor
  transformer=TemplateTransformer
  baseDir=//mathworks/devel/bat/A/logs/66048/
  fileName=.*\.* onError=skip recursive=true
 
  field column=fileAbsolutePath name=path /
  field column=fileSize name=size/
  field column=fileLastModified name=lastmodified /
 
  entity name=file dataSource=bin
  processor=TikaEntityProcessor url=${f.fileAbsolutePath} format=text
  onError=skip transformer=TemplateTransformer
 rootEntity=true
  field column=text name=text/
  /entity
  /entity
  /document
  /dataConfig
 
  Then i used FileListEntityProcessor with LineEntityProcessor which never
  stopped indexing even after 40 hours or so.
 
  data-config.xml
 
  dataConfig
  dataSource name=bin type=FileDataSource /
  document
  entity name=f dataSource=null rootEntity=true
  processor=FileListEntityProcessor
  transformer=TemplateTransformer
  baseDir=//mathworks/devel/bat/A/logs/
  fileName=.*\.* onError=skip recursive=true
 
  field column=fileAbsolutePath name=path /
  field column=fileSize name=size/
  field column=fileLastModified name=lastmodified /
 
  entity name=file dataSource=bin
  processor=LineEntityProcessor url=${f.fileAbsolutePath} format=text
  onError=skip
 rootEntity=true
  field column=content name=rawLine/
  /entity
  /entity
  /document
  /dataConfig
 
  Is there any way i can use post.jar to index text file recursively. Or
 any
  other way which works without java heap exception and doesn't take days
 to
  index.
 
  I am completely stuck here. Any help would be greatly appreciated.
 
  Thanks,
  Prerna
 
 
 
  --
  View this message in context:
 
 http://lucene.472066.n3.nabble.com/Indexing-logs-files-of-thousands-of-GBs-tp4097073.html
  Sent from the Solr - User mailing list archive at Nabble.com.

Re: Query cache and group by queries

query cache? queryResultCache? filterCache?

Some more details please, what are you seeing and
what do you expect to see?

Best,
Erick


On Wed, Oct 23, 2013 at 1:22 PM, Kalle Aaltonen
kalle.aalto...@zemanta.comwrote:

 Hi,

 It seems that query cache is not used to all for group queries? Can someone
 explain why this is?

What is the right fieldType for this kind of field?


Dear,

Data look likes:

strA23L1/22066/str
 strA23L1/227/str
 strA23L1/231/str
 strA23L1/2375/str

I tried:
- String
but I can't search with troncation (i.e. A23*)

- Text_General
but as my code contains / then data are splitted...

What kind of field must choose to use truncation and consider code with 
/ as one term?


thanks a lot for your help,
Bruno

Re: What is the right fieldType for this kind of field?

Trailing wildcard should work fine for strings, but a23* will not match 
A23* due to case. You could use the keyword tokenizer plus the lower case 
filter.


-- Jack Krupansky

-Original Message- 
From: Bruno Mannina

Sent: Wednesday, October 23, 2013 1:54 PM
To: solr-user@lucene.apache.org
Subject: What is the right fieldType for this kind of field?

Dear,

Data look likes:

strA23L1/22066/str
 strA23L1/227/str
 strA23L1/231/str
 strA23L1/2375/str

I tried:
- String
but I can't search with troncation (i.e. A23*)

- Text_General
but as my code contains / then data are splitted...

What kind of field must choose to use truncation and consider code with
/ as one term?

thanks a lot for your help,
Bruno

Re: DIH - delta query and delta import query executes transformer twice

2013-10-23 Thread Arcadius Ahouansou

Hello Lee.

In case you haven't solved this, would you mind posting your DIH config?


Arcadius.


On 27 September 2013 15:06, Lee Carroll lee.a.carr...@googlemail.comwrote:

 Hi  It looks like when a DIH entity has a delta and delta import query plus
 a transformer defined the execution of both query's call the transformer. I
 was expecting it to only be called on the import query. Sure we can check
 for a null value or something and just return the row during the delta
 query execution, but is their a better way of doing this. That is not call
 the transformer in the first place ?

 Cheers Lee C




-- 
Arcadius Ahouansou
Menelic Ltd | Information is Power
M: 07908761999
W: www.menelic.com
---

Re: What is the right fieldType for this kind of field?


Hi Jack,

Yes String works fine, I forgot to restart my solr server after changing 
my schema.xml...arrf.I'm so stupid sorry !


Le 23/10/2013 20:09, Jack Krupansky a écrit :
Trailing wildcard should work fine for strings, but a23* will not 
match A23* due to case. You could use the keyword tokenizer plus the 
lower case filter.


-- Jack Krupansky

-Original Message- From: Bruno Mannina
Sent: Wednesday, October 23, 2013 1:54 PM
To: solr-user@lucene.apache.org
Subject: What is the right fieldType for this kind of field?

Dear,

Data look likes:

strA23L1/22066/str
 strA23L1/227/str
 strA23L1/231/str
 strA23L1/2375/str

I tried:
- String
but I can't search with troncation (i.e. A23*)

- Text_General
but as my code contains / then data are splitted...

What kind of field must choose to use truncation and consider code with
/ as one term?

thanks a lot for your help,
Bruno

Solr not indexing everything from MongoDB

2013-10-23 Thread gohome190

Hi,
I have a Mongo database with about 50 entries inside.  I use a mongo-solr
connector.  When I do a Solr *:* query, I only get about 10 or 13 responses. 
Even if I increase the max rows.  I have updated my schema.xml accordingly. 
I have deleted my solr index, restarted solr, restarted the connector,
everything.  Any ideas?

Thanks!
Zach



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-not-indexing-everything-from-MongoDB-tp4097302.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr not indexing everything from MongoDB

2013-10-23 Thread Shawn Heisey


On 10/23/2013 1:14 PM, gohome190 wrote:

I have a Mongo database with about 50 entries inside.  I use a mongo-solr
connector.  When I do a Solr *:* query, I only get about 10 or 13 responses.
Even if I increase the max rows.  I have updated my schema.xml accordingly.
I have deleted my solr index, restarted solr, restarted the connector,
everything.  Any ideas?


What is the numFound value in the query response?  If you go to the 
admin UI and select your core from the dropdown, what does it say for 
Num Docs and Max Doc?I'm assuming Solr 4.x here, 1.x and 3.x are very 
different.


Thanks,
Shawn

Solr facet field counts not correct

I am running a simple query in a non-distributed search using grouping. I am
getting incorrect facet field counts and I cannot figure out why.

Here is the query you will notice that the facet field and facet query
counts are not the same. The facet query counts are correct. Any help is
appreciated.

This XML file does not appear to have any style information associated with
it. The document tree is shown below.
response
lst name=responseHeader
int name=status0/int
int name=QTime18/int
/lst
lst name=grouped
lst name=groupid
int name=matches89/int
int name=ngroups74/int
arr name=groups
lst
str name=groupValueuc101cadet/str
result name=doclist numFound=2 start=0
doc
str name=productiduc101/str
arr name=finish
strWhite/str
strBlack/str
/arr
str name=uniqueFinishBlack/str
str name=manufacturercadet/str
int name=productCompositeid736388/int
int name=uniqueid1545116/int
/doc
/result
/lst
lst
str name=groupValuerm162cadet/str
result name=doclist numFound=1 start=0
doc
str name=productidrm162/str
arr name=finish
strN/A/str
/arr
str name=uniqueFinishN/A/str
str name=manufacturercadet/str
int name=productCompositeid667690/int
int name=uniqueid1545089/int
/doc
/result
/lst
lst
str name=groupValuecs202cadet/str
result name=doclist numFound=1 start=0
doc
str name=productidcs202/str
arr name=finish
strN/A/str
/arr
str name=uniqueFinishN/A/str
str name=manufacturercadet/str
int name=productCompositeid460865/int
int name=uniqueid1545142/int
/doc
/result
/lst
lst
str name=groupValuecs152cadet/str
result name=doclist numFound=1 start=0
doc
str name=productidcs152/str
arr name=finish
strN/A/str
/arr
str name=uniqueFinishN/A/str
str name=manufacturercadet/str
int name=productCompositeid458740/int
int name=uniqueid1545141/int
/doc
/result
/lst
lst
str name=groupValue65201cadet/str
result name=doclist numFound=1 start=0
doc
str name=productid65201/str
arr name=finish
strWhite/str
/arr
str name=uniqueFinishWhite/str
str name=manufacturercadet/str
int name=productCompositeid773769/int
int name=uniqueid1999873/int
/doc
/result
/lst
lst
str name=groupValuermc202cadet/str
result name=doclist numFound=1 start=0
doc
str name=productidrmc202/str
arr name=finish
strWhite/str
/arr
str name=uniqueFinishWhite/str
str name=manufacturercadet/str
int name=productCompositeid667929/int
int name=uniqueid1545122/int
/doc
/result
/lst
lst
str name=groupValuerbf101cadet/str
result name=doclist numFound=1 start=0
doc
str name=productidrbf101/str
arr name=finish
strChrome/str
/arr
str name=uniqueFinishChrome/str
str name=manufacturercadet/str
int name=productCompositeid663553/int
int name=uniqueid1820328/int
/doc
/result
/lst
lst
str name=groupValuerm202cadet/str
result name=doclist numFound=1 start=0
doc
str name=productidrm202/str
arr name=finish
strN/A/str
/arr
str name=uniqueFinishN/A/str
str name=manufacturercadet/str
int name=productCompositeid667551/int
int name=uniqueid1545088/int
/doc
/result
/lst
lst
str name=groupValuesl151tcadet/str
result name=doclist numFound=1 start=0
doc
str name=productidsl151t/str
arr name=finish
strWhite/str
/arr
str name=uniqueFinishWhite/str
str name=manufacturercadet/str
int name=productCompositeid710375/int
int name=uniqueid1545153/int
/doc
/result
/lst
lst
str name=groupValueuc102cadet/str
result name=doclist numFound=2 start=0
doc
str name=productiduc102/str
arr name=finish
strWhite/str
strBlack/str
/arr
str name=uniqueFinishWhite/str
str name=manufacturercadet/str
int name=productCompositeid736389/int
int name=uniqueid1820349/int
/doc
/result
/lst
/arr
/lst
/lst
lst name=facet_counts
lst name=facet_queries
int name=HeatingArea_numeric:[0 TO *]23/int
/lst
lst name=facet_fields
lst name=HeatingArea_numeric
int name=128.020/int
int name=250.06/int
int name=500.06/int
int name=250.06/int
int name=500.06/int
int name=250.06/int
int name=500.06/int
int name=375.03/int
int name=375.03/int
int name=374.03/int
int name=125.02/int
int name=200.02/int
int name=125.02/int
int name=200.02/int
int name=125.02/int
int name=200.02/int
int name=32.02/int
int name=175.01/int
int name=300.01/int
int name=400.01/int
int name=550.01/int
int name=175.01/int
int name=300.01/int
int name=400.01/int
int name=550.01/int
int name=175.01/int
int name=300.01/int
int name=400.01/int
int name=548.01/int
int name=512.01/int
int name=100.00/int
int name=220.00/int
int name=420.00/int
int name=610.00/int
int name=640.00/int
int name=710.00/int
int name=720.00/int
int name=750.00/int
int name=770.00/int
int name=835.00/int
int name=850.00/int
int name=860.00/int
int name=870.00/int
int name=900.00/int
int name=910.00/int
int name=920.00/int
int name=930.00/int
int name=940.00/int
int name=950.00/int
int name=1000.00/int
int name=1010.00/int
int name=1015.00/int
int name=1020.00/int
int name=1040.00/int
int name=1050.00/int
int name=1070.00/int
int name=1090.00/int
int name=1100.00/int
int name=1150.00/int
int name=1175.00/int
int name=1200.00/int
int name=1250.00/int
int name=1300.00/int
int name=1330.00/int
int name=1360.00/int
int

Re: Solr facet field counts not correct

Here is my query String:
/solr/singleproductindex/productQuery?fq=siteid:82q=categories_82_is:109124facet=truefacet.query=HeatingArea_numeric:[0%20TO%20*]facet.field=HeatingArea_numericdebugQuery=true

Here is my schema for that field: 
dynamicField name=*_numeric type=tfloatindexed=true 
stored=false  multiValued=true/

Here is my request handler definition:
requestHandler name=/productQuery class=solr.SearchHandler
lst name=defaults
  str name=dftext/str
  str name=defTypeedismax/str
  float name=tie0.01/float
  str name=qf
sku^9.0 upc^9.1 keywords_82_txtws^1.9 series^2.8 
productTitle^1.2
productid^9.0 manufacturer^4.0 masterFinish^1.5 theme^1.1
categoryNames_82_txt^0.2 finish^1.4 uniqueFinish^1
  /str
  str name=pf
keywords_82_txtws^2.1 productTitle^1.5 manufacturer^4.0 
finish^1.9
  /str
  str name=bf
linear(popularity_82_i,1,2)^3.0
  /str
  str name=fl

uniqueid,productCompositeid,productid,manufacturer,uniqueFinish,finish
  /str
  str name=mm
3lt;-1 5lt;-2 6lt;90%
  /str 
  bool name=grouptrue/bool
  str name=group.fieldgroupid/str
  bool name=group.ngroupstrue/bool
  bool name=group.facettrue/bool
  int name=ps100/int
  int name=qs3/int
  str name=spellcheck.count10/str
  str name=spellcheck.alternativeTermCount5/str
  str name=spellcheck.maxResultsForSuggest5/str   
  str name=spellcheck.collatetrue/str
  str name=spellcheck.collateExtendedResultstrue/str  
  str name=spellcheck.maxCollationTries10/str
  str name=spellcheck.maxCollations5/str
/lst
arr name=last-components
  strspellcheck/str
/arr
  /requestHandler



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-facet-field-counts-not-correct-tp4097305p4097306.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr facet field counts not correct

if I do group=falsegroup.facet=false the counts are what they should be for
the ungrouped counts... seems like group.facet isn't working correctly



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-facet-field-counts-not-correct-tp4097305p4097314.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: What is the right fieldType for this kind of field?


Le 23/10/2013 20:09, Jack Krupansky a écrit :
You could use the keyword tokenizer plus the lower case filter. 

Jack,

Could you help me to write the right fieldType please?
(index and query)

Another thing, I don't know if I must use the Keyword tokenizer because 
codes contain / char,

and Tokenizer seems split code no ?

Many thanks,

Bruno

Re: What is the right fieldType for this kind of field?


Le 23/10/2013 22:44, Bruno Mannina a écrit :

Le 23/10/2013 20:09, Jack Krupansky a écrit :
You could use the keyword tokenizer plus the lower case filter. 

Jack,

Could you help me to write the right fieldType please?
(index and query)

Another thing, I don't know if I must use the Keyword tokenizer 
because codes contain / char,

and Tokenizer seems split code no ?

Many thanks,

Bruno



may be an answer (i don't tested yet)

http://pietervogelaar.nl/solr-3-5-search-case-insensitive-on-a-string-field-for-exact-match/

Re: What is the right fieldType for this kind of field?


Le 23/10/2013 22:49, Bruno Mannina a écrit :

Le 23/10/2013 22:44, Bruno Mannina a écrit :

Le 23/10/2013 20:09, Jack Krupansky a écrit :
You could use the keyword tokenizer plus the lower case filter. 

Jack,

Could you help me to write the right fieldType please?
(index and query)

Another thing, I don't know if I must use the Keyword tokenizer 
because codes contain / char,

and Tokenizer seems split code no ?

Many thanks,

Bruno



may be an answer (i don't tested yet)

http://pietervogelaar.nl/solr-3-5-search-case-insensitive-on-a-string-field-for-exact-match/ 





ok it works fine !

Re: What is the right fieldType for this kind of field?

Yes, that blog post appears to use the proper technique for case insensitive 
string fields.


The so-called keyword tokenizer merely treats the whole string value as a 
single token (AKA keyword) and does NOT do any further tokenization.


-- Jack Krupansky

-Original Message- 
From: Bruno Mannina

Sent: Wednesday, October 23, 2013 4:57 PM
To: solr-user@lucene.apache.org
Subject: Re: What is the right fieldType for this kind of field?

Le 23/10/2013 22:49, Bruno Mannina a écrit :

Le 23/10/2013 22:44, Bruno Mannina a écrit :

Le 23/10/2013 20:09, Jack Krupansky a écrit :

You could use the keyword tokenizer plus the lower case filter.

Jack,

Could you help me to write the right fieldType please?
(index and query)

Another thing, I don't know if I must use the Keyword tokenizer because 
codes contain / char,

and Tokenizer seems split code no ?

Many thanks,

Bruno



may be an answer (i don't tested yet)

http://pietervogelaar.nl/solr-3-5-search-case-insensitive-on-a-string-field-for-exact-match/


ok it works fine !

Terms function join with a Select function ?


Dear Solr users,

I use the Terms function to see the frequency data in a field but it's 
for the whole database.


I have 2 questions:
- Is it possible to increase the number of statistic ? actually I have 
the 10 first frequency term.


- Is it possible to limit this statistic to the result of a request ?

PS: the second question is very important for me.

Many thanks

Re: Solr not indexing everything from MongoDB

2013-10-23 Thread gohome190

numFound is 10.
numDocs is 10, maxDoc is 23.  Yeah, Solr 4.x!

Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-not-indexing-everything-from-MongoDB-tp4097302p4097340.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr facet field counts not correct

2013-10-23 Thread Chris Hostetter


: if I do group=falsegroup.facet=false the counts are what they should be for
: the ungrouped counts... seems like group.facet isn't working correctly

yeah ... thanks for digging int -- definitely seems like a problem with 
group.facet and Trie fields that use precisionStep.

I've opened a Jira: https://issues.apache.org/jira/browse/SOLR-5383


-Hoss

Re: Solr facet field counts not correct

Hoss created: https://issues.apache.org/jira/browse/SOLR-5383



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-facet-field-counts-not-correct-tp4097305p4097346.html
Sent from the Solr - User mailing list archive at Nabble.com.

single core for extracted text from pdf/other doc types and metadata fields about that doc from the database

2013-10-23 Thread Sharma, Vikas


Can I create a core where one subset of fields comes from the Database source 
using the DataImport handler for database
and another subset of fields using the Apache Tika dataimport handler

For example if in the indexed doc I want following fields to come from the 
database source

1  Id
2  DocFilePath (nullable)
3  Subject
4  KeyWords
5  Description
6  Text

and another set of field(s) to come from documents on the  filesystem with text 
extracted using Apache Tika processor

7  DocText


so that Final Doc fields are as follows
where DocText is the text of the document whose path is mentioned in the 
DocFilePath column

1  Id
2  DocFilePath (nullable)
3  Subject
4  KeyWords
5  Description
6  Text
7  DocText


Thanks,
Vikas

Vikas Sharma | Senior Software Engineer | MedAssets
14405 SE 36th Street, Suite 206 | Bellevue, WA, 98006 | Work: 425.519.1305
vsha...@medassets.commailto:vsha...@medassets.com
Visit us at www.medassets.comhttp://www.medassets.com
Follow us on LinkedInhttp://www.linkedin.com/company/medassets, 
YouTubehttps://www.youtube.com/user/MedAssetsInc, 
Twitterhttps://twitter.com/MedAssets, and 
Facebookhttps://www.facebook.com/MedAssets

*Attention*
This electronic transmission may contain confidential, sensitive, proprietary 
and/or privileged information belonging to the sender. This information, 
including any attached files, is intended only for the persons or entities to 
which it is addressed. Authorized recipients of this information are prohibited 
from disclosing the information to any unauthorized party and are required to 
properly dispose of the information upon fulfillment of its need/use, unless 
otherwise required by law. Any review, retransmission, dissemination or other 
use of, or taking of any action in reliance upon this information by any person 
or entity other than the intended recipient is prohibited. If you have received 
this electronic transmission in error, please notify the sender and properly 
dispose of the information immediately.

Re: single core for extracted text from pdf/other doc types and metadata fields about that doc from the database

You can accomplish your end goal easily if you just write your own indexer,
which is easy and gives you power and flexibility.

Otis
Solr  ElasticSearch Support
http://sematext.com/
On Oct 23, 2013 6:39 PM, Sharma, Vikas vsha...@medassets.com wrote:


 Can I create a core where one subset of fields comes from the Database
 source using the DataImport handler for database
 and another subset of fields using the Apache Tika dataimport handler

 For example if in the indexed doc I want following fields to come from the
 database source

 1  Id
 2  DocFilePath (nullable)
 3  Subject
 4  KeyWords
 5  Description
 6  Text

 and another set of field(s) to come from documents on the  filesystem with
 text extracted using Apache Tika processor

 7  DocText


 so that Final Doc fields are as follows
 where DocText is the text of the document whose path is mentioned in the
 DocFilePath column

 1  Id
 2  DocFilePath (nullable)
 3  Subject
 4  KeyWords
 5  Description
 6  Text
 7  DocText


 Thanks,
 Vikas

 Vikas Sharma | Senior Software Engineer | MedAssets
 14405 SE 36th Street, Suite 206 | Bellevue, WA, 98006 | Work: 425.519.1305
 vsha...@medassets.commailto:vsha...@medassets.com
 Visit us at www.medassets.comhttp://www.medassets.com
 Follow us on LinkedInhttp://www.linkedin.com/company/medassets, YouTube
 https://www.youtube.com/user/MedAssetsInc, Twitter
 https://twitter.com/MedAssets, and Facebook
 https://www.facebook.com/MedAssets

 *Attention*
 This electronic transmission may contain confidential, sensitive,
 proprietary and/or privileged information belonging to the sender. This
 information, including any attached files, is intended only for the persons
 or entities to which it is addressed. Authorized recipients of this
 information are prohibited from disclosing the information to any
 unauthorized party and are required to properly dispose of the information
 upon fulfillment of its need/use, unless otherwise required by law. Any
 review, retransmission, dissemination or other use of, or taking of any
 action in reliance upon this information by any person or entity other than
 the intended recipient is prohibited. If you have received this electronic
 transmission in error, please notify the sender and properly dispose of the
 information immediately.

Solr operation problem

2013-10-23 Thread masum.uia

Dear user,
Cloud you please help me to solve my following problem:

I have installed Java, Tomcat and arrange all the files for Solr 4.5
according the instruction from Solr Wiki.htm and different web.
My tomcat is running well but I am getting problem once I try to open solr
using http://localhost:8983/solr.
It shows,
type Status report

message /solr

description The requested resource is not available.
Apache Tomcat/7.0.42

I am seeking help to continue my solr.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-operation-problem-tp4097390.html
Sent from the Solr - User mailing list archive at Nabble.com.

Global IDF vs. Routing

Hi,

Seeing so much work being put in routing and seeing the recent
questions about the status of global IDF support made me realize, for
the first time really, that with people using routing more and more we
should be seeing more and more issues caused by the lack of global IDF
because routing by definition doesn't randomly and evenly spread data
across shards.

Is this correct or am I missing something and this is in fact not
(such a big) problem?

Thanks,
Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr  Elasticsearch Support * http://sematext.com/

Carrot2 Clustering with Field Collapsing

2013-10-23 Thread Thanigai Vellore

When I try to use carrot2 clustering in solr with grouping based on a field, I 
get a null pointer exception. However, the clustering query works fine without 
field grouping. For eg: the below query works fine:

/clustering?q=text:applerows=500carrot.title=title

but, this query throws an error: /clustering?
q=text:tigerrows=500carrot.title=titlegroup=truegroup.field=idgroup.main=true


I would like to know if clustering is supported with field collapsing or I'm 
doing something incorrect? The stack trace for the error is given below:

java.lang.NullPointerException
at 
org.apache.solr.handler.clustering.ClusteringComponent.process(ClusteringComponent.java:161)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at 
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at 
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
at 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at 
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.jav

Thanigai Vellore
Sr. Software Architect
Art.com
Phone: (510) 879-4791
[Art.com Inc.]
If you have received this e-mail in error, please immediately notify the sender 
by reply e-mail and destroy the original e-mail and its attachments without 
reading or saving them. This e-mail and any documents, files or previous e-mail 
messages attached to it, may contain confidential or privileged information 
that is prohibited from disclosure under confidentiality agreement or 
applicable law. If you are not the intended recipient, or a person responsible 
for delivering it to the intended recipient, you are hereby notified that any 
disclosure, copying, distribution or use of this e-mail or any of the 
information contained in or attached to this e-mail is STRICTLY PROHIBITED. 
Thank you.

Re: Global IDF vs. Routing

2013-10-23 Thread Yonik Seeley

On Wed, Oct 23, 2013 at 9:03 PM, Otis Gospodnetic
otis.gospodne...@gmail.com wrote:
 Seeing so much work being put in routing and seeing the recent
 questions about the status of global IDF support made me realize, for
 the first time really, that with people using routing more and more we
 should be seeing more and more issues caused by the lack of global IDF
 because routing by definition doesn't randomly and evenly spread data
 across shards.

Many people are using routing to partition users data - in this case,
global IDF would normally not be what you want anyway.

-Yonik

Re: Solr operation problem

2013-10-23 Thread Alexandre Rafalovitch

Have you already used Solr with default setup (Jetty)? If not, I recommend
you do the Jetty setup first and online tutorial. Just so you understand
what the files are, where they are and so on.

Then, add Tomcat into the mix. If you still have a problem, let us know
which operating system you are on and what exceptions you are getting in
log files. Currently, the information you provided is insufficient exactly
because Tomcat is not the primary out-of-the-box solution.

Regards,
   Alex.
P.s. Latest version of Solr requires additional logging libraries for usage
with Tomcat. It's documented on the Wiki. But I am not sure you are even at
this point yet.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Thu, Oct 24, 2013 at 7:58 AM, masum.uia masum@gmail.com wrote:

 Dear user,
 Cloud you please help me to solve my following problem:

 I have installed Java, Tomcat and arrange all the files for Solr 4.5
 according the instruction from Solr Wiki.htm and different web.
 My tomcat is running well but I am getting problem once I try to open solr
 using http://localhost:8983/solr.
 It shows,
 type Status report

 message /solr

 description The requested resource is not available.
 Apache Tomcat/7.0.42

 I am seeking help to continue my solr.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-operation-problem-tp4097390.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: deleteByQuery does not work with SolrCloud

2013-10-23 Thread YouPeng Yang

Hi Erick

  It can get hits on  this documents.
  And I try this :

myhost/solr/mycore/update?stream.body=deletequeryname:shardTv_20131010/query/deletecommit=true
  the document could be deleted.


Regards


2013/10/23 Erick Erickson erickerick...@gmail.com

 The first thing I'd do is go in to the browser UI and make sure you can get
 hits on documents, something like
 blah/collection/q=indexname:shardTv_20131010

 Best,
 Erick


 On Wed, Oct 23, 2013 at 8:20 AM, YouPeng Yang yypvsxf19870...@gmail.com
 wrote:

  Hi
I am using SolrCloud withing solr 4.4 ,and I try the SolrJ API
  deleteByQuery to delete the Index as :
   CloudSolrServer cloudServer = new CloudSolrServer(myZKhost)
   cloudServer.connect()
   cloudServer.setDefaultCollection
   cloudServer.deleteByQuery(indexname:shardTv_20131010);
   cloudServer.commit();
 
  It seems not to work.
  I also have do some google,unfortunately there is no help.
 
  Do I miss anything?
 
 
  Thanks
 
  Regard

Re: Global IDF vs. Routing