date:20140203

Re: Clone (or Restore) Solrcloud

2014-02-03 Thread Shalin Shekhar Mangar

Hi David,

The parent metadata persists only until the sub-shards become active.
Actually the logic to make the sub-shards active depends on knowing
when all 'sibling' sub-shards' replicas have recovered successfully.
We store the parent to make that easier to look up. Once all replicas
of all sub-shards have recovered, the shard states are updated. The
'updateshardstate' command also removes the 'parent' key from the
sub-shards while switching them to 'active'.

If you're seeing the 'parent' key on a 'active' sub-shard then it may
be a bug. Please paste your clusterstate and I'll look into why it was
left over.

On Mon, Feb 3, 2014 at 10:19 AM, David Smiley (@MITRE.org)
dsmi...@mitre.org wrote:
 I think I figured this out; I hope people find this useful..

 It may not be possible to declare what the hash ranges are when you create
 the collection, but you *can* do so when you split via the 'ranges'
 parameter, which is a comma-delimited list. So this means you can create a
 new collection with one shard and then immediately split it to the desired
 ranges to line up with that of your backup.  I also observed that if you
 create a collection and then split every shard (in 2), it will result in an
 equivalent collection to one that was created with twice as many shards to
 begin with.  I hoped that was so and verified the ranges end up being the
 same both ways.

 The only thing that seems like it may be benign but not 100% certain is that
 if you split a shard, the new shards have a 'parent' reference to the name
 of the shard it was split from.  And even if you delete that parent shard
 (since it's not needed anymore; it becomes inactive).  I'm not sure why this
 metadata is recorded because, at least after the split, I can't see why it's
 pertinent to anything.

 ~ David


 David Smiley (@MITRE.org) wrote
 Hi,

 I'm attempting to come up with a SolrCloud restore / clone process for
 either recover to a known good state or to clone the environment for
 experimentation.  At the moment my process involves either creating a new
 zookeeper environment or at least deleting the existing Collection so that
 I can create a new one.  This works; I use the Core API; the first command
 defines the collection parameters, and I invoke it once for each replica.
 I don't use the Collection API because I want SolrCloud to go off trying
 to create all the replicas -- I know where each one is pre-positioned.

 What I'm concerned about is what happens once I start wanting to use Shard
 splitting, *especially* if I don't want to split all shards because shards
 are uneven due to custom routing (e.g. id:customer!myid).  In this case
 I don't know how to create the collection with the hash ranges post-shard
 split.  Solr doesn't have an API for me to explicitly say what the hash
 ranges should be on each shard (to match up with a backup).  And I'm
 concerned about undocumented pitfalls that may exist in manually
 constructing a clusterstate.json, as another approach.

 Any ideas?

 ~ David





 -
  Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Clone-or-Restore-Solrcloud-tp4114773p4114983.html
 Sent from the Solr - User mailing list archive at Nabble.com.



-- 
Regards,
Shalin Shekhar Mangar.

Apache Solr.

2014-02-03 Thread vignesh

Hi Team,

 

I am Vignesh, am using Apache Solr 3.6 and able to Index
XML file and now trying to Index PDF file and not able to index .Can you
give me the steps to carry out PDF indexing it will be very useful. Kindly
guide me through this process.

 

 

Thanks  Regards.

Vignesh.V

 

cid:image001.jpg@01CA4872.39B33D40

Ninestars Information Technologies Limited.,

72, Greams Road, Thousand Lights, Chennai - 600 006. India.

Landline : +91 44 2829 4226 / 36 / 56   X: 144

 blocked::http://www.ninestars.in/ www.ninestars.in 

 


--
30 Million Advertisements displayed. Is yours there?
http://www.safentrixads.com/adlink?cid=13
--

Solr and SDL Tridion Integration

2014-02-03 Thread Prasi S

Hi,
I want to index sdl tridion content to solr. Can you suggest how this can
be achieved. Is there any document/tutorial for this? Thanks

Thanks,
Prasi

Fwd: Need help for integrating solr-4.5.1 with UIMA

2014-02-03 Thread rashi gandhi

Hi,



I'm trying to integrate Solr 4.5.1 with UIMA and following the steps of the
solr-4.5.1\contrib\uima\readme.txt.

Edited the solrconfig.xml as given in readme.txt. Also I have registered
the required keys.



But each time when I am indexing data , solr returns error:



Feb 3, 2014 2:04:32 PM
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl
callAnalysisComponentProcess(405)

SEVERE: Exception occurred

org.apache.uima.analysis_engine.AnalysisEngineProcessException

at
org.apache.uima.annotator.calais.OpenCalaisAnnotator.process(OpenCalaisAnnotator.java:206)

at
org.apache.uima.analysis_component.CasAnnotator_ImplBase.process(CasAnnotator_ImplBase.java:56)

at
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:377)

at
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:295)

at
org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:567)

at
org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.init(ASB_impl.java:409)

at
org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:342)

at
org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:267)

at
org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:267)

at
org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:280)

at
org.apache.solr.uima.processor.UIMAUpdateRequestProcessor.processText(UIMAUpdateRequestProcessor.java:173)

at
org.apache.solr.uima.processor.UIMAUpdateRequestProcessor.processAdd(UIMAUpdateRequestProcessor.java:79)

at
org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:247)

at
org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:174)

at
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)

at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)

at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)

at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703)

at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406)

at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)

at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)

at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)

at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)

at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)

at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)

at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)

at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:953)

at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)

at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)

at
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1008)

at
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)

at
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)

at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

at java.lang.Thread.run(Thread.java:619)

Caused by: java.net.ConnectException: Connection timed out: connect

at java.net.PlainSocketImpl.socketConnect(Native Method)

at
java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)

at
java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)

at
java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)

at
java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)

at java.net.Socket.connect(Socket.java:529)

at

Re: Solr and SDL Tridion Integration

2014-02-03 Thread Alexandre Rafalovitch

This is a new one.

You may want to start from Tridion's list and ask about API, export or
any other ways to get to the data. Then come back with more specific
question once you know what it looks like and granularity of update
(hook on document change vs. full export only).

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Mon, Feb 3, 2014 at 4:16 PM, Prasi S prasi1...@gmail.com wrote:
 Hi,
 I want to index sdl tridion content to solr. Can you suggest how this can
 be achieved. Is there any document/tutorial for this? Thanks

 Thanks,
 Prasi

Re: Apache Solr.

2014-02-03 Thread Siegfried Goeschl


Hi Vignesh,

a few keywords for further investigations

* Solr Data Import Handler
* Apache Tikka
* Apache PDFBox

Cheers,

Siegfried Goeschl

On 03.02.14 09:15, vignesh wrote:

Hi Team,



 I am Vignesh, am using Apache Solr 3.6 and able to Index
XML file and now trying to Index PDF file and not able to index .Can you
give me the steps to carry out PDF indexing it will be very useful. Kindly
guide me through this process.





Thanks  Regards.

Vignesh.V



cid:image001.jpg@01CA4872.39B33D40

Ninestars Information Technologies Limited.,

72, Greams Road, Thousand Lights, Chennai - 600 006. India.

Landline : +91 44 2829 4226 / 36 / 56   X: 144

  blocked::http://www.ninestars.in/ www.ninestars.in




--
30 Million Advertisements displayed. Is yours there?
http://www.safentrixads.com/adlink?cid=13
--

Special NGRAMish requirement

2014-02-03 Thread Lochschmied, Alexander

Hi,

we need to use something very similar to EdgeNGram (minGramSize=1 
maxGramSize=50 side=front).
The only thing missing is that we would like to reduce the number of matches. 
The request we need to implement is returning only those matches with the 
longest tokens (or terms if that is the right word).

Is there a way to do this in Solr (not necessarily with EdgeNGram)?

Thanks,
Alexander

Re: Solr and SDL Tridion Integration

2014-02-03 Thread Jack Krupansky

If SDL Tridion can export to CSV format, Solr can then import from CSV 
format.


Otherwise, you may have to write a custom script or even maybe Java code to 
read from SDL Tridion and output a supported Solr format, such as Solr XML, 
Solr JSON, or CSV.


-- Jack Krupansky

-Original Message- 
From: Prasi S

Sent: Monday, February 3, 2014 4:16 AM
To: solr-user@lucene.apache.org
Subject: Solr and SDL Tridion Integration

Hi,
I want to index sdl tridion content to solr. Can you suggest how this can
be achieved. Is there any document/tutorial for this? Thanks

Thanks,
Prasi

weird exception on update

2014-02-03 Thread Dmitry Kan

Hello!

We are hitting a really strange and nasty issue when trying to delete by
query and not when just adding documents. The exception says:
http://pastebin.com/B1x5dAF7

Any ideas as to what is going on?

The delete by query is referencing the unique field. The core's index does
not contain the value that is being deleted.
Solr: 4.3.1.


-- 
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: twitter.com/dmitrykan

Score of Search Term for every character remove

2014-02-03 Thread Lusung, Abner

Hi,

I'm new with using SOLR and I'm curious if this is capable of doing the 
following or similar.

Sample:
Query: ABCDEF

Returns:
ABCDEF  0 hits
ABCDE  2 hits
ABCD  3 hits
ABC  10 hits
AB  20 hits
A  100 hits

In one request only.

Thanks.

Abner G. Lusung Jr.| Java Web Development, Internet and Commerce, Global Web 
Services  | Vishay Philippines Inc.
10th Floor Pacific Star Building, Makati Avenue corner Buendia Avenue, Makati 
City, Philippines 1200
Phone: +63 2 8387421 loc. 7995 | Mobile: +63 9169674514
Website : www.vishay.comhttp://www.vishay.com/

[Vishay]http://www.vishay.com/

Re: Import data from mysql to sold

2014-02-03 Thread Alexei Martchenko

I've been using DIH to import large Databases to XML file batches and It's
blazing fast.


alexei martchenko
Facebook http://www.facebook.com/alexeiramone |
Linkedinhttp://br.linkedin.com/in/alexeimartchenko|
Steam http://steamcommunity.com/id/alexeiramone/ |
4sqhttps://pt.foursquare.com/alexeiramone| Skype: alexeiramone |
Github https://github.com/alexeiramone | (11) 9 7613.0966 |


2014-02-03 rachun rachun.c...@gmail.com:

 Dear all gurus,

 I would like to import my data (mysql) about 4 Million rows  into solar
 4.6.
 What is the best way to do it?

 Please suggest me.

 Million thanks,
 Chun.




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Import-data-from-mysql-to-sold-tp4114982.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Geospatial clustering + zoom in/out help

2014-02-03 Thread Bojan Šmid

Hi David,

  I was hoping to get an answer on Geospatial topic from you :). These
links basically confirm that approach I wanted to take should work ok with
similar (or even bigger) amount of data than I plan to have. Instead of my
custom NxM division of world, I'll try existing GeoHash encoding, it may be
good enough (and will be quicker to implement).

  Thanks!

  Bojan


On Fri, Jan 31, 2014 at 8:27 PM, Smiley, David W. dsmi...@mitre.org wrote:

 Hi Bojan.

 You've got some good ideas here along the lines of some that others have
 tried.  I've through together a page on the wiki about this subject some
 time ago that I'm sure you will find interesting.  It references a relevant
 stack-overflow post, and also a presentation at DrupalCon which had a
 segment from a guy using the same approach you suggest here involving
 field-collapsing and/or stats components.  The video shows it in action.

 http://wiki.apache.org/solr/SpatialClustering

 It would be helpful for everyone if you share your experience with
 whatever you choose, once you give an approach a try.

 ~ David
 
 From: Bojan Šmid [bos...@gmail.com]
 Sent: Thursday, January 30, 2014 1:15 PM
 To: solr-user@lucene.apache.org
 Subject: Geospatial clustering + zoom in/out help

 Hi,

 I have an index with 300K docs with lat,lon. I need to cluster the docs
 based on lat,lon for display in the UI. The user then needs to be able to
 click on any cluster and zoom in (up to 11 levels deep).

 I'm using Solr 4.6 and I'm wondering how best to implement this
 efficiently?

 A bit more specific questions below.

 I need to:

 1) cluster data points at different zoom levels

 2) click on a specific cluster and zoom in

 3) be able to select a region (bounding box or polygon) and show clusters
 in the selected area

 What's the best way to implement this so that queries are fast?

 What I thought I would try, but maybe there are better ways:

 * divide the world in NxM large squares and then each of these squares into
 4 more squares, and so on - 11 levels deep

 * at index time figure out all squares (at all 11 levels) each data point
 belongs to and index that info into 11 different fields: e.g.
 id=1 name=foo lat=x lon=y zoom1=square1_62  zoom2=square1_62_47
 zoom3=square1_62_47_33 

 * at search time, use field collapsing on zoomX field to get which docs
 belong to which square on particular level

 * calculate center point of each square (by calculating mean value of
 positions for all points in that square) using StatsComponent (facet on
 zoomX field, avg on lat and lon fields) - I would consider those squares as
 separate clusters (one square is one cluster) and center points of those
 squares as center points of clusters derived from them

 I *think* the problem with this approach is that:

 * there will be many unique fields for bigger zoom levels, which means
 field collapsing / StatsComponent maaay not work fast enough

 * clusters will not look very natural because I would have many clusters on
 each zoom level and what are real geographical clusters would be
 displayed as multiple clusters since their points would in some cases be
 dispersed into multiple squares. But that may be OK

 * a lot will depend on how the squares are calculated - linearly dividing
 360 degrees by N to get equal size squares in degrees would produce
 issues with real square sizes and counts of points in each of them


 So I'm wondering if there is a better way?

 Thanks,


   Bojan

Re: Apache Solr.

2014-02-03 Thread Alexei Martchenko

That's right, Solr doesn't import PDFs as it imports XMLs. You'll need to
use Tikka to import binary/specific file types.

http://tika.apache.org/1.4/formats.html


alexei martchenko
Facebook http://www.facebook.com/alexeiramone |
Linkedinhttp://br.linkedin.com/in/alexeimartchenko|
Steam http://steamcommunity.com/id/alexeiramone/ |
4sqhttps://pt.foursquare.com/alexeiramone| Skype: alexeiramone |
Github https://github.com/alexeiramone | (11) 9 7613.0966 |


2014-02-03 Siegfried Goeschl sgoes...@gmx.at:

 Hi Vignesh,

 a few keywords for further investigations

 * Solr Data Import Handler
 * Apache Tikka
 * Apache PDFBox

 Cheers,

 Siegfried Goeschl


 On 03.02.14 09:15, vignesh wrote:

 Hi Team,



  I am Vignesh, am using Apache Solr 3.6 and able to
 Index
 XML file and now trying to Index PDF file and not able to index .Can you
 give me the steps to carry out PDF indexing it will be very useful. Kindly
 guide me through this process.





 Thanks  Regards.

 Vignesh.V



 cid:image001.jpg@01CA4872.39B33D40

 Ninestars Information Technologies Limited.,

 72, Greams Road, Thousand Lights, Chennai - 600 006. India.

 Landline : +91 44 2829 4226 / 36 / 56   X: 144

   blocked::http://www.ninestars.in/ www.ninestars.in




 --

 30 Million Advertisements displayed. Is yours there?
 http://www.safentrixads.com/adlink?cid=13
 --

Re: Apache Solr.

2014-02-03 Thread Jack Krupansky

PDF files can be directly imported into Solr using Solr Cell (AKA 
ExtractingRequestHandler).


See:
https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika

Internally, Solr Cell uses Tika, which in turn uses PDFBox.

-- Jack Krupansky

-Original Message- 
From: Alexei Martchenko

Sent: Monday, February 3, 2014 8:04 AM
To: solr-user@lucene.apache.org
Subject: Re: Apache Solr.

That's right, Solr doesn't import PDFs as it imports XMLs. You'll need to
use Tikka to import binary/specific file types.

http://tika.apache.org/1.4/formats.html


alexei martchenko
Facebook http://www.facebook.com/alexeiramone |
Linkedinhttp://br.linkedin.com/in/alexeimartchenko|
Steam http://steamcommunity.com/id/alexeiramone/ |
4sqhttps://pt.foursquare.com/alexeiramone| Skype: alexeiramone |
Github https://github.com/alexeiramone | (11) 9 7613.0966 |


2014-02-03 Siegfried Goeschl sgoes...@gmx.at:


Hi Vignesh,

a few keywords for further investigations

* Solr Data Import Handler
* Apache Tikka
* Apache PDFBox

Cheers,

Siegfried Goeschl


On 03.02.14 09:15, vignesh wrote:


Hi Team,



 I am Vignesh, am using Apache Solr 3.6 and able to
Index
XML file and now trying to Index PDF file and not able to index .Can you
give me the steps to carry out PDF indexing it will be very useful. 
Kindly

guide me through this process.





Thanks  Regards.

Vignesh.V



cid:image001.jpg@01CA4872.39B33D40

Ninestars Information Technologies Limited.,

72, Greams Road, Thousand Lights, Chennai - 600 006. India.

Landline : +91 44 2829 4226 / 36 / 56   X: 144

  blocked::http://www.ninestars.in/ www.ninestars.in




--

30 Million Advertisements displayed. Is yours there?
http://www.safentrixads.com/adlink?cid=13
--

Announce list

2014-02-03 Thread Arie Zilberstein

Hi,

Is there a mailing list for getting just announcements about new versions?

Thanks,
Arie

Writing a customize updateRequestHandler

2014-02-03 Thread neerajp

Hi,
I want to write a custom updateRequestHandler.
Can you pl.s guide me the steps I need to perform for that ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Writing-a-customize-updateRequestHandler-tp4115059.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: weird exception on update

2014-02-03 Thread Dmitry Kan

This exception is similar to what is talked about here:
https://gist.github.com/mbklein/6367133
http://irc.projecthydra.org/2013-08-28.html

We found out that:

1. this happens iff on two cores inside the same container there is a query
parser defined via defType.
2. After removing index files on one of the cores, the delete by query
works just fine. Right after restarting the container, the same query fails.


Is there a jira for this? Should I create one?

Dmitry


On Mon, Feb 3, 2014 at 2:03 PM, Dmitry Kan solrexp...@gmail.com wrote:

 Hello!

 We are hitting a really strange and nasty issue when trying to delete by
 query and not when just adding documents. The exception says:
 http://pastebin.com/B1x5dAF7

 Any ideas as to what is going on?

 The delete by query is referencing the unique field. The core's index does
 not contain the value that is being deleted.
 Solr: 4.3.1.


 --
 Dmitry
 Blog: http://dmitrykan.blogspot.com
 Twitter: twitter.com/dmitrykan




-- 
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: twitter.com/dmitrykan

Re: Writing a customize updateRequestHandler

2014-02-03 Thread Jorge Luis Betancourt Gonzalez

In the book Apache Solr Beginner’s Guide there is a section dedicated to write 
new Solr plugins, perhaps it would be a good place to start, also in the wiki 
there is a page about this, but the it’s a light introduction. I’ve found that 
a very good starting point it’s just browse throw the code of some standard 
components similar to the one you’re trying to customize.

On Feb 3, 2014, at 9:00 AM, neerajp neeraj_star2...@yahoo.com wrote:

 Hi,
 I want to write a custom updateRequestHandler.
 Can you pl.s guide me the steps I need to perform for that ?
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Writing-a-customize-updateRequestHandler-tp4115059.html
 Sent from the Solr - User mailing list archive at Nabble.com.


III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu

Strange Error Message while Full Import

2014-02-03 Thread Peter Sch�tt

Hallo,

when I do a full import of a SOLR index I become a strange error 
message:

org.apache.solr.handler.dataimport.DataImportHandlerException: 
java.sql.SQLRecoverableException: Closed Resultset: next

It is only a simple query 

select FIRMEN_ID, FIRMIERUNG, FIRMENKENNUNG,
 PZN, DEBITORNUMMER, ADRESS_ID from DAT_FIRMA

This error seems to be a subsequent error but there is no other cause in 
the stacktrace.

Thanks for any hints.

Ciao
  Peter Schütt

P.S. The error stacktrace


Feb 03, 2014 2:11:01 PM org.apache.solr.common.SolrException log
SEVERE: getNext() failed for query 'select FIRMEN_ID, FIRMIERUNG, 
FIRMENKENNUNG,
 PZN, DEBITORNUMMER, ADRESS_ID from 
DAT_FIRMA':org.apache.solr.handler.dataimport.DataImportHandlerException
: java.sql.SQLRecoverableException: Closed Resultset: next
at 
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAnd
Throw(DataImportHandlerException.java:63)
at 
org.apache.solr.handler.dataimport.PreparedStatementJdbcDataSource$Re
sultSetIterator.hasnext(PreparedStatementJdbcDataSource.java:404)
at 
org.apache.solr.handler.dataimport.PreparedStatementJdbcDataSource$Re
sultSetIterator.access$600(PreparedStatementJdbcDataSource.java:256)
at 
org.apache.solr.handler.dataimport.PreparedStatementJdbcDataSource$Re
sultSetIterator$1.hasNext(PreparedStatementJdbcDataSource.java:324)
at 
org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(Entity
ProcessorBase.java:116)
at 
org.apache.solr.handler.dataimport.PreparedStatementSqlEntityProcesso
r.handleQuery(PreparedStatementSqlEntityProcessor.java:119)
at 
org.apache.solr.handler.dataimport.PreparedStatementSqlEntityProcesso
r.nextRow(PreparedStatementSqlEntityProcessor.java:124)
at 
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Ent
ityProcessorWrapper.java:243)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument
(DocBuilde
r.java:465)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument
(DocBuilde
r.java:404)
at org.apache.solr.handler.dataimport.DocBuilder.doFullDump
(DocBuilder.j
ava:319)
at org.apache.solr.handler.dataimport.DocBuilder.execute
(DocBuilder.java
:227)
at org.apache.solr.handler.dataimport.DataImporter.doFullImport
(DataImpo
rter.java:422)
at org.apache.solr.handler.dataimport.DataImporter.runCmd
(DataImporter.j
ava:487)
at org.apache.solr.handler.dataimport.DataImporter$1.run
(DataImporter.ja
va:468)
Caused by: java.sql.SQLRecoverableException: Closed Resultset: next
at oracle.jdbc.driver.OracleResultSetImpl.next
(OracleResultSetImpl.java:
214)
at org.apache.tomcat.dbcp.dbcp.DelegatingResultSet.next
(DelegatingResult
Set.java:207)
at org.apache.tomcat.dbcp.dbcp.DelegatingResultSet.next
(DelegatingResult
Set.java:207)
at 
org.apache.solr.handler.dataimport.PreparedStatementJdbcDataSource$Re
sultSetIterator.hasnext(PreparedStatementJdbcDataSource.java:396)
... 13 more

Re: Announce list

2014-02-03 Thread Alexandre Rafalovitch

I don't think so.  What would be the value?

Would you be upgrading every 6-8 weeks as the new versions come out?
Or are you downstream of Solr and want to check compatibility?

Curious what the use case would be.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Mon, Feb 3, 2014 at 8:59 PM, Arie Zilberstein
azilberst...@salesforce.com wrote:
 Hi,

 Is there a mailing list for getting just announcements about new versions?

 Thanks,
 Arie

Re: Announce list

2014-02-03 Thread Lajos


There's always http://projects.apache.org/feeds/rss.xml.

L


On 03/02/2014 14:59, Arie Zilberstein wrote:

Hi,

Is there a mailing list for getting just announcements about new versions?

Thanks,
Arie

Re: weird exception on update

2014-02-03 Thread Dmitry Kan

The solution (or workaround?) is to drop the defType from one of the cores
and use {!qparser} local param on every query, including the delete by
query. It would be really great, if this could be handled on the solr
config side only without involving the client changes.




On Mon, Feb 3, 2014 at 4:02 PM, Dmitry Kan solrexp...@gmail.com wrote:

 This exception is similar to what is talked about here:
 https://gist.github.com/mbklein/6367133
 http://irc.projecthydra.org/2013-08-28.html

 We found out that:

 1. this happens iff on two cores inside the same container there is a
 query parser defined via defType.
 2. After removing index files on one of the cores, the delete by query
 works just fine. Right after restarting the container, the same query fails.


 Is there a jira for this? Should I create one?

 Dmitry


 On Mon, Feb 3, 2014 at 2:03 PM, Dmitry Kan solrexp...@gmail.com wrote:

 Hello!

 We are hitting a really strange and nasty issue when trying to delete by
 query and not when just adding documents. The exception says:
 http://pastebin.com/B1x5dAF7

 Any ideas as to what is going on?

 The delete by query is referencing the unique field. The core's index
 does not contain the value that is being deleted.
 Solr: 4.3.1.


 --
 Dmitry
 Blog: http://dmitrykan.blogspot.com
 Twitter: twitter.com/dmitrykan




 --
 Dmitry
 Blog: http://dmitrykan.blogspot.com
 Twitter: twitter.com/dmitrykan




-- 
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: twitter.com/dmitrykan

Re: shard1 gone missing ... (upgrade to 4.6.1)

2014-02-03 Thread David Santamauro

Mark, I am testing the upgrade and indexing gives me this error:

914379 [http-apr-8080-exec-4] ERROR org.apache.solr.core.SolrCore ?
org.apache.solr.common.SolrException: Invalid UTF-8 middle byte 0xe0 (at
char #1, byte #-1)

... and a bunch of these

request:
http://xx.xx.xx.xx/col1/update?update.distrib=TOLEADERdistrib.from=http%3A%2F%2Fxx.xx.xx.xx%3A8080%2Fcol1%2Fwt=javabinversion=2
at
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:240)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:744)
1581335 [updateExecutor-1-thread-7] ERROR
org.apache.solr.update.StreamingSolrServers ? error

org.apache.solr.common.SolrException: Bad Request

Nothing else in the process chain has changed. Does this have anything
to do with the deprecated warnings:

WARN org.apache.solr.handler.UpdateRequestHandler ? Using deprecated
class: XmlUpdateRequestHandler -- replace with UpdateRequestHandler

thanks

David

On 01/31/2014 11:22 AM, Mark Miller wrote:

On Jan 31, 2014, at 11:15 AM, David Santamauro david.santama...@gmail.com
wrote:

On 01/31/2014 10:22 AM, Mark Miller wrote:

I’d also highly recommend you try moving to Solr 4.6.1 when you can though. We
have fixed many, many, many bugs around SolrCloud in the 4 releases since 4.4.
You can follow the progress in the CHANGES file we update for each release.

Can I do a drop-in replacement of 4.4.0 ?

It should be a drop in replacement. For some that use deep API’s in plugins,
sometimes you might have to make a couple small changes to your code.

Alway best to do a test with a copy of your index, but for most, it should be a
drop in replacement.

- Mark

http://about.me/markrmiller

SolrCloud query results order master vs replica

2014-02-03 Thread M. Flatterie

Greetings,

My setup is:
- SolrCloud V4.3
- On collection
- one shard
- 1 master, 1 replica

so each instance contains the entire index.  The index is rather small and the 
replica is used for robustness.  There is no need (IMHO) to split shard the 
index (yet, until the index gets bigger).

My question:
- if I do a query on a product name (that is what the index is about) on the 
master I get a certain number of results and the documents.
- if I do the same query on the replica, I get the same number of results but 
the docs are in a different order.
- I do not specify a sort parameter in my query, simply a q=product name.
- obviously if I force a sort order, everything is ok, same results, same order 
from both instances.
- am I wrong in expecting the same results, in the SAME order?

Follow up question if the order is not guaranteed:
- should I force the dev. to use an explicit sort order?
- if we force the sort, we then bypass the ranking / score order do we not?
- should I force all queries to go to the master and fall back on the replica 
only in the context of a total loss of the master?

Other useful information:
  - the admin page shows same number of documents in both instances.
  - logs are clean, load and replication and queries worked ok.
  - the web application that queries SOLR round robins between the two 
instances, so getting results in a different order is bad for consistency.

Thank you for your help!

Nic

Elevation and nested queries

2014-02-03 Thread Holger Rieß

I have a simple query 'q=hurco' (parser type edismax). Elevation is properly 
configured, so I get the expected results:
...
doc
str name=id7HURCO/str
arr name=debtoritem
  str0~*/str
/arr
bool name=[elevated]true/bool
/doc

A similar query with a nested query 'q=(hurco AND _query_:{!field f=debtoritem 
v=0~*})' returns the same document but without elevation:
...
doc
str name=id7HURCO/str
arr name=debtoritem
  str0~*/str
/arr
bool name=[elevated]false/bool
/doc

Does a nested query disable elevation?

There is an additional spellcheck component added to the query. This is working 
as expected.
arr name=last-components
strspellcheck/str
strelevator/str
/arr

Thanks,
Holger

Re: Need help for integrating solr-4.5.1 with UIMA

2014-02-03 Thread Luca Foppiano

On Mon, Feb 3, 2014 at 10:20 AM, rashi gandhi gandhirash...@gmail.comwrote:

 Hi,


Hi,


 I'm trying to integrate Solr 4.5.1 with UIMA and following the steps of the
 solr-4.5.1\contrib\uima\readme.txt.

 Edited the solrconfig.xml as given in readme.txt. Also I have registered
 the required keys.


[...]


 at java.lang.Thread.run(Thread.java:619)

 *Caused by: java.net.ConnectException: Connection timed out:*

*connect*


[...]


 What is going wrong?

 Please help me on this.


In principle I've never integrate UIMA and solr, but quickly looking at
your exception (please send only the meaningful part of the stack trace)
seems you have a problem to connect. I would start from there.

Regards
Luca
-- 
Luca Foppiano

Software Engineer
+31615253280
l...@foppiano.org
www.foppiano.org

Re: SolrCloudServer questions

2014-02-03 Thread Greg Walters

I've seen best throughput while indexing by sending in batches of documents 
rather than individual documents per request. You might try queueing on your 
indexing machines for a bit then sending off a batch every N documents.

Thanks,
Greg

On Feb 1, 2014, at 6:49 PM, Software Dev static.void@gmail.com wrote:

 Also, if we are seeing a huge cpu spike on the leader when doing a bulk
 index, would changing any of the options help?
 
 
 On Sat, Feb 1, 2014 at 2:59 PM, Software Dev static.void@gmail.comwrote:
 
 Out use case is we have 3 indexing machines pulling off a kafka queue and
 they are all sending individual updates.
 
 
 On Fri, Jan 31, 2014 at 12:54 PM, Mark Miller markrmil...@gmail.comwrote:
 
 Just make sure parallel updates is set to true.
 
 If you want to load even faster, you can use the bulk add methods, or if
 you need more fine grained responses, use the single add from multiple
 threads (though bulk add can also be done via multiple threads if you
 really want to try and push the max).
 
 - Mark
 
 http://about.me/markrmiller
 
 On Jan 31, 2014, at 3:50 PM, Software Dev static.void@gmail.com
 wrote:
 
 Which of any of these settings would be beneficial when bulk uploading?
 
 
 On Fri, Jan 31, 2014 at 11:05 AM, Mark Miller markrmil...@gmail.com
 wrote:
 
 
 
 On Jan 31, 2014, at 1:56 PM, Greg Walters greg.walt...@answers.com
 wrote:
 
 I'm assuming you mean CloudSolrServer here. If I'm wrong please ignore
 my response.
 
 -updatesToLeaders
 
 Only send documents to shard leaders while indexing. This saves
 cross-talk between slaves and leaders which results in more efficient
 document routing.
 
 Right, but recently this has less of an affect because CloudSolrServer
 can
 now hash documents and directly send them to the right place. This
 option
 has become more historical. Just make sure you set the correct id
 field on
 the CloudSolrServer instance for this hashing to work (I think it
 defaults
 to id).
 
 
 shutdownLBHttpSolrServer
 
 CloudSolrServer uses a LBHttpSolrServer behind the scenes to
 distribute
 requests (that aren't updates directly to leaders). Where did you find
 this? I don't see this in the javadoc anywhere but it is a boolean in
 the
 CloudSolrServer class. It looks like when you create a new
 CloudSolrServer
 and pass it your own LBHttpSolrServer the boolean gets set to false
 and the
 CloudSolrServer won't shut down the LBHttpSolrServer when it gets shut
 down.
 
 parellelUpdates
 
 The javadoc's done have any description for this one but I checked out
 the code for CloudSolrServer and if parallelUpdates it looks like it
 executes update statements to multiple shards at the same time.
 
 Right, we should def add some javadoc, but this sends updates to
 shards in
 parallel rather than with a single thread. Can really increase update
 speed. Still not as powerful as using CloudSolrServer from multiple
 threads, but a nice improvement non the less.
 
 
 - Mark
 
 http://about.me/markrmiller
 
 
 I'm no dev but I can read so please excuse any errors on my part.
 
 Thanks,
 Greg
 
 On Jan 31, 2014, at 11:40 AM, Software Dev static.void@gmail.com
 
 wrote:
 
 Can someone clarify what the following options are:
 
 - updatesToLeaders
 - shutdownLBHttpSolrServer
 - parallelUpdates
 
 Also, I remember in older version of Solr there was an efficient
 format
 that was used between SolrJ and Solr that is more compact. Does this
 sill
 exist in the latest version of Solr? If so, is it the default?
 
 Thanks

Duplicate Facet.FIelds cause same results, should dedupe?

2014-02-03 Thread William Bell

If we add :

facet.field=prac_spec_heirfacet.field=prac_spec_heir

we get it twice in the results. This breaks deserialization on wt=json
since you cannot have the same name twice

Thoughts? Seems like a new bug in 4.6 ?


facet.field: [prac_spec_heir,all_proc_name_code,all_cond_name_code,
prac_spec_heir,{!ex=exgender}gender,{!ex=expayor}payor_code_name],

-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076

Re: need help in understating solr cloud stats data

2014-02-03 Thread Greg Walters

I've had some issues monitoring Solr with the per-core mbeans and ended up 
writing a custom request handler that gets loaded then registers itself as an 
mbean. When called it polls all the per-core mbeans then adds or averages them 
where appropriate before returning the requested value. I'm not sure if there's 
a better way to get jvm-wide stats via jmx but it is *a* way to get it done.

Thanks,
Greg

On Feb 3, 2014, at 1:33 AM, adfel70 adfe...@gmail.com wrote:

 I'm sending all solr stats data to graphite.
 I have some questions:
 1. query_handler/select requestTime - 
 if i'm looking at some metric, lets say 75thPcRequestTime - I see that each
 core in a single collection has different values.
 Is each value of each core is the time that specific core spent on a
 request?
 so to get an idea of total request time, I should summarize all the values
 of all the cores?
 
 
 2.update_handler/commits - does this include auto_commits? becuaste I'm
 pretty sure I'm not doing any manual commits and yet I see a number there.
 
 3. update_handler/docs pending - what does this mean? pending for what? for
 flush to disk?
 
 thanks.
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/need-help-in-understating-solr-cloud-stats-data-tp4114992.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: need help in understating solr cloud stats data

2014-02-03 Thread Mark Miller

You should contribute that and spread the dev load with others :)

We need something like that at some point, it’s just no one has done it. We
currently expect you to aggregate in the monitoring layer and it’s a lot to ask
IMO.

- Mark

http://about.me/markrmiller

On Feb 3, 2014, at 10:49 AM, Greg Walters greg.walt...@answers.com wrote:

I've had some issues monitoring Solr with the per-core mbeans and ended up
writing a custom request handler that gets loaded then registers itself as
an mbean. When called it polls all the per-core mbeans then adds or averages
them where appropriate before returning the requested value. I'm not sure if
there's a better way to get jvm-wide stats via jmx but it is *a* way to get
it done.

Thanks,
Greg

On Feb 3, 2014, at 1:33 AM, adfel70 adfe...@gmail.com wrote:

I'm sending all solr stats data to graphite.
I have some questions:
1. query_handler/select requestTime -
if i'm looking at some metric, lets say 75thPcRequestTime - I see that each
core in a single collection has different values.
Is each value of each core is the time that specific core spent on a
request?
so to get an idea of total request time, I should summarize all the values
of all the cores?

2.update_handler/commits - does this include auto_commits? becuaste I'm
pretty sure I'm not doing any manual commits and yet I see a number there.

3. update_handler/docs pending - what does this mean? pending for what? for
flush to disk?

thanks.

--
View this message in context:
http://lucene.472066.n3.nabble.com/need-help-in-understating-solr-cloud-stats-data-tp4114992.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: need help in understating solr cloud stats data

2014-02-03 Thread Greg Walters

The code I wrote is currently a bit of an ugly hack so I'm a bit reluctant to
share it and there's some legal concerns with open-sourcing code within my
company. That being said, I wouldn't mind rewriting it on my own time. Where
can I find a starter kit for contributors with coding guidelines and the like?
Spruced up some I'd be OK with submitting a patch.

Thanks,
Greg

On Feb 3, 2014, at 10:08 AM, Mark Miller markrmil...@gmail.com wrote:

You should contribute that and spread the dev load with others :)

We need something like that at some point, it’s just no one has done it. We
currently expect you to aggregate in the monitoring layer and it’s a lot to
ask IMO.

- Mark

http://about.me/markrmiller

On Feb 3, 2014, at 10:49 AM, Greg Walters greg.walt...@answers.com wrote:

Thanks,
Greg

On Feb 3, 2014, at 1:33 AM, adfel70 adfe...@gmail.com wrote:

2.update_handler/commits - does this include auto_commits? becuaste I'm
pretty sure I'm not doing any manual commits and yet I see a number there.

3. update_handler/docs pending - what does this mean? pending for what? for
flush to disk?

thanks.

--
View this message in context:
http://lucene.472066.n3.nabble.com/need-help-in-understating-solr-cloud-stats-data-tp4114992.html
Sent from the Solr - User mailing list archive at Nabble.com.

SolrCloud multiple data center support

2014-02-03 Thread Darrell Burgan

Hello, we are using Solr in a SolrCloud configuration, with two Solr instances 
running with three Zookeepers in a single data center. We presently have a 
single search index with about 35 million entries in it, about 60GB disk space 
on each of the two Solr servers (120GB total). I would expect our usage of Solr 
to grow to include other search indexes, and likely larger data volumes.

I'm writing because we're needing to grow beyond a single data center, with two 
(potentially incompatible) goals:


1.   We need to be able to have a hot disaster recovery site, in a 
completely separate data center, that has a near-realtime replica of the search 
index.


2.   We'd like to have the option to have multiple active/active data 
centers that each see and update the same search index, distributed across data 
centers.

The options I'm aware of from reading archives:


a.   Simply set up the remote Solr instances as active parts of the same 
SolrCloud cluster. This will  essentially involve us standing up multiple 
Zookeepers in the second data center, and multiple Solr instances, and they 
will all keep each other in sync magically. This will also solve both of our 
goals. However, I'm concerned about performance and whether SolrCloud is smart 
enough to route local search queries only to local Solr servers ... ? Also, how 
does such a cluster tolerate and recover from network partitions?


b.  The remote Solr instances form their own completely unrelated SolrCloud 
cluster. I have to invent some kind of replication logic of my own to sync data 
between them. This replication would have to be bidirectional to satisfy both 
of our goals. I strongly dislike this option since the application really 
should not concern itself with data distribution. But I'll do it if I must.

So my questions are:


-  Can anyone give me any guidance as to option a? Anyone using this in 
a real production setting? Words of wisdom? Does it work?


-  Are there any other options that I'm not considering?


-  What is Solr's answer to such configurations (we can't be alone in 
needing one)? Any big enhancements coming on the Solr road map to deal with 
this?

Thanks!
Darrell Burgan


[Description: Infor]http://www.infor.com/

Darrell Burgan | Chief Architect, PeopleAnswers
office: 214 445 2172 | mobile: 214 564 4450 | fax: 972 692 5386 | 
darrell.bur...@infor.commailto:darrell.bur...@infor.com | http://www.infor.com

CONFIDENTIALITY NOTE: This email (including any attachments) is confidential 
and may be protected by legal privilege. If you are not the intended recipient, 
be aware that any disclosure, copying, distribution, or use of the information 
contained herein is prohibited.  If you have received this message in error, 
please notify the sender by replying to this message and then delete this 
message in its entirety. Thank you for your cooperation.

Re: Announce list

2014-02-03 Thread Daniel Collins

I have seen other projects that have a releases mailing list, the only use
cases I can think of are:

1) users who want notifications about new releases, but don't want the
flood of the full user-list.
2) historical searching to see how often releases were made.  Given there
isn't an official timetable, its not really going to be useful as a forward
planner, but might have some value looking at how often patch releases come
out.  One could attempt to infer some degree of stability (or more
accurately lack of stability) if lots of patches for a given release came
out quickly.

Wasn't aware of the RSS feed, that's useful as an indicator for use case 1
at least.  Use case 2 is probably too vague and has lots of
assumptions/inferences that mean its a bad idea anyway :)



On 3 February 2014 14:37, Lajos la...@protulae.com wrote:

 There's always http://projects.apache.org/feeds/rss.xml.

 L



 On 03/02/2014 14:59, Arie Zilberstein wrote:

 Hi,

 Is there a mailing list for getting just announcements about new versions?

 Thanks,
 Arie

Re: Solr and SDL Tridion Integration

2014-02-03 Thread Chris Warner

There are many ways to do this, Prasi. You have a lot of thinking to do on the 
subject.

You could decide to publish your content to database, and then index that 
database in Solr.

You could publish XML or CSV files of your content for Solr to read and index.

You could use nutch or some other tool to crawl your web server.

There are many more methods, probably. These being some of the more common.

Does your site have dynamic content presentation? If so, you may want to 
consider having Solr examine your broker database.

Static pages on your site? You may want to go with either a crawler or 
publishing a special file for Solr.

Please check out https://tridion.stackexchange.com/ for more on this topic.
 
--
chris_war...@yahoo.com



On Monday, February 3, 2014 3:54 AM, Jack Krupansky j...@basetechnology.com 
wrote:
If SDL Tridion can export to CSV format, Solr can then import from CSV 
format.

Otherwise, you may have to write a custom script or even maybe Java code to 
read from SDL Tridion and output a supported Solr format, such as Solr XML, 
Solr JSON, or CSV.

-- Jack Krupansky


-Original Message- 
From: Prasi S
Sent: Monday, February 3, 2014 4:16 AM
To: solr-user@lucene.apache.org
Subject: Solr and SDL Tridion Integration

Hi,
I want to index sdl tridion content to solr. Can you suggest how this can
be achieved. Is there any document/tutorial for this? Thanks

Thanks,
Prasi

[ANN] Heliosearch 0.03 with off-heap field cache

2014-02-03 Thread Yonik Seeley

A new Heliosearch pre-release has been cut for people to try out:
https://github.com/Heliosearch/heliosearch/releases

Release Notes:
-
This is Heliosearch v0.03
Heliosearch is forked from Apache Solr and includes the following
additional features:

- Off-Heap Filters to reduce garbage collection pauses and overhead.
http://www.heliosearch.org/off-heap-filters

- Removed the 1024 limit on the number of clauses in a boolean query.
For example: q=id:(doc1 doc2 doc3 doc4 doc5 ... doc2000) will now work
correctly without throwing an exception.

- Deep Paging with cursorMark.  This is not yet in a current release
of Apache Solr, but should be in Solr 4.7
http://heliosearch.org/solr/paging-and-deep-paging/

- nCache - the new Off-Heap FieldCache to reduce garbage collection
overhead and accelerate sorting, faceting, and function queries.
http://heliosearch.org/solr-off-heap-fieldcache



-Yonik
http://heliosearch.com -- making solr shine

Re: Apache Solr.

2014-02-03 Thread solr2020

You can have this kind of configuration in Data import handler xml file to
index different type of files.

dataConfig
dataSource type=BinFileDataSource /
document  
entity name=files dataSource=null rootEntity=false
processor=FileListEntityProcessor baseDir=(enter the file repository
path)
fileName=.*.(doc)|(pdf)|(docx)|(txt)|(ppt)|(xls)|(xlsx)|(sql)|(vsd)|(zip)
onError=skip recursive=true
  field column=fileAbsolutePath name=id /
field column=fileSize name=size /
field column=fileLastModified name=lastModified /
entity name=tika-documentimport processor=TikaEntityProcessor
url=${files.fileAbsolutePath} format=text  
field column=File name=fileName/
 field column=Author name=author meta=true/
/entity  
/entity  
/document
/dataConfig

Hope this helps.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Apache-Solr-tp4114996p4115102.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: JVM heap constraints and garbage collection

2014-02-03 Thread Michael Della Bitta

i2.xlarge looks vastly better than m2.2xlarge at about the same price, so
I must be missing something: Is it the 120 IPs that explains why anyone
would choose m2.2xlarge?

i2.xlarge is a relatively new instance type (December 2013). In our case,
we're partway through a yearlong reservation of m2.2xlarges and won't be up
for reconsidering that for a few months. I don't think that Amazon has ever
dropped a legacy instance type, so there's bound to be some overlap as they
roll out new ones. And I imagine someone setting up a huge memcached pool
might rather have the extra RAM over the SSD, so it still makes sense for
the m2.2xlarge to be around.

It can be kind of hard to understand how the various parameters that make
up an instance type get decided on, though. I have to consult that
ec2instances.info link all the time to make sure I'm not missing something
regarding what types we should be using.

On Feb 1, 2014 1:51 PM, Toke Eskildsen t...@statsbiblioteket.dk wrote:

Michael Della Bitta [michael.della.bi...@appinions.com] wrote:
Here at Appinions, we use mostly m2.2xlarges, but the new i2.xlarges look
pretty tasty primarily because of the SSD, and I'll probably push for a
switch to those when our reservations run out.

http://www.ec2instances.info/

i2.xlarge looks vastly better than m2.2xlarge at about the same price, so
I must be missing something: Is it the 120 IPs that explains why anyone
would choose m2.2xlarge?

Anyhow, it is good to see that Amazon now has 11 different setups with
SSD. The IOPS looks solid at around 40K/s (estimated) for the i2.xlarge and
they even have TRIM (
http://aws.amazon.com/about-aws/whats-new/2013/12/19/announcing-the-next-generation-of-amazon-ec2-high-i/o-instance/).

- Toke Eskildsen

Getting index schema in SolrCloud mode

2014-02-03 Thread Peter Keegan

I'm indexing data with a SolrJ client via SolrServer. Currently, I parse
the schema returned from a HttpGet on:
localhost:8983/solr/collection/schema/fields

What is the recommended way to read the schema with CloudSolrServer? Can it
be done with a single HttpGet to a ZK server?

Thanks,
Peter

Re: SolrCloud multiple data center support

2014-02-03 Thread Mark Miller

SolrCloud has not tackled multi data center yet.

I don’t think a or b are very good options yet.

Honestly, I think the best current bet is to use something like Apache Flume to 
send data to both data centers - it will handle retries and keeping things in 
sync and splitting the stream. Doesn’t satisfy all use cases though.

At some point, multi data center support will happen.

I can’t remember where ZooKeeper’s support for it is at, but with that and some 
logic to favor nodes in your data center, that might be a viable route.

- Mark

http://about.me/markrmiller

On Feb 3, 2014, at 11:48 AM, Darrell Burgan darrell.bur...@infor.com wrote:

 Hello, we are using Solr in a SolrCloud configuration, with two Solr 
 instances running with three Zookeepers in a single data center. We presently 
 have a single search index with about 35 million entries in it, about 60GB 
 disk space on each of the two Solr servers (120GB total). I would expect our 
 usage of Solr to grow to include other search indexes, and likely larger data 
 volumes.
  
 I’m writing because we’re needing to grow beyond a single data center, with 
 two (potentially incompatible) goals:
  
 1.   We need to be able to have a hot disaster recovery site, in a 
 completely separate data center, that has a near-realtime replica of the 
 search index.
 
 2.   We’d like to have the option to have multiple active/active data 
 centers that each see and update the same search index, distributed across 
 data centers.
  
 The options I’m aware of from reading archives:
  
 a.   Simply set up the remote Solr instances as active parts of the same 
 SolrCloud cluster. This will  essentially involve us standing up multiple 
 Zookeepers in the second data center, and multiple Solr instances, and they 
 will all keep each other in sync magically. This will also solve both of our 
 goals. However, I’m concerned about performance and whether SolrCloud is 
 smart enough to route local search queries only to local Solr servers … ? 
 Also, how does such a cluster tolerate and recover from network partitions?
 
 b.  The remote Solr instances form their own completely unrelated 
 SolrCloud cluster. I have to invent some kind of replication logic of my own 
 to sync data between them. This replication would have to be bidirectional to 
 satisfy both of our goals. I strongly dislike this option since the 
 application really should not concern itself with data distribution. But I’ll 
 do it if I must.
  
 So my questions are:
  
 -  Can anyone give me any guidance as to option a? Anyone using this 
 in a real production setting? Words of wisdom? Does it work?
 
 -  Are there any other options that I’m not considering?
 
 -  What is Solr’s answer to such configurations (we can’t be alone in 
 needing one)? Any big enhancements coming on the Solr road map to deal with 
 this?
  
 Thanks!
 Darrell Burgan
  
  
 
 Darrell Burgan | Chief Architect, PeopleAnswers
 office: 214 445 2172 | mobile: 214 564 4450 | fax: 972 692 5386 | 
 darrell.bur...@infor.com | http://www.infor.com
 CONFIDENTIALITY NOTE: This email (including any attachments) is confidential 
 and may be protected by legal privilege. If you are not the intended 
 recipient, be aware that any disclosure, copying, distribution, or use of the 
 information contained herein is prohibited.  If you have received this 
 message in error, please notify the sender by replying to this message and 
 then delete this message in its entirety. Thank you for your cooperation.

Re: need help in understating solr cloud stats data

2014-02-03 Thread Joel Cohen

I had to come up with some Solr stats monitoring for my Zabbix instance. I
found that using JMX was the easiest way for us.

There is a command line jmx client that works quite well for me.
http://crawler.archive.org/cmdline-jmxclient/

I wrote a shell script to wrap around that and shove the data back to
Zabbix for ingestion and monitoring. I've listed the stats that I am
gathering, and the mbean that is called. My shell script is rather
simplistic.

!/bin/bash

cmdLineJMXJar=/usr/local/lib/cmdline-jmxclient.jar
jmxHost=$1
port=$2
query=$3
value=$4

java -jar ${cmdLineJMXJar} user:pass ${jmxHost}:${port} ${query} ${value}
21 | awk '{print $NF}'

The script is called as so: jmxstats.sh solr server name or IP jmx port
name of mbean value to query from mbean
My collection name is productCatalog, so swap that with yours.

*select requests*:
solr/productCatalog:id=org.apache.solr.handler.component.SearchHandler,type=/select
requests
*select errors:
*solr/productCatalog:id=org.apache.solr.handler.component.SearchHandler,type=/select
errors
*95th percentile request time*:
solr/productCatalog:id=org.apache.solr.handler.component.SearchHandler,type=/select
95thPcRequestTime
*update requests*:
solr/productCatalog:id=org.apache.solr.handler.UpdateRequestHandler,type=/update
requests
*update errors:*
solr/productCatalog:id=org.apache.solr.handler.UpdateRequestHandler,type=/update
errors
*95th percentile update time:*
solr/productCatalog:id=org.apache.solr.handler.UpdateRequestHandler,type=/update
95thPcRequestTime

*query result cache lookups*:
solr/productCatalog:id=org.apache.solr.search.LRUCache,type=queryResultCache
cumulative_lookups
*query result cache inserts*:
solr/productCatalog:id=org.apache.solr.search.LRUCache,type=queryResultCache
cumulative_inserts
*query result cache evictions*:
solr/productCatalog:id=org.apache.solr.search.LRUCache,type=queryResultCache
cumulative_evictions
*query result cache hit ratio:
*solr/productCatalog:id=org.apache.solr.search.LRUCache,type=queryResultCache
cumulative_hitratio

*document cache lookups:
*solr/productCatalog:id=org.apache.solr.search.LRUCache,type=documentCache
cumulative_lookups
*document cache inserts:
*solr/productCatalog:id=org.apache.solr.search.LRUCache,type=documentCache
cumulative_inserts
*document cache evictions:
*solr/productCatalog:id=org.apache.solr.search.LRUCache,type=documentCache
cumulative_evictions
*document cache hit ratio:
*solr/productCatalog:id=org.apache.solr.search.LRUCache,type=documentCache
cumulative_hitratio

*filter cache lookups:
*solr/productCatalog:type=filterCache,id=org.apache.solr.search.FastLRUCache
cumulative_lookups
*filter cache inserts:
*solr/productCatalog:type=filterCache,id=org.apache.solr.search.FastLRUCache
cumulative_inserts
*filter cache evictions:
*solr/productCatalog:type=filterCache,id=org.apache.solr.search.FastLRUCache
cumulative_evictions
*filter cache hit ratio:
*solr/productCatalog:type=filterCache,id=org.apache.solr.search.FastLRUCache
cumulative_hitratio

*field value cache lookups:
*solr/productCatalog:type=fieldValueCache,id=org.apache.solr.search.FastLRUCache
cumulative_lookups
*field value cache inserts:
*solr/productCatalog:type=fieldValueCache,id=org.apache.solr.search.FastLRUCache
cumulative_inserts
*field value cache evictions:
*solr/productCatalog:type=fieldValueCache,id=org.apache.solr.search.FastLRUCache
cumulative_evictions
*field value cache hit ratio:
*solr/productCatalog:type=fieldValueCache,id=org.apache.solr.search.FastLRUCache
cumulative_evictions

This set of stats gets me a pretty good idea of what's going on with my
SolrCloud at any time. Anyone have any thoughts or suggestions?

Joel Cohen
Senior System Engineer
Bluefly, Inc.


On Mon, Feb 3, 2014 at 11:25 AM, Greg Walters greg.walt...@answers.comwrote:

 The code I wrote is currently a bit of an ugly hack so I'm a bit reluctant
 to share it and there's some legal concerns with open-sourcing code within
 my company. That being said, I wouldn't mind rewriting it on my own time.
 Where can I find a starter kit for contributors with coding guidelines and
 the like? Spruced up some I'd be OK with submitting a patch.

 Thanks,
 Greg

 On Feb 3, 2014, at 10:08 AM, Mark Miller markrmil...@gmail.com wrote:

  You should contribute that and spread the dev load with others :)
 
  We need something like that at some point, it's just no one has done it.
 We currently expect you to aggregate in the monitoring layer and it's a lot
 to ask IMO.
 
  - Mark
 
  http://about.me/markrmiller
 
  On Feb 3, 2014, at 10:49 AM, Greg Walters greg.walt...@answers.com
 wrote:
 
  I've had some issues monitoring Solr with the per-core mbeans and ended
 up writing a custom request handler that gets loaded then registers
 itself as an mbean. When called it polls all the per-core mbeans then adds
 or averages them where appropriate before returning the requested value.
 I'm not sure if there's a better way to get jvm-wide stats via jmx but it
 is *a* way to get it

Re: need help in understating solr cloud stats data

2014-02-03 Thread David Santamauro



Zabbix 2.2 has a jmx client built in as well as a few JVM templates. I 
wrote my own templates for my solr instance and monitoring and graphing 
is wonderful.


David


On 02/03/2014 12:55 PM, Joel Cohen wrote:

I had to come up with some Solr stats monitoring for my Zabbix instance. I
found that using JMX was the easiest way for us.

There is a command line jmx client that works quite well for me.
http://crawler.archive.org/cmdline-jmxclient/

I wrote a shell script to wrap around that and shove the data back to
Zabbix for ingestion and monitoring. I've listed the stats that I am
gathering, and the mbean that is called. My shell script is rather
simplistic.

!/bin/bash

cmdLineJMXJar=/usr/local/lib/cmdline-jmxclient.jar
jmxHost=$1
port=$2
query=$3
value=$4

java -jar ${cmdLineJMXJar} user:pass ${jmxHost}:${port} ${query} ${value}
21 | awk '{print $NF}'

The script is called as so: jmxstats.sh solr server name or IP jmx port
name of mbean value to query from mbean
My collection name is productCatalog, so swap that with yours.

*select requests*:
solr/productCatalog:id=org.apache.solr.handler.component.SearchHandler,type=/select
requests
*select errors:
*solr/productCatalog:id=org.apache.solr.handler.component.SearchHandler,type=/select
errors
*95th percentile request time*:
solr/productCatalog:id=org.apache.solr.handler.component.SearchHandler,type=/select
95thPcRequestTime
*update requests*:
solr/productCatalog:id=org.apache.solr.handler.UpdateRequestHandler,type=/update
requests
*update errors:*
solr/productCatalog:id=org.apache.solr.handler.UpdateRequestHandler,type=/update
errors
*95th percentile update time:*
solr/productCatalog:id=org.apache.solr.handler.UpdateRequestHandler,type=/update
95thPcRequestTime

*query result cache lookups*:
solr/productCatalog:id=org.apache.solr.search.LRUCache,type=queryResultCache
cumulative_lookups
*query result cache inserts*:
solr/productCatalog:id=org.apache.solr.search.LRUCache,type=queryResultCache
cumulative_inserts
*query result cache evictions*:
solr/productCatalog:id=org.apache.solr.search.LRUCache,type=queryResultCache
cumulative_evictions
*query result cache hit ratio:
*solr/productCatalog:id=org.apache.solr.search.LRUCache,type=queryResultCache
cumulative_hitratio

*document cache lookups:
*solr/productCatalog:id=org.apache.solr.search.LRUCache,type=documentCache
cumulative_lookups
*document cache inserts:
*solr/productCatalog:id=org.apache.solr.search.LRUCache,type=documentCache
cumulative_inserts
*document cache evictions:
*solr/productCatalog:id=org.apache.solr.search.LRUCache,type=documentCache
cumulative_evictions
*document cache hit ratio:
*solr/productCatalog:id=org.apache.solr.search.LRUCache,type=documentCache
cumulative_hitratio

*filter cache lookups:
*solr/productCatalog:type=filterCache,id=org.apache.solr.search.FastLRUCache
cumulative_lookups
*filter cache inserts:
*solr/productCatalog:type=filterCache,id=org.apache.solr.search.FastLRUCache
cumulative_inserts
*filter cache evictions:
*solr/productCatalog:type=filterCache,id=org.apache.solr.search.FastLRUCache
cumulative_evictions
*filter cache hit ratio:
*solr/productCatalog:type=filterCache,id=org.apache.solr.search.FastLRUCache
cumulative_hitratio

*field value cache lookups:
*solr/productCatalog:type=fieldValueCache,id=org.apache.solr.search.FastLRUCache
cumulative_lookups
*field value cache inserts:
*solr/productCatalog:type=fieldValueCache,id=org.apache.solr.search.FastLRUCache
cumulative_inserts
*field value cache evictions:
*solr/productCatalog:type=fieldValueCache,id=org.apache.solr.search.FastLRUCache
cumulative_evictions
*field value cache hit ratio:
*solr/productCatalog:type=fieldValueCache,id=org.apache.solr.search.FastLRUCache
cumulative_evictions

This set of stats gets me a pretty good idea of what's going on with my
SolrCloud at any time. Anyone have any thoughts or suggestions?

Joel Cohen
Senior System Engineer
Bluefly, Inc.


On Mon, Feb 3, 2014 at 11:25 AM, Greg Walters greg.walt...@answers.comwrote:


The code I wrote is currently a bit of an ugly hack so I'm a bit reluctant
to share it and there's some legal concerns with open-sourcing code within
my company. That being said, I wouldn't mind rewriting it on my own time.
Where can I find a starter kit for contributors with coding guidelines and
the like? Spruced up some I'd be OK with submitting a patch.

Thanks,
Greg

On Feb 3, 2014, at 10:08 AM, Mark Miller markrmil...@gmail.com wrote:


You should contribute that and spread the dev load with others :)

We need something like that at some point, it's just no one has done it.

We currently expect you to aggregate in the monitoring layer and it's a lot
to ask IMO.


- Mark

http://about.me/markrmiller

On Feb 3, 2014, at 10:49 AM, Greg Walters greg.walt...@answers.com

wrote:



I've had some issues monitoring Solr with the per-core mbeans and ended

up writing a custom request handler that gets loaded then registers
itself as an mbean. When called it polls all the

Re: Announce list

2014-02-03 Thread Chris Hostetter


: Is there a mailing list for getting just announcements about new versions?

This is the primary usecase for the general list, although it does 
occasionally get other traffic from people with questions/discussion about 
the project as a whole...

https://lucene.apache.org/solr/discussion.html#general-discussion-generallucene
https://mail-archives.apache.org/mod_mbox/lucene-general/

If you are looking for a really low volume list where release 
announcements are made, that's the place to start.


-Hoss
http://www.lucidworks.com/

Solr and Polygon/Radius based spatial searches

2014-02-03 Thread leevduhl

We have a public property search site that we are looking to replace the back
end index server on and we are looking at Solr as a possible replacement
(ElasticSearch is another possibility).

One of the key search components of out site is to search on a bounding box
(rectangle), custom multi-point polygon, and/or a radius from a point.

It appears that Solr3 and Solr4 both supported spatial searching, but using
different methods.  Also, per this link,
http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4, it appears that
Solr only supports point, rectangle and circle shapes and needs JTS and/or
WKT to support multi-point non non rectangular polygon shapes.

Our indexed data will included the long/lat values for all property records.

If someone can provide sample queries for the following situations, it would
be appreciated:
- All properties/points that fall within a multi-point polygon (ie: Polygon
points: Lo1 La1, Lo2 La2, Lo3 La3, Lo4 La4, Lo5 La5, Lo1, La1)

- All properties that fall within 1.5 miles (radius) of point: Lo1 La1

Other spatial search type functionality that may be targeted included:
- Ability to search within multiple polygons (both intersecting, non
intersecting and combinations
- Ability to search for properties that fall outside of a polygon

Thanks 
Lee



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-and-Polygon-Radius-based-spatial-searches-tp4115121.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Score of Search Term for every character remove

2014-02-03 Thread Erick Erickson

Maybe edgeNgram tokenizer? You haven't told us what the fields in the docs
you care about are

Best,
Erick


On Mon, Feb 3, 2014 at 4:48 AM, Lusung, Abner abner.lus...@vishay.comwrote:

  Hi,



 I'm new with using SOLR and I'm curious if this is capable of doing the
 following or similar.



 Sample:

 Query: ABCDEF

 Returns:

 ABCDEF  0 hits

 ABCDE  2 hits

 ABCD  3 hits

 ABC  10 hits

 AB  20 hits

 A  100 hits



 In one request only.



 Thanks.



 *Abner G. Lusung Jr.*| Java Web Development, Internet and Commerce,
 Global Web Services  | Vishay Philippines Inc.

 10th Floor Pacific Star Building, Makati Avenue corner Buendia Avenue,
 Makati City, Philippines 1200

 Phone: +63 2 8387421 loc. 7995 | Mobile: +63 9169674514

 Website : www.vishay.com



 [image: Vishay] http://www.vishay.com/

Re: Score of Search Term for every character remove

2014-02-03 Thread Jack Krupansky

I think he want to do a bunch of separate queries and return separate result 
sets for each.


Hmmm... maybe it would be nice to allow multiple q parameters in one query 
request, each returning a separate set of results.


-- Jack Krupansky

-Original Message- 
From: Erick Erickson

Sent: Monday, February 3, 2014 2:08 PM
To: solr-user@lucene.apache.org
Subject: Re: Score of Search Term for every character remove

Maybe edgeNgram tokenizer? You haven't told us what the fields in the docs
you care about are

Best,
Erick


On Mon, Feb 3, 2014 at 4:48 AM, Lusung, Abner 
abner.lus...@vishay.comwrote:



 Hi,



I'm new with using SOLR and I'm curious if this is capable of doing the
following or similar.



Sample:

Query: ABCDEF

Returns:

ABCDEF  0 hits

ABCDE  2 hits

ABCD  3 hits

ABC  10 hits

AB  20 hits

A  100 hits



In one request only.



Thanks.



*Abner G. Lusung Jr.*| Java Web Development, Internet and Commerce,
Global Web Services  | Vishay Philippines Inc.

10th Floor Pacific Star Building, Makati Avenue corner Buendia Avenue,
Makati City, Philippines 1200

Phone: +63 2 8387421 loc. 7995 | Mobile: +63 9169674514

Website : www.vishay.com



[image: Vishay] http://www.vishay.com/

Re: SolrCloud query results order master vs replica

2014-02-03 Thread Erick Erickson

This should only be happening if the scores are _exactly_ the same,
which is actually
quite rare. In that case, the tied scores are broken by the internal
Lucene document
ID, and the relative order of the docs on the two machines isn't
guaranteed to be the
same, the internal ID can change during segment merging, which is NOT the same
on both machines.

But this should be relatively rare. If you're doing *:* queries or
other such, then they
aren't scored (see ConstantScoreQuery). So in practical terms, I suspect you're
seeing some kind of test artifact. Try adding debug=all to the query
and you'll see
how documents are scored.

Best,
Erick

On Mon, Feb 3, 2014 at 6:57 AM, M. Flatterie nicflatte...@yahoo.com wrote:
 Greetings,

 My setup is:
 - SolrCloud V4.3
 - On collection
 - one shard
 - 1 master, 1 replica

 so each instance contains the entire index.  The index is rather small and 
 the replica is used for robustness.  There is no need (IMHO) to split shard 
 the index (yet, until the index gets bigger).

 My question:
 - if I do a query on a product name (that is what the index is about) on the 
 master I get a certain number of results and the documents.
 - if I do the same query on the replica, I get the same number of results but 
 the docs are in a different order.
 - I do not specify a sort parameter in my query, simply a q=product name.
 - obviously if I force a sort order, everything is ok, same results, same 
 order from both instances.
 - am I wrong in expecting the same results, in the SAME order?

 Follow up question if the order is not guaranteed:
 - should I force the dev. to use an explicit sort order?
 - if we force the sort, we then bypass the ranking / score order do we not?
 - should I force all queries to go to the master and fall back on the replica 
 only in the context of a total loss of the master?

 Other useful information:
   - the admin page shows same number of documents in both instances.
   - logs are clean, load and replication and queries worked ok.
   - the web application that queries SOLR round robins between the two 
 instances, so getting results in a different order is bad for consistency.

 Thank you for your help!

 Nic

Re: need help in understating solr cloud stats data

2014-02-03 Thread Erick Erickson

See:

http://wiki.apache.org/solr/HowToContribute

It outlines how to get the code, how to work with patches, how to set
up IntelliJ and Eclipse IDEs (links near the bottom?). There are
formatting files for both IntelliJ and Eclipse that'll do the right
thing in terms of indents and such.

Legal issues aside, you don't to be very compulsive about cleaning up
the code before posting the first patch! Just let people know you
don't consider it ready to commit. You'll want to open a JIRA to
attach it to. People often put in //nocommit in places they especially
don't like, and the precommit ant target takes care of keeping these
from getting into the code.

People are quite happy to see hack, first-cut patches. You'll often
get suggestions on approaches that may be easier and nobody will
complain about bad code when they know that _you_ don't consider it
submittable. Google for Yonik's law of half-baked patches.

One thing that escapes people often... When attaching a patch to a
JIRA, just call it SOLR-.patch, where is the JIRA number.
Successive versions of the patch should have the _same_ name, they'll
all be listed and the newest one will be live. It's easier to know
what is the right patch that way. No big deal either way.

Best,
Erick

On Mon, Feb 3, 2014 at 8:25 AM, Greg Walters greg.walt...@answers.com wrote:
The code I wrote is currently a bit of an ugly hack so I'm a bit reluctant to
share it and there's some legal concerns with open-sourcing code within my
company. That being said, I wouldn't mind rewriting it on my own time. Where
can I find a starter kit for contributors with coding guidelines and the
like? Spruced up some I'd be OK with submitting a patch.

Thanks,
Greg

On Feb 3, 2014, at 10:08 AM, Mark Miller markrmil...@gmail.com wrote:

You should contribute that and spread the dev load with others :)

We need something like that at some point, it's just no one has done it. We
currently expect you to aggregate in the monitoring layer and it's a lot to
ask IMO.

- Mark

http://about.me/markrmiller

On Feb 3, 2014, at 10:49 AM, Greg Walters greg.walt...@answers.com wrote:

I've had some issues monitoring Solr with the per-core mbeans and ended up
writing a custom request handler that gets loaded then registers itself
as an mbean. When called it polls all the per-core mbeans then adds or
averages them where appropriate before returning the requested value. I'm
not sure if there's a better way to get jvm-wide stats via jmx but it is
*a* way to get it done.

Thanks,
Greg

On Feb 3, 2014, at 1:33 AM, adfel70 adfe...@gmail.com wrote:

2.update_handler/commits - does this include auto_commits? becuaste I'm
pretty sure I'm not doing any manual commits and yet I see a number there.

3. update_handler/docs pending - what does this mean? pending for what? for
flush to disk?

thanks.

--
View this message in context:
http://lucene.472066.n3.nabble.com/need-help-in-understating-solr-cloud-stats-data-tp4114992.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Not finding part of fulltext field when word ends in dot

2014-02-03 Thread Thomas Michael Engelke

That was a complicated answer, but ultimately the right one. Thank you very
much.


2014-01-30 Jack Krupansky j...@basetechnology.com:

 The word delimiter filter will turn 26KA into two tokens, as if you had
 written 26 KA without the quotes. The autoGeneratePhraseQueries option
 will cause the multiple terms to be treated as if they actually were
 enclosed within quotes, otherwise they will be treated as separate and
 unquoted terms. If you do enclose 26KA in quotes in your query then
 autoGeneratePhraseQueries is not relevant.

 Ah... maybe the problem is that you have preserveOriginal=true in your
 query analyzer. Do you have your default query operator set to AND? If
 so, it would treat 26KA as 26 AND KA AND 26KA, which requires that
 26KA (without the trailing dot) to be in the index.

 It seems counter-intuitive, but the attributes of the index and query word
 delimiter filters need to be slightly asymmetric.


 -- Jack Krupansky

 -Original Message- From: Thomas Michael Engelke
 Sent: Thursday, January 30, 2014 2:16 AM

 To: solr-user@lucene.apache.org
 Subject: Re: Not finding part of fulltext field when word ends in dot

 I'm not sure I got my problem across. If I understand the snippet of
 documentation right, autoGeneratePhraseQueries only affects queries that
 result in multiple tokens, which mine does not. The version also is
 3.6.0.1, and we're not planning on upgrading to any 4.x version.


 2014-01-29 Jack Krupansky j...@basetechnology.com

  You might want to add autoGeneratePhraseQueries=true to your field
 type, but I don't think that would cause a break when going from 3.6 to
 4.x. The default for that attribute changed in Solr 3.5. What release was
 your data indexed using? There may have been some subtle word delimiter
 filter changes between 3.x and 4.x.

 Read:
 http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201202.mbox/%
 3CC0551C512C863540BC59694A118452AA0764A434@ITS-EMBX-03.
 adsroot.itcs.umich.edu%3E



 -Original Message- From: Thomas Michael Engelke
 Sent: Wednesday, January 29, 2014 11:16 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Not finding part of fulltext field when word ends in dot


 The fieldType definition is a tad on the longer side:

fieldType name=text class=solr.TextField
 positionIncrementGap=100
analyzer type=index
tokenizer
 class=solr.WhitespaceTokenizerFactory/

filter
 class=solr.WordDelimiterFilterFactory
catenateWords=1
catenateNumbers=1
generateNumberParts=1
splitOnCaseChange=1
generateWordParts=1
catenateAll=0
preserveOriginal=1
splitOnNumerics=0
/

filter
 class=solr.LowerCaseFilterFactory/
filter class=solr.SynonymFilterFactory
 synonyms=german/synonyms.txt ignoreCase=true expand=true/
filter
 class=solr.DictionaryCompoundWordTokenFilterFactory

 dictionary=german/german-common-nouns.txt
minWordSize=5
minSubwordSize=4
maxSubwordSize=15
onlyLongestMatch=true
/

filter class=solr.StopFilterFactory
 words=german/stopwords.txt ignoreCase=true
 enablePositionIncrements=true/
filter
 class=solr.SnowballPorterFilterFactory language=German2
 protected=german/protwords.txt/
filter
 class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
analyzer type=query
tokenizer
 class=solr.WhitespaceTokenizerFactory/

filter
 class=solr.WordDelimiterFilterFactory
catenateWords=0
catenateNumbers=0
generateWordParts=1
splitOnCaseChange=1
generateNumberParts=1
catenateAll=0
preserveOriginal=1
splitOnNumerics=0
/
filter
 class=solr.LowerCaseFilterFactory/
filter class=solr.StopFilterFactory
 words=german/stopwords.txt ignoreCase=true
 enablePositionIncrements=true/

Re: Solr and Polygon/Radius based spatial searches

2014-02-03 Thread Smiley, David W.

Hi Lee,

On 2/3/14, 1:59 PM, leevduhl ld...@corp.realcomp.com wrote:

We have a public property search site that we are looking to replace the
back
end index server on and we are looking at Solr as a possible replacement
(ElasticSearch is another possibility).

Both should work equally well.


One of the key search components of out site is to search on a bounding
box
(rectangle), custom multi-point polygon, and/or a radius from a point.

It appears that Solr3 and Solr4 both supported spatial searching, but
using
different methods.  Also, per this link,
http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4, it appears that
Solr only supports point, rectangle and circle shapes and needs JTS and/or
WKT to support multi-point non non rectangular polygon shapes.

Yup.  I¹m not sure what you mean by a multi-point² polygon thoughŠ is
that somehow different than a polygon that isn¹t multi-point?  All
polygons are comprised of at least 3 distinct points (a triangle).


Our indexed data will included the long/lat values for all property
records.

If someone can provide sample queries for the following situations, it
would
be appreciated:
- All properties/points that fall within a multi-point polygon (ie:
Polygon
points: Lo1 La1, Lo2 La2, Lo3 La3, Lo4 La4, Lo5 La5, Lo1, La1)

mygeorptfieldname:²Intersects(POLYGON((x1 y1, x2 y2, x3 y3, Š, x1 y1)))²

Inside of the immediate parenthesis of Intersects is a standard WKT
formatted polygon.  Note ³x y² order (longitude space latitude).


- All properties that fall within 1.5 miles (radius) of point: Lo1 La1

Just use Solr¹s standard ³geofilt² query parser:
fq={!geofilt}pt=lat,lond=0.021710

I got the distance value by converting miles to kilometers which is what
goofily expects (1.5 * 1.60934400061469).


Other spatial search type functionality that may be targeted included:
- Ability to search within multiple polygons (both intersecting, non
intersecting and combinations

No problem for union: Use standard WKT: MULTIPOLYGON or
GEOMETRYCOLLECTION.  If you want to combine them in interesting ways then
you¹re going to have to compute that client-side and send the resulting
polygon(s) to Solr (or ElasticSearch).  You could use JTS to do that,
which has a trove of spatial functionality for such things.  I¹m thinking
of some day adding some basic operator extensions to the WKT so you don¹t
have to do this on the client end.  Leveraging JTS server-side it would be
particularly easy, but it would also be pretty easy do it as a custom
shape aggregate, similar to Spatial4j 0.4¹s ShapeCollection.

- Ability to search for properties that fall outside of a polygon

You could use ³IsDisjointTo (instead of ³Intersects²) but you¹ll
generally get faster results by negating intersects.  For an example,
simply precede the first polygonal example with a ³NOT ³.


Thanks 
Lee

~ David

Re: SolrCloud multiple data center support

2014-02-03 Thread Daniel Collins

Option a) doesn't really work out of the box, *if you need NRT support*.
 The main reason (for us at least) is the ZK ensemble and maintaining
quorum. If you have a single ensemble, say 3 ZKs in 1 DC and 2 in another,
then if you lose DC 2, you lose 2 ZKs and the rest are fine.  But if you
lose the main DC that has 3 ZKs, you lose quorum.  Searches will be ok, but
if you are an NRT-setup, your updates will all stall until you get another
ZK started (and reload the whole Solr Cloud to give them the ID of that new
ZK).

For us, availability is more important than consistency, so we currently
have 2 independent setups, 1 ZK ensemble and Solr Cloud per DC.  We already
had an indexing system that serviced DCs so we didn't need something like
Flume.  We also have external systems that handle routing to some extent,
so we can route locally to each Cloud, and not have to worry about
cross-DC traffic.

One solution to that is have a 3rd DC with few instances in, say another 2
ZKs. That would take your total ensemble to 7, and you can lose 3 whilst
still maintaining quorum.  Since ZK is relatively light-weight, that 3rd
Data Centre doesn't have to be as robust, or contain Solr replicas, its
just a place to house 1 or 2 machines for holding ZKs.  We will probably
migrate to this kind of setup soon as it ticks more of our boxes.

One other option is in ZK trunk (but not yet in a release) is the ability
to dynamically reconfigure ZK ensembles (
https://issues.apache.org/jira/browse/ZOOKEEPER-107).  That would give the
ability to create new ZK instances in the event of a DC failure, and
reconfigure the Solr Cloud without having to reload everything. That would
help to some extent.

If you don't need NRT, then the solution is somewhat easier, as you don't
have to worry as much about ZK quorum, a single ZK ensemble across DCs
might be sufficient for you in that case.


On 3 February 2014 17:44, Mark Miller markrmil...@gmail.com wrote:

 SolrCloud has not tackled multi data center yet.

 I don't think a or b are very good options yet.

 Honestly, I think the best current bet is to use something like Apache
 Flume to send data to both data centers - it will handle retries and
 keeping things in sync and splitting the stream. Doesn't satisfy all use
 cases though.

 At some point, multi data center support will happen.

 I can't remember where ZooKeeper's support for it is at, but with that and
 some logic to favor nodes in your data center, that might be a viable route.

 - Mark

 http://about.me/markrmiller

 On Feb 3, 2014, at 11:48 AM, Darrell Burgan darrell.bur...@infor.com
 wrote:

  Hello, we are using Solr in a SolrCloud configuration, with two Solr
 instances running with three Zookeepers in a single data center. We
 presently have a single search index with about 35 million entries in it,
 about 60GB disk space on each of the two Solr servers (120GB total). I
 would expect our usage of Solr to grow to include other search indexes, and
 likely larger data volumes.
 
  I'm writing because we're needing to grow beyond a single data center,
 with two (potentially incompatible) goals:
 
  1.   We need to be able to have a hot disaster recovery site, in a
 completely separate data center, that has a near-realtime replica of the
 search index.
 
  2.   We'd like to have the option to have multiple active/active
 data centers that each see and update the same search index, distributed
 across data centers.
 
  The options I'm aware of from reading archives:
 
  a.   Simply set up the remote Solr instances as active parts of the
 same SolrCloud cluster. This will  essentially involve us standing up
 multiple Zookeepers in the second data center, and multiple Solr instances,
 and they will all keep each other in sync magically. This will also solve
 both of our goals. However, I'm concerned about performance and whether
 SolrCloud is smart enough to route local search queries only to local Solr
 servers ... ? Also, how does such a cluster tolerate and recover from network
 partitions?
 
  b.  The remote Solr instances form their own completely unrelated
 SolrCloud cluster. I have to invent some kind of replication logic of my
 own to sync data between them. This replication would have to be
 bidirectional to satisfy both of our goals. I strongly dislike this option
 since the application really should not concern itself with data
 distribution. But I'll do it if I must.
 
  So my questions are:
 
  -  Can anyone give me any guidance as to option a? Anyone using
 this in a real production setting? Words of wisdom? Does it work?
 
  -  Are there any other options that I'm not considering?
 
  -  What is Solr's answer to such configurations (we can't be
 alone in needing one)? Any big enhancements coming on the Solr road map to
 deal with this?
 
  Thanks!
  Darrell Burgan
 
 
 
  Darrell Burgan | Chief Architect, PeopleAnswers
  office: 214 445 2172 | mobile: 214 564 4450 | fax: 972 692

Adding HTTP Request Header in SolrJ

2014-02-03 Thread Andrew Doyle

Our web services are using PKI authentication so we have a user DN, however
we're querying an external Solr which is managed via a proxy which is
expecting our server DN proxying the user DN. My question is, how do we add
an HTTP header to the request being made by SolrJ?

I looked through the source code and I see that we can specify an
HttpClient when we create a new instance of an HttpSolrServer. I can set
the header there, but that seems slightly hackey to me. I'd prefer to use a
servlet filter if possible.

Do you have any other suggestions?

Thanks!


*-- Andrew Doyle*
Software Engineer II

 http://www.clearedgeit.com/news/2013/baltimore-sun-top-workplace/

10620 Guilford Road, Suite 200
Jessup, MD 20794
direct: 410 854 5560
cell: 410 440 8478

*ado...@clearedgeit.com ado...@clearedgeit.com*
* www.ClearEdgeIT.com http://www.ClearEdgeIT.com*

Re: Adding HTTP Request Header in SolrJ

2014-02-03 Thread Shawn Heisey


On 2/3/2014 3:40 PM, Andrew Doyle wrote:

Our web services are using PKI authentication so we have a user DN, however
we're querying an external Solr which is managed via a proxy which is
expecting our server DN proxying the user DN. My question is, how do we add
an HTTP header to the request being made by SolrJ?

I looked through the source code and I see that we can specify an
HttpClient when we create a new instance of an HttpSolrServer. I can set
the header there, but that seems slightly hackey to me. I'd prefer to use a
servlet filter if possible.

Do you have any other suggestions?


I don't think there's any servlet information (like the filters you 
mentioned) available in SolrJ.  There is in Solr itself, which uses 
SolrJ, but unless you're writing a servlet or custom server side code 
for Solr, you won't have access to any of that.  If you are writing a 
servlet or custom server-side code, then they'll be available -- but not 
from SolrJ.


I could be wrong about what I just said, but just now when I looked 
through the code for HttpSolrServer and SolrServer, I did not see 
anything about servlets or filters.


In my own SolrJ application, I create an HttpClient instance that is 
used across dozens of HttpSolrServer instances. The following is part of 
the constructor code for my custom Core class.


/*
 * If this is the first time a Core has been created, create 
the shared
 * httpClient with some increased connection properties. 
Synchronized to

 * ensure thread safety.
 */
synchronized (firstInstance)
{
if (firstInstance)
{
ModifiableSolrParams params = new ModifiableSolrParams();
params.add(HttpClientUtil.PROP_MAX_CONNECTIONS_PER_HOST, 200);
params.add(HttpClientUtil.PROP_MAX_CONNECTIONS, 5000);
httpClient = HttpClientUtil.createClient(params);
firstInstance = false;
}
}

These are the static class members used in the above code:

/**
 * A static boolean value indicating whether this is the first 
instance of

 * this object. Also used for thread synchronization.
 */
private static Boolean firstInstance = true;

/**
 * A static http client to use on all Solr server objects.
 */
private static HttpClient httpClient = null;

Just so you know, the deprecations introduced by the recent upgrade to 
HttpClient 4.3 might complicate things further when it comes to user 
code.  See SOLR-5604.  I have some ideas about how to proceed on that 
issue, but haven'thad a lot of time to look into it, and before I do 
anything, I need to discuss it with people who are smarter than me.


https://issues.apache.org/jira/browse/SOLR-5604

Thanks,
Shawn

Re: Special NGRAMish requirement

2014-02-03 Thread Otis Gospodnetic

Hi,

Can you provide an example, Alexander?

Otis
Solr  ElasticSearch Support
http://sematext.com/
On Feb 3, 2014 5:28 AM, Lochschmied, Alexander 
alexander.lochschm...@vishay.com wrote:

 Hi,

 we need to use something very similar to EdgeNGram (minGramSize=1
 maxGramSize=50 side=front).
 The only thing missing is that we would like to reduce the number of
 matches. The request we need to implement is returning only those matches
 with the longest tokens (or terms if that is the right word).

 Is there a way to do this in Solr (not necessarily with EdgeNGram)?

 Thanks,
 Alexander

Re: how to write an efficient query with a subquery to restrict the search space?

2014-02-03 Thread Otis Gospodnetic

Hi,

Sounds like a possible document and query routing use case.

Otis
Solr  ElasticSearch Support
http://sematext.com/
On Jan 31, 2014 7:11 AM, svante karlsson s...@csi.se wrote:

 It seems to be faster to first restrict the search space and then do the
 scoring compared to just use the full query and let solr handle everything.

 For example in my application one of the scoring fields effectivly hits
 1/12 of the database (a month field) and if we have 100'' items in the
 database the this matters.

 /svante


 2014-01-30 Jack Krupansky j...@basetechnology.com:

  Lucene's default scoring should give you much of what you want - ranking
  hits of low-frequency terms higher - without any special query syntax -
  just list out your terms and use OR as your default operator.
 
  -- Jack Krupansky
 
  -Original Message- From: svante karlsson
  Sent: Thursday, January 23, 2014 6:42 AM
  To: solr-user@lucene.apache.org
  Subject: how to write an efficient query with a subquery to restrict the
  search space?
 
 
  I have a solr db containing 1 billion records that I'm trying to use in a
  NoSQL fashion.
 
  What I want to do is find the best matches using all search terms but
  restrict the search space to the most unique terms
 
  In this example I know that val2 and val4 is rare terms and val1 and val3
  are more common. In my real scenario I'll have 20 fields that I want to
  include or exclude in the inner query depending on the uniqueness of the
  requested value.
 
 
  my first approach was:
  q=field1:val1 OR field2:val2 OR field3:val3 OR field4:val4 AND
 (field2:val2
  OR field4:val4)rows=100fl=*
 
  but what I think I get is
  .  field4:val4 AND (field2:val2 OR field4:val4)   this result is then
  OR'ed with the rest
 
  if I write
  q=(field1:val1 OR field2:val2 OR field3:val3 OR field4:val4) AND
  (field2:val2 OR field4:val4)rows=100fl=*
 
  then what I think I get is two sub-queries that is evaluated separately
 and
  then joined - performance wise this is bad.
 
  Whats the best way to write these types of queries?
 
 
  Are there any performance issues when running it on several solrcloud
 nodes
  vs a single instance or should it scale?
 
 
 
  /svante

Re: Adding DocValues in an existing field

2014-02-03 Thread Otis Gospodnetic

Hi,

You can change the field definition and then reindex.

Otis
Solr  ElasticSearch Support
http://sematext.com/
On Jan 30, 2014 1:12 PM, yriveiro yago.rive...@gmail.com wrote:

 Hi,

 Can I add to an existing field the docvalue feature without wipe the
 actual?

 The modification on the schema will be something like this:
 field name=surrogate_id  type=tlong   indexed=true  stored=true
 multiValued=false /
 field name=surrogate_id  type=tlong   indexed=true  stored=true
 multiValued=false  docValues=true/

 I want use the actual data to reindex it again in the same collection but
 in
 the process create the docvalues too, it's possible?

 I'm using solr 4.6.1



 -
 Best regards
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Adding-DocValues-in-an-existing-field-tp4114462.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: need help in understating solr cloud stats data

2014-02-03 Thread Otis Gospodnetic

Hi,

Oh, I just saw Greg's email on dev@ about this.
IMHO aggregating in the search engine is not the way to do. Leave that to
external tools, which are likely to be more flexible when it comes to this.
For example, our SPM for Solr can do all kinds of aggregations and
filtering by a number of Solr and SolrCloud-specific dimensions already,
without Solr having to do any sort of aggregation that it thinks Ops people
will really want.

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr Elasticsearch Support * http://sematext.com/

On Mon, Feb 3, 2014 at 11:08 AM, Mark Miller markrmil...@gmail.com wrote:

You should contribute that and spread the dev load with others :)

We need something like that at some point, it's just no one has done it.
We currently expect you to aggregate in the monitoring layer and it's a lot
to ask IMO.

- Mark

http://about.me/markrmiller

On Feb 3, 2014, at 10:49 AM, Greg Walters greg.walt...@answers.com
wrote:

I've had some issues monitoring Solr with the per-core mbeans and ended
up writing a custom request handler that gets loaded then registers
itself as an mbean. When called it polls all the per-core mbeans then adds
or averages them where appropriate before returning the requested value.
I'm not sure if there's a better way to get jvm-wide stats via jmx but it
is *a* way to get it done.

Thanks,
Greg

On Feb 3, 2014, at 1:33 AM, adfel70 adfe...@gmail.com wrote:

I'm sending all solr stats data to graphite.
I have some questions:
1. query_handler/select requestTime -
if i'm looking at some metric, lets say 75thPcRequestTime - I see that
each
core in a single collection has different values.
Is each value of each core is the time that specific core spent on a
request?
so to get an idea of total request time, I should summarize all the
values
of all the cores?

2.update_handler/commits - does this include auto_commits? becuaste I'm
pretty sure I'm not doing any manual commits and yet I see a number
there.

3. update_handler/docs pending - what does this mean? pending for what?
for
flush to disk?

thanks.

--
View this message in context:
http://lucene.472066.n3.nabble.com/need-help-in-understating-solr-cloud-stats-data-tp4114992.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Duplicate Facet.FIelds cause same results, should dedupe?

2014-02-03 Thread Otis Gospodnetic

Hi,

Don't know if this is old or new problem, but it does feel like a bug to me.

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr  Elasticsearch Support * http://sematext.com/


On Mon, Feb 3, 2014 at 10:48 AM, William Bell billnb...@gmail.com wrote:

 If we add :

 facet.field=prac_spec_heirfacet.field=prac_spec_heir

 we get it twice in the results. This breaks deserialization on wt=json
 since you cannot have the same name twice

 Thoughts? Seems like a new bug in 4.6 ?


 facet.field:
 [prac_spec_heir,all_proc_name_code,all_cond_name_code,
 prac_spec_heir,{!ex=exgender}gender,{!ex=expayor}payor_code_name],

 --
 Bill Bell
 billnb...@gmail.com
 cell 720-256-8076

Re: Duplicate Facet.FIelds cause same results, should dedupe?

2014-02-03 Thread William Bell

THis is in 4.6.1.


On Mon, Feb 3, 2014 at 9:11 PM, Otis Gospodnetic otis.gospodne...@gmail.com
 wrote:

 Hi,

 Don't know if this is old or new problem, but it does feel like a bug to
 me.

 Otis
 --
 Performance Monitoring * Log Analytics * Search Analytics
 Solr  Elasticsearch Support * http://sematext.com/


 On Mon, Feb 3, 2014 at 10:48 AM, William Bell billnb...@gmail.com wrote:

  If we add :
 
  facet.field=prac_spec_heirfacet.field=prac_spec_heir
 
  we get it twice in the results. This breaks deserialization on wt=json
  since you cannot have the same name twice
 
  Thoughts? Seems like a new bug in 4.6 ?
 
 
  facet.field:
  [prac_spec_heir,all_proc_name_code,all_cond_name_code,
  prac_spec_heir,{!ex=exgender}gender,{!ex=expayor}payor_code_name],
 
  --
  Bill Bell
  billnb...@gmail.com
  cell 720-256-8076
 




-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076

Re: Solr and SDL Tridion Integration

2014-02-03 Thread Prasi S

Thanks a lot for the options. Our site has dynamic content as well. I would
look into what best suits.

Thanks,
Prasi


On Mon, Feb 3, 2014 at 10:34 PM, Chris Warner chris_war...@yahoo.comwrote:

 There are many ways to do this, Prasi. You have a lot of thinking to do on
 the subject.

 You could decide to publish your content to database, and then index that
 database in Solr.

 You could publish XML or CSV files of your content for Solr to read and
 index.

 You could use nutch or some other tool to crawl your web server.

 There are many more methods, probably. These being some of the more common.

 Does your site have dynamic content presentation? If so, you may want to
 consider having Solr examine your broker database.

 Static pages on your site? You may want to go with either a crawler or
 publishing a special file for Solr.

 Please check out https://tridion.stackexchange.com/ for more on this
 topic.

 --
 chris_war...@yahoo.com



 On Monday, February 3, 2014 3:54 AM, Jack Krupansky 
 j...@basetechnology.com wrote:
 If SDL Tridion can export to CSV format, Solr can then import from CSV
 format.

 Otherwise, you may have to write a custom script or even maybe Java code to
 read from SDL Tridion and output a supported Solr format, such as Solr XML,
 Solr JSON, or CSV.

 -- Jack Krupansky


 -Original Message-
 From: Prasi S
 Sent: Monday, February 3, 2014 4:16 AM
 To: solr-user@lucene.apache.org
 Subject: Solr and SDL Tridion Integration

 Hi,
 I want to index sdl tridion content to solr. Can you suggest how this can
 be achieved. Is there any document/tutorial for this? Thanks

 Thanks,
 Prasi

Solr ranking query..

2014-02-03 Thread Chris

Hi,

I have a document structure that looks like the below. I would like to
implement something like -

(urlKeywords:+keyword+ AND domainRank:[3 TO 1] AND adultFlag:N)^60  +
 OR (title:+keyword+ AND domainRank:[3 TO 1] AND adultFlag:N)^20  +
  OR (title:+keyword+ AND domainRank:[10001 TO *] AND adultFlag:N)^2  +
  OR (fulltxt:+keyword+) );


In case we have multiple words in keywords - A B C D then for the
documents that have all the words should rank highest (Group1), then 3
words(Group2), then 2 words(Group 3) etc
AND - Within each group (Group1, 2, 3) I would want the ones with the
lowest domain rank value to rank higher (but within the group)

How can i do this in a single query? and please advice on the fastest way
possible,
(open to implementing fq  other techniques to speed it up)

Please advice.


Document Structure in XML -

 doc
str name=subDomainwww/str
str name=domainncoah.com/str
str name=path/links.html/str
str name=urlFullhttp://www.ncoah.com/links.html/str
str name=titleNorth Carolina Office of Administrative Hearings
- Links/str
arr name=text
  strNorth Carolina Office of Administrative Hearings - Links/str
/arr
str name=relatedLinks - a
href=http://www.ncoah.com/links.html;  title=HearingsHearings/a
- a href=http://www.ncoah.com/links.html;  title=RulesRules/a -
a href=http://www.ncoah.com/links.html;  title=Civil RightsCivil
Rights/a - a href=http://www.ncoah.com/links.html;
title=WelcomeWelcome/a - a
href=http://www.ncoah.com/links.html;  title=General
InformationGeneral Information/a - a
href=http://www.ncoah.com/links.html;  title=Directions to
OAHDirections to OAH/a - a href=http://www.ncoah.com/links.html;
 title=Establishment of OAHEstablishment of OAH/a - a
href=http://www.ncoah.com/links.html;  title=G.S. 150BG.S.
150B/a - a href=http://www.ncoah.com/links.html;
title=FormsForms/a - a href=http://www.ncoah.com/links.html;
title=LinksLinks/a - a href=http://www.nc.gov/;  title=Visit
the North Carolina State web portalVisit the North Carolina State
web portal/a - a
href=http://ncinfo.iog.unc.edu/library/counties.html;  title=North
Carolina CountiesNorth Carolina Counties/a - a
href=http://ncinfo.iog.unc.edu/library/cities.html;  title=North
Carolina Cities  TownsNorth Carolina Cities  Towns/a - a
href=http://www.nccourts.org/;  title=Administrative Office of the
CourtsAdministrative Office of the Courts/a - a
href=http://www.ncleg.net/;  title=North Carolina General
AssemblyNorth Carolina General Assembly/a - a
href=http://www.doa.state.nc.us/;  title=Department of
AdministrationDepartment of Administration/a - a
href=http://www.ncagr.com/;  title=Department of
AgricultureDepartment of Agriculture/a - a
href=http://www.nccommerce.com;  title=Department of
CommerceDepartment of Commerce/a - a
href=http://www.doc.state.nc.us/;  title=Department of
CorrectionDepartment of Correction/a - a
href=http://www.nccrimecontrol.org/;  title=Department of Crime
Control  Public SafetyDepartment of Crime Control  Public
Safety/a - a href=http://www.ncdcr.gov/;  title=Department of
Cultural ResourcesDepartment of Cultural Resources/a - a
href=http://www.ncdenr.gov/;  title=Department of Environment and
Natural ResourcesDepartment of Environment and Natural Resources/a
- a href=http://www.dhhs.state.nc.us;  title=Department of Health
and Human ServicesDepartment of Health and Human Services/a - a
href=http://www.ncdoi.com/;  title=Department of
InsuranceDepartment of Insurance/a - a
href=http://www.ncdoj.com/;  title=Department of JusticeDepartment
of Justice/a - a href=http://www.juvjus.state.nc.us/;
title=Department of Juvenile Justice and Delinquency
PreventionDepartment of Juvenile Justice and Delinquency
Prevention/a - a href=http://www.nclabor.com/;  title=Department
of LaborDepartment of Labor/a - a
href=http://www.dpi.state.nc.us/;  title=Department of Public
InstructionDepartment of Public Instruction/a - a
href=http://www.dor.state.nc.us/;  title=Department of
RevenueDepartment of Revenue/a - a
href=http://www.treasurer.state.nc.us/;  title=Department of State
TreasurerDepartment of State Treasurer/a - a
href=http://www.ncdot.org/;  title=Department of
TransportationDepartment of Transportation/a - a
href=http://www.secstate.state.nc.us/;  title=Department of the
Secretary of StateDepartment of the Secretary of State/a - a
href=http://www.osp.state.nc.us/;  title=Office of State
PersonnelOffice of State Personnel/a - a
href=http://www.governor.state.nc.us/;  title=Office of the
GovernorOffice of the Governor/a - a
href=http://www.ltgov.state.nc.us/;  title=Office of the Lt.
GovernorOffice of the Lt. Governor/a - a
href=http://www.ncauditor.net/;  title=Office of the State
AuditorOffice of the State Auditor/a - a
href=http://www.osc.nc.gov/;  title=Office of the State
ControllerOffice of the State Controller/a - a
href=http://www.ncbar.org/;  title=North Carolina Bar
AssociationNorth Carolina Bar Association/a - a

60 matches

Mail list logo