Re: Solr hanging when extracting a some broken .doc files

2013-12-18 Thread Charlie Hull

On 17/12/2013 15:29, Augusto Camarotti wrote:

Hi guys,
I'm having a problem with solr when trying to index some broken .doc
files.
I have set up a test case using Solr to index all the files the
users save on the shared directorys of the company that i work for and
Solr is hanging when trying to index this file in particular(the one i'm
attaching on this e-mail). There are some others broken .doc files that
Solr index by the name without a problem, even logging some Tika erros
during the process, but when it reaches this file in particular, it
hangs and i have to cancel the upload.
I cannot guarantee the directorys will never hold a broken .doc
file, or a broken file with some other extension, so i guess solr could
just return a failing message, or something like that.
These are the logging messages solr is recording:
03:38:23ERROR   SolrCoreorg.apache.solr.common.SolrException:
org.apache.tika.exception.TikaException: Unexpected RuntimeException
from org.apache.tika.parser.microsoft.OfficeParser@386f9474
03:38:25ERROR   SolrDispatchFilter
null:org.apache.solr.common.SolrException:
org.apache.tika.exception.TikaException: Unexpected RuntimeException
from org.apache.tika.parser.microsoft.OfficeParser@386f9474

So, how do I prevent solr from hanging when trying to index broken files?
Regards,
Augusto Camarotti


We don't like to run Tika from within Solr ourselves, as it has been 
known to barf (especially on large PDF files, yes there are such horrors 
as 3000 page PDFs!). We usually run it in an external process so it can 
be watched and killed if necessary.


Cheers

Charlie

--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk


Re: PostingsSolrHighlighter

2013-12-18 Thread Liu Bo
hi Josip

for the 1 question we've done similar things: copying search field to a
text field. But highlighting is normally on specific fields such as tittle
depending on how the search content is displayed to the front end, you can
search on text and highlight on the field you wanted by specify hl.fl

ref: http://wiki.apache.org/solr/HighlightingParameters#hl.fl


On 17 December 2013 02:29, Josip Delic j...@lugensa.com wrote:

 Hi @all,

 i am playing with the PostingsSolrHighlighter. I'm running solr 4.6.0
 and my configuration is from here:

 https://lucene.apache.org/solr/4_6_0/solr-core/org/apache/solr/highlight/
 PostingsSolrHighlighter.html

 Search query and result (not working):

 http://pastebin.com/13Uan0ZF

 Schema (not complete):

 http://pastebin.com/JGa38UDT

 Search query and result (working):

 http://pastebin.com/4CP8XKnr

 Solr config:

 searchComponent class=solr.HighlightComponent name=highlight
   highlighting class=org.apache.solr.highlight.PostingsSolrHighlighter/


 /searchComponent

 So this is working just fine, but now i have some questions:

 1.) With the old default highlighter component it was possible to search
 in searchable_text and to retrive highlighted text. This is essential,
 because we use copyfield to put almost everything to searchable_text
 (title, subtitle, description, ...)

 2.) I can't get ellipsis working i tried hl.tag.ellipsis=...,
 f.text.hl.tag.ellipsis=..., configuring it in RequestHandler noting seems
 to work, maxAnalyzedChars is just cutting the sentence?

 Kind Regards

 Josip Delic




-- 
All the best

Liu Bo


Re: Solr hanging when extracting a some broken .doc files

2013-12-18 Thread Alexandre Rafalovitch
Charlie,

Does it mean you are talking to it from a client program? Or are you
running Tika in a listen/server mode and build some adapters for standard
Solr processes?

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Wed, Dec 18, 2013 at 3:47 PM, Charlie Hull char...@flax.co.uk wrote:

 On 17/12/2013 15:29, Augusto Camarotti wrote:

 Hi guys,
 I'm having a problem with solr when trying to index some broken .doc
 files.
 I have set up a test case using Solr to index all the files the
 users save on the shared directorys of the company that i work for and
 Solr is hanging when trying to index this file in particular(the one i'm
 attaching on this e-mail). There are some others broken .doc files that
 Solr index by the name without a problem, even logging some Tika erros
 during the process, but when it reaches this file in particular, it
 hangs and i have to cancel the upload.
 I cannot guarantee the directorys will never hold a broken .doc
 file, or a broken file with some other extension, so i guess solr could
 just return a failing message, or something like that.
 These are the logging messages solr is recording:
 03:38:23ERROR   SolrCoreorg.apache.solr.common.
 SolrException:
 org.apache.tika.exception.TikaException: Unexpected RuntimeException
 from org.apache.tika.parser.microsoft.OfficeParser@386f9474
 03:38:25ERROR   SolrDispatchFilter
 null:org.apache.solr.common.SolrException:
 org.apache.tika.exception.TikaException: Unexpected RuntimeException
 from org.apache.tika.parser.microsoft.OfficeParser@386f9474

 So, how do I prevent solr from hanging when trying to index broken files?
 Regards,
 Augusto Camarotti


 We don't like to run Tika from within Solr ourselves, as it has been known
 to barf (especially on large PDF files, yes there are such horrors as 3000
 page PDFs!). We usually run it in an external process so it can be watched
 and killed if necessary.

 Cheers

 Charlie

 --
 Charlie Hull
 Flax - Open Source Enterprise Search

 tel/fax: +44 (0)8700 118334
 mobile:  +44 (0)7767 825828
 web: www.flax.co.uk



Re: an array liked string is treated as multivalued when adding doc to solr

2013-12-18 Thread Liu Bo
Hi Alexandre

It's quite a rare case, just one out of tens of thousands.

I'm planning to have every multilingual field as multivalued and just get
the first one while formatting the response to our business object.

The first value update processor seems a lot helpful, thank you.

All the best

Liu Bo


On 18 December 2013 15:26, Alexandre Rafalovitch arafa...@gmail.com wrote:

 If this happens rarely and you want to deal with in on the way into Solr,
 you could just keep one of the values, using URP:

 http://lucene.apache.org/solr/4_6_0/solr-core/org/apache/solr/update/processor/FirstFieldValueUpdateProcessorFactory.html

 Regards,
Alex

 Personal website: http://www.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all at
 once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


 On Wed, Dec 18, 2013 at 2:20 PM, Liu Bo diabl...@gmail.com wrote:

  Hey Furkan and solr users
 
  This is a miss reported problem. It's not solr problem but our data
 issue.
  Sorry for this.
 
  It's a data issue of our side, a coupon happened to have two piece
 English
  description, which is not allowed in our business logic, but it happened
   and we added twice of the name_en_US to solr document.
 
  I've done a set of test and deep debugging to solr source code, and found
  out that a array like string such as  [Get 20% Off Official Barca Kits,
  coupon] won't be treated as multivalued field.
 
  Sorry again for not digging more before sent out question email. I trust
  our business logic and data integrity more than solr, I will definitely
 not
  do this again. ;-)
 
  All the best
 
  Liu Bo
 
 
 
  On 11 December 2013 07:21, Furkan KAMACI furkankam...@gmail.com wrote:
 
   Hi Liu;
  
   Yes. it is an expected behavior. If you send data within square
 brackets
   Solr will behave it as a multivalued field. You can test it with this
  way:
   if you use Solrj and use a List for a field it will be considered as
   multivalued too because when you call toString() method of your List
 you
   can see that elements are printed within square brackets. This is the
   reason that a List can be used for a multivalued field.
  
   If you explain your situation I can offer a way how to do it.
  
   Thanks;
   Furkan KAMACI
  
  
   2013/12/6 Liu Bo diabl...@gmail.com
  
Dear solr users:
   
I've met this kind of error several times,
   
when add a array liked string such as:[Get 20% Off Official Barça
  Kits,
coupon] to a  multiValued=false field, solr will complain:
   
org.apache.solr.common.SolrException: ERROR:
 [doc=7781396456243918692]
multiple values encountered for non multiValued field name_en_US:
 [Get
   20%
Off Official Barca Kits, coupon]
   
my schema defination:
field name=name_en_US type=text_en indexed=true stored=true
multiValued=false /
   
This field is stored as the search result needs this field and it's
  value
in original format, and indexed to give it a boost while searching .
   
What I do is adding name (java.lang.String) to SolrInputDocument by
addField(name_en_US, product.getName()) method, and then add this
 to
   solr
using an AddUpdateCommand
   
It seems solr treats this kind of string data as multivalued, even I
  add
this field to solr only once.
   
Is this a bug or a supposed behavior?
   
Is there any way to tell solr this is not a multivalued value add
  don't
break it?
   
Your help and suggestion will be much of my appreciation.
   
--
All the best
   
Liu Bo
   
  
 
 
 
  --
  All the best
 
  Liu Bo
 




-- 
All the best

Liu Bo


DataImport Handler, writing a new EntityProcessor

2013-12-18 Thread Mathias Lux
Hi all!

I've got a question regarding writing a new EntityProcessor, in the
same sense as the Tika one. My EntityProcessor should analyze jpg
images and create document fields to be used with the LIRE Solr plugin
(https://bitbucket.org/dermotte/liresolr). Basically I've taken the
same approach as the TikaEntityProcessor, but my setup just indexes
the first of 1000 images. I'm using a FileListEntityProcessor to get
all JPEGs from a directory and then I'm handing them over (see [2]).
My code for the EntityProcessor is at [1]. I've tried to use the
DataSource as well as the filePath attribute, but it ends up all the
same. However, the FileListEntityProcessor is able to read all the
files according to the debug output, but I'm missing the link from the
FileListEntityProcessor to the LireEntityProcessor.

I'd appreciate any pointer or help :)

cheers,
  Mathias

[1] LireEntityProcessor http://pastebin.com/JFajkNtf
[2] dataConfig http://pastebin.com/vSHucatJ

-- 
Dr. Mathias Lux
Klagenfurt University, Austria
http://tinyurl.com/mlux-itec


PeerSync Recovery fails, starting Replication Recovery

2013-12-18 Thread Anca Kopetz



Hi,

In our SolrCloud cluster (2 shards, 8 replicas), the replicas go from time to time into recovering state, and it takes more than 10 minutes to finish to recover.

In logs, we see that PeerSync Recovery fails with the message :

PeerSync: core=fr_green url="" class="moz-txt-link-freetext" href="http://solr-08/searchsolrnodefr">http://solr-08/searchsolrnodefr too many updates received since start - startingUpdates no longer overlaps with our currentUpdates

Then Replication Recovery starts. 

Is there something we can do to avoid the failure of Peer Recovery so that the recovery process is more rapid (less than 10 minutes) ?

The full trace log is here : 

2013-12-05 13:51:53,740 [http-8080-46] INFO org.apache.solr.handler.admin.CoreAdminHandler:handleRequestRecoveryAction:705 - It has been requested that we recover
2013-12-05 13:51:53,740 [http-8080-112] INFO org.apache.solr.handler.admin.CoreAdminHandler:handleRequestRecoveryAction:705 - It has been requested that we recover
2013-12-05 13:51:53,740 [http-8080-112] INFO org.apache.solr.servlet.SolrDispatchFilter:handleAdminRequest:658 - [admin] webapp=null path=/admin/cores params={action="" status=0 QTime=0
2013-12-05 13:51:53,740 [Thread-1544] INFO org.apache.solr.cloud.ZkController:publish:1017 - publishing core=fr_green state=recovering
2013-12-05 13:51:53,741 [http-8080-46] INFO org.apache.solr.servlet.SolrDispatchFilter:handleAdminRequest:658 - [admin] webapp=null path=/admin/cores params={action="" status=0 QTime=1
2013-12-05 13:51:53,740 [Thread-1543] INFO org.apache.solr.cloud.ZkController:publish:1017 - publishing core=fr_green state=recovering
2013-12-05 13:51:53,743 [Thread-1544] INFO org.apache.solr.cloud.ZkController:publish:1021 - numShards not found on descriptor - reading it from system property
2013-12-05 13:51:53,746 [Thread-1543] INFO org.apache.solr.cloud.ZkController:publish:1021 - numShards not found on descriptor - reading it from system property
2013-12-05 13:51:53,755 [Thread-1543] WARN org.apache.solr.cloud.RecoveryStrategy:close:105 - Stopping recovery for zkNodeName=solr-08_searchsolrnodefr_fr_greencore=fr_green
2013-12-05 13:51:53,756 [RecoveryThread] INFO org.apache.solr.cloud.RecoveryStrategy:run:216 - Starting recovery process. core=fr_green recoveringAfterStartup=false
2013-12-05 13:51:53,762 [RecoveryThread] INFO org.apache.solr.cloud.RecoveryStrategy:doRecovery:495 - Finished recovery process. core=fr_green
2013-12-05 13:51:53,762 [RecoveryThread] INFO org.apache.solr.cloud.RecoveryStrategy:run:216 - Starting recovery process. core=fr_green recoveringAfterStartup=false
2013-12-05 13:51:53,765 [RecoveryThread] INFO org.apache.solr.cloud.ZkController:publish:1017 - publishing core=fr_green state=recovering
2013-12-05 13:51:53,765 [RecoveryThread] INFO org.apache.solr.cloud.ZkController:publish:1021 - numShards not found on descriptor - reading it from system property
2013-12-05 13:51:53,767 [RecoveryThread] INFO org.apache.solr.client.solrj.impl.HttpClientUtil:createClient:103 - Creating new http client, config:maxConnections=128maxConnectionsPerHost=32followRedirects=false
2013-12-05 13:51:54,777 [main-EventThread] INFO org.apache.solr.common.cloud.ZkStateReader:process:210 - A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred - updating... (live nodes size: 18)
2013-12-05 13:51:56,804 [RecoveryThread] INFO org.apache.solr.cloud.RecoveryStrategy:doRecovery:356 - Attempting to PeerSync from

http://solr-02/searchsolrnodefr/fr_green/ core=fr_green - recoveringAfterStartup=false
2013-12-05 13:51:56,806 [RecoveryThread] WARN org.apache.solr.update.PeerSync:sync:232 - PeerSync: core=fr_green url="" class="moz-txt-link-freetext" href="http://solr-08/searchsolrnodefr">http://solr-08/searchsolrnodefr too many updates received since
 start - startingUpdates no longer overlaps with our currentUpdates
2013-12-05 13:51:56,806 [RecoveryThread] INFO org.apache.solr.cloud.RecoveryStrategy:doRecovery:394 - PeerSync Recovery was not successful - trying replication. core=fr_green
2013-12-05 13:51:56,806 [RecoveryThread] INFO org.apache.solr.cloud.RecoveryStrategy:doRecovery:397 - Starting Replication Recovery. core=fr_green
2013-12-05 13:51:56,806 [RecoveryThread] INFO org.apache.solr.cloud.RecoveryStrategy:doRecovery:399 - Begin buffering updates. core=fr_green
2013-12-05 13:51:56,806 [RecoveryThread] INFO org.apache.solr.cloud.RecoveryStrategy:replicate:127 - Attempting to replicate from

http://solr-02/searchsolrnodefr/fr_green/. core=fr_green
2013-12-05 13:51:56,806 [RecoveryThread] INFO org.apache.solr.client.solrj.impl.HttpClientUtil:createClient:103 - Creating new http client, config:maxConnections=128maxConnectionsPerHost=32followRedirects=false
2013-12-05 13:52:01,203 [RecoveryThread] INFO org.apache.solr.handler.SnapPuller:init:211 - No value set for 'pollInterval'. Timer Task not started.
2013-12-05 13:52:01,209 [RecoveryThread] INFO 

Re: PostingsSolrHighlighter

2013-12-18 Thread Josip Delic

Am 18.12.2013 09:55, schrieb Liu Bo:

hi Josip


hi liu,


for the 1 question we've done similar things: copying search field to a
text field. But highlighting is normally on specific fields such as tittle
depending on how the search content is displayed to the front end, you can
search on text and highlight on the field you wanted by specify hl.fl

ref: http://wiki.apache.org/solr/HighlightingParameters#hl.fl


thats exactly what i'm doing in that pastebin:

http://pastebin.com/13Uan0ZF

I'm searing there for 'q=searchable_text:labore' this is present in 
'text' and in the copyfield 'searchable_text' but it is not highlighted 
in 'text' (hl.fl=text)


The same query is working if set 'q=text:labore' as you can see in

http://pastebin.com/4CP8XKnr

For 2 question i figured out that the PostingsSolrHighlighter ellipsis 
is not like i thought for adding ellipsis to start or/and end in 
highlighted text. It is instead used to combine multiple snippets 
together if snippets is  1.


cheers

josip




On 17 December 2013 02:29, Josip Delic j...@lugensa.com wrote:


Hi @all,

i am playing with the PostingsSolrHighlighter. I'm running solr 4.6.0
and my configuration is from here:

https://lucene.apache.org/solr/4_6_0/solr-core/org/apache/solr/highlight/
PostingsSolrHighlighter.html

Search query and result (not working):

http://pastebin.com/13Uan0ZF

Schema (not complete):

http://pastebin.com/JGa38UDT

Search query and result (working):

http://pastebin.com/4CP8XKnr

Solr config:

searchComponent class=solr.HighlightComponent name=highlight
   highlighting class=org.apache.solr.highlight.PostingsSolrHighlighter/


/searchComponent

So this is working just fine, but now i have some questions:

1.) With the old default highlighter component it was possible to search
in searchable_text and to retrive highlighted text. This is essential,
because we use copyfield to put almost everything to searchable_text
(title, subtitle, description, ...)

2.) I can't get ellipsis working i tried hl.tag.ellipsis=...,
f.text.hl.tag.ellipsis=..., configuring it in RequestHandler noting seems
to work, maxAnalyzedChars is just cutting the sentence?

Kind Regards

Josip Delic










smime.p7s
Description: S/MIME Cryptographic Signature


Wildcard queries and custom char filter

2013-12-18 Thread michallos
Hello,

I have a problem with configuring custom char filter. When there are no
wildcards in query then my filter is invoked. When there are wildcards, my
filter is not invoked.

It is possible to configure charFilter to be used with wildcard queries? I
can see than with wildcards, TokenizerChain.charFilters is null.

configuration:

analyzer type=query
charFilter class=a.b.c.MyFilterFactory /
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
/analyzer

What is more interesting, I can see that solr.LowerCaseFilterFactory is
invoked even with wildcards. I tried to transform charFilter to normal
Filter but the result is the same (it is not invoked with wildcards).

Best



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Wildcard-queries-and-custom-char-filter-tp4107241.html
Sent from the Solr - User mailing list archive at Nabble.com.


Service Unavailable Error.

2013-12-18 Thread yriveiro
I having this error on my logs:

ERROR - dat1 - 2013-12-18 11:40:11.704;
org.apache.solr.update.StreamingSolrServers$1; error
org.apache.solr.common.SolrException: Service Unavailable



request:
http://192.168.20.106:8983/solr/statistics-13_shard12_replica4/update?update.distrib=FROMLEADERdistrib.from=http%3A%2F%2F192.168.20.101%3A8983%2Fsolr%2Fstatistics-13_shard12_replica5%2Fwt=javabinversion=2
at
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:240)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

The machine is zen no load, no IO how it's possible be unavailable?
I'm on Solr 4.6.0 solrcloud mode.



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Service-Unavailable-Error-tp4107242.html
Sent from the Solr - User mailing list archive at Nabble.com.


No registered leader was found, but the UI says that I have.

2013-12-18 Thread yriveiro
I'm getting an error on Solr 4.6.0 about leader registation, the admin shows
this:

http://picpaste.com/a839446d0808df205aa7be78c780ed32.png

But my logs says:

ERROR - dat6 - 2013-12-18 11:43:54.253;
org.apache.solr.common.SolrException; org.apache.solr.common.SolrException:
No registered leader was found, collection:statistics-13 slice:shard23_1
at
org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:484)
at
org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:467)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:223)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:428)
at
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
at
org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:89)
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:151)
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:131)
at 
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:223)
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:116)
at 
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:188)
at
org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:114)
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:158)
at
org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:99)
at 
org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:58)
at
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:710)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:413)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:197)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:953)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Unknown Source)

Any idea how can I fix this?



-
Best regards
--
View this message in context: 

Re: solr as nosql - pulling all docs vs deep paging limitations

2013-12-18 Thread Jens Grivolla
You can do range queries without an upper bound and just limit the 
number of results. Then you look at the last result to obtain the new 
lower bound.


-- Jens


On 17/12/13 20:23, Petersen, Robert wrote:

My use case is basically to do a dump of all contents of the index with no 
ordering needed.  It's actually to be a product data export for third parties.  
Unique key is product sku.  I could take the min sku and range query up to the 
max sku but the skus are not contiguous because some get turned off and only 
some are valid for export so each range would return a different number of 
products (which may or may not be acceptable and I might be able to kind of 
hide that with some code).

-Original Message-
From: Mikhail Khludnev [mailto:mkhlud...@griddynamics.com]
Sent: Tuesday, December 17, 2013 10:41 AM
To: solr-user
Subject: Re: solr as nosql - pulling all docs vs deep paging limitations

Hoss,

What about SELECT * FROM WHERE ... like misusing Solr? I'm sure you've been 
asked many times for that.
What if client don't need to rank results somehow, but just requesting 
unordered filtering result like they are used to in RDBMS?
Do you feel it will never considered as a resonable usecase for Solr? or there 
is a well known approach for dealing with?


On Tue, Dec 17, 2013 at 10:16 PM, Chris Hostetter
hossman_luc...@fucit.orgwrote:



: Then I remembered we currently don't allow deep paging in our
current
: search indexes as performance declines the deeper you go.  Is this
still
: the case?

Coincidently, i'm working on a new cursor based API to make this much
more feasible as we speak..

https://issues.apache.org/jira/browse/SOLR-5463

I did some simple perf testing of the strawman approach and posted the
results last week...


http://searchhub.org/coming-soon-to-solr-efficient-cursor-based-iterat
ion-of-large-result-sets/

...current iterations on the patch are to eliminate the strawman code
to improve performance even more and beef up the test cases.

: If so, is there another approach to make all the data in a
collection
: easily available for retrieval?  The only thing I can think of is to
 ...
: Then I was thinking we could have a field with an incrementing
numeric
: value which could be used to perform range queries as a substitute
for
: paging through everything.  Ie queries like 'IncrementalField:[1 TO
: 100]' 'IncrementalField:[101 TO 200]' but this would be difficult to
: maintain as we update the index unless we reindex the entire
collection
: every time we update any docs at all.

As i mentioned in the blog above, as long as you have a uniqueKey
field that supports range queries, bulk exporting of all documents is
fairly trivial by sorting on your uniqueKey field and using an fq that
also filters on your uniqueKey field modify the fq each time to change
the lower bound to match the highest ID you got on the previous page.

This approach works really well in simple cases where you wnat to
fetch all documents matching a query and then process/sort them by
some other criteria on the client -- but it's not viable if it's
important to you that the documents come back from solr in score order
before your client gets them because you want to stop fetching once
some criteria is met in your client.  Example: you have billions of
documents matching a query, you want to fetch all sorted by score desc
and crunch them on your client to compute some stats, and once your
client side stat crunching tells you you have enough results (which
might be after the 1000th result, or might be after the millionth result) then 
you want to stop.

SOLR-5463 will help even in that later case.  The bulk of the patch
should easy to use in the next day or so (having other people try out
and test in their applications would be *very* helpful) and hopefully
show up in Solr 4.7

-Hoss
http://www.lucidworks.com/





--
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
  mkhlud...@griddynamics.com







Re: Wildcard queries and custom char filter

2013-12-18 Thread Ahmet Arslan
Hi,

Yes some factories implement 
org.apache.lucene.analysis.util.MultiTermAwareComponent 
Please see more http://wiki.apache.org/solr/MultitermQueryAnalysis




On Wednesday, December 18, 2013 1:05 PM, michallos michal.ware...@gmail.com 
wrote:
Hello,

I have a problem with configuring custom char filter. When there are no
wildcards in query then my filter is invoked. When there are wildcards, my
filter is not invoked.

It is possible to configure charFilter to be used with wildcard queries? I
can see than with wildcards, TokenizerChain.charFilters is null.

configuration:

analyzer type=query
        charFilter class=a.b.c.MyFilterFactory /
        tokenizer class=solr.StandardTokenizerFactory/
        filter class=solr.LowerCaseFilterFactory/
/analyzer

What is more interesting, I can see that solr.LowerCaseFilterFactory is
invoked even with wildcards. I tried to transform charFilter to normal
Filter but the result is the same (it is not invoked with wildcards).

Best



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Wildcard-queries-and-custom-char-filter-tp4107241.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Wildcard queries and custom char filter

2013-12-18 Thread michallos
It works! Thanks.

Last question: how to invoke charFilter before tokenizer? I can see that
with tokenizer StandardTokenizerFactory without wildcards text 123-abc is
broken into two tokens 123 and abc but text *123-abc* remain unchanged
*123-abc*.

It is possible to use charFilter before tokenizers?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Wildcard-queries-and-custom-char-filter-tp4107241p4107252.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: DataImport Handler, writing a new EntityProcessor

2013-12-18 Thread Dyer, James
The first thing I would suggest is to try and run it not in debug mode.  DIH's 
debug mode limits the number of documents it will take in, so that might be all 
that is wrong here.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: mathias@gmail.com [mailto:mathias@gmail.com] On Behalf Of Mathias 
Lux
Sent: Wednesday, December 18, 2013 4:04 AM
To: solr-user@lucene.apache.org
Subject: DataImport Handler, writing a new EntityProcessor

Hi all!

I've got a question regarding writing a new EntityProcessor, in the
same sense as the Tika one. My EntityProcessor should analyze jpg
images and create document fields to be used with the LIRE Solr plugin
(https://bitbucket.org/dermotte/liresolr). Basically I've taken the
same approach as the TikaEntityProcessor, but my setup just indexes
the first of 1000 images. I'm using a FileListEntityProcessor to get
all JPEGs from a directory and then I'm handing them over (see [2]).
My code for the EntityProcessor is at [1]. I've tried to use the
DataSource as well as the filePath attribute, but it ends up all the
same. However, the FileListEntityProcessor is able to read all the
files according to the debug output, but I'm missing the link from the
FileListEntityProcessor to the LireEntityProcessor.

I'd appreciate any pointer or help :)

cheers,
  Mathias

[1] LireEntityProcessor http://pastebin.com/JFajkNtf
[2] dataConfig http://pastebin.com/vSHucatJ

-- 
Dr. Mathias Lux
Klagenfurt University, Austria
http://tinyurl.com/mlux-itec



solrcloud no server hosting shard

2013-12-18 Thread gf80
Hi guys,

before starting note that I am new with solr and in particular with
solrcloud.
I have to index many many documents (10mln), last week I have complete my
import handler and configuration so I have started import activity on solr
using solrcloud with 10 shard (and without replicas :S ) on VM with 30giga
of RAM and good performance (I don't know if 10 are too much).
Today I see that during specific update (delete of wrong document) of a
specific document following exception was thrown:

org.apache.solr.common.SolrException: no servers hosting shard: 
at
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:148)
at
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:118)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source) 

After restarting solrcloud the exception is thrown during update of another
document. Nevertheless, search queries work fine but slow.

I thank you so much in advance if you can help me with this exception or if
you have any suggestion for my configuration. Is it a must to have some
replicas of shards? can I add now a replica after some million of document
indexed? To configure solrcloud I have essentially used default
configuration and I have read general solrcloud wiki, are there any
suggestions to use solr with this size of document in a more comfortable
way?

Thanks again,
Giuseppe

p.s. sorry for my english :)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solrcloud-no-server-hosting-shard-tp4107268.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DataImport Handler, writing a new EntityProcessor

2013-12-18 Thread Mathias Lux
Unfortunately it is the same in non-debug, just the first document. I
also output the params to sout, but it seems only the first one is
ever arriving at my custom class. I've the feeling that I'm doing
something seriously wrong here, based on a complete misunderstanding
:) I basically assume that the nested entity processor will be called
for each of the rows that come out from its parent. I've read
somewhere, that the data has to be taken from the data source, and
I've implemented that, but it doesn't seem to change anything.

cheers,
Mathias

On Wed, Dec 18, 2013 at 3:05 PM, Dyer, James
james.d...@ingramcontent.com wrote:
 The first thing I would suggest is to try and run it not in debug mode.  
 DIH's debug mode limits the number of documents it will take in, so that 
 might be all that is wrong here.

 James Dyer
 Ingram Content Group
 (615) 213-4311


 -Original Message-
 From: mathias@gmail.com [mailto:mathias@gmail.com] On Behalf Of 
 Mathias Lux
 Sent: Wednesday, December 18, 2013 4:04 AM
 To: solr-user@lucene.apache.org
 Subject: DataImport Handler, writing a new EntityProcessor

 Hi all!

 I've got a question regarding writing a new EntityProcessor, in the
 same sense as the Tika one. My EntityProcessor should analyze jpg
 images and create document fields to be used with the LIRE Solr plugin
 (https://bitbucket.org/dermotte/liresolr). Basically I've taken the
 same approach as the TikaEntityProcessor, but my setup just indexes
 the first of 1000 images. I'm using a FileListEntityProcessor to get
 all JPEGs from a directory and then I'm handing them over (see [2]).
 My code for the EntityProcessor is at [1]. I've tried to use the
 DataSource as well as the filePath attribute, but it ends up all the
 same. However, the FileListEntityProcessor is able to read all the
 files according to the debug output, but I'm missing the link from the
 FileListEntityProcessor to the LireEntityProcessor.

 I'd appreciate any pointer or help :)

 cheers,
   Mathias

 [1] LireEntityProcessor http://pastebin.com/JFajkNtf
 [2] dataConfig http://pastebin.com/vSHucatJ

 --
 Dr. Mathias Lux
 Klagenfurt University, Austria
 http://tinyurl.com/mlux-itec




-- 
PD Dr. Mathias Lux
Klagenfurt University, Austria
http://tinyurl.com/mlux-itec


Dynamically deriving the param value in solrconfig requestHandler

2013-12-18 Thread Senthilnathan Vijayaraja
hi,

 Is there any possibility to derive a value to a param from other params
like below,

requestHandler name=/main
class=com.solr.custom.handler.MySearchHandler
  arr name=components
   strquery/str
   strdebug/str
  /arr
  lst name=defaults
   str name=size_relaxed*size:['$minSize' TO '$maxSize'] */str
//minSize and maxSize will be supplied as query parameter or else
defaults
to below values((i.e size_relaxed=size:[0 TO 1] ))

*str name=minSize0/str  str name=maxSize1/str*
  /lst
 /requestHandler


Thanks  Regards,
Senthilnathan V


Re: solr as nosql - pulling all docs vs deep paging limitations

2013-12-18 Thread Mikhail Khludnev
Aha! SOLR-5244 is a particular case which I'm asking about. I wonder who
else consider it useful?
(I.m sorry if I hijacked the thread)
18.12.2013 5:41 пользователь Joel Bernstein joels...@gmail.com написал:

 They are for different use cases. Hoss's approach, I believe, focuses on
 deep paging of ranked search results. SOLR-5244 focuses on the batch export
 of an entire unranked search result in binary format. It's basically a very
 efficient bulk extract for Solr.


 On Tue, Dec 17, 2013 at 6:51 PM, Otis Gospodnetic 
 otis.gospodne...@gmail.com wrote:

  Joel - can you please elaborate a bit on how this compares with Hoss'
  approach?  Complementary?
 
  Thanks,
  Otis
  --
  Performance Monitoring * Log Analytics * Search Analytics
  Solr  Elasticsearch Support * http://sematext.com/
 
 
  On Tue, Dec 17, 2013 at 6:45 PM, Joel Bernstein joels...@gmail.com
  wrote:
 
   SOLR-5244 is also working in this direction. This focuses on efficient
   binary extract of entire search results.
  
  
   On Tue, Dec 17, 2013 at 2:33 PM, Otis Gospodnetic 
   otis.gospodne...@gmail.com wrote:
  
Hoss is working on it. Search for deep paging or cursor in JIRA.
   
Otis
Solr  ElasticSearch Support
http://sematext.com/
On Dec 17, 2013 12:30 PM, Petersen, Robert 
robert.peter...@mail.rakuten.com wrote:
   
 Hi solr users,

 We have a new use case where need to make a pile of data available
 as
   XML
 to a client and I was thinking we could easily put all this data
  into a
 solr collection and the client could just do a star search and page
through
 all the results to obtain the data we need to give them.  Then I
remembered
 we currently don't allow deep paging in our current search indexes
 as
 performance declines the deeper you go.  Is this still the case?

 If so, is there another approach to make all the data in a
 collection
 easily available for retrieval?  The only thing I can think of is
 to
query
 our DB for all the unique IDs of all the documents in the
 collection
   and
 then pull out the documents out in small groups with successive
  queries
 like 'UniqueIdField:(id1 OR id2 OR ... OR idn)'
 'UniqueIdField:(idn+1
   OR
 idn+2 OR ... etc)' which doesn't seem like a very good approach
  because
the
 DB might have been updated with new data which hasn't been indexed
  yet
and
 so all the ids might not be in there (which may or may not matter I
 suppose).

 Then I was thinking we could have a field with an incrementing
  numeric
 value which could be used to perform range queries as a substitute
  for
 paging through everything.  Ie queries like 'IncrementalField:[1 TO
   100]'
 'IncrementalField:[101 TO 200]' but this would be difficult to
  maintain
as
 we update the index unless we reindex the entire collection every
  time
   we
 update any docs at all.

 Is this perhaps not a good use case for solr?  Should I use
 something
else
 or is there another approach that would work here to allow a client
  to
pull
 groups of docs in a collection through the rest api until the
 client
   has
 gotten them all?

 Thanks
 Robi


   
  
  
  
   --
   Joel Bernstein
   Search Engineer at Heliosearch
  
 



 --
 Joel Bernstein
 Search Engineer at Heliosearch



RE: Solr failure results in misreplication?

2013-12-18 Thread Tim Potter
Any chance you still have the logs from the servers hosting 1  2? I would open 
a JIRA ticket for this one as it sounds like something went terribly wrong on 
restart. 

You can update the /clusterstate.json to fix this situation.

Lastly, it's recommended to use an OOM killer script with SolrCloud so that you 
don't end up with zombie nodes hanging around in your cluster. I use something 
like: -XX:OnOutOfMemoryError=$SCRIPT_DIR/oom_solr.sh $x %p

$x in start script is the port # and %p is the process ID ... My oom_solr.sh 
script is something like this:

#!/bin/bash
SOLR_PORT=$1
SOLR_PID=$2
NOW=$(date +%F%T)
(
echo Running OOM killer script for process $SOLR_PID for Solr on port 
89$SOLR_PORT
kill -9 $SOLR_PID
echo Killed process $SOLR_PID
) | tee oom_killer-89$SOLR_PORT-$NOW.log

I use supervisord do handle the restart after the process gets killed by the 
OOM killer, which is why you don't see the restart in this script ;-)

Timothy Potter
Sr. Software Engineer, LucidWorks
www.lucidworks.com


From: youknow...@heroicefforts.net youknow...@heroicefforts.net
Sent: Tuesday, December 17, 2013 10:31 PM
To: solr-user@lucene.apache.org
Subject: Solr failure results in misreplication?

My client has a test cluster Solr 4.6 with three instances 1, 2, and 3 hosting 
shards 1, 2, and 3, respectively.  There is no replication in this cluster.  We 
started receiving OOME during indexing; likely the batches were too large.  The 
cluster was rebooted to restore the system.  However, upon reboot, instance 2 
now shows as a replica of shard 1 and its shard2 is down with a null range.  
Instance 2 is queryable shards.tolerant=truedistribute=false and returns a 
different set of records than instance 1 (as would be expected during normal 
operations).  Clusterstate.json is similar to the following:

mycollection:{
shard1:{
range:800-d554,
state:active,
replicas:{
instance1state:active...,
instance2state:active...
}
},
shard3:{state:active.},
shard2:{
range:null,
state:active,
replicas:{
instance2{state:down}
}
},
maxShardsPerNode:1,
replicationFactor:1
}

Any ideas on how this would come to pass?  Would manually correcting the 
clusterstate.json in Zk correct this situation?

Re: Wildcard queries and custom char filter

2013-12-18 Thread michallos
Hoh, I can see that when there are wildcards then KeywordTokenizerFactory is
used instead of StandardTokenizerFactory.
I created custom wildcard remover char filter for few specific cases (so I
cannot use any of regex replacer filters) but event with that,
KeywordTokenizerFactory is used.

I thought charFilter is enough but there is more complicated logic in
SolrQueryParserBase#handleBareTokenQuery that chooses
KeywordTokenizerFactory before my charFilter is invoked!

Is it possible to handle custom wildcard remover, so that
StandardTokenizerFactory may be used?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Wildcard-queries-and-custom-char-filter-tp4107241p4107275.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr as nosql - pulling all docs vs deep paging limitations

2013-12-18 Thread Chris Hostetter
: 
: What about SELECT * FROM WHERE ... like misusing Solr? I'm sure you've been
: asked many times for that.
: What if client don't need to rank results somehow, but just requesting
: unordered filtering result like they are used to in RDBMS?
: Do you feel it will never considered as a resonable usecase for Solr? or
: there is a well known approach for dealing with?

If you don't care about ordering, then the approach i described (either 
using SOLR-5463, or just using a sort by uniqueKey with increasing 
range filters on the id) should work fine -- the fact that they come back 
sorted by id is just an implementation detail that makes it possible to 
batch the records (the same way most SQL databases will likely give you 
back the docs based on whatever primary key index you have)

I think the key difference between approaches like SOLR-5244 vs the cursor 
work in SOLR-5463 is that SOLR-5244 is really targeted at dumping all 
data about all docs from a core (matching the query) in a single 
request/response -- for something like SolrCloud, the client would 
manually need to hit each shard (but as i understand it fro mthe 
dscription, that's kind of the point, it's aiming to be a very low level 
bulk export).  With the cursor approach in SOLR-5463, we do 
agregation across all shards, and we support arbitrary sorts, and you can 
control the batch size from the client and iterate over multiple 
request/responses of that size.  if there is any network hucups, you can 
re-do a request.  If you process half the docs that match (in a 
particular order) and then decide I've got all the docs i need for my 
purposes, ou can stop requesting the continuation of that cursor.



-Hoss
http://www.lucidworks.com/


Re: solr as nosql - pulling all docs vs deep paging limitations

2013-12-18 Thread Chris Hostetter

: You can do range queries without an upper bound and just limit the number of
: results. Then you look at the last result to obtain the new lower bound.

exactly.  instead of this:

   First: q=foostart=0rows=$ROWS
   After: q=foostart=$Xrows=$ROWS

...where $ROWS is how big a batch of docsy you can handle at one time, 
and you increase the value of $X by the value of $ROWS on each successive 
request, you can just do this...

   First: q=foostart=0rows=$ROWSsort=id+asc
   After: q=foostart=0rows=$ROWSsort=id+ascfq=id:{$X TO *]

...where $X is whatever the last id you got on the previous page.

Or: you try out the patch in SOLR-5463 and do something like this...

   First: q=foostart=0rows=$ROWSsort=id+asccursorMark=*
   After: q=foostart=0rows=$ROWSsort=id+asccursorMark=$X

...where $X is whatever nextCursorMark you got from the previous page.



-Hoss
http://www.lucidworks.com/


Re: solr as nosql - pulling all docs vs deep paging limitations

2013-12-18 Thread Michael Della Bitta
Us too. That's going to be huge for us!

Michael Della Bitta

Applications Developer

o: +1 646 532 3062  | c: +1 917 477 7906

appinions inc.

The Science of Influence Marketing

18 East 41st Street

New York, NY 10017

t: @appinions https://twitter.com/Appinions | g+:
plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
w: appinions.com http://www.appinions.com/


On Wed, Dec 18, 2013 at 9:55 AM, Mikhail Khludnev 
mkhlud...@griddynamics.com wrote:

 Aha! SOLR-5244 is a particular case which I'm asking about. I wonder who
 else consider it useful?
 (I.m sorry if I hijacked the thread)
 18.12.2013 5:41 пользователь Joel Bernstein joels...@gmail.com
 написал:

  They are for different use cases. Hoss's approach, I believe, focuses on
  deep paging of ranked search results. SOLR-5244 focuses on the batch
 export
  of an entire unranked search result in binary format. It's basically a
 very
  efficient bulk extract for Solr.
 
 
  On Tue, Dec 17, 2013 at 6:51 PM, Otis Gospodnetic 
  otis.gospodne...@gmail.com wrote:
 
   Joel - can you please elaborate a bit on how this compares with Hoss'
   approach?  Complementary?
  
   Thanks,
   Otis
   --
   Performance Monitoring * Log Analytics * Search Analytics
   Solr  Elasticsearch Support * http://sematext.com/
  
  
   On Tue, Dec 17, 2013 at 6:45 PM, Joel Bernstein joels...@gmail.com
   wrote:
  
SOLR-5244 is also working in this direction. This focuses on
 efficient
binary extract of entire search results.
   
   
On Tue, Dec 17, 2013 at 2:33 PM, Otis Gospodnetic 
otis.gospodne...@gmail.com wrote:
   
 Hoss is working on it. Search for deep paging or cursor in JIRA.

 Otis
 Solr  ElasticSearch Support
 http://sematext.com/
 On Dec 17, 2013 12:30 PM, Petersen, Robert 
 robert.peter...@mail.rakuten.com wrote:

  Hi solr users,
 
  We have a new use case where need to make a pile of data
 available
  as
XML
  to a client and I was thinking we could easily put all this data
   into a
  solr collection and the client could just do a star search and
 page
 through
  all the results to obtain the data we need to give them.  Then I
 remembered
  we currently don't allow deep paging in our current search
 indexes
  as
  performance declines the deeper you go.  Is this still the case?
 
  If so, is there another approach to make all the data in a
  collection
  easily available for retrieval?  The only thing I can think of is
  to
 query
  our DB for all the unique IDs of all the documents in the
  collection
and
  then pull out the documents out in small groups with successive
   queries
  like 'UniqueIdField:(id1 OR id2 OR ... OR idn)'
  'UniqueIdField:(idn+1
OR
  idn+2 OR ... etc)' which doesn't seem like a very good approach
   because
 the
  DB might have been updated with new data which hasn't been
 indexed
   yet
 and
  so all the ids might not be in there (which may or may not
 matter I
  suppose).
 
  Then I was thinking we could have a field with an incrementing
   numeric
  value which could be used to perform range queries as a
 substitute
   for
  paging through everything.  Ie queries like 'IncrementalField:[1
 TO
100]'
  'IncrementalField:[101 TO 200]' but this would be difficult to
   maintain
 as
  we update the index unless we reindex the entire collection every
   time
we
  update any docs at all.
 
  Is this perhaps not a good use case for solr?  Should I use
  something
 else
  or is there another approach that would work here to allow a
 client
   to
 pull
  groups of docs in a collection through the rest api until the
  client
has
  gotten them all?
 
  Thanks
  Robi
 
 

   
   
   
--
Joel Bernstein
Search Engineer at Heliosearch
   
  
 
 
 
  --
  Joel Bernstein
  Search Engineer at Heliosearch
 



Re: solr as nosql - pulling all docs vs deep paging limitations

2013-12-18 Thread Jonathan Rochkind

On 12/17/13 1:16 PM, Chris Hostetter wrote:

As i mentioned in the blog above, as long as you have a uniqueKey field
that supports range queries, bulk exporting of all documents is fairly
trivial by sorting on your uniqueKey field and using an fq that also
filters on your uniqueKey field modify the fq each time to change the
lower bound to match the highest ID you got on the previous page.


Aha, very nice suggestion, I hadn't thought of this, when myself trying 
to figure out decent ways to 'fetch all documents matching a query' for 
some bulk offline processing.


One question that I was never sure about when trying to do things like 
this -- is this going to end up blowing the query and/or document caches 
if used on a live Solr?  By filling up those caches with the results of 
the 'bulk' export?  If so, is there any way to avoid that? Or does it 
probably not really matter?


Jonathan


Re: Solr3.4 on tomcat 7.0.23 - hung with error threw exception java.lang.IllegalStateException: Cannot call sendError() after the response has been committed

2013-12-18 Thread solr-user
were you able to resolve this issue, and if so how??

I am encountering the same issue in a couple of solr versions (including 4.0
and 4.5)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr3-4-on-tomcat-7-0-23-hung-with-error-threw-exception-java-lang-IllegalStateException-Cannot-call-tp4087342p4107286.html
Sent from the Solr - User mailing list archive at Nabble.com.


org.apache.solr.update.DirectUpdateHandler2; No uncommitted changes. Skipping IW.commit.

2013-12-18 Thread neerajp
Hi,
I am using ExtractingRequestHandler to extract text from binary data and
then index the text but getting *error:
org.apache.solr.update.DirectUpdateHandler2; No uncommitted changes.
Skipping IW.commit.*

*solrconfig.xml:*
requestHandler name=/update/extract
class=org.apache.solr.handler.extraction.ExtractingRequestHandler
  lst name=defaults
  str name=lowernamestrue/str
  str name=fmap.contentattachment/str
  str name=uprefixignored_/str
  /lst
/requestHandler

lib dir=/var/solrdev/solr-4.5.0/contrib/extraction/lib regex=.*\.jar / 
lib dir=/var/solrdev/solr-4.5.0/dist/ regex=.*\.jar / 
lib dir=/var/solrdev/solr-4.5.0/dist/ regex=solr-cell-4.5.0.jar
/ 

*schema.xml:*

field  name=attachment  type=string   indexed=true  stored=true  
required=false  multiValued=true/
fieldType name=string class=solr.TextField omitNorms=true

*CURL request:*
curl
http://localhost:8085/solr/openwave/update/extract?literal.msg-uid=9commit=true;
-F myFile=Dummy.doc

I do not understand where the problem is ? Pls. suggest me



--
View this message in context: 
http://lucene.472066.n3.nabble.com/org-apache-solr-update-DirectUpdateHandler2-No-uncommitted-changes-Skipping-IW-commit-tp4107285.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr as nosql - pulling all docs vs deep paging limitations

2013-12-18 Thread Chris Hostetter

: One question that I was never sure about when trying to do things like this --
: is this going to end up blowing the query and/or document caches if used on a
: live Solr?  By filling up those caches with the results of the 'bulk' export?
: If so, is there any way to avoid that? Or does it probably not really matter?

  q={!cache=false}...


-Hoss
http://www.lucidworks.com/


Shards stuck in down state after splitting shard - How can we recover from a failed SPLITSHARD?

2013-12-18 Thread cwhi
I called SPLITSHARD on a shard in an existing SolrCloud instance, where the
shard had ~1 million documents in it.  It's been about 3 hours since that
splitting has completed, and the subshards are still stuck in a Down
state.  They are reported as down in localhost/solr/#/~cloud, and I'm unable
to query my index.

How can we recover from a failed SPLITSHARD operation?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Shards-stuck-in-down-state-after-splitting-shard-How-can-we-recover-from-a-failed-SPLITSHARD-tp4107297.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DataImport Handler, writing a new EntityProcessor

2013-12-18 Thread P Williams
Hi Mathias,

I'd recommend testing one thing at a time.  See if you can get it to work
for one image before you try a directory of images.  Also try testing using
the solr-testframework using your ide (I use Eclipse) to debug rather than
your browser/print statements.  Hopefully that will give you some more
specific knowledge of what's happening around your plugin.

I also wrote an EntityProcessor plugin to read from a properties
filehttps://issues.apache.org/jira/browse/SOLR-3928.
 Hopefully that'll give you some insight about this kind of Solr plugin and
testing them.

Cheers,
Tricia




On Wed, Dec 18, 2013 at 3:03 AM, Mathias Lux m...@itec.uni-klu.ac.atwrote:

 Hi all!

 I've got a question regarding writing a new EntityProcessor, in the
 same sense as the Tika one. My EntityProcessor should analyze jpg
 images and create document fields to be used with the LIRE Solr plugin
 (https://bitbucket.org/dermotte/liresolr). Basically I've taken the
 same approach as the TikaEntityProcessor, but my setup just indexes
 the first of 1000 images. I'm using a FileListEntityProcessor to get
 all JPEGs from a directory and then I'm handing them over (see [2]).
 My code for the EntityProcessor is at [1]. I've tried to use the
 DataSource as well as the filePath attribute, but it ends up all the
 same. However, the FileListEntityProcessor is able to read all the
 files according to the debug output, but I'm missing the link from the
 FileListEntityProcessor to the LireEntityProcessor.

 I'd appreciate any pointer or help :)

 cheers,
   Mathias

 [1] LireEntityProcessor http://pastebin.com/JFajkNtf
 [2] dataConfig http://pastebin.com/vSHucatJ

 --
 Dr. Mathias Lux
 Klagenfurt University, Austria
 http://tinyurl.com/mlux-itec



Re: solr as nosql - pulling all docs vs deep paging limitations

2013-12-18 Thread Mikhail Khludnev
On Wed, Dec 18, 2013 at 8:03 PM, Chris Hostetter
hossman_luc...@fucit.orgwrote:

 :
 : What about SELECT * FROM WHERE ... like misusing Solr? I'm sure you've
 been
 : asked many times for that.
 : What if client don't need to rank results somehow, but just requesting
 : unordered filtering result like they are used to in RDBMS?
 : Do you feel it will never considered as a resonable usecase for Solr? or
 : there is a well known approach for dealing with?

 If you don't care about ordering, then the approach i described (either
 using SOLR-5463, or just using a sort by uniqueKey with increasing
 range filters on the id) should work fine -- the fact that they come back
 sorted by id is just an implementation detail that makes it possible to
 batch the records

From the functional standpoint it's true, but performance might matter, in
that side cases. eg. I wonder why the priority queue is needed even if we
request sort=_docid_.

 (the same way most SQL databases will likely give you
 back the docs based on whatever primary key index you have)

 I think the key difference between approaches like SOLR-5244 vs the cursor
 work in SOLR-5463 is that SOLR-5244 is really targeted at dumping all
 data about all docs from a core (matching the query) in a single
 request/response -- for something like SolrCloud, the client would
 manually need to hit each shard (but as i understand it fro mthe
 dscription, that's kind of the point, it's aiming to be a very low level
 bulk export).  With the cursor approach in SOLR-5463, we do
 agregation across all shards, and we support arbitrary sorts, and you can
 control the batch size from the client and iterate over multiple
 request/responses of that size.  if there is any network hucups, you can
 re-do a request.  If you process half the docs that match (in a
 particular order) and then decide I've got all the docs i need for my
 purposes, ou can stop requesting the continuation of that cursor.



 -Hoss
 http://www.lucidworks.com/




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com


Solr could replace shards

2013-12-18 Thread Max Hansmire
I am considering using SolrCloud, but I have a use case that I am not sure
if it covers.

I would like to keep an index up to date in realtime, but also I would like
to sometimes restate the past. The way that I would restate the past is to
do batch processing over historical data.

My idea is that I would have the Solr collection sharded by date range. As
I move forward in time I would add more shards.

For restating historical data I would have a separate process that actually
indexes a shards worth of data. (This keeps the servers that are meant for
production search from having to handle the load of indexing historically.)
I would then move the index files to the solr servers and register the
newly created index with the server replacing the existing shards.

I used to be able to do something similar pre-SolrCloud by using the core
admin. But this did not have the benefit of having one search for the
entire collection. I had to manually query each of the cores to get the
full search index.

Essentially the question is:
1- is it possible to shard by date range in this way?
2- is it possible to swap out the index used by a shard?
3- is there a different way I should be thinking of this?

Max


Re: Shards stuck in down state after splitting shard - How can we recover from a failed SPLITSHARD?

2013-12-18 Thread Anshum Gupta
Hi,

Is the parent shard currently active? What does the clusterstate.json say?
The subshard could be stuck in down when it's trying to recover but as far
as I remember, the sub-shards only get marked active (and the parent goes
inactive) once the recovery and replication (for as many replicas as the
parent shard) are completed.


On Wed, Dec 18, 2013 at 10:01 AM, cwhi chris.whi...@gmail.com wrote:

 I called SPLITSHARD on a shard in an existing SolrCloud instance, where the
 shard had ~1 million documents in it.  It's been about 3 hours since that
 splitting has completed, and the subshards are still stuck in a Down
 state.  They are reported as down in localhost/solr/#/~cloud, and I'm
 unable
 to query my index.

 How can we recover from a failed SPLITSHARD operation?



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Shards-stuck-in-down-state-after-splitting-shard-How-can-we-recover-from-a-failed-SPLITSHARD-tp4107297.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 

Anshum Gupta
http://www.anshumgupta.net


Re: solrcloud no server hosting shard

2013-12-18 Thread Furkan KAMACI
Hi Guiseppe;

First of all you should give us the full error log to understand the reason
behind the error. On the other hand it is not a must to have extra replicas
for your shards but you really should consider to have replicas. When you
start up a new Solr instance it will be assigned to one of your shards that
is directed by Zookeeper ensemble as a round robin process.

Thanks;
Furkan KAMACI


18 Aralık 2013 Çarşamba tarihinde gf80 giuseppe_fe...@hotmail.com adlı
kullanıcı şöyle yazdı:
 Hi guys,

 before starting note that I am new with solr and in particular with
 solrcloud.
 I have to index many many documents (10mln), last week I have complete my
 import handler and configuration so I have started import activity on solr
 using solrcloud with 10 shard (and without replicas :S ) on VM with 30giga
 of RAM and good performance (I don't know if 10 are too much).
 Today I see that during specific update (delete of wrong document) of a
 specific document following exception was thrown:

 org.apache.solr.common.SolrException: no servers hosting shard:
 at

org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:148)
 at

org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:118)
 at java.util.concurrent.FutureTask.run(Unknown Source)
 at java.util.concurrent.Executors$RunnableAdapter.call(Unknown
Source)
 at java.util.concurrent.FutureTask.run(Unknown Source)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
Source)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
 at java.lang.Thread.run(Unknown Source)

 After restarting solrcloud the exception is thrown during update of
another
 document. Nevertheless, search queries work fine but slow.

 I thank you so much in advance if you can help me with this exception or
if
 you have any suggestion for my configuration. Is it a must to have some
 replicas of shards? can I add now a replica after some million of document
 indexed? To configure solrcloud I have essentially used default
 configuration and I have read general solrcloud wiki, are there any
 suggestions to use solr with this size of document in a more comfortable
 way?

 Thanks again,
 Giuseppe

 p.s. sorry for my english :)



 --
 View this message in context:
http://lucene.472066.n3.nabble.com/solrcloud-no-server-hosting-shard-tp4107268.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: No registered leader was found, but the UI says that I have.

2013-12-18 Thread Furkan KAMACI
Hi;

Do you have any error log for leader election? Also do you have this error
always or just within the time period of while the other replica is
recovery mode?

Thanks;
Furkan KAMACI


18 Aralık 2013 Çarşamba tarihinde yriveiro yago.rive...@gmail.com adlı
kullanıcı şöyle yazdı:
 I'm getting an error on Solr 4.6.0 about leader registation, the admin
shows
 this:

 http://picpaste.com/a839446d0808df205aa7be78c780ed32.png

 But my logs says:

 ERROR - dat6 - 2013-12-18 11:43:54.253;
 org.apache.solr.common.SolrException;
org.apache.solr.common.SolrException:
 No registered leader was found, collection:statistics-13 slice:shard23_1
 at

org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:484)
 at

org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:467)
 at

org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:223)
 at

org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:428)
 at

org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
 at

org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:89)
 at

org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:151)
 at

org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:131)
 at
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:223)
 at

org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:116)
 at
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:188)
 at
 org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:114)
 at

org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:158)
 at

org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:99)
 at
org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:58)
 at

org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
 at

org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
 at

org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
 at

org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:710)
 at

org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:413)
 at

org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:197)
 at

org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
 at
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
 at

org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
 at

org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
 at

org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
 at

org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
 at
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
 at

org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
 at

org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
 at

org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
 at

org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
 at

org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
 at

org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
 at org.eclipse.jetty.server.Server.handle(Server.java:368)
 at

org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
 at

org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
 at

org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
 at

org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)
 at
org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:953)
 at
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
 at

org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
 at

org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
 

Re: PeerSync Recovery fails, starting Replication Recovery

2013-12-18 Thread Furkan KAMACI
Hi Anca;

Could you check the conversation at here:
http://lucene.472066.n3.nabble.com/ColrCloud-IOException-occured-when-talking-to-server-at-td4061831.html

Thanks;
Furkan KAMACI


18 Aralık 2013 Çarşamba tarihinde Anca Kopetz anca.kop...@kelkoo.com adlı
kullanıcı şöyle yazdı:
 Hi,

 In our SolrCloud cluster (2 shards, 8 replicas), the replicas go from
time to time into recovering state, and it takes more than 10 minutes to
finish to recover.

 In logs, we see that PeerSync Recovery fails with the message :

 PeerSync: core=fr_green url=http://solr-08/searchsolrnodefr too many
updates received since start - startingUpdates no longer overlaps with our
currentUpdates

 Then Replication Recovery starts.

 Is there something we can do to avoid the failure of Peer Recovery so
that the recovery process is more rapid (less than 10 minutes) ?

 The full trace log is here :

 2013-12-05 13:51:53,740 [http-8080-46] INFO
org.apache.solr.handler.admin.CoreAdminHandler:handleRequestRecoveryAction:705
- It has been requested that we recover
 2013-12-05 13:51:53,740 [http-8080-112] INFO
org.apache.solr.handler.admin.CoreAdminHandler:handleRequestRecoveryAction:705
- It has been requested that we recover
 2013-12-05 13:51:53,740 [http-8080-112] INFO
org.apache.solr.servlet.SolrDispatchFilter:handleAdminRequest:658  -
[admin] webapp=null path=/admin/cores
params={action=REQUESTRECOVERYcore=fr_greenwt=javabinversion=2} status=0
QTime=0
 2013-12-05 13:51:53,740 [Thread-1544] INFO
org.apache.solr.cloud.ZkController:publish:1017  - publishing core=fr_green
state=recovering
 2013-12-05 13:51:53,741 [http-8080-46] INFO
org.apache.solr.servlet.SolrDispatchFilter:handleAdminRequest:658  -
[admin] webapp=null path=/admin/cores
params={action=REQUESTRECOVERYcore=fr_greenwt=javabinversion=2} status=0
QTime=1
 2013-12-05 13:51:53,740 [Thread-1543] INFO
org.apache.solr.cloud.ZkController:publish:1017  - publishing core=fr_green
state=recovering
 2013-12-05 13:51:53,743 [Thread-1544] INFO
org.apache.solr.cloud.ZkController:publish:1021  - numShards not found on
descriptor - reading it from system property
 2013-12-05 13:51:53,746 [Thread-1543] INFO
org.apache.solr.cloud.ZkController:publish:1021  - numShards not found on
descriptor - reading it from system property
 2013-12-05 13:51:53,755 [Thread-1543] WARN
org.apache.solr.cloud.RecoveryStrategy:close:105  - Stopping recovery for
zkNodeName=solr-08_searchsolrnodefr_fr_greencore=fr_green
 2013-12-05 13:51:53,756 [RecoveryThread] INFO
org.apache.solr.cloud.RecoveryStrategy:run:216  - Starting recovery
process.  core=fr_green recoveringAfterStartup=false
 2013-12-05 13:51:53,762 [RecoveryThread] INFO
org.apache.solr.cloud.RecoveryStrategy:doRecovery:495  - Finished recovery
process. core=fr_green
 2013-12-05 13:51:53,762 [RecoveryThread] INFO
org.apache.solr.cloud.RecoveryStrategy:run:216  - Starting recovery
process.  core=fr_green recoveringAfterStartup=false
 2013-12-05 13:51:53,765 [RecoveryThread] INFO
org.apache.solr.cloud.ZkController:publish:1017  - publishing core=fr_green
state=recovering
 2013-12-05 13:51:53,765 [RecoveryThread] INFO
org.apache.solr.cloud.ZkController:publish:1021  - numShards not found on
descriptor - reading it from system property
 2013-12-05 13:51:53,767 [RecoveryThread] INFO
org.apache.solr.client.solrj.impl.HttpClientUtil:createClient:103  -
Creating new http client,
config:maxConnections=128maxConnectionsPerHost=32followRedirects=false
 2013-12-05 13:51:54,777 [main-EventThread] INFO
org.apache.solr.common.cloud.ZkStateReader:process:210  - A cluster state
change: WatchedEvent state:SyncConnected type:NodeDataChanged
path:/clusterstate.json, has occurred - updating... (live nodes size: 18)
 2013-12-05 13:51:56,804 [RecoveryThread] INFO
org.apache.solr.cloud.RecoveryStrategy:doRecovery:356  - Attempting to
PeerSync from http://solr-02/searchsolrnodefr/fr_green/ core=fr_green -
recoveringAfterStartup=false
 2013-12-05 13:51:56,806 [RecoveryThread] WARN
org.apache.solr.update.PeerSync:sync:232  - PeerSync: core=fr_green url=
http://solr-08/searchsolrnodefr too many updates received since start -
startingUpdates no longer overlaps with our currentUpdates
 2013-12-05 13:51:56,806 [RecoveryThread] INFO
org.apache.solr.cloud.RecoveryStrategy:doRecovery:394  - PeerSync Recovery
was not successful - trying replication. core=fr_green
 2013-12-05 13:51:56,806 [RecoveryThread] INFO
org.apache.solr.cloud.RecoveryStrategy:doRecovery:397  - Starting
Replication Recovery. core=fr_green
 2013-12-05 13:51:56,806 [RecoveryThread] INFO
org.apache.solr.cloud.RecoveryStrategy:doRecovery:399  - Begin buffering
updates. core=fr_green
 2013-12-05 13:51:56,806 [RecoveryThread] INFO
org.apache.solr.cloud.RecoveryStrategy:replicate:127  - Attempting to
replicate from http://solr-02/searchsolrnodefr/fr_green/. core=fr_green
 2013-12-05 13:51:56,806 [RecoveryThread] INFO
org.apache.solr.client.solrj.impl.HttpClientUtil:createClient:103  -
Creating new http client,

Solr 4.5 - Solr Cloud is creating new cores on random nodes

2013-12-18 Thread Ryan Wilson
Hello all,

I am currently in the process of building out a solr cloud with solr 4.5 on
4 nodes with some pretty hefty hardware. When we create the collection we
have a replication factor of 2 and store 2 replicas per node.

While we have been experimenting, which has involved bringing nodes up and
down as well as tanking them with OOM errors while messing with jvm
settings, we have observed a disturbing trend where we will bring nodes
back up and suddenly shard x has 6 replicas spread across the nodes. These
replicas will have been created with no action on our part and we would
much rather they not be created at all.

I have not been able to determine whether this is a bug or a feature. If
its a bug, I will happily provide what I can to track it down. If it is a
feature, I would very much like to turn it off.

Any Information is appreciated.

Regards,
Ryan Wilson
rpwils...@gmail.com


email datasource connect timeout issue

2013-12-18 Thread xie kidd
Hi all,

When i try to set up a email data source as
http://wiki.apache.org/solr/MailEntityProcessor ,  connect timeout
Exception happened.  i am sure the user and password is correct, and the
rss data source also work well. anyone can do me a favior?

This issue base on solr4.5 with tomcat7, exception information as following:
--

Full Import failed:java.lang.RuntimeException:
java.lang.RuntimeException:
org.apache.solr.handler.dataimport.DataImportHandlerException:
Connection failed Processing Document # 1
at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:270)
at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:411)
at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:476)
at 
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:457)
Caused by: java.lang.RuntimeException:
org.apache.solr.handler.dataimport.DataImportHandlerException:
Connection failed Processing Document # 1
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:410)
at 
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:323)
at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:231)
... 3 more
Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException:
Connection failed Processing Document # 1
at 
org.apache.solr.handler.dataimport.MailEntityProcessor.connectToMailBox(MailEntityProcessor.java:271)
at 
org.apache.solr.handler.dataimport.MailEntityProcessor.getNextMail(MailEntityProcessor.java:121)
at 
org.apache.solr.handler.dataimport.MailEntityProcessor.nextRow(MailEntityProcessor.java:112)
at 
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:469)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:408)
... 5 more
Caused by: javax.mail.MessagingException: Connection timed out;
  nested exception is:
java.net.ConnectException: Connection timed out
at com.sun.mail.imap.IMAPStore.protocolConnect(IMAPStore.java:571)
at javax.mail.Service.connect(Service.java:288)
at javax.mail.Service.connect(Service.java:169)
at 
org.apache.solr.handler.dataimport.MailEntityProcessor.connectToMailBox(MailEntityProcessor.java:267)
... 10 more
Caused by: java.net.ConnectException: Connection timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)
at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:310)
at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:176)
at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:163)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384)
at java.net.Socket.connect(Socket.java:542)
at sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:570)
at 
sun.security.ssl.BaseSSLSocketImpl.connect(BaseSSLSocketImpl.java:160)
at com.sun.mail.util.SocketFetcher.createSocket(SocketFetcher.java:233)
at com.sun.mail.util.SocketFetcher.getSocket(SocketFetcher.java:189)
at com.sun.mail.iap.Protocol.init(Protocol.java:107)
at com.sun.mail.imap.protocol.IMAPProtocol.init(IMAPProtocol.java:104)
at com.sun.mail.imap.IMAPStore.protocolConnect(IMAPStore.java:538)
... 13 more

--

Thanks in advanced.

Thanks,
Kidd


For the ideal, never give up, fighting!


Re: Solr-839 and version 4.5 (XmlQueryParser)

2013-12-18 Thread Puneet Pawaia
Hi,

Just in case it is of use to anyone, I managed to compile the 4.0 patch by
changing the line where new CoreParser is created to below.

CoreParser parser = new CoreParser(defaultField,
getReq().getSchema().getQueryAnalyzer());

The parser seems to work for the simple tests that I have done so far.

Regards
Puneet


On Tue, Dec 17, 2013 at 10:18 PM, Daniel Collins danwcoll...@gmail.comwrote:

 Do you need it?  Our workaround was to pass null, from what we could tell
 the (lucene) QueryParser which is needs is only used for parsing UserQuery
 constructs, and we never used that construct.  The problem is that
 SolrQueryParser is derived from Solr's QueryParser class which has now
 diverged from the Lucene one.

 Will try to get our patches updated and issued over Xmas.


 On 17 December 2013 14:53, Puneet Pawaia puneet.paw...@gmail.com wrote:

  Hi All,
 
  Not being a Java expert, I used Daniel Collins' modification to patch
 with
  version 4.0 source. It works for a start. Have not been able to test
 much.
 
  Next, I tried the same modifications with Solr 4.6.0. This throws up 2
  errors.
 
  I resolved
  public Query parse() throws ParseException {
  by changing to
  public Query parse() throws SyntaxError {
 
  However, I am not able to get the second error resolved.
  SolrQueryParser lparser;
  CoreParser parser = new
 CoreParser(getReq().getSchema().getQueryAnalyzer(),
  lparser);
 
  CoreParser does not take SolrQueryParser as its parameter. It asks for
  QueryParser.
 
  Is there something I am missing or should be doing that I am not doing?
 
  TIA
 
  Regards
  Puneet
 



Re: PostingsSolrHighlighter

2013-12-18 Thread Liu Bo
Hi Josip

that's quite weird, to my experience highlight is strict on string field
which needs a exact match, text fields should be fine.

I copy your schema definition and do a quick test in a new core, everything
is default from the tutorial, and the search component is
using solr.HighlightComponent .

search on searchable_text can highlight text, I copied your search url and
just change the host part, the input parameters are exactly the same,

result is attached.

Can you upload your complete solrconfig.xml and schema.xml?


On 18 December 2013 19:02, Josip Delic j...@lugensa.com wrote:

 Am 18.12.2013 09:55, schrieb Liu Bo:

 hi Josip


 hi liu,


  for the 1 question we've done similar things: copying search field to a
 text field. But highlighting is normally on specific fields such as tittle
 depending on how the search content is displayed to the front end, you can
 search on text and highlight on the field you wanted by specify hl.fl

 ref: http://wiki.apache.org/solr/HighlightingParameters#hl.fl


 thats exactly what i'm doing in that pastebin:

 http://pastebin.com/13Uan0ZF

 I'm searing there for 'q=searchable_text:labore' this is present in 'text'
 and in the copyfield 'searchable_text' but it is not highlighted in 'text'
 (hl.fl=text)

 The same query is working if set 'q=text:labore' as you can see in

 http://pastebin.com/4CP8XKnr

 For 2 question i figured out that the PostingsSolrHighlighter ellipsis
 is not like i thought for adding ellipsis to start or/and end in
 highlighted text. It is instead used to combine multiple snippets together
 if snippets is  1.

 cheers

 josip




 On 17 December 2013 02:29, Josip Delic j...@lugensa.com wrote:

  Hi @all,

 i am playing with the PostingsSolrHighlighter. I'm running solr 4.6.0
 and my configuration is from here:

 https://lucene.apache.org/solr/4_6_0/solr-core/org/
 apache/solr/highlight/
 PostingsSolrHighlighter.html

 Search query and result (not working):

 http://pastebin.com/13Uan0ZF

 Schema (not complete):

 http://pastebin.com/JGa38UDT

 Search query and result (working):

 http://pastebin.com/4CP8XKnr

 Solr config:

 searchComponent class=solr.HighlightComponent name=highlight
highlighting class=org.apache.solr.highlight.
 PostingsSolrHighlighter/


 /searchComponent

 So this is working just fine, but now i have some questions:

 1.) With the old default highlighter component it was possible to search
 in searchable_text and to retrive highlighted text. This is
 essential,
 because we use copyfield to put almost everything to searchable_text
 (title, subtitle, description, ...)

 2.) I can't get ellipsis working i tried hl.tag.ellipsis=...,
 f.text.hl.tag.ellipsis=..., configuring it in RequestHandler noting seems
 to work, maxAnalyzedChars is just cutting the sentence?

 Kind Regards

 Josip Delic









-- 
All the best

Liu Bo
http://localhost:8080/solr/try/select?wt=jsonfl=text%2Cscore=hl=truehl.fl=textq=%28searchable_text%3Alabore%29rows=10sort=score+descstart=0

{
responseHeader: {
status: 0,
QTime: 36,
params: {
sort: score desc,
fl: text,
start: 0,
,score: ,
q: (searchable_text:labore),
hl.fl: text,
wt: json,
hl: true,
rows: 10
}
},
response: {
numFound: 3,
start: 0,
docs: [
{
text: Lorem ipsum dolor sit amet, consetetur sadipscing 
elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna 
aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores 
et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum 
dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed 
diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed 
diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet 
clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.
},
{
text: Lorem ipsum dolor sit amet, consetetur sadipscing 
elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna 
aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores 
et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum 
dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed 
diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed 
diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet 
clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.
},
{
text: Lorem ipsum dolor sit amet, consetetur sadipscing 
elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna 
aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores 
et ea rebum. Stet clita kasd gubergren, no sea takimata 

Concurrent request configurations for Solr Processors

2013-12-18 Thread Dileepa Jayakody
Hi All,

I have written a custom update request processor and configured a
UpdateRequestProcessor chain in solrconfig.xml as below;

updateRequestProcessorChain name=stanbolInterceptor
processor class=
*com.solr.stanbol.processor.StanbolContentProcessorFactory* /
processor class=solr.RunUpdateProcessorFactory /
  /updateRequestProcessorChain

Can I please know how can I configure the number of concurrent requests for
my processor? What is the default number of concurrent requests per a Solr
processor?

Thanks,
Dileepa