date:20131011

Thanks!
My only doubt is: upload a new set of configuration files to the same
configuration name like so:

Initial configuration:
zkcli.sh -zkhost localhost:9983 -cmd upconfig -confdir conf_initial/
-confname my_custom_config
and afterwards, to change it do:
zkcli.sh -zkhost localhost:9983 -cmd upconfig -confdir conf_changed/
-confname my_custom_config

Is this correct?
If so, what happens afterwards, will ZK distribute this changes to all cores
and reload them?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multiple-schemas-in-the-same-SolrCloud-tp4094279p4094895.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Multiple schemas in the same SolrCloud ?

Here is a topic you should read it:
http://lucene.472066.n3.nabble.com/Reloading-config-to-zookeeper-td4021901.html


2013/10/11 maephisto my_sky...@yahoo.com

 Thanks!
 My only doubt is: upload a new set of configuration files to the same
 configuration name like so:

 Initial configuration:
 zkcli.sh -zkhost localhost:9983 -cmd upconfig -confdir conf_initial/
 -confname my_custom_config
 and afterwards, to change it do:
 zkcli.sh -zkhost localhost:9983 -cmd upconfig -confdir conf_changed/
 -confname my_custom_config

 Is this correct?
 If so, what happens afterwards, will ZK distribute this changes to all
 cores
 and reload them?



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Multiple-schemas-in-the-same-SolrCloud-tp4094279p4094895.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Solr Cloud Basic Authentification

I've deployed a SolrCloud cluster in Jetty 9 using solr 4.4.0 and I would
like to add some basic authentification.
My question is how can I provide the credentials so that they're used in the
collection API when creating a new collection or by ZK?

Are there any useful docs/wiki on this topic?
Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Cloud-Basic-Authentification-tp4094903.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Cloud Basic Authentification

For pre 4.x Solr (aka Solr 3.x) basic authentication works fine. Check 
this site: http://wiki.apache.org/solr/SolrSecurity

Even master-slave replication architecture (*not* SolrCloud) works for 
me. There could be some problems with *cross-shard* queries etc. though 
(see SOLR-1861, SOLR-3421).

I know I haven't answered your question but hopefully I have given you 
some more information on the subject.

Best regards,

Primož




From:   maephisto my_sky...@yahoo.com
To: solr-user@lucene.apache.org
Date:   11.10.2013 10:55
Subject:Solr Cloud Basic Authentification



I've deployed a SolrCloud cluster in Jetty 9 using solr 4.4.0 and I would
like to add some basic authentification.
My question is how can I provide the credentials so that they're used in 
the
collection API when creating a new collection or by ZK?

Are there any useful docs/wiki on this topic?
Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Cloud-Basic-Authentification-tp4094903.html

Sent from the Solr - User mailing list archive at Nabble.com.

Re: Please help!, Highlighting exact phrases with solr

2013-10-11 Thread Silvia Suárez

Dear Koji,

Thanks a lot for your answer and Sorry about my english

I tried to configure
FastVectorHighlighterhttp://wiki.apache.org/solr/HighlightingParameters#hl.useFastVectorHighlighter

However, I have this error:


lst name=error
str name=msg
fragCharSize(1) is too small. It must be 18 or higher.
/str
str name=trace
java.lang.IllegalArgumentException: fragCharSize(1) is too small. It must
be 18 or higher. at
org.apache.lucene.search.vectorhighlight.BaseFragListBuilder.createFieldFragList(BaseFragListBuilder.java:51)
at
org.apache.lucene.search.vectorhighlight.WeightedFragListBuilder.createFieldFragList(WeightedFragListBuilder.java:38)
at
org.apache.lucene.search.vectorhighlight.FastVectorHighlighter.getFieldFragList(FastVectorHighlighter.java:195)
at
org.apache.lucene.search.vectorhighlight.FastVectorHighlighter.getBestFragments(FastVectorHighlighter.java:184)
at
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByFastVectorHighlighter(DefaultSolrHighlighter.java:588)
at
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:413)
at
org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:139)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816) at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:365) at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:926)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:988)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:635) at
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:722)
/str
int name=code500/int
/lst
/response



Then, If I modify like this: (setHighlightFragsize(1) --
setHighlightFragsize(80)):

SolrQuery solrQuery = new SolrQuery();

solrQuery.setQuery(queryEnt);
solrQuery.set(collectionName, myCollection);
solrQuery.addHighlightField(texto)
 .addHighlightField(titular)
 .setHighlightSnippets(50)
 .setHighlightFragsize(80);
solrQuery.setHighlight(true);
solrQuery.setHighlightRequireFieldMatch(true);
solrQuery.set(hl.useFastVectorHighlighter, true);
solrQuery.setHighlightSimplePre(span class=\item\);
solrQuery.setHighlightSimplePost(/span);
solrQuery.set(hl.usePhraseHighlighter, true);


Then, It works (error disappears),  but Highlighting does not work :( :

lst name=highlighting
lst name=35254502/
lst name=35237409/
/lst
lst name=termVectors
str name=uniqueKeyFieldNamec_noticia/str
lst name=warnings
arr name=noTermVectors
strc_region/str
strc_idioma/str
strc_pais/str
strc_tipo/str
strc_categoria/str
strfecha_captura/str
strmedio/str
strc_fuente_docu/str
/arr
/lst
lst name=35254502
str name=uniqueKey35254502/str
/lst
lst name=35237409
str

Re: Multiple schemas in the same SolrCloud ?

2013-10-11 Thread xinwu

Hi,kamaci.
Is that means I just need to upload new config files ,and do not need to
reload every node in solrCloud ,when I want to change my configurations?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multiple-schemas-in-the-same-SolrCloud-tp4094279p4094908.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Cloud Basic Authentification

Here is more information about security that you can use:
http://wiki.apache.org/solr/SolrSecurity


2013/10/11 maephisto my_sky...@yahoo.com

 I've deployed a SolrCloud cluster in Jetty 9 using solr 4.4.0 and I would
 like to add some basic authentification.
 My question is how can I provide the credentials so that they're used in
 the
 collection API when creating a new collection or by ZK?

 Are there any useful docs/wiki on this topic?
 Thanks!



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-Cloud-Basic-Authentification-tp4094903.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Cloud Basic Authentification

Thank you!

I'm more interested in the SolrCloud architecture, with shards, shards
replicas and distributed index and search.
This are the features i use and would like to protect by some basic
authentification.

I imagine that there must be a way to have this, otherwise anybody could
mess with or even drop my collection.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Cloud-Basic-Authentification-tp4094903p4094911.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Please help!, Highlighting exact phrases with solr

Here is a similar question:

http://search-lucene.com/m/vnMGKACGM1/%252218+or+higher.%2522subj=FastVectorHighlighter+and+hl+fragsize+parameter+set+to+zero+causes+exception

and a related fixed issue: https://issues.apache.org/jira/browse/SOLR-1268


2013/10/11 Silvia Suárez s...@anpro21.com

 Dear Koji,

 Thanks a lot for your answer and Sorry about my english

 I tried to configure
 FastVectorHighlighter
 http://wiki.apache.org/solr/HighlightingParameters#hl.useFastVectorHighlighter
 

 However, I have this error:


 lst name=error
 str name=msg
 fragCharSize(1) is too small. It must be 18 or higher.
 /str
 str name=trace
 java.lang.IllegalArgumentException: fragCharSize(1) is too small. It must
 be 18 or higher. at

 org.apache.lucene.search.vectorhighlight.BaseFragListBuilder.createFieldFragList(BaseFragListBuilder.java:51)
 at

 org.apache.lucene.search.vectorhighlight.WeightedFragListBuilder.createFieldFragList(WeightedFragListBuilder.java:38)
 at

 org.apache.lucene.search.vectorhighlight.FastVectorHighlighter.getFieldFragList(FastVectorHighlighter.java:195)
 at

 org.apache.lucene.search.vectorhighlight.FastVectorHighlighter.getBestFragments(FastVectorHighlighter.java:184)
 at

 org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByFastVectorHighlighter(DefaultSolrHighlighter.java:588)
 at

 org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:413)
 at

 org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:139)
 at

 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)
 at

 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816) at

 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656)
 at

 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
 at

 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
 at

 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
 at
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
 at

 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
 at
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560)
 at

 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
 at

 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)
 at
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)
 at

 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
 at

 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)
 at

 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
 at

 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
 at

 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
 at

 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
 at org.eclipse.jetty.server.Server.handle(Server.java:365) at

 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)
 at

 org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
 at

 org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:926)
 at

 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:988)
 at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:635) at
 org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at

 org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
 at

 org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
 at

 org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
 at

 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
 at java.lang.Thread.run(Thread.java:722)
 /str
 int name=code500/int
 /lst
 /response



 Then, If I modify like this: (setHighlightFragsize(1) --
 setHighlightFragsize(80)):

 SolrQuery solrQuery = new SolrQuery();

 solrQuery.setQuery(queryEnt);
 solrQuery.set(collectionName, myCollection);
 solrQuery.addHighlightField(texto)
  .addHighlightField(titular)
  .setHighlightSnippets(50)
  .setHighlightFragsize(80);
 solrQuery.setHighlight(true);
 solrQuery.setHighlightRequireFieldMatch(true);
 solrQuery.set(hl.useFastVectorHighlighter, true);
 solrQuery.setHighlightSimplePre(span class=\item\);
 solrQuery.setHighlightSimplePost(/span);
 solrQuery.set(hl.usePhraseHighlighter, true);


 Then, It works (error disappears),

Re: Solr Cloud Basic Authentification

One possible solution is to firewall access to SolrCloud server(s). Only 
proxy/load-balacing servers should have unrestricted access to Solr 
infrastructure. Then you can implement basic/advanced authentication on 
the proxy/LB side.

Primož



From:   maephisto my_sky...@yahoo.com
To: solr-user@lucene.apache.org
Date:   11.10.2013 11:17
Subject:Re: Solr Cloud Basic Authentification



Thank you!

I'm more interested in the SolrCloud architecture, with shards, shards
replicas and distributed index and search.
This are the features i use and would like to protect by some basic
authentification.

I imagine that there must be a way to have this, otherwise anybody could
mess with or even drop my collection.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Cloud-Basic-Authentification-tp4094903p4094911.html

Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Cloud Basic Authentification

Thank you,
But I'm afraid that wiki page does not cover my topic of interest



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Cloud-Basic-Authentification-tp4094903p4094915.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Cloud Basic Authentification

If you want to deploy basic authentication in a way that a login is 
required when creating collections it is only a simple matter of 
constrainting a url pattern (eg. /solr/admin/collections/*). Maybe this 
link will help: 
http://stackoverflow.com/questions/5323855/jetty-webserver-security/5332049#5332049

But keep in mind that intra-node requests in SolrCloud must also be 
authenticated (because http stack is used). If I understand correctly this 
is currently not possible.

Primož




From:   maephisto my_sky...@yahoo.com
To: solr-user@lucene.apache.org
Date:   11.10.2013 11:25
Subject:Re: Solr Cloud Basic Authentification



Thank you,
But I'm afraid that wiki page does not cover my topic of interest



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Cloud-Basic-Authentification-tp4094903p4094915.html

Sent from the Solr - User mailing list archive at Nabble.com.

Cores with lot of folders with prefix index.XXXXXXX

2013-10-11 Thread yriveiro

Hi,

I have some cores with lot of folder with format index.X, my question is
why?

The collateral effect of this are shards with 50% of size than replicas in
other nodes.

There is any way to delete this folders to free space?

It's a bug?

/Yago



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Cores-with-lot-of-folders-with-prefix-index-XXX-tp4094920.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Cores with lot of folders with prefix index.XXXXXXX

I think this is connected to replications being made? I also have quite 
some of them but currently I am not worried :)

Primož

From:   yriveiro yago.rive...@gmail.com
To: solr-user@lucene.apache.org
Date:   11.10.2013 11:54
Subject:Cores with lot of folders with prefix index.XXX

Hi,

I have some cores with lot of folder with format index.X, my question 
is
why?

The collateral effect of this are shards with 50% of size than replicas in
other nodes.

There is any way to delete this folders to free space?

It's a bug?

/Yago

-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Cores-with-lot-of-folders-with-prefix-index-XXX-tp4094920.html

Sent from the Solr - User mailing list archive at Nabble.com.

solrnet sample

2013-10-11 Thread Kishan Parmar

i want to change the schema file of solrnet sample and want to add xml file
and want to facet data

so what i have to need to do in sample file???

Regards,

Kishan Parmar
Software Developer
+91 95 100 77394
Jay Shree Krishnaa !!

Re: Cores with lot of folders with prefix index.XXXXXXX

2013-10-11 Thread Yago Riveiro

I have ssd's therefor my space is like gold, I can have 30% of my space waste 
in failed replications, or replications that are not cleaned. 

The question for me is if this a normal behaviour or is a bug. If is a normal 
behaviour I have a trouble because a ssd with more than 512G is expensive.

-- 
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Friday, October 11, 2013 at 11:03 AM, primoz.sk...@policija.si wrote:

 I think this is connected to replications being made? I also have quite
 some of them but currently I am not worried :)

Re: Please help!, Highlighting exact phrases with solr

2013-10-11 Thread Silvia Suárez

Hi,

Thanks for your answer Furkan,
I'm sorry, I don't understand the proposed solution...

I did this:


   1. eliminate hl.useHighlighter parameter
   2. introduce hl.useFastVectorHighlighter


However the result is the same...

is something missing?

Thanks a lot in advance for your help...

Sil.



*
*
*Tecnologías y SaaS para el análisis de marcas comerciales.*


Nota:
Usted ha recibido este mensaje al estar en la libreta de direcciones del
remitente, en los archivos de la empresa o mediante el sistema de
“responder” al ser usted la persona que contactó por este medio con el
remitente. En caso de no querer recibir ningún email mas del remitente o de
cualquier miembro de la organización a la que pertenece, por favor,
responda a este email solicitando la baja de su dirección en nuestros
archivos.

Advertencia legal:
Este mensaje y, en su caso, los ficheros anexos son confidenciales,
especialmente en lo que respecta a los datos personales, y se dirigen
exclusivamente al destinatario referenciado. Si usted no lo es y lo ha
recibido por error o tiene conocimiento del mismo por cualquier motivo, le
rogamos que nos lo comunique por este medio y proceda a destruirlo o
borrarlo, y que en todo caso se abstenga de utilizar, reproducir, alterar,
archivar o comunicar a terceros el presente mensaje y ficheros anexos, todo
ello bajo pena de incurrir en responsabilidades legales.


2013/10/11 Furkan KAMACI furkankam...@gmail.com

 Here is a similar question:


 http://search-lucene.com/m/vnMGKACGM1/%252218+or+higher.%2522subj=FastVectorHighlighter+and+hl+fragsize+parameter+set+to+zero+causes+exception

 and a related fixed issue: https://issues.apache.org/jira/browse/SOLR-1268


 2013/10/11 Silvia Suárez s...@anpro21.com

  Dear Koji,
 
  Thanks a lot for your answer and Sorry about my english
 
  I tried to configure
  FastVectorHighlighter
 
 http://wiki.apache.org/solr/HighlightingParameters#hl.useFastVectorHighlighter
  
 
  However, I have this error:
 
 
  lst name=error
  str name=msg
  fragCharSize(1) is too small. It must be 18 or higher.
  /str
  str name=trace
  java.lang.IllegalArgumentException: fragCharSize(1) is too small. It must
  be 18 or higher. at
 
 
 org.apache.lucene.search.vectorhighlight.BaseFragListBuilder.createFieldFragList(BaseFragListBuilder.java:51)
  at
 
 
 org.apache.lucene.search.vectorhighlight.WeightedFragListBuilder.createFieldFragList(WeightedFragListBuilder.java:38)
  at
 
 
 org.apache.lucene.search.vectorhighlight.FastVectorHighlighter.getFieldFragList(FastVectorHighlighter.java:195)
  at
 
 
 org.apache.lucene.search.vectorhighlight.FastVectorHighlighter.getBestFragments(FastVectorHighlighter.java:184)
  at
 
 
 org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByFastVectorHighlighter(DefaultSolrHighlighter.java:588)
  at
 
 
 org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:413)
  at
 
 
 org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:139)
  at
 
 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)
  at
 
 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816) at
 
 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656)
  at
 
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
  at
 
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
  at
 
 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
  at
 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
  at
 
 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
  at
 
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560)
  at
 
 
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
  at
 
 
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)
  at
  org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)
  at
 
 
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
  at
 
 
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)
  at
 
 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
  at
 
 
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
  at
 
 
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
  at
 
 
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
  at org.eclipse.jetty.server.Server.handle(Server.java:365) at
 
 
 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)
  at

Re: Cores with lot of folders with prefix index.XXXXXXX

Do you have a lot of failed replications? Maybe those folders have 
something to do with this (please see the last answer at 
http://stackoverflow.com/questions/3145192/why-does-my-solr-slave-index-keep-growing
). If your disk space is valuable check index.properties file under data 
folder and try to determine which folders can be safely deleted.

Primož




From:   Yago Riveiro yago.rive...@gmail.com
To: solr-user@lucene.apache.org
Date:   11.10.2013 12:13
Subject:Re: Cores with lot of folders with prefix index.XXX



I have ssd's therefor my space is like gold, I can have 30% of my space 
waste in failed replications, or replications that are not cleaned. 

The question for me is if this a normal behaviour or is a bug. If is a 
normal behaviour I have a trouble because a ssd with more than 512G is 
expensive.

-- 
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Friday, October 11, 2013 at 11:03 AM, primoz.sk...@policija.si wrote:

 I think this is connected to replications being made? I also have quite
 some of them but currently I am not worried :)

Re: Cores with lot of folders with prefix index.XXXXXXX

2013-10-11 Thread Yago Riveiro

The thread that you point is about master / slave - replication, Is this issue 
valid on SolrCloud context?  

I check the index.properties and indeed the variable index=index.X point to 
a folder, the others can be deleted without any scary side effect?


--  
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Friday, October 11, 2013 at 11:31 AM, primoz.sk...@policija.si wrote:

 Do you have a lot of failed replications? Maybe those folders have  
 something to do with this (please see the last answer at  
 http://stackoverflow.com/questions/3145192/why-does-my-solr-slave-index-keep-growing
 ). If your disk space is valuable check index.properties file under data  
 folder and try to determine which folders can be safely deleted.
  
 Primo¾
  
  
  
  
 From: Yago Riveiro yago.rive...@gmail.com (mailto:yago.rive...@gmail.com)
 To: solr-user@lucene.apache.org (mailto:solr-user@lucene.apache.org)
 Date: 11.10.2013 12:13
 Subject: Re: Cores with lot of folders with prefix index.XXX
  
  
  
 I have ssd's therefor my space is like gold, I can have 30% of my space  
 waste in failed replications, or replications that are not cleaned.  
  
 The question for me is if this a normal behaviour or is a bug. If is a  
 normal behaviour I have a trouble because a ssd with more than 512G is  
 expensive.
  
 --  
 Yago Riveiro
 Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
  
  
 On Friday, October 11, 2013 at 11:03 AM, primoz.sk...@policija.si 
 (mailto:primoz.sk...@policija.si) wrote:
  
  I think this is connected to replications being made? I also have quite
  some of them but currently I am not worried :)

Re: Cores with lot of folders with prefix index.XXXXXXX

2013-10-11 Thread Shalin Shekhar Mangar

There are open issues related to extra index.XXX folders lying around if
replication/recovery fails. See
https://issues.apache.org/jira/browse/SOLR-4506


On Fri, Oct 11, 2013 at 4:06 PM, Yago Riveiro yago.rive...@gmail.comwrote:

 The thread that you point is about master / slave - replication, Is this
 issue valid on SolrCloud context?

 I check the index.properties and indeed the variable index=index.X
 point to a folder, the others can be deleted without any scary side effect?


 --
 Yago Riveiro
 Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


 On Friday, October 11, 2013 at 11:31 AM, primoz.sk...@policija.si wrote:

  Do you have a lot of failed replications? Maybe those folders have
  something to do with this (please see the last answer at
 
 http://stackoverflow.com/questions/3145192/why-does-my-solr-slave-index-keep-growing
  ). If your disk space is valuable check index.properties file under data
  folder and try to determine which folders can be safely deleted.
 
  Primo¾
 
 
 
 
  From: Yago Riveiro yago.rive...@gmail.com (mailto:
 yago.rive...@gmail.com)
  To: solr-user@lucene.apache.org (mailto:solr-user@lucene.apache.org)
  Date: 11.10.2013 12:13
  Subject: Re: Cores with lot of folders with prefix index.XXX
 
 
 
  I have ssd's therefor my space is like gold, I can have 30% of my space
  waste in failed replications, or replications that are not cleaned.
 
  The question for me is if this a normal behaviour or is a bug. If is a
  normal behaviour I have a trouble because a ssd with more than 512G is
  expensive.
 
  --
  Yago Riveiro
  Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
 
 
  On Friday, October 11, 2013 at 11:03 AM, primoz.sk...@policija.si(mailto:
 primoz.sk...@policija.si) wrote:
 
   I think this is connected to replications being made? I also have quite
   some of them but currently I am not worried :)
  
 
 
 





-- 
Regards,
Shalin Shekhar Mangar.

Re: Solr 4.4.0 on Ubuntu 10.04 with Jetty 6.1 from package Repository

Not so hard switching it to Oracle JDK 7u40.
Just download it and change the JAVA_HOME path in /etc/default/jetty, so
it's not nescessary to switch java version with update-java-alternatives

The machine is 64bit :)



2013/10/11 Bill Bell billnb...@gmail.com

 Does this work ?
 I can suggest -XX:-UseLoopPredicate to switch off predicates.

 ???

 Which version of 7 is recommended ?

 Bill Bell
 Sent from mobile


  On Oct 10, 2013, at 11:29 AM, Smiley, David W. dsmi...@mitre.org
 wrote:
 
  *Don't* use JDK 7u40, it's been known to cause index corruption and
  SIGSEGV faults with Lucene: LUCENE-5212   This has not been unnoticed by
  Oracle.
 
  ~ David
 
  On 10/10/13 12:34 PM, Guido Medina guido.med...@temetra.com wrote:
 
  2. Java version: There are huges performance winning between Java 5, 6
and 7; we use Oracle JDK 7u40.

Re: Cores with lot of folders with prefix index.XXXXXXX

Honestly I don't know for sure if you can delete then. Maybe make a backup 
then delete them and see if it still works :)

Replication works differently in SolrCloud world as I currently know. I 
don't think there are any additional index.* folders because fallback does 
not work in SolrCloud (someone correct me if I am wrong!).

Primož



From:   Yago Riveiro yago.rive...@gmail.com
To: solr-user@lucene.apache.org
Date:   11.10.2013 12:36
Subject:Re: Cores with lot of folders with prefix index.XXX



The thread that you point is about master / slave - replication, Is this 
issue valid on SolrCloud context? 

I check the index.properties and indeed the variable index=index.X 
point to a folder, the others can be deleted without any scary side 
effect?


-- 
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Friday, October 11, 2013 at 11:31 AM, primoz.sk...@policija.si wrote:

 Do you have a lot of failed replications? Maybe those folders have 
 something to do with this (please see the last answer at 
 
http://stackoverflow.com/questions/3145192/why-does-my-solr-slave-index-keep-growing

 ). If your disk space is valuable check index.properties file under data 
 
 folder and try to determine which folders can be safely deleted.
 
 Primo¾
 
 
 
 
 From: Yago Riveiro yago.rive...@gmail.com (
mailto:yago.rive...@gmail.com)
 To: solr-user@lucene.apache.org (mailto:solr-user@lucene.apache.org)
 Date: 11.10.2013 12:13
 Subject: Re: Cores with lot of folders with prefix index.XXX
 
 
 
 I have ssd's therefor my space is like gold, I can have 30% of my space 
 waste in failed replications, or replications that are not cleaned. 
 
 The question for me is if this a normal behaviour or is a bug. If is a 
 normal behaviour I have a trouble because a ssd with more than 512G is 
 expensive.
 
 -- 
 Yago Riveiro
 Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
 
 
 On Friday, October 11, 2013 at 11:03 AM, primoz.sk...@policija.si (
mailto:primoz.sk...@policija.si) wrote:
 
  I think this is connected to replications being made? I also have 
quite
  some of them but currently I am not worried :)

Re: Cores with lot of folders with prefix index.XXXXXXX

Thanks, I guess I was wrong after all in my last post.

Primož

From:   Shalin Shekhar Mangar shalinman...@gmail.com
To: solr-user@lucene.apache.org
Date:   11.10.2013 12:43
Subject:Re: Cores with lot of folders with prefix index.XXX

There are open issues related to extra index.XXX folders lying around if
replication/recovery fails. See
https://issues.apache.org/jira/browse/SOLR-4506

On Fri, Oct 11, 2013 at 4:06 PM, Yago Riveiro 
yago.rive...@gmail.comwrote:

 The thread that you point is about master / slave - replication, Is this
 issue valid on SolrCloud context?

 I check the index.properties and indeed the variable index=index.X
 point to a folder, the others can be deleted without any scary side 
effect?

 --
 Yago Riveiro
 Sent with Sparrow (http://www.sparrowmailapp.com/?sig)

 On Friday, October 11, 2013 at 11:31 AM, primoz.sk...@policija.si wrote:

  Do you have a lot of failed replications? Maybe those folders have
  something to do with this (please see the last answer at

http://stackoverflow.com/questions/3145192/why-does-my-solr-slave-index-keep-growing

  ). If your disk space is valuable check index.properties file under 
data
  folder and try to determine which folders can be safely deleted.

  Primo¾

  From: Yago Riveiro yago.rive...@gmail.com (mailto:
 yago.rive...@gmail.com)
  To: solr-user@lucene.apache.org (mailto:solr-user@lucene.apache.org)
  Date: 11.10.2013 12:13
  Subject: Re: Cores with lot of folders with prefix index.XXX

  I have ssd's therefor my space is like gold, I can have 30% of my 
space
  waste in failed replications, or replications that are not cleaned.

  The question for me is if this a normal behaviour or is a bug. If is a
  normal behaviour I have a trouble because a ssd with more than 512G is
  expensive.

  --
  Yago Riveiro
  Sent with Sparrow (http://www.sparrowmailapp.com/?sig)

  On Friday, October 11, 2013 at 11:03 AM, 
primoz.sk...@policija.si(mailto:
 primoz.sk...@policija.si) wrote:

   I think this is connected to replications being made? I also have 
quite
   some of them but currently I am not worried :)

-- 
Regards,
Shalin Shekhar Mangar.

Re: Solr 4.4.0 on Ubuntu 10.04 with Jetty 6.1 from package Repository

So the main problem was that the libs must be copied to the WEB-INF/lib
directory insteed of the jetty lib/ext directory. Is the fact that you
should you use WEB-INF/lib somewhere documented?


2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com

 Not so hard switching it to Oracle JDK 7u40.
 Just download it and change the JAVA_HOME path in /etc/default/jetty, so
 it's not nescessary to switch java version with update-java-alternatives

 The machine is 64bit :)



 2013/10/11 Bill Bell billnb...@gmail.com

 Does this work ?
 I can suggest -XX:-UseLoopPredicate to switch off predicates.

 ???

 Which version of 7 is recommended ?

 Bill Bell
 Sent from mobile


  On Oct 10, 2013, at 11:29 AM, Smiley, David W. dsmi...@mitre.org
 wrote:
 
  *Don't* use JDK 7u40, it's been known to cause index corruption and
  SIGSEGV faults with Lucene: LUCENE-5212   This has not been unnoticed by
  Oracle.
 
  ~ David
 
  On 10/10/13 12:34 PM, Guido Medina guido.med...@temetra.com wrote:
 
  2. Java version: There are huges performance winning between Java 5, 6
and 7; we use Oracle JDK 7u40.

Re: Re: feedback on Solr 4.x LotsOfCores feature

2013-10-11 Thread Erick Erickson

bq: sharing the underlying solrconfig object the configset introduced
in JIRA SOLR-4478 seems to be the solution for non-SolrCloud mode

SOLR-4478 will NOT share the underlying config objects, it simply
shares the underlying directory. Each core will, at least as presently
envisioned, simply read the files that exist there and create their
own solrconfig object. Schema objects may be shared, but not config
objects. It may turn out to be relatively easy to do in the configset
situation, but last time I looked at sharing the underlying config
object it was too fraught with problems.

bq: 15K cores is around 4 minutes

I find this very odd. On my laptop, spinning disk, I think I was
seeing 1k cores discovered/sec. You're seeing roughly 16x slower, so I
have no idea what's going on here. If this is just reading the files,
you should be seeing horrible disk contention. Are you on some kind of
networked drive?

bq: To do that in background and to block on that request until core
discovery is complete, should not work for us (due to the worst case).
What other choices are there? Either you have to do it up front or
with some kind of blocking. Hmmm, I suppose you could keep some kind
of custom store (DB? File? ZooKeeper?) that would keep the last known
layout. You'd still have some kind of worst-case situation where the
core you were trying to load wouldn't be in your persistent store and
you'd _still_ have to wait for the discovery process to complete.

bq: and we will use the cores Auto option to create load or only load
the core on
Interesting. I can see how this could all work without any core
discovery but it does require a very specific setup.

On Thu, Oct 10, 2013 at 11:42 AM, Soyez Olivier
olivier.so...@worldline.com wrote:
 The corresponding patch for Solr 4.2.1 LotsOfCores can be found in SOLR-5316, 
 including the new Cores options :
 - numBuckets to create a subdirectory based on a hash on the corename % 
 numBuckets in the core Datadir
 - Auto with 3 differents values :
   1) false : default behaviour
   2) createLoad : create, if not exist, and load the core on the fly on the 
 first incoming request (update, select)
   3) onlyLoad : load the core on the fly on the first incoming request 
 (update, select), if exist on disk

 Concerning :
 - sharing the underlying solrconfig object, the configset introduced in JIRA 
 SOLR-4478 seems to be the solution for non-SolrCloud mode.
 We need to test it for our use case. If another solution exists, please tell 
 me. We are very interested in such functionality and to contribute, if we can.

 - the possibility of lotsOfCores in SolrCloud, we don't know in details how 
 SolrCloud is working.
 But one possible limit is the maximum number of entries that can be added to 
 a zookeeper node.
 Maybe, a solution will be just a kind of hashing in the zookeeper tree.

 - the time to discover cores in Solr 4.4 : with spinning disk under linux, 
 all cores with transient=true and loadOnStartup=false, the linux buffer 
 cache empty before starting Solr :
 15K cores is around 4 minutes. It's linear in the cores number, so for 50K 
 it's more than 13 minutes. In fact, it corresponding to the time to read all 
 core.properties files.
 To do that in background and to block on that request until core discovery is 
 complete, should not work for us (due to the worst case).
 So, we will just disable the core Discovery, because we don't need to know 
 all cores from the start. Start Solr without any core entries in solr.xml, 
 and we will use the cores Auto option to create load or only load the core on 
 the fly, based on the existence of the core on the disk (absolute path 
 calculated from the core name).

 Thanks for your interest,

 Olivier
 
 De : Erick Erickson [erickerick...@gmail.com]
 Date d'envoi : lundi 7 octobre 2013 14:33
 À : solr-user@lucene.apache.org
 Objet : Re: feedback on Solr 4.x LotsOfCores feature

 Thanks for the great writeup! It's always interesting to see how
 a feature plays out in the real world. A couple of questions
 though:

 bq: We added 2 Cores options :
 Do you mean you patched Solr? If so are you willing to shard the code
 back? If both are yes, please open a JIRA, attach the patch and assign
 it to me.

 bq:  the number of file descriptors, it used a lot (need to increase global
 max and per process fd)

 Right, this makes sense since you have a bunch of cores all with their
 own descriptors open. I'm assuming that you hit a rather high max
 number and it stays pretty steady

 bq: the overhead to parse solrconfig.xml and load dependencies to open
 each core

 Right, I tried to look at sharing the underlying solrconfig object but
 it seemed pretty hairy. There are some extensive comments in the
 JIRA of the problems I foresaw. There may be some action on this
 in the future.

 bq: lotsOfCores doesn’t work with SolrCloud

 Right, we haven't concentrated on that, it's an interesting problem.
 In

Re: Find documents that are composed of % words

2013-10-11 Thread Erick Erickson

bq: but you cannot ask this to client.

You _can_ ask this of a client. IMO you are obligated to.
A gentle way to do that is say something like:

Solr doesn't do that out-of-the-box. I estimate it will
take me XXX weeks to implement that in custom code.
I will be unable to make progress on features A-F during
that time. We can try tweaking Solr's ranking with the
standard configurations and see if that satisfies your
ranking requirements in YYY days. Please prioritize this
relative to the other features.

I have, quite literally been in very similar situations.
The client was convinced that BM25 ranking would give
better results (this was before flexible scoring). They
never needed the BM25 stuff. And their project was wildly
successful.

It's amazing how often software people don't give this
feedback and then the project managers are surprised
later by time/cost overruns or lack of features. We _must_
inform our clients of the costs of a feature and cheaper
alternatives before they can make informed decisions.

It's also amazing how often, when given realistic cost
estimates, features like this get put off forever. On those
occasions when it _does_ make a difference, at least
the client has the information necessary to prioritize,
and their expectations are set appropriately.

Rant done,
Erick

On Thu, Oct 10, 2013 at 3:03 PM, shahzad73 shahzad...@yahoo.com wrote:
Yes the correct is answer may be Why but you cannot ask this to client.
He think there is something interesting with this formula and if it works we
can index websites with Nutch + Solrand let users input queries that
can locate documents which has % of foreign words other than list provided.
i will check the answer provided

Shahzad

--
View this message in context:
http://lucene.472066.n3.nabble.com/Find-documents-that-are-composed-of-words-tp4094264p4094778.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 4.4.0 on Ubuntu 10.04 with Jetty 6.1 from package Repository

I can report that jetty is running now with this options:

JAVA_OPTIONS=-Djava.awt.headless=true -Dfile.encoding=UTF-8 -Xms256m
-Xmx256m -XX:+UseG1GC -XX:MaxGCPauseMillis=50 -XX:+OptimizeStringConcat
-XX:+UseStringCache -Dsolr.solr.home=/usr/share/solr $JAVA_OPTIONS

@Guido: I reduced the min/max heap size to 256m, i will increase this on
production server.


2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com

 So the main problem was that the libs must be copied to the WEB-INF/lib
 directory insteed of the jetty lib/ext directory. Is the fact that you
 should you use WEB-INF/lib somewhere documented?


 2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com

 Not so hard switching it to Oracle JDK 7u40.
 Just download it and change the JAVA_HOME path in /etc/default/jetty, so
 it's not nescessary to switch java version with update-java-alternatives

 The machine is 64bit :)



 2013/10/11 Bill Bell billnb...@gmail.com

 Does this work ?
 I can suggest -XX:-UseLoopPredicate to switch off predicates.

 ???

 Which version of 7 is recommended ?

 Bill Bell
 Sent from mobile


  On Oct 10, 2013, at 11:29 AM, Smiley, David W. dsmi...@mitre.org
 wrote:
 
  *Don't* use JDK 7u40, it's been known to cause index corruption and
  SIGSEGV faults with Lucene: LUCENE-5212   This has not been unnoticed
 by
  Oracle.
 
  ~ David
 
  On 10/10/13 12:34 PM, Guido Medina guido.med...@temetra.com
 wrote:
 
  2. Java version: There are huges performance winning between Java 5, 6
and 7; we use Oracle JDK 7u40.

Re: Solr 4.4.0 on Ubuntu 10.04 with Jetty 6.1 from package Repository

I can't tell for sure if that is documented somewhere, I did that 
straight forward cause of years I have been developing Java webapps, a 
class not found usually means that some jar/class is missing somewhere, 
because of all the issues I have seen with parent-child class loaders, 
my 1st choice is usually to make the jars/classes available to the 
relevant webapp classloader, in this case to WEB-INF/lib Solr webapp; 
which if running several webapps will require more PERM GEN space, but 
in this case is not a problem cause there is only one webapp running 
which won't lead to several child class loader loading the same set of 
classes from a jar.


I have seen too man weird things with class loaders, well, enough about 
class loading, don't want to hijack the subject of this thread,


HTH,

Guido.


On 11/10/13 11:55, Peter Schmidt wrote:

So the main problem was that the libs must be copied to the WEB-INF/lib
directory insteed of the jetty lib/ext directory. Is the fact that you
should you use WEB-INF/lib somewhere documented?


2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com


Not so hard switching it to Oracle JDK 7u40.
Just download it and change the JAVA_HOME path in /etc/default/jetty, so
it's not nescessary to switch java version with update-java-alternatives

The machine is 64bit :)



2013/10/11 Bill Bell billnb...@gmail.com


Does this work ?
I can suggest -XX:-UseLoopPredicate to switch off predicates.

???

Which version of 7 is recommended ?

Bill Bell
Sent from mobile



On Oct 10, 2013, at 11:29 AM, Smiley, David W. dsmi...@mitre.org

wrote:

*Don't* use JDK 7u40, it's been known to cause index corruption and
SIGSEGV faults with Lucene: LUCENE-5212   This has not been unnoticed by
Oracle.

~ David


On 10/10/13 12:34 PM, Guido Medina guido.med...@temetra.com wrote:

2. Java version: There are huges performance winning between Java 5, 6
   and 7; we use Oracle JDK 7u40.

Re: Solr 4.4.0 on Ubuntu 10.04 with Jetty 6.1 from package Repository

Remember the -server which for Java webapps or dedicated Java services 
will improve things.


Guido.

On 11/10/13 12:26, Peter Schmidt wrote:

I can report that jetty is running now with this options:

JAVA_OPTIONS=-Djava.awt.headless=true -Dfile.encoding=UTF-8 -Xms256m
-Xmx256m -XX:+UseG1GC -XX:MaxGCPauseMillis=50 -XX:+OptimizeStringConcat
-XX:+UseStringCache -Dsolr.solr.home=/usr/share/solr $JAVA_OPTIONS

@Guido: I reduced the min/max heap size to 256m, i will increase this on
production server.


2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com


So the main problem was that the libs must be copied to the WEB-INF/lib
directory insteed of the jetty lib/ext directory. Is the fact that you
should you use WEB-INF/lib somewhere documented?


2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com


Not so hard switching it to Oracle JDK 7u40.
Just download it and change the JAVA_HOME path in /etc/default/jetty, so
it's not nescessary to switch java version with update-java-alternatives

The machine is 64bit :)



2013/10/11 Bill Bell billnb...@gmail.com


Does this work ?
I can suggest -XX:-UseLoopPredicate to switch off predicates.

???

Which version of 7 is recommended ?

Bill Bell
Sent from mobile



On Oct 10, 2013, at 11:29 AM, Smiley, David W. dsmi...@mitre.org

wrote:

*Don't* use JDK 7u40, it's been known to cause index corruption and
SIGSEGV faults with Lucene: LUCENE-5212   This has not been unnoticed

by

Oracle.

~ David


On 10/10/13 12:34 PM, Guido Medina guido.med...@temetra.com

wrote:

2. Java version: There are huges performance winning between Java 5, 6
   and 7; we use Oracle JDK 7u40.

Re: Find documents that are composed of % words

2013-10-11 Thread shahzad73

Eric agreed   Solr + Nutch solution was proposed by myself and had never used
these technologies, this is first time i handle these 2.   My initial
response to client's requirments were to try to work out existing industry
tools and then modify it according to client requirements instead of
re-inventing the wheel. I start from 0 to this point and was not even aware
Sole can handle this sort of requirement . 

Now all infrastructure is there crawler + index and a app to make searches,
its just this base requirement to fullfill.   At the moment i am moving in
dark to configure Solr to handle this requirements.   Here is what I am
thinking to do

Develop a filter which is called in search time for a field that will hold
all tokens for the page.   it will determine how many tokens (words) match
with criteria words  and what are remaining tokens.   get the total number
of tokens for a document and produce the % of matched and unmatched ratio.

Not sure above solution will work. so need suggestions






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Find-documents-that-are-composed-of-words-tp4094264p4094953.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Find documents that are composed of % words

2013-10-11 Thread shahzad73

Aloke Ghoshal i'm trying to work out your equation.   i am using standard
scheme provided by nutch for solr and not aware of how to calculate  
myfieldwordcount   in first query.no idea where this count will come
from.   is there any filter that will store number of tokens generated for a
specific field and store it as another field.   that way we can use it .
not sure what norm does in second equation  try to find information for
this from online and did not find any yet.   please explain


Shahzad



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Find-documents-that-are-composed-of-words-tp4094264p4094955.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 4.4.0 on Ubuntu 10.04 with Jetty 6.1 from package Repository

@Guido: Itried it before and than i thought you marked just the server
options

Because the -sever causes a:

sudo service jetty start
 * Starting Jetty servlet engine.
jetty
Invalid option -server
Cannot parse command line arguments

Or should i substitute server with ...?

Options with -server:


JAVA_OPTIONS=-Djava.awt.headless=true -Dfile.encoding=UTF-8 -server
-Xms256m -Xmx256m -XX:+UseG1GC -XX:MaxGCPauseMillis=50
-XX:+OptimizeStringConcat -XX:+UseStringCache
-Dsolr.solr.home=/usr/share/solr $JAVA_OPTIONS



2013/10/11 Guido Medina guido.med...@temetra.com

 Remember the -server which for Java webapps or dedicated Java services
 will improve things.

 Guido.


 On 11/10/13 12:26, Peter Schmidt wrote:

 I can report that jetty is running now with this options:

 JAVA_OPTIONS=-Djava.awt.**headless=true -Dfile.encoding=UTF-8 -Xms256m
 -Xmx256m -XX:+UseG1GC -XX:MaxGCPauseMillis=50 -XX:+OptimizeStringConcat
 -XX:+UseStringCache -Dsolr.solr.home=/usr/share/**solr $JAVA_OPTIONS

 @Guido: I reduced the min/max heap size to 256m, i will increase this on
 production server.


 2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com

  So the main problem was that the libs must be copied to the WEB-INF/lib
 directory insteed of the jetty lib/ext directory. Is the fact that you
 should you use WEB-INF/lib somewhere documented?


 2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com

  Not so hard switching it to Oracle JDK 7u40.
 Just download it and change the JAVA_HOME path in /etc/default/jetty, so
 it's not nescessary to switch java version with update-java-alternatives

 The machine is 64bit :)



 2013/10/11 Bill Bell billnb...@gmail.com

  Does this work ?
 I can suggest -XX:-UseLoopPredicate to switch off predicates.

 ???

 Which version of 7 is recommended ?

 Bill Bell
 Sent from mobile


  On Oct 10, 2013, at 11:29 AM, Smiley, David W. dsmi...@mitre.org

 wrote:

 *Don't* use JDK 7u40, it's been known to cause index corruption and
 SIGSEGV faults with Lucene: LUCENE-5212   This has not been unnoticed

 by

 Oracle.

 ~ David

  On 10/10/13 12:34 PM, Guido Medina guido.med...@temetra.com

 wrote:

 2. Java version: There are huges performance winning between Java 5, 6
and 7; we use Oracle JDK 7u40.

Re: Solr 4.4.0 on Ubuntu 10.04 with Jetty 6.1 from package Repository


It is JVM parameter, example:
JAVA_OPTIONS=-Djava.awt.headless=true -Dfile.encoding=UTF-8 -server 
-Xms256m -Xmx256m


If you want to concatenate more JVM parameters you do it like this:
JAVA_OPTIONS=-Dsolr.solr.home=/usr/share/solr $JAVA_OPTIONS

Take a good look at the format,

Guido.

On 11/10/13 13:37, Peter Schmidt wrote:

@Guido: Itried it before and than i thought you marked just the server
options

Because the -sever causes a:

sudo service jetty start
  * Starting Jetty servlet engine.
jetty
Invalid option -server
Cannot parse command line arguments

Or should i substitute server with ...?

Options with -server:


JAVA_OPTIONS=-Djava.awt.headless=true -Dfile.encoding=UTF-8 -server
-Xms256m -Xmx256m -XX:+UseG1GC -XX:MaxGCPauseMillis=50
-XX:+OptimizeStringConcat -XX:+UseStringCache
-Dsolr.solr.home=/usr/share/solr $JAVA_OPTIONS



2013/10/11 Guido Medina guido.med...@temetra.com


Remember the -server which for Java webapps or dedicated Java services
will improve things.

Guido.


On 11/10/13 12:26, Peter Schmidt wrote:


I can report that jetty is running now with this options:

JAVA_OPTIONS=-Djava.awt.**headless=true -Dfile.encoding=UTF-8 -Xms256m
-Xmx256m -XX:+UseG1GC -XX:MaxGCPauseMillis=50 -XX:+OptimizeStringConcat
-XX:+UseStringCache -Dsolr.solr.home=/usr/share/**solr $JAVA_OPTIONS

@Guido: I reduced the min/max heap size to 256m, i will increase this on
production server.


2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com

  So the main problem was that the libs must be copied to the WEB-INF/lib

directory insteed of the jetty lib/ext directory. Is the fact that you
should you use WEB-INF/lib somewhere documented?


2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com

  Not so hard switching it to Oracle JDK 7u40.

Just download it and change the JAVA_HOME path in /etc/default/jetty, so
it's not nescessary to switch java version with update-java-alternatives

The machine is 64bit :)



2013/10/11 Bill Bell billnb...@gmail.com

  Does this work ?

I can suggest -XX:-UseLoopPredicate to switch off predicates.

???

Which version of 7 is recommended ?

Bill Bell
Sent from mobile


  On Oct 10, 2013, at 11:29 AM, Smiley, David W. dsmi...@mitre.org
wrote:


*Don't* use JDK 7u40, it's been known to cause index corruption and
SIGSEGV faults with Lucene: LUCENE-5212   This has not been unnoticed


by


Oracle.

~ David

  On 10/10/13 12:34 PM, Guido Medina guido.med...@temetra.com
wrote:
2. Java version: There are huges performance winning between Java 5, 6

and 7; we use Oracle JDK 7u40.

Re: Solr 4.4.0 on Ubuntu 10.04 with Jetty 6.1 from package Repository

Strange. When i add -server to the arguments, i got everytime the error
on jetty startup

Invalid option -server
Cannot parse command line arguments


2013/10/11 Guido Medina guido.med...@temetra.com

 It is JVM parameter, example:

 JAVA_OPTIONS=-Djava.awt.**headless=true -Dfile.encoding=UTF-8 -server
 -Xms256m -Xmx256m

 If you want to concatenate more JVM parameters you do it like this:
 JAVA_OPTIONS=-Dsolr.solr.**home=/usr/share/solr $JAVA_OPTIONS

 Take a good look at the format,

 Guido.


 On 11/10/13 13:37, Peter Schmidt wrote:

 @Guido: Itried it before and than i thought you marked just the server
 options

 Because the -sever causes a:

 sudo service jetty start
   * Starting Jetty servlet engine.
 jetty
 Invalid option -server
 Cannot parse command line arguments

 Or should i substitute server with ...?

 Options with -server:


 JAVA_OPTIONS=-Djava.awt.**headless=true -Dfile.encoding=UTF-8 -server
 -Xms256m -Xmx256m -XX:+UseG1GC -XX:MaxGCPauseMillis=50
 -XX:+OptimizeStringConcat -XX:+UseStringCache
 -Dsolr.solr.home=/usr/share/**solr $JAVA_OPTIONS



 2013/10/11 Guido Medina guido.med...@temetra.com

  Remember the -server which for Java webapps or dedicated Java services
 will improve things.

 Guido.


 On 11/10/13 12:26, Peter Schmidt wrote:

  I can report that jetty is running now with this options:

 JAVA_OPTIONS=-Djava.awt.headless=true -Dfile.encoding=UTF-8
 -Xms256m

 -Xmx256m -XX:+UseG1GC -XX:MaxGCPauseMillis=50 -XX:+OptimizeStringConcat
 -XX:+UseStringCache -Dsolr.solr.home=/usr/share/solr $JAVA_OPTIONS


 @Guido: I reduced the min/max heap size to 256m, i will increase this on
 production server.


 2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com

   So the main problem was that the libs must be copied to the
 WEB-INF/lib

 directory insteed of the jetty lib/ext directory. Is the fact that you
 should you use WEB-INF/lib somewhere documented?


 2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com

   Not so hard switching it to Oracle JDK 7u40.

 Just download it and change the JAVA_HOME path in /etc/default/jetty,
 so
 it's not nescessary to switch java version with
 update-java-alternatives

 The machine is 64bit :)



 2013/10/11 Bill Bell billnb...@gmail.com

   Does this work ?

 I can suggest -XX:-UseLoopPredicate to switch off predicates.

 ???

 Which version of 7 is recommended ?

 Bill Bell
 Sent from mobile


   On Oct 10, 2013, at 11:29 AM, Smiley, David W. 
 dsmi...@mitre.org
 wrote:

  *Don't* use JDK 7u40, it's been known to cause index corruption and
 SIGSEGV faults with Lucene: LUCENE-5212   This has not been
 unnoticed

  by

  Oracle.

 ~ David

   On 10/10/13 12:34 PM, Guido Medina guido.med...@temetra.com
 wrote:
 2. Java version: There are huges performance winning between Java
 5, 6

 and 7; we use Oracle JDK 7u40.

Re: Solr 4.4.0 on Ubuntu 10.04 with Jetty 6.1 from package Repository

Oh, i got it http://stackoverflow.com/a/5273166/326905

at least 2 cores and at least 2 GB physical memory

Until know i'm using a VM with single core and 1GB RAM.

So this will be later for production :)

Thank you Guido.


2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com

 Strange. When i add -server to the arguments, i got everytime the error
 on jetty startup


 Invalid option -server
 Cannot parse command line arguments


 2013/10/11 Guido Medina guido.med...@temetra.com

 It is JVM parameter, example:

 JAVA_OPTIONS=-Djava.awt.**headless=true -Dfile.encoding=UTF-8 -server
 -Xms256m -Xmx256m

 If you want to concatenate more JVM parameters you do it like this:
 JAVA_OPTIONS=-Dsolr.solr.**home=/usr/share/solr $JAVA_OPTIONS

 Take a good look at the format,

 Guido.


 On 11/10/13 13:37, Peter Schmidt wrote:

 @Guido: Itried it before and than i thought you marked just the server
 options

 Because the -sever causes a:

 sudo service jetty start
   * Starting Jetty servlet engine.
 jetty
 Invalid option -server
 Cannot parse command line arguments

 Or should i substitute server with ...?

 Options with -server:


 JAVA_OPTIONS=-Djava.awt.**headless=true -Dfile.encoding=UTF-8 -server
 -Xms256m -Xmx256m -XX:+UseG1GC -XX:MaxGCPauseMillis=50
 -XX:+OptimizeStringConcat -XX:+UseStringCache
 -Dsolr.solr.home=/usr/share/**solr $JAVA_OPTIONS



 2013/10/11 Guido Medina guido.med...@temetra.com

  Remember the -server which for Java webapps or dedicated Java services
 will improve things.

 Guido.


 On 11/10/13 12:26, Peter Schmidt wrote:

  I can report that jetty is running now with this options:

 JAVA_OPTIONS=-Djava.awt.headless=true -Dfile.encoding=UTF-8
 -Xms256m

 -Xmx256m -XX:+UseG1GC -XX:MaxGCPauseMillis=50 -XX:+OptimizeStringConcat
 -XX:+UseStringCache -Dsolr.solr.home=/usr/share/solr
 $JAVA_OPTIONS


 @Guido: I reduced the min/max heap size to 256m, i will increase this
 on
 production server.


 2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com

   So the main problem was that the libs must be copied to the
 WEB-INF/lib

 directory insteed of the jetty lib/ext directory. Is the fact that you
 should you use WEB-INF/lib somewhere documented?


 2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com

   Not so hard switching it to Oracle JDK 7u40.

 Just download it and change the JAVA_HOME path in
 /etc/default/jetty, so
 it's not nescessary to switch java version with
 update-java-alternatives

 The machine is 64bit :)



 2013/10/11 Bill Bell billnb...@gmail.com

   Does this work ?

 I can suggest -XX:-UseLoopPredicate to switch off predicates.

 ???

 Which version of 7 is recommended ?

 Bill Bell
 Sent from mobile


   On Oct 10, 2013, at 11:29 AM, Smiley, David W. 
 dsmi...@mitre.org
 wrote:

  *Don't* use JDK 7u40, it's been known to cause index corruption and
 SIGSEGV faults with Lucene: LUCENE-5212   This has not been
 unnoticed

  by

  Oracle.

 ~ David

   On 10/10/13 12:34 PM, Guido Medina guido.med...@temetra.com
 wrote:
 2. Java version: There are huges performance winning between Java
 5, 6

 and 7; we use Oracle JDK 7u40.

SolrCloud on SSL

2013-10-11 Thread Christopher Gross

I have 3 SolrCloud nodes (call them idx1, idx2, idx3), and the boxes have
SSL  certs configured on them to protect the Solr Indexes.

Right now, I can do queries on idx1 and it works fine.
If I try to query on idx3, I get:
org.apache.solr.common.SolrException:
org.apache.sorl.client.solrj.SolrServerException:IOException occurred when
talking to server at http://idx1:8443/solr/test1
(and then a long stack trace -- can't copy it, on a test network)

Is there a spot in a Solr configuration that I can set this up to use HTTPS?

Let me know if you need more information to determine the problem.

Thanks!

-- Chris

Re: Solr 4.4.0 on Ubuntu 10.04 with Jetty 6.1 from package Repository

If your single core is at 32bits use Oracle JDK 7u25 or Ubuntu Open JDK 
7, the JDK 7u40 for 32bits will corrupt indexes as stated on the lucene 
bug report.


Guido.

On 11/10/13 15:13, Peter Schmidt wrote:

Oh, i got it http://stackoverflow.com/a/5273166/326905

at least 2 cores and at least 2 GB physical memory

Until know i'm using a VM with single core and 1GB RAM.

So this will be later for production :)

Thank you Guido.


2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com


Strange. When i add -server to the arguments, i got everytime the error
on jetty startup


Invalid option -server
Cannot parse command line arguments


2013/10/11 Guido Medina guido.med...@temetra.com


It is JVM parameter, example:

JAVA_OPTIONS=-Djava.awt.**headless=true -Dfile.encoding=UTF-8 -server
-Xms256m -Xmx256m

If you want to concatenate more JVM parameters you do it like this:
JAVA_OPTIONS=-Dsolr.solr.**home=/usr/share/solr $JAVA_OPTIONS

Take a good look at the format,

Guido.


On 11/10/13 13:37, Peter Schmidt wrote:


@Guido: Itried it before and than i thought you marked just the server
options

Because the -sever causes a:

sudo service jetty start
   * Starting Jetty servlet engine.
jetty
Invalid option -server
Cannot parse command line arguments

Or should i substitute server with ...?

Options with -server:


JAVA_OPTIONS=-Djava.awt.**headless=true -Dfile.encoding=UTF-8 -server
-Xms256m -Xmx256m -XX:+UseG1GC -XX:MaxGCPauseMillis=50
-XX:+OptimizeStringConcat -XX:+UseStringCache
-Dsolr.solr.home=/usr/share/**solr $JAVA_OPTIONS



2013/10/11 Guido Medina guido.med...@temetra.com

  Remember the -server which for Java webapps or dedicated Java services

will improve things.

Guido.


On 11/10/13 12:26, Peter Schmidt wrote:

  I can report that jetty is running now with this options:

JAVA_OPTIONS=-Djava.awt.headless=true -Dfile.encoding=UTF-8
-Xms256m

-Xmx256m -XX:+UseG1GC -XX:MaxGCPauseMillis=50 -XX:+OptimizeStringConcat
-XX:+UseStringCache -Dsolr.solr.home=/usr/share/solr
$JAVA_OPTIONS


@Guido: I reduced the min/max heap size to 256m, i will increase this
on
production server.


2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com

   So the main problem was that the libs must be copied to the
WEB-INF/lib


directory insteed of the jetty lib/ext directory. Is the fact that you
should you use WEB-INF/lib somewhere documented?


2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com

   Not so hard switching it to Oracle JDK 7u40.


Just download it and change the JAVA_HOME path in
/etc/default/jetty, so
it's not nescessary to switch java version with
update-java-alternatives

The machine is 64bit :)



2013/10/11 Bill Bell billnb...@gmail.com

   Does this work ?


I can suggest -XX:-UseLoopPredicate to switch off predicates.

???

Which version of 7 is recommended ?

Bill Bell
Sent from mobile


   On Oct 10, 2013, at 11:29 AM, Smiley, David W. 
dsmi...@mitre.org
wrote:

  *Don't* use JDK 7u40, it's been known to cause index corruption and

SIGSEGV faults with Lucene: LUCENE-5212   This has not been
unnoticed

  by

  Oracle.

~ David

   On 10/10/13 12:34 PM, Guido Medina guido.med...@temetra.com
wrote:
2. Java version: There are huges performance winning between Java
5, 6


 and 7; we use Oracle JDK 7u40.

Re: Solr 4.4.0 on Ubuntu 10.04 with Jetty 6.1 from package Repository

no it is 64bit and just a development VM. In production the solr will use
multicore, also 64bit and some gb ram.


2013/10/11 Guido Medina guido.med...@temetra.com

 If your single core is at 32bits use Oracle JDK 7u25 or Ubuntu Open JDK 7,
 the JDK 7u40 for 32bits will corrupt indexes as stated on the lucene bug
 report.

 Guido.


 On 11/10/13 15:13, Peter Schmidt wrote:

 Oh, i got it 
 http://stackoverflow.com/a/**5273166/326905http://stackoverflow.com/a/5273166/326905

 at least 2 cores and at least 2 GB physical memory

 Until know i'm using a VM with single core and 1GB RAM.

 So this will be later for production :)

 Thank you Guido.


 2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com

  Strange. When i add -server to the arguments, i got everytime the error
 on jetty startup


 Invalid option -server
 Cannot parse command line arguments


 2013/10/11 Guido Medina guido.med...@temetra.com

  It is JVM parameter, example:

 JAVA_OPTIONS=-Djava.awt.headless=true -Dfile.encoding=UTF-8
 -server

 -Xms256m -Xmx256m

 If you want to concatenate more JVM parameters you do it like this:
 JAVA_OPTIONS=-Dsolr.solr.home=/usr/share/solr $JAVA_OPTIONS


 Take a good look at the format,

 Guido.


 On 11/10/13 13:37, Peter Schmidt wrote:

  @Guido: Itried it before and than i thought you marked just the server
 options

 Because the -sever causes a:

 sudo service jetty start
* Starting Jetty servlet engine.
 jetty
 Invalid option -server
 Cannot parse command line arguments

 Or should i substitute server with ...?

 Options with -server:


 JAVA_OPTIONS=-Djava.awt.headless=true -Dfile.encoding=UTF-8
 -server

 -Xms256m -Xmx256m -XX:+UseG1GC -XX:MaxGCPauseMillis=50
 -XX:+OptimizeStringConcat -XX:+UseStringCache
 -Dsolr.solr.home=/usr/share/solr $JAVA_OPTIONS




 2013/10/11 Guido Medina guido.med...@temetra.com

   Remember the -server which for Java webapps or dedicated Java
 services

 will improve things.

 Guido.


 On 11/10/13 12:26, Peter Schmidt wrote:

   I can report that jetty is running now with this options:

 JAVA_OPTIONS=-Djava.awt.**headless=true -Dfile.encoding=UTF-8

 -Xms256m

 -Xmx256m -XX:+UseG1GC -XX:MaxGCPauseMillis=50
 -XX:+OptimizeStringConcat
 -XX:+UseStringCache -Dsolr.solr.home=/usr/share/**solr

 $JAVA_OPTIONS


 @Guido: I reduced the min/max heap size to 256m, i will increase this
 on
 production server.


 2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com

So the main problem was that the libs must be copied to the
 WEB-INF/lib

  directory insteed of the jetty lib/ext directory. Is the fact that
 you
 should you use WEB-INF/lib somewhere documented?


 2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com

Not so hard switching it to Oracle JDK 7u40.

  Just download it and change the JAVA_HOME path in
 /etc/default/jetty, so
 it's not nescessary to switch java version with
 update-java-alternatives

 The machine is 64bit :)



 2013/10/11 Bill Bell billnb...@gmail.com

Does this work ?

  I can suggest -XX:-UseLoopPredicate to switch off predicates.

 ???

 Which version of 7 is recommended ?

 Bill Bell
 Sent from mobile


On Oct 10, 2013, at 11:29 AM, Smiley, David W. 
 dsmi...@mitre.org
 wrote:

   *Don't* use JDK 7u40, it's been known to cause index corruption
 and

 SIGSEGV faults with Lucene: LUCENE-5212   This has not been
 unnoticed

   by

   Oracle.

 ~ David

On 10/10/13 12:34 PM, Guido Medina 
 guido.med...@temetra.com
 wrote:
 2. Java version: There are huges performance winning between Java
 5, 6

   and 7; we use Oracle JDK 7u40.

Problems using DataImportHandler and TikaEntityProcessor

2013-10-11 Thread Peter Bleackley


Starting Solr with the command line


java -Dsolr.solr.home=example-DIH/solr -jar start.jar


and then trying to import some data with

java -Durl=http://localhost:8983/solr/tika/update -Dtype=application/pdf 
-jar post.jar *.pdf


fails with error

SimplePostTool: WARNING: Solr returned an error #400 Bad Request
SimplePostTool: WARNING: IOException while reading response: 
java.io.IOException: Server returned HTTP response code: 400 for URL: 
http://localhost:8983/solr/tika/update


These are all valid PDFs that I have previously been able to import with 
Solr Cell.


What am I doing wrong?

Dr Peter J Bleackley
Computational Linguistics Contractor
Playful Technology Ltd

Re: Multiple schemas in the same SolrCloud ?

Upload the new configuration and the use the collection API to reload you
collection
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-ReloadaCollection



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multiple-schemas-in-the-same-SolrCloud-tp4094279p4094978.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Questions developing custom functionquery

2013-10-11 Thread JT

Hey Mikhail,

Thanks for responding.

Field: resourcename
Field-Type: org.apache.solr.schema.TextField
All 9 boxes checked (indexed, tokenized, stored).


I have various other fields (including MD5-checksums) in my Schema. When I
use a md5sum field (which is a str field, but doesn't have spaces,
forward slashes, etc.) The plugin I've written performs exactly as I've
expected.

I think the large part of my problem is that my ValueSource is being
instantianted as a the class StrFieldSource. When you call getvalues on a
StrFieldSource, you end up with a
DocTermsIndexDocValueshttp://grepcode.com/file/repo1.maven.org/maven2/org.apache.lucene/lucene-queries/4.3.0/org/apache/lucene/queries/function/docvalues/DocTermsIndexDocValues.java#DocTermsIndexDocValues.
Calling getVal() on a DocTermsIndexDocValues does some really weird stuff
that I really don't understand.
I assumed that calling ValueSource.getValues(...).strVal(int doc) would
simply return the data that my field corresponds to, but I don't think
that is true.


Its possible I'm going about this wrong and need to re-do my approach. I'm
just currently at a loss for what that approach is.



On Fri, Oct 11, 2013 at 2:48 AM, Mikhail Khludnev 
mkhlud...@griddynamics.com wrote:

 Hello JT,

 what's is the field and fieldType definition for resname ?
 can't you check how '/some
 example/data/here/2013/09/12/
 testing.text
 ' is handled on analysis page in SolrAdmin?


 On Fri, Oct 11, 2013 at 4:53 AM, Richard Lee rockiee...@gmail.com wrote:

  seems what u got is the terms other than the raw data. maybe u should
 check
  the api docs for more details
  2013-10-11 上午3:56于 JT handyrems...@gmail.com写道：
 
   I'm running into some issues developing a custom functionquery.
  
   My goal is to be able to implement a custom sorting technique.
  
   I have a field defined called resname, it is a single value str.
  
   Example: str name=resname/some
   example/data/here/2013/09/12/testing.text/str
  
   I would like to do a custom sort based on this resname field.
   Basically, I would like to parse out that date there (2013/09/12) and
  sort
   on that date.
  
  
   I've followed various tutorials
  - http://java.dzone.com/news/how-write-custom-solr
  -
  
 
 http://www.supermind.org/blog/756/how-to-write-a-custom-solr-functionquery
  
  
   Im at the point where my code compiles, runs, executes, etc. Solr is
  happy
   with my code.
  
   I have classes that inherit from ValueSorceParser and ValueSorce, etc.
  I've
   overrode parse and
   instantiated my class with ValueSource
  
   public ValueSource parse(FunctionQParser fqp) {
   return MyCustomClass(fqp.parseValueSource)
   }
  
   public class MyCustomClass extends ValueSource {
   ValueSource source;
  
   public MyCustomClass(ValueSource source) {
   this.source = source;
   }
  
   public FunctionValues getValues() {
  final FunctionValues sourceDV =
   source.getvalues(context,readerContext)
  return new IntValues(this)
   public int intVal(int doc) {
   //parse the value of resname here
 String value = sourceDV.strVal(doc);
...more stuff
}
  }
  }
  
   The issue I'm running into is that my call to sourceDV.strVal(doc) only
   returns part of the field, not all of it. It appears to be very
 random.
  
   I guess my actual question is, how do I access / reference the EXACT
 RAW
   value of a field, while writing a functionquery.
  
   Do I need to change my ValueSource to a String?, then somehow lookup
 the
   field name while inside my getValues call?
  
   Is there a way to access the raw field data , when referencing it as a
   FunctionValues?
  
  
   Maybe I'm going about this totally incorrectly?
  
 



 --
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics

 http://www.griddynamics.com
  mkhlud...@griddynamics.com

Re: Problems using DataImportHandler and TikaEntityProcessor

There may be a problem with you schema. Could you send your solr logs?


2013/10/11 Peter Bleackley bleackl...@zooey.co.uk

 Starting Solr with the command line


 java -Dsolr.solr.home=example-DIH/**solr -jar start.jar


 and then trying to import some data with

 java 
 -Durl=http://localhost:8983/**solr/tika/updatehttp://localhost:8983/solr/tika/update-Dtype=application/pdf
  -jar post.jar *.pdf

 fails with error

 SimplePostTool: WARNING: Solr returned an error #400 Bad Request
 SimplePostTool: WARNING: IOException while reading response:
 java.io.IOException: Server returned HTTP response code: 400 for URL:
 http://localhost:8983/solr/**tika/updatehttp://localhost:8983/solr/tika/update

 These are all valid PDFs that I have previously been able to import with
 Solr Cell.

 What am I doing wrong?

 Dr Peter J Bleackley
 Computational Linguistics Contractor
 Playful Technology Ltd

Re: SolrCloud on SSL

On 10/11/2013 8:17 AM, Christopher Gross wrote:
 I have 3 SolrCloud nodes (call them idx1, idx2, idx3), and the boxes have
 SSL  certs configured on them to protect the Solr Indexes.
 
 Right now, I can do queries on idx1 and it works fine.
 If I try to query on idx3, I get:
 org.apache.solr.common.SolrException:
 org.apache.sorl.client.solrj.SolrServerException:IOException occurred when
 talking to server at http://idx1:8443/solr/test1
 (and then a long stack trace -- can't copy it, on a test network)
 
 Is there a spot in a Solr configuration that I can set this up to use HTTPS?

From what I can tell, not yet.

https://issues.apache.org/jira/browse/SOLR-3854
https://issues.apache.org/jira/browse/SOLR-4407
https://issues.apache.org/jira/browse/SOLR-4470

I'm wondering why you want to do this, though.  It adds extra CPU
overhead.  Perhaps not a lot, but it's not free.

As for protecting Solr against eavesdropping, is it in a location where
that's possible?  The bottom line is this:  People that you cannot trust
should not have direct access to Solr.  It should be firewalled so only
trusted personnel and applications can talk to it.

Anyone who has direct access to Solr can change your index, delete your
index, and send denial of service queries.  If you take steps to block
access to the update handler(s) and the admin UI, denial of service
queries are still possible.  Blocking access to the update handlers and
admin UI is not something Solr itself can do - that's a job for the
servlet container.

Related general issue: The /browse handler included in the example
(which utilizes code written in velocity) requires that the user have
direct access to Solr.  This makes its very design insecure.  That
handler is intended as a demonstration of Solr's capabilities and how to
use them, it's not for production.

Thanks,
Shawn

Re: Cores with lot of folders with prefix index.XXXXXXX

On 10/11/2013 4:36 AM, Yago Riveiro wrote:
 The thread that you point is about master / slave - replication, Is this 
 issue valid on SolrCloud context?  
 
 I check the index.properties and indeed the variable index=index.X point 
 to a folder, the others can be deleted without any scary side effect?

SolrCloud uses traditional replication behind the scenes as a last
resort to recover an index when there's some kind of failure, or when it
determines that things are too far out of sync after a Solr restart, or
when adding replicas.  During normal operation, traditional replication
is *NOT* used.

If you are getting a lot of index. directories, this may be an
indication of an underlying issue, unless you are testing things and
doing a lot of Solr restarts, in which case it may be expected.

The index.properties file may be one way to go.  I would want to be
absolutely sure before deleting directories.  You should be able to
manually check which index directory Solr is using (with tools like lsof
for Linux or Process Explorer for Windows) and delete the others.

Thanks,
Shawn

Re: Solr 4.4.0 on Ubuntu 10.04 with Jetty 6.1 from package Repository

Then I think you downloaded the wrong JDK 7 (32bits JDK?), if you are 
running JDK 7 64bits the -server flag should be recognized. According to 
the stackoverflow link you mentioned before.


Guido.

On 11/10/13 15:48, Peter Schmidt wrote:

no it is 64bit and just a development VM. In production the solr will use
multicore, also 64bit and some gb ram.


2013/10/11 Guido Medina guido.med...@temetra.com


If your single core is at 32bits use Oracle JDK 7u25 or Ubuntu Open JDK 7,
the JDK 7u40 for 32bits will corrupt indexes as stated on the lucene bug
report.

Guido.


On 11/10/13 15:13, Peter Schmidt wrote:


Oh, i got it 
http://stackoverflow.com/a/**5273166/326905http://stackoverflow.com/a/5273166/326905

at least 2 cores and at least 2 GB physical memory

Until know i'm using a VM with single core and 1GB RAM.

So this will be later for production :)

Thank you Guido.


2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com

  Strange. When i add -server to the arguments, i got everytime the error

on jetty startup


Invalid option -server
Cannot parse command line arguments


2013/10/11 Guido Medina guido.med...@temetra.com

  It is JVM parameter, example:

JAVA_OPTIONS=-Djava.awt.headless=true -Dfile.encoding=UTF-8
-server

-Xms256m -Xmx256m

If you want to concatenate more JVM parameters you do it like this:
JAVA_OPTIONS=-Dsolr.solr.home=/usr/share/solr $JAVA_OPTIONS


Take a good look at the format,

Guido.


On 11/10/13 13:37, Peter Schmidt wrote:

  @Guido: Itried it before and than i thought you marked just the server

options

Because the -sever causes a:

sudo service jetty start
* Starting Jetty servlet engine.
jetty
Invalid option -server
Cannot parse command line arguments

Or should i substitute server with ...?

Options with -server:


JAVA_OPTIONS=-Djava.awt.headless=true -Dfile.encoding=UTF-8
-server

-Xms256m -Xmx256m -XX:+UseG1GC -XX:MaxGCPauseMillis=50
-XX:+OptimizeStringConcat -XX:+UseStringCache
-Dsolr.solr.home=/usr/share/solr $JAVA_OPTIONS




2013/10/11 Guido Medina guido.med...@temetra.com

   Remember the -server which for Java webapps or dedicated Java
services


will improve things.

Guido.


On 11/10/13 12:26, Peter Schmidt wrote:

   I can report that jetty is running now with this options:


JAVA_OPTIONS=-Djava.awt.**headless=true -Dfile.encoding=UTF-8

-Xms256m

-Xmx256m -XX:+UseG1GC -XX:MaxGCPauseMillis=50
-XX:+OptimizeStringConcat
-XX:+UseStringCache -Dsolr.solr.home=/usr/share/**solr

$JAVA_OPTIONS


@Guido: I reduced the min/max heap size to 256m, i will increase this
on
production server.


2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com

So the main problem was that the libs must be copied to the
WEB-INF/lib

  directory insteed of the jetty lib/ext directory. Is the fact that

you
should you use WEB-INF/lib somewhere documented?


2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com

Not so hard switching it to Oracle JDK 7u40.

  Just download it and change the JAVA_HOME path in

/etc/default/jetty, so
it's not nescessary to switch java version with
update-java-alternatives

The machine is 64bit :)



2013/10/11 Bill Bell billnb...@gmail.com

Does this work ?

  I can suggest -XX:-UseLoopPredicate to switch off predicates.

???

Which version of 7 is recommended ?

Bill Bell
Sent from mobile


On Oct 10, 2013, at 11:29 AM, Smiley, David W. 
dsmi...@mitre.org
wrote:

   *Don't* use JDK 7u40, it's been known to cause index corruption
and


SIGSEGV faults with Lucene: LUCENE-5212   This has not been
unnoticed

   by


   Oracle.


~ David

On 10/10/13 12:34 PM, Guido Medina 
guido.med...@temetra.com
wrote:
2. Java version: There are huges performance winning between Java
5, 6

   and 7; we use Oracle JDK 7u40.

Solr Slave warning: No content recieved for file

2013-10-11 Thread Arcadius Ahouansou

Hello.

We are running a master-slave solr 3.x and we are seeing more and more of
this in the slave log file:

**
*Oct 10, 2013 10:17:00 PM org.apache.solr.handler.SnapPuller$FileFetcher
fetchPackets*
*WARNING: No content recieved for file: {name=_56l.prx,
lastmodified=1381443413000, size=0}*
**

Is this something we should worry about?

Note that we are running some deleteDocByQuery commands on the master.


Thanks.

Arcadius.

SOLR Cloud on JBOSS

2013-10-11 Thread Branham, Jeremy [HR]

Hello -

This wiki page is gone - https://wiki.apache.org/solr/SolrCloud%20using%20Jboss

I have been able to configure an external instance of Zookeeper, and an 
instance of SOLR in JBOSS..
But I am unsure how to point my SOLR instance to the ZK instance and upload the 
configuration.

All the examples I have found, show using script parameters to start SOLR 
rather than using a container like JBOSS.

Can someone point me in the right direction?

Thanks!


Jeremy D. Branham
Performance Technologist II
Sprint University Performance Support
Fort Worth, TX | Tel: **DOTNET
http://JeremyBranham.Wordpress.comhttp://jeremybranham.wordpress.com/
http://www.linkedin.com/in/jeremybranham




This e-mail may contain Sprint proprietary information intended for the sole 
use of the recipient(s). Any use by others is prohibited. If you are not the 
intended recipient, please contact the sender and delete all copies of the 
message.

Re: Problems using DataImportHandler and TikaEntityProcessor

2013-10-11 Thread PeteBleackley

kamaci wrote
 There may be a problem with you schema. Could you send your solr logs?
 
 
 2013/10/11 Peter Bleackley lt;

 bleackleyp@.co

 gt;
 
 Starting Solr with the command line


 java -Dsolr.solr.home=example-DIH/**solr -jar start.jar


 and then trying to import some data with

 java
 -Durl=http://localhost:8983/**solr/tika/updatelt;http://localhost:8983/solr/tika/updategt;-Dtype=application/pdf
 -jar post.jar *.pdf

 fails with error

 SimplePostTool: WARNING: Solr returned an error #400 Bad Request
 SimplePostTool: WARNING: IOException while reading response:
 java.io.IOException: Server returned HTTP response code: 400 for URL:
 http://localhost:8983/solr/**tika/updatelt;http://localhost:8983/solr/tika/updategt;

 These are all valid PDFs that I have previously been able to import with
 Solr Cell.

 What am I doing wrong?

 Dr Peter J Bleackley
 Computational Linguistics Contractor
 Playful Technology Ltd




11228 [qtp1831924725-17] INFO 
org.apache.solr.update.processor.LogUpdateProcessor  – [tika] webapp=/solr
path=/update params={} {} 0 0
11229 [qtp1831924725-17] ERROR org.apache.solr.core.SolrCore  –
org.apache.solr.common.SolrException: Unsupported ContentType:
application/pdf  Not in: [application/xml, text/csv, text/json,
application/csv, application/javabin, text/xml, application/json]
at
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:86)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:724)


I tried changing the options to -Dauto -Dfiletypes=pdf. This gave me a 404
error, apparently caused by post.jar adding /extract to the end of the URL





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problems-using-DataImportHandler-and-TikaEntityProcessor-tp4094983p4094987.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 4.4.0 on Ubuntu 10.04 with Jetty 6.1 from package Repository

On 10/11/2013 4:55 AM, Peter Schmidt wrote:
 So the main problem was that the libs must be copied to the WEB-INF/lib
 directory insteed of the jetty lib/ext directory. Is the fact that you
 should you use WEB-INF/lib somewhere documented?

Actually, jetty's lib/ext is preferred, modifying the .war file is NOT
recommended.

Solr used to ship with the logging jars in the .war file, similar to the
result that Guido's procedure gives you.

http://wiki.apache.org/solr/SolrLogging#What_changed

This was changed in version 4.3.0 because many people were having to
take manual steps to change logging frameworks.  There is a strong
preference among people who really care about logging for using log4j or
logback instead of java.util.logging.  Now nobody needs to compile Solr
themselves or perform surgery on the .war file when they want to change
their logging, and the default produces much better results.

Thanks,
Shawn

Re: SolrCloud on SSL

2013-10-11 Thread Christopher Gross

On Fri, Oct 11, 2013 at 11:08 AM, Shawn Heisey s...@elyograg.org wrote:

 On 10/11/2013 8:17 AM, Christopher Gross wrote: 
  Is there a spot in a Solr configuration that I can set this up to use
 HTTPS?

 From what I can tell, not yet.

 https://issues.apache.org/jira/browse/SOLR-3854
 https://issues.apache.org/jira/browse/SOLR-4407
 https://issues.apache.org/jira/browse/SOLR-4470


Dang.


 I'm wondering why you want to do this, though.  It adds extra CPU
 overhead.  Perhaps not a lot, but it's not free.

 As for protecting Solr against eavesdropping, is it in a location where
 that's possible?  The bottom line is this:  People that you cannot trust
 should not have direct access to Solr.  It should be firewalled so only
 trusted personnel and applications can talk to it.


Oh, they should be firewalled, but I can't (yet) with the existing network
architecture.  It's out of my direct control -- I'm just trying to stay one
step ahead of the game.


 Anyone who has direct access to Solr can change your index, delete your
 index, and send denial of service queries.  If you take steps to block
 access to the update handler(s) and the admin UI, denial of service
 queries are still possible.  Blocking access to the update handlers and
 admin UI is not something Solr itself can do - that's a job for the
 servlet container.

 Related general issue: The /browse handler included in the example
 (which utilizes code written in velocity) requires that the user have
 direct access to Solr.  This makes its very design insecure.  That
 handler is intended as a demonstration of Solr's capabilities and how to
 use them, it's not for production.


Good to know, I'll make sure that I've bumped this in my configs.  Thanks!

Re: SOLR Cloud on JBOSS

On 10/11/2013 9:24 AM, Branham, Jeremy [HR] wrote:
 This wiki page is gone - 
 https://wiki.apache.org/solr/SolrCloud%20using%20Jboss
 
 I have been able to configure an external instance of Zookeeper, and an 
 instance of SOLR in JBOSS..
 But I am unsure how to point my SOLR instance to the ZK instance and upload 
 the configuration.
 
 All the examples I have found, show using script parameters to start SOLR 
 rather than using a container like JBOSS.

With version 4.4.0, you can put the zkHost parameter required to turn
SolrCloud mode on in your solr.xml file.  This is the case whether you
use the new solr.xml format or the old solr.xml format.  With versions
4.3.0 and older (which can only use the old solr.xml format), there was
a bug that prevented this parameter from working correctly in solr.xml.

Alternatively, you can use whatever mechanism JBoss provides for setting
java system properties to set the zkHost parameter.

As for uploading configurations, I strongly recommend that you do not do
this with startup parameters, but rather do it with the command-line
zookeeper utility.  The example includes scripts for using this utility,
but those scripts rely pretty heavily on the example jetty.  Here's a
reference that shows how to use it directly, but you must know where
JBoss extracted the war file so you can use the correct classpath argument:

https://cwiki.apache.org/confluence/display/solr/Command+Line+Utilities

Thanks,
Shawn

Re: SolrCloud on SSL

You could resolve that with SSH tunnels. Autossh with the right 
parameters works like a charm.


HTH,

Guido.

On 11/10/13 16:08, Shawn Heisey wrote:

On 10/11/2013 8:17 AM, Christopher Gross wrote:

I have 3 SolrCloud nodes (call them idx1, idx2, idx3), and the boxes have
SSL  certs configured on them to protect the Solr Indexes.

Right now, I can do queries on idx1 and it works fine.
If I try to query on idx3, I get:
org.apache.solr.common.SolrException:
org.apache.sorl.client.solrj.SolrServerException:IOException occurred when
talking to server at http://idx1:8443/solr/test1
(and then a long stack trace -- can't copy it, on a test network)

Is there a spot in a Solr configuration that I can set this up to use HTTPS?

 From what I can tell, not yet.

https://issues.apache.org/jira/browse/SOLR-3854
https://issues.apache.org/jira/browse/SOLR-4407
https://issues.apache.org/jira/browse/SOLR-4470

I'm wondering why you want to do this, though.  It adds extra CPU
overhead.  Perhaps not a lot, but it's not free.

As for protecting Solr against eavesdropping, is it in a location where
that's possible?  The bottom line is this:  People that you cannot trust
should not have direct access to Solr.  It should be firewalled so only
trusted personnel and applications can talk to it.

Anyone who has direct access to Solr can change your index, delete your
index, and send denial of service queries.  If you take steps to block
access to the update handler(s) and the admin UI, denial of service
queries are still possible.  Blocking access to the update handlers and
admin UI is not something Solr itself can do - that's a job for the
servlet container.

Related general issue: The /browse handler included in the example
(which utilizes code written in velocity) requires that the user have
direct access to Solr.  This makes its very design insecure.  That
handler is intended as a demonstration of Solr's capabilities and how to
use them, it's not for production.

Thanks,
Shawn

Re: What's the purpose of the bits option in compositeId (Solr 4.5)?

2013-10-11 Thread Brett Hoerner

Thanks folks,

As an update for future readers --- the problem was on my side (my logic in
picking the _route_ was flawed) as expected. :)


On Tue, Oct 8, 2013 at 7:35 PM, Yonik Seeley ysee...@gmail.com wrote:

 On Tue, Oct 8, 2013 at 8:27 PM, Shawn Heisey s...@elyograg.org wrote:
  There is also the distrib=false parameter that will cause the request
 to
  be handled directly by the core it is sent to rather than being
  distributed/balanced by SolrCloud.

 Right - this is probably the best option for diagnosing what is in what
 index.

 -Yonik

Re: Problems using DataImportHandler and TikaEntityProcessor

Here is a similar conversation:
http://search-lucene.com/m/GeXcg1YfgQ32/Re%253A+Solr+4.0+error+message%253A+%2522Unsupported+ContentType%253A+Content-type%253Atext%252Fxml%2522subj=Re+Solr+4+0+error+message+Unsupported+ContentType+Content+type+text+xml+

Could you change -Dauto into -Dtype=application/pdf and try it again?


2013/10/11 PeteBleackley bleackl...@zooey.co.uk

 kamaci wrote
  There may be a problem with you schema. Could you send your solr logs?
 
 
  2013/10/11 Peter Bleackley lt;

  bleackleyp@.co

  gt;
 
  Starting Solr with the command line
 
 
  java -Dsolr.solr.home=example-DIH/**solr -jar start.jar
 
 
  and then trying to import some data with
 
  java
  -Durl=
 http://localhost:8983/**solr/tika/updatelt;http://localhost:8983/solr/tika/updategt;-Dtype=application/pdf
  -jar post.jar *.pdf
 
  fails with error
 
  SimplePostTool: WARNING: Solr returned an error #400 Bad Request
  SimplePostTool: WARNING: IOException while reading response:
  java.io.IOException: Server returned HTTP response code: 400 for URL:
 
 http://localhost:8983/solr/**tika/updatelt;http://localhost:8983/solr/tika/updategt
 ;
 
  These are all valid PDFs that I have previously been able to import with
  Solr Cell.
 
  What am I doing wrong?
 
  Dr Peter J Bleackley
  Computational Linguistics Contractor
  Playful Technology Ltd
 
 
 

 11228 [qtp1831924725-17] INFO
 org.apache.solr.update.processor.LogUpdateProcessor  – [tika] webapp=/solr
 path=/update params={} {} 0 0
 11229 [qtp1831924725-17] ERROR org.apache.solr.core.SolrCore  –
 org.apache.solr.common.SolrException: Unsupported ContentType:
 application/pdf  Not in: [application/xml, text/csv, text/json,
 application/csv, application/javabin, text/xml, application/json]
 at

 org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:86)
 at

 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
 at

 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
 at

 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703)
 at

 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406)
 at

 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)
 at

 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
 at
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
 at

 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
 at
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
 at

 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
 at

 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
 at
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
 at

 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
 at

 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
 at

 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
 at

 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
 at

 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
 at

 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
 at org.eclipse.jetty.server.Server.handle(Server.java:368)
 at

 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
 at

 org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
 at

 org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
 at

 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)
 at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861)
 at
 org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
 at

 org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
 at

 org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
 at

 org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
 at

 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
 at java.lang.Thread.run(Thread.java:724)


 I tried changing the options to -Dauto -Dfiletypes=pdf. This gave me a 404
 error, apparently caused by post.jar adding /extract to the end of the URL





 --
 View this message in context:

Re: Problems using DataImportHandler and TikaEntityProcessor