Re: Solr 4.4.0 on Ubuntu 10.04 with Jetty 6.1 from package Repository

2013-10-11 Thread Bill Bell
Does this work ?
I can suggest -XX:-UseLoopPredicate to switch off predicates.

???

Which version of 7 is recommended ?

Bill Bell
Sent from mobile


 On Oct 10, 2013, at 11:29 AM, Smiley, David W. dsmi...@mitre.org wrote:
 
 *Don't* use JDK 7u40, it's been known to cause index corruption and
 SIGSEGV faults with Lucene: LUCENE-5212   This has not been unnoticed by
 Oracle.
 
 ~ David
 
 On 10/10/13 12:34 PM, Guido Medina guido.med...@temetra.com wrote:
 
 2. Java version: There are huges performance winning between Java 5, 6
   and 7; we use Oracle JDK 7u40.
 


Re: SolrCore 'collection1' is not available due to init failure

2013-10-11 Thread Liu Bo
org.apache.solr.core.SolrCore.init(SolrCore.java:821) ... 13 more Caused
by: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed
out:
NativeFSLock@/usr/share/solr-4.5.0/example/solr/
collection1/data/index/write.lock:
java.io.FileNotFoundException:
/usr/share/solr-4.5.0/example/solr/collection1/data/index/write.lock
(Permission denied) at org.apache.lucene.store.Lock.obtain(Lock.java:84) at

it seems a permission problem, the user that start tomcat don't have
permission to access your index folder.

try grant read and write permission to current user to your solr data
folder and restart tomcat to see what happens.


-- 
All the best

Liu Bo


Re: Questions developing custom functionquery

2013-10-11 Thread Mikhail Khludnev
Hello JT,

what's is the field and fieldType definition for resname ?
can't you check how '/some
example/data/here/2013/09/12/
testing.text
' is handled on analysis page in SolrAdmin?


On Fri, Oct 11, 2013 at 4:53 AM, Richard Lee rockiee...@gmail.com wrote:

 seems what u got is the terms other than the raw data. maybe u should check
 the api docs for more details
 2013-10-11 上午3:56于 JT handyrems...@gmail.com写道:

  I'm running into some issues developing a custom functionquery.
 
  My goal is to be able to implement a custom sorting technique.
 
  I have a field defined called resname, it is a single value str.
 
  Example: str name=resname/some
  example/data/here/2013/09/12/testing.text/str
 
  I would like to do a custom sort based on this resname field.
  Basically, I would like to parse out that date there (2013/09/12) and
 sort
  on that date.
 
 
  I've followed various tutorials
 - http://java.dzone.com/news/how-write-custom-solr
 -
 
 http://www.supermind.org/blog/756/how-to-write-a-custom-solr-functionquery
 
 
  Im at the point where my code compiles, runs, executes, etc. Solr is
 happy
  with my code.
 
  I have classes that inherit from ValueSorceParser and ValueSorce, etc.
 I've
  overrode parse and
  instantiated my class with ValueSource
 
  public ValueSource parse(FunctionQParser fqp) {
  return MyCustomClass(fqp.parseValueSource)
  }
 
  public class MyCustomClass extends ValueSource {
  ValueSource source;
 
  public MyCustomClass(ValueSource source) {
  this.source = source;
  }
 
  public FunctionValues getValues() {
 final FunctionValues sourceDV =
  source.getvalues(context,readerContext)
 return new IntValues(this)
  public int intVal(int doc) {
  //parse the value of resname here
String value = sourceDV.strVal(doc);
   ...more stuff
   }
 }
 }
 
  The issue I'm running into is that my call to sourceDV.strVal(doc) only
  returns part of the field, not all of it. It appears to be very random.
 
  I guess my actual question is, how do I access / reference the EXACT RAW
  value of a field, while writing a functionquery.
 
  Do I need to change my ValueSource to a String?, then somehow lookup the
  field name while inside my getValues call?
 
  Is there a way to access the raw field data , when referencing it as a
  FunctionValues?
 
 
  Maybe I'm going about this totally incorrectly?
 




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com


Re: Multiple schemas in the same SolrCloud ?

2013-10-11 Thread maephisto
Thanks!
My only doubt is: upload a new set of configuration files to the same
configuration name like so:

Initial configuration:
zkcli.sh -zkhost localhost:9983 -cmd upconfig -confdir conf_initial/
-confname my_custom_config
and afterwards, to change it do:
zkcli.sh -zkhost localhost:9983 -cmd upconfig -confdir conf_changed/
-confname my_custom_config

Is this correct?
If so, what happens afterwards, will ZK distribute this changes to all cores
and reload them?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multiple-schemas-in-the-same-SolrCloud-tp4094279p4094895.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Multiple schemas in the same SolrCloud ?

2013-10-11 Thread Furkan KAMACI
Here is a topic you should read it:
http://lucene.472066.n3.nabble.com/Reloading-config-to-zookeeper-td4021901.html


2013/10/11 maephisto my_sky...@yahoo.com

 Thanks!
 My only doubt is: upload a new set of configuration files to the same
 configuration name like so:

 Initial configuration:
 zkcli.sh -zkhost localhost:9983 -cmd upconfig -confdir conf_initial/
 -confname my_custom_config
 and afterwards, to change it do:
 zkcli.sh -zkhost localhost:9983 -cmd upconfig -confdir conf_changed/
 -confname my_custom_config

 Is this correct?
 If so, what happens afterwards, will ZK distribute this changes to all
 cores
 and reload them?



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Multiple-schemas-in-the-same-SolrCloud-tp4094279p4094895.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Solr Cloud Basic Authentification

2013-10-11 Thread maephisto
I've deployed a SolrCloud cluster in Jetty 9 using solr 4.4.0 and I would
like to add some basic authentification.
My question is how can I provide the credentials so that they're used in the
collection API when creating a new collection or by ZK?

Are there any useful docs/wiki on this topic?
Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Cloud-Basic-Authentification-tp4094903.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Cloud Basic Authentification

2013-10-11 Thread primoz . skale
For pre 4.x Solr (aka Solr 3.x) basic authentication works fine. Check 
this site: http://wiki.apache.org/solr/SolrSecurity

Even master-slave replication architecture (*not* SolrCloud) works for 
me. There could be some problems with *cross-shard* queries etc. though 
(see SOLR-1861, SOLR-3421).

I know I haven't answered your question but hopefully I have given you 
some more information on the subject.

Best regards,

Primož




From:   maephisto my_sky...@yahoo.com
To: solr-user@lucene.apache.org
Date:   11.10.2013 10:55
Subject:Solr Cloud Basic Authentification



I've deployed a SolrCloud cluster in Jetty 9 using solr 4.4.0 and I would
like to add some basic authentification.
My question is how can I provide the credentials so that they're used in 
the
collection API when creating a new collection or by ZK?

Are there any useful docs/wiki on this topic?
Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Cloud-Basic-Authentification-tp4094903.html

Sent from the Solr - User mailing list archive at Nabble.com.



Re: Please help!, Highlighting exact phrases with solr

2013-10-11 Thread Silvia Suárez
Dear Koji,

Thanks a lot for your answer and Sorry about my english

I tried to configure
FastVectorHighlighterhttp://wiki.apache.org/solr/HighlightingParameters#hl.useFastVectorHighlighter

However, I have this error:


lst name=error
str name=msg
fragCharSize(1) is too small. It must be 18 or higher.
/str
str name=trace
java.lang.IllegalArgumentException: fragCharSize(1) is too small. It must
be 18 or higher. at
org.apache.lucene.search.vectorhighlight.BaseFragListBuilder.createFieldFragList(BaseFragListBuilder.java:51)
at
org.apache.lucene.search.vectorhighlight.WeightedFragListBuilder.createFieldFragList(WeightedFragListBuilder.java:38)
at
org.apache.lucene.search.vectorhighlight.FastVectorHighlighter.getFieldFragList(FastVectorHighlighter.java:195)
at
org.apache.lucene.search.vectorhighlight.FastVectorHighlighter.getBestFragments(FastVectorHighlighter.java:184)
at
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByFastVectorHighlighter(DefaultSolrHighlighter.java:588)
at
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:413)
at
org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:139)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816) at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:365) at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:926)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:988)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:635) at
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:722)
/str
int name=code500/int
/lst
/response



Then, If I modify like this: (setHighlightFragsize(1) --
setHighlightFragsize(80)):

SolrQuery solrQuery = new SolrQuery();

solrQuery.setQuery(queryEnt);
solrQuery.set(collectionName, myCollection);
solrQuery.addHighlightField(texto)
 .addHighlightField(titular)
 .setHighlightSnippets(50)
 .setHighlightFragsize(80);
solrQuery.setHighlight(true);
solrQuery.setHighlightRequireFieldMatch(true);
solrQuery.set(hl.useFastVectorHighlighter, true);
solrQuery.setHighlightSimplePre(span class=\item\);
solrQuery.setHighlightSimplePost(/span);
solrQuery.set(hl.usePhraseHighlighter, true);


Then, It works (error disappears),  but Highlighting does not work :( :

lst name=highlighting
lst name=35254502/
lst name=35237409/
/lst
lst name=termVectors
str name=uniqueKeyFieldNamec_noticia/str
lst name=warnings
arr name=noTermVectors
strc_region/str
strc_idioma/str
strc_pais/str
strc_tipo/str
strc_categoria/str
strfecha_captura/str
strmedio/str
strc_fuente_docu/str
/arr
/lst
lst name=35254502
str name=uniqueKey35254502/str
/lst
lst name=35237409
str 

Re: Multiple schemas in the same SolrCloud ?

2013-10-11 Thread xinwu
Hi,kamaci.
Is that means I just need to upload new config files ,and do not need to
reload every node in solrCloud ,when I want to change my configurations?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multiple-schemas-in-the-same-SolrCloud-tp4094279p4094908.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Cloud Basic Authentification

2013-10-11 Thread Furkan KAMACI
Here is more information about security that you can use:
http://wiki.apache.org/solr/SolrSecurity


2013/10/11 maephisto my_sky...@yahoo.com

 I've deployed a SolrCloud cluster in Jetty 9 using solr 4.4.0 and I would
 like to add some basic authentification.
 My question is how can I provide the credentials so that they're used in
 the
 collection API when creating a new collection or by ZK?

 Are there any useful docs/wiki on this topic?
 Thanks!



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-Cloud-Basic-Authentification-tp4094903.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr Cloud Basic Authentification

2013-10-11 Thread maephisto
Thank you!

I'm more interested in the SolrCloud architecture, with shards, shards
replicas and distributed index and search.
This are the features i use and would like to protect by some basic
authentification.

I imagine that there must be a way to have this, otherwise anybody could
mess with or even drop my collection.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Cloud-Basic-Authentification-tp4094903p4094911.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Please help!, Highlighting exact phrases with solr

2013-10-11 Thread Furkan KAMACI
Here is a similar question:

http://search-lucene.com/m/vnMGKACGM1/%252218+or+higher.%2522subj=FastVectorHighlighter+and+hl+fragsize+parameter+set+to+zero+causes+exception

and a related fixed issue: https://issues.apache.org/jira/browse/SOLR-1268


2013/10/11 Silvia Suárez s...@anpro21.com

 Dear Koji,

 Thanks a lot for your answer and Sorry about my english

 I tried to configure
 FastVectorHighlighter
 http://wiki.apache.org/solr/HighlightingParameters#hl.useFastVectorHighlighter
 

 However, I have this error:


 lst name=error
 str name=msg
 fragCharSize(1) is too small. It must be 18 or higher.
 /str
 str name=trace
 java.lang.IllegalArgumentException: fragCharSize(1) is too small. It must
 be 18 or higher. at

 org.apache.lucene.search.vectorhighlight.BaseFragListBuilder.createFieldFragList(BaseFragListBuilder.java:51)
 at

 org.apache.lucene.search.vectorhighlight.WeightedFragListBuilder.createFieldFragList(WeightedFragListBuilder.java:38)
 at

 org.apache.lucene.search.vectorhighlight.FastVectorHighlighter.getFieldFragList(FastVectorHighlighter.java:195)
 at

 org.apache.lucene.search.vectorhighlight.FastVectorHighlighter.getBestFragments(FastVectorHighlighter.java:184)
 at

 org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByFastVectorHighlighter(DefaultSolrHighlighter.java:588)
 at

 org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:413)
 at

 org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:139)
 at

 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)
 at

 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816) at

 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656)
 at

 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
 at

 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
 at

 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
 at
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
 at

 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
 at
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560)
 at

 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
 at

 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)
 at
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)
 at

 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
 at

 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)
 at

 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
 at

 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
 at

 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
 at

 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
 at org.eclipse.jetty.server.Server.handle(Server.java:365) at

 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)
 at

 org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
 at

 org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:926)
 at

 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:988)
 at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:635) at
 org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at

 org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
 at

 org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
 at

 org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
 at

 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
 at java.lang.Thread.run(Thread.java:722)
 /str
 int name=code500/int
 /lst
 /response



 Then, If I modify like this: (setHighlightFragsize(1) --
 setHighlightFragsize(80)):

 SolrQuery solrQuery = new SolrQuery();

 solrQuery.setQuery(queryEnt);
 solrQuery.set(collectionName, myCollection);
 solrQuery.addHighlightField(texto)
  .addHighlightField(titular)
  .setHighlightSnippets(50)
  .setHighlightFragsize(80);
 solrQuery.setHighlight(true);
 solrQuery.setHighlightRequireFieldMatch(true);
 solrQuery.set(hl.useFastVectorHighlighter, true);
 solrQuery.setHighlightSimplePre(span class=\item\);
 solrQuery.setHighlightSimplePost(/span);
 solrQuery.set(hl.usePhraseHighlighter, true);


 Then, It works (error disappears),  

Re: Solr Cloud Basic Authentification

2013-10-11 Thread primoz . skale
One possible solution is to firewall access to SolrCloud server(s). Only 
proxy/load-balacing servers should have unrestricted access to Solr 
infrastructure. Then you can implement basic/advanced authentication on 
the proxy/LB side.

Primož



From:   maephisto my_sky...@yahoo.com
To: solr-user@lucene.apache.org
Date:   11.10.2013 11:17
Subject:Re: Solr Cloud Basic Authentification



Thank you!

I'm more interested in the SolrCloud architecture, with shards, shards
replicas and distributed index and search.
This are the features i use and would like to protect by some basic
authentification.

I imagine that there must be a way to have this, otherwise anybody could
mess with or even drop my collection.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Cloud-Basic-Authentification-tp4094903p4094911.html

Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr Cloud Basic Authentification

2013-10-11 Thread maephisto
Thank you,
But I'm afraid that wiki page does not cover my topic of interest



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Cloud-Basic-Authentification-tp4094903p4094915.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Cloud Basic Authentification

2013-10-11 Thread primoz . skale
If you want to deploy basic authentication in a way that a login is 
required when creating collections it is only a simple matter of 
constrainting a url pattern (eg. /solr/admin/collections/*). Maybe this 
link will help: 
http://stackoverflow.com/questions/5323855/jetty-webserver-security/5332049#5332049

But keep in mind that intra-node requests in SolrCloud must also be 
authenticated (because http stack is used). If I understand correctly this 
is currently not possible.

Primož




From:   maephisto my_sky...@yahoo.com
To: solr-user@lucene.apache.org
Date:   11.10.2013 11:25
Subject:Re: Solr Cloud Basic Authentification



Thank you,
But I'm afraid that wiki page does not cover my topic of interest



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Cloud-Basic-Authentification-tp4094903p4094915.html

Sent from the Solr - User mailing list archive at Nabble.com.



Cores with lot of folders with prefix index.XXXXXXX

2013-10-11 Thread yriveiro
Hi,

I have some cores with lot of folder with format index.X, my question is
why?

The collateral effect of this are shards with 50% of size than replicas in
other nodes.

There is any way to delete this folders to free space?

It's a bug?

/Yago



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Cores-with-lot-of-folders-with-prefix-index-XXX-tp4094920.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Cores with lot of folders with prefix index.XXXXXXX

2013-10-11 Thread primoz . skale
I think this is connected to replications being made? I also have quite 
some of them but currently I am not worried :)

Primož



From:   yriveiro yago.rive...@gmail.com
To: solr-user@lucene.apache.org
Date:   11.10.2013 11:54
Subject:Cores with lot of folders with prefix index.XXX



Hi,

I have some cores with lot of folder with format index.X, my question 
is
why?

The collateral effect of this are shards with 50% of size than replicas in
other nodes.

There is any way to delete this folders to free space?

It's a bug?

/Yago



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Cores-with-lot-of-folders-with-prefix-index-XXX-tp4094920.html

Sent from the Solr - User mailing list archive at Nabble.com.



solrnet sample

2013-10-11 Thread Kishan Parmar
i want to change the schema file of solrnet sample and want to add xml file
and want to facet data

so what i have to need to do in sample file???

Regards,

Kishan Parmar
Software Developer
+91 95 100 77394
Jay Shree Krishnaa !!


Re: Cores with lot of folders with prefix index.XXXXXXX

2013-10-11 Thread Yago Riveiro
I have ssd's therefor my space is like gold, I can have 30% of my space waste 
in failed replications, or replications that are not cleaned. 

The question for me is if this a normal behaviour or is a bug. If is a normal 
behaviour I have a trouble because a ssd with more than 512G is expensive.

-- 
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Friday, October 11, 2013 at 11:03 AM, primoz.sk...@policija.si wrote:

 I think this is connected to replications being made? I also have quite
 some of them but currently I am not worried :)




Re: Please help!, Highlighting exact phrases with solr

2013-10-11 Thread Silvia Suárez
Hi,

Thanks for your answer Furkan,
I'm sorry, I don't understand the proposed solution...

I did this:


   1. eliminate hl.useHighlighter parameter
   2. introduce hl.useFastVectorHighlighter


However the result is the same...

is something missing?

Thanks a lot in advance for your help...

Sil.



*
*
*Tecnologías y SaaS para el análisis de marcas comerciales.*


Nota:
Usted ha recibido este mensaje al estar en la libreta de direcciones del
remitente, en los archivos de la empresa o mediante el sistema de
“responder” al ser usted la persona que contactó por este medio con el
remitente. En caso de no querer recibir ningún email mas del remitente o de
cualquier miembro de la organización a la que pertenece, por favor,
responda a este email solicitando la baja de su dirección en nuestros
archivos.

Advertencia legal:
Este mensaje y, en su caso, los ficheros anexos son confidenciales,
especialmente en lo que respecta a los datos personales, y se dirigen
exclusivamente al destinatario referenciado. Si usted no lo es y lo ha
recibido por error o tiene conocimiento del mismo por cualquier motivo, le
rogamos que nos lo comunique por este medio y proceda a destruirlo o
borrarlo, y que en todo caso se abstenga de utilizar, reproducir, alterar,
archivar o comunicar a terceros el presente mensaje y ficheros anexos, todo
ello bajo pena de incurrir en responsabilidades legales.


2013/10/11 Furkan KAMACI furkankam...@gmail.com

 Here is a similar question:


 http://search-lucene.com/m/vnMGKACGM1/%252218+or+higher.%2522subj=FastVectorHighlighter+and+hl+fragsize+parameter+set+to+zero+causes+exception

 and a related fixed issue: https://issues.apache.org/jira/browse/SOLR-1268


 2013/10/11 Silvia Suárez s...@anpro21.com

  Dear Koji,
 
  Thanks a lot for your answer and Sorry about my english
 
  I tried to configure
  FastVectorHighlighter
 
 http://wiki.apache.org/solr/HighlightingParameters#hl.useFastVectorHighlighter
  
 
  However, I have this error:
 
 
  lst name=error
  str name=msg
  fragCharSize(1) is too small. It must be 18 or higher.
  /str
  str name=trace
  java.lang.IllegalArgumentException: fragCharSize(1) is too small. It must
  be 18 or higher. at
 
 
 org.apache.lucene.search.vectorhighlight.BaseFragListBuilder.createFieldFragList(BaseFragListBuilder.java:51)
  at
 
 
 org.apache.lucene.search.vectorhighlight.WeightedFragListBuilder.createFieldFragList(WeightedFragListBuilder.java:38)
  at
 
 
 org.apache.lucene.search.vectorhighlight.FastVectorHighlighter.getFieldFragList(FastVectorHighlighter.java:195)
  at
 
 
 org.apache.lucene.search.vectorhighlight.FastVectorHighlighter.getBestFragments(FastVectorHighlighter.java:184)
  at
 
 
 org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByFastVectorHighlighter(DefaultSolrHighlighter.java:588)
  at
 
 
 org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:413)
  at
 
 
 org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:139)
  at
 
 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)
  at
 
 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816) at
 
 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656)
  at
 
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
  at
 
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
  at
 
 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
  at
 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
  at
 
 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
  at
 
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560)
  at
 
 
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
  at
 
 
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)
  at
  org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)
  at
 
 
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
  at
 
 
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)
  at
 
 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
  at
 
 
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
  at
 
 
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
  at
 
 
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
  at org.eclipse.jetty.server.Server.handle(Server.java:365) at
 
 
 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)
  at
 
 
 

Re: Cores with lot of folders with prefix index.XXXXXXX

2013-10-11 Thread primoz . skale
Do you have a lot of failed replications? Maybe those folders have 
something to do with this (please see the last answer at 
http://stackoverflow.com/questions/3145192/why-does-my-solr-slave-index-keep-growing
). If your disk space is valuable check index.properties file under data 
folder and try to determine which folders can be safely deleted.

Primož




From:   Yago Riveiro yago.rive...@gmail.com
To: solr-user@lucene.apache.org
Date:   11.10.2013 12:13
Subject:Re: Cores with lot of folders with prefix index.XXX



I have ssd's therefor my space is like gold, I can have 30% of my space 
waste in failed replications, or replications that are not cleaned. 

The question for me is if this a normal behaviour or is a bug. If is a 
normal behaviour I have a trouble because a ssd with more than 512G is 
expensive.

-- 
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Friday, October 11, 2013 at 11:03 AM, primoz.sk...@policija.si wrote:

 I think this is connected to replications being made? I also have quite
 some of them but currently I am not worried :)





Re: Cores with lot of folders with prefix index.XXXXXXX

2013-10-11 Thread Yago Riveiro
The thread that you point is about master / slave - replication, Is this issue 
valid on SolrCloud context?  

I check the index.properties and indeed the variable index=index.X point to 
a folder, the others can be deleted without any scary side effect?


--  
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Friday, October 11, 2013 at 11:31 AM, primoz.sk...@policija.si wrote:

 Do you have a lot of failed replications? Maybe those folders have  
 something to do with this (please see the last answer at  
 http://stackoverflow.com/questions/3145192/why-does-my-solr-slave-index-keep-growing
 ). If your disk space is valuable check index.properties file under data  
 folder and try to determine which folders can be safely deleted.
  
 Primo¾
  
  
  
  
 From: Yago Riveiro yago.rive...@gmail.com (mailto:yago.rive...@gmail.com)
 To: solr-user@lucene.apache.org (mailto:solr-user@lucene.apache.org)
 Date: 11.10.2013 12:13
 Subject: Re: Cores with lot of folders with prefix index.XXX
  
  
  
 I have ssd's therefor my space is like gold, I can have 30% of my space  
 waste in failed replications, or replications that are not cleaned.  
  
 The question for me is if this a normal behaviour or is a bug. If is a  
 normal behaviour I have a trouble because a ssd with more than 512G is  
 expensive.
  
 --  
 Yago Riveiro
 Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
  
  
 On Friday, October 11, 2013 at 11:03 AM, primoz.sk...@policija.si 
 (mailto:primoz.sk...@policija.si) wrote:
  
  I think this is connected to replications being made? I also have quite
  some of them but currently I am not worried :)
   
  
  
  




Re: Cores with lot of folders with prefix index.XXXXXXX

2013-10-11 Thread Shalin Shekhar Mangar
There are open issues related to extra index.XXX folders lying around if
replication/recovery fails. See
https://issues.apache.org/jira/browse/SOLR-4506


On Fri, Oct 11, 2013 at 4:06 PM, Yago Riveiro yago.rive...@gmail.comwrote:

 The thread that you point is about master / slave - replication, Is this
 issue valid on SolrCloud context?

 I check the index.properties and indeed the variable index=index.X
 point to a folder, the others can be deleted without any scary side effect?


 --
 Yago Riveiro
 Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


 On Friday, October 11, 2013 at 11:31 AM, primoz.sk...@policija.si wrote:

  Do you have a lot of failed replications? Maybe those folders have
  something to do with this (please see the last answer at
 
 http://stackoverflow.com/questions/3145192/why-does-my-solr-slave-index-keep-growing
  ). If your disk space is valuable check index.properties file under data
  folder and try to determine which folders can be safely deleted.
 
  Primo¾
 
 
 
 
  From: Yago Riveiro yago.rive...@gmail.com (mailto:
 yago.rive...@gmail.com)
  To: solr-user@lucene.apache.org (mailto:solr-user@lucene.apache.org)
  Date: 11.10.2013 12:13
  Subject: Re: Cores with lot of folders with prefix index.XXX
 
 
 
  I have ssd's therefor my space is like gold, I can have 30% of my space
  waste in failed replications, or replications that are not cleaned.
 
  The question for me is if this a normal behaviour or is a bug. If is a
  normal behaviour I have a trouble because a ssd with more than 512G is
  expensive.
 
  --
  Yago Riveiro
  Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
 
 
  On Friday, October 11, 2013 at 11:03 AM, primoz.sk...@policija.si(mailto:
 primoz.sk...@policija.si) wrote:
 
   I think this is connected to replications being made? I also have quite
   some of them but currently I am not worried :)
  
 
 
 





-- 
Regards,
Shalin Shekhar Mangar.


Re: Solr 4.4.0 on Ubuntu 10.04 with Jetty 6.1 from package Repository

2013-10-11 Thread Peter Schmidt
Not so hard switching it to Oracle JDK 7u40.
Just download it and change the JAVA_HOME path in /etc/default/jetty, so
it's not nescessary to switch java version with update-java-alternatives

The machine is 64bit :)



2013/10/11 Bill Bell billnb...@gmail.com

 Does this work ?
 I can suggest -XX:-UseLoopPredicate to switch off predicates.

 ???

 Which version of 7 is recommended ?

 Bill Bell
 Sent from mobile


  On Oct 10, 2013, at 11:29 AM, Smiley, David W. dsmi...@mitre.org
 wrote:
 
  *Don't* use JDK 7u40, it's been known to cause index corruption and
  SIGSEGV faults with Lucene: LUCENE-5212   This has not been unnoticed by
  Oracle.
 
  ~ David
 
  On 10/10/13 12:34 PM, Guido Medina guido.med...@temetra.com wrote:
 
  2. Java version: There are huges performance winning between Java 5, 6
and 7; we use Oracle JDK 7u40.
 



Re: Cores with lot of folders with prefix index.XXXXXXX

2013-10-11 Thread primoz . skale
Honestly I don't know for sure if you can delete then. Maybe make a backup 
then delete them and see if it still works :)

Replication works differently in SolrCloud world as I currently know. I 
don't think there are any additional index.* folders because fallback does 
not work in SolrCloud (someone correct me if I am wrong!).

Primož



From:   Yago Riveiro yago.rive...@gmail.com
To: solr-user@lucene.apache.org
Date:   11.10.2013 12:36
Subject:Re: Cores with lot of folders with prefix index.XXX



The thread that you point is about master / slave - replication, Is this 
issue valid on SolrCloud context? 

I check the index.properties and indeed the variable index=index.X 
point to a folder, the others can be deleted without any scary side 
effect?


-- 
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Friday, October 11, 2013 at 11:31 AM, primoz.sk...@policija.si wrote:

 Do you have a lot of failed replications? Maybe those folders have 
 something to do with this (please see the last answer at 
 
http://stackoverflow.com/questions/3145192/why-does-my-solr-slave-index-keep-growing

 ). If your disk space is valuable check index.properties file under data 
 
 folder and try to determine which folders can be safely deleted.
 
 Primo¾
 
 
 
 
 From: Yago Riveiro yago.rive...@gmail.com (
mailto:yago.rive...@gmail.com)
 To: solr-user@lucene.apache.org (mailto:solr-user@lucene.apache.org)
 Date: 11.10.2013 12:13
 Subject: Re: Cores with lot of folders with prefix index.XXX
 
 
 
 I have ssd's therefor my space is like gold, I can have 30% of my space 
 waste in failed replications, or replications that are not cleaned. 
 
 The question for me is if this a normal behaviour or is a bug. If is a 
 normal behaviour I have a trouble because a ssd with more than 512G is 
 expensive.
 
 -- 
 Yago Riveiro
 Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
 
 
 On Friday, October 11, 2013 at 11:03 AM, primoz.sk...@policija.si (
mailto:primoz.sk...@policija.si) wrote:
 
  I think this is connected to replications being made? I also have 
quite
  some of them but currently I am not worried :)
  
 
 
 





Re: Cores with lot of folders with prefix index.XXXXXXX

2013-10-11 Thread primoz . skale
Thanks, I guess I was wrong after all in my last post.

Primož




From:   Shalin Shekhar Mangar shalinman...@gmail.com
To: solr-user@lucene.apache.org
Date:   11.10.2013 12:43
Subject:Re: Cores with lot of folders with prefix index.XXX



There are open issues related to extra index.XXX folders lying around if
replication/recovery fails. See
https://issues.apache.org/jira/browse/SOLR-4506


On Fri, Oct 11, 2013 at 4:06 PM, Yago Riveiro 
yago.rive...@gmail.comwrote:

 The thread that you point is about master / slave - replication, Is this
 issue valid on SolrCloud context?

 I check the index.properties and indeed the variable index=index.X
 point to a folder, the others can be deleted without any scary side 
effect?


 --
 Yago Riveiro
 Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


 On Friday, October 11, 2013 at 11:31 AM, primoz.sk...@policija.si wrote:

  Do you have a lot of failed replications? Maybe those folders have
  something to do with this (please see the last answer at
 
 
http://stackoverflow.com/questions/3145192/why-does-my-solr-slave-index-keep-growing

  ). If your disk space is valuable check index.properties file under 
data
  folder and try to determine which folders can be safely deleted.
 
  Primo¾
 
 
 
 
  From: Yago Riveiro yago.rive...@gmail.com (mailto:
 yago.rive...@gmail.com)
  To: solr-user@lucene.apache.org (mailto:solr-user@lucene.apache.org)
  Date: 11.10.2013 12:13
  Subject: Re: Cores with lot of folders with prefix index.XXX
 
 
 
  I have ssd's therefor my space is like gold, I can have 30% of my 
space
  waste in failed replications, or replications that are not cleaned.
 
  The question for me is if this a normal behaviour or is a bug. If is a
  normal behaviour I have a trouble because a ssd with more than 512G is
  expensive.
 
  --
  Yago Riveiro
  Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
 
 
  On Friday, October 11, 2013 at 11:03 AM, 
primoz.sk...@policija.si(mailto:
 primoz.sk...@policija.si) wrote:
 
   I think this is connected to replications being made? I also have 
quite
   some of them but currently I am not worried :)
  
 
 
 





-- 
Regards,
Shalin Shekhar Mangar.



Re: Solr 4.4.0 on Ubuntu 10.04 with Jetty 6.1 from package Repository

2013-10-11 Thread Peter Schmidt
So the main problem was that the libs must be copied to the WEB-INF/lib
directory insteed of the jetty lib/ext directory. Is the fact that you
should you use WEB-INF/lib somewhere documented?


2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com

 Not so hard switching it to Oracle JDK 7u40.
 Just download it and change the JAVA_HOME path in /etc/default/jetty, so
 it's not nescessary to switch java version with update-java-alternatives

 The machine is 64bit :)



 2013/10/11 Bill Bell billnb...@gmail.com

 Does this work ?
 I can suggest -XX:-UseLoopPredicate to switch off predicates.

 ???

 Which version of 7 is recommended ?

 Bill Bell
 Sent from mobile


  On Oct 10, 2013, at 11:29 AM, Smiley, David W. dsmi...@mitre.org
 wrote:
 
  *Don't* use JDK 7u40, it's been known to cause index corruption and
  SIGSEGV faults with Lucene: LUCENE-5212   This has not been unnoticed by
  Oracle.
 
  ~ David
 
  On 10/10/13 12:34 PM, Guido Medina guido.med...@temetra.com wrote:
 
  2. Java version: There are huges performance winning between Java 5, 6
and 7; we use Oracle JDK 7u40.
 





Re: Re: feedback on Solr 4.x LotsOfCores feature

2013-10-11 Thread Erick Erickson
bq: sharing the underlying solrconfig object the configset introduced
in JIRA SOLR-4478 seems to be the solution for non-SolrCloud mode

SOLR-4478 will NOT share the underlying config objects, it simply
shares the underlying directory. Each core will, at least as presently
envisioned, simply read the files that exist there and create their
own solrconfig object. Schema objects may be shared, but not config
objects. It may turn out to be relatively easy to do in the configset
situation, but last time I looked at sharing the underlying config
object it was too fraught with problems.

bq: 15K cores is around 4 minutes

I find this very odd. On my laptop, spinning disk, I think I was
seeing 1k cores discovered/sec. You're seeing roughly 16x slower, so I
have no idea what's going on here. If this is just reading the files,
you should be seeing horrible disk contention. Are you on some kind of
networked drive?

bq: To do that in background and to block on that request until core
discovery is complete, should not work for us (due to the worst case).
What other choices are there? Either you have to do it up front or
with some kind of blocking. Hmmm, I suppose you could keep some kind
of custom store (DB? File? ZooKeeper?) that would keep the last known
layout. You'd still have some kind of worst-case situation where the
core you were trying to load wouldn't be in your persistent store and
you'd _still_ have to wait for the discovery process to complete.

bq: and we will use the cores Auto option to create load or only load
the core on
Interesting. I can see how this could all work without any core
discovery but it does require a very specific setup.

On Thu, Oct 10, 2013 at 11:42 AM, Soyez Olivier
olivier.so...@worldline.com wrote:
 The corresponding patch for Solr 4.2.1 LotsOfCores can be found in SOLR-5316, 
 including the new Cores options :
 - numBuckets to create a subdirectory based on a hash on the corename % 
 numBuckets in the core Datadir
 - Auto with 3 differents values :
   1) false : default behaviour
   2) createLoad : create, if not exist, and load the core on the fly on the 
 first incoming request (update, select)
   3) onlyLoad : load the core on the fly on the first incoming request 
 (update, select), if exist on disk

 Concerning :
 - sharing the underlying solrconfig object, the configset introduced in JIRA 
 SOLR-4478 seems to be the solution for non-SolrCloud mode.
 We need to test it for our use case. If another solution exists, please tell 
 me. We are very interested in such functionality and to contribute, if we can.

 - the possibility of lotsOfCores in SolrCloud, we don't know in details how 
 SolrCloud is working.
 But one possible limit is the maximum number of entries that can be added to 
 a zookeeper node.
 Maybe, a solution will be just a kind of hashing in the zookeeper tree.

 - the time to discover cores in Solr 4.4 : with spinning disk under linux, 
 all cores with transient=true and loadOnStartup=false, the linux buffer 
 cache empty before starting Solr :
 15K cores is around 4 minutes. It's linear in the cores number, so for 50K 
 it's more than 13 minutes. In fact, it corresponding to the time to read all 
 core.properties files.
 To do that in background and to block on that request until core discovery is 
 complete, should not work for us (due to the worst case).
 So, we will just disable the core Discovery, because we don't need to know 
 all cores from the start. Start Solr without any core entries in solr.xml, 
 and we will use the cores Auto option to create load or only load the core on 
 the fly, based on the existence of the core on the disk (absolute path 
 calculated from the core name).

 Thanks for your interest,

 Olivier
 
 De : Erick Erickson [erickerick...@gmail.com]
 Date d'envoi : lundi 7 octobre 2013 14:33
 À : solr-user@lucene.apache.org
 Objet : Re: feedback on Solr 4.x LotsOfCores feature

 Thanks for the great writeup! It's always interesting to see how
 a feature plays out in the real world. A couple of questions
 though:

 bq: We added 2 Cores options :
 Do you mean you patched Solr? If so are you willing to shard the code
 back? If both are yes, please open a JIRA, attach the patch and assign
 it to me.

 bq:  the number of file descriptors, it used a lot (need to increase global
 max and per process fd)

 Right, this makes sense since you have a bunch of cores all with their
 own descriptors open. I'm assuming that you hit a rather high max
 number and it stays pretty steady

 bq: the overhead to parse solrconfig.xml and load dependencies to open
 each core

 Right, I tried to look at sharing the underlying solrconfig object but
 it seemed pretty hairy. There are some extensive comments in the
 JIRA of the problems I foresaw. There may be some action on this
 in the future.

 bq: lotsOfCores doesn’t work with SolrCloud

 Right, we haven't concentrated on that, it's an interesting problem.
 In 

Re: Find documents that are composed of % words

2013-10-11 Thread Erick Erickson
bq: but you cannot ask this to client.

You _can_ ask this of a client. IMO you are obligated to.
A gentle way to do that is say something like:

Solr doesn't do that out-of-the-box. I estimate it will
take me XXX weeks to implement that in custom code.
I will be unable to make progress on features A-F during
that time. We can try tweaking Solr's ranking with the
standard  configurations and see if that satisfies your
ranking requirements in YYY days. Please prioritize this
relative to the other features.

I have, quite literally been in very similar situations.
The client was convinced that BM25 ranking would give
better results (this was before flexible scoring). They
never needed the BM25 stuff. And their project was wildly
successful.

It's amazing how often software people don't give this
feedback and then the project managers are surprised
later by time/cost overruns or lack of features. We _must_
inform our clients of the costs of a feature and cheaper
alternatives before they can make informed decisions.

It's also amazing how often, when given realistic cost
estimates, features like this get put off forever. On those
occasions when it _does_ make a difference, at least
the client has the information necessary to prioritize,
and their expectations are set appropriately.

Rant done,
Erick

On Thu, Oct 10, 2013 at 3:03 PM, shahzad73 shahzad...@yahoo.com wrote:
 Yes the correct is answer may be Why but you cannot ask this to client.
 He think there is something interesting with this formula and if it works we
 can index websites with  Nutch + Solrand let users input queries that
 can locate documents which has % of foreign words other than list provided.
 i will check the answer provided

 Shahzad



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Find-documents-that-are-composed-of-words-tp4094264p4094778.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 4.4.0 on Ubuntu 10.04 with Jetty 6.1 from package Repository

2013-10-11 Thread Peter Schmidt
I can report that jetty is running now with this options:

JAVA_OPTIONS=-Djava.awt.headless=true -Dfile.encoding=UTF-8 -Xms256m
-Xmx256m -XX:+UseG1GC -XX:MaxGCPauseMillis=50 -XX:+OptimizeStringConcat
-XX:+UseStringCache -Dsolr.solr.home=/usr/share/solr $JAVA_OPTIONS

@Guido: I reduced the min/max heap size to 256m, i will increase this on
production server.


2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com

 So the main problem was that the libs must be copied to the WEB-INF/lib
 directory insteed of the jetty lib/ext directory. Is the fact that you
 should you use WEB-INF/lib somewhere documented?


 2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com

 Not so hard switching it to Oracle JDK 7u40.
 Just download it and change the JAVA_HOME path in /etc/default/jetty, so
 it's not nescessary to switch java version with update-java-alternatives

 The machine is 64bit :)



 2013/10/11 Bill Bell billnb...@gmail.com

 Does this work ?
 I can suggest -XX:-UseLoopPredicate to switch off predicates.

 ???

 Which version of 7 is recommended ?

 Bill Bell
 Sent from mobile


  On Oct 10, 2013, at 11:29 AM, Smiley, David W. dsmi...@mitre.org
 wrote:
 
  *Don't* use JDK 7u40, it's been known to cause index corruption and
  SIGSEGV faults with Lucene: LUCENE-5212   This has not been unnoticed
 by
  Oracle.
 
  ~ David
 
  On 10/10/13 12:34 PM, Guido Medina guido.med...@temetra.com
 wrote:
 
  2. Java version: There are huges performance winning between Java 5, 6
and 7; we use Oracle JDK 7u40.
 






Re: Solr 4.4.0 on Ubuntu 10.04 with Jetty 6.1 from package Repository

2013-10-11 Thread Guido Medina
I can't tell for sure if that is documented somewhere, I did that 
straight forward cause of years I have been developing Java webapps, a 
class not found usually means that some jar/class is missing somewhere, 
because of all the issues I have seen with parent-child class loaders, 
my 1st choice is usually to make the jars/classes available to the 
relevant webapp classloader, in this case to WEB-INF/lib Solr webapp; 
which if running several webapps will require more PERM GEN space, but 
in this case is not a problem cause there is only one webapp running 
which won't lead to several child class loader loading the same set of 
classes from a jar.


I have seen too man weird things with class loaders, well, enough about 
class loading, don't want to hijack the subject of this thread,


HTH,

Guido.


On 11/10/13 11:55, Peter Schmidt wrote:

So the main problem was that the libs must be copied to the WEB-INF/lib
directory insteed of the jetty lib/ext directory. Is the fact that you
should you use WEB-INF/lib somewhere documented?


2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com


Not so hard switching it to Oracle JDK 7u40.
Just download it and change the JAVA_HOME path in /etc/default/jetty, so
it's not nescessary to switch java version with update-java-alternatives

The machine is 64bit :)



2013/10/11 Bill Bell billnb...@gmail.com


Does this work ?
I can suggest -XX:-UseLoopPredicate to switch off predicates.

???

Which version of 7 is recommended ?

Bill Bell
Sent from mobile



On Oct 10, 2013, at 11:29 AM, Smiley, David W. dsmi...@mitre.org

wrote:

*Don't* use JDK 7u40, it's been known to cause index corruption and
SIGSEGV faults with Lucene: LUCENE-5212   This has not been unnoticed by
Oracle.

~ David


On 10/10/13 12:34 PM, Guido Medina guido.med...@temetra.com wrote:

2. Java version: There are huges performance winning between Java 5, 6
   and 7; we use Oracle JDK 7u40.






Re: Solr 4.4.0 on Ubuntu 10.04 with Jetty 6.1 from package Repository

2013-10-11 Thread Guido Medina
Remember the -server which for Java webapps or dedicated Java services 
will improve things.


Guido.

On 11/10/13 12:26, Peter Schmidt wrote:

I can report that jetty is running now with this options:

JAVA_OPTIONS=-Djava.awt.headless=true -Dfile.encoding=UTF-8 -Xms256m
-Xmx256m -XX:+UseG1GC -XX:MaxGCPauseMillis=50 -XX:+OptimizeStringConcat
-XX:+UseStringCache -Dsolr.solr.home=/usr/share/solr $JAVA_OPTIONS

@Guido: I reduced the min/max heap size to 256m, i will increase this on
production server.


2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com


So the main problem was that the libs must be copied to the WEB-INF/lib
directory insteed of the jetty lib/ext directory. Is the fact that you
should you use WEB-INF/lib somewhere documented?


2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com


Not so hard switching it to Oracle JDK 7u40.
Just download it and change the JAVA_HOME path in /etc/default/jetty, so
it's not nescessary to switch java version with update-java-alternatives

The machine is 64bit :)



2013/10/11 Bill Bell billnb...@gmail.com


Does this work ?
I can suggest -XX:-UseLoopPredicate to switch off predicates.

???

Which version of 7 is recommended ?

Bill Bell
Sent from mobile



On Oct 10, 2013, at 11:29 AM, Smiley, David W. dsmi...@mitre.org

wrote:

*Don't* use JDK 7u40, it's been known to cause index corruption and
SIGSEGV faults with Lucene: LUCENE-5212   This has not been unnoticed

by

Oracle.

~ David


On 10/10/13 12:34 PM, Guido Medina guido.med...@temetra.com

wrote:

2. Java version: There are huges performance winning between Java 5, 6
   and 7; we use Oracle JDK 7u40.






Re: Find documents that are composed of % words

2013-10-11 Thread shahzad73
Eric agreed   Solr + Nutch solution was proposed by myself and had never used
these technologies, this is first time i handle these 2.   My initial
response to client's requirments were to try to work out existing industry
tools and then modify it according to client requirements instead of
re-inventing the wheel. I start from 0 to this point and was not even aware
Sole can handle this sort of requirement . 

Now all infrastructure is there crawler + index and a app to make searches,
its just this base requirement to fullfill.   At the moment i am moving in
dark to configure Solr to handle this requirements.   Here is what I am
thinking to do

Develop a filter which is called in search time for a field that will hold
all tokens for the page.   it will determine how many tokens (words) match
with criteria words  and what are remaining tokens.   get the total number
of tokens for a document and produce the % of matched and unmatched ratio.

Not sure above solution will work. so need suggestions






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Find-documents-that-are-composed-of-words-tp4094264p4094953.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Find documents that are composed of % words

2013-10-11 Thread shahzad73
Aloke Ghoshal i'm trying to work out your equation.   i am using standard
scheme provided by nutch for solr and not aware of how to calculate  
myfieldwordcount   in first query.no idea where this count will come
from.   is there any filter that will store number of tokens generated for a
specific field and store it as another field.   that way we can use it .
not sure what norm does in second equation  try to find information for
this from online and did not find any yet.   please explain


Shahzad



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Find-documents-that-are-composed-of-words-tp4094264p4094955.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 4.4.0 on Ubuntu 10.04 with Jetty 6.1 from package Repository

2013-10-11 Thread Peter Schmidt
@Guido: Itried it before and than i thought you marked just the server
options

Because the -sever causes a:

sudo service jetty start
 * Starting Jetty servlet engine.
jetty
Invalid option -server
Cannot parse command line arguments

Or should i substitute server with ...?

Options with -server:


JAVA_OPTIONS=-Djava.awt.headless=true -Dfile.encoding=UTF-8 -server
-Xms256m -Xmx256m -XX:+UseG1GC -XX:MaxGCPauseMillis=50
-XX:+OptimizeStringConcat -XX:+UseStringCache
-Dsolr.solr.home=/usr/share/solr $JAVA_OPTIONS



2013/10/11 Guido Medina guido.med...@temetra.com

 Remember the -server which for Java webapps or dedicated Java services
 will improve things.

 Guido.


 On 11/10/13 12:26, Peter Schmidt wrote:

 I can report that jetty is running now with this options:

 JAVA_OPTIONS=-Djava.awt.**headless=true -Dfile.encoding=UTF-8 -Xms256m
 -Xmx256m -XX:+UseG1GC -XX:MaxGCPauseMillis=50 -XX:+OptimizeStringConcat
 -XX:+UseStringCache -Dsolr.solr.home=/usr/share/**solr $JAVA_OPTIONS

 @Guido: I reduced the min/max heap size to 256m, i will increase this on
 production server.


 2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com

  So the main problem was that the libs must be copied to the WEB-INF/lib
 directory insteed of the jetty lib/ext directory. Is the fact that you
 should you use WEB-INF/lib somewhere documented?


 2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com

  Not so hard switching it to Oracle JDK 7u40.
 Just download it and change the JAVA_HOME path in /etc/default/jetty, so
 it's not nescessary to switch java version with update-java-alternatives

 The machine is 64bit :)



 2013/10/11 Bill Bell billnb...@gmail.com

  Does this work ?
 I can suggest -XX:-UseLoopPredicate to switch off predicates.

 ???

 Which version of 7 is recommended ?

 Bill Bell
 Sent from mobile


  On Oct 10, 2013, at 11:29 AM, Smiley, David W. dsmi...@mitre.org

 wrote:

 *Don't* use JDK 7u40, it's been known to cause index corruption and
 SIGSEGV faults with Lucene: LUCENE-5212   This has not been unnoticed

 by

 Oracle.

 ~ David

  On 10/10/13 12:34 PM, Guido Medina guido.med...@temetra.com

 wrote:

 2. Java version: There are huges performance winning between Java 5, 6
and 7; we use Oracle JDK 7u40.






Re: Solr 4.4.0 on Ubuntu 10.04 with Jetty 6.1 from package Repository

2013-10-11 Thread Guido Medina

It is JVM parameter, example:
JAVA_OPTIONS=-Djava.awt.headless=true -Dfile.encoding=UTF-8 -server 
-Xms256m -Xmx256m


If you want to concatenate more JVM parameters you do it like this:
JAVA_OPTIONS=-Dsolr.solr.home=/usr/share/solr $JAVA_OPTIONS

Take a good look at the format,

Guido.

On 11/10/13 13:37, Peter Schmidt wrote:

@Guido: Itried it before and than i thought you marked just the server
options

Because the -sever causes a:

sudo service jetty start
  * Starting Jetty servlet engine.
jetty
Invalid option -server
Cannot parse command line arguments

Or should i substitute server with ...?

Options with -server:


JAVA_OPTIONS=-Djava.awt.headless=true -Dfile.encoding=UTF-8 -server
-Xms256m -Xmx256m -XX:+UseG1GC -XX:MaxGCPauseMillis=50
-XX:+OptimizeStringConcat -XX:+UseStringCache
-Dsolr.solr.home=/usr/share/solr $JAVA_OPTIONS



2013/10/11 Guido Medina guido.med...@temetra.com


Remember the -server which for Java webapps or dedicated Java services
will improve things.

Guido.


On 11/10/13 12:26, Peter Schmidt wrote:


I can report that jetty is running now with this options:

JAVA_OPTIONS=-Djava.awt.**headless=true -Dfile.encoding=UTF-8 -Xms256m
-Xmx256m -XX:+UseG1GC -XX:MaxGCPauseMillis=50 -XX:+OptimizeStringConcat
-XX:+UseStringCache -Dsolr.solr.home=/usr/share/**solr $JAVA_OPTIONS

@Guido: I reduced the min/max heap size to 256m, i will increase this on
production server.


2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com

  So the main problem was that the libs must be copied to the WEB-INF/lib

directory insteed of the jetty lib/ext directory. Is the fact that you
should you use WEB-INF/lib somewhere documented?


2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com

  Not so hard switching it to Oracle JDK 7u40.

Just download it and change the JAVA_HOME path in /etc/default/jetty, so
it's not nescessary to switch java version with update-java-alternatives

The machine is 64bit :)



2013/10/11 Bill Bell billnb...@gmail.com

  Does this work ?

I can suggest -XX:-UseLoopPredicate to switch off predicates.

???

Which version of 7 is recommended ?

Bill Bell
Sent from mobile


  On Oct 10, 2013, at 11:29 AM, Smiley, David W. dsmi...@mitre.org
wrote:


*Don't* use JDK 7u40, it's been known to cause index corruption and
SIGSEGV faults with Lucene: LUCENE-5212   This has not been unnoticed


by


Oracle.

~ David

  On 10/10/13 12:34 PM, Guido Medina guido.med...@temetra.com
wrote:
2. Java version: There are huges performance winning between Java 5, 6

and 7; we use Oracle JDK 7u40.





Re: Solr 4.4.0 on Ubuntu 10.04 with Jetty 6.1 from package Repository

2013-10-11 Thread Peter Schmidt
Strange. When i add -server to the arguments, i got everytime the error
on jetty startup

Invalid option -server
Cannot parse command line arguments


2013/10/11 Guido Medina guido.med...@temetra.com

 It is JVM parameter, example:

 JAVA_OPTIONS=-Djava.awt.**headless=true -Dfile.encoding=UTF-8 -server
 -Xms256m -Xmx256m

 If you want to concatenate more JVM parameters you do it like this:
 JAVA_OPTIONS=-Dsolr.solr.**home=/usr/share/solr $JAVA_OPTIONS

 Take a good look at the format,

 Guido.


 On 11/10/13 13:37, Peter Schmidt wrote:

 @Guido: Itried it before and than i thought you marked just the server
 options

 Because the -sever causes a:

 sudo service jetty start
   * Starting Jetty servlet engine.
 jetty
 Invalid option -server
 Cannot parse command line arguments

 Or should i substitute server with ...?

 Options with -server:


 JAVA_OPTIONS=-Djava.awt.**headless=true -Dfile.encoding=UTF-8 -server
 -Xms256m -Xmx256m -XX:+UseG1GC -XX:MaxGCPauseMillis=50
 -XX:+OptimizeStringConcat -XX:+UseStringCache
 -Dsolr.solr.home=/usr/share/**solr $JAVA_OPTIONS



 2013/10/11 Guido Medina guido.med...@temetra.com

  Remember the -server which for Java webapps or dedicated Java services
 will improve things.

 Guido.


 On 11/10/13 12:26, Peter Schmidt wrote:

  I can report that jetty is running now with this options:

 JAVA_OPTIONS=-Djava.awt.headless=true -Dfile.encoding=UTF-8
 -Xms256m

 -Xmx256m -XX:+UseG1GC -XX:MaxGCPauseMillis=50 -XX:+OptimizeStringConcat
 -XX:+UseStringCache -Dsolr.solr.home=/usr/share/solr $JAVA_OPTIONS


 @Guido: I reduced the min/max heap size to 256m, i will increase this on
 production server.


 2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com

   So the main problem was that the libs must be copied to the
 WEB-INF/lib

 directory insteed of the jetty lib/ext directory. Is the fact that you
 should you use WEB-INF/lib somewhere documented?


 2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com

   Not so hard switching it to Oracle JDK 7u40.

 Just download it and change the JAVA_HOME path in /etc/default/jetty,
 so
 it's not nescessary to switch java version with
 update-java-alternatives

 The machine is 64bit :)



 2013/10/11 Bill Bell billnb...@gmail.com

   Does this work ?

 I can suggest -XX:-UseLoopPredicate to switch off predicates.

 ???

 Which version of 7 is recommended ?

 Bill Bell
 Sent from mobile


   On Oct 10, 2013, at 11:29 AM, Smiley, David W. 
 dsmi...@mitre.org
 wrote:

  *Don't* use JDK 7u40, it's been known to cause index corruption and
 SIGSEGV faults with Lucene: LUCENE-5212   This has not been
 unnoticed

  by

  Oracle.

 ~ David

   On 10/10/13 12:34 PM, Guido Medina guido.med...@temetra.com
 wrote:
 2. Java version: There are huges performance winning between Java
 5, 6

 and 7; we use Oracle JDK 7u40.





Re: Solr 4.4.0 on Ubuntu 10.04 with Jetty 6.1 from package Repository

2013-10-11 Thread Peter Schmidt
Oh, i got it http://stackoverflow.com/a/5273166/326905

at least 2 cores and at least 2 GB physical memory

Until know i'm using a VM with single core and 1GB RAM.

So this will be later for production :)

Thank you Guido.


2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com

 Strange. When i add -server to the arguments, i got everytime the error
 on jetty startup


 Invalid option -server
 Cannot parse command line arguments


 2013/10/11 Guido Medina guido.med...@temetra.com

 It is JVM parameter, example:

 JAVA_OPTIONS=-Djava.awt.**headless=true -Dfile.encoding=UTF-8 -server
 -Xms256m -Xmx256m

 If you want to concatenate more JVM parameters you do it like this:
 JAVA_OPTIONS=-Dsolr.solr.**home=/usr/share/solr $JAVA_OPTIONS

 Take a good look at the format,

 Guido.


 On 11/10/13 13:37, Peter Schmidt wrote:

 @Guido: Itried it before and than i thought you marked just the server
 options

 Because the -sever causes a:

 sudo service jetty start
   * Starting Jetty servlet engine.
 jetty
 Invalid option -server
 Cannot parse command line arguments

 Or should i substitute server with ...?

 Options with -server:


 JAVA_OPTIONS=-Djava.awt.**headless=true -Dfile.encoding=UTF-8 -server
 -Xms256m -Xmx256m -XX:+UseG1GC -XX:MaxGCPauseMillis=50
 -XX:+OptimizeStringConcat -XX:+UseStringCache
 -Dsolr.solr.home=/usr/share/**solr $JAVA_OPTIONS



 2013/10/11 Guido Medina guido.med...@temetra.com

  Remember the -server which for Java webapps or dedicated Java services
 will improve things.

 Guido.


 On 11/10/13 12:26, Peter Schmidt wrote:

  I can report that jetty is running now with this options:

 JAVA_OPTIONS=-Djava.awt.headless=true -Dfile.encoding=UTF-8
 -Xms256m

 -Xmx256m -XX:+UseG1GC -XX:MaxGCPauseMillis=50 -XX:+OptimizeStringConcat
 -XX:+UseStringCache -Dsolr.solr.home=/usr/share/solr
 $JAVA_OPTIONS


 @Guido: I reduced the min/max heap size to 256m, i will increase this
 on
 production server.


 2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com

   So the main problem was that the libs must be copied to the
 WEB-INF/lib

 directory insteed of the jetty lib/ext directory. Is the fact that you
 should you use WEB-INF/lib somewhere documented?


 2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com

   Not so hard switching it to Oracle JDK 7u40.

 Just download it and change the JAVA_HOME path in
 /etc/default/jetty, so
 it's not nescessary to switch java version with
 update-java-alternatives

 The machine is 64bit :)



 2013/10/11 Bill Bell billnb...@gmail.com

   Does this work ?

 I can suggest -XX:-UseLoopPredicate to switch off predicates.

 ???

 Which version of 7 is recommended ?

 Bill Bell
 Sent from mobile


   On Oct 10, 2013, at 11:29 AM, Smiley, David W. 
 dsmi...@mitre.org
 wrote:

  *Don't* use JDK 7u40, it's been known to cause index corruption and
 SIGSEGV faults with Lucene: LUCENE-5212   This has not been
 unnoticed

  by

  Oracle.

 ~ David

   On 10/10/13 12:34 PM, Guido Medina guido.med...@temetra.com
 wrote:
 2. Java version: There are huges performance winning between Java
 5, 6

 and 7; we use Oracle JDK 7u40.






SolrCloud on SSL

2013-10-11 Thread Christopher Gross
I have 3 SolrCloud nodes (call them idx1, idx2, idx3), and the boxes have
SSL  certs configured on them to protect the Solr Indexes.

Right now, I can do queries on idx1 and it works fine.
If I try to query on idx3, I get:
org.apache.solr.common.SolrException:
org.apache.sorl.client.solrj.SolrServerException:IOException occurred when
talking to server at http://idx1:8443/solr/test1
(and then a long stack trace -- can't copy it, on a test network)

Is there a spot in a Solr configuration that I can set this up to use HTTPS?

Let me know if you need more information to determine the problem.

Thanks!

-- Chris


Re: Solr 4.4.0 on Ubuntu 10.04 with Jetty 6.1 from package Repository

2013-10-11 Thread Guido Medina
If your single core is at 32bits use Oracle JDK 7u25 or Ubuntu Open JDK 
7, the JDK 7u40 for 32bits will corrupt indexes as stated on the lucene 
bug report.


Guido.

On 11/10/13 15:13, Peter Schmidt wrote:

Oh, i got it http://stackoverflow.com/a/5273166/326905

at least 2 cores and at least 2 GB physical memory

Until know i'm using a VM with single core and 1GB RAM.

So this will be later for production :)

Thank you Guido.


2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com


Strange. When i add -server to the arguments, i got everytime the error
on jetty startup


Invalid option -server
Cannot parse command line arguments


2013/10/11 Guido Medina guido.med...@temetra.com


It is JVM parameter, example:

JAVA_OPTIONS=-Djava.awt.**headless=true -Dfile.encoding=UTF-8 -server
-Xms256m -Xmx256m

If you want to concatenate more JVM parameters you do it like this:
JAVA_OPTIONS=-Dsolr.solr.**home=/usr/share/solr $JAVA_OPTIONS

Take a good look at the format,

Guido.


On 11/10/13 13:37, Peter Schmidt wrote:


@Guido: Itried it before and than i thought you marked just the server
options

Because the -sever causes a:

sudo service jetty start
   * Starting Jetty servlet engine.
jetty
Invalid option -server
Cannot parse command line arguments

Or should i substitute server with ...?

Options with -server:


JAVA_OPTIONS=-Djava.awt.**headless=true -Dfile.encoding=UTF-8 -server
-Xms256m -Xmx256m -XX:+UseG1GC -XX:MaxGCPauseMillis=50
-XX:+OptimizeStringConcat -XX:+UseStringCache
-Dsolr.solr.home=/usr/share/**solr $JAVA_OPTIONS



2013/10/11 Guido Medina guido.med...@temetra.com

  Remember the -server which for Java webapps or dedicated Java services

will improve things.

Guido.


On 11/10/13 12:26, Peter Schmidt wrote:

  I can report that jetty is running now with this options:

JAVA_OPTIONS=-Djava.awt.headless=true -Dfile.encoding=UTF-8
-Xms256m

-Xmx256m -XX:+UseG1GC -XX:MaxGCPauseMillis=50 -XX:+OptimizeStringConcat
-XX:+UseStringCache -Dsolr.solr.home=/usr/share/solr
$JAVA_OPTIONS


@Guido: I reduced the min/max heap size to 256m, i will increase this
on
production server.


2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com

   So the main problem was that the libs must be copied to the
WEB-INF/lib


directory insteed of the jetty lib/ext directory. Is the fact that you
should you use WEB-INF/lib somewhere documented?


2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com

   Not so hard switching it to Oracle JDK 7u40.


Just download it and change the JAVA_HOME path in
/etc/default/jetty, so
it's not nescessary to switch java version with
update-java-alternatives

The machine is 64bit :)



2013/10/11 Bill Bell billnb...@gmail.com

   Does this work ?


I can suggest -XX:-UseLoopPredicate to switch off predicates.

???

Which version of 7 is recommended ?

Bill Bell
Sent from mobile


   On Oct 10, 2013, at 11:29 AM, Smiley, David W. 
dsmi...@mitre.org
wrote:

  *Don't* use JDK 7u40, it's been known to cause index corruption and

SIGSEGV faults with Lucene: LUCENE-5212   This has not been
unnoticed

  by

  Oracle.

~ David

   On 10/10/13 12:34 PM, Guido Medina guido.med...@temetra.com
wrote:
2. Java version: There are huges performance winning between Java
5, 6


 and 7; we use Oracle JDK 7u40.






Re: Solr 4.4.0 on Ubuntu 10.04 with Jetty 6.1 from package Repository

2013-10-11 Thread Peter Schmidt
no it is 64bit and just a development VM. In production the solr will use
multicore, also 64bit and some gb ram.


2013/10/11 Guido Medina guido.med...@temetra.com

 If your single core is at 32bits use Oracle JDK 7u25 or Ubuntu Open JDK 7,
 the JDK 7u40 for 32bits will corrupt indexes as stated on the lucene bug
 report.

 Guido.


 On 11/10/13 15:13, Peter Schmidt wrote:

 Oh, i got it 
 http://stackoverflow.com/a/**5273166/326905http://stackoverflow.com/a/5273166/326905

 at least 2 cores and at least 2 GB physical memory

 Until know i'm using a VM with single core and 1GB RAM.

 So this will be later for production :)

 Thank you Guido.


 2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com

  Strange. When i add -server to the arguments, i got everytime the error
 on jetty startup


 Invalid option -server
 Cannot parse command line arguments


 2013/10/11 Guido Medina guido.med...@temetra.com

  It is JVM parameter, example:

 JAVA_OPTIONS=-Djava.awt.headless=true -Dfile.encoding=UTF-8
 -server

 -Xms256m -Xmx256m

 If you want to concatenate more JVM parameters you do it like this:
 JAVA_OPTIONS=-Dsolr.solr.home=/usr/share/solr $JAVA_OPTIONS


 Take a good look at the format,

 Guido.


 On 11/10/13 13:37, Peter Schmidt wrote:

  @Guido: Itried it before and than i thought you marked just the server
 options

 Because the -sever causes a:

 sudo service jetty start
* Starting Jetty servlet engine.
 jetty
 Invalid option -server
 Cannot parse command line arguments

 Or should i substitute server with ...?

 Options with -server:


 JAVA_OPTIONS=-Djava.awt.headless=true -Dfile.encoding=UTF-8
 -server

 -Xms256m -Xmx256m -XX:+UseG1GC -XX:MaxGCPauseMillis=50
 -XX:+OptimizeStringConcat -XX:+UseStringCache
 -Dsolr.solr.home=/usr/share/solr $JAVA_OPTIONS




 2013/10/11 Guido Medina guido.med...@temetra.com

   Remember the -server which for Java webapps or dedicated Java
 services

 will improve things.

 Guido.


 On 11/10/13 12:26, Peter Schmidt wrote:

   I can report that jetty is running now with this options:

 JAVA_OPTIONS=-Djava.awt.**headless=true -Dfile.encoding=UTF-8

 -Xms256m

 -Xmx256m -XX:+UseG1GC -XX:MaxGCPauseMillis=50
 -XX:+OptimizeStringConcat
 -XX:+UseStringCache -Dsolr.solr.home=/usr/share/**solr

 $JAVA_OPTIONS


 @Guido: I reduced the min/max heap size to 256m, i will increase this
 on
 production server.


 2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com

So the main problem was that the libs must be copied to the
 WEB-INF/lib

  directory insteed of the jetty lib/ext directory. Is the fact that
 you
 should you use WEB-INF/lib somewhere documented?


 2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com

Not so hard switching it to Oracle JDK 7u40.

  Just download it and change the JAVA_HOME path in
 /etc/default/jetty, so
 it's not nescessary to switch java version with
 update-java-alternatives

 The machine is 64bit :)



 2013/10/11 Bill Bell billnb...@gmail.com

Does this work ?

  I can suggest -XX:-UseLoopPredicate to switch off predicates.

 ???

 Which version of 7 is recommended ?

 Bill Bell
 Sent from mobile


On Oct 10, 2013, at 11:29 AM, Smiley, David W. 
 dsmi...@mitre.org
 wrote:

   *Don't* use JDK 7u40, it's been known to cause index corruption
 and

 SIGSEGV faults with Lucene: LUCENE-5212   This has not been
 unnoticed

   by

   Oracle.

 ~ David

On 10/10/13 12:34 PM, Guido Medina 
 guido.med...@temetra.com
 wrote:
 2. Java version: There are huges performance winning between Java
 5, 6

   and 7; we use Oracle JDK 7u40.






Problems using DataImportHandler and TikaEntityProcessor

2013-10-11 Thread Peter Bleackley

Starting Solr with the command line


java -Dsolr.solr.home=example-DIH/solr -jar start.jar


and then trying to import some data with

java -Durl=http://localhost:8983/solr/tika/update -Dtype=application/pdf 
-jar post.jar *.pdf


fails with error

SimplePostTool: WARNING: Solr returned an error #400 Bad Request
SimplePostTool: WARNING: IOException while reading response: 
java.io.IOException: Server returned HTTP response code: 400 for URL: 
http://localhost:8983/solr/tika/update


These are all valid PDFs that I have previously been able to import with 
Solr Cell.


What am I doing wrong?

Dr Peter J Bleackley
Computational Linguistics Contractor
Playful Technology Ltd




Re: Multiple schemas in the same SolrCloud ?

2013-10-11 Thread maephisto
Upload the new configuration and the use the collection API to reload you
collection
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-ReloadaCollection



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multiple-schemas-in-the-same-SolrCloud-tp4094279p4094978.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Questions developing custom functionquery

2013-10-11 Thread JT
Hey Mikhail,

Thanks for responding.

Field: resourcename
Field-Type: org.apache.solr.schema.TextField
All 9 boxes checked (indexed, tokenized, stored).


I have various other fields (including MD5-checksums) in my Schema. When I
use a md5sum field (which is a str field, but doesn't have spaces,
forward slashes, etc.) The plugin I've written performs exactly as I've
expected.

I think the large part of my problem is that my ValueSource is being
instantianted as a the class StrFieldSource. When you call getvalues on a
StrFieldSource, you end up with a
DocTermsIndexDocValueshttp://grepcode.com/file/repo1.maven.org/maven2/org.apache.lucene/lucene-queries/4.3.0/org/apache/lucene/queries/function/docvalues/DocTermsIndexDocValues.java#DocTermsIndexDocValues.
Calling getVal() on a DocTermsIndexDocValues does some really weird stuff
that I really don't understand.
I assumed that calling ValueSource.getValues(...).strVal(int doc) would
simply return the data that my field corresponds to, but I don't think
that is true.


Its possible I'm going about this wrong and need to re-do my approach. I'm
just currently at a loss for what that approach is.



On Fri, Oct 11, 2013 at 2:48 AM, Mikhail Khludnev 
mkhlud...@griddynamics.com wrote:

 Hello JT,

 what's is the field and fieldType definition for resname ?
 can't you check how '/some
 example/data/here/2013/09/12/
 testing.text
 ' is handled on analysis page in SolrAdmin?


 On Fri, Oct 11, 2013 at 4:53 AM, Richard Lee rockiee...@gmail.com wrote:

  seems what u got is the terms other than the raw data. maybe u should
 check
  the api docs for more details
  2013-10-11 上午3:56于 JT handyrems...@gmail.com写道:
 
   I'm running into some issues developing a custom functionquery.
  
   My goal is to be able to implement a custom sorting technique.
  
   I have a field defined called resname, it is a single value str.
  
   Example: str name=resname/some
   example/data/here/2013/09/12/testing.text/str
  
   I would like to do a custom sort based on this resname field.
   Basically, I would like to parse out that date there (2013/09/12) and
  sort
   on that date.
  
  
   I've followed various tutorials
  - http://java.dzone.com/news/how-write-custom-solr
  -
  
 
 http://www.supermind.org/blog/756/how-to-write-a-custom-solr-functionquery
  
  
   Im at the point where my code compiles, runs, executes, etc. Solr is
  happy
   with my code.
  
   I have classes that inherit from ValueSorceParser and ValueSorce, etc.
  I've
   overrode parse and
   instantiated my class with ValueSource
  
   public ValueSource parse(FunctionQParser fqp) {
   return MyCustomClass(fqp.parseValueSource)
   }
  
   public class MyCustomClass extends ValueSource {
   ValueSource source;
  
   public MyCustomClass(ValueSource source) {
   this.source = source;
   }
  
   public FunctionValues getValues() {
  final FunctionValues sourceDV =
   source.getvalues(context,readerContext)
  return new IntValues(this)
   public int intVal(int doc) {
   //parse the value of resname here
 String value = sourceDV.strVal(doc);
...more stuff
}
  }
  }
  
   The issue I'm running into is that my call to sourceDV.strVal(doc) only
   returns part of the field, not all of it. It appears to be very
 random.
  
   I guess my actual question is, how do I access / reference the EXACT
 RAW
   value of a field, while writing a functionquery.
  
   Do I need to change my ValueSource to a String?, then somehow lookup
 the
   field name while inside my getValues call?
  
   Is there a way to access the raw field data , when referencing it as a
   FunctionValues?
  
  
   Maybe I'm going about this totally incorrectly?
  
 



 --
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics

 http://www.griddynamics.com
  mkhlud...@griddynamics.com



Re: Problems using DataImportHandler and TikaEntityProcessor

2013-10-11 Thread Furkan KAMACI
There may be a problem with you schema. Could you send your solr logs?


2013/10/11 Peter Bleackley bleackl...@zooey.co.uk

 Starting Solr with the command line


 java -Dsolr.solr.home=example-DIH/**solr -jar start.jar


 and then trying to import some data with

 java 
 -Durl=http://localhost:8983/**solr/tika/updatehttp://localhost:8983/solr/tika/update-Dtype=application/pdf
  -jar post.jar *.pdf

 fails with error

 SimplePostTool: WARNING: Solr returned an error #400 Bad Request
 SimplePostTool: WARNING: IOException while reading response:
 java.io.IOException: Server returned HTTP response code: 400 for URL:
 http://localhost:8983/solr/**tika/updatehttp://localhost:8983/solr/tika/update

 These are all valid PDFs that I have previously been able to import with
 Solr Cell.

 What am I doing wrong?

 Dr Peter J Bleackley
 Computational Linguistics Contractor
 Playful Technology Ltd





Re: SolrCloud on SSL

2013-10-11 Thread Shawn Heisey
On 10/11/2013 8:17 AM, Christopher Gross wrote:
 I have 3 SolrCloud nodes (call them idx1, idx2, idx3), and the boxes have
 SSL  certs configured on them to protect the Solr Indexes.
 
 Right now, I can do queries on idx1 and it works fine.
 If I try to query on idx3, I get:
 org.apache.solr.common.SolrException:
 org.apache.sorl.client.solrj.SolrServerException:IOException occurred when
 talking to server at http://idx1:8443/solr/test1
 (and then a long stack trace -- can't copy it, on a test network)
 
 Is there a spot in a Solr configuration that I can set this up to use HTTPS?

From what I can tell, not yet.

https://issues.apache.org/jira/browse/SOLR-3854
https://issues.apache.org/jira/browse/SOLR-4407
https://issues.apache.org/jira/browse/SOLR-4470

I'm wondering why you want to do this, though.  It adds extra CPU
overhead.  Perhaps not a lot, but it's not free.

As for protecting Solr against eavesdropping, is it in a location where
that's possible?  The bottom line is this:  People that you cannot trust
should not have direct access to Solr.  It should be firewalled so only
trusted personnel and applications can talk to it.

Anyone who has direct access to Solr can change your index, delete your
index, and send denial of service queries.  If you take steps to block
access to the update handler(s) and the admin UI, denial of service
queries are still possible.  Blocking access to the update handlers and
admin UI is not something Solr itself can do - that's a job for the
servlet container.

Related general issue: The /browse handler included in the example
(which utilizes code written in velocity) requires that the user have
direct access to Solr.  This makes its very design insecure.  That
handler is intended as a demonstration of Solr's capabilities and how to
use them, it's not for production.

Thanks,
Shawn



Re: Cores with lot of folders with prefix index.XXXXXXX

2013-10-11 Thread Shawn Heisey
On 10/11/2013 4:36 AM, Yago Riveiro wrote:
 The thread that you point is about master / slave - replication, Is this 
 issue valid on SolrCloud context?  
 
 I check the index.properties and indeed the variable index=index.X point 
 to a folder, the others can be deleted without any scary side effect?

SolrCloud uses traditional replication behind the scenes as a last
resort to recover an index when there's some kind of failure, or when it
determines that things are too far out of sync after a Solr restart, or
when adding replicas.  During normal operation, traditional replication
is *NOT* used.

If you are getting a lot of index. directories, this may be an
indication of an underlying issue, unless you are testing things and
doing a lot of Solr restarts, in which case it may be expected.

The index.properties file may be one way to go.  I would want to be
absolutely sure before deleting directories.  You should be able to
manually check which index directory Solr is using (with tools like lsof
for Linux or Process Explorer for Windows) and delete the others.

Thanks,
Shawn



Re: Solr 4.4.0 on Ubuntu 10.04 with Jetty 6.1 from package Repository

2013-10-11 Thread Guido Medina
Then I think you downloaded the wrong JDK 7 (32bits JDK?), if you are 
running JDK 7 64bits the -server flag should be recognized. According to 
the stackoverflow link you mentioned before.


Guido.

On 11/10/13 15:48, Peter Schmidt wrote:

no it is 64bit and just a development VM. In production the solr will use
multicore, also 64bit and some gb ram.


2013/10/11 Guido Medina guido.med...@temetra.com


If your single core is at 32bits use Oracle JDK 7u25 or Ubuntu Open JDK 7,
the JDK 7u40 for 32bits will corrupt indexes as stated on the lucene bug
report.

Guido.


On 11/10/13 15:13, Peter Schmidt wrote:


Oh, i got it 
http://stackoverflow.com/a/**5273166/326905http://stackoverflow.com/a/5273166/326905

at least 2 cores and at least 2 GB physical memory

Until know i'm using a VM with single core and 1GB RAM.

So this will be later for production :)

Thank you Guido.


2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com

  Strange. When i add -server to the arguments, i got everytime the error

on jetty startup


Invalid option -server
Cannot parse command line arguments


2013/10/11 Guido Medina guido.med...@temetra.com

  It is JVM parameter, example:

JAVA_OPTIONS=-Djava.awt.headless=true -Dfile.encoding=UTF-8
-server

-Xms256m -Xmx256m

If you want to concatenate more JVM parameters you do it like this:
JAVA_OPTIONS=-Dsolr.solr.home=/usr/share/solr $JAVA_OPTIONS


Take a good look at the format,

Guido.


On 11/10/13 13:37, Peter Schmidt wrote:

  @Guido: Itried it before and than i thought you marked just the server

options

Because the -sever causes a:

sudo service jetty start
* Starting Jetty servlet engine.
jetty
Invalid option -server
Cannot parse command line arguments

Or should i substitute server with ...?

Options with -server:


JAVA_OPTIONS=-Djava.awt.headless=true -Dfile.encoding=UTF-8
-server

-Xms256m -Xmx256m -XX:+UseG1GC -XX:MaxGCPauseMillis=50
-XX:+OptimizeStringConcat -XX:+UseStringCache
-Dsolr.solr.home=/usr/share/solr $JAVA_OPTIONS




2013/10/11 Guido Medina guido.med...@temetra.com

   Remember the -server which for Java webapps or dedicated Java
services


will improve things.

Guido.


On 11/10/13 12:26, Peter Schmidt wrote:

   I can report that jetty is running now with this options:


JAVA_OPTIONS=-Djava.awt.**headless=true -Dfile.encoding=UTF-8

-Xms256m

-Xmx256m -XX:+UseG1GC -XX:MaxGCPauseMillis=50
-XX:+OptimizeStringConcat
-XX:+UseStringCache -Dsolr.solr.home=/usr/share/**solr

$JAVA_OPTIONS


@Guido: I reduced the min/max heap size to 256m, i will increase this
on
production server.


2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com

So the main problem was that the libs must be copied to the
WEB-INF/lib

  directory insteed of the jetty lib/ext directory. Is the fact that

you
should you use WEB-INF/lib somewhere documented?


2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com

Not so hard switching it to Oracle JDK 7u40.

  Just download it and change the JAVA_HOME path in

/etc/default/jetty, so
it's not nescessary to switch java version with
update-java-alternatives

The machine is 64bit :)



2013/10/11 Bill Bell billnb...@gmail.com

Does this work ?

  I can suggest -XX:-UseLoopPredicate to switch off predicates.

???

Which version of 7 is recommended ?

Bill Bell
Sent from mobile


On Oct 10, 2013, at 11:29 AM, Smiley, David W. 
dsmi...@mitre.org
wrote:

   *Don't* use JDK 7u40, it's been known to cause index corruption
and


SIGSEGV faults with Lucene: LUCENE-5212   This has not been
unnoticed

   by


   Oracle.


~ David

On 10/10/13 12:34 PM, Guido Medina 
guido.med...@temetra.com
wrote:
2. Java version: There are huges performance winning between Java
5, 6

   and 7; we use Oracle JDK 7u40.







Solr Slave warning: No content recieved for file

2013-10-11 Thread Arcadius Ahouansou
Hello.

We are running a master-slave solr 3.x and we are seeing more and more of
this in the slave log file:

**
*Oct 10, 2013 10:17:00 PM org.apache.solr.handler.SnapPuller$FileFetcher
fetchPackets*
*WARNING: No content recieved for file: {name=_56l.prx,
lastmodified=1381443413000, size=0}*
**

Is this something we should worry about?

Note that we are running some deleteDocByQuery commands on the master.


Thanks.

Arcadius.


SOLR Cloud on JBOSS

2013-10-11 Thread Branham, Jeremy [HR]
Hello -

This wiki page is gone - https://wiki.apache.org/solr/SolrCloud%20using%20Jboss

I have been able to configure an external instance of Zookeeper, and an 
instance of SOLR in JBOSS..
But I am unsure how to point my SOLR instance to the ZK instance and upload the 
configuration.

All the examples I have found, show using script parameters to start SOLR 
rather than using a container like JBOSS.

Can someone point me in the right direction?

Thanks!


Jeremy D. Branham
Performance Technologist II
Sprint University Performance Support
Fort Worth, TX | Tel: **DOTNET
http://JeremyBranham.Wordpress.comhttp://jeremybranham.wordpress.com/
http://www.linkedin.com/in/jeremybranham




This e-mail may contain Sprint proprietary information intended for the sole 
use of the recipient(s). Any use by others is prohibited. If you are not the 
intended recipient, please contact the sender and delete all copies of the 
message.


Re: Problems using DataImportHandler and TikaEntityProcessor

2013-10-11 Thread PeteBleackley
kamaci wrote
 There may be a problem with you schema. Could you send your solr logs?
 
 
 2013/10/11 Peter Bleackley lt;

 bleackleyp@.co

 gt;
 
 Starting Solr with the command line


 java -Dsolr.solr.home=example-DIH/**solr -jar start.jar


 and then trying to import some data with

 java
 -Durl=http://localhost:8983/**solr/tika/updatelt;http://localhost:8983/solr/tika/updategt;-Dtype=application/pdf
 -jar post.jar *.pdf

 fails with error

 SimplePostTool: WARNING: Solr returned an error #400 Bad Request
 SimplePostTool: WARNING: IOException while reading response:
 java.io.IOException: Server returned HTTP response code: 400 for URL:
 http://localhost:8983/solr/**tika/updatelt;http://localhost:8983/solr/tika/updategt;

 These are all valid PDFs that I have previously been able to import with
 Solr Cell.

 What am I doing wrong?

 Dr Peter J Bleackley
 Computational Linguistics Contractor
 Playful Technology Ltd




11228 [qtp1831924725-17] INFO 
org.apache.solr.update.processor.LogUpdateProcessor  – [tika] webapp=/solr
path=/update params={} {} 0 0
11229 [qtp1831924725-17] ERROR org.apache.solr.core.SolrCore  –
org.apache.solr.common.SolrException: Unsupported ContentType:
application/pdf  Not in: [application/xml, text/csv, text/json,
application/csv, application/javabin, text/xml, application/json]
at
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:86)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:724)


I tried changing the options to -Dauto -Dfiletypes=pdf. This gave me a 404
error, apparently caused by post.jar adding /extract to the end of the URL





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problems-using-DataImportHandler-and-TikaEntityProcessor-tp4094983p4094987.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 4.4.0 on Ubuntu 10.04 with Jetty 6.1 from package Repository

2013-10-11 Thread Shawn Heisey
On 10/11/2013 4:55 AM, Peter Schmidt wrote:
 So the main problem was that the libs must be copied to the WEB-INF/lib
 directory insteed of the jetty lib/ext directory. Is the fact that you
 should you use WEB-INF/lib somewhere documented?

Actually, jetty's lib/ext is preferred, modifying the .war file is NOT
recommended.

Solr used to ship with the logging jars in the .war file, similar to the
result that Guido's procedure gives you.

http://wiki.apache.org/solr/SolrLogging#What_changed

This was changed in version 4.3.0 because many people were having to
take manual steps to change logging frameworks.  There is a strong
preference among people who really care about logging for using log4j or
logback instead of java.util.logging.  Now nobody needs to compile Solr
themselves or perform surgery on the .war file when they want to change
their logging, and the default produces much better results.

Thanks,
Shawn



Re: SolrCloud on SSL

2013-10-11 Thread Christopher Gross
On Fri, Oct 11, 2013 at 11:08 AM, Shawn Heisey s...@elyograg.org wrote:

 On 10/11/2013 8:17 AM, Christopher Gross wrote: 
  Is there a spot in a Solr configuration that I can set this up to use
 HTTPS?

 From what I can tell, not yet.

 https://issues.apache.org/jira/browse/SOLR-3854
 https://issues.apache.org/jira/browse/SOLR-4407
 https://issues.apache.org/jira/browse/SOLR-4470


Dang.


 I'm wondering why you want to do this, though.  It adds extra CPU
 overhead.  Perhaps not a lot, but it's not free.

 As for protecting Solr against eavesdropping, is it in a location where
 that's possible?  The bottom line is this:  People that you cannot trust
 should not have direct access to Solr.  It should be firewalled so only
 trusted personnel and applications can talk to it.


Oh, they should be firewalled, but I can't (yet) with the existing network
architecture.  It's out of my direct control -- I'm just trying to stay one
step ahead of the game.


 Anyone who has direct access to Solr can change your index, delete your
 index, and send denial of service queries.  If you take steps to block
 access to the update handler(s) and the admin UI, denial of service
 queries are still possible.  Blocking access to the update handlers and
 admin UI is not something Solr itself can do - that's a job for the
 servlet container.

 Related general issue: The /browse handler included in the example
 (which utilizes code written in velocity) requires that the user have
 direct access to Solr.  This makes its very design insecure.  That
 handler is intended as a demonstration of Solr's capabilities and how to
 use them, it's not for production.


Good to know, I'll make sure that I've bumped this in my configs.  Thanks!


Re: SOLR Cloud on JBOSS

2013-10-11 Thread Shawn Heisey
On 10/11/2013 9:24 AM, Branham, Jeremy [HR] wrote:
 This wiki page is gone - 
 https://wiki.apache.org/solr/SolrCloud%20using%20Jboss
 
 I have been able to configure an external instance of Zookeeper, and an 
 instance of SOLR in JBOSS..
 But I am unsure how to point my SOLR instance to the ZK instance and upload 
 the configuration.
 
 All the examples I have found, show using script parameters to start SOLR 
 rather than using a container like JBOSS.

With version 4.4.0, you can put the zkHost parameter required to turn
SolrCloud mode on in your solr.xml file.  This is the case whether you
use the new solr.xml format or the old solr.xml format.  With versions
4.3.0 and older (which can only use the old solr.xml format), there was
a bug that prevented this parameter from working correctly in solr.xml.

Alternatively, you can use whatever mechanism JBoss provides for setting
java system properties to set the zkHost parameter.

As for uploading configurations, I strongly recommend that you do not do
this with startup parameters, but rather do it with the command-line
zookeeper utility.  The example includes scripts for using this utility,
but those scripts rely pretty heavily on the example jetty.  Here's a
reference that shows how to use it directly, but you must know where
JBoss extracted the war file so you can use the correct classpath argument:

https://cwiki.apache.org/confluence/display/solr/Command+Line+Utilities

Thanks,
Shawn



Re: SolrCloud on SSL

2013-10-11 Thread Guido Medina
You could resolve that with SSH tunnels. Autossh with the right 
parameters works like a charm.


HTH,

Guido.

On 11/10/13 16:08, Shawn Heisey wrote:

On 10/11/2013 8:17 AM, Christopher Gross wrote:

I have 3 SolrCloud nodes (call them idx1, idx2, idx3), and the boxes have
SSL  certs configured on them to protect the Solr Indexes.

Right now, I can do queries on idx1 and it works fine.
If I try to query on idx3, I get:
org.apache.solr.common.SolrException:
org.apache.sorl.client.solrj.SolrServerException:IOException occurred when
talking to server at http://idx1:8443/solr/test1
(and then a long stack trace -- can't copy it, on a test network)

Is there a spot in a Solr configuration that I can set this up to use HTTPS?

 From what I can tell, not yet.

https://issues.apache.org/jira/browse/SOLR-3854
https://issues.apache.org/jira/browse/SOLR-4407
https://issues.apache.org/jira/browse/SOLR-4470

I'm wondering why you want to do this, though.  It adds extra CPU
overhead.  Perhaps not a lot, but it's not free.

As for protecting Solr against eavesdropping, is it in a location where
that's possible?  The bottom line is this:  People that you cannot trust
should not have direct access to Solr.  It should be firewalled so only
trusted personnel and applications can talk to it.

Anyone who has direct access to Solr can change your index, delete your
index, and send denial of service queries.  If you take steps to block
access to the update handler(s) and the admin UI, denial of service
queries are still possible.  Blocking access to the update handlers and
admin UI is not something Solr itself can do - that's a job for the
servlet container.

Related general issue: The /browse handler included in the example
(which utilizes code written in velocity) requires that the user have
direct access to Solr.  This makes its very design insecure.  That
handler is intended as a demonstration of Solr's capabilities and how to
use them, it's not for production.

Thanks,
Shawn





Re: What's the purpose of the bits option in compositeId (Solr 4.5)?

2013-10-11 Thread Brett Hoerner
Thanks folks,

As an update for future readers --- the problem was on my side (my logic in
picking the _route_ was flawed) as expected. :)


On Tue, Oct 8, 2013 at 7:35 PM, Yonik Seeley ysee...@gmail.com wrote:

 On Tue, Oct 8, 2013 at 8:27 PM, Shawn Heisey s...@elyograg.org wrote:
  There is also the distrib=false parameter that will cause the request
 to
  be handled directly by the core it is sent to rather than being
  distributed/balanced by SolrCloud.

 Right - this is probably the best option for diagnosing what is in what
 index.

 -Yonik



Re: Problems using DataImportHandler and TikaEntityProcessor

2013-10-11 Thread Furkan KAMACI
Here is a similar conversation:
http://search-lucene.com/m/GeXcg1YfgQ32/Re%253A+Solr+4.0+error+message%253A+%2522Unsupported+ContentType%253A+Content-type%253Atext%252Fxml%2522subj=Re+Solr+4+0+error+message+Unsupported+ContentType+Content+type+text+xml+

Could you change -Dauto into -Dtype=application/pdf and try it again?


2013/10/11 PeteBleackley bleackl...@zooey.co.uk

 kamaci wrote
  There may be a problem with you schema. Could you send your solr logs?
 
 
  2013/10/11 Peter Bleackley lt;

  bleackleyp@.co

  gt;
 
  Starting Solr with the command line
 
 
  java -Dsolr.solr.home=example-DIH/**solr -jar start.jar
 
 
  and then trying to import some data with
 
  java
  -Durl=
 http://localhost:8983/**solr/tika/updatelt;http://localhost:8983/solr/tika/updategt;-Dtype=application/pdf
  -jar post.jar *.pdf
 
  fails with error
 
  SimplePostTool: WARNING: Solr returned an error #400 Bad Request
  SimplePostTool: WARNING: IOException while reading response:
  java.io.IOException: Server returned HTTP response code: 400 for URL:
 
 http://localhost:8983/solr/**tika/updatelt;http://localhost:8983/solr/tika/updategt
 ;
 
  These are all valid PDFs that I have previously been able to import with
  Solr Cell.
 
  What am I doing wrong?
 
  Dr Peter J Bleackley
  Computational Linguistics Contractor
  Playful Technology Ltd
 
 
 

 11228 [qtp1831924725-17] INFO
 org.apache.solr.update.processor.LogUpdateProcessor  – [tika] webapp=/solr
 path=/update params={} {} 0 0
 11229 [qtp1831924725-17] ERROR org.apache.solr.core.SolrCore  –
 org.apache.solr.common.SolrException: Unsupported ContentType:
 application/pdf  Not in: [application/xml, text/csv, text/json,
 application/csv, application/javabin, text/xml, application/json]
 at

 org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:86)
 at

 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
 at

 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
 at

 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703)
 at

 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406)
 at

 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)
 at

 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
 at
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
 at

 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
 at
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
 at

 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
 at

 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
 at
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
 at

 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
 at

 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
 at

 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
 at

 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
 at

 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
 at

 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
 at org.eclipse.jetty.server.Server.handle(Server.java:368)
 at

 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
 at

 org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
 at

 org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
 at

 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)
 at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861)
 at
 org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
 at

 org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
 at

 org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
 at

 org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
 at

 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
 at java.lang.Thread.run(Thread.java:724)


 I tried changing the options to -Dauto -Dfiletypes=pdf. This gave me a 404
 error, apparently caused by post.jar adding /extract to the end of the URL





 --
 View this message in context:
 

Re: Problems using DataImportHandler and TikaEntityProcessor

2013-10-11 Thread Shawn Heisey
On 10/11/2013 9:32 AM, PeteBleackley wrote:
 I tried changing the options to -Dauto -Dfiletypes=pdf. This gave me a 404
 error, apparently caused by post.jar adding /extract to the end of the URL

In order to use post.jar, you would need the /update/extract handler,
which is not defined in the tika core under example-DIH.

The example-DIH configurations are intended to use and illustrate the
dataimport handler - documents are imported using the /dataimport
handler and its config file, not sent directly with post.jar.

Here's a page covering what you would need in order to send PDFs
directly rather than import them using DIH:

http://wiki.apache.org/solr/ExtractingRequestHandler

Thanks,
Shawn



Re: Question about plug-in update handler failure

2013-10-11 Thread Jack Park
Issue resolved. Not a Solr issue; a really hard to discover missing
library in my installation.

On Thu, Oct 10, 2013 at 7:10 PM, Jack Park jackp...@topicquests.org wrote:
 I have an interceptor which grabs SolrDocument instances in the
 update handler chain. It feeds those documents as a JSON string out to
 an agent system.

 That system has been running fine all the way up to Solr 4.3.1
 I have discovered that, as of 4.4 and now 4.5, the very same config
 files, agent jar, and test harness shows that no documents are
 intercepted, even though the index is built.

 I am wondering if I missed something in changes to Solr beyond 4.3.1
 which would invalidate my setup.

 For the record, earlier trials opened the war and dropped my agent jar
 into WEB-INF/lib; most recent trials on all systems leaves the war
 intact and drops the agent jar into collection1/lib -- it still works
 on 4.3.1, but nothing beyond that.

 Many thanks in advance for any thoughts.

 Jack


RE: Solr 4.4 - Master/Slave configuration - Replication Issue with Commits after deleting documents using Delete by ID

2013-10-11 Thread Akkinepalli, Bharat (ELS-CON)
Hi Otis,
Thanks for the response.  The log files can be found here.  

MasterLog : http://pastebin.com/DPLKMPcF 
Slave Log:  http://pastebin.com/DX9sV6Jx

One more point worth mentioning here is that when we issue the commit with 
expungeDeletes=true, then the delete by id replication is successful. i.e. 
http://localhost:8983/solr/annotation/update?commit=trueexpungeDeletes=true

Regards,
Bharat Akkinepalli

-Original Message-
From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] 
Sent: Wednesday, October 09, 2013 6:35 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr 4.4 - Master/Slave configuration - Replication Issue with 
Commits after deleting documents using Delete by ID

Bharat,

Can you look at the logs on the Master when you issue the delete and the 
subsequent commits and share that?

Otis
--
Solr  ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- 
http://sematext.com/spm



On Tue, Oct 8, 2013 at 3:57 PM, Akkinepalli, Bharat (ELS-CON) 
b.akkinepa...@elsevier.com wrote:
 Hi,
 We have recently migrated from Solr 3.6 to Solr 4.4.  We are using the 
 Master/Slave configuration in Solr 4.4 (not Solr Cloud).  We have noticed the 
 following behavior/defect.

 Configuration:
 ===

 1.   The Hard Commit and Soft Commit are disabled in the configuration 
 (we control the commits from the application)

 2.   We have 1 Master and 2 Slaves configured and the pollInterval is 
 configured to 10 Minutes.

 3.   The Master is configured to have the replicateAfter as commit  
 startup

 Steps to reproduce the problem:
 ==

 1.   Delete a document in Solr  (using delete by id).  URL - 
 http://localhost:8983/solr/annotation/update with body as  
 deleteidchange.me/id/delete

 2.   Issue a commit in Master 
 (http://localhost:8983/solr/annotation/update?commit=true).

 3.   The replication of the DELETE WILL NOT happen.  The master and slave 
 has the same Index version.

 4.   If we try to issue another commit in Master, we see that it 
 replicates fine.

 Request you to please confirm if this is a known issue.  Thank you.

 Regards,
 Bharat Akkinepalli



Re: Using split in updateCSV for SolrCloud 4.4

2013-10-11 Thread Utkarsh Sengar
Interestingly this URL by Jack works:
1. curl '
http://localhost/solr/prodinfo/update/csv?commit=truef.merchantList.split=truef.merchantList.separator=%3Af.merchantList.encapsulator=%22stream.contentType=text/csvstream.file=/tmp/test.csv
'

But this doesn't (i.e. it doesn't split the column):
2. curl '
http://localhost/solr/prodinfo/update/csv?commit=truef.merchantList.split=truef.merchantList.separator=%3Af.merchantList.encapsulator=%22escape=\stream.contentType=text/csvstream.file=/data/dump/catalog.txt
'

The only difference was escape=\, I added that in Jack's example and it
didn't work either. So the culprit was escape=\, not sure why.


Thanks,
-Utkarsh




On Thu, Oct 10, 2013 at 6:11 PM, Yonik Seeley ysee...@gmail.com wrote:

 Perhaps try adding echoParams=all
 to check that all of the input params are being parsed as expected.

 -Yonik

 On Thu, Oct 10, 2013 at 8:10 PM, Utkarsh Sengar utkarsh2...@gmail.com
 wrote:
  Didn't help.
 
  This is the complete data: https://gist.github.com/utkarsh2012/6927649(see
  merchantList column).
  I tried this URL:
  curl '
 
 http://localhost/solr/coll1/update/csv?commit=truef.merchantList.split=truef.merchantList.separator=%3Af.merchantList.encapsulator=%22escape=\stream.contentType=text/csvstream.file=/data/dump/log_20130101
  '
 
  Can this be a bug in the UpdateCSV split function?
 
  Thanks,
  -Utkarsh
 
 
 
  On Thu, Oct 10, 2013 at 3:11 PM, Jack Krupansky j...@basetechnology.com
 wrote:
 
  Using the standard Solr example for Solr 4.5, the following works,
  splitting the features CSV field into multiple values:
 
  curl http://localhost:8983/solr/**update/csv?commit=truef.**
  features.split=truef.**features.separator=%3Af.**
  features.encapsulator=%22
 http://localhost:8983/solr/update/csv?commit=truef.features.split=truef.features.separator=%3Af.features.encapsulator=%22
 
  -H Content-Type: text/csv -d '
  id,name,features
  doc-1,doc1,feat1:feat2'
 
  You may need to add stream.contentType=text/csv to you command.
 
  -- Jack Krupansky
 
  -Original Message- From: Utkarsh Sengar
  Sent: Thursday, October 10, 2013 4:51 PM
  To: solr-user@lucene.apache.org
  Subject: Using split in updateCSV for SolrCloud 4.4
 
 
  Hello,
 
  I am trying to use split: http://wiki.apache.org/solr/**UpdateCSV#split
 http://wiki.apache.org/solr/UpdateCSV#splitwhile
  loading some csv data via updateCSV.
 
  This is the field:
  field name=merchantList  type=string indexed=true  stored=true
  multiValued=true omitNorms=true termVectors=false
  termPositions=false termOffsets=false/
 
  This is the column in CSV (merchantList):
  values,16179:10950,.**values..
 
 
  This is the URL I call:
  http://localhost/solr/coll1/**update/csv?commit=truef.**
  merchantList.split=truef.**merchantList.separator=%3Af.**
  merchantList.encapsulator=
 http://localhost/solr/coll1/update/csv?commit=truef.merchantList.split=truef.merchantList.separator=%3Af.merchantList.encapsulator=
 
  escape=\stream.file=/data/**dump/log_20130101'
 
  Currently when I load the data, I see this:
 merchantList: [16179:10950],
  But I want this:
 merchantList: [16179,10950],
 
 
  This example is int but I have intentionally kept it as a string since
 some
  values can also be a string.
 
  Any suggestions where I am going wrong?
 
  --
  Thanks,
  -Utkarsh
 
 
 
 
  --
  Thanks,
  -Utkarsh




-- 
Thanks,
-Utkarsh


Re: Using split in updateCSV for SolrCloud 4.4

2013-10-11 Thread Jack Krupansky
There is this note for escape: If an escape is specified, the encapsulator 
is not used unless also explicitly specified since most formats use either 
encapsulation or escaping, not both.


-- Jack Krupansky

-Original Message- 
From: Utkarsh Sengar

Sent: Friday, October 11, 2013 4:35 PM
To: solr-user@lucene.apache.org
Subject: Re: Using split in updateCSV for SolrCloud 4.4

Interestingly this URL by Jack works:
1. curl '
http://localhost/solr/prodinfo/update/csv?commit=truef.merchantList.split=truef.merchantList.separator=%3Af.merchantList.encapsulator=%22stream.contentType=text/csvstream.file=/tmp/test.csv
'

But this doesn't (i.e. it doesn't split the column):
2. curl '
http://localhost/solr/prodinfo/update/csv?commit=truef.merchantList.split=truef.merchantList.separator=%3Af.merchantList.encapsulator=%22escape=\stream.contentType=text/csvstream.file=/data/dump/catalog.txt
'

The only difference was escape=\, I added that in Jack's example and it
didn't work either. So the culprit was escape=\, not sure why.


Thanks,
-Utkarsh




On Thu, Oct 10, 2013 at 6:11 PM, Yonik Seeley ysee...@gmail.com wrote:


Perhaps try adding echoParams=all
to check that all of the input params are being parsed as expected.

-Yonik

On Thu, Oct 10, 2013 at 8:10 PM, Utkarsh Sengar utkarsh2...@gmail.com
wrote:
 Didn't help.

 This is the complete data: 
 https://gist.github.com/utkarsh2012/6927649(see

 merchantList column).
 I tried this URL:
 curl '

http://localhost/solr/coll1/update/csv?commit=truef.merchantList.split=truef.merchantList.separator=%3Af.merchantList.encapsulator=%22escape=\stream.contentType=text/csvstream.file=/data/dump/log_20130101
 '

 Can this be a bug in the UpdateCSV split function?

 Thanks,
 -Utkarsh



 On Thu, Oct 10, 2013 at 3:11 PM, Jack Krupansky j...@basetechnology.com
wrote:

 Using the standard Solr example for Solr 4.5, the following works,
 splitting the features CSV field into multiple values:

 curl http://localhost:8983/solr/**update/csv?commit=truef.**
 features.split=truef.**features.separator=%3Af.**
 features.encapsulator=%22
http://localhost:8983/solr/update/csv?commit=truef.features.split=truef.features.separator=%3Af.features.encapsulator=%22

 -H Content-Type: text/csv -d '
 id,name,features
 doc-1,doc1,feat1:feat2'

 You may need to add stream.contentType=text/csv to you command.

 -- Jack Krupansky

 -Original Message- From: Utkarsh Sengar
 Sent: Thursday, October 10, 2013 4:51 PM
 To: solr-user@lucene.apache.org
 Subject: Using split in updateCSV for SolrCloud 4.4


 Hello,

 I am trying to use split: http://wiki.apache.org/solr/**UpdateCSV#split
http://wiki.apache.org/solr/UpdateCSV#splitwhile
 loading some csv data via updateCSV.

 This is the field:
 field name=merchantList  type=string indexed=true  stored=true
 multiValued=true omitNorms=true termVectors=false
 termPositions=false termOffsets=false/

 This is the column in CSV (merchantList):
 values,16179:10950,.**values..


 This is the URL I call:
 http://localhost/solr/coll1/**update/csv?commit=truef.**
 merchantList.split=truef.**merchantList.separator=%3Af.**
 merchantList.encapsulator=
http://localhost/solr/coll1/update/csv?commit=truef.merchantList.split=truef.merchantList.separator=%3Af.merchantList.encapsulator=

 escape=\stream.file=/data/**dump/log_20130101'

 Currently when I load the data, I see this:
merchantList: [16179:10950],
 But I want this:
merchantList: [16179,10950],


 This example is int but I have intentionally kept it as a string since
some
 values can also be a string.

 Any suggestions where I am going wrong?

 --
 Thanks,
 -Utkarsh




 --
 Thanks,
 -Utkarsh





--
Thanks,
-Utkarsh 



Setting SolrCloudServer collection

2013-10-11 Thread Mark
If using one static SolrCloudServer how can I add a bean to a certain 
collection. Do I need to update setDefaultCollection() each time? I doubt that 
thread safe?

Thanks

Re: Solr Cloud hangs when replicating updates

2013-10-11 Thread mewmewball
Hey guys,

We just hit a deadlock similar to this one on 4.5, and it seems to be
related to leaked connections probably due to
https://issues.apache.org/jira/browse/SOLR-4327. We're going to apply the
suggested change to add method.abort() in the finally block and see if it
fixes things.

Jessica



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Cloud-hangs-when-replicating-updates-tp4088083p4095061.html
Sent from the Solr - User mailing list archive at Nabble.com.


Replace NULL with 0 while Indexing

2013-10-11 Thread keshari.prerna
Hello,

One of my indexing field have NULL values and i want it to be replaces with
0 while indexing itself. So that when i search after indexing it gives me 0
instead of NULL.

This is my data-config.xml and duration is the field which has null values.

dataConfig
  dataSource type=JdbcDataSource 
  driver=com.mysql.jdbc.Driver
   url=jdbc:mysql://trdbadhoc/test_results
  responseBuffering=adaptive 
  batchSize=-1
  user=results 
  password=resultsloader/
   document
entity name=Test_Syndrome 
pk=id
  query=SELECT TS.id AS id, TET.type AS error_type, TS.syndrome AS
syndrome, S.start_date, SE.session_id AS sessionid,
S.duration, TL.logfile, J.job_number AS job, cluster,
S.hostname, platform FROM Test_Syndrome AS TS
STRAIGHT_JOIN Session_Errors AS SE ON (SE.test_syndrome_id = TS.id)
STRAIGHT_JOIN Session AS S ON (S.id = SE.session_id)
STRAIGHT_JOIN Test_Run AS TR ON (TR.session_id = SE.session_id)
STRAIGHT_JOIN  Test_Log AS TL ON (TL.id = TR.test_log_id)
STRAIGHT_JOIN  Job AS J ON (J.id = TL.job_id)
STRAIGHT_JOIN  Cluster AS C ON (C.id = J.cluster_id)
STRAIGHT_JOIN  Platform ON (TR.platform_id = Platform.id)
STRAIGHT_JOIN Test_Error_Type TET ON (SE.test_error_type_id =
TET.id)  
  
  Field column=id name=id/
   Field column=error_type name=error_type/
   Field column=syndrome name=syndrome/
   Field column=sessionid name=sessionid/
   Field column=duration name=duration/
   Field column=logfile name=logfile/
   Field column=job name=job/
Field column=cluster name=cluster/
   Field column=hostname name=hostname/
   Field column=platform name=platform/
  
/entity
  /document
/dataConfig

Please help.

Thanks  Regards,
Prerna





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Replace-NULL-with-0-while-Indexing-tp4095059.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Setting SolrCloudServer collection

2013-10-11 Thread Mark Miller
Set the collection param per request. It only uses the default if you don't set 
it.

- Mark

On Oct 11, 2013, at 5:26 PM, Mark static.void@gmail.com wrote:

 If using one static SolrCloudServer how can I add a bean to a certain 
 collection. Do I need to update setDefaultCollection() each time? I doubt 
 that thread safe?
 
 Thanks



Re: Solr's Filtering approaches

2013-10-11 Thread David Philip
Groups are pharmaceutical research expts.. User is presented with graph
view, he can select some region and all the groups in that region gets
included..user can modify the groups also here.. so we didn't maintain
group information in same solr index but we have externalized.
I looked at post filter article. So my understanding is that, I simply have
to extended as you did and should include implementaton for
isAllowed(acls[doc], groups) .This will filter the documents in the
collector and finally this collector will be returned. am I right?

  @Override
  public void collect(int doc) throws IOException {
if (isAllowed(acls[doc], user, groups)) super.collect(doc);
  }


Erick, I am interested to know whether I can extend any class that can
return me only the bitset of the documents that match the search query. I
can then do bitset1.andbitset2OfGroups - finally, collect only those
documents to return to user. How do I try this approach? Any pointers for
bit set?

Thanks - David




On Thu, Oct 10, 2013 at 5:25 PM, Erick Erickson erickerick...@gmail.comwrote:

 Well, my first question is why 50K groups is necessary, and
 whether you can simplify that. How a user can manually
 choose from among that many groups is interesting. But
 assuming they're all necessary, I can think of two things.

 If the user can only select ranges, just put in filter queries
 using ranges. Or possibly both ranges and individual entries,
 as fq=group:[1A TO 1A] OR group:(2B 45C 98Z) etc.
 You need to be a little careful how you put index these so
 range queries work properly, in the above you'd miss
 2A because it's sorting lexicographically, you'd need to
 store in some form that sorts like 001A 01A
 and so on. You wouldn't need to show that form to the
 user, just form your fq's in the app to work with
 that form.

 If that won't work (you wouldn't want this to get huge), think
 about a post filter that would only operate on documents that
 had made it through the select, although how to convey which
 groups the user selected to the post filter is an open
 question.

 Best,
 Erick

 On Wed, Oct 9, 2013 at 12:23 PM, David Philip
 davidphilipshe...@gmail.com wrote:
  Hi All,
 
  I have an issue in handling filters for one of our requirements and
  liked to get suggestion  for the best approaches.
 
 
  *Use Case:*
 
  1.  We have List of groups and the number of groups can increase upto 1
  million. Currently we have almost 90 thousand groups in the solr search
  system.
 
  2.  Just before the user hits a search, He has options to select the no.
 of
   groups he want to retrieve. [the distinct list of these group Names for
  display are retrieved from other solr index that has more information
 about
  groups]
 
  *3.User Operation:** *
  Say if user selected group 1A  - group 1A.  and searches for
 key:cancer.
 
 
  The current approach I was thinking is : get search results and filter
  query by groupids' list selected by user. But my concern is When these
  groups list is increasing to 50k unique Ids, This can cause lot of delay
  in getting search results. So wanted to know whether there are different
   filtering ways that I can try for?
 
  I was thinking of one more approach as suggested by my colleague to do -
   intersection.  -
  Get the groupIds' selected by user.
  Get the list of groupId's from search results,
  Perform intersection of both and then get the entire result set of only
  those groupid that intersected. Is this better way? Can I use any cache
  technique in this case?
 
 
  - David.