Using Solr from Github or SVN

2013-03-21 Thread Furkan KAMACI
I want to branch Solr (latest version) at my local and implement some
custom codes. After some time(maybe every month) I will merge my code with
Solr. However There is code at SVN and Github for Solr and I see that they
are not exactly synchronous. Which one do you suggest, do you think that if
there is no time delay between SVN and Github repositories using Git is
much better cos of merging easiness?


Re: Using Solr from Github or SVN

2013-03-21 Thread Furkan KAMACI
How about deciding on Maven or Ant + Ivy. On the other hand I need another
suggestion whether using Eclipse or Intellij IDEA. What developers use in
common?

2013/3/21 Jan Høydahl jan@cominvent.com

 See http://wiki.apache.org/solr/HowToContribute

 Whether you choose to work locally with a GIT checkout or SVN is up to
 you. At the end of the day, when you want to contribute stuff back, you'd
 generate a patch and attach it to JIRA. SVN is the main repo, so if you
 want to be 100% in sync, choose the official SVN.

 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 Solr Training - www.solrtraining.com

 21. mars 2013 kl. 10:31 skrev Furkan KAMACI furkankam...@gmail.com:

  I want to branch Solr (latest version) at my local and implement some
  custom codes. After some time(maybe every month) I will merge my code
 with
  Solr. However There is code at SVN and Github for Solr and I see that
 they
  are not exactly synchronous. Which one do you suggest, do you think that
 if
  there is no time delay between SVN and Github repositories using Git is
  much better cos of merging easiness?




How can I compile and debug Solr from source code?

2013-03-21 Thread Furkan KAMACI
I use Intellij Idea 12 and Solr 4.1 on a Centos 6.4 64 bit computer.

I have opened Solr source code at Intellij IDEA as explained documentation.
I want to deploy Solr into Tomcat 7. When I open the project there are
configurations set previosly (I used ant idea command before I open the
project) . However they are all test configurations and some of them are
not passed test (this is another issue, no need to go detail at this
e-mail). I have added a Tomcat Local configuration into configurations but
I don't know which one is the main method of Solr and is there any
documentation that explains code. i.e. I want to debug a point what Solr
receives from when I say -index from nutch and what Solr does?

I tried somehing to run code (I don't think I could generate a .war or an
exploded folder) an this is the error that I get:(I did't point any
artifact for edit configurations)

Error: Exception thrown by the agent : java.net.MalformedURLException:
Local host name unknown: java.net.UnknownHostException: me.local: me.local:
Name or service not known

(me.local is the name I set when I install Centos 6.4 on my computer)

Any ideas how to run source code will be nice for me.


Re: How can I compile and debug Solr from source code?

2013-03-21 Thread Furkan KAMACI
Using embedded is an option. However I see that there is a .war file inside
Solr source code. So that means that I can generate a .war file and deploy
it to Tomcat or something like that. My main question arises here. How can
I generate a .war file from my customized Solr source code? That's why I
mentioned tomcat. Any ideas?

2013/3/21 Shawn Heisey s...@elyograg.org

 On 3/21/2013 6:56 AM, Furkan KAMACI wrote:

 I use Intellij Idea 12 and Solr 4.1 on a Centos 6.4 64 bit computer.

 I have opened Solr source code at Intellij IDEA as explained
 documentation.
 I want to deploy Solr into Tomcat 7. When I open the project there are
 configurations set previosly (I used ant idea command before I open the
 project) . However they are all test configurations and some of them are
 not passed test (this is another issue, no need to go detail at this
 e-mail). I have added a Tomcat Local configuration into configurations but
 I don't know which one is the main method of Solr and is there any
 documentation that explains code. i.e. I want to debug a point what Solr
 receives from when I say -index from nutch and what Solr does?

 I tried somehing to run code (I don't think I could generate a .war or an
 exploded folder) an this is the error that I get:(I did't point any
 artifact for edit configurations)

 Error: Exception thrown by the agent : java.net.**MalformedURLException:
 Local host name unknown: java.net.UnknownHostException: me.local:
 me.local:
 Name or service not known


 There actually isn't a way to execute Solr itself, it doesn't have a main
 method.  Solr is a servlet, so it requires a servlet container to run.  The
 container that it ships with is jetty.  You have mentioned tomcat.

 I don't know how you might go about running tomcat and Solr within
 IntelliJ.  Perhaps someone else here might.  The debugging instructions on
 the wiki for IntelliJ seem to indicate that you debug remotely and start
 the included jetty with some special options:

 http://wiki.apache.org/lucene-**java/HowtoConfigureIntelliJhttp://wiki.apache.org/lucene-java/HowtoConfigureIntelliJ

 If you do figure out how to get IntelliJ to deploy directly to a locally
 installed tomcat, please update the wiki with the steps required.

 Thanks,
 Shawn




Re: How can I compile and debug Solr from source code?

2013-03-21 Thread Furkan KAMACI
Your mentioned suggestion is for only example application? Can I imply it
to just pure Solr (I don't want to generate example application because my
aim is not just debugging Solr, I want to extend it and I will debug that
extended code)?

2013/3/22 Alexandre Rafalovitch arafa...@gmail.com

 That's nice. Can we put that on a Wiki? Or as a quick screencast?

 Regards,
Alex.

 Personal blog: http://blog.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all at
 once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


 On Thu, Mar 21, 2013 at 5:42 PM, Erik Hatcher erik.hatc...@gmail.com
 wrote:

  Here's my development/debug workflow:
 
- ant idea at the top-level to generate the IntelliJ project
- cd solr; ant example - to build the full example
- cd example; java -Xdebug
  -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5005 -jar
  start.jar - to launch Jetty+Solr in debug mode
- set breakpoints in IntelliJ, set up a Remote run option
  (localhost:5005) in IntelliJ and debug pleasantly
 
  All the unit tests in Solr run very nicely in IntelliJ too, and for tight
  development loops, I spend my time doing that instead of running full on
  Solr.
 
  Erik
 
 
  On Mar 21, 2013, at 05:56 , Furkan KAMACI wrote:
 
   I use Intellij Idea 12 and Solr 4.1 on a Centos 6.4 64 bit computer.
  
   I have opened Solr source code at Intellij IDEA as explained
  documentation.
   I want to deploy Solr into Tomcat 7. When I open the project there are
   configurations set previosly (I used ant idea command before I open the
   project) . However they are all test configurations and some of them
 are
   not passed test (this is another issue, no need to go detail at this
   e-mail). I have added a Tomcat Local configuration into configurations
  but
   I don't know which one is the main method of Solr and is there any
   documentation that explains code. i.e. I want to debug a point what
 Solr
   receives from when I say -index from nutch and what Solr does?
  
   I tried somehing to run code (I don't think I could generate a .war or
 an
   exploded folder) an this is the error that I get:(I did't point any
   artifact for edit configurations)
  
   Error: Exception thrown by the agent : java.net.MalformedURLException:
   Local host name unknown: java.net.UnknownHostException: me.local:
  me.local:
   Name or service not known
  
   (me.local is the name I set when I install Centos 6.4 on my computer)
  
   Any ideas how to run source code will be nice for me.
 
 



Re: How can I compile and debug Solr from source code?

2013-03-21 Thread Furkan KAMACI
I mean I need that:  There is a .war file shipped with Solr source code.
How can I regenerate (build my code and generate a .war file) as like that?
I will deploy it to Tomcat then?

2013/3/22 Furkan KAMACI furkankam...@gmail.com

 Your mentioned suggestion is for only example application? Can I imply it
 to just pure Solr (I don't want to generate example application because my
 aim is not just debugging Solr, I want to extend it and I will debug that
 extended code)?


 2013/3/22 Alexandre Rafalovitch arafa...@gmail.com

 That's nice. Can we put that on a Wiki? Or as a quick screencast?

 Regards,
Alex.

 Personal blog: http://blog.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all at
 once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


 On Thu, Mar 21, 2013 at 5:42 PM, Erik Hatcher erik.hatc...@gmail.com
 wrote:

  Here's my development/debug workflow:
 
- ant idea at the top-level to generate the IntelliJ project
- cd solr; ant example - to build the full example
- cd example; java -Xdebug
  -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5005 -jar
  start.jar - to launch Jetty+Solr in debug mode
- set breakpoints in IntelliJ, set up a Remote run option
  (localhost:5005) in IntelliJ and debug pleasantly
 
  All the unit tests in Solr run very nicely in IntelliJ too, and for
 tight
  development loops, I spend my time doing that instead of running full on
  Solr.
 
  Erik
 
 
  On Mar 21, 2013, at 05:56 , Furkan KAMACI wrote:
 
   I use Intellij Idea 12 and Solr 4.1 on a Centos 6.4 64 bit computer.
  
   I have opened Solr source code at Intellij IDEA as explained
  documentation.
   I want to deploy Solr into Tomcat 7. When I open the project there are
   configurations set previosly (I used ant idea command before I open
 the
   project) . However they are all test configurations and some of them
 are
   not passed test (this is another issue, no need to go detail at this
   e-mail). I have added a Tomcat Local configuration into configurations
  but
   I don't know which one is the main method of Solr and is there any
   documentation that explains code. i.e. I want to debug a point what
 Solr
   receives from when I say -index from nutch and what Solr does?
  
   I tried somehing to run code (I don't think I could generate a .war
 or an
   exploded folder) an this is the error that I get:(I did't point any
   artifact for edit configurations)
  
   Error: Exception thrown by the agent : java.net.MalformedURLException:
   Local host name unknown: java.net.UnknownHostException: me.local:
  me.local:
   Name or service not known
  
   (me.local is the name I set when I install Centos 6.4 on my computer)
  
   Any ideas how to run source code will be nice for me.
 
 





Could not load config for solrconfig.xml

2013-03-21 Thread Furkan KAMACI
I run ant idea command  for Solr 4.1.0 and opened source code within
Intellij IDEA 12.0.4 and I use Centos 6.4 at my 64 bit computer.

I debugged JettySolrRunner (I don't know, I think this is the way to run
Solt with Embedd Jetty within my Intellij IDEA.) However I get that error:

SEVERE: Unable to create core: collection1
org.apache.solr.common.SolrException: Could not load config for
solrconfig.xml
at
org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:897)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:957)
at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:579)
at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:574)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.io.IOException: Can't find resource 'solrconfig.xml' in
classpath or './collection1/conf/', cwd=/home/kamaci/projects/lucene-solr
at
org.apache.solr.core.SolrResourceLoader.openResource(SolrResourceLoader.java:319)
at
org.apache.solr.core.SolrResourceLoader.openConfig(SolrResourceLoader.java:284)
at org.apache.solr.core.Config.init(Config.java:112)
at org.apache.solr.core.Config.init(Config.java:82)
at org.apache.solr.core.SolrConfig.init(SolrConfig.java:117)
at
org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:894)
... 11 more

What to do?


Re: How can I compile and debug Solr from source code?

2013-03-21 Thread Furkan KAMACI
Ok I run that and see that there is a .war file at

/lucene-solr/solr/dist

Do you know that how can I run that ant phase from Intellij without command
line (there are many phases under Ant build window) On the other hand
within Intellij Idea how can I auto deploy it into Tomcat. All in all I
will edit configurations and it will run that ant command and deploy it to
Tomcat itself?

2013/3/22 Steve Rowe sar...@gmail.com

 Perhaps you didn't see what I wrote earlier?:

 Sounds like you want 'ant dist', which will create the .war and put it
 into the solr/dist/ directory:

 PROMPT$ ant dist

 Steve

 On Mar 21, 2013, at 7:38 PM, Furkan KAMACI furkankam...@gmail.com wrote:

  I mean I need that:  There is a .war file shipped with Solr source code.
  How can I regenerate (build my code and generate a .war file) as like
 that?
  I will deploy it to Tomcat then?
 
  2013/3/22 Furkan KAMACI furkankam...@gmail.com
 
  Your mentioned suggestion is for only example application? Can I imply
 it
  to just pure Solr (I don't want to generate example application because
 my
  aim is not just debugging Solr, I want to extend it and I will debug
 that
  extended code)?
 
 
  2013/3/22 Alexandre Rafalovitch arafa...@gmail.com
 
  That's nice. Can we put that on a Wiki? Or as a quick screencast?
 
  Regards,
Alex.
 
  Personal blog: http://blog.outerthoughts.com/
  LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
  - Time is the quality of nature that keeps events from happening all at
  once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
 book)
 
 
  On Thu, Mar 21, 2013 at 5:42 PM, Erik Hatcher erik.hatc...@gmail.com
  wrote:
 
  Here's my development/debug workflow:
 
   - ant idea at the top-level to generate the IntelliJ project
   - cd solr; ant example - to build the full example
   - cd example; java -Xdebug
  -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5005 -jar
  start.jar - to launch Jetty+Solr in debug mode
   - set breakpoints in IntelliJ, set up a Remote run option
  (localhost:5005) in IntelliJ and debug pleasantly
 
  All the unit tests in Solr run very nicely in IntelliJ too, and for
  tight
  development loops, I spend my time doing that instead of running full
 on
  Solr.
 
 Erik
 
 
  On Mar 21, 2013, at 05:56 , Furkan KAMACI wrote:
 
  I use Intellij Idea 12 and Solr 4.1 on a Centos 6.4 64 bit computer.
 
  I have opened Solr source code at Intellij IDEA as explained
  documentation.
  I want to deploy Solr into Tomcat 7. When I open the project there
 are
  configurations set previosly (I used ant idea command before I open
  the
  project) . However they are all test configurations and some of them
  are
  not passed test (this is another issue, no need to go detail at this
  e-mail). I have added a Tomcat Local configuration into
 configurations
  but
  I don't know which one is the main method of Solr and is there any
  documentation that explains code. i.e. I want to debug a point what
  Solr
  receives from when I say -index from nutch and what Solr does?
 
  I tried somehing to run code (I don't think I could generate a .war
  or an
  exploded folder) an this is the error that I get:(I did't point any
  artifact for edit configurations)
 
  Error: Exception thrown by the agent :
 java.net.MalformedURLException:
  Local host name unknown: java.net.UnknownHostException: me.local:
  me.local:
  Name or service not known
 
  (me.local is the name I set when I install Centos 6.4 on my computer)
 
  Any ideas how to run source code will be nice for me.
 
 
 
 
 




Re: Could not load config for solrconfig.xml

2013-03-21 Thread Furkan KAMACI
Shoukd I create a collection1 folder as like in the example? On the other
hand if I use .war tı deploy how can I resolve that problem too?

2013/3/22 Furkan KAMACI furkankam...@gmail.com

 I run ant idea command  for Solr 4.1.0 and opened source code within
 Intellij IDEA 12.0.4 and I use Centos 6.4 at my 64 bit computer.

 I debugged JettySolrRunner (I don't know, I think this is the way to run
 Solt with Embedd Jetty within my Intellij IDEA.) However I get that error:

 SEVERE: Unable to create core: collection1
 org.apache.solr.common.SolrException: Could not load config for
 solrconfig.xml
 at
 org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:897)
 at org.apache.solr.core.CoreContainer.create(CoreContainer.java:957)
 at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:579)
 at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:574)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
 at java.util.concurrent.FutureTask.run(FutureTask.java:166)
 at
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
 at java.util.concurrent.FutureTask.run(FutureTask.java:166)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:722)
 Caused by: java.io.IOException: Can't find resource 'solrconfig.xml' in
 classpath or './collection1/conf/', cwd=/home/kamaci/projects/lucene-solr
 at
 org.apache.solr.core.SolrResourceLoader.openResource(SolrResourceLoader.java:319)
 at
 org.apache.solr.core.SolrResourceLoader.openConfig(SolrResourceLoader.java:284)
 at org.apache.solr.core.Config.init(Config.java:112)
 at org.apache.solr.core.Config.init(Config.java:82)
 at org.apache.solr.core.SolrConfig.init(SolrConfig.java:117)
 at
 org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:894)
 ... 11 more

 What to do?



Using Solr For a Real Search Engine

2013-03-22 Thread Furkan KAMACI
If I want to use Solr in a web search engine what kind of strategies should
I follow about how to run Solr. I mean I can run it via embedded jetty or
use war and deploy to a container? You should consider that I will have
heavy work load on my Solr.


NoSuchMethodError updateDocument

2013-03-22 Thread Furkan KAMACI
I use Solr 4.1.0 and Nutch 2.1, Java 1.7.0_17, Tomcat 7.0, Intellij IDEA
12.with a Centos 6.4 at my 64 bit computer.

I run that command succesfully:

bin/nutch solrindex http://localhost:8080/solr -index

However when I run that command:

bin/nutch solrindex http://localhost:8080/solr -reindex

I get that error :

Mar 22, 2013 6:48:27 PM org.apache.solr.common.SolrException log
SEVERE: null:java.lang.RuntimeException: java.lang.NoSuchMethodError:
org.apache.lucene.index.IndexWriter.updateDocument(Lorg/apache/lucene/index/Term;Lorg/apache/lucene/index/IndexDocument;Lorg/apache/lucene/analysis/Analyzer;)V
at
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:653)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:366)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:936)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
at
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1004)
at
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
at
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.lang.NoSuchMethodError:
org.apache.lucene.index.IndexWriter.updateDocument(Lorg/apache/lucene/index/Term;Lorg/apache/lucene/index/IndexDocument;Lorg/apache/lucene/analysis/Analyzer;)V
at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:201)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
at
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:451)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:587)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:346)
at
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
at
org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246)
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
at
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1812)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
... 16 more


Re: NoSuchMethodError updateDocument

2013-03-22 Thread Furkan KAMACI
I just indicated that JVM parameter:

-Dsolr.solr.home=/home/projects/lucene-solr/solr/solr_home

solr_home is where is my config files etc. stands. My solr.xml has that
lines:

cores adminPath=/admin/cores defaultCoreName=collection1
host=${host:} hostPort=${jetty.port:} hostContext=${hostContext:}
zkClientTimeout=${zkClientTimeout:15000}
   core name=collection1 instanceDir=collection1/
/cores

On the other hand I run it from my tomcat without using example embedded
jetty start.jar.

Any ideas?

2013/3/22 Furkan KAMACI furkankam...@gmail.com

 I use Solr 4.1.0 and Nutch 2.1, Java 1.7.0_17, Tomcat 7.0, Intellij IDEA
 12.with a Centos 6.4 at my 64 bit computer.

 I run that command succesfully:

 bin/nutch solrindex http://localhost:8080/solr -index

 However when I run that command:

 bin/nutch solrindex http://localhost:8080/solr -reindex

 I get that error :

 Mar 22, 2013 6:48:27 PM org.apache.solr.common.SolrException log
 SEVERE: null:java.lang.RuntimeException: java.lang.NoSuchMethodError:
 org.apache.lucene.index.IndexWriter.updateDocument(Lorg/apache/lucene/index/Term;Lorg/apache/lucene/index/IndexDocument;Lorg/apache/lucene/analysis/Analyzer;)V
 at
 org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:653)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:366)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
 at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
 at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
 at
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
 at
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
 at
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
 at
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
 at
 org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:936)
 at
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
 at
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
 at
 org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1004)
 at
 org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
 at
 org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:722)
 Caused by: java.lang.NoSuchMethodError:
 org.apache.lucene.index.IndexWriter.updateDocument(Lorg/apache/lucene/index/Term;Lorg/apache/lucene/index/IndexDocument;Lorg/apache/lucene/analysis/Analyzer;)V
 at
 org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:201)
 at
 org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
 at
 org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
 at
 org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:451)
 at
 org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:587)
 at
 org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:346)
 at
 org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
 at
 org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246)
 at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
 at
 org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
 at
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
 at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1812)
 at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
 ... 16 more



Re: NoSuchMethodError updateDocument

2013-03-23 Thread Furkan KAMACI
Hi Jan;

I will check the jar versions. By the way I think that I should create a
solr home directory for my application (my application is that: I use Nutch
to crawl web sites and use Solr to index them). Which folder from Solr
sources code folders (maybe lucene-solr/solr/example/example-DIH/solr?)
should I copy to somewhere and pass its path that is solr home as JVM
parameter? And I don't know what extra changes should I do for my situation
(nutch crawling and solr indexing)

At solr.xml there is a field ${jetty.port:} and I didn't define a port for
it? I use tomcat and it runs at 8080 and I think jetty port is 8983 that's
why I think that there may be a confusing point?

2013/3/23 Jan Høydahl jan@cominvent.com

 Are you 100% sure you use the exact jars for 4.1.0 *everywhere*, and that
 you're not blending older versions from the Nutch distro in your classpath
 here?

  Any ideas?
 BTW: What was your question here regarding Jetty vs Tomcat?

 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 Solr Training - www.solrtraining.com

 23. mars 2013 kl. 00:50 skrev Furkan KAMACI furkankam...@gmail.com:

  I just indicated that JVM parameter:
 
  -Dsolr.solr.home=/home/projects/lucene-solr/solr/solr_home
 
  solr_home is where is my config files etc. stands. My solr.xml has that
  lines:
 
  cores adminPath=/admin/cores defaultCoreName=collection1
  host=${host:} hostPort=${jetty.port:} hostContext=${hostContext:}
  zkClientTimeout=${zkClientTimeout:15000}
core name=collection1 instanceDir=collection1/
  /cores
 
  On the other hand I run it from my tomcat without using example embedded
  jetty start.jar.
 
  Any ideas?
 
  2013/3/22 Furkan KAMACI furkankam...@gmail.com
 
  I use Solr 4.1.0 and Nutch 2.1, Java 1.7.0_17, Tomcat 7.0, Intellij IDEA
  12.with a Centos 6.4 at my 64 bit computer.
 
  I run that command succesfully:
 
  bin/nutch solrindex http://localhost:8080/solr -index
 
  However when I run that command:
 
  bin/nutch solrindex http://localhost:8080/solr -reindex
 
  I get that error :
 
  Mar 22, 2013 6:48:27 PM org.apache.solr.common.SolrException log
  SEVERE: null:java.lang.RuntimeException: java.lang.NoSuchMethodError:
 
 org.apache.lucene.index.IndexWriter.updateDocument(Lorg/apache/lucene/index/Term;Lorg/apache/lucene/index/IndexDocument;Lorg/apache/lucene/analysis/Analyzer;)V
 at
 
 org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:653)
 at
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:366)
 at
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
 at
 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
 at
 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
 at
 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
 at
 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
 at
 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
 at
 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
 at
 
 org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:936)
 at
 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
 at
 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
 at
 
 org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1004)
 at
 
 org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
 at
 
 org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)
 at
 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:722)
  Caused by: java.lang.NoSuchMethodError:
 
 org.apache.lucene.index.IndexWriter.updateDocument(Lorg/apache/lucene/index/Term;Lorg/apache/lucene/index/IndexDocument;Lorg/apache/lucene/analysis/Analyzer;)V
 at
 
 org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:201)
 at
 
 org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
 at
 
 org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
 at
 
 org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:451)
 at
 
 org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:587)
 at
 
 org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:346)
 at
 
 org.apache.solr.update.processor.LogUpdateProcessor.processAdd

NoSuchMethodError SolrIndexSearcher.doc(I)

2013-03-23 Thread Furkan KAMACI
I have just configured my Solr to index nutch crawling data. I run dist-war
for Solr and when I deploy my war file from my Intellij IDEA 12.0.4 I get
that severe at my logs:

Mar 23, 2013 7:14:32 PM org.apache.solr.common.SolrException log
SEVERE: null:java.lang.NoSuchMethodError:
org.apache.solr.search.SolrIndexSearcher.doc(I)Lorg/apache/lucene/index/StoredDocument;
at
org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:78)
at org.apache.solr.core.SolrCore$5.call(SolrCore.java:1601)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)

However it is deployed and index, reindex worked, I can get result from my
searchs. Whay may be the reason for that and is that true dist-war to
compile my changes at solr when generating war file?


Re: NoSuchMethodError SolrIndexSearcher.doc(I)

2013-03-23 Thread Furkan KAMACI
I have indicated:

-Dsolr.data.dir

as a JVM parameter and error gone.



2013/3/23 Furkan KAMACI furkankam...@gmail.com

 I have just configured my Solr to index nutch crawling data. I run
 dist-war for Solr and when I deploy my war file from my Intellij IDEA
 12.0.4 I get that severe at my logs:

 Mar 23, 2013 7:14:32 PM org.apache.solr.common.SolrException log
 SEVERE: null:java.lang.NoSuchMethodError:
 org.apache.solr.search.SolrIndexSearcher.doc(I)Lorg/apache/lucene/index/StoredDocument;
 at
 org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:78)
 at org.apache.solr.core.SolrCore$5.call(SolrCore.java:1601)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
 at java.util.concurrent.FutureTask.run(FutureTask.java:166)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:722)

 However it is deployed and index, reindex worked, I can get result from my
 searchs. Whay may be the reason for that and is that true dist-war to
 compile my changes at solr when generating war file?



Re: NoSuchMethodError updateDocument

2013-03-23 Thread Furkan KAMACI
I am using:

bin/nutch solrindex http://localhost:8983/solr -index
bin/nutch solrindex http://localhost:8983/solr -reindex

I don't get this error anymore. By the wy who sets jetty.port?

2013/3/24 Jan Høydahl jan@cominvent.com

 How have you setup Nutch to index to Solr? Are you running this over HTTP
 between two different servers?

 The jetty.port is a silly name, but you can rename it anything you like.
 Its only task is to select which port to start an embedded ZooKeeper at if
 you use -DzkRun. If you don't, just forget about it.

 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 Solr Training - www.solrtraining.com

 23. mars 2013 kl. 14:34 skrev Furkan KAMACI furkankam...@gmail.com:

  Hi Jan;
 
  I will check the jar versions. By the way I think that I should create a
  solr home directory for my application (my application is that: I use
 Nutch
  to crawl web sites and use Solr to index them). Which folder from Solr
  sources code folders (maybe lucene-solr/solr/example/example-DIH/solr?)
  should I copy to somewhere and pass its path that is solr home as JVM
  parameter? And I don't know what extra changes should I do for my
 situation
  (nutch crawling and solr indexing)
 
  At solr.xml there is a field ${jetty.port:} and I didn't define a port
 for
  it? I use tomcat and it runs at 8080 and I think jetty port is 8983
 that's
  why I think that there may be a confusing point?
 
  2013/3/23 Jan Høydahl jan@cominvent.com
 
  Are you 100% sure you use the exact jars for 4.1.0 *everywhere*, and
 that
  you're not blending older versions from the Nutch distro in your
 classpath
  here?
 
  Any ideas?
  BTW: What was your question here regarding Jetty vs Tomcat?
 
  --
  Jan Høydahl, search solution architect
  Cominvent AS - www.cominvent.com
  Solr Training - www.solrtraining.com
 
  23. mars 2013 kl. 00:50 skrev Furkan KAMACI furkankam...@gmail.com:
 
  I just indicated that JVM parameter:
 
  -Dsolr.solr.home=/home/projects/lucene-solr/solr/solr_home
 
  solr_home is where is my config files etc. stands. My solr.xml has that
  lines:
 
  cores adminPath=/admin/cores defaultCoreName=collection1
  host=${host:} hostPort=${jetty.port:} hostContext=${hostContext:}
  zkClientTimeout=${zkClientTimeout:15000}
   core name=collection1 instanceDir=collection1/
  /cores
 
  On the other hand I run it from my tomcat without using example
 embedded
  jetty start.jar.
 
  Any ideas?
 
  2013/3/22 Furkan KAMACI furkankam...@gmail.com
 
  I use Solr 4.1.0 and Nutch 2.1, Java 1.7.0_17, Tomcat 7.0, Intellij
 IDEA
  12.with a Centos 6.4 at my 64 bit computer.
 
  I run that command succesfully:
 
  bin/nutch solrindex http://localhost:8080/solr -index
 
  However when I run that command:
 
  bin/nutch solrindex http://localhost:8080/solr -reindex
 
  I get that error :
 
  Mar 22, 2013 6:48:27 PM org.apache.solr.common.SolrException log
  SEVERE: null:java.lang.RuntimeException: java.lang.NoSuchMethodError:
 
 
 org.apache.lucene.index.IndexWriter.updateDocument(Lorg/apache/lucene/index/Term;Lorg/apache/lucene/index/IndexDocument;Lorg/apache/lucene/analysis/Analyzer;)V
at
 
 
 org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:653)
at
 
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:366)
at
 
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
at
 
 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at
 
 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at
 
 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
at
 
 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
at
 
 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
at
 
 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
at
 
 
 org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:936)
at
 
 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at
 
 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
at
 
 
 org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1004)
at
 
 
 org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
at
 
 
 org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)
at
 
 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
 
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
  Caused by: java.lang.NoSuchMethodError:
 
 
 org.apache.lucene.index.IndexWriter.updateDocument(Lorg/apache/lucene/index/Term;Lorg/apache/lucene/index/IndexDocument;Lorg

Re: Recommendation for integration test framework

2013-03-24 Thread Furkan KAMACI
Unrelated about your question you said that: We are utilizing Apache Maven
as build management tool I think currently ant + ivy is build and
dependency management tools, maven pom is generated  via plugin (If I am
wrong you can correct it). Are there any plan to move the project based on
Maven?

2013/3/25 Jan Morlock jan.morl...@googlemail.com

 Hi,

 our solr implementation consists of several cores sometimes interacting
 with
 each other. Using SolrTestCaseJ4 didn't work out for us. Instead we would
 like to test the resulting war from outside using integration tests. We are
 utilizing Apache Maven as build management tool. Therefore we are currently
 thinking about using the maven failsafe plugin.
 Does anybody have experiences with using it in combination with solr? Or
 does somebody have a better recommendation for us?

 Thank you very much in advance
 Jan



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Recommendation-for-integration-test-framework-tp4050936.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: multicore vs multi collection

2013-03-26 Thread Furkan KAMACI
Did you check that document:
http://wiki.apache.org/solr/SolrCloud#A_little_about_SolrCores_and_CollectionsIt
says:
On a single instance, Solr has something called a
SolrCorehttp://wiki.apache.org/solr/SolrCorethat is essentially a
single index. If you want multiple indexes, you
create multiple SolrCores http://wiki.apache.org/solr/SolrCores. With
SolrCloud, a single index can span multiple Solr instances. This means that
a single index can be made up of multiple
SolrCorehttp://wiki.apache.org/solr/SolrCore's
on different machines. We call all of these
SolrCoreshttp://wiki.apache.org/solr/SolrCoresthat make up one
logical index a collection. A collection is a essentially
a single index that spans many
SolrCorehttp://wiki.apache.org/solr/SolrCore's,
both for index scaling as well as redundancy. If you wanted to move your 2
SolrCore http://wiki.apache.org/solr/SolrCore Solr setup to SolrCloud,
you would have 2 collections, each made up of multiple individual
SolrCoreshttp://wiki.apache.org/solr/SolrCores.


2013/3/26 J Mohamed Zahoor zah...@indix.com

 Hi

 I am kind of confuzed between multi core and multi collection.
 Docs dont seem to clarify this.. can someone enlighten me what is ther
 difference between a core and a collection?
 Are they same?

 ./zahoor


Debugging Map Reduce Jobs at Solr

2013-03-26 Thread Furkan KAMACI
Is there any easy way(tools etc.) that I can debug Map Reduce jobs of Solr?


Re: Debugging Map Reduce Jobs at Solr

2013-03-26 Thread Furkan KAMACI
Ok, thanks for your responses. Actually I was wondering about indexing and
reindexing from nutch to Solr and debugging them. I think according to your
responses there is no difference for Solr side that data is coming through
a map reduce or not.

2013/3/26 Otis Gospodnetic otis.gospodne...@gmail.com

 Hi,

 Solr doesn't really do MapReduce jobs.  Maybe you mean distributed
 search where queries are dispatched to N servers and then responses
 are merged/reduced to top N and returned?

 Otis
 --
 Solr  ElasticSearch Support
 http://sematext.com/





 On Tue, Mar 26, 2013 at 6:34 AM, Furkan KAMACI furkankam...@gmail.com
 wrote:
  Is there any easy way(tools etc.) that I can debug Map Reduce jobs of
 Solr?



There are no SolrCores running. Using the Solr Admin UI currently requires at least one SolrCore.

2013-03-26 Thread Furkan KAMACI
I use Solr 4.2  on Centos 6.4 at AWS and I have deployed solr wars into two
different amazon instances at tomcats. *When I run them without solrcloud
they are OK.* However I want to use them as solrCloud. I want to start
embedded zookeper at one of them. When I run:

ps aux | grep catalina

I get that:

/usr/java/default/bin/java
-Djava.util.logging.config.file=/usr/share/tomcat/conf/logging.properties
-Dbootstrap_confdir=/usr/share/solrhome/collection1/conf
-Dcollection.configName=custom_conf -DnumShards=2 -DzkRun
-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
-Djava.endorsed.dirs=/usr/share/tomcat/endorsed -classpath
/usr/share/tomcat/bin/bootstrap.jar:/usr/share/tomcat/bin/tomcat-juli.jar
-Dcatalina.base=/usr/share/tomcat -Dcatalina.home=/usr/share/tomcat
-Djava.io.tmpdir=/usr/share/tomcat/temp
org.apache.catalina.startup.Bootstrap start

solrhome is my home of solr.

my solr.xml has that:

  cores adminPath=/admin/cores defaultCoreName=collection1
host=${host:} hostPort=${jetty.port:8080}
hostContext=${hostContext:search}
zkClientTimeout=${zkClientTimeout:15000}
core name=collection1 instanceDir=collection1 /
  /cores

When I open webpage I get that error:
*
There are no SolrCores running.
Using the Solr Admin UI currently requires at least one SolrCore.*

When I look catalina.out I see that:

Mar 26, 2013 8:54:35 PM org.apache.solr.cloud.ZkController
publish

INFO: publishing core=collection1
state=down

Mar 26, 2013 8:54:35 PM org.apache.solr.cloud.ZkController
publish

INFO: numShards not found on descriptor - reading it from system
property

Mar 26, 2013 8:54:36 PM org.apache.solr.common.cloud.ZkStateReader
updateClusterState

INFO: Updating cloud state from
ZooKeeper...

Mar 26, 2013 8:54:36 PM org.apache.solr.cloud.Overseer$ClusterStateUpdater
updateState

INFO: Update state numShards=2
message={


operation:state,


core_node_name:null,


numShards:2,


shard:null,


roles:null,


state:down,


core:collection1,


collection:collection1,

  node_name:**.**.***.**:8080_search,// I have put * as ip


  base_url:http://**.**.***.**:8080/search}  // I have put * as ip


Mar 26, 2013 8:54:36 PM org.apache.solr.cloud.Overseer$ClusterStateUpdater
createCollection

INFO: Create collection collection1 with numShards
2

Mar 26, 2013 8:54:36 PM org.apache.solr.cloud.Overseer$ClusterStateUpdater
updateState

INFO: Assigning new node to shard
shard=shard1

Mar 26, 2013 8:54:36 PM org.apache.zookeeper.server.NIOServerCnxnFactory$1
uncaughtException

SEVERE: Thread Thread[Thread-3,5,Overseer state updater.]
died

java.lang.NoSuchMethodError:
org.apache.solr.common.cloud.SolrZkClient.setData(Ljava/lang/String;[BZ)Lorg/apache/zookeeper/data/Stat;

at
org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:144)

at
java.lang.Thread.run(Thread.java:722)


Mar 26, 2013 8:59:55 PM org.apache.solr.common.SolrException log
SEVERE: null:org.apache.solr.common.SolrException: Could not get shard_id
for core: collection1 coreNodeName:10.36.163.29:8080_search_collection1
at
org.apache.solr.cloud.ZkController.doGetShardIdProcess(ZkController.java:1221)

at
org.apache.solr.cloud.ZkController.preRegister(ZkController.java:1290)

at
org.apache.solr.core.CoreContainer.registerCore(CoreContainer.java:861)

at
org.apache.solr.core.CoreContainer.register(CoreContainer.java:841)

at
org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:638)

at
org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629)

at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)

at
java.util.concurrent.FutureTask.run(FutureTask.java:166)

at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)

at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)

at
java.util.concurrent.FutureTask.run(FutureTask.java:166)

at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at
java.lang.Thread.run(Thread.java:722)


Mar 26, 2013 8:59:55 PM org.apache.solr.core.SolrCore close
INFO: [collection1]  CLOSING SolrCore org.apache.solr.core.SolrCore@64e5472e
Mar 26, 2013 8:59:55 PM org.apache.solr.update.DirectUpdateHandler2 close
INFO: closing DirectUpdateHandler2{commits=0,autocommit
maxTime=15000ms,autocommits=0,soft
autocommits=0,optimizes=0,rollbacks=0,expungeDeletes=0,docsPending=0,adds=0,deletesById=0,deletesByQuery=0,errors=0,cumulative_adds=0,cumulative_deletesById=0,cumulative_deletesByQuery=0,cumulative_errors=0}

Mar 26, 2013 8:59:55 PM org.apache.solr.update.SolrCoreState
decrefSolrCoreState

INFO: Closing
SolrCoreState



Mar 26, 2013 8:59:56 PM org.apache.catalina.startup.Catalina start
INFO: Server startup in 327928 ms
Mar 26, 2013 8:59:57 PM org.apache.solr.servlet.SolrDispatchFilter
handleAdminRequest
INFO: 

Re: There are no SolrCores running. Using the Solr Admin UI currently requires at least one SolrCore.

2013-03-26 Thread Furkan KAMACI
Yes, I cleaned and compiled with ant again and fixed. Because there were
some other jars at my lib somehow. How could do understand that there is
mix of jars? Just because of NoSuchMethodError or with something else?

2013/3/26 Mark Miller markrmil...@gmail.com

 java.lang.NoSuchMethodError:

 There must be something off with the jars you are using - a mix of
 versions or something.

 - Mark

 On Mar 26, 2013, at 5:18 PM, Furkan KAMACI furkankam...@gmail.com wrote:

  I use Solr 4.2  on Centos 6.4 at AWS and I have deployed solr wars into
 two
  different amazon instances at tomcats. *When I run them without solrcloud
  they are OK.* However I want to use them as solrCloud. I want to start
  embedded zookeper at one of them. When I run:
 
  ps aux | grep catalina
 
  I get that:
 
  /usr/java/default/bin/java
  -Djava.util.logging.config.file=/usr/share/tomcat/conf/logging.properties
  -Dbootstrap_confdir=/usr/share/solrhome/collection1/conf
  -Dcollection.configName=custom_conf -DnumShards=2 -DzkRun
  -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
  -Djava.endorsed.dirs=/usr/share/tomcat/endorsed -classpath
  /usr/share/tomcat/bin/bootstrap.jar:/usr/share/tomcat/bin/tomcat-juli.jar
  -Dcatalina.base=/usr/share/tomcat -Dcatalina.home=/usr/share/tomcat
  -Djava.io.tmpdir=/usr/share/tomcat/temp
  org.apache.catalina.startup.Bootstrap start
 
  solrhome is my home of solr.
 
  my solr.xml has that:
 
   cores adminPath=/admin/cores defaultCoreName=collection1
  host=${host:} hostPort=${jetty.port:8080}
  hostContext=${hostContext:search}
  zkClientTimeout=${zkClientTimeout:15000}
 core name=collection1 instanceDir=collection1 /
   /cores
 
  When I open webpage I get that error:
  *
  There are no SolrCores running.
  Using the Solr Admin UI currently requires at least one SolrCore.*
 
  When I look catalina.out I see that:
 
  Mar 26, 2013 8:54:35 PM org.apache.solr.cloud.ZkController
  publish
 
  INFO: publishing core=collection1
  state=down
 
  Mar 26, 2013 8:54:35 PM org.apache.solr.cloud.ZkController
  publish
 
  INFO: numShards not found on descriptor - reading it from system
  property
 
  Mar 26, 2013 8:54:36 PM org.apache.solr.common.cloud.ZkStateReader
  updateClusterState
 
  INFO: Updating cloud state from
  ZooKeeper...
 
  Mar 26, 2013 8:54:36 PM
 org.apache.solr.cloud.Overseer$ClusterStateUpdater
  updateState
 
  INFO: Update state numShards=2
  message={
 
 
  operation:state,
 
 
  core_node_name:null,
 
 
  numShards:2,
 
 
  shard:null,
 
 
  roles:null,
 
 
  state:down,
 
 
  core:collection1,
 
 
  collection:collection1,
 
   node_name:**.**.***.**:8080_search,// I have put * as ip
 
 
   base_url:http://**.**.***.**:8080/search}  // I have put * as ip
 
 
  Mar 26, 2013 8:54:36 PM
 org.apache.solr.cloud.Overseer$ClusterStateUpdater
  createCollection
 
  INFO: Create collection collection1 with numShards
  2
 
  Mar 26, 2013 8:54:36 PM
 org.apache.solr.cloud.Overseer$ClusterStateUpdater
  updateState
 
  INFO: Assigning new node to shard
  shard=shard1
 
  Mar 26, 2013 8:54:36 PM
 org.apache.zookeeper.server.NIOServerCnxnFactory$1
  uncaughtException
 
  SEVERE: Thread Thread[Thread-3,5,Overseer state updater.]
  died
 
  java.lang.NoSuchMethodError:
 
 org.apache.solr.common.cloud.SolrZkClient.setData(Ljava/lang/String;[BZ)Lorg/apache/zookeeper/data/Stat;
 
 at
  org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:144)
 
 at
  java.lang.Thread.run(Thread.java:722)
 
 
  Mar 26, 2013 8:59:55 PM org.apache.solr.common.SolrException log
  SEVERE: null:org.apache.solr.common.SolrException: Could not get shard_id
  for core: collection1 coreNodeName:10.36.163.29:8080_search_collection1
 at
 
 org.apache.solr.cloud.ZkController.doGetShardIdProcess(ZkController.java:1221)
 
 at
  org.apache.solr.cloud.ZkController.preRegister(ZkController.java:1290)
 
 at
  org.apache.solr.core.CoreContainer.registerCore(CoreContainer.java:861)
 
 at
  org.apache.solr.core.CoreContainer.register(CoreContainer.java:841)
 
 at
  org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:638)
 
 at
  org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629)
 
 at
  java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
 
 at
  java.util.concurrent.FutureTask.run(FutureTask.java:166)
 
 at
  java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
 
 at
  java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
 
 at
  java.util.concurrent.FutureTask.run(FutureTask.java:166)
 
 at
 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 
 at
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 
 at
  java.lang.Thread.run(Thread.java:722)
 
 
  Mar 26, 2013 8:59:55 PM org.apache.solr.core.SolrCore close
  INFO: [collection1]  CLOSING

SolrCloud On Different AWS Instances With Embedded Zookeeper

2013-03-26 Thread Furkan KAMACI
I have to Amazon Web Services instances. I have set up SolrCloud for them.
Solr .wars are deployed into tomcat. When I start solr that runs zookeper,
it is OK. It can not find second shard as usual. When I start up second
solr it throws error.

This is first solr config:

JAVA_OPTS=$JAVA_OPTS
-Dbootstrap_confdir=/usr/share/solr_home/collection1/conf
-Dcollection.configName=custom_conf -DnumShards=2 -DzkRun

This is for second one:

JAVA_OPTS=$JAVA_OPTS -DzkHost=**.**.***.**:9080 // Ihave masked ip

This is the error that I get at catalina.out:

Mar 26, 2013 10:42:14 PM org.apache.zookeeper.ClientCnxn$SendThread
logStartConnect
INFO: Opening socket connection to server
ip-**-**-***-**.eu-west-1.compute.internal/**.**.***.**:9080. Will not
attempt to authenticate using SASL (unknown error)
Mar 26, 2013 10:42:14 PM org.apache.zookeeper.ClientCnxn$SendThread
run

WARNING: Session 0x0 for server null, unexpected error, closing socket
connection and attempting
reconnect
java.net.ConnectException: Connection
refused

at sun.nio.ch.SocketChannelImpl.checkConnect(Native
Method)

at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:692)

at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)

at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)

And the things that I am confused:
1) I didn't define a -DzkHost for first one.
2) First solr runs on 8080 and I have added +1000 and set 9080 for second
one's -DzkHost. I should learn that do I need to define that 8080 at first
one at anywhere else.


Any ideas?


Re: multicore vs multi collection

2013-03-26 Thread Furkan KAMACI
Also from there http://wiki.apache.org/solr/SolrCloud:

*Q:* What is the difference between a Collection and a
SolrCorehttp://wiki.apache.org/solr/SolrCore?

*A:* In classic single node Solr, a
SolrCorehttp://wiki.apache.org/solr/SolrCoreis basically equivalent
to a Collection. It presents one logical index. In
SolrCloud, the SolrCore http://wiki.apache.org/solr/SolrCore's on
multiple nodes form a Collection. This is still just one logical index, but
multiple SolrCores http://wiki.apache.org/solr/SolrCores host different
'shards' of the full collection. So a
SolrCorehttp://wiki.apache.org/solr/SolrCoreencapsulates a single
physical index on an instance. A Collection is a
combination of all of the SolrCores
http://wiki.apache.org/solr/SolrCoresthat together provide a logical
index that is distributed across many
nodes.

2013/3/26 J Mohamed Zahoor zah...@indix.com

 Thanks.

 This make it clear than the wiki.

 How do you create multiple collection which can have different schema?

 ./zahoor

 On 26-Mar-2013, at 3:52 PM, Furkan KAMACI furkankam...@gmail.com wrote:

  Did you check that document:
 
 http://wiki.apache.org/solr/SolrCloud#A_little_about_SolrCores_and_CollectionsIt
  says:
  On a single instance, Solr has something called a
  SolrCorehttp://wiki.apache.org/solr/SolrCorethat is essentially a
  single index. If you want multiple indexes, you
  create multiple SolrCores http://wiki.apache.org/solr/SolrCores. With
  SolrCloud, a single index can span multiple Solr instances. This means
 that
  a single index can be made up of multiple
  SolrCorehttp://wiki.apache.org/solr/SolrCore's
  on different machines. We call all of these
  SolrCoreshttp://wiki.apache.org/solr/SolrCoresthat make up one
  logical index a collection. A collection is a essentially
  a single index that spans many
  SolrCorehttp://wiki.apache.org/solr/SolrCore's,
  both for index scaling as well as redundancy. If you wanted to move your
 2
  SolrCore http://wiki.apache.org/solr/SolrCore Solr setup to SolrCloud,
  you would have 2 collections, each made up of multiple individual
  SolrCoreshttp://wiki.apache.org/solr/SolrCores.
 
 
  2013/3/26 J Mohamed Zahoor zah...@indix.com
 
  Hi
 
  I am kind of confuzed between multi core and multi collection.
  Docs dont seem to clarify this.. can someone enlighten me what is ther
  difference between a core and a collection?
  Are they same?
 
  ./zahoor




Re: Loadtesting solr/tomcat7 and tomcat stops responding entirely

2013-03-27 Thread Furkan KAMACI
Hi Nate;

This may be out of topic however could you explain that why you want to use
Tomcat instead of Jetty or Embedded Jetty?


2013/3/27 Michael Della Bitta michael.della.bi...@appinions.com

 You're using the blocking IO connector, which isn't so great for heavy
 loads.

 Give this a shot... You'll end up with 8192 max connections by
 default, although this is tunable too:

 Run:
 apt-get install libapr1 libtcnative-1

 Add this to the list of Listeners at the top of server.xml:

 Listener className=org.apache.catalina.core.AprLifecycleListener
 SSLEngine=off /

 These instructions assume you're running Tomcat 6 or 7.

 Here's some documentation:
 http://tomcat.apache.org/tomcat-7.0-doc/apr.html
 http://tomcat.apache.org/tomcat-7.0-doc/config/http.html


 Michael Della Bitta

 
 Appinions
 18 East 41st Street, 2nd Floor
 New York, NY 10017-6271

 www.appinions.com

 Where Influence Isn’t a Game


 On Tue, Mar 26, 2013 at 5:31 PM, Nate Fox n...@neogov.com wrote:
  We're not using ELB and I have no idea which connector I'm using - I'm
  guessing whatever is default (I'm a total noob). This is from my
 server.xml:
  Connector port=8080 protocol=HTTP/1.1
 connectionTimeout=6
 URIEncoding=UTF-8 redirectPort=8443 /
 
 
 
  --
  Nate Fox
  Sr Systems Engineer
 
  o: 310.658.5775
  m: 714.248.5350
 
  Follow us @NEOGOV http://twitter.com/NEOGOV and on
  Facebookhttp://www.facebook.com/neogov
 
  NEOGOV http://www.neogov.com/ is among the top fastest growing
 software
  companies in the USA, recognized by Inc 500|5000, Deloitte Fast 500, and
  the LA Business Journal. We are hiring!
 http://www.neogov.com/#/company/careers
 
 
 
  On Tue, Mar 26, 2013 at 1:02 PM, Michael Della Bitta 
  michael.della.bi...@appinions.com wrote:
 
  Nate,
 
  We just cleared up a problem similar to this by ditching Elastic Load
  Balancer and switching over to the APR connector in Tomcat. Are you
  using either of those?
 
  Michael Della Bitta
 
  
  Appinions
  18 East 41st Street, 2nd Floor
  New York, NY 10017-6271
 
  www.appinions.com
 
  Where Influence Isn’t a Game
 
 
  On Tue, Mar 26, 2013 at 2:58 PM, Otis Gospodnetic
  otis.gospodne...@gmail.com wrote:
   Hi Nate,
  
   Try adding some warmup queries and making sure the setting for using
   the cold searcher in solrconfig.xml is set to false.  Your warmup
   queries should use facets and sorting if your normal queries use them.
In SPM you'll actually see how much time warming up takes, so you'll
   get a better idea of the cost of that (when you don't do it).
  
   Otis
   --
   Solr  ElasticSearch Support
   http://sematext.com/
  
  
  
  
  
   On Tue, Mar 26, 2013 at 2:50 PM, Nate Fox n...@neogov.com wrote:
   I was wondering if the warmup stuff was one of the culprits (we dont
  have
   warmup's at all - the configs are pretty stock).
   As for the system, it seems capable of quite a bit more: memory
 usage is
   ~30%, jvm-memory (from the dashboard) is very low (~220Mb out of 3Gb)
  and
   load below 1.00.
  
   The seed data and queries were put together by one of our developers.
  I've
   put all the solrmeter files here:
   https://gist.github.com/natefox/ee5cef3d4fbbc73e9bce
   Unfortunately I'm quite new to solr (and tomcat) so I'm not entirely
  sure
   which file does which specifically.
  
   Does the system's reaction to a 'fast load' without a warmup sound
  normal?
   I would have expected the first couple hundred queries to be very
 slow
   (500ms) and then the system catch up after a while. But it just dies
  very
   quickly and never recovers.
  
   I'll check out your SPM - I've seen it mentioned before. Thanks!
  
  
  
   --
   Nate Fox
   Sr Systems Engineer
  
   o: 310.658.5775
   m: 714.248.5350
  
   Follow us @NEOGOV http://twitter.com/NEOGOV and on
   Facebookhttp://www.facebook.com/neogov
  
   NEOGOV http://www.neogov.com/ is among the top fastest growing
  software
   companies in the USA, recognized by Inc 500|5000, Deloitte Fast 500,
 and
   the LA Business Journal. We are hiring!
  http://www.neogov.com/#/company/careers
  
  
  
   On Tue, Mar 26, 2013 at 11:12 AM, Otis Gospodnetic 
   otis.gospodne...@gmail.com wrote:
  
   Hi,
  
   In short, certain data structures need to load from index in the
   beginning, (for sorting and faceting) caches need to warm up, JVM
   needs to warm up, etc., so going slowly in the beginning makes
 sense.
   Why things die after that is a different Q.  Maybe it OOMs?  Maybe
   queries are very complex?  What do your queries look like?  I see
   newrelic.jar in the command-line.  May want to try SPM for Solr, it
   has better Solr metrics.
  
   Otis
   --
   Solr  ElasticSearch Support
   http://sematext.com/
  
  
  
  
  
   On Tue, Mar 26, 2013 at 1:24 PM, Nate Fox n...@neogov.com wrote:
I'm new to solr and I'm load testing our setup to see what we can
  handle.

Re: Setup solrcloud on tomcat

2013-03-28 Thread Furkan KAMACI
First of all, can you check your catalina.out log. It gives the detail
about what is wrong. Secondly you can separate such kind of JVM parameters
from that solr.xml and put them into a file setenv.sh (you will create it
under bin folder of tomcat.) and here is what you should do:

#!/bin/sh
JAVA_OPTS=$JAVA_OPTS
-Dbootstrap_confdir=/usr/share/solrhome/collection1/conf
-Dcollection.configName=custom_conf -DnumShards=2 -DzkRun
export JAVA_OPTS

You should change here - /usr/share/solrhome
into where is your solr home.

That should start up an embedded zookeper.

On the other hand client that will connect to embedded zookeper should have
that setenv.sh:

#!/bin/sh
JAVA_OPTS=$JAVA_OPTS -DzkHost=**.**.***.**:2181
export JAVA_OPTS

I have masked ip address, you should put your's.


2013/3/28 하정대 jungdae...@ahnlab.com

 Hi, all

 I tried setup solrcloud on tomcat. But I couldn’t see the cloud bar on
 solr menu. I think embedded zookeeper might not be loaded.
 This is my solr.xml file that was supposed to run zookeeper.

 solr persistent=”true”
cores adminPath=”/admin/cores” defaultCoreName=”collection1”
 host=”${host:}” hostPort=”8080” hostContext=”${hostContext:}” numShards=”2”
 zkRun=http://localhost:9081 zkClientTimeout=”${zkClientTimeout:15000}” 
   core name=”collection1” instanceDir=”collection1” /
/cores
 /solr

 What shall I have? I need your help.
 Also, Example file or tutorial could be a good help for me.
 I am working this with solrcloud wiki.

 Thanks. All.


 
 “세상에서 가장 안전한 이름 - 안철수연구소”
 하정대,  선임연구원 / ASD실
 Tel: 031-722-8338
 e-mail: jungdae...@ahnlab.com  http://www.ahnlab.com
 http://www.ahnlab.com/
 (우)463-400 경기도 성남시 분당구 삼평동 673번지
 




Combining Solr Indexes at SolrCloud

2013-03-29 Thread Furkan KAMACI
Let's assume that I have two machine in a SolrCloud that works as a part of
cloud. If I want to shutdown one of them an combine its indexes into other
how can I do that?


SOAP for Solr indexing mechanism

2013-03-29 Thread Furkan KAMACI
Is there any support for communication over SOAP for Solr indexing
mechanism?


Parallel Indexing With Solr?

2013-03-29 Thread Furkan KAMACI
Does Solr allows parallelism (parallel computing) for indexing?


Suggestions for Customizing Solr Admin Page

2013-03-29 Thread Furkan KAMACI
I want to customize Solr Admin Page. I think that I will need more
complicated things to manage my cloud. I will separate my Solr cluster into
just indexing ones and just response ones. I will index my documents by
categorical and I will index them at different collections.

In my admin page I will combine that collections, I will separate my
collection into new ones. I will add, remove, query documents etc.

Here is an old topic about admin Solr page:
http://lucene.472066.n3.nabble.com/Extending-Solr-s-Admin-functionality-td473974.html

My needs my change and some of them should be done via existing Solr admin
page. What do you suggest me, extending existing admin page, wrapping up a
new one over a Solrj. Which directions should I care and how can I decide
one of them.


Re: Parallel Indexing With Solr?

2013-03-29 Thread Furkan KAMACI
Can you tell more about You can index from a MapReduce job ? I use
nutch and it says Solr to index and reindex. I know that I can use Map
Reduce jobs at nutch side however can I use Map Reduce jobs at Solr side
(i.e for indexing etc.)?


2013/3/29 Otis Gospodnetic otis.gospodne...@gmail.com

 Yes.  You can index from any app that can hit SOlr with multiple
 threads.  You can use StreamingUpdateSolrServer, at least in older
 Solrs, to handle multi-threading for you.  You can index from a
 MapReduce job 

 Otis
 --
 Solr  ElasticSearch Support
 http://sematext.com/





 On Fri, Mar 29, 2013 at 5:26 AM, Furkan KAMACI furkankam...@gmail.com
 wrote:
  Does Solr allows parallelism (parallel computing) for indexing?



Filtering Search Cloud

2013-04-01 Thread Furkan KAMACI
I want to separate my cloud into two logical parts. One of them is indexer
cloud of SolrCloud. Second one is Searcher cloud of SolrCloud.

My first question is that. Does separating my cloud system make sense about
performance improvement. Because I think that when indexing, searching make
time to response and if I separate them I get a performance improvement. On
the other hand maybe using all Solr machines as whole (I mean not
partitioning as I mentioned) SolrCloud can make a better load balancing, I
would want to learn it.

My second question is that. Let's assume that I have separated my machines
as I mentioned. Can I filter some indexes to be searchable or not from
Searcher SolrCloud?


Re: Flow Chart of Solr

2013-04-02 Thread Furkan KAMACI
Actually maybe one the most important core thing is that Analysis part at
last diagram but there is nothing about it i.e. stamming, lemmitazing etc.
at any of them.


2013/4/2 Andre Bois-Crettez andre.b...@kelkoo.com


 On 04/02/2013 04:20 PM, Koji Sekiguchi wrote:

 (13/04/02 21:45), Furkan KAMACI wrote:

 Is there any documentation something like flow chart of Solr. i.e.
 Documents comes into Solr(maybe indicating which classes get documents)
 and
 goes to parsing process (i.e. stemming processes etc.) and then reverse
 indexes are get so on so forth?

  There is an interesting ticket:

 Architecture Diagrams needed for Lucene, Solr and Nutch
 https://issues.apache.org/**jira/browse/LUCENE-2412https://issues.apache.org/jira/browse/LUCENE-2412

 koji


 I like this one, it is a bit more detailed :

 http://www.cominvent.com/2011/**04/04/solr-architecture-**diagram/http://www.cominvent.com/2011/04/04/solr-architecture-diagram/

 --
 André Bois-Crettez

 Search technology, Kelkoo
 http://www.kelkoo.com/


 Kelkoo SAS
 Société par Actions Simplifiée
 Au capital de € 4.168.964,30
 Siège social : 8, rue du Sentier 75002 Paris
 425 093 069 RCS Paris

 Ce message et les pièces jointes sont confidentiels et établis à
 l'attention exclusive de leurs destinataires. Si vous n'êtes pas le
 destinataire de ce message, merci de le détruire et d'en avertir
 l'expéditeur.



Re: Flow Chart of Solr

2013-04-02 Thread Furkan KAMACI
You are right about mentioning developer doc and user doc. Users separate
about it. Some of them uses Solr for indexing and monitoring via admin face
and that is quietly enough for them however some people wants to modify it
so it would be nice if there had been some documentation for developer side
too.


2013/4/2 Yago Riveiro yago.rive...@gmail.com

 For beginners is complicate understand the complexity of solr / lucene,
 I'm trying devel a custom search component and it's too hard keep in mind
 the flow, inheritance and iteration between classes. I think that there is
 a gap between software doc and user doc, or maybe I don't search enough
 T_T. Java doc not always is clear always.

 The fact that I'm beginner in solr world don't help.

 Either way, this thread was very helpful, I found some very good resources
 here :)

 Cumprimentos

 --
 Yago Riveiro
 Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


 On Tuesday, April 2, 2013 at 3:51 PM, Furkan KAMACI wrote:

  Actually maybe one the most important core thing is that Analysis part at
  last diagram but there is nothing about it i.e. stamming, lemmitazing
 etc.
  at any of them.
 
 
  2013/4/2 Andre Bois-Crettez andre.b...@kelkoo.com (mailto:
 andre.b...@kelkoo.com)
 
  
   On 04/02/2013 04:20 PM, Koji Sekiguchi wrote:
  
(13/04/02 21:45), Furkan KAMACI wrote:
   
 Is there any documentation something like flow chart of Solr. i.e.
 Documents comes into Solr(maybe indicating which classes get
 documents)
 and
 goes to parsing process (i.e. stemming processes etc.) and then
 reverse
 indexes are get so on so forth?

 There is an interesting ticket:
   
Architecture Diagrams needed for Lucene, Solr and Nutch
https://issues.apache.org/**jira/browse/LUCENE-2412
 https://issues.apache.org/jira/browse/LUCENE-2412
   
koji
  
   I like this one, it is a bit more detailed :
  
   http://www.cominvent.com/2011/**04/04/solr-architecture-**diagram/
 http://www.cominvent.com/2011/04/04/solr-architecture-diagram/
  
   --
   André Bois-Crettez
  
   Search technology, Kelkoo
   http://www.kelkoo.com/
  
  
   Kelkoo SAS
   Société par Actions Simplifiée
   Au capital de € 4.168.964,30
   Siège social : 8, rue du Sentier 75002 Paris
   425 093 069 RCS Paris
  
   Ce message et les pièces jointes sont confidentiels et établis à
   l'attention exclusive de leurs destinataires. Si vous n'êtes pas le
   destinataire de ce message, merci de le détruire et d'en avertir
   l'expéditeur.
  
 
 
 





Re: [ANNOUNCE] Solr wiki editing change

2013-04-02 Thread Furkan KAMACI
Hi;

Please add FurkanKAMACI to the group.

Thanks;
Furkan KAMACI


2013/4/2 Steve Rowe sar...@gmail.com

 On Apr 2, 2013, at 11:23 AM, Ryan Ernst r...@iernst.net wrote:
  Please add RyanErnst to the contributors group.  Thanks!

 Added to solr wiki ContributorsGroup.



Re: Flow Chart of Solr

2013-04-02 Thread Furkan KAMACI
I think about myself as an example. I have started to make research about
Solr just for some weeks. I have learned Solr and its related projects. My
next step writing down the main steps Solr. We have separated learning
curve of Solr into two main categories.
First one is who are using it as out of the box components. Second one is
developer side.

Actually developer side branches into two way.

First one is general steps of it. i.e. document comes into Solr (i.e.
crawled data of Nutch). which analyzing processes are going to done
(stamming, hamming etc.), what will be doing after parsing step by step.
When a search query happens what happens step by step, at which step scores
are calculated so on so forth.
Second one is more code specific i.e. which handlers takes into account
data that will going to be indexed(no need the explain every handler at
this step) . Which are the analyzer, tokenizer classes and what are the
flow between them. How response handlers works and what are they.

Also explaining about cloud side is other work.

Some of explanations are currently presents at wiki (but some of them are
at very deep places at wiki and it is not easy to find the parent topic of
it, maybe starting wiki from a top age and branching all other topics as
possible as from it could be better)

If we could show the big picture, and beside of it the smaller pictures
within it, it would be great (if you know the main parts it will be easy to
go deep into the code i.e. you don't need to explain every handler, if you
show the way to the developer he/she could debug and find the needs)

When I think about myself as an example, I have to write down the steps of
Solr a bit detail  even I read many pages at wiki and a book about it, I
see that it is not easy even writing down the big picture of developer side.


2013/4/2 Alexandre Rafalovitch arafa...@gmail.com

 Yago,

 My point - perhaps lost in too much text - was that Solr is presented - and
 can function - as a black-box. Which makes it different from more
 traditional open-source project. So, the stage-2 happens exactly when the
 non-programmers have to cross the boundary from the black-box into
 code-first approach and the hand-off is not particularly smooth. Or even
 when - say - php or .Net programmer  tries to get beyond the basic
 operations their client library and has the understand the server-side
 aspects of Solr.

 Regards,
Alex.

 On Tue, Apr 2, 2013 at 1:19 PM, Yago Riveiro yago.rive...@gmail.com
 wrote:

  Alexandre,
 
  You describe the normal path when a beginner try to use a source of code
  that doesn't understand, black-box, reading code, hacking, ok now I know
  10% of the project, with lucky :p.
 


 Personal blog: http://blog.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all at
 once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)



Re: Flow Chart of Solr

2013-04-03 Thread Furkan KAMACI
So, all in all, is there anybody who can write down just main steps of
Solr(including parsing, stemming etc.)?


2013/4/2 Furkan KAMACI furkankam...@gmail.com

 I think about myself as an example. I have started to make research about
 Solr just for some weeks. I have learned Solr and its related projects. My
 next step writing down the main steps Solr. We have separated learning
 curve of Solr into two main categories.
 First one is who are using it as out of the box components. Second one is
 developer side.

 Actually developer side branches into two way.

 First one is general steps of it. i.e. document comes into Solr (i.e.
 crawled data of Nutch). which analyzing processes are going to done
 (stamming, hamming etc.), what will be doing after parsing step by step.
 When a search query happens what happens step by step, at which step scores
 are calculated so on so forth.
 Second one is more code specific i.e. which handlers takes into account
 data that will going to be indexed(no need the explain every handler at
 this step) . Which are the analyzer, tokenizer classes and what are the
 flow between them. How response handlers works and what are they.

 Also explaining about cloud side is other work.

 Some of explanations are currently presents at wiki (but some of them are
 at very deep places at wiki and it is not easy to find the parent topic of
 it, maybe starting wiki from a top age and branching all other topics as
 possible as from it could be better)

 If we could show the big picture, and beside of it the smaller pictures
 within it, it would be great (if you know the main parts it will be easy to
 go deep into the code i.e. you don't need to explain every handler, if you
 show the way to the developer he/she could debug and find the needs)

 When I think about myself as an example, I have to write down the steps of
 Solr a bit detail  even I read many pages at wiki and a book about it, I
 see that it is not easy even writing down the big picture of developer side.


 2013/4/2 Alexandre Rafalovitch arafa...@gmail.com

 Yago,

 My point - perhaps lost in too much text - was that Solr is presented -
 and
 can function - as a black-box. Which makes it different from more
 traditional open-source project. So, the stage-2 happens exactly when the
 non-programmers have to cross the boundary from the black-box into
 code-first approach and the hand-off is not particularly smooth. Or even
 when - say - php or .Net programmer  tries to get beyond the basic
 operations their client library and has the understand the server-side
 aspects of Solr.

 Regards,
Alex.

 On Tue, Apr 2, 2013 at 1:19 PM, Yago Riveiro yago.rive...@gmail.com
 wrote:

  Alexandre,
 
  You describe the normal path when a beginner try to use a source of code
  that doesn't understand, black-box, reading code, hacking, ok now I know
  10% of the project, with lucky :p.
 


 Personal blog: http://blog.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all at
 once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)





Re: Filtering Search Cloud

2013-04-03 Thread Furkan KAMACI
Shawn, thanks for your detailed explanation. My system will work on high
load. I mean I will always index something and something always will be
queried at my system. That is why I consider about physically separating
indexer and query reply machines. I think about that: imagine a machine
that both does indexing (a disk IO for it, I don't know the underlying
system maybe Solr makes a sequential IO) and both trying to reply queries
(another kind of IO) That is my main challenge to decide separating them.
And the next step is that, if I separate them before response can I filter
the data of indexer machines (I don't have any filtering  issues right now,
I just think that maybe I can need it at future)


2013/4/3 Shawn Heisey s...@elyograg.org

 On 4/1/2013 3:02 PM, Furkan KAMACI wrote:
  I want to separate my cloud into two logical parts. One of them is
 indexer
  cloud of SolrCloud. Second one is Searcher cloud of SolrCloud.
 
  My first question is that. Does separating my cloud system make sense
 about
  performance improvement. Because I think that when indexing, searching
 make
  time to response and if I separate them I get a performance improvement.
 On
  the other hand maybe using all Solr machines as whole (I mean not
  partitioning as I mentioned) SolrCloud can make a better load balancing,
 I
  would want to learn it.
 
  My second question is that. Let's assume that I have separated my
 machines
  as I mentioned. Can I filter some indexes to be searchable or not from
  Searcher SolrCloud?

 SolrCloud gets rid of the master and slave designations.  It also gets
 rid of the line between indexing and querying.  Each shard has a replica
 that is designated the leader, but that has no real impact on searching
 and indexing, only on deciding which data to use when replicas get out
 of sync.

 In the old master-slave architecture, you indexed to the master and the
 updated index files were replicated to the slave.  The slave did not
 handle the analysis for indexing, so it was usually better to send
 queries to slaves and let the master only do indexing.

 SolrCloud is very different.  When you index, the documents are indexed
 on all replicas at about the same time.  When you query, the requests
 are load balanced across all replicas.  During normal operation,
 SolrCloud does not use replication at all.  The replication feature is
 only used when a replica gets out of sync with the leader, and in that
 case, the entire index is replicated.

 Thanks,
 Shawn




Re: Filtering Search Cloud

2013-04-03 Thread Furkan KAMACI
Thanks for your explanation, you explained every thing what I need. Just
one more question. I see that I can not make it with Solr Cloud, but I can
do something like that with master-slave replication of Solr. If I use
master-slave replication of Solr, can I eliminate (filter) something
(something that is indexed from master) from being a response after
querying (querying from slaves) ?


2013/4/3 Shawn Heisey s...@elyograg.org

 On 4/3/2013 1:13 PM, Furkan KAMACI wrote:
  Shawn, thanks for your detailed explanation. My system will work on high
  load. I mean I will always index something and something always will be
  queried at my system. That is why I consider about physically separating
  indexer and query reply machines. I think about that: imagine a machine
  that both does indexing (a disk IO for it, I don't know the underlying
  system maybe Solr makes a sequential IO) and both trying to reply queries
  (another kind of IO) That is my main challenge to decide separating them.
  And the next step is that, if I separate them before response can I
 filter
  the data of indexer machines (I don't have any filtering  issues right
 now,
  I just think that maybe I can need it at future)

 We do seem to have a language barrier, so let me try to be very clear:
 If you use SolrCloud, you can't separate querying and indexing.  You
 will have to use the master-slave replication that been part of Solr
 since at least 1.4, possibly earlier.

 Thanks,
 Shawn




Difference Between Indexing and Reindexing

2013-04-03 Thread Furkan KAMACI
OK, This could be a so easy question but I want to learn just a bit more
technical detail of it.
When I use Nutch to send documents to Solr to be indexing there are two
parameters:

-index and -reindex.

What Solr does at each one different from the other one?


Re: Difference Between Indexing and Reindexing

2013-04-04 Thread Furkan KAMACI
Hi Otis, then what is the difference between add and update? And how we
update or add documents into Solr (I see that there is just one update
handler)?


2013/4/4 Otis Gospodnetic otis.gospodne...@gmail.com

 I don't recall what Nutch does, so it's hard to tell.

 In Solr (Lucene, really), you can:
 * add documents
 * update documents
 * delete documents

 Currently, update is really a delete + readd under the hood.  It's
 been like that for 13+ years, but this may change:
 https://issues.apache.org/jira/browse/LUCENE-4258

 Otis
 --
 Solr  ElasticSearch Support
 http://sematext.com/





 On Wed, Apr 3, 2013 at 9:15 PM, Furkan KAMACI furkankam...@gmail.com
 wrote:
  OK, This could be a so easy question but I want to learn just a bit more
  technical detail of it.
  When I use Nutch to send documents to Solr to be indexing there are two
  parameters:
 
  -index and -reindex.
 
  What Solr does at each one different from the other one?



Re: Difference Between Indexing and Reindexing

2013-04-04 Thread Furkan KAMACI
I craw webages with Nutch and send them to Solr for indexing. There are two
parameters to send data into Solr. One of them is -index and the other one
is -reindex. I just want to learn what they do.


2013/4/4 Jack Krupansky j...@basetechnology.com

 Technically, update and add are identical from a user perspective - you
 don't need to worry about whether the document already exists.

 But, there is another, newer form of update, selective or atomic which
 is updating a subset of the fields in an existing document without needing
 to re-send all of the other fields of the existing document.
 See:
 http://wiki.apache.org/solr/**Atomic_Updateshttp://wiki.apache.org/solr/Atomic_Updates

 But... none of this has to do with indexing vs. reindexing... you need
 to be clear what real question you are trying to ask, otherwise we can
 keeping following your questions, answering each in detail, bouncing all
 over the place without understanding what it is that you are really looking
 for.

 More specifically, what exactly is the problem you are trying to solve?

 -- Jack Krupansky

 -Original Message- From: Furkan KAMACI
 Sent: Thursday, April 04, 2013 2:45 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Difference Between Indexing and Reindexing


 Hi Otis, then what is the difference between add and update? And how we
 update or add documents into Solr (I see that there is just one update
 handler)?


 2013/4/4 Otis Gospodnetic otis.gospodne...@gmail.com

  I don't recall what Nutch does, so it's hard to tell.

 In Solr (Lucene, really), you can:
 * add documents
 * update documents
 * delete documents

 Currently, update is really a delete + readd under the hood.  It's
 been like that for 13+ years, but this may change:
 https://issues.apache.org/**jira/browse/LUCENE-4258https://issues.apache.org/jira/browse/LUCENE-4258

 Otis
 --
 Solr  ElasticSearch Support
 http://sematext.com/





 On Wed, Apr 3, 2013 at 9:15 PM, Furkan KAMACI furkankam...@gmail.com
 wrote:
  OK, This could be a so easy question but I want to learn just a bit more
  technical detail of it.
  When I use Nutch to send documents to Solr to be indexing there are two
  parameters:
 
  -index and -reindex.
 
  What Solr does at each one different from the other one?





Re: Difference Between Indexing and Reindexing

2013-04-04 Thread Furkan KAMACI
I use Nutch 2.1 and using that:

bin/nutch solrindex http://localhost:8983/solr -index
bin/nutch solrindex http://localhost:8983/solr -reindex


2013/4/4 Gora Mohanty g...@mimirtech.com

 On 4 April 2013 18:33, Furkan KAMACI furkankam...@gmail.com wrote:
  I craw webages with Nutch and send them to Solr for indexing. There are
 two
  parameters to send data into Solr. One of them is -index and the other
 one
  is -reindex. I just want to learn what they do.
 [...]

 Which version of Nutch are you using?
 Unless I have completely missed something, both 1.6 and 2.1
 use solrindex:
 http://wiki.apache.org/nutch/NutchTutorial#A6._Integrate_Solr_with_Nutch

 Where do you see -index and -reindex?

 Regards,
 Gora



Re: Difference Between Indexing and Reindexing

2013-04-04 Thread Furkan KAMACI
It may be a deprecated usage(maybe not) but certainly can run -index and
-reindex on Nutch 2.1.


2013/4/4 Gora Mohanty g...@mimirtech.com

 On 4 April 2013 20:16, Gora Mohanty g...@mimirtech.com wrote:
  On 4 April 2013 19:29, Furkan KAMACI furkankam...@gmail.com wrote:
  I use Nutch 2.1 and using that:
 
  bin/nutch solrindex http://localhost:8983/solr -index
  bin/nutch solrindex http://localhost:8983/solr -reindex
  [...]
 
  Sorry, but are you sure that you are using 2.1. Here is
  what I get with:
  ./bin/nutch solrindex
 [...]

 I am running in local mode, however, as I do not currently
 have access to a Hadoop cluster.

 Regards,
 Gora



Re: Filtering Search Cloud

2013-04-05 Thread Furkan KAMACI
Ok, I will test and give you a detailed report for it, thanks for your help.


2013/4/5 Erick Erickson erickerick...@gmail.com

 I cannot emphasize strongly enough that you need to _prove_ you have
 a problem before you decide on a solution! Do you have any evidence
 that solrcloud can't handle the load you intend? Might a better approach
 be just to create more shards thus spreading the load and get all the
 HA/DR goodness of SolrCloud?

 So far you've said you'll have a heavy load without giving us any
 numbers.
 10,000 update/second? 10 updates/second? 1 query/second? 100,000
 queries/second? 100,000 documents? 1,000,000,000,000 documents?

 Best
 Erick

 On Wed, Apr 3, 2013 at 5:15 PM, Shawn Heisey s...@elyograg.org wrote:
  On 4/3/2013 1:52 PM, Furkan KAMACI wrote:
  Thanks for your explanation, you explained every thing what I need. Just
  one more question. I see that I can not make it with Solr Cloud, but I
 can
  do something like that with master-slave replication of Solr. If I use
  master-slave replication of Solr, can I eliminate (filter) something
  (something that is indexed from master) from being a response after
  querying (querying from slaves) ?
 
  I don't understand the question.  I will attempt to give you more
  information, but it might not answer your question.  If not, you'll have
  to try to improve your question.
 
  Your master and each of that master's slaves will have the same index as
  soon as replication is done.  A query on the slave has no idea that the
  master exists.
 
  Thanks,
  Shawn
 



Re: Flow Chart of Solr

2013-04-05 Thread Furkan KAMACI
  of
  the package and class names are OBVIOUS, really, and follow the class
  hierarchy and code flow using the standard features of any modern Java
  IDE.
  If you are wondering where to start for some specific user-level
 feature,
  please ask specifically about that feature. But... make a diligent
 effort
  to
  discover and learn on your own before asking open-ended questions.
 
  Sure, there are lots of things in Lucene and Solr that are rather
 complex
  and seemingly convoluted, and not obvious, but people are more than
  willing
  to help you out if you simply ask a specific question. I mean, not
  everybody
  needs to know the fine detail of query parsing, analysis, building a
  Lucene-level stemmer, etc. If we tried to put all of that in a diagram,
  most
  people would be more confused than enlightened.
 
  At which step are scores calculated? That's more of a Lucene question.
  Or,
  are you really asking what code in Solr invokes Lucene search methods
  that
  calculate basic scores?
 
  In short, you need to be more specific. Don't force us to guess what
  problem
  you are trying to solve.
 
  -- Jack Krupansky
 
  -Original Message- From: Furkan KAMACI
  Sent: Wednesday, April 03, 2013 6:52 AM
  To: solr-user@lucene.apache.org
  Subject: Re: Flow Chart of Solr
 
 
  So, all in all, is there anybody who can write down just main steps of
  Solr(including parsing, stemming etc.)?
 
 
  2013/4/2 Furkan KAMACI furkankam...@gmail.com
 
  I think about myself as an example. I have started to make research
  about
  Solr just for some weeks. I have learned Solr and its related
 projects.
  My
  next step writing down the main steps Solr. We have separated learning
  curve of Solr into two main categories.
  First one is who are using it as out of the box components. Second one
  is
  developer side.
 
  Actually developer side branches into two way.
 
  First one is general steps of it. i.e. document comes into Solr (i.e.
  crawled data of Nutch). which analyzing processes are going to done
  (stamming, hamming etc.), what will be doing after parsing step by
 step.
  When a search query happens what happens step by step, at which step
  scores
  are calculated so on so forth.
  Second one is more code specific i.e. which handlers takes into
 account
  data that will going to be indexed(no need the explain every handler
 at
  this step) . Which are the analyzer, tokenizer classes and what are
 the
  flow between them. How response handlers works and what are they.
 
  Also explaining about cloud side is other work.
 
  Some of explanations are currently presents at wiki (but some of them
  are
  at very deep places at wiki and it is not easy to find the parent
 topic
  of
  it, maybe starting wiki from a top age and branching all other topics
 as
  possible as from it could be better)
 
  If we could show the big picture, and beside of it the smaller
 pictures
  within it, it would be great (if you know the main parts it will be
 easy
  to
  go deep into the code i.e. you don't need to explain every handler, if
  you
  show the way to the developer he/she could debug and find the needs)
 
  When I think about myself as an example, I have to write down the
 steps
  of
  Solr a bit detail  even I read many pages at wiki and a book about
 it, I
  see that it is not easy even writing down the big picture of developer
  side.
 
 
  2013/4/2 Alexandre Rafalovitch arafa...@gmail.com
 
  Yago,
 
  My point - perhaps lost in too much text - was that Solr is
 presented -
  and
  can function - as a black-box. Which makes it different from more
  traditional open-source project. So, the stage-2 happens exactly when
  the
  non-programmers have to cross the boundary from the black-box into
  code-first approach and the hand-off is not particularly smooth. Or
  even
  when - say - php or .Net programmer  tries to get beyond the basic
  operations their client library and has the understand the
 server-side
  aspects of Solr.
 
  Regards,
 Alex.
 
  On Tue, Apr 2, 2013 at 1:19 PM, Yago Riveiro yago.rive...@gmail.com
 
  wrote:
 
   Alexandre,
  
   You describe the normal path when a beginner try to use a source
 of 
   code
   that doesn't understand, black-box, reading code, hacking, ok now
 I 
   know
   10% of the project, with lucky :p.
  
 
 
  Personal blog: http://blog.outerthoughts.com/
  LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
  - Time is the quality of nature that keeps events from happening all
 at
  once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
  book)
 
 
 
 
 
 



Re: Sharing index amongst multiple nodes

2013-04-06 Thread Furkan KAMACI
Hi Daire Mac Mathúna;

If there is a way copying one Solr's indexes into another Solr instance,
this may also solve the problem. Somebody generates indexes and some of
other instances could get a copy of them. At synchronizing process you may
eliminate some of indexes at reader instance. So you can filter something
to become unsearchable. *This may not be efficient and good thing and maybe
solved with built-in functionality somehow.* However I think somebody may
need that mechanism.


2013/4/6 Amit Nithian anith...@gmail.com

 I don't understand why this would be more performant.. seems like it'd be
 more memory and resource intensive as you'd have multiple class-loaders and
 multiple cache spaces for no good reason. Just have a single core with
 sufficiently large caches to handle your response needs.

 If you want to load balance reads consider having multiple physical nodes
 with a master/slaves or SolrCloud.


 On Sat, Apr 6, 2013 at 9:21 AM, Daire Mac Mathúna daire...@gmail.com
 wrote:

  Hi. Wat are the thoughts on having multiple SOLR instances i.e. multiple
  SOLR war files, sharing the same index (i.e. sharing the same solr_home)
  where only one SOLR instance is used for writing and the others for
  reading?
 
  Is this possible?
 
  Is it beneficial - is it more performant than having just one solr
  instance?
 
  How does it affect auto-commits i.e. how would the read nodes know the
  index has been changed and re-populate cache etc.?
 
  Sole 3.6.1
 
  Thanks.
 



Pointing to Hbase for Docuements or Directly Saving Documents at Hbase

2013-04-06 Thread Furkan KAMACI
Hi;

First of all should mention that I am new to Solr and making a research
about it. What I am trying to do that I will crawl some websites with Nutch
and then I will index them with Solr. (Nutch 2.1, Solr-SolrCloud 4.2 )

I wonder about something. I have a cloud of machines that crawls websites
and stores that documents. Then I send that documents into SolrCloud. Solr
indexes that documents and generates indexes and save them. I know that
from Information Retrieval theory: it *may* not be efficient to store
indexes at a NoSQL database (they are something like linked lists and if
you store them in such kind of database you *may* have a sparse
representation -by the way there may be some solutions for it. If you
explain them you are welcome.)

However Solr stores some documents too (i.e. highlights) So some of my
documents will be doubled somehow. If I consider that I will have many
documents, that dobuled documents may cause a problem for me. So is there
any way not storing that documents at Solr and pointing to them at
Hbase(where I save my crawled documents) or instead of pointing directly
storing them at Hbase (is it efficient or not)?


Re: Sharing index amongst multiple nodes

2013-04-06 Thread Furkan KAMACI
Hi Walter;

I am new to Solr and digging into code to understand it. I think that when
indexer copies indexes, before the commit it is unsearchable.

Where exactly that commit occurs at code and can I say that: rollback
something because I don't want that indexes (reason maybe anything else,
maybe I will decline some indexes(index filtering) because of the documents
they points. Is it possible?



2013/4/7 Walter Underwood wun...@wunderwood.org

 This is precisely how Solr replication works. It copies the indexes then
 does a commit.

 wunder

 On Apr 6, 2013, at 2:40 PM, Furkan KAMACI wrote:

  Hi Daire Mac Mathúna;
 
  If there is a way copying one Solr's indexes into another Solr instance,
  this may also solve the problem. Somebody generates indexes and some of
  other instances could get a copy of them. At synchronizing process you
 may
  eliminate some of indexes at reader instance. So you can filter something
  to become unsearchable. *This may not be efficient and good thing and
 maybe
  solved with built-in functionality somehow.* However I think somebody may
  need that mechanism.
 
 
  2013/4/6 Amit Nithian anith...@gmail.com
 
  I don't understand why this would be more performant.. seems like it'd
 be
  more memory and resource intensive as you'd have multiple class-loaders
 and
  multiple cache spaces for no good reason. Just have a single core with
  sufficiently large caches to handle your response needs.
 
  If you want to load balance reads consider having multiple physical
 nodes
  with a master/slaves or SolrCloud.
 
 
  On Sat, Apr 6, 2013 at 9:21 AM, Daire Mac Mathúna daire...@gmail.com
  wrote:
 
  Hi. Wat are the thoughts on having multiple SOLR instances i.e.
 multiple
  SOLR war files, sharing the same index (i.e. sharing the same
 solr_home)
  where only one SOLR instance is used for writing and the others for
  reading?
 
  Is this possible?
 
  Is it beneficial - is it more performant than having just one solr
  instance?
 
  How does it affect auto-commits i.e. how would the read nodes know the
  index has been changed and re-populate cache etc.?
 
  Sole 3.6.1
 
  Thanks.
 
 

 --
 Walter Underwood
 wun...@wunderwood.org






Re: Sharing index amongst multiple nodes

2013-04-06 Thread Furkan KAMACI
Hi Walter;

Thanks for your explanation. You said Indexing happens on one Solr
server. Is it true even for SolrCloud?


2013/4/7 Walter Underwood wun...@wunderwood.org

 Indexing happens on one Solr server. After a commit, the documents are
 searchable. In Solr 4, there is a soft commit, which makes the documents
 searchable, but does not create on-disk indexes.

 Solr replication copies the committed indexes to another Solr server.

 Solr Cloud uses a transaction log to make documents available before a
 hard commit.

 Solr does not have rollback. A commit succeeds or fails. After it
 succeeds, there is no going back.

 wunder

 On Apr 6, 2013, at 3:08 PM, Furkan KAMACI wrote:

  Hi Walter;
 
  I am new to Solr and digging into code to understand it. I think that
 when
  indexer copies indexes, before the commit it is unsearchable.
 
  Where exactly that commit occurs at code and can I say that: rollback
  something because I don't want that indexes (reason maybe anything else,
  maybe I will decline some indexes(index filtering) because of the
 documents
  they points. Is it possible?
 
 
 
  2013/4/7 Walter Underwood wun...@wunderwood.org
 
  This is precisely how Solr replication works. It copies the indexes then
  does a commit.
 
  wunder
 
  On Apr 6, 2013, at 2:40 PM, Furkan KAMACI wrote:
 
  Hi Daire Mac Mathúna;
 
  If there is a way copying one Solr's indexes into another Solr
 instance,
  this may also solve the problem. Somebody generates indexes and some of
  other instances could get a copy of them. At synchronizing process you
  may
  eliminate some of indexes at reader instance. So you can filter
 something
  to become unsearchable. *This may not be efficient and good thing and
  maybe
  solved with built-in functionality somehow.* However I think somebody
 may
  need that mechanism.
 
 
  2013/4/6 Amit Nithian anith...@gmail.com
 
  I don't understand why this would be more performant.. seems like it'd
  be
  more memory and resource intensive as you'd have multiple
 class-loaders
  and
  multiple cache spaces for no good reason. Just have a single core with
  sufficiently large caches to handle your response needs.
 
  If you want to load balance reads consider having multiple physical
  nodes
  with a master/slaves or SolrCloud.
 
 
  On Sat, Apr 6, 2013 at 9:21 AM, Daire Mac Mathúna daire...@gmail.com
  wrote:
 
  Hi. Wat are the thoughts on having multiple SOLR instances i.e.
  multiple
  SOLR war files, sharing the same index (i.e. sharing the same
  solr_home)
  where only one SOLR instance is used for writing and the others for
  reading?
 
  Is this possible?
 
  Is it beneficial - is it more performant than having just one solr
  instance?
 
  How does it affect auto-commits i.e. how would the read nodes know
 the
  index has been changed and re-populate cache etc.?
 
  Sole 3.6.1
 
  Thanks.
 
 
 
  --
  Walter Underwood
  wun...@wunderwood.org
 
 
 
 

 --
 Walter Underwood
 wun...@wunderwood.org






Re: Sharing index amongst multiple nodes

2013-04-06 Thread Furkan KAMACI
My last questions.

1) If I sent document to a replica does it pass document to shard leader
and do you mean that even if I send document to shard leader does it can
pass that document
one of replicas to be indexed.

2) Does it possible to copy a shard into another shard, or merge them?

By the way thanks for your explanations.


2013/4/7 Walter Underwood wun...@wunderwood.org

 In Solr Cloud, a document is indexed on the shard leader. The replicas in
 that shard get the document and add it to their indexes. There is some
 indexing that happens on the replicas, but that is managed by Solr.

 wunder

 On Apr 6, 2013, at 3:58 PM, Furkan KAMACI wrote:

  Hi Walter;
 
  Thanks for your explanation. You said Indexing happens on one Solr
  server. Is it true even for SolrCloud?
 
 
  2013/4/7 Walter Underwood wun...@wunderwood.org
 
  Indexing happens on one Solr server. After a commit, the documents are
  searchable. In Solr 4, there is a soft commit, which makes the
 documents
  searchable, but does not create on-disk indexes.
 
  Solr replication copies the committed indexes to another Solr server.
 
  Solr Cloud uses a transaction log to make documents available before a
  hard commit.
 
  Solr does not have rollback. A commit succeeds or fails. After it
  succeeds, there is no going back.
 
  wunder
 
  On Apr 6, 2013, at 3:08 PM, Furkan KAMACI wrote:
 
  Hi Walter;
 
  I am new to Solr and digging into code to understand it. I think that
  when
  indexer copies indexes, before the commit it is unsearchable.
 
  Where exactly that commit occurs at code and can I say that: rollback
  something because I don't want that indexes (reason maybe anything
 else,
  maybe I will decline some indexes(index filtering) because of the
  documents
  they points. Is it possible?
 
 
 
  2013/4/7 Walter Underwood wun...@wunderwood.org
 
  This is precisely how Solr replication works. It copies the indexes
 then
  does a commit.
 
  wunder
 
  On Apr 6, 2013, at 2:40 PM, Furkan KAMACI wrote:
 
  Hi Daire Mac Mathúna;
 
  If there is a way copying one Solr's indexes into another Solr
  instance,
  this may also solve the problem. Somebody generates indexes and some
 of
  other instances could get a copy of them. At synchronizing process
 you
  may
  eliminate some of indexes at reader instance. So you can filter
  something
  to become unsearchable. *This may not be efficient and good thing and
  maybe
  solved with built-in functionality somehow.* However I think somebody
  may
  need that mechanism.
 
 
  2013/4/6 Amit Nithian anith...@gmail.com
 
  I don't understand why this would be more performant.. seems like
 it'd
  be
  more memory and resource intensive as you'd have multiple
  class-loaders
  and
  multiple cache spaces for no good reason. Just have a single core
 with
  sufficiently large caches to handle your response needs.
 
  If you want to load balance reads consider having multiple physical
  nodes
  with a master/slaves or SolrCloud.
 
 
  On Sat, Apr 6, 2013 at 9:21 AM, Daire Mac Mathúna 
 daire...@gmail.com
  wrote:
 
  Hi. Wat are the thoughts on having multiple SOLR instances i.e.
  multiple
  SOLR war files, sharing the same index (i.e. sharing the same
  solr_home)
  where only one SOLR instance is used for writing and the others for
  reading?
 
  Is this possible?
 
  Is it beneficial - is it more performant than having just one solr
  instance?
 
  How does it affect auto-commits i.e. how would the read nodes know
  the
  index has been changed and re-populate cache etc.?
 
  Sole 3.6.1
 
  Thanks.
 
 
 
  --
  Walter Underwood
  wun...@wunderwood.org
 
 
 
 
 
  --
  Walter Underwood
  wun...@wunderwood.org
 
 
 
 

 --
 Walter Underwood
 wun...@wunderwood.org






Prediction About Index Sizes of Solr

2013-04-08 Thread Furkan KAMACI
This may not be a well detailed question but I will try to make it clear.

I am crawling web pages and will index them at SolrCloud 4.2. What I want
to predict is the index size.

I will have approximately 2 billion web pages and I consider each of them
will be 100 Kb.
I know that it depends on storing documents, stop words. etc. etc. If you
want to ask about detail of my question I may give you more explanation.
However there should be some analysis to help me because I should predict
something about what will be the index size for me.

On the other hand my other important question is how SolrCloud makes
replicas for indexes, can I change it how many replicas will be. Because I
should multiply the total amount of index size with replica size.

Here I found an article related to my analysis:
http://juanggrande.wordpress.com/2010/12/20/solr-index-size-analysis/

I know this question may not be details but if you give ideas about it you
are welcome.


Solr Admin Page Master Size

2013-04-08 Thread Furkan KAMACI
When I check my Solr Admin Page:
Replication (Master)

Version Gen Size   Master:
1365458125729
5
18.24 MB

It is a one shard one computer. What is that 18.24 MB. Does it contains
just indexes or indexes, highlights etc. etc.?

My solr home folder was 512.7 KB and it has become 22860 KB that is why I
ask this question.


Average Solr Server Spec.

2013-04-09 Thread Furkan KAMACI
This question may not have a generel answer and may be open ended but is
there any commodity server spec. for a usual Solr running machine? I mean
what is the average server spesification for a Solr machine (i.e. Hadoop
running system it is not recommended to have very big storage capably
computers.) I will use Solr for indexing web crawled data.


Re: How can I set configuration options?

2013-04-09 Thread Furkan KAMACI
Hi Edd;

The parameters you mentioned are JVM parameters. There are two ways to
define them.
First one is if you are using an IDE you can indicate them as JVM
parameters. i.e. if you are using Intellij IDEA when you click your
Run/Debug configurations there is a line called VM Options. You can write
your paramters without writing java word in front of them.

Second one is deploying your war file into Tomcat without using an IDE (I
think this is what you want). Here is what to do:

Go to tomcat home folder and under the bin folder create a file called
setenv.sh Then add that lines:

#!/bin/sh
#
#
export JAVA_OPTS=$JAVA_OPTS
-Dbootstrap_confdir=./solr/collection1/conf
-Dcollection.configName=myconf -DzkRun
-DzkHost=localhost:9983,localhost:8574,localhost:9900 -DnumShards=2



2013/4/9 Edd Grant e...@eddgrant.com

 Hi all,

 I have been working through the examples on the SolrCloud page:
 http://wiki.apache.org/solr/SolrCloud

 I am now at the point where, rather than firing up Solr through start.jar,
 I'm deploying the Solr war in to Tomcat instances. Taking the following
 command as an example:

 java -Dbootstrap_confdir=./solr/collection1/conf
 -Dcollection.configName=myconf -DzkRun
 -DzkHost=localhost:9983,localhost:8574,localhost:9900 -DnumShards=2
 -jar start.jar

 I can't figure out from the documentation how/ where I set the above
 properties when deploying Solr as a war file. I initially thought these
 might be configurable through solr.xml but can't find anything in the
 documentation to support this.

 Most grateful for any pointers here.

 Cheers,

 Edd
 --
 Web: http://www.eddgrant.com
 Email: e...@eddgrant.com
 Mobile: +44 (0) 7861 394 543



Re: Average Solr Server Spec.

2013-04-09 Thread Furkan KAMACI
Hi Walter;

Could I learn that what is the average size of Solr indexes and average
query per second to your Solr. Maybe I can come up with an assumption?

2013/4/9 Walter Underwood wun...@wunderwood.org

 We mostly run m1.xlarge with an 8GB heap. --wunder

 On Apr 9, 2013, at 10:57 AM, Otis Gospodnetic wrote:

  Hi,
 
  You are right there is no average.  I saw a Solr cluster with a
  few EC2 micro instances yesterday and regularly see Solr running on 16
  or 32 GB RAM and sometimes well over 100 GB RAM.  Sometimes they have
  just 2 CPU cores, sometimes 32 or more.  Some use SSDs, some HDDs,
  some local storage, some SAN, some EBS on AWS. etc.
 
  Otis
  --
  Solr  ElasticSearch Support
  http://sematext.com/
 
 
 
 
 
  On Tue, Apr 9, 2013 at 7:04 AM, Furkan KAMACI furkankam...@gmail.com
 wrote:
  This question may not have a generel answer and may be open ended but is
  there any commodity server spec. for a usual Solr running machine? I
 mean
  what is the average server spesification for a Solr machine (i.e. Hadoop
  running system it is not recommended to have very big storage capably
  computers.) I will use Solr for indexing web crawled data.








Re: Slow qTime for distributed search

2013-04-09 Thread Furkan KAMACI
Hi Shawn;

You say that:

*... your documents are about 50KB each.  That would translate to an index
that's at least 25GB*

I know we can not say an exact size but what is the approximately ratio of
document size / index size according to your experiences?


2013/4/9 Shawn Heisey s...@elyograg.org

 On 4/9/2013 2:10 PM, Manuel Le Normand wrote:

 Thanks for replying.
 My config:

 - 40 dedicated servers, dual-core each
 - Running Tomcat servlet on Linux
 - 12 Gb RAM per server, splitted half between OS and Solr
 - Complex queries (up to 30 conditions on different fields), 1 qps
 rate

 Sharding my index was done for two reasons, based on 2 servers (4shards)
 tests:

 1. As index grew above few million of docs qTime raised greatly, while
 sharding the index to smaller pieces (about 0.5M docs) gave way better
 results, so I bound every shard to have 0.5M docs.
 2. Tests showed i was cpu-bounded during queries. As i have low qps
 rate
 (emphasize: lower than expected qTime) and as a query runs
 single-threaded
 on each shard, it made sense to accord a cpu to each shard.

 For the same amount of docs per shards I do expect a raise in total qTime
 for the reasons:

 1. The response should wait for the slowest shard
 2. Merging the responses from 40 different shards takes time

 What i understand from your explanation is that it's the merging that
 takes
 time and as qTime ends only after the second retrieval phase, the qTime on
 each shard will take longer. Meaning during a significant proportion of
 the
 first query phase (right after the [id,score] are retieved), all cpu's are
 idle except the response-merger thread running on a single cpu. I thought
 of the merge as a simple sorting of [id,score], way more simple than
 additional 300 ms cpu time.

 Why would a RAM increase improve my performances, as it's a
 response-merge (CPU resource) bottleneck?


 If you have not tweaked the Tomcat configuration, that can lead to
 problems, but if your total query volume is really only one query per
 second, this is probably not a worry for you.  A tomcat connector can be
 configured with a maxThreads parameter.  The recommended value there is
 1, but Tomcat defaults to 200.

 You didn't include the index sizes.  There's half a million docs per
 shard, but I don't know what that translates to in terms of MB or GB of
 disk space.

 On another email thread you mention that your documents are about 50KB
 each.  That would translate to an index that's at least 25GB, possibly
 more.  That email thread also says that optimization for you takes an hour,
 further indications that you've got some really big indexes.

 You're saying that you have given 6GB out of the 12GB to Solr, leaving
 only 6GB for the OS and caching.  Ideally you want to have enough RAM to
 cache the entire index, but in reality you can usually get away with
 caching between half and two thirds of the index.  Exactly what ratio works
 best is highly dependent on your schema.

 If my numbers are even close to right, then you've got a lot more index on
 each server than available RAM.  Based on what I can deduce, you would want
 24 to 48GB of RAM per server.  If my numbers are wrong, then this estimate
 is wrong.

 I would be interested in seeing your queries.  If the complexity can be
 expressed as filter queries that get re-used a lot, the filter cache can be
 a major boost to performance.  Solr's caches in general can make a big
 difference.  There is no guarantee that caches will help, of course.

 Thanks,
 Shawn




Approximately needed RAM for 5000 query/second at a Solr machine?

2013-04-09 Thread Furkan KAMACI
Are there anybody who can help me about how to guess the approximately
needed RAM for 5000 query/second at a Solr machine?


Re: Approximately needed RAM for 5000 query/second at a Solr machine?

2013-04-09 Thread Furkan KAMACI
Actually I will propose a system and I should figure out about machine
specifications. There will be no faceting mechanism at first, just simple
search queries of a web search engine. We can think that I will have a
commodity server (I don't know is there any benchmark for a usual Solr
machine)

2013/4/10 Jack Krupansky j...@basetechnology.com

 It all depends on the nature of your query and the nature of the data in
 the index. Does returning results from a result cache count in your QPS?
 Not to mention how many cores and CPU speed and CPU caching as well. Not to
 mention network latency.

 The best way to answer is to do a proof of concept implementation and
 measure it yourself.

 -- Jack Krupansky

 -Original Message- From: Furkan KAMACI
 Sent: Tuesday, April 09, 2013 6:06 PM
 To: solr-user@lucene.apache.org
 Subject: Approximately needed RAM for 5000 query/second at a Solr machine?


 Are there anybody who can help me about how to guess the approximately
 needed RAM for 5000 query/second at a Solr machine?



Re: Approximately needed RAM for 5000 query/second at a Solr machine?

2013-04-09 Thread Furkan KAMACI
Hi Walter;

Firstly thank for your detailed reply. I know that this is not a well
detailed question but I don't have any metrics yet. If we talk about your
system, what is the average RAM size of your Solr machines? Maybe that can
help me to make a comparison.

2013/4/10 Walter Underwood wun...@wunderwood.org

 On Apr 9, 2013, at 3:06 PM, Furkan KAMACI wrote:

  Are there anybody who can help me about how to guess the approximately
  needed RAM for 5000 query/second at a Solr machine?

 No.

 That depends on the kind of queries you have, the size and content of the
 index, the required response time, how frequently the index is updated, and
 many more factors. So anyone who can guess that is wrong.

 You can only find that out by running your own benchmarks with your own
 queries against your own index.

 In our system, we can meet our response time requirements at a rate of
 4000 queries/minute. We have several cores, but most traffic goes to a 3M
 document index. This index is small documents, mostly titles and authors of
 books. We have no wildcard queries and less than 5% of our queries use
 fuzzy matching. We update once per day and have cache hit rates of around
 30%.

 We run new benchmarks twice each year, before our busy seasons. We use the
 current index and configuration and the queries from the busiest day of the
 previous season.

 Our key benchmark is the 95th percentile response time, but we also
 measure median, 90th, and 99th percentile.

 We are currently on Solr 3.3 with some customizations. We're working on
 transitioning to Solr 4.

 wunder
 --
 Walter Underwood
 wun...@wunderwood.org






Re: Approximately needed RAM for 5000 query/second at a Solr machine?

2013-04-09 Thread Furkan KAMACI
Thanks for your answer.

2013/4/10 Walter Underwood wun...@wunderwood.org

 We are using Amazon EC2 M1 Extra Large instances (m1.xlarge).

 http://aws.amazon.com/ec2/instance-types/

 wunder

 On Apr 9, 2013, at 3:35 PM, Furkan KAMACI wrote:

  Hi Walter;
 
  Firstly thank for your detailed reply. I know that this is not a well
  detailed question but I don't have any metrics yet. If we talk about your
  system, what is the average RAM size of your Solr machines? Maybe that
 can
  help me to make a comparison.
 
  2013/4/10 Walter Underwood wun...@wunderwood.org
 
  On Apr 9, 2013, at 3:06 PM, Furkan KAMACI wrote:
 
  Are there anybody who can help me about how to guess the approximately
  needed RAM for 5000 query/second at a Solr machine?
 
  No.
 
  That depends on the kind of queries you have, the size and content of
 the
  index, the required response time, how frequently the index is updated,
 and
  many more factors. So anyone who can guess that is wrong.
 
  You can only find that out by running your own benchmarks with your own
  queries against your own index.
 
  In our system, we can meet our response time requirements at a rate of
  4000 queries/minute. We have several cores, but most traffic goes to a
 3M
  document index. This index is small documents, mostly titles and
 authors of
  books. We have no wildcard queries and less than 5% of our queries use
  fuzzy matching. We update once per day and have cache hit rates of
 around
  30%.
 
  We run new benchmarks twice each year, before our busy seasons. We use
 the
  current index and configuration and the queries from the busiest day of
 the
  previous season.
 
  Our key benchmark is the 95th percentile response time, but we also
  measure median, 90th, and 99th percentile.
 
  We are currently on Solr 3.3 with some customizations. We're working on
  transitioning to Solr 4.
 
  wunder
  --
  Walter Underwood
  wun...@wunderwood.org
 
 
 
 

 --
 Walter Underwood
 wun...@wunderwood.org






Re: Pushing a whole set of pdf-files to solr

2013-04-09 Thread Furkan KAMACI
Apache Solr 4 Cookbok says that:

curl http://localhost:8983/solr/update/extract?literal.id=1commit=true;
-F myfile=@cookbook.pdf

is that what you want?

2013/4/10 sdspieg sdsp...@mail.ru

 If anybody could still help me out with this, I'd really appreciate it.
 Thanks!



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Pushing-a-whole-set-of-pdf-files-to-solr-tp4025256p4054885.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Approximately needed RAM for 5000 query/second at a Solr machine?

2013-04-09 Thread Furkan KAMACI
These are really good metrics for me:

You say that RAM size should be at least index size, and it is better to
have a RAM size twice the index size (because of worst case scenario).

On the other hand let's assume that I have a RAM size that is bigger than
twice of indexes at machine. Can Solr use that extra RAM or is it a
approximately maximum limit (to have twice size of indexes at machine)?


2013/4/10 Shawn Heisey s...@elyograg.org

 On 4/9/2013 4:06 PM, Furkan KAMACI wrote:

 Are there anybody who can help me about how to guess the approximately
 needed RAM for 5000 query/second at a Solr machine?


 You've already gotten some good replies, and I'm aware that they haven't
 really answered your question.  This is the kind of question that cannot be
 answered.

 The amount of RAM that you'll need for extreme performance actually isn't
 hard to figure out - you need enough free RAM for the OS to cache the
 maximum amount of disk space all your indexes will ever use. Normally this
 will be twice the size of all the indexes on the machine, because that's
 how much disk space will likely be used in a worst-case merge scenario
 (optimize).  That's very expensive, so it is cheaper to budget for only the
 size of the index.

 A load of 5000 queries per second is pretty high, and probably something
 you will not achieve with a single-server (not counting backup) approach.
  All of the tricks that high-volume website developers use are also
 applicable to Solr.

 Once you have enough RAM, you need to worry more about the number of
 servers, the number of CPU cores in each server, and the speed of those CPU
 cores.  Testing with actual production queries is the only way to find out
 what you really need.

 Beyond hardware design, making the requests as simple as possible and
 taking advantage of caches is important.  Solr has caches for queries,
 filters, and documents.  You can also put a caching proxy (something like
 Varnish) in front of Solr, but that would make NRT updates pretty much
 impossible, and that kind of caching can be difficult to get working right.

 Thanks,
 Shawn




Re: Approximately needed RAM for 5000 query/second at a Solr machine?

2013-04-09 Thread Furkan KAMACI
I am sorry but you said:

*you need enough free RAM for the OS to cache the maximum amount of disk
space all your indexes will ever use*

I have made an assumption my indexes at my machine. Let's assume that it is
5 GB. So it is better to have at least 5 GB RAM? OK, Solr will use RAM up
to how much I define it as a Java processes. When we think about the
indexes at storage and caching them at RAM by OS, is that what you talk
about: having more than 5 GB - or - 10 GB RAM for my machine?

2013/4/10 Shawn Heisey s...@elyograg.org

 On 4/9/2013 7:03 PM, Furkan KAMACI wrote:
  These are really good metrics for me:
 
  You say that RAM size should be at least index size, and it is better to
  have a RAM size twice the index size (because of worst case scenario).
 
  On the other hand let's assume that I have a RAM size that is bigger than
  twice of indexes at machine. Can Solr use that extra RAM or is it a
  approximately maximum limit (to have twice size of indexes at machine)?

 What we have been discussing is the OS cache, which is memory that is
 not used by programs.  The OS uses that memory to make everything run
 faster.  The OS will instantly give that memory up if a program requests
 it.

 Solr is a java program, and java uses memory a little differently, so
 Solr most likely will NOT use more memory when it is available.

 In a normal directly executable program, memory can be allocated at
 any time, and given back to the system at any time.

 With Java, you tell it the maximum amount of memory the program is ever
 allowed to use.  Because of how memory is used inside Java, most
 long-running Java programs (like Solr) will allocate up to the
 configured maximum even if they don't really need that much memory.
 Most Java virtual machines will never give the memory back to the system
 even if it is not required.

 Thanks,
 Shawn




Re: Approximately needed RAM for 5000 query/second at a Solr machine?

2013-04-10 Thread Furkan KAMACI
Thank you for your explanations, this will help me to figure out my system.

2013/4/10 Shawn Heisey s...@elyograg.org

 On 4/9/2013 9:12 PM, Furkan KAMACI wrote:
  I am sorry but you said:
 
  *you need enough free RAM for the OS to cache the maximum amount of disk
  space all your indexes will ever use*
 
  I have made an assumption my indexes at my machine. Let's assume that it
 is
  5 GB. So it is better to have at least 5 GB RAM? OK, Solr will use RAM up
  to how much I define it as a Java processes. When we think about the
  indexes at storage and caching them at RAM by OS, is that what you talk
  about: having more than 5 GB - or - 10 GB RAM for my machine?

 If your index is 5GB, and you give 3GB of RAM to the Solr JVM, then you
 would want at least 8GB of total RAM for that machine - the 3GB of RAM
 given to Solr, plus the rest so the OS can cache the index in RAM.  If
 you plan for double the cache memory, you'd need 13 to 14GB.

 Thanks,
 Shawn




Re: Approximately needed RAM for 5000 query/second at a Solr machine?

2013-04-11 Thread Furkan KAMACI
Hi Walter;

Is there any document or something else says that worst case is three times
of disk space? Twice times or three times. It is really different when we
talk about GB's of disk spaces.


2013/4/10 Walter Underwood wun...@wunderwood.org

 Correct, except the worst case maximum for disk space is three times.
 --wunder

 On Apr 10, 2013, at 6:04 AM, Erick Erickson wrote:

  You're mixing up disk and RAM requirements when you talk
  about having twice the disk size. Solr does _NOT_ require
  twice the index size of RAM to optimize, it requires twice
  the size on _DISK_.
 
  In terms of RAM requirements, you need to create an index,
  run realistic queries at the installation and measure.
 
  Best
  Erick
 
  On Tue, Apr 9, 2013 at 10:32 PM, bigjust bigj...@lambdaphil.es wrote:
 
 
 
  On 4/9/2013 7:03 PM, Furkan KAMACI wrote:
  These are really good metrics for me:
  You say that RAM size should be at least index size, and it is
  better to have a RAM size twice the index size (because of worst
  case scenario).
  On the other hand let's assume that I have a RAM size that is
  bigger than twice of indexes at machine. Can Solr use that extra
  RAM or is it a approximately maximum limit (to have twice size of
  indexes at machine)?
  What we have been discussing is the OS cache, which is memory that
  is not used by programs.  The OS uses that memory to make everything
  run faster.  The OS will instantly give that memory up if a program
  requests it.
  Solr is a java program, and java uses memory a little differently,
  so Solr most likely will NOT use more memory when it is available.
  In a normal directly executable program, memory can be allocated
  at any time, and given back to the system at any time.
  With Java, you tell it the maximum amount of memory the program is
  ever allowed to use.  Because of how memory is used inside Java,
  most long-running Java programs (like Solr) will allocate up to the
  configured maximum even if they don't really need that much memory.
  Most Java virtual machines will never give the memory back to the
  system even if it is not required.
  Thanks, Shawn
 
 
  Furkan KAMACI furkankam...@gmail.com writes:
 
  I am sorry but you said:
 
  *you need enough free RAM for the OS to cache the maximum amount of
  disk space all your indexes will ever use*
 
  I have made an assumption my indexes at my machine. Let's assume that
  it is 5 GB. So it is better to have at least 5 GB RAM? OK, Solr will
  use RAM up to how much I define it as a Java processes. When we think
  about the indexes at storage and caching them at RAM by OS, is that
  what you talk about: having more than 5 GB - or - 10 GB RAM for my
  machine?
 
  2013/4/10 Shawn Heisey s...@elyograg.org
 
 
  10 GB.  Because when Solr shuffles the data around, it could use up to
  twice the size of the index in order to optimize the index on disk.
 
  -- Justin

 --
 Walter Underwood
 wun...@wunderwood.org






Re: migration solr 3.5 to 4.1 - JVM GC problems

2013-04-11 Thread Furkan KAMACI
Hi Marc;

Could I learn your index size and what is your performance measure as query
per second?

2013/4/11 Marc Des Garets marc.desgar...@192.com

 Big heap because very large number of requests with more than 60 indexes
 and hundreds of million of documents (all indexes together). My problem
 is with solr 4.1. All is perfect with 3.5. I have 0.05 sec GCs every 1
 or 2mn and 20Gb of the heap is used.

 With the 4.1 indexes it uses 30Gb-33Gb, the survivor space is all weird
 (it changed the size capacity to 6Mb at some point) and I have 2 sec GCs
 every minute.

 There must be something that has changed in 4.1 compared to 3.5 to cause
 this behavior. It's the same requests, same schemas (excepted 4 fields
 changed from sint to tint) and same config.

 On 04/10/2013 07:38 PM, Shawn Heisey wrote:
  On 4/10/2013 9:48 AM, Marc Des Garets wrote:
  The JVM behavior is now radically different and doesn't seem to make
  sense. I was using ConcMarkSweepGC. I am now trying the G1 collector.
 
  The perm gen went from 410Mb to 600Mb.
 
  The eden space usage is a lot bigger and the survivor space usage is
  100% all the time.
 
  I don't really understand what is happening. GC behavior really doesn't
  seem right.
 
  My jvm settings:
  -d64 -server -Xms40g -Xmx40g -XX:+UseG1GC -XX:NewRatio=1
  -XX:SurvivorRatio=3 -XX:PermSize=728m -XX:MaxPermSize=728m
  As Otis has already asked, why do you have a 40GB heap?  The only way I
  can imagine that you would actually NEED a heap that big is if your
  index size is measured in hundreds of gigabytes.  If you really do need
  a heap that big, you will probably need to go with a JVM like Zing.  I
  don't know how much Zing costs, but they claim to be able to make any
  heap size perform well under any load.  It is Linux-only.
 
  I was running into extreme problems with GC pauses with my own setup,
  and that was only with an 8GB heap.  I was using the CMS collector and
  NewRatio=1.  Switching to G1 didn't help at all - it might have even
  made the problem worse.  I never did try the Zing JVM.
 
  After a lot of experimentation (which I will admit was not done very
  methodically) I found JVM options that have reduced the GC pause problem
  greatly.  Below is what I am using now on Solr 4.2.1 with a total
  per-server index size of about 45GB.  This works properly on CentOS 6
  with Oracle Java 7u17, UseLargePages may require special kernel tuning
  on other operating systems:
 
  -Xmx6144M -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75
  -XX:NewRatio=3 -XX:MaxTenuringThreshold=8 -XX:+CMSParallelRemarkEnabled
  -XX:+ParallelRefProcEnabled -XX:+UseLargePages -XX:+AggressiveOpts
 
  These options could probably use further tuning, but I haven't had time
  for the kind of testing that will be required.
 
  If you decide to pay someone to make the problem going away instead:
 
  http://www.azulsystems.com/products/zing/whatisit
 
  Thanks,
  Shawn
 
 
 


 This transmission is strictly confidential, possibly legally privileged,
 and intended solely for the addressee.
 Any views or opinions expressed within it are those of the author and do
 not necessarily represent those of
 192.com Ltd or any of its subsidiary companies. If you are not the
 intended recipient then you must
 not disclose, copy or take any action in reliance of this transmission. If
 you have received this
 transmission in error, please notify the sender as soon as possible. No
 employee or agent is authorised
 to conclude any binding agreement on behalf 192.com Ltd with another
 party by email without express written
 confirmation by an authorised employee of the company. 
 http://www.192.com(Tel: 08000 192 192).
 192.com Ltd is incorporated in England and Wales, company number
 07180348, VAT No. GB 103226273.



Re: Pointing to Hbase for Docuements or Directly Saving Documents at Hbase

2013-04-11 Thread Furkan KAMACI
Actually I don't think to store documents at Solr. I want to store just
highlights (snippets) at Hbase and I want to retrieve them from Hbase when
needed.
What do you think about separating just highlights from Solr and storing
them into Hbase at Solrclod. By the way if you explain at which process and
how highlights are genareted at Solr you are welcome.


2013/4/9 Otis Gospodnetic otis.gospodne...@gmail.com

 You may also be interested in looking at things like solrbase (on Github).

 Otis
 --
 Solr  ElasticSearch Support
 http://sematext.com/





 On Sat, Apr 6, 2013 at 6:01 PM, Furkan KAMACI furkankam...@gmail.com
 wrote:
  Hi;
 
  First of all should mention that I am new to Solr and making a research
  about it. What I am trying to do that I will crawl some websites with
 Nutch
  and then I will index them with Solr. (Nutch 2.1, Solr-SolrCloud 4.2 )
 
  I wonder about something. I have a cloud of machines that crawls websites
  and stores that documents. Then I send that documents into SolrCloud.
 Solr
  indexes that documents and generates indexes and save them. I know that
  from Information Retrieval theory: it *may* not be efficient to store
  indexes at a NoSQL database (they are something like linked lists and if
  you store them in such kind of database you *may* have a sparse
  representation -by the way there may be some solutions for it. If you
  explain them you are welcome.)
 
  However Solr stores some documents too (i.e. highlights) So some of my
  documents will be doubled somehow. If I consider that I will have many
  documents, that dobuled documents may cause a problem for me. So is there
  any way not storing that documents at Solr and pointing to them at
  Hbase(where I save my crawled documents) or instead of pointing directly
  storing them at Hbase (is it efficient or not)?



Re: Approximately needed RAM for 5000 query/second at a Solr machine?

2013-04-11 Thread Furkan KAMACI
Thanks Walter, you guys gave me really nice ideas about RAM approximation.

2013/4/11 Walter Underwood wun...@wunderwood.org

 Here is the situation where merging can require 3X space. It can only
 happen if you force merge, then index with merging turned off, but we had
 Ultraseek customers do that.

 * All documents are merged into a single segment.
 * Without a merge, all documents are replaced.
 * This results in one segment of deleted documents and one of new
 documents (2X).
 * A merge takes place, creating a new segment of the same size, thus 3X.

 For normal operation, 2X is plenty of room.

 wunder

 On Apr 11, 2013, at 6:46 AM, Michael Ryan wrote:

  I've investigated this in the past. The worst case is 2*indexSize
 additional disk space (3*indexSize total) during an optimize.
 
  In our system, we use LogByteSizeMergePolicy, and used to have a
 mergeFactor of 10. We would see the worst case happen when there were
 exactly 20 segments (or some other multiple of 10, I believe) at the start
 of the optimize. IIRC, it would merge those 20 segments down to 2 segments,
 and then merge those 2 segments down to 1 segment. 1*indexSize space was
 used by the original index (because there is still a reader open on it),
 1*indexSpace was used by the 2 segments, and 1*indexSize space was used by
 the 1 segment. This is the worst case because there are two full additional
 copies of the index on disk. Normally, when the number of segments is not a
 multiple of the mergeFactor, there will be some part of the index that was
 not part of both merges (and this part that is excluded usually would be
 the largest segments).
 
  We worked around this by doing multiple optimize passes, where the first
 pass merges down to between 2 and 2*mergeFactor-1 segments (based on a
 great tip from Lance Norskog on the mailing list a couple years ago).
 
  I'm not sure if the current merge policy implementations still have this
 issue.
 
  -Michael
 
  -Original Message-
  From: Furkan KAMACI [mailto:furkankam...@gmail.com]
  Sent: Thursday, April 11, 2013 2:44 AM
  To: solr-user@lucene.apache.org
  Subject: Re: Approximately needed RAM for 5000 query/second at a Solr
 machine?
 
  Hi Walter;
 
  Is there any document or something else says that worst case is three
 times of disk space? Twice times or three times. It is really different
 when we talk about GB's of disk spaces.
 
 
  2013/4/10 Walter Underwood wun...@wunderwood.org
 
  Correct, except the worst case maximum for disk space is three times.
  --wunder
 
  On Apr 10, 2013, at 6:04 AM, Erick Erickson wrote:
 
  You're mixing up disk and RAM requirements when you talk about
  having twice the disk size. Solr does _NOT_ require twice the index
  size of RAM to optimize, it requires twice the size on _DISK_.
 
  In terms of RAM requirements, you need to create an index, run
  realistic queries at the installation and measure.
 
  Best
  Erick
 
  On Tue, Apr 9, 2013 at 10:32 PM, bigjust bigj...@lambdaphil.es
 wrote:
 
 
 
  On 4/9/2013 7:03 PM, Furkan KAMACI wrote:
  These are really good metrics for me:
  You say that RAM size should be at least index size, and it is
  better to have a RAM size twice the index size (because of worst
  case scenario).
  On the other hand let's assume that I have a RAM size that is
  bigger than twice of indexes at machine. Can Solr use that extra
  RAM or is it a approximately maximum limit (to have twice size
  of indexes at machine)?
  What we have been discussing is the OS cache, which is memory
  that is not used by programs.  The OS uses that memory to make
  everything run faster.  The OS will instantly give that memory up
  if a program requests it.
  Solr is a java program, and java uses memory a little
  differently, so Solr most likely will NOT use more memory when it
 is available.
  In a normal directly executable program, memory can be
  allocated at any time, and given back to the system at any time.
  With Java, you tell it the maximum amount of memory the program
  is ever allowed to use.  Because of how memory is used inside
  Java, most long-running Java programs (like Solr) will allocate
  up to the configured maximum even if they don't really need that
 much memory.
  Most Java virtual machines will never give the memory back to the
  system even if it is not required.
  Thanks, Shawn
 
 
  Furkan KAMACI furkankam...@gmail.com writes:
 
  I am sorry but you said:
 
  *you need enough free RAM for the OS to cache the maximum amount
  of disk space all your indexes will ever use*
 
  I have made an assumption my indexes at my machine. Let's assume
  that it is 5 GB. So it is better to have at least 5 GB RAM? OK,
  Solr will use RAM up to how much I define it as a Java processes.
  When we think about the indexes at storage and caching them at RAM
  by OS, is that what you talk about: having more than 5 GB - or -
  10 GB RAM for my machine?
 
  2013/4/10 Shawn Heisey s...@elyograg.org
 
 
  10 GB.  Because when

Re: Pointing to Hbase for Docuements or Directly Saving Documents at Hbase

2013-04-11 Thread Furkan KAMACI
Hi Otis;

It seems that I should read more about highlights. Is there any where that
explains in detail how highlights are generated at Solr?

2013/4/11 Otis Gospodnetic otis.gospodne...@gmail.com

 Hi,

 You can't store highlights ahead of time because they are query
 dependent.  You could store documents in HBase and use Solr just for
 indexing.  Is that what you want to do?  If so, a custom
 SearchComponent executed after QueryComponent could fetch data from
 external store like HBase.  I'm not sure if I'd recommend that.

 Otis
 --
 Solr  ElasticSearch Support
 http://sematext.com/





 On Thu, Apr 11, 2013 at 10:01 AM, Furkan KAMACI furkankam...@gmail.com
 wrote:
  Actually I don't think to store documents at Solr. I want to store just
  highlights (snippets) at Hbase and I want to retrieve them from Hbase
 when
  needed.
  What do you think about separating just highlights from Solr and storing
  them into Hbase at Solrclod. By the way if you explain at which process
 and
  how highlights are genareted at Solr you are welcome.
 
 
  2013/4/9 Otis Gospodnetic otis.gospodne...@gmail.com
 
  You may also be interested in looking at things like solrbase (on
 Github).
 
  Otis
  --
  Solr  ElasticSearch Support
  http://sematext.com/
 
 
 
 
 
  On Sat, Apr 6, 2013 at 6:01 PM, Furkan KAMACI furkankam...@gmail.com
  wrote:
   Hi;
  
   First of all should mention that I am new to Solr and making a
 research
   about it. What I am trying to do that I will crawl some websites with
  Nutch
   and then I will index them with Solr. (Nutch 2.1, Solr-SolrCloud 4.2 )
  
   I wonder about something. I have a cloud of machines that crawls
 websites
   and stores that documents. Then I send that documents into SolrCloud.
  Solr
   indexes that documents and generates indexes and save them. I know
 that
   from Information Retrieval theory: it *may* not be efficient to store
   indexes at a NoSQL database (they are something like linked lists and
 if
   you store them in such kind of database you *may* have a sparse
   representation -by the way there may be some solutions for it. If you
   explain them you are welcome.)
  
   However Solr stores some documents too (i.e. highlights) So some of my
   documents will be doubled somehow. If I consider that I will have many
   documents, that dobuled documents may cause a problem for me. So is
 there
   any way not storing that documents at Solr and pointing to them at
   Hbase(where I save my crawled documents) or instead of pointing
 directly
   storing them at Hbase (is it efficient or not)?
 



Re: Slow qTime for distributed search

2013-04-12 Thread Furkan KAMACI
Manuel Le Normand, I am sorry but I want to learn something. You said you
have 40 dedicated servers. What is your total document count, total
document size, and total shard size?

2013/4/11 Manuel Le Normand manuel.lenorm...@gmail.com

 Hi,
 We have different working hours, sorry for the reply delay. Your assumed
 numbers are right, about 25-30Kb per doc. giving a total of 15G per shard,
 there are two shards per server (+2 slaves that should do no work
 normally).
 An average query has about 30 conditions (OR AND mixed), most of them
 textual, a small part on dateTime. They use only simple queries (no facet,
 filters etc.) as it is taken from the actual query set of my entreprise
 that works with an old search engine.

 As we said, if the shards in collection1 and collection2 have the same
 number of docs each (and same RAM  CPU per shard), it is apparently not a
 slow IO issue, right? So the fact of not having cached all my index doesn't
 seem the be the bottleneck.Moreover, i do store the fields but my query set
 requests only the id's and rarely snippets so I'd assume that the plenty of
 RAM i'd give the OS wouldn't make any difference as these *.fdt files don't
 need to get cached.

 The conclusion i get to is that the merging issue is the problem, and the
 only possibility of outsmarting it is to distribute to much fewer shards,
 meaning that i'll get back to few millions of docs per shard which are
 about linearly slower with the num of docs per shard. Though the latter
 should improve if i give much more RAM per server.

 I'll try tweaking a bit my schema and making better use of solr cache
 (filter query as an example), but i have something telling me the problem
 might be elsewhere. My main clue to it is that merging seems a simple CPU
 task, and tests show that even with a small amount of responses it takes a
 long time (and clearly the merging task on few docs is very short)


 On Wed, Apr 10, 2013 at 2:50 AM, Shawn Heisey s...@elyograg.org wrote:

  On 4/9/2013 3:50 PM, Furkan KAMACI wrote:
 
  Hi Shawn;
 
  You say that:
 
  *... your documents are about 50KB each.  That would translate to an
 index
  that's at least 25GB*
 
  I know we can not say an exact size but what is the approximately ratio
 of
  document size / index size according to your experiences?
 
 
  If you store the fields, that is actual size plus a small amount of
  overhead.  Starting with Solr 4.1, stored fields are compressed.  I
 believe
  that it uses LZ4 compression.  Some people store all fields, some people
  store only a few or one - an ID field.  The size of stored fields does
 have
  an impact on how much OS disk cache you need, but not as much as the
 other
  parts of an index.
 
  It's been my experience that termvectors take up almost as much space as
  stored data for the same fields, and sometimes more.  Starting with Solr
  4.2, termvectors are also compressed.
 
  Adding docValues (new in 4.2) to the schema will also make the index
  larger.  The requirements here are similar to stored fields.  I do not
 know
  whether this data gets compressed, but I don't think it does.
 
  As for the indexed data, this is where I am less clear about the storage
  ratios, but I think you can count on it needing almost as much space as
 the
  original data.  If the schema uses types or filters that produce a lot of
  information, the indexed data might be larger than the original input.
   Examples of data explosions in a schema: trie fields with a non-zero
  precisionStep, the edgengram filter, the shingle filter.
 
  Thanks,
  Shawn
 
 



Re: Approximately needed RAM for 5000 query/second at a Solr machine?

2013-04-13 Thread Furkan KAMACI
Hi Jack;

Due to I am new to Solr, can you explain this two things that you said:

1) when most people say index size they are referring to all fields,
collectively, not individual fields (what do you mean with Segments are on
a per-field basis  and all fields, individual fields.)
2) more cores might make the worst case scenario worse since it will
maximize the amount of data processed at a given moment


2013/4/13 Erick Erickson erickerick...@gmail.com

 bq: disk space is three times

 True, I keep forgetting about compound since I never use it...

 On Wed, Apr 10, 2013 at 11:05 AM, Walter Underwood
 wun...@wunderwood.org wrote:
  Correct, except the worst case maximum for disk space is three times.
 --wunder
 
  On Apr 10, 2013, at 6:04 AM, Erick Erickson wrote:
 
  You're mixing up disk and RAM requirements when you talk
  about having twice the disk size. Solr does _NOT_ require
  twice the index size of RAM to optimize, it requires twice
  the size on _DISK_.
 
  In terms of RAM requirements, you need to create an index,
  run realistic queries at the installation and measure.
 
  Best
  Erick
 
  On Tue, Apr 9, 2013 at 10:32 PM, bigjust bigj...@lambdaphil.es wrote:
 
 
 
  On 4/9/2013 7:03 PM, Furkan KAMACI wrote:
  These are really good metrics for me:
  You say that RAM size should be at least index size, and it is
  better to have a RAM size twice the index size (because of worst
  case scenario).
  On the other hand let's assume that I have a RAM size that is
  bigger than twice of indexes at machine. Can Solr use that extra
  RAM or is it a approximately maximum limit (to have twice size of
  indexes at machine)?
  What we have been discussing is the OS cache, which is memory that
  is not used by programs.  The OS uses that memory to make everything
  run faster.  The OS will instantly give that memory up if a program
  requests it.
  Solr is a java program, and java uses memory a little differently,
  so Solr most likely will NOT use more memory when it is available.
  In a normal directly executable program, memory can be allocated
  at any time, and given back to the system at any time.
  With Java, you tell it the maximum amount of memory the program is
  ever allowed to use.  Because of how memory is used inside Java,
  most long-running Java programs (like Solr) will allocate up to the
  configured maximum even if they don't really need that much memory.
  Most Java virtual machines will never give the memory back to the
  system even if it is not required.
  Thanks, Shawn
 
 
  Furkan KAMACI furkankam...@gmail.com writes:
 
  I am sorry but you said:
 
  *you need enough free RAM for the OS to cache the maximum amount of
  disk space all your indexes will ever use*
 
  I have made an assumption my indexes at my machine. Let's assume that
  it is 5 GB. So it is better to have at least 5 GB RAM? OK, Solr will
  use RAM up to how much I define it as a Java processes. When we think
  about the indexes at storage and caching them at RAM by OS, is that
  what you talk about: having more than 5 GB - or - 10 GB RAM for my
  machine?
 
  2013/4/10 Shawn Heisey s...@elyograg.org
 
 
  10 GB.  Because when Solr shuffles the data around, it could use up to
  twice the size of the index in order to optimize the index on disk.
 
  -- Justin
 
  --
  Walter Underwood
  wun...@wunderwood.org
 
 
 



Listing Priority

2013-04-14 Thread Furkan KAMACI
I have crawled some internet pages and indexed them at Solr.

When I list my results via Solr I want that: if a page has a URL(my schema
includes a field for URL) that ends with .edu, .edu.az or .co.uk I will
give more priority to them.

How can I do it in a more efficient way at Solr?


Some Questions About Using Solr as Cloud

2013-04-14 Thread Furkan KAMACI
I read wiki and reading SolrGuide of Lucidworks. However I want to clear
something in my mind. Here are my questions:

1) Does SolrCloud lets a multi master design (is there any document that I
can read about it)?
2) Let's assume that I use multiple cores i.e. core A and core B. Let's
assume that there is a document just indexed at core B. If I send a search
request to core A can I get result?
3) When I use multi master design (if exists) can I transfer one master's
index data into another (with its slaves or not)?
4) When I use multi core design can I transfer one index data into another
core or anywhere else?

By the way thanks for the quick responses and kindness at mail list.


Re: Some Questions About Using Solr as Cloud

2013-04-14 Thread Furkan KAMACI
5) When I use multi core design can I transfer one index data into another
core or anywhere else?
6) Does Solr holds old versions of documents or remove them?

2013/4/15 Furkan KAMACI furkankam...@gmail.com

 I read wiki and reading SolrGuide of Lucidworks. However I want to clear
 something in my mind. Here are my questions:

 1) Does SolrCloud lets a multi master design (is there any document that I
 can read about it)?
 2) Let's assume that I use multiple cores i.e. core A and core B. Let's
 assume that there is a document just indexed at core B. If I send a search
 request to core A can I get result?
 3) When I use multi master design (if exists) can I transfer one master's
 index data into another (with its slaves or not)?
 4) When I use multi core design can I transfer one index data into another
 core or anywhere else?

 By the way thanks for the quick responses and kindness at mail list.



Re: Some Questions About Using Solr as Cloud

2013-04-15 Thread Furkan KAMACI
Hi Jack;

I see that SolrCloud makes everything automated. When I use SolrCloud is it
true that: there may be more than one computer responsible for indexing at
any time?

2013/4/15 Jack Krupansky j...@basetechnology.com

 There are no masters or slaves in SolrCloud - it's fully distributed. Some
 cluster nodes will be leaders (of the shard on that node) at a given
 point in time, but different nodes may be leaders at different points in
 time as they become elected.

 In a distributed cluster you would never want to store documents only on
 one node. Sure, you can do that by setting the replication factor to 1, but
 that defeats half the purpose for SolrCloud.

 Index transfer is automatic - SolrCloud supports fully distributed update.

 You might be getting confused with the old Master-Slave-Replication
 model that Solr had (and still has) which is distinct from SolrCloud.

 -- Jack Krupansky

 -Original Message- From: Furkan KAMACI
 Sent: Sunday, April 14, 2013 7:45 PM
 To: solr-user@lucene.apache.org
 Subject: Some Questions About Using Solr as Cloud


 I read wiki and reading SolrGuide of Lucidworks. However I want to clear
 something in my mind. Here are my questions:

 1) Does SolrCloud lets a multi master design (is there any document that I
 can read about it)?
 2) Let's assume that I use multiple cores i.e. core A and core B. Let's
 assume that there is a document just indexed at core B. If I send a search
 request to core A can I get result?
 3) When I use multi master design (if exists) can I transfer one master's
 index data into another (with its slaves or not)?
 4) When I use multi core design can I transfer one index data into another
 core or anywhere else?

 By the way thanks for the quick responses and kindness at mail list.



SolrCloud Leaders

2013-04-15 Thread Furkan KAMACI
Does number of leaders at a SolrCloud is equal to number of shards?


Re: SolrCloud Leaders

2013-04-15 Thread Furkan KAMACI
Does leaders may response search requests (I mean do they store indexes) at
when I run SolrCloud at first and after a time later?

2013/4/15 Jack Krupansky j...@basetechnology.com

 When the cluster is fully operational, yes. But if part of the cluster is
 down or split and unable to communicate, or leader election is in progress,
 the actual count of leaders will not be indicative of the number of shards.

 Leaders and shards are apples and oranges. If you take down a cluster, by
 definition it would have no leaders (because leaders are running code), but
 shards are the files in the index on disk that continue to exist even if
 the code is not running. So, in the extreme, the number of leaders can be
 zero while the number of shards is non-zero on disk.

 -- Jack Krupansky

 -Original Message- From: Furkan KAMACI
 Sent: Monday, April 15, 2013 8:21 AM
 To: solr-user@lucene.apache.org
 Subject: SolrCloud Leaders


 Does number of leaders at a SolrCloud is equal to number of shards?



Re: SolrCloud Leaders

2013-04-15 Thread Furkan KAMACI
Here writes something:
https://support.lucidworks.com/entries/22180608-Solr-HA-DR-overview-3-x-and-4-0-SolrCloud-and
says:

Both leaders and replicas index items and perform searches.

How replicas index items?


2013/4/15 Furkan KAMACI furkankam...@gmail.com

 Does leaders may response search requests (I mean do they store indexes)
 at when I run SolrCloud at first and after a time later?


 2013/4/15 Jack Krupansky j...@basetechnology.com

 When the cluster is fully operational, yes. But if part of the cluster is
 down or split and unable to communicate, or leader election is in progress,
 the actual count of leaders will not be indicative of the number of shards.

 Leaders and shards are apples and oranges. If you take down a cluster, by
 definition it would have no leaders (because leaders are running code), but
 shards are the files in the index on disk that continue to exist even if
 the code is not running. So, in the extreme, the number of leaders can be
 zero while the number of shards is non-zero on disk.

 -- Jack Krupansky

 -Original Message- From: Furkan KAMACI
 Sent: Monday, April 15, 2013 8:21 AM
 To: solr-user@lucene.apache.org
 Subject: SolrCloud Leaders


 Does number of leaders at a SolrCloud is equal to number of shards?





Usage of CloudSolrServer?

2013-04-15 Thread Furkan KAMACI
I am reading Lucidworks Solr Guide it says at SolrCloud section:

*Read Side Fault Tolerance*
With earlier versions of Solr, you had to set up your own load balancer.
Now each individual node
load balances requests across the replicas in a cluster. You still need a
load balancer on the
'outside' that talks to the cluster, or you need a smart client. (Solr
provides a smart Java Solrj
client called CloudSolrServer.)

My system is as follows: I crawl data with Nutch and send them into
SolrCloud. Users will search at Solr.

What is that CloudSolrServer, should I use it for load balancing or is it
something else different?


Re: Usage of CloudSolrServer?

2013-04-16 Thread Furkan KAMACI
Hi Shawn;

I am sorry but what kind of Load Balancing is that? I mean does it check
whether some leaders are using much CPU or RAM etc.? I think a problem may
occur at such kind of scenario: if some of leaders getting more documents
than other leaders (I don't know how it is decided that into which shard a
document will go) than there will be a bottleneck on that leader?


2013/4/15 Shawn Heisey s...@elyograg.org

 On 4/15/2013 8:05 AM, Furkan KAMACI wrote:

 My system is as follows: I crawl data with Nutch and send them into
 SolrCloud. Users will search at Solr.

 What is that CloudSolrServer, should I use it for load balancing or is it
 something else different?


 It appears that the Solr integration in Nutch currently does not use
 CloudSolrServer.  There is an issue to add it.  The mutual dependency on
 HttpClient is holding it up - Nutch uses HttpClient 3, SolrJ 4.x uses
 HttpClient 4.

 https://issues.apache.org/**jira/browse/NUTCH-1377https://issues.apache.org/jira/browse/NUTCH-1377

 Until that is fixed, a load balancer would be required for full redundancy
 for updates with SolrCloud.  You don't have to use a load balancer for it
 to work, but if the Solr server that Nutch is using goes down, then
 indexing will stop unless you reconfigure Nutch or bring the Solr server
 back up.

 Thanks,
 Shawn




Re: Storing Solr Index on NFS

2013-04-16 Thread Furkan KAMACI
Hi Walter;

You said: It is not safe to share Solr index files between two Solr
servers. Why do you think like that?


2013/4/16 Tim Vaillancourt t...@elementspace.com

 If centralization of storage is your goal by choosing NFS, iSCSI works
 reasonably well with SOLR indexes, although good local-storage will always
 be the overall winner.

 I noticed a near 5% degredation in overall search performance (casual
 testing, nothing scientific) when moving a 40-50GB indexes to iSCSI (10GBe
 network) from a 4x7200rpm RAID 10 local SATA disk setup.

 Tim


 On 15/04/13 09:59 AM, Walter Underwood wrote:

 Solr 4.2 does have field compression which makes smaller indexes. That
 will reduce the amount of network traffic. That probably does not help
 much, because I think the latency of NFS is what causes problems.

 wunder

 On Apr 15, 2013, at 9:52 AM, Ali, Saqib wrote:

  Hello Walter,

 Thanks for the response. That has been my experience in the past as well.
 But I was wondering if there new are things in Solr 4 and NFS 4.1 that
 make
 the storing of indexes on a NFS mount feasible.

 Thanks,
 Saqib


 On Mon, Apr 15, 2013 at 9:47 AM, Walter Underwoodwunder@wunderwood.**
 org wun...@wunderwood.orgwrote:

  On Apr 15, 2013, at 9:40 AM, Ali, Saqib wrote:

  Greetings,

 Are there any issues with storing Solr Indexes on a NFS share? Also any
 recommendations for using NFS for Solr indexes?

 I recommend that you do not put Solr indexes on NFS.

 It can be very slow, I measured indexing as 100X slower on NFS a few
 years
 ago.

 It is not safe to share Solr index files between two Solr servers, so
 there is no benefit to NFS.

 wunder
 --
 Walter Underwood
 wun...@wunderwood.org




  --
 Walter Underwood
 wun...@wunderwood.org







Re: Usage of CloudSolrServer?

2013-04-16 Thread Furkan KAMACI
Thanks for your detailed explanation. However you said:

It will then choose one of those hosts/cores for each shard, and send a
request to them as a distributed search request. Is there any document
that explains of distributed search? What is the criteria for it?


2013/4/16 Upayavira u...@odoko.co.uk

 If you are accessing Solr from Java code, you will likely use the SolrJ
 client to do so. If your users are hitting Solr directly, you should
 think about whether this is wise - as well as providing them with direct
 search access, you are also providing them with the ability to delete
 your entire index with a single command.

 SolrJ isn't really a load balancer as such. When SolrJ is used to make a
 request against a collection, it will ask Zookeeper for the names of the
 shards that make up that collection, and for the hosts/cores that make
 up the set of replicas for those shards.

 It will then choose one of those hosts/cores for each shard, and send a
 request to them as a distributed search request.

 This has the advantage over traditional load balancing that if you bring
 up a new node, that node will register itself with ZooKeeper, and thus
 your SolrJ client(s) will know about it, without any intervention.

 Upayavira

 On Tue, Apr 16, 2013, at 08:36 AM, Furkan KAMACI wrote:
  Hi Shawn;
 
  I am sorry but what kind of Load Balancing is that? I mean does it check
  whether some leaders are using much CPU or RAM etc.? I think a problem
  may
  occur at such kind of scenario: if some of leaders getting more documents
  than other leaders (I don't know how it is decided that into which shard
  a
  document will go) than there will be a bottleneck on that leader?
 
 
  2013/4/15 Shawn Heisey s...@elyograg.org
 
   On 4/15/2013 8:05 AM, Furkan KAMACI wrote:
  
   My system is as follows: I crawl data with Nutch and send them into
   SolrCloud. Users will search at Solr.
  
   What is that CloudSolrServer, should I use it for load balancing or
 is it
   something else different?
  
  
   It appears that the Solr integration in Nutch currently does not use
   CloudSolrServer.  There is an issue to add it.  The mutual dependency
 on
   HttpClient is holding it up - Nutch uses HttpClient 3, SolrJ 4.x uses
   HttpClient 4.
  
   https://issues.apache.org/**jira/browse/NUTCH-1377
 https://issues.apache.org/jira/browse/NUTCH-1377
  
   Until that is fixed, a load balancer would be required for full
 redundancy
   for updates with SolrCloud.  You don't have to use a load balancer for
 it
   to work, but if the Solr server that Nutch is using goes down, then
   indexing will stop unless you reconfigure Nutch or bring the Solr
 server
   back up.
  
   Thanks,
   Shawn
  
  



Re: Some Questions About Using Solr as Cloud

2013-04-16 Thread Furkan KAMACI
Hi Erick;

Thanks for the explanation. You said:

You cannot transfer just the indexed form of a document from one
core to another, you have to re-index the doc. why do you think like that?

2013/4/16 Erick Erickson erickerick...@gmail.com

 Yes. Every node is really self-contained. When you send a doc to a
 cluster where each shard has a replica, the raw doc is sent to
 each node of that shard and indexed independently.

 About old docs, it's the same as Solr 3.6. Data associated with
 docs stays around in the index until it's merged away.

 You cannot transfer just the indexed form of a document from one
 core to another, you have to re-index the doc.

 Best
 Erick

 On Mon, Apr 15, 2013 at 7:46 AM, Furkan KAMACI furkankam...@gmail.com
 wrote:
  Hi Jack;
 
  I see that SolrCloud makes everything automated. When I use SolrCloud is
 it
  true that: there may be more than one computer responsible for indexing
 at
  any time?
 
  2013/4/15 Jack Krupansky j...@basetechnology.com
 
  There are no masters or slaves in SolrCloud - it's fully distributed.
 Some
  cluster nodes will be leaders (of the shard on that node) at a given
  point in time, but different nodes may be leaders at different points in
  time as they become elected.
 
  In a distributed cluster you would never want to store documents only on
  one node. Sure, you can do that by setting the replication factor to 1,
 but
  that defeats half the purpose for SolrCloud.
 
  Index transfer is automatic - SolrCloud supports fully distributed
 update.
 
  You might be getting confused with the old Master-Slave-Replication
  model that Solr had (and still has) which is distinct from SolrCloud.
 
  -- Jack Krupansky
 
  -Original Message- From: Furkan KAMACI
  Sent: Sunday, April 14, 2013 7:45 PM
  To: solr-user@lucene.apache.org
  Subject: Some Questions About Using Solr as Cloud
 
 
  I read wiki and reading SolrGuide of Lucidworks. However I want to clear
  something in my mind. Here are my questions:
 
  1) Does SolrCloud lets a multi master design (is there any document
 that I
  can read about it)?
  2) Let's assume that I use multiple cores i.e. core A and core B. Let's
  assume that there is a document just indexed at core B. If I send a
 search
  request to core A can I get result?
  3) When I use multi master design (if exists) can I transfer one
 master's
  index data into another (with its slaves or not)?
  4) When I use multi core design can I transfer one index data into
 another
  core or anywhere else?
 
  By the way thanks for the quick responses and kindness at mail list.
 



SolrCloud Leader Response Mechanism

2013-04-16 Thread Furkan KAMACI
When a leader responses for a query, does it says that: If I have the data
what I am looking for, I should build response with it, otherwise I should
find it anywhere. Because it may be long to search it?
or
does it says I only index the data, I will tell it to other guys to build
up the response query?


Same Shards at Different Machines

2013-04-16 Thread Furkan KAMACI
Is it possible to use same shards at different machines at SolrCloud?


Re: SolrCloud Leader Response Mechanism

2013-04-16 Thread Furkan KAMACI
Hi Mark;

When I speak with proper terms I want to ask that: is there a data locality
of spatial locality (
http://www.roguewave.com/portals/0/products/threadspotter/docs/2011.2/manual_html_linux/manual_html/ch_intro_locality.html
- I mean if you have data on your machine, use it and don't search it
anywhere else, just search for remaining parts) at querying on a leader of
SolrCloud?

2013/4/16 Mark Miller markrmil...@gmail.com

 Leaders don't have much to do with querying - the node that you query will
 determine what other nodes it has to query to search the whole index and do
 a scatter/gather for you. (Though in some cases that request can be proxied
 to another node)

 - Mark

 On Apr 16, 2013, at 7:48 AM, Furkan KAMACI furkankam...@gmail.com wrote:

  When a leader responses for a query, does it says that: If I have the
 data
  what I am looking for, I should build response with it, otherwise I
 should
  find it anywhere. Because it may be long to search it?
  or
  does it says I only index the data, I will tell it to other guys to build
  up the response query?




Why indexing and querying performance is better at SolrCloud compared to older versions of Solr?

2013-04-16 Thread Furkan KAMACI
Is there any document that describes why indexing and querying performance
is better at SolrCloud compared to older versions of Solr?

I was examining that architecture to use: there will be a cloud of Solr
that just do indexing and there will be another cloud that copies that
indexes into them and just to querying because of to get better
performance. However if I use SolrCloud I think that there is no need to
build up an architecture such like it.


Re: Pointing to Hbase for Docuements or Directly Saving Documents at Hbase

2013-04-16 Thread Furkan KAMACI
Hi Otis and Jack;

I have made a research about highlights and debugged code. I see that
highlight are query dependent and not stored. Why Solr uses Lucene for
storing text, I mean i.e. content of a web page. Is there any comparison
about to store texts at Hbase or any other databases versus Lucene.

Also I want to learn that is there anybody who has used anything else from
Lucene to store text of document at our solr user list?

2013/4/11 Otis Gospodnetic otis.gospodne...@gmail.com

 Source code is your best bet.  Wiki has info about how to use it, but
 not how highlighting is implemented.  But you don't need to understand
 the implementation details to understand that they are dynamic,
 computed specifically for each query for each matching document, so
 you cannot store them anywhere ahead of time.

 Otis
 --
 Solr  ElasticSearch Support
 http://sematext.com/





 On Thu, Apr 11, 2013 at 11:22 AM, Furkan KAMACI furkankam...@gmail.com
 wrote:
  Hi Otis;
 
  It seems that I should read more about highlights. Is there any where
 that
  explains in detail how highlights are generated at Solr?
 
  2013/4/11 Otis Gospodnetic otis.gospodne...@gmail.com
 
  Hi,
 
  You can't store highlights ahead of time because they are query
  dependent.  You could store documents in HBase and use Solr just for
  indexing.  Is that what you want to do?  If so, a custom
  SearchComponent executed after QueryComponent could fetch data from
  external store like HBase.  I'm not sure if I'd recommend that.
 
  Otis
  --
  Solr  ElasticSearch Support
  http://sematext.com/
 
 
 
 
 
  On Thu, Apr 11, 2013 at 10:01 AM, Furkan KAMACI furkankam...@gmail.com
 
  wrote:
   Actually I don't think to store documents at Solr. I want to store
 just
   highlights (snippets) at Hbase and I want to retrieve them from Hbase
  when
   needed.
   What do you think about separating just highlights from Solr and
 storing
   them into Hbase at Solrclod. By the way if you explain at which
 process
  and
   how highlights are genareted at Solr you are welcome.
  
  
   2013/4/9 Otis Gospodnetic otis.gospodne...@gmail.com
  
   You may also be interested in looking at things like solrbase (on
  Github).
  
   Otis
   --
   Solr  ElasticSearch Support
   http://sematext.com/
  
  
  
  
  
   On Sat, Apr 6, 2013 at 6:01 PM, Furkan KAMACI 
 furkankam...@gmail.com
   wrote:
Hi;
   
First of all should mention that I am new to Solr and making a
  research
about it. What I am trying to do that I will crawl some websites
 with
   Nutch
and then I will index them with Solr. (Nutch 2.1, Solr-SolrCloud
 4.2 )
   
I wonder about something. I have a cloud of machines that crawls
  websites
and stores that documents. Then I send that documents into
 SolrCloud.
   Solr
indexes that documents and generates indexes and save them. I know
  that
from Information Retrieval theory: it *may* not be efficient to
 store
indexes at a NoSQL database (they are something like linked lists
 and
  if
you store them in such kind of database you *may* have a sparse
representation -by the way there may be some solutions for it. If
 you
explain them you are welcome.)
   
However Solr stores some documents too (i.e. highlights) So some
 of my
documents will be doubled somehow. If I consider that I will have
 many
documents, that dobuled documents may cause a problem for me. So is
  there
any way not storing that documents at Solr and pointing to them at
Hbase(where I save my crawled documents) or instead of pointing
  directly
storing them at Hbase (is it efficient or not)?
  
 



When a search query comes to a replica what happens?

2013-04-16 Thread Furkan KAMACI
I want to make it clear in my mind:

When a search query comes to a replica what happens?

-Does it forwards the search query to leader and leader collects all the
data and prepares response (this will cause a performance issue because
leader is responsible for indexing at same time)
or
- replica communicates with leader and learns where is remaining
data(leaders asks to Zookeper and tells it to replica) and replica collects
all data and response it?


How SolrCloud Balance Number of Documents at each Shard?

2013-04-16 Thread Furkan KAMACI
Is it possible that different shards have different number of documents or
does SolrCloud balance them?

I ask this question because I want to learn the mechanism behind how Solr
calculete hash value of the identifier of the document. Is it possible that
hash function produces more documents into one of the shards other than any
of shards. (because this may cause a bottleneck at some leaders of
SolrCloud)


Re: When a search query comes to a replica what happens?

2013-04-16 Thread Furkan KAMACI
All in all will replica ask to its leader about where is remaining of data
or it directly asks to Zookeper?

2013/4/17 Otis Gospodnetic otis.gospodne...@gmail.com

 Hi,

 No, I believe redirect from replica to leader would happen only at
 index time, so a doc first gets indexed to leader and from there it's
 replicated to non-leader shards.  At query time there is no redirect
 to leader, I imagine, as that would quickly turn leaders into
 hotspots.

 Otis
 --
 Solr  ElasticSearch Support
 http://sematext.com/





 On Tue, Apr 16, 2013 at 6:01 PM, Furkan KAMACI furkankam...@gmail.com
 wrote:
  I want to make it clear in my mind:
 
  When a search query comes to a replica what happens?
 
  -Does it forwards the search query to leader and leader collects all the
  data and prepares response (this will cause a performance issue because
  leader is responsible for indexing at same time)
  or
  - replica communicates with leader and learns where is remaining
  data(leaders asks to Zookeper and tells it to replica) and replica
 collects
  all data and response it?



Re: How SolrCloud Balance Number of Documents at each Shard?

2013-04-16 Thread Furkan KAMACI
Hi Otis;

Firstly thanks for your answers. So do you mean that hashing mechanism will
randomly route a document into a randomly shard? I want to ask it because I
consider about putting a load balancer in front of my SolrCloud and
manually route some documents into some other shards to avoid bottleneck.

2013/4/17 Otis Gospodnetic otis.gospodne...@gmail.com

 They won't be exact, but should be close.  Are you seeing some *big*
 differences?

 Otis
 --
 Solr  ElasticSearch Support
 http://sematext.com/





 On Tue, Apr 16, 2013 at 6:11 PM, Furkan KAMACI furkankam...@gmail.com
 wrote:
  Is it possible that different shards have different number of documents
 or
  does SolrCloud balance them?
 
  I ask this question because I want to learn the mechanism behind how Solr
  calculete hash value of the identifier of the document. Is it possible
 that
  hash function produces more documents into one of the shards other than
 any
  of shards. (because this may cause a bottleneck at some leaders of
  SolrCloud)



Re: Pointing to Hbase for Docuements or Directly Saving Documents at Hbase

2013-04-16 Thread Furkan KAMACI
Thanks again for your answer. If I find any document about such comparisons
that I would like to read.

By the way, is there any advantage for using Lucene instead of anything
else as like that:

Using Lucene is naturally supported at Solr and if I use anything else I
may face with some compatibility problems or communicating issues?


2013/4/17 Otis Gospodnetic otis.gospodne...@gmail.com

 People do use other data stores to retrieve data sometimes. e.g. Mongo
 is popular for that.  Like I hinted in another email, I wouldn't
 necessarily recommend this for common cases.  Don't do it unless you
 really know you need it.  Otherwise, just store in Solr.

 Otis
 --
 Solr  ElasticSearch Support
 http://sematext.com/





 On Tue, Apr 16, 2013 at 5:32 PM, Furkan KAMACI furkankam...@gmail.com
 wrote:
  Hi Otis and Jack;
 
  I have made a research about highlights and debugged code. I see that
  highlight are query dependent and not stored. Why Solr uses Lucene for
  storing text, I mean i.e. content of a web page. Is there any comparison
  about to store texts at Hbase or any other databases versus Lucene.
 
  Also I want to learn that is there anybody who has used anything else
 from
  Lucene to store text of document at our solr user list?
 
  2013/4/11 Otis Gospodnetic otis.gospodne...@gmail.com
 
  Source code is your best bet.  Wiki has info about how to use it, but
  not how highlighting is implemented.  But you don't need to understand
  the implementation details to understand that they are dynamic,
  computed specifically for each query for each matching document, so
  you cannot store them anywhere ahead of time.
 
  Otis
  --
  Solr  ElasticSearch Support
  http://sematext.com/
 
 
 
 
 
  On Thu, Apr 11, 2013 at 11:22 AM, Furkan KAMACI furkankam...@gmail.com
 
  wrote:
   Hi Otis;
  
   It seems that I should read more about highlights. Is there any where
  that
   explains in detail how highlights are generated at Solr?
  
   2013/4/11 Otis Gospodnetic otis.gospodne...@gmail.com
  
   Hi,
  
   You can't store highlights ahead of time because they are query
   dependent.  You could store documents in HBase and use Solr just for
   indexing.  Is that what you want to do?  If so, a custom
   SearchComponent executed after QueryComponent could fetch data from
   external store like HBase.  I'm not sure if I'd recommend that.
  
   Otis
   --
   Solr  ElasticSearch Support
   http://sematext.com/
  
  
  
  
  
   On Thu, Apr 11, 2013 at 10:01 AM, Furkan KAMACI 
 furkankam...@gmail.com
  
   wrote:
Actually I don't think to store documents at Solr. I want to store
  just
highlights (snippets) at Hbase and I want to retrieve them from
 Hbase
   when
needed.
What do you think about separating just highlights from Solr and
  storing
them into Hbase at Solrclod. By the way if you explain at which
  process
   and
how highlights are genareted at Solr you are welcome.
   
   
2013/4/9 Otis Gospodnetic otis.gospodne...@gmail.com
   
You may also be interested in looking at things like solrbase (on
   Github).
   
Otis
--
Solr  ElasticSearch Support
http://sematext.com/
   
   
   
   
   
On Sat, Apr 6, 2013 at 6:01 PM, Furkan KAMACI 
  furkankam...@gmail.com
wrote:
 Hi;

 First of all should mention that I am new to Solr and making a
   research
 about it. What I am trying to do that I will crawl some websites
  with
Nutch
 and then I will index them with Solr. (Nutch 2.1, Solr-SolrCloud
  4.2 )

 I wonder about something. I have a cloud of machines that crawls
   websites
 and stores that documents. Then I send that documents into
  SolrCloud.
Solr
 indexes that documents and generates indexes and save them. I
 know
   that
 from Information Retrieval theory: it *may* not be efficient to
  store
 indexes at a NoSQL database (they are something like linked
 lists
  and
   if
 you store them in such kind of database you *may* have a sparse
 representation -by the way there may be some solutions for it.
 If
  you
 explain them you are welcome.)

 However Solr stores some documents too (i.e. highlights) So some
  of my
 documents will be doubled somehow. If I consider that I will
 have
  many
 documents, that dobuled documents may cause a problem for me.
 So is
   there
 any way not storing that documents at Solr and pointing to them
 at
 Hbase(where I save my crawled documents) or instead of pointing
   directly
 storing them at Hbase (is it efficient or not)?
   
  
 



Re: SolrCloud Leader Response Mechanism

2013-04-16 Thread Furkan KAMACI
Hi Otis;

You said:

It can just do it because it knows where things are.

Does it learn it from Zookeeper?

2013/4/17 Otis Gospodnetic otis.gospodne...@gmail.com

 If query comes to shard X on some node and this shard X is NOT a
 leader, but HAS data, it will just execute the query.  If it needs to
 query shards on other nodes, it will have the info about which shards
 to query and will just do that and aggregate the results.  It doesn't
 have to ask leader for permission, for info, etc.  It can just do it
 because it knows where things are.

 Otis
 --
 Solr  ElasticSearch Support
 http://sematext.com/





 On Tue, Apr 16, 2013 at 5:23 PM, Furkan KAMACI furkankam...@gmail.com
 wrote:
  Hi Mark;
 
  When I speak with proper terms I want to ask that: is there a data
 locality
  of spatial locality (
 
 http://www.roguewave.com/portals/0/products/threadspotter/docs/2011.2/manual_html_linux/manual_html/ch_intro_locality.html
  - I mean if you have data on your machine, use it and don't search it
  anywhere else, just search for remaining parts) at querying on a leader
 of
  SolrCloud?
 
  2013/4/16 Mark Miller markrmil...@gmail.com
 
  Leaders don't have much to do with querying - the node that you query
 will
  determine what other nodes it has to query to search the whole index
 and do
  a scatter/gather for you. (Though in some cases that request can be
 proxied
  to another node)
 
  - Mark
 
  On Apr 16, 2013, at 7:48 AM, Furkan KAMACI furkankam...@gmail.com
 wrote:
 
   When a leader responses for a query, does it says that: If I have the
  data
   what I am looking for, I should build response with it, otherwise I
  should
   find it anywhere. Because it may be long to search it?
   or
   does it says I only index the data, I will tell it to other guys to
 build
   up the response query?
 
 



Re: SolrCloud Leader Response Mechanism

2013-04-16 Thread Furkan KAMACI
Replica asks to Zookeper and Leader does not do anything. Thanks for your
answer Otis.

2013/4/17 Otis Gospodnetic otis.gospodne...@gmail.com

 Oui, ZK holds the map.

 Otis
 --
 Solr  ElasticSearch Support
 http://sematext.com/





 On Tue, Apr 16, 2013 at 6:33 PM, Furkan KAMACI furkankam...@gmail.com
 wrote:
  Hi Otis;
 
  You said:
 
  It can just do it because it knows where things are.
 
  Does it learn it from Zookeeper?
 
  2013/4/17 Otis Gospodnetic otis.gospodne...@gmail.com
 
  If query comes to shard X on some node and this shard X is NOT a
  leader, but HAS data, it will just execute the query.  If it needs to
  query shards on other nodes, it will have the info about which shards
  to query and will just do that and aggregate the results.  It doesn't
  have to ask leader for permission, for info, etc.  It can just do it
  because it knows where things are.
 
  Otis
  --
  Solr  ElasticSearch Support
  http://sematext.com/
 
 
 
 
 
  On Tue, Apr 16, 2013 at 5:23 PM, Furkan KAMACI furkankam...@gmail.com
  wrote:
   Hi Mark;
  
   When I speak with proper terms I want to ask that: is there a data
  locality
   of spatial locality (
  
 
 http://www.roguewave.com/portals/0/products/threadspotter/docs/2011.2/manual_html_linux/manual_html/ch_intro_locality.html
   - I mean if you have data on your machine, use it and don't search it
   anywhere else, just search for remaining parts) at querying on a
 leader
  of
   SolrCloud?
  
   2013/4/16 Mark Miller markrmil...@gmail.com
  
   Leaders don't have much to do with querying - the node that you query
  will
   determine what other nodes it has to query to search the whole index
  and do
   a scatter/gather for you. (Though in some cases that request can be
  proxied
   to another node)
  
   - Mark
  
   On Apr 16, 2013, at 7:48 AM, Furkan KAMACI furkankam...@gmail.com
  wrote:
  
When a leader responses for a query, does it says that: If I have
 the
   data
what I am looking for, I should build response with it, otherwise I
   should
find it anywhere. Because it may be long to search it?
or
does it says I only index the data, I will tell it to other guys to
  build
up the response query?
  
  
 



Re: Storing Solr Index on NFS

2013-04-16 Thread Furkan KAMACI
I don't want to bother but I try to understand that part:

When yo perform a commit in solr you have (for an instant) two versions of
the index. The commit produces new segments (with new documents, new
deletions, etc). After creating these new segments a new index searcher is
created and its caches begin to autowarm. At this point the old index
searcher that you were using is still active receiving requests. After the
new index searcher finishes loading and autowarming the old searcher is
discarded.

So does it mean that when I have multiple Solr servers and a shared index,
I should synchronize the caches at that different machines RAMs?

2013/4/17 Otis Gospodnetic otis.gospodne...@gmail.com

 Yesterday, we spent 1 hour with a client looking at their cluster's
 performance metrics SPM, their indexing logs, etc. trying to figure
 out why some indexing was slower than it should have been.  We traced
 issues to network hickups, to VMs that would move from host to host,
 etc.  Really fancy and powerful system in terms of hardware resources,
 but in the end a bit too far from just locally attached HDD or SDD
 that would not have issues like the ones we found.  I'd stay away from
 NFS for the same reason - it's another moving part on the other side
 of the network.

 Otis
 --
 Solr  ElasticSearch Support
 http://sematext.com/





 On Tue, Apr 16, 2013 at 7:15 AM, Furkan KAMACI furkankam...@gmail.com
 wrote:
  Hi Walter;
 
  You said: It is not safe to share Solr index files between two Solr
  servers. Why do you think like that?
 
 
  2013/4/16 Tim Vaillancourt t...@elementspace.com
 
  If centralization of storage is your goal by choosing NFS, iSCSI works
  reasonably well with SOLR indexes, although good local-storage will
 always
  be the overall winner.
 
  I noticed a near 5% degredation in overall search performance (casual
  testing, nothing scientific) when moving a 40-50GB indexes to iSCSI
 (10GBe
  network) from a 4x7200rpm RAID 10 local SATA disk setup.
 
  Tim
 
 
  On 15/04/13 09:59 AM, Walter Underwood wrote:
 
  Solr 4.2 does have field compression which makes smaller indexes. That
  will reduce the amount of network traffic. That probably does not help
  much, because I think the latency of NFS is what causes problems.
 
  wunder
 
  On Apr 15, 2013, at 9:52 AM, Ali, Saqib wrote:
 
   Hello Walter,
 
  Thanks for the response. That has been my experience in the past as
 well.
  But I was wondering if there new are things in Solr 4 and NFS 4.1 that
  make
  the storing of indexes on a NFS mount feasible.
 
  Thanks,
  Saqib
 
 
  On Mon, Apr 15, 2013 at 9:47 AM, Walter Underwoodwunder@wunderwood.
 **
  org wun...@wunderwood.orgwrote:
 
   On Apr 15, 2013, at 9:40 AM, Ali, Saqib wrote:
 
   Greetings,
 
  Are there any issues with storing Solr Indexes on a NFS share? Also
 any
  recommendations for using NFS for Solr indexes?
 
  I recommend that you do not put Solr indexes on NFS.
 
  It can be very slow, I measured indexing as 100X slower on NFS a few
  years
  ago.
 
  It is not safe to share Solr index files between two Solr servers, so
  there is no benefit to NFS.
 
  wunder
  --
  Walter Underwood
  wun...@wunderwood.org
 
 
 
 
   --
  Walter Underwood
  wun...@wunderwood.org
 
 
 
 
 



Re: Push/pull model between leader and replica in one shard

2013-04-16 Thread Furkan KAMACI
Really nice presentation.

2013/4/17 Mark Miller markrmil...@gmail.com


 On Apr 16, 2013, at 1:36 AM, SuoNayi suonayi2...@163.com wrote:

  Hi, can someone explain more details about what model is used to sync
 docs between the lead and
  replica in the shard?
  The model can be push or pull.Supposing I have only one shard that has 1
 leader and 2 replicas,
  when the leader receives a update request, does it will scatter the
 request to each available and active
  replica at first and then processes the request locally at last?In this
 case if the replicas are able to catch
  up with the leader can I think this is a push model that the leader
 pushes updates to it's replicas?

 Currently, the leader adds the doc locally and then sends it to all
 replicas concurrently.

 
 
  What happens if a replica is behind the leader?Will the replica pull
 docs from the leader and keep
  a track of the coming updates from the lead in a log(called tlog)?If so
 when it complete pulling docs
  it will replay updates in the tlog at last?

 If an update forwarded from a leader to a replica fails it's likely
 because that replica died. Just in case, the leader will ask that replica
 to enter recovery.

 When a node comes up and is not a leader, it also enters recovery.

 Recovery tries to peersync from the leader, and if that fails (works if
 off by about 100 updates), it replicates the entire index.

 If you are interested in more details on the SolrCloud architecture, I've
 given a few talks on it - two of them here:

 http://vimeo.com/43913870
 http://www.youtube.com/watch?v=eVK0wLkLw9w

 - Mark




Re: Push/pull model between leader and replica in one shard

2013-04-17 Thread Furkan KAMACI
Hej Mark;

What did you use to prepare your presentation, its really nice.

2013/4/17 Furkan KAMACI furkankam...@gmail.com

 Really nice presentation.


 2013/4/17 Mark Miller markrmil...@gmail.com


 On Apr 16, 2013, at 1:36 AM, SuoNayi suonayi2...@163.com wrote:

  Hi, can someone explain more details about what model is used to sync
 docs between the lead and
  replica in the shard?
  The model can be push or pull.Supposing I have only one shard that has
 1 leader and 2 replicas,
  when the leader receives a update request, does it will scatter the
 request to each available and active
  replica at first and then processes the request locally at last?In this
 case if the replicas are able to catch
  up with the leader can I think this is a push model that the leader
 pushes updates to it's replicas?

 Currently, the leader adds the doc locally and then sends it to all
 replicas concurrently.

 
 
  What happens if a replica is behind the leader?Will the replica pull
 docs from the leader and keep
  a track of the coming updates from the lead in a log(called tlog)?If so
 when it complete pulling docs
  it will replay updates in the tlog at last?

 If an update forwarded from a leader to a replica fails it's likely
 because that replica died. Just in case, the leader will ask that replica
 to enter recovery.

 When a node comes up and is not a leader, it also enters recovery.

 Recovery tries to peersync from the leader, and if that fails (works if
 off by about 100 updates), it replicates the entire index.

 If you are interested in more details on the SolrCloud architecture, I've
 given a few talks on it - two of them here:

 http://vimeo.com/43913870
 http://www.youtube.com/watch?v=eVK0wLkLw9w

 - Mark





Solr Caching

2013-04-17 Thread Furkan KAMACI
I've just started to read about Solr caching. I want to learn one thing.
Let's assume that I have given 4 GB RAM into my Solr application and I have
10 GB RAM. When Solr caching mechanism starts to work, does it use memory
from that 4 GB part or lets operating system to cache it from 6 GB part of
RAM that is remaining from Solr application?


  1   2   3   4   5   6   7   8   >