Re: SolrCloud setup - any advice?

2013-09-20 Thread Neil Prosser
Sorry, my bad. For SolrCloud soft commits are enabled (every 15 seconds). I
do a hard commit from an external cron task via curl every 15 minutes.

The version I'm using for the SolrCloud setup is 4.4.0.

Document cache warm-up times are 0ms.
Filter cache warm-up times are between 3 and 7 seconds.
Query result cache warm-up times are between 0 and 2 seconds.

I haven't tried disabling the caches, I'll give that a try and see what
happens.

This isn't a static index. We are indexing documents into it. We're keeping
up with our normal update load, which is to make updates to a percentage of
the documents (thousands, not hundreds).




On 19 September 2013 20:33, Shreejay Nair shreej...@gmail.com wrote:

 Hi Neil,

 Although you haven't mentioned it, just wanted to confirm - do you have
 soft commits enabled?

 Also what's the version of solr you are using for the solr cloud setup?
 4.0.0 had lots of memory and zk related issues. What's the warmup time for
 your caches? Have you tried disabling the caches?

 Is this is static index or you documents are added continuously?

 The answers to these questions might help us pin point the issue...

 On Thursday, September 19, 2013, Neil Prosser wrote:

  Apologies for the giant email. Hopefully it makes sense.
 
  We've been trying out SolrCloud to solve some scalability issues with our
  current setup and have run into problems. I'd like to describe our
 current
  setup, our queries and the sort of load we see and am hoping someone
 might
  be able to spot the massive flaw in the way I've been trying to set
 things
  up.
 
  We currently run Solr 4.0.0 in the old style Master/Slave replication. We
  have five slaves, each running Centos with 96GB of RAM, 24 cores and with
  48GB assigned to the JVM heap. Disks aren't crazy fast (i.e. not SSDs)
 but
  aren't slow either. Our GC parameters aren't particularly exciting, just
  -XX:+UseConcMarkSweepGC. Java version is 1.7.0_11.
 
  Our index size ranges between 144GB and 200GB (when we optimise it back
  down, since we've had bad experiences with large cores). We've got just
  over 37M documents some are smallish but most range between 1000-6000
  bytes. We regularly update documents so large portions of the index will
 be
  touched leading to a maxDocs value of around 43M.
 
  Query load ranges between 400req/s to 800req/s across the five slaves
  throughout the day, increasing and decreasing gradually over a period of
  hours, rather than bursting.
 
  Most of our documents have upwards of twenty fields. We use different
  fields to store territory variant (we have around 30 territories) values
  and also boost based on the values in some of these fields (integer
 ones).
 
  So an average query can do a range filter by two of the territory variant
  fields, filter by a non-territory variant field. Facet by a field or two
  (may be territory variant). Bring back the values of 60 fields. Boost
 query
  on field values of a non-territory variant field. Boost by values of two
  territory-variant fields. Dismax query on up to 20 fields (with boosts)
 and
  phrase boost on those fields too. They're pretty big queries. We don't do
  any index-time boosting. We try to keep things dynamic so we can alter
 our
  boosts on-the-fly.
 
  Another common query is to list documents with a given set of IDs and
  select documents with a common reference and order them by one of their
  fields.
 
  Auto-commit every 30 minutes. Replication polls every 30 minutes.
 
  Document cache:
* initialSize - 32768
* size - 32768
 
  Filter cache:
* autowarmCount - 128
* initialSize - 8192
* size - 8192
 
  Query result cache:
* autowarmCount - 128
* initialSize - 8192
* size - 8192
 
  After a replicated core has finished downloading (probably while it's
  warming) we see requests which usually take around 100ms taking over 5s.
 GC
  logs show concurrent mode failure.
 
  I was wondering whether anyone can help with sizing the boxes required to
  split this index down into shards for use with SolrCloud and roughly how
  much memory we should be assigning to the JVM. Everything I've read
  suggests that running with a 48GB heap is way too high but every attempt
  I've made to reduce the cache sizes seems to wind up causing
 out-of-memory
  problems. Even dropping all cache sizes by 50% and reducing the heap by
 50%
  caused problems.
 
  I've already tried using SolrCloud 10 shards (around 3.7M documents per
  shard, each with one replica) and kept the cache sizes low:
 
  Document cache:
* initialSize - 1024
* size - 1024
 
  Filter cache:
* autowarmCount - 128
* initialSize - 512
* size - 512
 
  Query result cache:
* autowarmCount - 32
* initialSize - 128
* size - 128
 
  Even when running on six machines in AWS with SSDs, 24GB heap (out of
 60GB
  memory) and four shards on two boxes and three on the rest I still see
  concurrent mode failure. This looks like it's causing 

Spellchecking

2013-09-20 Thread Gastone Penzo
Hi,
i'd like to know if is it possibile to have suggests only of a part of
indexes.
for example:

an ecommerce:
there are a lot of typologies of products (book, dvd, cd..)

if i search inside books, i want only suggests of books products, not cds
but the spellchecking indexs are all together.

is it possibile to divided indexes or have suggests only of a typology?

thanx

-- 
Gastone


Hash range to shard assignment

2013-09-20 Thread lochri
Hello folks,

we would like to have control of where certain hash values or ranges are
being located.
The reason is that we want to shard per user but we know ahead that one or
more specific users could grow way faster than others. Therefore we would
like to locate them on separate shards (which may be on the same server
initially and can be moved out later).

So my question: can we control the hash-ranges and hash-range to shard
assignment in SolrCloud ?

Regards,
Lochri 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Hash-range-to-shard-assignment-tp4091204.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Hash range to shard assignment

2013-09-20 Thread Noble Paul നോബിള്‍ नोब्ळ्
This would need you to plug your own router . It is not yet possible

But , you can split that shard repeatedly and keep the no:of users in that
shard limited


On Fri, Sep 20, 2013 at 3:52 PM, lochri loc...@web.de wrote:

 Hello folks,

 we would like to have control of where certain hash values or ranges are
 being located.
 The reason is that we want to shard per user but we know ahead that one or
 more specific users could grow way faster than others. Therefore we would
 like to locate them on separate shards (which may be on the same server
 initially and can be moved out later).

 So my question: can we control the hash-ranges and hash-range to shard
 assignment in SolrCloud ?

 Regards,
 Lochri




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Hash-range-to-shard-assignment-tp4091204.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
-
Noble Paul


RE: Spellchecking

2013-09-20 Thread Dyer, James
If you're using spellcheck.collate you can also set 
spellcheck.maxCollationTries to validate each collation against the index 
before suggesting it.  This validation takes into account any fq parameters 
on your query, so if your original query has fq=Product:Book, then the 
collations returned will all be vetted by internally running the query with 
that filter applied.

If for some reason your main query does not have fq=Product:Book, but you 
want it considered when collations are being built, you can include 
spellcheck.collateParam.fq=Product:Book.

See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.collate and 
following sections.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Gastone Penzo [mailto:gastone.pe...@gmail.com] 
Sent: Friday, September 20, 2013 4:00 AM
To: solr-user@lucene.apache.org
Subject: Spellchecking

Hi,
i'd like to know if is it possibile to have suggests only of a part of
indexes.
for example:

an ecommerce:
there are a lot of typologies of products (book, dvd, cd..)

if i search inside books, i want only suggests of books products, not cds
but the spellchecking indexs are all together.

is it possibile to divided indexes or have suggests only of a typology?

thanx

-- 
Gastone



Re: Will Solr work with a mapped drive?

2013-09-20 Thread Aloke Ghoshal
Hi,

Try the UNC path instead: http://wiki.apache.org/tomcat/FAQ/Windows#Q6

Regards,
Aloke

On 9/20/13, johnmu...@aol.com johnmu...@aol.com wrote:
 Hi,


 I'm having this same problem as described here:
 http://stackoverflow.com/questions/17708163/absolute-paths-in-solr-xml-configuration-using-tomcat6-on-windows
  Any one knows if this is a limitation of Solr or not?


 I searched the web, nothing came up.


 Thanks!!!


 -- MJ



Re: check which file/document cause solr to work hard

2013-09-20 Thread Erick Erickson
you can always commit them one at a time to the ExtractingRequestHandler

http://wiki.apache.org/solr/ExtractingRequestHandler

Best,
Erick


On Tue, Sep 17, 2013 at 6:47 AM, Yossi Nachum nachum...@gmail.com wrote:

 Hi,

 I am trying to index my windows pc files with manifoldcf version 1.3 and
 solr version 4.4.

 Few minutes after I start the crawler job I see that tomcat process
 constantly consume 100% of one cpu (I have two cpu's).

 I check the thread dump in solr admin and saw that the following threads
 take the most cpu/user time
 
 http-8080-3 (32)

- java.io.FileInputStream.readBytes(Native Method)
- java.io.FileInputStream.read(FileInputStream.java:236)
- java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
- java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
- java.io.BufferedInputStream.read(BufferedInputStream.java:334)
- org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:99)
- java.io.FilterInputStream.read(FilterInputStream.java:133)
- org.apache.tika.io.TailStream.read(TailStream.java:117)
- org.apache.tika.io.TailStream.skip(TailStream.java:140)
- org.apache.tika.parser.mp3.MpegStream.skipStream(MpegStream.java:283)
- org.apache.tika.parser.mp3.MpegStream.skipFrame(MpegStream.java:160)
-

  org.apache.tika.parser.mp3.Mp3Parser.getAllTagHandlers(Mp3Parser.java:193)
- org.apache.tika.parser.mp3.Mp3Parser.parse(Mp3Parser.java:71)
- org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
- org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
-
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
-

  
 org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219)
-

  
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
-

  
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
-

  
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241)
- org.apache.solr.core.SolrCore.execute(SolrCore.java:1904)
-

  
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659)
-

  
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362)
-

  
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
-

  
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
-

  
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
-

  
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
-

  
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
-

  org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
-

  org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
-

  
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
-

  org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
-

  org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)
-

  
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
-
 org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
- java.lang.Thread.run(Thread.java:679)

 

 how can I check which file cause tika to work so hard?
 I don't see anything in the log files and I am stuck
 Thanks,
 Yossi



Re: Solr node goes down while trying to index records

2013-09-20 Thread Erick Erickson
What happens if you bump up you zookeeper timeout? This has been an issue
at times in the past.

Best,
Erick


On Tue, Sep 17, 2013 at 1:48 PM, Furkan KAMACI furkankam...@gmail.comwrote:

 Could you give some information about your jetty.xml and give more info
 about your index rate and RAM usage of your machines?

 17 Eylül 2013 Salı tarihinde neoman harira...@gmail.com adlı kullanıcı
 şöyle yazdı:
  yes. the nodes go down while indexing. if we stop indexing, it does not
 go
  down.
 
 
 
  --
  View this message in context:

 http://lucene.472066.n3.nabble.com/Solr-node-goes-down-while-trying-to-index-records-tp4090610p4090644.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 



Re: Limits of Document Size at SolrCloud and Faced Problems with Large Size of Documents

2013-09-20 Thread Erick Erickson
You're probably exceeding the size that your servlet container allows.
This assumes you're using curl or some such. You can change it.
How big is the document and how are you sending it to Solr?

Best,
Erick


On Tue, Sep 17, 2013 at 2:24 PM, Furkan KAMACI furkankam...@gmail.comwrote:

 Currently I hafer over 50+ millions documents at my index and as I mentiod
 before at another question I have some problems while indexing (jetty EOF
 exception) I know that problem may not be about index size but just I want
 to learn that is there any limit for document size at Solr that if I exceed
 it I can have some problems? I am not talking about the theoretical limit.

 What are the maximim index size for folks and what they to handle heavy
 index rate when having millions of documents. What tuning strategies they
 do?

 PS: I have 18 machines, 9 shards, each machine has 48 GB RAM and I use Solr
 4.2.1 for my SolrCloud.



Need help understanding the use cases behind core auto-discovery

2013-09-20 Thread Timothy Potter
Trying to add some information about core.properties and auto-discovery in
Solr in Action and am at a loss for what to tell the reader is the purpose
of this feature.

Can anyone point me to any background information about core
auto-discovery? I'm not interested in the technical implementation details.
Mainly I'm trying to understand the motivation behind having this feature
as it seems unnecessary with the Core Admin API. Best I can tell is it
removes a manual step of firing off a call to the Core Admin API or loading
a core from the Admin UI. If that's it and I'm overthinking it, then cool
but was expecting more of an ah-ha moment with this feature ;-)

Any insights you can share are appreciated.

Thanks.
Tim


Problem running EmbeddedSolr (spring data)

2013-09-20 Thread JMill
What is the cause of this Stactrace?

Working with the following solr maven dependancies

solr-core-version4.4.0/
solr-core-version
spring-data-solr-version1.0.0.RC1/spring-data-solr-version

Stacktrace

SEVERE: Exception sending context initialized event to listener instance of
class org.springframework.web.context.ContextLoaderListener
org.springframework.beans.factory.BeanCreationException: Error creating
bean with name 'solrServerFactoryBean' defined in class path resource
[com/project/core/config/EmbeddedSolrContext.class]: Invocation of init
method failed; nested exception is java.lang.NoSuchMethodError:
org.apache.solr.core.CoreContainer.init(Ljava/lang/String;Ljava/io/File;)V
at
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1482)
at
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:521)
at
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:458)
at
org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:295)
at
org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:223)
at
org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:292)
at
org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:194)
at
org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:608)
at
org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:932)
at
org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:479)
at
org.springframework.web.context.ContextLoader.configureAndRefreshWebApplicationContext(ContextLoader.java:389)
at
org.springframework.web.context.ContextLoader.initWebApplicationContext(ContextLoader.java:294)
at
org.springframework.web.context.ContextLoaderListener.contextInitialized(ContextLoaderListener.java:112)
at
org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:4887)
at
org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5381)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
at
org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1559)
at
org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1549)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.lang.NoSuchMethodError:
org.apache.solr.core.CoreContainer.init(Ljava/lang/String;Ljava/io/File;)V
at
org.springframework.data.solr.server.support.EmbeddedSolrServerFactory.createPathConfiguredSolrServer(EmbeddedSolrServerFactory.java:96)
at
org.springframework.data.solr.server.support.EmbeddedSolrServerFactory.initSolrServer(EmbeddedSolrServerFactory.java:72)
at
org.springframework.data.solr.server.support.EmbeddedSolrServerFactoryBean.afterPropertiesSet(EmbeddedSolrServerFactoryBean.java:41)
at
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeInitMethods(AbstractAutowireCapableBeanFactory.java:1541)
at
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1479)
... 22 more



//Config Class
@Configuration
@EnableSolrRepositories(core.solr.repository)
@Profile(dev)
@PropertySource(classpath:solr.properties)
public class EmbeddedSolrContext {
@Resource
private Environment environment;

@Bean
public EmbeddedSolrServerFactoryBean solrServerFactoryBean() {
EmbeddedSolrServerFactoryBean factory = new
EmbeddedSolrServerFactoryBean();


factory.setSolrHome(environment.getRequiredProperty(solr.solr.home));

return factory;
}

@Bean
public SolrTemplate solrTemplate() throws Exception {
return new SolrTemplate(solrServerFactoryBean().getObject());
}
}

Solr.properties

solr.server.url=http://localhost:8983/solr/
solr.solr.home=classpath*:com/project/core/solr -- NOTE: points to an
empty package inside the project


Re: Need help understanding the use cases behind core auto-discovery

2013-09-20 Thread Yonik Seeley
On Fri, Sep 20, 2013 at 11:56 AM, Timothy Potter thelabd...@gmail.com wrote:
 Trying to add some information about core.properties and auto-discovery in
 Solr in Action and am at a loss for what to tell the reader is the purpose
 of this feature.

IMO, it was more a removal of unnecessary central configuration.
You previously had to list the core in solr.xml, and now you don't.
Cores should be fully self-describing so that it should be easy to
move them in the future just by moving the core directory (although
that may not yet work...)

-Yonik
http://lucidworks.com

 Can anyone point me to any background information about core
 auto-discovery? I'm not interested in the technical implementation details.
 Mainly I'm trying to understand the motivation behind having this feature
 as it seems unnecessary with the Core Admin API. Best I can tell is it
 removes a manual step of firing off a call to the Core Admin API or loading
 a core from the Admin UI. If that's it and I'm overthinking it, then cool
 but was expecting more of an ah-ha moment with this feature ;-)

 Any insights you can share are appreciated.

 Thanks.
 Tim


Re: Need help understanding the use cases behind core auto-discovery

2013-09-20 Thread Timothy Potter
Exactly the insight I was looking for! Thanks Yonik ;-)


On Fri, Sep 20, 2013 at 10:37 AM, Yonik Seeley yo...@lucidworks.com wrote:

 On Fri, Sep 20, 2013 at 11:56 AM, Timothy Potter thelabd...@gmail.com
 wrote:
  Trying to add some information about core.properties and auto-discovery
 in
  Solr in Action and am at a loss for what to tell the reader is the
 purpose
  of this feature.

 IMO, it was more a removal of unnecessary central configuration.
 You previously had to list the core in solr.xml, and now you don't.
 Cores should be fully self-describing so that it should be easy to
 move them in the future just by moving the core directory (although
 that may not yet work...)

 -Yonik
 http://lucidworks.com

  Can anyone point me to any background information about core
  auto-discovery? I'm not interested in the technical implementation
 details.
  Mainly I'm trying to understand the motivation behind having this feature
  as it seems unnecessary with the Core Admin API. Best I can tell is it
  removes a manual step of firing off a call to the Core Admin API or
 loading
  a core from the Admin UI. If that's it and I'm overthinking it, then cool
  but was expecting more of an ah-ha moment with this feature ;-)
 
  Any insights you can share are appreciated.
 
  Thanks.
  Tim



Re: Solr node goes down while trying to index records

2013-09-20 Thread Erick Erickson
What happens if you bump up you zookeeper timeout? This has been an
issue at times in the past.

Best,
Erick

On Tue, Sep 17, 2013 at 1:48 PM, Furkan KAMACI furkankam...@gmail.com wrote:
 Could you give some information about your jetty.xml and give more info
 about your index rate and RAM usage of your machines?

 17 Eylül 2013 Salı tarihinde neoman harira...@gmail.com adlı kullanıcı
 şöyle yazdı:
 yes. the nodes go down while indexing. if we stop indexing, it does not go
 down.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-node-goes-down-while-trying-to-index-records-tp4090610p4090644.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: check which file/document cause solr to work hard

2013-09-20 Thread Erick Erickson
you can always commit them one at a time to the ExtractingRequestHandler

http://wiki.apache.org/solr/ExtractingRequestHandler

Best,
Erick

On Tue, Sep 17, 2013 at 6:47 AM, Yossi Nachum nachum...@gmail.com wrote:
 Hi,

 I am trying to index my windows pc files with manifoldcf version 1.3 and
 solr version 4.4.

 Few minutes after I start the crawler job I see that tomcat process
 constantly consume 100% of one cpu (I have two cpu's).

 I check the thread dump in solr admin and saw that the following threads
 take the most cpu/user time
 
 http-8080-3 (32)

- java.io.FileInputStream.readBytes(Native Method)
- java.io.FileInputStream.read(FileInputStream.java:236)
- java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
- java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
- java.io.BufferedInputStream.read(BufferedInputStream.java:334)
- org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:99)
- java.io.FilterInputStream.read(FilterInputStream.java:133)
- org.apache.tika.io.TailStream.read(TailStream.java:117)
- org.apache.tika.io.TailStream.skip(TailStream.java:140)
- org.apache.tika.parser.mp3.MpegStream.skipStream(MpegStream.java:283)
- org.apache.tika.parser.mp3.MpegStream.skipFrame(MpegStream.java:160)
-
org.apache.tika.parser.mp3.Mp3Parser.getAllTagHandlers(Mp3Parser.java:193)
- org.apache.tika.parser.mp3.Mp3Parser.parse(Mp3Parser.java:71)
- org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
- org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
-
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
-

 org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219)
-

 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
-

 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
-

 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241)
- org.apache.solr.core.SolrCore.execute(SolrCore.java:1904)
-

 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659)
-

 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362)
-

 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
-

 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
-

 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
-

 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
-

 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
-

 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
-

 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
-

 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
-
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
-
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)
-

 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
- org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
- java.lang.Thread.run(Thread.java:679)

 

 how can I check which file cause tika to work so hard?
 I don't see anything in the log files and I am stuck
 Thanks,
 Yossi


Re: Migrating from Endeca

2013-09-20 Thread Shawn Heisey
On 9/19/2013 5:50 AM, Gareth Poulton wrote:
 A customer wants us to move their entire enterprise platform - of which one
 of the many components is Oracle Endeca - to open source.
 However, customers being the way they are, they don't want to have to give
 up any of the features they currently use, the most prominent of which are
 user friendly web-based editors for non-technical people to be able to edit
 things like:
 - Schema
 - Dimensions (i.e. facets)
 - Dimension groups (not sure what these are)
 - Thesaurus
 - Stopwords
 - Report generation
 - Boosting individual records (i.e. sponsored links)
 - Relevance ranking settings
 - Process pipeline editor for, e.g. adding new languages
 -...all without touching any xml.

I think Jack and Alexandre have pretty much covered what exists now for
Solr without paying someone for features and support - not much.  There
is however some background work underway to bring features exactly like
this to Solr.  Except for the Schema REST API that exists right now, I
don't think any of it has much priority.  The priority is likely to
increase in the future, but probably not fast enough for your needs.

There is a strong desire among the top Solr developers to have Solr
always be in SolrCloud mode in a future major version release -- which
means it would use Zookeeper to store all config information, just like
SolrCloud does now.

When your config is in a separate network service instead of traditional
config files, the ability to edit the config using API calls is very
important.  Creating a UI front-end that uses the API and doesn't
require editing XML would be EXTREMELY nice.  I'm pretty sure that this
is the goal with the current work on the Schema REST API.

If you have any idea how to bring these features to Solr, patches are
always welcome!

Some of the things in your list, particularly facets and grouping (which
is what dimension groups might be equivalent to) are normally handled in
client code.  The application creates the parameters it needs and
handles the response.  With Solr they aren't normally configured on the
server side.  You could do so, by putting parameters in request handler
definitions.

Thanks,
Shawn



Cause of NullPointer Exception? (Solr with Spring Data)

2013-09-20 Thread JMill
I am unsure about the cause of the following NullPointer Exception.  Any
Ideas?

Thanks

Exception in thread main
org.springframework.beans.factory.BeanCreationException: Error creating
bean with name 'aDocumentService': Injection of autowired dependencies
failed; nested exception is
org.springframework.beans.factory.BeanCreationException: Could not autowire
field: com.project.core.solr.repository.DocumentRepository
com.project.core.solr.service.impl.DocumentServiceImpl.DocRepo; nested
exception is org.springframework.beans.factory.BeanCreationException: Error
creating bean with name 'DocumentRepository': FactoryBean threw exception
on object creation; nested exception is java.lang.NullPointerException
at
org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor.postProcessPropertyValues(AutowiredAnnotationBeanPostProcessor.java:288)
at
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.populateBean(AbstractAutowireCapableBeanFactory.java:1116)
at
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:519)
at
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:458)
at
org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:295)
at
org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:223)
at
org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:292)
at
org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:194)
at
org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:626)
at
org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:932)
at
org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:479)
at
org.springframework.context.annotation.AnnotationConfigApplicationContext.init(AnnotationConfigApplicationContext.java:73)
at com.project.core.solr..DocumentTester.main(DocumentTester.java:18)
Caused by: org.springframework.beans.factory.BeanCreationException: Could
not autowire field: com.project.core.solr.repository.DocumentRepository
com.project.core.solr.service.impl.DocumentServiceImpl.DocRepo; nested
exception is org.springframework.beans.factory.BeanCreationException: Error
creating bean with name 'DocumentRepository': FactoryBean threw exception
on object creation; nested exception is java.lang.NullPointerException
at
org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor$AutowiredFieldElement.inject(AutowiredAnnotationBeanPostProcessor.java:514)
at
org.springframework.beans.factory.annotation.InjectionMetadata.inject(InjectionMetadata.java:87)
at
org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor.postProcessPropertyValues(AutowiredAnnotationBeanPostProcessor.java:285)
... 12 more
Caused by: org.springframework.beans.factory.BeanCreationException: Error
creating bean with name 'DocumentRepository': FactoryBean threw exception
on object creation; nested exception is java.lang.NullPointerException
at
org.springframework.beans.factory.support.FactoryBeanRegistrySupport.doGetObjectFromFactoryBean(FactoryBeanRegistrySupport.java:149)
at
org.springframework.beans.factory.support.FactoryBeanRegistrySupport.getObjectFromFactoryBean(FactoryBeanRegistrySupport.java:102)
at
org.springframework.beans.factory.support.AbstractBeanFactory.getObjectForBeanInstance(AbstractBeanFactory.java:1454)
at
org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:306)
at
org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:194)
at
org.springframework.beans.factory.support.DefaultListableBeanFactory.findAutowireCandidates(DefaultListableBeanFactory.java:910)
at
org.springframework.beans.factory.support.DefaultListableBeanFactory.doResolveDependency(DefaultListableBeanFactory.java:853)
at
org.springframework.beans.factory.support.DefaultListableBeanFactory.resolveDependency(DefaultListableBeanFactory.java:768)
at
org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor$AutowiredFieldElement.inject(AutowiredAnnotationBeanPostProcessor.java:486)
... 14 more
Caused by: java.lang.NullPointerException
at
org.springframework.data.solr.repository.support.MappingSolrEntityInformation.getIdAttribute(MappingSolrEntityInformation.java:68)
at
org.springframework.data.solr.repository.support.SimpleSolrRepository.init(SimpleSolrRepository.java:73)
at

Re: SolrCloud setup - any advice?

2013-09-20 Thread Shawn Heisey
On 9/19/2013 9:20 AM, Neil Prosser wrote:
 Apologies for the giant email. Hopefully it makes sense.

Because of its size, I'm going to reply inline like this and I'm going
to trim out portions of your original message.  I hope that's not
horribly confusing to you!  Looking through my archive of the mailing
list, I see that I have given you some of this information before.

 Our index size ranges between 144GB and 200GB (when we optimise it back
 down, since we've had bad experiences with large cores). We've got just
 over 37M documents some are smallish but most range between 1000-6000
 bytes. We regularly update documents so large portions of the index will be
 touched leading to a maxDocs value of around 43M.
 
 Query load ranges between 400req/s to 800req/s across the five slaves
 throughout the day, increasing and decreasing gradually over a period of
 hours, rather than bursting.

With indexes of that size and 96GB of RAM, you're starting to get into
the size range where severe performance problems begin happening.  Also,
with no GC tuning other than turning on CMS (and a HUGE 48GB heap on top
of that), you're going to run into extremely long GC pause times.  Your
query load is what I would call quite high, which will make those GC
problems quite frequent.

This is the problem I was running into with only an 8GB heap, with
similar tuning where I just turned on CMS.  When Solr disappears for 10+
seconds at a time for garbage collection, the load balancer will
temporarily drop that server from the available pool.

I'm aware that this is your old setup, so we'll put it aside for now  so
we can concentrate on your SolrCloud setup.

 Most of our documents have upwards of twenty fields. We use different
 fields to store territory variant (we have around 30 territories) values
 and also boost based on the values in some of these fields (integer ones).
 
 So an average query can do a range filter by two of the territory variant
 fields, filter by a non-territory variant field. Facet by a field or two
 (may be territory variant). Bring back the values of 60 fields. Boost query
 on field values of a non-territory variant field. Boost by values of two
 territory-variant fields. Dismax query on up to 20 fields (with boosts) and
 phrase boost on those fields too. They're pretty big queries. We don't do
 any index-time boosting. We try to keep things dynamic so we can alter our
 boosts on-the-fly.

The nature of your main queries (and possibly your filters) is probably
always going to be a little memory hungry, but it sounds like the facets
are probably what's requiring such incredible amounts of heap RAM.

Try putting a facet.method parameter into your request handler defaults
and set it to enum.  The default is fc which means fieldcache - it
basically loads all the indexed terms for that field on the entire index
into the field cache.  Multiply that by the number of fields that you
facet on (across all your queries), and it can be a real problem.
Memory is always going to be required for quick facets, but it's
generally better to let the OS handle it automatically with disk caching
than to load it into the java heap.

Your next paragraph (which I trimmed) talks about sorting, which is
another thing that eats up java heap.  The amount taken is based on the
number of documents in the index, and a chunk is taken for every field
that you use for sorting.  See if you can reduce the number of fields
you use for sorting.

 Even when running on six machines in AWS with SSDs, 24GB heap (out of 60GB
 memory) and four shards on two boxes and three on the rest I still see
 concurrent mode failure. This looks like it's causing ZooKeeper to mark the
 node as down and things begin to struggle.
 
 Is concurrent mode failure just something that will inevitably happen or is
 it avoidable by dropping the CMSInitiatingOccupancyFraction?

I assume that concurrent mode failure is what gets logged preceding a
full garbage collection.  Aggressively tuning your GC will help
immensely.  The link below has what I am currently using.  Someone on
IRC was saying that they have a 48GB heap with similar settings and they
never see huge pauses.  These tuning parameters don't use fixed memory
sizes, so it should work with any size max heap:

http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning

Otis has mentioned G1.  What I found when I used G1 was that it worked
extremely well *almost* all of the time.  The occasions for full garbage
collections were a LOT less frequent, but when they happened, the pause
was *even longer* than the untuned CMS.  That caused big problems for me
and my load balancer.  Until someone can come up with some awesome G1
tuning parameters, I personally will continue to avoid it except for
small-heap applications.  G1 is an awesome idea.  If it can be tuned, it
will probably be better than a tuned CMS.

Switching to facet.method=enum as outlined above will probably do the
most for letting you decrease your max java heap.  

Re: JVM Crash using solr 4.4 on Centos

2013-09-20 Thread Oak McIlwain
Thanks Michael, I thought I had the latest but it turned out to be from
July 2011. Working Fine with the latest build :-)


On Thu, Sep 19, 2013 at 7:29 PM, Michael Ryan mr...@moreover.com wrote:

 This is a known bug in that JDK version. Upgrade to a newer version of JDK
 7 (any build within the last two years or so should be fine). If that's not
 possible for you, you can add -XX:-UseLoopPredicate as a command line
 option to java to work around this.

 -Michael

 -Original Message-
 From: Oak McIlwain [mailto:oak.mcilw...@gmail.com]
 Sent: Thursday, September 19, 2013 10:10 PM
 To: solr-user@lucene.apache.org
 Subject: JVM Crash using solr 4.4 on Centos

 I have solr 4.4 running on tomcat 7 on my local development environment
 which is ubuntu based and it works fine (Querying, Posting Documents, Data
 Import etc.)

 I am trying to move into a staging environment which is Centos based
 (still using tomcat 7 and solr 4.4 however when attempting to post
 documents and do a data import from mysql through jdbc, after a few hundred
 documents, the tomcat server crashes and it logs:

 # A fatal error has been detected by the Java Runtime Environment:
 #
 #  SIGSEGV (0xb) at pc=0x7fb4d8fe5e85, pid=10620, tid=140414656674112
 # # JRE version: 7.0-b147 # Java VM: Java HotSpot(TM) 64-Bit Server VM
 (21.0-b17 mixed mode
 linux-amd64 compressed oops)
 # Problematic frame:
 # J  org.apache.lucene.analysis.en.PorterStemFilter.incrementToken()Z

 I'm using Sun Java JDK 1.7.0

 Anyone got any ideas I can pursue to resolve this?



java.lang.LinkageError when using custom filters in multiple cores

2013-09-20 Thread Hayden Muhl
I have two cores favorite and user running in the same Tomcat instance.
In each of these cores I have identical field types text_en, text_de,
text_fr, and text_ja. These fields use some custom token filters I've
written. Everything was going smoothly when I only had the favorite core.
When I added the user core, I started getting java.lang.LinkageErrors
being thrown when I start up Tomcat. The error always happens with one of
the classes I've written, but it's unpredictable which class the
classloader chokes on.

Here's the really strange part. I comment out the text_* fields in the
user core and the errors go away (makes sense). I add text_en back in, no
error (OK). I add text_fr back in, no error (OK). I add text_de back
in, and I get the error (ah ha!). I comment text_de out again, and I
still get the same error (wtf?).

I also put a break point at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:424),
and when I load everything one at a time, I don't get any errors.

I'm running Tomcat 5.5.28, Java version 1.6.0_39 and Solr 4.2.0. I'm
running this all within Eclipse 1.5.1 on a mac. I have not tested this on a
production-like system yet.

Here's an example stack trace. In this case it was one of my Japanese
filters, but other times it will choke on my synonym filter, or my compound
word filter. The specific class it fails on doesn't seem to be relevant.

SEVERE: null:java.lang.LinkageError: loader (instance of
 org/apache/catalina/loader/WebappClassLoader): attempted  duplicate class
definition for name: com/shopstyle/solrx/KatakanaVuFilterFactory
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
at java.lang.ClassLoader.defineClass(ClassLoader.java:615)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at
org.apache.catalina.loader.WebappClassLoader.findClass(WebappClassLoader.java:904)
at
org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1353)
at java.lang.ClassLoader.loadClass(ClassLoader.java:295)
at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:627)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:249)
at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:424)
at
org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:462)
at
org.apache.solr.util.plugin.AbstractPluginLoader.create(AbstractPluginLoader.java:89)
at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151)
at
org.apache.solr.schema.FieldTypePluginLoader.readAnalyzer(FieldTypePluginLoader.java:392)
at
org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:86)
at
org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:43)
at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151)
at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:373)
at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:121)
at
org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1018)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1051)
at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:634)
at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:680)

- Hayden


Re: Limits of Document Size at SolrCloud and Faced Problems with Large Size of Documents

2013-09-20 Thread Erick Erickson
A, good to know Shawn...

Erick


On Fri, Sep 20, 2013 at 1:04 PM, Shawn Heisey s...@elyograg.org wrote:

 On 9/20/2013 12:34 PM, Erick Erickson wrote:
  You're probably exceeding the size that your servlet container allows.
  This assumes you're using curl or some such. You can change it.
  How big is the document and how are you sending it to Solr?

 The maximum form size is configurable in Solr, not sure whether that
 change went in for 4.1 or 4.2.  Solr will override what the servlet
 container itself has configured.

 In the requestDispatcher section of solrconfig.xml, you can have a
 requestParsers tag.  One of the attributes for that tag can be
 formdataUploadLimitInKB.  The default value for that setting is 2048,
 for a maximum POST size of 2MB.  This should be described in the example
 solrconfig.xml file.

 Thanks,
 Shawn




Re: Limits of Document Size at SolrCloud and Faced Problems with Large Size of Documents

2013-09-20 Thread Shawn Heisey
On 9/20/2013 12:34 PM, Erick Erickson wrote:
 You're probably exceeding the size that your servlet container allows.
 This assumes you're using curl or some such. You can change it.
 How big is the document and how are you sending it to Solr?

The maximum form size is configurable in Solr, not sure whether that
change went in for 4.1 or 4.2.  Solr will override what the servlet
container itself has configured.

In the requestDispatcher section of solrconfig.xml, you can have a
requestParsers tag.  One of the attributes for that tag can be
formdataUploadLimitInKB.  The default value for that setting is 2048,
for a maximum POST size of 2MB.  This should be described in the example
solrconfig.xml file.

Thanks,
Shawn



Getting term offsets from Solr

2013-09-20 Thread Nalini Kartha
Hi,

We're looking at implementing highlighting for some fields which may be too
large to store in the index.

As an alternative to using the Solr Highlighter (which needs fields to be
stored), I was wondering if a) the offsets of terms are stored BY DEFAULT
in the index (even if we're not using the TermVectorComponent) and if so,
b) is there a way to get the offset information from Solr.

Thanks,
Nalini


Re: Getting term offsets from Solr

2013-09-20 Thread Jack Krupansky

Set:

termVectors=true
termPositions=true
termOffsets=true

And use the fast vector highlighter.

-- Jack Krupansky

-Original Message- 
From: Nalini Kartha 
Sent: Friday, September 20, 2013 7:34 PM 
To: solr-user@lucene.apache.org 
Subject: Getting term offsets from Solr 


Hi,

We're looking at implementing highlighting for some fields which may be too
large to store in the index.

As an alternative to using the Solr Highlighter (which needs fields to be
stored), I was wondering if a) the offsets of terms are stored BY DEFAULT
in the index (even if we're not using the TermVectorComponent) and if so,
b) is there a way to get the offset information from Solr.

Thanks,
Nalini


Re: Limits of Document Size at SolrCloud and Faced Problems with Large Size of Documents

2013-09-20 Thread Erick Erickson
You're probably exceeding the size that your servlet container allows.
This assumes you're using curl or some such. You can change it.
How big is the document and how are you sending it to Solr?

Best,
Erick

On Tue, Sep 17, 2013 at 4:28 PM, Otis Gospodnetic
otis.gospodne...@gmail.com wrote:
 Hi

 50m docs across 18 servers 48gb RAM ain't much. I doubt you are hitting any
 limits in lucene or solr.

 How heavy is your index rate?

 Otis
 Solr  ElasticSearch Support
 http://sematext.com/
 On Sep 17, 2013 5:25 PM, Furkan KAMACI furkankam...@gmail.com wrote:

 Currently I hafer over 50+ millions documents at my index and as I mentiod
 before at another question I have some problems while indexing (jetty EOF
 exception) I know that problem may not be about index size but just I want
 to learn that is there any limit for document size at Solr that if I exceed
 it I can have some problems? I am not talking about the theoretical limit.

 What are the maximim index size for folks and what they to handle heavy
 index rate when having millions of documents. What tuning strategies they
 do?

 PS: I have 18 machines, 9 shards, each machine has 48 GB RAM and I use Solr
 4.2.1 for my SolrCloud.



Re: Getting term offsets from Solr

2013-09-20 Thread Nalini Kartha
Thanks for the reply.

We tried enabling these options but that's also causing too much index
bloat so I was wondering if there's a way to get at the offset information
more cheaply?

Thanks,
Nalini


On Fri, Sep 20, 2013 at 4:41 PM, Jack Krupansky j...@basetechnology.comwrote:

 Set:

 termVectors=true
 termPositions=true
 termOffsets=true

 And use the fast vector highlighter.

 -- Jack Krupansky

 -Original Message- From: Nalini Kartha Sent: Friday, September 20,
 2013 7:34 PM To: solr-user@lucene.apache.org Subject: Getting term
 offsets from Solr
  Hi,

 We're looking at implementing highlighting for some fields which may be too
 large to store in the index.

 As an alternative to using the Solr Highlighter (which needs fields to be
 stored), I was wondering if a) the offsets of terms are stored BY DEFAULT
 in the index (even if we're not using the TermVectorComponent) and if so,
 b) is there a way to get the offset information from Solr.

 Thanks,
 Nalini



Re: Getting term offsets from Solr

2013-09-20 Thread Nalini Kartha
I'm wondering if storing just the offset as a payload would be cheaper from
storage perspective than enabling termOffsets, termVectors and
termPositions? Maybe we could get the offset info to return with results
from there then?

Thanks,
Nalini


On Fri, Sep 20, 2013 at 5:02 PM, Nalini Kartha nalinikar...@gmail.comwrote:

 Thanks for the reply.

 We tried enabling these options but that's also causing too much index
 bloat so I was wondering if there's a way to get at the offset information
 more cheaply?

 Thanks,
 Nalini


 On Fri, Sep 20, 2013 at 4:41 PM, Jack Krupansky 
 j...@basetechnology.comwrote:

 Set:

 termVectors=true
 termPositions=true
 termOffsets=true

 And use the fast vector highlighter.

 -- Jack Krupansky

 -Original Message- From: Nalini Kartha Sent: Friday, September
 20, 2013 7:34 PM To: solr-user@lucene.apache.org Subject: Getting term
 offsets from Solr
  Hi,

 We're looking at implementing highlighting for some fields which may be
 too
 large to store in the index.

 As an alternative to using the Solr Highlighter (which needs fields to be
 stored), I was wondering if a) the offsets of terms are stored BY DEFAULT
 in the index (even if we're not using the TermVectorComponent) and if so,
 b) is there a way to get the offset information from Solr.

 Thanks,
 Nalini