How to migrate content of a collection to a new collection

2014-07-22 Thread Per Steffensen

Hi

We have numerous collections each with numerous shards spread across 
numerous machines. We just discovered that all documents have a field 
with a wrong value and besides that we would like to add a new field to 
all documents
* The field with the wrong value is a long, DocValued, Indexed and 
Stored. Some (about half) of the documents need to have a constant added 
to their current value
* The field we want to add will be and int, DocValued, Indexed and 
Stored. Needs to be added to all documents, but will have different 
values among the documents


How to achieve our goal in the easiest possible way?

We thought about spooling/streaming from the existing collection into a 
"twin"-collection, then delete the existing collection and finally 
rename the "twin"-collection to have the same name as the original 
collection. Basically indexing all documents again. If that is the 
easiest way, how do we query in a way so that we get all documents 
streamed. We cannot just do a *:* query that returns everything into 
memory and the index from there, because we have billions of documents 
(not enough memory). Please note that we are on 4.4, which does not 
contain the new CURSOR-feature. Please also note that speed is an 
important factor for us.


Guess this could also be achieved by doing 1-1 migration on shard-level 
instead of collection-level, keeping everything in the new collections 
on the same machine as where they lived in the old collections. That 
could probably complete faster than the 1-1 on collection-level 
approach. But this 1-1 on shard-level approach is not very good for us, 
because the long field we need to change is also part of the id 
(controlling the routing to a particular shard) and therefore actually 
we also need to change the id on all documents. So if we do the 1-1 on 
shard-level approach, we will end up having documents in shards that 
they actually do not be to (they would not have been routed there by the 
routing system in Solr). We might be able to live with this disadvantage 
if 1-1 on shard-level can be easily achieved much faster than the 1-1 on 
collection-level.


Any input is very much appreciated! Thanks

Regards, Per Steffensen


Re: Replication Problem from solr-3.6 to solr-4.0

2014-07-22 Thread askumar1444
Same with me too, in a multi-core Master/Slave.

11:17:30.476 [snapPuller-8-thread-1] INFO  o.a.s.h.SnapPuller - Master's
generation: 87
11:17:30.476 [snapPuller-8-thread-1] INFO  o.a.s.h.SnapPuller - Slave's
generation: 3
11:17:30.476 [snapPuller-8-thread-1] INFO  o.a.s.h.SnapPuller - Starting
replication process
11:17:30.713 [snapPuller-8-thread-1] ERROR o.a.s.h.SnapPuller - No files to
download for index generation: 87

Any solution/fix for it?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Replication-Problem-from-solr-3-6-to-solr-4-0-tp4025028p4148703.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: NoClassDefFoundError while indexing in Solr

2014-07-22 Thread Shalin Shekhar Mangar
Solr is trying to load "com/uwyn/jhighlight/renderer/XhtmlRendererFactory"
but that is not a class which is shipped or used by Solr. I think you have
some custom plugins (a highlighter perhaps?) which uses that class and the
classpath is not setup correctly.


On Wed, Jul 23, 2014 at 2:20 AM, Ameya Aware  wrote:

> Hi
>
> I am running into below error while indexing a file in solr.
>
> Can you please help to fix this?
>
> ERROR - 2014-07-22 16:40:32.126; org.apache.solr.common.SolrException;
> null:java.lang.RuntimeException: java.lang.NoClassDefFoundError:
> com/uwyn/jhighlight/renderer/XhtmlRendererFactory
> at
>
> org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:790)
> at
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:439)
> at
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
> at
>
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
> at
>
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
> at
>
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
> at
>
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
> at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
> at
>
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
> at
>
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
> at
>
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
> at
>
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
> at
>
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
> at
>
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
> at org.eclipse.jetty.server.Server.handle(Server.java:368)
> at
>
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
> at
>
> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
> at
>
> org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
> at
>
> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
> at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:636)
> at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
> at
>
> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
> at
>
> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
> at
>
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
> at
>
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
> at java.lang.Thread.run(Unknown Source)
> Caused by: java.lang.NoClassDefFoundError:
> com/uwyn/jhighlight/renderer/XhtmlRendererFactory
> at
>
> org.apache.tika.parser.code.SourceCodeParser.getRenderer(SourceCodeParser.java:121)
> at
>
> org.apache.tika.parser.code.SourceCodeParser.parse(SourceCodeParser.java:102)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> at
>
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219)
> at
>
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
> at
>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> at
>
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952)
> at
>
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:774)
> at
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
> ... 26 more
> Caused by: java.lang.ClassNotFoundException:
> com.uwyn.jhighlight.renderer.XhtmlRendererFactory
> at java.net.URLClassLoader$1.run(Unknown Source)
> at java.net.URLClassLoader$1.run(Unknown Source)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(Unknown Source)
> at java.lang.ClassLoader.loadClass(Unknown Source)
> at java.net.FactoryURLClassLoader.loadClass(Unknown Source)
> at java.lang.ClassLoader.loadClass(Unknown Source)
> ... 38 more
>
> WARN  - 2014-07-22 16:40:32.193; org.eclipse.jetty.servlet.ServletHandler;
> Error for /solr/collection1/update/extract
> java.lang.NoClassDefFoundError:
> com/uwyn/jhighlight/renderer/

Re: Query using doc Id

2014-07-22 Thread Alexandre Rafalovitch
Do you mean something different from docId:[100 TO 200] ?

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On Wed, Jul 23, 2014 at 11:49 AM, Mukundaraman Valakumaresan
 wrote:
> Hi,
>
> Is it possible to execute queries using doc Id as a query parameter
>
> For eg, query docs whose doc Id is between 100 and 200
>
> Thanks & Regards
> Mukund


Re: Query using doc Id

2014-07-22 Thread santosh sidnal
i guess you can use these two params in your query,

rows=100&start=100

which will give you 100 documents after 100th document.


On Wed, Jul 23, 2014 at 10:19 AM, Mukundaraman Valakumaresan <
muk...@8kmiles.com> wrote:

> Hi,
>
> Is it possible to execute queries using doc Id as a query parameter
>
> For eg, query docs whose doc Id is between 100 and 200
>
> Thanks & Regards
> Mukund
>



-- 
Regards,
Santosh Sidnal


Query using doc Id

2014-07-22 Thread Mukundaraman Valakumaresan
Hi,

Is it possible to execute queries using doc Id as a query parameter

For eg, query docs whose doc Id is between 100 and 200

Thanks & Regards
Mukund


Re: SOLR 4.4 - Slave always replicates full index

2014-07-22 Thread Shawn Heisey
On 7/22/2014 5:00 PM, Robin Woods wrote:
> I think, I found the issue!
>
> I actually missed to mention a very important step that I did, which is,
> CORE SWAP
> otherwise, it's not replicating the full index.
>
> when we do CORE SWAP, doesn't it do the same checks of copying only deltas?

Yes, it will look for differences and only copy what's changed ... but
when you swap cores, you're pretty much guaranteed that the entire index
is different on the master compared to the slave, so it will have to
copy the entire thing.  Even if you build the index in exactly the same
way in two cores at exactly the same time on the same machine, the end
result will have minor differences, such as the timestamp on each file.

Thanks,
Shawn



Re: SOLR 4.4 - Slave always replicates full index

2014-07-22 Thread Robin Woods
I think, I found the issue!

I actually missed to mention a very important step that I did, which is,
CORE SWAP
otherwise, it's not replicating the full index.

when we do CORE SWAP, doesn't it do the same checks of copying only deltas?





--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-4-4-Slave-always-replicates-full-index-tp4113089p4148678.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to get Lacuma to match Lucuma

2014-07-22 Thread Jack Krupansky
Or possibly use the synonym filter at query or index time for common 
misspellings or misunderstandings about the spelling. That would be 
automatic, without the user needing to add the explicit fuzzy query 
operator.


-- Jack Krupansky

-Original Message- 
From: Anshum Gupta

Sent: Tuesday, July 22, 2014 4:54 PM
To: solr-user@lucene.apache.org
Subject: Re: How to get Lacuma to match Lucuma

Hi Warren,

Check out the section about fuzzy search here
https://cwiki.apache.org/confluence/display/solr/The+Standard+Query+Parser.


On Tue, Jul 22, 2014 at 1:29 PM, Warren Bell 
wrote:


What field type or filters do I use to get something like the word
“Lacuma” to return results with “Lucuma” in it ? The word “Lucuma” has 
been

indexed in a field with field type text_en_splitting that came with the
original solar examples.

Thanks,

Warren


   
  








  
  







  



--
This email was Virus checked by Clark's Nutrition's Astaro Security
Gateway.

The information contained in this e-mail is intended only for use of
the individual or entity named above. This e-mail, and any documents,
files, previous e-mails or other information attached to it, may contain
confidential information that is legally privileged. If you are not the
intended recipient of this e-mail, or the employee or agent responsible
for delivering it to the intended recipient, you are hereby notified
that any disclosure, dissemination, distribution, copying or other use
of this e-mail or any of the information contained in or attached to it
is strictly prohibited. If you have received this e-mail in error,
please immediately notify us by return e-mail or by telephone at
(951)321-1960, and destroy the original e-mail and its attachments
without reading or saving it in any manner. Thank you.

Clark’s Nutrition is a registered trademark of Clark's Nutritional
Centers, Inc.





--

Anshum Gupta
http://www.anshumgupta.net 



Re: How to get Lacuma to match Lucuma

2014-07-22 Thread Anshum Gupta
Hi Warren,

Check out the section about fuzzy search here
https://cwiki.apache.org/confluence/display/solr/The+Standard+Query+Parser.


On Tue, Jul 22, 2014 at 1:29 PM, Warren Bell 
wrote:

> What field type or filters do I use to get something like the word
> “Lacuma” to return results with “Lucuma” in it ? The word “Lucuma” has been
> indexed in a field with field type text_en_splitting that came with the
> original solar examples.
>
> Thanks,
>
> Warren
>
>
> positionIncrementGap="100" autoGeneratePhraseQueries="true">
>   
> 
> 
> 
>  ignoreCase="true"
> words="lang/stopwords_en.txt"
> />
>  generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> 
>  protected="protwords.txt"/>
> 
>   
>   
> 
>  ignoreCase="true" expand="true"/>
>  ignoreCase="true"
> words="lang/stopwords_en.txt"
> />
>  generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
> 
>  protected="protwords.txt"/>
> 
>   
> 
>
>
> --
> This email was Virus checked by Clark's Nutrition's Astaro Security
> Gateway.
>
> The information contained in this e-mail is intended only for use of
> the individual or entity named above. This e-mail, and any documents,
> files, previous e-mails or other information attached to it, may contain
> confidential information that is legally privileged. If you are not the
> intended recipient of this e-mail, or the employee or agent responsible
> for delivering it to the intended recipient, you are hereby notified
> that any disclosure, dissemination, distribution, copying or other use
> of this e-mail or any of the information contained in or attached to it
> is strictly prohibited. If you have received this e-mail in error,
> please immediately notify us by return e-mail or by telephone at
> (951)321-1960, and destroy the original e-mail and its attachments
> without reading or saving it in any manner. Thank you.
>
> Clark’s Nutrition is a registered trademark of Clark's Nutritional
> Centers, Inc.
>



-- 

Anshum Gupta
http://www.anshumgupta.net


NoClassDefFoundError while indexing in Solr

2014-07-22 Thread Ameya Aware
Hi

I am running into below error while indexing a file in solr.

Can you please help to fix this?

ERROR - 2014-07-22 16:40:32.126; org.apache.solr.common.SolrException;
null:java.lang.RuntimeException: java.lang.NoClassDefFoundError:
com/uwyn/jhighlight/renderer/XhtmlRendererFactory
at
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:790)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:439)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:636)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.NoClassDefFoundError:
com/uwyn/jhighlight/renderer/XhtmlRendererFactory
at
org.apache.tika.parser.code.SourceCodeParser.getRenderer(SourceCodeParser.java:121)
at
org.apache.tika.parser.code.SourceCodeParser.parse(SourceCodeParser.java:102)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:774)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
... 26 more
Caused by: java.lang.ClassNotFoundException:
com.uwyn.jhighlight.renderer.XhtmlRendererFactory
at java.net.URLClassLoader$1.run(Unknown Source)
at java.net.URLClassLoader$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at java.net.FactoryURLClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
... 38 more

WARN  - 2014-07-22 16:40:32.193; org.eclipse.jetty.servlet.ServletHandler;
Error for /solr/collection1/update/extract
java.lang.NoClassDefFoundError:
com/uwyn/jhighlight/renderer/XhtmlRendererFactory
at
org.apache.tika.parser.code.SourceCodeParser.getRenderer(SourceCodeParser.java:121)
at
org.apache.tika.parser.code.SourceCodeParser.parse(SourceCodeParser.java:102)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219)
at
org.apache.solr.handler.ContentStreamHandlerBase

How to get Lacuma to match Lucuma

2014-07-22 Thread Warren Bell
What field type or filters do I use to get something like the word “Lacuma” to 
return results with “Lucuma” in it ? The word “Lucuma” has been indexed in a 
field with field type text_en_splitting that came with the original solar 
examples.

Thanks,

Warren


   
  








  
  







  



-- 
This email was Virus checked by Clark's Nutrition's Astaro Security Gateway.

The information contained in this e-mail is intended only for use of
the individual or entity named above. This e-mail, and any documents,
files, previous e-mails or other information attached to it, may contain
confidential information that is legally privileged. If you are not the
intended recipient of this e-mail, or the employee or agent responsible
for delivering it to the intended recipient, you are hereby notified
that any disclosure, dissemination, distribution, copying or other use
of this e-mail or any of the information contained in or attached to it
is strictly prohibited. If you have received this e-mail in error,
please immediately notify us by return e-mail or by telephone at
(951)321-1960, and destroy the original e-mail and its attachments
without reading or saving it in any manner. Thank you.

Clark’s Nutrition is a registered trademark of Clark's Nutritional Centers, Inc.


Re: Java heap Space error

2014-07-22 Thread Rafał Kuć
Hello!

Yes, just edit your Jetty configuration file and add -Xmx and -Xms
parameters. For example, the file you may be looking at it
/etc/default/jetty.

-- 
Regards,
 Rafał Kuć
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


> So can i come over this exception by increasing heap size somewhere?

> Thanks,
> Ameya


> On Tue, Jul 22, 2014 at 2:00 PM, Shawn Heisey  wrote:

>> On 7/22/2014 11:37 AM, Ameya Aware wrote:
>> > i am running into java heap space issue. Please see below log.
>>
>> All we have here is an out of memory exception.  It is impossible to
>> know *why* you are out of memory from the exception.  With enough
>> investigation, we could determine the area of code where the error
>> occurred, but that doesn't say anything at all about what allocated all
>> the memory, it's simply the final allocation that pushed it over the edge.
>>
>> Here is some generic information about Solr and high heap usage:
>>
>> http://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap
>>
>> Thanks,
>> Shawn
>>
>>



Re: Java heap Space error

2014-07-22 Thread Ameya Aware
So can i come over this exception by increasing heap size somewhere?

Thanks,
Ameya


On Tue, Jul 22, 2014 at 2:00 PM, Shawn Heisey  wrote:

> On 7/22/2014 11:37 AM, Ameya Aware wrote:
> > i am running into java heap space issue. Please see below log.
>
> All we have here is an out of memory exception.  It is impossible to
> know *why* you are out of memory from the exception.  With enough
> investigation, we could determine the area of code where the error
> occurred, but that doesn't say anything at all about what allocated all
> the memory, it's simply the final allocation that pushed it over the edge.
>
> Here is some generic information about Solr and high heap usage:
>
> http://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap
>
> Thanks,
> Shawn
>
>


Re: Java heap Space error

2014-07-22 Thread Shawn Heisey
On 7/22/2014 11:37 AM, Ameya Aware wrote:
> i am running into java heap space issue. Please see below log.

All we have here is an out of memory exception.  It is impossible to
know *why* you are out of memory from the exception.  With enough
investigation, we could determine the area of code where the error
occurred, but that doesn't say anything at all about what allocated all
the memory, it's simply the final allocation that pushed it over the edge.

Here is some generic information about Solr and high heap usage:

http://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap

Thanks,
Shawn



Java heap Space error

2014-07-22 Thread Ameya Aware
Hi

i am running into java heap space issue. Please see below log.

ERROR - 2014-07-22 11:38:59.370; org.apache.solr.common.SolrException;
null:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space
at
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:790)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:439)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:636)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.OutOfMemoryError: Java heap space
at org.apache.solr.common.util.JavaBinCodec.writeStr(JavaBinCodec.java:567)
at
org.apache.solr.common.util.JavaBinCodec.writePrimitive(JavaBinCodec.java:646)
at
org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:240)
at org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:153)
at
org.apache.solr.common.util.JavaBinCodec.writeSolrInputDocument(JavaBinCodec.java:409)
at org.apache.solr.update.TransactionLog.write(TransactionLog.java:353)
at org.apache.solr.update.UpdateLog.add(UpdateLog.java:397)
at org.apache.solr.update.UpdateLog.add(UpdateLog.java:382)
at
org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:255)
at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:160)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
at
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:704)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:858)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:557)
at
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.doAdd(ExtractingDocumentLoader.java:121)
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.addDoc(ExtractingDocumentLoader.java:126)
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:228)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:774)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
at
org.eclipse.jetty.servlet.ServletHandl

RE: Multiterm analysis in complexphrase query

2014-07-22 Thread Allison, Timothy B.
Hi Gopal,

I just started a repository on github 
(https://github.com/tballison/tallison-lucene-addons) to host a standalone 
version of LUCENE-5205 (with other patches to come).  SOLR-5410 is next (Solr 
wrapper of the SpanQueryParser), and then I'll try to add LUCENE-5317 
(concordance) and LUCENE-5318 (co-occurrence) over the next week or so.

The code in this repository is "standalone" (not a fork of lucene-solr)and is 
aimed at the most recent stable release of Lucene/Solr.

For "trunk" versions of this code, check out the lucene5205 branch of my 
lucene-solr fork.

Much more work remains.

-Original Message-
From: Gopal Agarwal [mailto:gopal.agarw...@gmail.com] 
Sent: Monday, July 21, 2014 5:04 PM
To: solr-user@lucene.apache.org
Subject: RE: Multiterm analysis in complexphrase query

That would be really useful.

Can you upload the jar and its requirements?

It also makes it pluggable with diff versions of solr.
 On Jul 1, 2014 9:01 PM, "Allison, Timothy B."  wrote:

> If there's enough interest, I might get back into the code and throw a
> standalone src (and jar) of the SpanQueryParser and the Solr wrapper onto
> github.  That would make it more widely available until there's a chance to
> integrate it into Lucene/Solr.  If you'd be interested in this, let me know
> (and/or vote on the issue pages on Jira).
>
> Best,
>
>Tim
>
> -Original Message-
> From: Michael Ryan [mailto:mr...@moreover.com]
> Sent: Tuesday, July 01, 2014 9:24 AM
> To: solr-user@lucene.apache.org
> Subject: RE: Multiterm analysis in complexphrase query
>
> Thanks. This looks interesting...
>
> -Michael
>
> -Original Message-
> From: Allison, Timothy B. [mailto:talli...@mitre.org]
> Sent: Monday, June 30, 2014 8:15 AM
> To: solr-user@lucene.apache.org
> Subject: RE: Multiterm analysis in complexphrase query
>
> Ahmet, please correct me if I'm wrong, but the ComplexPhraseQueryParser
> does not perform analysis (as you, Michael, point out).  The
> SpanQueryParser in LUCENE-5205 does perform analysis and might meet your
> needs.  Work on it has gone on pause, though, so you'll have to build from
> the patch or the LUCENE-5205 branch.  Let me know if you have any questions.
>
> LUCENE-5470 and LUCENE-5504 would move multiterm analysis farther down and
> make it available to all parsers that use QueryParserBase, including the
> ComplexPhraseQueryParser.
>
> Best,
>
> Tim
>
> -Original Message-
> From: Michael Ryan [mailto:mr...@moreover.com]
> Sent: Sunday, June 29, 2014 11:09 AM
> To: solr-user@lucene.apache.org
> Subject: Multiterm analysis in complexphrase query
>
> I've been using a modified version of the complex phrase query parser
> patch from https://issues.apache.org/jira/browse/SOLR-1604 in Solr 3.6,
> and I'm currently upgrading to 4.9, which has this built-in.
>
> I'm having trouble with using accents in wildcard queries, support for
> which was added in https://issues.apache.org/jira/browse/SOLR-2438. In
> 3.6, I was using a modified version of SolrQueryParser, which simply used
> ComplexPhraseQueryParser in place of QueryParser. In the version of
> ComplexPhraseQParserPlugin in 4.9, it just directly uses
> ComplexPhraseQueryParser, and doesn't go through SolrQueryParser at all.
> SolrQueryParserBase.analyzeIfMultitermTermText() is where the multiterm
> analysis magic happens.
>
> So, my problem is that ComplexPhraseQParserPlugin/ComplexPhraseQueryParser
> doesn't use SolrQueryParserBase, which breaks doing fun things like this:
> {!complexPhrase}"barac* óba*a"
> And expecting it to match "Barack Obama".
>
> Anyone run into this before, or have a way to get this working?
>
> -Michael
>


Re: Mixing ordinary and nested documents

2014-07-22 Thread Umesh Prasad
public static DocSet mapChildDocsToParentOnly(DocSet childDocSet) {

DocSet mappedParentDocSet = new BitDocSet();
DocIterator childIterator = childDocSet.iterator();
while (childIterator.hasNext()) {
int childDoc = childIterator.nextDoc();
int parentDoc = childToParentDocMapping[childDoc];
mappedParentDocSet.addUnique(parentDoc);
}
int[] matches = new int[mappedParentDocSet.size()];
DocIterator parentIter = mappedParentDocSet.iterator();
for (int i = 0; parentIter.hasNext(); i++) {
matches[i] = parentIter.nextDoc();
}
return new SortedIntDocSet(matches); // you will need
SortedIntDocSet impl else docset interaction in some facet queries fails
later.
}



On 22 July 2014 19:59, Umesh Prasad  wrote:

> Query parentFilterQuery = new TermQuery(new Term("document_type",
> "parent"));
>
> int[] childToParentDocMapping = new int[searcher.maxDoc()];
> DocSet allParentDocSet = searcher.getDocSet(parentFilterQuery);
> DocIterator iter = allParentDocSet.iterator();
> int child = 0;
> while (iter.hasNext()) {
> int parent = iter.nextDoc();
> while (child <= parent) {
> childToParentDocMapping[child] = parent;
> child++;
> }
> }
>
>
> On 22 July 2014 16:28, Bjørn Axelsen 
> wrote:
>
>> Thanks, Umesh
>>
>> You can get the parent bitset by running a the parent doc type query on
>> > the solr indexsearcher.
>> > Then child bitset by runnning the child doc type query. Then  use these
>> > together to create a int[] where int[i] = parent of i.
>> >
>>
>> Can you kindly add an example? I am not quite sure how to put this into a
>> query?
>>
>> I can easily make the join from child to parent, but what I want to
>> achieve
>> is to get the parent document added to the result if it exists but
>> maintain
>> the scoring fromt the child as well as the full child document. Is this
>> possible?
>>
>> Cheers,
>> Bjørn
>>
>> 2014-07-18 19:00 GMT+02:00 Umesh Prasad :
>>
>> > Comments inline
>> >
>> >
>> > On 16 July 2014 20:31, Bjørn Axelsen > >
>> > wrote:
>> >
>> > > Hi Solr users
>> > >
>> > > I would appreciate your inputs on how to handle a *mix *of *simple
>> *and
>> > > *nested
>> > > *documents in the most easy and flexible way.
>> > >
>> > > I need to handle:
>> > >
>> > >- simple documens: webpages, short articles etc. (approx. 90% of
>> the
>> > >content)
>> > >- nested documents: books containing chapters etc. (approx 10% of
>> the
>> > >content)
>> > >
>> > >
>> >
>> >
>> > > For simple documents I just want to present straightforward search
>> > results
>> > > without any grouping etc.
>> > >
>> > > For the nested documents I want to group by book and show book title,
>> > book
>> > > price etc. AND the individual results within the book. Lets say there
>> is
>> > a
>> > > hit on "Chapters 1" and "Chapter 7" within "Book 1" and a hit on
>> "Article
>> > > 1", I would like to present this:
>> > >
>> > > *Book 1 title*
>> > > Book 1 published date
>> > > Book 1 description
>> > > - *Chapter 1 title*
>> > >   Chapter 1 snippet
>> > > - *Chapter 7 title*
>> > >   CHapter 7 snippet
>> > >
>> > > *Article 1 title*
>> > > Article 1 published date
>> > > Article 1 description
>> > > Article 1 snippet
>> > >
>> > > It looks like it is pretty straightforward to use the
>> CollapsingQParser
>> > to
>> > > collapse the book results into one result and not to collapse the
>> other
>> > > results. But how about showing the information about the book (the
>> parent
>> > > document of the chapters)?
>> > >
>> >
>> > You can map the child document to parent  doc id space and extract the
>> > information from parent doc id.
>> >
>> > First you need to generate child doc to parent doc id mapping one time.
>> >   You can get the parent bitset by running a the parent doc type query
>> on
>> > the solr indexsearcher.
>> > Then child bitset by runnning the child doc type query. Then  use these
>> > together to create a int[] where int[i] = parent of i. This result is
>> > cachable till next commit. I am doing that for computing facets from
>> fields
>> > in parent docs and sorting on values from parent docs (while getting
>> child
>> > docs as output).
>> >
>> >
>> >
>> >
>> > > 1) Is there a way to do an* optional block join* to a *parent
>> *document
>> > and
>> > > return it together *with *the *child *document - but not to require a
>> > > parent document?
>> > >
>> > > - or -
>> > >
>> > > 2) Do I need to require parent-child documents for everything? This is
>> > > really not my preferred strategy as only a small part of the
>> documents is
>> > > in a real parent-child relationship. This would mean a lot of dummy
>> child
>> > > documents.
>> > >
>> > >
>> >
>> > >
>> > > - or -
>> > >
>> > > 3) Should I just denormalize data 

Re: Mixing ordinary and nested documents

2014-07-22 Thread Umesh Prasad
Query parentFilterQuery = new TermQuery(new Term("document_type",
"parent"));

int[] childToParentDocMapping = new int[searcher.maxDoc()];
DocSet allParentDocSet = searcher.getDocSet(parentFilterQuery);
DocIterator iter = allParentDocSet.iterator();
int child = 0;
while (iter.hasNext()) {
int parent = iter.nextDoc();
while (child <= parent) {
childToParentDocMapping[child] = parent;
child++;
}
}


On 22 July 2014 16:28, Bjørn Axelsen 
wrote:

> Thanks, Umesh
>
> You can get the parent bitset by running a the parent doc type query on
> > the solr indexsearcher.
> > Then child bitset by runnning the child doc type query. Then  use these
> > together to create a int[] where int[i] = parent of i.
> >
>
> Can you kindly add an example? I am not quite sure how to put this into a
> query?
>
> I can easily make the join from child to parent, but what I want to achieve
> is to get the parent document added to the result if it exists but maintain
> the scoring fromt the child as well as the full child document. Is this
> possible?
>
> Cheers,
> Bjørn
>
> 2014-07-18 19:00 GMT+02:00 Umesh Prasad :
>
> > Comments inline
> >
> >
> > On 16 July 2014 20:31, Bjørn Axelsen 
> > wrote:
> >
> > > Hi Solr users
> > >
> > > I would appreciate your inputs on how to handle a *mix *of *simple *and
> > > *nested
> > > *documents in the most easy and flexible way.
> > >
> > > I need to handle:
> > >
> > >- simple documens: webpages, short articles etc. (approx. 90% of the
> > >content)
> > >- nested documents: books containing chapters etc. (approx 10% of
> the
> > >content)
> > >
> > >
> >
> >
> > > For simple documents I just want to present straightforward search
> > results
> > > without any grouping etc.
> > >
> > > For the nested documents I want to group by book and show book title,
> > book
> > > price etc. AND the individual results within the book. Lets say there
> is
> > a
> > > hit on "Chapters 1" and "Chapter 7" within "Book 1" and a hit on
> "Article
> > > 1", I would like to present this:
> > >
> > > *Book 1 title*
> > > Book 1 published date
> > > Book 1 description
> > > - *Chapter 1 title*
> > >   Chapter 1 snippet
> > > - *Chapter 7 title*
> > >   CHapter 7 snippet
> > >
> > > *Article 1 title*
> > > Article 1 published date
> > > Article 1 description
> > > Article 1 snippet
> > >
> > > It looks like it is pretty straightforward to use the CollapsingQParser
> > to
> > > collapse the book results into one result and not to collapse the other
> > > results. But how about showing the information about the book (the
> parent
> > > document of the chapters)?
> > >
> >
> > You can map the child document to parent  doc id space and extract the
> > information from parent doc id.
> >
> > First you need to generate child doc to parent doc id mapping one time.
> >   You can get the parent bitset by running a the parent doc type query on
> > the solr indexsearcher.
> > Then child bitset by runnning the child doc type query. Then  use these
> > together to create a int[] where int[i] = parent of i. This result is
> > cachable till next commit. I am doing that for computing facets from
> fields
> > in parent docs and sorting on values from parent docs (while getting
> child
> > docs as output).
> >
> >
> >
> >
> > > 1) Is there a way to do an* optional block join* to a *parent *document
> > and
> > > return it together *with *the *child *document - but not to require a
> > > parent document?
> > >
> > > - or -
> > >
> > > 2) Do I need to require parent-child documents for everything? This is
> > > really not my preferred strategy as only a small part of the documents
> is
> > > in a real parent-child relationship. This would mean a lot of dummy
> child
> > > documents.
> > >
> > >
> >
> > >
> > > - or -
> > >
> > > 3) Should I just denormalize data and include the book information
> within
> > > each chapter document?
> > >
> > > - or -
> > >
> > > 4) ... or is there a smarter way?
> > >
> > > Your help is very much appreciated.
> > >
> > > Cheers,
> > >
> > > Bjørn Axelsen
> > >
> >
> >
> >
> > --
> > ---
> > Thanks & Regards
> > Umesh Prasad
> >
>



-- 
---
Thanks & Regards
Umesh Prasad


Re: Edit Example Post.jar to read ALL file types

2014-07-22 Thread jrusnak
I am copy-pasting the file extensions /from /the text document /into /the
source code, not /from /the source code. My typing mistake.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Edit-Example-Post-jar-to-read-ALL-file-types-tp4148312p4148567.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Edit Example Post.jar to read ALL file types

2014-07-22 Thread jrusnak
So by using the SimplePostTool I can define the application type and handling
of specific documents (Such as word, powerpoint, xml, png, etcetera). I have
defined these and they are handled based on their type. In my file system
however, I have a large number of files that can be read as plain text but
do not have the .txt extension due to the manner in which they were saved. I
would like them to read in a text/plain.

Since posting I have found a workaround - I am using a batch file to read
all the directory's file extensions into a text document and copy/pasting
the extensions from the SimplePostTool Source code. Though not ideal, it
does get the job done.

My thanks for the blog, I will look into it.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Edit-Example-Post-jar-to-read-ALL-file-types-tp4148312p4148566.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DocValues without re-index?

2014-07-22 Thread Shawn Heisey
On 7/22/2014 6:14 AM, Michael Ryan wrote:
> I mean re-adding all of the documents in my index. The DocValues wiki page 
> says that this is necessary, but I wanted to know if there was a way around 
> it.

If your index meets the strict criteria for Atomic Updates, you could
"update" all the documents by setting one field to the value it's
already got.

https://wiki.apache.org/solr/Atomic_Updates#Caveats_and_Limitations

If the index does not meet the requirements for Atomic Updates, then
you'll need to completely reindex after adding docValues to a field.
Features that use docValues (like sorting and facets) will not work on
that field until you reindex.  As I understand it, those features cannot
fall back to indexed values.

It sounds like you already know about what this page says:

http://wiki.apache.org/solr/HowToReindex

Thanks,
Shawn



Re: Solr Cassandra MySQL Best Practice Indexing

2014-07-22 Thread Yavar Husain
Exactly. Thanks a lot Jack. +1 for "Your best bet is to get that RDBMS data
moved to Cassandra or DSE ASAP."


On Tue, Jul 22, 2014 at 5:15 PM, Jack Krupansky 
wrote:

> I don't think the Solr Data Import Handler has a Cassandra plugin (entity
> processor) yet, so the most straight forward approach is to write a Java
> app that reads from Cassandra, then reads the corresponding RDBMS data,
> combines the data, and then uses SolrJ to add documents to Solr.
>
> Your best bet is to get that RDBMS data moved to Cassandra or DSE ASAP.
> All you have until then is a stopgap measure rather than a robust
> architecture.
>
>
> -- Jack Krupansky
>
> -Original Message- From: Yavar Husain
> Sent: Tuesday, July 22, 2014 2:22 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr Cassandra MySQL Best Practice Indexing
>
>
> Thanks Jack for your guidance on DSE. However it would be great if somebody
> could help me solving my use case:
>
> So my full text data lies on Cassandra along with an ID. Now I have a lot
> of structured data linked to the ID which lies on an RDBMS (read MySQL). I
> need this structured data as it would help me with my faceting and other
> needs. What is the best practice in going about indexing in this scenario.
>
> I will think about incremental indexing for the new records later.
>
> Bit confused. Any help would be appreciated.
>
>
> On Mon, Jul 21, 2014 at 6:51 PM, Jack Krupansky 
> wrote:
>
>  Solandra is not a supported product. DataStax Enterprise (DSE) supersedes
>> it. With DSE, just load your data into a Solr-enabled Cassandra data
>> center
>> and it will be indexed automatically in the embedded Solr within DSE, as
>> per a Solr schema that you provide. Then use any of the nodes in that
>> Solr-enabled Cassandra data center just the same as with normal Solr.
>>
>> -- Jack Krupansky
>>
>> -Original Message- From: Yavar Husain
>> Sent: Monday, July 21, 2014 8:37 AM
>> To: solr-user@lucene.apache.org
>> Subject: Solr Cassandra MySQL Best Practice Indexing
>>
>>
>> So my full text data lies on Cassandra along with an ID. Now I have a lot
>> of structured data linked to the ID which lies on an RDBMS (read MySQL). I
>> need this structured data as it would help me with my faceting and other
>> needs. What is the best practice in going about indexing in this scenario.
>> My thoughts (maybe weird):
>>
>> 1. Read the data from Cassandra, for each ID read, read the corresponding
>> row from MySQL for that ID, form an XML on the fly (for each ID) and send
>> it to Solr for Indexing without storing anything.
>> 2. I do not have much idea on Solandra. However even if I use it I will
>> have to go to MySQL for fetching the structured data.
>> 3. Duplicate the data and either get all of Cassandra to MySQL or vice
>> versa but then data duplication would happen.
>>
>> I will think about incremental indexing for the new records later.
>>
>> Bit confused. Any help would be appreciated.
>>
>>
>


RE: DocValues without re-index?

2014-07-22 Thread Michael Ryan
I mean re-adding all of the documents in my index. The DocValues wiki page says 
that this is necessary, but I wanted to know if there was a way around it.

-Michael

-Original Message-
From: Mikhail Khludnev [mailto:mkhlud...@griddynamics.com] 
Sent: Tuesday, July 22, 2014 2:14 AM
To: solr-user
Subject: Re: DocValues without re-index?

Michael,

What's "first re-indexing"?
I'm sure you are aware about binary/number DocValues updates, but it works for 
existing column strides. I can guess you are talking about something like 
sidecar index http://www.youtube.com/watch?v=9h3ax5Wmxpk



On Tue, Jul 22, 2014 at 6:50 AM, Michael Ryan  wrote:

> Is it possible to use DocValues on an existing index without first 
> re-indexing?
>
> -Michael
>



--
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics


 


Re: Solr Cassandra MySQL Best Practice Indexing

2014-07-22 Thread Jack Krupansky
I don't think the Solr Data Import Handler has a Cassandra plugin (entity 
processor) yet, so the most straight forward approach is to write a Java app 
that reads from Cassandra, then reads the corresponding RDBMS data, combines 
the data, and then uses SolrJ to add documents to Solr.


Your best bet is to get that RDBMS data moved to Cassandra or DSE ASAP. All 
you have until then is a stopgap measure rather than a robust architecture.


-- Jack Krupansky

-Original Message- 
From: Yavar Husain

Sent: Tuesday, July 22, 2014 2:22 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr Cassandra MySQL Best Practice Indexing

Thanks Jack for your guidance on DSE. However it would be great if somebody
could help me solving my use case:

So my full text data lies on Cassandra along with an ID. Now I have a lot
of structured data linked to the ID which lies on an RDBMS (read MySQL). I
need this structured data as it would help me with my faceting and other
needs. What is the best practice in going about indexing in this scenario.

I will think about incremental indexing for the new records later.

Bit confused. Any help would be appreciated.


On Mon, Jul 21, 2014 at 6:51 PM, Jack Krupansky 
wrote:


Solandra is not a supported product. DataStax Enterprise (DSE) supersedes
it. With DSE, just load your data into a Solr-enabled Cassandra data 
center

and it will be indexed automatically in the embedded Solr within DSE, as
per a Solr schema that you provide. Then use any of the nodes in that
Solr-enabled Cassandra data center just the same as with normal Solr.

-- Jack Krupansky

-Original Message- From: Yavar Husain
Sent: Monday, July 21, 2014 8:37 AM
To: solr-user@lucene.apache.org
Subject: Solr Cassandra MySQL Best Practice Indexing


So my full text data lies on Cassandra along with an ID. Now I have a lot
of structured data linked to the ID which lies on an RDBMS (read MySQL). I
need this structured data as it would help me with my faceting and other
needs. What is the best practice in going about indexing in this scenario.
My thoughts (maybe weird):

1. Read the data from Cassandra, for each ID read, read the corresponding
row from MySQL for that ID, form an XML on the fly (for each ID) and send
it to Solr for Indexing without storing anything.
2. I do not have much idea on Solandra. However even if I use it I will
have to go to MySQL for fetching the structured data.
3. Duplicate the data and either get all of Cassandra to MySQL or vice
versa but then data duplication would happen.

I will think about incremental indexing for the new records later.

Bit confused. Any help would be appreciated.





Re: wrong docFreq while executing query based on uniqueKey-field

2014-07-22 Thread Jack Krupansky
Deleted documents remain in the Lucene index until an "optimize" or segment 
merge operation removes them. As a result they are still counted in document 
frequency. An update is a combination of a delete and an add of a fresh 
document.


-- Jack Krupansky

-Original Message- 
From: Johannes Siegert

Sent: Tuesday, July 22, 2014 7:26 AM
To: solr-user@lucene.apache.org
Subject: wrong docFreq while executing query based on uniqueKey-field

Hi.

My solr-index (version=4.7.2.) has an id-field:


...
id

The index will be updated once per hour.

I use the following query to retrieve some documents:

"q=id:2^2 id:1^1"

I would expect that the document(2) should be always before the
document(1). But after many index updates document(1) is before document(2).

With debug=true I could see the problem. The document(1) has a
docFreq=2, while the document(2) has a docFreq=1.

How could the docFreq of the uniqueKey-field be hight than 1? Could
anyone explain this behavior to me?

Thanks!

Johannes



Re: wrong docFreq while executing query based on uniqueKey-field

2014-07-22 Thread Apoorva Gaurav
I faced the same issue sometime back, root cause is docs getting deleted
and created again without getting optimized. Here is the discussion
http://www.signaldump.org/solr/qpod/22731/docfreq-coming-to-be-more-than-1-for-unique-id-field


On Tue, Jul 22, 2014 at 4:56 PM, Johannes Siegert <
johannes.sieg...@marktjagd.de> wrote:

> Hi.
>
> My solr-index (version=4.7.2.) has an id-field:
>
> 
> ...
> id
>
> The index will be updated once per hour.
>
> I use the following query to retrieve some documents:
>
> "q=id:2^2 id:1^1"
>
> I would expect that the document(2) should be always before the
> document(1). But after many index updates document(1) is before document(2).
>
> With debug=true I could see the problem. The document(1) has a docFreq=2,
> while the document(2) has a docFreq=1.
>
> How could the docFreq of the uniqueKey-field be hight than 1? Could anyone
> explain this behavior to me?
>
> Thanks!
>
> Johannes
>
>


-- 
Thanks & Regards,
Apoorva


wrong docFreq while executing query based on uniqueKey-field

2014-07-22 Thread Johannes Siegert

Hi.

My solr-index (version=4.7.2.) has an id-field:


...
id

The index will be updated once per hour.

I use the following query to retrieve some documents:

"q=id:2^2 id:1^1"

I would expect that the document(2) should be always before the 
document(1). But after many index updates document(1) is before document(2).


With debug=true I could see the problem. The document(1) has a 
docFreq=2, while the document(2) has a docFreq=1.


How could the docFreq of the uniqueKey-field be hight than 1? Could 
anyone explain this behavior to me?


Thanks!

Johannes



Re: Mixing ordinary and nested documents

2014-07-22 Thread Bjørn Axelsen
Thanks, Umesh

You can get the parent bitset by running a the parent doc type query on
> the solr indexsearcher.
> Then child bitset by runnning the child doc type query. Then  use these
> together to create a int[] where int[i] = parent of i.
>

Can you kindly add an example? I am not quite sure how to put this into a
query?

I can easily make the join from child to parent, but what I want to achieve
is to get the parent document added to the result if it exists but maintain
the scoring fromt the child as well as the full child document. Is this
possible?

Cheers,
Bjørn

2014-07-18 19:00 GMT+02:00 Umesh Prasad :

> Comments inline
>
>
> On 16 July 2014 20:31, Bjørn Axelsen 
> wrote:
>
> > Hi Solr users
> >
> > I would appreciate your inputs on how to handle a *mix *of *simple *and
> > *nested
> > *documents in the most easy and flexible way.
> >
> > I need to handle:
> >
> >- simple documens: webpages, short articles etc. (approx. 90% of the
> >content)
> >- nested documents: books containing chapters etc. (approx 10% of the
> >content)
> >
> >
>
>
> > For simple documents I just want to present straightforward search
> results
> > without any grouping etc.
> >
> > For the nested documents I want to group by book and show book title,
> book
> > price etc. AND the individual results within the book. Lets say there is
> a
> > hit on "Chapters 1" and "Chapter 7" within "Book 1" and a hit on "Article
> > 1", I would like to present this:
> >
> > *Book 1 title*
> > Book 1 published date
> > Book 1 description
> > - *Chapter 1 title*
> >   Chapter 1 snippet
> > - *Chapter 7 title*
> >   CHapter 7 snippet
> >
> > *Article 1 title*
> > Article 1 published date
> > Article 1 description
> > Article 1 snippet
> >
> > It looks like it is pretty straightforward to use the CollapsingQParser
> to
> > collapse the book results into one result and not to collapse the other
> > results. But how about showing the information about the book (the parent
> > document of the chapters)?
> >
>
> You can map the child document to parent  doc id space and extract the
> information from parent doc id.
>
> First you need to generate child doc to parent doc id mapping one time.
>   You can get the parent bitset by running a the parent doc type query on
> the solr indexsearcher.
> Then child bitset by runnning the child doc type query. Then  use these
> together to create a int[] where int[i] = parent of i. This result is
> cachable till next commit. I am doing that for computing facets from fields
> in parent docs and sorting on values from parent docs (while getting child
> docs as output).
>
>
>
>
> > 1) Is there a way to do an* optional block join* to a *parent *document
> and
> > return it together *with *the *child *document - but not to require a
> > parent document?
> >
> > - or -
> >
> > 2) Do I need to require parent-child documents for everything? This is
> > really not my preferred strategy as only a small part of the documents is
> > in a real parent-child relationship. This would mean a lot of dummy child
> > documents.
> >
> >
>
> >
> > - or -
> >
> > 3) Should I just denormalize data and include the book information within
> > each chapter document?
> >
> > - or -
> >
> > 4) ... or is there a smarter way?
> >
> > Your help is very much appreciated.
> >
> > Cheers,
> >
> > Bjørn Axelsen
> >
>
>
>
> --
> ---
> Thanks & Regards
> Umesh Prasad
>


spatial search: find result in bbox OR first result outside bbox

2014-07-22 Thread elisabeth benoit
Hello,

I am using solr 4.2.1. I have the following use case.

I should find results inside bbox OR if there is none, first result outside
bbox within a 1000 km distance. I was wondering what is the best way to
proceed.

I was considering doing a geofilt search from the center of my bounding box
and post filtering results.

fq={!geofilt sfield=store}&pt=45.15,-93.85&d=1000

>From a performance point of view I don't think it's a good solution though,
since solr will have to calculate every document distance, then sort.

I was wondering if there was another way to do this and avoid sending more
than one request to solr.

Thanks,
Elisabeth