Re: Import from S3

2016-11-24 Thread vrindavda
Thanks for the quick response Aniket, 

Do i need to make any specific configurations to get data from Amazon S3
storage ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Import-from-S3-tp4307382p4307384.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Import from S3

2016-11-24 Thread Aniket Khare
You can use Solr DIH for indexing csv data into solr.
https://wiki.apache.org/solr/DataImportHandler

On Fri, Nov 25, 2016 at 12:47 PM, vrindavda  wrote:

> Hello,
>
> I have some data in S3, say in text/CSV format, Please provide pointers how
> can i ingest this data into Solr.
>
> Thank you,
> Vrinda Davda
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Import-from-S3-tp4307382.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Regards,

Aniket S. Khare


Import from S3

2016-11-24 Thread vrindavda
Hello,

I have some data in S3, say in text/CSV format, Please provide pointers how
can i ingest this data into Solr.

Thank you,
Vrinda Davda



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Import-from-S3-tp4307382.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SOl6.3 Alchemy Annotator Not Working

2016-11-24 Thread soumitra80
I  have managed to get it working.
Now getting a exception for a different annotator.

org.apache.solr.common.SolrException: processing error
java.lang.NullPointerException.
id=C:\IBMASSIGNMENTS\WatsonCognitive\data\WatsonConferenceRooms.pdf, 
text="null
C:\NONIBM\uimaj-2.9.0-bin\apache-uima\examples\data\WatsonConferenceRooms.txt
Thursday, Novembe..."
at
org.apache.solr.uima.processor.UIMAUpdateRequestProcessor.processAdd(UIMAUpdateRequestProcessor.java:120)
at 
org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:74)
at
org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:254)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:526)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:516)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:414)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:329)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:232)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:416)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:475)
at
org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:180)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:153)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2213)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:460)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:303)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:254)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1160)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1092)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.eclipse.jetty.server.Server.handle(Server.java:518)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)
at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244)
at
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
at
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:246)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.execute(ExecuteProduceConsume.java:102)
at org.eclipse.jetty.io.ManagedSelector.run(ManagedSelector.java:137)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)
at java.lang.Thread.run(Thread.java:785)
Caused by: org.apache.solr.uima.processor.FieldMappingException:
java.lang.NullPointerException
at
org.apache.solr.uima.processor.UIMAToSolrMapper.map(UIMAToSolrMapper.java:83)
at
org.apache.solr.uima.processor.UIMAUpdateRequestProcessor.processAdd(UIMAUpdateRequestProcessor.java:88)
... 41 more
Caused by: java.lang.NullPointerException
at
org.apache.uima.cas.impl.FSIndexRepositoryImpl.getAllIndexedFS(FSIndexRepositoryImpl.java:3010)
at
org.apache.uima.cas.impl.FSIndexRepositoryImpl.getAllIndexedFS(FSIndexRepositoryImpl.java:2987)
at
org.apache.solr.uima.processor.UIMAToSolrMapper.map(UIMAToSolrMapper.java:58)
... 42 more



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOl6-3-Alchemy-Annotator-Not-Working-tp4307228p4307373.html
Sent from the Solr - User 

Re: Search opening hours

2016-11-24 Thread David Smiley
I just saw this conversation now.  I didn't read every word but I have to
ask immediately: does DateRangeField address your needs?
https://cwiki.apache.org/confluence/display/solr/Working+with+Dates  It was
introduced in 5.0.

On Wed, Nov 16, 2016 at 4:59 AM O. Klein  wrote:

> Above implementation was too slow, so wondering if Solr 6 with all its new
> features provides a better solution to tackle operating hours. Especially
> dealing with different timezones.
>
> Any thoughts?
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Search-opening-hours-tp4225250p4306073.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com


Re: Reload schema or configs failed then drop index, can not recreate that index.

2016-11-24 Thread Jerome Yang
Thanks Erick!

On Fri, Nov 25, 2016 at 1:38 AM, Erick Erickson 
wrote:

> This is arguably a bug. I raised a JIRA, see:
>
>  https://issues.apache.org/jira/browse/SOLR-9799
>
> Managed schema is not necessary to show this problem, generically if
> you upload a bad config by whatever means, then
> RELOAD/DELETE/correct/CREATE it fails. The steps I outlined
> in the JIRA force the same replica to be created on the same Solr instance
> to insure it can be reproduced at will.
>
> In the meantime, you can keep from having to restart Solr by:
> - correcting the schema
> - pushing it to Zookeeper (managed schema API does this for you)
> - RELOAD the collection (do NOT delete it first).
>
> Since you can just RELOAD, I doubt this will be a high priority though.
>
> Thanks for reporting!
> Erick
>
>
> On Wed, Nov 23, 2016 at 6:37 PM, Jerome Yang  wrote:
> > It's solr 6.1, cloud mode.
> >
> > Please ignore the first message. Just take check my second email.
> >
> > I mean if I modify an existing collections's managed-schema and the
> > modification makes reload collection failed.
> > Then I delete the collection, and delete the configs from zookeeper.
> > After that upload an configs as the same name as before, and the
> > managed-schema is the not modified version.
> > Then recreate the collection, it will throw an error, "core already
> > exists". But actually it's not.
> > After restart the whole cluster, recreate collection will success.
> >
> > Regards,
> > Jerome
> >
> >
> > On Wed, Nov 23, 2016 at 3:26 PM, Erick Erickson  >
> > wrote:
> >
> >> The mail server is pretty heavy-handed at deleting attachments, none of
> >> your
> >> (presumably) screenshots came through.
> >>
> >> You also haven't told us what version of Solr you're using.
> >>
> >> Best,
> >> Erick
> >>
> >> On Tue, Nov 22, 2016 at 6:25 PM, Jerome Yang  wrote:
> >> > Sorry, wrong message.
> >> > To correct.
> >> >
> >> > In cloud mode.
> >> >
> >> >1. I created a collection called "test" and then modified the
> >> >managed-schemaI, write something wrong, for example
> >> >"id", then reload collection would failed.
> >> >2. Then I drop the collection "test" and delete configs form
> >> zookeeper.
> >> >It works fine. The collection is removed both from zookeeper and
> hard
> >> disk.
> >> >3. Upload the right configs with the same name as before, try to
> >> create
> >> >collection as name "test", it would failed and the error is "core
> >> with name
> >> >'*' already exists". But actually not.
> >> >4. The restart the whole cluster, do the create again, everything
> >> works
> >> >fine.
> >> >
> >> >
> >> > I think when doing the delete collection, there's something still
> hold in
> >> > somewhere not deleted.
> >> > Please have a look
> >> >
> >> > Regards,
> >> > Jerome
> >> >
> >> > On Wed, Nov 23, 2016 at 10:16 AM, Jerome Yang 
> wrote:
> >> >
> >> >> Hi all,
> >> >>
> >> >>
> >> >> Here's my situation:
> >> >>
> >> >> In cloud mode.
> >> >>
> >> >>1. I created a collection called "test" and then modified the
> >> >>managed-schemaI got an error as shown in picture 2.
> >> >>2. To get enough error message, I checked solr logs and get
> message
> >> >>shown in picture 3.
> >> >>3. If I corrected the managed-schema, everything would be fine.
> But I
> >> >>dropped the index. The index couldn't be created it again, like
> >> picture 4.
> >> >>I restarted gptext using "gptext-start -r" and recreated the
> index,
> >> it was
> >> >>created successfully like picture 5.
> >> >>
> >> >>
> >>
>


Re: Best python 3 client for solrcloud

2016-11-24 Thread Dorian Hoxha
Hi Nick,

What I care most is the low-level stuff to work good (like cloud, retries,
zookeeper(i don't think that's needed for normal requests), maybe even
routing to the right core/replica?).
And your client looked best on an overview.

On Thu, Nov 24, 2016 at 10:07 PM, Nick Vasilyev 
wrote:

> I am a comitter for
>
> https://github.com/moonlitesolutions/SolrClient.
>
> I think its pretty good, my aim with it is to provide several reusable
> modules for working with Solr in python. Not just querying, but working
> with collections indexing, reindexing, etc..
>
> Check it out and let me know what you think.
>
> On Nov 24, 2016 3:51 PM, "Dorian Hoxha"  wrote:
>
> > Hi searchers,
> >
> > I see multiple clients for solr in python but each one looks like misses
> > many features. What I need is for at least the low-level api to work with
> > cloud (like retries on different nodes and nice exceptions). What is the
> > best that you use currently ?
> >
> > Thank You!
> >
>


Re: Best python 3 client for solrcloud

2016-11-24 Thread Nick Vasilyev
I am a comitter for

https://github.com/moonlitesolutions/SolrClient.

I think its pretty good, my aim with it is to provide several reusable
modules for working with Solr in python. Not just querying, but working
with collections indexing, reindexing, etc..

Check it out and let me know what you think.

On Nov 24, 2016 3:51 PM, "Dorian Hoxha"  wrote:

> Hi searchers,
>
> I see multiple clients for solr in python but each one looks like misses
> many features. What I need is for at least the low-level api to work with
> cloud (like retries on different nodes and nice exceptions). What is the
> best that you use currently ?
>
> Thank You!
>


Best python 3 client for solrcloud

2016-11-24 Thread Dorian Hoxha
Hi searchers,

I see multiple clients for solr in python but each one looks like misses
many features. What I need is for at least the low-level api to work with
cloud (like retries on different nodes and nice exceptions). What is the
best that you use currently ?

Thank You!


Re: Metadata and Newline Characters at Content

2016-11-24 Thread Erick Erickson
Not sure. What have you tried?

 For production situations or when you want to take total control of
the indexing process,I strongly recommend that you put the Tika
parsing on the _client_.

Here's a writeup on this topic:

https://lucidworks.com/blog/2012/02/14/indexing-with-solrj/

Best,
Erick

On Thu, Nov 24, 2016 at 10:37 AM, Furkan KAMACI  wrote:
> Hi Erick,
>
> When I check the *Solr* documentation I see that [1]:
>
> *In addition to Tika's metadata, Solr adds the following metadata (defined
> in ExtractingMetadataConstants):*
>
> *"stream_name" - The name of the ContentStream as uploaded to Solr.
> Depending on how the file is uploaded, this may or may not be set.*
> *"stream_source_info" - Any source info about the stream. See
> ContentStream.*
> *"stream_size" - The size of the stream in bytes(?)*
> *"stream_content_type" - The content type of the stream, if available.*
>
> So, it seems that these may not be added by Tika, but Solr. Do you know how
> to enable/disable this feature?
>
> Kind Regards,
> Furkan KAMACI
>
> [1] https://wiki.apache.org/solr/ExtractingRequestHandler
>
> On Thu, Nov 24, 2016 at 6:51 PM, Erick Erickson 
> wrote:
>
>> about PatternCaptureGroupFilterFactory. This isn't going to help. The
>> data you see when you return stored data is _before_ any analysis so
>> the PatternFactory won't be applied. You could do this in a
>> ScriptUpdateProcessorFactory. Or, just don't worry about it and have
>> the real app deal with it.
>>
>> I don't particularly know about the Tika settings, that's largely a guess.
>>
>> Best,
>> Erick
>>
>> On Thu, Nov 24, 2016 at 8:43 AM, Furkan KAMACI 
>> wrote:
>> > Hi Erick,
>> >
>> > 1) I am looking stored data via Solr Admin UI. I send the query and check
>> > what is in content field.
>> >
>> > 2) I can debug the Tika settings if you think that this is not the
>> desired
>> > behaviour to have such metadata fields combined into content field.
>> >
>> > *PS: *Is there any solution to get rid of it except for
>> > using PatternCaptureGroupFilterFactory?
>> >
>> > Kind Regards,
>> > Furkan KAMACI
>> >
>> > On Thu, Nov 24, 2016 at 6:31 PM, Erick Erickson > >
>> > wrote:
>> >
>> >> 1> I'm assuming when you "see" this data you're looking at the stored
>> >> data, right? It's a verbatim copy of whatever you sent to the field.
>> >> I'm guessing it's a character-encoding mismatch between the source and
>> >> what you use to display.
>> >>
>> >> 2> How are you extracting this data? There are Tika options I think
>> >> that can/do mush fields together.
>> >>
>> >> Best,
>> >> Erick
>> >>
>> >>
>> >>
>> >> On Thu, Nov 24, 2016 at 7:54 AM, Furkan KAMACI 
>> >> wrote:
>> >> > Hi,
>> >> >
>> >> > I'm testing Solr 4.9.1 I've indexed documents via it. Content field at
>> >> > schema has text_general field type which is not modified from
>> original. I
>> >> > do not copy any fields to content. When I check the data  I see
>> content
>> >> > values as like:
>> >> >
>> >> >  " \n \nstream_source_info MARLON BRANDO.rtf   \nstream_content_type
>> >> > application/rtf   \nstream_size 13580   \nstream_name MARLON
>> BRANDO.rtf
>> >> > \nContent-Type application/rtf   \nresourceName MARLON BRANDO.rtf   \n
>> >> \n
>> >> > \n  1. Vivien Leigh and Marlon Brando in \"A Streetcar Named Desire\"
>> >> > directed by Elia Kazan \n"
>> >> >
>> >> > My questions:
>> >> >
>> >> > 1) Is it usual to have that newline characters?
>> >> > 2) Is it usual to have file metadata at the beginning of the content
>> >> (i.e.
>> >> > stream source, stream_content_type) or related to tool that I post
>> data
>> >> to
>> >> > Solr?
>> >> >
>> >> > Kind Regards,
>> >> > Furkan KAMACI
>> >>
>>


Re: Metadata and Newline Characters at Content

2016-11-24 Thread Furkan KAMACI
Hi Erick,

When I check the *Solr* documentation I see that [1]:

*In addition to Tika's metadata, Solr adds the following metadata (defined
in ExtractingMetadataConstants):*

*"stream_name" - The name of the ContentStream as uploaded to Solr.
Depending on how the file is uploaded, this may or may not be set.*
*"stream_source_info" - Any source info about the stream. See
ContentStream.*
*"stream_size" - The size of the stream in bytes(?)*
*"stream_content_type" - The content type of the stream, if available.*

So, it seems that these may not be added by Tika, but Solr. Do you know how
to enable/disable this feature?

Kind Regards,
Furkan KAMACI

[1] https://wiki.apache.org/solr/ExtractingRequestHandler

On Thu, Nov 24, 2016 at 6:51 PM, Erick Erickson 
wrote:

> about PatternCaptureGroupFilterFactory. This isn't going to help. The
> data you see when you return stored data is _before_ any analysis so
> the PatternFactory won't be applied. You could do this in a
> ScriptUpdateProcessorFactory. Or, just don't worry about it and have
> the real app deal with it.
>
> I don't particularly know about the Tika settings, that's largely a guess.
>
> Best,
> Erick
>
> On Thu, Nov 24, 2016 at 8:43 AM, Furkan KAMACI 
> wrote:
> > Hi Erick,
> >
> > 1) I am looking stored data via Solr Admin UI. I send the query and check
> > what is in content field.
> >
> > 2) I can debug the Tika settings if you think that this is not the
> desired
> > behaviour to have such metadata fields combined into content field.
> >
> > *PS: *Is there any solution to get rid of it except for
> > using PatternCaptureGroupFilterFactory?
> >
> > Kind Regards,
> > Furkan KAMACI
> >
> > On Thu, Nov 24, 2016 at 6:31 PM, Erick Erickson  >
> > wrote:
> >
> >> 1> I'm assuming when you "see" this data you're looking at the stored
> >> data, right? It's a verbatim copy of whatever you sent to the field.
> >> I'm guessing it's a character-encoding mismatch between the source and
> >> what you use to display.
> >>
> >> 2> How are you extracting this data? There are Tika options I think
> >> that can/do mush fields together.
> >>
> >> Best,
> >> Erick
> >>
> >>
> >>
> >> On Thu, Nov 24, 2016 at 7:54 AM, Furkan KAMACI 
> >> wrote:
> >> > Hi,
> >> >
> >> > I'm testing Solr 4.9.1 I've indexed documents via it. Content field at
> >> > schema has text_general field type which is not modified from
> original. I
> >> > do not copy any fields to content. When I check the data  I see
> content
> >> > values as like:
> >> >
> >> >  " \n \nstream_source_info MARLON BRANDO.rtf   \nstream_content_type
> >> > application/rtf   \nstream_size 13580   \nstream_name MARLON
> BRANDO.rtf
> >> > \nContent-Type application/rtf   \nresourceName MARLON BRANDO.rtf   \n
> >> \n
> >> > \n  1. Vivien Leigh and Marlon Brando in \"A Streetcar Named Desire\"
> >> > directed by Elia Kazan \n"
> >> >
> >> > My questions:
> >> >
> >> > 1) Is it usual to have that newline characters?
> >> > 2) Is it usual to have file metadata at the beginning of the content
> >> (i.e.
> >> > stream source, stream_content_type) or related to tool that I post
> data
> >> to
> >> > Solr?
> >> >
> >> > Kind Regards,
> >> > Furkan KAMACI
> >>
>


Re: Comparing a Date value in solr

2016-11-24 Thread Erick Erickson
bq: The requirement doesn't really let me use the query like that.

Why not? Why can't you index a start date and end date? At ingestion
time if your data is a start date and number of days the event (let's
call it an event) will run, why not index a second field that contains the
end date along with the number of days the event runs?


Or perhaps use the DateRangeField rather than separate start/end dates
to make your queries easier?
(see: https://cwiki.apache.org/confluence/display/solr/Working+with+Dates)

Many times I'll add extra fields to the doc at index time to support the
search use-cases. Almost always, the extra work at index time is repaid
many times over in query efficiency.

Best,
Erick

On Wed, Nov 23, 2016 at 2:27 PM, Sadheera Vithanage  wrote:
> Thankyou Erick,
>
> The requirement doesn't really let me use the query like that.
>
> Rather, what I would be storing in my document is the day number.
>
> E.g: Day : 1, Day : 2 etc I can even store this in milliseconds
> like 8640,17280.
>
> And I want to compare if those days falls within the difference of current
> day and another day.
>
> Something like below.
>
> StoredDay_ms:ms(NOW/DAY+1DAY,NOW/DAY)
>
> And it should return the documents with values set to 8640 as
> StoredDay_ms.
>
> Thank you very much, Really appreciate your help.
>
> On Wed, Nov 23, 2016 at 6:20 PM, Erick Erickson 
> wrote:
>
>> I wouldn't do it this way, it's far more complex than you need. Try
>> fq=Startdate__D:[NOW/DAY-7DAYS TO NOW/DAY+1DAY].
>>
>> Why the weird NOW/DAY+1DAY? Well, that makes fq clauses far
>> more likely to be reused, see:
>> https://lucidworks.com/blog/2012/02/23/date-math-now-and-filter-queries/
>>
>> Best,
>> Erick
>>
>> On Tue, Nov 22, 2016 at 7:29 PM, Sadheera Vithanage 
>> wrote:
>> > Hi All,
>> >
>> > I am struggling to get the difference of 2 days and return the matching
>> > documents.
>> >
>> > I got the below function query to work, however I am unable to pass a
>> > fieldname for *u *in frange function.
>> >
>> > {!frange l=0 u=86400}ms(NOW,StartDate__d)
>> >
>> >
>> > What I really want to do is compare the start date with today's date and
>> > return the documents that falls within a date range.For example 7 days.
>> >
>> > Thank you.
>> >
>> > --
>> > Regards
>> >
>> > Sadheera Vithanage
>>
>
>
>
> --
> Regards
>
> Sadheera Vithanage


Re: Wildcard searches with space in TextField/StrField

2016-11-24 Thread Sandeep Khanzode
Hi All, Erick,
Please suggest. Would like to use the ComplexPhraseQueryParser for searching 
text (with wildcard) that may contain special characters.
For example ...John* should match John V. DoeJohn* should match Johnson 
SmithBruce-Willis* should match Bruce-WillisV.* should match John V. F. Doe
SRK 

On Thursday, November 24, 2016 5:57 PM, Sandeep Khanzode 
 wrote:
 

 Hi,
This is the typical TextField with ...             
            



SRK 

    On Thursday, November 24, 2016 1:38 AM, Reth RM  
wrote:
 

 what is the fieldType of those records?  
On Tue, Nov 22, 2016 at 4:18 AM, Sandeep Khanzode 
 wrote:

Hi Erick,
I gave this a try. 
These are my results. There is a record with "John D. Smith", and another named 
"John Doe".

1.] {!complexphrase inOrder=true}name:"John D.*" ... does not fetch any 
results. 

2.] {!complexphrase inOrder=true}name:"John D*" ... fetches both results. 



Second observation: There is a record with "John D Smith"
1.] {!complexphrase inOrder=true}name:"John*" ... does not fetch any results. 

2.] {!complexphrase inOrder=true}name:"John D*" ... fetches that record. 

3.] {!complexphrase inOrder=true}name:"John D S*" ... fetches that record. 

SRK

    On Sunday, November 13, 2016 7:43 AM, Erick Erickson 
 wrote:


 Right, for that kind of use case you want complexPhraseQueryParser,
see: https://cwiki.apache.org/ confluence/display/solr/Other+ 
Parsers#OtherParsers- ComplexPhraseQueryParser

Best,
Erick

On Sat, Nov 12, 2016 at 9:39 AM, Sandeep Khanzode
 wrote:
> Thanks, Erick.
>
> I am actually not trying to use the String field (prefer a TextField here).
> But, in my comparisons with TextField, it seems that something like phrase
> matching with whitespace and wildcard (like, 'my do*' or say, 'my dog*', or
> say, 'my dog has*') can only be accomplished with a string type field,
> especially because, with a WhitespaceTokenizer in TextField, the space will
> be lost, and all tokens will be individually considered. Am I missing
> something?
>
> SRK
>
>
> On Friday, November 11, 2016 10:05 PM, Erick Erickson
>  wrote:
>
>
> You have to query text and string fields differently, that's just the
> way it works. The problem is getting the query string through the
> parser as a _single_ token or as multiple tokens.
>
> Let's say you have a string field with the "a b" example. You have a
> single token
> a b that starts at offset 0.
>
> But with a text field, you have two tokens,
> a at position 0
> b at position 1
>
> But when the query parser sees "a b" (without quotes) it splits it
> into two tokens, and only the text field has both tokens so the string
> field won't match.
>
> OTOH, when the query parser sees "a\ b" it passes this through as a
> single token, which only matches the string field as there's no
> _single_ token "a b" in the text field.
>
> But a more interesting question is why you want to search this way.
> String fields are intended for keywords, machine-generated IDs and the
> like. They're pretty useless for searching anything except
> 1> exact tokens
> 2> prefixes
>
> While if you have "my dog has fleas" in a string field, you _can_
> search "*dog*" and get a hit but the performance is poor when you get
> a large corpus. Performance for "my*" will be pretty good though.
>
> In all this sounds like an XY problem, what's the use-case you're
> trying to solve?
>
> Best,
> Erick
>
>
>
> On Thu, Nov 10, 2016 at 10:11 PM, Sandeep Khanzode
>  wrote:
>> Hi Erick, Reth,
>>
>> The 'a\ b*' as well as the q.op=AND approach worked (successfully) only
>> for StrField for me.
>>
>> Any attempt at creating a 'a\ b*' for a TextField does not match any
>> documents. The parsedQuery in debug mode does show 'field:a b*'. I am sure
>> there are documents that should match.
>> Another (maybe unrelated) observation is if I have 'field:a\ b', then the
>> parsedQuery is field:a field:b. Which does not match as expected (matches
>> individually).
>>
>> Can you please provide an example that I can use in Solr Query dashboard?
>> That will be helpful.
>>
>> I have also seen that wildcard queries work irrespective of field type
>> i.e. StrField as well as TextField. That makes sense because with a
>> WhitespaceTokenizer only creates word boundaries when we do not use a
>> EdgeNGramFilter. If I am not wrong, that is. SRK
>>
>>    On Friday, November 11, 2016 5:00 AM, Erick Erickson
>>  wrote:
>>
>>
>>  You can escape the space with a backslash as  'a\ b*'
>>
>> Best,
>> Erick
>>
>> On Thu, Nov 10, 2016 at 2:37 PM, Reth RM  wrote:
>>> I don't think you can do wildcard on StrField. For text field, if your
>>> query is "category:(test m*)"  the parsed query will be  "category:test
>>> OR
>>> category:m*"
>>> You can add q.op=AND to 

Re: Reload schema or configs failed then drop index, can not recreate that index.

2016-11-24 Thread Erick Erickson
This is arguably a bug. I raised a JIRA, see:

 https://issues.apache.org/jira/browse/SOLR-9799

Managed schema is not necessary to show this problem, generically if
you upload a bad config by whatever means, then
RELOAD/DELETE/correct/CREATE it fails. The steps I outlined
in the JIRA force the same replica to be created on the same Solr instance
to insure it can be reproduced at will.

In the meantime, you can keep from having to restart Solr by:
- correcting the schema
- pushing it to Zookeeper (managed schema API does this for you)
- RELOAD the collection (do NOT delete it first).

Since you can just RELOAD, I doubt this will be a high priority though.

Thanks for reporting!
Erick


On Wed, Nov 23, 2016 at 6:37 PM, Jerome Yang  wrote:
> It's solr 6.1, cloud mode.
>
> Please ignore the first message. Just take check my second email.
>
> I mean if I modify an existing collections's managed-schema and the
> modification makes reload collection failed.
> Then I delete the collection, and delete the configs from zookeeper.
> After that upload an configs as the same name as before, and the
> managed-schema is the not modified version.
> Then recreate the collection, it will throw an error, "core already
> exists". But actually it's not.
> After restart the whole cluster, recreate collection will success.
>
> Regards,
> Jerome
>
>
> On Wed, Nov 23, 2016 at 3:26 PM, Erick Erickson 
> wrote:
>
>> The mail server is pretty heavy-handed at deleting attachments, none of
>> your
>> (presumably) screenshots came through.
>>
>> You also haven't told us what version of Solr you're using.
>>
>> Best,
>> Erick
>>
>> On Tue, Nov 22, 2016 at 6:25 PM, Jerome Yang  wrote:
>> > Sorry, wrong message.
>> > To correct.
>> >
>> > In cloud mode.
>> >
>> >1. I created a collection called "test" and then modified the
>> >managed-schemaI, write something wrong, for example
>> >"id", then reload collection would failed.
>> >2. Then I drop the collection "test" and delete configs form
>> zookeeper.
>> >It works fine. The collection is removed both from zookeeper and hard
>> disk.
>> >3. Upload the right configs with the same name as before, try to
>> create
>> >collection as name "test", it would failed and the error is "core
>> with name
>> >'*' already exists". But actually not.
>> >4. The restart the whole cluster, do the create again, everything
>> works
>> >fine.
>> >
>> >
>> > I think when doing the delete collection, there's something still hold in
>> > somewhere not deleted.
>> > Please have a look
>> >
>> > Regards,
>> > Jerome
>> >
>> > On Wed, Nov 23, 2016 at 10:16 AM, Jerome Yang  wrote:
>> >
>> >> Hi all,
>> >>
>> >>
>> >> Here's my situation:
>> >>
>> >> In cloud mode.
>> >>
>> >>1. I created a collection called "test" and then modified the
>> >>managed-schemaI got an error as shown in picture 2.
>> >>2. To get enough error message, I checked solr logs and get message
>> >>shown in picture 3.
>> >>3. If I corrected the managed-schema, everything would be fine. But I
>> >>dropped the index. The index couldn't be created it again, like
>> picture 4.
>> >>I restarted gptext using "gptext-start -r" and recreated the index,
>> it was
>> >>created successfully like picture 5.
>> >>
>> >>
>>


Re: Zookeeper version

2016-11-24 Thread Erick Erickson
Well, 3.4.6 gets the most testing, so if you want to upgrade it's at
your own risk.

See: https://issues.apache.org/jira/browse/SOLR-8724, there are
problems with 3.4.8 in the Solr context for instance.

There's currently an open Zookeeper JIRA for 3.4.9 that, when fixed,
Solr will try to upgrade to.

Best,
Erick

On Thu, Nov 24, 2016 at 2:12 AM, Novin Novin  wrote:
> Hi Guys,
>
> I found in solr docs that "Solr currently uses Apache ZooKeeper v3.4.6".
> Can I use higher version or I have to use 3.4.6 zookeeper.
>
> Thanks in advance,
> Novin


Re: Zookeeper version

2016-11-24 Thread Shawn Heisey
On 11/24/2016 3:12 AM, Novin Novin wrote:
> I found in solr docs that "Solr currently uses Apache ZooKeeper
> v3.4.6". Can I use higher version or I have to use 3.4.6 zookeeper. 

Solr should be fine working with zookeeper servers running any 3.4.x
version.  I believe 3.4.9 is the highest stable version currently available.

It looks like Zookeeper does not follow the same release philosophy that
Solr does.  In Solr, changes in the third version number are bugfix-only
releases.  Zookeeper does appear to add new features when the third
version number changes.  They haven't had a minor release in quite a
while, but Solr makes minor releases frequently.

So far, there are only alpha releases of Zookeeper version 3.5.  One of
the big features in that version is the ability to dynamically change
the zookeeper cluster by adding or removing servers. I have no idea
whether current Solr versions will work out of the box with a 3.5 server
cluster ... but even if it does work, it won't handle the new dynamic
membership feature.  Some time after Zookeeper releases a stable 3.5
version, Solr will be updated to use it.

Thanks,
Shawn



Re: Need help to update multiple documents

2016-11-24 Thread Erick Erickson
_What_ issue? You haven't told us what the results are, what if anything
the Solr logs show when you try this, in short anything that could help
us diagnose the problem.

Solr has "atomic updates" that work to update partial documents, but
that requires that all fields be stored. Are you trying that?

Have you seen: 
https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-Solr-StyleJSON
?

More details please.
Erick

On Thu, Nov 24, 2016 at 3:01 AM, GW  wrote:
> I've not looked at your file. If you are really thinking update, there is
> no such thing. You can only replace the entire document or delete it.
>
> On 23 November 2016 at 23:47, Reddy Sankar 
> wrote:
>
>> Hi Team ,
>>
>>
>>
>> Facing issue to update multiple document in SOLAR at time in my batch job.
>>
>>
>>
>> Could you please help me by giving example or an documentation for the
>> same.
>>
>>
>>
>> Thanks
>>
>> Sankar Reddy M.B
>>


Re: Metadata and Newline Characters at Content

2016-11-24 Thread Erick Erickson
about PatternCaptureGroupFilterFactory. This isn't going to help. The
data you see when you return stored data is _before_ any analysis so
the PatternFactory won't be applied. You could do this in a
ScriptUpdateProcessorFactory. Or, just don't worry about it and have
the real app deal with it.

I don't particularly know about the Tika settings, that's largely a guess.

Best,
Erick

On Thu, Nov 24, 2016 at 8:43 AM, Furkan KAMACI  wrote:
> Hi Erick,
>
> 1) I am looking stored data via Solr Admin UI. I send the query and check
> what is in content field.
>
> 2) I can debug the Tika settings if you think that this is not the desired
> behaviour to have such metadata fields combined into content field.
>
> *PS: *Is there any solution to get rid of it except for
> using PatternCaptureGroupFilterFactory?
>
> Kind Regards,
> Furkan KAMACI
>
> On Thu, Nov 24, 2016 at 6:31 PM, Erick Erickson 
> wrote:
>
>> 1> I'm assuming when you "see" this data you're looking at the stored
>> data, right? It's a verbatim copy of whatever you sent to the field.
>> I'm guessing it's a character-encoding mismatch between the source and
>> what you use to display.
>>
>> 2> How are you extracting this data? There are Tika options I think
>> that can/do mush fields together.
>>
>> Best,
>> Erick
>>
>>
>>
>> On Thu, Nov 24, 2016 at 7:54 AM, Furkan KAMACI 
>> wrote:
>> > Hi,
>> >
>> > I'm testing Solr 4.9.1 I've indexed documents via it. Content field at
>> > schema has text_general field type which is not modified from original. I
>> > do not copy any fields to content. When I check the data  I see content
>> > values as like:
>> >
>> >  " \n \nstream_source_info MARLON BRANDO.rtf   \nstream_content_type
>> > application/rtf   \nstream_size 13580   \nstream_name MARLON BRANDO.rtf
>> > \nContent-Type application/rtf   \nresourceName MARLON BRANDO.rtf   \n
>> \n
>> > \n  1. Vivien Leigh and Marlon Brando in \"A Streetcar Named Desire\"
>> > directed by Elia Kazan \n"
>> >
>> > My questions:
>> >
>> > 1) Is it usual to have that newline characters?
>> > 2) Is it usual to have file metadata at the beginning of the content
>> (i.e.
>> > stream source, stream_content_type) or related to tool that I post data
>> to
>> > Solr?
>> >
>> > Kind Regards,
>> > Furkan KAMACI
>>


Re: AW: AW: Resync after restart

2016-11-24 Thread Erick Erickson
Hold on. Are you using SolrCloud or not? There is a lot of talk here
about masters and slaves, then you say "I always add slaves with the
collection API", collections are a SolrCloud construct.

It sounds like you're mixing the two. You should _not_ configure
master/slave replication parameters with SolrCloud. Take a look at the
sample configs

And you haven't told us what version of Solr you're using, we can
infer a relatively recent one because of the high number you have for
numVersionBuckets, but that's guessing.

If you are _not_ in SolrCloud, then maybe:
https://issues.apache.org/jira/browse/SOLR-9036 is relevant.

Best,
Erick

On Thu, Nov 24, 2016 at 3:10 AM, Arkadi Colson  wrote:
> This is the code from the master node. Al configs are the same on all nodes.
> I always add slaves with the collection API. Is there an other place to look
> for this part of the config?
>
>
>
> On 24-11-16 12:02, Michael Aleythe, Sternwald wrote:
>>
>> You need to change this on the master node. The part of the config you
>> pasted here, looks like it is from the slave node.
>>
>> -Ursprüngliche Nachricht-
>> Von: Arkadi Colson [mailto:ark...@smartbit.be]
>> Gesendet: Donnerstag, 24. November 2016 11:56
>> An: solr-user@lucene.apache.org
>> Betreff: Re: AW: Resync after restart
>>
>> Hi Michael
>>
>> Thanks for the quick response! The line does not exist in my config. So
>> can I assume that the default configuration is to not replicate at startup?
>>
>> 
>>   
>> 18.75
>> 05:00:00
>> 15
>> 30
>>   
>> 
>>
>> Any other idea's?
>>
>>
>> On 24-11-16 11:49, Michael Aleythe, Sternwald wrote:
>>>
>>> Hi Arkadi,
>>>
>>> you need to remove the line "startup"
>>> from your ReplicationHandler-config in solrconfig.xml ->
>>> https://wiki.apache.org/solr/SolrReplication.
>>>
>>> Greetings
>>> Michael
>>>
>>> -Ursprüngliche Nachricht-
>>> Von: Arkadi Colson [mailto:ark...@smartbit.be]
>>> Gesendet: Donnerstag, 24. November 2016 09:26
>>> An: solr-user 
>>> Betreff: Resync after restart
>>>
>>> Hi
>>>
>>> Almost every time when restarting a solr instance the index is replicated
>>> completely. Is there a way to avoid this somehow? The index currently has a
>>> size of about 17GB.
>>> Some advice here would be great.
>>>
>>> 99% of the config is defaul:
>>>
>>>  ${solr.ulog.dir:} >> name="numVersionBuckets">${solr.ulog.numVersionBuckets:65536}
>>>  
>>>   ${solr.autoCommit.maxTime:15000}
>>>   false
>>> 
>>>
>>> If you need more info, just let me know...
>>>
>>> Thx!
>>> Arkadi
>>>
>


Re: Metadata and Newline Characters at Content

2016-11-24 Thread Furkan KAMACI
Hi Erick,

1) I am looking stored data via Solr Admin UI. I send the query and check
what is in content field.

2) I can debug the Tika settings if you think that this is not the desired
behaviour to have such metadata fields combined into content field.

*PS: *Is there any solution to get rid of it except for
using PatternCaptureGroupFilterFactory?

Kind Regards,
Furkan KAMACI

On Thu, Nov 24, 2016 at 6:31 PM, Erick Erickson 
wrote:

> 1> I'm assuming when you "see" this data you're looking at the stored
> data, right? It's a verbatim copy of whatever you sent to the field.
> I'm guessing it's a character-encoding mismatch between the source and
> what you use to display.
>
> 2> How are you extracting this data? There are Tika options I think
> that can/do mush fields together.
>
> Best,
> Erick
>
>
>
> On Thu, Nov 24, 2016 at 7:54 AM, Furkan KAMACI 
> wrote:
> > Hi,
> >
> > I'm testing Solr 4.9.1 I've indexed documents via it. Content field at
> > schema has text_general field type which is not modified from original. I
> > do not copy any fields to content. When I check the data  I see content
> > values as like:
> >
> >  " \n \nstream_source_info MARLON BRANDO.rtf   \nstream_content_type
> > application/rtf   \nstream_size 13580   \nstream_name MARLON BRANDO.rtf
> > \nContent-Type application/rtf   \nresourceName MARLON BRANDO.rtf   \n
> \n
> > \n  1. Vivien Leigh and Marlon Brando in \"A Streetcar Named Desire\"
> > directed by Elia Kazan \n"
> >
> > My questions:
> >
> > 1) Is it usual to have that newline characters?
> > 2) Is it usual to have file metadata at the beginning of the content
> (i.e.
> > stream source, stream_content_type) or related to tool that I post data
> to
> > Solr?
> >
> > Kind Regards,
> > Furkan KAMACI
>


Re: SolrCloud -Distribued Indexing

2016-11-24 Thread Shawn Heisey
On 11/23/2016 3:43 AM, Udit Tyagi wrote:
> I am a solr user, I am using solr-6.3.0 version, I have some doubts > for 
> Distributed indexing and sharding in SolrCloud pease clarify, > >
1. How can I index documents to a specific shard(I heard about >
document routing not documentation is not proper for that). One of the
really nice things about SolrCloud is the automatic document routing. 
You can shard your index and not worry about which document ends up
where, because SolrCloud will automatically figure that out for you. 
You can turn that off, but aside from time series data (like logs) I
don't see much reason to.

The documentation (link below) says that you can set a _route_ parameter
to name the shard you want to index to.  I have not tried this.  If the
router is implicit and you send a request directly to a replica for the
shard you want to index, then you wouldn't need to name a shard.

https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud#ShardsandIndexingDatainSolrCloud-DocumentRouting

> I am using solr create command from terminal to create collection i > don't 
> have any option to specify router name while creating >
collection from terminal so how can i implement implicit router for > my
collection.
It looks like you can't specify the router when using the script to
create the collection.  Use the Collections API instead, where you can
make an HTTP call to create the collection.  You can even use a browser
to make the request.  Note that this is what the script actually does
when you create a collection in cloud mode and don't upload a config at
the same time -- it just calls this HTTP api.

https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api1

> 2.In documentation of Solr-6.3.0 for client API solrj the way to > connect to 
> solrcloud is specified as > > String zkHostString = >
"zkServerA:2181,zkServerB:2181,zkServerC:2181/solr"; SolrClient solr > =
new CloudSolrClient.Builder().withZkHost(zkHostString).build();
Sugar methods on the client like "add" and "query" have alternate forms
that accept the name of a collection as one of the parameters.  If you
are building a request from scratch and not using those sugar methods on
the client, you can set the "collection" parameter on the request.  The
client has a "setDefaultCollection" method that sets the default
collection for requests that don't mention which collection to use.  You
can't use setDefaultCollection if you have assigned the client to a
SolrClient -- it must be a CloudSolrClient.  If you cast it back to the
cloud object, then you'd be able to do it.  This additional line after
the code above would work:

((CloudSolrClient) solr).setDefaultCollection("foo");

Thanks,
Shawn



Re: SOLR vs mongdb

2016-11-24 Thread Shawn Heisey
On 11/23/2016 11:27 AM, Prateek Jain J wrote:
> 1.   Solr is indexing engine but it stores both data and indexes in same 
> directory. Although we can select fields to store/persist in solr via 
> schema.xml. But in nutshell, it's not possible to distinguish between data 
> and indexes like, I can't remove all indexes and still have persisted data 
> with SOLR.

Solr uses Lucene for most of its functionality.  Although the Lucene
file format does have different files for stored data than it does for
the index, it's not separate enough that you can manually manipulate it
and delete one or the other while leaving part of it intact.  The files
that make up the Lucene index are NOT meant to be manipulated by
anything other than Lucene code.  Changing them in any other way can
lead to a corrupt index.

> 2.   Solr indexing capabilities are far better than any other nosql db 
> like mongodb etc. like faceting, weighted search.

This is vague.  Solr is good at search and associated details, databases
typically aren't.  I removed your next numbered point -- whether or not
mongodb uses shards doesn't matter.  Exactly how scaling happens isn't
all that important.
 
> 4.   We can have architecture where data is stored in separate db like 
> mongodb or mysql. SOLR can connect with db and index data (in SOLR).
>
> I tried googling for question "solr vs mongodb" and there are various threads 
> on sites like stackoverflow. But I still can't understand why would anyone go 
> for mongodb and when for SOLR (except for features like faceting, may be CAP 
> theorem). Are there any specific use-cases for choosing NoSQL databases like 
> mongoDB over SOLR?

Solr and MondoDB are designed for very different uses.  Although Solr
*can* be used as a NoSQL database, that is not what it is *designed*
for.  It is a *search engine*.  There are redundancy and scalability
features, and Solr does try really hard to never lose data, but it has
not been hardened against those problems.

Solr is good at combing a large dataset for random keywords plus other
filtering and returning the top N results, where N is typically a small
number that's two or three digits.  If you ask it for a million results,
it's going to be REALLY slow ... but if you ask a database for the same
thing, it is probably going to return it pretty quickly.

Those who have experience with Solr *as a search engine* will tell you
this:  "Always be prepared to completely rebuild your Solr indexes from
scratch, because a large percentage of changes will require a reindex." 
This is less problematic if you only use Solr as a data store, not for
searching ... but if that's the plan, why use Solr at all?

Slipping into the subjective:  This is purely my opinion.  Somewhat
informed, but still MY opinion:

I wouldn't use either Solr or MongoDB as the canonical datastore for
anything where I care about the reliability.  Solr is not designed for
it, and I've read from sources that are normally trustworthy that
MongoDB has serious issues with reliability.  Here's a couple of things
I found with only minimal poking:

https://aphyr.com/posts/322-jepsen-mongodb-stale-reads
http://hackingdistributed.com/2013/01/29/mongo-ft/

The Jepsen testing concluded that MongoDB had serious problems with its
architecture, not just bugs that lose data.

It's only fair to mention that SolrCloud was also subjected to Jepsen
testing.  Bugs were found, but because of its reliance on Zookeeper for
cluster management, it actually did fairly well:

https://lucidworks.com/blog/2014/12/10/call-maybe-solrcloud-jepsen-flaky-networks/

Thanks,
Shawn



Re: Metadata and Newline Characters at Content

2016-11-24 Thread Erick Erickson
1> I'm assuming when you "see" this data you're looking at the stored
data, right? It's a verbatim copy of whatever you sent to the field.
I'm guessing it's a character-encoding mismatch between the source and
what you use to display.

2> How are you extracting this data? There are Tika options I think
that can/do mush fields together.

Best,
Erick



On Thu, Nov 24, 2016 at 7:54 AM, Furkan KAMACI  wrote:
> Hi,
>
> I'm testing Solr 4.9.1 I've indexed documents via it. Content field at
> schema has text_general field type which is not modified from original. I
> do not copy any fields to content. When I check the data  I see content
> values as like:
>
>  " \n \nstream_source_info MARLON BRANDO.rtf   \nstream_content_type
> application/rtf   \nstream_size 13580   \nstream_name MARLON BRANDO.rtf
> \nContent-Type application/rtf   \nresourceName MARLON BRANDO.rtf   \n  \n
> \n  1. Vivien Leigh and Marlon Brando in \"A Streetcar Named Desire\"
> directed by Elia Kazan \n"
>
> My questions:
>
> 1) Is it usual to have that newline characters?
> 2) Is it usual to have file metadata at the beginning of the content (i.e.
> stream source, stream_content_type) or related to tool that I post data to
> Solr?
>
> Kind Regards,
> Furkan KAMACI


Re: SOLR vs mongdb

2016-11-24 Thread Walter Underwood
Solr is not designed to be a repository, so don’t use it as a repository.

If you want to keep the original copy of your data, put it in something designed
to do that. It could be a database, it could be files in Amazon S3.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Nov 24, 2016, at 3:11 AM, Prateek Jain J  
> wrote:
> 
> 
> Hi Walter,
> 
> With the solr support to sharding, is the storage capability still in 
> question? Or we are only talking about features like transaction logs, which 
> can be used to re-build database.
> 
> Regards,
> Prateek Jain
> 
> -Original Message-
> From: Walter Underwood [mailto:wun...@wunderwood.org] 
> Sent: 24 November 2016 05:14 AM
> To: solr-user@lucene.apache.org
> Subject: Re: SOLR vs mongdb
> 
> Sure. Someone sends an HTTP request that deletes all the content. I’m glad to 
> share the curl request.
> 
> Or you can put content in with fields that are indexed but not stored. Then 
> the content is “gone” as soon as you send it to Solr.
> 
> Or you change the schema and need to reindex, but don’t have copies of the 
> original content.
> 
> Or there there is some disk problem and some docs are not in the backup 
> because the backups aren’t transactional.
> 
> I’m sure there are other situations.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
> 
>> On Nov 23, 2016, at 9:00 PM, Kris Musshorn  wrote:
>> 
>> Will someone please give me a detailed scenario where solr content could 
>> "disappear"? 
>> 
>> Disappear means what exactly?
>> 
>> TIA,
>> Kris
>> 
>> 
>> -Original Message-
>> From: Walter Underwood [mailto:wun...@wunderwood.org]
>> Sent: Wednesday, November 23, 2016 7:47 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: SOLR vs mongdb
>> 
>> Well, I didn’t actually recommend MongoDB as a repository. :-)
>> 
>> If you want transactions and search, buy MarkLogic. I worked there for two 
>> years, and that is serious non-muggle technology.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>> 
>>> On Nov 23, 2016, at 4:43 PM, Alexandre Rafalovitch  
>>> wrote:
>>> 
>>> Actually, you need to be ok that your content will disappear when you 
>>> use MongoDB as well :-(
>>> 
>>> But I understand what you were trying to say.
>>> 
>>> http://www.solr-start.com/ - Resources for Solr users, new and 
>>> experienced
>>> 
>>> 
>>> On 24 November 2016 at 11:34, Walter Underwood  
>>> wrote:
 The choice is simple. Are you OK if all your content disappears and you 
 need to reload?
 If so, use Solr. If not, you need some kind of repository. It can be files 
 in Amazon S3.
 But Solr is not designed to preserve your data.
 
 wunder
 Walter Underwood
 wun...@wunderwood.org
 http://observer.wunderwood.org/  (my blog)
 
 
> On Nov 23, 2016, at 4:12 PM, Alexandre Rafalovitch  
> wrote:
> 
> Solr supports automatic detection of content types for new fields.
> That was - unfortunately - named as schemaless mode. It still is 
> typed under the covers and has limitations. Such as needing all 
> automatically created fields to be multivalued (by the default 
> schemaless definition).
> 
> MongoDB is better about actually storing content, especially nested 
> content. Solr can store content, but that's not what it is about.
> You can totally turn off all the stored flags in Solr and return 
> just document ids, while storing the content in MongoDB.
> 
> You can search in Mongo and you can store content in Solr, so for 
> simple use cases you can use either one to serve both cause. But 
> you can also pound nails with a brick and make holes with a hammer.
> 
> Oh, and do not read this as me endorsing MongoDB. I would probably 
> look at Postgress with JSON columns instead, as it is more reliable 
> and feature rich.
> 
> Regards,
> Alex.
> 
> http://www.solr-start.com/ - Resources for Solr users, new and 
> experienced
> 
> 
> On 24 November 2016 at 07:34, Prateek Jain J 
>  wrote:
>> SOLR also supports, schemaless behaviour. and my question is same that, 
>> why and where should we prefer mongodb. Web search didn’t helped me on 
>> this.
>> 
>> 
>> Regards,
>> Prateek Jain
>> 
>> -Original Message-
>> From: Rohit Kanchan [mailto:rohitkan2...@gmail.com]
>> Sent: 23 November 2016 07:07 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: SOLR vs mongdb
>> 
>> Hi Prateek,
>> 
>> I think you are talking about two different animals. Solr(actually 
>> embedded
>> lucene) is actually a search engine where you 

Metadata and Newline Characters at Content

2016-11-24 Thread Furkan KAMACI
Hi,

I'm testing Solr 4.9.1 I've indexed documents via it. Content field at
schema has text_general field type which is not modified from original. I
do not copy any fields to content. When I check the data  I see content
values as like:

 " \n \nstream_source_info MARLON BRANDO.rtf   \nstream_content_type
application/rtf   \nstream_size 13580   \nstream_name MARLON BRANDO.rtf
\nContent-Type application/rtf   \nresourceName MARLON BRANDO.rtf   \n  \n
\n  1. Vivien Leigh and Marlon Brando in \"A Streetcar Named Desire\"
directed by Elia Kazan \n"

My questions:

1) Is it usual to have that newline characters?
2) Is it usual to have file metadata at the beginning of the content (i.e.
stream source, stream_content_type) or related to tool that I post data to
Solr?

Kind Regards,
Furkan KAMACI


Re: Query parser behavior with AND and negative clause

2016-11-24 Thread Alessandro Benedetti
Hey Sandeep,
can you debug the query ( debugQuery=on) and show how the query is parsed ?

Cheers



On Thu, Nov 24, 2016 at 12:38 PM, Sandeep Khanzode <
sandeep_khanz...@yahoo.com.invalid> wrote:

> Hi Erick,
> The example record contains ...dateRange1 = [2016-11-22T18:00:00Z TO
> 2016-11-22T20:00:00Z], [2016-11-22T06:00:00Z TO 
> 2016-11-22T14:00:00Z]dateRange2
> = [2016-11-22T12:00:00Z TO 2016-11-22T14:00:00Z]"
> The first query works ... which means that it is able to EXCLUDE this
> record from the result (since the negative dateRange2 clause should return
> false). Whereas the second query should also work but it does not and
> actually pulls the record in the result.
> WORKS:
> +{!field f=dateRange1 op=Intersects v='[2016-11-22T12:01:00Z TO
> 2016-11-22T13:59:00Z]'} +(*:* -{!field f=dateRange2 op=Contains
> v='[2016-11-22T12:01:00Z TO 2016-11-22T13:59:00Z]'})
>
>
> DOES NOT WORK :
> {!field f=dateRange1 op=Intersects v='[2016-11-22T12:01:00Z TO
> 2016-11-22T13:59:00Z]'} AND (*:* -{!field f=dateRange2 op=Contains
> v='[2016-11-22T12:01:00Z TO 2016-11-22T13:59:00Z]'})
>  SRK
>
> On Tuesday, November 22, 2016 9:41 PM, Erick Erickson <
> erickerick...@gmail.com> wrote:
>
>
>  _How_ does it "not work"? You haven't told us what you expect .vs.
> what you get back.
>
> Plus a sample doc that that violates your expectations (just the
> dateRange field) would
> also help.
>
> Best,
> Erick
>
> On Tue, Nov 22, 2016 at 4:23 AM, Sandeep Khanzode
>  wrote:
> > Hi,
> > I have a simple query that should intersect with dateRange1 and NOT be
> contained within dateRange2. I have tried the following options:
> >
> > WORKS:
> > +{!field f=dateRange1 op=Intersects v='[2016-11-22T12:01:00Z TO
> 2016-11-22T13:59:00Z]'} +(*:* -{!field f=dateRange2 op=Contains
> v='[2016-11-22T12:01:00Z TO 2016-11-22T13:59:00Z]'})
> >
> >
> > DOES NOT WORK :
> > {!field f=dateRange1 op=Intersects v='[2016-11-22T12:01:00Z TO
> 2016-11-22T13:59:00Z]'} AND (*:* -{!field f=dateRange2 op=Contains
> v='[2016-11-22T12:01:00Z TO 2016-11-22T13:59:00Z]'})
> >
> > Why?
> >
> > WILL NOT WORK (because of the negative clause at the top level?):
> > {!field f=dateRange1 op=Intersects v='[2016-11-22T12:01:00Z TO
> 2016-11-22T13:59:00Z]'} AND -{!field f=dateRange2 op=Contains
> v='[2016-11-22T12:01:00Z TO 2016-11-22T13:59:00Z]'}
> >
> >
> > SRK
>
>
>
>



-- 
--

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


RE: Again : Query formulation help

2016-11-24 Thread Prasanna S. Dhakephalkar
:(

Thanks Michael.

Regards,

Prasanna.

-Original Message-
From: Michael Kuhlmann [mailto:k...@solr.info] 
Sent: Thursday, November 24, 2016 4:29 PM
To: solr-user@lucene.apache.org
Subject: Re: Again : Query formulation help

Hi Prasanna,

there's no such filter out-of-the-box. It's similar to the mm parameter in
(e)dismax parser, but this only works for full text searches on the same
fields.

So you have to build the query on your own using all possible permutations:

fq=(code1: AND code2:) OR (code1: AND code3:) OR .

Of course, such a query can become huge when there are more than four
constraints.

Best,
Michael

Am 24.11.2016 um 11:40 schrieb Prasanna S. Dhakephalkar:
> Hi,
>
>  
>
> Need to formulate a distinctive field values query on 4 fields with 
> minimum match on 2 fields
>
>  
>
> I have 4 fields in my core
>
> Code 1 : Values between 1001 to 
>
> Code 2 : Values between 1001 to 
>
> Code 3 : Values between 1001 to 
>
> Code 4 : Values between 1001 to 
>
>  
>
> I want to formulate a query in following manner
>
>  
>
> Code 1 : 
>
> Code 2 : 
>
> Code 3 : 
>
> Code 4 : 
>
>  
>
> I want to formulate a query, given above parameters, the result should 
> contain documents where at least 2 of the above match.
>
>  
>
> Thanks and Regards,
>
>  
>
> Prasanna
>
>  
>
>




Re: Query parser behavior with AND and negative clause

2016-11-24 Thread Sandeep Khanzode
Hi Erick,
The example record contains ...dateRange1 = [2016-11-22T18:00:00Z TO 
2016-11-22T20:00:00Z], [2016-11-22T06:00:00Z TO 2016-11-22T14:00:00Z]dateRange2 
= [2016-11-22T12:00:00Z TO 2016-11-22T14:00:00Z]"
The first query works ... which means that it is able to EXCLUDE this record 
from the result (since the negative dateRange2 clause should return false). 
Whereas the second query should also work but it does not and actually pulls 
the record in the result.
WORKS:
+{!field f=dateRange1 op=Intersects v='[2016-11-22T12:01:00Z TO 
2016-11-22T13:59:00Z]'} +(*:* -{!field f=dateRange2 op=Contains 
v='[2016-11-22T12:01:00Z TO 2016-11-22T13:59:00Z]'})


DOES NOT WORK :
{!field f=dateRange1 op=Intersects v='[2016-11-22T12:01:00Z TO 
2016-11-22T13:59:00Z]'} AND (*:* -{!field f=dateRange2 op=Contains 
v='[2016-11-22T12:01:00Z TO 2016-11-22T13:59:00Z]'})
 SRK 

On Tuesday, November 22, 2016 9:41 PM, Erick Erickson 
 wrote:
 

 _How_ does it "not work"? You haven't told us what you expect .vs.
what you get back.

Plus a sample doc that that violates your expectations (just the
dateRange field) would
also help.

Best,
Erick

On Tue, Nov 22, 2016 at 4:23 AM, Sandeep Khanzode
 wrote:
> Hi,
> I have a simple query that should intersect with dateRange1 and NOT be 
> contained within dateRange2. I have tried the following options:
>
> WORKS:
> +{!field f=dateRange1 op=Intersects v='[2016-11-22T12:01:00Z TO 
> 2016-11-22T13:59:00Z]'} +(*:* -{!field f=dateRange2 op=Contains 
> v='[2016-11-22T12:01:00Z TO 2016-11-22T13:59:00Z]'})
>
>
> DOES NOT WORK :
> {!field f=dateRange1 op=Intersects v='[2016-11-22T12:01:00Z TO 
> 2016-11-22T13:59:00Z]'} AND (*:* -{!field f=dateRange2 op=Contains 
> v='[2016-11-22T12:01:00Z TO 2016-11-22T13:59:00Z]'})
>
> Why?
>
> WILL NOT WORK (because of the negative clause at the top level?):
> {!field f=dateRange1 op=Intersects v='[2016-11-22T12:01:00Z TO 
> 2016-11-22T13:59:00Z]'} AND -{!field f=dateRange2 op=Contains 
> v='[2016-11-22T12:01:00Z TO 2016-11-22T13:59:00Z]'}
>
>
> SRK


   

Re: Wildcard searches with space in TextField/StrField

2016-11-24 Thread Sandeep Khanzode
Hi,
This is the typical TextField with ...             
            



SRK 

On Thursday, November 24, 2016 1:38 AM, Reth RM  
wrote:
 

 what is the fieldType of those records?  
On Tue, Nov 22, 2016 at 4:18 AM, Sandeep Khanzode 
 wrote:

Hi Erick,
I gave this a try. 
These are my results. There is a record with "John D. Smith", and another named 
"John Doe".

1.] {!complexphrase inOrder=true}name:"John D.*" ... does not fetch any 
results. 

2.] {!complexphrase inOrder=true}name:"John D*" ... fetches both results. 



Second observation: There is a record with "John D Smith"
1.] {!complexphrase inOrder=true}name:"John*" ... does not fetch any results. 

2.] {!complexphrase inOrder=true}name:"John D*" ... fetches that record. 

3.] {!complexphrase inOrder=true}name:"John D S*" ... fetches that record. 

SRK

    On Sunday, November 13, 2016 7:43 AM, Erick Erickson 
 wrote:


 Right, for that kind of use case you want complexPhraseQueryParser,
see: https://cwiki.apache.org/ confluence/display/solr/Other+ 
Parsers#OtherParsers- ComplexPhraseQueryParser

Best,
Erick

On Sat, Nov 12, 2016 at 9:39 AM, Sandeep Khanzode
 wrote:
> Thanks, Erick.
>
> I am actually not trying to use the String field (prefer a TextField here).
> But, in my comparisons with TextField, it seems that something like phrase
> matching with whitespace and wildcard (like, 'my do*' or say, 'my dog*', or
> say, 'my dog has*') can only be accomplished with a string type field,
> especially because, with a WhitespaceTokenizer in TextField, the space will
> be lost, and all tokens will be individually considered. Am I missing
> something?
>
> SRK
>
>
> On Friday, November 11, 2016 10:05 PM, Erick Erickson
>  wrote:
>
>
> You have to query text and string fields differently, that's just the
> way it works. The problem is getting the query string through the
> parser as a _single_ token or as multiple tokens.
>
> Let's say you have a string field with the "a b" example. You have a
> single token
> a b that starts at offset 0.
>
> But with a text field, you have two tokens,
> a at position 0
> b at position 1
>
> But when the query parser sees "a b" (without quotes) it splits it
> into two tokens, and only the text field has both tokens so the string
> field won't match.
>
> OTOH, when the query parser sees "a\ b" it passes this through as a
> single token, which only matches the string field as there's no
> _single_ token "a b" in the text field.
>
> But a more interesting question is why you want to search this way.
> String fields are intended for keywords, machine-generated IDs and the
> like. They're pretty useless for searching anything except
> 1> exact tokens
> 2> prefixes
>
> While if you have "my dog has fleas" in a string field, you _can_
> search "*dog*" and get a hit but the performance is poor when you get
> a large corpus. Performance for "my*" will be pretty good though.
>
> In all this sounds like an XY problem, what's the use-case you're
> trying to solve?
>
> Best,
> Erick
>
>
>
> On Thu, Nov 10, 2016 at 10:11 PM, Sandeep Khanzode
>  wrote:
>> Hi Erick, Reth,
>>
>> The 'a\ b*' as well as the q.op=AND approach worked (successfully) only
>> for StrField for me.
>>
>> Any attempt at creating a 'a\ b*' for a TextField does not match any
>> documents. The parsedQuery in debug mode does show 'field:a b*'. I am sure
>> there are documents that should match.
>> Another (maybe unrelated) observation is if I have 'field:a\ b', then the
>> parsedQuery is field:a field:b. Which does not match as expected (matches
>> individually).
>>
>> Can you please provide an example that I can use in Solr Query dashboard?
>> That will be helpful.
>>
>> I have also seen that wildcard queries work irrespective of field type
>> i.e. StrField as well as TextField. That makes sense because with a
>> WhitespaceTokenizer only creates word boundaries when we do not use a
>> EdgeNGramFilter. If I am not wrong, that is. SRK
>>
>>    On Friday, November 11, 2016 5:00 AM, Erick Erickson
>>  wrote:
>>
>>
>>  You can escape the space with a backslash as  'a\ b*'
>>
>> Best,
>> Erick
>>
>> On Thu, Nov 10, 2016 at 2:37 PM, Reth RM  wrote:
>>> I don't think you can do wildcard on StrField. For text field, if your
>>> query is "category:(test m*)"  the parsed query will be  "category:test
>>> OR
>>> category:m*"
>>> You can add q.op=AND to make an AND between those terms.
>>>
>>> For phrase type wild card query support, as per docs, it
>>> is ComplexPhraseQueryParser that supports it. (I haven't tested it
>>> myself)
>>>
>>>
>>> https://cwiki.apache.org/ confluence/display/solr/Other+ 
>>> Parsers#OtherParsers- ComplexPhraseQueryParser
>>>
>>> On Thu, Nov 10, 2016 at 11:40 AM, Sandeep Khanzode <
>>> sandeep_khanz...@yahoo.com. invalid> wrote:
>>>

RE: SOLR vs mongdb

2016-11-24 Thread Prateek Jain J

Hi Walter,

With the solr support to sharding, is the storage capability still in question? 
Or we are only talking about features like transaction logs, which can be used 
to re-build database.

Regards,
Prateek Jain

-Original Message-
From: Walter Underwood [mailto:wun...@wunderwood.org] 
Sent: 24 November 2016 05:14 AM
To: solr-user@lucene.apache.org
Subject: Re: SOLR vs mongdb

Sure. Someone sends an HTTP request that deletes all the content. I’m glad to 
share the curl request.

Or you can put content in with fields that are indexed but not stored. Then the 
content is “gone” as soon as you send it to Solr.

Or you change the schema and need to reindex, but don’t have copies of the 
original content.

Or there there is some disk problem and some docs are not in the backup because 
the backups aren’t transactional.

I’m sure there are other situations.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Nov 23, 2016, at 9:00 PM, Kris Musshorn  wrote:
> 
> Will someone please give me a detailed scenario where solr content could 
> "disappear"? 
> 
> Disappear means what exactly?
> 
> TIA,
> Kris
> 
> 
> -Original Message-
> From: Walter Underwood [mailto:wun...@wunderwood.org]
> Sent: Wednesday, November 23, 2016 7:47 PM
> To: solr-user@lucene.apache.org
> Subject: Re: SOLR vs mongdb
> 
> Well, I didn’t actually recommend MongoDB as a repository. :-)
> 
> If you want transactions and search, buy MarkLogic. I worked there for two 
> years, and that is serious non-muggle technology.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
> 
>> On Nov 23, 2016, at 4:43 PM, Alexandre Rafalovitch  
>> wrote:
>> 
>> Actually, you need to be ok that your content will disappear when you 
>> use MongoDB as well :-(
>> 
>> But I understand what you were trying to say.
>> 
>> http://www.solr-start.com/ - Resources for Solr users, new and 
>> experienced
>> 
>> 
>> On 24 November 2016 at 11:34, Walter Underwood  wrote:
>>> The choice is simple. Are you OK if all your content disappears and you 
>>> need to reload?
>>> If so, use Solr. If not, you need some kind of repository. It can be files 
>>> in Amazon S3.
>>> But Solr is not designed to preserve your data.
>>> 
>>> wunder
>>> Walter Underwood
>>> wun...@wunderwood.org
>>> http://observer.wunderwood.org/  (my blog)
>>> 
>>> 
 On Nov 23, 2016, at 4:12 PM, Alexandre Rafalovitch  
 wrote:
 
 Solr supports automatic detection of content types for new fields.
 That was - unfortunately - named as schemaless mode. It still is 
 typed under the covers and has limitations. Such as needing all 
 automatically created fields to be multivalued (by the default 
 schemaless definition).
 
 MongoDB is better about actually storing content, especially nested 
 content. Solr can store content, but that's not what it is about.
 You can totally turn off all the stored flags in Solr and return 
 just document ids, while storing the content in MongoDB.
 
 You can search in Mongo and you can store content in Solr, so for 
 simple use cases you can use either one to serve both cause. But 
 you can also pound nails with a brick and make holes with a hammer.
 
 Oh, and do not read this as me endorsing MongoDB. I would probably 
 look at Postgress with JSON columns instead, as it is more reliable 
 and feature rich.
 
 Regards,
 Alex.
 
 http://www.solr-start.com/ - Resources for Solr users, new and 
 experienced
 
 
 On 24 November 2016 at 07:34, Prateek Jain J 
  wrote:
> SOLR also supports, schemaless behaviour. and my question is same that, 
> why and where should we prefer mongodb. Web search didn’t helped me on 
> this.
> 
> 
> Regards,
> Prateek Jain
> 
> -Original Message-
> From: Rohit Kanchan [mailto:rohitkan2...@gmail.com]
> Sent: 23 November 2016 07:07 PM
> To: solr-user@lucene.apache.org
> Subject: Re: SOLR vs mongdb
> 
> Hi Prateek,
> 
> I think you are talking about two different animals. Solr(actually 
> embedded
> lucene) is actually a search engine where you can use different features 
> like faceting, highlighting etc but it is a document store where for each 
> text it does create an Inverted index and map that to documents.  Mongodb 
> is also document store but I think it adds basic search capability.  This 
> is my understanding. We are using mongo for temporary storage and I think 
> it is good for that where you want to store a key value document in a 
> collection without any static schema. In Solr you need to define your 
> schema. In solr you can define dynamic fields too. This is all my 
> 

Re: AW: AW: Resync after restart

2016-11-24 Thread Arkadi Colson
This is the code from the master node. Al configs are the same on all 
nodes. I always add slaves with the collection API. Is there an other 
place to look for this part of the config?



On 24-11-16 12:02, Michael Aleythe, Sternwald wrote:

You need to change this on the master node. The part of the config you pasted 
here, looks like it is from the slave node.

-Ursprüngliche Nachricht-
Von: Arkadi Colson [mailto:ark...@smartbit.be]
Gesendet: Donnerstag, 24. November 2016 11:56
An: solr-user@lucene.apache.org
Betreff: Re: AW: Resync after restart

Hi Michael

Thanks for the quick response! The line does not exist in my config. So can I 
assume that the default configuration is to not replicate at startup?


  
18.75
05:00:00
15
30
  


Any other idea's?


On 24-11-16 11:49, Michael Aleythe, Sternwald wrote:

Hi Arkadi,

you need to remove the line "startup" from your 
ReplicationHandler-config in solrconfig.xml -> https://wiki.apache.org/solr/SolrReplication.

Greetings
Michael

-Ursprüngliche Nachricht-
Von: Arkadi Colson [mailto:ark...@smartbit.be]
Gesendet: Donnerstag, 24. November 2016 09:26
An: solr-user 
Betreff: Resync after restart

Hi

Almost every time when restarting a solr instance the index is replicated 
completely. Is there a way to avoid this somehow? The index currently has a 
size of about 17GB.
Some advice here would be great.

99% of the config is defaul:

 ${solr.ulog.dir:} ${solr.ulog.numVersionBuckets:65536}
 
  ${solr.autoCommit.maxTime:15000}
  false


If you need more info, just let me know...

Thx!
Arkadi





Re: Need help to update multiple documents

2016-11-24 Thread GW
I've not looked at your file. If you are really thinking update, there is
no such thing. You can only replace the entire document or delete it.

On 23 November 2016 at 23:47, Reddy Sankar 
wrote:

> Hi Team ,
>
>
>
> Facing issue to update multiple document in SOLAR at time in my batch job.
>
>
>
> Could you please help me by giving example or an documentation for the
> same.
>
>
>
> Thanks
>
> Sankar Reddy M.B
>


Re: Again : Query formulation help

2016-11-24 Thread Michael Kuhlmann
Hi Prasanna,

there's no such filter out-of-the-box. It's similar to the mm parameter
in (e)dismax parser, but this only works for full text searches on the
same fields.

So you have to build the query on your own using all possible permutations:

fq=(code1: AND code2:) OR (code1: AND code3:) OR .

Of course, such a query can become huge when there are more than four
constraints.

Best,
Michael

Am 24.11.2016 um 11:40 schrieb Prasanna S. Dhakephalkar:
> Hi,
>
>  
>
> Need to formulate a distinctive field values query on 4 fields with minimum
> match on 2 fields
>
>  
>
> I have 4 fields in my core
>
> Code 1 : Values between 1001 to 
>
> Code 2 : Values between 1001 to 
>
> Code 3 : Values between 1001 to 
>
> Code 4 : Values between 1001 to 
>
>  
>
> I want to formulate a query in following manner
>
>  
>
> Code 1 : 
>
> Code 2 : 
>
> Code 3 : 
>
> Code 4 : 
>
>  
>
> I want to formulate a query, given above parameters, the result should
> contain documents where at least 2 of the above match.
>
>  
>
> Thanks and Regards,
>
>  
>
> Prasanna
>
>  
>
>



AW: AW: Resync after restart

2016-11-24 Thread Michael Aleythe, Sternwald
You need to change this on the master node. The part of the config you pasted 
here, looks like it is from the slave node.

-Ursprüngliche Nachricht-
Von: Arkadi Colson [mailto:ark...@smartbit.be] 
Gesendet: Donnerstag, 24. November 2016 11:56
An: solr-user@lucene.apache.org
Betreff: Re: AW: Resync after restart

Hi Michael

Thanks for the quick response! The line does not exist in my config. So can I 
assume that the default configuration is to not replicate at startup?

   
 
   18.75
   05:00:00
   15
   30
 
   

Any other idea's?


On 24-11-16 11:49, Michael Aleythe, Sternwald wrote:
> Hi Arkadi,
>
> you need to remove the line "startup" from 
> your ReplicationHandler-config in solrconfig.xml -> 
> https://wiki.apache.org/solr/SolrReplication.
>
> Greetings
> Michael
>
> -Ursprüngliche Nachricht-
> Von: Arkadi Colson [mailto:ark...@smartbit.be]
> Gesendet: Donnerstag, 24. November 2016 09:26
> An: solr-user 
> Betreff: Resync after restart
>
> Hi
>
> Almost every time when restarting a solr instance the index is replicated 
> completely. Is there a way to avoid this somehow? The index currently has a 
> size of about 17GB.
> Some advice here would be great.
>
> 99% of the config is defaul:
>
>  ${solr.ulog.dir:}  name="numVersionBuckets">${solr.ulog.numVersionBuckets:65536}
>  
>  ${solr.autoCommit.maxTime:15000}
>  false
>
>
> If you need more info, just let me know...
>
> Thx!
> Arkadi
>



Re: AW: Resync after restart

2016-11-24 Thread Arkadi Colson

Hi Michael

Thanks for the quick response! The line does not exist in my config. So 
can I assume that the default configuration is to not replicate at startup?


  

  18.75
  05:00:00
  15
  30

  

Any other idea's?


On 24-11-16 11:49, Michael Aleythe, Sternwald wrote:

Hi Arkadi,

you need to remove the line "startup" from your 
ReplicationHandler-config in solrconfig.xml -> https://wiki.apache.org/solr/SolrReplication.

Greetings
Michael

-Ursprüngliche Nachricht-
Von: Arkadi Colson [mailto:ark...@smartbit.be]
Gesendet: Donnerstag, 24. November 2016 09:26
An: solr-user 
Betreff: Resync after restart

Hi

Almost every time when restarting a solr instance the index is replicated 
completely. Is there a way to avoid this somehow? The index currently has a 
size of about 17GB.
Some advice here would be great.

99% of the config is defaul:

 ${solr.ulog.dir:} ${solr.ulog.numVersionBuckets:65536}
 
 ${solr.autoCommit.maxTime:15000}
 false
   

If you need more info, just let me know...

Thx!
Arkadi





AW: Resync after restart

2016-11-24 Thread Michael Aleythe, Sternwald
Hi Arkadi,

you need to remove the line "startup" from 
your ReplicationHandler-config in solrconfig.xml -> 
https://wiki.apache.org/solr/SolrReplication.

Greetings
Michael

-Ursprüngliche Nachricht-
Von: Arkadi Colson [mailto:ark...@smartbit.be] 
Gesendet: Donnerstag, 24. November 2016 09:26
An: solr-user 
Betreff: Resync after restart

Hi

Almost every time when restarting a solr instance the index is replicated 
completely. Is there a way to avoid this somehow? The index currently has a 
size of about 17GB.
Some advice here would be great.

99% of the config is defaul:

 ${solr.ulog.dir:} ${solr.ulog.numVersionBuckets:65536}
   
${solr.autoCommit.maxTime:15000}  
false  
  

If you need more info, just let me know...

Thx!
Arkadi



Again : Query formulation help

2016-11-24 Thread Prasanna S. Dhakephalkar
Hi,

 

Need to formulate a distinctive field values query on 4 fields with minimum
match on 2 fields

 

I have 4 fields in my core

Code 1 : Values between 1001 to 

Code 2 : Values between 1001 to 

Code 3 : Values between 1001 to 

Code 4 : Values between 1001 to 

 

I want to formulate a query in following manner

 

Code 1 : 

Code 2 : 

Code 3 : 

Code 4 : 

 

I want to formulate a query, given above parameters, the result should
contain documents where at least 2 of the above match.

 

Thanks and Regards,

 

Prasanna

 



RE: SOLR vs mongdb

2016-11-24 Thread Prateek Jain J

I have used Marklogic for around 6 months but its majorly used for custom 
ontologies and had serious issues once you start asking for more search results 
(other than default) in one go. 


Regards,
Prateek Jain

-Original Message-
From: Walter Underwood [mailto:wun...@wunderwood.org] 
Sent: 24 November 2016 12:47 AM
To: solr-user@lucene.apache.org
Subject: Re: SOLR vs mongdb

Well, I didn’t actually recommend MongoDB as a repository. :-)

If you want transactions and search, buy MarkLogic. I worked there for two 
years, and that is serious non-muggle technology.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Nov 23, 2016, at 4:43 PM, Alexandre Rafalovitch  wrote:
> 
> Actually, you need to be ok that your content will disappear when you 
> use MongoDB as well :-(
> 
> But I understand what you were trying to say.
> 
> http://www.solr-start.com/ - Resources for Solr users, new and 
> experienced
> 
> 
> On 24 November 2016 at 11:34, Walter Underwood  wrote:
>> The choice is simple. Are you OK if all your content disappears and you need 
>> to reload?
>> If so, use Solr. If not, you need some kind of repository. It can be files 
>> in Amazon S3.
>> But Solr is not designed to preserve your data.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>> 
>>> On Nov 23, 2016, at 4:12 PM, Alexandre Rafalovitch  
>>> wrote:
>>> 
>>> Solr supports automatic detection of content types for new fields.
>>> That was - unfortunately - named as schemaless mode. It still is 
>>> typed under the covers and has limitations. Such as needing all 
>>> automatically created fields to be multivalued (by the default 
>>> schemaless definition).
>>> 
>>> MongoDB is better about actually storing content, especially nested 
>>> content. Solr can store content, but that's not what it is about. 
>>> You can totally turn off all the stored flags in Solr and return 
>>> just document ids, while storing the content in MongoDB.
>>> 
>>> You can search in Mongo and you can store content in Solr, so for 
>>> simple use cases you can use either one to serve both cause. But you 
>>> can also pound nails with a brick and make holes with a hammer.
>>> 
>>> Oh, and do not read this as me endorsing MongoDB. I would probably 
>>> look at Postgress with JSON columns instead, as it is more reliable 
>>> and feature rich.
>>> 
>>> Regards,
>>>  Alex.
>>> 
>>> http://www.solr-start.com/ - Resources for Solr users, new and 
>>> experienced
>>> 
>>> 
>>> On 24 November 2016 at 07:34, Prateek Jain J 
>>>  wrote:
 SOLR also supports, schemaless behaviour. and my question is same that, 
 why and where should we prefer mongodb. Web search didn’t helped me on 
 this.
 
 
 Regards,
 Prateek Jain
 
 -Original Message-
 From: Rohit Kanchan [mailto:rohitkan2...@gmail.com]
 Sent: 23 November 2016 07:07 PM
 To: solr-user@lucene.apache.org
 Subject: Re: SOLR vs mongdb
 
 Hi Prateek,
 
 I think you are talking about two different animals. Solr(actually 
 embedded
 lucene) is actually a search engine where you can use different features 
 like faceting, highlighting etc but it is a document store where for each 
 text it does create an Inverted index and map that to documents.  Mongodb 
 is also document store but I think it adds basic search capability.  This 
 is my understanding. We are using mongo for temporary storage and I think 
 it is good for that where you want to store a key value document in a 
 collection without any static schema. In Solr you need to define your 
 schema. In solr you can define dynamic fields too. This is all my 
 understanding.
 
 -
 Rohit
 
 
 On Wed, Nov 23, 2016 at 10:27 AM, Prateek Jain J < 
 prateek.j.j...@ericsson.com> wrote:
 
> 
> Hi All,
> 
> I have started to use mongodb and solr recently. Please feel free 
> to correct me where my understanding is not upto the mark:
> 
> 
> 1.   Solr is indexing engine but it stores both data and indexes in
> same directory. Although we can select fields to store/persist in 
> solr via schema.xml. But in nutshell, it's not possible to 
> distinguish between data and indexes like, I can't remove all 
> indexes and still have persisted data with SOLR.
> 
> 2.   Solr indexing capabilities are far better than any other nosql db
> like mongodb etc. like faceting, weighted search.
> 
> 3.   Both support scalability via sharding.
> 
> 4.   We can have architecture where data is stored in separate db like
> mongodb or mysql. SOLR can connect with db and index data (in SOLR).
> 
> I tried googling for question "solr vs mongodb" and 

Zookeeper version

2016-11-24 Thread Novin Novin
Hi Guys,

I found in solr docs that "Solr currently uses Apache ZooKeeper v3.4.6".
Can I use higher version or I have to use 3.4.6 zookeeper.

Thanks in advance,
Novin


Resync after restart

2016-11-24 Thread Arkadi Colson

Hi

Almost every time when restarting a solr instance the index is 
replicated completely. Is there a way to avoid this somehow? The index 
currently has a size of about 17GB.

Some advice here would be great.

99% of the config is defaul:

 ${solr.ulog.dir:} name="numVersionBuckets">${solr.ulog.numVersionBuckets:65536} 
   
   ${solr.autoCommit.maxTime:15000}  
   false  
 


If you need more info, just let me know...

Thx!
Arkadi