Solr Cloud and Multiple Indexes

2015-11-05 Thread Salman Ansari
Hi,

I am using Solr cloud and I have created a single index that host around
70M documents distributed into 2 shards (each having 35M documents) and 2
replicas. The queries are very slow to run so I was thinking to distribute
the indexes into multiple indexes and consequently distributed search. Can
anyone guide me to some sources (articles) that discuss this in Solr Cloud?

Appreciate your feedback regarding this.

Regards,
Salman


Re: Solr Cloud and Multiple Indexes

2015-11-05 Thread Modassar Ather
What is your index size? How much memory is used? What type of queries are
slow?
Are there GC pauses as they can be a cause of slowness?
Are document updates/additions happening in parallel?

The queries are very slow to run so I was thinking to distribute
the indexes into multiple indexes and consequently distributed search. Can
anyone guide me to some sources (articles) that discuss this in Solr Cloud?

This is what you are already doing. Did you mean that you want to add more
shards?

Regards,
Modassar

On Thu, Nov 5, 2015 at 1:51 PM, Salman Ansari 
wrote:

> Hi,
>
> I am using Solr cloud and I have created a single index that host around
> 70M documents distributed into 2 shards (each having 35M documents) and 2
> replicas. The queries are very slow to run so I was thinking to distribute
> the indexes into multiple indexes and consequently distributed search. Can
> anyone guide me to some sources (articles) that discuss this in Solr Cloud?
>
> Appreciate your feedback regarding this.
>
> Regards,
> Salman
>


Re: OpenNLP plugin or similar NER software for Solr ??? !!!

2015-11-05 Thread Alessandro Benedetti
Apparently this mail thread is duplicated, anyway I will copy and paste my
previous comment as well :

Hi Christian.
This was quite easy to have, since 2011.
But you can complicate this as much as you want.
Or customise it as much as you want.

Take a look :

https://cwiki.apache.org/confluence/display/solr/UIMA+Integration

https://wiki.apache.org/solr/SolrUIMA

This is a good painless starting point.

Then you can complicate the scenario how much you want, developing your own
updateProcessor .
This is a simple customisation and you can decide to use the best location
NER available (
for example I would suggest you to explore :
http://nlp.stanford.edu/software/corenlp.shtml for the open source ones)

Apache Open NLP could be a good choice as well.

Let us know, if this is what you wanted.

Cheers


On 4 November 2015 at 20:10, Doug Turnbull <
dturnb...@opensourceconnections.com> wrote:

> David Smiley had a place name and general tagging engine that for the life
> of me I can't find.
>
> It didn't do NER for you (I'm not sure you want to do this in the search
> engine) but it helps you tag entities in a search engine based on a
> predefined list. At least that's what I remember.
>
> On Wed, Nov 4, 2015 at 3:05 PM,  wrote:
>
> > Hi everyone,
> >
> > I need to install a plugin to extract Location (Country/State/City) from
> > free text documents - any professional advice?!? Does OpenNLP really does
> > the job? Is it English only? US only? Or does it cover worldwide places
> > names?
> > Could someone help me with this job - installation, configuration,
> > model-training etc?
> >
> > Please help,Kind regards,Christian
> >  Christian Fotache Tel: 0728.297.207 Fax: 0351.411.570
> >
> >
> >  From: Upayavira 
> >  To: solr-user@lucene.apache.org
> >  Sent: Tuesday, November 3, 2015 12:13 PM
> >  Subject: Re: language plugin
> >
> > Looking at the code, this is not going to work without modifications to
> > Solr (or at least a custom component).
> >
> > The atomic update code is closely embedded into the Solr
> > DistributedUpdateProcessor, which expands the atomic update into a full
> > document and then posts it to the shards.
> >
> > You need to do the update expansion before your lang detect processor,
> > but there is no gap between them.
> >
> > From my reading of the code, you could create an AtomicUpdateProcessor
> > that simply expands updates, and insert that before the
> > LangDetectUpdateProcessor.
> >
> > Upayavira
> >
> > On Tue, Nov 3, 2015, at 06:38 AM, Chaushu, Shani wrote:
> > > Hi
> > > When I make atomic update - set field - also on content field and also
> > > another field, the language field became generic. Meaning, it doesn’t
> > > work in the set field, only in the first inserting. Even if in the
> first
> > > time the language was detected, it just became generic after the
> update.
> > > Any idea?
> > >
> > > The chain is
> > >
> > > 
> > >  > >
> >
> class="org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory">
> > > title,content,text
> > >language_t
> > >language_all_t
> > >generic
> > >false
> > >0.8
> > > 
> > > 
> > >  
> > > 
> > >
> > >
> > > Thanks,
> > > Shani
> > >
> > >
> > >
> > >
> > > -Original Message-
> > > From: Jack Krupansky [mailto:jack.krupan...@gmail.com]
> > > Sent: Thursday, October 29, 2015 17:04
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: language plugin
> > >
> > > Are you trying to do an atomic update without the content field? If so,
> > > it sounds like Solr needs an enhancement (bug fix?) so that language
> > > detection would be skipped if the input field is not present. Or maybe
> > > that could be an option.
> > >
> > >
> > > -- Jack Krupansky
> > >
> > > On Thu, Oct 29, 2015 at 3:25 AM, Chaushu, Shani <
> shani.chau...@intel.com
> > >
> > > wrote:
> > >
> > > > Hi,
> > > >  I'm using solr language detection plugin on field name "content"
> > > > (solr 4.10, plugin
> LangDetectLanguageIdentifierUpdateProcessorFactory)
> > > > When I'm indexing  on the first time it works fine, but if I want to
> > > > set one field again (regardless if it's the content or not) if goes
> to
> > > > its default language. If I'm setting other field I would like the
> > > > language to stay the way it was before, and o don't want to insert
> all
> > > > the content again. There is an option to set the plugin that it won't
> > > > calculate again the language? (put langid.overwrite to false didn't
> > > > work)
> > > >
> > > > Thanks,
> > > > Shani
> > > >
> > > >
> > > > -
> > > > Intel Electronics Ltd.
> > > >
> > > > This e-mail and any attachments may contain confidential material for
> > > > the sole use of the intended recipient(s). Any review or distribution
> > > > by others is strictly prohibited. If you are not the intended
> > > > recipient, please contact the sender 

Re: Solr Cloud and Multiple Indexes

2015-11-05 Thread Salman Ansari
Here is the current info

How much memory is used?
Physical memory consumption: 5.48 GB out of 14 GB.
Swap space consumption: 5.83 GB out of 15.94 GB.
JVM-Memory consumption: 1.58 GB out of 3.83 GB.

What is your index size?
I have around 70M documents distributed on 2 shards (so each shard has 35M
document)

What type of queries are slow?
I am running normal queries (queries on a field) no faceting or highlights
are requested. Currently, I am facing delay of 2-3 seconds but previously I
had delays of around 28 seconds.

Are there GC pauses as they can be a cause of slowness?
I doubt this as the slowness was happening for a long period of time.

Are document updates/additions happening in parallel?
No, I have stopped adding/updating documents and doing queries only.

This is what you are already doing. Did you mean that you want to add more
shards?
No, what I meant is that I read that previously there was a way to chunk a
large index into multiple and then do distributed search on that as in this
article https://wiki.apache.org/solr/DistributedSearch. What I was looking
for how this is handled in Solr Cloud?


Regards,
Salman





On Thu, Nov 5, 2015 at 12:06 PM, Modassar Ather 
wrote:

> What is your index size? How much memory is used? What type of queries are
> slow?
> Are there GC pauses as they can be a cause of slowness?
> Are document updates/additions happening in parallel?
>
> The queries are very slow to run so I was thinking to distribute
> the indexes into multiple indexes and consequently distributed search. Can
> anyone guide me to some sources (articles) that discuss this in Solr Cloud?
>
> This is what you are already doing. Did you mean that you want to add more
> shards?
>
> Regards,
> Modassar
>
> On Thu, Nov 5, 2015 at 1:51 PM, Salman Ansari 
> wrote:
>
> > Hi,
> >
> > I am using Solr cloud and I have created a single index that host around
> > 70M documents distributed into 2 shards (each having 35M documents) and 2
> > replicas. The queries are very slow to run so I was thinking to
> distribute
> > the indexes into multiple indexes and consequently distributed search.
> Can
> > anyone guide me to some sources (articles) that discuss this in Solr
> Cloud?
> >
> > Appreciate your feedback regarding this.
> >
> > Regards,
> > Salman
> >
>


Re: Invalid parsing with solr edismax operators

2015-11-05 Thread Mahmoud Almokadem
Thanks Jack. I have reported it as a bug on JIRA 

https://issues.apache.org/jira/browse/SOLR-8237 


Mahmoud 

> On Nov 4, 2015, at 5:30 PM, Jack Krupansky  wrote:
> 
> I think you should go ahead and file a Jira ticket for this as a bug since
> either it is an actual bug or some behavior nuance that needs to be
> documented better.
> 
> -- Jack Krupansky
> 
> On Wed, Nov 4, 2015 at 8:24 AM, Mahmoud Almokadem 
> wrote:
> 
>> I removed the q.op=“AND” and add the mm=2
>> when searching for (public libraries) I got 19 with
>> "parsedquery_toString": "+(((Title:public^200.0 | TotalField:public^0.1)
>> (Title:libraries^200.0 | TotalField:libraries^0.1))~2)",
>> 
>> and when adding + and searching for +(public libraries) I got 1189 with
>> "parsedquery_toString": "+(+((Title:public^200.0 | TotalField:public^0.1)
>> (Title:libraries^200.0 | TotalField:libraries^0.1)))",
>> 
>> 
>> I think when adding + before parentheses I got all terms mandatory despite
>> the value of mm=2 in the two cases.
>> 
>> Mahmoud
>> 
>> 
>> 
>>> On Nov 4, 2015, at 3:04 PM, Alessandro Benedetti 
>> wrote:
>>> 
>>> Here we go :
>>> 
>>> Title^200 TotalField^1
>>> 
>>> + Jack explanation and you have the parsed query explained !
>>> 
>>> Cheers
>>> 
>>> On 4 November 2015 at 12:56, Mahmoud Almokadem 
>>> wrote:
>>> 
 Thank you Alessandro for your reply.
 
 Here is the request handler
 
 
 
 
explicit
  10
  TotalField
 AND
 edismax
 Title^200 TotalField^1
 

 
 
 
 
 Mahmoud
 
 
> On Nov 4, 2015, at 2:43 PM, Alessandro Benedetti <
>> abenede...@apache.org>
 wrote:
> 
> Hi Mahmoud,
> can you send us the solrconfig.xml snippet of your request handler
 please ?
> 
> It's kinda strange you get a boost factor for the Title field and that
> parsing query, according to your config.
> 
> Cheers
> 
> On 4 November 2015 at 08:39, Mahmoud Almokadem >> 
> wrote:
> 
>> Hello,
>> 
>> I'm using solr 4.8.1. Using edismax as the parser we got the
>> undesirable
>> parsed queries and results. The following is two different cases with
>> strange behavior: Searching with these parameters
>> 
>> "mm":"2",
>> "df":"TotalField",
>> "debug":"true",
>> "indent":"true",
>> "fl":"Title",
>> "start":"0",
>> "q.op":"AND",
>> "fq":"",
>> "rows":"10",
>> "wt":"json"
>> and the query is
>> 
>> "q":"+(public libraries)",
>> Retrieve 502 documents with these parsed query
>> 
>> "rawquerystring":"+(public libraries)",
>> "querystring":"+(public libraries)",
>> "parsedquery":"(+(+(DisjunctionMaxQuery((Title:public^200.0 |
>> TotalField:public^0.1)) DisjunctionMaxQuery((Title:libraries^200.0 |
>> TotalField:libraries^0.1)/no_coord",
>> "parsedquery_toString":"+(+((Title:public^200.0 |
>> TotalField:public^0.1)
>> (Title:libraries^200.0 | TotalField:libraries^0.1)))"
>> and if the query is
>> 
>> "q":" (public libraries) "
>> then it retrieves 8 documents with these parsed query
>> 
>> "rawquerystring":" (public libraries) ",
>> "querystring":" (public libraries) ",
>> "parsedquery":"(+((DisjunctionMaxQuery((Title:public^200.0 |
>> TotalField:public^0.1)) DisjunctionMaxQuery((Title:libraries^200.0 |
>> TotalField:libraries^0.1)))~2))/no_coord",
>> "parsedquery_toString":"+(((Title:public^200.0 |
>> TotalField:public^0.1)
>> (Title:libraries^200.0 | TotalField:libraries^0.1))~2)"
>> So the results of adding "+" to get all tokens before the parenthesis
>> retrieve more results than removing it.
>> 
>> Is this a bug on this version or there are something missing?
> 
> 
> 
> 
> --
> --
> 
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
> 
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
> 
> William Blake - Songs of Experience -1794 England
 
 
>>> 
>>> 
>>> --
>>> --
>>> 
>>> Benedetti Alessandro
>>> Visiting card : http://about.me/alessandro_benedetti
>>> 
>>> "Tyger, tyger burning bright
>>> In the forests of the night,
>>> What immortal hand or eye
>>> Could frame thy fearful symmetry?"
>>> 
>>> William Blake - Songs of Experience -1794 England
>> 
>> 



[SolrJ Clients] RequestWriter VS BinaryRequestWriter

2015-11-05 Thread Alessandro Benedetti
Hi guys,
I was taking a look to the implementation details to understand how Solr
requests are written by SolrJ APIs.
The interesting classes are :

*org.apache.solr.client.solrj.request.RequestWriter*

*org.apache.solr.client.solrj.impl.BinaryRequestWriter* ( wrong package ? )

I discovered that :

*CloudSolrClient *- is using the javabin format ( *BinaryRequestWriter*)
*HttpSolrClient *and* LBHttpSolrClient* - are using the *RequestWriter* (
which writes xml)

In consequence the ConcurrentUpdateSolrClient is using the xml
ResponseWriter as well.

Is there any reason in this ?
I did know that the javabin  format is the most efficient for Solr requests.
Why the xml RequestWriter is still used as default with those SolrClients ?

Cheers

-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Boosting a document score when advertised! Please help!

2015-11-05 Thread liviuchristian
Hi everyone,I'm building a food recipe search engine based on solr. 

I need to boost documents score for the recipes that their authors paid for in 
order to have them returned first when somebody searches for "chocolate cake 
with hazelnuts". So those recipes that match the query terms and their authors 
paid to be listed first need to be returned first, ahead of the unpaid ones 
that match the query. 

How do I do that in Solr?
PLEASE HELP!
Regards, 
Christian
 


Re: Boosting a document score when advertised! Please help!

2015-11-05 Thread Alessandro Benedetti
Hi Christian,
there are several ways :

1) Elevation query component - it should be your winner :
https://cwiki.apache.org/confluence/display/solr/The+Query+Elevation+Component

2) Play with boosting according to your requirements

Cheers

On 5 November 2015 at 10:52,  wrote:

> Hi everyone,I'm building a food recipe search engine based on solr.
>
> I need to boost documents score for the recipes that their authors paid
> for in order to have them returned first when somebody searches for
> "chocolate cake with hazelnuts". So those recipes that match the query
> terms and their authors paid to be listed first need to be returned first,
> ahead of the unpaid ones that match the query.
>
> How do I do that in Solr?
> PLEASE HELP!
> Regards,
> Christian
>
>


-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: Solr Features

2015-11-05 Thread Alexandre Rafalovitch
Well, I've started to answer, but it hit a nerve and turned into a
guide. Which is now a blog post with 6 steps (not mentioning step 0 -
Admitting you have a problem).

I hope this is helpful:
http://blog.outerthoughts.com/2015/11/learning-solr-comprehensively/

Regards,
   Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 5 November 2015 at 01:08, Salman Ansari  wrote:
> Hi,
>
> I am in the process of looking for a comprehensive list of Solr features in
> order to assess how much have we implemented, what are some features that
> we were unaware of that we can utilize etc. I have looked at the following
> link for Solr features http://lucene.apache.org/solr/features.html but it
> looks like it highlights the main features. I also looked at this page
> http://www.typo3-solr.com/en/what-is-solr/features/ which gives some
> details and I am looking for more of such list and possibly a comprehensive
> list that combines them all.
>
> Regards,
> Salman


Re: Solr Cloud and Multiple Indexes

2015-11-05 Thread Salman Ansari
Thanks for your response. I have already gone through those documents
before. My point was that if I am using Solr Cloud the only way to
distribute my indexes is by adding shards? and I don't have to do anything
manually (because all the distributed search is handled by Solr Cloud).

What is the Xms and Xmx you are allocating to Solr and how much max is used by
your solr?
Xms and Xmx are both 4G. My current JVM-Memory consumption is 1.58 GB

How many segments are there in the index? The more the segment the slower is
the search.
How do I check how many segments are there in the index?

Is this after you moved to solrcloud?
I have been using SolrCloud from the beginning.

Regards,
Salman


On Thu, Nov 5, 2015 at 1:21 PM, Modassar Ather 
wrote:

> SolrCloud makes the distributed search easier. You can find details about
> it under following link.
> https://cwiki.apache.org/confluence/display/solr/How+SolrCloud+Works
>
> You can also refer to following link:
>
> https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud
>
> From size of your index I meant index size and not the total document
> alone.
> How many segments are there in the index? The more the segment the slower
> is the search.
> What is the Xms and Xmx you are allocating to Solr and how much max is used
> by your solr?
>
> I doubt this as the slowness was happening for a long period of time.
> I mentioned this point as I have seen gc pauses of 30 seconds and more in
> some complex queries.
>
> I am facing delay of 2-3 seconds but previously I
> had delays of around 28 seconds.
> Is this after you moved to solrcloud?
>
> Regards,
> Modassar
>
>
> On Thu, Nov 5, 2015 at 3:09 PM, Salman Ansari 
> wrote:
>
> > Here is the current info
> >
> > How much memory is used?
> > Physical memory consumption: 5.48 GB out of 14 GB.
> > Swap space consumption: 5.83 GB out of 15.94 GB.
> > JVM-Memory consumption: 1.58 GB out of 3.83 GB.
> >
> > What is your index size?
> > I have around 70M documents distributed on 2 shards (so each shard has
> 35M
> > document)
> >
> > What type of queries are slow?
> > I am running normal queries (queries on a field) no faceting or
> highlights
> > are requested. Currently, I am facing delay of 2-3 seconds but
> previously I
> > had delays of around 28 seconds.
> >
> > Are there GC pauses as they can be a cause of slowness?
> > I doubt this as the slowness was happening for a long period of time.
> >
> > Are document updates/additions happening in parallel?
> > No, I have stopped adding/updating documents and doing queries only.
> >
> > This is what you are already doing. Did you mean that you want to add
> more
> > shards?
> > No, what I meant is that I read that previously there was a way to chunk
> a
> > large index into multiple and then do distributed search on that as in
> this
> > article https://wiki.apache.org/solr/DistributedSearch. What I was
> looking
> > for how this is handled in Solr Cloud?
> >
> >
> > Regards,
> > Salman
> >
> >
> >
> >
> >
> > On Thu, Nov 5, 2015 at 12:06 PM, Modassar Ather 
> > wrote:
> >
> > > What is your index size? How much memory is used? What type of queries
> > are
> > > slow?
> > > Are there GC pauses as they can be a cause of slowness?
> > > Are document updates/additions happening in parallel?
> > >
> > > The queries are very slow to run so I was thinking to distribute
> > > the indexes into multiple indexes and consequently distributed search.
> > Can
> > > anyone guide me to some sources (articles) that discuss this in Solr
> > Cloud?
> > >
> > > This is what you are already doing. Did you mean that you want to add
> > more
> > > shards?
> > >
> > > Regards,
> > > Modassar
> > >
> > > On Thu, Nov 5, 2015 at 1:51 PM, Salman Ansari  >
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > I am using Solr cloud and I have created a single index that host
> > around
> > > > 70M documents distributed into 2 shards (each having 35M documents)
> > and 2
> > > > replicas. The queries are very slow to run so I was thinking to
> > > distribute
> > > > the indexes into multiple indexes and consequently distributed
> search.
> > > Can
> > > > anyone guide me to some sources (articles) that discuss this in Solr
> > > Cloud?
> > > >
> > > > Appreciate your feedback regarding this.
> > > >
> > > > Regards,
> > > > Salman
> > > >
> > >
> >
>


Re: Child document and parent document with same key

2015-11-05 Thread Jamie Johnson
The field is "key" and this is the value of unique key in schema.xml
On Oct 17, 2015 3:23 AM, "Mikhail Khludnev" 
wrote:

> Hello,
>
> What are the field names for parent and child docs exactly?
> Whats'  in schema.xml?
> What you've got if you actually try to do this?
>
> On Fri, Oct 16, 2015 at 12:41 PM, Jamie Johnson  wrote:
>
> > I am looking at using child documents and noticed that if I specify a
> child
> > and parent with the same key solr indexes this fine and I can retrieve
> both
> > documents separately.  Is this expected to work?
> >
> > -Jamie
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> 
> 
>


Re: Solr Cloud and Multiple Indexes

2015-11-05 Thread Modassar Ather
SolrCloud makes the distributed search easier. You can find details about
it under following link.
https://cwiki.apache.org/confluence/display/solr/How+SolrCloud+Works

You can also refer to following link:
https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud

>From size of your index I meant index size and not the total document alone.
How many segments are there in the index? The more the segment the slower
is the search.
What is the Xms and Xmx you are allocating to Solr and how much max is used
by your solr?

I doubt this as the slowness was happening for a long period of time.
I mentioned this point as I have seen gc pauses of 30 seconds and more in
some complex queries.

I am facing delay of 2-3 seconds but previously I
had delays of around 28 seconds.
Is this after you moved to solrcloud?

Regards,
Modassar


On Thu, Nov 5, 2015 at 3:09 PM, Salman Ansari 
wrote:

> Here is the current info
>
> How much memory is used?
> Physical memory consumption: 5.48 GB out of 14 GB.
> Swap space consumption: 5.83 GB out of 15.94 GB.
> JVM-Memory consumption: 1.58 GB out of 3.83 GB.
>
> What is your index size?
> I have around 70M documents distributed on 2 shards (so each shard has 35M
> document)
>
> What type of queries are slow?
> I am running normal queries (queries on a field) no faceting or highlights
> are requested. Currently, I am facing delay of 2-3 seconds but previously I
> had delays of around 28 seconds.
>
> Are there GC pauses as they can be a cause of slowness?
> I doubt this as the slowness was happening for a long period of time.
>
> Are document updates/additions happening in parallel?
> No, I have stopped adding/updating documents and doing queries only.
>
> This is what you are already doing. Did you mean that you want to add more
> shards?
> No, what I meant is that I read that previously there was a way to chunk a
> large index into multiple and then do distributed search on that as in this
> article https://wiki.apache.org/solr/DistributedSearch. What I was looking
> for how this is handled in Solr Cloud?
>
>
> Regards,
> Salman
>
>
>
>
>
> On Thu, Nov 5, 2015 at 12:06 PM, Modassar Ather 
> wrote:
>
> > What is your index size? How much memory is used? What type of queries
> are
> > slow?
> > Are there GC pauses as they can be a cause of slowness?
> > Are document updates/additions happening in parallel?
> >
> > The queries are very slow to run so I was thinking to distribute
> > the indexes into multiple indexes and consequently distributed search.
> Can
> > anyone guide me to some sources (articles) that discuss this in Solr
> Cloud?
> >
> > This is what you are already doing. Did you mean that you want to add
> more
> > shards?
> >
> > Regards,
> > Modassar
> >
> > On Thu, Nov 5, 2015 at 1:51 PM, Salman Ansari 
> > wrote:
> >
> > > Hi,
> > >
> > > I am using Solr cloud and I have created a single index that host
> around
> > > 70M documents distributed into 2 shards (each having 35M documents)
> and 2
> > > replicas. The queries are very slow to run so I was thinking to
> > distribute
> > > the indexes into multiple indexes and consequently distributed search.
> > Can
> > > anyone guide me to some sources (articles) that discuss this in Solr
> > Cloud?
> > >
> > > Appreciate your feedback regarding this.
> > >
> > > Regards,
> > > Salman
> > >
> >
>


Re: Solr Features

2015-11-05 Thread Alexandre Rafalovitch
On 5 November 2015 at 11:22, Shawn Heisey  wrote:
> As far as I know, there are no currently available books covering
> version 5, but I believe there is at least one on the horizon.

Rafal's book is "compatible" with Solr 5:
http://solr.pl/solr-cookbook-third-edition/ . But the number of
features and changes introduced in 5.1, 5.2, AND 5.3 was making any
book writing on the topic quite hard. Speaking from the experience.

Regards,
   Alex.
P.s. My last book of course targeted the latest and greatest 4.3 :-) I
no longer recommend people buy it. The concepts might all still be
valid, but the step-by-step guides would be quite broken.


Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


Re: Invalid parsing with solr edismax operators

2015-11-05 Thread Jack Krupansky
Great. Now, we'll have to see if any enterprising committers will step up
and take a look.

-- Jack Krupansky

On Thu, Nov 5, 2015 at 4:46 AM, Mahmoud Almokadem 
wrote:

> Thanks Jack. I have reported it as a bug on JIRA
>
> https://issues.apache.org/jira/browse/SOLR-8237 <
> https://issues.apache.org/jira/browse/SOLR-8237>
>
> Mahmoud
>
> > On Nov 4, 2015, at 5:30 PM, Jack Krupansky 
> wrote:
> >
> > I think you should go ahead and file a Jira ticket for this as a bug
> since
> > either it is an actual bug or some behavior nuance that needs to be
> > documented better.
> >
> > -- Jack Krupansky
> >
> > On Wed, Nov 4, 2015 at 8:24 AM, Mahmoud Almokadem <
> prog.mahm...@gmail.com>
> > wrote:
> >
> >> I removed the q.op=“AND” and add the mm=2
> >> when searching for (public libraries) I got 19 with
> >> "parsedquery_toString": "+(((Title:public^200.0 | TotalField:public^0.1)
> >> (Title:libraries^200.0 | TotalField:libraries^0.1))~2)",
> >>
> >> and when adding + and searching for +(public libraries) I got 1189 with
> >> "parsedquery_toString": "+(+((Title:public^200.0 |
> TotalField:public^0.1)
> >> (Title:libraries^200.0 | TotalField:libraries^0.1)))",
> >>
> >>
> >> I think when adding + before parentheses I got all terms mandatory
> despite
> >> the value of mm=2 in the two cases.
> >>
> >> Mahmoud
> >>
> >>
> >>
> >>> On Nov 4, 2015, at 3:04 PM, Alessandro Benedetti <
> abenede...@apache.org>
> >> wrote:
> >>>
> >>> Here we go :
> >>>
> >>> Title^200 TotalField^1
> >>>
> >>> + Jack explanation and you have the parsed query explained !
> >>>
> >>> Cheers
> >>>
> >>> On 4 November 2015 at 12:56, Mahmoud Almokadem  >
> >>> wrote:
> >>>
>  Thank you Alessandro for your reply.
> 
>  Here is the request handler
> 
> 
>  
> 
> explicit
>   10
>   TotalField
>  AND
>  edismax
>  Title^200 TotalField^1
> 
> 
> 
>  
> 
> 
>  Mahmoud
> 
> 
> > On Nov 4, 2015, at 2:43 PM, Alessandro Benedetti <
> >> abenede...@apache.org>
>  wrote:
> >
> > Hi Mahmoud,
> > can you send us the solrconfig.xml snippet of your request handler
>  please ?
> >
> > It's kinda strange you get a boost factor for the Title field and
> that
> > parsing query, according to your config.
> >
> > Cheers
> >
> > On 4 November 2015 at 08:39, Mahmoud Almokadem <
> prog.mahm...@gmail.com
> >>>
> > wrote:
> >
> >> Hello,
> >>
> >> I'm using solr 4.8.1. Using edismax as the parser we got the
> >> undesirable
> >> parsed queries and results. The following is two different cases
> with
> >> strange behavior: Searching with these parameters
> >>
> >> "mm":"2",
> >> "df":"TotalField",
> >> "debug":"true",
> >> "indent":"true",
> >> "fl":"Title",
> >> "start":"0",
> >> "q.op":"AND",
> >> "fq":"",
> >> "rows":"10",
> >> "wt":"json"
> >> and the query is
> >>
> >> "q":"+(public libraries)",
> >> Retrieve 502 documents with these parsed query
> >>
> >> "rawquerystring":"+(public libraries)",
> >> "querystring":"+(public libraries)",
> >> "parsedquery":"(+(+(DisjunctionMaxQuery((Title:public^200.0 |
> >> TotalField:public^0.1)) DisjunctionMaxQuery((Title:libraries^200.0 |
> >> TotalField:libraries^0.1)/no_coord",
> >> "parsedquery_toString":"+(+((Title:public^200.0 |
> >> TotalField:public^0.1)
> >> (Title:libraries^200.0 | TotalField:libraries^0.1)))"
> >> and if the query is
> >>
> >> "q":" (public libraries) "
> >> then it retrieves 8 documents with these parsed query
> >>
> >> "rawquerystring":" (public libraries) ",
> >> "querystring":" (public libraries) ",
> >> "parsedquery":"(+((DisjunctionMaxQuery((Title:public^200.0 |
> >> TotalField:public^0.1)) DisjunctionMaxQuery((Title:libraries^200.0 |
> >> TotalField:libraries^0.1)))~2))/no_coord",
> >> "parsedquery_toString":"+(((Title:public^200.0 |
> >> TotalField:public^0.1)
> >> (Title:libraries^200.0 | TotalField:libraries^0.1))~2)"
> >> So the results of adding "+" to get all tokens before the
> parenthesis
> >> retrieve more results than removing it.
> >>
> >> Is this a bug on this version or there are something missing?
> >
> >
> >
> >
> > --
> > --
> >
> > Benedetti Alessandro
> > Visiting card : http://about.me/alessandro_benedetti
> >
> > "Tyger, tyger burning bright
> > In the forests of the night,
> > What immortal hand or eye
> > Could frame thy fearful symmetry?"
> >
> > William Blake - Songs of Experience -1794 England
> 
> 
> >>>
> >>>
> >>> --
> >>> --
> >>>
> >>> Benedetti Alessandro
> >>> Visiting card : http://about.me/alessandro_benedetti
> >>>
> 

Re: Solr Features

2015-11-05 Thread Shawn Heisey
On 11/5/2015 8:38 AM, Jack Krupansky wrote:
> It's unfortunate, but the official Solr reference guide does not have a
> table of contents:
> http://mirror.olnevhost.net/pub/apache/lucene/solr/ref-guide/apache-solr-ref-guide-5.3.pdf
> https://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference+Guide

While it's true that there is no table of contents included in the
reference guide text, Acrobat Reader will automatically generate a table
of contents from markers within the guide for navigation purposes.

See the left side of this window:

https://www.dropbox.com/s/6foaz7xeq11vyuy/solr-ref-guide-toc.png?dl=0

> My Solr 4.4 Deep Dive is now a little outdated (since 4.4) and even then
> was not complete (no SolrCloud or DIH), but its table of contents would
> probably give you a fair view of the sheer magnitude of the number of Solr
> features:
> http://www.lulu.com/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-7/ebook/product-21203548.html
>
> It probably still has the most in-depth coverage and examples for token
> analysis and update processors, even though more recent Solr changes are
> not covered.

I have not seen your book.  I bet it's awesome, and for $10 I should
just go ahead and buy it.

The recent title "Solr In Action" covers Solr pretty well, though it is
somewhat pricy.  I have not read all of it.

https://www.manning.com/books/solr-in-action?a_bid=39472865_aid=1

As far as I know, there are no currently available books covering
version 5, but I believe there is at least one on the horizon.

Thanks,
Shawn



Re: highlighting on child document

2015-11-05 Thread Yangrui Guo
So if child document highlighting doesn't work how can I let solr tell
which child document and its field matched?

On Wednesday, November 4, 2015, Mikhail Khludnev 
wrote:

> Hello,
>
> Highlighter for block join hasn't been implemented. So, far you can call
> highlighter with children query also passing fq={!child
> ..}parent-id:.
>
> On Wed, Nov 4, 2015 at 7:57 PM, Yangrui Guo  > wrote:
>
> > Hi
> >
> > I want to highlight matched terms on child documents because I need to
> > determine which field matched the search terms. However when I use block
> > join solr returned empty highlight fields. How can I use highlight with
> > nested document? Or is there anyway to tell which field matched the query
> > terms?
> >
> > Yangrui
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> 
> >
>


Re: how to efficiently get sum of an int field

2015-11-05 Thread Renee Sun
thanks!

but it is silly that I can seem to escape the {!sum=true} properly to make
it work in my curl :-(

 time curl -d
'q=*:*=0=solrhostname:8080/solr/413-1,anothersolrhost:8080/solr/413-2=true={!sum=true}myfieldname'
http://localhost:8080/solr/413-1/select/? | xmllint --format -

double quote or single quote, only escape ! or escape all { and !, nothing
will make it work. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-efficiently-get-sum-of-an-int-field-tp4238464p4238478.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: how to efficiently get sum of an int field

2015-11-05 Thread Alexandre Rafalovitch
Ah, Unix. Isn't it wonderful (it is, but):
http://unix.stackexchange.com/questions/3051/how-to-echo-a-bang

Try single quotes and backslash before the bang. Or disable history characters.

Regards,
   Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 5 November 2015 at 14:20, Renee Sun  wrote:
> thanks!
>
> but it is silly that I can seem to escape the {!sum=true} properly to make
> it work in my curl :-(
>
>  time curl -d
> 'q=*:*=0=solrhostname:8080/solr/413-1,anothersolrhost:8080/solr/413-2=true={!sum=true}myfieldname'
> http://localhost:8080/solr/413-1/select/? | xmllint --format -
>
> double quote or single quote, only escape ! or escape all { and !, nothing
> will make it work.
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/how-to-efficiently-get-sum-of-an-int-field-tp4238464p4238478.html
> Sent from the Solr - User mailing list archive at Nabble.com.


how to efficiently get sum of an int field

2015-11-05 Thread Renee Sun
Hi -
I have been using stats to get the sum of a field data (int) like:

=true=my_field_name=0

It works fine but when the index has hundreds million messages on a sharded
indices, it take long time.

I noticed the 'stats' give out more information than I needed (just sum), I
suspect the min/max/mean etc are the ones that caused the time. 

Is there a simple way I can just get the sum without other things, and run
it on a faster and less stressed to the solr server manner?

Thanks
Renee



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-efficiently-get-sum-of-an-int-field-tp4238464.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: how to efficiently get sum of an int field

2015-11-05 Thread Yonik Seeley
You can also try the new JSON Facet API if you are on a recent version of Solr.

json.facet={x:"sum(myfield)"}

http://yonik.com/solr-facet-functions/

-Yonik


On Thu, Nov 5, 2015 at 1:14 PM, Renee Sun  wrote:
> Hi -
> I have been using stats to get the sum of a field data (int) like:
>
> =true=my_field_name=0
>
> It works fine but when the index has hundreds million messages on a sharded
> indices, it take long time.
>
> I noticed the 'stats' give out more information than I needed (just sum), I
> suspect the min/max/mean etc are the ones that caused the time.
>
> Is there a simple way I can just get the sum without other things, and run
> it on a faster and less stressed to the solr server manner?
>
> Thanks
> Renee
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/how-to-efficiently-get-sum-of-an-int-field-tp4238464.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: how to efficiently get sum of an int field

2015-11-05 Thread Renee Sun
I did try single quote with backslash of the bang.
also tried disable history chars... 

did not work for me.

unfortunately, we are using solr 3.5, probably does not support json format?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-efficiently-get-sum-of-an-int-field-tp4238464p4238497.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: how to efficiently get sum of an int field

2015-11-05 Thread Chris Hostetter
: =true=my_field_name=0
...
: I noticed the 'stats' give out more information than I needed (just sum), I
: suspect the min/max/mean etc are the ones that caused the time. 
: 
: Is there a simple way I can just get the sum without other things, and run
: it on a faster and less stressed to the solr server manner?

Yes...

  stats.field={!sum=true}my_field_name

https://cwiki.apache.org/confluence/display/solr/The+Stats+Component#TheStatsComponent-StatisticsSupported


-Hoss
http://www.lucidworks.com/


Re: Child document and parent document with same key

2015-11-05 Thread Mikhail Khludnev
On Fri, Oct 16, 2015 at 10:41 PM, Jamie Johnson  wrote:

> Is this expected to work?


I think it is. I'm still not sure I understand the question. But let me
bring some details from SOLR-3076:
- Solr's  backs on Lucene's "deleteTerm" which is supplied into
indexWriter.updateDocument();
- when parent document has children,  is not a deleteTerm but
its' value is used for "deleteTerm" for field "_root_" see
https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/update/DirectUpdateHandler2.java#L251
- thus for block updates uniqueKey is (almost) meaningless.
It lacks of elegance, but that's it.

-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Solr Search: Access Control / Role based security

2015-11-05 Thread Susheel Kumar
Hi,

I have seen couple of use cases / need where we want to restrict result of
search based on role of a user.  For e.g.

- if user role is admin, any document from the search result will be
returned
- if user role is manager, only documents intended for managers will be
returned
- if user role is worker, only documents intended for workers will be
returned

Typical practise is to tag the documents with the roles (using a
multi-valued field) during indexing and then during search append filter
query to restrict result based on roles.

Wondering if there is any other better way out there and if this common
requirement should be added as a Solr feature/plugin.

The current security plugins are more towards making Solr apis/resources
secure not towards securing/controlling data during search.
https://cwiki.apache.org/confluence/display/solr/Authentication+and+Authorization+Plugins


Please share your thoughts.

Thanks,
Susheel


Re: tikaparser docx file fails with exception

2015-11-05 Thread Alexandre Rafalovitch
It is quite clear actually that the problem is this:
Caused by: java.io.CharConversionException: Characters larger than 4
bytes are not supported: byte 0xb7 implies a length of more than 4
bytes
  at 
org.apache.xmlbeans.impl.piccolo.xml.UTF8XMLDecoder.decode(UTF8XMLDecoder.java:162)
  at 
org.apache.xmlbeans.impl.piccolo.xml.XMLStreamReader$FastStreamDecoder.read(XMLStreamReader.java:762)
  at 
org.apache.xmlbeans.impl.piccolo.xml.XMLStreamReader.read(XMLStreamReader.java:162)
  at 
org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.yy_refill(PiccoloLexer.java:3477)

If you search for something like: PiccoloLexer.yy_refill Characters
larger than 4 bytes are not supported:
you get lots of various matches in different forums for different
(java-based? tika-based?) software. Most likely Tika found something
obscure in the document that there is no implementations for yet. E.g.
an image inside a text field inside a footer section. Just as an
example

I would basically try standalone Tika and look for the most expressive
debug flag. It should tell you which file inside the zip that docx
actually is caused the problem. That should give you some hint.

Regards,
   Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 5 November 2015 at 17:36, Aswath Srinivasan (TMS)
 wrote:
> Thank you for attempting to answer. I will try out with solrj and standalone 
> java with tika parser. I completely understand that a bad document could 
> cause this, however, when I opened up the document I couldn't find anything 
> suspicious expect for some binary images/pictures embedded into the document.
>
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Wednesday, November 04, 2015 4:33 PM
> To: solr-user 
> Subject: Re: tikaparser docx file fails with exception
>
> Possibly a corrupt file? Tika does its best, but bad data is...bad data.
>
> You can experiment a bit with using Tika in Java, that might give you a 
> better idea of what's really going on, here's a SolrJ example:
>
> https://lucidworks.com/blog/2012/02/14/indexing-with-solrj/
>
> Best,
> Erick
>
> On Wed, Nov 4, 2015 at 3:49 PM, Aswath Srinivasan (TMS) 
>  wrote:
>>
>> Trying to index a document. A docx file. Ending up with the below exception. 
>> Not sure why it is erroring out. When I opened the docx I was able to see 
>> lots of binary data like embedded pictures etc., Is there a possible 
>> solution to this or is it a bug? Only one such file fails. Rest of the files 
>> are smoothly indexed.
>>
>> 2015-11-04 23:16:11.549 INFO  (coreLoadExecutor-6-thread-1) [   x:tika] 
>> o.a.s.c.CoreContainer registering core: tika
>> 2015-11-04 23:16:11.549 INFO  
>> (searcherExecutor-7-thread-1-processing-x:tika) [   x:tika] o.a.s.c.SolrCore 
>> QuerySenderListener sending requests to Searcher@1eb69b2[tika] 
>> main{ExitableDirectoryReader(UninvertingDirectoryReader())}
>> 2015-11-04 23:16:11.585 INFO  
>> (searcherExecutor-7-thread-1-processing-x:tika) [   x:tika] 
>> o.a.s.c.S.Request [tika] webapp=null path=null 
>> params={q=static+firstSearcher+warming+in+solrconfig.xml=false=firstSearcher}
>>  hits=0 status=0 QTime=34
>> 2015-11-04 23:16:11.586 INFO  
>> (searcherExecutor-7-thread-1-processing-x:tika) [   x:tika] o.a.s.c.SolrCore 
>> QuerySenderListener done.
>> 2015-11-04 23:16:11.586 INFO  
>> (searcherExecutor-7-thread-1-processing-x:tika) [   x:tika] 
>> o.a.s.h.c.SpellCheckComponent Loading spell index for spellchecker: default
>> 2015-11-04 23:16:11.586 INFO  
>> (searcherExecutor-7-thread-1-processing-x:tika) [   x:tika] 
>> o.a.s.h.c.SpellCheckComponent Loading spell index for spellchecker: wordbreak
>> 2015-11-04 23:16:11.586 INFO  
>> (searcherExecutor-7-thread-1-processing-x:tika) [   x:tika] 
>> o.a.s.h.c.SuggestComponent buildOnStartup: mySuggester
>> 2015-11-04 23:16:11.586 INFO  
>> (searcherExecutor-7-thread-1-processing-x:tika) [   x:tika] 
>> o.a.s.s.s.SolrSuggester SolrSuggester.build(mySuggester)
>> 2015-11-04 23:16:11.605 INFO  
>> (searcherExecutor-7-thread-1-processing-x:tika) [   x:tika] o.a.s.c.SolrCore 
>> [tika] Registered new searcher Searcher@1eb69b2[tika] 
>> main{ExitableDirectoryReader(UninvertingDirectoryReader())}
>> 2015-11-04 23:16:25.923 INFO  (qtp7980742-16) [   x:tika] 
>> o.a.s.h.d.DataImporter Loading DIH Configuration: tika-data-config.xml
>> 2015-11-04 23:16:25.937 INFO  (qtp7980742-16) [   x:tika] 
>> o.a.s.h.d.DataImporter Data Configuration loaded successfully
>> 2015-11-04 23:16:25.947 INFO  (qtp7980742-16) [   x:tika] o.a.s.c.S.Request 
>> [tika] webapp=/solr path=/dataimport 
>> params={debug=false=false=true=true=true=json=full-import=false}
>>  status=0 QTime=28
>> 2015-11-04 23:16:25.948 INFO  (Thread-17) [   x:tika] o.a.s.h.d.DataImporter 
>> Starting Full Import
>> 2015-11-04 23:16:25.961 INFO  (Thread-17) [   

Re: [Newbie question] in SOLR 5, would I have a "master-to-slave" relationship for two servers?

2015-11-05 Thread Erick Erickson
To pile on to Chris' comment. In the M/S situation
you describe, all the query traffic goes to the slave.

True, this relieves the slave from doing the work of
indexing, but it _also_ prevents the master from
answering queries. So going to SolrCloud trades
off indexing on _both_ machines to also querying on
_both_ machines.

And this doesn't even take into account the issues
involved in recovering if one or the other (especially
the master) goes down, which is automatically
handled in SolrCloud.

Add to that the fact that memory management is
_very_ significantly improved starting with Solr
4x (see: 
https://lucidworks.com/blog/2012/04/06/memory-comparisons-between-solr-3x-and-trunk/)
and my claim is that you are _far_ better off
using SolrCloud than M/S in 5x.

As always, YMMV of course.

Best,
Erick


On Thu, Nov 5, 2015 at 1:12 PM, Chris Hostetter
 wrote:
>
> : The database of server 2 is considered the "master" and it is replicated
> : regularly to server 1, the "slave".
> :
> : The advantage is the responsiveness of server 1 is not impacted with server
> : 2 gets busy with lots of indexing.
> :
> : QUESTION: When deploying a SOLR 5 setup, do I set things up the same way?
> : Or do I cluster bother servers together into one "cloud"?   That is, in
> : SOLR 5, how do I ensure the indexing process will not impact the
> : performance of the web app?
>
> There is nothing preventing you from using a master slave setup with Solr
> 5...
>
> https://cwiki.apache.org/confluence/display/solr/Index+Replication
>
> ...however if you do so you have to take responsibility for the same
> risks/tradeoffs that existed with this type of setup in Solr 3...
>
> 1) if the "query slave" goes down, you can't serve quiers w/o manually
> redirecting traffic to your "indexing master"
>
> 2) if the "indexing master" goes down you can't accept index updates w/o
> manually redirecting update to your "query slave" -- and manually
> rectifying the descrepencies if/when your master comes back online.
>
>
> When using a cloud based setup these types of problems go away because
> there is no single "master", clients can send updates/queries to any node
> (and if you use SolrJ your clients will be "ZK aware" and know
> automatically if/when a node is down or new nodes are added) ...
> many people concerned about performance/reliability consider these
> benefits more important then the risks/tradeoffs of performance impacts of
> indexing directy to nodes that are serving queries -- especially with
> other NRT (Near Real Time) related improvements to Solr over the years
> (Soft Commits, DocValues instead of FieldCache, etc...)
>
>
> -Hoss
> http://www.lucidworks.com/


RE: tikaparser docx file fails with exception

2015-11-05 Thread Aswath Srinivasan (TMS)
Thank you for attempting to answer. I will try out with solrj and standalone 
java with tika parser. I completely understand that a bad document could cause 
this, however, when I opened up the document I couldn't find anything 
suspicious expect for some binary images/pictures embedded into the document.


-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Wednesday, November 04, 2015 4:33 PM
To: solr-user 
Subject: Re: tikaparser docx file fails with exception

Possibly a corrupt file? Tika does its best, but bad data is...bad data.

You can experiment a bit with using Tika in Java, that might give you a better 
idea of what's really going on, here's a SolrJ example:

https://lucidworks.com/blog/2012/02/14/indexing-with-solrj/

Best,
Erick

On Wed, Nov 4, 2015 at 3:49 PM, Aswath Srinivasan (TMS) 
 wrote:
>
> Trying to index a document. A docx file. Ending up with the below exception. 
> Not sure why it is erroring out. When I opened the docx I was able to see 
> lots of binary data like embedded pictures etc., Is there a possible solution 
> to this or is it a bug? Only one such file fails. Rest of the files are 
> smoothly indexed.
>
> 2015-11-04 23:16:11.549 INFO  (coreLoadExecutor-6-thread-1) [   x:tika] 
> o.a.s.c.CoreContainer registering core: tika
> 2015-11-04 23:16:11.549 INFO  (searcherExecutor-7-thread-1-processing-x:tika) 
> [   x:tika] o.a.s.c.SolrCore QuerySenderListener sending requests to 
> Searcher@1eb69b2[tika] 
> main{ExitableDirectoryReader(UninvertingDirectoryReader())}
> 2015-11-04 23:16:11.585 INFO  (searcherExecutor-7-thread-1-processing-x:tika) 
> [   x:tika] o.a.s.c.S.Request [tika] webapp=null path=null 
> params={q=static+firstSearcher+warming+in+solrconfig.xml=false=firstSearcher}
>  hits=0 status=0 QTime=34
> 2015-11-04 23:16:11.586 INFO  (searcherExecutor-7-thread-1-processing-x:tika) 
> [   x:tika] o.a.s.c.SolrCore QuerySenderListener done.
> 2015-11-04 23:16:11.586 INFO  (searcherExecutor-7-thread-1-processing-x:tika) 
> [   x:tika] o.a.s.h.c.SpellCheckComponent Loading spell index for 
> spellchecker: default
> 2015-11-04 23:16:11.586 INFO  (searcherExecutor-7-thread-1-processing-x:tika) 
> [   x:tika] o.a.s.h.c.SpellCheckComponent Loading spell index for 
> spellchecker: wordbreak
> 2015-11-04 23:16:11.586 INFO  (searcherExecutor-7-thread-1-processing-x:tika) 
> [   x:tika] o.a.s.h.c.SuggestComponent buildOnStartup: mySuggester
> 2015-11-04 23:16:11.586 INFO  (searcherExecutor-7-thread-1-processing-x:tika) 
> [   x:tika] o.a.s.s.s.SolrSuggester SolrSuggester.build(mySuggester)
> 2015-11-04 23:16:11.605 INFO  (searcherExecutor-7-thread-1-processing-x:tika) 
> [   x:tika] o.a.s.c.SolrCore [tika] Registered new searcher 
> Searcher@1eb69b2[tika] 
> main{ExitableDirectoryReader(UninvertingDirectoryReader())}
> 2015-11-04 23:16:25.923 INFO  (qtp7980742-16) [   x:tika] 
> o.a.s.h.d.DataImporter Loading DIH Configuration: tika-data-config.xml
> 2015-11-04 23:16:25.937 INFO  (qtp7980742-16) [   x:tika] 
> o.a.s.h.d.DataImporter Data Configuration loaded successfully
> 2015-11-04 23:16:25.947 INFO  (qtp7980742-16) [   x:tika] o.a.s.c.S.Request 
> [tika] webapp=/solr path=/dataimport 
> params={debug=false=false=true=true=true=json=full-import=false}
>  status=0 QTime=28
> 2015-11-04 23:16:25.948 INFO  (Thread-17) [   x:tika] o.a.s.h.d.DataImporter 
> Starting Full Import
> 2015-11-04 23:16:25.961 INFO  (Thread-17) [   x:tika] 
> o.a.s.h.d.SimplePropertiesWriter Read dataimport.properties
> 2015-11-04 23:16:25.966 INFO  (qtp7980742-14) [   x:tika] o.a.s.c.S.Request 
> [tika] webapp=/solr path=/dataimport 
> params={indent=true=json=status&_=1446678985952} status=0 QTime=1
> 2015-11-04 23:16:25.998 INFO  (Thread-17) [   x:tika] o.a.s.c.SolrCore [tika] 
> REMOVING ALL DOCUMENTS FROM INDEX
> 2015-11-04 23:16:26.728 ERROR (Thread-17) [   x:tika] 
> o.a.s.h.d.EntityProcessorWrapper Exception in entity : 
> documentImport:org.apache.solr.handler.dataimport.DataImportHandlerException: 
> Unable to read content Processing Document # 1
>
>   at 
> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndT
> hrow(DataImportHandlerException.java:70)
>
>   at 
> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEnt
> ityProcessor.java:168)
>
>   at 
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Enti
> tyProcessorWrapper.java:243)
>
>   at 
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder
> .java:475)
>
>   at 
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder
> .java:514)
>
>   at 
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder
> .java:414)
>
>   at 
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.ja
> va:329)
>
>   at 
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:
> 232)
>
>   at 
> 

SolrSpatial conversion error

2015-11-05 Thread Gangl, Michael E (398H)
I’m processing some satellite coverage data and storing it in solr to search by 
geographical regions. I can create the correct WKT and pass ‘invalid’ tests 
when its created, but when I output to WKT and then ingested in solr, it looks 
like some string to digit conversion errors are happening:

2015-11-05 23:24:03.272 ERROR (qtp1125757038-18) [   x:l2ssCore] 
o.a.s.c.SolrCore org.apache.solr.common.SolrException: Couldn't parse shape 
'POLYGON ((39.42654 86.82489, -22.74477 87.94481, -51.87799 87.34623, -70.80492 
86.02579, -80.82939 84.22955, -87.55906 81.48592, -91.99886 77.37768, -94.95214 
71.18504, -109.15262 71.1237, -122.03073 70.07185, -132.71886 68.30231, 
-143.40538 65.33532, -159.34148 70.66631, -180 73.53569, -180 90, 180 90, 180 
73.53569, 157.67432 73.89309, 154.67627 78.65489, 149.71222 82.05602, 142.35925 
84.34942, 131.24057 85.93911, 89.5779 87.4869, 39.42654 86.82489))' because: 
com.vividsolutions.jts.geom.TopologyException: side location conflict [ 
(39.426539, 86.82489, NaN) ]

The conflict point ( (39.426539, 86.82489, NaN)  isn’t in the original 
WKT, so it looks like that’s being created or synthesized somewhere within 
solr. Has anyone run into this issue before? Are there configuration options 
that can help prevent this situation?


Full stack trace:

l2ss-solr_1| 2015-11-05 23:24:03.270 INFO  (qtp1125757038-18) [   
x:l2ssCore] o.a.s.u.p.LogUpdateProcessor [l2ssCore] webapp=/solr path=/update 
params={wt=javabin=2} {} 0 157
l2ss-solr_1| 2015-11-05 23:24:03.272 ERROR (qtp1125757038-18) [   
x:l2ssCore] o.a.s.c.SolrCore org.apache.solr.common.SolrException: Couldn't 
parse shape 'POLYGON ((39.42654 86.82489, -22.74477 87.94481, -51.87799 
87.34623, -70.80492 86.02579, -80.82939 84.22955, -87.55906 81.48592, -91.99886 
77.37768, -94.95214 71.18504, -109.15262 71.1237, -122.03073 70.07185, 
-132.71886 68.30231, -143.40538 65.33532, -159.34148 70.66631, -180 73.53569, 
-180 90, 180 90, 180 73.53569, 157.67432 73.89309, 154.67627 78.65489, 
149.71222 82.05602, 142.35925 84.34942, 131.24057 85.93911, 89.5779 87.4869, 
39.42654 86.82489))' because: com.vividsolutions.jts.geom.TopologyException: 
side location conflict [ (39.426539, 86.82489, NaN) ]
l2ss-solr_1| at 
org.apache.solr.schema.AbstractSpatialFieldType.parseShape(AbstractSpatialFieldType.java:236)
l2ss-solr_1| at 
org.apache.solr.schema.AbstractSpatialFieldType.createFields(AbstractSpatialFieldType.java:201)
l2ss-solr_1| at 
org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:48)
l2ss-solr_1| at 
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:123)
l2ss-solr_1| at 
org.apache.solr.update.AddUpdateCommand.getLuceneDocument(AddUpdateCommand.java:83)
l2ss-solr_1| at 
org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:237)
l2ss-solr_1| at 
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:163)
l2ss-solr_1| at 
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
l2ss-solr_1| at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
l2ss-solr_1| at 
org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117)
l2ss-solr_1| at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
l2ss-solr_1| at 
org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117)
l2ss-solr_1| at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
l2ss-solr_1| at 
org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117)
l2ss-solr_1| at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
l2ss-solr_1| at 
org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117)
l2ss-solr_1| at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
l2ss-solr_1| at 
org.apache.solr.update.processor.FieldNameMutatingUpdateProcessorFactory$1.processAdd(FieldNameMutatingUpdateProcessorFactory.java:79)
l2ss-solr_1| at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
l2ss-solr_1| at 
org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117)
l2ss-solr_1| at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
l2ss-solr_1| at 
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:955)
l2ss-solr_1  

[Newbie question] in SOLR 5, would I have a "master-to-slave" relationship for two servers?

2015-11-05 Thread Robert Hume
Hi,

In my SOLR 3 deployment (inherited it), I have (1) one SOLR server that is
used by my web application, and (2) a second SOLR server that is used to
index documents via a customer datasource.

The database of server 2 is considered the "master" and it is replicated
regularly to server 1, the "slave".

The advantage is the responsiveness of server 1 is not impacted with server
2 gets busy with lots of indexing.

QUESTION: When deploying a SOLR 5 setup, do I set things up the same way?
Or do I cluster bother servers together into one "cloud"?   That is, in
SOLR 5, how do I ensure the indexing process will not impact the
performance of the web app?

Any help is greatly appreciated!!

Rob


Re: collection API timeout

2015-11-05 Thread lboutros
Hi Julien,

just one additional thing,
if you developed some plugins/filters, you will have to adapt and compile
them for the Solr 5 API.

Ludovic.



-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/collection-API-timeout-tp4238150p4238511.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: how to efficiently get sum of an int field

2015-11-05 Thread Renee Sun
Also Yonik, out of curiosity... when I run stats on a large msg set (such as
200 million msgs), it tends to use a lot of memory, this should be expected
correct?

if I were able to use !sum=true to only get sum, a clever algorithm should
be able to tell if sum is only requited, it will avoid memory overhead, is
that implemented so ?

anyways I was only trying to avoid running these stats on thousands
customers that kills our solr servers.

thanks
Renee



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-efficiently-get-sum-of-an-int-field-tp4238464p4238520.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: how to efficiently get sum of an int field

2015-11-05 Thread Renee Sun
now I think with solr 3.5 (that we are using), !sum=true (overwrite default )
probably is not supported yet :-(

thanks
Renee



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-efficiently-get-sum-of-an-int-field-tp4238464p4238519.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: how to efficiently get sum of an int field

2015-11-05 Thread Yonik Seeley
On Thu, Nov 5, 2015 at 4:55 PM, Renee Sun  wrote:
> Also Yonik, out of curiosity... when I run stats on a large msg set (such as
> 200 million msgs), it tends to use a lot of memory, this should be expected
> correct?

With the stats component, yeah.

> if I were able to use !sum=true to only get sum, a clever algorithm should
> be able to tell if sum is only requited, it will avoid memory overhead, is
> that implemented so ?

I think so,  but I'm not an expert on the stats component.  I looked
at it when I wanted to implement the new JSON Facet API and decided we
were probably better off starting fresh and re-architecting some
things for better performance.


-Yonik


Re: how to efficiently get sum of an int field

2015-11-05 Thread Renee Sun
thanks Yonik... I bet with solr 3.5 we do not have jason facet api support
yet ...



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-efficiently-get-sum-of-an-int-field-tp4238464p4238522.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: how to efficiently get sum of an int field

2015-11-05 Thread Chris Hostetter

: On Thu, Nov 5, 2015 at 4:55 PM, Renee Sun  wrote:
: > Also Yonik, out of curiosity... when I run stats on a large msg set (such as
: > 200 million msgs), it tends to use a lot of memory, this should be expected
: > correct?
: 
: With the stats component, yeah.

the amount of RAM needed by the stats component for the default stats is 
fixed -- regardless of how many docs/values where are.  So in Solr 3.x, 
unless you are explicitly asking for "calcDistinct" the amount of RAM is 
fixed.

What you are probably seeing use a lot of RAM is the FieldCache -- used 
(and shared) under the covers for lots of things like StatsComponent, 
sorting, faceting, etc  in modern versions of Solr you can use 
DocValues instead.


-Hoss
http://www.lucidworks.com/


[Newbie question] what is a "core" and are they different from 3.x to 5.x ?

2015-11-05 Thread Robert Hume
Trying to learn about SOLR.

I can see there is something called a "core" ... it appears there can be
many cores for a single SOLR server.

Can someone "explain like I'm five" -- what is a core?

And how do "cores" differ from 3.x to 5.x.

Any pointers in the right direction are helpful!

Thanks!
Rob


Re: [Newbie question] what is a "core" and are they different from 3.x to 5.x ?

2015-11-05 Thread Chris Hostetter

: I can see there is something called a "core" ... it appears there can be
: many cores for a single SOLR server.
: 
: Can someone "explain like I'm five" -- what is a core?

https://cwiki.apache.org/confluence/display/solr/Solr+Cores+and+solr.xml

"In Solr, the term core is used to refer to a single index and associated 
transaction log and configuration files (including schema.xml and 
solrconfig.xml, among others). Your Solr installation can have multiple 
cores if needed, which allows you to index data with different structures 
in the same server, and maintain more control over how your data is 
presented to different audiences."

: And how do "cores" differ from 3.x to 5.x.


The only fundemental differences between "cores" in Solr 3.x vs 5.x are:

1) in 3.x there was a concept known as the "default core" (if you didn't 
explicitly use multiple cores)  with 5.x every request (updates or 
queries) must be to an explicit core (or collection)

2) when using SolrCloud in 5.x, you should think (logically) in terms of 
the higher level concept of "collections" which (depending on the settings 
when the collection is created) may be *implemented* by multiple cores 
that are managed under the covers for you...

https://cwiki.apache.org/confluence/display/solr/SolrCloud
https://cwiki.apache.org/confluence/display/solr/Nodes%2C+Cores%2C+Clusters+and+Leaders


-Hoss
http://www.lucidworks.com/


Re: [Newbie question] in SOLR 5, would I have a "master-to-slave" relationship for two servers?

2015-11-05 Thread Chris Hostetter

: The database of server 2 is considered the "master" and it is replicated
: regularly to server 1, the "slave".
: 
: The advantage is the responsiveness of server 1 is not impacted with server
: 2 gets busy with lots of indexing.
: 
: QUESTION: When deploying a SOLR 5 setup, do I set things up the same way?
: Or do I cluster bother servers together into one "cloud"?   That is, in
: SOLR 5, how do I ensure the indexing process will not impact the
: performance of the web app?

There is nothing preventing you from using a master slave setup with Solr 
5...

https://cwiki.apache.org/confluence/display/solr/Index+Replication

...however if you do so you have to take responsibility for the same 
risks/tradeoffs that existed with this type of setup in Solr 3...

1) if the "query slave" goes down, you can't serve quiers w/o manually 
redirecting traffic to your "indexing master"

2) if the "indexing master" goes down you can't accept index updates w/o 
manually redirecting update to your "query slave" -- and manually 
rectifying the descrepencies if/when your master comes back online.


When using a cloud based setup these types of problems go away because 
there is no single "master", clients can send updates/queries to any node 
(and if you use SolrJ your clients will be "ZK aware" and know 
automatically if/when a node is down or new nodes are added) ... 
many people concerned about performance/reliability consider these 
benefits more important then the risks/tradeoffs of performance impacts of 
indexing directy to nodes that are serving queries -- especially with 
other NRT (Near Real Time) related improvements to Solr over the years 
(Soft Commits, DocValues instead of FieldCache, etc...)


-Hoss
http://www.lucidworks.com/


Re: MatchAllDocsQuery is much slower in solr5.3.1 compare to solr4.7

2015-11-05 Thread Jack Krupansky
I vaguely recall some discussion concerning removal of the field cache in
Lucene.

-- Jack Krupansky

On Thu, Nov 5, 2015 at 10:38 PM, wei  wrote:

> We are running our search on solr4.7 and I am evaluating whether to upgrade
> to solr5.3.1. I found MatchAllDocsQuery is much slower in solr5.3.1. Anyone
> know why?
>
> We have a lot of queries without any query keyword, but we apply filters on
> the query. Load testing shows those queries are much slower in solr5.3.1
> compare to 4.7. If we load test with queries with search keywords, we can
> see the queries are much faster in solr5.3.1 compare solr4.7.
> here is sample debug info:
> (in solr 4.7)
>
> 
>
>   0
>   86
>   
>  id
>  0
>  *:*
>  true
>  +categoryIdsPath:1001
>  2
>   
>
>
>   
>  36652255
>   
>   
>  36651884
>   
>
>
>   *:*
>   *:*
>   MatchAllDocsQuery(*:*)
>   *:*
>   
>  1.0 = (MATCH) MatchAllDocsQuery, product of:
>   1.0 = queryNorm
>  1.0 = (MATCH) MatchAllDocsQuery, product of:
>   1.0 = queryNorm
>   
>   LuceneQParser
>   
>  +categoryIdsPath:1001
>   
>   
>  +categoryIdsPath:1001
>   
>   
>  86.0
>  
> 0.0
> 
>0.0
> 
> 
>0.0
> 
> 
>0.0
> 
> 
>0.0
> 
> 
>0.0
> 
> 
>0.0
> 
>  
>  
> 86.0
> 
>85.0
> 
> 
>0.0
> 
> 
>0.0
> 
> 
>0.0
> 
> 
>0.0
> 
> 
>1.0
> 
>  
>   
>
>
>
> (in solr 5.3.1)
>
> 
>
>   0
>   313
>   
>  id
>  0
>  *:*
>  true
>  +categoryIdsPath:1001
>  2
>   
>
>
>   
>  36652255
>   
>   
>  36651884
>   
>
>
>   *:*
>   *:*
>   MatchAllDocsQuery(*:*)
>   *:*
>   
>  1.0 = *:*, product of:
>   1.0 = boost
>   1.0 = queryNorm
>  1.0 = *:*, product of:
>   1.0 = boost
>   1.0 = queryNorm
>   
>   LuceneQParser
>   
>  +categoryIdsPath:1001
>   
>   
>  +categoryIdsPath:1001
>   
>   
>  313.0
>  
> 0.0
> 
>0.0
> 
> 
>0.0
> 
> 
>0.0
> 
> 
>0.0
> 
> 
>0.0
> 
> 
>0.0
> 
> 
>0.0
> 
> 
>0.0
> 
>  
>  
> 311.0
> 
>311.0
> 
> 
>0.0
> 
> 
>0.0
> 
> 
>0.0
> 
> 
>0.0
> 
> 
>0.0
> 
> 
>0.0
> 
> 
>0.0
> 
>  
>   
>
>
> Thanks,
> Wei
>


Exception in grouping with docValues enable field.

2015-11-05 Thread Modassar Ather
Hi,

I have following docValues enabled field.

*Field : *
*Type:  *

When I am grouping on this field I am getting following exception. Kindly
let me know if I am missing something or it is an issue.

  org.apache.solr.common.SolrException; java.lang.NullPointerException
at org.apache.solr.schema.FieldType.toExternal(FieldType.java:346)
at org.apache.solr.schema.FieldType.toObject(FieldType.java:355)
at
org.apache.solr.search.grouping.endresulttransformer.GroupedEndResultTransformer.transform(GroupedEndResultTransformer.java:72)
at
org.apache.solr.handler.component.QueryComponent.groupedFinishStage(QueryComponent.java:810)
at
org.apache.solr.handler.component.QueryComponent.finishStage(QueryComponent.java:768)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:394)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064)
at
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:497)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
at
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Thread.java:745)

Thanks,
Modassar


Trying to apply patch for SOLR-7036

2015-11-05 Thread r b
I just wanted to double check that my steps were not too off base.

I am trying to apply the patch from 8/May/15 and it seems to be
slightly off. Inside the working revision is 1658487 so I checked that
out from svn. This is what I did.

svn checkout
http://svn.apache.org/repos/asf/lucene/dev/trunk@1658487 lucene_trunk
cd lucene_trunk/solr
curl 
https://issues.apache.org/jira/secure/attachment/12731517/SOLR-7036.patch
| patch -p0

But `patch` still fails on a few hunks. I figured this patch was made
with `svn diff` so it should apply smoothly to that same revision,
shouldn't it?

-renning


Re: Solr Cloud and Multiple Indexes

2015-11-05 Thread Modassar Ather
Thanks for your response. I have already gone through those documents
before. My point was that if I am using Solr Cloud the only way to
distribute my indexes is by adding shards? and I don't have to do anything
manually (because all the distributed search is handled by Solr Cloud).

Yes as per my knowledge.

How do I check how many segments are there in the index?
You can see into the index folder manually. Which version of solr are you
using? I don't remember exactly the start version but in the latest and
Solr-5.2.1 there is a "Segments info" link available where you can see
number of segments.

Regards,
Modassar

On Thu, Nov 5, 2015 at 5:41 PM, Salman Ansari 
wrote:

> Thanks for your response. I have already gone through those documents
> before. My point was that if I am using Solr Cloud the only way to
> distribute my indexes is by adding shards? and I don't have to do anything
> manually (because all the distributed search is handled by Solr Cloud).
>
> What is the Xms and Xmx you are allocating to Solr and how much max is
> used by
> your solr?
> Xms and Xmx are both 4G. My current JVM-Memory consumption is 1.58 GB
>
> How many segments are there in the index? The more the segment the slower
> is
> the search.
> How do I check how many segments are there in the index?
>
> Is this after you moved to solrcloud?
> I have been using SolrCloud from the beginning.
>
> Regards,
> Salman
>
>
> On Thu, Nov 5, 2015 at 1:21 PM, Modassar Ather 
> wrote:
>
> > SolrCloud makes the distributed search easier. You can find details about
> > it under following link.
> > https://cwiki.apache.org/confluence/display/solr/How+SolrCloud+Works
> >
> > You can also refer to following link:
> >
> >
> https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud
> >
> > From size of your index I meant index size and not the total document
> > alone.
> > How many segments are there in the index? The more the segment the slower
> > is the search.
> > What is the Xms and Xmx you are allocating to Solr and how much max is
> used
> > by your solr?
> >
> > I doubt this as the slowness was happening for a long period of time.
> > I mentioned this point as I have seen gc pauses of 30 seconds and more in
> > some complex queries.
> >
> > I am facing delay of 2-3 seconds but previously I
> > had delays of around 28 seconds.
> > Is this after you moved to solrcloud?
> >
> > Regards,
> > Modassar
> >
> >
> > On Thu, Nov 5, 2015 at 3:09 PM, Salman Ansari 
> > wrote:
> >
> > > Here is the current info
> > >
> > > How much memory is used?
> > > Physical memory consumption: 5.48 GB out of 14 GB.
> > > Swap space consumption: 5.83 GB out of 15.94 GB.
> > > JVM-Memory consumption: 1.58 GB out of 3.83 GB.
> > >
> > > What is your index size?
> > > I have around 70M documents distributed on 2 shards (so each shard has
> > 35M
> > > document)
> > >
> > > What type of queries are slow?
> > > I am running normal queries (queries on a field) no faceting or
> > highlights
> > > are requested. Currently, I am facing delay of 2-3 seconds but
> > previously I
> > > had delays of around 28 seconds.
> > >
> > > Are there GC pauses as they can be a cause of slowness?
> > > I doubt this as the slowness was happening for a long period of time.
> > >
> > > Are document updates/additions happening in parallel?
> > > No, I have stopped adding/updating documents and doing queries only.
> > >
> > > This is what you are already doing. Did you mean that you want to add
> > more
> > > shards?
> > > No, what I meant is that I read that previously there was a way to
> chunk
> > a
> > > large index into multiple and then do distributed search on that as in
> > this
> > > article https://wiki.apache.org/solr/DistributedSearch. What I was
> > looking
> > > for how this is handled in Solr Cloud?
> > >
> > >
> > > Regards,
> > > Salman
> > >
> > >
> > >
> > >
> > >
> > > On Thu, Nov 5, 2015 at 12:06 PM, Modassar Ather <
> modather1...@gmail.com>
> > > wrote:
> > >
> > > > What is your index size? How much memory is used? What type of
> queries
> > > are
> > > > slow?
> > > > Are there GC pauses as they can be a cause of slowness?
> > > > Are document updates/additions happening in parallel?
> > > >
> > > > The queries are very slow to run so I was thinking to distribute
> > > > the indexes into multiple indexes and consequently distributed
> search.
> > > Can
> > > > anyone guide me to some sources (articles) that discuss this in Solr
> > > Cloud?
> > > >
> > > > This is what you are already doing. Did you mean that you want to add
> > > more
> > > > shards?
> > > >
> > > > Regards,
> > > > Modassar
> > > >
> > > > On Thu, Nov 5, 2015 at 1:51 PM, Salman Ansari <
> salman.rah...@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I am using Solr cloud and I have created a single index that host
> > > around
> > > > > 70M documents distributed into 2 

MatchAllDocsQuery is much slower in solr5.3.1 compare to solr4.7

2015-11-05 Thread wei
We are running our search on solr4.7 and I am evaluating whether to upgrade
to solr5.3.1. I found MatchAllDocsQuery is much slower in solr5.3.1. Anyone
know why?

We have a lot of queries without any query keyword, but we apply filters on
the query. Load testing shows those queries are much slower in solr5.3.1
compare to 4.7. If we load test with queries with search keywords, we can
see the queries are much faster in solr5.3.1 compare solr4.7.
here is sample debug info:
(in solr 4.7)


   
  0
  86
  
 id
 0
 *:*
 true
 +categoryIdsPath:1001
 2
  
   
   
  
 36652255
  
  
 36651884
  
   
   
  *:*
  *:*
  MatchAllDocsQuery(*:*)
  *:*
  
 1.0 = (MATCH) MatchAllDocsQuery, product of:
  1.0 = queryNorm
 1.0 = (MATCH) MatchAllDocsQuery, product of:
  1.0 = queryNorm
  
  LuceneQParser
  
 +categoryIdsPath:1001
  
  
 +categoryIdsPath:1001
  
  
 86.0
 
0.0

   0.0


   0.0


   0.0


   0.0


   0.0


   0.0

 
 
86.0

   85.0


   0.0


   0.0


   0.0


   0.0


   1.0

 
  
   


(in solr 5.3.1)


   
  0
  313
  
 id
 0
 *:*
 true
 +categoryIdsPath:1001
 2
  
   
   
  
 36652255
  
  
 36651884
  
   
   
  *:*
  *:*
  MatchAllDocsQuery(*:*)
  *:*
  
 1.0 = *:*, product of:
  1.0 = boost
  1.0 = queryNorm
 1.0 = *:*, product of:
  1.0 = boost
  1.0 = queryNorm
  
  LuceneQParser
  
 +categoryIdsPath:1001
  
  
 +categoryIdsPath:1001
  
  
 313.0
 
0.0

   0.0


   0.0


   0.0


   0.0


   0.0


   0.0


   0.0


   0.0

 
 
311.0

   311.0


   0.0


   0.0


   0.0


   0.0


   0.0


   0.0


   0.0

 
  
   

Thanks,
Wei


No live SolrServers available to handle this request

2015-11-05 Thread wilanjar .
Hi All,

I'm very new handle the solrcloud.
I've changed the scema.xml with adding field to index but after reload the
collection we got error from logging " No live SolrServers available to
handle this request".

i have check solrcloud from localhost each node and running  well.
i'm using solr version 4.10.4 lucene version 4.10.4
tomcat 8.0.27
zookeeper 3.4.6.

I already googling but not get solution yet.

Thank you.


Re: Securing field level access permission by filtering the query itself

2015-11-05 Thread Scott Stults
Good to hear! Depending on how far you want to take it, you can then scan
the initial request coming in from the client (and the final response) for
raw Solr fields -- that shouldn't happen. I've used mod_security as a
general-purpose application firewall and would recommend it.

k/r,
Scott

On Wed, Nov 4, 2015 at 1:40 PM, Douglas McGilvray  wrote:

>
> Thanks Alessandro, I had overlooked the highlighting component.
>
> I will also add a reminder to exclude these fields from spellcheck fields,
> (or maintain different spellcheck fields for different roles).
>
> @Scott - Once I started planning my code the penny finally dropped
> regarding your point about aliasing the fields - it removes the need for
> calculating which fields to request in the app itself.
>
> Regards,
> D
>
>
> > On 4 Nov 2015, at 14:53, Alessandro Benedetti 
> wrote:
> >
> > Of course it depends of all the query parameter you use and you process
> in
> > the response.
> > The list you wrote should be ok if you use only those components.
> >
> > For example if you use highlight, it's not ok and you need to take care
> of
> > the highlighted fields as well.
> >
> > Cheers
> >
> > On 30 October 2015 at 14:51, Douglas McGilvray  wrote:
> >
> >>
> >> Scott thanks for the reply. I like the idea of mapping all the
> fieldnames
> >> internally, adding security through obscurity. My question therefore
> would
> >> be what is the definitive list of query parameters that one must filter
> to
> >> ensure a particular field is not exposed in the query response? Am I
> >> missing in the following?
> >>
> >> fl
> >> facect.field
> >> facet.pivot
> >> json.facet
> >> terms.fl
> >>
> >>
> >> kr
> >> Douglas
> >>
> >>
> >>> On 30 Oct 2015, at 07:37, Scott Stults <
> >> sstu...@opensourceconnections.com> wrote:
> >>>
> >>> Douglas,
> >>>
> >>> Managing a per-user-group whitelist of fields outside of Solr seems the
> >>> best approach. When the query comes in you can then filter out any
> fields
> >>> not contained in the whitelist before you send the request to Solr. The
> >>> easy part will be to do that on URL parameters like fl. Depending on
> how
> >>> your app generates the actual query string, you may want to also scan
> >> that
> >>> for fielded query clauses (eg "badfield:value") and localParams (eg
> >>> "{!dismax qf=badfield}value").
> >>>
> >>> Secondly, you can map internal Solr fields to aliases using this syntax
> >> in
> >>> the fl parameter: "display_name:real_solr_name". So when the request
> >> comes
> >>> in from your app, first you'll map from the requested field alias names
> >> to
> >>> internal Solr names (while enforcing the whitelist), and then in the fl
> >>> parameter supply the aliases you want sent in the response.
> >>>
> >>>
> >>> k/r,
> >>> Scott
> >>>
> >>> On Wed, Oct 28, 2015 at 6:58 PM, Douglas McGilvray 
> >> wrote:
> >>>
>  Hi all,
> 
>  First I’d like to say the nested facets and the json facet api in
>  particular have made my world much better, I thank everyone involved,
> >> you
>  are all awesome.
> 
>  In my implementation has much of the solr query building working on
> the
>  browser, solr is behind a php server which acts as “proxy” and
> doorman,
>  filtering at the document level according to user role and supplying
> >> some
>  sensible maximums …
> 
>  However we now wish to filter just one or two potentially sensitive
> >> fields
>  in one document type according to user role (as determined in the php
>  proxy). Duplicating documents (or cores) seems like overkill for just
> >> two
>  fields in one document type .. I wondered if it would be feasible (in
> >> the
>  interests of preventing malicious activity) to filter the query itself
>  whether it be parameters (fl, facet.fields, terms, etc) … or even deny
> >> any
>  request in which fieldname occurs …
> 
>  Is there someway someone might obscure a fieldname in a request?
> 
>  Kind Regards & thanks in davacne,
>  Douglas
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>> Scott Stults | Founder & Solutions Architect | OpenSource Connections,
> >> LLC
> >>> | 434.409.2780
> >>> http://www.opensourceconnections.com
> >>
> >>
> >
> >
> > --
> > --
> >
> > Benedetti Alessandro
> > Visiting card : http://about.me/alessandro_benedetti
> >
> > "Tyger, tyger burning bright
> > In the forests of the night,
> > What immortal hand or eye
> > Could frame thy fearful symmetry?"
> >
> > William Blake - Songs of Experience -1794 England
>
>


-- 
Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC
| 434.409.2780
http://www.opensourceconnections.com


Re: Boosting a document score when advertised! Please help!

2015-11-05 Thread Walter Underwood
The elevation component will be a ton of manual work. Instead, use edismax and 
the boost parameter.

Add a field that is true for paid documents, then boost for paid:true. It might 
be easier to use a boost query (bq) to do this. The extra boost will be a 
tiebreaker for documents that would have the same score.

Use this in your solrconfig.xml:

paid:true 

You can add weight to that if it isn’t boosting the paid content enough. Like 
this:

paid:true^8 

It is slightly better to do this with the boost parameter and a function query, 
because that bypasses idf, but I think this approach is nice and clear.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Nov 5, 2015, at 3:33 AM, Alessandro Benedetti  
> wrote:
> 
> Hi Christian,
> there are several ways :
> 
> 1) Elevation query component - it should be your winner :
> https://cwiki.apache.org/confluence/display/solr/The+Query+Elevation+Component
> 
> 2) Play with boosting according to your requirements
> 
> Cheers
> 
> On 5 November 2015 at 10:52,  wrote:
> 
>> Hi everyone,I'm building a food recipe search engine based on solr.
>> 
>> I need to boost documents score for the recipes that their authors paid
>> for in order to have them returned first when somebody searches for
>> "chocolate cake with hazelnuts". So those recipes that match the query
>> terms and their authors paid to be listed first need to be returned first,
>> ahead of the unpaid ones that match the query.
>> 
>> How do I do that in Solr?
>> PLEASE HELP!
>> Regards,
>> Christian
>> 
>> 
> 
> 
> -- 
> --
> 
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
> 
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
> 
> William Blake - Songs of Experience -1794 England



Re: Solr Features

2015-11-05 Thread Jack Krupansky
It's unfortunate, but the official Solr reference guide does not have a
table of contents:
http://mirror.olnevhost.net/pub/apache/lucene/solr/ref-guide/apache-solr-ref-guide-5.3.pdf
https://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference+Guide

My Solr 4.4 Deep Dive is now a little outdated (since 4.4) and even then
was not complete (no SolrCloud or DIH), but its table of contents would
probably give you a fair view of the sheer magnitude of the number of Solr
features:
http://www.lulu.com/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-7/ebook/product-21203548.html

It probably still has the most in-depth coverage and examples for token
analysis and update processors, even though more recent Solr changes are
not covered.



-- Jack Krupansky

On Thu, Nov 5, 2015 at 9:18 AM, Alexandre Rafalovitch 
wrote:

> Glad you liked it.
>
> The problem with your request is that it is not clear what you already
> know and in which direction you are trying to go. Cloud is a big topic
> all on its own. Relevancy - another one. Crafting schema to best
> represent your data - a third. Loading data with DIH vs. SolrJ vs. 3rd
> party client - a fourth. Multilingual content - a fifth. And so on.
>
> But if you want high level guidelines, I would pick a couple of Solr
> books and look at their Tables of Contents. Then, do the same for the
> Reference Guide. This should be a good mid-level overview of issues.
>
> Regards,
> Alex.
>
> 
> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> http://www.solr-start.com/
>
>
> On 5 November 2015 at 08:43, Salman Ansari 
> wrote:
> > Thanks Alex for your response. Much appreciated effort! For sure, I will
> > need to look for all those details and information to fully understand
> Solr
> > but I don't have that much time in my hand. That's why I was thinking
> > instead of reading everything from the beginning is to start with a
> feature
> > list that briefly explains what each feature does and then dig deeper if
> I
> > need more information. I will appreciate any comments/feedback regarding
> > this.
> >
> > Regards,
> > Salman
> >
> > On Thu, Nov 5, 2015 at 2:56 PM, Alexandre Rafalovitch <
> arafa...@gmail.com>
> > wrote:
> >
> >> Well, I've started to answer, but it hit a nerve and turned into a
> >> guide. Which is now a blog post with 6 steps (not mentioning step 0 -
> >> Admitting you have a problem).
> >>
> >> I hope this is helpful:
> >> http://blog.outerthoughts.com/2015/11/learning-solr-comprehensively/
> >>
> >> Regards,
> >>Alex.
> >> 
> >> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> >> http://www.solr-start.com/
> >>
> >>
> >> On 5 November 2015 at 01:08, Salman Ansari 
> >> wrote:
> >> > Hi,
> >> >
> >> > I am in the process of looking for a comprehensive list of Solr
> features
> >> in
> >> > order to assess how much have we implemented, what are some features
> that
> >> > we were unaware of that we can utilize etc. I have looked at the
> >> following
> >> > link for Solr features http://lucene.apache.org/solr/features.html
> but
> >> it
> >> > looks like it highlights the main features. I also looked at this page
> >> > http://www.typo3-solr.com/en/what-is-solr/features/ which gives some
> >> > details and I am looking for more of such list and possibly a
> >> comprehensive
> >> > list that combines them all.
> >> >
> >> > Regards,
> >> > Salman
> >>
>


Re: Securing field level access permission by filtering the query itself

2015-11-05 Thread Alessandro Benedetti
Be careful to the suggester as well. You don't want to show suggestions
coming from sensitive fields.

Cheers

On 5 November 2015 at 15:28, Scott Stults  wrote:

> Good to hear! Depending on how far you want to take it, you can then scan
> the initial request coming in from the client (and the final response) for
> raw Solr fields -- that shouldn't happen. I've used mod_security as a
> general-purpose application firewall and would recommend it.
>
> k/r,
> Scott
>
> On Wed, Nov 4, 2015 at 1:40 PM, Douglas McGilvray  wrote:
>
> >
> > Thanks Alessandro, I had overlooked the highlighting component.
> >
> > I will also add a reminder to exclude these fields from spellcheck
> fields,
> > (or maintain different spellcheck fields for different roles).
> >
> > @Scott - Once I started planning my code the penny finally dropped
> > regarding your point about aliasing the fields - it removes the need for
> > calculating which fields to request in the app itself.
> >
> > Regards,
> > D
> >
> >
> > > On 4 Nov 2015, at 14:53, Alessandro Benedetti 
> > wrote:
> > >
> > > Of course it depends of all the query parameter you use and you process
> > in
> > > the response.
> > > The list you wrote should be ok if you use only those components.
> > >
> > > For example if you use highlight, it's not ok and you need to take care
> > of
> > > the highlighted fields as well.
> > >
> > > Cheers
> > >
> > > On 30 October 2015 at 14:51, Douglas McGilvray 
> wrote:
> > >
> > >>
> > >> Scott thanks for the reply. I like the idea of mapping all the
> > fieldnames
> > >> internally, adding security through obscurity. My question therefore
> > would
> > >> be what is the definitive list of query parameters that one must
> filter
> > to
> > >> ensure a particular field is not exposed in the query response? Am I
> > >> missing in the following?
> > >>
> > >> fl
> > >> facect.field
> > >> facet.pivot
> > >> json.facet
> > >> terms.fl
> > >>
> > >>
> > >> kr
> > >> Douglas
> > >>
> > >>
> > >>> On 30 Oct 2015, at 07:37, Scott Stults <
> > >> sstu...@opensourceconnections.com> wrote:
> > >>>
> > >>> Douglas,
> > >>>
> > >>> Managing a per-user-group whitelist of fields outside of Solr seems
> the
> > >>> best approach. When the query comes in you can then filter out any
> > fields
> > >>> not contained in the whitelist before you send the request to Solr.
> The
> > >>> easy part will be to do that on URL parameters like fl. Depending on
> > how
> > >>> your app generates the actual query string, you may want to also scan
> > >> that
> > >>> for fielded query clauses (eg "badfield:value") and localParams (eg
> > >>> "{!dismax qf=badfield}value").
> > >>>
> > >>> Secondly, you can map internal Solr fields to aliases using this
> syntax
> > >> in
> > >>> the fl parameter: "display_name:real_solr_name". So when the request
> > >> comes
> > >>> in from your app, first you'll map from the requested field alias
> names
> > >> to
> > >>> internal Solr names (while enforcing the whitelist), and then in the
> fl
> > >>> parameter supply the aliases you want sent in the response.
> > >>>
> > >>>
> > >>> k/r,
> > >>> Scott
> > >>>
> > >>> On Wed, Oct 28, 2015 at 6:58 PM, Douglas McGilvray 
> > >> wrote:
> > >>>
> >  Hi all,
> > 
> >  First I’d like to say the nested facets and the json facet api in
> >  particular have made my world much better, I thank everyone
> involved,
> > >> you
> >  are all awesome.
> > 
> >  In my implementation has much of the solr query building working on
> > the
> >  browser, solr is behind a php server which acts as “proxy” and
> > doorman,
> >  filtering at the document level according to user role and supplying
> > >> some
> >  sensible maximums …
> > 
> >  However we now wish to filter just one or two potentially sensitive
> > >> fields
> >  in one document type according to user role (as determined in the
> php
> >  proxy). Duplicating documents (or cores) seems like overkill for
> just
> > >> two
> >  fields in one document type .. I wondered if it would be feasible
> (in
> > >> the
> >  interests of preventing malicious activity) to filter the query
> itself
> >  whether it be parameters (fl, facet.fields, terms, etc) … or even
> deny
> > >> any
> >  request in which fieldname occurs …
> > 
> >  Is there someway someone might obscure a fieldname in a request?
> > 
> >  Kind Regards & thanks in davacne,
> >  Douglas
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> Scott Stults | Founder & Solutions Architect | OpenSource
> Connections,
> > >> LLC
> > >>> | 434.409.2780
> > >>> http://www.opensourceconnections.com
> > >>
> > >>
> > >
> > >
> > > --
> > > --
> > >
> > > Benedetti Alessandro
> > > Visiting card : http://about.me/alessandro_benedetti
> > >
> > > "Tyger, tyger burning bright
> > 

Re: Solr Features

2015-11-05 Thread Alexandre Rafalovitch
Glad you liked it.

The problem with your request is that it is not clear what you already
know and in which direction you are trying to go. Cloud is a big topic
all on its own. Relevancy - another one. Crafting schema to best
represent your data - a third. Loading data with DIH vs. SolrJ vs. 3rd
party client - a fourth. Multilingual content - a fifth. And so on.

But if you want high level guidelines, I would pick a couple of Solr
books and look at their Tables of Contents. Then, do the same for the
Reference Guide. This should be a good mid-level overview of issues.

Regards,
Alex.


Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 5 November 2015 at 08:43, Salman Ansari  wrote:
> Thanks Alex for your response. Much appreciated effort! For sure, I will
> need to look for all those details and information to fully understand Solr
> but I don't have that much time in my hand. That's why I was thinking
> instead of reading everything from the beginning is to start with a feature
> list that briefly explains what each feature does and then dig deeper if I
> need more information. I will appreciate any comments/feedback regarding
> this.
>
> Regards,
> Salman
>
> On Thu, Nov 5, 2015 at 2:56 PM, Alexandre Rafalovitch 
> wrote:
>
>> Well, I've started to answer, but it hit a nerve and turned into a
>> guide. Which is now a blog post with 6 steps (not mentioning step 0 -
>> Admitting you have a problem).
>>
>> I hope this is helpful:
>> http://blog.outerthoughts.com/2015/11/learning-solr-comprehensively/
>>
>> Regards,
>>Alex.
>> 
>> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
>> http://www.solr-start.com/
>>
>>
>> On 5 November 2015 at 01:08, Salman Ansari 
>> wrote:
>> > Hi,
>> >
>> > I am in the process of looking for a comprehensive list of Solr features
>> in
>> > order to assess how much have we implemented, what are some features that
>> > we were unaware of that we can utilize etc. I have looked at the
>> following
>> > link for Solr features http://lucene.apache.org/solr/features.html but
>> it
>> > looks like it highlights the main features. I also looked at this page
>> > http://www.typo3-solr.com/en/what-is-solr/features/ which gives some
>> > details and I am looking for more of such list and possibly a
>> comprehensive
>> > list that combines them all.
>> >
>> > Regards,
>> > Salman
>>


Re: Boosting a document score when advertised! Please help!

2015-11-05 Thread Doug Turnbull
Funny I'm editing a chapter about boosting for a book :)
http://manning.com/turnbull

Anyway, I've been told by others that this blog post I wrote was really
useful in teaching them how to carefully boost documents. Maybe it would
help you?
http://opensourceconnections.com/blog/2013/07/21/improve-search-relevancy-by-telling-solr-exactly-what-you-want/

This post by John Berryman is also a nice companion
http://opensourceconnections.com/blog/2013/11/22/parameterizing-and-organizing-solr-boosts/

-Doug

On Thu, Nov 5, 2015 at 9:12 AM, Paul Libbrecht  wrote:

> Alessandro,
>
> none of them seem to match what I'd expect be done: given an extra param
> that indicates the author, for each query, add an extra boosting.
>
> Christian,
> I used to do that with a query component (in java) but I think that
> nowadays you can do that with the bq parameter of edismax.
>
> paul
>
>
>
> > Alessandro Benedetti 
> > 5 novembre 2015 12:33
> > Hi Christian,
> > there are several ways :
> >
> > 1) Elevation query component - it should be your winner :
> >
> https://cwiki.apache.org/confluence/display/solr/The+Query+Elevation+Component
> >
> > 2) Play with boosting according to your requirements
> >
> > Cheers
> >
> >
> >
> > liviuchrist...@yahoo.com.INVALID  liviuchrist...@yahoo.com.INVALID>
> > 5 novembre 2015 11:52
> > Hi everyone,I'm building a food recipe search engine based on solr.
> >
> > I need to boost documents score for the recipes that their authors
> > paid for in order to have them returned first when somebody searches
> > for "chocolate cake with hazelnuts". So those recipes that match the
> > query terms and their authors paid to be listed first need to be
> > returned first, ahead of the unpaid ones that match the query.
> >
> > How do I do that in Solr?
> > PLEASE HELP!
> > Regards,
> > Christian
> >
> >
>
>


-- 
*Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections
, LLC | 240.476.9983
Author: Relevant Search 
This e-mail and all contents, including attachments, is considered to be
Company Confidential unless explicitly stated otherwise, regardless
of whether attachments are marked as such.


Re: collection API timeout

2015-11-05 Thread Julien David

Seems I'ill need to upgrade to 5.3.1

It is possible to upgrade from 4.9 to 5.3 or do I need deploy all 
intermediate versions?


Thks











Re: collection API timeout

2015-11-05 Thread Erick Erickson
You should be able to go straight to 5.3.1

Solr/Lucene tries to guarantee that the indexes are readable
one major version back, i.e. 5.x Solr/Lucene code should
be able to read any 4.x index. New segments are written in the
most modern format, so over time all the 4x remnants should
disappear.

You can also use the IndexUpgrader tool to make upgrade
the index, see:
https://lucene.apache.org/core/5_3_1/core/org/apache/lucene/index/IndexUpgrader.html
take a backup first of course ;)

You can also optimize after upgrading to 5.3.1 which will rewrite
all the segments too, but personally I'd run the IndexUpgrade
by preference unless it's an index that doesn't change very often if
at all.

All that said, you should simply be able to install 5.3.1 and start
running, running
the IndexUpgrade tool or optimizing isn't actually necessary.

Best,
Erick

On Thu, Nov 5, 2015 at 1:28 AM, Julien David  wrote:
> Seems I'ill need to upgrade to 5.3.1
>
> It is possible to upgrade from 4.9 to 5.3 or do I need deploy all
> intermediate versions?
>
> Thks
>
>
>
>>
>
>
>


Re: Solr Features

2015-11-05 Thread Erick Erickson
I agree with Alexandre, the question is far too broad.

Better to pick something you want to _do_ and ask
how to accomplish that. Define a use case that's useful
for your user base (actual or future) and see if Solr
can do that.

See the "books" section here for a number of resources
that people have spent inordinate amounts of time
creating to allow you to use your time wisely:
http://lucene.apache.org/solr/resources.html#documentation

Best,
Erick

On Thu, Nov 5, 2015 at 6:18 AM, Alexandre Rafalovitch
 wrote:
> Glad you liked it.
>
> The problem with your request is that it is not clear what you already
> know and in which direction you are trying to go. Cloud is a big topic
> all on its own. Relevancy - another one. Crafting schema to best
> represent your data - a third. Loading data with DIH vs. SolrJ vs. 3rd
> party client - a fourth. Multilingual content - a fifth. And so on.
>
> But if you want high level guidelines, I would pick a couple of Solr
> books and look at their Tables of Contents. Then, do the same for the
> Reference Guide. This should be a good mid-level overview of issues.
>
> Regards,
> Alex.
>
> 
> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> http://www.solr-start.com/
>
>
> On 5 November 2015 at 08:43, Salman Ansari  wrote:
>> Thanks Alex for your response. Much appreciated effort! For sure, I will
>> need to look for all those details and information to fully understand Solr
>> but I don't have that much time in my hand. That's why I was thinking
>> instead of reading everything from the beginning is to start with a feature
>> list that briefly explains what each feature does and then dig deeper if I
>> need more information. I will appreciate any comments/feedback regarding
>> this.
>>
>> Regards,
>> Salman
>>
>> On Thu, Nov 5, 2015 at 2:56 PM, Alexandre Rafalovitch 
>> wrote:
>>
>>> Well, I've started to answer, but it hit a nerve and turned into a
>>> guide. Which is now a blog post with 6 steps (not mentioning step 0 -
>>> Admitting you have a problem).
>>>
>>> I hope this is helpful:
>>> http://blog.outerthoughts.com/2015/11/learning-solr-comprehensively/
>>>
>>> Regards,
>>>Alex.
>>> 
>>> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
>>> http://www.solr-start.com/
>>>
>>>
>>> On 5 November 2015 at 01:08, Salman Ansari 
>>> wrote:
>>> > Hi,
>>> >
>>> > I am in the process of looking for a comprehensive list of Solr features
>>> in
>>> > order to assess how much have we implemented, what are some features that
>>> > we were unaware of that we can utilize etc. I have looked at the
>>> following
>>> > link for Solr features http://lucene.apache.org/solr/features.html but
>>> it
>>> > looks like it highlights the main features. I also looked at this page
>>> > http://www.typo3-solr.com/en/what-is-solr/features/ which gives some
>>> > details and I am looking for more of such list and possibly a
>>> comprehensive
>>> > list that combines them all.
>>> >
>>> > Regards,
>>> > Salman
>>>


Re: Solr Features

2015-11-05 Thread Salman Ansari
Thanks Alex for your response. Much appreciated effort! For sure, I will
need to look for all those details and information to fully understand Solr
but I don't have that much time in my hand. That's why I was thinking
instead of reading everything from the beginning is to start with a feature
list that briefly explains what each feature does and then dig deeper if I
need more information. I will appreciate any comments/feedback regarding
this.

Regards,
Salman

On Thu, Nov 5, 2015 at 2:56 PM, Alexandre Rafalovitch 
wrote:

> Well, I've started to answer, but it hit a nerve and turned into a
> guide. Which is now a blog post with 6 steps (not mentioning step 0 -
> Admitting you have a problem).
>
> I hope this is helpful:
> http://blog.outerthoughts.com/2015/11/learning-solr-comprehensively/
>
> Regards,
>Alex.
> 
> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> http://www.solr-start.com/
>
>
> On 5 November 2015 at 01:08, Salman Ansari 
> wrote:
> > Hi,
> >
> > I am in the process of looking for a comprehensive list of Solr features
> in
> > order to assess how much have we implemented, what are some features that
> > we were unaware of that we can utilize etc. I have looked at the
> following
> > link for Solr features http://lucene.apache.org/solr/features.html but
> it
> > looks like it highlights the main features. I also looked at this page
> > http://www.typo3-solr.com/en/what-is-solr/features/ which gives some
> > details and I am looking for more of such list and possibly a
> comprehensive
> > list that combines them all.
> >
> > Regards,
> > Salman
>


Re: Boosting a document score when advertised! Please help!

2015-11-05 Thread Paul Libbrecht
Alessandro,

none of them seem to match what I'd expect be done: given an extra param
that indicates the author, for each query, add an extra boosting.

Christian,
I used to do that with a query component (in java) but I think that
nowadays you can do that with the bq parameter of edismax.

paul



> Alessandro Benedetti 
> 5 novembre 2015 12:33
> Hi Christian,
> there are several ways :
>
> 1) Elevation query component - it should be your winner :
> https://cwiki.apache.org/confluence/display/solr/The+Query+Elevation+Component
>
> 2) Play with boosting according to your requirements
>
> Cheers
>
>
>
> liviuchrist...@yahoo.com.INVALID 
> 5 novembre 2015 11:52
> Hi everyone,I'm building a food recipe search engine based on solr.
>
> I need to boost documents score for the recipes that their authors
> paid for in order to have them returned first when somebody searches
> for "chocolate cake with hazelnuts". So those recipes that match the
> query terms and their authors paid to be listed first need to be
> returned first, ahead of the unpaid ones that match the query.
>
> How do I do that in Solr?
> PLEASE HELP!
> Regards,
> Christian
>
>