Re: Problem with numeric "math" types and the dataimport handler

2015-05-19 Thread Shawn Heisey
On 5/20/2015 12:06 AM, Shalin Shekhar Mangar wrote:
> Sounds similar to https://issues.apache.org/jira/browse/SOLR-6165 which I
> fixed in 4.10. Can you try a newer release?

I can't upgrade yet.  I am using a plugin that hasn't been verified
against anything newer than 4.9.  When a new version becomes available,
I will begin testing 5.x.

The patch does look like it will fix the issue perfectly ... so I am
very likely to patch 4.9.1 and build a custom war.

Thanks,
Shawn



Re: Problem with numeric "math" types and the dataimport handler

2015-05-19 Thread Shalin Shekhar Mangar
Sounds similar to https://issues.apache.org/jira/browse/SOLR-6165 which I
fixed in 4.10. Can you try a newer release?

On Wed, May 20, 2015 at 6:51 AM, Shawn Heisey  wrote:

> An unusual problem is happening with the DIH on a field that is an
> unsigned BIGINT in the MySQL database.  This is Solr 4.9.1 without
> SolrCloud, running on OpenJDK 7u79.
>
> During actual import, everything is fine.  The problem comes when I
> restart Solr and the transaction logs are replayed.  I get the following
> exception for every document replayed:
>
> WARN  - 2015-05-19 18:52:44.461;
> org.apache.solr.update.UpdateLog$LogReplayer; REYPLAY_ERR: IOException
> reading log
> org.apache.solr.common.SolrException: ERROR: [doc=getty26025060] Error
> adding field 'file_size'='java.math.BigInteger:5934053' msg=For input
> string: "java.math.BigInteger:5934053"
>
> I believe I need one of two things to solve this problem:
>
> 1) A connection parameter for the MySQL JDBC driver that will force the
> use of java.lang.* objects and exclude the java.math.* classes.
>
> 2) Write the actual imported value into the transaction log rather than
> include the class name in the string representation.  Testing shows that
> the toString() method on BigInteger does *NOT* include the class name,
> so I am confused about why the class name is being recorded in the
> transaction log.
>
> For the first solution, I've been looking for a MySQL connection
> parameter to change the Java object types that get used, but so far I
> haven't found one.  For the second, I should probably open an issue in
> Jira, but I wanted to run it by everyone before taking that step.
>
> I have another index (building from a different database) where this
> isn't happening, because the MySQL column is *NOT* unsigned, which
> causes the JDBC driver to use java.lang.Long instead of
> java.math.BigInteger.
>
> Thanks,
> Shawn
>
>


-- 
Regards,
Shalin Shekhar Mangar.


Re: JSON

2015-05-19 Thread William Bell
thanks

On Tue, May 19, 2015 at 11:38 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> Someone just opened https://issues.apache.org/jira/browse/SOLR-7574 which
> is exactly what you experienced.
>
> On Tue, May 19, 2015 at 8:34 PM, Shalin Shekhar Mangar <
> shalinman...@gmail.com> wrote:
>
> > That sounds reasonable. Please open a Jira issue.
> >
> > On Sat, May 16, 2015 at 9:22 AM, William Bell 
> wrote:
> >
> >> Can we gt this one fixed? If the Body is empty don't through a Null
> >> Pointer
> >> Exception?
> >>
> >> Thanks
> >>
> >> >*
> >>
> http://localhost:8983/solr/gettingstarted/select?wt=json&indent=true&q=foundation
> >> "
> >> <
> >>
> http://localhost:8983/solr/gettingstarted/select?wt=json&indent=true&q=foundation%22
> >> >;*
> >> >* -H "Content-type:application/json"*
> >>
> >> You're telling Solr the body encoding is JSON, but then you don't send
> any
> >> body.
> >> We could catch that error earlier perhaps, but it still looks like an
> >> error?
> >>
> >> -Yonik
> >>
> >> --
> >> Bill Bell
> >> billnb...@gmail.com
> >> cell 720-256-8076
> >>
> >
> >
> >
> > --
> > Regards,
> > Shalin Shekhar Mangar.
> >
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>



-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: JSON

2015-05-19 Thread Shalin Shekhar Mangar
Someone just opened https://issues.apache.org/jira/browse/SOLR-7574 which
is exactly what you experienced.

On Tue, May 19, 2015 at 8:34 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> That sounds reasonable. Please open a Jira issue.
>
> On Sat, May 16, 2015 at 9:22 AM, William Bell  wrote:
>
>> Can we gt this one fixed? If the Body is empty don't through a Null
>> Pointer
>> Exception?
>>
>> Thanks
>>
>> >*
>> http://localhost:8983/solr/gettingstarted/select?wt=json&indent=true&q=foundation
>> "
>> <
>> http://localhost:8983/solr/gettingstarted/select?wt=json&indent=true&q=foundation%22
>> >;*
>> >* -H "Content-type:application/json"*
>>
>> You're telling Solr the body encoding is JSON, but then you don't send any
>> body.
>> We could catch that error earlier perhaps, but it still looks like an
>> error?
>>
>> -Yonik
>>
>> --
>> Bill Bell
>> billnb...@gmail.com
>> cell 720-256-8076
>>
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>



-- 
Regards,
Shalin Shekhar Mangar.


Push ExternalFileField to Solr

2015-05-19 Thread Floyd Wu
Hi I have two server(Physical) that run my application and solr. I use
external file field to do some search result ranking.

According to the wiki page, external file field data need to resident in
{solr}\data directory. Because EFF data is generated by my application. How
can I push this file to solr. Is there any API or solr web services or any
mechanism help this?

floyd


Re: Wildcard/Regex Searching with Decimal Fields

2015-05-19 Thread Todd Long
Sounds good. Thank you for the synonym (definitely will work on this) and
padding suggestions.

- Todd



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Wildcard-Regex-Searching-with-Decimal-Fields-tp4206015p4206421.html
Sent from the Solr - User mailing list archive at Nabble.com.


Problem with numeric "math" types and the dataimport handler

2015-05-19 Thread Shawn Heisey
An unusual problem is happening with the DIH on a field that is an
unsigned BIGINT in the MySQL database.  This is Solr 4.9.1 without
SolrCloud, running on OpenJDK 7u79.

During actual import, everything is fine.  The problem comes when I
restart Solr and the transaction logs are replayed.  I get the following
exception for every document replayed:

WARN  - 2015-05-19 18:52:44.461;
org.apache.solr.update.UpdateLog$LogReplayer; REYPLAY_ERR: IOException
reading log
org.apache.solr.common.SolrException: ERROR: [doc=getty26025060] Error
adding field 'file_size'='java.math.BigInteger:5934053' msg=For input
string: "java.math.BigInteger:5934053"

I believe I need one of two things to solve this problem:

1) A connection parameter for the MySQL JDBC driver that will force the
use of java.lang.* objects and exclude the java.math.* classes.

2) Write the actual imported value into the transaction log rather than
include the class name in the string representation.  Testing shows that
the toString() method on BigInteger does *NOT* include the class name,
so I am confused about why the class name is being recorded in the
transaction log.

For the first solution, I've been looking for a MySQL connection
parameter to change the Java object types that get used, but so far I
haven't found one.  For the second, I should probably open an issue in
Jira, but I wanted to run it by everyone before taking that step.

I have another index (building from a different database) where this
isn't happening, because the MySQL column is *NOT* unsigned, which
causes the JDBC driver to use java.lang.Long instead of
java.math.BigInteger.

Thanks,
Shawn



Re: Suggestion on field type

2015-05-19 Thread Walter Underwood
A field type based on BigDecimal could be useful, but that would be a fair 
amount more work.

Double is usually sufficient for big data analysis, especially if you are doing 
simple aggregates (which is most of what Solr can do). 

If you want to do something fancier, you’ll need a database, not a search 
engine. As I usually do, I’ll recommend MarkLogic, which is pretty awesome 
stuff. Solr would not be in my top handful of solutions for big data analysis.

Personally, I’d stuff it all in JSON in Amazon S3 and run map-reduce against 
it. If you need to do something like that, you could store a JSON blob in Solr 
with the exact values, and use approximate fields to narrow things down. Of 
course, MarkLogic has a graceful interface to Hadoop.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

On May 19, 2015, at 4:09 PM, Erick Erickson  wrote:

> Well, double is all you've got, so that's what you have to work with.
> _Every_ float is an approximation when you get out to some number of
> decimal places, so you don't really have any choice. Of course it'll
> affect the result. The question is whether it affects the result
> enough to matter which is application-specific.
> 
> Best,
> Erick
> 
> On Tue, May 19, 2015 at 12:05 PM, Vishal Swaroop  wrote:
>> Also 10481.5711458735456*79* indexes to 10481.571145873546 using double
>> > positionIncrementGap="0" omitNorms="false"/>
>> 
>> On Tue, May 19, 2015 at 2:57 PM, Vishal Swaroop 
>> wrote:
>> 
>>> Thanks Erick... I can ignore the trailing zeros
>>> 
>>> I am indexing data from Vertica database... Though *double *is very close
>>> but it SOLR indexes 14 digits after decimal
>>> e.g. actual db value is 15 digits after decimal i.e. 249.81735425382405*2*
>>> 
>>> SOLR indexes 14 digits after decimal i.e. 249.81735425382405
>>> 
>>> As these values will be used for big data analysis, so I am wondering if
>>> it might impact the result.
>>> >> positionIncrementGap="0" omitNorms="false"/>
>>> 
>>> Any suggestions ?
>>> 
>>> Regards
>>> 
>>> 
>>> On Tue, May 19, 2015 at 1:41 PM, Erick Erickson 
>>> wrote:
>>> 
 Why do you want to keep trailing zeros? The original input is
 preserved in the "stored" portion and will be returned if you specify
 the field in your "fl" list. I'm assuming here that you're looking at
 the actual indexed terms, and don't really understand why the trailing
 zeros are important
 
 Do not use strings.
 
 Best
 Erick
 
 On Tue, May 19, 2015 at 10:22 AM, Vishal Swaroop 
 wrote:
> Thank you John and Jack...
> 
> Looks like double is much closer... it removes trailing zeros...
> a) Is there a way to keep trailing zeros
> double : 194.846189733028000 indexes to 194.846189733028
>  positionIncrementGap="0" omitNorms="false"/>
> 
> b) If I use "String" then will there be issue doing range query
> 
> float
>  positionIncrementGap="0" omitNorms="false"/>
> 277.677836785372000 indexes to 277.67783
> 
> 
> 
> On Tue, May 19, 2015 at 11:56 AM, Jack Krupansky <
 jack.krupan...@gmail.com>
> wrote:
> 
>> "double" (solr.TrieDoubleField) gives more precision
>> 
>> See:
>> 
>> 
 https://lucene.apache.org/solr/5_1_0/solr-core/org/apache/solr/schema/TrieDoubleField.html
>> 
>> -- Jack Krupansky
>> 
>> On Tue, May 19, 2015 at 11:27 AM, Vishal Swaroop  
>> wrote:
>> 
>>> Please suggest which numeric field type to use so that I can get
 complete
>>> value.
>>> 
>>> e.g value in database is : 194.846189733028000
>>> 
>>> If I index it as float SOLR indexes it as 194.84619 where as I need
>>> complete value i.e 194.846189733028000
>>> I will also be doing range query on this field.
>>> 
>>> >> positionIncrementGap="0"/>
>>> 
>>> >> multiValued="false" />
>>> 
>>> Regards
>>> 
>> 
 
>>> 
>>> 



Re: Wildcard/Regex Searching with Decimal Fields

2015-05-19 Thread Erick Erickson
Then it seems like you can just index the raw strings as a string
field and suggest with that but fire the actual query against the
numeric type.

Best,
Erick

On Tue, May 19, 2015 at 3:25 PM, Todd Long  wrote:
> Erick Erickson wrote
>> But I _really_ have to go back to one of my original questions: What's
>> the use-case?
>
> The use-case is with autocompleting fields. The user might know a frequency
> starts with 2 so we want to limit those results (e.g. 2, 23, 214, etc.). We
> would still index/store the numeric-type but maintain an additional string
> index for autocompleting (and regular expressions). We can throw away the
> "contains" but will at least need the "starts with" behavior.
>
> - Todd
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Wildcard-Regex-Searching-with-Decimal-Fields-tp4206015p4206398.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Suggestion on field type

2015-05-19 Thread Erick Erickson
Well, double is all you've got, so that's what you have to work with.
_Every_ float is an approximation when you get out to some number of
decimal places, so you don't really have any choice. Of course it'll
affect the result. The question is whether it affects the result
enough to matter which is application-specific.

Best,
Erick

On Tue, May 19, 2015 at 12:05 PM, Vishal Swaroop  wrote:
> Also 10481.5711458735456*79* indexes to 10481.571145873546 using double
>  positionIncrementGap="0" omitNorms="false"/>
>
> On Tue, May 19, 2015 at 2:57 PM, Vishal Swaroop 
> wrote:
>
>> Thanks Erick... I can ignore the trailing zeros
>>
>> I am indexing data from Vertica database... Though *double *is very close
>> but it SOLR indexes 14 digits after decimal
>> e.g. actual db value is 15 digits after decimal i.e. 249.81735425382405*2*
>>
>> SOLR indexes 14 digits after decimal i.e. 249.81735425382405
>>
>> As these values will be used for big data analysis, so I am wondering if
>> it might impact the result.
>> > positionIncrementGap="0" omitNorms="false"/>
>>
>> Any suggestions ?
>>
>> Regards
>>
>>
>> On Tue, May 19, 2015 at 1:41 PM, Erick Erickson 
>> wrote:
>>
>>> Why do you want to keep trailing zeros? The original input is
>>> preserved in the "stored" portion and will be returned if you specify
>>> the field in your "fl" list. I'm assuming here that you're looking at
>>> the actual indexed terms, and don't really understand why the trailing
>>> zeros are important
>>>
>>> Do not use strings.
>>>
>>> Best
>>> Erick
>>>
>>> On Tue, May 19, 2015 at 10:22 AM, Vishal Swaroop 
>>> wrote:
>>> > Thank you John and Jack...
>>> >
>>> > Looks like double is much closer... it removes trailing zeros...
>>> > a) Is there a way to keep trailing zeros
>>> > double : 194.846189733028000 indexes to 194.846189733028
>>> > >> > positionIncrementGap="0" omitNorms="false"/>
>>> >
>>> > b) If I use "String" then will there be issue doing range query
>>> >
>>> > float
>>> > >> > positionIncrementGap="0" omitNorms="false"/>
>>> > 277.677836785372000 indexes to 277.67783
>>> >
>>> >
>>> >
>>> > On Tue, May 19, 2015 at 11:56 AM, Jack Krupansky <
>>> jack.krupan...@gmail.com>
>>> > wrote:
>>> >
>>> >> "double" (solr.TrieDoubleField) gives more precision
>>> >>
>>> >> See:
>>> >>
>>> >>
>>> https://lucene.apache.org/solr/5_1_0/solr-core/org/apache/solr/schema/TrieDoubleField.html
>>> >>
>>> >> -- Jack Krupansky
>>> >>
>>> >> On Tue, May 19, 2015 at 11:27 AM, Vishal Swaroop >> >
>>> >> wrote:
>>> >>
>>> >> > Please suggest which numeric field type to use so that I can get
>>> complete
>>> >> > value.
>>> >> >
>>> >> > e.g value in database is : 194.846189733028000
>>> >> >
>>> >> > If I index it as float SOLR indexes it as 194.84619 where as I need
>>> >> > complete value i.e 194.846189733028000
>>> >> > I will also be doing range query on this field.
>>> >> >
>>> >> > >> >> > positionIncrementGap="0"/>
>>> >> >
>>> >> > >> >> >  multiValued="false" />
>>> >> >
>>> >> > Regards
>>> >> >
>>> >>
>>>
>>
>>


Re: Escaping special chars when exporting result sets

2015-05-19 Thread Joel Bernstein
This should be considered a bug in the /export handler. Please create a
jira ticket for this.

Thanks

Joel Bernstein
http://joelsolr.blogspot.com/

On Tue, May 19, 2015 at 2:32 PM, Angelo Veltens 
wrote:

> Hi all!
>
> We use Solr 4.10.3 and have configured an /export SearchHandler in
> addition to the default SearchHandler /select.
>
> 
>   
> {!xport}
> xsort
> false
> username,description
>
>   
> query
>   
> 
>
> The handler follows the example from section "Exporting Result Sets" of
> the user guide.
>
> The description field may contain line breaks (\n) and quotation marks (").
>
> When using the /select handler those characters are properly escaped. E.g.
> we get following value in the response:
>
> "description":"Lorem ipsum\n\ndolor sit amet. \"hello\" world \" test",
>
> BUT when using the /export handler those characters are NOT escaped. We
> get:
>
> "description":"Lorem ipsum
>
> dolor sit amet. "hello" world " test"
>
> The latter is not valid JSON in our understanding.
>
> What are we doing wrong? How can we get properly escaped JSON?
>
> Thanks in advance!
>
> Best regards,
> Angelo Veltens
>
>
>
>
>
>
>


Re: Wildcard/Regex Searching with Decimal Fields

2015-05-19 Thread Todd Long
Erick Erickson wrote
> But I _really_ have to go back to one of my original questions: What's
> the use-case?

The use-case is with autocompleting fields. The user might know a frequency
starts with 2 so we want to limit those results (e.g. 2, 23, 214, etc.). We
would still index/store the numeric-type but maintain an additional string
index for autocompleting (and regular expressions). We can throw away the
"contains" but will at least need the "starts with" behavior.

- Todd



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Wildcard-Regex-Searching-with-Decimal-Fields-tp4206015p4206398.html
Sent from the Solr - User mailing list archive at Nabble.com.


Looking up arrays in a sub-entity

2015-05-19 Thread rumford
I have an entity which extracts records from a MySQL data source. One of the
fields is meant to be a multi-value field, except, this data source does not
store the values. Rather, it stores their ids in a single column as a
pipe-delimited string. The values themselves are in a separate table, in an
entirely different database, on a different server.

I have written a transformer to make an array out of this delimited string,
but after that I'm at a loss. Can I iterate over an array in a sub-entity? I
need to query that second data source for each of the IDs that I find in
each record of the first data source.

Other people who have asked similar questions have been able to solve their
issue with a join, but in my case I cannot.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Looking-up-arrays-in-a-sub-entity-tp4206380.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Suggestion on field type

2015-05-19 Thread Vishal Swaroop
Also 10481.5711458735456*79* indexes to 10481.571145873546 using double


On Tue, May 19, 2015 at 2:57 PM, Vishal Swaroop 
wrote:

> Thanks Erick... I can ignore the trailing zeros
>
> I am indexing data from Vertica database... Though *double *is very close
> but it SOLR indexes 14 digits after decimal
> e.g. actual db value is 15 digits after decimal i.e. 249.81735425382405*2*
>
> SOLR indexes 14 digits after decimal i.e. 249.81735425382405
>
> As these values will be used for big data analysis, so I am wondering if
> it might impact the result.
>  positionIncrementGap="0" omitNorms="false"/>
>
> Any suggestions ?
>
> Regards
>
>
> On Tue, May 19, 2015 at 1:41 PM, Erick Erickson 
> wrote:
>
>> Why do you want to keep trailing zeros? The original input is
>> preserved in the "stored" portion and will be returned if you specify
>> the field in your "fl" list. I'm assuming here that you're looking at
>> the actual indexed terms, and don't really understand why the trailing
>> zeros are important
>>
>> Do not use strings.
>>
>> Best
>> Erick
>>
>> On Tue, May 19, 2015 at 10:22 AM, Vishal Swaroop 
>> wrote:
>> > Thank you John and Jack...
>> >
>> > Looks like double is much closer... it removes trailing zeros...
>> > a) Is there a way to keep trailing zeros
>> > double : 194.846189733028000 indexes to 194.846189733028
>> > > > positionIncrementGap="0" omitNorms="false"/>
>> >
>> > b) If I use "String" then will there be issue doing range query
>> >
>> > float
>> > > > positionIncrementGap="0" omitNorms="false"/>
>> > 277.677836785372000 indexes to 277.67783
>> >
>> >
>> >
>> > On Tue, May 19, 2015 at 11:56 AM, Jack Krupansky <
>> jack.krupan...@gmail.com>
>> > wrote:
>> >
>> >> "double" (solr.TrieDoubleField) gives more precision
>> >>
>> >> See:
>> >>
>> >>
>> https://lucene.apache.org/solr/5_1_0/solr-core/org/apache/solr/schema/TrieDoubleField.html
>> >>
>> >> -- Jack Krupansky
>> >>
>> >> On Tue, May 19, 2015 at 11:27 AM, Vishal Swaroop > >
>> >> wrote:
>> >>
>> >> > Please suggest which numeric field type to use so that I can get
>> complete
>> >> > value.
>> >> >
>> >> > e.g value in database is : 194.846189733028000
>> >> >
>> >> > If I index it as float SOLR indexes it as 194.84619 where as I need
>> >> > complete value i.e 194.846189733028000
>> >> > I will also be doing range query on this field.
>> >> >
>> >> > > >> > positionIncrementGap="0"/>
>> >> >
>> >> > > >> >  multiValued="false" />
>> >> >
>> >> > Regards
>> >> >
>> >>
>>
>
>


Re: Suggestion on field type

2015-05-19 Thread Vishal Swaroop
Thanks Erick... I can ignore the trailing zeros

I am indexing data from Vertica database... Though *double *is very close
but it SOLR indexes 14 digits after decimal
e.g. actual db value is 15 digits after decimal i.e. 249.81735425382405*2*
SOLR indexes 14 digits after decimal i.e. 249.81735425382405

As these values will be used for big data analysis, so I am wondering if it
might impact the result.


Any suggestions ?

Regards


On Tue, May 19, 2015 at 1:41 PM, Erick Erickson 
wrote:

> Why do you want to keep trailing zeros? The original input is
> preserved in the "stored" portion and will be returned if you specify
> the field in your "fl" list. I'm assuming here that you're looking at
> the actual indexed terms, and don't really understand why the trailing
> zeros are important
>
> Do not use strings.
>
> Best
> Erick
>
> On Tue, May 19, 2015 at 10:22 AM, Vishal Swaroop 
> wrote:
> > Thank you John and Jack...
> >
> > Looks like double is much closer... it removes trailing zeros...
> > a) Is there a way to keep trailing zeros
> > double : 194.846189733028000 indexes to 194.846189733028
> >  > positionIncrementGap="0" omitNorms="false"/>
> >
> > b) If I use "String" then will there be issue doing range query
> >
> > float
> >  > positionIncrementGap="0" omitNorms="false"/>
> > 277.677836785372000 indexes to 277.67783
> >
> >
> >
> > On Tue, May 19, 2015 at 11:56 AM, Jack Krupansky <
> jack.krupan...@gmail.com>
> > wrote:
> >
> >> "double" (solr.TrieDoubleField) gives more precision
> >>
> >> See:
> >>
> >>
> https://lucene.apache.org/solr/5_1_0/solr-core/org/apache/solr/schema/TrieDoubleField.html
> >>
> >> -- Jack Krupansky
> >>
> >> On Tue, May 19, 2015 at 11:27 AM, Vishal Swaroop 
> >> wrote:
> >>
> >> > Please suggest which numeric field type to use so that I can get
> complete
> >> > value.
> >> >
> >> > e.g value in database is : 194.846189733028000
> >> >
> >> > If I index it as float SOLR indexes it as 194.84619 where as I need
> >> > complete value i.e 194.846189733028000
> >> > I will also be doing range query on this field.
> >> >
> >> >  >> > positionIncrementGap="0"/>
> >> >
> >> >  >> >  multiValued="false" />
> >> >
> >> > Regards
> >> >
> >>
>


Re: Issue with German search

2015-05-19 Thread Doug Turnbull
No prob, easy mistake to make :)

On Tue, May 19, 2015 at 2:14 PM, shamik  wrote:

> Thanks a ton Doug, I should have figured this out, pretty stupid of me.
>
> Appreciate your help.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Issue-with-German-search-tp4206104p4206357.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
*Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections,
LLC | 240.476.9983 | http://www.opensourceconnections.com
Author: Relevant Search  from Manning
Publications
This e-mail and all contents, including attachments, is considered to be
Company Confidential unless explicitly stated otherwise, regardless
of whether attachments are marked as such.


Re: Issue with German search

2015-05-19 Thread shamik
Thanks a ton Doug, I should have figured this out, pretty stupid of me.

Appreciate your help.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Issue-with-German-search-tp4206104p4206357.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Issue with German search

2015-05-19 Thread Doug Turnbull
I see the problem, instead of using debugQuery directly you might want to
check out splainer (http://splainer.io) it can help detect these sorts of
relevance problems. For example, in your case you just need to wrap the
search in parenthesis.

There's a difference between text:foo bar and text:(foo bar). Without the
parethesis the "bar" goes to the default search field. Instead of
q=title_deu:Software%20und%20Downloads try q=title_deu:(Software%
20und%20Downloads)

This example shows your problem using a different Solr instance, you can
see "hunting" matches are happening in text, not the catch_line field

http://splainer.io/#?solr=http:%2F%2Fsolr.quepid.com%2Fsolr%2Fstatedecoded%2Fselect%3Fq%3Dcatch_line:deer%20hunting

wrap in parens and viola, both matches go to the catch_line field:

http://splainer.io/#?solr=http:%2F%2Fsolr.quepid.com%2Fsolr%2Fstatedecoded%2Fselect%3Fq%3Dcatch_line:(deer%20hunting)

Hope that helps
-Doug

On Tue, May 19, 2015 at 1:45 PM, shamik  wrote:

> Thanks Doug. I'm using eDismax
>
> Here's my Solr query :
>
>
> http://localhost:8983/solr/testhandlerdeu?debugQuery=true&q=title_deu:Software%20und%20Downloads
>
> Here's my request handler.
>
> 
> 
> explicit
> 0.01
> velocity
> browse
>  name="v.contentType">text/html;charset=UTF-8
> layout
> testhandler
> Test Request Handler German
>
> edismax
> AND
>
> *:*
> 15
> *,score
> name_deu^1.2  title_deu^10.0
> description_deu^5.0 
> text_deu
>
>
> on
> 1
> -1
> index
> enum
> cat
> manu_exact
> content_type
> author
>
>
> true
> 
> 
> name subject description_deu
> name_deu title_deu
> html
> 20
> 20
> 20
> false
> true
> breakIterator
> SENTENCE
>
>
> true
> default
> true
> false
> false
> 1
>
> 
> 
> spellcheck
> 
> 
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Issue-with-German-search-tp4206104p4206341.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
*Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections,
LLC | 240.476.9983 | http://www.opensourceconnections.com
Author: Relevant Search  from Manning
Publications
This e-mail and all contents, including attachments, is considered to be
Company Confidential unless explicitly stated otherwise, regardless
of whether attachments are marked as such.


Re: Issue with German search

2015-05-19 Thread shamik
Thanks Doug. I'm using eDismax

Here's my Solr query :

http://localhost:8983/solr/testhandlerdeu?debugQuery=true&q=title_deu:Software%20und%20Downloads

Here's my request handler.



explicit
0.01
velocity
browse
text/html;charset=UTF-8 
  
layout
testhandler
Test Request Handler German

edismax
AND

*:*
15
*,score
name_deu^1.2  title_deu^10.0 
description_deu^5.0 
text_deu   


on
1
-1
index
enum
cat
manu_exact
content_type
author


true


name subject description_deu name_deu 
title_deu
html
20
20
20
false
true
breakIterator
SENTENCE


true
default
true
false
false
1



spellcheck







--
View this message in context: 
http://lucene.472066.n3.nabble.com/Issue-with-German-search-tp4206104p4206341.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Suggestion on field type

2015-05-19 Thread Erick Erickson
Why do you want to keep trailing zeros? The original input is
preserved in the "stored" portion and will be returned if you specify
the field in your "fl" list. I'm assuming here that you're looking at
the actual indexed terms, and don't really understand why the trailing
zeros are important

Do not use strings.

Best
Erick

On Tue, May 19, 2015 at 10:22 AM, Vishal Swaroop  wrote:
> Thank you John and Jack...
>
> Looks like double is much closer... it removes trailing zeros...
> a) Is there a way to keep trailing zeros
> double : 194.846189733028000 indexes to 194.846189733028
>  positionIncrementGap="0" omitNorms="false"/>
>
> b) If I use "String" then will there be issue doing range query
>
> float
>  positionIncrementGap="0" omitNorms="false"/>
> 277.677836785372000 indexes to 277.67783
>
>
>
> On Tue, May 19, 2015 at 11:56 AM, Jack Krupansky 
> wrote:
>
>> "double" (solr.TrieDoubleField) gives more precision
>>
>> See:
>>
>> https://lucene.apache.org/solr/5_1_0/solr-core/org/apache/solr/schema/TrieDoubleField.html
>>
>> -- Jack Krupansky
>>
>> On Tue, May 19, 2015 at 11:27 AM, Vishal Swaroop 
>> wrote:
>>
>> > Please suggest which numeric field type to use so that I can get complete
>> > value.
>> >
>> > e.g value in database is : 194.846189733028000
>> >
>> > If I index it as float SOLR indexes it as 194.84619 where as I need
>> > complete value i.e 194.846189733028000
>> > I will also be doing range query on this field.
>> >
>> > > > positionIncrementGap="0"/>
>> >
>> > > >  multiValued="false" />
>> >
>> > Regards
>> >
>>


Re: Suggestion on field type

2015-05-19 Thread Vishal Swaroop
Thank you John and Jack...

Looks like double is much closer... it removes trailing zeros...
a) Is there a way to keep trailing zeros
double : 194.846189733028000 indexes to 194.846189733028


b) If I use "String" then will there be issue doing range query

float

277.677836785372000 indexes to 277.67783



On Tue, May 19, 2015 at 11:56 AM, Jack Krupansky 
wrote:

> "double" (solr.TrieDoubleField) gives more precision
>
> See:
>
> https://lucene.apache.org/solr/5_1_0/solr-core/org/apache/solr/schema/TrieDoubleField.html
>
> -- Jack Krupansky
>
> On Tue, May 19, 2015 at 11:27 AM, Vishal Swaroop 
> wrote:
>
> > Please suggest which numeric field type to use so that I can get complete
> > value.
> >
> > e.g value in database is : 194.846189733028000
> >
> > If I index it as float SOLR indexes it as 194.84619 where as I need
> > complete value i.e 194.846189733028000
> > I will also be doing range query on this field.
> >
> >  > positionIncrementGap="0"/>
> >
> >  >  multiValued="false" />
> >
> > Regards
> >
>


Re: Issue serving concurrent requests to SOLR on PROD

2015-05-19 Thread Erick Erickson
Just to pile on:

How's your CPU utilization? That's the first place to look. The very first
question to answer is:
"Is Solr the bottleneck or the rest of the infrastructure?". One _very_
quick measure is
CPU utilization. If it's running along at 100% then you need to improve
your queries or add
more Solr nodes. If it's not, then you have more detective work to do
because it could be a lot
of things, I/O contention, your container backing up queries, etc.

Look at the admin UI, the stats section for a core in question. There
you'll see query times for
various percentiles (95th, 99th, etc). Or analyze the QTimes in the Solr
logs. That should give you
a sense of how fast Solr is serving queries.

All that said, you are running about 80 QPS which is humming right along,
my first suspicion is that
you need more Solr instances but check first.

Best,
Erick

On Tue, May 19, 2015 at 7:30 AM, Luis Cappa Banda 
wrote:

> Hi there,
>
> Unfortunately I don' t agree with Shawn when he suggest to update
> server.xml configuration up to 1 in maxThreads. If Tomcat (due to the
> concurrent overload you' re suffering, the type of the queries you' re
> handling, etc.) cannot manage the requested queries what could happen is
> that Tomcat internal request queue fills and and Out of Memory may appear
> to say hello to you.
>
> Solr is multithreaded and Tomcat also it is, but those Tomcat threads are
> managed by an internal thread pool with a queue. What Tomcat does is to
> dispatch requests as much it cans over the web applications that are
> deployed in it (in this case, Solr). If Tomcat receives more requests that
> it can answer its internal queue starts to be filled.
>
> Those timeouts from the client side you explained seems to be due to
> Tomcat thread pool and its queue is starting to fill up. You can check it
> monitoring its memory and thread usage and I' m sure you' ll see how it
> grows correlated with the number of concurrent requests they receive. Then,
> for sure you' ll se a more or less horizontal line from memory usage and
> those timeouts will appear from the cliente side.
>
> Basically I think that our scenarios are:
>
>- Queries are slow. You should check and try to improve them, because
>maybe they are bad formed and that queries are destroying your performance.
>Also, check your index configuration (segments number, etc.).
>- Queries are OK, but you receive more queries that you can handle.
>Your configuration and everything is well done, but you are trying to
>consume more requests that you can dispatch and answer.
>
> If you cannot improve your queries, or your queries are OK but you receive
> more requests that the ones you can handle, the only solution you have is
> to scale horizontally and startup new Tomcat + Solrs from 4 to N nodes.
>
>
> Best,
>
>
> - Luis Cappa
>
> 2015-05-19 15:57 GMT+02:00 Michael Della Bitta :
>
>> Are you sure the requests are getting queued because the LB is detecting
>> that Solr won't handle them?
>>
>> The reason why I'm asking is I know that ELB doesn't handle bursts well.
>> The load balancer needs to "warm up," which essentially means it might be
>> underpowered at the beginning of a burst. It will spool up more resources
>> if the average load over the last minute is high. But for that minute it
>> will definitely not be able to handle a burst.
>>
>> If you're testing infrastructure using a benchmarking tool that doesn't
>> slowly ramp up traffic, you're definitely encountering this problem.
>>
>> Michael
>>
>>   Jani, Vrushank 
>>  2015-05-19 at 03:51
>>
>> Hello,
>>
>> We have production SOLR deployed on AWS Cloud. We have currently 4 live
>> SOLR servers running on m3xlarge EC2 server instances behind ELB (Elastic
>> Load Balancer) on AWS cloud. We run Apache SOLR in Tomcat container which
>> is sitting behind Apache httpd. Apache httpd is using prefork mpm and the
>> request flows from ELB to Apache Httpd Server to Tomcat (via AJP).
>>
>> Last few days, we are seeing increase in the requests around 2
>> requests minute hitting the LB. In effect we see ELB Surge Queue Length
>> continuously being around 100.
>> Surge Queue Length: represents the total number of request pending
>> submission to the instances, queued by the load balancer;
>>
>> This is causing latencies and time outs from Client applications. Our
>> first reaction was that we don't have enough max connections set either in
>> HTTPD or Tomcat. What we saw, the servers are very lightly loaded with very
>> low CPU and memory utilisation. Apache preform settings are as below on
>> each servers with keep-alive turned off.
>>
>> 
>> StartServers 8
>> MinSpareServers 5
>> MaxSpareServers 20
>> ServerLimit 256
>> MaxClients 256
>> MaxRequestsPerChild 4000
>> 
>>
>>
>> Tomcat server.xml has following settings.
>>
>> > maxThreads="500" connectionTimeout="6"/>
>> For HTTPD – we see that there are lots of TIME_WAIT connections Apache
>> port around 7000+ but ESTABLISHED con

Re: Wildcard/Regex Searching with Decimal Fields

2015-05-19 Thread Erick Erickson
No cleaner ways that spring to mind. Although you might get some
mileage out of normalizing
_everything_ rather than indexing different forms. Perhaps all numbers
are stored left-padded
with zeros to 16 places to the left of the decimal point and
right-padded 16 places to the right
of the decimal point. Which incidentally allows you to do range
queries and other numeric-type
comparisons.


But I _really_ have to go back to one of my original questions: What's
the use-case? You've
outlined _how_ users would like to use regexes  and wildcards over
numeric data, but not _why_.
You've accepted as a given that "contains" are necessary. Before
investing any more time
and effort, please, please, please figure out whether this is just
something somebody threw
in and is valueless or whether it's actually something that would
provide value _to the end user_.

This is where I really have to dig in my heels and have the product
manager explain, in very
concrete terms, the _value_ the user gets out of this. Don't get me
wrong, there may be perfectly
valid reasons. Just make sure they're well thought out before
straining to provide functionality
that implements a half-baked use-case that nobody then uses. Is this
more valuable than not being
able to do any statistics like sum, average, etc?

When having this discussion, have the range queries in your back
pocket and see if anything
that the PM brings up can't be satisfied by numeric searches rather
than string searches. Maybe
even bring in a user and ask "is this useful?".

I've just spent too much of my life implementing useless features to
not question something
like this ;)



Best,
Erick

On Tue, May 19, 2015 at 7:19 AM, Todd Long  wrote:
> I see what you're saying and that should do the trick. I could index 123 with
> an index synonym 123.0. Then my regex query "/123/" should hit along with a
> boolean query "123.0 OR 123.00*". Is there a cleaner approach to breaking
> apart the boolean query in this case? Right now, outside of Solr, I'm just
> looking for any extraneous zeros and wildcards to get the exact value (e.g.
> 123.0) and OR'ing that with the original user input.
>
> Thank you for your help.
>
> - Todd
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Wildcard-Regex-Searching-with-Decimal-Fields-tp4206015p4206288.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Cloud: No live SolrServers available

2015-05-19 Thread Erick Erickson
What you've done _looks_ correct at a glance. Take a look at the Solr
logs. Don't bother trying to index things unless and until your nodes
are "active", it won't happen.

My first guess is that you have some error in your schema or
solrconfig.xml files, syntax errors, typos, class names that are
mis-typed, jars that are missing, whatever.

If that's true, the Solr log (or the screen if you're just running
from the command line) will show big ugly stack traces.

If nothing shows up in the logs then I'm puzzled, but what you
describe is consistent with what I've seen in terms of having bad
configs and trying to create a collection.

Best,
Erick

On Tue, May 19, 2015 at 4:33 AM, Chetan Vora  wrote:
> Hi all
>
> We have a cluster of standalone Solr cores (Solr 4.3) for which we had
> built  some custom plugins. I'm now trying to prototype converting the
> cluster to a Solr Cloud cluster. This is how I am trying to deploy the
> cores (in 4.7.2).
>
>1.
>
>Start solr with zookeeper embedded.
>
>java -DzkRun -Djetty.port=8985 -jar start.jar
>2.
>
>upload a config into Zookeeper (same config as the standalone cores)
>
>zkcli.bat -zkhost localhost:9985 -cmd upconfig -confdir myconfig
>-confname myconfig
>3.
>
>Create a new collection (mycollection) of 2 shards using the Collections
>API
>
> http://localhost:8985/solr/admin/collections?action=CREATE&name=mycollection&numShards=2&replicationFactor=1&maxShardsPerNode=2&collection.configName=myconfig
>
> So at this point I have two shards under my solr directory with the
> appropriate core.properties
>
> But when I go to http://localhost:8985/solr/#/~cloud, I see that the two
> shards' status is "Down" when they are supposed to be active by default.
>
> And when I try to index documents in them using SolrJ (via CloudSolrServer
> API) , I get the error "No live SolrServers available to handle this
> request". I restarted Solr but same issue.
>
> private CloudSolrServer cloudSolr;
> cloudSolr = new CloudSolrServer(zkHOST);
> cloudSolr.setZkClientTimeout(zkClientTimeout);
> cloudSolr.setDefaultCollection(collectionName);
> cloudSolr.connect();
> cloudSolr.add(doc)
>
> What am I doing wrong? I did a lot of digging around and saw an old Jira
> bug saying that Solr Cloud shards won't be active until there are some
> documents in the index. If that is the reason, that's kind of like a
> catch-22 isn't it?
>
> So anyways, I also tried adding some test documents manually and committed
> to see if things improved. Now on the shard statistics page, it correctly
> gives me the Numdocs count but when I try to query it says "no servers
> hosting shard". I next tried passing in shards.tolerant=true as a query
> parameter and search, but no cigar. It says 0 documents found.
>
> Any help would be appreciated. My main objective is to rebuilt the old
> standalone cores using SolrCloud and test to see if our custom
> requesthandlers still work as expected. And at this point, I can't index
> documents inside of the 4.7 Solr Cloud collection I have created. I am
> trying to use a 4.x SolrCloud release as it seems the internal APIs have
> changed quite a bit for the 5.x releases and our custom requesthandlers
> don't work anymore as expected.
>
> Thanks and Regards


Re: Suggestion on field type

2015-05-19 Thread Jack Krupansky
"double" (solr.TrieDoubleField) gives more precision

See:
https://lucene.apache.org/solr/5_1_0/solr-core/org/apache/solr/schema/TrieDoubleField.html

-- Jack Krupansky

On Tue, May 19, 2015 at 11:27 AM, Vishal Swaroop 
wrote:

> Please suggest which numeric field type to use so that I can get complete
> value.
>
> e.g value in database is : 194.846189733028000
>
> If I index it as float SOLR indexes it as 194.84619 where as I need
> complete value i.e 194.846189733028000
> I will also be doing range query on this field.
>
>  positionIncrementGap="0"/>
>
>   multiValued="false" />
>
> Regards
>


Re: soft commit through leader

2015-05-19 Thread Erick Erickson
In a word "yes". The Solr servers are independently keeping their own
timers and one could trip on replica X while an update was in
transmission from the leader say. Or any one of a zillion other timing
conditions. In fact, this is why the indexes will have different
segments on replicas in a slice, the hard commit can be triggered at
different wall-clock times.

But do note that this isn't as much an issue as you might think. The
timer is started when the first update is sent to Solr. So, in the
scenario where you start up all your nodes, the timer starts when you
issue the first commit, i.e. probably within a few milliseconds of
each other. This might still be an issue, but the gap isn't that wide.
Solr promises _eventual_ consistency

If you need to control this, if you issue a soft commit from a client
(URL, SolrJ client, curl, etc) then it _is_ distributed to all
replicas in a collection at that point in time.

Best,
Erick

On Tue, May 19, 2015 at 3:43 AM, Gopal Jee  wrote:
> hi
> wanted to know, when we do soft commit through configuration in
> solrconfig.xml,  will different replicas commit at different point of time
> depending upon when the replica started...or will leader send commit to all
> replicas at same time as per commit interval set in solrconfig.
>
> thanks
> gopal


Re: Suggestion on field type

2015-05-19 Thread John Blythe
I think the omitNorms option will normalize your field length. try setting
that to false (it defaults to true for floats) and see if it helps

-- 
*John Blythe*
Product Manager & Lead Developer

251.605.3071 | j...@curvolabs.com
www.curvolabs.com

58 Adams Ave
Evansville, IN 47713

On Tue, May 19, 2015 at 11:27 AM, Vishal Swaroop 
wrote:

> Please suggest which numeric field type to use so that I can get complete
> value.
>
> e.g value in database is : 194.846189733028000
>
> If I index it as float SOLR indexes it as 194.84619 where as I need
> complete value i.e 194.846189733028000
> I will also be doing range query on this field.
>
>  positionIncrementGap="0"/>
>
>   multiValued="false" />
>
> Regards
>


Suggestion on field type

2015-05-19 Thread Vishal Swaroop
Please suggest which numeric field type to use so that I can get complete
value.

e.g value in database is : 194.846189733028000

If I index it as float SOLR indexes it as 194.84619 where as I need
complete value i.e 194.846189733028000
I will also be doing range query on this field.





Regards


Re: Issue with German search

2015-05-19 Thread Doug Turnbull
How are you searching shamik? What query parser are you using? Perhaps you
could share a sample Solr URL?

Cheers,
-Doug

On Tue, May 19, 2015 at 11:11 AM, shamik  wrote:

> Anyone ?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Issue-with-German-search-tp4206104p4206306.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
*Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections,
LLC | 240.476.9983 | http://www.opensourceconnections.com
Author: Relevant Search  from Manning
Publications
This e-mail and all contents, including attachments, is considered to be
Company Confidential unless explicitly stated otherwise, regardless
of whether attachments are marked as such.


Re: Issue with German search

2015-05-19 Thread shamik
Anyone ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Issue-with-German-search-tp4206104p4206306.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: JSON

2015-05-19 Thread Shalin Shekhar Mangar
That sounds reasonable. Please open a Jira issue.

On Sat, May 16, 2015 at 9:22 AM, William Bell  wrote:

> Can we gt this one fixed? If the Body is empty don't through a Null Pointer
> Exception?
>
> Thanks
>
> >*
> http://localhost:8983/solr/gettingstarted/select?wt=json&indent=true&q=foundation
> "
> <
> http://localhost:8983/solr/gettingstarted/select?wt=json&indent=true&q=foundation%22
> >;*
> >* -H "Content-type:application/json"*
>
> You're telling Solr the body encoding is JSON, but then you don't send any
> body.
> We could catch that error earlier perhaps, but it still looks like an
> error?
>
> -Yonik
>
> --
> Bill Bell
> billnb...@gmail.com
> cell 720-256-8076
>



-- 
Regards,
Shalin Shekhar Mangar.


Re: Help/Guidance Needed : To reload kstem protword hash without full core reload

2015-05-19 Thread Aman Tandon
That link you provided is exactly I want to do. Thanks Ahmet.

With Regards
Aman Tandon

On Tue, May 19, 2015 at 5:06 PM, Ahmet Arslan 
wrote:

> Hi Aman,
>
> changing protected words without reindexing makes little or no sense.
> Regarding protected words, trend is to use solr.KeywordMarkerFilterFactory.
>
> Instead I suggest you to work on a more general issue:
> https://issues.apache.org/jira/browse/SOLR-1307
> Ahmet
>
>
> On Tuesday, May 19, 2015 3:16 AM, Aman Tandon 
> wrote:
> Please help or I am not clear here?
>
> With Regards
> Aman Tandon
>
>
> On Mon, May 18, 2015 at 9:47 PM, Aman Tandon 
> wrote:
>
> > Hi,
> >
> > *Problem Statement: *I want to reload an hash of protwords created by the
> > kstem filter without reloading the whole index core.
> >
> > *My Thought: *I am thinking to reload the hash by passing a parameter
> > like *&r=1 *to analysis url request (to somehow pass the parameter via
> > url). And I am thinking if somehow by changing the IndexSchema.java I
> might
> > can pass this parameter though my analyzer chain to KStemFilter. In
> which I
> > will call the initializeDictionary function to make protwords hash again
> > from the file if *r=1*, instead of making full core reload request.
> >
> > Please guide me, I know question might be stupid, the thought came in my
> > mind and I want to share and ask some suggestions here. Is it possible or
> > not and how can i achieve the same?
> >
> > I will be thankful for guidance.
> >
> > With Regards
> > Aman Tandon
> >
>
>
>
>
> On Tuesday, May 19, 2015 3:16 AM, Aman Tandon 
> wrote:
> Please help or I am not clear here?
>
> With Regards
> Aman Tandon
>
>
> On Mon, May 18, 2015 at 9:47 PM, Aman Tandon 
> wrote:
>
> > Hi,
> >
> > *Problem Statement: *I want to reload an hash of protwords created by the
> > kstem filter without reloading the whole index core.
> >
> > *My Thought: *I am thinking to reload the hash by passing a parameter
> > like *&r=1 *to analysis url request (to somehow pass the parameter via
> > url). And I am thinking if somehow by changing the IndexSchema.java I
> might
> > can pass this parameter though my analyzer chain to KStemFilter. In
> which I
> > will call the initializeDictionary function to make protwords hash again
> > from the file if *r=1*, instead of making full core reload request.
> >
> > Please guide me, I know question might be stupid, the thought came in my
> > mind and I want to share and ask some suggestions here. Is it possible or
> > not and how can i achieve the same?
> >
> > I will be thankful for guidance.
> >
> > With Regards
> > Aman Tandon
> >
>


Re: Issue serving concurrent requests to SOLR on PROD

2015-05-19 Thread Luis Cappa Banda
Hi there,

Unfortunately I don' t agree with Shawn when he suggest to update
server.xml configuration up to 1 in maxThreads. If Tomcat (due to the
concurrent overload you' re suffering, the type of the queries you' re
handling, etc.) cannot manage the requested queries what could happen is
that Tomcat internal request queue fills and and Out of Memory may appear
to say hello to you.

Solr is multithreaded and Tomcat also it is, but those Tomcat threads are
managed by an internal thread pool with a queue. What Tomcat does is to
dispatch requests as much it cans over the web applications that are
deployed in it (in this case, Solr). If Tomcat receives more requests that
it can answer its internal queue starts to be filled.

Those timeouts from the client side you explained seems to be due to Tomcat
thread pool and its queue is starting to fill up. You can check it
monitoring its memory and thread usage and I' m sure you' ll see how it
grows correlated with the number of concurrent requests they receive. Then,
for sure you' ll se a more or less horizontal line from memory usage and
those timeouts will appear from the cliente side.

Basically I think that our scenarios are:

   - Queries are slow. You should check and try to improve them, because
   maybe they are bad formed and that queries are destroying your performance.
   Also, check your index configuration (segments number, etc.).
   - Queries are OK, but you receive more queries that you can handle. Your
   configuration and everything is well done, but you are trying to consume
   more requests that you can dispatch and answer.

If you cannot improve your queries, or your queries are OK but you receive
more requests that the ones you can handle, the only solution you have is
to scale horizontally and startup new Tomcat + Solrs from 4 to N nodes.


Best,


- Luis Cappa

2015-05-19 15:57 GMT+02:00 Michael Della Bitta :

> Are you sure the requests are getting queued because the LB is detecting
> that Solr won't handle them?
>
> The reason why I'm asking is I know that ELB doesn't handle bursts well.
> The load balancer needs to "warm up," which essentially means it might be
> underpowered at the beginning of a burst. It will spool up more resources
> if the average load over the last minute is high. But for that minute it
> will definitely not be able to handle a burst.
>
> If you're testing infrastructure using a benchmarking tool that doesn't
> slowly ramp up traffic, you're definitely encountering this problem.
>
> Michael
>
>   Jani, Vrushank 
>  2015-05-19 at 03:51
>
> Hello,
>
> We have production SOLR deployed on AWS Cloud. We have currently 4 live
> SOLR servers running on m3xlarge EC2 server instances behind ELB (Elastic
> Load Balancer) on AWS cloud. We run Apache SOLR in Tomcat container which
> is sitting behind Apache httpd. Apache httpd is using prefork mpm and the
> request flows from ELB to Apache Httpd Server to Tomcat (via AJP).
>
> Last few days, we are seeing increase in the requests around 2
> requests minute hitting the LB. In effect we see ELB Surge Queue Length
> continuously being around 100.
> Surge Queue Length: represents the total number of request pending
> submission to the instances, queued by the load balancer;
>
> This is causing latencies and time outs from Client applications. Our
> first reaction was that we don't have enough max connections set either in
> HTTPD or Tomcat. What we saw, the servers are very lightly loaded with very
> low CPU and memory utilisation. Apache preform settings are as below on
> each servers with keep-alive turned off.
>
> 
> StartServers 8
> MinSpareServers 5
> MaxSpareServers 20
> ServerLimit 256
> MaxClients 256
> MaxRequestsPerChild 4000
> 
>
>
> Tomcat server.xml has following settings.
>
>  maxThreads="500" connectionTimeout="6"/>
> For HTTPD – we see that there are lots of TIME_WAIT connections Apache
> port around 7000+ but ESTABLISHED connections are around 20.
> For Tomact – we see about 60 ESTABLISHED connections on tomcat AJP port.
>
> So the servers and connections doesn't look like fully utilised to the
> capacity. There is no visible stress anywhere. However we still get
> requests being queued up on LB because they can not be served from
> underlying servers.
>
> Can you please help me resolving this issue? Can you see any apparent
> problem here? Am I missing any configuration or settings for SOLR?
>
> Your help will be truly appreciated.
>
> Regards
> VJ
>
>
>
>
>
>
> Vrushank Jani [http://media.for.truelocal.com.au/signature/img/divider.png]
> Senior Java Developer
> T 02 8312 1625[http://media.for.truelocal.com.au/signature/img/divider.png]
> E vrushank.j...@truelocal.com.au
> 
>
> [http://media.for.truelocal.com.au/signature/img/TL_logo.png]
>   [
> http://media.for.truelocal.com.au/signature/img/TL_facebook.png]
>  

Re: Wildcard/Regex Searching with Decimal Fields

2015-05-19 Thread Todd Long
I see what you're saying and that should do the trick. I could index 123 with
an index synonym 123.0. Then my regex query "/123/" should hit along with a
boolean query "123.0 OR 123.00*". Is there a cleaner approach to breaking
apart the boolean query in this case? Right now, outside of Solr, I'm just
looking for any extraneous zeros and wildcards to get the exact value (e.g.
123.0) and OR'ing that with the original user input.

Thank you for your help.

- Todd



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Wildcard-Regex-Searching-with-Decimal-Fields-tp4206015p4206288.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Issue serving concurrent requests to SOLR on PROD

2015-05-19 Thread Michael Della Bitta
Are you sure the requests are getting queued because the LB is detecting 
that Solr won't handle them?


The reason why I'm asking is I know that ELB doesn't handle bursts well. 
The load balancer needs to "warm up," which essentially means it might 
be underpowered at the beginning of a burst. It will spool up more 
resources if the average load over the last minute is high. But for that 
minute it will definitely not be able to handle a burst.


If you're testing infrastructure using a benchmarking tool that doesn't 
slowly ramp up traffic, you're definitely encountering this problem.


Michael


Jani, Vrushank 
2015-05-19 at 03:51

Hello,

We have production SOLR deployed on AWS Cloud. We have currently 4 
live SOLR servers running on m3xlarge EC2 server instances behind ELB 
(Elastic Load Balancer) on AWS cloud. We run Apache SOLR in Tomcat 
container which is sitting behind Apache httpd. Apache httpd is using 
prefork mpm and the request flows from ELB to Apache Httpd Server to 
Tomcat (via AJP).


Last few days, we are seeing increase in the requests around 2 
requests minute hitting the LB. In effect we see ELB Surge Queue 
Length continuously being around 100.
Surge Queue Length: represents the total number of request pending 
submission to the instances, queued by the load balancer;


This is causing latencies and time outs from Client applications. Our 
first reaction was that we don't have enough max connections set 
either in HTTPD or Tomcat. What we saw, the servers are very lightly 
loaded with very low CPU and memory utilisation. Apache preform 
settings are as below on each servers with keep-alive turned off.



StartServers 8
MinSpareServers 5
MaxSpareServers 20
ServerLimit 256
MaxClients 256
MaxRequestsPerChild 4000



Tomcat server.xml has following settings.

maxThreads="500" connectionTimeout="6"/>
For HTTPD – we see that there are lots of TIME_WAIT connections Apache 
port around 7000+ but ESTABLISHED connections are around 20.

For Tomact – we see about 60 ESTABLISHED connections on tomcat AJP port.

So the servers and connections doesn't look like fully utilised to the 
capacity. There is no visible stress anywhere. However we still get 
requests being queued up on LB because they can not be served from 
underlying servers.


Can you please help me resolving this issue? Can you see any apparent 
problem here? Am I missing any configuration or settings for SOLR?


Your help will be truly appreciated.

Regards
VJ






Vrushank Jani 
[http://media.for.truelocal.com.au/signature/img/divider.png] Senior 
Java Developer
T 02 8312 
1625[http://media.for.truelocal.com.au/signature/img/divider.png] E 
vrushank.j...@truelocal.com.au


[http://media.for.truelocal.com.au/signature/img/TL_logo.png] 
[http://media.for.truelocal.com.au/signature/img/TL_facebook.png] 
 
[http://media.for.truelocal.com.au/signature/img/TL_twitter.png] 
 
[http://media.for.truelocal.com.au/signature/img/TL_google.png] 
 
[http://media.for.truelocal.com.au/signature/img/TL_pintrest.png] 






Escaping special chars when exporting result sets

2015-05-19 Thread Angelo Veltens

Hi all!

We use Solr 4.10.3 and have configured an /export SearchHandler in 
addition to the default SearchHandler /select.



  
{!xport}
xsort
false
username,description
   
  
query
  


The handler follows the example from section "Exporting Result Sets" of 
the user guide.


The description field may contain line breaks (\n) and quotation marks (").

When using the /select handler those characters are properly escaped. 
E.g. we get following value in the response:


"description":"Lorem ipsum\n\ndolor sit amet. \"hello\" world \" test",

BUT when using the /export handler those characters are NOT escaped. We get:

"description":"Lorem ipsum

dolor sit amet. "hello" world " test"

The latter is not valid JSON in our understanding.

What are we doing wrong? How can we get properly escaped JSON?

Thanks in advance!

Best regards,
Angelo Veltens








Re: Deduplication

2015-05-19 Thread Jack Krupansky
Shawn, I was going to say the same thing, but... then I was thinking about
SolrCloud and the fact that update processors are invoked before the
document is set to its target node, so there wouldn't be a reliable way to
tell if the input document field value exists on the target rather than
current node.

Or does the update processing only occur on the leader node after being
forwarded from the originating node? Is the doc clear on this detail?

My understanding was that the distributed update processor is near the end
of the chain, so that running of user update processors occurs before the
distribution step, but is that distribution to the leader, or distribution
from leader to replicas for a shard?


-- Jack Krupansky

On Tue, May 19, 2015 at 9:01 AM, Shawn Heisey  wrote:

> On 5/19/2015 3:02 AM, Bram Van Dam wrote:
> > I'm looking for a way to have Solr reject documents if a certain field
> > value is duplicated (reject, not overwrite). There doesn't seem to be
> > any kind of unique option in schema fields.
> >
> > The de-duplication feature seems to make this (somewhat) possible, but I
> > would like it to provide the unique value myself, without having the
> > deduplicator create a hash of field values.
> >
> > Am I missing an obvious (or less obvious) way of accomplishing this?
>
> Write a custom update processor and include it in your update chain.
> You will then have the ability to do anything you want with the entire
> input document before it hits the code to actually do the indexing.
>
> A script update processor is included with Solr allows you to write your
> processor in a language other than Java, such as javascript.
>
>
> https://lucene.apache.org/solr/5_1_0/solr-core/org/apache/solr/update/processor/StatelessScriptUpdateProcessorFactory.html
>
> Here's how to discard a document in an update processor written in Java:
>
>
> http://stackoverflow.com/questions/27108200/how-to-cancel-indexing-of-a-solr-document-using-update-request-processor
>
> The javadoc that I linked above describes the ability to return "false"
> in other languages to discard the document.
>
> Thanks,
> Shawn
>
>


Re: Java upgrade for solr in master-slave configuration

2015-05-19 Thread Shawn Heisey
On 5/19/2015 12:21 AM, Kamal Kishore Aggarwal wrote:
> I am currently working with Java-1.7, Solr-4.8.1 with tomcat 7. The solr
> configuration has slave & master architecture. I am looking forward to
> upgrade Java from 1.7 to 1.8 version in order to take advantage of memory
> optimization done in latest version.
> 
> So, I am confused if I should upgrade java first on master server and then
> on slave server or the other way round. What should be the ideal steps, so
> that existing solr index and other things should not get corrupted . Please
> suggest.

I am not aware of any changes in index format resulting from changing
your Java version.  It should not matter which machines you upgrade first.

Thanks,
Shawn



Re: Deduplication

2015-05-19 Thread Shawn Heisey
On 5/19/2015 3:02 AM, Bram Van Dam wrote:
> I'm looking for a way to have Solr reject documents if a certain field
> value is duplicated (reject, not overwrite). There doesn't seem to be
> any kind of unique option in schema fields.
> 
> The de-duplication feature seems to make this (somewhat) possible, but I
> would like it to provide the unique value myself, without having the
> deduplicator create a hash of field values.
> 
> Am I missing an obvious (or less obvious) way of accomplishing this?

Write a custom update processor and include it in your update chain.
You will then have the ability to do anything you want with the entire
input document before it hits the code to actually do the indexing.

A script update processor is included with Solr allows you to write your
processor in a language other than Java, such as javascript.

https://lucene.apache.org/solr/5_1_0/solr-core/org/apache/solr/update/processor/StatelessScriptUpdateProcessorFactory.html

Here's how to discard a document in an update processor written in Java:

http://stackoverflow.com/questions/27108200/how-to-cancel-indexing-of-a-solr-document-using-update-request-processor

The javadoc that I linked above describes the ability to return "false"
in other languages to discard the document.

Thanks,
Shawn



Re: Deduplication

2015-05-19 Thread Alessandro Benedetti
Hi Bram,
what do you mean with :
"  I
would like it to provide the unique value myself, without having the
deduplicator create a hash of field values " .

This is not reduplication, but simple document filtering based on a
constraint.
In the case you want de-duplication ( which seemed from your very first
part of the mail) here you can find a lot of info :

https://cwiki.apache.org/confluence/display/solr/De-Duplication

Let me know for more detailed requirements!

2015-05-19 10:02 GMT+01:00 Bram Van Dam :

> Hi folks,
>
> I'm looking for a way to have Solr reject documents if a certain field
> value is duplicated (reject, not overwrite). There doesn't seem to be
> any kind of unique option in schema fields.
>
> The de-duplication feature seems to make this (somewhat) possible, but I
> would like it to provide the unique value myself, without having the
> deduplicator create a hash of field values.
>
> Am I missing an obvious (or less obvious) way of accomplishing this?
>
> Thanks,
>
>  - Bram
>



-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: Issue serving concurrent requests to SOLR on PROD

2015-05-19 Thread Shawn Heisey
On 5/19/2015 1:51 AM, Jani, Vrushank wrote:
> We have production SOLR deployed on AWS Cloud. We have currently 4 live SOLR 
> servers running on m3xlarge EC2 server instances behind ELB (Elastic Load 
> Balancer) on AWS cloud. We run Apache SOLR in Tomcat container which is 
> sitting behind Apache httpd. Apache httpd is using prefork mpm and the 
> request flows from ELB to Apache Httpd Server to Tomcat (via AJP).
> 
> Last few days, we are seeing increase in the requests around 2 requests 
> minute hitting the LB. In effect we see ELB Surge Queue Length continuously 
> being around 100.
> Surge Queue Length: represents the total number of request pending submission 
> to the instances, queued by the load balancer;
> 
> This is causing latencies and time outs from Client applications. Our first 
> reaction was that we don't have enough max connections set either in HTTPD or 
> Tomcat. What we saw, the servers are very lightly loaded with very low CPU 
> and memory utilisation.  Apache preform settings are as below on each servers 
> with keep-alive turned off.
> 
> 
> StartServers 8
> MinSpareServers 5
> MaxSpareServers 20
> ServerLimit 256
> MaxClients 256
> MaxRequestsPerChild 4000
> 
> 
> 
> Tomcat server.xml has following settings.
> 
>  maxThreads="500" connectionTimeout="6"/>
> For HTTPD – we see that there are lots of TIME_WAIT connections Apache port 
> around 7000+ but ESTABLISHED connections are around 20.
> For Tomact – we see  about 60 ESTABLISHED connections  on tomcat AJP port.
> 
> So the servers and connections doesn't look like fully utilised to the 
> capacity. There is no visible stress anywhere.  However we still get requests 
> being queued up on LB because they can not be served from underlying servers.
> 
> Can you please help me resolving this issue? Can you see any apparent problem 
> here? Am I missing any configuration or settings for SOLR?

I'm curious about why you have Apache sitting in front of Tomcat.  About
the only reason I can think of to require that step is that you are
using it to require authentication or to deny access to things like the
admin UI.   If you are not doing anything in Apache other than proxying
the traffic, then drop the middleman and use the container directly with
its own HTTP connector.  Or even better, use the Jetty included with Solr.

You should set maxThreads to 1 in your Tomcat configuration,
effectively removing the limit.  Solr is a multi-threaded Java servlet,
with background threads as well as request-based threads.  Tomcat
requires threads for handling connections, but Solr also requires
threads for its own operation.  The maxThreads limit counts *all* of
those threads, not just the Tomcat threads.

Thanks,
Shawn



Re: Relevancy Scoring

2015-05-19 Thread John Blythe
Awesome, following it now!

-- 
*John Blythe*
Product Manager & Lead Developer

251.605.3071 | j...@curvolabs.com
www.curvolabs.com

58 Adams Ave
Evansville, IN 47713

On Mon, May 18, 2015 at 8:21 PM, Doug Turnbull <
dturnb...@opensourceconnections.com> wrote:

> Glad you figured things out and found splainer useful! Pull requests, bugs,
> feature requests welcome!
>
> https://github.com/o19s/splainer
>
> Doug
>
> On Monday, May 18, 2015, John Blythe  wrote:
>
> > Doug,
> >
> > very very cool tool you've made there. thanks so much for sharing!
> >
> > i ended up removing the shinglefilterfactory and voila! things are back
> in
> > good, working order with some great matching. i'm not 100% certain as to
> > why shingling was so ineffective. i'm guessing the stacked terms created
> > lower relevancy due to IDF on the *joint *terms/token?
> >
> > --
> > *John Blythe*
> > Product Manager & Lead Developer
> >
> > 251.605.3071 | j...@curvolabs.com 
> > www.curvolabs.com
> >
> > 58 Adams Ave
> > Evansville, IN 47713
> >
> > On Mon, May 18, 2015 at 4:57 PM, John Blythe  > > wrote:
> >
> > > Doug,
> > >
> > > A couple things quickly:
> > > - I'll check in to that. How would you go about testing things, direct
> > > URL? If so, how would you compose one of the examples above?
> > > - yup, I used it extensively before testing scores to ensure that I was
> > > getting things parsed appropriately (segmenting off the unit of measure
> > > [mm] whilst still maintaining the decimal instead of breaking it up was
> > my
> > > largest concern as of late)
> > > - to that point, though, it looks like one of my blunders was in the
> > > synonyms file. i just referenced /analysis/ again and realized "CANN"
> was
> > > being transposed to "cannula" instead of "cannulated" #facepalm
> > > - i'll be GLAD to use that! i'd been trying to use
> > http://explain.solr.pl/
> > > previously but it kept error'ing out on me :\
> > >
> > > thanks again, will report back!
> > >
> > > --
> > > *John Blythe*
> > > Product Manager & Lead Developer
> > >
> > > 251.605.3071 | j...@curvolabs.com 
> > > www.curvolabs.com
> > >
> > > 58 Adams Ave
> > > Evansville, IN 47713
> > >
> > > On Mon, May 18, 2015 at 4:47 PM, Doug Turnbull <
> > > dturnb...@opensourceconnections.com > wrote:
> > >
> > >> Hey John,
> > >>
> > >> I think you likely do need to think about escaping the query
> operators.
> > I
> > >> doubt the Solr admin could tell the difference.
> > >>
> > >> For analysis, have you looked at the handy analysis tool in the Solr
> > Admin
> > >> UI? Its pretty indespensible for figuring out if an analyzed query
> > matches
> > >> an analyzed field.
> > >>
> > >> Outside of that, I can selfishly plug Splainer (http://splainer.io)
> > that
> > >> gives you more insight into the Solr relevance explain. You would
> paste
> > in
> > >> something like
> > >>
> http://solr.quepid.com/solr/statedecoded/select?q=text:(deer%20hunting)
> > .
> > >>
> > >> Cheers!
> > >> -Doug
> > >>
> > >> On Mon, May 18, 2015 at 3:02 PM, John Blythe  > > wrote:
> > >>
> > >> > Thanks again for the speediness, Doug.
> > >> >
> > >> > Good to know on some of those things, not least of all the +
> > indicating
> > >> a
> > >> > mandatory field and the parentheses. It seems like the escaping is
> > >> pretty
> > >> > robust in light of the product number.
> > >> >
> > >> > I'm thinking it has to be largely related to the analyzer. Check
> this
> > >> out,
> > >> > this time with more of a real world case for us. Searching for
> > >> "descript2:
> > >> > CANN SCREW PT 3.5X50MM" produces a top result that has "Cannulated
> > >> screw PT
> > >> > 4.0x40mm" as its description. There is a document, though, that has
> > the
> > >> > description of "Cannulated screw PT 3.5x50mm"—the exact same thing
> > >> (minus
> > >> > lowercases) rendering that the analyzer is producing (per the
> > /analysis
> > >> > page). Why would 4.0x40 come up first?  The top four results have
> > >> > 4.0x[Something]. It's not till the fifth result that you see a 3.5
> > >> > something: "Cannulated screw PT 3.5x105mm" at which point I'm saying
> > >> WTF.
> > >> > So close, but then it ignores the "50" for a "105" instead.
> > >> >
> > >> > Further, adding parenthesis around the phrase—"descript2: (CANN
> SCREW
> > PT
> > >> > 3.5X50MM)"—produces top results that have the correct
> > >> dimensions—3.5x50—but
> > >> > the wrong type. Instead of "cannulated" screws we see "cortical."
> I'm
> > >> > convinced Solr is trolling me at this point :p
> > >> >
> > >> > --
> > >> > *John Blythe*
> > >> > Product Manager & Lead Developer
> > >> >
> > >> > 251.605.3071 | j...@curvolabs.com 
> > >> > www.curvolabs.com
> > >> >
> > >> > 58 Adams Ave
> > >> > Evansville, IN 47713
> > >> >
> > >> > On Mon, May 18, 2015 at 2:34 PM, Doug Turnbull <
> > >> > dturnb...@opensourceconnections.com > wrote:
> > >> >
> > >> > > You might just need some syntax help. Not sure what the Solr admin
> > >> > escapes,
> > >> > > but many

Re: Help/Guidance Needed : To reload kstem protword hash without full core reload

2015-05-19 Thread Ahmet Arslan
Hi Aman,

changing protected words without reindexing makes little or no sense. 
Regarding protected words, trend is to use solr.KeywordMarkerFilterFactory.

Instead I suggest you to work on a more general issue:
https://issues.apache.org/jira/browse/SOLR-1307
Ahmet


On Tuesday, May 19, 2015 3:16 AM, Aman Tandon  wrote:
Please help or I am not clear here?

With Regards
Aman Tandon


On Mon, May 18, 2015 at 9:47 PM, Aman Tandon 
wrote:

> Hi,
>
> *Problem Statement: *I want to reload an hash of protwords created by the
> kstem filter without reloading the whole index core.
>
> *My Thought: *I am thinking to reload the hash by passing a parameter
> like *&r=1 *to analysis url request (to somehow pass the parameter via
> url). And I am thinking if somehow by changing the IndexSchema.java I might
> can pass this parameter though my analyzer chain to KStemFilter. In which I
> will call the initializeDictionary function to make protwords hash again
> from the file if *r=1*, instead of making full core reload request.
>
> Please guide me, I know question might be stupid, the thought came in my
> mind and I want to share and ask some suggestions here. Is it possible or
> not and how can i achieve the same?
>
> I will be thankful for guidance.
>
> With Regards
> Aman Tandon
>

 


On Tuesday, May 19, 2015 3:16 AM, Aman Tandon  wrote:
Please help or I am not clear here?

With Regards
Aman Tandon


On Mon, May 18, 2015 at 9:47 PM, Aman Tandon 
wrote:

> Hi,
>
> *Problem Statement: *I want to reload an hash of protwords created by the
> kstem filter without reloading the whole index core.
>
> *My Thought: *I am thinking to reload the hash by passing a parameter
> like *&r=1 *to analysis url request (to somehow pass the parameter via
> url). And I am thinking if somehow by changing the IndexSchema.java I might
> can pass this parameter though my analyzer chain to KStemFilter. In which I
> will call the initializeDictionary function to make protwords hash again
> from the file if *r=1*, instead of making full core reload request.
>
> Please guide me, I know question might be stupid, the thought came in my
> mind and I want to share and ask some suggestions here. Is it possible or
> not and how can i achieve the same?
>
> I will be thankful for guidance.
>
> With Regards
> Aman Tandon
>


Solr Cloud: No live SolrServers available

2015-05-19 Thread Chetan Vora
Hi all

We have a cluster of standalone Solr cores (Solr 4.3) for which we had
built  some custom plugins. I'm now trying to prototype converting the
cluster to a Solr Cloud cluster. This is how I am trying to deploy the
cores (in 4.7.2).

   1.

   Start solr with zookeeper embedded.

   java -DzkRun -Djetty.port=8985 -jar start.jar
   2.

   upload a config into Zookeeper (same config as the standalone cores)

   zkcli.bat -zkhost localhost:9985 -cmd upconfig -confdir myconfig
   -confname myconfig
   3.

   Create a new collection (mycollection) of 2 shards using the Collections
   API
   
http://localhost:8985/solr/admin/collections?action=CREATE&name=mycollection&numShards=2&replicationFactor=1&maxShardsPerNode=2&collection.configName=myconfig

So at this point I have two shards under my solr directory with the
appropriate core.properties

But when I go to http://localhost:8985/solr/#/~cloud, I see that the two
shards' status is "Down" when they are supposed to be active by default.

And when I try to index documents in them using SolrJ (via CloudSolrServer
API) , I get the error "No live SolrServers available to handle this
request". I restarted Solr but same issue.

private CloudSolrServer cloudSolr;
cloudSolr = new CloudSolrServer(zkHOST);
cloudSolr.setZkClientTimeout(zkClientTimeout);
cloudSolr.setDefaultCollection(collectionName);
cloudSolr.connect();
cloudSolr.add(doc)

What am I doing wrong? I did a lot of digging around and saw an old Jira
bug saying that Solr Cloud shards won't be active until there are some
documents in the index. If that is the reason, that's kind of like a
catch-22 isn't it?

So anyways, I also tried adding some test documents manually and committed
to see if things improved. Now on the shard statistics page, it correctly
gives me the Numdocs count but when I try to query it says "no servers
hosting shard". I next tried passing in shards.tolerant=true as a query
parameter and search, but no cigar. It says 0 documents found.

Any help would be appreciated. My main objective is to rebuilt the old
standalone cores using SolrCloud and test to see if our custom
requesthandlers still work as expected. And at this point, I can't index
documents inside of the 4.7 Solr Cloud collection I have created. I am
trying to use a 4.x SolrCloud release as it seems the internal APIs have
changed quite a bit for the 5.x releases and our custom requesthandlers
don't work anymore as expected.

Thanks and Regards


Solr 4.10.2 - highlighter : Random Error on retrieving document with Highlighter enabled

2015-05-19 Thread ariya bala
Hi,

I am facing an issue (solr 4.10.2) when we are trying retrieve a document
and highlight the hits.
Below is the exception and this happens in a random fashion.
If we try again to reload the same document which threw this exception, it
loads without exception.

Any help would be appreciated.

*Exception:*

org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token 0.8
exceeds length of provided text sized 9971

at
org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:552)
~solr-solrj-4.10.2.jar:4.10.2 1634293 - mike - 2014-10-26 05:56:22


at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
~solr-solrj-4.10.2.jar:4.10.2 1634293 - mike - 2014-10-26 05:56:22


at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
~solr-solrj-4.10.2.jar:4.10.2 1634293 - mike - 2014-10-26 05:56:22



-- 
Cheers
*Ariya *


soft commit through leader

2015-05-19 Thread Gopal Jee
hi
wanted to know, when we do soft commit through configuration in
solrconfig.xml,  will different replicas commit at different point of time
depending upon when the replica started...or will leader send commit to all
replicas at same time as per commit interval set in solrconfig.

thanks
gopal


Re: Solr 5.0, Jetty and WAR

2015-05-19 Thread Bram Van Dam
> My organization has issues with Jetty (some customers don't want Jetty on
> their boxes, but are OK with WebSphere or Tomcat) so I'm trying to figure
> out: how to get Solr on WebSphere / Tomcat without using WAR knowing that
> the WAR will go away.

I understand that some customers are irrational. Doesn't mean you (or
Solr) should cater to them. I've heard the objections, and they're all
nonsense. Jetty is slow? WebSphere is easier to manage? Tomcat doesn't
support X/Y/Z. It's all nonsense.

We're currently in the process of migrating our applications away from
WARs for many of the same reason as Solr. Whether or not we use Jetty
internally to handle HTTP requests isn't anyone's concern.

The best way to explain it to your irrational customers is that you're
running Solr, instead of confusing them with useless details. It doesn't
matter that Solr uses Jetty internally.

As for running Solr on WebSphere/Tomcat without a WAR...that's not going
to happen. Unless you want to fork Solr and keep the WAR...

 - Bram


Deduplication

2015-05-19 Thread Bram Van Dam
Hi folks,

I'm looking for a way to have Solr reject documents if a certain field
value is duplicated (reject, not overwrite). There doesn't seem to be
any kind of unique option in schema fields.

The de-duplication feature seems to make this (somewhat) possible, but I
would like it to provide the unique value myself, without having the
deduplicator create a hash of field values.

Am I missing an obvious (or less obvious) way of accomplishing this?

Thanks,

 - Bram


Issue serving concurrent requests to SOLR on PROD

2015-05-19 Thread Jani, Vrushank


Hello,

We have production SOLR deployed on AWS Cloud. We have currently 4 live SOLR 
servers running on m3xlarge EC2 server instances behind ELB (Elastic Load 
Balancer) on AWS cloud. We run Apache SOLR in Tomcat container which is sitting 
behind Apache httpd. Apache httpd is using prefork mpm and the request flows 
from ELB to Apache Httpd Server to Tomcat (via AJP).

Last few days, we are seeing increase in the requests around 2 requests 
minute hitting the LB. In effect we see ELB Surge Queue Length continuously 
being around 100.
Surge Queue Length: represents the total number of request pending submission 
to the instances, queued by the load balancer;

This is causing latencies and time outs from Client applications. Our first 
reaction was that we don't have enough max connections set either in HTTPD or 
Tomcat. What we saw, the servers are very lightly loaded with very low CPU and 
memory utilisation.  Apache preform settings are as below on each servers with 
keep-alive turned off.


StartServers 8
MinSpareServers 5
MaxSpareServers 20
ServerLimit 256
MaxClients 256
MaxRequestsPerChild 4000



Tomcat server.xml has following settings.


For HTTPD – we see that there are lots of TIME_WAIT connections Apache port 
around 7000+ but ESTABLISHED connections are around 20.
For Tomact – we see  about 60 ESTABLISHED connections  on tomcat AJP port.

So the servers and connections doesn't look like fully utilised to the 
capacity. There is no visible stress anywhere.  However we still get requests 
being queued up on LB because they can not be served from underlying servers.

Can you please help me resolving this issue? Can you see any apparent problem 
here? Am I missing any configuration or settings for SOLR?

Your help will be truly appreciated.

Regards
VJ






Vrushank Jani [http://media.for.truelocal.com.au/signature/img/divider.png]  
Senior Java Developer
T 02 8312 1625[http://media.for.truelocal.com.au/signature/img/divider.png] E 
vrushank.j...@truelocal.com.au

[http://media.for.truelocal.com.au/signature/img/TL_logo.png]
 [http://media.for.truelocal.com.au/signature/img/TL_facebook.png] 
  
[http://media.for.truelocal.com.au/signature/img/TL_twitter.png] 
  
[http://media.for.truelocal.com.au/signature/img/TL_google.png] 
  
[http://media.for.truelocal.com.au/signature/img/TL_pintrest.png]