run multiple queries at the same time

2014-07-09 Thread Lee Chunki
Hi,

Is there any way to run multiple queries at the same time?

situation is
1. when query in
2. check synonyms
3. get search results for all synonym queries and original query

even if, I can get search results by looping searcher but, as you know, it is 
time consuming.

Thanks,
Chunki.




Re: Complement of {!join}

2014-07-09 Thread Alexandre Rafalovitch
Well, even JIRA and the release notes concentrates on a replacement of
_query_ with {!}. But not about having multiple of them. Was it
possible to have multiple _query_ segments in one 'q' query? I was not
aware of that either.

Basically, I am suggesting that somebody who knows this in depth
should write an article. I feel it is a powerful feature of Solr, but
I was even hesitant to use it in my own config because all the online
examples were for a single-use.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Thu, Jul 10, 2014 at 11:18 AM, Jack Krupansky
 wrote:
> From the Solr 4.1 release notes:
>
> * Solr QParsers may now be directly invoked in the lucene query syntax
>  via localParams and without the _query_ magic field hack.
>  Example: foo AND {!term f=myfield v=$qq}
>
> -- Jack Krupansky
>
> -Original Message- From: Jack Krupansky
> Sent: Thursday, July 10, 2014 12:14 AM
>
> To: solr-user@lucene.apache.org
> Subject: Re: Complement of {!join}
>
> I think this is the Jira that implemented that feature:
> SOLR-4093 - localParams syntax for standard query parser
> https://issues.apache.org/jira/browse/SOLR-4093
>
> Yeah, I don't think this is fully documented anywhere, other than the Jira
> and the patch itself.
>
> I think I had finished my query parser doc in my e-book before 4.1 came out.
> This was the point where the "divorce" between the Lucene and Solr query
> parsers took place, because the feature needed to be added to the query
> parser grammar, but the Lucene guys objected to this "Solr feature."
>
> -- Jack Krupansky
>
> -Original Message- From: Alexandre Rafalovitch
> Sent: Wednesday, July 9, 2014 9:10 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Complement of {!join}
>
> Ok, so cannot be eDisMax at the top.
>
> However, the point I really am trying to make does not seem to be in
> those links. All the examples of local parameters I have seen use them
> at the start of the query as a standalone component. I haven't seen
> examples where a query string contains several of them together and
> uses different query parsers. The only example I do remember seeing
> multiple query parsers used together was when each one of them was
> done separately in 'fq' clauses.
>
> Additionally, even now I don't know how the end of the content after
> the local parameter closing brace is determined. I used line breaks
> for my example, also (brackets) seem to work. But I don't remember
> seeing the exact rules.
>
> So, I still think the world could benefit from a very visible example
> showing multi-clause query with different sub-clauses using different
> query parsers. Perhaps even on that same linked page on Wiki. And/Or a
> presentation on "Did you know this about Solr?" at the next big
> conference.
>
> Regards,
>   Alex.
>
> Personal website: http://www.outerthoughts.com/
> Current project: http://www.solr-start.com/ - Accelerating your Solr
> proficiency
>
>
> On Thu, Jul 10, 2014 at 7:53 AM, Chris Hostetter
>  wrote:
>>
>> :
>> : Somebody (with more knowledge) should write up an in-depth article on
>> : this issue and whether the parent parser has to be default (lucene) or
>> : whatever.
>>
>> It's a feature of Solr's standard query parser...
>>
>> https://cwiki.apache.org/confluence/display/solr/Query+Syntax+and+Parsing
>> https://cwiki.apache.org/confluence/display/solr/The+Standard+Query+Parser
>>
>> https://cwiki.apache.org/confluence/display/solr/The+Standard+Query+Parser#TheStandardQueryParser-DifferencesbetweenLuceneQueryParserandtheSolrStandardQueryParser
>>
>>
>>
>> -Hoss
>> http://www.lucidworks.com/
>
>


Re: Complement of {!join}

2014-07-09 Thread Jack Krupansky

From the Solr 4.1 release notes:


* Solr QParsers may now be directly invoked in the lucene query syntax
 via localParams and without the _query_ magic field hack.
 Example: foo AND {!term f=myfield v=$qq}

-- Jack Krupansky

-Original Message- 
From: Jack Krupansky

Sent: Thursday, July 10, 2014 12:14 AM
To: solr-user@lucene.apache.org
Subject: Re: Complement of {!join}

I think this is the Jira that implemented that feature:
SOLR-4093 - localParams syntax for standard query parser
https://issues.apache.org/jira/browse/SOLR-4093

Yeah, I don't think this is fully documented anywhere, other than the Jira
and the patch itself.

I think I had finished my query parser doc in my e-book before 4.1 came out.
This was the point where the "divorce" between the Lucene and Solr query
parsers took place, because the feature needed to be added to the query
parser grammar, but the Lucene guys objected to this "Solr feature."

-- Jack Krupansky

-Original Message- 
From: Alexandre Rafalovitch

Sent: Wednesday, July 9, 2014 9:10 PM
To: solr-user@lucene.apache.org
Subject: Re: Complement of {!join}

Ok, so cannot be eDisMax at the top.

However, the point I really am trying to make does not seem to be in
those links. All the examples of local parameters I have seen use them
at the start of the query as a standalone component. I haven't seen
examples where a query string contains several of them together and
uses different query parsers. The only example I do remember seeing
multiple query parsers used together was when each one of them was
done separately in 'fq' clauses.

Additionally, even now I don't know how the end of the content after
the local parameter closing brace is determined. I used line breaks
for my example, also (brackets) seem to work. But I don't remember
seeing the exact rules.

So, I still think the world could benefit from a very visible example
showing multi-clause query with different sub-clauses using different
query parsers. Perhaps even on that same linked page on Wiki. And/Or a
presentation on "Did you know this about Solr?" at the next big
conference.

Regards,
  Alex.

Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr
proficiency


On Thu, Jul 10, 2014 at 7:53 AM, Chris Hostetter
 wrote:

:
: Somebody (with more knowledge) should write up an in-depth article on
: this issue and whether the parent parser has to be default (lucene) or
: whatever.

It's a feature of Solr's standard query parser...

https://cwiki.apache.org/confluence/display/solr/Query+Syntax+and+Parsing
https://cwiki.apache.org/confluence/display/solr/The+Standard+Query+Parser
https://cwiki.apache.org/confluence/display/solr/The+Standard+Query+Parser#TheStandardQueryParser-DifferencesbetweenLuceneQueryParserandtheSolrStandardQueryParser



-Hoss
http://www.lucidworks.com/ 




Re: Lower/UpperCase Issue

2014-07-09 Thread Erick Erickson
Side note. Puttling LowercaseFilter in front of
WordDelimiterFilterFactory is usually a poor
choice. One of the purposes of WDFF is that
it breaks lower->upper case transitions into
separate tokens.

NOTE: This is _not_ germane to your problem
IMO.

But it _is_ an indication that you might want to
spend some time with the admin/analysis page
to understand the affects of the filters on various
inputs.

Also, add &debug=all to your query to see exactly
why things were returned in the order they were. In
this case, as Shawn says, all queries will be scored
the same.

Actually, I'd be very interested in the output of
adding &debug=all to the two queries. Theoretically,
since all the scores are the same, I'd expect the
returns to be constantly ordered unless the filter
query is parsing Balancer and BALANCER differently.
I'm going to guess that the parsing of the fq clause is
different somehow, but that's only a guess.

Best,
Erick

On Wed, Jul 9, 2014 at 4:20 PM, Shawn Heisey  wrote:
> On 7/9/2014 2:02 PM, EXTERNAL Taminidi Ravi (ETI,
> Automotive-Service-Solutions) wrote:
>> Here is the schema part.
>>
>> 
>
> Your query is *:*, which is a constant score query.  You also have a
> filter, which does not affect scoring.
>
> Since there is no score difference between different documents with your
> query results, the lack of a sort parameter means that you will most
> likely get the results in the order that Lucene returns them, which is
> completely indeterminate.
>
> There's probably some minute difference between the two queries at the
> Lucene level, possibly because the stemmer behaves differently with
> different case or just because the internal matching happens
> differently, which makes Lucene return the results in a different order.
>
> If you want to be absolutely in control of your result order when the
> query results in a constant score, you'll need to specify a sort parameter.
>
> Thanks,
> Shawn
>


Re: Complement of {!join}

2014-07-09 Thread Jack Krupansky

I think this is the Jira that implemented that feature:
SOLR-4093 - localParams syntax for standard query parser
https://issues.apache.org/jira/browse/SOLR-4093

Yeah, I don't think this is fully documented anywhere, other than the Jira 
and the patch itself.


I think I had finished my query parser doc in my e-book before 4.1 came out. 
This was the point where the "divorce" between the Lucene and Solr query 
parsers took place, because the feature needed to be added to the query 
parser grammar, but the Lucene guys objected to this "Solr feature."


-- Jack Krupansky

-Original Message- 
From: Alexandre Rafalovitch

Sent: Wednesday, July 9, 2014 9:10 PM
To: solr-user@lucene.apache.org
Subject: Re: Complement of {!join}

Ok, so cannot be eDisMax at the top.

However, the point I really am trying to make does not seem to be in
those links. All the examples of local parameters I have seen use them
at the start of the query as a standalone component. I haven't seen
examples where a query string contains several of them together and
uses different query parsers. The only example I do remember seeing
multiple query parsers used together was when each one of them was
done separately in 'fq' clauses.

Additionally, even now I don't know how the end of the content after
the local parameter closing brace is determined. I used line breaks
for my example, also (brackets) seem to work. But I don't remember
seeing the exact rules.

So, I still think the world could benefit from a very visible example
showing multi-clause query with different sub-clauses using different
query parsers. Perhaps even on that same linked page on Wiki. And/Or a
presentation on "Did you know this about Solr?" at the next big
conference.

Regards,
  Alex.

Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr 
proficiency



On Thu, Jul 10, 2014 at 7:53 AM, Chris Hostetter
 wrote:

:
: Somebody (with more knowledge) should write up an in-depth article on
: this issue and whether the parent parser has to be default (lucene) or
: whatever.

It's a feature of Solr's standard query parser...

https://cwiki.apache.org/confluence/display/solr/Query+Syntax+and+Parsing
https://cwiki.apache.org/confluence/display/solr/The+Standard+Query+Parser
https://cwiki.apache.org/confluence/display/solr/The+Standard+Query+Parser#TheStandardQueryParser-DifferencesbetweenLuceneQueryParserandtheSolrStandardQueryParser



-Hoss
http://www.lucidworks.com/ 




Re: Solr performance improved under heavy load

2014-07-09 Thread Erick Erickson
I'm pretty much lost, please add some details:
1> 27-50 rpm. Queries? Updates?
2> what kinds of updates are happening if <1> is queries?
3> The various mail systems often strip screenshots, I don't see it.
4> What are you measuring anyway? QTime? Time for response to
 come back?
5> are your logs showing any commits? How about opening new
searchers in the two situations?
6> I expect that 6ms response times (assuming queries) pretty
much means you're hitting the queryResultCache. The admin
page should help you figure this out, see the cache hit ratio.

Best,
Erick

On Wed, Jul 9, 2014 at 4:10 PM, Utkarsh Sengar  wrote:
> I run a small solr cloud cluster (4.5) of 3 nodes, 3 collections with 3
> shards each. Total index size per node is about 20GB with about 70M
> documents.
>
> In regular traffic (27-50 rpm) the performance is ok and response time
> ranges from 100 to 500ms.
> But when I start loading (overwriting) 70M documents again via curl + csv,
> the performance has drastically improved. I see a 6ms response time
> (screenshot attached).
>
> So I am just curious about this, intuitively solr should perform better
> under low traffic and slow down as traffic goes up. So what is the reason
> for this? Efficient memory management with more data?
>
> --
> Thanks,
> -Utkarsh


Phrase Slop relevance tuning

2014-07-09 Thread Greg Pendlebury
I've received a request from our business area to take a look at
emphasising ~0 phrase matches over ~1 (and greater) more that they are
already. I can't see any doco on the subject, and I'd like to ask if anyone
else has played in this area? Or at least is willing to sanity check my
reasoning before I rush in and code a solution, when I may be reinventing
the wheel?

Looking through the codebase, I can only find hardcoded weightings in a
couple of places, using the formula: "return 1.0f / (distance + 1);" which
results in ~0 getting a weight of 1, and ~1 getting a weight of 0.5.

There are a number of ways I've already considered, but the most flexible
seems to be to expose those two numbers via configuration.

We are considering adjusting them in sync with each other (using 1/3
instead of 1 in both places), which has the impact of altering the overall
distribution of the weightings graph, but retaining the scale between 1 and
0.

Additionally, we are considering increasing the numerator to increase the
upper scale above 1. Not sure if this is dumb idea though. Our hope was to
use something like "return 2.0f / (distance + 0.33f);" to give ~0 matches a
real (^2) boost in comparison to other weighting factors, and retain the ~1
(and greater) matches at around their current weight. This remains a
completely untested theory though, since I may be misunderstanding how the
output gets combined outside this method.

The real technical change though would be to simply get those two numbers
from config. Any advice or suggestions about other ideas we haven't even
considered? The larger picture here is that we are using edismax and the pf
fields are all covered by ps=5.

Ta,
Greg


Re: Solr enterprise tech support in Brazil

2014-07-09 Thread Otis Gospodnetic
Hello,

Sematext would be happy to help.  Please see signature.

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/



> On Jul 9, 2014, at 4:15 PM, "Jefferson Olyntho Neto (STI)" 
>  wrote:
>
> Dear all,
>
> I would like some recommendation of companies who work with enterprise 
> technical support for Solr in Brazil. Could someone help me?
>
> Thanks!
>
> Jefferson Olyntho Neto
> jefferson.olyn...@unimedbh.com.br


Re: Complement of {!join}

2014-07-09 Thread Alexandre Rafalovitch
Ok, so cannot be eDisMax at the top.

However, the point I really am trying to make does not seem to be in
those links. All the examples of local parameters I have seen use them
at the start of the query as a standalone component. I haven't seen
examples where a query string contains several of them together and
uses different query parsers. The only example I do remember seeing
multiple query parsers used together was when each one of them was
done separately in 'fq' clauses.

Additionally, even now I don't know how the end of the content after
the local parameter closing brace is determined. I used line breaks
for my example, also (brackets) seem to work. But I don't remember
seeing the exact rules.

So, I still think the world could benefit from a very visible example
showing multi-clause query with different sub-clauses using different
query parsers. Perhaps even on that same linked page on Wiki. And/Or a
presentation on "Did you know this about Solr?" at the next big
conference.

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Thu, Jul 10, 2014 at 7:53 AM, Chris Hostetter
 wrote:
> :
> : Somebody (with more knowledge) should write up an in-depth article on
> : this issue and whether the parent parser has to be default (lucene) or
> : whatever.
>
> It's a feature of Solr's standard query parser...
>
> https://cwiki.apache.org/confluence/display/solr/Query+Syntax+and+Parsing
> https://cwiki.apache.org/confluence/display/solr/The+Standard+Query+Parser
> https://cwiki.apache.org/confluence/display/solr/The+Standard+Query+Parser#TheStandardQueryParser-DifferencesbetweenLuceneQueryParserandtheSolrStandardQueryParser
>
>
>
> -Hoss
> http://www.lucidworks.com/


Re: Complement of {!join}

2014-07-09 Thread Chris Hostetter
: 
: Somebody (with more knowledge) should write up an in-depth article on
: this issue and whether the parent parser has to be default (lucene) or
: whatever.

It's a feature of Solr's standard query parser...

https://cwiki.apache.org/confluence/display/solr/Query+Syntax+and+Parsing
https://cwiki.apache.org/confluence/display/solr/The+Standard+Query+Parser
https://cwiki.apache.org/confluence/display/solr/The+Standard+Query+Parser#TheStandardQueryParser-DifferencesbetweenLuceneQueryParserandtheSolrStandardQueryParser



-Hoss
http://www.lucidworks.com/


Re: Complement of {!join}

2014-07-09 Thread Alexandre Rafalovitch
I think any sub-clause can use a local syntax and branch off into
different query parsers. I could not find any examples of it either
but really need to do an advanced search and came up with this:


   {!switch case='*:*' default=$q_lastName v=$lastName}
   AND {!switch case='*:*' default=$q_firstName v=$firstName}
   AND {!switch case='*:*' default=$q_organizationalUnit v=$organizationalUnit}
   ...

Full example: https://gist.github.com/arafalov/5e04884e5aefaf46678c

Somebody (with more knowledge) should write up an in-depth article on
this issue and whether the parent parser has to be default (lucene) or
whatever.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Wed, Jul 9, 2014 at 11:26 PM, Bruce Johnson  wrote:
> Thank you so much for the quick reply, Erik. And wow: I didn't realize you
> could use join that fluidly. Very nice.
>
> Is there some trove of Solr doc that I'm missing where this natural syntax
> is explained? I wouldn't have asked such a basic question except that I
> found no evidence that this was possible.
>
>
>
>
> On Wed, Jul 9, 2014 at 12:07 PM, Erik Hatcher 
> wrote:
>
>> Maybe something like q=*:* AND NOT {!join … } would do the trick?  (it’ll
>> depend on your version of Solr for support of the {!…} more natural nested
>> queries)
>>
>> Erik
>>
>> On Jul 9, 2014, at 11:24 AM, Bruce Johnson 
>> wrote:
>>
>> > === Short-version ===
>> > Is there a way to join on the complement of a query? I want the only the
>> > Solr documents for which the nested join query does not match.
>> >
>> > === Longer-version ===
>> > Query-time joins with {!join} are great at modeling the SQL equivalent of
>> > patterns like this:
>> >
>> > SELECT book_name FROM books WHERE id
>> > IN (SELECT book_id FROM chapters WHERE chapter_title = "Foo")
>> >
>> > This would find the name of books having chapters entitled "Foo".
>> (Assuming
>> > the chapters table have the column 'book_id' that point back to the book
>> > record containing them.)
>> >
>> > That's great.
>> >
>> > Is there a way in Solr to query for the complement of that? In SQL terms,
>> > this:
>> >
>> > SELECT book_name FROM books WHERE id
>> > NOT IN (SELECT book_id FROM chapters WHERE chapter_title = "Foo")
>> >
>> > This would find books that do not have chapters entitled "Foo".
>> >
>> > It isn't the same as querying (in Solr terms) for something like
>> >
>> > {!join to=id from=book_id}-chapter_title:"Foo" // note the negation
>> >
>> > because it would still match other chapters in the same book that are not
>> > entitled "Foo", causing the join to still identify the book based on its
>> > other non-Foo chapters.
>> >
>> > Any advice would be greatly appreciated. I'm also open to other ways of
>> > thinking about the problem. Perhaps there are alternative indexing
>> patterns
>> > that could accomplish the same goal.
>> >
>> > Many thanks,
>> > Bruce
>>
>>


Re: Lower/UpperCase Issue

2014-07-09 Thread Shawn Heisey
On 7/9/2014 2:02 PM, EXTERNAL Taminidi Ravi (ETI,
Automotive-Service-Solutions) wrote:
> Here is the schema part.
>
> 

Your query is *:*, which is a constant score query.  You also have a
filter, which does not affect scoring.

Since there is no score difference between different documents with your
query results, the lack of a sort parameter means that you will most
likely get the results in the order that Lucene returns them, which is
completely indeterminate.

There's probably some minute difference between the two queries at the
Lucene level, possibly because the stemmer behaves differently with
different case or just because the internal matching happens
differently, which makes Lucene return the results in a different order.

If you want to be absolutely in control of your result order when the
query results in a constant score, you'll need to specify a sort parameter.

Thanks,
Shawn



Solr performance improved under heavy load

2014-07-09 Thread Utkarsh Sengar
I run a small solr cloud cluster (4.5) of 3 nodes, 3 collections with 3
shards each. Total index size per node is about 20GB with about 70M
documents.

In regular traffic (27-50 rpm) the performance is ok and response time
ranges from 100 to 500ms.
But when I start loading (overwriting) 70M documents again via curl + csv,
the performance has drastically improved. I see a 6ms response time
(screenshot attached).

So I am just curious about this, intuitively solr should perform better
under low traffic and slow down as traffic goes up. So what is the reason
for this? Efficient memory management with more data?

-- 
Thanks,
-Utkarsh


Re: Fwd: Language detection for solr 3.6.1

2014-07-09 Thread T. Kuro Kurosaka


On 07/08/2014 03:17 AM, Poornima Jay wrote:

I'm using the google library which I has mentioned in my first mail saying Im 
usinghttp://code.google.com/p/language-detection/. I have downloaded the jar 
file from the below url

https://www.versioneye.com/java/org.apache.solr:solr-langid/3.6.1


Please let me know from where I need to download the correct jar file.

Regards,
I don't think you need to download anything. It's included in Solr 3.6.1 
package.


$ ls contrib/langid/lib
jsonic-1.2.7.jar jsonic-NOTICE.txt langdetect-LICENSE-ASL.txt
jsonic-LICENSE-ASL.txt langdetect-1.1-20120112.jar langdetect-NOTICE.txt

langdetect-1.1-20120112.jar is the one you find in the Googole Code site,
which isn't developed by Google, but developed by a Japanese company
Cybozu.

I used this some years ago for a comparison purpose,
but I don't remember how I did. You'd have to move the
JARs in the lib directory to the lib directory, and
use
LangDetectLanguageIdentifierUpdateProcessorFactory
instead of
TikaLanguageIdentifierUpdateProcessorFactory
in the commented out portion of example/solr/conf/solrconfig.xml
(and you need to un-comment out that portion, of course)

Hope this helps.

--
T. "Kuro" Kurosaka • Senior Software Engineer
Healthline - The Power of Intelligent Health
www.healthline.com  |@Healthline  | @HealthlineCorp



Re: Query in metadata sent to Solr

2014-07-09 Thread Ahmet Arslan
Hi,

Field name sent with literal is Modified. In your screenshot, it is 
last_modified . Do you use f.map setting in solrconfig.xml? 

I think it is better to send use solrconfig.xml file where solr cell handler 
defined.


On Thursday, July 10, 2014 12:18 AM, Ameya Aware  wrote:
 


Hi,

Please have look at the below part taken from solr.log file.

INFO  - 2014-07-09 15:30:56.243; 
org.apache.solr.update.processor.LogUpdateProcessor; [collection1] webapp=/solr 
path=/update/extract
 
params={literal.deny_token_document=DEAD_AUTHORITY&literal.DocIcon=docx&resource.name=Anarchism-201310091123505625.docx&literal.allow_token_document=S-1-5-21-1482846375-227860-3536682573-500&literal.allow_token_document=S-1-5-21-1482846375-227860-3536682573-68651&literal.FolderChildCount=0&version=2.2&literal.ItemChildCount=0&literal.GUID=Ameya&literal.ParentVersionString=&literal._CopySource=&literal.cat=&literal.FileSizeDisplay=1264155&literal._CheckinComment=&literal.Edit=0&literal.id=http://sharepointten:10800/sites/siteecho/Shared%2520Documents/Anarchism-201310091123505625.docx&literal.LinkFilenameNoMenu=Anarchism-201310091123505625.docx&literal.Created=2014-06-03+11:21:53&literal._UIVersionString=1.0&wt=xml&literal.Title=Anarchism&literal.Modified=2014-06-03+11:21:53&literal.Author=Sharepoint+Backup&literal.FileLeafRef=Anarchism-201310091123505625.docx&literal.LinkFilename=Anarchism-201310091123505625.docx&literal.lcf_metadata_id=81&liter
al.Editor=Administrator&literal.ParentLeafName=&literal.CheckoutUser=&literal.


In the log, last_modified date comes out to be 2014-06-03 which is correct date.
But when i see it on Solr UI it seems to come different.

Please find attached screenshot for it.

Can you please let me know cause for it?

Is that field being extracted by Solar cell?

Thanks,
Ameya

Query in metadata sent to Solr

2014-07-09 Thread Ameya Aware
Hi,

Please have look at the below part taken from solr.log file.

INFO  - 2014-07-09 15:30:56.243;
org.apache.solr.update.processor.LogUpdateProcessor;
[collection1] webapp=/solr path=/update/extract params={literal.deny_token_
document=DEAD_AUTHORITY&literal.DocIcon=docx&resource.name=Anarchism-
201310091123505625.docx&literal.allow_token_document=S-1-5-21-1482846375-
227860-3536682573-500&literal.allow_token_document=S-1-5-21-1482846375-
227860-3536682573-68651&literal.FolderChildCount=0&version=2.2&literal.
ItemChildCount=0&literal.GUID=Ameya&literal.ParentVersionString=&literal._
CopySource=&literal.cat=&literal.FileSizeDisplay=1264155&literal._
CheckinComment=&literal.Edit=0&literal.id=
http://sharepointten:10800/sites/siteecho/Shared%2520Documents/Anarchism-201310091123505625.docx&literal.LinkFilenameNoMenu=Anarchism-201310091123505625.docx&literal.Created=2014-06-03+11:21:53&literal._UIVersionString=1.0&wt=xml&literal.Title=Anarchism&;
literal.Modified=2014-06-03+11:21:53
&literal.Author=Sharepoint+Backup&literal.FileLeafRef=Anarchism-201310091123505625.docx&literal.LinkFilename=Anarchism-201310091123505625.docx&literal.lcf_metadata_id=81&literal.Editor=Administrator&literal.ParentLeafName=&literal.CheckoutUser=&literal.


In the log, last_modified date comes out to be 2014-06-03 which is correct
date.
But when i see it on Solr UI it seems to come different.

Please find attached screenshot for it.

Can you please let me know cause for it?

Is that field being extracted by Solar cell?

Thanks,
Ameya


Re: Lower/UpperCase Issue

2014-07-09 Thread Jack Krupansky
Ahmet is correct: the porter stemmer assumes that your input is lower case, 
so be sure to place the lower case filter before stemming.


BTW, this is the kind of detail that I have in my e-book:
http://www.lulu.com/us/en/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-7/ebook/product-21203548.html

You could also find this detail down at the level of the Lucene Javadoc, but 
IMHO it's inappropriate to expect Solr users to have to dive down into 
Lucene Javadoc.


See:
http://lucene.apache.org/core/4_9_0/analyzers-common/org/apache/lucene/analysis/en/PorterStemFilter.html

-- Jack Krupansky

-Original Message- 
From: EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions)

Sent: Wednesday, July 9, 2014 4:03 PM
To: solr-user@lucene.apache.org ; Ahmet Arslan
Subject: RE: Lower/UpperCase Issue

Do I need to use different algorithm instead of porter stemming..? can you 
suggest anything in you mind..?


-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com.INVALID]
Sent: Wednesday, July 09, 2014 12:26 PM
To: solr-user@lucene.apache.org
Subject: Re: Lower/UpperCase Issue

Hi,

Analysis admin page will tell you the truth. Just a guess: porter stem 
filter could be "case sensitive" and that may cause the difference. I am 
pretty sure porter stemming algorithms designed to work on lowercase input.


By the way you have two lowercase filters defined in index analyzer.

Ahmet



On Wednesday, July 9, 2014 7:18 PM, "EXTERNAL Taminidi Ravi (ETI, 
Automotive-Service-Solutions)"  wrote:
I have a situation here, when I search with "BALANCER" the results are 
different Compare to "Balancer" and the order is different  When I search 
"BALANCER" then, the documents with Upper Case are first in the List and for 
"Balancer" it is in different order.


I am confused why this behavior, Can some one has same issue or I am missing 
something.


positionIncrementGap="100">

 

 
   words="stopwords.txt" />

 
 
 generateWordParts="0" generateNumberParts="0" catenateWords="1" 
catenateNumbers="1" catenateAll="0"/>

   
 
 


 
   words="stopwords.txt" />
   ignoreCase="true" expand="true"/>

   
 generateWordParts="0" generateNumberParts="0" catenateWords="1" 
catenateNumbers="1" catenateAll="0"/>



   

e.g query

http://localhost:8983/solr/Test/select?q=*%3A*&fq=Name%3ABALANCER&wt=json&indent=true

http://localhost:8983/solr/Test/select?q=*%3A*&fq=Name%3ABalancer&wt=json&indent=true

Thanks

Ravi 



Solr enterprise tech support in Brazil

2014-07-09 Thread Jefferson Olyntho Neto (STI)
Dear all,

I would like some recommendation of companies who work with enterprise 
technical support for Solr in Brazil. Could someone help me?

Thanks!

Jefferson Olyntho Neto
jefferson.olyn...@unimedbh.com.br


RE: Lower/UpperCase Issue

2014-07-09 Thread EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions)
Do I need to use different algorithm instead of porter stemming..? can you 
suggest anything in you mind..?

-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com.INVALID] 
Sent: Wednesday, July 09, 2014 12:26 PM
To: solr-user@lucene.apache.org
Subject: Re: Lower/UpperCase Issue

Hi,

Analysis admin page will tell you the truth. Just a guess: porter stem filter 
could be "case sensitive" and that may cause the difference. I am pretty sure 
porter stemming algorithms designed to work on lowercase input.

By the way you have two lowercase filters defined in index analyzer.

Ahmet



On Wednesday, July 9, 2014 7:18 PM, "EXTERNAL Taminidi Ravi (ETI, 
Automotive-Service-Solutions)"  wrote:
I have a situation here, when I search with "BALANCER" the results are 
different Compare to "Balancer" and the order is different  When I search 
"BALANCER" then, the documents with Upper Case are first in the List and for 
"Balancer" it is in different order.

I am confused why this behavior, Can some one has same issue or I am missing 
something.


      
         
      
        
              
              
              
        
      
      
         
     
              
        
        
        
              

         
    

e.g query

http://localhost:8983/solr/Test/select?q=*%3A*&fq=Name%3ABALANCER&wt=json&indent=true

http://localhost:8983/solr/Test/select?q=*%3A*&fq=Name%3ABalancer&wt=json&indent=true

Thanks

Ravi


RE: Lower/UpperCase Issue

2014-07-09 Thread EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions)
Here is the schema part.




-Original Message-
From: Shawn Heisey [mailto:s...@elyograg.org] 
Sent: Wednesday, July 09, 2014 12:24 PM
To: solr-user@lucene.apache.org
Subject: Re: Lower/UpperCase Issue

On 7/9/2014 10:17 AM, EXTERNAL Taminidi Ravi (ETI,
Automotive-Service-Solutions) wrote:
> I have a situation here, when I search with "BALANCER" the results are 
> different Compare to "Balancer" and the order is different  When I search 
> "BALANCER" then, the documents with Upper Case are first in the List and for 
> "Balancer" it is in different order.
>
> I am confused why this behavior, Can some one has same issue or I am missing 
> something.
>
>  positionIncrementGap="100">
>   
>  
>   
>  words="stopwords.txt" />
>   
>   
>generateWordParts="0" generateNumberParts="0" catenateWords="1" 
> catenateNumbers="1" catenateAll="0"/>
> 
>   
>   
>  
>  
>   
>  words="stopwords.txt" />
>  ignoreCase="true" expand="true"/>
> 
>generateWordParts="0" generateNumberParts="0" catenateWords="1" 
> catenateNumbers="1" catenateAll="0"/>
>
>  
> 
>
> e.g query
>
> http://localhost:8983/solr/Test/select?q=*%3A*&fq=Name%3ABALANCER&wt=j
> son&indent=true
>
> http://localhost:8983/solr/Test/select?q=*%3A*&fq=Name%3ABalancer&wt=j
> son&indent=true

What is the full field definition for Name?  You've included a fieldType here, 
but that's only half the picture.

Thanks,
Shawn



Re: Facets on Nested documents

2014-07-09 Thread Mikhail Khludnev
Colleagues,
So far you can either vote or contribute to
https://issues.apache.org/jira/browse/SOLR-5743

Walter,
Usually, index-time tricks loose relationships information, that leads to
wrong counts.


On Tue, Jul 8, 2014 at 2:40 PM, Walter Liguori 
wrote:

> Yes, also i've the same problem.
> In my case i have 2 type (parent and children) in a single collection and i
> want to retrieve only the parent with a facet on a children field.
> I've seen that is possible via block join query (availble by solr 4.5).
> I've solr 1.2 and I've thinked about static facet field calculated during
> indexing time but i'dont see any guide o reference about it.
> Walter
>
> Ing. Walter Liguori
>
>
> 2014-07-07 17:59 GMT+02:00 adfel70 :
>
> > Hi,
> >
> > I indexed different types(different fields) of child docs for every
> parent.
> > I want to do facet on field in one type of child doc and after it to do
> > another of facet on different type of child doc. It doesn't work..
> >
> > Any idea how i can do something like that?
> >
> > thanks.
> >
> >
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/Facets-on-Nested-documents-tp4145931.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics


 


Re: Parallel optimize of index on SolrCloud.

2014-07-09 Thread Mark Miller
I think that’s pretty much a search time param, though it might end being used 
on the update side as well. In any case, I know it doesn’t affect commit or 
optimize.

Also, to my knowledge, SolrCloud optimize support was never explicitly added or 
tested.

--  
Mark Miller
about.me/markrmiller

On July 9, 2014 at 12:00:27 PM, Shawn Heisey (s...@elyograg.org) wrote:
> > I thought a bug had been filed on the distrib=false problem,  



Re: [Solr Schema API] SolrJ Access

2014-07-09 Thread Alessandro Benedetti
I solved the PUT issue using a post ( with adding of more than one field) :

 ContentStreamUpdateRequest contentStreamUpdateRequest =
> new ContentStreamUpdateRequest(
> SCHEMA_SOLR_FIELDS_ENDPOINT);
> SolrServer solrServer = this.getCore( core );
> contentStreamUpdateRequest.addContentStream(new
> ContentStreamBase.ByteArrayStream(jsonFields.getBytes(),"data-binary"));
> UpdateResponse process = contentStreamUpdateRequest.process(
> solrServer );


The EmbeddedSolrServer is still a problem.

Cheers


2014-07-09 15:55 GMT+01:00 Alessandro Benedetti 
:

> mmm wondering how to pass the payload for the PUT using that structure
> with SolrQuery...
>
>
> 2014-07-09 15:42 GMT+01:00 Alessandro Benedetti <
> benedetti.ale...@gmail.com>:
>
> Thank's Elaine !
>> Worked for the GET Method !
>> I will test soon with the PUT method :)
>>
>> One strange thing is that is working with a real Solr Instance but not
>> with an Embedded SolrServer ...
>> probably it's matter of dependencies, I let you know...
>>
>> Many thanks
>>
>> Cheers
>>
>>
>> 2014-07-08 21:59 GMT+01:00 Cario, Elaine 
>> :
>>
>> Alessandro,
>>>
>>> I just got this to work myself:
>>>
>>> public static final String DEFINED_FIELDS_API = "/schema/fields";
>>> public static final String DYNAMIC_FIELDS_API =
>>> "/schema/dynamicfields";
>>> ...
>>> // just get a connection to Solr as usual (the factory is mine -
>>> it will use CloudSolrServer or HttpSolrServer depending on if we're using
>>> SolrCloud or not)
>>> SolrClient client =
>>> SolrClientFactory.getSolrClientInstance(CLOUD_ENABLED);
>>> SolrServer solrConn = client.getConnection(SOLR_URL, collection);
>>>
>>> SolrQuery query = new SolrQuery();
>>> if (dynamicFields)
>>> query.setRequestHandler(DYNAMIC_FIELDS_API);
>>> else
>>> query.setRequestHandler(DEFINED_FIELDS_API);
>>> query.setParam("showDefaults", true);
>>>
>>> QueryResponse response = solrConn.query(query)
>>>
>>> Then you've got to parse the response using NamedList etc.etc.
>>>
>>> -Original Message-
>>> From: Alessandro Benedetti [mailto:benedetti.ale...@gmail.com]
>>> Sent: Tuesday, July 08, 2014 5:54 AM
>>> To: solr-user@lucene.apache.org
>>> Subject: [Solr Schema API] SolrJ Access
>>>
>>> Hi guys,
>>> wondering if there is any proper way to access Schema API via Solrj.
>>>
>>> Of course is possible to reach them in Java with a specific Http
>>> Request, but in this way, using SolrCloud for example we become coupled to
>>> one specific instance ( and we don't want) .
>>>
>>> Code Example :
>>>
>>> HttpResponse httpResponse;
>>> > String url=this.solrBase+"/"+core+
>>> > SCHEMA_SOLR_FIELDS_ENDPOINT
>>> > +fieldName;
>>> > HttpPut httpPut = new HttpPut(url);
>>> > StringEntity entity = new StringEntity(
>>> > "{\"type\":\"text_general\",\"stored\":\"true\"}" ,
>>> > ContentType.APPLICATION_JSON);
>>> >  httpPut.setEntity( entity );
>>> >  HttpClient client=new DefaultHttpClient();
>>> >  response = client.execute(httpPut);
>>>
>>>
>>> Any suggestion ?
>>> In my opinion should be interesting to have some auxiliary method in
>>> SolrServer if it's not there yet.
>>>
>>> Cheers
>>>
>>> --
>>> --
>>>
>>> Benedetti Alessandro
>>> Visiting card : http://about.me/alessandro_benedetti
>>>
>>> "Tyger, tyger burning bright
>>> In the forests of the night,
>>> What immortal hand or eye
>>> Could frame thy fearful symmetry?"
>>>
>>> William Blake - Songs of Experience -1794 England
>>>
>>
>>
>>
>> --
>> --
>>
>> Benedetti Alessandro
>> Visiting card : http://about.me/alessandro_benedetti
>>
>> "Tyger, tyger burning bright
>> In the forests of the night,
>> What immortal hand or eye
>> Could frame thy fearful symmetry?"
>>
>> William Blake - Songs of Experience -1794 England
>>
>
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>



-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: Getting OutOfMemoryError: Java heap space in Solr

2014-07-09 Thread yuvaraj ponnuswamy
Hi Shawn,

Thanks for your valuable inputs.

For your information we are using SQL Server.

Also, we will try to use the JOIN instead of Cache Entity and check it.

Regards
P.Yuvaraj Kumar

On Wed, 9/7/14, Shawn Heisey  wrote:

 Subject: Re: Getting OutOfMemoryError: Java heap space in Solr
 To: solr-user@lucene.apache.org
 Date: Wednesday, 9 July, 2014, 9:24 PM
 
 On 7/9/2014 6:02 AM,
 yuvaraj ponnuswamy wrote:
 > Hi,
 >
 > I am getting the
 OutofMemory Error: "java.lang.OutOfMemoryError: Java
 heap space" often in production due to the particular
 Treemap is taking more memory in the JVM.
 >
 > When i looked into
 the config files I am having the entity called
 UserQryDocument where i am fetching the data from certain
 tables.
 > Again i have a sub entiry
 called "UserLocation" where i am using the
 CachedSqlEntityProcessor to get the fields from Cache. It
 seems like it has the total of 2,00,000 records total.
 >
 processor="CachedSqlEntityProcessor"
 cacheKey="user_pin"
 cacheLookup="UserQueryDocumentNonAuthor.DocKey">
 >
 > Like this i have some
 other different entity and there also i am using this
 CachedSqlEntityProcessor in the sub entity.
 >
 > But when i looked
 into the Heap Dump : java_pid57.hprof i am able to see the
 TreeMap is causing the problem.
 >
 > But not able to find which entity is
 causing this issue.I am using the IBM Heap Ananlyser to look
 into the Dump.
 >
 > Can
 you please let me know is there any other way we can find
 out which entity is causing this issue or any other tool to
 analyse and debug the Out of Memory Issue to find the exact
 entity is causing this issue.
 >
 > I have attched the entity in
 dataconfig.xml and heap Anayser screen shot.
 
 JDBC drivers have a habit of
 loading the entire resultset into RAM. 
 Also, you are using the cached processor ...
 which will effectively do
 the same thing. 
 With millions of DB rows, this is going to require a
 LOT of heap memory.  You'll want to change
 your JDBC connection so that
 it doesn't
 load the entire result set, and you may also need to turn
 off
 entity caching in Solr.  You didn't
 mention what database you're using. 
 Here's how to fix MySQL and SQL Server so
 they don't load the entire
 result set. 
 The requirements for another database are likely to be
 different:
 
 
https://wiki.apache.org/solr/DataImportHandlerFaq#I.27m_using_DataImportHandler_with_a_MySQL_database._My_table_is_huge_and_DataImportHandler_is_going_out_of_memory._Why_does_DataImportHandler_bring_everything_to_memory.3F
 
 The best way to make DIH
 perform well is to use JOIN so that you can get
 all your data with one entity and one SELECT
 query.  Let the database do
 all the heavy
 lifting instead of having Solr send millions of queries. 
 GROUP_CONCAT on the SQL side and a
 regexTransformer 'splitBy' can
 sometimes be used to get multiple values into a
 field.
 
 Thanks,
 Shawn
 



Re: Lower/UpperCase Issue

2014-07-09 Thread Ahmet Arslan
Hi,

Analysis admin page will tell you the truth. Just a guess: porter stem filter 
could be "case sensitive" and that may cause the difference. 
I am pretty sure porter stemming algorithms designed to work on lowercase input.

By the way you have two lowercase filters defined in index analyzer.

Ahmet



On Wednesday, July 9, 2014 7:18 PM, "EXTERNAL Taminidi Ravi (ETI, 
Automotive-Service-Solutions)"  wrote:
I have a situation here, when I search with "BALANCER" the results are 
different Compare to "Balancer" and the order is different  When I search 
"BALANCER" then, the documents with Upper Case are first in the List and for 
"Balancer" it is in different order.

I am confused why this behavior, Can some one has same issue or I am missing 
something.


      
         
      
        
              
              
              
        
      
      
         
     
              
        
        
        
              

         
    

e.g query

http://localhost:8983/solr/Test/select?q=*%3A*&fq=Name%3ABALANCER&wt=json&indent=true

http://localhost:8983/solr/Test/select?q=*%3A*&fq=Name%3ABalancer&wt=json&indent=true

Thanks

Ravi


Re: Complement of {!join}

2014-07-09 Thread Bruce Johnson
Thank you so much for the quick reply, Erik. And wow: I didn't realize you
could use join that fluidly. Very nice.

Is there some trove of Solr doc that I'm missing where this natural syntax
is explained? I wouldn't have asked such a basic question except that I
found no evidence that this was possible.




On Wed, Jul 9, 2014 at 12:07 PM, Erik Hatcher 
wrote:

> Maybe something like q=*:* AND NOT {!join … } would do the trick?  (it’ll
> depend on your version of Solr for support of the {!…} more natural nested
> queries)
>
> Erik
>
> On Jul 9, 2014, at 11:24 AM, Bruce Johnson 
> wrote:
>
> > === Short-version ===
> > Is there a way to join on the complement of a query? I want the only the
> > Solr documents for which the nested join query does not match.
> >
> > === Longer-version ===
> > Query-time joins with {!join} are great at modeling the SQL equivalent of
> > patterns like this:
> >
> > SELECT book_name FROM books WHERE id
> > IN (SELECT book_id FROM chapters WHERE chapter_title = "Foo")
> >
> > This would find the name of books having chapters entitled "Foo".
> (Assuming
> > the chapters table have the column 'book_id' that point back to the book
> > record containing them.)
> >
> > That's great.
> >
> > Is there a way in Solr to query for the complement of that? In SQL terms,
> > this:
> >
> > SELECT book_name FROM books WHERE id
> > NOT IN (SELECT book_id FROM chapters WHERE chapter_title = "Foo")
> >
> > This would find books that do not have chapters entitled "Foo".
> >
> > It isn't the same as querying (in Solr terms) for something like
> >
> > {!join to=id from=book_id}-chapter_title:"Foo" // note the negation
> >
> > because it would still match other chapters in the same book that are not
> > entitled "Foo", causing the join to still identify the book based on its
> > other non-Foo chapters.
> >
> > Any advice would be greatly appreciated. I'm also open to other ways of
> > thinking about the problem. Perhaps there are alternative indexing
> patterns
> > that could accomplish the same goal.
> >
> > Many thanks,
> > Bruce
>
>


Re: Lower/UpperCase Issue

2014-07-09 Thread Shawn Heisey
On 7/9/2014 10:17 AM, EXTERNAL Taminidi Ravi (ETI,
Automotive-Service-Solutions) wrote:
> I have a situation here, when I search with "BALANCER" the results are 
> different Compare to "Balancer" and the order is different  When I search 
> "BALANCER" then, the documents with Upper Case are first in the List and for 
> "Balancer" it is in different order.
>
> I am confused why this behavior, Can some one has same issue or I am missing 
> something.
>
>  positionIncrementGap="100">
>   
>  
>   
>  words="stopwords.txt" />
>   
>   
>generateWordParts="0" generateNumberParts="0" catenateWords="1" 
> catenateNumbers="1" catenateAll="0"/>
> 
>   
>   
>  
>  
>   
>  words="stopwords.txt" />
>  ignoreCase="true" expand="true"/>
> 
>generateWordParts="0" generateNumberParts="0" catenateWords="1" 
> catenateNumbers="1" catenateAll="0"/>
>
>  
> 
>
> e.g query
>
> http://localhost:8983/solr/Test/select?q=*%3A*&fq=Name%3ABALANCER&wt=json&indent=true
>
> http://localhost:8983/solr/Test/select?q=*%3A*&fq=Name%3ABalancer&wt=json&indent=true

What is the full field definition for Name?  You've included a fieldType
here, but that's only half the picture.

Thanks,
Shawn



Lower/UpperCase Issue

2014-07-09 Thread EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions)
I have a situation here, when I search with "BALANCER" the results are 
different Compare to "Balancer" and the order is different  When I search 
"BALANCER" then, the documents with Upper Case are first in the List and for 
"Balancer" it is in different order.

I am confused why this behavior, Can some one has same issue or I am missing 
something.


  
 
  

  
  
  

  
  
 
 
  



  

 


e.g query

http://localhost:8983/solr/Test/select?q=*%3A*&fq=Name%3ABALANCER&wt=json&indent=true

http://localhost:8983/solr/Test/select?q=*%3A*&fq=Name%3ABalancer&wt=json&indent=true

Thanks

Ravi




Re: Complement of {!join}

2014-07-09 Thread Erik Hatcher
Maybe something like q=*:* AND NOT {!join … } would do the trick?  (it’ll 
depend on your version of Solr for support of the {!…} more natural nested 
queries)

Erik

On Jul 9, 2014, at 11:24 AM, Bruce Johnson  wrote:

> === Short-version ===
> Is there a way to join on the complement of a query? I want the only the
> Solr documents for which the nested join query does not match.
> 
> === Longer-version ===
> Query-time joins with {!join} are great at modeling the SQL equivalent of
> patterns like this:
> 
> SELECT book_name FROM books WHERE id
> IN (SELECT book_id FROM chapters WHERE chapter_title = "Foo")
> 
> This would find the name of books having chapters entitled "Foo". (Assuming
> the chapters table have the column 'book_id' that point back to the book
> record containing them.)
> 
> That's great.
> 
> Is there a way in Solr to query for the complement of that? In SQL terms,
> this:
> 
> SELECT book_name FROM books WHERE id
> NOT IN (SELECT book_id FROM chapters WHERE chapter_title = "Foo")
> 
> This would find books that do not have chapters entitled "Foo".
> 
> It isn't the same as querying (in Solr terms) for something like
> 
> {!join to=id from=book_id}-chapter_title:"Foo" // note the negation
> 
> because it would still match other chapters in the same book that are not
> entitled "Foo", causing the join to still identify the book based on its
> other non-Foo chapters.
> 
> Any advice would be greatly appreciated. I'm also open to other ways of
> thinking about the problem. Perhaps there are alternative indexing patterns
> that could accomplish the same goal.
> 
> Many thanks,
> Bruce



Re: Parallel optimize of index on SolrCloud.

2014-07-09 Thread Shawn Heisey
On 7/9/2014 8:49 AM, Timothy Potter wrote:
> Hi Modassar,
>
> Have you tried hitting the cores for each replica directly (instead of
> using the collection)? i.e. if you had col_shard1_replica1 on node1,
> then send the optimize command to that core URL directly:
>
> curl -i -v "http://host:port/solr/col_shard1_replica1/update"; -H
> 'Content-type:application/xml' \
>   --data-binary ""
>
> I haven't tried this myself but might work ;-)

That doesn't work.  It will optimize the whole collection, one core at a
time.  I thought that sending the optimize with distrib=false would
limit the optimize to just the called core, but that also doesn't work. 
I thought a bug had been filed on the distrib=false problem, but it's
been long enough that I'm no longer sure about that.

Thanks,
Shawn



Re: Getting OutOfMemoryError: Java heap space in Solr

2014-07-09 Thread Shawn Heisey
On 7/9/2014 6:02 AM, yuvaraj ponnuswamy wrote:
> Hi,
>
> I am getting the OutofMemory Error: "java.lang.OutOfMemoryError: Java heap 
> space" often in production due to the particular Treemap is taking more 
> memory in the JVM.
>
> When i looked into the config files I am having the entity called 
> UserQryDocument where i am fetching the data from certain tables.
> Again i have a sub entiry called "UserLocation" where i am using the 
> CachedSqlEntityProcessor to get the fields from Cache. It seems like it has 
> the total of 2,00,000 records total.
> processor="CachedSqlEntityProcessor" cacheKey="user_pin" 
> cacheLookup="UserQueryDocumentNonAuthor.DocKey">
>
> Like this i have some other different entity and there also i am using this 
> CachedSqlEntityProcessor in the sub entity.
>
> But when i looked into the Heap Dump : java_pid57.hprof i am able to see the 
> TreeMap is causing the problem.
>
> But not able to find which entity is causing this issue.I am using the IBM 
> Heap Ananlyser to look into the Dump.
>
> Can you please let me know is there any other way we can find out which 
> entity is causing this issue or any other tool to analyse and debug the Out 
> of Memory Issue to find the exact entity is causing this issue.
>
> I have attched the entity in dataconfig.xml and heap Anayser screen shot.

JDBC drivers have a habit of loading the entire resultset into RAM. 
Also, you are using the cached processor ... which will effectively do
the same thing.  With millions of DB rows, this is going to require a
LOT of heap memory.  You'll want to change your JDBC connection so that
it doesn't load the entire result set, and you may also need to turn off
entity caching in Solr.  You didn't mention what database you're using. 
Here's how to fix MySQL and SQL Server so they don't load the entire
result set.  The requirements for another database are likely to be
different:

https://wiki.apache.org/solr/DataImportHandlerFaq#I.27m_using_DataImportHandler_with_a_MySQL_database._My_table_is_huge_and_DataImportHandler_is_going_out_of_memory._Why_does_DataImportHandler_bring_everything_to_memory.3F

The best way to make DIH perform well is to use JOIN so that you can get
all your data with one entity and one SELECT query.  Let the database do
all the heavy lifting instead of having Solr send millions of queries. 
GROUP_CONCAT on the SQL side and a regexTransformer 'splitBy' can
sometimes be used to get multiple values into a field.

Thanks,
Shawn



Complement of {!join}

2014-07-09 Thread Bruce Johnson
=== Short-version ===
Is there a way to join on the complement of a query? I want the only the
Solr documents for which the nested join query does not match.

=== Longer-version ===
Query-time joins with {!join} are great at modeling the SQL equivalent of
patterns like this:

SELECT book_name FROM books WHERE id
IN (SELECT book_id FROM chapters WHERE chapter_title = "Foo")

This would find the name of books having chapters entitled "Foo". (Assuming
the chapters table have the column 'book_id' that point back to the book
record containing them.)

That's great.

Is there a way in Solr to query for the complement of that? In SQL terms,
this:

SELECT book_name FROM books WHERE id
NOT IN (SELECT book_id FROM chapters WHERE chapter_title = "Foo")

This would find books that do not have chapters entitled "Foo".

It isn't the same as querying (in Solr terms) for something like

{!join to=id from=book_id}-chapter_title:"Foo" // note the negation

because it would still match other chapters in the same book that are not
entitled "Foo", causing the join to still identify the book based on its
other non-Foo chapters.

Any advice would be greatly appreciated. I'm also open to other ways of
thinking about the problem. Perhaps there are alternative indexing patterns
that could accomplish the same goal.

Many thanks,
Bruce


Re: Add a new replica to SolrCloud

2014-07-09 Thread Erick Erickson
Here's a blog on the topic of creating cores on particular nodes.
http://heliosearch.org/solrcloud-assigning-nodes-machines/

Himanshu:
What you wrote works perfectly well. FYW, this can also be done
with the Collections API. The Collections API is evolving
though, so what commands are available depends on the
version of Solr you're using.

Best,
Erick

On Tue, Jul 8, 2014 at 10:14 PM, Himanshu Mehrotra
 wrote:
> Yes, there is a way.
>
> One node on which replica needs to be created hit
>
> curl '
> http://localhost:8983/solr/admin/cores?action=CREATE&name=&collection=&shard=<
> 
> shardid>'
>
> For example
>
> curl '
> http://localhost:8983/solr/admin/cores?action=CREATE&name=mycore&collection=collection1&shard=shard2
> '
>
>
> see http://wiki.apache.org/solr/SolrCloud#Creating_cores_via_CoreAdmin
> for details.
>
>
> Thanks,
>
> Himanshu
>
>
>
> On Wed, Jul 9, 2014 at 9:46 AM, Varun Gupta  wrote:
>
>> Hi,
>>
>> I am currently using Solr 4.7.2 and have SolrCloud setup running on 2
>> servers with number of shards as 2, replication factor as 2 and mas shards
>> per node as 4.
>>
>> Now, I want to add another server to the SolrCloud as a replica. I can see
>> Collection API to add a new replica but that was added in Solr 4.8. Is
>> there some way to add a new replica in Solr 4.7.2?
>>
>> --
>> Thanks
>> Varun Gupta
>>


Re: Synchronising two masters

2014-07-09 Thread Erick Erickson
1> stop all indexing.
2> stop Solr on M1
3> delete M1's data directory
4> temporarily make M1 a slave of M2 and wait for it to sync.
5> make M1 a master again.

But really, this isn't a very good setup. You're wasting a machine that
you could be using. What I'd do is set up a single master and 3 slaves.
If the master goes down, just take one of the slaves and make it a the
master. Now re-index everything that's changed to the new master. That
way you get some value out of the second master, you're using it to
server queries.

Of course this depends upon the ability to re-index from some point in time.
Say your master goes down at 9:00 AM. Say your polling interval is 30 minutes.
Once I'd re-configured things, I'd just re-index from, say, 7:30. It also
depends on your system being able to cope with being slightly stale for
a while.

FWIW,
Erick

On Tue, Jul 8, 2014 at 9:38 PM, Prasi S  wrote:
> Hi ,
> Our solr setup consists of 2 Masters and 2Slaves. The slaves would point to
> any one of the Masters through a load balancer and replicate the data.
>
> Master1(M1) is the primary indexer. I send data to M1. In case M1 fails, i
> have a failover master, M2 and that would be indexing the data. The problem
> is, once the Master1 comes up, how to synchornize M1 and M2? SolrCloud
> would the option rather that going with this setup. But, currently we want
> it to be implemented in Master-Slave mode.
>
> Any suggestions?
> Thanks,
> Prasi


Re: Planning ahead for Solr Cloud and Scaling

2014-07-09 Thread Timothy Potter
Hi Zane,

re 1: as an alternative to shard splitting, you can just overshard the
collection from the start and then migrate existing shards to new
hardware as needed. The migrate can happen online, see collection API
ADDREPLICA. Once the new replica is online on the new hardware, you
can unload the older replica on your original hardware. There are
other benefits to oversharding, such as increased parallelism during
indexing and query execution (provided you have the CPU capacity,
which is typically the case on modern hardware).

re 2: mainly depends on how the Java GC and heap are affected by
colocating the cores on the same JVM ... if heap is stable and the GC
is keeping up and qps / latency times are acceptable, I wouldn't
change it.

re 3: read Trey's chapter 14 in Solr in Action ;-)

Cheers,
Tim

On Tue, Jul 8, 2014 at 10:09 PM, Zane Rockenbaugh  wrote:
> I'm working on a product hosted with AWS that uses Elastic Beanstalk
> auto-scaling to good effect and we are trying to set up similar (more or
> less) runtime scaling support with Solr. I think I understand how to set
> this up, and wanted to check I was on the right track.
>
> We currently run 3 cores on a single host / Solr server / shard. This is
> just fine for now, and we have overhead for the near future. However, I
> need to have a plan, and then test, for a higher capacity future.
>
> 1) I gather that if I set up SolrCloud, and then later load increases, I
> can spin up a second host / Solr server, create a new shard, and then split
> the first shard:
>
> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api3
>
> And doing this, we no longer have to commit to shards out of the gate.
>
> 2) I'm not clear whether there's a big advantage splitting up the cores or
> not. Two of the three cores will have about the same number of documents,
> though only one contains large amounts of text. The third core is much
> smaller in both bytes and documents (2 orders of magnitude).
>
> 3) We are also looking at moving multi-lingual. The current plan is to
> store the localized text in fields within the same core. The languages will
> be added over time. We can update the schema (as each will be optional).
> This seems easier than adding a core for each language. Is there a downside?
>
> Thanks for any pointers.


Re: [Solr Schema API] SolrJ Access

2014-07-09 Thread Alessandro Benedetti
mmm wondering how to pass the payload for the PUT using that structure with
SolrQuery...


2014-07-09 15:42 GMT+01:00 Alessandro Benedetti 
:

> Thank's Elaine !
> Worked for the GET Method !
> I will test soon with the PUT method :)
>
> One strange thing is that is working with a real Solr Instance but not
> with an Embedded SolrServer ...
> probably it's matter of dependencies, I let you know...
>
> Many thanks
>
> Cheers
>
>
> 2014-07-08 21:59 GMT+01:00 Cario, Elaine :
>
> Alessandro,
>>
>> I just got this to work myself:
>>
>> public static final String DEFINED_FIELDS_API = "/schema/fields";
>> public static final String DYNAMIC_FIELDS_API =
>> "/schema/dynamicfields";
>> ...
>> // just get a connection to Solr as usual (the factory is mine -
>> it will use CloudSolrServer or HttpSolrServer depending on if we're using
>> SolrCloud or not)
>> SolrClient client =
>> SolrClientFactory.getSolrClientInstance(CLOUD_ENABLED);
>> SolrServer solrConn = client.getConnection(SOLR_URL, collection);
>>
>> SolrQuery query = new SolrQuery();
>> if (dynamicFields)
>> query.setRequestHandler(DYNAMIC_FIELDS_API);
>> else
>> query.setRequestHandler(DEFINED_FIELDS_API);
>> query.setParam("showDefaults", true);
>>
>> QueryResponse response = solrConn.query(query)
>>
>> Then you've got to parse the response using NamedList etc.etc.
>>
>> -Original Message-
>> From: Alessandro Benedetti [mailto:benedetti.ale...@gmail.com]
>> Sent: Tuesday, July 08, 2014 5:54 AM
>> To: solr-user@lucene.apache.org
>> Subject: [Solr Schema API] SolrJ Access
>>
>> Hi guys,
>> wondering if there is any proper way to access Schema API via Solrj.
>>
>> Of course is possible to reach them in Java with a specific Http Request,
>> but in this way, using SolrCloud for example we become coupled to one
>> specific instance ( and we don't want) .
>>
>> Code Example :
>>
>> HttpResponse httpResponse;
>> > String url=this.solrBase+"/"+core+
>> > SCHEMA_SOLR_FIELDS_ENDPOINT
>> > +fieldName;
>> > HttpPut httpPut = new HttpPut(url);
>> > StringEntity entity = new StringEntity(
>> > "{\"type\":\"text_general\",\"stored\":\"true\"}" ,
>> > ContentType.APPLICATION_JSON);
>> >  httpPut.setEntity( entity );
>> >  HttpClient client=new DefaultHttpClient();
>> >  response = client.execute(httpPut);
>>
>>
>> Any suggestion ?
>> In my opinion should be interesting to have some auxiliary method in
>> SolrServer if it's not there yet.
>>
>> Cheers
>>
>> --
>> --
>>
>> Benedetti Alessandro
>> Visiting card : http://about.me/alessandro_benedetti
>>
>> "Tyger, tyger burning bright
>> In the forests of the night,
>> What immortal hand or eye
>> Could frame thy fearful symmetry?"
>>
>> William Blake - Songs of Experience -1794 England
>>
>
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>



-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: Parallel optimize of index on SolrCloud.

2014-07-09 Thread Timothy Potter
Hi Modassar,

Have you tried hitting the cores for each replica directly (instead of
using the collection)? i.e. if you had col_shard1_replica1 on node1,
then send the optimize command to that core URL directly:

curl -i -v "http://host:port/solr/col_shard1_replica1/update"; -H
'Content-type:application/xml' \
  --data-binary ""

I haven't tried this myself but might work ;-)

Tim

On Wed, Jul 9, 2014 at 12:59 AM, Modassar Ather  wrote:
> Hi All,
>
> Thanks for your kind suggestions and inputs.
>
> We have been going the optimize way and it has helped. There have been
> testing and benchmarking already done around memory and performance.
> So while optimizing we see a scope of improvement on it by doing it
> parallel so kindly suggest in what way it can be achieved.
>
> Thanks,
> Modassar
>
>
> On Wed, Jul 9, 2014 at 11:48 AM, Shalin Shekhar Mangar <
> shalinman...@gmail.com> wrote:
>
>> Hi Walter,
>>
>> I wonder why you think SolrCloud isn't necessary if you're indexing once
>> per week. Isn't the automatic failover and auto-sharding still useful? One
>> can also do custom sharding with SolrCloud if necessary.
>>
>>
>> On Wed, Jul 9, 2014 at 11:38 AM, Walter Underwood 
>> wrote:
>>
>> > More memory or faster disks will make a much bigger improvement than a
>> > forced merge.
>> >
>> > What are you measuring? If it is average query time, that is not a good
>> > measure. Look at 90th or 95th percentile. Test with queries from logs.
>> >
>> > No user can see a 10% or 20% difference. If your managers are watching
>> > that, they are watching the wrong thing.
>> >
>> > If you are indexing once per week, you don't really need the complexity
>> of
>> > Solr Cloud. You can do manual sharding.
>> >
>> > wunder
>> >
>> > On Jul 8, 2014, at 10:55 PM, Modassar Ather 
>> > wrote:
>> >
>> > > Our index has almost 100M documents running on SolrCloud of 3 shards
>> and
>> > > each shard has an index size of about 700GB (for the record, we are not
>> > > using stored fields - our documents are pretty large). We perform a
>> full
>> > > indexing every weekend and during the week there are no updates made to
>> > the
>> > > index. Most of the queries that we run are pretty complex with hundreds
>> > of
>> > > terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards, boosts
>> etc.
>> > > and take many minutes to execute. A difference of 10-20% is also a big
>> > > advantage for us.
>> > >
>> > > We have been optimizing the index after indexing for years and it has
>> > > worked well for us. Every once in a while, we upgrade Solr to the
>> latest
>> > > version and try without optimizing so that we can save the many hours
>> it
>> > > take to optimize such a huge index, but it does not work well.
>> > >
>> > > Kindly provide your suggestion.
>> > >
>> > > Thanks,
>> > > Modassar
>> > >
>> > >
>> > > On Wed, Jul 9, 2014 at 10:47 AM, Walter Underwood <
>> wun...@wunderwood.org
>> > >
>> > > wrote:
>> > >
>> > >> I seriously doubt that you are required to force merge.
>> > >>
>> > >> How much improvement? And is the big performance cost also OK?
>> > >>
>> > >> I have worked on search engines that do automatic merges and offer
>> > forced
>> > >> merges for over fifteen years. For all that time, forced merges have
>> > >> usually caused problems.
>> > >>
>> > >> Stop doing forced merges.
>> > >>
>> > >> wunder
>> > >>
>> > >> On Jul 8, 2014, at 10:09 PM, Modassar Ather 
>> > >> wrote:
>> > >>
>> > >>> Thanks Walter for your inputs.
>> > >>>
>> > >>> Our use case and performance benchmark requires us to invoke
>> optimize.
>> > >>>
>> > >>> Here we see a chance of improvement in performance of optimize() if
>> > >> invoked
>> > >>> in parallel.
>> > >>> I found that if* distrib=false *is used, the optimization will happen
>> > in
>> > >>> parallel.
>> > >>>
>> > >>> But I could not find a way to set it using
>> > >> HttpSolrServer/CloudSolrServer.
>> > >>> Also with the parameter setting as given in my mail above does not
>> > seems
>> > >> to
>> > >>> work.
>> > >>>
>> > >>> Please let me know in what ways I can achieve the parallel optimize
>> on
>> > >>> SolrCloud.
>> > >>>
>> > >>> Thanks,
>> > >>> Modassar
>> > >>>
>> > >>> On Tue, Jul 8, 2014 at 7:53 PM, Walter Underwood <
>> > wun...@wunderwood.org>
>> > >>> wrote:
>> > >>>
>> >  You probably do not need to force merge (mistakenly called
>> "optimize")
>> >  your index.
>> > 
>> >  Solr does automatic merges, which work just fine.
>> > 
>> >  There are only a few situations where a forced merge is even a good
>> > >> idea.
>> >  The most common one is a replicated (non-cloud) setup with a full
>> > >> reindex
>> >  every night.
>> > 
>> >  If you need Solr Cloud, I cannot think of a situation where you
>> would
>> > >> want
>> >  a forced merge.
>> > 
>> >  wunder
>> > 
>> >  On Jul 8, 2014, at 2:01 AM, Modassar Ather 
>> > >> wrote:
>> > 
>> > > Hi,
>> > >
>> > > Need to optimize index created

Re: [Solr Schema API] SolrJ Access

2014-07-09 Thread Alessandro Benedetti
Thank's Elaine !
Worked for the GET Method !
I will test soon with the PUT method :)

One strange thing is that is working with a real Solr Instance but not with
an Embedded SolrServer ...
probably it's matter of dependencies, I let you know...

Many thanks

Cheers


2014-07-08 21:59 GMT+01:00 Cario, Elaine :

> Alessandro,
>
> I just got this to work myself:
>
> public static final String DEFINED_FIELDS_API = "/schema/fields";
> public static final String DYNAMIC_FIELDS_API =
> "/schema/dynamicfields";
> ...
> // just get a connection to Solr as usual (the factory is mine -
> it will use CloudSolrServer or HttpSolrServer depending on if we're using
> SolrCloud or not)
> SolrClient client =
> SolrClientFactory.getSolrClientInstance(CLOUD_ENABLED);
> SolrServer solrConn = client.getConnection(SOLR_URL, collection);
>
> SolrQuery query = new SolrQuery();
> if (dynamicFields)
> query.setRequestHandler(DYNAMIC_FIELDS_API);
> else
> query.setRequestHandler(DEFINED_FIELDS_API);
> query.setParam("showDefaults", true);
>
> QueryResponse response = solrConn.query(query)
>
> Then you've got to parse the response using NamedList etc.etc.
>
> -Original Message-
> From: Alessandro Benedetti [mailto:benedetti.ale...@gmail.com]
> Sent: Tuesday, July 08, 2014 5:54 AM
> To: solr-user@lucene.apache.org
> Subject: [Solr Schema API] SolrJ Access
>
> Hi guys,
> wondering if there is any proper way to access Schema API via Solrj.
>
> Of course is possible to reach them in Java with a specific Http Request,
> but in this way, using SolrCloud for example we become coupled to one
> specific instance ( and we don't want) .
>
> Code Example :
>
> HttpResponse httpResponse;
> > String url=this.solrBase+"/"+core+
> > SCHEMA_SOLR_FIELDS_ENDPOINT
> > +fieldName;
> > HttpPut httpPut = new HttpPut(url);
> > StringEntity entity = new StringEntity(
> > "{\"type\":\"text_general\",\"stored\":\"true\"}" ,
> > ContentType.APPLICATION_JSON);
> >  httpPut.setEntity( entity );
> >  HttpClient client=new DefaultHttpClient();
> >  response = client.execute(httpPut);
>
>
> Any suggestion ?
> In my opinion should be interesting to have some auxiliary method in
> SolrServer if it's not there yet.
>
> Cheers
>
> --
> --
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>



-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: Solr atomic updates question

2014-07-09 Thread Steve McKay
Right. Without atomic updates, the client needs to fetch the document (or 
rebuild it from the system of record), apply changes, and send the entire 
document to Solr, including fields that haven't changed. With atomic updates, 
the client sends a list of changes to Solr and the server handles the 
read/modify/write steps internally. That's the closest Solr can get to updating 
a doc in place.

Steve

On Jul 8, 2014, at 10:42 PM, Bill Au  wrote:

> I see what you mean now.  Thanks for the example.  It makes things very
> clear.
> 
> I have been thinking about the explanation in the original response more.
> According to that, both regular update with entire doc and atomic update
> involves a delete by id followed by a add.  But both the Solr reference doc
> (
> https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents)
> says that:
> 
> "The first is *atomic updates*. This approach allows changing only one or
> more fields of a document without having to re-index the entire document."
> 
> But since Solr is doing a delete by id followed by a add, so "without
> having to re-index the entire document" apply to the client side only?  On
> the server side the add means that the entire document is re-indexed, right?
> 
> Bill
> 
> 
> On Tue, Jul 8, 2014 at 7:32 PM, Steve McKay  wrote:
> 
>> Take a look at this update XML:
>> 
>> 
>>  
>>05991
>>Steve McKay
>>Walla Walla
>>Python
>>  
>> 
>> 
>> Let's say employeeId is the key. If there's a fourth field, salary, on the
>> existing doc, should it be deleted or retained? With this update it will
>> obviously be deleted:
>> 
>> 
>>  
>>05991
>>Steve McKay
>>  
>> 
>> 
>> With this XML it will be retained:
>> 
>> 
>>  
>>05991
>>Walla Walla
>>Python
>>  
>> 
>> 
>> I'm not willing to guess what will happen in the case where non-atomic and
>> atomic updates are present on the same add because I haven't looked at that
>> code since 4.0, but I think I could make a case for retaining salary or for
>> discarding it. That by itself reeks--and it's also not well documented.
>> Relying on iffy, poorly-documented behavior is asking for pain at upgrade
>> time.
>> 
>> Steve
>> 
>> On Jul 8, 2014, at 7:02 PM, Bill Au  wrote:
>> 
>>> Thanks for that under-the-cover explanation.
>>> 
>>> I am not sure what you mean by "mix atomic updates with regular field
>>> values".  Can you give an example?
>>> 
>>> Thanks.
>>> 
>>> Bill
>>> 
>>> 
>>> On Tue, Jul 8, 2014 at 6:56 PM, Steve McKay  wrote:
>>> 
 Atomic updates fetch the doc with RealTimeGet, apply the updates to the
 fetched doc, then reindex. Whether you use atomic updates or send the
 entire doc to Solr, it has to deleteById then add. The perf difference
 between the atomic updates and "normal" updates is likely minimal.
 
 Atomic updates are for when you have changes and want to apply them to a
 document without affecting the other fields. A regular add will replace
>> an
 existing document completely. AFAIK Solr will let you mix atomic updates
 with regular field values, but I don't think it's a good idea.
 
 Steve
 
 On Jul 8, 2014, at 5:30 PM, Bill Au  wrote:
 
> Solr atomic update allows for changing only one or more fields of a
> document without having to re-index the entire document.  But what
>> about
> the case where I am sending in the entire document?  In that case the
 whole
> document will be re-indexed anyway, right?  So I assume that there will
 be
> no saving.  I am actually thinking that there will be a performance
 penalty
> since atomic update requires Solr to first retrieve all the fields
>> first
> before updating.
> 
> Bill
 
 
>> 
>> 



Getting OutOfMemoryError: Java heap space in Solr

2014-07-09 Thread yuvaraj ponnuswamy
Hi,

I am getting the OutofMemory Error: "java.lang.OutOfMemoryError: Java heap 
space" often in production due to the particular Treemap is taking more memory 
in the JVM.

When i looked into the config files I am having the entity called 
UserQryDocument where i am fetching the data from certain tables.
Again i have a sub entiry called "UserLocation" where i am using the 
CachedSqlEntityProcessor to get the fields from Cache. It seems like it has the 
total of 2,00,000 records total.
processor="CachedSqlEntityProcessor" cacheKey="user_pin" 
cacheLookup="UserQueryDocumentNonAuthor.DocKey">

Like this i have some other different entity and there also i am using this 
CachedSqlEntityProcessor in the sub entity.

But when i looked into the Heap Dump : java_pid57.hprof i am able to see the 
TreeMap is causing the problem.

But not able to find which entity is causing this issue.I am using the IBM Heap 
Ananlyser to look into the Dump.

Can you please let me know is there any other way we can find out which entity 
is causing this issue or any other tool to analyse and debug the Out of Memory 
Issue to find the exact entity is causing this issue.

I have attched the entity in dataconfig.xml and heap Anayser screen shot.


Thanks
P.Yuvaraj Kumar 








































Re: Parallel optimize of index on SolrCloud.

2014-07-09 Thread Modassar Ather
Hi All,

Thanks for your kind suggestions and inputs.

We have been going the optimize way and it has helped. There have been
testing and benchmarking already done around memory and performance.
So while optimizing we see a scope of improvement on it by doing it
parallel so kindly suggest in what way it can be achieved.

Thanks,
Modassar


On Wed, Jul 9, 2014 at 11:48 AM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> Hi Walter,
>
> I wonder why you think SolrCloud isn't necessary if you're indexing once
> per week. Isn't the automatic failover and auto-sharding still useful? One
> can also do custom sharding with SolrCloud if necessary.
>
>
> On Wed, Jul 9, 2014 at 11:38 AM, Walter Underwood 
> wrote:
>
> > More memory or faster disks will make a much bigger improvement than a
> > forced merge.
> >
> > What are you measuring? If it is average query time, that is not a good
> > measure. Look at 90th or 95th percentile. Test with queries from logs.
> >
> > No user can see a 10% or 20% difference. If your managers are watching
> > that, they are watching the wrong thing.
> >
> > If you are indexing once per week, you don't really need the complexity
> of
> > Solr Cloud. You can do manual sharding.
> >
> > wunder
> >
> > On Jul 8, 2014, at 10:55 PM, Modassar Ather 
> > wrote:
> >
> > > Our index has almost 100M documents running on SolrCloud of 3 shards
> and
> > > each shard has an index size of about 700GB (for the record, we are not
> > > using stored fields - our documents are pretty large). We perform a
> full
> > > indexing every weekend and during the week there are no updates made to
> > the
> > > index. Most of the queries that we run are pretty complex with hundreds
> > of
> > > terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards, boosts
> etc.
> > > and take many minutes to execute. A difference of 10-20% is also a big
> > > advantage for us.
> > >
> > > We have been optimizing the index after indexing for years and it has
> > > worked well for us. Every once in a while, we upgrade Solr to the
> latest
> > > version and try without optimizing so that we can save the many hours
> it
> > > take to optimize such a huge index, but it does not work well.
> > >
> > > Kindly provide your suggestion.
> > >
> > > Thanks,
> > > Modassar
> > >
> > >
> > > On Wed, Jul 9, 2014 at 10:47 AM, Walter Underwood <
> wun...@wunderwood.org
> > >
> > > wrote:
> > >
> > >> I seriously doubt that you are required to force merge.
> > >>
> > >> How much improvement? And is the big performance cost also OK?
> > >>
> > >> I have worked on search engines that do automatic merges and offer
> > forced
> > >> merges for over fifteen years. For all that time, forced merges have
> > >> usually caused problems.
> > >>
> > >> Stop doing forced merges.
> > >>
> > >> wunder
> > >>
> > >> On Jul 8, 2014, at 10:09 PM, Modassar Ather 
> > >> wrote:
> > >>
> > >>> Thanks Walter for your inputs.
> > >>>
> > >>> Our use case and performance benchmark requires us to invoke
> optimize.
> > >>>
> > >>> Here we see a chance of improvement in performance of optimize() if
> > >> invoked
> > >>> in parallel.
> > >>> I found that if* distrib=false *is used, the optimization will happen
> > in
> > >>> parallel.
> > >>>
> > >>> But I could not find a way to set it using
> > >> HttpSolrServer/CloudSolrServer.
> > >>> Also with the parameter setting as given in my mail above does not
> > seems
> > >> to
> > >>> work.
> > >>>
> > >>> Please let me know in what ways I can achieve the parallel optimize
> on
> > >>> SolrCloud.
> > >>>
> > >>> Thanks,
> > >>> Modassar
> > >>>
> > >>> On Tue, Jul 8, 2014 at 7:53 PM, Walter Underwood <
> > wun...@wunderwood.org>
> > >>> wrote:
> > >>>
> >  You probably do not need to force merge (mistakenly called
> "optimize")
> >  your index.
> > 
> >  Solr does automatic merges, which work just fine.
> > 
> >  There are only a few situations where a forced merge is even a good
> > >> idea.
> >  The most common one is a replicated (non-cloud) setup with a full
> > >> reindex
> >  every night.
> > 
> >  If you need Solr Cloud, I cannot think of a situation where you
> would
> > >> want
> >  a forced merge.
> > 
> >  wunder
> > 
> >  On Jul 8, 2014, at 2:01 AM, Modassar Ather 
> > >> wrote:
> > 
> > > Hi,
> > >
> > > Need to optimize index created using CloudSolrServer APIs under
> > >> SolrCloud
> > > setup of 3 instances on separate machines. Currently it optimizes
> > > sequentially if I invoke cloudSolrServer.optimize().
> > >
> > > To make it parallel I tried making three separate HttpSolrServer
> >  instances
> > > and invoked httpSolrServer.opimize() on them parallely but still it
> > >> seems
> > > to be doing optimization sequentially.
> > >
> > > I tried invoking optimize directly using HttpPost with following
> url
> > >> and
> > > parameters but still it seems to be sequential.
> > 

Re: Solr irregularly having QTime > 50000ms, stracing solr cures the problem

2014-07-09 Thread Harald Kirsch
Good point. I will see if I can get the necessary access rights on this 
machine to run tcpdump.


Thanks for the suggestion,
Harald.

On 09.07.2014 00:32, Steve McKay wrote:

Sure sounds like a socket bug, doesn't it? I turn to tcpdump when Solr starts 
behaving strangely in a socket-related way. Knowing exactly what's happening at 
the transport level is worth a month of guessing and poking.

On Jul 8, 2014, at 3:53 AM, Harald Kirsch  wrote:


Hi all,

This is what happens when I run a regular wget query to log the current number 
of documents indexed:

2014-07-08:07:23:28 QTime=20 numFound="5720168"
2014-07-08:07:24:28 QTime=12 numFound="5721126"
2014-07-08:07:25:28 QTime=19 numFound="5721126"
2014-07-08:07:27:18 QTime=50071 numFound="5721126"
2014-07-08:07:29:08 QTime=50058 numFound="5724494"
2014-07-08:07:30:58 QTime=50033 numFound="5730710"
2014-07-08:07:31:58 QTime=13 numFound="5730710"
2014-07-08:07:33:48 QTime=50065 numFound="5734069"
2014-07-08:07:34:48 QTime=16 numFound="5737742"
2014-07-08:07:36:38 QTime=50037 numFound="5737742"
2014-07-08:07:37:38 QTime=12 numFound="5738190"
2014-07-08:07:38:38 QTime=23 numFound="5741208"
2014-07-08:07:40:29 QTime=50034 numFound="5742067"
2014-07-08:07:41:29 QTime=12 numFound="5742067"
2014-07-08:07:42:29 QTime=17 numFound="5742067"
2014-07-08:07:43:29 QTime=20 numFound="5745497"
2014-07-08:07:44:29 QTime=13 numFound="5745981"
2014-07-08:07:45:29 QTime=23 numFound="5746420"

As you can see, the QTime is just over 50 seconds at irregular intervals.

This happens independent of whether I am indexing documents with around 20 dps 
or not. First I thought about a dependence on the auto-commit of 5 minutes, but 
the the 50 seconds hits are too irregular.

Furthermore, and this is *really strange*: when hooking strace on the solr 
process, the 50 seconds QTimes disappear completely and consistently --- a real 
Heisenbug.

Nevertheless, strace shows that there is a socket timeout of 50 seconds defined 
in calls like this:

[pid  1253] 09:09:37.857413 poll([{fd=96, events=POLLIN|POLLERR}], 1, 5) = 1 
([{fd=96, revents=POLLIN}]) <0.40>

where the fd=96 is the result of

[pid 25446] 09:09:37.855235 accept(122, {sa_family=AF_INET, sin_port=htons(57236), 
sin_addr=inet_addr("ip address of local host")}, [16]) = 96 <0.54>

where again fd=122 is the TCP port on which solr was started.

My hunch is that this is communication between the cores of solr.

I tried to search the internet for such a strange connection between socket 
timeouts and strace, but could not find anything (the stackoverflow entry from 
yesterday is my own :-(


This smells a bit like a race condition/deadlock kind of thing which is broken 
up by timing differences introduced by stracing the process.

Any hints appreciated.

For completeness, here is my setup:
- solr-4.8.1,
- cloud version running
- 10 shards on 10 cores in one instance
- hosted on SUSE Linux Enterprise Server 11 (x86_64), VERSION 11, PATCHLEVEL 2
- hosted on a vmware, 4 CPU cores, 16 GB RAM
- single digit million docs indexed, exact number does not matter
- zero query load


Harald.