Re: OOM Error

2016-10-25 Thread Toke Eskildsen
On Tue, 2016-10-25 at 15:04 -0400, Susheel Kumar wrote:
> Thanks, Toke.  Analyzing GC logs helped to determine that it was a
> sudden
> death.  

> The peaks in last 20 mins... See   http://tinypic.com/r/n2zonb/9

Peaks yes, but there is a pattern of 

1) Stable memory use
2) Temporary doubling of the memory used and a lot of GC
3) Increased (relative to last stable period) but stable memory use
4) Goto 2

Should I guess, I would say that you are running ingests in batches,
which temporarily causes 2 searchers to be open at the same time. That
is 2 in the list above. After the batch ingest, the baseline moves up,
assumedly because your have added quite a lot of documents, relative to
the overall number of documents.


The temporary doubling of the baseline is hard to avoid, but I am
surprised of the amount of heap that you need in the stable periods.
Just to be clear: This is from a Solr with 8GB of heap handling only 1
shard of 20GB and you are using DocValues? How many documents do you
have in such a shard?

- Toke Eskildsen, State and University Library, Denmark


Re: Facet behavior

2016-10-25 Thread Bastien Latard | MDPI AG

Hi Guys,

Could any of you tell me if I'm right?
Thanks in advance.

kr,
Bast



 Forwarded Message 
Subject:Re: Facet behavior
Date:   Thu, 20 Oct 2016 14:45:23 +0200
From:   Bastien Latard | MDPI AG 
To: solr-user@lucene.apache.org



Hi Yonik,

Thanks for your answer!
I'm not quite I understood everything...please, see my comments below.



On Wed, Oct 19, 2016 at 6:23 AM, Bastien Latard | MDPI AG
 wrote:

I just had a question about facets.
*==> Is the facet run on all documents (to pre-process/cache the data) or
only on returned documents?*

Yes ;-)

There are sometimes per-field data structures that are cached to
support faceting.  This can make the first facet request after a new
searcher take longer.  Unless you're using docValues, then the cost is
much less.

So how to force it to use docValues? Simply:

Are there other advantage/inconvenient?


Then there are per-request data structures (like a count array) that
are O(field_cardinality) and not O(matching_docs).
But then for default field-cache faceting, the actual counting part is
O(matching_docs).
So yes, at the end of  the day we only facet on the matching
documents... but what the total field looks like certainly matters.

This would only be like that if I would use docValues, right?

If I have such field declaration (dedicated field for facet-- without
stemming), what would be the best setting?


Kind regards,
Bastien



Re: OOM Error

2016-10-25 Thread Shawn Heisey
On 10/25/2016 8:03 PM, Susheel Kumar wrote:
> Agree, Pushkar.  I had docValues for sorting / faceting fields from
> begining (since I setup Solr 6.0).  So good on that side. I am going to
> analyze the queries to find any potential issue. Two questions which I am
> puzzling with
>
> a) Should the below JVM parameter be included for Prod to get heap dump
>
> "-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/path/to/the/dump"

A heap dump can take a very long time to complete, and there may not be
enough memory in the machine to start another instance of Solr until the
first one has finished the heap dump.  Also, I do not know whether Java
would release the listening port before the heap dump finishes.  If not,
then a new instance would not be able to start immediately.

If a different heap dump file is created each time, that might lead to
problems with disk space after repeated dumps.  I don't know how the
option works.

> b) Currently OOM script just kills the Solr instance. Shouldn't it be
> enhanced to wait and restart Solr instance

As long as there is a problem causing OOMs, it seems rather pointless to
start Solr right back up, as another OOM is likely.  The safest thing to
do is kill Solr (since its operation would be unpredictable after OOM)
and let the admin sort the problem out.

Thanks,
Shawn



Re: OOM Error

2016-10-25 Thread Erick Erickson
Off the top of my head:

a) Should the below JVM parameter be included for Prod to get heap dump

Makes sense. It may produce quite a large dump file, but then this is
an extraordinary situation so that's probably OK.

b) Currently OOM script just kills the Solr instance. Shouldn't it be
enhanced to wait and restart Solr instance

Personally I don't think so. IMO there's no real point in restarting
Solr, you have to address this issue as this situation is likely to
recur. So restarting Solr may hide this very serious problem, how
would you even know to look? Restarting Solr could potentially lead to
a long involved process of wondering why selected queries seem to fail
and not noticing that the OOM script killed Solr. Having the default
_not_ restart Solr forces you to notice.

If you have to change the script to restart Solr, you also know that
you made the change and you should _really_ notify ops that they
should monitor this situation.

I admit this can be argued either way; Personally, I'd rather "fail
fast and often".

Best,
Erick

On Tue, Oct 25, 2016 at 7:03 PM, Susheel Kumar  wrote:
> Agree, Pushkar.  I had docValues for sorting / faceting fields from
> begining (since I setup Solr 6.0).  So good on that side. I am going to
> analyze the queries to find any potential issue. Two questions which I am
> puzzling with
>
> a) Should the below JVM parameter be included for Prod to get heap dump
>
> "-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/path/to/the/dump"
>
> b) Currently OOM script just kills the Solr instance. Shouldn't it be
> enhanced to wait and restart Solr instance
>
> Thanks,
> Susheel
>
>
>
>
> On Tue, Oct 25, 2016 at 7:35 PM, Pushkar Raste 
> wrote:
>
>> You should look into using docValues.  docValues are stored off heap and
>> hence you would be better off than just bumping up the heap.
>>
>> Don't enable docValues on existing fields unless you plan to reindex data
>> from scratch.
>>
>> On Oct 25, 2016 3:04 PM, "Susheel Kumar"  wrote:
>>
>> > Thanks, Toke.  Analyzing GC logs helped to determine that it was a sudden
>> > death.  The peaks in last 20 mins... See   http://tinypic.com/r/n2zonb/9
>> >
>> > Will look into the queries more closer and also adjusting the cache
>> sizing.
>> >
>> >
>> > Thanks,
>> > Susheel
>> >
>> > On Tue, Oct 25, 2016 at 3:37 AM, Toke Eskildsen 
>> > wrote:
>> >
>> > > On Mon, 2016-10-24 at 18:27 -0400, Susheel Kumar wrote:
>> > > > I am seeing OOM script killed solr (solr 6.0.0) on couple of our VM's
>> > > > today. So far our solr cluster has been running fine but suddenly
>> > > > today many of the VM's Solr instance got killed.
>> > >
>> > > As you have the GC-logs, you should be able to determine if it was a
>> > > slow death (e.g. caches gradually being filled) or a sudden one (e.g.
>> > > grouping or faceting on a large new non-DocValued field).
>> > >
>> > > Try plotting the GC logs with time on the x-axis and free memory after
>> > > GC on the y-axis. It it happens to be a sudden death, the last lines in
>> > > solr.log might hold a clue after all.
>> > >
>> > > - Toke Eskildsen, State and University Library, Denmark
>> > >
>> >
>>


Re: OOM Error

2016-10-25 Thread Susheel Kumar
Agree, Pushkar.  I had docValues for sorting / faceting fields from
begining (since I setup Solr 6.0).  So good on that side. I am going to
analyze the queries to find any potential issue. Two questions which I am
puzzling with

a) Should the below JVM parameter be included for Prod to get heap dump

"-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/path/to/the/dump"

b) Currently OOM script just kills the Solr instance. Shouldn't it be
enhanced to wait and restart Solr instance

Thanks,
Susheel




On Tue, Oct 25, 2016 at 7:35 PM, Pushkar Raste 
wrote:

> You should look into using docValues.  docValues are stored off heap and
> hence you would be better off than just bumping up the heap.
>
> Don't enable docValues on existing fields unless you plan to reindex data
> from scratch.
>
> On Oct 25, 2016 3:04 PM, "Susheel Kumar"  wrote:
>
> > Thanks, Toke.  Analyzing GC logs helped to determine that it was a sudden
> > death.  The peaks in last 20 mins... See   http://tinypic.com/r/n2zonb/9
> >
> > Will look into the queries more closer and also adjusting the cache
> sizing.
> >
> >
> > Thanks,
> > Susheel
> >
> > On Tue, Oct 25, 2016 at 3:37 AM, Toke Eskildsen 
> > wrote:
> >
> > > On Mon, 2016-10-24 at 18:27 -0400, Susheel Kumar wrote:
> > > > I am seeing OOM script killed solr (solr 6.0.0) on couple of our VM's
> > > > today. So far our solr cluster has been running fine but suddenly
> > > > today many of the VM's Solr instance got killed.
> > >
> > > As you have the GC-logs, you should be able to determine if it was a
> > > slow death (e.g. caches gradually being filled) or a sudden one (e.g.
> > > grouping or faceting on a large new non-DocValued field).
> > >
> > > Try plotting the GC logs with time on the x-axis and free memory after
> > > GC on the y-axis. It it happens to be a sudden death, the last lines in
> > > solr.log might hold a clue after all.
> > >
> > > - Toke Eskildsen, State and University Library, Denmark
> > >
> >
>


Re: Does _version_ field in schema need to be indexed and/or stored?

2016-10-25 Thread Yonik Seeley
On Tue, Oct 25, 2016 at 6:41 PM, Brent  wrote:
> I know that in the sample config sets, the _version_ field is indexed and not
> stored, like so:
>
> 
>
> Is there any reason it needs to be indexed?

It may depend on your solr version, but the starting configsets
currently only have docvalues:

./solr/server/solr/configsets/basic_configs/conf/managed-schema:


-Yonik


Re: Does _version_ field in schema need to be indexed and/or stored?

2016-10-25 Thread Alexandre Rafalovitch
Did you try using optimistic concurrency or SolrCloud? It should NOT
work if I understand what's going on correctly.

And if you don't index and don't store (and don't docValue), you don't
actually have that field active. That's how the dynamicField
*/false/false/false works to avoid unknown fields causing exception
during indexing.

Regards,
   Alex.

Solr Example reading group is starting November 2016, join us at
http://j.mp/SolrERG
Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 25 October 2016 at 17:41, Brent  wrote:
> I know that in the sample config sets, the _version_ field is indexed and not
> stored, like so:
>
> 
>
> Is there any reason it needs to be indexed? I'm able to create collections
> and use them with it not indexed, but I wonder if it negatively impacts
> performance.
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Does-version-field-in-schema-need-to-be-indexed-and-or-stored-tp4303036.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Combine Data from PDF + XML

2016-10-25 Thread Erick Erickson
First you need to define the problem

what do you mean by "combine"? Do the XML files
contain, say, metadata about an associated PDF file?

Or are these entirely orthogonal documents that
you need to index into the same collection?

Best,
Erick

On Tue, Oct 25, 2016 at 4:18 PM, tesm...@gmail.com  wrote:
> Hi,
>
> I ma new to Apache Solr.  Developing a search project. The source data is
> coming from two sources:
>
> 1) XML Files
>
> 2) PDF Files
>
>
> I need to combine these two sources for search.  Couldn't find example of
> combining these two sources. Any help is appreciated.
>
>
> Regards,


Re: OOM Error

2016-10-25 Thread Pushkar Raste
You should look into using docValues.  docValues are stored off heap and
hence you would be better off than just bumping up the heap.

Don't enable docValues on existing fields unless you plan to reindex data
from scratch.

On Oct 25, 2016 3:04 PM, "Susheel Kumar"  wrote:

> Thanks, Toke.  Analyzing GC logs helped to determine that it was a sudden
> death.  The peaks in last 20 mins... See   http://tinypic.com/r/n2zonb/9
>
> Will look into the queries more closer and also adjusting the cache sizing.
>
>
> Thanks,
> Susheel
>
> On Tue, Oct 25, 2016 at 3:37 AM, Toke Eskildsen 
> wrote:
>
> > On Mon, 2016-10-24 at 18:27 -0400, Susheel Kumar wrote:
> > > I am seeing OOM script killed solr (solr 6.0.0) on couple of our VM's
> > > today. So far our solr cluster has been running fine but suddenly
> > > today many of the VM's Solr instance got killed.
> >
> > As you have the GC-logs, you should be able to determine if it was a
> > slow death (e.g. caches gradually being filled) or a sudden one (e.g.
> > grouping or faceting on a large new non-DocValued field).
> >
> > Try plotting the GC logs with time on the x-axis and free memory after
> > GC on the y-axis. It it happens to be a sudden death, the last lines in
> > solr.log might hold a clue after all.
> >
> > - Toke Eskildsen, State and University Library, Denmark
> >
>


Re: Solr 5.3.1 - Synonym is not working as expected

2016-10-25 Thread Ahmet Arslan
Hi,

If your index is pure Chinese, I would do the expansion on query time only.
Simply replace English query term with Chinese translations.

Ahmet



On Tuesday, October 25, 2016 12:30 PM, soundarya  
wrote:
We are using Solr 5.3.1 version as our search engine. This setup is provided
by 
the Bitnami cloud and the amazon AMI is ami-50a47e23.

We have a website which has content in Chinese. We use Nutch crawler to
crawl 
the entire website and index it to the Solr collection. We have configured
few 
fields including text field with Cinese tokenizers. When user search with 
Chinese characters, we are able to see the relevant results. We wanted to
see 
the same results when user types in English or Pinyin characters. So, we
have 
included synonym file and respective tokenizer added to the schema.xml file.
We 
are not able to get any results after doing these changes. Below is the 
configuration we did in schema.xml. The synonym file is a mapping of Chinese 
word with equivalent English and pinyin words.















The output with query debug is providing the below result. The synonym 
configured for the English word is actually picked, but we see no results. 
Below is the

"rawquerystring":"nasonex",
"querystring":"nasonex",
"parsedquery":"(text:nasonex text:内舒拿)/no_coord",
"parsedquery_toString":"text:nasonex text:内舒拿",
"QParser":"LuceneQParser"


Below is the output when we try to use the analysis tool.

ST

text

raw_bytes

start

end

positionLength

type

position



nasonex

[6e 61 73 6f 6e 65 78]

0

7

1



1



SF

text

raw_bytes

start

end

positionLength

type

position



nasonex

[6e 61 73 6f 6e 65 78]

0

7

1



1


内舒拿

[e5 86 85 e8 88 92 e6 8b bf]

0

7

1

SYNONYM

1



CJKWF

text

raw_bytes

start

end

positionLength

type

position



nasonex

[6e 61 73 6f 6e 65 78]

0

7

1



1


内舒拿

[e5 86 85 e8 88 92 e6 8b bf]

0

7

1

SYNONYM

1



LCF

text

raw_bytes

start

end

positionLength

type

position



nasonex

[6e 61 73 6f 6e 65 78]

0

7

1



1


内舒拿

[e5 86 85 e8 88 92 e6 8b bf]

0

7

1

SYNONYM

1



CJKBF

text

raw_bytes

start

end

positionLength

type

position



nasonex

[6e 61 73 6f 6e 65 78]

0

7

1



1


内舒拿

[e5 86 85 e8 88 92 e6 8b bf]

0

7

1

SYNONYM

1





Please help us regarding this issue. 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-5-3-1-Synonym-is-not-working-as-expected-tp4302913.html
Sent from the Solr - User mailing list archive at Nabble.com. 


Combine Data from PDF + XML

2016-10-25 Thread tesm...@gmail.com
Hi,

I ma new to Apache Solr.  Developing a search project. The source data is
coming from two sources:

1) XML Files

2) PDF Files


I need to combine these two sources for search.  Couldn't find example of
combining these two sources. Any help is appreciated.


Regards,


Re: Graph Traversal Question

2016-10-25 Thread Joel Bernstein
Because the edges are unique on the subject->object there isn't currently a
way to capture the relationship. Aggregations can be rolled up on numeric
fields and as Yonik mentioned you can track the ancestor.

It would be fairly easy to track the relationship by adding a relationship
array that would correspond with the ancestors array for example:

{"result-set":{"docs":[
{"node":"Haruka","collection":"reviews","field":"user_s","ancestors":["book1"],
"relationships":["author"],   "level":1},
{"node":"Maria","collection":"reviews","field":"user_s","
ancestors":["book2"], "relationships":["author"], "level":1},
{"EOF":true,"RESPONSE_TIME":22}]}}

Joel Bernstein
http://joelsolr.blogspot.com/

On Tue, Oct 25, 2016 at 6:26 PM, Yonik Seeley  wrote:

> You can get the nodes that to came from by adding trackTraversal=true
>
> A cut'n'paste example from my Lucene/Solr Revolution slides:
>
> curl $URL -d 'expr=gatherNodes(reviews,
>search(reviews, q="user_s:Yonik AND rating_i:5",
>   fl="book_s,user_s,rating_i",sort="user_s asc"),
>walk="book_s->book_s",
>gather="user_s",
>fq="rating_i:[4 TO *] -user_s:Yonik",
>trackTraversal=true )'
>
> {"result-set":{"docs":[
> {"node":"Haruka","collection":"reviews","field":"user_s","
> ancestors":["book1"],"level":1},
> {"node":"Maria","collection":"reviews","field":"user_s","
> ancestors":["book2"],"level":1},
> {"EOF":true,"RESPONSE_TIME":22}]}}
>
> -Yonik
>
>
> On Tue, Oct 25, 2016 at 5:57 PM, Grant Ingersoll 
> wrote:
> > Hi,
> >
> > I'm playing around with the new Graph Traversal/GatherNodes capabilities
> in
> > Solr 6.  I've been indexing Yago facts (
> > http://www.mpi-inf.mpg.de/departments/databases-and-
> information-systems/research/yago-naga/yago/downloads/)
> > which give me triples of something like subject-relationship-object
> (United
> > States -> hasCapital -> Washington DC)
> >
> > My documents look like:
> > subject: string
> > relationship: string
> > object: string
> >
> > I can do a simple gatherNodes like
> > http://localhost:8983/solr/default/graph?expr=gatherNodes(default,
> > walk="United_States->subject", gather="object") and get back the objects
> > that relate to the subject.  However, I don't see any way to capture what
> > the relationship is in the response.  IOW, the request above would just
> > return a node of "Washington DC", but it doesn't tell me the relationship
> > (i.e. I'd like to get Wash DC and hasCapital back somehow).  Is there
> > anyway to expand the "gather" or otherwise mark up the nodes returned
> with
> > additional field attributes or maybe get additional graph info back?
> >
> > Thanks,
> > Grant
>


Does _version_ field in schema need to be indexed and/or stored?

2016-10-25 Thread Brent
I know that in the sample config sets, the _version_ field is indexed and not
stored, like so:



Is there any reason it needs to be indexed? I'm able to create collections
and use them with it not indexed, but I wonder if it negatively impacts
performance.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Does-version-field-in-schema-need-to-be-indexed-and-or-stored-tp4303036.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Graph Traversal Question

2016-10-25 Thread Yonik Seeley
You can get the nodes that to came from by adding trackTraversal=true

A cut'n'paste example from my Lucene/Solr Revolution slides:

curl $URL -d 'expr=gatherNodes(reviews,
   search(reviews, q="user_s:Yonik AND rating_i:5",
  fl="book_s,user_s,rating_i",sort="user_s asc"),
   walk="book_s->book_s",
   gather="user_s",
   fq="rating_i:[4 TO *] -user_s:Yonik",
   trackTraversal=true )'

{"result-set":{"docs":[
{"node":"Haruka","collection":"reviews","field":"user_s","ancestors":["book1"],"level":1},
{"node":"Maria","collection":"reviews","field":"user_s","ancestors":["book2"],"level":1},
{"EOF":true,"RESPONSE_TIME":22}]}}

-Yonik


On Tue, Oct 25, 2016 at 5:57 PM, Grant Ingersoll  wrote:
> Hi,
>
> I'm playing around with the new Graph Traversal/GatherNodes capabilities in
> Solr 6.  I've been indexing Yago facts (
> http://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago/downloads/)
> which give me triples of something like subject-relationship-object (United
> States -> hasCapital -> Washington DC)
>
> My documents look like:
> subject: string
> relationship: string
> object: string
>
> I can do a simple gatherNodes like
> http://localhost:8983/solr/default/graph?expr=gatherNodes(default,
> walk="United_States->subject", gather="object") and get back the objects
> that relate to the subject.  However, I don't see any way to capture what
> the relationship is in the response.  IOW, the request above would just
> return a node of "Washington DC", but it doesn't tell me the relationship
> (i.e. I'd like to get Wash DC and hasCapital back somehow).  Is there
> anyway to expand the "gather" or otherwise mark up the nodes returned with
> additional field attributes or maybe get additional graph info back?
>
> Thanks,
> Grant


Graph Traversal Question

2016-10-25 Thread Grant Ingersoll
Hi,

I'm playing around with the new Graph Traversal/GatherNodes capabilities in
Solr 6.  I've been indexing Yago facts (
http://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago/downloads/)
which give me triples of something like subject-relationship-object (United
States -> hasCapital -> Washington DC)

My documents look like:
subject: string
relationship: string
object: string

I can do a simple gatherNodes like
http://localhost:8983/solr/default/graph?expr=gatherNodes(default,
walk="United_States->subject", gather="object") and get back the objects
that relate to the subject.  However, I don't see any way to capture what
the relationship is in the response.  IOW, the request above would just
return a node of "Washington DC", but it doesn't tell me the relationship
(i.e. I'd like to get Wash DC and hasCapital back somehow).  Is there
anyway to expand the "gather" or otherwise mark up the nodes returned with
additional field attributes or maybe get additional graph info back?

Thanks,
Grant


Re: Related Search

2016-10-25 Thread Grant Ingersoll
Hi Rick,

I typically do this stuff just by searching a different collection that I
create offline by analyzing query logs and then indexing them and searching.

On Mon, Oct 24, 2016 at 8:32 PM Rick Leir  wrote:

> Hi all,
>
> There is an issue 'Create a Related Search Component' which has been
> open for some years now.
>
> It has a priority: major.
>
> https://issues.apache.org/jira/browse/SOLR-2080
>
>
> I discovered it linked from Lucidwork's very useful blog on ecommerce:
>
>
> https://lucidworks.com/blog/2011/01/25/implementing-the-ecommerce-checklist-with-apache-solr-and-lucidworks/
>
>
> Did people find a better way to accomplish Related Search? Perhaps MLT
> http://wiki.apache.org/solr/MoreLikeThis ?
>
> cheers -- Rick
>
>
>


Re: OOM Error

2016-10-25 Thread Susheel Kumar
Thanks, Toke.  Analyzing GC logs helped to determine that it was a sudden
death.  The peaks in last 20 mins... See   http://tinypic.com/r/n2zonb/9

Will look into the queries more closer and also adjusting the cache sizing.


Thanks,
Susheel

On Tue, Oct 25, 2016 at 3:37 AM, Toke Eskildsen 
wrote:

> On Mon, 2016-10-24 at 18:27 -0400, Susheel Kumar wrote:
> > I am seeing OOM script killed solr (solr 6.0.0) on couple of our VM's
> > today. So far our solr cluster has been running fine but suddenly
> > today many of the VM's Solr instance got killed.
>
> As you have the GC-logs, you should be able to determine if it was a
> slow death (e.g. caches gradually being filled) or a sudden one (e.g.
> grouping or faceting on a large new non-DocValued field).
>
> Try plotting the GC logs with time on the x-axis and free memory after
> GC on the y-axis. It it happens to be a sudden death, the last lines in
> solr.log might hold a clue after all.
>
> - Toke Eskildsen, State and University Library, Denmark
>


Re: OOM Error

2016-10-25 Thread William Bell
I would also recommend that 8GB is cutting it close for Java 8 JVM with
SOLR. We use 12GB and have had issues with 8GB. But your mileage may vary.

On Tue, Oct 25, 2016 at 1:37 AM, Toke Eskildsen 
wrote:

> On Mon, 2016-10-24 at 18:27 -0400, Susheel Kumar wrote:
> > I am seeing OOM script killed solr (solr 6.0.0) on couple of our VM's
> > today. So far our solr cluster has been running fine but suddenly
> > today many of the VM's Solr instance got killed.
>
> As you have the GC-logs, you should be able to determine if it was a
> slow death (e.g. caches gradually being filled) or a sudden one (e.g.
> grouping or faceting on a large new non-DocValued field).
>
> Try plotting the GC logs with time on the x-axis and free memory after
> GC on the y-axis. It it happens to be a sudden death, the last lines in
> solr.log might hold a clue after all.
>
> - Toke Eskildsen, State and University Library, Denmark
>



-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: CachedSqlEntityProcessor with delta-import

2016-10-25 Thread Erick Erickson
Why not use delete by id rather than query? It'll be more efficient

Probably not a big deal though.

On Tue, Oct 25, 2016 at 1:47 AM, Aniket Khare  wrote:
> Hi Sowmya,
>
> I my case I have implemeneted the data indexing suggested by James and for
> deleting the reords I have created my own data indexing job which will call
> the delete API periodically by passing the list of unique Id.
> https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers
>
> http://localhost:8983/solr/update?stream.body=
> id:1234=true
>
> Thanks,
> Aniket S. Khare
>
> On Tue, Oct 25, 2016 at 1:32 AM, Mohan, Sowmya  wrote:
>
>> Thanks James. That's what I was using before. But I also wanted to perform
>> deletes using deletedPkQuery and hence switched to delta imports. The
>> problem with using deletedPkQuery with the full import is that
>> dataimporter.last_index_time is no longer accurate.
>>
>> Below is an example of my deletedPkQuery. If run the full-import for a
>> differential index, that would update the last index time. Running the
>> delta import to remove the deleted records then wouldn't do anything since
>> nothing changed since the last index time.
>>
>>
>>  deletedPkQuery="SELECT id
>> FROM content
>> WHERE active = 1 AND lastUpdate >
>> '${dataimporter.last_index_time}'"
>>
>>
>>
>>
>>
>>
>> -Original Message-
>> From: Dyer, James [mailto:james.d...@ingramcontent.com]
>> Sent: Friday, October 21, 2016 4:23 PM
>> To: solr-user@lucene.apache.org
>> Subject: RE: CachedSqlEntityProcessor with delta-import
>>
>> Sowmya,
>>
>> My memory is that the cache feature does not work with Delta Imports.  In
>> fact, I believe that nearly all DIH features except straight JDBC imports
>> do not work with Delta Imports.  My advice is to not use the Delta Import
>> feature at all as the same result can (often more-efficiently) be
>> accomplished following the approach outlined here:
>> https://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport
>>
>> James Dyer
>> Ingram Content Group
>>
>> -Original Message-
>> From: Mohan, Sowmya [mailto:sowmya.mo...@icf.com]
>> Sent: Tuesday, October 18, 2016 10:07 AM
>> To: solr-user@lucene.apache.org
>> Subject: CachedSqlEntityProcessor with delta-import
>>
>> Good morning,
>>
>> Can CachedSqlEntityProcessor be used with delta-import? In my setup when
>> running a delta-import with CachedSqlEntityProcessor, the child entity
>> values are not correctly updated for the parent record. I am on Solr 4.3.
>> Has anyone experienced this and if so how to resolve it?
>>
>> Thanks,
>> Sowmya.
>>
>>
>
>
> --
> Regards,
>
> Aniket S. Khare


Re: how is calculated score in group queries?

2016-10-25 Thread Erick Erickson
bq: I'm still using Solr 4.8.1, any chance this thing is changed/fixed in the
early releases?

What do you want it changed to? IIUC this is the intended behavior.

As for the rest, I'll defer to others.

Best,
Erick

On Tue, Oct 25, 2016 at 2:19 AM, Vincenzo D'Amore  wrote:
> Hi all,
>
> I have a couple of questions about grouping, I hope you can help me.
>
> I'm trying to understand how is calculated group score in group queries.
>
> So, I did my home work and it seems that group score is taken from score of
> first document for each group found, i.e.:
>
> 
>
>   3653
> 
>   
>  
>/xx/yyy/
>  
>   maxScore="0.775318">
> 
>
> Document title
>
> 0.775318
>  
> [omissis]
>
> So for all the groups, what I see is that maxScore is the same value of
> first document into the returned group. Am'I right?
>
> I'm still using Solr 4.8.1, any chance this thing is changed/fixed in the
> early releases?
>
> I as this because reading the documentation I see now we should prefer
> "Collapse and Expand", even if not in all cases we can use it... not clear
> when.
>
> So what's the difference? Can be explained with a real word example?
>
> Best regards and thanks for all the time you will spent with me,
> Vincenzo
>
>
> --
> Vincenzo D'Amore
> email: v.dam...@gmail.com
> skype: free.dev
> mobile: +39 349 8513251


Re: Solr Cloud A/B Deployment Issue

2016-10-25 Thread jimtronic
Also, if we issue a delete by query where the query is "_version_:0", it also
creates a transaction log and then has no trouble transferring leadership
between old and new nodes.

Still, it seems like when we ADDREPLICA, some sort of transaction log should
be started. 

Jim



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Cloud-A-B-Deployment-Issue-tp4302810p4302959.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Cloud A/B Deployment Issue

2016-10-25 Thread jimtronic
Interestingly, If I simply add one document to the full cluster after all 6
nodes are active, this entire problem goes away. This appears to be because
a transaction log entry is created which in turn prevents the new nodes from
going into full replication recovery upon leader change.

Adding a document is a hacky solution, however. It seems like new nodes that
were added via ADDREPLICA should know more about versions than they
currently do.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Cloud-A-B-Deployment-Issue-tp4302810p4302949.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Solr 6.0 Highlighting Not Working

2016-10-25 Thread Teague James
Hi - Thanks for the reply, I'll give that a try.  

-Original Message-
From: jimtronic [mailto:jimtro...@gmail.com] 
Sent: Monday, October 24, 2016 3:56 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr 6.0 Highlighting Not Working

Perhaps you need to wrap your inner "" and "" tags in the CDATA
structure?





--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-6-0-Highlighting-Not-Working-tp43027
87p4302835.html
Sent from the Solr - User mailing list archive at Nabble.com.



Solr 5.3.1 - Synonym is not working as expected

2016-10-25 Thread soundarya
We are using Solr 5.3.1 version as our search engine. This setup is provided
by 
the Bitnami cloud and the amazon AMI is ami-50a47e23.

We have a website which has content in Chinese. We use Nutch crawler to
crawl 
the entire website and index it to the Solr collection. We have configured
few 
fields including text field with Cinese tokenizers. When user search with 
Chinese characters, we are able to see the relevant results. We wanted to
see 
the same results when user types in English or Pinyin characters. So, we
have 
included synonym file and respective tokenizer added to the schema.xml file.
We 
are not able to get any results after doing these changes. Below is the 
configuration we did in schema.xml. The synonym file is a mapping of Chinese 
word with equivalent English and pinyin words.















The output with query debug is providing the below result. The synonym 
configured for the English word is actually picked, but we see no results. 
Below is the

"rawquerystring":"nasonex",
"querystring":"nasonex",
"parsedquery":"(text:nasonex text:内舒拿)/no_coord",
"parsedquery_toString":"text:nasonex text:内舒拿",
"QParser":"LuceneQParser"


Below is the output when we try to use the analysis tool.

ST

text

raw_bytes

start

end

positionLength

type

position



nasonex

[6e 61 73 6f 6e 65 78]

0

7

1



1



SF

text

raw_bytes

start

end

positionLength

type

position



nasonex

[6e 61 73 6f 6e 65 78]

0

7

1



1


内舒拿

[e5 86 85 e8 88 92 e6 8b bf]

0

7

1

SYNONYM

1



CJKWF

text

raw_bytes

start

end

positionLength

type

position



nasonex

[6e 61 73 6f 6e 65 78]

0

7

1



1


内舒拿

[e5 86 85 e8 88 92 e6 8b bf]

0

7

1

SYNONYM

1



LCF

text

raw_bytes

start

end

positionLength

type

position



nasonex

[6e 61 73 6f 6e 65 78]

0

7

1



1


内舒拿

[e5 86 85 e8 88 92 e6 8b bf]

0

7

1

SYNONYM

1



CJKBF

text

raw_bytes

start

end

positionLength

type

position



nasonex

[6e 61 73 6f 6e 65 78]

0

7

1



1


内舒拿

[e5 86 85 e8 88 92 e6 8b bf]

0

7

1

SYNONYM

1





Please help us regarding this issue. 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-5-3-1-Synonym-is-not-working-as-expected-tp4302913.html
Sent from the Solr - User mailing list archive at Nabble.com.


how is calculated score in group queries?

2016-10-25 Thread Vincenzo D'Amore
Hi all,

I have a couple of questions about grouping, I hope you can help me.

I'm trying to understand how is calculated group score in group queries.

So, I did my home work and it seems that group score is taken from score of
first document for each group found, i.e.:


   
  3653

  
 
   /xx/yyy/
 
 

   
Document title
   
0.775318
 
[omissis]

So for all the groups, what I see is that maxScore is the same value of
first document into the returned group. Am'I right?

I'm still using Solr 4.8.1, any chance this thing is changed/fixed in the
early releases?

I as this because reading the documentation I see now we should prefer
"Collapse and Expand", even if not in all cases we can use it... not clear
when.

So what's the difference? Can be explained with a real word example?

Best regards and thanks for all the time you will spent with me,
Vincenzo


-- 
Vincenzo D'Amore
email: v.dam...@gmail.com
skype: free.dev
mobile: +39 349 8513251


Re: CachedSqlEntityProcessor with delta-import

2016-10-25 Thread Aniket Khare
Hi Sowmya,

I my case I have implemeneted the data indexing suggested by James and for
deleting the reords I have created my own data indexing job which will call
the delete API periodically by passing the list of unique Id.
https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers

http://localhost:8983/solr/update?stream.body=
id:1234=true

Thanks,
Aniket S. Khare

On Tue, Oct 25, 2016 at 1:32 AM, Mohan, Sowmya  wrote:

> Thanks James. That's what I was using before. But I also wanted to perform
> deletes using deletedPkQuery and hence switched to delta imports. The
> problem with using deletedPkQuery with the full import is that
> dataimporter.last_index_time is no longer accurate.
>
> Below is an example of my deletedPkQuery. If run the full-import for a
> differential index, that would update the last index time. Running the
> delta import to remove the deleted records then wouldn't do anything since
> nothing changed since the last index time.
>
>
>  deletedPkQuery="SELECT id
> FROM content
> WHERE active = 1 AND lastUpdate >
> '${dataimporter.last_index_time}'"
>
>
>
>
>
>
> -Original Message-
> From: Dyer, James [mailto:james.d...@ingramcontent.com]
> Sent: Friday, October 21, 2016 4:23 PM
> To: solr-user@lucene.apache.org
> Subject: RE: CachedSqlEntityProcessor with delta-import
>
> Sowmya,
>
> My memory is that the cache feature does not work with Delta Imports.  In
> fact, I believe that nearly all DIH features except straight JDBC imports
> do not work with Delta Imports.  My advice is to not use the Delta Import
> feature at all as the same result can (often more-efficiently) be
> accomplished following the approach outlined here:
> https://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport
>
> James Dyer
> Ingram Content Group
>
> -Original Message-
> From: Mohan, Sowmya [mailto:sowmya.mo...@icf.com]
> Sent: Tuesday, October 18, 2016 10:07 AM
> To: solr-user@lucene.apache.org
> Subject: CachedSqlEntityProcessor with delta-import
>
> Good morning,
>
> Can CachedSqlEntityProcessor be used with delta-import? In my setup when
> running a delta-import with CachedSqlEntityProcessor, the child entity
> values are not correctly updated for the parent record. I am on Solr 4.3.
> Has anyone experienced this and if so how to resolve it?
>
> Thanks,
> Sowmya.
>
>


-- 
Regards,

Aniket S. Khare


Re: OOM Error

2016-10-25 Thread Toke Eskildsen
On Mon, 2016-10-24 at 18:27 -0400, Susheel Kumar wrote:
> I am seeing OOM script killed solr (solr 6.0.0) on couple of our VM's
> today. So far our solr cluster has been running fine but suddenly
> today many of the VM's Solr instance got killed.

As you have the GC-logs, you should be able to determine if it was a
slow death (e.g. caches gradually being filled) or a sudden one (e.g.
grouping or faceting on a large new non-DocValued field).

Try plotting the GC logs with time on the x-axis and free memory after
GC on the y-axis. It it happens to be a sudden death, the last lines in
solr.log might hold a clue after all.

- Toke Eskildsen, State and University Library, Denmark