Clob transformer not working in DIH

2016-12-08 Thread Kamal Kishore Aggarwal
Hi,

I am using solr 5.4.1. Here I am using dataimport handler to index data
with SQL Server.

I am using CLOB transformer to convert clob value to string. Indexing is
working fine but clob transformation is not working. Expected string value
is not coming for clob column. There is no error or exception coming in log.

Here is the configuration:


















I tried using RegexTransformer, it worked. But ClobTransformer is not
working. Please assist.

Regards
Kamal


Re: Very long young generation stop the world GC pause

2016-12-08 Thread Greg Harris
Your gun (not quite smoking yet, we still need the fingerprints) is this:

[Times: user=0.00 sys=94.28, real=97.19 secs]

Normal GC pauses are generally almost entirely user CPU, very short and
multiprocessor. Something else is sometimes happening with either the JVM
or OS which is causing this process to be single threaded, blocked and
being executed by system CPU -- ie: lower level processing unrelated to a
normal GC of Solr. This is non normal GC behavior and probably not going to
be fixed by any JVM parameter changes (but you can always try). Any unusual
disk or network IO? You need to understand what else is going on with your
system at the same moment in time this happens. You might see if you can do
an strace to see what its trying to do.

On Thu, Dec 8, 2016 at 12:06 AM, forest_soup  wrote:

> As you can see in the gc log, the long GC pause is not a full GC. It's a
> young generation GC instead.
> In our case, full gc is fast and young gc got some long stw pause.
> Do you have any comments on that, as we usually believe full gc may cause
> longer pause, but young generation should be ok?
>
> 2016-11-22T20:43:16.463+: 2942054.509: Total time for which application
> threads were stopped: 0.0029195 seconds, Stopping threads took: 0.804
> seconds
> {Heap before GC invocations=2246 (full 0):
>  garbage-first heap   total 26673152K, used 4683965K [0x7f0c1000,
> 0x7f0c108065c0, 0x7f141000)
>   region size 8192K, 162 young (1327104K), 17 survivors (139264K)
>  Metaspace   used 56487K, capacity 57092K, committed 58368K, reserved
> 59392K
> 2016-11-22T20:43:16.555+: 2942054.602: [GC pause (G1 Evacuation Pause)
> (young)
> Desired survivor size 88080384 bytes, new threshold 15 (max 15)
> - age   1:   28176280 bytes,   28176280 total
> - age   2:5632480 bytes,   33808760 total
> - age   3:9719072 bytes,   43527832 total
> - age   4:6219408 bytes,   49747240 total
> - age   5:4465544 bytes,   54212784 total
> - age   6:3417168 bytes,   57629952 total
> - age   7:5343072 bytes,   62973024 total
> - age   8:2784808 bytes,   65757832 total
> - age   9:6538056 bytes,   72295888 total
> - age  10:6368016 bytes,   78663904 total
> - age  11: 695216 bytes,   79359120 total
> , 97.2044320 secs]
>[Parallel Time: 19.8 ms, GC Workers: 18]
>   [GC Worker Start (ms): Min: 2942054602.1, Avg: 2942054604.6, Max:
> 2942054612.7, Diff: 10.6]
>   [Ext Root Scanning (ms): Min: 0.0, Avg: 2.4, Max: 6.7, Diff: 6.7,
> Sum:
> 43.5]
>   [Update RS (ms): Min: 0.0, Avg: 3.0, Max: 15.9, Diff: 15.9, Sum:
> 54.0]
>  [Processed Buffers: Min: 0, Avg: 10.7, Max: 39, Diff: 39, Sum:
> 192]
>   [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.6]
>   [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0,
> Sum: 0.0]
>   [Object Copy (ms): Min: 0.1, Avg: 9.2, Max: 13.4, Diff: 13.3, Sum:
> 165.9]
>   [Termination (ms): Min: 0.0, Avg: 2.5, Max: 2.7, Diff: 2.7, Sum:
> 44.1]
>  [Termination Attempts: Min: 1, Avg: 1.5, Max: 3, Diff: 2, Sum: 27]
>   [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.0, Sum:
> 0.6]
>   [GC Worker Total (ms): Min: 9.0, Avg: 17.1, Max: 19.7, Diff: 10.6,
> Sum: 308.7]
>   [GC Worker End (ms): Min: 2942054621.8, Avg: 2942054621.8, Max:
> 2942054621.8, Diff: 0.0]
>[Code Root Fixup: 0.1 ms]
>[Code Root Purge: 0.0 ms]
>[Clear CT: 0.2 ms]
>[Other: 97184.3 ms]
>   [Choose CSet: 0.0 ms]
>   [Ref Proc: 8.5 ms]
>   [Ref Enq: 0.2 ms]
>   [Redirty Cards: 0.2 ms]
>   [Humongous Register: 0.1 ms]
>   [Humongous Reclaim: 0.1 ms]
>   [Free CSet: 0.4 ms]
>[Eden: 1160.0M(1160.0M)->0.0B(1200.0M) Survivors: 136.0M->168.0M Heap:
> 4574.2M(25.4G)->3450.8M(26.8G)]
> Heap after GC invocations=2247 (full 0):
>  garbage-first heap   total 28049408K, used 3533601K [0x7f0c1000,
> 0x7f0c10806b00, 0x7f141000)
>   region size 8192K, 21 young (172032K), 21 survivors (172032K)
>  Metaspace   used 56487K, capacity 57092K, committed 58368K, reserved
> 59392K
> }
>  [Times: user=0.00 sys=94.28, real=97.19 secs]
> 2016-11-22T20:44:53.760+: 2942151.806: Total time for which application
> threads were stopped: 97.2053747 seconds, Stopping threads took: 0.0001373
> seconds
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Very-long-young-generation-stop-the-world-GC-
> pause-tp4308911p4308912.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: The state of Solr 5. Is it in maintenance mode only?

2016-12-08 Thread Chris Hostetter

: On the 5.x front I wasn't expecting 5.6 release now that we have 6.x but
: was simply surprised to see fix for 4.x and not for 5.x.

As Shawn mentioned: jira issues might have inccorrect fixVersion info if 
people don't pay enough attention when resolving (especially with 
dups/invalid) but what really matters is what gets committed/released.

Wether or not there are any future 4.x "bug fix" releases depends entirely 
on the severity of the bug and the demand from users for fixes -- 
particularly for security relatd bugs, you might see more effort put into 
backporting "farther back" in the release timeline and releasing a 4.10.x 
release.  

: As for adoption levels, it was my subjective feel reading this list. Do
: we have community survey on that subject? That would be really
: interesting to see.

As a developer, my focus is on building features and fixing any bugs found 
in the current major version release branch, w/o worrying too much about 
bugs that may only affect older previous major versions.  If someone finds 
a bug in 5.x, and that bug no longer exists in 6.x, I have less 
interested/motivation to look into that bug then something else that 
*does* affect 6.x, because there is already a fix/workaround available...

Upgrade to the latest version.

while some users might have (completley understandable) mitigating factors 
preventing them from upgrading, that doesn't really affect my 
interest/motivation in fixing bugs on older branches, because users who 
have reasons preventing them from upgrading to recent major versions 
frequently tend to have one thing in common: They have things preventing 
them from upgrading at all.

So even if i put in the effort to find/diagnose/fix an old bug, and even 
if the project as a whole goes to the effort to build/test/release from an 
"older" major version dev branch, the return on investment for that work 
is lower then putting the same amount of effort into bug fixes on a 
"newer" major version dev brach.

For example: let's say hypothetically the Solr user base was devided 
evenly into 3rds: 1/3 using 6.x.0, 1/3 using 5.y.0, 1/3 using 4.z.0.  In 
theory, if 3 diff bugs affect each of those 3 versions to the same degree, 
then the number of users impact by a 4.z.1 bug fix would same as the 
number of users impacted by a 5.y.1 or a 6.x.1 bug fix -- but in practice, 
the number of 4.z.0 users who are likely to upgrade to 4.x.1 is much lower 
then the number of 5.y.0 users who would upgrade to 5.y.1, which is less 
still then the number of 6.x.0 users who will upgrade to 6.x.1. 


-Hoss
http://www.lucidworks.com/


"on deck" searcher vs warming searcher

2016-12-08 Thread Brent
Is there a difference between an "on deck" searcher and a warming searcher?
>From what I've read, they sound like the same thing.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/on-deck-searcher-vs-warming-searcher-tp4309021.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr seems to reserve facet.limit results

2016-12-08 Thread Toke Eskildsen
Markus Jelsma  wrote:
> I tried the overrequest ratio/count and set them to 1.0/0 . Odd enough,
> with these settings high facet.limit and extremely high facet.limit are
> both up to twice as slow as with 1.5/10 settings.

Not sure if it is the right explanation for your "extremely high 
facet.limit"-case, but here goes...


The two phases in distributed simple String faceting in Solr are very different 
from each other:

The first phase allocates a counter structure, iterates the query hits and 
increments the counters, then extracts the top-X facet terms and returns them.

The second phase receives a list of facet terms to count. The terms are those 
that the shard did not deliver in phase 1. 
An example might help here: For phase 1, shard 1 returns [a:5 b:3 c:3], while 
shard 2 returns [d:2 e:2 c:1]. This is merged to [a:5 c:4 b:3]. Since shard 2 
did not return counts for the terms a and b, these counts are requested from 
shard 2 in phase 2.
In the current implementation, the term counts in the second phase are 
calculated in the same way as enum faceting: Basically one tiny search for each 
term with the query facetfield:term. This does not scale well, so it does not 
take many terms before phase 2 gets _slower_ than phase 1 (you can see for 
yourself in the solr.log). So we want to keep the number of phase 2 term-counts 
down, even if it means that phase 1 gets a bit slower.
This is where over-requesting comes into play: The more you over-request, the 
slower phase 1 gets, but it also means that the chance of the merger having to 
ask for extra term-counts gets lower as they were probably returned in phase 1.
I wrote a bit about the phenomena in 
https://sbdevel.wordpress.com/2014/09/11/even-sparse-faceting-is-limited/

- Toke Eskildsen


RE: prefix query help

2016-12-08 Thread Kris Musshorn
I think this will work. Ill try it tomorrow and let you know.
Thanks for the help Eric and Shawn
Kris

-Original Message-
From: Erik Hatcher [mailto:erik.hatc...@gmail.com] 
Sent: Thursday, December 8, 2016 2:43 PM
To: solr-user@lucene.apache.org
Subject: Re: prefix query help

It’s hard to tell how _exact_ to be here, but if you’re indexing those strings 
and your queries are literally always -MM, then do the truncation of the 
actual data into that format or via analysis techniques to index only the 
-MM piece of the incoming string.  

But given what you’ve got so far, using what the prefix examples I provided 
below, your two queries would be this:

   q={!prefix f=metatag.date v=‘2016-06'}

and

   q=({!prefix f=metatag.date v=‘2016-06’} OR {!prefix f=metatag.date 
v=‘2014-04’} )

Does that work for you?

It really should work to do this q=metadata.date:(2016-06* OR 2014-04*) as 
you’ve got it, but you said that sort of thing wasn’t working (debug out would 
help suss that issue out).

If you did index those strings cleaner as -MM to accommodate the types of 
query you’ve shown then you could do q=metadata.date:(2016-06 OR 2014-04), or 
q={!terms f=metadata.date}2016-06,2014-04

Erik




> On Dec 8, 2016, at 11:34 AM, KRIS MUSSHORN  wrote:
> 
> yes I did attach rather than paste sorry. 
>   
> Ok heres an actual, truncated, example of the metatag.date field contents in 
> solr. 
> NONE-NN-NN is the default setting. 
>   
> doc 1 
> " metatag.date ": [ 
>   "2016-06-15T14:51:04Z" ,
>   "2016-06-15T14:51:04Z" 
> ] 
>   
> doc 2 
> " metatag.date ": [ 
>   "2016-06-15" 
> ] 
> doc 3 
> " metatag.date ": [ 
>   "NONE-NN-NN" 
> ] 
> doc 4 
> " metatag.date ": [ 
>   "-mm-dd" 
> ] 
>   
> doc 5 
> " metatag.date ": [ 
>   "2016-07-06" 
> ] 
> 
> doc 6 
> " metatag.date ": [ 
>   "2014-04-15T14:51:06Z" , 
>   "2014-04-15T14:51:06Z" 
> ] 
>   
> q=2016-06 should return doc 2 and 1 
> q=2016-06 OR 2014-04 should return docs 1, 2 and 6 
>   
> yes I know its wonky but its what I have to deal with until he content is 
> cleaned up. 
> I cant use date type.. that would make my life to easy. 
>   
> TIA again 
> Kris 
> 
> - Original Message -
> 
> From: "Erik Hatcher"  
> To: solr-user@lucene.apache.org 
> Sent: Thursday, December 8, 2016 12:36:26 PM 
> Subject: Re: prefix query help 
> 
> Kris - 
> 
> To chain multiple prefix queries together: 
> 
> q=({!prefix f=field1 v=‘prefix1'} {!prefix f=field2 v=‘prefix2’}) 
> 
> The leading paren is needed to ensure it’s being parsed with the lucene 
> qparser (be sure not to have defType set, or a variant would be needed) and 
> that allows multiple {!…} expressions to be parsed.  The outside-the-curlys 
> value for the prefix shouldn’t be attempted with multiples, so the `v` is the 
> way to go, either inline or $referenced. 
> 
> If you do have defType set, say to edismax, then do something like this 
> instead: 
> q={!lucene v=prefixed_queries} 
> _queries={!prefix f=field1 v=‘prefix1'} {!prefix f=field2 
> v=‘prefix2’} 
>// I don’t think parens are needed with _queries, but maybe.  
>  
> 
> =query (or =true) is your friend - see how things are parsed.  I 
> presume in your example that didn’t work that the dash didn’t work as you 
> expected?   or… not sure.  What’s the parsed_query output in debug on that 
> one? 
> 
> Erik 
> 
> p.s. did you really just send a Word doc to the list that could have been 
> inlined in text?  :)   
> 
> 
> 
>> On Dec 8, 2016, at 7:18 AM, KRIS MUSSHORN  wrote: 
>> 
>> Im indexing data from Nutch into SOLR 5.4.1. 
>> I've got a date metatag that I have to store as text type because the data 
>> stinks. 
>> It's stored in SOLR as field metatag.date. 
>> At the source the dates are formatted (when they are entered correctly ) as 
>> -MM-DD 
>>   
>> q=metatag.date:2016-01* does not produce the correct results and returns 
>> undesireable matches2016-05-01 etc as example. 
>> q={!prefix f=metatag.date}2016-01 gives me exactly what I want for one 
>> month/year. 
>>   
>> My question is how do I chain n prefix queries together? 
>> i.e. 
>> I want all docs where metatag.date prefix is 2016-01 or 2016-07 or 2016-10 
>>   
>> TIA, 
>> Kris 
>>   
> 
> 



Re: prefix query help

2016-12-08 Thread Erik Hatcher
It’s hard to tell how _exact_ to be here, but if you’re indexing those strings 
and your queries are literally always -MM, then do the truncation of the 
actual data into that format or via analysis techniques to index only the 
-MM piece of the incoming string.  

But given what you’ve got so far, using what the prefix examples I provided 
below, your two queries would be this:

   q={!prefix f=metatag.date v=‘2016-06'}

and

   q=({!prefix f=metatag.date v=‘2016-06’} OR {!prefix f=metatag.date 
v=‘2014-04’} )

Does that work for you?

It really should work to do this q=metadata.date:(2016-06* OR 2014-04*) as 
you’ve got it, but you said that sort of thing wasn’t working (debug out would 
help suss that issue out).

If you did index those strings cleaner as -MM to accommodate the types of 
query you’ve shown then you could do q=metadata.date:(2016-06 OR 2014-04), or 
q={!terms f=metadata.date}2016-06,2014-04

Erik




> On Dec 8, 2016, at 11:34 AM, KRIS MUSSHORN  wrote:
> 
> yes I did attach rather than paste sorry. 
>   
> Ok heres an actual, truncated, example of the metatag.date field contents in 
> solr. 
> NONE-NN-NN is the default setting. 
>   
> doc 1 
> " metatag.date ": [ 
>   "2016-06-15T14:51:04Z" ,
>   "2016-06-15T14:51:04Z" 
> ] 
>   
> doc 2 
> " metatag.date ": [ 
>   "2016-06-15" 
> ] 
> doc 3 
> " metatag.date ": [ 
>   "NONE-NN-NN" 
> ] 
> doc 4 
> " metatag.date ": [ 
>   "-mm-dd" 
> ] 
>   
> doc 5 
> " metatag.date ": [ 
>   "2016-07-06" 
> ] 
> 
> doc 6 
> " metatag.date ": [ 
>   "2014-04-15T14:51:06Z" , 
>   "2014-04-15T14:51:06Z" 
> ] 
>   
> q=2016-06 should return doc 2 and 1 
> q=2016-06 OR 2014-04 should return docs 1, 2 and 6 
>   
> yes I know its wonky but its what I have to deal with until he content is 
> cleaned up. 
> I cant use date type.. that would make my life to easy. 
>   
> TIA again 
> Kris 
> 
> - Original Message -
> 
> From: "Erik Hatcher"  
> To: solr-user@lucene.apache.org 
> Sent: Thursday, December 8, 2016 12:36:26 PM 
> Subject: Re: prefix query help 
> 
> Kris - 
> 
> To chain multiple prefix queries together: 
> 
> q=({!prefix f=field1 v=‘prefix1'} {!prefix f=field2 v=‘prefix2’}) 
> 
> The leading paren is needed to ensure it’s being parsed with the lucene 
> qparser (be sure not to have defType set, or a variant would be needed) and 
> that allows multiple {!…} expressions to be parsed.  The outside-the-curlys 
> value for the prefix shouldn’t be attempted with multiples, so the `v` is the 
> way to go, either inline or $referenced. 
> 
> If you do have defType set, say to edismax, then do something like this 
> instead: 
> q={!lucene v=prefixed_queries} 
> _queries={!prefix f=field1 v=‘prefix1'} {!prefix f=field2 
> v=‘prefix2’} 
>// I don’t think parens are needed with _queries, but maybe.  
>  
> 
> =query (or =true) is your friend - see how things are parsed.  I 
> presume in your example that didn’t work that the dash didn’t work as you 
> expected?   or… not sure.  What’s the parsed_query output in debug on that 
> one? 
> 
> Erik 
> 
> p.s. did you really just send a Word doc to the list that could have been 
> inlined in text?  :)   
> 
> 
> 
>> On Dec 8, 2016, at 7:18 AM, KRIS MUSSHORN  wrote: 
>> 
>> Im indexing data from Nutch into SOLR 5.4.1. 
>> I've got a date metatag that I have to store as text type because the data 
>> stinks. 
>> It's stored in SOLR as field metatag.date. 
>> At the source the dates are formatted (when they are entered correctly ) as 
>> -MM-DD 
>>   
>> q=metatag.date:2016-01* does not produce the correct results and returns 
>> undesireable matches2016-05-01 etc as example. 
>> q={!prefix f=metatag.date}2016-01 gives me exactly what I want for one 
>> month/year. 
>>   
>> My question is how do I chain n prefix queries together? 
>> i.e. 
>> I want all docs where metatag.date prefix is 2016-01 or 2016-07 or 2016-10 
>>   
>> TIA, 
>> Kris 
>>   
> 
> 



Re: prefix query help

2016-12-08 Thread KRIS MUSSHORN
yes I did attach rather than paste sorry. 
  
Ok heres an actual, truncated, example of the metatag.date field contents in 
solr. 
NONE-NN-NN is the default setting. 
  
doc 1 
" metatag.date ": [ 
  "2016-06-15T14:51:04Z" , 
  "2016-06-15T14:51:04Z" 
    ] 
  
doc 2 
" metatag.date ": [ 
  "2016-06-15" 
    ] 
doc 3 
" metatag.date ": [ 
  "NONE-NN-NN" 
    ] 
doc 4 
" metatag.date ": [ 
  "-mm-dd" 
    ] 
  
doc 5 
" metatag.date ": [ 
  "2016-07-06" 
    ] 

doc 6 
" metatag.date ": [ 
  "2014-04-15T14:51:06Z" , 
  "2014-04-15T14:51:06Z" 
    ] 
  
q=2016-06 should return doc 2 and 1 
q=2016-06 OR 2014-04 should return docs 1, 2 and 6 
  
yes I know its wonky but its what I have to deal with until he content is 
cleaned up. 
I cant use date type.. that would make my life to easy. 
  
TIA again 
Kris 

- Original Message -

From: "Erik Hatcher"  
To: solr-user@lucene.apache.org 
Sent: Thursday, December 8, 2016 12:36:26 PM 
Subject: Re: prefix query help 

Kris - 

To chain multiple prefix queries together: 

    q=({!prefix f=field1 v=‘prefix1'} {!prefix f=field2 v=‘prefix2’}) 

The leading paren is needed to ensure it’s being parsed with the lucene qparser 
(be sure not to have defType set, or a variant would be needed) and that allows 
multiple {!…} expressions to be parsed.  The outside-the-curlys value for the 
prefix shouldn’t be attempted with multiples, so the `v` is the way to go, 
either inline or $referenced. 

If you do have defType set, say to edismax, then do something like this 
instead: 
    q={!lucene v=prefixed_queries} 
    _queries={!prefix f=field1 v=‘prefix1'} {!prefix f=field2 
v=‘prefix2’} 
       // I don’t think parens are needed with _queries, but maybe.   

=query (or =true) is your friend - see how things are parsed.  I 
presume in your example that didn’t work that the dash didn’t work as you 
expected?   or… not sure.  What’s the parsed_query output in debug on that one? 

Erik 

p.s. did you really just send a Word doc to the list that could have been 
inlined in text?  :)   



> On Dec 8, 2016, at 7:18 AM, KRIS MUSSHORN  wrote: 
> 
> Im indexing data from Nutch into SOLR 5.4.1. 
> I've got a date metatag that I have to store as text type because the data 
> stinks. 
> It's stored in SOLR as field metatag.date. 
> At the source the dates are formatted (when they are entered correctly ) as 
> -MM-DD 
>   
> q=metatag.date:2016-01* does not produce the correct results and returns 
> undesireable matches2016-05-01 etc as example. 
> q={!prefix f=metatag.date}2016-01 gives me exactly what I want for one 
> month/year. 
>   
> My question is how do I chain n prefix queries together? 
> i.e. 
> I want all docs where metatag.date prefix is 2016-01 or 2016-07 or 2016-10 
>   
> TIA, 
> Kris 
>   




Re: prefix query help

2016-12-08 Thread Shawn Heisey
On 12/8/2016 10:02 AM, KRIS MUSSHORN wrote:
>
> Here is how I have the field defined... see attachment.

You're using a tokenized field type.

For the kinds of queries you asked about here, you want to use StrField,
not TextField -- this type cannot have an analysis chain and indexes to
one token that is completely unchanged from the input.  Note that if you
also want to do other kinds of queries (like 2016), then StrField would
break those.  You do have to reindex after making this change.

Since your input data is consistently -MM-DD, you should consider
using the solr.DateRangeField class instead of a string or text type. 
This allows queries like "2016" or "2016-07" to work as you would expect
them to.

https://cwiki.apache.org/confluence/display/solr/Working+with+Dates#WorkingwithDates-DateRangeFormatting

Thanks,
Shawn



Re: prefix query help

2016-12-08 Thread Erik Hatcher
Kris -

To chain multiple prefix queries together:

q=({!prefix f=field1 v=‘prefix1'} {!prefix f=field2 v=‘prefix2’})

The leading paren is needed to ensure it’s being parsed with the lucene qparser 
(be sure not to have defType set, or a variant would be needed) and that allows 
multiple {!…} expressions to be parsed.  The outside-the-curlys value for the 
prefix shouldn’t be attempted with multiples, so the `v` is the way to go, 
either inline or $referenced.

If you do have defType set, say to edismax, then do something like this instead:
q={!lucene v=prefixed_queries}
_queries={!prefix f=field1 v=‘prefix1'} {!prefix f=field2 
v=‘prefix2’} 
   // I don’t think parens are needed with _queries, but maybe.  

=query (or =true) is your friend - see how things are parsed.  I 
presume in your example that didn’t work that the dash didn’t work as you 
expected?   or… not sure.  What’s the parsed_query output in debug on that one?

Erik

p.s. did you really just send a Word doc to the list that could have been 
inlined in text?  :)  



> On Dec 8, 2016, at 7:18 AM, KRIS MUSSHORN  wrote:
> 
> Im indexing data from Nutch into SOLR 5.4.1. 
> I've got a date metatag that I have to store as text type because the data 
> stinks. 
> It's stored in SOLR as field metatag.date. 
> At the source the dates are formatted (when they are entered correctly ) as 
> -MM-DD 
>   
> q=metatag.date:2016-01* does not produce the correct results and returns 
> undesireable matches2016-05-01 etc as example. 
> q={!prefix f=metatag.date}2016-01 gives me exactly what I want for one 
> month/year. 
>   
> My question is how do I chain n prefix queries together? 
> i.e. 
> I want all docs where metatag.date prefix is 2016-01 or 2016-07 or 2016-10 
>   
> TIA, 
> Kris 
>   



Re: prefix query help

2016-12-08 Thread KRIS MUSSHORN

Here is how I have the field defined... see attachment. 
  
  
- Original Message -

From: "Erick Erickson"  
To: "solr-user"  
Sent: Thursday, December 8, 2016 10:44:08 AM 
Subject: Re: prefix query help 

You'd probably be better off indexing it as a "string" type given your 
expectations. Depending on the analysis chain (do take a look at 
admin/analysis for the field in question) the tokenization can be tricky 
to get right. 

Best, 
Erick 

On Thu, Dec 8, 2016 at 7:18 AM, KRIS MUSSHORN  wrote: 
> Im indexing data from Nutch into SOLR 5.4.1. 
> I've got a date metatag that I have to store as text type because the data 
> stinks. 
> It's stored in SOLR as field metatag.date. 
> At the source the dates are formatted (when they are entered correctly ) as 
> -MM-DD 
> 
> q=metatag.date:2016-01* does not produce the correct results and returns 
> undesireable matches2016-05-01 etc as example. 
> q={!prefix f=metatag.date}2016-01 gives me exactly what I want for one 
> month/year. 
> 
> My question is how do I chain n prefix queries together? 
> i.e. 
> I want all docs where metatag.date prefix is 2016-01 or 2016-07 or 2016-10 
> 
> TIA, 
> Kris 
> 



field name.docx
Description: MS-Word 2007 document


Re: IndexWriter exception

2016-12-08 Thread Susheel Kumar
I believe you may want to look into commit frequency as pointed by Eric to
resolve this issue. If you committing too often, it may keep opening the
multiple searchers and running into race conditions.

Thanks,
Susheel

On Thu, Dec 8, 2016 at 10:49 AM, Alexandre Drouin <
alexandre.dro...@orckestra.com> wrote:

> I checked my source control history and "6" was the original value that
> was checked-in.  I'll investigate lowering this value in our next iteration.
>
> Thanks for the hint.
>
> Alexandre Drouin
>
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: December 6, 2016 5:42 PM
> To: solr-user 
> Subject: Re: IndexWriter exception
> Importance: High
>
> bq: maxWarmingSearchers is set to 6
>
> Red flag ref. If this was done to avoid the warning in the logs about too
> many warming searchers, it's a clear indication that you're committing far
> too often. Let's see exactly what you're using to post when you say you're
> "using the REST API". My bet: each one does a commit. If this is the
> post.jar tool there's an option to _not_ commit and I'd be sure to set that
> and let your autocommit settings handle committing.
>
> My guess is that you are adding more and more documents to Solr, thus
> making it more likely that you are opening a bunch of searchers at once
> (see above) and running into a race condition. So the first thing I'd do is
> straighten that out and see if the problem goes away.
>
> Best,
> Erick
>
> On Tue, Dec 6, 2016 at 1:34 PM, Alexandre Drouin <
> alexandre.dro...@orckestra.com> wrote:
> > Hello,
> >
> > I have an error that has been popping up randomly since 3 weeks ago and
> the randomness of the issue makes it hard to troubleshoot.
> >
> > I have a service that use the REST API to index documents (1000 docs at
> a time) and in this process I often call the core status API
> (/solr/admin/cores?action=STATUS) to get the statuses of the different
> cores.  This process has been working flawlessly since 2014 however it has
> been failing recently with the exception: " this IndexWriter is closed".
> >
> > I did a few search on Google for this exception but I did not see
> anything relevant.  Does anyone have an idea how to troubleshoot/fix this
> issue?
> >
> > This is my configuration:
> > - Solr 4.10.2 on Windows.  I am not using SolrCloud.
> > - Java 1.7.0_79 24.79-b02
> > - useColdSearcher is set to true
> > - maxWarmingSearchers is set to 6
> > - I changed my Solr configuration about 2-3 months ago: I disabled HTTPS
> and enabled the logging (INFO level) but I do not think this could cause
> the issue.
> >
> > Relevant stack trace:
> >
> > INFO  - 2016-12-06 18:43:23.854; org.apache.solr.update.CommitTracker;
> > Hard AutoCommit: if uncommited for 9ms; if 75000 uncommited docs
> > INFO  - 2016-12-06 18:43:23.856; org.apache.solr.update.CommitTracker;
> > Soft AutoCommit: if uncommited for 15000ms; INFO  - 2016-12-06
> > 18:43:23.929; org.apache.solr.update.processor.LogUpdateProcessor;
> > [coreENCA] webapp=/solr path=/update params={commit=false}
> > {add=[Global_44235 (1552993270510911488), Global_44236Pony
> > (1552993270516154368), Global_44236Magnum (1552993270518251520),
> > Global_44237Pony (1552993270519300096), Global_44237Split
> > (1552993270521397249), Global_44237Standard (1552993270523494401),
> > Global_44238Pony (1552993270525591553), Global_44238Standard
> > (1552993270527688704), Global_44238Magnum (1552993270529785856),
> > Global_44239Standard (1552993270531883008), ... (2102 adds)]} 0 8292
> > INFO  - 2016-12-06 18:43:23.933; org.apache.solr.core.SolrCore;
> > [coreENCA]  CLOSING SolrCore org.apache.solr.core.SolrCore@5730eaaf
> > INFO  - 2016-12-06 18:43:23.935;
> > org.apache.solr.update.DirectUpdateHandler2; closing
> > DirectUpdateHandler2{commits=0,autocommit maxDocs=75000,autocommit
> > maxTime=9ms,autocommits=0,soft autocommit maxTime=15000ms,soft
> > autocommits=0,optimizes=0,rollbacks=0,expungeDeletes=0,docsPending=417
> > 6,adds=4176,deletesById=0,deletesByQuery=0,errors=0,cumulative_adds=41
> > 76,cumulative_deletesById=0,cumulative_deletesByQuery=0,cumulative_err
> > ors=0,transaction_logs_total_size=84547858,transaction_logs_total_numb
> > er=1} INFO  - 2016-12-06 18:43:23.936; org.apache.solr.core.SolrCore;
> > [coreENCA] Closing main searcher on request.
> > INFO  - 2016-12-06 18:43:24.044;
> > org.apache.solr.search.SolrIndexSearcher; Opening 
> > Searcher@73a40fbb[coreENCA]
> main ERROR - 2016-12-06 18:43:24.045; org.apache.solr.common.SolrException;
> org.apache.solr.common.SolrException: Error handling 'status' action
> > at org.apache.solr.handler.admin.CoreAdminHandler.
> handleStatusAction(CoreAdminHandler.java:710)
> > at org.apache.solr.handler.admin.CoreAdminHandler.
> handleRequestInternal(CoreAdminHandler.java:214)
> > at org.apache.solr.handler.admin.CoreAdminHandler.
> handleRequestBody(CoreAdminHandler.java:188)
> > at 

RE: Solr seems to reserve facet.limit results

2016-12-08 Thread Markus Jelsma
Thanks Chris, Toke,

I tried the overrequest ratio/count and set them to 1.0/0 . Odd enough, with 
these settings high facet.limit and extremely high facet.limit are both up to 
twice as slow as with 1.5/10 settings.

Even successive calls don't seem to 'warm anything up`. 

Anyone with an explaination for this? This is counterintuitive, well to me at 
least.

Thanks,
Markus
 
-Original message-
> From:Chris Hostetter 
> Sent: Tuesday 6th December 2016 1:47
> To: solr-user@lucene.apache.org
> Subject: RE: Solr seems to reserve facet.limit results
> 
> 
> 
> I think what you're seeing might be a result of the overrequesting done
> in phase #1 of a distriuted facet query.
> 
> The purpose of overrequesting is to mitigate the possibility of a 
> constraint which should be in the topN for the collection as a whole, but 
> just outside the topN on every shard -- so they never make it to the 
> second phase of the distributed calculation.
> 
> The amount of overrequest is, by default, a multiplicitive function of the 
> user specified facet.limit with a fudge factor (IIRC: 10+(1.5*facet.limit))
> 
> If you're using an explicitly high facet.limit, you can try setting the 
> overrequets ratio/count to 1.0/0 respectively to force Solr to only 
> request the # of constraints you've specified from each shard, and then 
> aggregate them...
> 
> https://lucene.apache.org/solr/6_3_0/solr-solrj/org/apache/solr/common/params/FacetParams.html#FACET_OVERREQUEST_RATIO
> https://lucene.apache.org/solr/6_3_0/solr-solrj/org/apache/solr/common/params/FacetParams.html#FACET_OVERREQUEST_COUNT
> 
> 
> 
> One side note related to the work around you suggested...
> 
> : One simple solution, in my case would be, now just thinking of it, run 
> : the query with no facets and no rows, get the numFound, and set that as 
> : facet.limit for the actual query.
> 
> ...that assumes that the number of facet constraints returned is limited 
> by the total number of documents matching the query -- in general there is 
> no such garuntee because of multivalued fields (or faceting on tokenized 
> fields), so this type of approach isn't a good idea as a generalized 
> solution
> 
> 
> 
> -Hoss
> http://www.lucidworks.com/
> 


Re: Solr node not found in ZK live_nodes

2016-12-08 Thread Susheel Kumar
This happens some time that one of the node goes down but then it gets
registered as Leader/Active.  Does the Cloud View shows anything about this
node (Recovering/Down/Recovery Failed etc.) and are you able to perform
query to just this shard/node directly?

Susheel

On Wed, Dec 7, 2016 at 10:13 PM, Mark Miller  wrote:

> That already happens. The ZK client itself will reconnect when it can and
> trigger everything to be setup like when the cluster first starts up,
> including a live node and leader election, etc.
>
> You may have hit a bug or something else missing from this conversation,
> but reconnecting after losing the ZK connection is a basic feature from day
> one.
>
> Mark
> On Wed, Dec 7, 2016 at 12:34 AM Manohar Sripada 
> wrote:
>
> > Thanks Erick! Should I create a JIRA issue for the same?
> >
> > Regarding the logs, I have changed the log level to WARN. That may be the
> > reason, I couldn't get anything from it.
> >
> > Thanks,
> > Manohar
> >
> > On Tue, Dec 6, 2016 at 9:58 PM, Erick Erickson 
> > wrote:
> >
> > > Most likely reason is that the Solr node in question,
> > > was not reachable thus it was removed from
> > > live_nodes. Perhaps due to temporary network
> > > glitch, long GC pause or the like. If you're rolling
> > > your logs over it's quite possible that any illuminating
> > > messages were lost. The default 4M size for each
> > > log is quite lo at INFO level...
> > >
> > > It does seem possible for a Solr node to periodically
> > > check its status and re-insert itself into live_nodes,
> > > go through recovery and all that. So far most of that
> > > registration logic is baked into startup code. What
> > > do others think? Worth a JIRA?
> > >
> > > Erick
> > >
> > > On Tue, Dec 6, 2016 at 3:53 AM, Manohar Sripada 
> > > wrote:
> > > > We have a 16 node cluster of Solr (5.2.1) and 5 node Zookeeper
> (3.4.6).
> > > >
> > > > All the Solr nodes were registered to Zookeeper (ls /live_nodes) when
> > > setup
> > > > was done 3 months back. Suddenly, few days back our search started
> > > failing
> > > > because one of the solr node(consider s16) was not seen in Zookeeper,
> > > i.e.,
> > > > when we checked for *"ls /live_nodes"*, *s16 *solr node was not
> found.
> > > > However, the corresponding Solr process was up and running.
> > > >
> > > > To my surprise, I couldn't find any errors or warnings in solr or
> > > zookeeper
> > > > logs related to this. I have few questions -
> > > >
> > > > 1. Is there any reason why this registration to ZK was lost? I know
> > logs
> > > > should provide some information, but, it didn't. Did anyone
> encountered
> > > > similar issue, if so, what can be the root cause?
> > > > 2. Shouldn't Solr be clever enough to detect that the registration to
> > ZK
> > > > was lost (for some reason) and should try to re-register again?
> > > >
> > > > PS: The issue is resolved by restarting the Solr node. However, I am
> > > > curious to know why it happened in the first place.
> > > >
> > > > Thanks
> > >
> >
> --
> - Mark
> about.me/markrmiller
>


RE: IndexWriter exception

2016-12-08 Thread Alexandre Drouin
I checked my source control history and "6" was the original value that was 
checked-in.  I'll investigate lowering this value in our next iteration.

Thanks for the hint.  

Alexandre Drouin


-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: December 6, 2016 5:42 PM
To: solr-user 
Subject: Re: IndexWriter exception
Importance: High

bq: maxWarmingSearchers is set to 6

Red flag ref. If this was done to avoid the warning in the logs about too many 
warming searchers, it's a clear indication that you're committing far too 
often. Let's see exactly what you're using to post when you say you're "using 
the REST API". My bet: each one does a commit. If this is the post.jar tool 
there's an option to _not_ commit and I'd be sure to set that and let your 
autocommit settings handle committing.

My guess is that you are adding more and more documents to Solr, thus making it 
more likely that you are opening a bunch of searchers at once (see above) and 
running into a race condition. So the first thing I'd do is straighten that out 
and see if the problem goes away.

Best,
Erick

On Tue, Dec 6, 2016 at 1:34 PM, Alexandre Drouin 
 wrote:
> Hello,
>
> I have an error that has been popping up randomly since 3 weeks ago and the 
> randomness of the issue makes it hard to troubleshoot.
>
> I have a service that use the REST API to index documents (1000 docs at a 
> time) and in this process I often call the core status API 
> (/solr/admin/cores?action=STATUS) to get the statuses of the different cores. 
>  This process has been working flawlessly since 2014 however it has been 
> failing recently with the exception: " this IndexWriter is closed".
>
> I did a few search on Google for this exception but I did not see anything 
> relevant.  Does anyone have an idea how to troubleshoot/fix this issue?
>
> This is my configuration:
> - Solr 4.10.2 on Windows.  I am not using SolrCloud.
> - Java 1.7.0_79 24.79-b02
> - useColdSearcher is set to true
> - maxWarmingSearchers is set to 6
> - I changed my Solr configuration about 2-3 months ago: I disabled HTTPS and 
> enabled the logging (INFO level) but I do not think this could cause the 
> issue.
>
> Relevant stack trace:
>
> INFO  - 2016-12-06 18:43:23.854; org.apache.solr.update.CommitTracker; 
> Hard AutoCommit: if uncommited for 9ms; if 75000 uncommited docs 
> INFO  - 2016-12-06 18:43:23.856; org.apache.solr.update.CommitTracker; 
> Soft AutoCommit: if uncommited for 15000ms; INFO  - 2016-12-06 
> 18:43:23.929; org.apache.solr.update.processor.LogUpdateProcessor; 
> [coreENCA] webapp=/solr path=/update params={commit=false} 
> {add=[Global_44235 (1552993270510911488), Global_44236Pony 
> (1552993270516154368), Global_44236Magnum (1552993270518251520), 
> Global_44237Pony (1552993270519300096), Global_44237Split 
> (1552993270521397249), Global_44237Standard (1552993270523494401), 
> Global_44238Pony (1552993270525591553), Global_44238Standard 
> (1552993270527688704), Global_44238Magnum (1552993270529785856), 
> Global_44239Standard (1552993270531883008), ... (2102 adds)]} 0 8292 
> INFO  - 2016-12-06 18:43:23.933; org.apache.solr.core.SolrCore; 
> [coreENCA]  CLOSING SolrCore org.apache.solr.core.SolrCore@5730eaaf
> INFO  - 2016-12-06 18:43:23.935; 
> org.apache.solr.update.DirectUpdateHandler2; closing 
> DirectUpdateHandler2{commits=0,autocommit maxDocs=75000,autocommit 
> maxTime=9ms,autocommits=0,soft autocommit maxTime=15000ms,soft 
> autocommits=0,optimizes=0,rollbacks=0,expungeDeletes=0,docsPending=417
> 6,adds=4176,deletesById=0,deletesByQuery=0,errors=0,cumulative_adds=41
> 76,cumulative_deletesById=0,cumulative_deletesByQuery=0,cumulative_err
> ors=0,transaction_logs_total_size=84547858,transaction_logs_total_numb
> er=1} INFO  - 2016-12-06 18:43:23.936; org.apache.solr.core.SolrCore; 
> [coreENCA] Closing main searcher on request.
> INFO  - 2016-12-06 18:43:24.044; 
> org.apache.solr.search.SolrIndexSearcher; Opening Searcher@73a40fbb[coreENCA] 
> main ERROR - 2016-12-06 18:43:24.045; org.apache.solr.common.SolrException; 
> org.apache.solr.common.SolrException: Error handling 'status' action
> at 
> org.apache.solr.handler.admin.CoreAdminHandler.handleStatusAction(CoreAdminHandler.java:710)
> at 
> org.apache.solr.handler.admin.CoreAdminHandler.handleRequestInternal(CoreAdminHandler.java:214)
> at 
> org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:188)
> at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:729)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:258)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
> at 
> 

Re: prefix query help

2016-12-08 Thread Erick Erickson
You'd probably be better off indexing it as a "string" type given your
expectations. Depending on the analysis chain (do take a look at
admin/analysis for the field in question) the tokenization can be tricky
to get right.

Best,
Erick

On Thu, Dec 8, 2016 at 7:18 AM, KRIS MUSSHORN  wrote:
> Im indexing data from Nutch into SOLR 5.4.1.
> I've got a date metatag that I have to store as text type because the data 
> stinks.
> It's stored in SOLR as field metatag.date.
> At the source the dates are formatted (when they are entered correctly ) as 
> -MM-DD
>
> q=metatag.date:2016-01* does not produce the correct results and returns 
> undesireable matches2016-05-01 etc as example.
> q={!prefix f=metatag.date}2016-01 gives me exactly what I want for one 
> month/year.
>
> My question is how do I chain n prefix queries together?
> i.e.
> I want all docs where metatag.date prefix is 2016-01 or 2016-07 or 2016-10
>
> TIA,
> Kris
>


prefix query help

2016-12-08 Thread KRIS MUSSHORN
Im indexing data from Nutch into SOLR 5.4.1. 
I've got a date metatag that I have to store as text type because the data 
stinks. 
It's stored in SOLR as field metatag.date. 
At the source the dates are formatted (when they are entered correctly ) as 
-MM-DD 
  
q=metatag.date:2016-01* does not produce the correct results and returns 
undesireable matches2016-05-01 etc as example. 
q={!prefix f=metatag.date}2016-01 gives me exactly what I want for one 
month/year. 
  
My question is how do I chain n prefix queries together? 
i.e. 
I want all docs where metatag.date prefix is 2016-01 or 2016-07 or 2016-10 
  
TIA, 
Kris 
  


Re: Very long young generation stop the world GC pause

2016-12-08 Thread Shawn Heisey
On 12/8/2016 1:06 AM, forest_soup wrote:
> As you can see in the gc log, the long GC pause is not a full GC. It's a
> young generation GC instead.  
> In our case, full gc is fast and young gc got some long stw pause.
> Do you have any comments on that, as we usually believe full gc may cause
> longer pause, but young generation should be ok?

While full GC is *typically* where long pauses happen, it can happen
with *any* collection.

The startup script in Solr 5.0 and later comes with GC tuning, so you
probably should not be messing with it at all.  If you feel that you
must change the GC tuning to G1GC, perhaps you should try these settings
from my personal wiki page:

https://wiki.apache.org/solr/ShawnHeisey#Current_experiments

Also, I would strongly recommend that you drop your max heap to 31GB
instead of 32GB to change the pointer size, and that you investigate
whether you need a heap that large *at all*.  The numbers I saw in your
log on the Jira issue did not indicate the need for that much heap
memory.  After the long collection, your heap usage was only 3450MB.  In
addition to reducing your heap size to a level that's more appropriate
to your Solr's needs, I would suggest that you try the GC tuning that
Solr has out of the box, which uses CMS.

Thanks,
Shawn



Re: Very long young generation stop the world GC pause

2016-12-08 Thread Pushkar Raste
Disable all the G1GC tuning your are doing except for ParallelRefProcEnabled

G1GC is an adaptive algorithm and would keep tuning to reach the default
pause goal of 250ms which should be good for most of the applications.

Can you also tell us how much RAM you have on your machine and if you have
swap enabled and being used?

On Dec 8, 2016 8:53 AM, "forest_soup"  wrote:

> Besides, will those JVM options make it better?
> -XX:+UnlockExperimentalVMOptions -XX:G1NewSizePercent=10
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Very-long-young-generation-stop-the-world-GC-
> pause-tp4308911p4308937.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Solr cannot provide index service after a large GC pause but core state in ZK is still active

2016-12-08 Thread forest_soup
Hi Erick, Mark and Varun,

I'll use this mail thread tracking the issue in
https://issues.apache.org/jira/browse/SOLR-9829 .

@Erick, for your question: 
I'm sure the solr node is still in the live_nodes list. 
The logs are from solr log. And the most root cause I can see here is the
IndexWriter is closed.

@Mark and Varun, are you sure this issue is dup of
https://issues.apache.org/jira/browse/SOLR-7956 ?
If yes, I'll try to backport it to 5.3.2.
And also I see Daisy created a similar JIRA:
https://issues.apache.org/jira/browse/SOLR-9830 . Although her root cause is
the too many open file, but could you make sure it's also the dup of
SOLR-7956?

Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-cannot-provide-index-service-after-a-large-GC-pause-but-core-state-in-ZK-is-still-active-tp4308942.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Very long young generation stop the world GC pause

2016-12-08 Thread forest_soup
Besides, will those JVM options make it better? 
-XX:+UnlockExperimentalVMOptions -XX:G1NewSizePercent=10 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Very-long-young-generation-stop-the-world-GC-pause-tp4308911p4308937.html
Sent from the Solr - User mailing list archive at Nabble.com.


Encryption to Solr stored fields – Using Custom Codec Lucene JIRA -6966

2016-12-08 Thread Mohit Sidana
Hello,


I am trying to experiment with my solr indexes with the patch open on
Apache JIRA - Codec for index-level encryption
 (LUCENE-6966).
https://issues.apache.org/jira/browse/LUCENE-6966. I am currently trying to
test this Custom codec with Solr to encrypt sensitive documents.


I have managed to apply this patch to Lucene-solr trunk (branch 6.3) and
used “ant compile” and “ant jar” to get the jar files.

As According to solr wiki, custom posting format can be plugged per field
using SchemaCodecFactory.


Here are some sample fields defined in my managedschema.xml







In order to use encryptedLucene50 posting format I have overridden posting
format in field type definations.






This all is working fine as desired because after indexing I get encrypted
version of term dictionary and term indexes, but I would also like to
encrypt the stored fields.

I see this codec implementation is encrypting stored fields using Lucene
default CompressingStoredFieldformat. and as mentioned in small
documentation on this patch recently


*“If the stored fields must be encrypted, the user has to specify with the
method **‘storedFieldFormat()’ an instance of the
EncryptedLucene50StoredFieldsFormat. This class is an abstract class
itself, and user can specify which field to encrypt by overriding the
method ‘isFieldEncrypted(String field) ”.*



The DummyEncryptedLucene60Codec class provided with this patch already
override the default stored field format with Encrypted version. However, I
am unable to make use of this function with Solr. After indexing with solr
stored fields are not encrypted in my index.



My question is, what might be I am doing wrong here or am I missing any
other thing so this function can also be picked and used with Solr?

 I will appreciate any feedback on this.

Thanks.

Mohit


IllegalArgumentException: lucene file does not exist

2016-12-08 Thread Sara Elshobaky
Hi All,

I'm using Solr 6.3.0 to build a large index (around 700+GB).
Everything went well on a normal PC,  But when I moved to an HPC ( High 
Performance Computing cluster) solr generates the following exception,

-   java.lang.: 
/data/solr-6.3.0/server/solr/watr/data/index/_mkq_Lucene54_0.dvd does not exist

After checking solr, I found it committing normally so I continued indexing. 
Then,  again after few hours I got another exception:

-  
java.lang.IllegalArgumentException/data/solr-6.3.0/server/solr/watr/data/index/_n32_Lucene50_0.tim
 does not exist

After repeating the same scenario, I got this one also

-  
java.lang.IllegalArgumentException:/data/solr-6.3.0/server/solr/watr/data/index/_q0q.nvm
 does not exist

Any idea, why those exceptions are raised?
And, how to ensure that my current index is not corrupted or missing any data?

I really appreciate your advice
Sara


---
  Sara El-Shobaky, Ph.D.
   Project Manager
   ICT Sector
   Bibliotheca Alexandrina
   P.O. Box 138, Chatby
   Alexandria 21526, Egypt
   -
   Phone:+(203) 483 ,  Ext.:1413
   Fax:+(203) 482 0405:
   E-Mail: 
sara.elshob...@bibalex.org
   Website: www.bibalex.org
   -



Re: Very long young generation stop the world GC pause

2016-12-08 Thread forest_soup
As you can see in the gc log, the long GC pause is not a full GC. It's a
young generation GC instead.  
In our case, full gc is fast and young gc got some long stw pause.
Do you have any comments on that, as we usually believe full gc may cause
longer pause, but young generation should be ok?

2016-11-22T20:43:16.463+: 2942054.509: Total time for which application
threads were stopped: 0.0029195 seconds, Stopping threads took: 0.804
seconds
{Heap before GC invocations=2246 (full 0):
 garbage-first heap   total 26673152K, used 4683965K [0x7f0c1000,
0x7f0c108065c0, 0x7f141000)
  region size 8192K, 162 young (1327104K), 17 survivors (139264K)
 Metaspace   used 56487K, capacity 57092K, committed 58368K, reserved
59392K
2016-11-22T20:43:16.555+: 2942054.602: [GC pause (G1 Evacuation Pause)
(young)
Desired survivor size 88080384 bytes, new threshold 15 (max 15)
- age   1:   28176280 bytes,   28176280 total
- age   2:5632480 bytes,   33808760 total
- age   3:9719072 bytes,   43527832 total
- age   4:6219408 bytes,   49747240 total
- age   5:4465544 bytes,   54212784 total
- age   6:3417168 bytes,   57629952 total
- age   7:5343072 bytes,   62973024 total
- age   8:2784808 bytes,   65757832 total
- age   9:6538056 bytes,   72295888 total
- age  10:6368016 bytes,   78663904 total
- age  11: 695216 bytes,   79359120 total
, 97.2044320 secs]
   [Parallel Time: 19.8 ms, GC Workers: 18]
  [GC Worker Start (ms): Min: 2942054602.1, Avg: 2942054604.6, Max:
2942054612.7, Diff: 10.6]
  [Ext Root Scanning (ms): Min: 0.0, Avg: 2.4, Max: 6.7, Diff: 6.7, Sum:
43.5]
  [Update RS (ms): Min: 0.0, Avg: 3.0, Max: 15.9, Diff: 15.9, Sum: 54.0]
 [Processed Buffers: Min: 0, Avg: 10.7, Max: 39, Diff: 39, Sum: 192]
  [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.6]
  [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0,
Sum: 0.0]
  [Object Copy (ms): Min: 0.1, Avg: 9.2, Max: 13.4, Diff: 13.3, Sum:
165.9]
  [Termination (ms): Min: 0.0, Avg: 2.5, Max: 2.7, Diff: 2.7, Sum: 44.1]
 [Termination Attempts: Min: 1, Avg: 1.5, Max: 3, Diff: 2, Sum: 27]
  [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.0, Sum:
0.6]
  [GC Worker Total (ms): Min: 9.0, Avg: 17.1, Max: 19.7, Diff: 10.6,
Sum: 308.7]
  [GC Worker End (ms): Min: 2942054621.8, Avg: 2942054621.8, Max:
2942054621.8, Diff: 0.0]
   [Code Root Fixup: 0.1 ms]
   [Code Root Purge: 0.0 ms]
   [Clear CT: 0.2 ms]
   [Other: 97184.3 ms]
  [Choose CSet: 0.0 ms]
  [Ref Proc: 8.5 ms]
  [Ref Enq: 0.2 ms]
  [Redirty Cards: 0.2 ms]
  [Humongous Register: 0.1 ms]
  [Humongous Reclaim: 0.1 ms]
  [Free CSet: 0.4 ms]
   [Eden: 1160.0M(1160.0M)->0.0B(1200.0M) Survivors: 136.0M->168.0M Heap:
4574.2M(25.4G)->3450.8M(26.8G)]
Heap after GC invocations=2247 (full 0):
 garbage-first heap   total 28049408K, used 3533601K [0x7f0c1000,
0x7f0c10806b00, 0x7f141000)
  region size 8192K, 21 young (172032K), 21 survivors (172032K)
 Metaspace   used 56487K, capacity 57092K, committed 58368K, reserved
59392K
}
 [Times: user=0.00 sys=94.28, real=97.19 secs] 
2016-11-22T20:44:53.760+: 2942151.806: Total time for which application
threads were stopped: 97.2053747 seconds, Stopping threads took: 0.0001373
seconds



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Very-long-young-generation-stop-the-world-GC-pause-tp4308911p4308912.html
Sent from the Solr - User mailing list archive at Nabble.com.