Re: understanding phonetic matching

2016-03-22 Thread Alexandre Rafalovitch
I'd start by putting LowerCaseFF before the PhoneticFilter.

But then, you say you were using Analysis screen and what? Do you get
the matches when you put your sample text and the query text in the
two boxes in the UI? I am not sure what "look at my solr data" means
in this particular context.

Regards,
   Alex.

Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 23 March 2016 at 16:27, Jay Potharaju  wrote:
> Hi,
> I am trying to do name matching using the phonetic filter factory. As part
> of that I was analyzing the data using analysis screen in solr UI. If i
> search for john, any documents containing john or jon should be found.
>
> Following is my definition of the custom field that I use for indexing the
> data. When I look at my solr data I dont see any similar sounding names in
> my solr data, even though I have set inject="true". Is that not how it is
> supposed to work?
> Can someone explain how phonetic matching works?
>
>   ="100">
>
>  
>
> 
>
>  inject="true" maxCodeLength="5"/>
>
> 
>
>  
>
> 
>
> --
> Thanks
> Jay


understanding phonetic matching

2016-03-22 Thread Jay Potharaju
Hi,
I am trying to do name matching using the phonetic filter factory. As part
of that I was analyzing the data using analysis screen in solr UI. If i
search for john, any documents containing john or jon should be found.

Following is my definition of the custom field that I use for indexing the
data. When I look at my solr data I dont see any similar sounding names in
my solr data, even though I have set inject="true". Is that not how it is
supposed to work?
Can someone explain how phonetic matching works?

 

 







 



-- 
Thanks
Jay


Re: Seasonal searches in SOLR 5.x

2016-03-22 Thread David Smiley
Hi,

I suggest having a "season" field (or whatever you might want to call it)
using DateRangeField but simply use a nominal year value.  So basically all
durations would be within this nominal year.  For some docs that span
new-years, this might mean 2 durations and that's okay.  Also it's okay if
you have multiple values and it's okay if your calculations result in some
that overlap; you needn't make them distinct; it'll all get coalesced in
the index.

If for some reason you wind up going the route of abusing point data for
durations, I recommend this link:
http://wiki.apache.org/solr/SpatialForTimeDurations
and it most definitely does not require polygons (and thus JTS); I'm not
sure what gave you that impression.  It's all rectangles & points.

~ David

On Mon, Mar 21, 2016 at 1:29 PM Ioannis Kirmitzoglou <
ioanniskirmitzog...@gmail.com> wrote:

> Hi all,
>
> I would like to implement seasonal date searches on date ranges. I’m using
> SOLR 5.4.1 and have indexed date ranges using a DateRangeField (let’s call
> this field date_ranges).
> Each document in SOLR corresponds to a biological sample and each sample
> was collected during a date range that can span from a single day to
> multiple years. For my application it makes sense to enable seasonal
> searches, ie find samples that were collected during a specific period of
> the year (e.g. summer, or February). In this type of search, the year that
> the sample was collected is not relevant, only the days of the year. I’ve
> been all over SOLR documentation and I haven’t been able to find anything
> that will enable do me that. The closest I got was a post with instructions
> on how to use a spatial field to do date searches (
> https://people.apache.org/~hossman/spatial-for-non-spatial-meetup-20130117/).
> Using the logic in that post I was able to come up with a solution but it’s
> rather complex and needs polygon searches (which in turn means installing
> the JTS Topology suite).
> Before committing to that I would like to ask for your input and whether
> there’s an easier way to do these types of searches.
>
> Many thanks,
>
> Ioannis
>
> -
> Ioannis Kirmitzoglou, PhD
> Bioinformatician - Scientific Programmer
> Imperial College, London
> www.vectorbase.org
> www.vigilab.org
>
> --
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com


Re: Delete by query using JSON?

2016-03-22 Thread Jack Krupansky
See the correct syntax example here:
https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-SendingJSONUpdateCommands

Your query is fine.

-- Jack Krupansky

On Tue, Mar 22, 2016 at 3:07 PM, Paul Hoffman  wrote:

> I've been struggling to find the right syntax for deleting by query
> using JSON, where the query includes an fq parameter.
>
> I know how to delete *all* documents, but how would I delete only
> documents with field doctype = "cres"?  I have tried the following along
> with a number of variations, all to no avail:
>
> $ curl -s -d @- 'http://localhost:8983/solr/blacklight-core/update?wt=json'
> < {
> "delete": { "query": "doctype:cres" }
> }
> EOS
>
> I can identify the documents like this:
>
> curl -s '
> http://localhost:8983/solr/blacklight-core/select?q==doctype%3Acres=json=id
> '
>
> It seems like such a simple thing, but I haven't found any examples that
> use an fq.  Could someone post an example?
>
> Thanks in advance,
>
> Paul.
>
> --
> Paul Hoffman 
> Systems Librarian
> Fenway Libraries Online
> c/o Wentworth Institute of Technology
> 550 Huntington Ave.
> Boston, MA 02115
> (617) 442-2384 (FLO main number)
>


Re: Delete by query using JSON?

2016-03-22 Thread Walter Underwood
“Why do you care?” might not be the best way to say it, but it is essential to 
understand the difference between selection (filtering) and ranking.

As Solr params:

* q is ranking and filtering
* fq is filtering only
* bq is ranking only

When deleting documents, ordering does not matter, which is why we ask why you 
care about the ordering.

If the response is familiar to you, imagine how the questions sound to people 
who have been working in search for twenty years. But even when we are snippy, 
we still try to help.

Many, many times, the question is wrong. The most common difficulty on this 
list is an “XY problem”, where the poster has problem X and has assumed 
solution Y, which is not the right solution. But they ask about Y. So we will 
tell people that their approach is wrong, because that is the most helpful 
thing we can do.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Mar 22, 2016, at 4:16 PM, Robert Brown  wrote:
> 
> "why do you care? just do this ..."
> 
> I see this a lot on mailing lists these days, it's usually a learning 
> curve/task/question.  I know I fall into these types of questions/tasks 
> regularly.
> 
> Which usually leads to "don't tell me my approach is wrong, just explain 
> what's going on, and why", or "just answer the straight-forward question I 
> asked in first place.".
> 
> Sorry for rambling, this just sounded familiar...
> 
> :)
> 
> 
> 
> On 22/03/16 22:50, Alexandre Rafalovitch wrote:
>> Why do you care?
>> 
>> The difference between Q and FQ are the scoring. For delete, you
>> delete all of them regardless of scoring and there is no difference.
>> Just chuck them all into Q.
>> 
>> Regards,
>>Alex.
>> 
>> Newsletter and resources for Solr beginners and intermediates:
>> http://www.solr-start.com/
>> 
>> 
>> On 23 March 2016 at 06:07, Paul Hoffman  wrote:
>>> I've been struggling to find the right syntax for deleting by query
>>> using JSON, where the query includes an fq parameter.
>>> 
>>> I know how to delete *all* documents, but how would I delete only
>>> documents with field doctype = "cres"?  I have tried the following along
>>> with a number of variations, all to no avail:
>>> 
>>> $ curl -s -d @- 'http://localhost:8983/solr/blacklight-core/update?wt=json' 
>>> <>> {
>>> "delete": { "query": "doctype:cres" }
>>> }
>>> EOS
>>> 
>>> I can identify the documents like this:
>>> 
>>> curl -s 
>>> 'http://localhost:8983/solr/blacklight-core/select?q==doctype%3Acres=json=id'
>>> 
>>> It seems like such a simple thing, but I haven't found any examples that
>>> use an fq.  Could someone post an example?
>>> 
>>> Thanks in advance,
>>> 
>>> Paul.
>>> 
>>> --
>>> Paul Hoffman 
>>> Systems Librarian
>>> Fenway Libraries Online
>>> c/o Wentworth Institute of Technology
>>> 550 Huntington Ave.
>>> Boston, MA 02115
>>> (617) 442-2384 (FLO main number)
> 



Re: Delete by query using JSON?

2016-03-22 Thread Robert Brown

"why do you care? just do this ..."

I see this a lot on mailing lists these days, it's usually a learning 
curve/task/question.  I know I fall into these types of questions/tasks 
regularly.


Which usually leads to "don't tell me my approach is wrong, just explain 
what's going on, and why", or "just answer the straight-forward question 
I asked in first place.".


Sorry for rambling, this just sounded familiar...

:)



On 22/03/16 22:50, Alexandre Rafalovitch wrote:

Why do you care?

The difference between Q and FQ are the scoring. For delete, you
delete all of them regardless of scoring and there is no difference.
Just chuck them all into Q.

Regards,
Alex.

Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 23 March 2016 at 06:07, Paul Hoffman  wrote:

I've been struggling to find the right syntax for deleting by query
using JSON, where the query includes an fq parameter.

I know how to delete *all* documents, but how would I delete only
documents with field doctype = "cres"?  I have tried the following along
with a number of variations, all to no avail:

$ curl -s -d @- 'http://localhost:8983/solr/blacklight-core/update?wt=json' 

Systems Librarian
Fenway Libraries Online
c/o Wentworth Institute of Technology
550 Huntington Ave.
Boston, MA 02115
(617) 442-2384 (FLO main number)




Re: Delete by query using JSON?

2016-03-22 Thread Alexandre Rafalovitch
Why do you care?

The difference between Q and FQ are the scoring. For delete, you
delete all of them regardless of scoring and there is no difference.
Just chuck them all into Q.

Regards,
   Alex.

Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 23 March 2016 at 06:07, Paul Hoffman  wrote:
> I've been struggling to find the right syntax for deleting by query
> using JSON, where the query includes an fq parameter.
>
> I know how to delete *all* documents, but how would I delete only
> documents with field doctype = "cres"?  I have tried the following along
> with a number of variations, all to no avail:
>
> $ curl -s -d @- 'http://localhost:8983/solr/blacklight-core/update?wt=json' 
> < {
> "delete": { "query": "doctype:cres" }
> }
> EOS
>
> I can identify the documents like this:
>
> curl -s 
> 'http://localhost:8983/solr/blacklight-core/select?q==doctype%3Acres=json=id'
>
> It seems like such a simple thing, but I haven't found any examples that
> use an fq.  Could someone post an example?
>
> Thanks in advance,
>
> Paul.
>
> --
> Paul Hoffman 
> Systems Librarian
> Fenway Libraries Online
> c/o Wentworth Institute of Technology
> 550 Huntington Ave.
> Boston, MA 02115
> (617) 442-2384 (FLO main number)


Delete by query using JSON?

2016-03-22 Thread Paul Hoffman
I've been struggling to find the right syntax for deleting by query 
using JSON, where the query includes an fq parameter.

I know how to delete *all* documents, but how would I delete only 
documents with field doctype = "cres"?  I have tried the following along 
with a number of variations, all to no avail:

$ curl -s -d @- 'http://localhost:8983/solr/blacklight-core/update?wt=json' 

Systems Librarian
Fenway Libraries Online
c/o Wentworth Institute of Technology
550 Huntington Ave.
Boston, MA 02115
(617) 442-2384 (FLO main number)


RE: PERFORMANCE WARNING: Overlapping onDeckSearchers=2

2016-03-22 Thread Aswath Srinivasan (TMS)
>> Since you've already reproduced it on a small scale, we'll need your entire 
>> Solr logfile.  The mailing list eats attachments, so you'll need to place it 
>> somewhere and provide a URL.  Sites like gist and dropbox are excellent for 
>> sharing large text content.

Sure. I will try and sent it across. However I don’t see anything in them. I 
have FINE level logs.

>> Do you literally mean 10 records (a number I can count on my fingers)? 
How much data is in each of those DB records?  Which configset did you use when 
creating the index?

Yes. Crazy right. Actually the select query I gave will yield 10 records only. 
Total records in the table is 200,000. I restricted the query to reproduce the 
issue in small scale. This issue started appearing in my QA, where, one time we 
happen to have an accidently frequent hard commit by two batch jobs. There is 
no autocommit set in solrconfig. Only the batch jobs send a commit. I was never 
able to recover the collection so I had to delete the data and reindex to fix 
it. Hence decided to reproduce the issue in very small scale and trying to fix 
it because deleting data and reindex cannot be a fix. DB records are just 
normal varchars. Some 7 columns. I don’t think data is the problem.

I cloned the 'solr-5.3.2\example\example-DIH\solr\db' and added some addition 
fields and removed unused default fields.

>> You mentioned a 10GB heap and then said the machine has 8GB of RAM.  Is this 
>> correct?  If so, this would be a source of serious performance issues.

OOPPSS. Its 1 GB heap. That was a typo. The consumed heap is around 300-400 MB.

Thank you,
Aswath NS


-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: Tuesday, March 22, 2016 10:41 AM
To: solr-user@lucene.apache.org
Subject: Re: PERFORMANCE WARNING: Overlapping onDeckSearchers=2

On 3/22/2016 11:32 AM, Aswath Srinivasan (TMS) wrote:
> Thank you Shawn for taking time and responding.
>
> Unfortunately, this is not the case. My heap is not even going past 
> 50% and I have a heap of 10 GB on a instance that I just installed as 
> a standalone version and was only trying out these,
>
> •   Install a standalone solr 5.3.2 in my PC
> •   Indexed some 10 db records
> •   Hit core reload/call commit frequently in quick internals
> •   Seeing the  o.a.s.c.SolrCore [db] PERFORMANCE WARNING: Overlapping 
> onDeckSearchers=2
> •   Collection crashes
> •   Only way to recover is to stop solr – delete the data folder – start 
> solr – reindex
>
> In any case, if this heap related issue, a solr restart should help, is what 
> I think.

That shouldn't happen.

Since you've already reproduced it on a small scale, we'll need your entire 
Solr logfile.  The mailing list eats attachments, so you'll need to place it 
somewhere and provide a URL.  Sites like gist and dropbox are excellent for 
sharing large text content.

More questions:

Do you literally mean 10 records (a number I can count on my fingers)? 
How much data is in each of those DB records?  Which configset did you use when 
creating the index?

You mentioned a 10GB heap and then said the machine has 8GB of RAM.  Is this 
correct?  If so, this would be a source of serious performance issues.

Thanks,
Shawn



DIH cant index adresses web

2016-03-22 Thread kostali hassan
I try to index rich data (msword and pdf) but when a content of document
have multiple liens (web adress) i get an ERROR in log .
what i have to add in my tika-config.xml to index web path .


Solr 5.3: anything similar to ChildDocTransformerFactory that does not flatten the hierarchical structure?

2016-03-22 Thread Alisa Z .
 Hi all, 

Following the example from  https://dzone.com/articles/using-solr-49-new , 
let's say we are given multiple-level nested structure: 


1
I am the parent
PARENT

1.1
I am the 1st child
CHILD


1.2
I am the 2nd child
CHILD

1.2.1
I am a grandchildren
GRANDCHILD





Querying 
q={!parent which="cat:PARENT"}name:(I am +child)=id,name,[child 
parentFilter=cat:PARENT]

will return flattened structure, where cat:CHILD and cat:GRANDCHILD documents 
end up on the same level:

1
I am the parent
PARENT

1.1
I am the 1st child
CHILD


1.2
I am the 2nd child
CHILD


1.2.1
I am a grandchildren
GRANDCHILD
  
 Indeed, the JAVAdocs for ChildDocTransformerFactory say: "This 
transformer returns all descendants of each parent document in a flat list 
nested inside the parent document". 

Yet is there any way to preserve the hierarchy in the response? I really need 
to find the way to preserve the structure in the response.  

Thank you in advance! 

-- 
Alisa Zhila

Re: PERFORMANCE WARNING: Overlapping onDeckSearchers=2

2016-03-22 Thread Shawn Heisey
On 3/22/2016 11:32 AM, Aswath Srinivasan (TMS) wrote:
> Thank you Shawn for taking time and responding.
>
> Unfortunately, this is not the case. My heap is not even going past 50% and I 
> have a heap of 10 GB on a instance that I just installed as a standalone 
> version and was only trying out these,
>
> •   Install a standalone solr 5.3.2 in my PC
> •   Indexed some 10 db records
> •   Hit core reload/call commit frequently in quick internals
> •   Seeing the  o.a.s.c.SolrCore [db] PERFORMANCE WARNING: Overlapping 
> onDeckSearchers=2
> •   Collection crashes
> •   Only way to recover is to stop solr – delete the data folder – start 
> solr – reindex
>
> In any case, if this heap related issue, a solr restart should help, is what 
> I think.

That shouldn't happen.

Since you've already reproduced it on a small scale, we'll need your
entire Solr logfile.  The mailing list eats attachments, so you'll need
to place it somewhere and provide a URL.  Sites like gist and dropbox
are excellent for sharing large text content.

More questions:

Do you literally mean 10 records (a number I can count on my fingers)? 
How much data is in each of those DB records?  Which configset did you
use when creating the index?

You mentioned a 10GB heap and then said the machine has 8GB of RAM.  Is
this correct?  If so, this would be a source of serious performance issues.

Thanks,
Shawn



Cant access new docs without restarting Solr (java.nio.file.NoSuchFileException)

2016-03-22 Thread Victor D'agostino

Hi

I've setup a Solr Cloud 5.5.0 ensemble with ZooKeeper.

If I post a few docs with curl it seems ok :
[root@LXLYOSOL30 ~]# curl --noproxy '*' 
http://lxlyosol30:8983/solr/db/update --data-binary 
@/data/conf-cpm3/test.txt -H 'Content-type:application/xml'



0name="QTime">18




But when I go to the admin page on my first shard I got :
Luke is not configured
although I have in solrconfig.xml and ZooKeeper the line name="/admin/luke" 
class="org.apache.solr.handler.admin.LukeRequestHandler" />



If I restart Solr i can see in the stats the new docs have been added !

Statistics :

Last Modified: 3 minutes ago
Num Docs:5
Max Doc:5
Heap Memory Usage:-1

Deleted Docs:0
Version:22
Segment Count:2

Instance :

CWD:/data/solr-5.5.0/server
Instance:/data/solr-5.5.0/server/solr/db_shard1_replica1
Data:/data/solr-5.5.0/server/solr/db_shard1_replica1/data
Index:/data/solr-5.5.0/server/solr/db_shard1_replica1/data/index
Impl:org.apache.solr.core.NRTCachingDirectoryFactory



In the logs I can see a java.nio.file.NoSuchFileException :

149746 INFO  (qtp609396627-17) [c:db s:shard1 r:core_node1 
x:db_shard1_replica1] o.a.s.c.S.Request [db_shard1_replica1] 
webapp=/solr path=/admin/file 
params={file=admin-extra.menu-bottom.html&_=1458667795848=text/html;charset%3Dutf-8} 
status=404 QTime=1
149780 ERROR (qtp609396627-20) [c:db s:shard1 r:core_node1 
x:db_shard1_replica1] o.a.s.h.RequestHandlerBase 
java.nio.file.NoSuchFileException: 
/data/solr-5.5.0/server/solr/db_shard1_replica1/data/index/segments_2
at 
sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at 
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at 
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at 
sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55)
at 
sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:144)
at 
sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:97)

at java.nio.file.Files.readAttributes(Files.java:1686)
at java.nio.file.Files.size(Files.java:2275)
at 
org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:210)
at 
org.apache.lucene.store.NRTCachingDirectory.fileLength(NRTCachingDirectory.java:127)
at 
org.apache.solr.handler.admin.LukeRequestHandler.getIndexInfo(LukeRequestHandler.java:592)
at 
org.apache.solr.handler.admin.LukeRequestHandler.handleRequestBody(LukeRequestHandler.java:137)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:155)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:2082)
at 
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:670)

at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:458)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:225)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:183)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)

at org.eclipse.jetty.server.Server.handle(Server.java:499)
at 
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
at 
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)

at java.lang.Thread.run(Thread.java:745)

149780 INFO  (qtp609396627-20) [c:db s:shard1 r:core_node1 
x:db_shard1_replica1] o.a.s.c.S.Request [db_shard1_replica1] 
webapp=/solr path=/admin/luke 

RE: PERFORMANCE WARNING: Overlapping onDeckSearchers=2

2016-03-22 Thread Aswath Srinivasan (TMS)
>> If you're not actually hitting OutOfMemoryError, then my best guess about 
>> what's happening is that you are running >>right at the edge of the 
>> available Java heap memory, so your JVM is constantly running full garbage 
>> collections to free up >>enough memory for normal operation.  In this 
>> situation, Solr is actually still running, but is spending most of its time 
>> >>paused for garbage collection.

Thank you Shawn for taking time and responding.

Unfortunately, this is not the case. My heap is not even going past 50% and I 
have a heap of 10 GB on a instance that I just installed as a standalone 
version and was only trying out these,

•   Install a standalone solr 5.3.2 in my PC
•   Indexed some 10 db records
•   Hit core reload/call commit frequently in quick internals
•   Seeing the  o.a.s.c.SolrCore [db] PERFORMANCE WARNING: Overlapping 
onDeckSearchers=2
•   Collection crashes
•   Only way to recover is to stop solr – delete the data folder – start 
solr – reindex

In any case, if this heap related issue, a solr restart should help, is what I 
think.

>>If I'm wrong about what's happening, then we'll need a lot more details about 
>>your server and your Solr setup.

Nothing really. Just a standalone solr 5.3.2 on a windows 7 machine - 64 bit, 8 
GB RAM. I bet anybody could reproduce the problem if they follow my above steps.

Thank you all for spending time on this. I shall post back my findings, if I'm 
findings are useful.

Thank you,
Aswath NS
Mobile  +1 424 345 5340
Office+1 310 468 6729

-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org]
Sent: Monday, March 21, 2016 6:07 PM
To: solr-user@lucene.apache.org
Subject: Re: PERFORMANCE WARNING: Overlapping onDeckSearchers=2

On 3/21/2016 6:49 PM, Aswath Srinivasan (TMS) wrote:
>>> Thank you for the responses. Collection crashes as in, I'm unable to open 
>>> the core tab in Solr console. Search is not returning. None of the page 
>>> opens in solr admin dashboard.
>>>
>>> I do understand how and why this issue occurs and I'm going to do all it 
>>> takes to avoid this issue. However, on an event of an accidental frequent 
>>> hard commit close to each other which throws this WARN then - I'm just 
>>> trying to figure out a way to make my collection throw results without 
>>> having to delete and re-create the collection or delete the data folder.
>>>
>>> Again, I know how to avoid this issue but if it still happens then what can 
>>> be done to avoid a complete reindexing.

If you're not actually hitting OutOfMemoryError, then my best guess about 
what's happening is that you are running right at the edge of the available 
Java heap memory, so your JVM is constantly running full garbage collections to 
free up enough memory for normal operation.  In this situation, Solr is 
actually still running, but is spending most of its time paused for garbage 
collection.

https://wiki.apache.org/solr/SolrPerformanceProblems#GC_pause_problems

The first part of the "GC pause problems" section on the above wiki page talks 
about very large heaps, but there is a paragraph just before "Tools and Garbage 
Collection" that talks about heaps that are a little bit too small.

If I'm right about this, you're going to need to increase your java heap size.  
Exactly how to do this will depend on what version of Solr you're running, how 
you installed it, and how you start it.

For 5.x versions using the included scripts, you can use the "-m" option on the 
"bin/solr" command when you start Solr manually, or you can edit the solr.in.sh 
file (usually found in /etc/default or /var/solr) if you used the service 
installer script on a UNIX/Linux platform.  The default heap size in 5.x 
scripts is 512MB, which is VERY small.

For earlier versions, there's too many install/start options available.
There were no installation scripts included with Solr itself, so I won't know 
anything about the setup.

If I'm wrong about what's happening, then we'll need a lot more details about 
your server and your Solr setup.

Thanks,
Shawn



Re: Creating new cluster with existing config in zookeeper

2016-03-22 Thread Robert Brown

Thanks Erick and Shawn, a "collection" is indeed what I meant.

I was under the impression the entire Tree view in the admin GUI was 
showing everything in ZK, including things like 
"collections/name/state.json", not just the /configs directory.


The solr.xml file is too isn't it? (I added it to ZK as per the docs), 
just a bit confusing to see some files/directories from ZK, and some not.


Thanks for any more insight.



On 03/22/2016 04:57 PM, Shawn Heisey wrote:

On 3/22/2016 6:38 AM, Robert Brown wrote:
Is it safe to create a new cluster but use an existing config set 
that's in zookeeper?  Or does that config set contain the cluster 
status too?


I want to (re)-build a cluster from scratch, with a different amount 
of shards, but not using shard-splitting.


When you say "cluster" what exactly do you mean?

To me, "cluster" in a Solr context means "a bunch of Solr servers."  
If this is what you mean, there is nothing built in to copy things 
from an existing cluster.  You *can* run multiple SolrCloud clusters 
on one Zookeeper ensemble.


If you are actually talking about a *collection* when you say 
"cluster", then what Erick said is 100% correct.


Thanks,
Shawn





Re: java.lang.NullPointerException in json facet hll function

2016-03-22 Thread Yago Riveiro
Nop.  
  
A normal query with wt=json  
  
the q parameter is *:*

  

The unique particular thing with this index is that some docs has the field
visitor__visitor_id as dynamic type long and others has the field as type
string. (our indexer tool didn't resolve the type right as result of a bug,
that was resolved later)  
  
In fact if I add q=visitor__visitor_id_l:* to query I have no error.

  

I think the problem is that I have the field "visitor__visitor_id" with _s and
_l mixed in the index. But this should not be a problem because they are two
independent fields, isn't it?

  

\--

  

/Yago Riveiro

> On Mar 22 2016, at 5:00 pm, Yonik Seeley ysee...@gmail.com wrote:  

>

> Hmmm, looks like the "hll" value is missing for some reason. It's not  
clear why that would happen... are you running any custom code?

>

> -Yonik

>

> On Tue, Mar 22, 2016 at 12:54 PM, Yago Riveiro
yago.rive...@gmail.com wrote:  
 Solr version: 5.3.1  
  
 With this query:  
  
 group:  
 {  
 type:terms,  
 limit:-1,  
 field:group,  
 sort:{index:asc},  
 numBuckets:true,  
 facet:{  
 col_1_unique_visitors:'hll(visitor__visitor_id_l)'  
 }  
 }  
 }  
  
 visitor__visitor_id_l is a dynamic field.  
  
 Running the query described above I'm hitting this exception.  
  
 java.lang.NullPointerException at  
 org.apache.solr.search.facet.HLLAgg$Merger.merge(HLLAgg.java:86) at  

org.apache.solr.search.facet.FacetBucket.mergeBucket(FacetModule.java:410)  
 at  
 org.apache.solr.search.facet.FacetFieldMerger.mergeBucketList(FacetModule
.java:510)  
 at
org.apache.solr.search.facet.FacetFieldMerger.merge(FacetModule.java:488)  
 at
org.apache.solr.search.facet.FacetFieldMerger.merge(FacetModule.java:462)  
 at  

org.apache.solr.search.facet.FacetBucket.mergeBucket(FacetModule.java:410)  
 at
org.apache.solr.search.facet.FacetQueryMerger.merge(FacetModule.java:337)  
 at  

org.apache.solr.search.facet.FacetModule.handleResponses(FacetModule.java:178)  
 at  
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchH
andler.java:410)  
 at  
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBa
se.java:143)  
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:2068) at  
 org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:669) at  
 org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:462) at  
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.ja
va:214)  
 at  
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.ja
va:179)  
 at  
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHand
ler.java:1652)  
 at  

org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)  
 at  

org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)  
 at  

org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)  
 at  
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.j
ava:223)  
 at  
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.j
ava:1127)  
 at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)  
 at  
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.ja
va:185)  
 at  
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.ja
va:1061)  
 at  

org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)  
 at  
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextH
andlerCollection.java:215)  
 at  
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollecti
on.java:110)  
 at  

org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)  
 at org.eclipse.jetty.server.Server.handle(Server.java:499) at  
 org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) at  

org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)  
 at  

org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)  
 at  
 org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.ja
va:635)  
 at  
 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.jav
a:555)  
 at java.lang.Thread.run(Thread.java:745)  
  
  
  
 \-  
 Best regards  
 \--  
 View this message in context:   
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: java.lang.NullPointerException in json facet hll function

2016-03-22 Thread Yonik Seeley
Hmmm, looks like the "hll" value is missing for some reason.  It's not
clear why that would happen... are you running any custom code?

-Yonik

On Tue, Mar 22, 2016 at 12:54 PM, Yago Riveiro  wrote:
> Solr version: 5.3.1
>
> With this query:
>
> group:
> {
> type:terms,
> limit:-1,
> field:group,
> sort:{index:asc},
> numBuckets:true,
> facet:{
> col_1_unique_visitors:'hll(visitor__visitor_id_l)'
> }
> }
> }
>
> visitor__visitor_id_l is a dynamic field.
>
> Running the query described above I'm hitting this exception.
>
> java.lang.NullPointerException at
> org.apache.solr.search.facet.HLLAgg$Merger.merge(HLLAgg.java:86) at
> org.apache.solr.search.facet.FacetBucket.mergeBucket(FacetModule.java:410)
> at
> org.apache.solr.search.facet.FacetFieldMerger.mergeBucketList(FacetModule.java:510)
> at org.apache.solr.search.facet.FacetFieldMerger.merge(FacetModule.java:488)
> at org.apache.solr.search.facet.FacetFieldMerger.merge(FacetModule.java:462)
> at
> org.apache.solr.search.facet.FacetBucket.mergeBucket(FacetModule.java:410)
> at org.apache.solr.search.facet.FacetQueryMerger.merge(FacetModule.java:337)
> at
> org.apache.solr.search.facet.FacetModule.handleResponses(FacetModule.java:178)
> at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:410)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2068) at
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:669) at
> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:462) at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:214)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
> at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
> at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
> at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
> at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
> at
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
> at org.eclipse.jetty.server.Server.handle(Server.java:499) at
> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) at
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
> at
> org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
> at java.lang.Thread.run(Thread.java:745)
>
>
>
> -
> Best regards
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/java-lang-NullPointerException-in-json-facet-hll-function-tp4265378.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Creating new cluster with existing config in zookeeper

2016-03-22 Thread Shawn Heisey

On 3/22/2016 6:38 AM, Robert Brown wrote:
Is it safe to create a new cluster but use an existing config set 
that's in zookeeper?  Or does that config set contain the cluster 
status too?


I want to (re)-build a cluster from scratch, with a different amount 
of shards, but not using shard-splitting.


When you say "cluster" what exactly do you mean?

To me, "cluster" in a Solr context means "a bunch of Solr servers."  If 
this is what you mean, there is nothing built in to copy things from an 
existing cluster.  You *can* run multiple SolrCloud clusters on one 
Zookeeper ensemble.


If you are actually talking about a *collection* when you say "cluster", 
then what Erick said is 100% correct.


Thanks,
Shawn



java.lang.NullPointerException in json facet hll function

2016-03-22 Thread Yago Riveiro
Solr version: 5.3.1

With this query:

group:
{
type:terms,
limit:-1,
field:group,
sort:{index:asc},
numBuckets:true,
facet:{
col_1_unique_visitors:'hll(visitor__visitor_id_l)'
}
}
}

visitor__visitor_id_l is a dynamic field.

Running the query described above I'm hitting this exception.

java.lang.NullPointerException at
org.apache.solr.search.facet.HLLAgg$Merger.merge(HLLAgg.java:86) at
org.apache.solr.search.facet.FacetBucket.mergeBucket(FacetModule.java:410)
at
org.apache.solr.search.facet.FacetFieldMerger.mergeBucketList(FacetModule.java:510)
at org.apache.solr.search.facet.FacetFieldMerger.merge(FacetModule.java:488)
at org.apache.solr.search.facet.FacetFieldMerger.merge(FacetModule.java:462)
at
org.apache.solr.search.facet.FacetBucket.mergeBucket(FacetModule.java:410)
at org.apache.solr.search.facet.FacetQueryMerger.merge(FacetModule.java:337)
at
org.apache.solr.search.facet.FacetModule.handleResponses(FacetModule.java:178)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:410)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2068) at
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:669) at
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:462) at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:214)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:499) at
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
at
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Thread.java:745) 



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/java-lang-NullPointerException-in-json-facet-hll-function-tp4265378.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Creating new cluster with existing config in zookeeper

2016-03-22 Thread Erick Erickson
The whole _point_ of configsets is to re-use them in multiple
collections, so please do!

Best,
Erick

On Tue, Mar 22, 2016 at 5:38 AM, Robert Brown  wrote:
> Hi,
>
> Is it safe to create a new cluster but use an existing config set that's in
> zookeeper?  Or does that config set contain the cluster status too?
>
> I want to (re)-build a cluster from scratch, with a different amount of
> shards, but not using shard-splitting.
>
> Thanks,
> Rob
>


Re: Next Solr Release - 5.5.1 or 6.0 ?

2016-03-22 Thread Erick Erickson
No real plans have been made that I know of for a 5.5.1 release. What
happens is that 5.5 was cut as, potentially, the last 5x release. Some
small fixes are still back-ported "just in case" there's a 5.5.1
release.

As for 6.0, that's something of a moving target currently, we're still
flushing out issues that need to be fixed first. IOW, there's no good
way to predict when it would be, the 4 weeks is just an estimate.
Besides, I always like to be a little cautious about using a X.0
release.

the "bottom line" here is that I'd go with 5.5 for the time being. If
a 5.5.1 comes out it should be a really simple upgrade to do if the
fixes are important to you.

Best,
Erick

On Tue, Mar 22, 2016 at 8:51 AM, Alessandro Benedetti
 wrote:
> Hi gents,
> I am planning a version upgrade, is it possible to know the next upcoming
> version ?
> From the Solr news I see the next one will be  Solr 6.0 in 4 weeks
> approximately.
>
> But from Jira I see also the 5.5.1 with 8 Jira  issues in it.
> Is it possible to have an estimation of the release dates ?
> Is 5.5.1 coming out soon ?
>
> Cheers
>
> --
> --
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England


Re: Solr 5.5.0: JVM args warning in console logfile.

2016-03-22 Thread Bram Van Dam
On 22/03/16 15:16, Shawn Heisey wrote:
> This message is not coming from Solr.  It's coming from Jetty.  Solr
> uses Jetty, but uses it completely unchanged.

Ah you're right. Here's the offending code:

https://github.com/eclipse/jetty.project/blob/ac24196b0d341534793308d585161381d5bca4ac/jetty-start/src/main/java/org/eclipse/jetty/start/Main.java#L446

Doesn't look like there's an immediate workaround. Darn.

 - Bram



Re: Explain score is different from score

2016-03-22 Thread Ahmet Arslan


Hi all,

May be it is better to move the discussion into a jira ticket.
I created SOLR-8884 for this. 

aHmet

On Tuesday, March 22, 2016 1:59 PM, Alessandro Benedetti 
 wrote:



I got this problem re-ranking.
But in my short  experience I was not able to reproduce nor fix the bug.
Can I ask you the query aprser used and all the components involved in the
query ?

Cheers

On Mon, Mar 21, 2016 at 8:40 PM, Rick Sullivan 
wrote:

> I haven't checked this thread since Friday, but here are my responses to
> the questions that have come up.
>
> 1. How is ranking affected?
>
> Some documents have their scores divided by an integer value in the
> response documents.
>
> 2. Do you see the proper ranking in the explain section?
>
> Yes, the explain section always seems to have consistent values and proper
> rankings.
>
> 3. What about the results?
>
> No, these are ranked according to the sometimes incorrect score.
>
> 4. What version of Solr are you using?
>
> I've produced the problem on SolrCloud 5.5.0 (2 shards on 2 nodes on the
> same machine), Solr 5.5.0 (no sharding), and Solr 5.4.1 (no sharding).
> I've also had trouble reproducing the problem on test data.
>
> Thanks,
> -Rick
>
> 
> > Date: Mon, 21 Mar 2016 14:14:44 +
> > From: iori...@yahoo.com.INVALID
> > To: solr-user@lucene.apache.org
> > Subject: Re: Explain score is different from score
> >
> >
> >
> > Hi Alessandro,
> >
> > OP have different ranking: fl=score and explain's score would have
> retrieve different orders.
> > I wrote test cases using ClassicSimilarity, but it won't re-produce.
> > This is really weird. I wonder what is triggering this.
> >
> > aHmet
> >
> >
> > On Monday, March 21, 2016 2:08 PM, Alessandro Benedetti <
> abenede...@apache.org> wrote:
> >
> >
> >
> > I would like to add a question, how the ranking is affected ?
> > Do you see the proper ranking in the explain section ?
> > And what about the results ? Are they ranked accordingly the correct
> score,
> > or they are ranked by the wrong score ?
> > I got a similar issue, which I am not able to reproduce yet, but it was
> > really really weird ( in my case I got also the ranking messed up_
> >
> > Cheers
> >
> >
> > On Mon, Mar 21, 2016 at 7:30 AM, G, Rajesh  wrote:
> >
> >> Hi Ahmet,
> >>
> >> I am using solr 5.5.0. I am running single instance with single core. No
> >> shards
> >>
> >> I have added  to my
> schema
> >> as suggested by Rick Sullivan. Now the scores are same between explain
> and
> >> score field.
> >>
> >> But instead of previous results "Lync - Microsoft Office 365" and
> >> "Microsoft Office 365" I am getting
> >>
> >> {
> >> "title":"Office 365",
> >> "score":7.471676
> >> },
> >> {
> >> "title":"Office 365",
> >> "score":7.471676
> >> },
> >>
> >> If I try NGram title:(Microsoft Ofice 365)
> >>
> >> The scores are same for top 10 results even though they are differing by
> >> min of 3 characters. I have attached my schema.xml so it can help
> >>
> >> 
> >> Lync - Microsoft Office 365
> >> 52.056263
> >> 
> >> Microsoft Office 365
> >> 52.056263
> >> 
> >> Microsoft Office 365 1.0
> >> 52.056263
> >> 
> >> Microsoft Office 365 14.0
> >> 52.056263
> >> 
> >> Microsoft Office 365 14.3
> >> 52.056263
> >> 
> >> Microsoft Office 365 14.4
> >> 52.056263
> >> 
> >> Microsoft Office 365 14.5(Mac)
> >> 52.056263
> >> 
> >> Microsoft Office 365 15.0
> >> 52.056263
> >> 
> >> Microsoft Office 365 16.0
> >> 52.056263
> >> 
> >> Microsoft Office 365 4.0
> >> 52.056263
> >> 
> >> Microsoft Office 365 E4
> >> 52.056263
> >> 
> >> Microsoft Mail Protection Reports for Office 365
> >> 15.0
> >> 50.215454
> >>
> >> Thanks
> >> Rajesh
> >>
> >>
> >>
> >> Corporate Executive Board India Private Limited. Registration No:
> >> U741040HR2004PTC035324. Registered office: 6th Floor, Tower B, DLF
> Building
> >> No.10 DLF Cyber City, Gurgaon, Haryana-122002, India.
> >>
> >> This e-mail and/or its attachments are intended only for the use of the
> >> addressee(s) and may contain confidential and legally privileged
> >> information belonging to CEB and/or its subsidiaries, including CEB
> >> subsidiaries that offer SHL Talent Measurement products and services. If
> >> you have received this e-mail in error, please notify the sender and
> >> immediately, destroy all copies of this email and its attachments. The
> >> publication, copying, in whole or in part, or use or dissemination in
> any
> >> other way of this e-mail and attachments by anyone other than the
> intended
> >> person(s) is prohibited.
> >>
> >> -Original Message-
> >> From: Ahmet Arslan [mailto:iori...@yahoo.com]
> >> Sent: Sunday, March 20, 2016 2:10 AM
> >> To: solr-user@lucene.apache.org; G, Rajesh ;
> >> r...@ricksullivan.net
> >> Subject: Re: Explain score is different from score
> >>
> >> Hi Rick and Rajesh,
> >>
> >> I wasn't able re-produce this neither with lucene nor 

Next Solr Release - 5.5.1 or 6.0 ?

2016-03-22 Thread Alessandro Benedetti
Hi gents,
I am planning a version upgrade, is it possible to know the next upcoming
version ?
>From the Solr news I see the next one will be  Solr 6.0 in 4 weeks
approximately.

But from Jira I see also the 5.5.1 with 8 Jira  issues in it.
Is it possible to have an estimation of the release dates ?
Is 5.5.1 coming out soon ?

Cheers

-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: Paging and cursorMark

2016-03-22 Thread Steve Rowe
Hi Tom,

There is an outstanding JIRA issue to directly support what you want (with a 
patch even!) but no work on it recently: 
.  If you’re so inclined, 
please pitch in: bring the patch up-to-date, test it, contribute improvements, 
etc.

--
Steve
www.lucidworks.com

> On Mar 22, 2016, at 10:27 AM, Tom Evans  wrote:
> 
> Hi all
> 
> With Solr 5.5.0, we're trying to improve our paging performance. When
> we are delivering results using infinite scrolling, cursorMark is
> perfectly fine - one page is followed by the next. However, we also
> offer traditional paging of results, and this is where it gets a
> little tricky.
> 
> Say we have 10 results per page, and a user wants to jump from page 1
> to page 20, and then wants to view page 21, there doesn't seem to be a
> simple way to get the nextCursorMark. We can make an inefficient
> request for page 20 (start=190, rows=10), but we cannot give that
> request a cursorMark=* as it contains start=190.
> 
> Consequently, if the user clicks to page 21, we have to continue along
> using start=200, as we have no cursorMark. The only way I can see to
> get a cursorMark at that point is to omit the start=200, and instead
> say rows=210, and ignore the first 200 results on the client side.
> Obviously, this gets more and more inefficient the deeper we page - I
> know that internally to Solr, using start=200=10 has to do the
> same work as rows=210, but less data is sent over the wire to the
> client.
> 
> As I understand it, the cursorMark is a hash of the sort values of the
> last document returned, so I don't really see why it is forbidden to
> specify start=190=10=* - why is it not possible to
> calculate the nextCursorMark from the last document returned?
> 
> I was also thinking a possible temporary workaround would be to
> request start=190=10, note the last document returned, and then
> make a subsequent query for q=id:""=1=*.
> This seems to work, but means an extra Solr query for no real reason.
> Is there any other problem to doing this?
> 
> Is there some other simple trick I am missing that we can use to get
> both the page of results we want and a nextCursorMark for the
> subsequent page?
> 
> Cheers
> 
> Tom



Re: JSON facets, count a long or an integer in cloud and non-cloud modes

2016-03-22 Thread Yonik Seeley
Much of the merging / client code in Solr (not just the JSON Facets)
uses things like
((Number)count).longValue()
to handle either int or long values.

-Yonik


On Tue, Mar 22, 2016 at 4:46 AM, Markus Jelsma
 wrote:
> Hello,
>
> Using SolrJ i built a method that consumes output produced by JSON facets, it 
> also checks the count before further processing the output:
>
> 
> 
> 
>   49
>   
> 
>   
>
> This is the code reading the count value via SolrJ:
>
> QueryResponse response = sourceClient.query(query);
> NamedList jsonFacets = (NamedList)response.getResponse().get("facets");
> int totalOccurences = (int)jsonFacets.get("count");
>
> The problem is, this code doesn't work in unit tests, it throws a:
> java.lang.ClassCastException: java.lang.Long cannot be cast to 
> java.lang.Integer!?
>
> But why it is an integer right? Anyway, i change the totalOccurences and the 
> cast to a long and the unit tests runs just fine. But when actually running 
> the code, i suddenly get another cast exception at exactly the same line.
> java.lang.ClassCastException: java.lang.Integer cannot be cast to 
> java.lang.Long
>
> What is going on? The only difference is that the unit tests runs in cloud 
> mode via AbstractFullDistribZkTestBase, but i run the code in a local dev 
> non-cloud mode. I haven't noticed this behaviour anywhere else although i 
> have many unit tests consuming lots of different pieces of Solr output, and 
> all that code runs fine in non-cloud mode too.
>
> Is this to be expected, normal? Did i catch another bug?
>
> Thanks!
> Markus


Re: Issue with Auto Suggester Component

2016-03-22 Thread Alessandro Benedetti
Let's try to keep it simple , please provide a query and the expected
ranking of the results.
I find quite difficult to read and intepret in this way :)

Then we can sort a proper autocomplete out :)

Cheers

On Wed, Mar 16, 2016 at 10:52 AM, Manohar Sripada 
wrote:

> Thanks for the response.
> If you see the first 5 results- "*ABC* Corporation", "*ABC*D
> Corporation", "*Abc
> *Tech", "*AbC*orporation", "*ABC*D company". The keyword "*abc*" that I am
> trying to search is part of prefix of all the strings. Sorry, it's not
> entire keyword to be of higher importance like #1, #3 and #6.
> In the 2nd set of results, "The *ABC* Company", "The *ABC*DEF", the keyword
> "*abc*" is not part of prefix of 1st string, but it is part of some other
> string of each result.
>
> Thanks,
> Manohar
>
> On Tue, Mar 15, 2016 at 3:03 PM, Alessandro Benedetti <
> abenede...@apache.org
> > wrote:
>
> > Hi Manohar,
> > I have not clear what should be your ideal ranking of suggestions.
> >
> > "I want prefix search of
> > entire keyword to be of high preference (#1 to #5 in the below example)
> > followed by prefix part of any other string (the last 2 in the below
> > example). I am not bothered about ordering within 1st and 2nd set.
> >
> > ABC Corporation
> > ABCD Corporation
> > Abc Tech
> > AbCorporation
> > ABCD company
> > The ABC Company
> > The ABCDEF"
> >
> > Could you take the example you posted, show an example of query and the
> > expected sort order ?
> > According to your description of the problem
> > Query : abc
> > 1 Criteria : entire keyword to be of high preference
> > I can't understand why you didn't count #3, #6 but you did #5 .
> >
> > 2 Criteria : followed by prefix part of any other string
> > It is not that clear, probably you mean all the rest.
> > Anyway an infix lookup algorithm with a boost for exact search should do
> > the trick.
> >
> > Please give us some more details !
> >
> > Cheers
> >
> > On Tue, Mar 15, 2016 at 8:19 AM, Manohar Sripada 
> > wrote:
> >
> > > Consider the below company names indexed. I want the below auto
> > suggestions
> > > to be listed when searched for "abc". Basically, I want prefix search
> of
> > > entire keyword to be of high preference (#1 to #5 in the below example)
> > > followed by prefix part of any other string (the last 2 in the below
> > > example). I am not bothered about ordering within 1st and 2nd set.
> > >
> > > ABC Corporation
> > > ABCD Corporation
> > > Abc Tech
> > > AbCorporation
> > > ABCD company
> > > The ABC Company
> > > The ABCDEF
> > >
> > > I am using Suggest feature of solr as mentioned in the wiki
> > > . I used
> > > different Lookup implementations available, but, I couldn't get the
> > result
> > > as above. Here's is one sample config I used with
> > BlendedInfixLookupFactory
> > >
> > >
> > >  **
> > > * businessNameBlendedInfixSuggester1*
> > > * BlendedInfixLookupFactory*
> > > * DocumentDictionaryFactory*
> > > * business_name_suggest*
> > > * id*
> > > *text_suggest*
> > > * business_name*
> > > * linear*
> > > * true*
> > > *  name="indexPath">/app/solrnode/suggest_test_1_blendedinfix1*
> > > * 0*
> > > * true*
> > > * true*
> > > * false*
> > > * *
> > >
> > > Can someone please suggest on how I can achieve this?
> > >
> > > Thanks,
> > > Manohar
> > >
> >
> >
> >
> > --
> > --
> >
> > Benedetti Alessandro
> > Visiting card : http://about.me/alessandro_benedetti
> >
> > "Tyger, tyger burning bright
> > In the forests of the night,
> > What immortal hand or eye
> > Could frame thy fearful symmetry?"
> >
> > William Blake - Songs of Experience -1794 England
> >
>



-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


RE: Save Number of words in field

2016-03-22 Thread G, Rajesh
It works.Thanks Jack.



Corporate Executive Board India Private Limited. Registration No: 
U741040HR2004PTC035324. Registered office: 6th Floor, Tower B, DLF Building 
No.10 DLF Cyber City, Gurgaon, Haryana-122002, India.

This e-mail and/or its attachments are intended only for the use of the 
addressee(s) and may contain confidential and legally privileged information 
belonging to CEB and/or its subsidiaries, including CEB subsidiaries that offer 
SHL Talent Measurement products and services. If you have received this e-mail 
in error, please notify the sender and immediately, destroy all copies of this 
email and its attachments. The publication, copying, in whole or in part, or 
use or dissemination in any other way of this e-mail and attachments by anyone 
other than the intended person(s) is prohibited.

-Original Message-
From: Jack Krupansky [mailto:jack.krupan...@gmail.com]
Sent: Tuesday, March 22, 2016 1:33 AM
To: solr-user@lucene.apache.org
Subject: Re: Save Number of words in field

You can write an Update Request Processor that would count the words in the 
source value for a specified field and generate that count as an integer value 
for another field.

My old Solr 4.x Deep Dive book has an example that uses a sequence (chain) of 
existing update processors to count words in a multi-valued text field.
That's not as efficient as a custom or script update processor, but avoids 
creating a custom processor.

See:
http://www.lulu.com/us/en/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-7/ebook/product-21203548.html
Look for "regex-count-words".


-- Jack Krupansky

On Mon, Mar 21, 2016 at 12:15 PM, G, Rajesh  wrote:

> Hi,
>
> When indexing sentences I want to store the number of words in the
> sentence in a fields that I can use to with other query later for word
> count match. Please let me know whether it is possible?
>
> Thanks
> Rajesh
>
>
>
> Corporate Executive Board India Private Limited. Registration No:
> U741040HR2004PTC035324. Registered office: 6th Floor, Tower B, DLF
> Building
> No.10 DLF Cyber City, Gurgaon, Haryana-122002, India..
>
>
>
> This e-mail and/or its attachments are intended only for the use of
> the
> addressee(s) and may contain confidential and legally privileged
> information belonging to CEB and/or its subsidiaries, including CEB
> subsidiaries that offer SHL Talent Measurement products and services.
> If you have received this e-mail in error, please notify the sender
> and immediately, destroy all copies of this email and its attachments.
> The publication, copying, in whole or in part, or use or dissemination
> in any other way of this e-mail and attachments by anyone other than
> the intended
> person(s) is prohibited.
>
>
>


NPE when executing clustering query search

2016-03-22 Thread Tim Hearn
Hi everyone,

I am trying to execute a clustering query to my single-core master-slave
solr setup and it is returning a NullPointerException.  I checked the line
in the source code where it is being thrown, and it looks like the null
object is some sort of 'filt' object, which doesn't make sense.  Below is
the query, my schema, solrconfig, and the exception.  If anyone could
please help that would be great!

Thank you!

QUERY:

1510649 [qtp1855032000-20] INFO  org.apache.solr.core.SolrCore  û
[collection1] webapp=/solr
path=/clustering
params{
mlt.minwl=3&
mlt.boost=true&
mlt.fl=textpropertymlt&
sort=score+desc&
carrot.snippet=impnoteplain&
mlt.mintf=1&
qf=concept_name&
mlt.interestingTerms=details&
wt=javabin&
clustering.engine=lingo&
version=2&
rows=500&
mlt.mindf=2&
debugQuery=true&
fl=id,concept_name,impnoteplain&
start=0&
q=id:567065dc658089be9f5c2c0d5670653d658089be9f5c2ae2&
carrot.title=concept_name&
clustering.results=true&
qt=/clustering&
fq=storeid:5670653d658089be9f5c2ae2&
fq={!edismax+v%3D''+qf%3D'textpropertymlt'+mm%3D'2<40%25'}=id=true}
status=500 QTime=217

ERROR:

1510697 [qtp1855032000-20] ERROR org.apache.solr.servlet.SolrDispatchFilter
 û null:java.lang.NullPointerException
at
org.apache.solr.search.QueryResultKey.(QueryResultKey.java:53)
at
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1416)
at
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:586)
at
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:511)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:235)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144)
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:291)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2006)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:413)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:204)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
at
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Unknown Source)


SCHEMA.XML:

   
   
   
   

   
   

   
   
   
   
   
   

   


   
   

   

   
   

   
   
   
   c

   
   
   

   



   

   

   
   

   
   

   

   


SOLR CONFIG.XML:

  



  clustering

  


Paging and cursorMark

2016-03-22 Thread Tom Evans
Hi all

With Solr 5.5.0, we're trying to improve our paging performance. When
we are delivering results using infinite scrolling, cursorMark is
perfectly fine - one page is followed by the next. However, we also
offer traditional paging of results, and this is where it gets a
little tricky.

Say we have 10 results per page, and a user wants to jump from page 1
to page 20, and then wants to view page 21, there doesn't seem to be a
simple way to get the nextCursorMark. We can make an inefficient
request for page 20 (start=190, rows=10), but we cannot give that
request a cursorMark=* as it contains start=190.

Consequently, if the user clicks to page 21, we have to continue along
using start=200, as we have no cursorMark. The only way I can see to
get a cursorMark at that point is to omit the start=200, and instead
say rows=210, and ignore the first 200 results on the client side.
Obviously, this gets more and more inefficient the deeper we page - I
know that internally to Solr, using start=200=10 has to do the
same work as rows=210, but less data is sent over the wire to the
client.

As I understand it, the cursorMark is a hash of the sort values of the
last document returned, so I don't really see why it is forbidden to
specify start=190=10=* - why is it not possible to
calculate the nextCursorMark from the last document returned?

I was also thinking a possible temporary workaround would be to
request start=190=10, note the last document returned, and then
make a subsequent query for q=id:""=1=*.
This seems to work, but means an extra Solr query for no real reason.
Is there any other problem to doing this?

Is there some other simple trick I am missing that we can use to get
both the page of results we want and a nextCursorMark for the
subsequent page?

Cheers

Tom


Re: Solr 5.5.0: JVM args warning in console logfile.

2016-03-22 Thread Shawn Heisey
On 3/22/2016 6:57 AM, Bram Van Dam wrote:
> Hey folks,
>
> When I start 5.5.0 (on RHEL), the following entry is added to
> server/logs/solr-8983-console.log:
>
> WARNING: System properties and/or JVM args set.  Consider using
> --dry-run or --exec
>
> I can't quite figure out what's causing this. Any clues on how to get
> rid of it?

This message is not coming from Solr.  It's coming from Jetty.  Solr
uses Jetty, but uses it completely unchanged.

Based on what it's saying, it might not be possible to get rid of it. 
The Solr start script uses system properties and JVM arguments, and
that's not going to change.

I can check with the Jetty project to see if they have a way to
eliminate the warning.

Thanks,
Shawn



Antw: Re: Boosting of Join Results

2016-03-22 Thread Alena Dengler
Mikhail, 

Thanks a lot for the suggestion. We now implemented the query as
follows: 
 q=(+geschichte +rom) OR _query_:{!boost b=0.01}{!join from=expandtype
fromIndex=pages to=id score=avg v='pageno_content:(+geschichte +rom)'})
With the factor of 0.01 it seems to work well with our data. 

Best Regards
Alena


>>> Mikhail Khludnev  22.03.2016 12:44 >>>
what is you nest join into boost eg q=+foo {!boost ..}{!join ...
v=...}

see
https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-BoostQueryParser


if it works, you may vote for
https://issues.apache.org/jira/browse/SOLR-7814 

On Tue, Mar 22, 2016 at 12:39 PM, Alena Dengler <
alena.deng...@bsb-muenchen.de> wrote:

> Hello,
>
> we are currently developing a combined index for book metadata and
> fulltexts. Our primary core contains metadata of ~12Mio. books.
~0.5Mio.
> of them have fulltexts; those fulltexts are indexed in a secondary
core.
> This secondary core has one index document per fulltext page.
> We are joining all matching fulltext pages with the bookwise
metadata
> in the primary core. Currently we have the problem that scores for
books
> with matches from the secondary core are not comparable with matches
> from metadata only. So we are trying to normalize fulltext scores to
be
> in the same dimension as the metadata scores for non-digitized
results.
>
> This is a basic query without join using only the primary core
> (metadata):
> http://server/solr/live/select?=+geschichte=id,score 
> Top 10 result scores range from 2.0 to 1.7
>
> For fulltexts, the query is extended with a join:
>
>
http://server/solr/live/select?q=%28%28+geschichte%29%20OR%20_query_:{!join%20from=expandtype%20fromIndex=pages%20to=id%20score=max%20v=%27pageno_content:%28+geschichte%29%27}%29=id,score

> Top 10 result scores range from 5.4 to 4.8 (4.7 score points for the
> first hit result from the joined secondary core. We would like to
reduce
> this value. See explain output below [1])
>
> This difference will effectively hide any books without fulltexts
from
> hitlists, which is not our goal.
>
> We tried to add lucene boosts to the join subquery, but they do not
> have any effect on the final scores. E.g. we 'down boost' the
fulltext
> results by a factor of 0.1:
> q=((+geschichte) OR _query_:{!join from=expandtype fromIndex=pages
> to=id score=max v='pageno_content:(+geschichte)^0.1'})
> But the resulting scores are the same as from the join example
above.
>
> Is this the correct query syntax, or should the boost for the join
> query be put somewhere else?
>
> Thanks for any suggestions.
>
> Best Regards
> Alena
>
> [1] Explain output for the first hit of the join example query
> 5.398742 = sum of:
>   4.816505 = sum of:
> 0.07251295 = max of:
>   0.07251295 = weight(title:geschichte in 10585926)
> [ClassicSimilarity], result of:
> 0.07251295 = score(doc=10585926,freq=1.0), product of:
>   0.037440736 = queryWeight, product of:
> 5.1646385 = idf(docFreq=197504, maxDocs=12713278)
> 0.00724944 = queryNorm
>   1.9367394 = fieldWeight in 10585926, product of:
> 1.0 = tf(freq=1.0), with freq of:
>   1.0 = termFreq=1.0
> 5.1646385 = idf(docFreq=197504, maxDocs=12713278)
> 0.375 = fieldNorm(doc=10585926)
>   0.005904072 = weight(free_search:geschichte in 10585926)
> [ClassicSimilarity], result of:
> 0.005904072 = score(doc=10585926,freq=2.0), product of:
>   0.022005465 = queryWeight, product of:
> 3.035471 = idf(docFreq=1660594, maxDocs=12713278)
> 0.00724944 = queryNorm
>   0.26830027 = fieldWeight in 10585926, product of:
> 1.4142135 = tf(freq=2.0), with freq of:
>   2.0 = termFreq=2.0
> 3.035471 = idf(docFreq=1660594, maxDocs=12713278)
> 0.0625 = fieldNorm(doc=10585926)
> 4.743992 = Score based on join value 957245
>   0.58188105 = weight(statusband:F in 10585926) [ClassicSimilarity],
> result of:
> 0.58188105 = score(doc=10585926,freq=1.0), product of:
>   0.4592555 = queryWeight, product of:
> 50.0 = boost
> 1.2670095 = idf(docFreq=9734121, maxDocs=12713278)
> 0.00724944 = queryNorm
>   1.2670095 = fieldWeight in 10585926, product of:
> 1.0 = tf(freq=1.0), with freq of:
>   1.0 = termFreq=1.0
> 1.2670095 = idf(docFreq=9734121, maxDocs=12713278)
> 1.0 = fieldNorm(doc=10585926)
>   3.5596997E-4 =
>
>
FunctionQuery(1.0/(3.16E-11*float(ms(const(1458638802405),date(freshness)))+1.0)),
> product of:
> 0.00491031 =
>
>
1.0/(3.16E-11*float(ms(const(1458638802405),date(freshness)=1813-01-01T00:00:01Z))+1.0)
> 0.0724944 = boost
> 1.0 = queryNorm
>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Solr 5.5.0: JVM args warning in console logfile.

2016-03-22 Thread Bram Van Dam
Hey folks,

When I start 5.5.0 (on RHEL), the following entry is added to
server/logs/solr-8983-console.log:

WARNING: System properties and/or JVM args set.  Consider using
--dry-run or --exec

I can't quite figure out what's causing this. Any clues on how to get
rid of it?

Thanks,

 - Bram


Creating new cluster with existing config in zookeeper

2016-03-22 Thread Robert Brown

Hi,

Is it safe to create a new cluster but use an existing config set that's 
in zookeeper?  Or does that config set contain the cluster status too?


I want to (re)-build a cluster from scratch, with a different amount of 
shards, but not using shard-splitting.


Thanks,
Rob



Re: JSON facets, count a long or an integer in cloud and non-cloud modes

2016-03-22 Thread Yago Riveiro
I have a felling that this is related with the number of nodes of the cluster.  
  
My dev runs in  cloud mode but only has one node, production has 12, and the
version is the same.  

\--

  

/Yago Riveiro

> On Mar 22 2016, at 9:13 am, Markus Jelsma markus.jel...@openindex.io
wrote:  

>

> I'm now using instanceof as ugly work around but i'd prefer a decent
solution.  
M

>

>  
  
\-Original message-  
 From:Yago Riveiro yago.rive...@gmail.com  
 Sent: Tuesday 22nd March 2016 9:52  
 To: solr-user solr-user@lucene.apache.org; solr-
u...@lucene.apache.org  
 Subject: Re: JSON facets, count a long or an integer in cloud and non-
cloud modes  
  
 I have the same problem with a custom response writer.  
  
 In production works but in my dev doesn't and are the same version 5.3.1  
  
 \--  
 Yago Riveiro  
  
 On 22 Mar 2016 08:47 +, Markus
Jelsmamarkus.jel...@openindex.io, wrote:  
  Hello,  
   
  Using SolrJ i built a method that consumes output produced by JSON
facets, it also checks the count before further processing the output:  
   
  result name="response" numFound="49" start="0"  
  /result  
  lst name="facets"  
  int name="count"49/int  
  lst name="by_day"  
  arr name="buckets"  
  lst  
   
  This is the code reading the count value via SolrJ:  
   
  QueryResponse response = sourceClient.query(query);  
  NamedList jsonFacets =
(NamedList)response.getResponse().get("facets");  
  int totalOccurences = (int)jsonFacets.get("count");  
   
  The problem is, this code doesn't work in unit tests, it throws a:  
  java.lang.ClassCastException: java.lang.Long cannot be cast to
java.lang.Integer!?  
   
  But why it is an integer right? Anyway, i change the totalOccurences
and the cast to a long and the unit tests runs just fine. But when actually
running the code, i suddenly get another cast exception at exactly the same
line.  
  java.lang.ClassCastException: java.lang.Integer cannot be cast to
java.lang.Long  
   
  What is going on? The only difference is that the unit tests runs in
cloud mode via AbstractFullDistribZkTestBase, but i run the code in a local
dev non-cloud mode. I haven't noticed this behaviour anywhere else although i
have many unit tests consuming lots of different pieces of Solr output, and
all that code runs fine in non-cloud mode too.  
   
  Is this to be expected, normal? Did i catch another bug?  
   
  Thanks!  
  Markus  




Re: Explain score is different from score

2016-03-22 Thread Alessandro Benedetti
I got this problem re-ranking.
But in my short  experience I was not able to reproduce nor fix the bug.
Can I ask you the query aprser used and all the components involved in the
query ?

Cheers

On Mon, Mar 21, 2016 at 8:40 PM, Rick Sullivan 
wrote:

> I haven't checked this thread since Friday, but here are my responses to
> the questions that have come up.
>
> 1. How is ranking affected?
>
> Some documents have their scores divided by an integer value in the
> response documents.
>
> 2. Do you see the proper ranking in the explain section?
>
> Yes, the explain section always seems to have consistent values and proper
> rankings.
>
> 3. What about the results?
>
> No, these are ranked according to the sometimes incorrect score.
>
> 4. What version of Solr are you using?
>
> I've produced the problem on SolrCloud 5.5.0 (2 shards on 2 nodes on the
> same machine), Solr 5.5.0 (no sharding), and Solr 5.4.1 (no sharding).
> I've also had trouble reproducing the problem on test data.
>
> Thanks,
> -Rick
>
> 
> > Date: Mon, 21 Mar 2016 14:14:44 +
> > From: iori...@yahoo.com.INVALID
> > To: solr-user@lucene.apache.org
> > Subject: Re: Explain score is different from score
> >
> >
> >
> > Hi Alessandro,
> >
> > OP have different ranking: fl=score and explain's score would have
> retrieve different orders.
> > I wrote test cases using ClassicSimilarity, but it won't re-produce.
> > This is really weird. I wonder what is triggering this.
> >
> > aHmet
> >
> >
> > On Monday, March 21, 2016 2:08 PM, Alessandro Benedetti <
> abenede...@apache.org> wrote:
> >
> >
> >
> > I would like to add a question, how the ranking is affected ?
> > Do you see the proper ranking in the explain section ?
> > And what about the results ? Are they ranked accordingly the correct
> score,
> > or they are ranked by the wrong score ?
> > I got a similar issue, which I am not able to reproduce yet, but it was
> > really really weird ( in my case I got also the ranking messed up_
> >
> > Cheers
> >
> >
> > On Mon, Mar 21, 2016 at 7:30 AM, G, Rajesh  wrote:
> >
> >> Hi Ahmet,
> >>
> >> I am using solr 5.5.0. I am running single instance with single core. No
> >> shards
> >>
> >> I have added  to my
> schema
> >> as suggested by Rick Sullivan. Now the scores are same between explain
> and
> >> score field.
> >>
> >> But instead of previous results "Lync - Microsoft Office 365" and
> >> "Microsoft Office 365" I am getting
> >>
> >> {
> >> "title":"Office 365",
> >> "score":7.471676
> >> },
> >> {
> >> "title":"Office 365",
> >> "score":7.471676
> >> },
> >>
> >> If I try NGram title:(Microsoft Ofice 365)
> >>
> >> The scores are same for top 10 results even though they are differing by
> >> min of 3 characters. I have attached my schema.xml so it can help
> >>
> >> 
> >> Lync - Microsoft Office 365
> >> 52.056263
> >> 
> >> Microsoft Office 365
> >> 52.056263
> >> 
> >> Microsoft Office 365 1.0
> >> 52.056263
> >> 
> >> Microsoft Office 365 14.0
> >> 52.056263
> >> 
> >> Microsoft Office 365 14.3
> >> 52.056263
> >> 
> >> Microsoft Office 365 14.4
> >> 52.056263
> >> 
> >> Microsoft Office 365 14.5(Mac)
> >> 52.056263
> >> 
> >> Microsoft Office 365 15.0
> >> 52.056263
> >> 
> >> Microsoft Office 365 16.0
> >> 52.056263
> >> 
> >> Microsoft Office 365 4.0
> >> 52.056263
> >> 
> >> Microsoft Office 365 E4
> >> 52.056263
> >> 
> >> Microsoft Mail Protection Reports for Office 365
> >> 15.0
> >> 50.215454
> >>
> >> Thanks
> >> Rajesh
> >>
> >>
> >>
> >> Corporate Executive Board India Private Limited. Registration No:
> >> U741040HR2004PTC035324. Registered office: 6th Floor, Tower B, DLF
> Building
> >> No.10 DLF Cyber City, Gurgaon, Haryana-122002, India.
> >>
> >> This e-mail and/or its attachments are intended only for the use of the
> >> addressee(s) and may contain confidential and legally privileged
> >> information belonging to CEB and/or its subsidiaries, including CEB
> >> subsidiaries that offer SHL Talent Measurement products and services. If
> >> you have received this e-mail in error, please notify the sender and
> >> immediately, destroy all copies of this email and its attachments. The
> >> publication, copying, in whole or in part, or use or dissemination in
> any
> >> other way of this e-mail and attachments by anyone other than the
> intended
> >> person(s) is prohibited.
> >>
> >> -Original Message-
> >> From: Ahmet Arslan [mailto:iori...@yahoo.com]
> >> Sent: Sunday, March 20, 2016 2:10 AM
> >> To: solr-user@lucene.apache.org; G, Rajesh ;
> >> r...@ricksullivan.net
> >> Subject: Re: Explain score is different from score
> >>
> >> Hi Rick and Rajesh,
> >>
> >> I wasn't able re-produce this neither with lucene nor solr.
> >> What version of solr is this?
> >> Are you using a sharded request?
> >>
> >> @BeforeClass
> >> public static void beforeClass() throws Exception {
> >> initCore("solrconfig.xml", 

Re: Boosting of Join Results

2016-03-22 Thread Mikhail Khludnev
what is you nest join into boost eg q=+foo {!boost ..}{!join ... v=...}

see
https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-BoostQueryParser

if it works, you may vote for
https://issues.apache.org/jira/browse/SOLR-7814

On Tue, Mar 22, 2016 at 12:39 PM, Alena Dengler <
alena.deng...@bsb-muenchen.de> wrote:

> Hello,
>
> we are currently developing a combined index for book metadata and
> fulltexts. Our primary core contains metadata of ~12Mio. books. ~0.5Mio.
> of them have fulltexts; those fulltexts are indexed in a secondary core.
> This secondary core has one index document per fulltext page.
> We are joining all matching fulltext pages with the bookwise metadata
> in the primary core. Currently we have the problem that scores for books
> with matches from the secondary core are not comparable with matches
> from metadata only. So we are trying to normalize fulltext scores to be
> in the same dimension as the metadata scores for non-digitized results.
>
> This is a basic query without join using only the primary core
> (metadata):
> http://server/solr/live/select?=+geschichte=id,score
> Top 10 result scores range from 2.0 to 1.7
>
> For fulltexts, the query is extended with a join:
>
> http://server/solr/live/select?q=%28%28+geschichte%29%20OR%20_query_:{!join%20from=expandtype%20fromIndex=pages%20to=id%20score=max%20v=%27pageno_content:%28+geschichte%29%27}%29=id,score
> Top 10 result scores range from 5.4 to 4.8 (4.7 score points for the
> first hit result from the joined secondary core. We would like to reduce
> this value. See explain output below [1])
>
> This difference will effectively hide any books without fulltexts from
> hitlists, which is not our goal.
>
> We tried to add lucene boosts to the join subquery, but they do not
> have any effect on the final scores. E.g. we 'down boost' the fulltext
> results by a factor of 0.1:
> q=((+geschichte) OR _query_:{!join from=expandtype fromIndex=pages
> to=id score=max v='pageno_content:(+geschichte)^0.1'})
> But the resulting scores are the same as from the join example above.
>
> Is this the correct query syntax, or should the boost for the join
> query be put somewhere else?
>
> Thanks for any suggestions.
>
> Best Regards
> Alena
>
> [1] Explain output for the first hit of the join example query
> 5.398742 = sum of:
>   4.816505 = sum of:
> 0.07251295 = max of:
>   0.07251295 = weight(title:geschichte in 10585926)
> [ClassicSimilarity], result of:
> 0.07251295 = score(doc=10585926,freq=1.0), product of:
>   0.037440736 = queryWeight, product of:
> 5.1646385 = idf(docFreq=197504, maxDocs=12713278)
> 0.00724944 = queryNorm
>   1.9367394 = fieldWeight in 10585926, product of:
> 1.0 = tf(freq=1.0), with freq of:
>   1.0 = termFreq=1.0
> 5.1646385 = idf(docFreq=197504, maxDocs=12713278)
> 0.375 = fieldNorm(doc=10585926)
>   0.005904072 = weight(free_search:geschichte in 10585926)
> [ClassicSimilarity], result of:
> 0.005904072 = score(doc=10585926,freq=2.0), product of:
>   0.022005465 = queryWeight, product of:
> 3.035471 = idf(docFreq=1660594, maxDocs=12713278)
> 0.00724944 = queryNorm
>   0.26830027 = fieldWeight in 10585926, product of:
> 1.4142135 = tf(freq=2.0), with freq of:
>   2.0 = termFreq=2.0
> 3.035471 = idf(docFreq=1660594, maxDocs=12713278)
> 0.0625 = fieldNorm(doc=10585926)
> 4.743992 = Score based on join value 957245
>   0.58188105 = weight(statusband:F in 10585926) [ClassicSimilarity],
> result of:
> 0.58188105 = score(doc=10585926,freq=1.0), product of:
>   0.4592555 = queryWeight, product of:
> 50.0 = boost
> 1.2670095 = idf(docFreq=9734121, maxDocs=12713278)
> 0.00724944 = queryNorm
>   1.2670095 = fieldWeight in 10585926, product of:
> 1.0 = tf(freq=1.0), with freq of:
>   1.0 = termFreq=1.0
> 1.2670095 = idf(docFreq=9734121, maxDocs=12713278)
> 1.0 = fieldNorm(doc=10585926)
>   3.5596997E-4 =
>
> FunctionQuery(1.0/(3.16E-11*float(ms(const(1458638802405),date(freshness)))+1.0)),
> product of:
> 0.00491031 =
>
> 1.0/(3.16E-11*float(ms(const(1458638802405),date(freshness)=1813-01-01T00:00:01Z))+1.0)
> 0.0724944 = boost
> 1.0 = queryNorm
>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Boosting of Join Results

2016-03-22 Thread Alena Dengler
Hello, 

we are currently developing a combined index for book metadata and
fulltexts. Our primary core contains metadata of ~12Mio. books. ~0.5Mio.
of them have fulltexts; those fulltexts are indexed in a secondary core.
This secondary core has one index document per fulltext page. 
We are joining all matching fulltext pages with the bookwise metadata
in the primary core. Currently we have the problem that scores for books
with matches from the secondary core are not comparable with matches
from metadata only. So we are trying to normalize fulltext scores to be
in the same dimension as the metadata scores for non-digitized results.

This is a basic query without join using only the primary core
(metadata): 
http://server/solr/live/select?=+geschichte=id,score
Top 10 result scores range from 2.0 to 1.7

For fulltexts, the query is extended with a join: 
http://server/solr/live/select?q=%28%28+geschichte%29%20OR%20_query_:{!join%20from=expandtype%20fromIndex=pages%20to=id%20score=max%20v=%27pageno_content:%28+geschichte%29%27}%29=id,score
Top 10 result scores range from 5.4 to 4.8 (4.7 score points for the
first hit result from the joined secondary core. We would like to reduce
this value. See explain output below [1])

This difference will effectively hide any books without fulltexts from
hitlists, which is not our goal. 

We tried to add lucene boosts to the join subquery, but they do not
have any effect on the final scores. E.g. we 'down boost' the fulltext
results by a factor of 0.1:
q=((+geschichte) OR _query_:{!join from=expandtype fromIndex=pages
to=id score=max v='pageno_content:(+geschichte)^0.1'})
But the resulting scores are the same as from the join example above. 

Is this the correct query syntax, or should the boost for the join
query be put somewhere else?

Thanks for any suggestions. 

Best Regards
Alena

[1] Explain output for the first hit of the join example query 
5.398742 = sum of:
  4.816505 = sum of:
0.07251295 = max of:
  0.07251295 = weight(title:geschichte in 10585926)
[ClassicSimilarity], result of:
0.07251295 = score(doc=10585926,freq=1.0), product of:
  0.037440736 = queryWeight, product of:
5.1646385 = idf(docFreq=197504, maxDocs=12713278)
0.00724944 = queryNorm
  1.9367394 = fieldWeight in 10585926, product of:
1.0 = tf(freq=1.0), with freq of:
  1.0 = termFreq=1.0
5.1646385 = idf(docFreq=197504, maxDocs=12713278)
0.375 = fieldNorm(doc=10585926)
  0.005904072 = weight(free_search:geschichte in 10585926)
[ClassicSimilarity], result of:
0.005904072 = score(doc=10585926,freq=2.0), product of:
  0.022005465 = queryWeight, product of:
3.035471 = idf(docFreq=1660594, maxDocs=12713278)
0.00724944 = queryNorm
  0.26830027 = fieldWeight in 10585926, product of:
1.4142135 = tf(freq=2.0), with freq of:
  2.0 = termFreq=2.0
3.035471 = idf(docFreq=1660594, maxDocs=12713278)
0.0625 = fieldNorm(doc=10585926)
4.743992 = Score based on join value 957245
  0.58188105 = weight(statusband:F in 10585926) [ClassicSimilarity],
result of:
0.58188105 = score(doc=10585926,freq=1.0), product of:
  0.4592555 = queryWeight, product of:
50.0 = boost
1.2670095 = idf(docFreq=9734121, maxDocs=12713278)
0.00724944 = queryNorm
  1.2670095 = fieldWeight in 10585926, product of:
1.0 = tf(freq=1.0), with freq of:
  1.0 = termFreq=1.0
1.2670095 = idf(docFreq=9734121, maxDocs=12713278)
1.0 = fieldNorm(doc=10585926)
  3.5596997E-4 =
FunctionQuery(1.0/(3.16E-11*float(ms(const(1458638802405),date(freshness)))+1.0)),
product of:
0.00491031 =
1.0/(3.16E-11*float(ms(const(1458638802405),date(freshness)=1813-01-01T00:00:01Z))+1.0)
0.0724944 = boost
1.0 = queryNorm



RE: JSON facets, count a long or an integer in cloud and non-cloud modes

2016-03-22 Thread Markus Jelsma
I'm now using instanceof as ugly work around but i'd prefer a decent solution.
M

 
 
-Original message-
> From:Yago Riveiro 
> Sent: Tuesday 22nd March 2016 9:52
> To: solr-user ; solr-user@lucene.apache.org
> Subject: Re: JSON facets, count a long or an integer in cloud and non-cloud 
> modes
> 
> I have the same problem with a custom response writer.
> 
> In production works but in my dev doesn't and are the same version 5.3.1
> 
> --
> Yago Riveiro
> 
> On 22 Mar 2016 08:47 +, Markus Jelsma, wrote:
> > Hello,
> > 
> > Using SolrJ i built a method that consumes output produced by JSON facets, 
> > it also checks the count before further processing the output:
> > 
> >  >  >  > 49 >  >  >  > 
> > This is the code reading the count value via SolrJ:
> > 
> > QueryResponse response = sourceClient.query(query);
> > NamedList jsonFacets = (NamedList)response.getResponse().get("facets");
> > int totalOccurences = (int)jsonFacets.get("count");
> > 
> > The problem is, this code doesn't work in unit tests, it throws a:
> > java.lang.ClassCastException: java.lang.Long cannot be cast to 
> > java.lang.Integer!?
> > 
> > But why it is an integer right? Anyway, i change the totalOccurences and 
> > the cast to a long and the unit tests runs just fine. But when actually 
> > running the code, i suddenly get another cast exception at exactly the same 
> > line.
> > java.lang.ClassCastException: java.lang.Integer cannot be cast to 
> > java.lang.Long
> > 
> > What is going on? The only difference is that the unit tests runs in cloud 
> > mode via AbstractFullDistribZkTestBase, but i run the code in a local dev 
> > non-cloud mode. I haven't noticed this behaviour anywhere else although i 
> > have many unit tests consuming lots of different pieces of Solr output, and 
> > all that code runs fine in non-cloud mode too.
> > 
> > Is this to be expected, normal? Did i catch another bug?
> > 
> > Thanks!
> > Markus
> 


Re: JSON facets, count a long or an integer in cloud and non-cloud modes

2016-03-22 Thread Yago Riveiro
I have the same problem with a custom response writer.

In production works but in my dev doesn't and are the same version 5.3.1

--
Yago Riveiro

On 22 Mar 2016 08:47 +, Markus Jelsma, wrote:
> Hello,
> 
> Using SolrJ i built a method that consumes output produced by JSON facets, it 
> also checks the count before further processing the output:
> 
>49
> This is the code reading the count value via SolrJ:
> 
> QueryResponse response = sourceClient.query(query);
> NamedList jsonFacets = (NamedList)response.getResponse().get("facets");
> int totalOccurences = (int)jsonFacets.get("count");
> 
> The problem is, this code doesn't work in unit tests, it throws a:
> java.lang.ClassCastException: java.lang.Long cannot be cast to 
> java.lang.Integer!?
> 
> But why it is an integer right? Anyway, i change the totalOccurences and the 
> cast to a long and the unit tests runs just fine. But when actually running 
> the code, i suddenly get another cast exception at exactly the same line.
> java.lang.ClassCastException: java.lang.Integer cannot be cast to 
> java.lang.Long
> 
> What is going on? The only difference is that the unit tests runs in cloud 
> mode via AbstractFullDistribZkTestBase, but i run the code in a local dev 
> non-cloud mode. I haven't noticed this behaviour anywhere else although i 
> have many unit tests consuming lots of different pieces of Solr output, and 
> all that code runs fine in non-cloud mode too.
> 
> Is this to be expected, normal? Did i catch another bug?
> 
> Thanks!
> Markus


JSON facets, count a long or an integer in cloud and non-cloud modes

2016-03-22 Thread Markus Jelsma
Hello,

Using SolrJ i built a method that consumes output produced by JSON facets, it 
also checks the count before further processing the output:




  49
  

  

This is the code reading the count value via SolrJ:

QueryResponse response = sourceClient.query(query);
NamedList jsonFacets = (NamedList)response.getResponse().get("facets");
int totalOccurences = (int)jsonFacets.get("count");

The problem is, this code doesn't work in unit tests, it throws a:
java.lang.ClassCastException: java.lang.Long cannot be cast to 
java.lang.Integer!?

But why it is an integer right? Anyway, i change the totalOccurences and the 
cast to a long and the unit tests runs just fine. But when actually running the 
code, i suddenly get another cast exception at exactly the same line.
java.lang.ClassCastException: java.lang.Integer cannot be cast to java.lang.Long

What is going on? The only difference is that the unit tests runs in cloud mode 
via AbstractFullDistribZkTestBase, but i run the code in a local dev non-cloud 
mode. I haven't noticed this behaviour anywhere else although i have many unit 
tests consuming lots of different pieces of Solr output, and all that code runs 
fine in non-cloud mode too.

Is this to be expected, normal? Did i catch another bug?

Thanks!
Markus


Custom shard key

2016-03-22 Thread Anil
HI,

i am using explicit shading by creating custom shard key for my application
using hbase util mumurhash (added snippet below).

int hash = MurmurHash.getInstance().hash(sharekey.getBytes());
hash = Math.abs(hash);
int routingValue = hash % shards;

I noticed only 5 out of 8 shards used and 3 shards are empty.

Please let me know if you see any issues or suggest any new shard key
generation mechanism.

Thanks.

Regards,
Anil