Debugging NullPointerException in QueryComponent.mergeIds for cross core search

2016-05-16 Thread Douglas McGilvray

Hi, I am having trouble performing a search across multiple cores on a single 
server running 5.4.0

I have erased, rebuilt & optimized the indexes, and  and according to the 
schema browser for both cores, every document has a unique key (id). However I 
am still getting the same error. I would appreciate any suggestions as to how I 
might resolve this. FWIW I do have child documents in my index, although all of 
these have a unique id too.

Query ran:
http://localhost:8984/solr/som/select?q=*%3A*=localhost:8984/solr/som,localhost:8984/solr/demo=/query




java.lang.NullPointerException at 
org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:1037)
 at 
org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:757)
 at 
org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:736)
 at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:406)
 at 


Full error message: https://justpaste.it/uccw

Re: Securing field level access permission by filtering the query itself

2015-11-06 Thread Douglas McGilvray
You know what guys, I have had a change in perspective… 

I previously thought: do I want to index all these documents multiple times 
just to protect 3 fields
I am now thinking: do I really want to try to parse all the fields in a query 
when there are only 3 roles. 

I have only 4k documents and 3 roles, so thats 8k more documents and I doubt I 
will need to cross query with the other documents … 

Until I have more or more complex roles, or more protected documents, I think 
multiple cores is the best option … 

Cheers
D


> On 5 Nov 2015, at 12:50, Alessandro Benedetti <abenede...@apache.org> wrote:
> 
> Be careful to the suggester as well. You don't want to show suggestions
> coming from sensitive fields.
> 
> Cheers
> 
> On 5 November 2015 at 15:28, Scott Stults <sstu...@opensourceconnections.com
>> wrote:
> 
>> Good to hear! Depending on how far you want to take it, you can then scan
>> the initial request coming in from the client (and the final response) for
>> raw Solr fields -- that shouldn't happen. I've used mod_security as a
>> general-purpose application firewall and would recommend it.
>> 
>> k/r,
>> Scott
>> 
>> On Wed, Nov 4, 2015 at 1:40 PM, Douglas McGilvray <d...@weemondo.com> wrote:
>> 
>>> 
>>> Thanks Alessandro, I had overlooked the highlighting component.
>>> 
>>> I will also add a reminder to exclude these fields from spellcheck
>> fields,
>>> (or maintain different spellcheck fields for different roles).
>>> 
>>> @Scott - Once I started planning my code the penny finally dropped
>>> regarding your point about aliasing the fields - it removes the need for
>>> calculating which fields to request in the app itself.
>>> 
>>> Regards,
>>> D
>>> 
>>> 
>>>> On 4 Nov 2015, at 14:53, Alessandro Benedetti <abenede...@apache.org>
>>> wrote:
>>>> 
>>>> Of course it depends of all the query parameter you use and you process
>>> in
>>>> the response.
>>>> The list you wrote should be ok if you use only those components.
>>>> 
>>>> For example if you use highlight, it's not ok and you need to take care
>>> of
>>>> the highlighted fields as well.
>>>> 
>>>> Cheers
>>>> 
>>>> On 30 October 2015 at 14:51, Douglas McGilvray <d...@weemondo.com>
>> wrote:
>>>> 
>>>>> 
>>>>> Scott thanks for the reply. I like the idea of mapping all the
>>> fieldnames
>>>>> internally, adding security through obscurity. My question therefore
>>> would
>>>>> be what is the definitive list of query parameters that one must
>> filter
>>> to
>>>>> ensure a particular field is not exposed in the query response? Am I
>>>>> missing in the following?
>>>>> 
>>>>> fl
>>>>> facect.field
>>>>> facet.pivot
>>>>> json.facet
>>>>> terms.fl
>>>>> 
>>>>> 
>>>>> kr
>>>>> Douglas
>>>>> 
>>>>> 
>>>>>> On 30 Oct 2015, at 07:37, Scott Stults <
>>>>> sstu...@opensourceconnections.com> wrote:
>>>>>> 
>>>>>> Douglas,
>>>>>> 
>>>>>> Managing a per-user-group whitelist of fields outside of Solr seems
>> the
>>>>>> best approach. When the query comes in you can then filter out any
>>> fields
>>>>>> not contained in the whitelist before you send the request to Solr.
>> The
>>>>>> easy part will be to do that on URL parameters like fl. Depending on
>>> how
>>>>>> your app generates the actual query string, you may want to also scan
>>>>> that
>>>>>> for fielded query clauses (eg "badfield:value") and localParams (eg
>>>>>> "{!dismax qf=badfield}value").
>>>>>> 
>>>>>> Secondly, you can map internal Solr fields to aliases using this
>> syntax
>>>>> in
>>>>>> the fl parameter: "display_name:real_solr_name". So when the request
>>>>> comes
>>>>>> in from your app, first you'll map from the requested field alias
>> names
>>>>> to
>>>>>> internal Solr names (while enforcing the whitelist), and then in the
>> fl
>>>>>> parameter supply the aliases you want sen

Re: Securing field level access permission by filtering the query itself

2015-11-04 Thread Douglas McGilvray

Thanks Alessandro, I had overlooked the highlighting component. 

I will also add a reminder to exclude these fields from spellcheck fields, (or 
maintain different spellcheck fields for different roles).

@Scott - Once I started planning my code the penny finally dropped regarding 
your point about aliasing the fields - it removes the need for calculating 
which fields to request in the app itself. 

Regards,
D


> On 4 Nov 2015, at 14:53, Alessandro Benedetti <abenede...@apache.org> wrote:
> 
> Of course it depends of all the query parameter you use and you process in
> the response.
> The list you wrote should be ok if you use only those components.
> 
> For example if you use highlight, it's not ok and you need to take care of
> the highlighted fields as well.
> 
> Cheers
> 
> On 30 October 2015 at 14:51, Douglas McGilvray <d...@weemondo.com> wrote:
> 
>> 
>> Scott thanks for the reply. I like the idea of mapping all the fieldnames
>> internally, adding security through obscurity. My question therefore would
>> be what is the definitive list of query parameters that one must filter to
>> ensure a particular field is not exposed in the query response? Am I
>> missing in the following?
>> 
>> fl
>> facect.field
>> facet.pivot
>> json.facet
>> terms.fl
>> 
>> 
>> kr
>> Douglas
>> 
>> 
>>> On 30 Oct 2015, at 07:37, Scott Stults <
>> sstu...@opensourceconnections.com> wrote:
>>> 
>>> Douglas,
>>> 
>>> Managing a per-user-group whitelist of fields outside of Solr seems the
>>> best approach. When the query comes in you can then filter out any fields
>>> not contained in the whitelist before you send the request to Solr. The
>>> easy part will be to do that on URL parameters like fl. Depending on how
>>> your app generates the actual query string, you may want to also scan
>> that
>>> for fielded query clauses (eg "badfield:value") and localParams (eg
>>> "{!dismax qf=badfield}value").
>>> 
>>> Secondly, you can map internal Solr fields to aliases using this syntax
>> in
>>> the fl parameter: "display_name:real_solr_name". So when the request
>> comes
>>> in from your app, first you'll map from the requested field alias names
>> to
>>> internal Solr names (while enforcing the whitelist), and then in the fl
>>> parameter supply the aliases you want sent in the response.
>>> 
>>> 
>>> k/r,
>>> Scott
>>> 
>>> On Wed, Oct 28, 2015 at 6:58 PM, Douglas McGilvray <d...@weemondo.com>
>> wrote:
>>> 
>>>> Hi all,
>>>> 
>>>> First I’d like to say the nested facets and the json facet api in
>>>> particular have made my world much better, I thank everyone involved,
>> you
>>>> are all awesome.
>>>> 
>>>> In my implementation has much of the solr query building working on the
>>>> browser, solr is behind a php server which acts as “proxy” and doorman,
>>>> filtering at the document level according to user role and supplying
>> some
>>>> sensible maximums …
>>>> 
>>>> However we now wish to filter just one or two potentially sensitive
>> fields
>>>> in one document type according to user role (as determined in the php
>>>> proxy). Duplicating documents (or cores) seems like overkill for just
>> two
>>>> fields in one document type .. I wondered if it would be feasible (in
>> the
>>>> interests of preventing malicious activity) to filter the query itself
>>>> whether it be parameters (fl, facet.fields, terms, etc) … or even deny
>> any
>>>> request in which fieldname occurs …
>>>> 
>>>> Is there someway someone might obscure a fieldname in a request?
>>>> 
>>>> Kind Regards & thanks in davacne,
>>>> Douglas
>>> 
>>> 
>>> 
>>> 
>>> --
>>> Scott Stults | Founder & Solutions Architect | OpenSource Connections,
>> LLC
>>> | 434.409.2780
>>> http://www.opensourceconnections.com
>> 
>> 
> 
> 
> -- 
> --
> 
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
> 
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
> 
> William Blake - Songs of Experience -1794 England



Re: Securing field level access permission by filtering the query itself

2015-10-30 Thread Douglas McGilvray

Scott thanks for the reply. I like the idea of mapping all the fieldnames 
internally, adding security through obscurity. My question therefore would be 
what is the definitive list of query parameters that one must filter to ensure 
a particular field is not exposed in the query response? Am I missing in the 
following?

fl
facect.field
facet.pivot
json.facet
terms.fl


kr
Douglas


> On 30 Oct 2015, at 07:37, Scott Stults <sstu...@opensourceconnections.com> 
> wrote:
> 
> Douglas,
> 
> Managing a per-user-group whitelist of fields outside of Solr seems the
> best approach. When the query comes in you can then filter out any fields
> not contained in the whitelist before you send the request to Solr. The
> easy part will be to do that on URL parameters like fl. Depending on how
> your app generates the actual query string, you may want to also scan that
> for fielded query clauses (eg "badfield:value") and localParams (eg
> "{!dismax qf=badfield}value").
> 
> Secondly, you can map internal Solr fields to aliases using this syntax in
> the fl parameter: "display_name:real_solr_name". So when the request comes
> in from your app, first you'll map from the requested field alias names to
> internal Solr names (while enforcing the whitelist), and then in the fl
> parameter supply the aliases you want sent in the response.
> 
> 
> k/r,
> Scott
> 
> On Wed, Oct 28, 2015 at 6:58 PM, Douglas McGilvray <d...@weemondo.com> wrote:
> 
>> Hi all,
>> 
>> First I’d like to say the nested facets and the json facet api in
>> particular have made my world much better, I thank everyone involved, you
>> are all awesome.
>> 
>> In my implementation has much of the solr query building working on the
>> browser, solr is behind a php server which acts as “proxy” and doorman,
>> filtering at the document level according to user role and supplying some
>> sensible maximums …
>> 
>> However we now wish to filter just one or two potentially sensitive fields
>> in one document type according to user role (as determined in the php
>> proxy). Duplicating documents (or cores) seems like overkill for just two
>> fields in one document type .. I wondered if it would be feasible (in the
>> interests of preventing malicious activity) to filter the query itself
>> whether it be parameters (fl, facet.fields, terms, etc) … or even deny any
>> request in which fieldname occurs …
>> 
>> Is there someway someone might obscure a fieldname in a request?
>> 
>> Kind Regards & thanks in davacne,
>> Douglas
> 
> 
> 
> 
> -- 
> Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC
> | 434.409.2780
> http://www.opensourceconnections.com



Securing field level access permission by filtering the query itself

2015-10-28 Thread Douglas McGilvray
Hi all,

First I’d like to say the nested facets and the json facet api in particular 
have made my world much better, I thank everyone involved, you are all awesome.

In my implementation has much of the solr query building working on the 
browser, solr is behind a php server which acts as “proxy” and doorman, 
filtering at the document level according to user role and supplying some 
sensible maximums …

However we now wish to filter just one or two potentially sensitive fields in 
one document type according to user role (as determined in the php proxy). 
Duplicating documents (or cores) seems like overkill for just two fields in one 
document type .. I wondered if it would be feasible (in the interests of 
preventing malicious activity) to filter the query itself whether it be 
parameters (fl, facet.fields, terms, etc) … or even deny any request in which 
fieldname occurs … 

Is there someway someone might obscure a fieldname in a request?

Kind Regards & thanks in davacne,
Douglas

Re: Drill down facet for multi valued groups of fields

2015-10-05 Thread Douglas McGilvray
Hi Alessandro, thanks for the reply!

I wasn’t aware of nested documents, as you say, it seems precisely what I need 
.. in fact it looks like I plagiarised that article while writing my 
description hehe. Upgrading from 4.10 is a bit of work but, might just be worth 
it. 

Many Thanks!


Douglas


> On 5 Oct 2015, at 07:46, Alessandro Benedetti <benedetti.ale...@gmail.com> 
> wrote:
> 
> Hi Douglas !
> Your use case is a really good fit for Nested Objects in Solr[1]
> After you model your problem in nested objects, you should play a little
> bit with faceting at different levels ( parent/children).
> A pivot faceting can be good  in some scenario, probably not in yours.
> I would probably suggest to explain better how you want to search over your
> documents.
> After that you can think how to facet over the children.
> 
> Cheers
> 
> [1] http://yonik.com/solr-nested-objects/
> 
> 2015-10-02 17:48 GMT+01:00 Douglas McGilvray <d...@weemondo.com>:
> 
>> Hi everyone, my first post to the list! I tried and failed to explain this
>> on IRC, I hope I can do a better job here.
>> 
>> My document has a group of text fields: company, location, year. The group
>> can have multiple values and I would like to facet (drill down) beginning
>> with any of the three fields. The order of the groups is not important.
>> 
>> Example Doc1:
>> {company1: Bolts, location1: NY, year1: 2002}
>> {company2: Nuts,  location2: SF, year2: 2010}
>> 
>> If I select two filters: fq=company:Bolts && fq=location:SF, Doc1 should
>> not be in the results, because although the two individual values occur in
>> the document, they are not within the same group.
>> 
>> Following the instructions for facet.prefix based drill down (the link
>> will explain this far better than I can)
>> 
>> https://wiki.apache.org/solr/HierarchicalFaceting#A.27facet.prefix.27__Based_Drill_Down
>> I can create a custom field lets call it cly  which represents a
>> drill-down hierarchy company > location > year
>> So For the document above it would contain the following:
>> 
>> 0:Bolts
>> 1:Bolts>NY
>> 2:Bolts>NY>2002
>> 0:Nuts
>> 1:Nuts>SF
>> 2:Nuts>SF>2010
>> 
>> I can retrieve the facets for the Company using: facet.field={!key=company
>> facet.prefix=“0:”}cly
>> 
>> If the user selects the company Bolts, I can filter the values using:
>> fq=cly:”0:Bolts”
>> And I can retrieve the facets for the location using
>> facet.field={!key=location facet.prefix=“1:Bolts”}cly
>> 
>> This is fine if I want to drill down company location year, but what if,
>> after selecting company I now want to select year? I make a field for each
>> combination of values: cly, cyl, lyc …..
>> 
>> If the user selects Bolts, I can now retrieve the facets for year using
>> facet.field={!key=year facet.prefix=“1:Bolts”}cyl (NB the order of the
>> letters here)
>> 
>> I hope the above makes sense, even if the idea itself is completely crazy.
>> Obviously the number of extra fields is factorial. I cant believe I am the
>> first person to want to do this type of search, which makes me think there
>> is probably another (better) way to do this. Is there?
>> 
>> King Regards and many thanks in advance,
>> Douglas
> 
> 
> 
> 
> -- 
> --
> 
> Benedetti Alessandro
> Visiting card - http://about.me/alessandro_benedetti
> Blog - http://alexbenedetti.blogspot.co.uk
> 
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
> 
> William Blake - Songs of Experience -1794 England



Drill down facet for multi valued groups of fields

2015-10-02 Thread Douglas McGilvray
Hi everyone, my first post to the list! I tried and failed to explain this on 
IRC, I hope I can do a better job here.   

My document has a group of text fields: company, location, year. The group can 
have multiple values and I would like to facet (drill down) beginning with any 
of the three fields. The order of the groups is not important.

Example Doc1: 
{company1: Bolts, location1: NY, year1: 2002}
{company2: Nuts,  location2: SF, year2: 2010}

If I select two filters: fq=company:Bolts && fq=location:SF, Doc1 should not be 
in the results, because although the two individual values occur in the 
document, they are not within the same group.  

Following the instructions for facet.prefix based drill down (the link will 
explain this far better than I can)
https://wiki.apache.org/solr/HierarchicalFaceting#A.27facet.prefix.27__Based_Drill_Down
I can create a custom field lets call it cly  which represents a drill-down 
hierarchy company > location > year
So For the document above it would contain the following:

0:Bolts
1:Bolts>NY
2:Bolts>NY>2002
0:Nuts
1:Nuts>SF
2:Nuts>SF>2010

I can retrieve the facets for the Company using: facet.field={!key=company 
facet.prefix=“0:”}cly

If the user selects the company Bolts, I can filter the values using: 
fq=cly:”0:Bolts”
And I can retrieve the facets for the location using facet.field={!key=location 
facet.prefix=“1:Bolts”}cly

This is fine if I want to drill down company location year, but what if, after 
selecting company I now want to select year? I make a field for each 
combination of values: cly, cyl, lyc …..

If the user selects Bolts, I can now retrieve the facets for year using 
facet.field={!key=year facet.prefix=“1:Bolts”}cyl (NB the order of the letters 
here)

I hope the above makes sense, even if the idea itself is completely crazy. 
Obviously the number of extra fields is factorial. I cant believe I am the 
first person to want to do this type of search, which makes me think there is 
probably another (better) way to do this. Is there?

King Regards and many thanks in advance,
Douglas