Error when use block-join filters in json api

2015-10-08 Thread Iana Bondarska
Hello,
I'm trying to use block join feature with json api. I got following error
when add query with  "which parent" or "child of " prefixes to query facet.
My query is :

{!parent which="state:Idaho"} AND category:Books

If I remove block-join prefixes -- query runs without errors. Are such
filters supported by json api now?

Best Regards,
Iana


Re: Instant Page Previews

2015-10-08 Thread Charlie Hull

On 08/10/2015 09:00, Paul Libbrecht wrote:

This is a very nice start Charlie,


Thanks! I just hope it's not too elderly to serve as a basis.


I'd warn a bit however, on the value of such previews: automated
previews of web-page can be quite far from what users might be
remembering a page should look like. In particular all tool pages
typically show quite "empty" or "initial" state in such automatic
previewers.


This wasn't for webpages, but rather for content in an enterprise search 
application - Office files, PDFs etc. It's a common feature in closed 
source enterprise search engines.


For i2geo.net, I searched for such a solution (a tick longer than 6
years ago!) and failed to find a successful one. Instead, we built in a
signed applet (yes, this is old) where users could screenshot previews.
To my taste, this allows a far far better feeling, but of course, it
requires a community approach.


Yes...and in an enterprise situation, this will depend on users spending 
time working on enhancing content, which is a battle seldom won :)


Charlie



Maybe both are needed if there's an infinite budget...

Paul


Charlie Hull 
8 octobre 2015 09:48

Hi Lewin,

We built this feature for another search engine (based on Xapian,
which I doubt many people have heard of) a long while ago. It's
standalone and open source though so should be applicable:
https://github.com/flaxsearch/flaxcode/tree/master/flax_basic/libs/previewgen

It uses a headless version of Open Office under the hood to generate
thumbbnail previews for various common file types, plus some
ImageMagick for PDF, all wrapped up in Python. Bear in mind this is 6
years old so some updating might be required!

Cheers

Charlie


Lewin Joy (TMS) 
7 octobre 2015 19:49
Hi,

Is there anyway we can implement instant page previews in solr?
Just saw that Google Search Appliance has this out of the box.
Just like what google.com had previously. We need to display the
content of the result record when hovering over the link.

Thanks,
Lewin










--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk


Re: Unexpected delayed document deletion with atomic updates

2015-10-08 Thread John Smith
The ids are all different: they're unique numbers followed by a couple
of keywords. I've made a test with a small collection of 10 documents to
make sure I can manage them manually: all ids are confirmed as different.

I also dumped the exact command, here's one example:

101084385_Sebago_ sebago shoes11.8701925463775

It's sent as the body of a POST request to
http://127.0.0.1:8080/solr/ato_test/update?wt=json=true, with a
Content-Type: text/xml header. I still noted the consistent loss of
another document with the update above.

John


On 08/10/15 00:38, Upayavira wrote:
> What ID are you using? Are you possibly using the same ID field for
> both, so the second document you visit causes the first to be
> overwritten?
>
> Upayavira
>
> On Wed, Oct 7, 2015, at 06:38 PM, Erick Erickson wrote:
>> This certainly should not be happening. I'd
>> take a careful look at what you actually send.
>> My _guess_ is that you're not sending the update
>> command you think you are
>>
>> As a test you could just curl (or use post.jar) to
>> send these types of commands up individually.
>>
>> Perhaps looking at the solr log would help too...
>>
>> Best,
>> Erick
>>
>> On Wed, Oct 7, 2015 at 6:32 AM, John Smith 
>> wrote:
>>> Hi,
>>>
>>> I'm bumping on the following problem with update XML messages. The idea
>>> is to record the number of clicks for a document: each time, a message
>>> is sent to .../update such as this one:
>>>
>>> 
>>> 
>>> abc
>>> 1
>>> 1.05
>>> 
>>> 
>>>
>>> (Clicks is an int field; Boost is a float field, it's updated to reflect
>>> the change in popularity using a formula based on the number of clicks).
>>>
>>> At the moment in the dev environment, changes are committed immediately.
>>>
>>>
>>> When a document is updated, the changes are indeed reflected in the
>>> search results. If I click on the same document again, all goes well.
>>> But  when I click on an other document, the latter gets updated as
>>> expected but the former is plainly deleted. It can no longer be found
>>> and the admin core Overview page counts 1 document less. If I click on a
>>> 3rd document, so goes the 2nd one.
>>>
>>>
>>> The schema is the default one amended to remove unneeded fields and add
>>> new ones, nothing fancy. All fields are stored="true" and there's no
>>> . I've tried versions 5.2.1 & 5.3.1 in standalone mode, with
>>> the same outcome. It looks like a bug to me but I might have overlooked
>>> something? This is my first attempt at atomic updates.
>>>
>>> Thanks,
>>> John.
>>>



Re: Exclude documents having same data in two fields

2015-10-08 Thread NutchDev
One option could be creating another boolean field field1_equals_field2 and
set it to true for documents matching it while indexing. Use this field as a
filter criteria while querying solr. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Exclude-documents-having-same-data-in-two-fields-tp4233408p4233411.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Fuzzy search for names and phrases

2015-10-08 Thread NutchDev
WordDelimiterFilterFactory can handle cases like,

wi-fi ==> wifi
SD500 ==> sd 500
PowerShot ==> Power Shot

you can get more information at wiki page here,
https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Fuzzy-search-for-names-and-phrases-tp4233209p4233413.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Unexpected delayed document deletion with atomic updates

2015-10-08 Thread Upayavira
Look for the DedupUpdateProcessor in an update chain.

that is there, but commented out IIRC in the techproducts sample
configs.

Perhaps you uncommented it to use your own update processors, but didn't
remove that component?

On Thu, Oct 8, 2015, at 07:38 AM, John Smith wrote:
> Oh, I forgot Erick's mention of the logs: there's nothing unusual in
> INFO level, the update request just gets mentioned. No exception. I
> reran it with the DEBUG level, but most of the log was related to jetty.
> Here's a line I noticed though:
> 
> org.apache.solr.servlet.HttpSolrCall; Closing out SolrRequest:
> {wt=json=true=dedupe}
> 
> The update.chain parameter wasn't part of the original request, and
> "dedupe" looks suspicious to me. Perhaps should I investigate further
> there?
> 
> Thanks,
> John.
> 
> 
> On 08/10/15 08:25, John Smith wrote:
> > The ids are all different: they're unique numbers followed by a couple
> > of keywords. I've made a test with a small collection of 10 documents to
> > make sure I can manage them manually: all ids are confirmed as different.
> >
> > I also dumped the exact command, here's one example:
> >
> > 101084385_Sebago_ sebago shoes > name="Clicks" update="set">1 > update="set">1.8701925463775
> >
> > It's sent as the body of a POST request to
> > http://127.0.0.1:8080/solr/ato_test/update?wt=json=true, with a
> > Content-Type: text/xml header. I still noted the consistent loss of
> > another document with the update above.
> >
> > John
> >
> >
> > On 08/10/15 00:38, Upayavira wrote:
> >> What ID are you using? Are you possibly using the same ID field for
> >> both, so the second document you visit causes the first to be
> >> overwritten?
> >>
> >> Upayavira
> >>
> >> On Wed, Oct 7, 2015, at 06:38 PM, Erick Erickson wrote:
> >>> This certainly should not be happening. I'd
> >>> take a careful look at what you actually send.
> >>> My _guess_ is that you're not sending the update
> >>> command you think you are
> >>>
> >>> As a test you could just curl (or use post.jar) to
> >>> send these types of commands up individually.
> >>>
> >>> Perhaps looking at the solr log would help too...
> >>>
> >>> Best,
> >>> Erick
> >>>
> >>> On Wed, Oct 7, 2015 at 6:32 AM, John Smith 
> >>> wrote:
>  Hi,
> 
>  I'm bumping on the following problem with update XML messages. The idea
>  is to record the number of clicks for a document: each time, a message
>  is sent to .../update such as this one:
> 
>  
>  
>  abc
>  1
>  1.05
>  
>  
> 
>  (Clicks is an int field; Boost is a float field, it's updated to reflect
>  the change in popularity using a formula based on the number of clicks).
> 
>  At the moment in the dev environment, changes are committed immediately.
> 
> 
>  When a document is updated, the changes are indeed reflected in the
>  search results. If I click on the same document again, all goes well.
>  But  when I click on an other document, the latter gets updated as
>  expected but the former is plainly deleted. It can no longer be found
>  and the admin core Overview page counts 1 document less. If I click on a
>  3rd document, so goes the 2nd one.
> 
> 
>  The schema is the default one amended to remove unneeded fields and add
>  new ones, nothing fancy. All fields are stored="true" and there's no
>  . I've tried versions 5.2.1 & 5.3.1 in standalone mode, with
>  the same outcome. It looks like a bug to me but I might have overlooked
>  something? This is my first attempt at atomic updates.
> 
>  Thanks,
>  John.
> 
> >
> 


Re: Unexpected delayed document deletion with atomic updates

2015-10-08 Thread John Smith
After some further investigation, for those interested: the
SignatureUpdateProcessorFactory fields were somehow mis-configured (I
guess copied over from another collection). The initial import had been
made using a data import handler: I suppose the update chain isn't
called in this process and no signature field is created - am I right?.

The first time a document was updated, a signature field with value
"" was added. The next time, the same signature was
generated for the new udpate, which triggered the deletion of all
documents with the same signature (i.e. the first one) as overwriteDupes
was set to true. Correct behavior but quite tricky...

So my conclusion here (please correct me if I'm wrong) is of course to
fix the signature configuration problem, but also to manage calling the
update chain (or maybe a simplified one, e.g. by skipping logging) in
the data import handler. Is there an easy way to do this? Conceptually,
shouldn't the update chain be callable from the data import process -
maybe it is?

John


On 08/10/15 09:43, Upayavira wrote:
> Yay!
>
> On Thu, Oct 8, 2015, at 08:38 AM, John Smith wrote:
>> Yes indeed, the update chain had been activated... I commented it out
>> again and the problem vanished.
>>
>> Good job, thanks Erick and Upayavira!
>> John
>>
>>
>> On 08/10/15 08:58, Upayavira wrote:
>>> Look for the DedupUpdateProcessor in an update chain.
>>>
>>> that is there, but commented out IIRC in the techproducts sample
>>> configs.
>>>
>>> Perhaps you uncommented it to use your own update processors, but didn't
>>> remove that component?
>>>
>>> On Thu, Oct 8, 2015, at 07:38 AM, John Smith wrote:
 Oh, I forgot Erick's mention of the logs: there's nothing unusual in
 INFO level, the update request just gets mentioned. No exception. I
 reran it with the DEBUG level, but most of the log was related to jetty.
 Here's a line I noticed though:

 org.apache.solr.servlet.HttpSolrCall; Closing out SolrRequest:
 {wt=json=true=dedupe}

 The update.chain parameter wasn't part of the original request, and
 "dedupe" looks suspicious to me. Perhaps should I investigate further
 there?

 Thanks,
 John.


 On 08/10/15 08:25, John Smith wrote:
> The ids are all different: they're unique numbers followed by a couple
> of keywords. I've made a test with a small collection of 10 documents to
> make sure I can manage them manually: all ids are confirmed as different.
>
> I also dumped the exact command, here's one example:
>
> 101084385_Sebago_ sebago shoes name="Clicks" update="set">1 update="set">1.8701925463775
>
> It's sent as the body of a POST request to
> http://127.0.0.1:8080/solr/ato_test/update?wt=json=true, with a
> Content-Type: text/xml header. I still noted the consistent loss of
> another document with the update above.
>
> John
>
>
> On 08/10/15 00:38, Upayavira wrote:
>> What ID are you using? Are you possibly using the same ID field for
>> both, so the second document you visit causes the first to be
>> overwritten?
>>
>> Upayavira
>>
>> On Wed, Oct 7, 2015, at 06:38 PM, Erick Erickson wrote:
>>> This certainly should not be happening. I'd
>>> take a careful look at what you actually send.
>>> My _guess_ is that you're not sending the update
>>> command you think you are
>>>
>>> As a test you could just curl (or use post.jar) to
>>> send these types of commands up individually.
>>>
>>> Perhaps looking at the solr log would help too...
>>>
>>> Best,
>>> Erick
>>>
>>> On Wed, Oct 7, 2015 at 6:32 AM, John Smith 
>>> wrote:
 Hi,

 I'm bumping on the following problem with update XML messages. The idea
 is to record the number of clicks for a document: each time, a message
 is sent to .../update such as this one:

 
 
 abc
 1
 1.05
 
 

 (Clicks is an int field; Boost is a float field, it's updated to 
 reflect
 the change in popularity using a formula based on the number of 
 clicks).

 At the moment in the dev environment, changes are committed 
 immediately.


 When a document is updated, the changes are indeed reflected in the
 search results. If I click on the same document again, all goes well.
 But  when I click on an other document, the latter gets updated as
 expected but the former is plainly deleted. It can no longer be found
 and the admin core Overview page counts 1 document less. If I click on 
 a
 3rd document, so goes the 2nd one.


 The schema is the default one amended to remove unneeded fields and add
 new ones, nothing fancy. All 

Re: Unexpected delayed document deletion with atomic updates

2015-10-08 Thread John Smith
Yes indeed, the update chain had been activated... I commented it out
again and the problem vanished.

Good job, thanks Erick and Upayavira!
John


On 08/10/15 08:58, Upayavira wrote:
> Look for the DedupUpdateProcessor in an update chain.
>
> that is there, but commented out IIRC in the techproducts sample
> configs.
>
> Perhaps you uncommented it to use your own update processors, but didn't
> remove that component?
>
> On Thu, Oct 8, 2015, at 07:38 AM, John Smith wrote:
>> Oh, I forgot Erick's mention of the logs: there's nothing unusual in
>> INFO level, the update request just gets mentioned. No exception. I
>> reran it with the DEBUG level, but most of the log was related to jetty.
>> Here's a line I noticed though:
>>
>> org.apache.solr.servlet.HttpSolrCall; Closing out SolrRequest:
>> {wt=json=true=dedupe}
>>
>> The update.chain parameter wasn't part of the original request, and
>> "dedupe" looks suspicious to me. Perhaps should I investigate further
>> there?
>>
>> Thanks,
>> John.
>>
>>
>> On 08/10/15 08:25, John Smith wrote:
>>> The ids are all different: they're unique numbers followed by a couple
>>> of keywords. I've made a test with a small collection of 10 documents to
>>> make sure I can manage them manually: all ids are confirmed as different.
>>>
>>> I also dumped the exact command, here's one example:
>>>
>>> 101084385_Sebago_ sebago shoes>> name="Clicks" update="set">1>> update="set">1.8701925463775
>>>
>>> It's sent as the body of a POST request to
>>> http://127.0.0.1:8080/solr/ato_test/update?wt=json=true, with a
>>> Content-Type: text/xml header. I still noted the consistent loss of
>>> another document with the update above.
>>>
>>> John
>>>
>>>
>>> On 08/10/15 00:38, Upayavira wrote:
 What ID are you using? Are you possibly using the same ID field for
 both, so the second document you visit causes the first to be
 overwritten?

 Upayavira

 On Wed, Oct 7, 2015, at 06:38 PM, Erick Erickson wrote:
> This certainly should not be happening. I'd
> take a careful look at what you actually send.
> My _guess_ is that you're not sending the update
> command you think you are
>
> As a test you could just curl (or use post.jar) to
> send these types of commands up individually.
>
> Perhaps looking at the solr log would help too...
>
> Best,
> Erick
>
> On Wed, Oct 7, 2015 at 6:32 AM, John Smith 
> wrote:
>> Hi,
>>
>> I'm bumping on the following problem with update XML messages. The idea
>> is to record the number of clicks for a document: each time, a message
>> is sent to .../update such as this one:
>>
>> 
>> 
>> abc
>> 1
>> 1.05
>> 
>> 
>>
>> (Clicks is an int field; Boost is a float field, it's updated to reflect
>> the change in popularity using a formula based on the number of clicks).
>>
>> At the moment in the dev environment, changes are committed immediately.
>>
>>
>> When a document is updated, the changes are indeed reflected in the
>> search results. If I click on the same document again, all goes well.
>> But  when I click on an other document, the latter gets updated as
>> expected but the former is plainly deleted. It can no longer be found
>> and the admin core Overview page counts 1 document less. If I click on a
>> 3rd document, so goes the 2nd one.
>>
>>
>> The schema is the default one amended to remove unneeded fields and add
>> new ones, nothing fancy. All fields are stored="true" and there's no
>> . I've tried versions 5.2.1 & 5.3.1 in standalone mode, with
>> the same outcome. It looks like a bug to me but I might have overlooked
>> something? This is my first attempt at atomic updates.
>>
>> Thanks,
>> John.
>>



Re: Instant Page Previews

2015-10-08 Thread Paul Libbrecht
This is a very nice start Charlie,

I'd warn a bit however, on the value of such previews: automated
previews of web-page can be quite far from what users might be
remembering a page should look like. In particular all tool pages
typically show quite "empty" or "initial" state in such automatic
previewers.

For i2geo.net, I searched for such a solution (a tick longer than 6
years ago!) and failed to find a successful one. Instead, we built in a
signed applet (yes, this is old) where users could screenshot previews.
To my taste, this allows a far far better feeling, but of course, it
requires a community approach.

Maybe both are needed if there's an infinite budget...

Paul

> Charlie Hull 
> 8 octobre 2015 09:48
>
> Hi Lewin,
>
> We built this feature for another search engine (based on Xapian,
> which I doubt many people have heard of) a long while ago. It's
> standalone and open source though so should be applicable:
> https://github.com/flaxsearch/flaxcode/tree/master/flax_basic/libs/previewgen
>
> It uses a headless version of Open Office under the hood to generate
> thumbbnail previews for various common file types, plus some
> ImageMagick for PDF, all wrapped up in Python. Bear in mind this is 6
> years old so some updating might be required!
>
> Cheers
>
> Charlie
>
>
> Lewin Joy (TMS) 
> 7 octobre 2015 19:49
> Hi,
>
> Is there anyway we can implement instant page previews in solr?
> Just saw that Google Search Appliance has this out of the box.
> Just like what google.com had previously. We need to display the
> content of the result record when hovering over the link.
>
> Thanks,
> Lewin
>
>
>
>



Re: Unexpected delayed document deletion with atomic updates

2015-10-08 Thread John Smith
Oh, I forgot Erick's mention of the logs: there's nothing unusual in
INFO level, the update request just gets mentioned. No exception. I
reran it with the DEBUG level, but most of the log was related to jetty.
Here's a line I noticed though:

org.apache.solr.servlet.HttpSolrCall; Closing out SolrRequest:
{wt=json=true=dedupe}

The update.chain parameter wasn't part of the original request, and
"dedupe" looks suspicious to me. Perhaps should I investigate further there?

Thanks,
John.


On 08/10/15 08:25, John Smith wrote:
> The ids are all different: they're unique numbers followed by a couple
> of keywords. I've made a test with a small collection of 10 documents to
> make sure I can manage them manually: all ids are confirmed as different.
>
> I also dumped the exact command, here's one example:
>
> 101084385_Sebago_ sebago shoes name="Clicks" update="set">1 update="set">1.8701925463775
>
> It's sent as the body of a POST request to
> http://127.0.0.1:8080/solr/ato_test/update?wt=json=true, with a
> Content-Type: text/xml header. I still noted the consistent loss of
> another document with the update above.
>
> John
>
>
> On 08/10/15 00:38, Upayavira wrote:
>> What ID are you using? Are you possibly using the same ID field for
>> both, so the second document you visit causes the first to be
>> overwritten?
>>
>> Upayavira
>>
>> On Wed, Oct 7, 2015, at 06:38 PM, Erick Erickson wrote:
>>> This certainly should not be happening. I'd
>>> take a careful look at what you actually send.
>>> My _guess_ is that you're not sending the update
>>> command you think you are
>>>
>>> As a test you could just curl (or use post.jar) to
>>> send these types of commands up individually.
>>>
>>> Perhaps looking at the solr log would help too...
>>>
>>> Best,
>>> Erick
>>>
>>> On Wed, Oct 7, 2015 at 6:32 AM, John Smith 
>>> wrote:
 Hi,

 I'm bumping on the following problem with update XML messages. The idea
 is to record the number of clicks for a document: each time, a message
 is sent to .../update such as this one:

 
 
 abc
 1
 1.05
 
 

 (Clicks is an int field; Boost is a float field, it's updated to reflect
 the change in popularity using a formula based on the number of clicks).

 At the moment in the dev environment, changes are committed immediately.


 When a document is updated, the changes are indeed reflected in the
 search results. If I click on the same document again, all goes well.
 But  when I click on an other document, the latter gets updated as
 expected but the former is plainly deleted. It can no longer be found
 and the admin core Overview page counts 1 document less. If I click on a
 3rd document, so goes the 2nd one.


 The schema is the default one amended to remove unneeded fields and add
 new ones, nothing fancy. All fields are stored="true" and there's no
 . I've tried versions 5.2.1 & 5.3.1 in standalone mode, with
 the same outcome. It looks like a bug to me but I might have overlooked
 something? This is my first attempt at atomic updates.

 Thanks,
 John.

>



Re: Unexpected delayed document deletion with atomic updates

2015-10-08 Thread Upayavira
Yay!

On Thu, Oct 8, 2015, at 08:38 AM, John Smith wrote:
> Yes indeed, the update chain had been activated... I commented it out
> again and the problem vanished.
> 
> Good job, thanks Erick and Upayavira!
> John
> 
> 
> On 08/10/15 08:58, Upayavira wrote:
> > Look for the DedupUpdateProcessor in an update chain.
> >
> > that is there, but commented out IIRC in the techproducts sample
> > configs.
> >
> > Perhaps you uncommented it to use your own update processors, but didn't
> > remove that component?
> >
> > On Thu, Oct 8, 2015, at 07:38 AM, John Smith wrote:
> >> Oh, I forgot Erick's mention of the logs: there's nothing unusual in
> >> INFO level, the update request just gets mentioned. No exception. I
> >> reran it with the DEBUG level, but most of the log was related to jetty.
> >> Here's a line I noticed though:
> >>
> >> org.apache.solr.servlet.HttpSolrCall; Closing out SolrRequest:
> >> {wt=json=true=dedupe}
> >>
> >> The update.chain parameter wasn't part of the original request, and
> >> "dedupe" looks suspicious to me. Perhaps should I investigate further
> >> there?
> >>
> >> Thanks,
> >> John.
> >>
> >>
> >> On 08/10/15 08:25, John Smith wrote:
> >>> The ids are all different: they're unique numbers followed by a couple
> >>> of keywords. I've made a test with a small collection of 10 documents to
> >>> make sure I can manage them manually: all ids are confirmed as different.
> >>>
> >>> I also dumped the exact command, here's one example:
> >>>
> >>> 101084385_Sebago_ sebago shoes >>> name="Clicks" update="set">1 >>> update="set">1.8701925463775
> >>>
> >>> It's sent as the body of a POST request to
> >>> http://127.0.0.1:8080/solr/ato_test/update?wt=json=true, with a
> >>> Content-Type: text/xml header. I still noted the consistent loss of
> >>> another document with the update above.
> >>>
> >>> John
> >>>
> >>>
> >>> On 08/10/15 00:38, Upayavira wrote:
>  What ID are you using? Are you possibly using the same ID field for
>  both, so the second document you visit causes the first to be
>  overwritten?
> 
>  Upayavira
> 
>  On Wed, Oct 7, 2015, at 06:38 PM, Erick Erickson wrote:
> > This certainly should not be happening. I'd
> > take a careful look at what you actually send.
> > My _guess_ is that you're not sending the update
> > command you think you are
> >
> > As a test you could just curl (or use post.jar) to
> > send these types of commands up individually.
> >
> > Perhaps looking at the solr log would help too...
> >
> > Best,
> > Erick
> >
> > On Wed, Oct 7, 2015 at 6:32 AM, John Smith 
> > wrote:
> >> Hi,
> >>
> >> I'm bumping on the following problem with update XML messages. The idea
> >> is to record the number of clicks for a document: each time, a message
> >> is sent to .../update such as this one:
> >>
> >> 
> >> 
> >> abc
> >> 1
> >> 1.05
> >> 
> >> 
> >>
> >> (Clicks is an int field; Boost is a float field, it's updated to 
> >> reflect
> >> the change in popularity using a formula based on the number of 
> >> clicks).
> >>
> >> At the moment in the dev environment, changes are committed 
> >> immediately.
> >>
> >>
> >> When a document is updated, the changes are indeed reflected in the
> >> search results. If I click on the same document again, all goes well.
> >> But  when I click on an other document, the latter gets updated as
> >> expected but the former is plainly deleted. It can no longer be found
> >> and the admin core Overview page counts 1 document less. If I click on 
> >> a
> >> 3rd document, so goes the 2nd one.
> >>
> >>
> >> The schema is the default one amended to remove unneeded fields and add
> >> new ones, nothing fancy. All fields are stored="true" and there's no
> >> . I've tried versions 5.2.1 & 5.3.1 in standalone mode, with
> >> the same outcome. It looks like a bug to me but I might have overlooked
> >> something? This is my first attempt at atomic updates.
> >>
> >> Thanks,
> >> John.
> >>
> 


Re: Lose Solr config on zookeeper when it is restarted

2015-10-08 Thread Upayavira
Are all instances of Solr the same version? Mixing versions could cause
what Erick describes.

On Thu, Oct 8, 2015, at 03:19 AM, Erick Erickson wrote:
> Sounds like you're somehow mixing old and new versions of the ZK state
> when you restart. I have no idea how that would be happening, but...
> 
> It's consistent. If you're somehow creating collections with the new
> format where state.json is kept per collection, but when you restart
> you're somehow defaulting to the old format where there was one
> gigantic clusterstate.json.
> 
> One deceptive thing is that with the new format, the clusterstate.json
> node will exist but be empty, and underneath the collections node
> there'll be a state.json for that collection.
> 
> Best,
> Erick
> 
> On Wed, Oct 7, 2015 at 6:31 PM, CrazyDiamond 
> wrote:
> > zk is stand-alone. But i think solr node is ephimeral.
> >
> >
> >
> > --
> > View this message in context: 
> > http://lucene.472066.n3.nabble.com/Lose-Solr-config-on-zookeeper-when-it-is-restarted-tp421p4233376.html
> > Sent from the Solr - User mailing list archive at Nabble.com.


Re: Instant Page Previews

2015-10-08 Thread Charlie Hull

On 07/10/2015 18:49, Lewin Joy (TMS) wrote:

Hi,

Is there anyway we can implement instant page previews in solr?
Just saw that Google Search Appliance has this out of the box.
Just like what google.com had previously. We need to display the content of the 
result record when hovering over the link.

Thanks,
Lewin


Hi Lewin,

We built this feature for another search engine (based on Xapian, which 
I doubt many people have heard of) a long while ago. It's standalone and 
open source though so should be applicable:

https://github.com/flaxsearch/flaxcode/tree/master/flax_basic/libs/previewgen
It uses a headless version of Open Office under the hood to generate 
thumbbnail previews for various common file types, plus some ImageMagick 
for PDF, all wrapped up in Python. Bear in mind this is 6 years old so 
some updating might be required!


Cheers

Charlie


--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk


Re: Instant Page Previews

2015-10-08 Thread Charlie Hull

On 07/10/2015 18:49, Lewin Joy (TMS) wrote:

Hi,

Is there anyway we can implement instant page previews in solr?
Just saw that Google Search Appliance has this out of the box.
Just like what google.com had previously. We need to display the content of the 
result record when hovering over the link.

Thanks,
Lewin


Hi Lewin,

We built this feature for another search engine (based on Xapian, which 
I doubt many people have heard of) a long while ago. It's standalone and 
open source though so should be applicable:

https://github.com/flaxsearch/flaxcode/tree/master/flax_basic/libs/previewgen
It uses a headless version of Open Office under the hood to generate 
thumbbnail previews for various common file types, plus some ImageMagick 
for PDF, all wrapped up in Python. Bear in mind this is 6 years old so 
some updating might be required!


Cheers

Charlie


--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk


Re: Error when use block-join filters in json api

2015-10-08 Thread Mikhail Khludnev
Hello, Yana!

It's not clear what happens. I appreciate if you put exact queries (up to
obfuscated values) and exceptions or actual results (and expectations);
sample data is also useful.
What I can note so far, user filters can't be used as parent mask in
_which_ and _of_. See
https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-BlockJoinQueryParsers
q={!parent which=}. Again the parameter The
parameter allParents is a filter that matches only parent documents;



On Thu, Oct 8, 2015 at 11:46 AM, Iana Bondarska  wrote:

> Hello,
> I'm trying to use block join feature with json api. I got following error
> when add query with  "which parent" or "child of " prefixes to query facet.
> My query is :
>
> {!parent which="state:Idaho"} AND category:Books
>
> If I remove block-join prefixes -- query runs without errors. Are such
> filters supported by json api now?
>
> Best Regards,
> Iana
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Re: Exclude documents having same data in two fields

2015-10-08 Thread Aman Tandon
But I want to find do it at run time without index extra field

With Regards
Aman Tandon

On Thu, Oct 8, 2015 at 11:55 AM, NutchDev  wrote:

> One option could be creating another boolean field field1_equals_field2 and
> set it to true for documents matching it while indexing. Use this field as
> a
> filter criteria while querying solr.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Exclude-documents-having-same-data-in-two-fields-tp4233408p4233411.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: How to show some documents ahead of others

2015-10-08 Thread NutchDev
Hi Christian,

You can take a look at Solr's  QueryElevationComponent
  . 

It will allow you to configure the top results for a given query regardless
of the normal lucene scoring. Also you can specify exclude document list to
exclude certain results for perticular query.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-show-some-documents-ahead-of-others-tp4233481p4233490.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: faceting is unusable slow since upgrade to 5.3.0

2015-10-08 Thread Uwe Reh

Sorry for the delay. I had an ugly flu.

SOLR-7730 seems to work fine. Using docValues with Solr 
5.4.0-2015-09-29_08-29-55 1705813 makes my faceted queries fast again. 
(90ms vs. 2ms) :-)


Thanks
Uwe



Am 27.09.2015 um 20:32 schrieb Mikhail Khludnev:

On Sun, Sep 27, 2015 at 2:00 PM, Uwe Reh  wrote:


When 5.4 with SOLR-7730 will be released, I will start to use docValues.
Going this way, seems more straight forward to me.



Sure. Giving your answers docValues facets has a really good chance to
perform in your index after SOLR-7730. It's really interesting to see
performance numbers on early 5.4 builds:
https://builds.apache.org/view/All/job/Solr-Artifacts-5.x/lastSuccessfulBuild/artifact/solr/package/





Re: Unexpected delayed document deletion with atomic updates

2015-10-08 Thread Upayavira
You can either specify the update chain via an update.chain request
parameter, or you can configure a new request parameter with its own URL
and separate update.chain value. 

I have no idea how you would then reference that in the DIH - I've never
really used it.

Upayavira

On Thu, Oct 8, 2015, at 09:25 AM, John Smith wrote:
> After some further investigation, for those interested: the
> SignatureUpdateProcessorFactory fields were somehow mis-configured (I
> guess copied over from another collection). The initial import had been
> made using a data import handler: I suppose the update chain isn't
> called in this process and no signature field is created - am I right?.
> 
> The first time a document was updated, a signature field with value
> "" was added. The next time, the same signature was
> generated for the new udpate, which triggered the deletion of all
> documents with the same signature (i.e. the first one) as overwriteDupes
> was set to true. Correct behavior but quite tricky...
> 
> So my conclusion here (please correct me if I'm wrong) is of course to
> fix the signature configuration problem, but also to manage calling the
> update chain (or maybe a simplified one, e.g. by skipping logging) in
> the data import handler. Is there an easy way to do this? Conceptually,
> shouldn't the update chain be callable from the data import process -
> maybe it is?
> 
> John
> 
> 
> On 08/10/15 09:43, Upayavira wrote:
> > Yay!
> >
> > On Thu, Oct 8, 2015, at 08:38 AM, John Smith wrote:
> >> Yes indeed, the update chain had been activated... I commented it out
> >> again and the problem vanished.
> >>
> >> Good job, thanks Erick and Upayavira!
> >> John
> >>
> >>
> >> On 08/10/15 08:58, Upayavira wrote:
> >>> Look for the DedupUpdateProcessor in an update chain.
> >>>
> >>> that is there, but commented out IIRC in the techproducts sample
> >>> configs.
> >>>
> >>> Perhaps you uncommented it to use your own update processors, but didn't
> >>> remove that component?
> >>>
> >>> On Thu, Oct 8, 2015, at 07:38 AM, John Smith wrote:
>  Oh, I forgot Erick's mention of the logs: there's nothing unusual in
>  INFO level, the update request just gets mentioned. No exception. I
>  reran it with the DEBUG level, but most of the log was related to jetty.
>  Here's a line I noticed though:
> 
>  org.apache.solr.servlet.HttpSolrCall; Closing out SolrRequest:
>  {wt=json=true=dedupe}
> 
>  The update.chain parameter wasn't part of the original request, and
>  "dedupe" looks suspicious to me. Perhaps should I investigate further
>  there?
> 
>  Thanks,
>  John.
> 
> 
>  On 08/10/15 08:25, John Smith wrote:
> > The ids are all different: they're unique numbers followed by a couple
> > of keywords. I've made a test with a small collection of 10 documents to
> > make sure I can manage them manually: all ids are confirmed as 
> > different.
> >
> > I also dumped the exact command, here's one example:
> >
> > 101084385_Sebago_ sebago shoes > name="Clicks" update="set">1 > update="set">1.8701925463775
> >
> > It's sent as the body of a POST request to
> > http://127.0.0.1:8080/solr/ato_test/update?wt=json=true, with a
> > Content-Type: text/xml header. I still noted the consistent loss of
> > another document with the update above.
> >
> > John
> >
> >
> > On 08/10/15 00:38, Upayavira wrote:
> >> What ID are you using? Are you possibly using the same ID field for
> >> both, so the second document you visit causes the first to be
> >> overwritten?
> >>
> >> Upayavira
> >>
> >> On Wed, Oct 7, 2015, at 06:38 PM, Erick Erickson wrote:
> >>> This certainly should not be happening. I'd
> >>> take a careful look at what you actually send.
> >>> My _guess_ is that you're not sending the update
> >>> command you think you are
> >>>
> >>> As a test you could just curl (or use post.jar) to
> >>> send these types of commands up individually.
> >>>
> >>> Perhaps looking at the solr log would help too...
> >>>
> >>> Best,
> >>> Erick
> >>>
> >>> On Wed, Oct 7, 2015 at 6:32 AM, John Smith 
> >>> wrote:
>  Hi,
> 
>  I'm bumping on the following problem with update XML messages. The 
>  idea
>  is to record the number of clicks for a document: each time, a 
>  message
>  is sent to .../update such as this one:
> 
>  
>  
>  abc
>  1
>  1.05
>  
>  
> 
>  (Clicks is an int field; Boost is a float field, it's updated to 
>  reflect
>  the change in popularity using a formula based on the number of 
>  clicks).
> 
>  At the moment in the dev environment, changes are committed 
>  immediately.

How to show some documents ahead of others

2015-10-08 Thread liviuchristian
Hi everybody, 
I'm building a recipe search engine based on solr. 
Paid postings must be listed on the front page, ahead of non-paid postings. 
When a user performs a query based on some keywords, solr returns documents in 
the decreasing order of their score. However, I don't know how to make paid 
postings that match the query to be listed ahead of the un-paid postings that 
match the query. 
How to I give paid postings an extra scoring so that I can listed them on the 
first page? What other solutions would be?

Please advice, 

Much obliged, 
Christian 


Re: Error when use block-join filters in json api

2015-10-08 Thread Iana Bondarska
Hello Mikhail,

here are json.facet parameters that I tried:
c_gender, c_window belong to child documents, rest - to parent.

1. returns no results, can we combine filters from different levels in
queries

 { high_popularity : {
type : query,
q : "{!child of=city:Auburn}city:Auburn AND c_window:seaview",
facet :{top_genres:{type: terms,field: c_gender}}
}
}

2.triggers full text search, I get error "undefined field: \"Review_Text\"" ,
that's true, I have mistake in configuration,but I didn't request fulltext
search in the query

 { high_popularity : {
type : query,
q : "{!child of=city:Auburn}city:Auburn AND {child
of=state:Washingthon}state:Washingthon",
facet :{top_genres:{type: terms,field: c_gender}}
}
}
logs for 2nd case:

org.apache.solr.common.SolrException: undefined field: "Review_Text"
at org.apache.solr.schema.IndexSchema.getField(IndexSchema.java:1229)
at 
org.apache.solr.parser.SolrQueryParserBase.getRangeQuery(SolrQueryParserBase.java:769)
at org.apache.solr.parser.QueryParser.Term(QueryParser.java:382)
at org.apache.solr.parser.QueryParser.Clause(QueryParser.java:185)
at org.apache.solr.parser.QueryParser.Query(QueryParser.java:139)
at org.apache.solr.parser.QueryParser.TopLevelQuery(QueryParser.java:96)
at 
org.apache.solr.parser.SolrQueryParserBase.parse(SolrQueryParserBase.java:151)
at org.apache.solr.search.LuceneQParser.parse(LuceneQParser.java:50)
at org.apache.solr.search.QParser.getQuery(QParser.java:141)
at 
org.apache.solr.search.join.BlockJoinParentQParser.parse(BlockJoinParentQParser.java:70)
at 
org.apache.solr.search.join.BlockJoinChildQParser.parse(BlockJoinChildQParser.java:25)
at org.apache.solr.search.QParser.getQuery(QParser.java:141)
at 
org.apache.solr.search.facet.FacetQueryParser.parse(FacetRequest.java:473)
at 
org.apache.solr.search.facet.FacetParser.parseQueryFacet(FacetRequest.java:255)
at 
org.apache.solr.search.facet.FacetParser.parseFacetOrStat(FacetRequest.java:238)
at 
org.apache.solr.search.facet.FacetParser.parseFacetOrStat(FacetRequest.java:229)
at 
org.apache.solr.search.facet.FacetParser.parseSubs(FacetRequest.java:179)
at 
org.apache.solr.search.facet.FacetTopParser.parse(FacetRequest.java:427)
at 
org.apache.solr.search.facet.FacetTopParser.parse(FacetRequest.java:416)
at 
org.apache.solr.search.facet.FacetModule.prepare(FacetModule.java:125)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:251)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2068)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:669)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:462)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:210)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:499)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
at 
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Thread.java:745)


2015-10-08 12:42 GMT+03:00 Mikhail Khludnev :

> Hello, Yana!
>
> 

Re: Exclude documents having same data in two fields

2015-10-08 Thread NutchDev
Hi Aman,

Have a look at this , it has query time approach also using Solr function
query,

http://stackoverflow.com/questions/15927893/how-to-check-equality-of-two-solr-fields
http://stackoverflow.com/questions/16258605/query-for-document-that-two-fields-are-equal



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Exclude-documents-having-same-data-in-two-fields-tp4233408p4233489.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: faceting is unusable slow since upgrade to 5.3.0

2015-10-08 Thread Mikhail Khludnev
Uwe, it's good to know! I mean that you've recovered. Take care!

On Thu, Oct 8, 2015 at 1:24 PM, Uwe Reh  wrote:

> Sorry for the delay. I had an ugly flu.
>
> SOLR-7730 seems to work fine. Using docValues with Solr
> 5.4.0-2015-09-29_08-29-55 1705813 makes my faceted queries fast again.
> (90ms vs. 2ms) :-)
>
> Thanks
> Uwe
>
>
>
>
> Am 27.09.2015 um 20:32 schrieb Mikhail Khludnev:
>
>> On Sun, Sep 27, 2015 at 2:00 PM, Uwe Reh 
>> wrote:
>>
>> When 5.4 with SOLR-7730 will be released, I will start to use docValues.
>>> Going this way, seems more straight forward to me.
>>>
>>
>>
>> Sure. Giving your answers docValues facets has a really good chance to
>> perform in your index after SOLR-7730. It's really interesting to see
>> performance numbers on early 5.4 builds:
>>
>> https://builds.apache.org/view/All/job/Solr-Artifacts-5.x/lastSuccessfulBuild/artifact/solr/package/
>>
>>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Re: Error when use block-join filters in json api

2015-10-08 Thread Iana Bondarska
sorry,missed example input data:
child document:
{ "c_gender": "female", "c_window": "seaview", "_root_": 1673891436 }

parent document:
{ "_id": 1673891436, "county_code": "26021", "city": "Auburn", "year": 2012,
"county": "Berrien", "Sales": 112808, "state": "Washington", "product_group":
"Books", "sku": "ZD111588", "income_bracket": "$25000 to $5", "_version_":
1513636454494896000, "_root_": 1673891436 },

2015-10-08 15:00 GMT+03:00 Iana Bondarska :

> Hello Mikhail,
>
> here are json.facet parameters that I tried:
> c_gender, c_window belong to child documents, rest - to parent.
>
> 1. returns no results, can we combine filters from different levels in
> queries
>
>  { high_popularity : {
> type : query,
> q : "{!child of=city:Auburn}city:Auburn AND c_window:seaview",
> facet :{top_genres:{type: terms,field: c_gender}}
> }
> }
>
> 2.triggers full text search, I get error "undefined field:
> \"Review_Text\"" , that's true, I have mistake in configuration,but I
> didn't request fulltext search in the query
>
>  { high_popularity : {
> type : query,
> q : "{!child of=city:Auburn}city:Auburn AND {child
> of=state:Washingthon}state:Washingthon",
> facet :{top_genres:{type: terms,field: c_gender}}
> }
> }
> logs for 2nd case:
>
> org.apache.solr.common.SolrException: undefined field: "Review_Text"
>   at org.apache.solr.schema.IndexSchema.getField(IndexSchema.java:1229)
>   at 
> org.apache.solr.parser.SolrQueryParserBase.getRangeQuery(SolrQueryParserBase.java:769)
>   at org.apache.solr.parser.QueryParser.Term(QueryParser.java:382)
>   at org.apache.solr.parser.QueryParser.Clause(QueryParser.java:185)
>   at org.apache.solr.parser.QueryParser.Query(QueryParser.java:139)
>   at org.apache.solr.parser.QueryParser.TopLevelQuery(QueryParser.java:96)
>   at 
> org.apache.solr.parser.SolrQueryParserBase.parse(SolrQueryParserBase.java:151)
>   at org.apache.solr.search.LuceneQParser.parse(LuceneQParser.java:50)
>   at org.apache.solr.search.QParser.getQuery(QParser.java:141)
>   at 
> org.apache.solr.search.join.BlockJoinParentQParser.parse(BlockJoinParentQParser.java:70)
>   at 
> org.apache.solr.search.join.BlockJoinChildQParser.parse(BlockJoinChildQParser.java:25)
>   at org.apache.solr.search.QParser.getQuery(QParser.java:141)
>   at 
> org.apache.solr.search.facet.FacetQueryParser.parse(FacetRequest.java:473)
>   at 
> org.apache.solr.search.facet.FacetParser.parseQueryFacet(FacetRequest.java:255)
>   at 
> org.apache.solr.search.facet.FacetParser.parseFacetOrStat(FacetRequest.java:238)
>   at 
> org.apache.solr.search.facet.FacetParser.parseFacetOrStat(FacetRequest.java:229)
>   at 
> org.apache.solr.search.facet.FacetParser.parseSubs(FacetRequest.java:179)
>   at 
> org.apache.solr.search.facet.FacetTopParser.parse(FacetRequest.java:427)
>   at 
> org.apache.solr.search.facet.FacetTopParser.parse(FacetRequest.java:416)
>   at 
> org.apache.solr.search.facet.FacetModule.prepare(FacetModule.java:125)
>   at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:251)
>   at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:2068)
>   at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:669)
>   at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:462)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:210)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)
>   at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>   at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>   at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
>   at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
>   at 

Re: Error when use block-join filters in json api

2015-10-08 Thread Mikhail Khludnev
>
>  { high_popularity : {
> type : query,
> q : "{!child of=city:Auburn}city:Auburn AND c_window:seaview",
> facet :{top_genres:{type: terms,field: c_gender}}
> }
> }


I'm not sure about facets, but query isn't correct in should be something
like
  q : "+c_window:seaview +{!child of="_id:[* TO *]"}city:Auburn",
of filter should match all parents docs! it's convenient to index
parent:true field.

q : "{!child of=city:Auburn}city:Auburn AND {child
of=state:Washingthon}state:Washingthon",

it should be rewritten as (I'm ashamed to say why)

q : "+{!child of="_id:[* TO *]"}city:Auburn +{child
of="_id:[* TO *]"}state:Washingthon",

I don't believe that stack trace is caused by the shown request, because it
hiccups on query facet, which you didn't show:

at
org.apache.solr.search.facet.FacetQueryParser.parse(FacetRequest.java:473)
at
org.apache.solr.search.facet.FacetParser.parseQueryFacet(FacetRequest.java:255)
at
org.apache.solr.search.facet.FacetParser.parseFacetOrStat(FacetRequest.java:238)

see
  public Object parseFacetOrStat(String key, String type, Object args)
throws SyntaxError {
// TODO: a place to register all these facet types?

if ("field".equals(type) || "terms".equals(type)) {
  return parseFieldFacet(key, args);
} else if ("query".equals(type)) {
  return parseQueryFacet(key, args);
} else if ("range".equals(type)) {
  return parseRangeFacet(key, args);
}




On Thu, Oct 8, 2015 at 4:08 PM, Iana Bondarska  wrote:

> sorry,missed example input data:
> child document:
> { "c_gender": "female", "c_window": "seaview", "_root_": 1673891436 }
>
> parent document:
> { "_id": 1673891436, "county_code": "26021", "city": "Auburn", "year":
> 2012,
> "county": "Berrien", "Sales": 112808, "state": "Washington",
> "product_group":
> "Books", "sku": "ZD111588", "income_bracket": "$25000 to $5",
> "_version_":
> 1513636454494896000, "_root_": 1673891436 },
>
> 2015-10-08 15:00 GMT+03:00 Iana Bondarska :
>
> > Hello Mikhail,
> >
> > here are json.facet parameters that I tried:
> > c_gender, c_window belong to child documents, rest - to parent.
> >
> > 1. returns no results, can we combine filters from different levels in
> > queries
> >
> >  { high_popularity : {
> > type : query,
> > q : "{!child of=city:Auburn}city:Auburn AND c_window:seaview",
> > facet :{top_genres:{type: terms,field: c_gender}}
> > }
> > }
> >
> > 2.triggers full text search, I get error "undefined field:
> > \"Review_Text\"" , that's true, I have mistake in configuration,but I
> > didn't request fulltext search in the query
> >
> >  { high_popularity : {
> > type : query,
> > q : "{!child of=city:Auburn}city:Auburn AND {child
> > of=state:Washingthon}state:Washingthon",
> > facet :{top_genres:{type: terms,field: c_gender}}
> > }
> > }
> > logs for 2nd case:
> >
> > org.apache.solr.common.SolrException: undefined field: "Review_Text"
> >   at
> org.apache.solr.schema.IndexSchema.getField(IndexSchema.java:1229)
> >   at
> org.apache.solr.parser.SolrQueryParserBase.getRangeQuery(SolrQueryParserBase.java:769)
> >   at org.apache.solr.parser.QueryParser.Term(QueryParser.java:382)
> >   at org.apache.solr.parser.QueryParser.Clause(QueryParser.java:185)
> >   at org.apache.solr.parser.QueryParser.Query(QueryParser.java:139)
> >   at
> org.apache.solr.parser.QueryParser.TopLevelQuery(QueryParser.java:96)
> >   at
> org.apache.solr.parser.SolrQueryParserBase.parse(SolrQueryParserBase.java:151)
> >   at
> org.apache.solr.search.LuceneQParser.parse(LuceneQParser.java:50)
> >   at org.apache.solr.search.QParser.getQuery(QParser.java:141)
> >   at
> org.apache.solr.search.join.BlockJoinParentQParser.parse(BlockJoinParentQParser.java:70)
> >   at
> org.apache.solr.search.join.BlockJoinChildQParser.parse(BlockJoinChildQParser.java:25)
> >   at org.apache.solr.search.QParser.getQuery(QParser.java:141)
> >   at
> org.apache.solr.search.facet.FacetQueryParser.parse(FacetRequest.java:473)
> >   at
> org.apache.solr.search.facet.FacetParser.parseQueryFacet(FacetRequest.java:255)
> >   at
> org.apache.solr.search.facet.FacetParser.parseFacetOrStat(FacetRequest.java:238)
> >   at
> org.apache.solr.search.facet.FacetParser.parseFacetOrStat(FacetRequest.java:229)
> >   at
> org.apache.solr.search.facet.FacetParser.parseSubs(FacetRequest.java:179)
> >   at
> org.apache.solr.search.facet.FacetTopParser.parse(FacetRequest.java:427)
> >   at
> org.apache.solr.search.facet.FacetTopParser.parse(FacetRequest.java:416)
> >   at
> org.apache.solr.search.facet.FacetModule.prepare(FacetModule.java:125)
> >   at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:251)
> >   at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
> >   at 

Re: how to deployed another web project into jetty server(solr inbuilt)

2015-10-08 Thread Mugeesh Husain
Thank you Daniel Collins.

Client is not providing tomcat or any other server that why i was looking
for it.
any i ask again for server installation.



Thanks,
Mugeesh Husain 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-deployed-another-web-project-into-jetty-server-solr-inbuilt-tp4233288p4233504.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to show some documents ahead of others

2015-10-08 Thread Upayavira
Or just have a field in your index - 

paid: true/false

Then sort=paid desc, score desc

(you may need to sort paid asc, not sure which way a boolean would sort)

Question is whether you want to show ALL paid posts, or just a set of
them. For the latter you could use result grouping on the paid field.

Upayavira

On Thu, Oct 8, 2015, at 01:34 PM, NutchDev wrote:
> Hi Christian,
> 
> You can take a look at Solr's  QueryElevationComponent
>   . 
> 
> It will allow you to configure the top results for a given query
> regardless
> of the normal lucene scoring. Also you can specify exclude document list
> to
> exclude certain results for perticular query.
> 
> 
> 
> 
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/How-to-show-some-documents-ahead-of-others-tp4233481p4233490.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Unexpected delayed document deletion with atomic updates

2015-10-08 Thread John Smith
Well, every day we update a lot of documents (usually several millions)
so the DIH is a good fit.

Calling the update chain would make sense there: after all a data import
is just a batch update. Otherwise, the same operations would have to be
made upfront, possibly in another environment and/or language. That's
probably what I'm gonna do anyway.

Thanks for your help!
John


On 08/10/15 13:39, Upayavira wrote:
> You can either specify the update chain via an update.chain request
> parameter, or you can configure a new request parameter with its own URL
> and separate update.chain value. 
>
> I have no idea how you would then reference that in the DIH - I've never
> really used it.
>
> Upayavira
>
> On Thu, Oct 8, 2015, at 09:25 AM, John Smith wrote:
>> After some further investigation, for those interested: the
>> SignatureUpdateProcessorFactory fields were somehow mis-configured (I
>> guess copied over from another collection). The initial import had been
>> made using a data import handler: I suppose the update chain isn't
>> called in this process and no signature field is created - am I right?.
>>
>> The first time a document was updated, a signature field with value
>> "" was added. The next time, the same signature was
>> generated for the new udpate, which triggered the deletion of all
>> documents with the same signature (i.e. the first one) as overwriteDupes
>> was set to true. Correct behavior but quite tricky...
>>
>> So my conclusion here (please correct me if I'm wrong) is of course to
>> fix the signature configuration problem, but also to manage calling the
>> update chain (or maybe a simplified one, e.g. by skipping logging) in
>> the data import handler. Is there an easy way to do this? Conceptually,
>> shouldn't the update chain be callable from the data import process -
>> maybe it is?
>>
>> John
>>
>>
>> On 08/10/15 09:43, Upayavira wrote:
>>> Yay!
>>>
>>> On Thu, Oct 8, 2015, at 08:38 AM, John Smith wrote:
 Yes indeed, the update chain had been activated... I commented it out
 again and the problem vanished.

 Good job, thanks Erick and Upayavira!
 John


 On 08/10/15 08:58, Upayavira wrote:
> Look for the DedupUpdateProcessor in an update chain.
>
> that is there, but commented out IIRC in the techproducts sample
> configs.
>
> Perhaps you uncommented it to use your own update processors, but didn't
> remove that component?
>
> On Thu, Oct 8, 2015, at 07:38 AM, John Smith wrote:
>> Oh, I forgot Erick's mention of the logs: there's nothing unusual in
>> INFO level, the update request just gets mentioned. No exception. I
>> reran it with the DEBUG level, but most of the log was related to jetty.
>> Here's a line I noticed though:
>>
>> org.apache.solr.servlet.HttpSolrCall; Closing out SolrRequest:
>> {wt=json=true=dedupe}
>>
>> The update.chain parameter wasn't part of the original request, and
>> "dedupe" looks suspicious to me. Perhaps should I investigate further
>> there?
>>
>> Thanks,
>> John.
>>
>>
>> On 08/10/15 08:25, John Smith wrote:
>>> The ids are all different: they're unique numbers followed by a couple
>>> of keywords. I've made a test with a small collection of 10 documents to
>>> make sure I can manage them manually: all ids are confirmed as 
>>> different.
>>>
>>> I also dumped the exact command, here's one example:
>>>
>>> 101084385_Sebago_ sebago shoes>> name="Clicks" update="set">1>> update="set">1.8701925463775
>>>
>>> It's sent as the body of a POST request to
>>> http://127.0.0.1:8080/solr/ato_test/update?wt=json=true, with a
>>> Content-Type: text/xml header. I still noted the consistent loss of
>>> another document with the update above.
>>>
>>> John
>>>
>>>
>>> On 08/10/15 00:38, Upayavira wrote:
 What ID are you using? Are you possibly using the same ID field for
 both, so the second document you visit causes the first to be
 overwritten?

 Upayavira

 On Wed, Oct 7, 2015, at 06:38 PM, Erick Erickson wrote:
> This certainly should not be happening. I'd
> take a careful look at what you actually send.
> My _guess_ is that you're not sending the update
> command you think you are
>
> As a test you could just curl (or use post.jar) to
> send these types of commands up individually.
>
> Perhaps looking at the solr log would help too...
>
> Best,
> Erick
>
> On Wed, Oct 7, 2015 at 6:32 AM, John Smith 
> wrote:
>> Hi,
>>
>> I'm bumping on the following problem with update XML messages. The 
>> idea
>> is to record the number of clicks for a document: each time, a 
>> message
>> is 

Re: How to show some documents ahead of others

2015-10-08 Thread Andrea Roggerone
Hi guys,
I don't think that sorting is a good solution in this case as it doesn't
allow any meaningful customization.I believe that the advised
QueryElevationComponent is one of the viable alternative. Another one would
be to boost at query time a particular field, like for instance paid. That
would allow you to assign different boosts to different values using a
function.

On Thu, Oct 8, 2015 at 1:48 PM, Upayavira  wrote:

> Or just have a field in your index -
>
> paid: true/false
>
> Then sort=paid desc, score desc
>
> (you may need to sort paid asc, not sure which way a boolean would sort)
>
> Question is whether you want to show ALL paid posts, or just a set of
> them. For the latter you could use result grouping on the paid field.
>
> Upayavira
>
> On Thu, Oct 8, 2015, at 01:34 PM, NutchDev wrote:
> > Hi Christian,
> >
> > You can take a look at Solr's  QueryElevationComponent
> >   .
> >
> > It will allow you to configure the top results for a given query
> > regardless
> > of the normal lucene scoring. Also you can specify exclude document list
> > to
> > exclude certain results for perticular query.
> >
> >
> >
> >
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/How-to-show-some-documents-ahead-of-others-tp4233481p4233490.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: How to show some documents ahead of others

2015-10-08 Thread Alessandro Benedetti
Is it possible to understand better this : "as it doesn't
allow any meaningful customization " ?

Cheers

On 8 October 2015 at 15:27, Andrea Roggerone  wrote:

> Hi guys,
> I don't think that sorting is a good solution in this case as it doesn't
> allow any meaningful customization.I believe that the advised
> QueryElevationComponent is one of the viable alternative. Another one would
> be to boost at query time a particular field, like for instance paid. That
> would allow you to assign different boosts to different values using a
> function.
>
> On Thu, Oct 8, 2015 at 1:48 PM, Upayavira  wrote:
>
> > Or just have a field in your index -
> >
> > paid: true/false
> >
> > Then sort=paid desc, score desc
> >
> > (you may need to sort paid asc, not sure which way a boolean would sort)
> >
> > Question is whether you want to show ALL paid posts, or just a set of
> > them. For the latter you could use result grouping on the paid field.
> >
> > Upayavira
> >
> > On Thu, Oct 8, 2015, at 01:34 PM, NutchDev wrote:
> > > Hi Christian,
> > >
> > > You can take a look at Solr's  QueryElevationComponent
> > >   .
> > >
> > > It will allow you to configure the top results for a given query
> > > regardless
> > > of the normal lucene scoring. Also you can specify exclude document
> list
> > > to
> > > exclude certain results for perticular query.
> > >
> > >
> > >
> > >
> > >
> > > --
> > > View this message in context:
> > >
> >
> http://lucene.472066.n3.nabble.com/How-to-show-some-documents-ahead-of-others-tp4233481p4233490.html
> > > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>



-- 
--

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: Best Indexing Approaches - To max the throughput

2015-10-08 Thread Mugeesh Husain
Good way Using SolrJ with Thread pool executor framework, increase number of
Thread as per your requirement



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Best-Indexing-Approaches-To-max-the-throughput-tp4232740p4233513.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to show some documents ahead of others

2015-10-08 Thread Walter Underwood
Sorting all paid above all unpaid will give bad results when there are many 
matches. It will show 1000 paid items, include all the barely relevant ones, 
before it shows the first highly relevant unpaid recipe. What if that was the 
only correct result?

Two approaches that work:

1. Boost paid items using the “boost” parameter in edismax. Adjust it to be a 
tiebreaker between documents with similar score.

2. Show two lists, one with the five most relevant paid, the next with the five 
most relevant unpaid.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Oct 8, 2015, at 7:39 AM, Alessandro Benedetti  
> wrote:
> 
> Is it possible to understand better this : "as it doesn't
> allow any meaningful customization " ?
> 
> Cheers
> 
> On 8 October 2015 at 15:27, Andrea Roggerone > wrote:
> 
>> Hi guys,
>> I don't think that sorting is a good solution in this case as it doesn't
>> allow any meaningful customization.I believe that the advised
>> QueryElevationComponent is one of the viable alternative. Another one would
>> be to boost at query time a particular field, like for instance paid. That
>> would allow you to assign different boosts to different values using a
>> function.
>> 
>> On Thu, Oct 8, 2015 at 1:48 PM, Upayavira  wrote:
>> 
>>> Or just have a field in your index -
>>> 
>>> paid: true/false
>>> 
>>> Then sort=paid desc, score desc
>>> 
>>> (you may need to sort paid asc, not sure which way a boolean would sort)
>>> 
>>> Question is whether you want to show ALL paid posts, or just a set of
>>> them. For the latter you could use result grouping on the paid field.
>>> 
>>> Upayavira
>>> 
>>> On Thu, Oct 8, 2015, at 01:34 PM, NutchDev wrote:
 Hi Christian,
 
 You can take a look at Solr's  QueryElevationComponent
   .
 
 It will allow you to configure the top results for a given query
 regardless
 of the normal lucene scoring. Also you can specify exclude document
>> list
 to
 exclude certain results for perticular query.
 
 
 
 
 
 --
 View this message in context:
 
>>> 
>> http://lucene.472066.n3.nabble.com/How-to-show-some-documents-ahead-of-others-tp4233481p4233490.html
 Sent from the Solr - User mailing list archive at Nabble.com.
>>> 
>> 
> 
> 
> 
> -- 
> --
> 
> Benedetti Alessandro
> Visiting card - http://about.me/alessandro_benedetti
> Blog - http://alexbenedetti.blogspot.co.uk
> 
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
> 
> William Blake - Songs of Experience -1794 England



Re: Error when use block-join filters in json api

2015-10-08 Thread Iana Bondarska
 thanks for help, I managed to get some results with json.facet:
{ high_popularity : {
type : query,
q : "+{!child of=state:*}state:Michigan+{!child
of=city:*}city:'Benton'",
facet :{top_genres:{type: terms,field: c_gender}}
}
}

but in this case operator to link conditions is defined by solrQueryParser,
defaultOperator param in schema.xml.
Are there any way to define which operator I want to use in the query?

Also seems that  it's impossible to specify numeric conditions, such query
leads to error (I have c_age in child documents too)
{ high_popularity : {
type : query,
q : "+{!child of=state:*}state:Michigan+{!child
of=city:*}city:'Benton'+c_age:[10:40]",
facet :{top_genres:{type: terms,field: c_gender}}
}
}



Regarding error with default text field -- it was definitely result of that
query, I also got 400, undefined field Review_Text in response(it was
defined as default query field in solrconfig.xml ) . After add field, I
results, but again, they are linked by OR condition


2015-10-08 16:53 GMT+03:00 Mikhail Khludnev :

> >
> >  { high_popularity : {
> > type : query,
> > q : "{!child of=city:Auburn}city:Auburn AND c_window:seaview",
> > facet :{top_genres:{type: terms,field: c_gender}}
> > }
> > }
>
>
> I'm not sure about facets, but query isn't correct in should be something
> like
>   q : "+c_window:seaview +{!child of="_id:[* TO *]"}city:Auburn",
> of filter should match all parents docs! it's convenient to index
> parent:true field.
>
> q : "{!child of=city:Auburn}city:Auburn AND {child
> of=state:Washingthon}state:Washingthon",
>
> it should be rewritten as (I'm ashamed to say why)
>
> q : "+{!child of="_id:[* TO *]"}city:Auburn +{child
> of="_id:[* TO *]"}state:Washingthon",
>
> I don't believe that stack trace is caused by the shown request, because it
> hiccups on query facet, which you didn't show:
>
> at
> org.apache.solr.search.facet.FacetQueryParser.parse(FacetRequest.java:473)
> at
>
> org.apache.solr.search.facet.FacetParser.parseQueryFacet(FacetRequest.java:255)
> at
>
> org.apache.solr.search.facet.FacetParser.parseFacetOrStat(FacetRequest.java:238)
>
> see
>   public Object parseFacetOrStat(String key, String type, Object args)
> throws SyntaxError {
> // TODO: a place to register all these facet types?
>
> if ("field".equals(type) || "terms".equals(type)) {
>   return parseFieldFacet(key, args);
> } else if ("query".equals(type)) {
>   return parseQueryFacet(key, args);
> } else if ("range".equals(type)) {
>   return parseRangeFacet(key, args);
> }
>
>
>
>
> On Thu, Oct 8, 2015 at 4:08 PM, Iana Bondarska  wrote:
>
> > sorry,missed example input data:
> > child document:
> > { "c_gender": "female", "c_window": "seaview", "_root_": 1673891436 }
> >
> > parent document:
> > { "_id": 1673891436, "county_code": "26021", "city": "Auburn", "year":
> > 2012,
> > "county": "Berrien", "Sales": 112808, "state": "Washington",
> > "product_group":
> > "Books", "sku": "ZD111588", "income_bracket": "$25000 to $5",
> > "_version_":
> > 1513636454494896000, "_root_": 1673891436 },
> >
> > 2015-10-08 15:00 GMT+03:00 Iana Bondarska :
> >
> > > Hello Mikhail,
> > >
> > > here are json.facet parameters that I tried:
> > > c_gender, c_window belong to child documents, rest - to parent.
> > >
> > > 1. returns no results, can we combine filters from different levels in
> > > queries
> > >
> > >  { high_popularity : {
> > > type : query,
> > > q : "{!child of=city:Auburn}city:Auburn AND c_window:seaview",
> > > facet :{top_genres:{type: terms,field: c_gender}}
> > > }
> > > }
> > >
> > > 2.triggers full text search, I get error "undefined field:
> > > \"Review_Text\"" , that's true, I have mistake in configuration,but I
> > > didn't request fulltext search in the query
> > >
> > >  { high_popularity : {
> > > type : query,
> > > q : "{!child of=city:Auburn}city:Auburn AND {child
> > > of=state:Washingthon}state:Washingthon",
> > > facet :{top_genres:{type: terms,field: c_gender}}
> > > }
> > > }
> > > logs for 2nd case:
> > >
> > > org.apache.solr.common.SolrException: undefined field: "Review_Text"
> > >   at
> > org.apache.solr.schema.IndexSchema.getField(IndexSchema.java:1229)
> > >   at
> >
> org.apache.solr.parser.SolrQueryParserBase.getRangeQuery(SolrQueryParserBase.java:769)
> > >   at org.apache.solr.parser.QueryParser.Term(QueryParser.java:382)
> > >   at
> org.apache.solr.parser.QueryParser.Clause(QueryParser.java:185)
> > >   at org.apache.solr.parser.QueryParser.Query(QueryParser.java:139)
> > >   at
> > org.apache.solr.parser.QueryParser.TopLevelQuery(QueryParser.java:96)
> > >   at
> >
> org.apache.solr.parser.SolrQueryParserBase.parse(SolrQueryParserBase.java:151)
> > >   at
> > org.apache.solr.search.LuceneQParser.parse(LuceneQParser.java:50)
> > >   at 

Re: Unexpected delayed document deletion with atomic updates

2015-10-08 Thread Alessandro Benedetti
Not related to the deletion problem, only as a curiosity for your use case :

1

Have i misunderstood your use case, or you should use :

inc

Increments a numeric value by a specific amount.

Must be specified as a single numeric value.

Basically overtime you click, you always set the value for that field to
"1" .
So a document with 1 click will be considered equal to one with 1000 clicks.
My 2 cents

Cheers

On 8 October 2015 at 14:10, John Smith  wrote:

> Well, every day we update a lot of documents (usually several millions)
> so the DIH is a good fit.
>
> Calling the update chain would make sense there: after all a data import
> is just a batch update. Otherwise, the same operations would have to be
> made upfront, possibly in another environment and/or language. That's
> probably what I'm gonna do anyway.
>
> Thanks for your help!
> John
>
>
> On 08/10/15 13:39, Upayavira wrote:
> > You can either specify the update chain via an update.chain request
> > parameter, or you can configure a new request parameter with its own URL
> > and separate update.chain value.
> >
> > I have no idea how you would then reference that in the DIH - I've never
> > really used it.
> >
> > Upayavira
> >
> > On Thu, Oct 8, 2015, at 09:25 AM, John Smith wrote:
> >> After some further investigation, for those interested: the
> >> SignatureUpdateProcessorFactory fields were somehow mis-configured (I
> >> guess copied over from another collection). The initial import had been
> >> made using a data import handler: I suppose the update chain isn't
> >> called in this process and no signature field is created - am I right?.
> >>
> >> The first time a document was updated, a signature field with value
> >> "" was added. The next time, the same signature was
> >> generated for the new udpate, which triggered the deletion of all
> >> documents with the same signature (i.e. the first one) as overwriteDupes
> >> was set to true. Correct behavior but quite tricky...
> >>
> >> So my conclusion here (please correct me if I'm wrong) is of course to
> >> fix the signature configuration problem, but also to manage calling the
> >> update chain (or maybe a simplified one, e.g. by skipping logging) in
> >> the data import handler. Is there an easy way to do this? Conceptually,
> >> shouldn't the update chain be callable from the data import process -
> >> maybe it is?
> >>
> >> John
> >>
> >>
> >> On 08/10/15 09:43, Upayavira wrote:
> >>> Yay!
> >>>
> >>> On Thu, Oct 8, 2015, at 08:38 AM, John Smith wrote:
>  Yes indeed, the update chain had been activated... I commented it out
>  again and the problem vanished.
> 
>  Good job, thanks Erick and Upayavira!
>  John
> 
> 
>  On 08/10/15 08:58, Upayavira wrote:
> > Look for the DedupUpdateProcessor in an update chain.
> >
> > that is there, but commented out IIRC in the techproducts sample
> > configs.
> >
> > Perhaps you uncommented it to use your own update processors, but
> didn't
> > remove that component?
> >
> > On Thu, Oct 8, 2015, at 07:38 AM, John Smith wrote:
> >> Oh, I forgot Erick's mention of the logs: there's nothing unusual in
> >> INFO level, the update request just gets mentioned. No exception. I
> >> reran it with the DEBUG level, but most of the log was related to
> jetty.
> >> Here's a line I noticed though:
> >>
> >> org.apache.solr.servlet.HttpSolrCall; Closing out SolrRequest:
> >> {wt=json=true=dedupe}
> >>
> >> The update.chain parameter wasn't part of the original request, and
> >> "dedupe" looks suspicious to me. Perhaps should I investigate
> further
> >> there?
> >>
> >> Thanks,
> >> John.
> >>
> >>
> >> On 08/10/15 08:25, John Smith wrote:
> >>> The ids are all different: they're unique numbers followed by a
> couple
> >>> of keywords. I've made a test with a small collection of 10
> documents to
> >>> make sure I can manage them manually: all ids are confirmed as
> different.
> >>>
> >>> I also dumped the exact command, here's one example:
> >>>
> >>> 101084385_Sebago_ sebago
> shoes >>> name="Clicks" update="set">1 >>> update="set">1.8701925463775
> >>>
> >>> It's sent as the body of a POST request to
> >>> http://127.0.0.1:8080/solr/ato_test/update?wt=json=true,
> with a
> >>> Content-Type: text/xml header. I still noted the consistent loss of
> >>> another document with the update above.
> >>>
> >>> John
> >>>
> >>>
> >>> On 08/10/15 00:38, Upayavira wrote:
>  What ID are you using? Are you possibly using the same ID field
> for
>  both, so the second document you visit causes the first to be
>  overwritten?
> 
>  Upayavira
> 
>  On Wed, Oct 7, 2015, at 06:38 PM, Erick Erickson wrote:
> > This certainly should not be happening. I'd
> > take 

Re: Fuzzy search for names and phrases

2015-10-08 Thread Alessandro Benedetti
Am i the only one that sees this messages out of context in the Mailing
list ?
Is this the expected behaviour ?

Cheers

On 8 October 2015 at 07:37, NutchDev  wrote:

> WordDelimiterFilterFactory can handle cases like,
>
> wi-fi ==> wifi
> SD500 ==> sd 500
> PowerShot ==> Power Shot
>
> you can get more information at wiki page here,
>
> https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Fuzzy-search-for-names-and-phrases-tp4233209p4233413.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
--

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Please add me to ContributorsGroup of the Solr Wiki

2015-10-08 Thread Nikola Smolenski
Hello,

Could you please add me to the ContributorsGroup of the Solr Wiki? I have
made Serbian analyzer for Solr [
https://issues.apache.org/jira/browse/LUCENE-6053] and would now like to
write about some Serbian search considerations.

My wiki username is NikolaSmolenski.

-- 
Nikola Smolenski

University of Belgrade
University library ''Svetozar Markovic''


Re: Please add me to ContributorsGroup of the Solr Wiki

2015-10-08 Thread Nikola Smolenski
Sorry, I have somehow not seen your initial response. I can log in and edit.

On Thu, Oct 8, 2015 at 5:32 PM, Erick Erickson 
wrote:

> I think I did this a few days ago, your name has been in the auth file
> since 5-Oct.
>
> So...
> 1> you haven't checked
> or
> 2> I messed it up somehow.
> or
> 3> You really need access to the _Lucene_ contributor's group
>  rather than the Solr contributor's group, they're separate
>  auth lists. Let me know if that's the case.
>
> And let me  know if you're still unable to edit the Solr Wiki
> pages and that's really what you need.
>
> Erick
>
> On Thu, Oct 8, 2015 at 8:13 AM, Nikola Smolenski 
> wrote:
> > Hello,
> >
> > Could you please add me to the ContributorsGroup of the Solr Wiki? I have
> > made Serbian analyzer for Solr [
> > https://issues.apache.org/jira/browse/LUCENE-6053] and would now like to
> > write about some Serbian search considerations.
> >
> > My wiki username is NikolaSmolenski.
> >
> > --
> > Nikola Smolenski
> >
> > University of Belgrade
> > University library ''Svetozar Markovic''
>



-- 
Nikola Smolenski

University of Belgrade
University library ''Svetozar Markovic''


Re: Exclude documents having same data in two fields

2015-10-08 Thread Alessandro Benedetti
Hi agree with Nutch,
using the Function Range Query Parser, should do your trick :

https://lucene.apache.org/solr/5_3_0/solr-core/org/apache/solr/search/FunctionRangeQParserPlugin.html

Cheers

On 8 October 2015 at 13:31, NutchDev  wrote:

> Hi Aman,
>
> Have a look at this , it has query time approach also using Solr function
> query,
>
>
> http://stackoverflow.com/questions/15927893/how-to-check-equality-of-two-solr-fields
>
> http://stackoverflow.com/questions/16258605/query-for-document-that-two-fields-are-equal
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Exclude-documents-having-same-data-in-two-fields-tp4233408p4233489.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
--

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: How to show some documents ahead of others

2015-10-08 Thread Andrea Roggerone
Sure. Let's say that as Upayavira was saying you have in your index:

"paid: true/false
Then sort=paid desc, score desc"

In that case, paid=true and higher score would come up first.
After that you decide that you want to add a set of offers:
Offer 1: cost 1000 euros
Offer 2: cost 100 euros
Offer 3: cost 10 euros
and you expect that user1 (that pays more) appears before user 2 and 3. In
such case the field true/false won't be enough as you don't have any way to
sort user to have offer1 before offer2.

Let's say for sake of conversation that you decide to replace "paid" with a
numeric value paid=1 or 2 or 3. This solution would work better until you
decide to improve relevancy...at that point your new solution wouldn't suit
you anymore.
So "as it doesn't allow any meaningful customization " meant that such
solution is too rigid. Hope it makes sense.



On Thu, Oct 8, 2015 at 3:39 PM, Alessandro Benedetti <
benedetti.ale...@gmail.com> wrote:

> Is it possible to understand better this : "as it doesn't
> allow any meaningful customization " ?
>
> Cheers
>
> On 8 October 2015 at 15:27, Andrea Roggerone <
> andrearoggerone.o...@gmail.com
> > wrote:
>
> > Hi guys,
> > I don't think that sorting is a good solution in this case as it doesn't
> > allow any meaningful customization.I believe that the advised
> > QueryElevationComponent is one of the viable alternative. Another one
> would
> > be to boost at query time a particular field, like for instance paid.
> That
> > would allow you to assign different boosts to different values using a
> > function.
> >
> > On Thu, Oct 8, 2015 at 1:48 PM, Upayavira  wrote:
> >
> > > Or just have a field in your index -
> > >
> > > paid: true/false
> > >
> > > Then sort=paid desc, score desc
> > >
> > > (you may need to sort paid asc, not sure which way a boolean would
> sort)
> > >
> > > Question is whether you want to show ALL paid posts, or just a set of
> > > them. For the latter you could use result grouping on the paid field.
> > >
> > > Upayavira
> > >
> > > On Thu, Oct 8, 2015, at 01:34 PM, NutchDev wrote:
> > > > Hi Christian,
> > > >
> > > > You can take a look at Solr's  QueryElevationComponent
> > > >   .
> > > >
> > > > It will allow you to configure the top results for a given query
> > > > regardless
> > > > of the normal lucene scoring. Also you can specify exclude document
> > list
> > > > to
> > > > exclude certain results for perticular query.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > > View this message in context:
> > > >
> > >
> >
> http://lucene.472066.n3.nabble.com/How-to-show-some-documents-ahead-of-others-tp4233481p4233490.html
> > > > Sent from the Solr - User mailing list archive at Nabble.com.
> > >
> >
>
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card - http://about.me/alessandro_benedetti
> Blog - http://alexbenedetti.blogspot.co.uk
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>


Re: Please add me to ContributorsGroup of the Solr Wiki

2015-10-08 Thread Erick Erickson
I think I did this a few days ago, your name has been in the auth file
since 5-Oct.

So...
1> you haven't checked
or
2> I messed it up somehow.
or
3> You really need access to the _Lucene_ contributor's group
 rather than the Solr contributor's group, they're separate
 auth lists. Let me know if that's the case.

And let me  know if you're still unable to edit the Solr Wiki
pages and that's really what you need.

Erick

On Thu, Oct 8, 2015 at 8:13 AM, Nikola Smolenski  wrote:
> Hello,
>
> Could you please add me to the ContributorsGroup of the Solr Wiki? I have
> made Serbian analyzer for Solr [
> https://issues.apache.org/jira/browse/LUCENE-6053] and would now like to
> write about some Serbian search considerations.
>
> My wiki username is NikolaSmolenski.
>
> --
> Nikola Smolenski
>
> University of Belgrade
> University library ''Svetozar Markovic''


Re: How to show some documents ahead of others

2015-10-08 Thread Alessandro Benedetti
Thanks Andrea, I agree with you.
 It seems much likely the classic " Relevancy biased by date" .
But instead of having new docs we have paying docs.
Probably a boost function can be helpful as already said.

Cheers

On 8 October 2015 at 17:03, Upayavira  wrote:

> Hence the suggestion to group by the paid field - would give you two
> lists of the number you ask for.
>
> What I'm trying to say is that the QueryElevationComponent might do it,
> but it is also relatively clunky, so a pure search solution might do it.
>
> However, the thing we lack right now is a full take on the requirements,
> e.g. how should paid results be sorted, how many paid results do you
> show, etc, etc. Without these details we're all guessing.
>
> Upayavira
>
>
> On Thu, Oct 8, 2015, at 04:45 PM, Walter Underwood wrote:
> > Sorting all paid above all unpaid will give bad results when there are
> > many matches. It will show 1000 paid items, include all the barely
> > relevant ones, before it shows the first highly relevant unpaid recipe.
> > What if that was the only correct result?
> >
> > Two approaches that work:
> >
> > 1. Boost paid items using the “boost” parameter in edismax. Adjust it to
> > be a tiebreaker between documents with similar score.
> >
> > 2. Show two lists, one with the five most relevant paid, the next with
> > the five most relevant unpaid.
> >
> > wunder
> > Walter Underwood
> > wun...@wunderwood.org
> > http://observer.wunderwood.org/  (my blog)
> >
> >
> > > On Oct 8, 2015, at 7:39 AM, Alessandro Benedetti <
> benedetti.ale...@gmail.com> wrote:
> > >
> > > Is it possible to understand better this : "as it doesn't
> > > allow any meaningful customization " ?
> > >
> > > Cheers
> > >
> > > On 8 October 2015 at 15:27, Andrea Roggerone <
> andrearoggerone.o...@gmail.com
> > >> wrote:
> > >
> > >> Hi guys,
> > >> I don't think that sorting is a good solution in this case as it
> doesn't
> > >> allow any meaningful customization.I believe that the advised
> > >> QueryElevationComponent is one of the viable alternative. Another one
> would
> > >> be to boost at query time a particular field, like for instance paid.
> That
> > >> would allow you to assign different boosts to different values using a
> > >> function.
> > >>
> > >> On Thu, Oct 8, 2015 at 1:48 PM, Upayavira  wrote:
> > >>
> > >>> Or just have a field in your index -
> > >>>
> > >>> paid: true/false
> > >>>
> > >>> Then sort=paid desc, score desc
> > >>>
> > >>> (you may need to sort paid asc, not sure which way a boolean would
> sort)
> > >>>
> > >>> Question is whether you want to show ALL paid posts, or just a set of
> > >>> them. For the latter you could use result grouping on the paid field.
> > >>>
> > >>> Upayavira
> > >>>
> > >>> On Thu, Oct 8, 2015, at 01:34 PM, NutchDev wrote:
> >  Hi Christian,
> > 
> >  You can take a look at Solr's  QueryElevationComponent
> >    .
> > 
> >  It will allow you to configure the top results for a given query
> >  regardless
> >  of the normal lucene scoring. Also you can specify exclude document
> > >> list
> >  to
> >  exclude certain results for perticular query.
> > 
> > 
> > 
> > 
> > 
> >  --
> >  View this message in context:
> > 
> > >>>
> > >>
> http://lucene.472066.n3.nabble.com/How-to-show-some-documents-ahead-of-others-tp4233481p4233490.html
> >  Sent from the Solr - User mailing list archive at Nabble.com.
> > >>>
> > >>
> > >
> > >
> > >
> > > --
> > > --
> > >
> > > Benedetti Alessandro
> > > Visiting card - http://about.me/alessandro_benedetti
> > > Blog - http://alexbenedetti.blogspot.co.uk
> > >
> > > "Tyger, tyger burning bright
> > > In the forests of the night,
> > > What immortal hand or eye
> > > Could frame thy fearful symmetry?"
> > >
> > > William Blake - Songs of Experience -1794 England
> >
>



-- 
--

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: How to show some documents ahead of others

2015-10-08 Thread Upayavira
Hence the suggestion to group by the paid field - would give you two
lists of the number you ask for.

What I'm trying to say is that the QueryElevationComponent might do it,
but it is also relatively clunky, so a pure search solution might do it.

However, the thing we lack right now is a full take on the requirements,
e.g. how should paid results be sorted, how many paid results do you
show, etc, etc. Without these details we're all guessing.

Upayavira


On Thu, Oct 8, 2015, at 04:45 PM, Walter Underwood wrote:
> Sorting all paid above all unpaid will give bad results when there are
> many matches. It will show 1000 paid items, include all the barely
> relevant ones, before it shows the first highly relevant unpaid recipe.
> What if that was the only correct result?
> 
> Two approaches that work:
> 
> 1. Boost paid items using the “boost” parameter in edismax. Adjust it to
> be a tiebreaker between documents with similar score.
> 
> 2. Show two lists, one with the five most relevant paid, the next with
> the five most relevant unpaid.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
> 
> > On Oct 8, 2015, at 7:39 AM, Alessandro Benedetti 
> >  wrote:
> > 
> > Is it possible to understand better this : "as it doesn't
> > allow any meaningful customization " ?
> > 
> > Cheers
> > 
> > On 8 October 2015 at 15:27, Andrea Roggerone  >> wrote:
> > 
> >> Hi guys,
> >> I don't think that sorting is a good solution in this case as it doesn't
> >> allow any meaningful customization.I believe that the advised
> >> QueryElevationComponent is one of the viable alternative. Another one would
> >> be to boost at query time a particular field, like for instance paid. That
> >> would allow you to assign different boosts to different values using a
> >> function.
> >> 
> >> On Thu, Oct 8, 2015 at 1:48 PM, Upayavira  wrote:
> >> 
> >>> Or just have a field in your index -
> >>> 
> >>> paid: true/false
> >>> 
> >>> Then sort=paid desc, score desc
> >>> 
> >>> (you may need to sort paid asc, not sure which way a boolean would sort)
> >>> 
> >>> Question is whether you want to show ALL paid posts, or just a set of
> >>> them. For the latter you could use result grouping on the paid field.
> >>> 
> >>> Upayavira
> >>> 
> >>> On Thu, Oct 8, 2015, at 01:34 PM, NutchDev wrote:
>  Hi Christian,
>  
>  You can take a look at Solr's  QueryElevationComponent
>    .
>  
>  It will allow you to configure the top results for a given query
>  regardless
>  of the normal lucene scoring. Also you can specify exclude document
> >> list
>  to
>  exclude certain results for perticular query.
>  
>  
>  
>  
>  
>  --
>  View this message in context:
>  
> >>> 
> >> http://lucene.472066.n3.nabble.com/How-to-show-some-documents-ahead-of-others-tp4233481p4233490.html
>  Sent from the Solr - User mailing list archive at Nabble.com.
> >>> 
> >> 
> > 
> > 
> > 
> > -- 
> > --
> > 
> > Benedetti Alessandro
> > Visiting card - http://about.me/alessandro_benedetti
> > Blog - http://alexbenedetti.blogspot.co.uk
> > 
> > "Tyger, tyger burning bright
> > In the forests of the night,
> > What immortal hand or eye
> > Could frame thy fearful symmetry?"
> > 
> > William Blake - Songs of Experience -1794 England
> 


Re: Best Indexing Approaches - To max the throughput

2015-10-08 Thread Alessandro Benedetti
This depends of the number of active producers, but ideally it's ok.
Different threads will access the ThreadSafe ConcurrentUpdateSolrClient and
send the document in batches.

Or you were meaning something different ?


On 8 October 2015 at 16:00, Mugeesh Husain  wrote:

> Good way Using SolrJ with Thread pool executor framework, increase number
> of
> Thread as per your requirement
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Best-Indexing-Approaches-To-max-the-throughput-tp4232740p4233513.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
--

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: Error when use block-join filters in json api

2015-10-08 Thread Mikhail Khludnev
Iana,
Such complex structured queries are really hard to forge in Solr (-here
Elastic's DSL gets over much-).
I suggest to check
http://blog.griddynamics.com/2013/09/solr-block-join-support.html and
http://blog.griddynamics.com/2013/12/grandchildren-and-siblings-with-block.html
to get known corner cases.

so, far
"+{!child of=state:*}state:Michigan+{!child
of=city:*}city:'Benton'",
Different 'of' filter seems doubtful to me, beware! you might exceed BJQ
capability. Absence of space before second plus is also suspicious do me. I
suggest to look into the parsed query in debugQuery=true output.

Oh, I just understand that you work with query facet!!
 type : query,

Thus, make sure that the query works as a q parameter, it should yield a
lot of surprises.

On Thu, Oct 8, 2015 at 6:41 PM, Iana Bondarska  wrote:

>  thanks for help, I managed to get some results with json.facet:
> { high_popularity : {
> type : query,
> q : "+{!child of=state:*}state:Michigan+{!child
> of=city:*}city:'Benton'",
> facet :{top_genres:{type: terms,field: c_gender}}
> }
> }
>
> but in this case operator to link conditions is defined by solrQueryParser,
> defaultOperator param in schema.xml.
> Are there any way to define which operator I want to use in the query?
>
> Also seems that  it's impossible to specify numeric conditions, such query
> leads to error (I have c_age in child documents too)
> { high_popularity : {
> type : query,
> q : "+{!child of=state:*}state:Michigan+{!child
> of=city:*}city:'Benton'+c_age:[10:40]",
> facet :{top_genres:{type: terms,field: c_gender}}
> }
> }
>
>
>
> Regarding error with default text field -- it was definitely result of that
> query, I also got 400, undefined field Review_Text in response(it was
> defined as default query field in solrconfig.xml ) . After add field, I
> results, but again, they are linked by OR condition
>
>
> 2015-10-08 16:53 GMT+03:00 Mikhail Khludnev :
>
> > >
> > >  { high_popularity : {
> > > type : query,
> > > q : "{!child of=city:Auburn}city:Auburn AND c_window:seaview",
> > > facet :{top_genres:{type: terms,field: c_gender}}
> > > }
> > > }
> >
> >
> > I'm not sure about facets, but query isn't correct in should be something
> > like
> >   q : "+c_window:seaview +{!child of="_id:[* TO *]"}city:Auburn",
> > of filter should match all parents docs! it's convenient to index
> > parent:true field.
> >
> > q : "{!child of=city:Auburn}city:Auburn AND {child
> > of=state:Washingthon}state:Washingthon",
> >
> > it should be rewritten as (I'm ashamed to say why)
> >
> > q : "+{!child of="_id:[* TO *]"}city:Auburn +{child
> > of="_id:[* TO *]"}state:Washingthon",
> >
> > I don't believe that stack trace is caused by the shown request, because
> it
> > hiccups on query facet, which you didn't show:
> >
> > at
> >
> org.apache.solr.search.facet.FacetQueryParser.parse(FacetRequest.java:473)
> > at
> >
> >
> org.apache.solr.search.facet.FacetParser.parseQueryFacet(FacetRequest.java:255)
> > at
> >
> >
> org.apache.solr.search.facet.FacetParser.parseFacetOrStat(FacetRequest.java:238)
> >
> > see
> >   public Object parseFacetOrStat(String key, String type, Object args)
> > throws SyntaxError {
> > // TODO: a place to register all these facet types?
> >
> > if ("field".equals(type) || "terms".equals(type)) {
> >   return parseFieldFacet(key, args);
> > } else if ("query".equals(type)) {
> >   return parseQueryFacet(key, args);
> > } else if ("range".equals(type)) {
> >   return parseRangeFacet(key, args);
> > }
> >
> >
> >
> >
> > On Thu, Oct 8, 2015 at 4:08 PM, Iana Bondarska 
> wrote:
> >
> > > sorry,missed example input data:
> > > child document:
> > > { "c_gender": "female", "c_window": "seaview", "_root_": 1673891436 }
> > >
> > > parent document:
> > > { "_id": 1673891436, "county_code": "26021", "city": "Auburn", "year":
> > > 2012,
> > > "county": "Berrien", "Sales": 112808, "state": "Washington",
> > > "product_group":
> > > "Books", "sku": "ZD111588", "income_bracket": "$25000 to $5",
> > > "_version_":
> > > 1513636454494896000, "_root_": 1673891436 },
> > >
> > > 2015-10-08 15:00 GMT+03:00 Iana Bondarska :
> > >
> > > > Hello Mikhail,
> > > >
> > > > here are json.facet parameters that I tried:
> > > > c_gender, c_window belong to child documents, rest - to parent.
> > > >
> > > > 1. returns no results, can we combine filters from different levels
> in
> > > > queries
> > > >
> > > >  { high_popularity : {
> > > > type : query,
> > > > q : "{!child of=city:Auburn}city:Auburn AND c_window:seaview",
> > > > facet :{top_genres:{type: terms,field: c_gender}}
> > > > }
> > > > }
> > > >
> > > > 2.triggers full text search, I get error "undefined field:
> > > > \"Review_Text\"" , that's true, I have mistake in configuration,but I
> > > > didn't request fulltext search in the 

Re: tlog replay

2015-10-08 Thread Rallavagu

As a follow up.

Eventually the tlog file is disappeared (could not track the time it 
took to clear out completely). However, following messages were noticed 
in follower's log.


5120638 [recoveryExecutor-14-thread-2] WARN 
org.apache.solr.update.UpdateLog  – Starting log replay tlog


On 10/7/15 8:29 PM, Erick Erickson wrote:

The only way I can account for such a large file off the top of my
head is if, for some reason,
the Solr on the node somehow was failing to index documents and kept
adding them to the
log for a lnnn time. But how that would happen without the
node being in recovery
mode I'm not sure. I mean the Solr instance would have to be healthy
otherwise but just not
able to index docs which makes no sense.

The usual question here is whether there were any messages in the solr
log file indicating
problems while this built up.

tlogs will build up to very large sizes if there are very long hard
commit intervals, but I don't
see how that interval would be different on the leader and follower.

So color me puzzled.

Best,
Erick

On Wed, Oct 7, 2015 at 8:09 PM, Rallavagu  wrote:

Thanks Erick.

Eventually, followers caught up but the 14G tlog file still persists and
they are healthy. Is there anything to look for? Will monitor and see how
long will it take before it disappears.

Evaluating move to Solr 5.3.

On 10/7/15 7:51 PM, Erick Erickson wrote:


Uhm, that's very weird. Updates are not applied from the tlog. Rather the
raw doc is forwarded to the replica which both indexes the doc and
writes it to the local tlog. So having a 14G tlog on a follower but a
small
tlog on the leader is definitely strange, especially if it persists over
time.

I assume the follower is healthy? And does this very large tlog disappear
after a while? I'd expect it to be aged out after a few commits of > 100
docs.

All that said, there have been a LOT of improvements since 4.6, so it
might
be something that's been addressed in the intervening time.

Best,
Erick



On Wed, Oct 7, 2015 at 7:39 PM, Rallavagu  wrote:


Solr 4.6.1, single shard, 4 node cloud, 3 node zk

Like to understand the behavior better when large number of updates
happen
on leader and it generates huge tlog (14G sometimes in my case) on other
nodes. At the same time leader's tlog is few KB. So, what is the rate at
which the changes from transaction log are applied at nodes? The
autocommit
interval is set to 15 seconds after going through

https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Thanks


Re: Scramble data

2015-10-08 Thread Susheel Kumar
Like Erick said,  would something like using replace function on individual
sensitive fields in fl param would work? replacing to something REDACTED
etc.

On Thu, Oct 8, 2015 at 2:58 PM, Tarala, Magesh  wrote:

> I already have the data ingested and it takes several days to do that. I
> was trying to avoid re-ingesting the data.
>
> Thanks,
> Magesh
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Wednesday, October 07, 2015 9:26 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Scramble data
>
> Probably sanitize the data on the front end? Something simple like put
> "REDACTED" for all of the customer-sensitive fields.
>
> You might also write a DocTransformer plugin, all you have to do is
> implement subclass DocTransformer and override one very simple "transform"
> method,
>
> Best,
> Erick
>
> On Wed, Oct 7, 2015 at 5:09 PM, Tarala, Magesh  wrote:
> > Folks,
> > I have a strange question. We have a Solr implementation that we would
> like to demo to external customers. But we don't want to display the real
> data, which contains our customer information and so is sensitive data.
> What's the best way to scramble the data of the Solr Query results? By best
> I mean the simplest way with least amount of work. BTW, we have a .NET
> front end application.
> >
> > Thanks,
> > Magesh
> >
> >
> >
>


Re: Scramble data

2015-10-08 Thread Uwe Reh

Hi,

my suggestions are probably to simple, because they are not a real 
protection of privacy. But maybe one fits to your needs.


Most simple:
Declare your 'hidden' fields just as "indexed=true stored=false", the 
data will be used for searching, but the fields are not listed in the 
query response.
Cons: The Terms of the fields can be still examined by advanced users. 
As example they could use the field as facet.


Very simple
Use a PhoneticFilter for indexing and searching. The encoding 
"ColognePhonetic" generates a numeric hash for each term. The name 
"Breschnew" will be saved as "17863".
Cons: Phonetic similaritys will lead to false hits. This hashing is 
really only scrambling and not appropriate as security feature.


Simple
Declare a special SearchHandlers in your solrconfig.xml and define an 
invariant fieldList parameter. This should contain just the public 
subset of your fields.

Cons: I'm not really sure, about this.

Still quite simple
Write a own Filter, which generates real cryptographic hashes
Cons: If the entropy of your data is poor, you may need additional 
tricks like padding the data. This filter may slow down your system.



Last but not least be aware, that the searching could be a way to 
restore hidden informations. If a query for "billionaire" just get one 
hit, it's obvious that "billionaire" is an attribute of the document 
even if it is not listed in the result.


Uwe


Re: Scramble data

2015-10-08 Thread Roman Chyla
Or you could also apply XSL to returned records:
https://wiki.apache.org/solr/XsltResponseWriter


On Thu, Oct 8, 2015 at 5:06 PM, Uwe Reh  wrote:
> Hi,
>
> my suggestions are probably to simple, because they are not a real
> protection of privacy. But maybe one fits to your needs.
>
> Most simple:
> Declare your 'hidden' fields just as "indexed=true stored=false", the data
> will be used for searching, but the fields are not listed in the query
> response.
> Cons: The Terms of the fields can be still examined by advanced users. As
> example they could use the field as facet.
>
> Very simple
> Use a PhoneticFilter for indexing and searching. The encoding
> "ColognePhonetic" generates a numeric hash for each term. The name
> "Breschnew" will be saved as "17863".
> Cons: Phonetic similaritys will lead to false hits. This hashing is really
> only scrambling and not appropriate as security feature.
>
> Simple
> Declare a special SearchHandlers in your solrconfig.xml and define an
> invariant fieldList parameter. This should contain just the public subset of
> your fields.
> Cons: I'm not really sure, about this.
>
> Still quite simple
> Write a own Filter, which generates real cryptographic hashes
> Cons: If the entropy of your data is poor, you may need additional tricks
> like padding the data. This filter may slow down your system.
>
>
> Last but not least be aware, that the searching could be a way to restore
> hidden informations. If a query for "billionaire" just get one hit, it's
> obvious that "billionaire" is an attribute of the document even if it is not
> listed in the result.
>
> Uwe


Re: No live SolrServers available to handle this request

2015-10-08 Thread Mark Miller
Your Lucene and Solr versions must match.

On Thu, Oct 8, 2015 at 4:02 PM Steve  wrote:

> I've loaded the Films data into a 4 node cluster.  Indexing went well, but
> when I issue a query, I get this:
>
> "error": {
> "msg": "org.apache.solr.client.solrj.SolrServerException: No live
> SolrServers available to handle this request:
> [
>
> http://host-192-168-0-63.openstacklocal:8081/solr/CollectionFilms_shard1_replica2
> ,
>
>
> http://host-192-168-0-62.openstacklocal:8081/solr/CollectionFilms_shard2_replica2
> ,
>
>
> http://host-192-168-0-60.openstacklocal:8081/solr/CollectionFilms_shard2_replica1
> ]",
> ...
>
> and further down in the stacktrace:
>
> Server Error
> Caused by:
> java.lang.NoSuchMethodError:
>
> org.apache.lucene.index.TermsEnum.postings(Lorg/apache/lucene/index/PostingsEnum;I)Lorg/apache/lucene/index/PostingsEnum;\n\tat
>
> org.apache.solr.search.SolrIndexSearcher.getFirstMatch(SolrIndexSearcher.java:802)\n\tat
>
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:333)\n\tat
> ...
>
>
> I'm using:
>
> solr version 5.3.1
>
> lucene 5.2.1
>
> zookeeper version 3.4.6
>
> indexing with:
>
>cd /opt/solr/example/films;
>
> /opt/solr/bin/post -c CollectionFilms -port 8081  films.json
>
>
>
> thx,
> .strick
>
-- 
- Mark
about.me/markrmiller


Re: Scramble data

2015-10-08 Thread Doug Turnbull
Can you just generate a fake data set for testing? There are numerous
libraries that create fake names, phone numbers, etc that you can use to
create mock data. Faker is one we have used in sensitive situations

https://github.com/joke2k/faker

I think this is going to be a better long-term solution than trying to play
around with possibly sensitive info.

-Doug

On Wednesday, October 7, 2015, Tarala, Magesh  wrote:

> Folks,
> I have a strange question. We have a Solr implementation that we would
> like to demo to external customers. But we don't want to display the real
> data, which contains our customer information and so is sensitive data.
> What's the best way to scramble the data of the Solr Query results? By best
> I mean the simplest way with least amount of work. BTW, we have a .NET
> front end application.
>
> Thanks,
> Magesh
>
>
>
>

-- 
*Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections
, LLC | 240.476.9983
Author: Relevant Search 
This e-mail and all contents, including attachments, is considered to be
Company Confidential unless explicitly stated otherwise, regardless
of whether attachments are marked as such.


Re: tlog replay

2015-10-08 Thread Rallavagu

Erick,

Actually, configured autocommit to 15 seconds and openSearcher is set to 
false. Neither 2 nor 3 happened. However, softCommit is set to 10 min.



   ${solr.autoCommit.maxTime:15000}
   false
 

Working on upgrading to 5.3 which will take a bit of time and trying to 
get this under control until that time.


On 10/8/15 5:28 PM, Erick Erickson wrote:

right, so the scenario is
1> somehow you didn't do a hard commit (openSearcher=true or false
doesn't matter) for a really long time while indexing.
2> Solr abnormally terminated.
3> When Solr started back up it replayed the entire log.

How <1> happened is the mystery though. With a hard commit
(autocommit) interval of 15 seconds that's weird.

The message indicates something like that happened. In very recent
Solr versions, the log will have
progress messages printed that'll help see this is happening.

Best,
Erick

On Thu, Oct 8, 2015 at 12:23 PM, Rallavagu  wrote:

As a follow up.

Eventually the tlog file is disappeared (could not track the time it took to
clear out completely). However, following messages were noticed in
follower's log.

5120638 [recoveryExecutor-14-thread-2] WARN org.apache.solr.update.UpdateLog
– Starting log replay tlog

On 10/7/15 8:29 PM, Erick Erickson wrote:


The only way I can account for such a large file off the top of my
head is if, for some reason,
the Solr on the node somehow was failing to index documents and kept
adding them to the
log for a lnnn time. But how that would happen without the
node being in recovery
mode I'm not sure. I mean the Solr instance would have to be healthy
otherwise but just not
able to index docs which makes no sense.

The usual question here is whether there were any messages in the solr
log file indicating
problems while this built up.

tlogs will build up to very large sizes if there are very long hard
commit intervals, but I don't
see how that interval would be different on the leader and follower.

So color me puzzled.

Best,
Erick

On Wed, Oct 7, 2015 at 8:09 PM, Rallavagu  wrote:


Thanks Erick.

Eventually, followers caught up but the 14G tlog file still persists and
they are healthy. Is there anything to look for? Will monitor and see how
long will it take before it disappears.

Evaluating move to Solr 5.3.

On 10/7/15 7:51 PM, Erick Erickson wrote:



Uhm, that's very weird. Updates are not applied from the tlog. Rather
the
raw doc is forwarded to the replica which both indexes the doc and
writes it to the local tlog. So having a 14G tlog on a follower but a
small
tlog on the leader is definitely strange, especially if it persists over
time.

I assume the follower is healthy? And does this very large tlog
disappear
after a while? I'd expect it to be aged out after a few commits of > 100
docs.

All that said, there have been a LOT of improvements since 4.6, so it
might
be something that's been addressed in the intervening time.

Best,
Erick



On Wed, Oct 7, 2015 at 7:39 PM, Rallavagu  wrote:



Solr 4.6.1, single shard, 4 node cloud, 3 node zk

Like to understand the behavior better when large number of updates
happen
on leader and it generates huge tlog (14G sometimes in my case) on
other
nodes. At the same time leader's tlog is few KB. So, what is the rate
at
which the changes from transaction log are applied at nodes? The
autocommit
interval is set to 15 seconds after going through


https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Thanks


Re: Lose Solr config on zookeeper when it is restarted

2015-10-08 Thread CrazyDiamond
i have one instance of solr. the thing is when i create collection the
running solr is used but when i upload config i use zkcli 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Lose-Solr-config-on-zookeeper-when-it-is-restarted-tp421p4233626.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: tlog replay

2015-10-08 Thread Erick Erickson
right, so the scenario is
1> somehow you didn't do a hard commit (openSearcher=true or false
doesn't matter) for a really long time while indexing.
2> Solr abnormally terminated.
3> When Solr started back up it replayed the entire log.

How <1> happened is the mystery though. With a hard commit
(autocommit) interval of 15 seconds that's weird.

The message indicates something like that happened. In very recent
Solr versions, the log will have
progress messages printed that'll help see this is happening.

Best,
Erick

On Thu, Oct 8, 2015 at 12:23 PM, Rallavagu  wrote:
> As a follow up.
>
> Eventually the tlog file is disappeared (could not track the time it took to
> clear out completely). However, following messages were noticed in
> follower's log.
>
> 5120638 [recoveryExecutor-14-thread-2] WARN org.apache.solr.update.UpdateLog
> – Starting log replay tlog
>
> On 10/7/15 8:29 PM, Erick Erickson wrote:
>>
>> The only way I can account for such a large file off the top of my
>> head is if, for some reason,
>> the Solr on the node somehow was failing to index documents and kept
>> adding them to the
>> log for a lnnn time. But how that would happen without the
>> node being in recovery
>> mode I'm not sure. I mean the Solr instance would have to be healthy
>> otherwise but just not
>> able to index docs which makes no sense.
>>
>> The usual question here is whether there were any messages in the solr
>> log file indicating
>> problems while this built up.
>>
>> tlogs will build up to very large sizes if there are very long hard
>> commit intervals, but I don't
>> see how that interval would be different on the leader and follower.
>>
>> So color me puzzled.
>>
>> Best,
>> Erick
>>
>> On Wed, Oct 7, 2015 at 8:09 PM, Rallavagu  wrote:
>>>
>>> Thanks Erick.
>>>
>>> Eventually, followers caught up but the 14G tlog file still persists and
>>> they are healthy. Is there anything to look for? Will monitor and see how
>>> long will it take before it disappears.
>>>
>>> Evaluating move to Solr 5.3.
>>>
>>> On 10/7/15 7:51 PM, Erick Erickson wrote:


 Uhm, that's very weird. Updates are not applied from the tlog. Rather
 the
 raw doc is forwarded to the replica which both indexes the doc and
 writes it to the local tlog. So having a 14G tlog on a follower but a
 small
 tlog on the leader is definitely strange, especially if it persists over
 time.

 I assume the follower is healthy? And does this very large tlog
 disappear
 after a while? I'd expect it to be aged out after a few commits of > 100
 docs.

 All that said, there have been a LOT of improvements since 4.6, so it
 might
 be something that's been addressed in the intervening time.

 Best,
 Erick



 On Wed, Oct 7, 2015 at 7:39 PM, Rallavagu  wrote:
>
>
> Solr 4.6.1, single shard, 4 node cloud, 3 node zk
>
> Like to understand the behavior better when large number of updates
> happen
> on leader and it generates huge tlog (14G sometimes in my case) on
> other
> nodes. At the same time leader's tlog is few KB. So, what is the rate
> at
> which the changes from transaction log are applied at nodes? The
> autocommit
> interval is set to 15 seconds after going through
>
>
> https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>
> Thanks


which one is faster synonym_edismax & edismax faster?

2015-10-08 Thread Aman Tandon
Hi,

Currently we are using the *synonym_edismax query parser* plugin to handle
the multi-word synonym. I want to know which is more faster *edismax* or
*synonym_edismax*.

As we are having the very less amount of multi-words in our dictionary so
we are thinking to use standard edismax query parser.

Any suggestions or observations will be helpful.

With Regards
Aman Tandon


Re: how to deployed another web project into jetty server(solr inbuilt)

2015-10-08 Thread Upayavira


On Thu, Oct 8, 2015, at 03:21 PM, Mugeesh Husain wrote:
> Thank you Daniel Collins.
> 
> Client is not providing tomcat or any other server that why i was looking
> for it.
> any i ask again for server installation.

There is good reason for what Daniel told you. Sure, you can work out
how to install your app within Solr's jetty. Then, you try to upgrade
and find that Solr isn't using Jetty anymore, or uses it in an
incompatible way.

You would be best to advise your client that installing your app within
Solr is not advised, and that you will need to install the app in
another way.

Upayavira


Re: Best Indexing Approaches - To max the throughput

2015-10-08 Thread Susheel Kumar
The ConcurrentUpdateSolrClient is not cloud aware or takes zkHostString as
input.  So only option is to use CloudSolrClient with SolrJ & Thread pool
executor framework.

On Thu, Oct 8, 2015 at 12:50 PM, Alessandro Benedetti <
benedetti.ale...@gmail.com> wrote:

> This depends of the number of active producers, but ideally it's ok.
> Different threads will access the ThreadSafe ConcurrentUpdateSolrClient and
> send the document in batches.
>
> Or you were meaning something different ?
>
>
> On 8 October 2015 at 16:00, Mugeesh Husain  wrote:
>
> > Good way Using SolrJ with Thread pool executor framework, increase number
> > of
> > Thread as per your requirement
> >
> >
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/Best-Indexing-Approaches-To-max-the-throughput-tp4232740p4233513.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card - http://about.me/alessandro_benedetti
> Blog - http://alexbenedetti.blogspot.co.uk
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>


RE: Scramble data

2015-10-08 Thread Tarala, Magesh
I already have the data ingested and it takes several days to do that. I was 
trying to avoid re-ingesting the data. 

Thanks,
Magesh

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Wednesday, October 07, 2015 9:26 PM
To: solr-user@lucene.apache.org
Subject: Re: Scramble data

Probably sanitize the data on the front end? Something simple like put 
"REDACTED" for all of the customer-sensitive fields.

You might also write a DocTransformer plugin, all you have to do is implement 
subclass DocTransformer and override one very simple "transform" method,

Best,
Erick

On Wed, Oct 7, 2015 at 5:09 PM, Tarala, Magesh  wrote:
> Folks,
> I have a strange question. We have a Solr implementation that we would like 
> to demo to external customers. But we don't want to display the real data, 
> which contains our customer information and so is sensitive data. What's the 
> best way to scramble the data of the Solr Query results? By best I mean the 
> simplest way with least amount of work. BTW, we have a .NET front end 
> application.
>
> Thanks,
> Magesh
>
>
>


No live SolrServers available to handle this request

2015-10-08 Thread Steve
I've loaded the Films data into a 4 node cluster.  Indexing went well, but
when I issue a query, I get this:

"error": {
"msg": "org.apache.solr.client.solrj.SolrServerException: No live
SolrServers available to handle this request:
[
http://host-192-168-0-63.openstacklocal:8081/solr/CollectionFilms_shard1_replica2
,

http://host-192-168-0-62.openstacklocal:8081/solr/CollectionFilms_shard2_replica2
,

http://host-192-168-0-60.openstacklocal:8081/solr/CollectionFilms_shard2_replica1
]",
...

and further down in the stacktrace:

Server Error
Caused by:
java.lang.NoSuchMethodError:
org.apache.lucene.index.TermsEnum.postings(Lorg/apache/lucene/index/PostingsEnum;I)Lorg/apache/lucene/index/PostingsEnum;\n\tat
org.apache.solr.search.SolrIndexSearcher.getFirstMatch(SolrIndexSearcher.java:802)\n\tat
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:333)\n\tat
...


I'm using:

solr version 5.3.1

lucene 5.2.1

zookeeper version 3.4.6

indexing with:

   cd /opt/solr/example/films;

/opt/solr/bin/post -c CollectionFilms -port 8081  films.json



thx,
.strick