Re: Issue Using Solr 5.3 Authentication and Authorization Plugins

2015-09-01 Thread Kevin Lee
The restart issues aside, I’m trying to lockdown usage of the Collections API, 
but that also does not seem to be working either.

Here is my security.json.  I’m using the “collection-admin-edit” permission and 
assigning it to the “adminRole”.  However, after uploading the new 
security.json and restarting the web browser, it doesn’t seem to be requiring 
credentials when calling the RELOAD action on the Collections API.  The only 
thing that seems to work is the custom permission “browse” which is requiring 
authentication before allowing me to pull up the page.  Am I using the 
permissions correctly for the RuleBasedAuthorizationPlugin?

{
"authentication":{
   "class":"solr.BasicAuthPlugin",
   "credentials": {
"admin”:” ",
"user": ” "
}
},
"authorization":{
   "class":"solr.RuleBasedAuthorizationPlugin",
   "permissions": [
{
"name":"security-edit", 
"role":"adminRole"
},
{
"name":"collection-admin-edit”,
"role":"adminRole"
},
{
"name":"browse", 
"collection": "inventory", 
"path": "/browse", 
"role":"browseRole"
}
],
   "user-role": {
"admin": [
"adminRole",
"browseRole"
],
"user": [
"browseRole"
]
}
}
}

Also tried adding the permission using the Authorization API, but no effect, 
still isn’t protecting the Collections API from being invoked without a 
username password.  I do see in the Solr logs that it sees the updates because 
it outputs the messages “Updating /security.json …”, “Security node changed”, 
“Initializing authorization plugin: solr.RuleBasedAuthorizationPlugin” and 
“Authentication plugin class obtained from ZK: solr.BasicAuthPlugin”.

Thanks,
Kevin

> On Sep 1, 2015, at 12:31 AM, Noble Paul  wrote:
> 
> I'm investigating why restarts or first time start does not read the
> security.json
> 
> On Tue, Sep 1, 2015 at 1:00 PM, Noble Paul  wrote:
>> I removed that statement
>> 
>> "If activating the authorization plugin doesn't protect the admin ui,
>> how does one protect access to it?"
>> 
>> One does not need to protect the admin UI. You only need to protect
>> the relevant API calls . I mean it's OK to not protect the CSS and
>> HTML stuff.  But if you perform an action to create a core or do a
>> query through admin UI , it automatically will prompt you for
>> credentials (if those APIs are protected)
>> 
>> On Tue, Sep 1, 2015 at 12:41 PM, Kevin Lee  wrote:
>>> Thanks for the clarification!
>>> 
>>> So is the wiki page incorrect at
>>> https://cwiki.apache.org/confluence/display/solr/Basic+Authentication+Plugin
>>>  which says that the admin ui will require authentication once the 
>>> authorization plugin is activated?
>>> 
>>> "An authorization plugin is also available to configure Solr with 
>>> permissions to perform various activities in the system. Once activated, 
>>> access to the Solr Admin UI and all requests will need to be authenticated 
>>> and users will be required to have the proper authorization for all 
>>> requests, including using the Admin UI and making any API calls."
>>> 
>>> If activating the authorization plugin doesn't protect the admin ui, how 
>>> does one protect access to it?
>>> 
>>> Also, the issue I'm having is not just at restart.  According to the docs 
>>> security.json should be uploaded to Zookeeper before starting any of the 
>>> Solr instances.  However, I tried to upload security.json before starting 
>>> any of the Solr instances, but it would not pick up the security config 
>>> until after the Solr instances are already running and then uploading the 
>>> security.json again.  I can see in the logs at startup that the Solr 
>>> instances don't see any plugin enabled even though security.json is already 
>>> in zookeeper and then after they are started and the security.json is 
>>> uploaded again I see it reconfigure to use the plugin.
>>> 
>>> Thanks,
>>> Kevin
>>> 
 On Aug 31, 2015, at 11:22 PM, Noble Paul  wrote:
 
 Admin UI is not protected by any of these permissions. Only if you try
 to perform a protected operation , it asks for a password.
 
 I'll investigate the restart problem and report my  findings
 
> On Tue, Sep 1, 2015 at 3:10 AM, Kevin Lee 

Re: DataImportHandler scheduling

2015-09-01 Thread Troy Edwards
My initial thought was to use scheduling built with DIH:
http://wiki.apache.org/solr/DataImportHandler#Scheduling

But I think just a cron job should do the same for me.

Thanks

On Tue, Sep 1, 2015 at 8:51 AM, Davis, Daniel (NIH/NLM) [C] <
daniel.da...@nih.gov> wrote:

> On 8/31/2015 11:26 AM, Troy Edwards wrote:
> > I am having a hard time finding documentation on DataImportHandler
> > scheduling in SolrCloud. Can someone please post a link to that? I
> > have a requirement that the DIH should be initiated at a specific time
> > Monday through Friday.
>
> Troy, is your question how to use scheduled tasks?   Shawn pointed you to
> the right direction.   I thought it more likely that you want to schedule a
> cron task to run on any of your servers running SolrCloud, and you want the
> job to run even if the cluster is degraded.
>
> Here's an idea - schedule your job Monday on node 1, Tuesday on node 2,
> etc.   That way, if the cluster is degraded (a node is down),
> re-indexing/delta indexing still happens, it just happens slower.You
> can certainly write a zookeeper client to make each cron job compete to see
> who does the job - questions on how to do this should be directed to a
> zookeeper users' mailing list.
>
> -Original Message-
> From: Shawn Heisey [mailto:apa...@elyograg.org]
> Sent: Monday, August 31, 2015 7:50 PM
> To: solr-user@lucene.apache.org
> Subject: Re: DataImportHandler scheduling
>
> On 8/31/2015 11:26 AM, Troy Edwards wrote:
> > I am having a hard time finding documentation on DataImportHandler
> > scheduling in SolrCloud. Can someone please post a link to that? I
> > have a requirement that the DIH should be initiated at a specific time
> > Monday through Friday.
>
> Every modern operating system (and most of the previous versions of every
> modern OS) has a built-in task scheduling system.  For Windows, it's
> literally called Task Scheduler.  For most other operating systems, it's
> called cron.
>
> Including dataimport scheduling capability in Solr has been discussed, and
> I think someone even wrote a working version ... but since every OS already
> has scheduling capability that has had years of time to mature, why should
> Solr reinvent the wheel and take the risk that the implementation will have
> bugs?
>
> Currently virtually all updates to Solr's index must be initiated outside
> of Solr, and there is good reason to make sure that Solr doesn't ever
> modify the index without outside input.  The only thing I know of right now
> that can update the index automatically is Document Expiration, but the
> expiration time is decided when the document is indexed, and the original
> indexing action is external to Solr.
>
> https://lucidworks.com/blog/document-expiration/
>
> Thanks,
> Shawn
>
>


Re: DataImportHandler scheduling

2015-09-01 Thread Shawn Heisey
On 9/1/2015 11:45 AM, Troy Edwards wrote:
> My initial thought was to use scheduling built with DIH:
> http://wiki.apache.org/solr/DataImportHandler#Scheduling
>
> But I think just a cron job should do the same for me.

The dataimport scheduler does not exist in any Solr version.  This is a
proposed feature, with the enhancement issue open for more than four years:

https://issues.apache.org/jira/browse/SOLR-2305

I have updated the wiki page to state the fact that the scheduler is a
proposed improvement, not a usable feature.

Thanks,
Shawn



Solr cloud hangs, log4j contention issue observed

2015-09-01 Thread Arnon Yogev
We have a Solr cloud (4.7) consisting of 5 servers.
At some point we noticed that one of the servers had a very high CPU and
was not responding. A few minutes later, the other 4 servers were
responding very slowly. A restart was required.
Looking at the Solr logs, we mainly saw symptoms, i.e. errors that happened
a few minutes after the high CPU started (connection timeouts etc).

When looking at the javacore of the problematic server, we found that one
thread was waiting on a log4j method, and 538 threads (!) were waiting on
the same lock.
The thread's stack trace is:

3XMTHREADINFO  "http-bio-8443-exec-37460"
J9VMThread:0x7FED88044600, j9thread_t:0x7FE73E4D04A0,
java/lang/Thread:0x7FF267995468, state:CW, prio=5

3XMJAVALTHREAD(java/lang/Thread getId:0xA1AC9, isDaemon:true)

3XMTHREADINFO1(native thread ID:0x17F8, native priority:0x5,
native policy:UNKNOWN)

3XMTHREADINFO2(native stack address range
from:0x7FEA9487B000, to:0x7FEA948BC000, size:0x41000)

3XMCPUTIME   CPU usage total: 55.216798962 secs

3XMHEAPALLOC Heap bytes allocated since last GC cycle=3176200
(0x307708)

3XMTHREADINFO3   Java callstack:

4XESTACKTRACEat
org/apache/log4j/Category.callAppenders(Category.java:204)

4XESTACKTRACEat
org/apache/log4j/Category.forcedLog(Category.java:391(Compiled Code))

4XESTACKTRACEat
org/apache/log4j/Category.log(Category.java:856(Compiled Code))

4XESTACKTRACEat
org/slf4j/impl/Log4jLoggerAdapter.error(Log4jLoggerAdapter.java:498)

4XESTACKTRACEat
org/apache/solr/common/SolrException.log(SolrException.java:109)

4XESTACKTRACEat
org/apache/solr/handler/RequestHandlerBase.handleRequest(RequestHandlerBase.java:153(Compiled
Code))

4XESTACKTRACEat
org/apache/solr/core/SolrCore.execute(SolrCore.java:1916(Compiled Code))

4XESTACKTRACEat
org/apache/solr/servlet/SolrDispatchFilter.execute(SolrDispatchFilter.java:780(Compiled
Code))

4XESTACKTRACEat
org/apache/solr/servlet/SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427(Compiled
Code))
4XESTACKTRACEat
org/apache/solr/servlet/SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217(Compiled
...

Our logging is done to a local file.
After searching the web, we found similar problems:
https://bz.apache.org/bugzilla/show_bug.cgi?id=50213
https://bz.apache.org/bugzilla/show_bug.cgi?id=51047
https://dzone.com/articles/log4j-thread-deadlock-case

However, seems like the fixes were made for log4j 2.X. And Solr uses log4j
1.2.X (even the new Solr 5.3.0, from what I've seen).

Is this a known problem?
Is it possible to upgrade Solr log4j version to 2.X?

Thanks,
Arnon


Re: Issue Using Solr 5.3 Authentication and Authorization Plugins

2015-09-01 Thread Kevin Lee
Thanks for the clarification!  

So is the wiki page incorrect at 
https://cwiki.apache.org/confluence/display/solr/Basic+Authentication+Plugin 
which says that the admin ui will require authentication once the authorization 
plugin is activated?

"An authorization plugin is also available to configure Solr with permissions 
to perform various activities in the system. Once activated, access to the Solr 
Admin UI and all requests will need to be authenticated and users will be 
required to have the proper authorization for all requests, including using the 
Admin UI and making any API calls."

If activating the authorization plugin doesn't protect the admin ui, how does 
one protect access to it?

Also, the issue I'm having is not just at restart.  According to the docs 
security.json should be uploaded to Zookeeper before starting any of the Solr 
instances.  However, I tried to upload security.json before starting any of the 
Solr instances, but it would not pick up the security config until after the 
Solr instances are already running and then uploading the security.json again.  
I can see in the logs at startup that the Solr instances don't see any plugin 
enabled even though security.json is already in zookeeper and then after they 
are started and the security.json is uploaded again I see it reconfigure to use 
the plugin.

Thanks,
Kevin

> On Aug 31, 2015, at 11:22 PM, Noble Paul  wrote:
> 
> Admin UI is not protected by any of these permissions. Only if you try
> to perform a protected operation , it asks for a password.
> 
> I'll investigate the restart problem and report my  findings
> 
>> On Tue, Sep 1, 2015 at 3:10 AM, Kevin Lee  wrote:
>> Anyone else running into any issues trying to get the authentication and 
>> authorization plugins in 5.3 working?
>> 
>>> On Aug 29, 2015, at 2:30 AM, Kevin Lee  wrote:
>>> 
>>> Hi,
>>> 
>>> I’m trying to use the new basic auth plugin for Solr 5.3 and it doesn’t 
>>> seem to be working quite right.  Not sure if I’m missing steps or there is 
>>> a bug.  I am able to get it to protect access to a URL under a collection, 
>>> but am unable to get it to secure access to the Admin UI.  In addition, 
>>> after stopping the Solr and Zookeeper instances, the security.json is still 
>>> in Zookeeper, however Solr is allowing access to everything again like the 
>>> security configuration isn’t in place.
>>> 
>>> Contents of security.json taken from wiki page, but edited to produce valid 
>>> JSON.  Had to move comma after 3rd from last “}” up to just after the last 
>>> “]”.
>>> 
>>> {
>>> "authentication":{
>>> "class":"solr.BasicAuthPlugin",
>>> "credentials":{"solr":"IV0EHq1OnNrj6gvRCwvFwTrZ1+z1oBbnQdiVC3otuq0= 
>>> Ndd7LKvVBAaZIF0QAVi1ekCfAJXr1GGfLtRUXhgrF8c="}
>>> },
>>> "authorization":{
>>> "class":"solr.RuleBasedAuthorizationPlugin",
>>> "permissions":[{"name":"security-edit",
>>>"role":"admin"}],
>>> "user-role":{"solr":"admin"}
>>> }}
>>> 
>>> Here are the steps I followed:
>>> 
>>> Upload security.json to zookeeper
>>> ./zkcli.sh -z localhost:2181,localhost:2182,localhost:2183 -cmd putfile 
>>> /security.json ~/solr/security.json
>>> 
>>> Use zkCli.sh from Zookeeper to ensure the security.json is in Zookeeper at 
>>> /security.json.  It is there and looks like what was originally uploaded.
>>> 
>>> Start Solr Instances
>>> 
>>> Attempt to create a permission, however get the following error:
>>> {
>>> "responseHeader":{
>>>  "status":400,
>>>  "QTime":0},
>>> "error":{
>>>  "msg":"No authorization plugin configured",
>>>  "code":400}}
>>> 
>>> Upload security.json again.
>>> ./zkcli.sh -z localhost:2181,localhost:2182,localhost:2183 -cmd putfile 
>>> /security.json ~/solr/security.json
>>> 
>>> Issue the following to try to create the permission again and this time 
>>> it’s successful.
>>> // Create a permission for mysearch endpoint
>>>  curl --user solr:SolrRocks -H 'Content-type:application/json' -d 
>>> '{"set-permission": {"name":"mycollection-search","collection": 
>>> “mycollection","path":”/mysearch","role": "search-user"}}' 
>>> http://localhost:8983/solr/admin/authorization
>>> 
>>>  {
>>>"responseHeader":{
>>>  "status":0,
>>>  "QTime":7}}
>>> 
>>> Issue the following commands to add users
>>> curl --user solr:SolrRocks http://localhost:8983/solr/admin/authentication 
>>> -H 'Content-type:application/json' -d '{"set-user": {"admin" : “password" 
>>> }}’
>>> curl --user solr:SolrRocks http://localhost:8983/solr/admin/authentication 
>>> -H 'Content-type:application/json' -d '{"set-user": {"user" : “password" }}'
>>> 
>>> Issue the following command to add permission to users
>>> curl -u solr:SolrRocks -H 'Content-type:application/json' -d '{ 
>>> "set-user-role" : {"admin": ["search-user", "admin"]}}' 
>>> http://localhost:8983/solr/admin/authorization
>>> curl -u solr:SolrRocks -H 'Content-type:application/json' -d '{ 
>>> "set-user-role" : 

Re: Get distinct results in Solr

2015-09-01 Thread Upayavira
you are attempting to write your signature to your ID field. That's not
a good idea. You are generating your signature from the content field,
which seems okay. Change your id to be
your 'signature' field instead of id, and something different will
happen :-)

Upayavira

On Tue, Sep 1, 2015, at 04:34 AM, Zheng Lin Edwin Yeo wrote:
> I tried to follow the de-duplication guide, but after I configured it in
> solrconfig.xml and schema.xml, nothing is indexed into Solr, and there is
> no error message. I'm using SimplePostTool to index rich-text documents.
> 
> Below are my configurations:
> 
> In solrconfig.xml
> 
>   
>  
> dedupe
>  
>   
> 
> 
>  
> true
> id
> false
> content
> solr.processor.Lookup3Signature
>  
> 
> 
> 
> In schema.xml
> 
>   multiValued="false" />
> 
> 
> Is there anything which I might have missed out or done wrongly?
> 
> Regards,
> Edwin
> 
> 
> On 1 September 2015 at 10:46, Zheng Lin Edwin Yeo 
> wrote:
> 
> > Thank you for your advice Alexandre.
> >
> > Will try out the de-duplication from the link you gave.
> >
> > Regards,
> > Edwin
> >
> >
> > On 1 September 2015 at 10:34, Alexandre Rafalovitch 
> > wrote:
> >
> >> Re-read the question. You want to de-dupe on the full text-content.
> >>
> >> I would actually try to use the dedupe chain as per the link I gave
> >> but put results into a separate string field. Then, you group on that
> >> field. You cannot actually group on the long text field, that would
> >> kill any performance. So a signature is your proxy.
> >>
> >> Regards,
> >>Alex
> >> 
> >> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> >> http://www.solr-start.com/
> >>
> >>
> >> On 31 August 2015 at 22:26, Zheng Lin Edwin Yeo 
> >> wrote:
> >> > Hi Alexandre,
> >> >
> >> > Will treating it as String affect the search or other functions like
> >> > highlighting?
> >> >
> >> > Yes, the content must be in my index, unless I do a copyField to do
> >> > de-duplication on that field.. Will that help?
> >> >
> >> > Regards,
> >> > Edwin
> >> >
> >> >
> >> > On 1 September 2015 at 10:04, Alexandre Rafalovitch  >> >
> >> > wrote:
> >> >
> >> >> Can't you just treat it as String?
> >> >>
> >> >> Also, do you actually want those documents in your index in the first
> >> >> place? If not, have you looked at De-duplication:
> >> >> https://cwiki.apache.org/confluence/display/solr/De-Duplication
> >> >>
> >> >> Regards,
> >> >>Alex.
> >> >> 
> >> >> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> >> >> http://www.solr-start.com/
> >> >>
> >> >>
> >> >> On 31 August 2015 at 22:00, Zheng Lin Edwin Yeo 
> >> >> wrote:
> >> >> > Thanks Jan.
> >> >> >
> >> >> > But I read that the field that is being collapsed on must be a single
> >> >> > valued String, Int or Float. As I'm required to get the distinct
> >> results
> >> >> > from "content" field that was indexed from a rich text document, I
> >> got
> >> >> the
> >> >> > following error:
> >> >> >
> >> >> >   "error":{
> >> >> > "msg":"java.io.IOException: 64 bit numeric collapse fields are
> >> not
> >> >> > supported",
> >> >> > "trace":"java.lang.RuntimeException: java.io.IOException: 64 bit
> >> >> > numeric collapse fields are not supported\r\n\tat
> >> >> >
> >> >> >
> >> >> > Is it possible to collapsed on fields which has a long integer of
> >> data,
> >> >> > like content from a rich text document?
> >> >> >
> >> >> > Regards,
> >> >> > Edwin
> >> >> >
> >> >> >
> >> >> > On 31 August 2015 at 18:59, Jan Høydahl 
> >> wrote:
> >> >> >
> >> >> >> Hi
> >> >> >>
> >> >> >> Check out the CollapsingQParser (
> >> >> >>
> >> >>
> >> https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results
> >> >> ).
> >> >> >> As long as you have a field that will be the same for all
> >> duplicates,
> >> >> you
> >> >> >> can “collapse” on that field. If you not have a “group id”, you can
> >> >> create
> >> >> >> one using e.g. an MD5 signature of the identical body text (
> >> >> >> https://cwiki.apache.org/confluence/display/solr/De-Duplication).
> >> >> >>
> >> >> >> --
> >> >> >> Jan Høydahl, search solution architect
> >> >> >> Cominvent AS - www.cominvent.com
> >> >> >>
> >> >> >> > 31. aug. 2015 kl. 12.03 skrev Zheng Lin Edwin Yeo <
> >> >> edwinye...@gmail.com
> >> >> >> >:
> >> >> >> >
> >> >> >> > Hi,
> >> >> >> >
> >> >> >> > I'm using Solr 5.2.1, and I would like to find out, what is the
> >> best
> >> >> way
> >> >> >> to
> >> >> >> > get Solr to return only distinct results?
> >> >> >> >
> >> >> >> > Currently, I've indexed several exact similar documents into Solr,
> >> >> with
> >> >> >> > just different id and title, but the content is exactly the same.
> >> >> When I
> >> >> >> do
> >> >> >> > a search, Solr will return all these documents several time in the
> >> >> list.
> >> >> >> >
> >> >> >> > What is the 

Re: Issue Using Solr 5.3 Authentication and Authorization Plugins

2015-09-01 Thread Noble Paul
Admin UI is not protected by any of these permissions. Only if you try
to perform a protected operation , it asks for a password.

I'll investigate the restart problem and report my  findings

On Tue, Sep 1, 2015 at 3:10 AM, Kevin Lee  wrote:
> Anyone else running into any issues trying to get the authentication and 
> authorization plugins in 5.3 working?
>
>> On Aug 29, 2015, at 2:30 AM, Kevin Lee  wrote:
>>
>> Hi,
>>
>> I’m trying to use the new basic auth plugin for Solr 5.3 and it doesn’t seem 
>> to be working quite right.  Not sure if I’m missing steps or there is a bug. 
>>  I am able to get it to protect access to a URL under a collection, but am 
>> unable to get it to secure access to the Admin UI.  In addition, after 
>> stopping the Solr and Zookeeper instances, the security.json is still in 
>> Zookeeper, however Solr is allowing access to everything again like the 
>> security configuration isn’t in place.
>>
>> Contents of security.json taken from wiki page, but edited to produce valid 
>> JSON.  Had to move comma after 3rd from last “}” up to just after the last 
>> “]”.
>>
>> {
>> "authentication":{
>>   "class":"solr.BasicAuthPlugin",
>>   "credentials":{"solr":"IV0EHq1OnNrj6gvRCwvFwTrZ1+z1oBbnQdiVC3otuq0= 
>> Ndd7LKvVBAaZIF0QAVi1ekCfAJXr1GGfLtRUXhgrF8c="}
>> },
>> "authorization":{
>>   "class":"solr.RuleBasedAuthorizationPlugin",
>>   "permissions":[{"name":"security-edit",
>>  "role":"admin"}],
>>   "user-role":{"solr":"admin"}
>> }}
>>
>> Here are the steps I followed:
>>
>> Upload security.json to zookeeper
>> ./zkcli.sh -z localhost:2181,localhost:2182,localhost:2183 -cmd putfile 
>> /security.json ~/solr/security.json
>>
>> Use zkCli.sh from Zookeeper to ensure the security.json is in Zookeeper at 
>> /security.json.  It is there and looks like what was originally uploaded.
>>
>> Start Solr Instances
>>
>> Attempt to create a permission, however get the following error:
>> {
>>  "responseHeader":{
>>"status":400,
>>"QTime":0},
>>  "error":{
>>"msg":"No authorization plugin configured",
>>"code":400}}
>>
>> Upload security.json again.
>> ./zkcli.sh -z localhost:2181,localhost:2182,localhost:2183 -cmd putfile 
>> /security.json ~/solr/security.json
>>
>> Issue the following to try to create the permission again and this time it’s 
>> successful.
>> // Create a permission for mysearch endpoint
>>curl --user solr:SolrRocks -H 'Content-type:application/json' -d 
>> '{"set-permission": {"name":"mycollection-search","collection": 
>> “mycollection","path":”/mysearch","role": "search-user"}}' 
>> http://localhost:8983/solr/admin/authorization
>>
>>{
>>  "responseHeader":{
>>"status":0,
>>"QTime":7}}
>>
>> Issue the following commands to add users
>> curl --user solr:SolrRocks http://localhost:8983/solr/admin/authentication 
>> -H 'Content-type:application/json' -d '{"set-user": {"admin" : “password" }}’
>> curl --user solr:SolrRocks http://localhost:8983/solr/admin/authentication 
>> -H 'Content-type:application/json' -d '{"set-user": {"user" : “password" }}'
>>
>> Issue the following command to add permission to users
>> curl -u solr:SolrRocks -H 'Content-type:application/json' -d '{ 
>> "set-user-role" : {"admin": ["search-user", "admin"]}}' 
>> http://localhost:8983/solr/admin/authorization
>> curl -u solr:SolrRocks -H 'Content-type:application/json' -d '{ 
>> "set-user-role" : {"user": ["search-user"]}}' 
>> http://localhost:8983/solr/admin/authorization
>>
>> After executing the above, access to /mysearch is protected until I restart 
>> the Solr and Zookeeper instances.  However, the admin UI is never protected 
>> like the Wiki page says it should be once activated.
>>
>> https://cwiki.apache.org/confluence/display/solr/Rule-Based+Authorization+Plugin
>>  
>> 
>>
>> Why does the authentication and authorization plugin not stay activated 
>> after restart and why is the Admin UI never protected?  Am I missing any 
>> steps?
>>
>> Thanks,
>> Kevin



-- 
-
Noble Paul


Re: Custom merge logic in SolrCloud.

2015-09-01 Thread Upayavira
Take a step back. *why* do you need a blend? Can you adjust the scores
on your shards to make the normal algorithm work better for you?

Upayavira 

On Mon, Aug 31, 2015, at 08:47 PM, Mohan gupta wrote:
> Hi Folks,
> 
> I need to merge docs received from multiple shards via a custom logic, a
> straightforward score based priority queue doesn't work for my scenario
> (I
> need to maintain a blend/distribution of docs).
> 
> How can I plugin my custom merge logic? One way might be to fully
> implement
> the QueryComponent but that seems like a lot of work, is there a simpler
> way?
> 
> I need my custom logic to kick-in in very specific cases and most of the
> cases can still use default QueryComponent, was there a reason to make
> merge functionality private (non-overridable) in the  QueryComponent
> class?
> 
> -- 
> Regards ,
> Mohan Gupta


Re: Issue Using Solr 5.3 Authentication and Authorization Plugins

2015-09-01 Thread Noble Paul
I removed that statement

"If activating the authorization plugin doesn't protect the admin ui,
how does one protect access to it?"

One does not need to protect the admin UI. You only need to protect
the relevant API calls . I mean it's OK to not protect the CSS and
HTML stuff.  But if you perform an action to create a core or do a
query through admin UI , it automatically will prompt you for
credentials (if those APIs are protected)

On Tue, Sep 1, 2015 at 12:41 PM, Kevin Lee  wrote:
> Thanks for the clarification!
>
> So is the wiki page incorrect at
> https://cwiki.apache.org/confluence/display/solr/Basic+Authentication+Plugin 
> which says that the admin ui will require authentication once the 
> authorization plugin is activated?
>
> "An authorization plugin is also available to configure Solr with permissions 
> to perform various activities in the system. Once activated, access to the 
> Solr Admin UI and all requests will need to be authenticated and users will 
> be required to have the proper authorization for all requests, including 
> using the Admin UI and making any API calls."
>
> If activating the authorization plugin doesn't protect the admin ui, how does 
> one protect access to it?
>
> Also, the issue I'm having is not just at restart.  According to the docs 
> security.json should be uploaded to Zookeeper before starting any of the Solr 
> instances.  However, I tried to upload security.json before starting any of 
> the Solr instances, but it would not pick up the security config until after 
> the Solr instances are already running and then uploading the security.json 
> again.  I can see in the logs at startup that the Solr instances don't see 
> any plugin enabled even though security.json is already in zookeeper and then 
> after they are started and the security.json is uploaded again I see it 
> reconfigure to use the plugin.
>
> Thanks,
> Kevin
>
>> On Aug 31, 2015, at 11:22 PM, Noble Paul  wrote:
>>
>> Admin UI is not protected by any of these permissions. Only if you try
>> to perform a protected operation , it asks for a password.
>>
>> I'll investigate the restart problem and report my  findings
>>
>>> On Tue, Sep 1, 2015 at 3:10 AM, Kevin Lee  wrote:
>>> Anyone else running into any issues trying to get the authentication and 
>>> authorization plugins in 5.3 working?
>>>
 On Aug 29, 2015, at 2:30 AM, Kevin Lee  wrote:

 Hi,

 I’m trying to use the new basic auth plugin for Solr 5.3 and it doesn’t 
 seem to be working quite right.  Not sure if I’m missing steps or there is 
 a bug.  I am able to get it to protect access to a URL under a collection, 
 but am unable to get it to secure access to the Admin UI.  In addition, 
 after stopping the Solr and Zookeeper instances, the security.json is 
 still in Zookeeper, however Solr is allowing access to everything again 
 like the security configuration isn’t in place.

 Contents of security.json taken from wiki page, but edited to produce 
 valid JSON.  Had to move comma after 3rd from last “}” up to just after 
 the last “]”.

 {
 "authentication":{
 "class":"solr.BasicAuthPlugin",
 "credentials":{"solr":"IV0EHq1OnNrj6gvRCwvFwTrZ1+z1oBbnQdiVC3otuq0= 
 Ndd7LKvVBAaZIF0QAVi1ekCfAJXr1GGfLtRUXhgrF8c="}
 },
 "authorization":{
 "class":"solr.RuleBasedAuthorizationPlugin",
 "permissions":[{"name":"security-edit",
"role":"admin"}],
 "user-role":{"solr":"admin"}
 }}

 Here are the steps I followed:

 Upload security.json to zookeeper
 ./zkcli.sh -z localhost:2181,localhost:2182,localhost:2183 -cmd putfile 
 /security.json ~/solr/security.json

 Use zkCli.sh from Zookeeper to ensure the security.json is in Zookeeper at 
 /security.json.  It is there and looks like what was originally uploaded.

 Start Solr Instances

 Attempt to create a permission, however get the following error:
 {
 "responseHeader":{
  "status":400,
  "QTime":0},
 "error":{
  "msg":"No authorization plugin configured",
  "code":400}}

 Upload security.json again.
 ./zkcli.sh -z localhost:2181,localhost:2182,localhost:2183 -cmd putfile 
 /security.json ~/solr/security.json

 Issue the following to try to create the permission again and this time 
 it’s successful.
 // Create a permission for mysearch endpoint
  curl --user solr:SolrRocks -H 'Content-type:application/json' -d 
 '{"set-permission": {"name":"mycollection-search","collection": 
 “mycollection","path":”/mysearch","role": "search-user"}}' 
 http://localhost:8983/solr/admin/authorization

  {
"responseHeader":{
  "status":0,
  "QTime":7}}

 Issue the following commands to add users
 curl --user solr:SolrRocks 

Re: Issue Using Solr 5.3 Authentication and Authorization Plugins

2015-09-01 Thread Noble Paul
I'm investigating why restarts or first time start does not read the
security.json

On Tue, Sep 1, 2015 at 1:00 PM, Noble Paul  wrote:
> I removed that statement
>
> "If activating the authorization plugin doesn't protect the admin ui,
> how does one protect access to it?"
>
> One does not need to protect the admin UI. You only need to protect
> the relevant API calls . I mean it's OK to not protect the CSS and
> HTML stuff.  But if you perform an action to create a core or do a
> query through admin UI , it automatically will prompt you for
> credentials (if those APIs are protected)
>
> On Tue, Sep 1, 2015 at 12:41 PM, Kevin Lee  wrote:
>> Thanks for the clarification!
>>
>> So is the wiki page incorrect at
>> https://cwiki.apache.org/confluence/display/solr/Basic+Authentication+Plugin 
>> which says that the admin ui will require authentication once the 
>> authorization plugin is activated?
>>
>> "An authorization plugin is also available to configure Solr with 
>> permissions to perform various activities in the system. Once activated, 
>> access to the Solr Admin UI and all requests will need to be authenticated 
>> and users will be required to have the proper authorization for all 
>> requests, including using the Admin UI and making any API calls."
>>
>> If activating the authorization plugin doesn't protect the admin ui, how 
>> does one protect access to it?
>>
>> Also, the issue I'm having is not just at restart.  According to the docs 
>> security.json should be uploaded to Zookeeper before starting any of the 
>> Solr instances.  However, I tried to upload security.json before starting 
>> any of the Solr instances, but it would not pick up the security config 
>> until after the Solr instances are already running and then uploading the 
>> security.json again.  I can see in the logs at startup that the Solr 
>> instances don't see any plugin enabled even though security.json is already 
>> in zookeeper and then after they are started and the security.json is 
>> uploaded again I see it reconfigure to use the plugin.
>>
>> Thanks,
>> Kevin
>>
>>> On Aug 31, 2015, at 11:22 PM, Noble Paul  wrote:
>>>
>>> Admin UI is not protected by any of these permissions. Only if you try
>>> to perform a protected operation , it asks for a password.
>>>
>>> I'll investigate the restart problem and report my  findings
>>>
 On Tue, Sep 1, 2015 at 3:10 AM, Kevin Lee  
 wrote:
 Anyone else running into any issues trying to get the authentication and 
 authorization plugins in 5.3 working?

> On Aug 29, 2015, at 2:30 AM, Kevin Lee  wrote:
>
> Hi,
>
> I’m trying to use the new basic auth plugin for Solr 5.3 and it doesn’t 
> seem to be working quite right.  Not sure if I’m missing steps or there 
> is a bug.  I am able to get it to protect access to a URL under a 
> collection, but am unable to get it to secure access to the Admin UI.  In 
> addition, after stopping the Solr and Zookeeper instances, the 
> security.json is still in Zookeeper, however Solr is allowing access to 
> everything again like the security configuration isn’t in place.
>
> Contents of security.json taken from wiki page, but edited to produce 
> valid JSON.  Had to move comma after 3rd from last “}” up to just after 
> the last “]”.
>
> {
> "authentication":{
> "class":"solr.BasicAuthPlugin",
> "credentials":{"solr":"IV0EHq1OnNrj6gvRCwvFwTrZ1+z1oBbnQdiVC3otuq0= 
> Ndd7LKvVBAaZIF0QAVi1ekCfAJXr1GGfLtRUXhgrF8c="}
> },
> "authorization":{
> "class":"solr.RuleBasedAuthorizationPlugin",
> "permissions":[{"name":"security-edit",
>"role":"admin"}],
> "user-role":{"solr":"admin"}
> }}
>
> Here are the steps I followed:
>
> Upload security.json to zookeeper
> ./zkcli.sh -z localhost:2181,localhost:2182,localhost:2183 -cmd putfile 
> /security.json ~/solr/security.json
>
> Use zkCli.sh from Zookeeper to ensure the security.json is in Zookeeper 
> at /security.json.  It is there and looks like what was originally 
> uploaded.
>
> Start Solr Instances
>
> Attempt to create a permission, however get the following error:
> {
> "responseHeader":{
>  "status":400,
>  "QTime":0},
> "error":{
>  "msg":"No authorization plugin configured",
>  "code":400}}
>
> Upload security.json again.
> ./zkcli.sh -z localhost:2181,localhost:2182,localhost:2183 -cmd putfile 
> /security.json ~/solr/security.json
>
> Issue the following to try to create the permission again and this time 
> it’s successful.
> // Create a permission for mysearch endpoint
>  curl --user solr:SolrRocks -H 'Content-type:application/json' -d 
> '{"set-permission": {"name":"mycollection-search","collection": 

Re: 'missing content stream' issuing expungeDeletes=true

2015-09-01 Thread Upayavira
I wonder if this resolves it [1]. It has been applied to trunk, but not
to the 5.x release branch.

If you needed it in 5.x, I wonder if there's a way that particular
choice could be made configurable.

Upayavira

[1] https://issues.apache.org/jira/browse/LUCENE-6711
On Tue, Sep 1, 2015, at 02:43 AM, Derek Poh wrote:
> Hi Upayavira
> 
> In fact we are using optimize currently but was advised to use expunge 
> deletes as it is less resource intensive.
> So expunge deletes will only remove deleted documents, it will not merge 
> all index segments into one?
> 
> If we don't use optimize, the deleted documents in the index will affect 
> the scores (with docFreq=2) of the matched documents which will affect 
> the relevancy of the search result.
> 
> Derek
> 
> On 9/1/2015 12:05 AM, Upayavira wrote:
> > If you really must expunge deletes, use optimize. That will merge all
> > index segments into one, and in the process will remove any deleted
> > documents.
> >
> > Why do you need to expunge deleted documents anyway? It is generally
> > done in the background for you, so you shouldn't need to worry about it.
> >
> > Upayavira
> >
> > On Mon, Aug 31, 2015, at 06:46 AM, davidphilip cherian wrote:
> >> Hi,
> >>
> >> The below curl command worked without error, you can try.
> >>
> >> curl http://localhost:8983/solr/techproducts/update?commit=true -H
> >> "Content-Type: text/xml" --data-binary ' >> expungeDeletes="true"/>'
> >>
> >> However, after executing this, I could still see same deleted counts on
> >> dashboard.  Deleted Docs:6
> >> I am not sure whether that means,  the command did not take effect or it
> >> took effect but did not reflect on dashboard view.
> >>
> >>
> >>
> >>
> >>
> >> On Mon, Aug 31, 2015 at 8:51 AM, Derek Poh 
> >> wrote:
> >>
> >>> Hi
> >>>
> >>> I tried doing a expungeDeletes=true with the following but get the message
> >>> 'missing content stream'. What am I missing? I need to provide additional
> >>> parameters?
> >>>
> >>> curl 'http://127.0.0.1:8983/solr/supplier/update/json?expungeDeletes=true
> >>> ';
> >>>
> >>> Thanks,
> >>> Derek
> >>>
> >>> --
> >>> CONFIDENTIALITY NOTICE
> >>> This e-mail (including any attachments) may contain confidential and/or
> >>> privileged information. If you are not the intended recipient or have
> >>> received this e-mail in error, please inform the sender immediately and
> >>> delete this e-mail (including any attachments) from your computer, and you
> >>> must not use, disclose to anyone else or copy this e-mail (including any
> >>> attachments), whether in whole or in part.
> >>> This e-mail and any reply to it may be monitored for security, legal,
> >>> regulatory compliance and/or other appropriate reasons.
> >>>
> >>>
> >
> 
> 
> --
> CONFIDENTIALITY NOTICE 
> 
> This e-mail (including any attachments) may contain confidential and/or
> privileged information. If you are not the intended recipient or have
> received this e-mail in error, please inform the sender immediately and
> delete this e-mail (including any attachments) from your computer, and
> you must not use, disclose to anyone else or copy this e-mail (including
> any attachments), whether in whole or in part. 
> 
> This e-mail and any reply to it may be monitored for security, legal,
> regulatory compliance and/or other appropriate reasons.


Re: 'missing content stream' issuing expungeDeletes=true

2015-09-01 Thread Derek Poh

Erick

Yes, we see documents changing their position in the list due to having 
deleted docs.
In our searchresult,weapply higher boost (bq) to a group of matched 
documents to have them display at the top tier of the result.
At times 1 or 2 of these documentsare not return in the top tier, they 
are relegateddown to the lower tierof the result. Wediscovered that 
these documents have a lower score due to docFreq=2.
After we do an optimize, these 1-2 documents are back in the top tier 
result order and their docFreqis 1.




On 9/1/2015 11:40 PM, Erick Erickson wrote:

Derek:

Why do you care? What evidence do you have that this matters _practically_?

If you've look at scoring with a small number of documents, you'll see
significant
differences due to deleted documents. In most cases, as you get a larger number
of documents the ranking of documents in an index with no deletions .vs. indexes
that have deletions is usually not noticeable.

I'm suggesting that this is a red herring. Your specific situation may
be different
of course, but since scoring is really only about ranking docs
relative to each other,
unless the relative positions change enough to be noticeable it's not a problem.

Note that I'm saying "relative rankings", NOT "absolute score". Document scores
have no meaning outside comparisons to other docs _in the same query_. So
unless you see documents changing their position in the list due to
having deleted
docs, it's not worth spending time on IMO.

Best,
Erick

On Tue, Sep 1, 2015 at 12:45 AM, Upayavira  wrote:

I wonder if this resolves it [1]. It has been applied to trunk, but not
to the 5.x release branch.

If you needed it in 5.x, I wonder if there's a way that particular
choice could be made configurable.

Upayavira

[1] https://issues.apache.org/jira/browse/LUCENE-6711
On Tue, Sep 1, 2015, at 02:43 AM, Derek Poh wrote:

Hi Upayavira

In fact we are using optimize currently but was advised to use expunge
deletes as it is less resource intensive.
So expunge deletes will only remove deleted documents, it will not merge
all index segments into one?

If we don't use optimize, the deleted documents in the index will affect
the scores (with docFreq=2) of the matched documents which will affect
the relevancy of the search result.

Derek

On 9/1/2015 12:05 AM, Upayavira wrote:

If you really must expunge deletes, use optimize. That will merge all
index segments into one, and in the process will remove any deleted
documents.

Why do you need to expunge deleted documents anyway? It is generally
done in the background for you, so you shouldn't need to worry about it.

Upayavira

On Mon, Aug 31, 2015, at 06:46 AM, davidphilip cherian wrote:

Hi,

The below curl command worked without error, you can try.

curl http://localhost:8983/solr/techproducts/update?commit=true -H
"Content-Type: text/xml" --data-binary ''

However, after executing this, I could still see same deleted counts on
dashboard.  Deleted Docs:6
I am not sure whether that means,  the command did not take effect or it
took effect but did not reflect on dashboard view.





On Mon, Aug 31, 2015 at 8:51 AM, Derek Poh 
wrote:


Hi

I tried doing a expungeDeletes=true with the following but get the message
'missing content stream'. What am I missing? I need to provide additional
parameters?

curl 'http://127.0.0.1:8983/solr/supplier/update/json?expungeDeletes=true
';

Thanks,
Derek

--
CONFIDENTIALITY NOTICE
This e-mail (including any attachments) may contain confidential and/or
privileged information. If you are not the intended recipient or have
received this e-mail in error, please inform the sender immediately and
delete this e-mail (including any attachments) from your computer, and you
must not use, disclose to anyone else or copy this e-mail (including any
attachments), whether in whole or in part.
This e-mail and any reply to it may be monitored for security, legal,
regulatory compliance and/or other appropriate reasons.




--
CONFIDENTIALITY NOTICE

This e-mail (including any attachments) may contain confidential and/or
privileged information. If you are not the intended recipient or have
received this e-mail in error, please inform the sender immediately and
delete this e-mail (including any attachments) from your computer, and
you must not use, disclose to anyone else or copy this e-mail (including
any attachments), whether in whole or in part.

This e-mail and any reply to it may be monitored for security, legal,
regulatory compliance and/or other appropriate reasons.





--
CONFIDENTIALITY NOTICE 

This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or 

Re: Difference between Legacy Facets and JSON Facets

2015-09-01 Thread Zheng Lin Edwin Yeo
Hi Yonik,

Thanks for pointing out the difference.

I've made modification and tried with this below command for JSON Facet,
but it is still having a QTime of 410, as compared to the Legacy Facet
QTime of 22:
http://localhost:8983/solr/collection1/select?q=paint={f:{field:content}}=0

Is this the same as the Legacy Facet query of
http://localhost:8983/solr/collection1/select?q=paint=true=content=0

 ?


Regards,
Edwin


On 1 September 2015 at 23:24, Yonik Seeley  wrote:

> They aren't doing the same thing...
>
> The first URL is doing a straight facet on the content field.
> The second URL is doing a facet on the content field and asking for an
> additional statistic for each bucket.
>
> -Yonik
>
>
> On Tue, Sep 1, 2015 at 11:08 AM, Zheng Lin Edwin Yeo
>  wrote:
> > I've tried the following commands and I found that the Legacy Faceting is
> > actually much faster than JSON Faceting. Not sure why is this so, when
> the
> > document from this link http://yonik.com/solr-count-distinct/ states
> that
> > JSON Facets has a much lower request latency.
> >
> > (For Legacy Facet) - QTime: 22
> >
> > -
> >
> http://localhost:8983/solr/collection1/select?q=paint=true=content=0
> > <
> http://27.54.41.220:8983/edm/collection1/select?q=paint=true=content=0
> >
> >
> > (For JSON Facet) - QTime: 1128
> >
> > -
> >
> http://localhost:8983/solr/collection1/select?q=paint={f:{type:terms,field:content,facet:{stat1
> :"hll(id)"}}}=0
> > <
> http://27.54.41.220:8983/edm/collection1/select?q=paint=%7bf:%7btype:terms,field:content,facet:%7bstat1:%22hll(id)%22%7d%7d%7d=0
> >
> >
> >
> > Is there any problem with my URL for the JSON Facet?
> >
> >
> > Regards,
> >
> > Edwin
> >
> >
> >
> > On 1 September 2015 at 16:51, Zheng Lin Edwin Yeo 
> > wrote:
> >
> >> Hi,
> >>
> >> I'm using Solr 5.2.1, and I would like to find out, what is the
> difference
> >> between Legacy Facets and JSON Facets in Solr? I was told that JSON
> Facets
> >> has a much lesser Request Latency, but I couldn't find any major
> difference
> >> in speed. Or must we have a larger index in order to have any
> significant
> >> difference?
> >>
> >> Is there any significant advantage to use JSON Faceting command instead
> of
> >> Legacy Faceting command?
> >>
> >> Regards,
> >> Edwin
> >>
>


Re: Issue Using Solr 5.3 Authentication and Authorization Plugins

2015-09-01 Thread shamik
Hi Kevin,

  Were you able to get a workaround / fix for your problem ? I'm also
looking to secure Collection and Update APIs by upgrading to 5.3. Just
wondering if it's worth the upgrade or should I wait for the next version,
which will probably address this.

Regards,
Shamik



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Issue-Using-Solr-5-3-Authentication-and-Authorization-Plugins-tp4226011p4226552.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DataImportHandler scheduling

2015-09-01 Thread William Bell
We should add a simple scheduler in the UI. It is very useful. To schedule
various actions:

- Full index
- Delta Index
- Replicate




On Tue, Sep 1, 2015 at 12:41 PM, Shawn Heisey  wrote:

> On 9/1/2015 11:45 AM, Troy Edwards wrote:
> > My initial thought was to use scheduling built with DIH:
> > http://wiki.apache.org/solr/DataImportHandler#Scheduling
> >
> > But I think just a cron job should do the same for me.
>
> The dataimport scheduler does not exist in any Solr version.  This is a
> proposed feature, with the enhancement issue open for more than four years:
>
> https://issues.apache.org/jira/browse/SOLR-2305
>
> I have updated the wiki page to state the fact that the scheduler is a
> proposed improvement, not a usable feature.
>
> Thanks,
> Shawn
>
>


-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: DataImportHandler scheduling

2015-09-01 Thread Kevin Lee
While it may be useful to have a scheduler for simple cases, I think there are 
too many variables to make it useful for everyone's case.  For example, I 
recently wrote a script that uses the data import handler api to get the 
status, kick off the import, etc.  However, before allowing it to just kick 
off, I needed to query the database where the data was coming from to make sure 
it had finished it's daily load and then if it hadn't finished, wait for awhile 
to see if it would, then the script could do the load.  After the load is 
finished it does another check to ensure the expected number of docs was 
actually loaded by Solr based on the data from the database.

If a scheduler were built into Solr it probably would only cover the simple 
case and for production you'd probably need to write your own scripts and use 
your own scheduler anyways to ensure the loads are starting/completing as 
expected.

> On Sep 1, 2015, at 1:09 PM, William Bell  wrote:
> 
> We should add a simple scheduler in the UI. It is very useful. To schedule
> various actions:
> 
> - Full index
> - Delta Index
> - Replicate
> 
> 
> 
> 
>> On Tue, Sep 1, 2015 at 12:41 PM, Shawn Heisey  wrote:
>> 
>>> On 9/1/2015 11:45 AM, Troy Edwards wrote:
>>> My initial thought was to use scheduling built with DIH:
>>> http://wiki.apache.org/solr/DataImportHandler#Scheduling
>>> 
>>> But I think just a cron job should do the same for me.
>> 
>> The dataimport scheduler does not exist in any Solr version.  This is a
>> proposed feature, with the enhancement issue open for more than four years:
>> 
>> https://issues.apache.org/jira/browse/SOLR-2305
>> 
>> I have updated the wiki page to state the fact that the scheduler is a
>> proposed improvement, not a usable feature.
>> 
>> Thanks,
>> Shawn
> 
> 
> -- 
> Bill Bell
> billnb...@gmail.com
> cell 720-256-8076


Re: plz help me

2015-09-01 Thread sara hajili
i'm really confused:|
i'm really anxious about cost of update like count.
and as you said:

you indexed like_count field .and i think it cost alot to update and index
again docs.
because like count change more and more
so isn't better to indede="false" that this field name??!!

On Tue, Sep 1, 2015 at 3:08 AM, Upayavira  wrote:

> you don't need to use a dynamic field, just a normal field will work for
> you. But, you *will* want to index it, and you may benefit from
> docValues, so:
>
>  docValues="true"/>
>
> Upayavira
>
> On Tue, Sep 1, 2015, at 10:59 AM, sara hajili wrote:
> > my solr version is 5.2.1
> > i have a question.
> > if i create 2 core .one for post and one for like . i must index like
> > count?
> > i mean in schema for like core i must write:
> >  > stored="true"/>
> >
> > am i true?
> >
> > On Tue, Sep 1, 2015 at 2:42 AM, Upayavira  wrote:
> >
> > > So you want to be able to sort by the "number of likes" value for a
> > > post?
> > >
> > > What version of Solr are you using? How many posts do you have?
> > >
> > > There's a neat feature in Solr 5.2.1 (I'm pretty sure it is there, not
> > > 5.3) called score joins. Using that you can have two cores, one
> > > containing your posts, and another containing your likes.
> > >
> > > You cannot *sort* on these values, but you can include your likes into
> > > the score, which might even be better.
> > >
> > > If this sounds good, I can dig up some syntax for such a query.
> > >
> > > Upayavira
> > >
> > > On Tue, Sep 1, 2015, at 10:36 AM, sara hajili wrote:
> > > > hi.
> > > > at first i.m sorry for my bad english!
> > > > i have a social app.i want to use solr for searching in this app.
> > > > i have many document (in my case people text that posted on my social
> > > > app).
> > > > and i indexed this.
> > > > but i'm have 1 issue and it is :
> > > >
> > > > i have very doc(post) and they have a property "like" is it  good
> > > > approach
> > > > to index like count (people can like eachother post in my social
> app)?
> > > > likecount change more and more in one day.(so as i know it must be
> set
> > > > dynamic field)
> > > > and if i indexed it ,i think  it costs alot , to update and index
> > > > likecount
> > > > more and more even i use bach update.
> > > > so is it approach to didn't index one field in solr but i could sort
> my
> > > > search result according to that unindexed field?
> > > >
> > > > tnx
> > >
>


Re: Lucene/Solr 5.0 and custom FieldCahe implementation

2015-09-01 Thread Jamie Johnson
No worries, thanks again I'll begin teaching this

On Mon, Aug 31, 2015, 5:16 PM Tomás Fernández Löbbe 
wrote:

> Sorry Jamie, I totally missed this email. There was no Jira that I could
> find. I created SOLR-7996
>
> On Sat, Aug 29, 2015 at 5:26 AM, Jamie Johnson  wrote:
>
> > This sounds like a good idea, I'm assuming I'd need to make my own
> > UnInvertingReader (or subclass) to do this right?  Is there a way to do
> > this on the 5.x codebase or would I still need the solrindexer factory
> work
> > that Tomás mentioned previously?
> >
> > Tomás, is there a ticket for the SolrIndexer factory?  I'd like to follow
> > it's work to know what version of 5.x (or later) I should be looking for
> > this in.
> >
> > On Thu, Aug 27, 2015 at 1:06 PM, Yonik Seeley  wrote:
> >
> > > UnInvertingReader makes indexed fields look like docvalues fields.
> > > The caching itself is still done in FieldCache/FieldCacheImpl
> > > but you could perhaps wrap what is cached there to either screen out
> > > stuff or construct a new entry based on the user.
> > >
> > > -Yonik
> > >
> > >
> > > On Thu, Aug 27, 2015 at 12:55 PM, Jamie Johnson 
> > wrote:
> > > > I think a custom UnInvertingReader would work as I could skip the
> > process
> > > > of putting things in the cache.  Right now in Solr 4.x though I am
> > > caching
> > > > based but including the users authorities in the key of the cache so
> > > we're
> > > > not rebuilding the UnivertedField on every request.  Where in 5.x is
> > the
> > > > object actually cached?  Will this be possible in 5.x?
> > > >
> > > > On Thu, Aug 27, 2015 at 12:32 PM, Yonik Seeley 
> > > wrote:
> > > >
> > > >> The FieldCache has become implementation rather than interface, so I
> > > >> don't think you're going to see plugins at that level (it's all
> > > >> package protected now).
> > > >>
> > > >> One could either subclass or re-implement UnInvertingReader though.
> > > >>
> > > >> -Yonik
> > > >>
> > > >>
> > > >> On Thu, Aug 27, 2015 at 12:09 PM, Jamie Johnson 
> > > wrote:
> > > >> > Also in this vein I think that Lucene should support factories for
> > the
> > > >> > cache creation as described @
> > > >> > https://issues.apache.org/jira/browse/LUCENE-2394.  I'm not
> > endorsing
> > > >> the
> > > >> > patch that is provided (I haven't even looked at it) just the
> > concept
> > > in
> > > >> > general.
> > > >> >
> > > >> > On Thu, Aug 27, 2015 at 12:01 PM, Jamie Johnson <
> jej2...@gmail.com>
> > > >> wrote:
> > > >> >
> > > >> >> That makes sense, then I could extend the SolrIndexSearcher by
> > > creating
> > > >> a
> > > >> >> different factory class that did whatever magic I needed.  If you
> > > >> create a
> > > >> >> Jira ticket for this please link it here so I can track it!
> Again
> > > >> thanks
> > > >> >>
> > > >> >> On Thu, Aug 27, 2015 at 11:59 AM, Tomás Fernández Löbbe <
> > > >> >> tomasflo...@gmail.com> wrote:
> > > >> >>
> > > >> >>> I don't think there is a way to do this now. Maybe we should
> > > separate
> > > >> the
> > > >> >>> logic of creating the SolrIndexSearcher to a factory. Moving
> this
> > > logic
> > > >> >>> away from SolrCore is already a win, plus it will make it easier
> > to
> > > >> unit
> > > >> >>> test and extend for advanced use cases.
> > > >> >>>
> > > >> >>> Tomás
> > > >> >>>
> > > >> >>> On Wed, Aug 26, 2015 at 8:10 PM, Jamie Johnson <
> jej2...@gmail.com
> > >
> > > >> wrote:
> > > >> >>>
> > > >> >>> > Sorry to poke this again but I'm not following the last
> comment
> > of
> > > >> how I
> > > >> >>> > could go about extending the solr index searcher and have the
> > > >> extension
> > > >> >>> > used.  Is there an example of this?  Again thanks
> > > >> >>> >
> > > >> >>> > Jamie
> > > >> >>> > On Aug 25, 2015 7:18 AM, "Jamie Johnson" 
> > > wrote:
> > > >> >>> >
> > > >> >>> > > I had seen this as well, if I over wrote this by extending
> > > >> >>> > > SolrIndexSearcher how do I have my extension used?  I didn't
> > > see a
> > > >> way
> > > >> >>> > that
> > > >> >>> > > could be plugged in.
> > > >> >>> > > On Aug 25, 2015 7:15 AM, "Mikhail Khludnev" <
> > > >> >>> mkhlud...@griddynamics.com>
> > > >> >>> > > wrote:
> > > >> >>> > >
> > > >> >>> > >> On Tue, Aug 25, 2015 at 2:03 PM, Jamie Johnson <
> > > jej2...@gmail.com
> > > >> >
> > > >> >>> > wrote:
> > > >> >>> > >>
> > > >> >>> > >> > Thanks Mikhail.  If I'm reading the SimpleFacets class
> > > >> correctly,
> > > >> >>> out
> > > >> >>> > >> > delegates to DocValuesFacets when facet method is FC,
> what
> > > used
> > > >> to
> > > >> >>> be
> > > >> >>> > >> > FieldCache I believe.  DocValuesFacets either uses
> > DocValues
> > > or
> > > >> >>> builds
> > > >> >>> > >> then
> > > >> >>> > >> > using the UninvertingReader.
> > > >> >>> > >> >
> > > >> >>> > >>
> > > >> >>> > >> Ah.. got it. Thanks for reminding this details.It 

Re: Connect and sync two solr server

2015-09-01 Thread shahper

Hi,

In the link which you have send I Cannot se how to connect to solr cloud 
for synchronization of indexes.


From the description, this is straight forward SolrCloud where you
have replicas on the separate machines, see:
https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+SolrCloud

A different way of accomplishing this would be the master/slave style, see:
https://cwiki.apache.org/confluence/display/solr/Index+Replication
--
Shahper Jamil

System Administrator

Tel: +91 124 4548383 Ext- 1033
UK: +44 845 0047 142 Ext- 5133

TBS Website 
Techblue Software Pvt. Ltd
The Palms, Plot No 73, Sector 5, IMT Manesar,
Gurgaon- 122050 (Hr.)

www.techbluesoftware.co.in 


	TBS Facebook 
 
TBS Twitter  TBS Google+ 
 TBS Linked In 



TBS Branding 



Re: Get distinct results in Solr

2015-09-01 Thread Zheng Lin Edwin Yeo
Hi Upayavira,

I've tried to change id to be signature, but nothing is indexed into Solr as
well. Is that what you mean?

Besides that, I've also included a copyField to copy the content field into
the signature field. Both versions (with and without copyField) have
nothing indexed into Solr.

Regards,
Edwin


On 1 September 2015 at 15:48, Upayavira  wrote:

> you are attempting to write your signature to your ID field. That's not
> a good idea. You are generating your signature from the content field,
> which seems okay. Change your id to be
> your 'signature' field instead of id, and something different will
> happen :-)
>
> Upayavira
>
> On Tue, Sep 1, 2015, at 04:34 AM, Zheng Lin Edwin Yeo wrote:
> > I tried to follow the de-duplication guide, but after I configured it in
> > solrconfig.xml and schema.xml, nothing is indexed into Solr, and there is
> > no error message. I'm using SimplePostTool to index rich-text documents.
> >
> > Below are my configurations:
> >
> > In solrconfig.xml
> >
> >   
> >  
> > dedupe
> >  
> >   
> >
> > 
> >  
> > true
> > id
> > false
> > content
> > solr.processor.Lookup3Signature
> >  
> > 
> >
> >
> > In schema.xml
> >
> >   > multiValued="false" />
> >
> >
> > Is there anything which I might have missed out or done wrongly?
> >
> > Regards,
> > Edwin
> >
> >
> > On 1 September 2015 at 10:46, Zheng Lin Edwin Yeo 
> > wrote:
> >
> > > Thank you for your advice Alexandre.
> > >
> > > Will try out the de-duplication from the link you gave.
> > >
> > > Regards,
> > > Edwin
> > >
> > >
> > > On 1 September 2015 at 10:34, Alexandre Rafalovitch <
> arafa...@gmail.com>
> > > wrote:
> > >
> > >> Re-read the question. You want to de-dupe on the full text-content.
> > >>
> > >> I would actually try to use the dedupe chain as per the link I gave
> > >> but put results into a separate string field. Then, you group on that
> > >> field. You cannot actually group on the long text field, that would
> > >> kill any performance. So a signature is your proxy.
> > >>
> > >> Regards,
> > >>Alex
> > >> 
> > >> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> > >> http://www.solr-start.com/
> > >>
> > >>
> > >> On 31 August 2015 at 22:26, Zheng Lin Edwin Yeo  >
> > >> wrote:
> > >> > Hi Alexandre,
> > >> >
> > >> > Will treating it as String affect the search or other functions like
> > >> > highlighting?
> > >> >
> > >> > Yes, the content must be in my index, unless I do a copyField to do
> > >> > de-duplication on that field.. Will that help?
> > >> >
> > >> > Regards,
> > >> > Edwin
> > >> >
> > >> >
> > >> > On 1 September 2015 at 10:04, Alexandre Rafalovitch <
> arafa...@gmail.com
> > >> >
> > >> > wrote:
> > >> >
> > >> >> Can't you just treat it as String?
> > >> >>
> > >> >> Also, do you actually want those documents in your index in the
> first
> > >> >> place? If not, have you looked at De-duplication:
> > >> >> https://cwiki.apache.org/confluence/display/solr/De-Duplication
> > >> >>
> > >> >> Regards,
> > >> >>Alex.
> > >> >> 
> > >> >> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> > >> >> http://www.solr-start.com/
> > >> >>
> > >> >>
> > >> >> On 31 August 2015 at 22:00, Zheng Lin Edwin Yeo <
> edwinye...@gmail.com>
> > >> >> wrote:
> > >> >> > Thanks Jan.
> > >> >> >
> > >> >> > But I read that the field that is being collapsed on must be a
> single
> > >> >> > valued String, Int or Float. As I'm required to get the distinct
> > >> results
> > >> >> > from "content" field that was indexed from a rich text document,
> I
> > >> got
> > >> >> the
> > >> >> > following error:
> > >> >> >
> > >> >> >   "error":{
> > >> >> > "msg":"java.io.IOException: 64 bit numeric collapse fields
> are
> > >> not
> > >> >> > supported",
> > >> >> > "trace":"java.lang.RuntimeException: java.io.IOException: 64
> bit
> > >> >> > numeric collapse fields are not supported\r\n\tat
> > >> >> >
> > >> >> >
> > >> >> > Is it possible to collapsed on fields which has a long integer of
> > >> data,
> > >> >> > like content from a rich text document?
> > >> >> >
> > >> >> > Regards,
> > >> >> > Edwin
> > >> >> >
> > >> >> >
> > >> >> > On 31 August 2015 at 18:59, Jan Høydahl 
> > >> wrote:
> > >> >> >
> > >> >> >> Hi
> > >> >> >>
> > >> >> >> Check out the CollapsingQParser (
> > >> >> >>
> > >> >>
> > >>
> https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results
> > >> >> ).
> > >> >> >> As long as you have a field that will be the same for all
> > >> duplicates,
> > >> >> you
> > >> >> >> can “collapse” on that field. If you not have a “group id”, you
> can
> > >> >> create
> > >> >> >> one using e.g. an MD5 signature of the identical body text (
> > >> >> >> https://cwiki.apache.org/confluence/display/solr/De-Duplication
> ).
> > >> >> >>
> > >> >> >> --
> > >> >> >> Jan Høydahl, search solution architect
> > >> >> >> Cominvent 

Re: plz help me

2015-09-01 Thread Upayavira
So you want to be able to sort by the "number of likes" value for a
post?

What version of Solr are you using? How many posts do you have?

There's a neat feature in Solr 5.2.1 (I'm pretty sure it is there, not
5.3) called score joins. Using that you can have two cores, one
containing your posts, and another containing your likes.

You cannot *sort* on these values, but you can include your likes into
the score, which might even be better.

If this sounds good, I can dig up some syntax for such a query.

Upayavira

On Tue, Sep 1, 2015, at 10:36 AM, sara hajili wrote:
> hi.
> at first i.m sorry for my bad english!
> i have a social app.i want to use solr for searching in this app.
> i have many document (in my case people text that posted on my social
> app).
> and i indexed this.
> but i'm have 1 issue and it is :
> 
> i have very doc(post) and they have a property "like" is it  good
> approach
> to index like count (people can like eachother post in my social app)?
> likecount change more and more in one day.(so as i know it must be set
> dynamic field)
> and if i indexed it ,i think  it costs alot , to update and index
> likecount
> more and more even i use bach update.
> so is it approach to didn't index one field in solr but i could sort my
> search result according to that unindexed field?
> 
> tnx


Re: plz help me

2015-09-01 Thread Upayavira
you don't need to use a dynamic field, just a normal field will work for
you. But, you *will* want to index it, and you may benefit from
docValues, so:



Upayavira

On Tue, Sep 1, 2015, at 10:59 AM, sara hajili wrote:
> my solr version is 5.2.1
> i have a question.
> if i create 2 core .one for post and one for like . i must index like
> count?
> i mean in schema for like core i must write:
>  stored="true"/>
> 
> am i true?
> 
> On Tue, Sep 1, 2015 at 2:42 AM, Upayavira  wrote:
> 
> > So you want to be able to sort by the "number of likes" value for a
> > post?
> >
> > What version of Solr are you using? How many posts do you have?
> >
> > There's a neat feature in Solr 5.2.1 (I'm pretty sure it is there, not
> > 5.3) called score joins. Using that you can have two cores, one
> > containing your posts, and another containing your likes.
> >
> > You cannot *sort* on these values, but you can include your likes into
> > the score, which might even be better.
> >
> > If this sounds good, I can dig up some syntax for such a query.
> >
> > Upayavira
> >
> > On Tue, Sep 1, 2015, at 10:36 AM, sara hajili wrote:
> > > hi.
> > > at first i.m sorry for my bad english!
> > > i have a social app.i want to use solr for searching in this app.
> > > i have many document (in my case people text that posted on my social
> > > app).
> > > and i indexed this.
> > > but i'm have 1 issue and it is :
> > >
> > > i have very doc(post) and they have a property "like" is it  good
> > > approach
> > > to index like count (people can like eachother post in my social app)?
> > > likecount change more and more in one day.(so as i know it must be set
> > > dynamic field)
> > > and if i indexed it ,i think  it costs alot , to update and index
> > > likecount
> > > more and more even i use bach update.
> > > so is it approach to didn't index one field in solr but i could sort my
> > > search result according to that unindexed field?
> > >
> > > tnx
> >


Re: Sorting parent documents based on a field from children

2015-09-01 Thread Florin Mandoc

Hi,

I have tried the solution from your blog with my schema and with the 
example from the blog post, with solr-5.3.0 and with solr-5.4. 
0-2015-08-12 and i get this error:


"responseHeader":{
"status":500,
"QTime":32},
  "error":{
"msg":"child query must only match non-parent docs, but parent docID=2 matched 
childScorer=class org.apache.lucene.search.DisjunctionSumScorer",
"trace":"java.lang.IllegalStateException: child query must only match non-parent 
docs, but parent docID=2 matched childScorer=class 
org.apache.lucene.search.DisjunctionSumScorer\n\tat 
org.apache.lucene.search.join.ToParentBlockJoinQuery$BlockJoinScorer.nextDoc(ToParentBlockJoinQuery.java:311)\n\tat
 org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:216)\n\tat 
org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:169)\n\tat 
org.apache.lucene.search.BulkScorer.score(BulkScorer.java:39)\n\tat 
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:772)\n\tat 
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:486)\n\tat 
org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:200)\n\tat
 org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1682)\n\tat 
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1501)\n\tat 
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:555)\n\tat 
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:522)\n\tat 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:277)\n\tat 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)\n\tat 
org.apache.solr.core.SolrCore.execute(SolrCore.java:2068)\n\tat 
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:669)\n\tat 
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:462)\n\tat 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:210)\n\tat 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)\n\tat 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)\n\tat 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)\n\tat 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)\n\tat 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)\n\tat 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)\n\tat 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)\n\tat 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\n\tat 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)\n\tat 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)\n\tat
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)\n\tat 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)\n\tat 
org.eclipse.jetty.server.Server.handle(Server.java:499)\n\tat 
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)\n\tat 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)\n\tat 
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)\n\tat 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)\n\tat 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)\n\tat 
java.lang.Thread.run(Thread.java:745)\n",
"code":500}

The query is for your example
http://localhost:8983/solr/testscore/select?q={!parent%20which=type_s:product%20score=max}+color_s:Red^=0%20{!func}price_i=json=true=score,*,[docid]

Do you have any idea why i get this error?

Thank you


On 31.08.2015 15:48, Mikhail Khludnev wrote:

Florin,

I disclosure some details in the recent post
http://blog.griddynamics.com/2015/08/scoring-join-party-in-solr-53.html.
Let me know if you have further questions afterwards.
I also notice that you use "obvious" syntax: BuyerID=83 but it's hardly
ever possible. There is a good habit of debugQuery=true, which allows to
reconcile query interpretation.

On Mon, Aug 31, 2015 at 2:40 PM, Florin Mandoc  wrote:


Hi,

I am trying to model am index from a relational database and i have 3 main
entity types: products, buyers and sellers.
I am using nested documents for sellers and buyers, as i have many sellers
and many buyers for one product:

{ "Active" : "true",
   "CategoryID" : 59,
   "CategoryName" : "Produce",
   "Id" : "227686",
   "ManufacturerID" : 322,
   "ManufacturerName" : "---",
   "Name" : "product name",
   "ProductID" : "227686",
   "SKU" : 

RE: testing with EmbeddedSolrServer

2015-09-01 Thread Moen Endre
Mikhail,

The purpose of using EmbeddedSolrServer is for testing, not for running as 
main().

Is there a best practice for doing integration-testing of solr? Or of 
validating that queries to solr returns the expected result?

E.g. I have this bit of production code:
private String getStartAndStopDateIntersectsRange( Date beginDate, Date 
EndDate) {
...
  dateQuery = "( (Start_Date:[* TO "+ endDate +"] AND Stop_Date:["+beginDate+" 
TO *])"+
   " OR (Start_Date:[* TO "+ endDate +"] AND !Stop_Date:[* TO *])" +
   " OR (!Start_Date:[* TO *] AND Stop_Date:["+beginDate+" TO *]) )";
..
}

And I would like to write a test-case that only returns the records that 
intersects a given daterange.


Cheers
Endre




-Original Message-
From: Mikhail Khludnev [mailto:mkhlud...@griddynamics.com] 
Sent: 31. august 2015 15:02
To: solr-user
Subject: Re: testing with EmbeddedSolrServer

Endre,

As I suggested before, consider to avoid test framework, just put all code 
interacting with EmbeddedSolrServer into main() method.

On Mon, Aug 31, 2015 at 12:15 PM, Moen Endre  wrote:

> Hi Mikhail,
>
> Im trying to read 7-8 xml files of data that contain realistic data 
> from our production server. Then I would like to read this data into 
> EmbeddedSolrServer to test for edge cases for our custom date search. 
> The use of EmbeddedSolrServer is purely to separate the data testing 
> from any environment that might change over time.
>
> I would also like to avoid writing plumbing-code to import each field 
> from the xml since I already have a working DIH.
>
> I tried adding synchronous=true but it doesn’t look like it makes solr 
> complete the import before doing a search.
>
> Looking at the log it doesn’t seem process the import request:
> [searcherExecutor-6-thread-1-processing-{core=nmdc}] DEBUG 
> o.apache.solr.core.SolrCore.Request - [nmdc] webapp=null path=null 
> params={q=static+firstSearcher+warming+in+solrconfig.xml=false
> =firstSearcher}
> ...
> [TEST-TestSolrEmbeddedServer.testNodeConfigConstructor-seed#[41C3C11DE
> 20DD5CE]] INFO  org.apache.solr.core.CoreContainer - registering core: 
> nmdc
> 10:48:31.613
> [TEST-TestSolrEmbeddedServer.testNodeConfigConstructor-seed#[41C3C11DE
> 20DD5CE]] INFO  o.apache.solr.core.SolrCore.Request - [nmdc] 
> webapp=null
> path=/dataimport2
> params={qt=%2Fdataimport2=full-import%26clean%3Dtrue%26synchro
> nous%3Dtrue}
> status=0 QTime=1
>
> {responseHeader={status=0,QTime=1},initArgs={defaults={config=dih-conf
> ig.xml}},command=full-import=true=true,status=idle,i
> mportResponse=,statusMessages={}} 
> [TEST-TestSolrEmbeddedServer.testNodeConfigConstructor-seed#[41C3C11DE
> 20DD5CE]] DEBUG o.apache.solr.core.SolrCore.Request - [nmdc] 
> webapp=null path=/select params={q=*%3A*} 
> [TEST-TestSolrEmbeddedServer.testNodeConfigConstructor-seed#[41C3C11DE
> 20DD5CE]] DEBUG o.a.s.h.component.QueryComponent - process:
> q=*:*=text=10=explicit
> [searcherExecutor-6-thread-1-processing-{core=nmdc}] DEBUG 
> o.a.s.h.component.QueryComponent - process:
> q=static+firstSearcher+warming+in+solrconfig.xml=false=text
> =firstSearcher=10=explicit
> [searcherExecutor-6-thread-1-processing-{core=nmdc}] DEBUG 
> o.a.s.search.stats.LocalStatsCache - ## GET 
> {q=static+firstSearcher+warming+in+solrconfig.xml=false=tex
> t=firstSearcher=10=explicit}
> [searcherExecutor-6-thread-1-processing-{core=nmdc}] INFO 
> o.apache.solr.core.SolrCore.Request - [nmdc] webapp=null path=null 
> params={q=static+firstSearcher+warming+in+solrconfig.xml=false
> =firstSearcher}
> hits=0 status=0 QTime=36
> [searcherExecutor-6-thread-1-processing-{core=nmdc}] INFO 
> org.apache.solr.core.SolrCore - QuerySenderListener done.
> [searcherExecutor-6-thread-1-processing-{core=nmdc}] INFO 
> org.apache.solr.core.SolrCore - [nmdc] Registered new searcher 
> Searcher@28be2785[nmdc] 
> main{ExitableDirectoryReader(UninvertingDirectoryReader())}
> ...
> [TEST-TestSolrEmbeddedServer.testNodeConfigConstructor-seed#[41C3C11DE
> 20DD5CE]] INFO  org.apache.solr.update.SolrCoreState - Closing 
> SolrCoreState 
> [TEST-TestSolrEmbeddedServer.testNodeConfigConstructor-seed#[41C3C11DE
> 20DD5CE]] INFO  o.a.solr.update.DefaultSolrCoreState - SolrCoreState 
> ref count has reached 0 - closing IndexWriter 
> [TEST-TestSolrEmbeddedServer.testNodeConfigConstructor-seed#[41C3C11DE
> 20DD5CE]] INFO  o.a.solr.update.DefaultSolrCoreState - closing 
> IndexWriter with IndexWriterCloser 
> [TEST-TestSolrEmbeddedServer.testNodeConfigConstructor-seed#[41C3C11DE
> 20DD5CE]] DEBUG o.apache.solr.update.SolrIndexWriter - Closing Writer
> DirectUpdateHandler2
>
> Cheers
> Endre
>
> -Original Message-
> From: Mikhail Khludnev [mailto:mkhlud...@griddynamics.com]
> Sent: 25. august 2015 19:43
> To: solr-user
> Subject: Re: testing with EmbeddedSolrServer
>
> Hello,
>
> I'm trying to guess what are you doing. It's not clear so far.
> I found http://stackoverflow.com/questions/11951695/embedded-solr-dih
> My conclusion, if 

plz help me

2015-09-01 Thread sara hajili
hi.
at first i.m sorry for my bad english!
i have a social app.i want to use solr for searching in this app.
i have many document (in my case people text that posted on my social app).
and i indexed this.
but i'm have 1 issue and it is :

i have very doc(post) and they have a property "like" is it  good approach
to index like count (people can like eachother post in my social app)?
likecount change more and more in one day.(so as i know it must be set
dynamic field)
and if i indexed it ,i think  it costs alot , to update and index likecount
more and more even i use bach update.
so is it approach to didn't index one field in solr but i could sort my
search result according to that unindexed field?

tnx


Re: Lucene/Solr 5.0 and custom FieldCahe implementation

2015-09-01 Thread Jamie Johnson
Tracking not teaching... Auto complete is fun...

On Tue, Sep 1, 2015, 6:34 AM Jamie Johnson  wrote:

> No worries, thanks again I'll begin teaching this
>
> On Mon, Aug 31, 2015, 5:16 PM Tomás Fernández Löbbe 
> wrote:
>
>> Sorry Jamie, I totally missed this email. There was no Jira that I could
>> find. I created SOLR-7996
>>
>> On Sat, Aug 29, 2015 at 5:26 AM, Jamie Johnson  wrote:
>>
>> > This sounds like a good idea, I'm assuming I'd need to make my own
>> > UnInvertingReader (or subclass) to do this right?  Is there a way to do
>> > this on the 5.x codebase or would I still need the solrindexer factory
>> work
>> > that Tomás mentioned previously?
>> >
>> > Tomás, is there a ticket for the SolrIndexer factory?  I'd like to
>> follow
>> > it's work to know what version of 5.x (or later) I should be looking for
>> > this in.
>> >
>> > On Thu, Aug 27, 2015 at 1:06 PM, Yonik Seeley 
>> wrote:
>> >
>> > > UnInvertingReader makes indexed fields look like docvalues fields.
>> > > The caching itself is still done in FieldCache/FieldCacheImpl
>> > > but you could perhaps wrap what is cached there to either screen out
>> > > stuff or construct a new entry based on the user.
>> > >
>> > > -Yonik
>> > >
>> > >
>> > > On Thu, Aug 27, 2015 at 12:55 PM, Jamie Johnson 
>> > wrote:
>> > > > I think a custom UnInvertingReader would work as I could skip the
>> > process
>> > > > of putting things in the cache.  Right now in Solr 4.x though I am
>> > > caching
>> > > > based but including the users authorities in the key of the cache so
>> > > we're
>> > > > not rebuilding the UnivertedField on every request.  Where in 5.x is
>> > the
>> > > > object actually cached?  Will this be possible in 5.x?
>> > > >
>> > > > On Thu, Aug 27, 2015 at 12:32 PM, Yonik Seeley 
>> > > wrote:
>> > > >
>> > > >> The FieldCache has become implementation rather than interface, so
>> I
>> > > >> don't think you're going to see plugins at that level (it's all
>> > > >> package protected now).
>> > > >>
>> > > >> One could either subclass or re-implement UnInvertingReader though.
>> > > >>
>> > > >> -Yonik
>> > > >>
>> > > >>
>> > > >> On Thu, Aug 27, 2015 at 12:09 PM, Jamie Johnson > >
>> > > wrote:
>> > > >> > Also in this vein I think that Lucene should support factories
>> for
>> > the
>> > > >> > cache creation as described @
>> > > >> > https://issues.apache.org/jira/browse/LUCENE-2394.  I'm not
>> > endorsing
>> > > >> the
>> > > >> > patch that is provided (I haven't even looked at it) just the
>> > concept
>> > > in
>> > > >> > general.
>> > > >> >
>> > > >> > On Thu, Aug 27, 2015 at 12:01 PM, Jamie Johnson <
>> jej2...@gmail.com>
>> > > >> wrote:
>> > > >> >
>> > > >> >> That makes sense, then I could extend the SolrIndexSearcher by
>> > > creating
>> > > >> a
>> > > >> >> different factory class that did whatever magic I needed.  If
>> you
>> > > >> create a
>> > > >> >> Jira ticket for this please link it here so I can track it!
>> Again
>> > > >> thanks
>> > > >> >>
>> > > >> >> On Thu, Aug 27, 2015 at 11:59 AM, Tomás Fernández Löbbe <
>> > > >> >> tomasflo...@gmail.com> wrote:
>> > > >> >>
>> > > >> >>> I don't think there is a way to do this now. Maybe we should
>> > > separate
>> > > >> the
>> > > >> >>> logic of creating the SolrIndexSearcher to a factory. Moving
>> this
>> > > logic
>> > > >> >>> away from SolrCore is already a win, plus it will make it
>> easier
>> > to
>> > > >> unit
>> > > >> >>> test and extend for advanced use cases.
>> > > >> >>>
>> > > >> >>> Tomás
>> > > >> >>>
>> > > >> >>> On Wed, Aug 26, 2015 at 8:10 PM, Jamie Johnson <
>> jej2...@gmail.com
>> > >
>> > > >> wrote:
>> > > >> >>>
>> > > >> >>> > Sorry to poke this again but I'm not following the last
>> comment
>> > of
>> > > >> how I
>> > > >> >>> > could go about extending the solr index searcher and have the
>> > > >> extension
>> > > >> >>> > used.  Is there an example of this?  Again thanks
>> > > >> >>> >
>> > > >> >>> > Jamie
>> > > >> >>> > On Aug 25, 2015 7:18 AM, "Jamie Johnson" 
>> > > wrote:
>> > > >> >>> >
>> > > >> >>> > > I had seen this as well, if I over wrote this by extending
>> > > >> >>> > > SolrIndexSearcher how do I have my extension used?  I
>> didn't
>> > > see a
>> > > >> way
>> > > >> >>> > that
>> > > >> >>> > > could be plugged in.
>> > > >> >>> > > On Aug 25, 2015 7:15 AM, "Mikhail Khludnev" <
>> > > >> >>> mkhlud...@griddynamics.com>
>> > > >> >>> > > wrote:
>> > > >> >>> > >
>> > > >> >>> > >> On Tue, Aug 25, 2015 at 2:03 PM, Jamie Johnson <
>> > > jej2...@gmail.com
>> > > >> >
>> > > >> >>> > wrote:
>> > > >> >>> > >>
>> > > >> >>> > >> > Thanks Mikhail.  If I'm reading the SimpleFacets class
>> > > >> correctly,
>> > > >> >>> out
>> > > >> >>> > >> > delegates to DocValuesFacets when facet method is FC,
>> what
>> > > used
>> > > >> to
>> > > >> >>> be
>> > > 

Difference between Legacy Facets and JSON Facets

2015-09-01 Thread Zheng Lin Edwin Yeo
Hi,

I'm using Solr 5.2.1, and I would like to find out, what is the difference
between Legacy Facets and JSON Facets in Solr? I was told that JSON Facets
has a much lesser Request Latency, but I couldn't find any major difference
in speed. Or must we have a larger index in order to have any significant
difference?

Is there any significant advantage to use JSON Faceting command instead of
Legacy Faceting command?

Regards,
Edwin


Re: plz help me

2015-09-01 Thread sara hajili
my solr version is 5.2.1
i have a question.
if i create 2 core .one for post and one for like . i must index like count?
i mean in schema for like core i must write:


am i true?

On Tue, Sep 1, 2015 at 2:42 AM, Upayavira  wrote:

> So you want to be able to sort by the "number of likes" value for a
> post?
>
> What version of Solr are you using? How many posts do you have?
>
> There's a neat feature in Solr 5.2.1 (I'm pretty sure it is there, not
> 5.3) called score joins. Using that you can have two cores, one
> containing your posts, and another containing your likes.
>
> You cannot *sort* on these values, but you can include your likes into
> the score, which might even be better.
>
> If this sounds good, I can dig up some syntax for such a query.
>
> Upayavira
>
> On Tue, Sep 1, 2015, at 10:36 AM, sara hajili wrote:
> > hi.
> > at first i.m sorry for my bad english!
> > i have a social app.i want to use solr for searching in this app.
> > i have many document (in my case people text that posted on my social
> > app).
> > and i indexed this.
> > but i'm have 1 issue and it is :
> >
> > i have very doc(post) and they have a property "like" is it  good
> > approach
> > to index like count (people can like eachother post in my social app)?
> > likecount change more and more in one day.(so as i know it must be set
> > dynamic field)
> > and if i indexed it ,i think  it costs alot , to update and index
> > likecount
> > more and more even i use bach update.
> > so is it approach to didn't index one field in solr but i could sort my
> > search result according to that unindexed field?
> >
> > tnx
>


Re: plz help me

2015-09-01 Thread Alexandre Rafalovitch
http://blog.griddynamics.com/2015/08/scoring-join-party-in-solr-53.html
shows how to keep updates in a separate core. Notice that it is an
intermediate-level article for query syntax.

For persian text analysis, there is a pre-built analyser defiition in
the techproducts example, start from that. It is in the schema.xml in
server/solr/configsets/sample_techproducts_configs/conf and is one of
the example configsets.

Regards,
   Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 1 September 2015 at 08:07, sara hajili  wrote:
> and another question is:
> my docs are persian and i use text_fa for fieldType but i wanna to have a
> persian textfield that handle search problem such as stemming.
> word distance,synonyms etc
> like english types.
> as i said i handle "میخواهم " and "خواستن"  and so on.
> can you suggest me a fieldtype for handle this issues in persian field.
> tnx
>
> On Tue, Sep 1, 2015 at 3:16 AM, sara hajili  wrote:
>
>> i'm really confused:|
>> i'm really anxious about cost of update like count.
>> and as you said:
>> > docValues="true"/>
>> you indexed like_count field .and i think it cost alot to update and index
>> again docs.
>> because like count change more and more
>> so isn't better to indede="false" that this field name??!!
>>
>> On Tue, Sep 1, 2015 at 3:08 AM, Upayavira  wrote:
>>
>>> you don't need to use a dynamic field, just a normal field will work for
>>> you. But, you *will* want to index it, and you may benefit from
>>> docValues, so:
>>>
>>> >> docValues="true"/>
>>>
>>> Upayavira
>>>
>>> On Tue, Sep 1, 2015, at 10:59 AM, sara hajili wrote:
>>> > my solr version is 5.2.1
>>> > i have a question.
>>> > if i create 2 core .one for post and one for like . i must index like
>>> > count?
>>> > i mean in schema for like core i must write:
>>> > >> > stored="true"/>
>>> >
>>> > am i true?
>>> >
>>> > On Tue, Sep 1, 2015 at 2:42 AM, Upayavira  wrote:
>>> >
>>> > > So you want to be able to sort by the "number of likes" value for a
>>> > > post?
>>> > >
>>> > > What version of Solr are you using? How many posts do you have?
>>> > >
>>> > > There's a neat feature in Solr 5.2.1 (I'm pretty sure it is there, not
>>> > > 5.3) called score joins. Using that you can have two cores, one
>>> > > containing your posts, and another containing your likes.
>>> > >
>>> > > You cannot *sort* on these values, but you can include your likes into
>>> > > the score, which might even be better.
>>> > >
>>> > > If this sounds good, I can dig up some syntax for such a query.
>>> > >
>>> > > Upayavira
>>> > >
>>> > > On Tue, Sep 1, 2015, at 10:36 AM, sara hajili wrote:
>>> > > > hi.
>>> > > > at first i.m sorry for my bad english!
>>> > > > i have a social app.i want to use solr for searching in this app.
>>> > > > i have many document (in my case people text that posted on my
>>> social
>>> > > > app).
>>> > > > and i indexed this.
>>> > > > but i'm have 1 issue and it is :
>>> > > >
>>> > > > i have very doc(post) and they have a property "like" is it  good
>>> > > > approach
>>> > > > to index like count (people can like eachother post in my social
>>> app)?
>>> > > > likecount change more and more in one day.(so as i know it must be
>>> set
>>> > > > dynamic field)
>>> > > > and if i indexed it ,i think  it costs alot , to update and index
>>> > > > likecount
>>> > > > more and more even i use bach update.
>>> > > > so is it approach to didn't index one field in solr but i could
>>> sort my
>>> > > > search result according to that unindexed field?
>>> > > >
>>> > > > tnx
>>> > >
>>>
>>
>>


Re: Sorting parent documents based on a field from children

2015-09-01 Thread Mikhail Khludnev
I suspect URL encoding might mess with MUST (+) clause. Can you post
debugQuery=true output to make sure that the query parsed right.
Then make sure that chid query and parent filter are fully orthogonal. eg
+type_s:product +color_s:Red returns no result.
Last check to check, make sure that you don't have deleted document in the
index for a while. You can check in at SolrAdmin.

On Tue, Sep 1, 2015 at 1:42 PM, Florin Mandoc  wrote:

> Hi,
>
> I have tried the solution from your blog with my schema and with the
> example from the blog post, with solr-5.3.0 and with solr-5.4. 0-2015-08-12
> and i get this error:
>
> "responseHeader":{
> "status":500,
> "QTime":32},
>   "error":{
> "msg":"child query must only match non-parent docs, but parent docID=2
> matched childScorer=class org.apache.lucene.search.DisjunctionSumScorer",
> "trace":"java.lang.IllegalStateException: child query must only match
> non-parent docs, but parent docID=2 matched childScorer=class
> org.apache.lucene.search.DisjunctionSumScorer\n\tat
> org.apache.lucene.search.join.ToParentBlockJoinQuery$BlockJoinScorer.nextDoc(ToParentBlockJoinQuery.java:311)\n\tat
> org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:216)\n\tat
> org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:169)\n\tat
> org.apache.lucene.search.BulkScorer.score(BulkScorer.java:39)\n\tat
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:772)\n\tat
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:486)\n\tat
> org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:200)\n\tat
> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1682)\n\tat
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1501)\n\tat
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:555)\n\tat
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:522)\n\tat
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:277)\n\tat
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)\n\tat
> org.apache.solr.core.SolrCore.execute(SolrCore.java:2068)\n\tat
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:669)\n\tat
> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:462)\n\tat
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:210)\n\tat
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)\n\tat
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)\n\tat
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)\n\tat
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)\n\tat
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)\n\tat
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)\n\tat
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)\n\tat
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\n\tat
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)\n\tat
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)\n\tat
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)\n\tat
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)\n\tat
> org.eclipse.jetty.server.Server.handle(Server.java:499)\n\tat
> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)\n\tat
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)\n\tat
> org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)\n\tat
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)\n\tat
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)\n\tat
> java.lang.Thread.run(Thread.java:745)\n",
> "code":500}
>
> The query is for your example
>
> http://localhost:8983/solr/testscore/select?q={!parent%20which=type_s:product%20score=max}+color_s:Red
> 
> ^=0%20{!func}price_i=json=true=score,*,[docid]
>
> Do you have any idea why i get this error?
>
> Thank you
>
>
>
> On 31.08.2015 15:48, Mikhail Khludnev wrote:
>
>> Florin,
>>
>> I disclosure some details in the recent post
>> http://blog.griddynamics.com/2015/08/scoring-join-party-in-solr-53.html.
>> Let me know if you have further questions afterwards.
>> I also notice that you use "obvious" syntax: BuyerID=83 but 

Re: testing with EmbeddedSolrServer

2015-09-01 Thread Mikhail Khludnev
Endre,
Here is the problem. SolrTestCase4J already brings solr core/container and
a sort of server already orchestrated by a complex harness. Thus, adding
EmbeddedSS makes all things quite complicated, it's challenging to
understand which ones misbehaves. Giving that you need to debug DIH config
I can suggest you look at the short
org.apache.solr.handler.dataimport.TestNestedChildren and use it as a
sample to start from.


On Tue, Sep 1, 2015 at 11:54 AM, Moen Endre  wrote:

> Mikhail,
>
> The purpose of using EmbeddedSolrServer is for testing, not for running as
> main().
>
> Is there a best practice for doing integration-testing of solr? Or of
> validating that queries to solr returns the expected result?
>
> E.g. I have this bit of production code:
> private String getStartAndStopDateIntersectsRange( Date beginDate, Date
> EndDate) {
> ...
>   dateQuery = "( (Start_Date:[* TO "+ endDate +"] AND
> Stop_Date:["+beginDate+" TO *])"+
>" OR (Start_Date:[* TO "+ endDate +"] AND !Stop_Date:[* TO *])" +
>" OR (!Start_Date:[* TO *] AND Stop_Date:["+beginDate+" TO *]) )";
> ..
> }
>
> And I would like to write a test-case that only returns the records that
> intersects a given daterange.
>
>
> Cheers
> Endre
>
>
>
>
> -Original Message-
> From: Mikhail Khludnev [mailto:mkhlud...@griddynamics.com]
> Sent: 31. august 2015 15:02
> To: solr-user
> Subject: Re: testing with EmbeddedSolrServer
>
> Endre,
>
> As I suggested before, consider to avoid test framework, just put all code
> interacting with EmbeddedSolrServer into main() method.
>
> On Mon, Aug 31, 2015 at 12:15 PM, Moen Endre  wrote:
>
> > Hi Mikhail,
> >
> > Im trying to read 7-8 xml files of data that contain realistic data
> > from our production server. Then I would like to read this data into
> > EmbeddedSolrServer to test for edge cases for our custom date search.
> > The use of EmbeddedSolrServer is purely to separate the data testing
> > from any environment that might change over time.
> >
> > I would also like to avoid writing plumbing-code to import each field
> > from the xml since I already have a working DIH.
> >
> > I tried adding synchronous=true but it doesn’t look like it makes solr
> > complete the import before doing a search.
> >
> > Looking at the log it doesn’t seem process the import request:
> > [searcherExecutor-6-thread-1-processing-{core=nmdc}] DEBUG
> > o.apache.solr.core.SolrCore.Request - [nmdc] webapp=null path=null
> > params={q=static+firstSearcher+warming+in+solrconfig.xml=false
> > =firstSearcher}
> > ...
> > [TEST-TestSolrEmbeddedServer.testNodeConfigConstructor-seed#[41C3C11DE
> > 20DD5CE]] INFO  org.apache.solr.core.CoreContainer - registering core:
> > nmdc
> > 10:48:31.613
> > [TEST-TestSolrEmbeddedServer.testNodeConfigConstructor-seed#[41C3C11DE
> > 20DD5CE]] INFO  o.apache.solr.core.SolrCore.Request - [nmdc]
> > webapp=null
> > path=/dataimport2
> > params={qt=%2Fdataimport2=full-import%26clean%3Dtrue%26synchro
> > nous%3Dtrue}
> > status=0 QTime=1
> >
> > {responseHeader={status=0,QTime=1},initArgs={defaults={config=dih-conf
> > ig.xml}},command=full-import=true=true,status=idle,i
> > mportResponse=,statusMessages={}}
> > [TEST-TestSolrEmbeddedServer.testNodeConfigConstructor-seed#[41C3C11DE
> > 20DD5CE]] DEBUG o.apache.solr.core.SolrCore.Request - [nmdc]
> > webapp=null path=/select params={q=*%3A*}
> > [TEST-TestSolrEmbeddedServer.testNodeConfigConstructor-seed#[41C3C11DE
> > 20DD5CE]] DEBUG o.a.s.h.component.QueryComponent - process:
> > q=*:*=text=10=explicit
> > [searcherExecutor-6-thread-1-processing-{core=nmdc}] DEBUG
> > o.a.s.h.component.QueryComponent - process:
> > q=static+firstSearcher+warming+in+solrconfig.xml=false=text
> > =firstSearcher=10=explicit
> > [searcherExecutor-6-thread-1-processing-{core=nmdc}] DEBUG
> > o.a.s.search.stats.LocalStatsCache - ## GET
> > {q=static+firstSearcher+warming+in+solrconfig.xml=false=tex
> > t=firstSearcher=10=explicit}
> > [searcherExecutor-6-thread-1-processing-{core=nmdc}] INFO
> > o.apache.solr.core.SolrCore.Request - [nmdc] webapp=null path=null
> > params={q=static+firstSearcher+warming+in+solrconfig.xml=false
> > =firstSearcher}
> > hits=0 status=0 QTime=36
> > [searcherExecutor-6-thread-1-processing-{core=nmdc}] INFO
> > org.apache.solr.core.SolrCore - QuerySenderListener done.
> > [searcherExecutor-6-thread-1-processing-{core=nmdc}] INFO
> > org.apache.solr.core.SolrCore - [nmdc] Registered new searcher
> > Searcher@28be2785[nmdc]
> > main{ExitableDirectoryReader(UninvertingDirectoryReader())}
> > ...
> > [TEST-TestSolrEmbeddedServer.testNodeConfigConstructor-seed#[41C3C11DE
> > 20DD5CE]] INFO  org.apache.solr.update.SolrCoreState - Closing
> > SolrCoreState
> > [TEST-TestSolrEmbeddedServer.testNodeConfigConstructor-seed#[41C3C11DE
> > 20DD5CE]] INFO  o.a.solr.update.DefaultSolrCoreState - SolrCoreState
> > ref count has reached 0 - closing IndexWriter
> > 

Re: Sorting parent documents based on a field from children

2015-09-01 Thread Alexandre Rafalovitch
On 1 September 2015 at 08:29, Mikhail Khludnev
 wrote:
> Last check to check, make sure that you don't have deleted document in the
> index for a while. You can check in at SolrAdmin.

What's the significance of that particular advice? Is something in the
join including deleted documents as well?

P.s. Great article. If I may suggest, it would help to link/mention
the ^= constant scoring feature when it is shown first. Not many
people know about it, may help to disambiguate the syntax.

Regards,
   Alex.


Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


Re: plz help me

2015-09-01 Thread sara hajili
i used pre-built persian analyzer but it isn't enough for me.
this work just for search exacly indexed work!
like search 'go' and search result is 'go'
as i said in english we have stemming,synonym,etc filters that help us to
have flexible search.
and i want some filter for persian.
that pre-built text_fa doesn't satisfied me.have you better perisan filter
than that?or a soulotion to have this filter in persian?
tnx.

On Tue, Sep 1, 2015 at 5:21 AM, Alexandre Rafalovitch 
wrote:

> http://blog.griddynamics.com/2015/08/scoring-join-party-in-solr-53.html
> shows how to keep updates in a separate core. Notice that it is an
> intermediate-level article for query syntax.
>
> For persian text analysis, there is a pre-built analyser defiition in
> the techproducts example, start from that. It is in the schema.xml in
> server/solr/configsets/sample_techproducts_configs/conf and is one of
> the example configsets.
>
> Regards,
>Alex.
> 
> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> http://www.solr-start.com/
>
>
> On 1 September 2015 at 08:07, sara hajili  wrote:
> > and another question is:
> > my docs are persian and i use text_fa for fieldType but i wanna to have a
> > persian textfield that handle search problem such as stemming.
> > word distance,synonyms etc
> > like english types.
> > as i said i handle "میخواهم " and "خواستن"  and so on.
> > can you suggest me a fieldtype for handle this issues in persian field.
> > tnx
> >
> > On Tue, Sep 1, 2015 at 3:16 AM, sara hajili 
> wrote:
> >
> >> i'm really confused:|
> >> i'm really anxious about cost of update like count.
> >> and as you said:
> >>  >> docValues="true"/>
> >> you indexed like_count field .and i think it cost alot to update and
> index
> >> again docs.
> >> because like count change more and more
> >> so isn't better to indede="false" that this field name??!!
> >>
> >> On Tue, Sep 1, 2015 at 3:08 AM, Upayavira  wrote:
> >>
> >>> you don't need to use a dynamic field, just a normal field will work
> for
> >>> you. But, you *will* want to index it, and you may benefit from
> >>> docValues, so:
> >>>
> >>>  >>> docValues="true"/>
> >>>
> >>> Upayavira
> >>>
> >>> On Tue, Sep 1, 2015, at 10:59 AM, sara hajili wrote:
> >>> > my solr version is 5.2.1
> >>> > i have a question.
> >>> > if i create 2 core .one for post and one for like . i must index like
> >>> > count?
> >>> > i mean in schema for like core i must write:
> >>> >  >>> > stored="true"/>
> >>> >
> >>> > am i true?
> >>> >
> >>> > On Tue, Sep 1, 2015 at 2:42 AM, Upayavira  wrote:
> >>> >
> >>> > > So you want to be able to sort by the "number of likes" value for a
> >>> > > post?
> >>> > >
> >>> > > What version of Solr are you using? How many posts do you have?
> >>> > >
> >>> > > There's a neat feature in Solr 5.2.1 (I'm pretty sure it is there,
> not
> >>> > > 5.3) called score joins. Using that you can have two cores, one
> >>> > > containing your posts, and another containing your likes.
> >>> > >
> >>> > > You cannot *sort* on these values, but you can include your likes
> into
> >>> > > the score, which might even be better.
> >>> > >
> >>> > > If this sounds good, I can dig up some syntax for such a query.
> >>> > >
> >>> > > Upayavira
> >>> > >
> >>> > > On Tue, Sep 1, 2015, at 10:36 AM, sara hajili wrote:
> >>> > > > hi.
> >>> > > > at first i.m sorry for my bad english!
> >>> > > > i have a social app.i want to use solr for searching in this app.
> >>> > > > i have many document (in my case people text that posted on my
> >>> social
> >>> > > > app).
> >>> > > > and i indexed this.
> >>> > > > but i'm have 1 issue and it is :
> >>> > > >
> >>> > > > i have very doc(post) and they have a property "like" is it  good
> >>> > > > approach
> >>> > > > to index like count (people can like eachother post in my social
> >>> app)?
> >>> > > > likecount change more and more in one day.(so as i know it must
> be
> >>> set
> >>> > > > dynamic field)
> >>> > > > and if i indexed it ,i think  it costs alot , to update and index
> >>> > > > likecount
> >>> > > > more and more even i use bach update.
> >>> > > > so is it approach to didn't index one field in solr but i could
> >>> sort my
> >>> > > > search result according to that unindexed field?
> >>> > > >
> >>> > > > tnx
> >>> > >
> >>>
> >>
> >>
>


Re: Get distinct results in Solr

2015-09-01 Thread Alexandre Rafalovitch
Do you mean that normally you do get stuff indexed but when you make
any of these changes the indexing stops working and you get empty
index? If so, you probably misconfigured something and should be
getting error messages.

If, on the other hand, you see no changes, check that you are actually
using that URP chain. It needs to be declared in the search handler to
be used. Or it can be passed as a URL parameter too. The documentation
has the details.

Regards,
   Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 1 September 2015 at 04:46, Zheng Lin Edwin Yeo  wrote:
> Hi Upayavira,
>
> I've tried to change id to be  name="signatureField">signature, but nothing is indexed into Solr as
> well. Is that what you mean?
>
> Besides that, I've also included a copyField to copy the content field into
> the signature field. Both versions (with and without copyField) have
> nothing indexed into Solr.
>
> Regards,
> Edwin
>
>
> On 1 September 2015 at 15:48, Upayavira  wrote:
>
>> you are attempting to write your signature to your ID field. That's not
>> a good idea. You are generating your signature from the content field,
>> which seems okay. Change your id to be
>> your 'signature' field instead of id, and something different will
>> happen :-)
>>
>> Upayavira
>>
>> On Tue, Sep 1, 2015, at 04:34 AM, Zheng Lin Edwin Yeo wrote:
>> > I tried to follow the de-duplication guide, but after I configured it in
>> > solrconfig.xml and schema.xml, nothing is indexed into Solr, and there is
>> > no error message. I'm using SimplePostTool to index rich-text documents.
>> >
>> > Below are my configurations:
>> >
>> > In solrconfig.xml
>> >
>> >   
>> >  
>> > dedupe
>> >  
>> >   
>> >
>> > 
>> >  
>> > true
>> > id
>> > false
>> > content
>> > solr.processor.Lookup3Signature
>> >  
>> > 
>> >
>> >
>> > In schema.xml
>> >
>> >  > > multiValued="false" />
>> >
>> >
>> > Is there anything which I might have missed out or done wrongly?
>> >
>> > Regards,
>> > Edwin
>> >
>> >
>> > On 1 September 2015 at 10:46, Zheng Lin Edwin Yeo 
>> > wrote:
>> >
>> > > Thank you for your advice Alexandre.
>> > >
>> > > Will try out the de-duplication from the link you gave.
>> > >
>> > > Regards,
>> > > Edwin
>> > >
>> > >
>> > > On 1 September 2015 at 10:34, Alexandre Rafalovitch <
>> arafa...@gmail.com>
>> > > wrote:
>> > >
>> > >> Re-read the question. You want to de-dupe on the full text-content.
>> > >>
>> > >> I would actually try to use the dedupe chain as per the link I gave
>> > >> but put results into a separate string field. Then, you group on that
>> > >> field. You cannot actually group on the long text field, that would
>> > >> kill any performance. So a signature is your proxy.
>> > >>
>> > >> Regards,
>> > >>Alex
>> > >> 
>> > >> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
>> > >> http://www.solr-start.com/
>> > >>
>> > >>
>> > >> On 31 August 2015 at 22:26, Zheng Lin Edwin Yeo > >
>> > >> wrote:
>> > >> > Hi Alexandre,
>> > >> >
>> > >> > Will treating it as String affect the search or other functions like
>> > >> > highlighting?
>> > >> >
>> > >> > Yes, the content must be in my index, unless I do a copyField to do
>> > >> > de-duplication on that field.. Will that help?
>> > >> >
>> > >> > Regards,
>> > >> > Edwin
>> > >> >
>> > >> >
>> > >> > On 1 September 2015 at 10:04, Alexandre Rafalovitch <
>> arafa...@gmail.com
>> > >> >
>> > >> > wrote:
>> > >> >
>> > >> >> Can't you just treat it as String?
>> > >> >>
>> > >> >> Also, do you actually want those documents in your index in the
>> first
>> > >> >> place? If not, have you looked at De-duplication:
>> > >> >> https://cwiki.apache.org/confluence/display/solr/De-Duplication
>> > >> >>
>> > >> >> Regards,
>> > >> >>Alex.
>> > >> >> 
>> > >> >> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
>> > >> >> http://www.solr-start.com/
>> > >> >>
>> > >> >>
>> > >> >> On 31 August 2015 at 22:00, Zheng Lin Edwin Yeo <
>> edwinye...@gmail.com>
>> > >> >> wrote:
>> > >> >> > Thanks Jan.
>> > >> >> >
>> > >> >> > But I read that the field that is being collapsed on must be a
>> single
>> > >> >> > valued String, Int or Float. As I'm required to get the distinct
>> > >> results
>> > >> >> > from "content" field that was indexed from a rich text document,
>> I
>> > >> got
>> > >> >> the
>> > >> >> > following error:
>> > >> >> >
>> > >> >> >   "error":{
>> > >> >> > "msg":"java.io.IOException: 64 bit numeric collapse fields
>> are
>> > >> not
>> > >> >> > supported",
>> > >> >> > "trace":"java.lang.RuntimeException: java.io.IOException: 64
>> bit
>> > >> >> > numeric collapse fields are not supported\r\n\tat
>> > >> >> >
>> > >> >> >
>> > >> >> > Is it possible to collapsed on fields which has a long integer of
>> > >> data,
>> > >> >> > like content from a rich text document?
>> 

Re: plz help me

2015-09-01 Thread sara hajili
and another question is:
my docs are persian and i use text_fa for fieldType but i wanna to have a
persian textfield that handle search problem such as stemming.
word distance,synonyms etc
like english types.
as i said i handle "میخواهم " and "خواستن"  and so on.
can you suggest me a fieldtype for handle this issues in persian field.
tnx

On Tue, Sep 1, 2015 at 3:16 AM, sara hajili  wrote:

> i'm really confused:|
> i'm really anxious about cost of update like count.
> and as you said:
>  docValues="true"/>
> you indexed like_count field .and i think it cost alot to update and index
> again docs.
> because like count change more and more
> so isn't better to indede="false" that this field name??!!
>
> On Tue, Sep 1, 2015 at 3:08 AM, Upayavira  wrote:
>
>> you don't need to use a dynamic field, just a normal field will work for
>> you. But, you *will* want to index it, and you may benefit from
>> docValues, so:
>>
>> > docValues="true"/>
>>
>> Upayavira
>>
>> On Tue, Sep 1, 2015, at 10:59 AM, sara hajili wrote:
>> > my solr version is 5.2.1
>> > i have a question.
>> > if i create 2 core .one for post and one for like . i must index like
>> > count?
>> > i mean in schema for like core i must write:
>> > > > stored="true"/>
>> >
>> > am i true?
>> >
>> > On Tue, Sep 1, 2015 at 2:42 AM, Upayavira  wrote:
>> >
>> > > So you want to be able to sort by the "number of likes" value for a
>> > > post?
>> > >
>> > > What version of Solr are you using? How many posts do you have?
>> > >
>> > > There's a neat feature in Solr 5.2.1 (I'm pretty sure it is there, not
>> > > 5.3) called score joins. Using that you can have two cores, one
>> > > containing your posts, and another containing your likes.
>> > >
>> > > You cannot *sort* on these values, but you can include your likes into
>> > > the score, which might even be better.
>> > >
>> > > If this sounds good, I can dig up some syntax for such a query.
>> > >
>> > > Upayavira
>> > >
>> > > On Tue, Sep 1, 2015, at 10:36 AM, sara hajili wrote:
>> > > > hi.
>> > > > at first i.m sorry for my bad english!
>> > > > i have a social app.i want to use solr for searching in this app.
>> > > > i have many document (in my case people text that posted on my
>> social
>> > > > app).
>> > > > and i indexed this.
>> > > > but i'm have 1 issue and it is :
>> > > >
>> > > > i have very doc(post) and they have a property "like" is it  good
>> > > > approach
>> > > > to index like count (people can like eachother post in my social
>> app)?
>> > > > likecount change more and more in one day.(so as i know it must be
>> set
>> > > > dynamic field)
>> > > > and if i indexed it ,i think  it costs alot , to update and index
>> > > > likecount
>> > > > more and more even i use bach update.
>> > > > so is it approach to didn't index one field in solr but i could
>> sort my
>> > > > search result according to that unindexed field?
>> > > >
>> > > > tnx
>> > >
>>
>
>


Re: plz help me

2015-09-01 Thread Alexandre Rafalovitch
If Solr's persian configuration is not sufficient, you could look into
commercial implementation from Basis Tech (I haven't tested it):
http://www.basistech.com/text-analytics/rosette/base-linguistics/for-arabic/
(says it supports persian at the bottom of the page).

I would also open a JIRA with example of what works and what does not.
Even if it does not get implemented very soon, it would be clear
record of what to aim for.

Regards,
   Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 1 September 2015 at 08:27, sara hajili  wrote:
> i used pre-built persian analyzer but it isn't enough for me.
> this work just for search exacly indexed work!
> like search 'go' and search result is 'go'
> as i said in english we have stemming,synonym,etc filters that help us to
> have flexible search.
> and i want some filter for persian.
> that pre-built text_fa doesn't satisfied me.have you better perisan filter
> than that?or a soulotion to have this filter in persian?
> tnx.
>
> On Tue, Sep 1, 2015 at 5:21 AM, Alexandre Rafalovitch 
> wrote:
>
>> http://blog.griddynamics.com/2015/08/scoring-join-party-in-solr-53.html
>> shows how to keep updates in a separate core. Notice that it is an
>> intermediate-level article for query syntax.
>>
>> For persian text analysis, there is a pre-built analyser defiition in
>> the techproducts example, start from that. It is in the schema.xml in
>> server/solr/configsets/sample_techproducts_configs/conf and is one of
>> the example configsets.
>>
>> Regards,
>>Alex.
>> 
>> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
>> http://www.solr-start.com/
>>
>>
>> On 1 September 2015 at 08:07, sara hajili  wrote:
>> > and another question is:
>> > my docs are persian and i use text_fa for fieldType but i wanna to have a
>> > persian textfield that handle search problem such as stemming.
>> > word distance,synonyms etc
>> > like english types.
>> > as i said i handle "میخواهم " and "خواستن"  and so on.
>> > can you suggest me a fieldtype for handle this issues in persian field.
>> > tnx
>> >
>> > On Tue, Sep 1, 2015 at 3:16 AM, sara hajili 
>> wrote:
>> >
>> >> i'm really confused:|
>> >> i'm really anxious about cost of update like count.
>> >> and as you said:
>> >> > >> docValues="true"/>
>> >> you indexed like_count field .and i think it cost alot to update and
>> index
>> >> again docs.
>> >> because like count change more and more
>> >> so isn't better to indede="false" that this field name??!!
>> >>
>> >> On Tue, Sep 1, 2015 at 3:08 AM, Upayavira  wrote:
>> >>
>> >>> you don't need to use a dynamic field, just a normal field will work
>> for
>> >>> you. But, you *will* want to index it, and you may benefit from
>> >>> docValues, so:
>> >>>
>> >>> > >>> docValues="true"/>
>> >>>
>> >>> Upayavira
>> >>>
>> >>> On Tue, Sep 1, 2015, at 10:59 AM, sara hajili wrote:
>> >>> > my solr version is 5.2.1
>> >>> > i have a question.
>> >>> > if i create 2 core .one for post and one for like . i must index like
>> >>> > count?
>> >>> > i mean in schema for like core i must write:
>> >>> > > >>> > stored="true"/>
>> >>> >
>> >>> > am i true?
>> >>> >
>> >>> > On Tue, Sep 1, 2015 at 2:42 AM, Upayavira  wrote:
>> >>> >
>> >>> > > So you want to be able to sort by the "number of likes" value for a
>> >>> > > post?
>> >>> > >
>> >>> > > What version of Solr are you using? How many posts do you have?
>> >>> > >
>> >>> > > There's a neat feature in Solr 5.2.1 (I'm pretty sure it is there,
>> not
>> >>> > > 5.3) called score joins. Using that you can have two cores, one
>> >>> > > containing your posts, and another containing your likes.
>> >>> > >
>> >>> > > You cannot *sort* on these values, but you can include your likes
>> into
>> >>> > > the score, which might even be better.
>> >>> > >
>> >>> > > If this sounds good, I can dig up some syntax for such a query.
>> >>> > >
>> >>> > > Upayavira
>> >>> > >
>> >>> > > On Tue, Sep 1, 2015, at 10:36 AM, sara hajili wrote:
>> >>> > > > hi.
>> >>> > > > at first i.m sorry for my bad english!
>> >>> > > > i have a social app.i want to use solr for searching in this app.
>> >>> > > > i have many document (in my case people text that posted on my
>> >>> social
>> >>> > > > app).
>> >>> > > > and i indexed this.
>> >>> > > > but i'm have 1 issue and it is :
>> >>> > > >
>> >>> > > > i have very doc(post) and they have a property "like" is it  good
>> >>> > > > approach
>> >>> > > > to index like count (people can like eachother post in my social
>> >>> app)?
>> >>> > > > likecount change more and more in one day.(so as i know it must
>> be
>> >>> set
>> >>> > > > dynamic field)
>> >>> > > > and if i indexed it ,i think  it costs alot , to update and index
>> >>> > > > likecount
>> >>> > > > more and more even i use bach update.
>> >>> > > > so is it 

Re: Difference between Legacy Facets and JSON Facets

2015-09-01 Thread Zheng Lin Edwin Yeo
No, I've tested it several times after committing it.

I've tried executing it several times. The QTime for the JSON Facet is
always between the range of 400 to 600, while the QTime for the Legacy
Facet is usually between 15 to 30.

I have indexed about 1GB of rich-text data into the collection with a large
content section, but I don't think this is a problem, since the QTime for
Legacy Facet is relatively fast?

Both the JSON Facet and Legacy Facet returns the same results. The JSON
Facet will only shows the top 10 results by default, while the Legacy Facet
will show all (there are about 100 of them for my case). This is what I
find strange also, since the JSON Facet is only showing 10 results, so
shouldn't it be faster?


Regards,
Edwin


On 2 September 2015 at 11:39, Yonik Seeley  wrote:

> That's pretty strange...
> There can be caching differences.  Is this the first time the request
> is executed after a commit?
> What does executing it again show?
>
> -Yonik
>
> On Tue, Sep 1, 2015 at 9:47 PM, Zheng Lin Edwin Yeo
>  wrote:
> > Hi Yonik,
> >
> > Thanks for pointing out the difference.
> >
> > I've made modification and tried with this below command for JSON Facet,
> > but it is still having a QTime of 410, as compared to the Legacy Facet
> > QTime of 22:
> >
> http://localhost:8983/solr/collection1/select?q=paint={f:{field:content}}=0
> >
> > Is this the same as the Legacy Facet query of
> >
> http://localhost:8983/solr/collection1/select?q=paint=true=content=0
> > <
> http://27.54.41.220:8983/edm/collection1/select?q=paint=true=content=0
> >
> >  ?
> >
> >
> > Regards,
> > Edwin
> >
> >
> > On 1 September 2015 at 23:24, Yonik Seeley  wrote:
> >
> >> They aren't doing the same thing...
> >>
> >> The first URL is doing a straight facet on the content field.
> >> The second URL is doing a facet on the content field and asking for an
> >> additional statistic for each bucket.
> >>
> >> -Yonik
> >>
> >>
> >> On Tue, Sep 1, 2015 at 11:08 AM, Zheng Lin Edwin Yeo
> >>  wrote:
> >> > I've tried the following commands and I found that the Legacy
> Faceting is
> >> > actually much faster than JSON Faceting. Not sure why is this so, when
> >> the
> >> > document from this link http://yonik.com/solr-count-distinct/ states
> >> that
> >> > JSON Facets has a much lower request latency.
> >> >
> >> > (For Legacy Facet) - QTime: 22
> >> >
> >> > -
> >> >
> >>
> http://localhost:8983/solr/collection1/select?q=paint=true=content=0
> >> > <
> >>
> http://27.54.41.220:8983/edm/collection1/select?q=paint=true=content=0
> >> >
> >> >
> >> > (For JSON Facet) - QTime: 1128
> >> >
> >> > -
> >> >
> >>
> http://localhost:8983/solr/collection1/select?q=paint={f:{type:terms,field:content,facet:{stat1
> >> :"hll(id)"}}}=0
> >> > <
> >>
> http://27.54.41.220:8983/edm/collection1/select?q=paint=%7bf:%7btype:terms,field:content,facet:%7bstat1:%22hll(id)%22%7d%7d%7d=0
> >> >
> >> >
> >> >
> >> > Is there any problem with my URL for the JSON Facet?
> >> >
> >> >
> >> > Regards,
> >> >
> >> > Edwin
> >> >
> >> >
> >> >
> >> > On 1 September 2015 at 16:51, Zheng Lin Edwin Yeo <
> edwinye...@gmail.com>
> >> > wrote:
> >> >
> >> >> Hi,
> >> >>
> >> >> I'm using Solr 5.2.1, and I would like to find out, what is the
> >> difference
> >> >> between Legacy Facets and JSON Facets in Solr? I was told that JSON
> >> Facets
> >> >> has a much lesser Request Latency, but I couldn't find any major
> >> difference
> >> >> in speed. Or must we have a larger index in order to have any
> >> significant
> >> >> difference?
> >> >>
> >> >> Is there any significant advantage to use JSON Faceting command
> instead
> >> of
> >> >> Legacy Faceting command?
> >> >>
> >> >> Regards,
> >> >> Edwin
> >> >>
> >>
>


Re: 'missing content stream' issuing expungeDeletes=true

2015-09-01 Thread Erick Erickson
How many document total in your corpus? And how many do you
intend to have?

My point is that if you are testing this with a small corpus, the results
are very likely different than when you test on a reasonable corpus.
So if you expect your "real" index will contain many more docs than
what you're testing, this is likely a red herring.

But something isn't making a lot of sense here. You say you've traced it
to having a docfreq of 2 that changes to 1. But that means that the
value is unique in your entire corpus, which kind of indicates you're
trying to boost on unique values which is unusual.

If you're confident in your model though, the only way to guarantee
what you want is to optimize/expungeDeletes.

Best,
Erick

On Tue, Sep 1, 2015 at 7:51 PM, Derek Poh  wrote:
> Erick
>
> Yes, we see documents changing their position in the list due to having
> deleted docs.
> In our searchresult,weapply higher boost (bq) to a group of matched
> documents to have them display at the top tier of the result.
> At times 1 or 2 of these documentsare not return in the top tier, they are
> relegateddown to the lower tierof the result. Wediscovered that these
> documents have a lower score due to docFreq=2.
> After we do an optimize, these 1-2 documents are back in the top tier result
> order and their docFreqis 1.
>
>
>
> On 9/1/2015 11:40 PM, Erick Erickson wrote:
>>
>> Derek:
>>
>> Why do you care? What evidence do you have that this matters
>> _practically_?
>>
>> If you've look at scoring with a small number of documents, you'll see
>> significant
>> differences due to deleted documents. In most cases, as you get a larger
>> number
>> of documents the ranking of documents in an index with no deletions .vs.
>> indexes
>> that have deletions is usually not noticeable.
>>
>> I'm suggesting that this is a red herring. Your specific situation may
>> be different
>> of course, but since scoring is really only about ranking docs
>> relative to each other,
>> unless the relative positions change enough to be noticeable it's not a
>> problem.
>>
>> Note that I'm saying "relative rankings", NOT "absolute score". Document
>> scores
>> have no meaning outside comparisons to other docs _in the same query_. So
>> unless you see documents changing their position in the list due to
>> having deleted
>> docs, it's not worth spending time on IMO.
>>
>> Best,
>> Erick
>>
>> On Tue, Sep 1, 2015 at 12:45 AM, Upayavira  wrote:
>>>
>>> I wonder if this resolves it [1]. It has been applied to trunk, but not
>>> to the 5.x release branch.
>>>
>>> If you needed it in 5.x, I wonder if there's a way that particular
>>> choice could be made configurable.
>>>
>>> Upayavira
>>>
>>> [1] https://issues.apache.org/jira/browse/LUCENE-6711
>>> On Tue, Sep 1, 2015, at 02:43 AM, Derek Poh wrote:

 Hi Upayavira

 In fact we are using optimize currently but was advised to use expunge
 deletes as it is less resource intensive.
 So expunge deletes will only remove deleted documents, it will not merge
 all index segments into one?

 If we don't use optimize, the deleted documents in the index will affect
 the scores (with docFreq=2) of the matched documents which will affect
 the relevancy of the search result.

 Derek

 On 9/1/2015 12:05 AM, Upayavira wrote:
>
> If you really must expunge deletes, use optimize. That will merge all
> index segments into one, and in the process will remove any deleted
> documents.
>
> Why do you need to expunge deleted documents anyway? It is generally
> done in the background for you, so you shouldn't need to worry about
> it.
>
> Upayavira
>
> On Mon, Aug 31, 2015, at 06:46 AM, davidphilip cherian wrote:
>>
>> Hi,
>>
>> The below curl command worked without error, you can try.
>>
>> curl http://localhost:8983/solr/techproducts/update?commit=true -H
>> "Content-Type: text/xml" --data-binary '> expungeDeletes="true"/>'
>>
>> However, after executing this, I could still see same deleted counts
>> on
>> dashboard.  Deleted Docs:6
>> I am not sure whether that means,  the command did not take effect or
>> it
>> took effect but did not reflect on dashboard view.
>>
>>
>>
>>
>>
>> On Mon, Aug 31, 2015 at 8:51 AM, Derek Poh 
>> wrote:
>>
>>> Hi
>>>
>>> I tried doing a expungeDeletes=true with the following but get the
>>> message
>>> 'missing content stream'. What am I missing? I need to provide
>>> additional
>>> parameters?
>>>
>>> curl
>>> 'http://127.0.0.1:8983/solr/supplier/update/json?expungeDeletes=true
>>> ';
>>>
>>> Thanks,
>>> Derek
>>>
>>> --
>>> CONFIDENTIALITY NOTICE
>>> This e-mail (including any attachments) may contain confidential
>>> and/or
>>> 

Re: Difference between Legacy Facets and JSON Facets

2015-09-01 Thread Yonik Seeley
That's pretty strange...
There can be caching differences.  Is this the first time the request
is executed after a commit?
What does executing it again show?

-Yonik

On Tue, Sep 1, 2015 at 9:47 PM, Zheng Lin Edwin Yeo
 wrote:
> Hi Yonik,
>
> Thanks for pointing out the difference.
>
> I've made modification and tried with this below command for JSON Facet,
> but it is still having a QTime of 410, as compared to the Legacy Facet
> QTime of 22:
> http://localhost:8983/solr/collection1/select?q=paint={f:{field:content}}=0
>
> Is this the same as the Legacy Facet query of
> http://localhost:8983/solr/collection1/select?q=paint=true=content=0
> 
>  ?
>
>
> Regards,
> Edwin
>
>
> On 1 September 2015 at 23:24, Yonik Seeley  wrote:
>
>> They aren't doing the same thing...
>>
>> The first URL is doing a straight facet on the content field.
>> The second URL is doing a facet on the content field and asking for an
>> additional statistic for each bucket.
>>
>> -Yonik
>>
>>
>> On Tue, Sep 1, 2015 at 11:08 AM, Zheng Lin Edwin Yeo
>>  wrote:
>> > I've tried the following commands and I found that the Legacy Faceting is
>> > actually much faster than JSON Faceting. Not sure why is this so, when
>> the
>> > document from this link http://yonik.com/solr-count-distinct/ states
>> that
>> > JSON Facets has a much lower request latency.
>> >
>> > (For Legacy Facet) - QTime: 22
>> >
>> > -
>> >
>> http://localhost:8983/solr/collection1/select?q=paint=true=content=0
>> > <
>> http://27.54.41.220:8983/edm/collection1/select?q=paint=true=content=0
>> >
>> >
>> > (For JSON Facet) - QTime: 1128
>> >
>> > -
>> >
>> http://localhost:8983/solr/collection1/select?q=paint={f:{type:terms,field:content,facet:{stat1
>> :"hll(id)"}}}=0
>> > <
>> http://27.54.41.220:8983/edm/collection1/select?q=paint=%7bf:%7btype:terms,field:content,facet:%7bstat1:%22hll(id)%22%7d%7d%7d=0
>> >
>> >
>> >
>> > Is there any problem with my URL for the JSON Facet?
>> >
>> >
>> > Regards,
>> >
>> > Edwin
>> >
>> >
>> >
>> > On 1 September 2015 at 16:51, Zheng Lin Edwin Yeo 
>> > wrote:
>> >
>> >> Hi,
>> >>
>> >> I'm using Solr 5.2.1, and I would like to find out, what is the
>> difference
>> >> between Legacy Facets and JSON Facets in Solr? I was told that JSON
>> Facets
>> >> has a much lesser Request Latency, but I couldn't find any major
>> difference
>> >> in speed. Or must we have a larger index in order to have any
>> significant
>> >> difference?
>> >>
>> >> Is there any significant advantage to use JSON Faceting command instead
>> of
>> >> Legacy Faceting command?
>> >>
>> >> Regards,
>> >> Edwin
>> >>
>>


Re: Difference between Legacy Facets and JSON Facets

2015-09-01 Thread Zheng Lin Edwin Yeo
The type of field is text_general.

I found that the problem mainly happen in the content field of the
collections with rich text document.
It works fine for other files, and also collections indexed with CSV
documents, even if the fieldType is text_general.

Regards,
Edwin


On 2 September 2015 at 12:12, Yonik Seeley  wrote:

> On Tue, Sep 1, 2015 at 11:51 PM, Zheng Lin Edwin Yeo
>  wrote:
> > No, I've tested it several times after committing it.
>
> Hmmm, well something is really wrong for this orders of magnitude
> difference.  I've never seen anything like that and we should
> definitely try to get to the bottom of it.
> What is the type of the field?
>
> -Yonik
>


custom shard or auto shard for SolrCloud?

2015-09-01 Thread Scott Chu
I post this question on Stackoverflow and would like some suggestion: 

solr - Custom sharding or auto Sharding on SolrCloud? - Stack Overflow
http://stackoverflow.com/questions/32343813/custom-sharding-or-auto-sharding-on-solrcloud


Scott Chu,scott@udngroup.com
2015/9/2 


Re: Difference between Legacy Facets and JSON Facets

2015-09-01 Thread Yonik Seeley
On Tue, Sep 1, 2015 at 11:51 PM, Zheng Lin Edwin Yeo
 wrote:
> No, I've tested it several times after committing it.

Hmmm, well something is really wrong for this orders of magnitude
difference.  I've never seen anything like that and we should
definitely try to get to the bottom of it.
What is the type of the field?

-Yonik


Re: Solr cloud hangs, log4j contention issue observed

2015-09-01 Thread Shawn Heisey
On 9/1/2015 12:53 AM, Arnon Yogev wrote:
> We have a Solr cloud (4.7) consisting of 5 servers.
> At some point we noticed that one of the servers had a very high CPU and
> was not responding. A few minutes later, the other 4 servers were
> responding very slowly. A restart was required.
> Looking at the Solr logs, we mainly saw symptoms, i.e. errors that happened
> a few minutes after the high CPU started (connection timeouts etc).
>
> When looking at the javacore of the problematic server, we found that one
> thread was waiting on a log4j method, and 538 threads (!) were waiting on
> the same lock.
> The thread's stack trace is:



> Our logging is done to a local file.
> After searching the web, we found similar problems:
> https://bz.apache.org/bugzilla/show_bug.cgi?id=50213
> https://bz.apache.org/bugzilla/show_bug.cgi?id=51047
> https://dzone.com/articles/log4j-thread-deadlock-case
>
> However, seems like the fixes were made for log4j 2.X. And Solr uses log4j
> 1.2.X (even the new Solr 5.3.0, from what I've seen).
>
> Is this a known problem?
> Is it possible to upgrade Solr log4j version to 2.X?

We have an issue to upgrde log4j.  I know because I'm the one that
opened it.  I haven't had any time to work on it, and until I can
actually research it, I am fairly clueless about how to proceed.

https://issues.apache.org/jira/browse/SOLR-7887

What container are you running in?  The stacktrace was not complete
enough for me to figure that out myself.  What is that container's
maxThreads setting?  The thread name including "http-bio-8443" makes me
thing it's probably Tomcat, not the jetty included in the example found
in the download, which makes the maxThreads parameter particularly relevant.

I do not see any mention of locks in the information that you included,
either held or waiting.  If a lot of threads are waiting on a single
lock, then you should be able to find which thread is holding that lock
... and I don't think it will be the thread that you mentioned.

Thanks,
Shawn



Re: Sorting parent documents based on a field from children

2015-09-01 Thread Alexandre Rafalovitch
On 1 September 2015 at 09:10, Mikhail Khludnev
 wrote:
>> Not many
>> people know about it, may help to disambiguate the syntax.
>>
> Oh. C'mon! it's announced for ages http://yonik.com/solr/query-syntax/

Not everybody reads and keeps track of every feature of Solr.
Especially for newbies, it helps to disambiguate more obscure
features. And, based on my experience, local params, URPs and
alternative parsers are obscure features, you can guess what they feel
about everything else.

Regards,
   Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


Re: Difference between Legacy Facets and JSON Facets

2015-09-01 Thread Zheng Lin Edwin Yeo
I've tried the following commands and I found that the Legacy Faceting is
actually much faster than JSON Faceting. Not sure why is this so, when the
document from this link http://yonik.com/solr-count-distinct/ states that
JSON Facets has a much lower request latency.

(For Legacy Facet) - QTime: 22

-
http://localhost:8983/solr/collection1/select?q=paint=true=content=0



(For JSON Facet) - QTime: 1128

-
http://localhost:8983/solr/collection1/select?q=paint={f:{type:terms,field:content,facet:{stat1:"hll(id)"}}}=0



Is there any problem with my URL for the JSON Facet?


Regards,

Edwin



On 1 September 2015 at 16:51, Zheng Lin Edwin Yeo 
wrote:

> Hi,
>
> I'm using Solr 5.2.1, and I would like to find out, what is the difference
> between Legacy Facets and JSON Facets in Solr? I was told that JSON Facets
> has a much lesser Request Latency, but I couldn't find any major difference
> in speed. Or must we have a larger index in order to have any significant
> difference?
>
> Is there any significant advantage to use JSON Faceting command instead of
> Legacy Faceting command?
>
> Regards,
> Edwin
>


RE: Custom merge logic in SolrCloud.

2015-09-01 Thread Markus Jelsma
Hello, i have had this issue as well. I patched QueryComponent and some other 
files that are used by QueryComponent so that it is finally possible to extend 
QueryComponent.
https://issues.apache.org/jira/browse/SOLR-7968

 
 
-Original message-
> From:Mohan gupta 
> Sent: Tuesday 1st September 2015 17:09
> To: solr-user@lucene.apache.org
> Subject: Re: Custom merge logic in SolrCloud.
> 
> *Bump*
> 
> On Tue, Sep 1, 2015 at 1:17 AM, Mohan gupta  wrote:
> 
> > Hi Folks,
> >
> > I need to merge docs received from multiple shards via a custom logic, a
> > straightforward score based priority queue doesn't work for my scenario (I
> > need to maintain a blend/distribution of docs).
> >
> > How can I plugin my custom merge logic? One way might be to fully
> > implement the QueryComponent but that seems like a lot of work, is there a
> > simpler way?
> >
> > I need my custom logic to kick-in in very specific cases and most of the
> > cases can still use default QueryComponent, was there a reason to make
> > merge functionality private (non-overridable) in the  QueryComponent class?
> >
> > --
> > Regards ,
> > Mohan Gupta
> >
> 
> 
> 
> -- 
> Regards ,
> Mohan Gupta
> 


Re: Get distinct results in Solr

2015-09-01 Thread Zheng Lin Edwin Yeo
Yes, here it is. These are in solrconfig.xml

  
  
dedupe
  
  



 
true
signature
false
content
solr.processor.Lookup3Signature
 



Regards,
Edwin


On 1 September 2015 at 22:26, Upayavira  wrote:

> Can you repeat the config you have for the dedup update chain?
>
> Thx
>
> On Tue, Sep 1, 2015, at 02:57 PM, Zheng Lin Edwin Yeo wrote:
> > Hi Upayavira,
> >
> > Yes, I tried with a completely new index. I found that once I added the
> > line below to my /update handler in solrconfig.xml, the indexing doesn't
> > work anymore.
> > dedupe
> >
> > Besides that, it is also not able to do any deletion to the index when
> > this
> > line is added.
> >
> > Regards,
> > Edwin
> >
> >
> >
> >
> > On 1 September 2015 at 21:15, Upayavira  wrote:
> >
> > > Have you tried with a completely clean index? Are you deduping, or just
> > > calculating the signature? Is it possible dedup is preventing your
> > > documents from indexing (because it thinks they are dups)?
> > >
> > > On Tue, Sep 1, 2015, at 09:46 AM, Zheng Lin Edwin Yeo wrote:
> > > > Hi Upayavira,
> > > >
> > > > I've tried to change id to be  > > > name="signatureField">signature, but nothing is indexed into
> Solr
> > > > as
> > > > well. Is that what you mean?
> > > >
> > > > Besides that, I've also included a copyField to copy the content
> field
> > > > into
> > > > the signature field. Both versions (with and without copyField) have
> > > > nothing indexed into Solr.
> > > >
> > > > Regards,
> > > > Edwin
> > > >
> > > >
> > > > On 1 September 2015 at 15:48, Upayavira  wrote:
> > > >
> > > > > you are attempting to write your signature to your ID field.
> That's not
> > > > > a good idea. You are generating your signature from the content
> field,
> > > > > which seems okay. Change your id
> to be
> > > > > your 'signature' field instead of id, and something different will
> > > > > happen :-)
> > > > >
> > > > > Upayavira
> > > > >
> > > > > On Tue, Sep 1, 2015, at 04:34 AM, Zheng Lin Edwin Yeo wrote:
> > > > > > I tried to follow the de-duplication guide, but after I
> configured
> > > it in
> > > > > > solrconfig.xml and schema.xml, nothing is indexed into Solr, and
> > > there is
> > > > > > no error message. I'm using SimplePostTool to index rich-text
> > > documents.
> > > > > >
> > > > > > Below are my configurations:
> > > > > >
> > > > > > In solrconfig.xml
> > > > > >
> > > > > >class="solr.UpdateRequestHandler">
> > > > > >  
> > > > > > dedupe
> > > > > >  
> > > > > >   
> > > > > >
> > > > > > 
> > > > > >   class="solr.processor.SignatureUpdateProcessorFactory">
> > > > > > true
> > > > > > id
> > > > > > false
> > > > > > content
> > > > > > solr.processor.Lookup3Signature
> > > > > >  
> > > > > > 
> > > > > >
> > > > > >
> > > > > > In schema.xml
> > > > > >
> > > > > >   indexed="true"
> > > > > > multiValued="false" />
> > > > > >
> > > > > >
> > > > > > Is there anything which I might have missed out or done wrongly?
> > > > > >
> > > > > > Regards,
> > > > > > Edwin
> > > > > >
> > > > > >
> > > > > > On 1 September 2015 at 10:46, Zheng Lin Edwin Yeo <
> > > edwinye...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Thank you for your advice Alexandre.
> > > > > > >
> > > > > > > Will try out the de-duplication from the link you gave.
> > > > > > >
> > > > > > > Regards,
> > > > > > > Edwin
> > > > > > >
> > > > > > >
> > > > > > > On 1 September 2015 at 10:34, Alexandre Rafalovitch <
> > > > > arafa...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > >> Re-read the question. You want to de-dupe on the full
> > > text-content.
> > > > > > >>
> > > > > > >> I would actually try to use the dedupe chain as per the link I
> > > gave
> > > > > > >> but put results into a separate string field. Then, you group
> on
> > > that
> > > > > > >> field. You cannot actually group on the long text field, that
> > > would
> > > > > > >> kill any performance. So a signature is your proxy.
> > > > > > >>
> > > > > > >> Regards,
> > > > > > >>Alex
> > > > > > >> 
> > > > > > >> Solr Analyzers, Tokenizers, Filters, URPs and even a
> newsletter:
> > > > > > >> http://www.solr-start.com/
> > > > > > >>
> > > > > > >>
> > > > > > >> On 31 August 2015 at 22:26, Zheng Lin Edwin Yeo <
> > > edwinye...@gmail.com
> > > > > >
> > > > > > >> wrote:
> > > > > > >> > Hi Alexandre,
> > > > > > >> >
> > > > > > >> > Will treating it as String affect the search or other
> functions
> > > like
> > > > > > >> > highlighting?
> > > > > > >> >
> > > > > > >> > Yes, the content must be in my index, unless I do a
> copyField
> > > to do
> > > > > > >> > de-duplication on that field.. Will that help?
> > > > > > >> >
> > > > > > >> > Regards,
> > > > > > >> > Edwin
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > On 1 September 2015 at 10:04, Alexandre Rafalovitch <
> > > > > arafa...@gmail.com
> > > > > > >> >
> > > > > > >> > wrote:
> > > > > > >> >
> > > > > > >> >> Can't 

Re: Custom merge logic in SolrCloud.

2015-09-01 Thread Mohan gupta
*Bump*

On Tue, Sep 1, 2015 at 1:17 AM, Mohan gupta  wrote:

> Hi Folks,
>
> I need to merge docs received from multiple shards via a custom logic, a
> straightforward score based priority queue doesn't work for my scenario (I
> need to maintain a blend/distribution of docs).
>
> How can I plugin my custom merge logic? One way might be to fully
> implement the QueryComponent but that seems like a lot of work, is there a
> simpler way?
>
> I need my custom logic to kick-in in very specific cases and most of the
> cases can still use default QueryComponent, was there a reason to make
> merge functionality private (non-overridable) in the  QueryComponent class?
>
> --
> Regards ,
> Mohan Gupta
>



-- 
Regards ,
Mohan Gupta


RE: DataImportHandler scheduling

2015-09-01 Thread Davis, Daniel (NIH/NLM) [C]
On 8/31/2015 11:26 AM, Troy Edwards wrote:
> I am having a hard time finding documentation on DataImportHandler 
> scheduling in SolrCloud. Can someone please post a link to that? I 
> have a requirement that the DIH should be initiated at a specific time 
> Monday through Friday.

Troy, is your question how to use scheduled tasks?   Shawn pointed you to the 
right direction.   I thought it more likely that you want to schedule a cron 
task to run on any of your servers running SolrCloud, and you want the job to 
run even if the cluster is degraded.   

Here's an idea - schedule your job Monday on node 1, Tuesday on node 2, etc.   
That way, if the cluster is degraded (a node is down), re-indexing/delta 
indexing still happens, it just happens slower.You can certainly write a 
zookeeper client to make each cron job compete to see who does the job - 
questions on how to do this should be directed to a zookeeper users' mailing 
list.

-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: Monday, August 31, 2015 7:50 PM
To: solr-user@lucene.apache.org
Subject: Re: DataImportHandler scheduling

On 8/31/2015 11:26 AM, Troy Edwards wrote:
> I am having a hard time finding documentation on DataImportHandler 
> scheduling in SolrCloud. Can someone please post a link to that? I 
> have a requirement that the DIH should be initiated at a specific time 
> Monday through Friday.

Every modern operating system (and most of the previous versions of every 
modern OS) has a built-in task scheduling system.  For Windows, it's literally 
called Task Scheduler.  For most other operating systems, it's called cron.

Including dataimport scheduling capability in Solr has been discussed, and I 
think someone even wrote a working version ... but since every OS already has 
scheduling capability that has had years of time to mature, why should Solr 
reinvent the wheel and take the risk that the implementation will have bugs?

Currently virtually all updates to Solr's index must be initiated outside of 
Solr, and there is good reason to make sure that Solr doesn't ever modify the 
index without outside input.  The only thing I know of right now that can 
update the index automatically is Document Expiration, but the expiration time 
is decided when the document is indexed, and the original indexing action is 
external to Solr.

https://lucidworks.com/blog/document-expiration/

Thanks,
Shawn



Re: Get distinct results in Solr

2015-09-01 Thread Zheng Lin Edwin Yeo
Hi Alexandre,

Yes, the indexing works fine previously until the following line is added
to my /update handler in solrconfig.xml.

  
 
dedupe
 
  

Regards,
Edwin


On 1 September 2015 at 20:25, Alexandre Rafalovitch 
wrote:

> Do you mean that normally you do get stuff indexed but when you make
> any of these changes the indexing stops working and you get empty
> index? If so, you probably misconfigured something and should be
> getting error messages.
>
> If, on the other hand, you see no changes, check that you are actually
> using that URP chain. It needs to be declared in the search handler to
> be used. Or it can be passed as a URL parameter too. The documentation
> has the details.
>
> Regards,
>Alex.
> 
> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> http://www.solr-start.com/
>
>
> On 1 September 2015 at 04:46, Zheng Lin Edwin Yeo 
> wrote:
> > Hi Upayavira,
> >
> > I've tried to change id to be  > name="signatureField">signature, but nothing is indexed into Solr
> as
> > well. Is that what you mean?
> >
> > Besides that, I've also included a copyField to copy the content field
> into
> > the signature field. Both versions (with and without copyField) have
> > nothing indexed into Solr.
> >
> > Regards,
> > Edwin
> >
> >
> > On 1 September 2015 at 15:48, Upayavira  wrote:
> >
> >> you are attempting to write your signature to your ID field. That's not
> >> a good idea. You are generating your signature from the content field,
> >> which seems okay. Change your id to be
> >> your 'signature' field instead of id, and something different will
> >> happen :-)
> >>
> >> Upayavira
> >>
> >> On Tue, Sep 1, 2015, at 04:34 AM, Zheng Lin Edwin Yeo wrote:
> >> > I tried to follow the de-duplication guide, but after I configured it
> in
> >> > solrconfig.xml and schema.xml, nothing is indexed into Solr, and
> there is
> >> > no error message. I'm using SimplePostTool to index rich-text
> documents.
> >> >
> >> > Below are my configurations:
> >> >
> >> > In solrconfig.xml
> >> >
> >> >   
> >> >  
> >> > dedupe
> >> >  
> >> >   
> >> >
> >> > 
> >> >  
> >> > true
> >> > id
> >> > false
> >> > content
> >> > solr.processor.Lookup3Signature
> >> >  
> >> > 
> >> >
> >> >
> >> > In schema.xml
> >> >
> >> >   >> > multiValued="false" />
> >> >
> >> >
> >> > Is there anything which I might have missed out or done wrongly?
> >> >
> >> > Regards,
> >> > Edwin
> >> >
> >> >
> >> > On 1 September 2015 at 10:46, Zheng Lin Edwin Yeo <
> edwinye...@gmail.com>
> >> > wrote:
> >> >
> >> > > Thank you for your advice Alexandre.
> >> > >
> >> > > Will try out the de-duplication from the link you gave.
> >> > >
> >> > > Regards,
> >> > > Edwin
> >> > >
> >> > >
> >> > > On 1 September 2015 at 10:34, Alexandre Rafalovitch <
> >> arafa...@gmail.com>
> >> > > wrote:
> >> > >
> >> > >> Re-read the question. You want to de-dupe on the full text-content.
> >> > >>
> >> > >> I would actually try to use the dedupe chain as per the link I gave
> >> > >> but put results into a separate string field. Then, you group on
> that
> >> > >> field. You cannot actually group on the long text field, that would
> >> > >> kill any performance. So a signature is your proxy.
> >> > >>
> >> > >> Regards,
> >> > >>Alex
> >> > >> 
> >> > >> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> >> > >> http://www.solr-start.com/
> >> > >>
> >> > >>
> >> > >> On 31 August 2015 at 22:26, Zheng Lin Edwin Yeo <
> edwinye...@gmail.com
> >> >
> >> > >> wrote:
> >> > >> > Hi Alexandre,
> >> > >> >
> >> > >> > Will treating it as String affect the search or other functions
> like
> >> > >> > highlighting?
> >> > >> >
> >> > >> > Yes, the content must be in my index, unless I do a copyField to
> do
> >> > >> > de-duplication on that field.. Will that help?
> >> > >> >
> >> > >> > Regards,
> >> > >> > Edwin
> >> > >> >
> >> > >> >
> >> > >> > On 1 September 2015 at 10:04, Alexandre Rafalovitch <
> >> arafa...@gmail.com
> >> > >> >
> >> > >> > wrote:
> >> > >> >
> >> > >> >> Can't you just treat it as String?
> >> > >> >>
> >> > >> >> Also, do you actually want those documents in your index in the
> >> first
> >> > >> >> place? If not, have you looked at De-duplication:
> >> > >> >> https://cwiki.apache.org/confluence/display/solr/De-Duplication
> >> > >> >>
> >> > >> >> Regards,
> >> > >> >>Alex.
> >> > >> >> 
> >> > >> >> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> >> > >> >> http://www.solr-start.com/
> >> > >> >>
> >> > >> >>
> >> > >> >> On 31 August 2015 at 22:00, Zheng Lin Edwin Yeo <
> >> edwinye...@gmail.com>
> >> > >> >> wrote:
> >> > >> >> > Thanks Jan.
> >> > >> >> >
> >> > >> >> > But I read that the field that is being collapsed on must be a
> >> single
> >> > >> >> > valued String, Int or Float. As I'm required to get the
> distinct
> >> > >> results
> >> > >> >> > from "content" field that was 

Re: Get distinct results in Solr

2015-09-01 Thread Zheng Lin Edwin Yeo
Hi Upayavira,

Yes, I tried with a completely new index. I found that once I added the
line below to my /update handler in solrconfig.xml, the indexing doesn't
work anymore.
dedupe

Besides that, it is also not able to do any deletion to the index when this
line is added.

Regards,
Edwin




On 1 September 2015 at 21:15, Upayavira  wrote:

> Have you tried with a completely clean index? Are you deduping, or just
> calculating the signature? Is it possible dedup is preventing your
> documents from indexing (because it thinks they are dups)?
>
> On Tue, Sep 1, 2015, at 09:46 AM, Zheng Lin Edwin Yeo wrote:
> > Hi Upayavira,
> >
> > I've tried to change id to be  > name="signatureField">signature, but nothing is indexed into Solr
> > as
> > well. Is that what you mean?
> >
> > Besides that, I've also included a copyField to copy the content field
> > into
> > the signature field. Both versions (with and without copyField) have
> > nothing indexed into Solr.
> >
> > Regards,
> > Edwin
> >
> >
> > On 1 September 2015 at 15:48, Upayavira  wrote:
> >
> > > you are attempting to write your signature to your ID field. That's not
> > > a good idea. You are generating your signature from the content field,
> > > which seems okay. Change your id to be
> > > your 'signature' field instead of id, and something different will
> > > happen :-)
> > >
> > > Upayavira
> > >
> > > On Tue, Sep 1, 2015, at 04:34 AM, Zheng Lin Edwin Yeo wrote:
> > > > I tried to follow the de-duplication guide, but after I configured
> it in
> > > > solrconfig.xml and schema.xml, nothing is indexed into Solr, and
> there is
> > > > no error message. I'm using SimplePostTool to index rich-text
> documents.
> > > >
> > > > Below are my configurations:
> > > >
> > > > In solrconfig.xml
> > > >
> > > >   
> > > >  
> > > > dedupe
> > > >  
> > > >   
> > > >
> > > > 
> > > >  
> > > > true
> > > > id
> > > > false
> > > > content
> > > > solr.processor.Lookup3Signature
> > > >  
> > > > 
> > > >
> > > >
> > > > In schema.xml
> > > >
> > > >   > > > multiValued="false" />
> > > >
> > > >
> > > > Is there anything which I might have missed out or done wrongly?
> > > >
> > > > Regards,
> > > > Edwin
> > > >
> > > >
> > > > On 1 September 2015 at 10:46, Zheng Lin Edwin Yeo <
> edwinye...@gmail.com>
> > > > wrote:
> > > >
> > > > > Thank you for your advice Alexandre.
> > > > >
> > > > > Will try out the de-duplication from the link you gave.
> > > > >
> > > > > Regards,
> > > > > Edwin
> > > > >
> > > > >
> > > > > On 1 September 2015 at 10:34, Alexandre Rafalovitch <
> > > arafa...@gmail.com>
> > > > > wrote:
> > > > >
> > > > >> Re-read the question. You want to de-dupe on the full
> text-content.
> > > > >>
> > > > >> I would actually try to use the dedupe chain as per the link I
> gave
> > > > >> but put results into a separate string field. Then, you group on
> that
> > > > >> field. You cannot actually group on the long text field, that
> would
> > > > >> kill any performance. So a signature is your proxy.
> > > > >>
> > > > >> Regards,
> > > > >>Alex
> > > > >> 
> > > > >> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> > > > >> http://www.solr-start.com/
> > > > >>
> > > > >>
> > > > >> On 31 August 2015 at 22:26, Zheng Lin Edwin Yeo <
> edwinye...@gmail.com
> > > >
> > > > >> wrote:
> > > > >> > Hi Alexandre,
> > > > >> >
> > > > >> > Will treating it as String affect the search or other functions
> like
> > > > >> > highlighting?
> > > > >> >
> > > > >> > Yes, the content must be in my index, unless I do a copyField
> to do
> > > > >> > de-duplication on that field.. Will that help?
> > > > >> >
> > > > >> > Regards,
> > > > >> > Edwin
> > > > >> >
> > > > >> >
> > > > >> > On 1 September 2015 at 10:04, Alexandre Rafalovitch <
> > > arafa...@gmail.com
> > > > >> >
> > > > >> > wrote:
> > > > >> >
> > > > >> >> Can't you just treat it as String?
> > > > >> >>
> > > > >> >> Also, do you actually want those documents in your index in the
> > > first
> > > > >> >> place? If not, have you looked at De-duplication:
> > > > >> >>
> https://cwiki.apache.org/confluence/display/solr/De-Duplication
> > > > >> >>
> > > > >> >> Regards,
> > > > >> >>Alex.
> > > > >> >> 
> > > > >> >> Solr Analyzers, Tokenizers, Filters, URPs and even a
> newsletter:
> > > > >> >> http://www.solr-start.com/
> > > > >> >>
> > > > >> >>
> > > > >> >> On 31 August 2015 at 22:00, Zheng Lin Edwin Yeo <
> > > edwinye...@gmail.com>
> > > > >> >> wrote:
> > > > >> >> > Thanks Jan.
> > > > >> >> >
> > > > >> >> > But I read that the field that is being collapsed on must be
> a
> > > single
> > > > >> >> > valued String, Int or Float. As I'm required to get the
> distinct
> > > > >> results
> > > > >> >> > from "content" field that was indexed from a rich text
> document,
> > > I
> > > > >> got
> > > > >> >> the
> > > > >> >> > following error:
> > > > >> >> >
> > > > >> >> >   "error":{
> > > > >> >> >

Re: Get distinct results in Solr

2015-09-01 Thread Upayavira
Can you repeat the config you have for the dedup update chain?

Thx

On Tue, Sep 1, 2015, at 02:57 PM, Zheng Lin Edwin Yeo wrote:
> Hi Upayavira,
> 
> Yes, I tried with a completely new index. I found that once I added the
> line below to my /update handler in solrconfig.xml, the indexing doesn't
> work anymore.
> dedupe
> 
> Besides that, it is also not able to do any deletion to the index when
> this
> line is added.
> 
> Regards,
> Edwin
> 
> 
> 
> 
> On 1 September 2015 at 21:15, Upayavira  wrote:
> 
> > Have you tried with a completely clean index? Are you deduping, or just
> > calculating the signature? Is it possible dedup is preventing your
> > documents from indexing (because it thinks they are dups)?
> >
> > On Tue, Sep 1, 2015, at 09:46 AM, Zheng Lin Edwin Yeo wrote:
> > > Hi Upayavira,
> > >
> > > I've tried to change id to be  > > name="signatureField">signature, but nothing is indexed into Solr
> > > as
> > > well. Is that what you mean?
> > >
> > > Besides that, I've also included a copyField to copy the content field
> > > into
> > > the signature field. Both versions (with and without copyField) have
> > > nothing indexed into Solr.
> > >
> > > Regards,
> > > Edwin
> > >
> > >
> > > On 1 September 2015 at 15:48, Upayavira  wrote:
> > >
> > > > you are attempting to write your signature to your ID field. That's not
> > > > a good idea. You are generating your signature from the content field,
> > > > which seems okay. Change your id to be
> > > > your 'signature' field instead of id, and something different will
> > > > happen :-)
> > > >
> > > > Upayavira
> > > >
> > > > On Tue, Sep 1, 2015, at 04:34 AM, Zheng Lin Edwin Yeo wrote:
> > > > > I tried to follow the de-duplication guide, but after I configured
> > it in
> > > > > solrconfig.xml and schema.xml, nothing is indexed into Solr, and
> > there is
> > > > > no error message. I'm using SimplePostTool to index rich-text
> > documents.
> > > > >
> > > > > Below are my configurations:
> > > > >
> > > > > In solrconfig.xml
> > > > >
> > > > >   
> > > > >  
> > > > > dedupe
> > > > >  
> > > > >   
> > > > >
> > > > > 
> > > > >  
> > > > > true
> > > > > id
> > > > > false
> > > > > content
> > > > > solr.processor.Lookup3Signature
> > > > >  
> > > > > 
> > > > >
> > > > >
> > > > > In schema.xml
> > > > >
> > > > >   > > > > multiValued="false" />
> > > > >
> > > > >
> > > > > Is there anything which I might have missed out or done wrongly?
> > > > >
> > > > > Regards,
> > > > > Edwin
> > > > >
> > > > >
> > > > > On 1 September 2015 at 10:46, Zheng Lin Edwin Yeo <
> > edwinye...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Thank you for your advice Alexandre.
> > > > > >
> > > > > > Will try out the de-duplication from the link you gave.
> > > > > >
> > > > > > Regards,
> > > > > > Edwin
> > > > > >
> > > > > >
> > > > > > On 1 September 2015 at 10:34, Alexandre Rafalovitch <
> > > > arafa...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > >> Re-read the question. You want to de-dupe on the full
> > text-content.
> > > > > >>
> > > > > >> I would actually try to use the dedupe chain as per the link I
> > gave
> > > > > >> but put results into a separate string field. Then, you group on
> > that
> > > > > >> field. You cannot actually group on the long text field, that
> > would
> > > > > >> kill any performance. So a signature is your proxy.
> > > > > >>
> > > > > >> Regards,
> > > > > >>Alex
> > > > > >> 
> > > > > >> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> > > > > >> http://www.solr-start.com/
> > > > > >>
> > > > > >>
> > > > > >> On 31 August 2015 at 22:26, Zheng Lin Edwin Yeo <
> > edwinye...@gmail.com
> > > > >
> > > > > >> wrote:
> > > > > >> > Hi Alexandre,
> > > > > >> >
> > > > > >> > Will treating it as String affect the search or other functions
> > like
> > > > > >> > highlighting?
> > > > > >> >
> > > > > >> > Yes, the content must be in my index, unless I do a copyField
> > to do
> > > > > >> > de-duplication on that field.. Will that help?
> > > > > >> >
> > > > > >> > Regards,
> > > > > >> > Edwin
> > > > > >> >
> > > > > >> >
> > > > > >> > On 1 September 2015 at 10:04, Alexandre Rafalovitch <
> > > > arafa...@gmail.com
> > > > > >> >
> > > > > >> > wrote:
> > > > > >> >
> > > > > >> >> Can't you just treat it as String?
> > > > > >> >>
> > > > > >> >> Also, do you actually want those documents in your index in the
> > > > first
> > > > > >> >> place? If not, have you looked at De-duplication:
> > > > > >> >>
> > https://cwiki.apache.org/confluence/display/solr/De-Duplication
> > > > > >> >>
> > > > > >> >> Regards,
> > > > > >> >>Alex.
> > > > > >> >> 
> > > > > >> >> Solr Analyzers, Tokenizers, Filters, URPs and even a
> > newsletter:
> > > > > >> >> http://www.solr-start.com/
> > > > > >> >>
> > > > > >> >>
> > > > > >> >> On 31 August 2015 at 22:00, Zheng Lin Edwin Yeo <
> > > > edwinye...@gmail.com>
> > > > > >> >> 

Re: Sorting parent documents based on a field from children

2015-09-01 Thread Mikhail Khludnev
On Tue, Sep 1, 2015 at 3:44 PM, Alexandre Rafalovitch 
wrote:

> On 1 September 2015 at 08:29, Mikhail Khludnev
>  wrote:
> > Last check to check, make sure that you don't have deleted document in
> the
> > index for a while. You can check in at SolrAdmin.
>
> What's the significance of that particular advice? Is something in the
> join including deleted documents as well?
>
I just don't remember this side case: the orthogonality between children
query and parent might or might not expected up to deleted documents.
Removing this variable makes an equation easier to solve. After a proof
case works, we can think what needs to be deleted.
Note that deletes might be caused by updates.


>
> P.s. Great article. If I may suggest, it would help to link/mention
> the ^= constant scoring feature when it is shown first.

Thanks. Added link to wiki.


> Not many
> people know about it, may help to disambiguate the syntax.
>
Oh. C'mon! it's announced for ages http://yonik.com/solr/query-syntax/


>
> Regards,
>Alex.
>
> 
> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> http://www.solr-start.com/
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Re: Get distinct results in Solr

2015-09-01 Thread Upayavira
Have you tried with a completely clean index? Are you deduping, or just
calculating the signature? Is it possible dedup is preventing your
documents from indexing (because it thinks they are dups)?

On Tue, Sep 1, 2015, at 09:46 AM, Zheng Lin Edwin Yeo wrote:
> Hi Upayavira,
> 
> I've tried to change id to be  name="signatureField">signature, but nothing is indexed into Solr
> as
> well. Is that what you mean?
> 
> Besides that, I've also included a copyField to copy the content field
> into
> the signature field. Both versions (with and without copyField) have
> nothing indexed into Solr.
> 
> Regards,
> Edwin
> 
> 
> On 1 September 2015 at 15:48, Upayavira  wrote:
> 
> > you are attempting to write your signature to your ID field. That's not
> > a good idea. You are generating your signature from the content field,
> > which seems okay. Change your id to be
> > your 'signature' field instead of id, and something different will
> > happen :-)
> >
> > Upayavira
> >
> > On Tue, Sep 1, 2015, at 04:34 AM, Zheng Lin Edwin Yeo wrote:
> > > I tried to follow the de-duplication guide, but after I configured it in
> > > solrconfig.xml and schema.xml, nothing is indexed into Solr, and there is
> > > no error message. I'm using SimplePostTool to index rich-text documents.
> > >
> > > Below are my configurations:
> > >
> > > In solrconfig.xml
> > >
> > >   
> > >  
> > > dedupe
> > >  
> > >   
> > >
> > > 
> > >  
> > > true
> > > id
> > > false
> > > content
> > > solr.processor.Lookup3Signature
> > >  
> > > 
> > >
> > >
> > > In schema.xml
> > >
> > >   > > multiValued="false" />
> > >
> > >
> > > Is there anything which I might have missed out or done wrongly?
> > >
> > > Regards,
> > > Edwin
> > >
> > >
> > > On 1 September 2015 at 10:46, Zheng Lin Edwin Yeo 
> > > wrote:
> > >
> > > > Thank you for your advice Alexandre.
> > > >
> > > > Will try out the de-duplication from the link you gave.
> > > >
> > > > Regards,
> > > > Edwin
> > > >
> > > >
> > > > On 1 September 2015 at 10:34, Alexandre Rafalovitch <
> > arafa...@gmail.com>
> > > > wrote:
> > > >
> > > >> Re-read the question. You want to de-dupe on the full text-content.
> > > >>
> > > >> I would actually try to use the dedupe chain as per the link I gave
> > > >> but put results into a separate string field. Then, you group on that
> > > >> field. You cannot actually group on the long text field, that would
> > > >> kill any performance. So a signature is your proxy.
> > > >>
> > > >> Regards,
> > > >>Alex
> > > >> 
> > > >> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> > > >> http://www.solr-start.com/
> > > >>
> > > >>
> > > >> On 31 August 2015 at 22:26, Zheng Lin Edwin Yeo  > >
> > > >> wrote:
> > > >> > Hi Alexandre,
> > > >> >
> > > >> > Will treating it as String affect the search or other functions like
> > > >> > highlighting?
> > > >> >
> > > >> > Yes, the content must be in my index, unless I do a copyField to do
> > > >> > de-duplication on that field.. Will that help?
> > > >> >
> > > >> > Regards,
> > > >> > Edwin
> > > >> >
> > > >> >
> > > >> > On 1 September 2015 at 10:04, Alexandre Rafalovitch <
> > arafa...@gmail.com
> > > >> >
> > > >> > wrote:
> > > >> >
> > > >> >> Can't you just treat it as String?
> > > >> >>
> > > >> >> Also, do you actually want those documents in your index in the
> > first
> > > >> >> place? If not, have you looked at De-duplication:
> > > >> >> https://cwiki.apache.org/confluence/display/solr/De-Duplication
> > > >> >>
> > > >> >> Regards,
> > > >> >>Alex.
> > > >> >> 
> > > >> >> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> > > >> >> http://www.solr-start.com/
> > > >> >>
> > > >> >>
> > > >> >> On 31 August 2015 at 22:00, Zheng Lin Edwin Yeo <
> > edwinye...@gmail.com>
> > > >> >> wrote:
> > > >> >> > Thanks Jan.
> > > >> >> >
> > > >> >> > But I read that the field that is being collapsed on must be a
> > single
> > > >> >> > valued String, Int or Float. As I'm required to get the distinct
> > > >> results
> > > >> >> > from "content" field that was indexed from a rich text document,
> > I
> > > >> got
> > > >> >> the
> > > >> >> > following error:
> > > >> >> >
> > > >> >> >   "error":{
> > > >> >> > "msg":"java.io.IOException: 64 bit numeric collapse fields
> > are
> > > >> not
> > > >> >> > supported",
> > > >> >> > "trace":"java.lang.RuntimeException: java.io.IOException: 64
> > bit
> > > >> >> > numeric collapse fields are not supported\r\n\tat
> > > >> >> >
> > > >> >> >
> > > >> >> > Is it possible to collapsed on fields which has a long integer of
> > > >> data,
> > > >> >> > like content from a rich text document?
> > > >> >> >
> > > >> >> > Regards,
> > > >> >> > Edwin
> > > >> >> >
> > > >> >> >
> > > >> >> > On 31 August 2015 at 18:59, Jan Høydahl 
> > > >> wrote:
> > > >> >> >
> > > >> >> >> Hi
> > > >> >> >>
> > > >> >> >> Check 

Re: Difference between Legacy Facets and JSON Facets

2015-09-01 Thread Yonik Seeley
They aren't doing the same thing...

The first URL is doing a straight facet on the content field.
The second URL is doing a facet on the content field and asking for an
additional statistic for each bucket.

-Yonik


On Tue, Sep 1, 2015 at 11:08 AM, Zheng Lin Edwin Yeo
 wrote:
> I've tried the following commands and I found that the Legacy Faceting is
> actually much faster than JSON Faceting. Not sure why is this so, when the
> document from this link http://yonik.com/solr-count-distinct/ states that
> JSON Facets has a much lower request latency.
>
> (For Legacy Facet) - QTime: 22
>
> -
> http://localhost:8983/solr/collection1/select?q=paint=true=content=0
> 
>
> (For JSON Facet) - QTime: 1128
>
> -
> http://localhost:8983/solr/collection1/select?q=paint={f:{type:terms,field:content,facet:{stat1:"hll(id)"}}}=0
> 
>
>
> Is there any problem with my URL for the JSON Facet?
>
>
> Regards,
>
> Edwin
>
>
>
> On 1 September 2015 at 16:51, Zheng Lin Edwin Yeo 
> wrote:
>
>> Hi,
>>
>> I'm using Solr 5.2.1, and I would like to find out, what is the difference
>> between Legacy Facets and JSON Facets in Solr? I was told that JSON Facets
>> has a much lesser Request Latency, but I couldn't find any major difference
>> in speed. Or must we have a larger index in order to have any significant
>> difference?
>>
>> Is there any significant advantage to use JSON Faceting command instead of
>> Legacy Faceting command?
>>
>> Regards,
>> Edwin
>>


Re: 'missing content stream' issuing expungeDeletes=true

2015-09-01 Thread Erick Erickson
Derek:

Why do you care? What evidence do you have that this matters _practically_?

If you've look at scoring with a small number of documents, you'll see
significant
differences due to deleted documents. In most cases, as you get a larger number
of documents the ranking of documents in an index with no deletions .vs. indexes
that have deletions is usually not noticeable.

I'm suggesting that this is a red herring. Your specific situation may
be different
of course, but since scoring is really only about ranking docs
relative to each other,
unless the relative positions change enough to be noticeable it's not a problem.

Note that I'm saying "relative rankings", NOT "absolute score". Document scores
have no meaning outside comparisons to other docs _in the same query_. So
unless you see documents changing their position in the list due to
having deleted
docs, it's not worth spending time on IMO.

Best,
Erick

On Tue, Sep 1, 2015 at 12:45 AM, Upayavira  wrote:
> I wonder if this resolves it [1]. It has been applied to trunk, but not
> to the 5.x release branch.
>
> If you needed it in 5.x, I wonder if there's a way that particular
> choice could be made configurable.
>
> Upayavira
>
> [1] https://issues.apache.org/jira/browse/LUCENE-6711
> On Tue, Sep 1, 2015, at 02:43 AM, Derek Poh wrote:
>> Hi Upayavira
>>
>> In fact we are using optimize currently but was advised to use expunge
>> deletes as it is less resource intensive.
>> So expunge deletes will only remove deleted documents, it will not merge
>> all index segments into one?
>>
>> If we don't use optimize, the deleted documents in the index will affect
>> the scores (with docFreq=2) of the matched documents which will affect
>> the relevancy of the search result.
>>
>> Derek
>>
>> On 9/1/2015 12:05 AM, Upayavira wrote:
>> > If you really must expunge deletes, use optimize. That will merge all
>> > index segments into one, and in the process will remove any deleted
>> > documents.
>> >
>> > Why do you need to expunge deleted documents anyway? It is generally
>> > done in the background for you, so you shouldn't need to worry about it.
>> >
>> > Upayavira
>> >
>> > On Mon, Aug 31, 2015, at 06:46 AM, davidphilip cherian wrote:
>> >> Hi,
>> >>
>> >> The below curl command worked without error, you can try.
>> >>
>> >> curl http://localhost:8983/solr/techproducts/update?commit=true -H
>> >> "Content-Type: text/xml" --data-binary '> >> expungeDeletes="true"/>'
>> >>
>> >> However, after executing this, I could still see same deleted counts on
>> >> dashboard.  Deleted Docs:6
>> >> I am not sure whether that means,  the command did not take effect or it
>> >> took effect but did not reflect on dashboard view.
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> On Mon, Aug 31, 2015 at 8:51 AM, Derek Poh 
>> >> wrote:
>> >>
>> >>> Hi
>> >>>
>> >>> I tried doing a expungeDeletes=true with the following but get the 
>> >>> message
>> >>> 'missing content stream'. What am I missing? I need to provide additional
>> >>> parameters?
>> >>>
>> >>> curl 'http://127.0.0.1:8983/solr/supplier/update/json?expungeDeletes=true
>> >>> ';
>> >>>
>> >>> Thanks,
>> >>> Derek
>> >>>
>> >>> --
>> >>> CONFIDENTIALITY NOTICE
>> >>> This e-mail (including any attachments) may contain confidential and/or
>> >>> privileged information. If you are not the intended recipient or have
>> >>> received this e-mail in error, please inform the sender immediately and
>> >>> delete this e-mail (including any attachments) from your computer, and 
>> >>> you
>> >>> must not use, disclose to anyone else or copy this e-mail (including any
>> >>> attachments), whether in whole or in part.
>> >>> This e-mail and any reply to it may be monitored for security, legal,
>> >>> regulatory compliance and/or other appropriate reasons.
>> >>>
>> >>>
>> >
>>
>>
>> --
>> CONFIDENTIALITY NOTICE
>>
>> This e-mail (including any attachments) may contain confidential and/or
>> privileged information. If you are not the intended recipient or have
>> received this e-mail in error, please inform the sender immediately and
>> delete this e-mail (including any attachments) from your computer, and
>> you must not use, disclose to anyone else or copy this e-mail (including
>> any attachments), whether in whole or in part.
>>
>> This e-mail and any reply to it may be monitored for security, legal,
>> regulatory compliance and/or other appropriate reasons.


Re: Connect and sync two solr server

2015-09-01 Thread Alexandre Rafalovitch
Is this for multi-datacenter? If so, you may want to review Apple's
presentation at the last Solr Revolution:
https://www.youtube.com/watch?v=_Erkln5WWLw=2=PLU6n9Voqu_1FM8nmVwiWWDRtsEjlPqhgP

Regards,
   Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 1 September 2015 at 04:09, shahper  wrote:
> Hi,
>
> In the link which you have send I Cannot se how to connect to solr cloud for
> synchronization of indexes.
>
> From the description, this is straight forward SolrCloud where you
> have replicas on the separate machines, see:
> https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+SolrCloud
>
> A different way of accomplishing this would be the master/slave style, see:
> https://cwiki.apache.org/confluence/display/solr/Index+Replication
> --
> Shahper Jamil
>
> System Administrator
>
> Tel: +91 124 4548383 Ext- 1033
> UK: +44 845 0047 142 Ext- 5133
>
> TBS Website 
> Techblue Software Pvt. Ltd
> The Palms, Plot No 73, Sector 5, IMT Manesar,
> Gurgaon- 122050 (Hr.)
>
> www.techbluesoftware.co.in 
>
>
> TBS Facebook
> 
> TBS Twitter  TBS Google+
>  TBS Linked In
> 
>
> TBS Branding 
>


Re: Issue Using Solr 5.3 Authentication and Authorization Plugins

2015-09-01 Thread Noble Paul
Looks like there is a bug in that . On start/restart the security.json
is not loaded
I shall open a ticket

https://issues.apache.org/jira/browse/SOLR-8000

On Tue, Sep 1, 2015 at 1:01 PM, Noble Paul  wrote:
> I'm investigating why restarts or first time start does not read the
> security.json
>
> On Tue, Sep 1, 2015 at 1:00 PM, Noble Paul  wrote:
>> I removed that statement
>>
>> "If activating the authorization plugin doesn't protect the admin ui,
>> how does one protect access to it?"
>>
>> One does not need to protect the admin UI. You only need to protect
>> the relevant API calls . I mean it's OK to not protect the CSS and
>> HTML stuff.  But if you perform an action to create a core or do a
>> query through admin UI , it automatically will prompt you for
>> credentials (if those APIs are protected)
>>
>> On Tue, Sep 1, 2015 at 12:41 PM, Kevin Lee  wrote:
>>> Thanks for the clarification!
>>>
>>> So is the wiki page incorrect at
>>> https://cwiki.apache.org/confluence/display/solr/Basic+Authentication+Plugin
>>>  which says that the admin ui will require authentication once the 
>>> authorization plugin is activated?
>>>
>>> "An authorization plugin is also available to configure Solr with 
>>> permissions to perform various activities in the system. Once activated, 
>>> access to the Solr Admin UI and all requests will need to be authenticated 
>>> and users will be required to have the proper authorization for all 
>>> requests, including using the Admin UI and making any API calls."
>>>
>>> If activating the authorization plugin doesn't protect the admin ui, how 
>>> does one protect access to it?
>>>
>>> Also, the issue I'm having is not just at restart.  According to the docs 
>>> security.json should be uploaded to Zookeeper before starting any of the 
>>> Solr instances.  However, I tried to upload security.json before starting 
>>> any of the Solr instances, but it would not pick up the security config 
>>> until after the Solr instances are already running and then uploading the 
>>> security.json again.  I can see in the logs at startup that the Solr 
>>> instances don't see any plugin enabled even though security.json is already 
>>> in zookeeper and then after they are started and the security.json is 
>>> uploaded again I see it reconfigure to use the plugin.
>>>
>>> Thanks,
>>> Kevin
>>>
 On Aug 31, 2015, at 11:22 PM, Noble Paul  wrote:

 Admin UI is not protected by any of these permissions. Only if you try
 to perform a protected operation , it asks for a password.

 I'll investigate the restart problem and report my  findings

> On Tue, Sep 1, 2015 at 3:10 AM, Kevin Lee  
> wrote:
> Anyone else running into any issues trying to get the authentication and 
> authorization plugins in 5.3 working?
>
>> On Aug 29, 2015, at 2:30 AM, Kevin Lee  wrote:
>>
>> Hi,
>>
>> I’m trying to use the new basic auth plugin for Solr 5.3 and it doesn’t 
>> seem to be working quite right.  Not sure if I’m missing steps or there 
>> is a bug.  I am able to get it to protect access to a URL under a 
>> collection, but am unable to get it to secure access to the Admin UI.  
>> In addition, after stopping the Solr and Zookeeper instances, the 
>> security.json is still in Zookeeper, however Solr is allowing access to 
>> everything again like the security configuration isn’t in place.
>>
>> Contents of security.json taken from wiki page, but edited to produce 
>> valid JSON.  Had to move comma after 3rd from last “}” up to just after 
>> the last “]”.
>>
>> {
>> "authentication":{
>> "class":"solr.BasicAuthPlugin",
>> "credentials":{"solr":"IV0EHq1OnNrj6gvRCwvFwTrZ1+z1oBbnQdiVC3otuq0= 
>> Ndd7LKvVBAaZIF0QAVi1ekCfAJXr1GGfLtRUXhgrF8c="}
>> },
>> "authorization":{
>> "class":"solr.RuleBasedAuthorizationPlugin",
>> "permissions":[{"name":"security-edit",
>>"role":"admin"}],
>> "user-role":{"solr":"admin"}
>> }}
>>
>> Here are the steps I followed:
>>
>> Upload security.json to zookeeper
>> ./zkcli.sh -z localhost:2181,localhost:2182,localhost:2183 -cmd putfile 
>> /security.json ~/solr/security.json
>>
>> Use zkCli.sh from Zookeeper to ensure the security.json is in Zookeeper 
>> at /security.json.  It is there and looks like what was originally 
>> uploaded.
>>
>> Start Solr Instances
>>
>> Attempt to create a permission, however get the following error:
>> {
>> "responseHeader":{
>>  "status":400,
>>  "QTime":0},
>> "error":{
>>  "msg":"No authorization plugin configured",
>>  "code":400}}
>>
>> Upload security.json again.
>> ./zkcli.sh -z localhost:2181,localhost:2182,localhost:2183 -cmd putfile 
>> 

Re: Connect and sync two solr server

2015-09-01 Thread Erick Erickson
I don't really understand your problem here. The whole point
behind SolrCloud is that you simply create a collection where
each shard has replicas and the rest is automatic.

So, assuming you have one collection already running under
SolrCloud (i.e. it's all being coordinated through Zookeeper),
all you have to do is to add more replicas is use the
Collections API to execute an ADDREPLICA command. This
should be done for every shard in your collection. See:
https://cwiki.apache.org/confluence/display/solr/Collections+API

Replicas do not have to be on the same machines, that would
be useless. All the getting started stuff is intentionally created
on a single machine just to illustrated the process without
forcing someone to create a network of machines first. There's
nothing special you need to do to use separate machines. The
whole point of Zookeeper is to coordinate across multiple
machines.

It'll automatically connect to the proper leader, replicate the
index and start serving queries.

If this isn't clear, please describe what you've actually tried and
what the problems you're seeing are.

Best,
Erick

On Tue, Sep 1, 2015 at 1:09 AM, shahper  wrote:
> Hi,
>
> In the link which you have send I Cannot se how to connect to solr cloud for
> synchronization of indexes.
>
> From the description, this is straight forward SolrCloud where you
> have replicas on the separate machines, see:
> https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+SolrCloud
>
> A different way of accomplishing this would be the master/slave style, see:
> https://cwiki.apache.org/confluence/display/solr/Index+Replication
> --
> Shahper Jamil
>
> System Administrator
>
> Tel: +91 124 4548383 Ext- 1033
> UK: +44 845 0047 142 Ext- 5133
>
> TBS Website 
> Techblue Software Pvt. Ltd
> The Palms, Plot No 73, Sector 5, IMT Manesar,
> Gurgaon- 122050 (Hr.)
>
> www.techbluesoftware.co.in 
>
>
> TBS Facebook
> 
> TBS Twitter  TBS Google+
>  TBS Linked In
> 
>
> TBS Branding 
>