Re: pivoting with json facet api

2016-04-21 Thread Yangrui Guo
Thanks so much! Are you also contributing to Solr development?

On Thu, Apr 21, 2016 at 3:33 PM, Alisa Z.  wrote:

>  Hi Yangrui,
>
> I have summarized some experiments about Solr nesting capabilities
> (however, it does not include precisely pivoting yet more of faceting up to
> parents and down to children with some statictics) so maybe you could find
> an idea there:
>
>
> https://medium.com/@alisazhila/solr-s-nesting-on-solr-s-capabilities-to-handle-deeply-nested-document-structures-50eeaaa4347a#.dbxdv3zdp
>
>
> Please, let me know if it were useful in comments. You could also specify
> your problem a bit more if you don't find the answer.
>
> Cheers,
> Alisa
>
>
>
> >Четверг, 21 апреля 2016, 1:01 -04:00 от Yangrui Guo  >:
> >
> >Hi
> >
> >I am trying to facet results on my nest documents. The solr document did
> >not say much on how to pivot with json api with nest documents. Could
> >someone show me some examples? Thanks very much.
> >
> >Yangrui
>
>


Re: Making managed schema unmutable correctly?

2016-04-21 Thread Boman


From: "Boman [via Lucene]" 
>
Date: Thursday, April 21, 2016 at 9:52 PM
To: Boman Irani >
Subject: Re: Making managed schema unmutable correctly?

Thanks @Erick. You are right. That collection is not using a managed-schema.

Works now!


If you reply to this email, your message will be added to the discussion below:
http://lucene.472066.n3.nabble.com/Making-managed-schema-unmutable-correctly-tp4264051p4272073.html
To unsubscribe from Making managed schema unmutable correctly?, click 
here.
NAML




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Making-managed-schema-unmutable-correctly-tp4264051p4272074.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Making managed schema unmutable correctly?

2016-04-21 Thread Boman
Thanks @Erick. You are right. That collection is not using a managed-schema.

Works now!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Making-managed-schema-unmutable-correctly-tp4264051p4272073.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Solr5.5:DocValues/CopyField does not work with Atomic updates

2016-04-21 Thread Karthik Ramachandran
We feel the issue is in RealTimeGetComponent.getInputDocument(SolrCore core, 
BytesRef idBytes) where solr calls getNonStoredDVs and add the fields to the 
original document without excluding the copyFields.



We made changes to send the filteredList to searcher.decorateDocValueFields and 
it started working.



Attached is the modified file.


With Thanks & Regards
Karthik Ramachandran
CommVault
P Please don't print this e-mail unless you really need to



-Original Message-
From: Karthik Ramachandran [mailto:mrk...@gmail.com]
Sent: Friday, April 22, 2016 12:08 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr5.5:DocValues/CopyField does not work with Atomic updates



We are trying to update Field A.





-Karthik



On Thu, Apr 21, 2016 at 10:36 PM, John Bickerstaff  wrote:



> Which field do you try to atomically update?  A or B or some other?

> On Apr 21, 2016 8:29 PM, "Tirthankar Chatterjee" <

> tchatter...@commvault.com>

> wrote:

>

> > Hi,

> > Here is the scenario for SOLR5.5:

> >

> > FieldA type= stored=true indexed=true

> >

> > FieldB type= stored=false indexed=true docValue=true

> > usedocvalueasstored=false

> >

> > FieldA copyTo FieldB

> >

> > Try an Atomic update and we are getting this error:

> >

> > possible analysis error: DocValuesField "mtmround" appears more than

> > once in this document (only one value is allowed per field)

> >

> > How do we resolve this.

> >

> >

> >

> > ***Legal

> > Disclaimer***

> > "This communication may contain confidential and privileged material

> > for the sole use of the intended recipient. Any unauthorized review,

> > use or distribution by others is strictly prohibited. If you have

> > received the message by mistake, please advise the sender by reply

> > email and delete the message. Thank

> you."

> > 

> > **

>



***Legal Disclaimer***
"This communication may contain confidential and privileged material for the
sole use of the intended recipient. Any unauthorized review, use or distribution
by others is strictly prohibited. If you have received the message by mistake,
please advise the sender by reply email and delete the message. Thank you."
**

Re: Solr5.5:DocValues/CopyField does not work with Atomic updates

2016-04-21 Thread Karthik Ramachandran
We are trying to update Field A.


-Karthik

On Thu, Apr 21, 2016 at 10:36 PM, John Bickerstaff  wrote:

> Which field do you try to atomically update?  A or B or some other?
> On Apr 21, 2016 8:29 PM, "Tirthankar Chatterjee" <
> tchatter...@commvault.com>
> wrote:
>
> > Hi,
> > Here is the scenario for SOLR5.5:
> >
> > FieldA type= stored=true indexed=true
> >
> > FieldB type= stored=false indexed=true docValue=true
> > usedocvalueasstored=false
> >
> > FieldA copyTo FieldB
> >
> > Try an Atomic update and we are getting this error:
> >
> > possible analysis error: DocValuesField "mtmround" appears more than once
> > in this document (only one value is allowed per field)
> >
> > How do we resolve this.
> >
> >
> >
> > ***Legal Disclaimer***
> > "This communication may contain confidential and privileged material for
> > the
> > sole use of the intended recipient. Any unauthorized review, use or
> > distribution
> > by others is strictly prohibited. If you have received the message by
> > mistake,
> > please advise the sender by reply email and delete the message. Thank
> you."
> > **
>


Re: Re[2]: Traversal of documents through network

2016-04-21 Thread vidya
(1) So,dispalying the content(traversal of documents) depends on my
pagination ?
If i specify all 500 documents to be dispalyed and first 10 on the first
page and remaining on the other, that implies that all documents traverse
through network ?

(2) In my application, front end of UI is developed by one team and back end
of indexing is done by one team. How to implement highlighting feature in
solr. By developing a request handler with highlighting parameters turned on
and with termvector components, can i highlight the term queried ? Or any
special configuration needed ?

Please help me on these two.

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Traversal-of-documents-through-network-tp4271555p4272053.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Making managed schema unmutable correctly?

2016-04-21 Thread Erick Erickson
You're mixing managed and non-managed schema terms here.

"schema.xml" is the old default and is (usually) _not_
editable by the managed schema stuff.

Managed schema schemas are usually named just that "managed_schema".
You can hand edit this if you want, but when you do I'd recommend that all
the Solr nodes be shut down. As to where it lives in ZK, it should be under
configsets/the_name_you_gave_it.

Of course you have to edit it somewhere and push it up to ZK, Solr 5.5
has options
to do this in bin/solr, the "zk" command.

But from the error, you are not configured to use managed schema, which
is enabled in solrconfig.xml. See:
https://cwiki.apache.org/confluence/display/solr/Schema+API
and
https://cwiki.apache.org/confluence/display/solr/Schema+Factory+Definition+in+SolrConfig

Best,
Erick

On Thu, Apr 21, 2016 at 3:02 PM, Boman  wrote:
> Where do I find the shema.xml to hand edit? I can't find it on my node
> running ZK.
>
> I'm not sure what's happening, but when I try to add a field to the schema
> for one of the collections (I am running in SolrCloud mode), I get:
>
> curl -X POST -H 'Content-type:application/json' --data-binary '{ "add-field"
> : { "name":"newtestfield__c", "type":"string", "stored":true}}'
> http://localhost:8983/solr/00D6100HHnj/schema
> {
>   "responseHeader":{
> "status":0,
> "QTime":2},
>   "errors":[{"errorMessages":"schema is not editable"}]}
>
> Thanks.
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Making-managed-schema-unmutable-correctly-tp4264051p4271963.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr5.5:DocValues/CopyField does not work with Atomic updates

2016-04-21 Thread John Bickerstaff
Which field do you try to atomically update?  A or B or some other?
On Apr 21, 2016 8:29 PM, "Tirthankar Chatterjee" 
wrote:

> Hi,
> Here is the scenario for SOLR5.5:
>
> FieldA type= stored=true indexed=true
>
> FieldB type= stored=false indexed=true docValue=true
> usedocvalueasstored=false
>
> FieldA copyTo FieldB
>
> Try an Atomic update and we are getting this error:
>
> possible analysis error: DocValuesField "mtmround" appears more than once
> in this document (only one value is allowed per field)
>
> How do we resolve this.
>
>
>
> ***Legal Disclaimer***
> "This communication may contain confidential and privileged material for
> the
> sole use of the intended recipient. Any unauthorized review, use or
> distribution
> by others is strictly prohibited. If you have received the message by
> mistake,
> please advise the sender by reply email and delete the message. Thank you."
> **


Solr5.5:DocValues/CopyField does not work with Atomic updates

2016-04-21 Thread Tirthankar Chatterjee
Hi,
Here is the scenario for SOLR5.5:

FieldA type= stored=true indexed=true

FieldB type= stored=false indexed=true docValue=true 
usedocvalueasstored=false

FieldA copyTo FieldB

Try an Atomic update and we are getting this error:

possible analysis error: DocValuesField "mtmround" appears more than once in 
this document (only one value is allowed per field)

How do we resolve this.



***Legal Disclaimer***
"This communication may contain confidential and privileged material for the
sole use of the intended recipient. Any unauthorized review, use or distribution
by others is strictly prohibited. If you have received the message by mistake,
please advise the sender by reply email and delete the message. Thank you."
**

Re: Storing different collection on different hard disk

2016-04-21 Thread Zheng Lin Edwin Yeo
Yes, that works as well too.
Thank you!

Regards,
Edwin

On 21 April 2016 at 19:00, Bram Van Dam  wrote:

> On 21/04/16 03:56, Zheng Lin Edwin Yeo wrote:
> > This is the working one:
> > dataDir=D:/collection1/data
>
> Ah yes. Backslashes are escape characters in properties files.
> C:\\collection1\\data would probably work as well.
>
>  - bram
>


Re: Cross collection join in Solr 5.x

2016-04-21 Thread Susmit Shukla
I have done it by extending the solr join plugin. Needed to override 2
methods from join plugin and it works out.

Thanks,
Susmit

On Thu, Apr 21, 2016 at 12:01 PM, Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:

> Hello,
>
> There is no much progress on
> https://issues.apache.org/jira/browse/SOLR-8297
> Although it's really achievable.
>
> On Thu, Apr 21, 2016 at 7:52 PM, Shikha Somani 
> wrote:
>
> > Greetings,
> >
> >
> > Background: Our application is using Solr 4.10 and has multiple
> > collections all of them sharded equally on Solr. These collections were
> > joined to support complex queries.
> >
> >
> > Problem: We are trying to upgrade to Solr 5.x. However from Solr 5.2
> > onward to join two collections it is a requirement that the secondary
> > collection must be singly sharded and replicated where primary collection
> > is. But collections are very large and need to be sharded for
> performance.
> >
> >
> > Query: Is there any way in Solr 5.x to join two collections both of which
> > are equally sharded i.e. the secondary collection is also sharded as the
> > primary.
> >
> >
> > Thanks,
> > Shikha
> >
> > 
> >
> >
> >
> >
> >
> >
> > NOTE: This message may contain information that is confidential,
> > proprietary, privileged or otherwise protected by law. The message is
> > intended solely for the named addressee. If received in error, please
> > destroy and notify the sender. Any use of this email is prohibited when
> > received in error. Impetus does not represent, warrant and/or guarantee,
> > that the integrity of this communication has been maintained nor that the
> > communication is free of errors, virus, interception or interference.
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> 
> 
>


Re[2]: how to restrict phrase to appear in same child document

2016-04-21 Thread Alisa Z .
 I'm afraid that if the queries are given in such a loose natural language 
form, the only way to handle it is to introduce some natural language 
processing stage that would form the right query (which is actually a working 
strategy, IBM does so). 

If your document structure is fixed (i.e., you know types of nested documents 
and what fields they exactly contain) , you can try to introduce some basic NLP 
that will detect the entities or nouns,e.g., "driver" and "car" (try 
AlchemyLanguage API  http://www.alchemyapi.com/products/demo/alchemylanguage 
for this) and you will also need some syntactic parser to connect black+driver 
and white+mercedes correctly.  



>Среда, 20 апреля 2016, 15:31 -04:00 от Yangrui Guo :
>
>Hi thanks for answering. My problem is that users do not distinguish what
>color the color belongs to in the query. For example, "which black driver
>has a white mercedes", it is difficult to distinguish which color belongs
>to which field, because there can be thousands of car brands and
>professions. Is there anyway that can achieve the feature I stated been
>fore?
>
>On Wednesday, April 20, 2016, Alisa Z. < prol...@mail.ru > wrote:
>
>>  Yangrui,
>>
>> First, have you indexed your documents with proper nested document
>> structure [
>>  
>> https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-NestedChildDocuments
>>  ]?
>> From the peice of data you showed, it seems that you just put it right as
>> it is and it all got flattened.
>>
>> Then, you'll probably want to introduce a distinguishing
>> "type"/"category"/"path" fields into your data, so it would look like this:
>>
>> {
>> type:top
>> id:
>> {
>> type:car_color
>> car:
>> color:
>> }
>> {
>>   type:driver_color
>> driver:
>> color:
>> }
>> }
>>
>>
>> >Wed, 20 Apr 2016 -3:28:33 -0400 от Yangrui Guo < guoyang...@gmail.com
>> >:
>> >
>> >hello
>> >
>> >I have a nested document type in my index. Here's the structure of my
>> >document:
>> >
>> >{
>> >id:
>> >{
>> >car:
>> >color:
>> >}
>> >{
>> >driver:
>> >color:
>> >}
>> >}
>> >
>> >However, when I use the query q={!parent
>> >which="content_type:parent"}+(black AND driver)={!parent
>> >which="content_type:parent"}+(white AND mercedes), the result also
>> >contained white driver with black mercedes. I know I can put fields before
>> >terms but it is not always easy to do this. Users might just enter one
>> >string. How can I modify my query to require that the terms between two
>> >parentheses must appear in the same child document, or boost those meet
>> the
>> >criteria? Thanks
>>
>>



Re: pivoting with json facet api

2016-04-21 Thread Alisa Z .
 Hi Yangrui, 

I have summarized some experiments about Solr nesting capabilities (however, it 
does not include precisely pivoting yet more of faceting up to parents and down 
to children with some statictics) so maybe you could find an idea there: 

https://medium.com/@alisazhila/solr-s-nesting-on-solr-s-capabilities-to-handle-deeply-nested-document-structures-50eeaaa4347a#.dbxdv3zdp
   

Please, let me know if it were useful in comments. You could also specify your 
problem a bit more if you don't find the answer. 

Cheers,
Alisa 



>Четверг, 21 апреля 2016, 1:01 -04:00 от Yangrui Guo :
>
>Hi
>
>I am trying to facet results on my nest documents. The solr document did
>not say much on how to pivot with json api with nest documents. Could
>someone show me some examples? Thanks very much.
>
>Yangrui



Re[2]: Traversal of documents through network

2016-04-21 Thread Alisa Z .
 Well, it took me 7 milliseconds to index a 100MB dataset on a local Solr. So 
you could assume that for 1 GB it would take 70ms= 0.07s which is still pretty 
fast. 
Yet dealing with network delays is a separate issue.  

100 wikipedia article-size documents shouldn't be a big problem. 


>Четверг, 21 апреля 2016, 0:57 -04:00 от vidya :
>
>ok. I understand that. So, you would say documents traverse through network.
>If i specify some 100 docs to be dispalyed on my first page, will it effect
>performance. While docs gets traversed, will there be any high volume
>traffic and effects performance of the application.
>
>
>And whats the time solr takes to index 1GB of data in general.
>
>
>Thanks
>
>
>
>--
>View this message in context:  
>http://lucene.472066.n3.nabble.com/Traversal-of-documents-through-network-tp4271555p4271743.html
>Sent from the Solr - User mailing list archive at Nabble.com.



Re: Making managed schema unmutable correctly?

2016-04-21 Thread Boman
Where do I find the shema.xml to hand edit? I can't find it on my node
running ZK.

I'm not sure what's happening, but when I try to add a field to the schema
for one of the collections (I am running in SolrCloud mode), I get:

curl -X POST -H 'Content-type:application/json' --data-binary '{ "add-field"
: { "name":"newtestfield__c", "type":"string", "stored":true}}'
http://localhost:8983/solr/00D6100HHnj/schema
{
  "responseHeader":{
"status":0,
"QTime":2},
  "errors":[{"errorMessages":"schema is not editable"}]}

Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Making-managed-schema-unmutable-correctly-tp4264051p4271963.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: complete cluster shutdown

2016-04-21 Thread John Bickerstaff
I guess errors like "fsync-ing the write ahead log in SyncThread:5 took
7268ms which will adversely effect operation latency."

and: "likely client has closed socket"

make me wonder if something went wrong in terms of running out of disk
space for logs (thus giving your OS no space for necessary functions)  or
if you ran into memory issues, or if something changed your network /
firewall settings to prevent communication on ports that used to work...?

I'm not an expert on the code, but those kind of external problems is where
I'd start looking if I saw errors like this.

Were all the VM's up and running or were they down too?

On Wed, Apr 20, 2016 at 10:06 PM, Zap Org  wrote:

> I have 5 zookeeper and 2 solr machines and after a month or two whole
> clustre shutdown i dont know why. The logs i get in zookeeper are attached
> below. otherwise i dont get any error. All this is based on linux VM.
>
> 2016-03-11 16:50:18,159 [myid:5] - WARN  [SyncThread:5:FileTxnLog@334] -
> fsync-ing the write ahead log in SyncThread:5 took 7268ms which will
> adversely effect operation latency. See the ZooKeeper troubleshooting guide
> 2016-03-11 16:50:18,161 [myid:5] - WARN  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2185:NIOServerCnxn@357] - caught end of stream exception
> EndOfStreamException: Unable to read additional data from client sessionid
> 0x4535f00ee370001, likely client has closed socket
> at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
> at
>
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
> at java.lang.Thread.run(Thread.java:745)
> 2016-03-11 16:50:18,163 [myid:5] - INFO  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2185:NIOServerCnxn@1007] - Closed socket connection for
> client /localhost which had sessionid 0x4535f00ee370001
> 2016-03-11 16:50:18,166 [myid:5] - WARN  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2185:NIOServerCnxn@357] - caught end of stream exception
> EndOfStreamException: Unable to read additional data from client sessionid
> 0x2535ef744dd0005, likely client has closed socket
> at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
> at
>
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
> at java.lang.Thread.run(Thread.java:745)
>


Re: NoSuchFileException errors common on version 5.5.0

2016-04-21 Thread Jay Potharaju
Hi.
I am seeing lot of these errors in my current 5.5.0 dev install. Would it
make sense to use 5.5 in production or a different version is recommended ?
I am using DIH, not sure if that matters in this case.

Thanks


On Fri, Mar 11, 2016 at 3:57 AM, Shai Erera  wrote:

> Hey Shawn,
>
> I added segments file information (name and size) to Core admin status API.
> Turns out that you might get into NoSuchFileException if indexing happens
> and the commit point has changed, but the IndexReader LukeRequestHandler
> receives hasn't picked up the new commit yet, in which case the old
> segments_N file was deleted and computing its size resulted in that
> exception.
>
> I pushed a fix for it which will be released in any one of future releases,
> including 5.5.1 if we'll have any. The fix includes logging the exception
> and returning -1 as the file size.
>
> Shai
>
> On Fri, Mar 11, 2016 at 12:21 AM Shawn Heisey  wrote:
>
> > On 3/10/2016 12:18 PM, Shawn Heisey wrote:
> > > I pulled down branch_5_5 and installed a 5.5.1 snapshot.  Had to edit
> > > lucene/version.properties to get it to be 5.5.1.  I also had to edit
> the
> > > SolrIdentifierValidator class to allow hyphens, since I have them in
> > > some of my core names.  The NoSuchFileException errors are gone now.
> >
> > Spoke too soon.
> >
> > The log message did change a little bit.  Now it's only one log entry on
> > LukeRequestHandler instead of two separate log entries, and it's a WARN
> > instead of ERROR.
> >
> > 2016-03-10 14:35:00.038 WARN  (qtp1012570586-11405) [   x:spark3live]
> > org.apache.solr.handler.admin.LukeRequestHandler Error getting file
> > length for [segments_c5t]
> > java.nio.file.NoSuchFileException:
> > /index/solr5/data/data/spark3_0/index/segments_c5t
> > at
> > sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
> > at
> > sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
> > at
> > sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
> > at
> >
> >
> sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55)
> > at
> >
> >
> sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:144)
> > at
> >
> >
> sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99)
> > at java.nio.file.Files.readAttributes(Files.java:1737)
> > at java.nio.file.Files.size(Files.java:2332)
> > at
> > org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:210)
> >
> > Something else to note:  It wasn't 5.5.0 that I had installed, it was
> > 5.5.0-SNAPSHOT -- I installed it some time before 5.5.0 was released.
> > Looks like I did the install of that version on January 29th.
> >
> > Thanks,
> > Shawn
> >
> >
>



-- 
Thanks
Jay Potharaju


Re: Cross collection join in Solr 5.x

2016-04-21 Thread Mikhail Khludnev
Hello,

There is no much progress on https://issues.apache.org/jira/browse/SOLR-8297
Although it's really achievable.

On Thu, Apr 21, 2016 at 7:52 PM, Shikha Somani  wrote:

> Greetings,
>
>
> Background: Our application is using Solr 4.10 and has multiple
> collections all of them sharded equally on Solr. These collections were
> joined to support complex queries.
>
>
> Problem: We are trying to upgrade to Solr 5.x. However from Solr 5.2
> onward to join two collections it is a requirement that the secondary
> collection must be singly sharded and replicated where primary collection
> is. But collections are very large and need to be sharded for performance.
>
>
> Query: Is there any way in Solr 5.x to join two collections both of which
> are equally sharded i.e. the secondary collection is also sharded as the
> primary.
>
>
> Thanks,
> Shikha
>
> 
>
>
>
>
>
>
> NOTE: This message may contain information that is confidential,
> proprietary, privileged or otherwise protected by law. The message is
> intended solely for the named addressee. If received in error, please
> destroy and notify the sender. Any use of this email is prohibited when
> received in error. Impetus does not represent, warrant and/or guarantee,
> that the integrity of this communication has been maintained nor that the
> communication is free of errors, virus, interception or interference.
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





RE: Is it possible to configure a minimum field length for the fieldNorm value?

2016-04-21 Thread jimi.hullegard
Hi Ahmet,

Yes, I have also come to that conclusion, that I need to do one of those things 
if I want this function, since Solr/Lucene is lacking in this area. Although 
after some discussion with my coworkers, we decided to simply disable norms for 
the title field, and not do anything more, for now. Hopefully all the other 
boosting logic we use will give a reasonable user experience even without a 
length norm for the title.

Thanks for your help. :)

/Jimi

-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com] 
Sent: Thursday, April 21, 2016 7:10 PM
To: solr-user@lucene.apache.org; Hullegård, Jimi 

Subject: Re: Is it possible to configure a minimum field length for the 
fieldNorm value?

Hi Jimi,

Please do either :

1) write your own similarity that saves document length (docValues) in a 
lossless way and implement whatever punishment/algorithm you want.

or

2) disable norms altogether add an integer field (title_lenght) and populate it 
(outside the solr) with the number of words in the title field. And use some 
function query to influence the score. e.g. 
q=something=someFuctionQuery(title_lenght)
https://cwiki.apache.org/confluence/display/solr/Function+Queries

Ahmet



On Thursday, April 21, 2016 9:37 AM, "jimi.hulleg...@svensktnaringsliv.se" 
 wrote:
Yes, it definately seems to be the main problem for us. I did some simple tests 
of the encoding and decoding calculations in DefaultSimilarity, and my findings 
are:

* For input between 1.0 and 0.5, a difference of 0.01 in the input causes the 
output to change by a value of 0 or 0.125 depending if it is an edge case or not
* For input between 0.5 and 0.25, a difference of 0.01 in the input causes the 
output to change by a value of 0 or 0.0625
* For input between 0.25 and 0.125, a difference of 0.01 in the input causes 
the output to change by a value of 0 or 0.015625
* And so on, with smaller and smaller differences in the output value for edge 
cases

I would say that the main problem is for input values between 1.0 and 0.5. So 
if one could tweak the SweetSpotSimilarity to start it's "raw" (ie not encoded) 
lengthNorm values at 0.5 instead of 1.0, it would solve my problem for the 
title field. This would of course worsen the precision for longer text values, 
but since this is a title field that is not a problem.

So, is there a way to configure SweetSpotSimilarity to use 0.5 as it's highest 
lengthNorm value, instead of 1.0?

/Jimi



From: Ahmet Arslan 
Sent: Thursday, April 21, 2016 2:24 AM
To: solr-user@lucene.apache.org
Subject: Re: Is it possible to configure a minimum field length for the 
fieldNorm value?

Hi Jim,

fieldNorm encode/decode thing cause some precision loss.
This may be a problem when dealing with very short documents.
You can find many discussions on this topic.

ahmet



On Thursday, April 21, 2016 3:10 AM, "jimi.hulleg...@svensktnaringsliv.se" 
 wrote:
Ok sure, I can try and give some examples :)

Lets say that we have the following documents:

Id: 1
Title: John Doe

Id: 2
Title: John Doe Jr.

Id: 3
Title: John Lennon: The Life

Id: 4
Title: John Thompson's Modern Course for the Piano: First Grade Book

Id: 5
Title: I Rode With Stonewall: Being Chiefly The War Experiences of the Youngest 
Member of Jackson's Staff from John Brown's Raid to the Hanging of Mrs. Surratt


And in general, when a search word matches the title, I would like to have the 
length of the title field influence the score, so that matching documents with 
shorter title get a higher score than documents with longer title, all else 
considered equal.

So, when a user searches for "John", I would like the results to be pretty much 
in the order presented above. Though, it is not crucial that for example 
document 1 comes before document 2. But I would surely want document 1-3 to 
come before document 4 and 5.

In my mind, the fieldNorm is a perfect solution for this. At least in theory. 
In practice, the encoding of the fieldNorm seems to make this function much 
less useful for this use case. Unless I have missed something.

Is there another way to achive something like this? Note that I don't want a 
general boost on documents with short titles, I only want to boost them if the 
title field actually matched the query.

/Jimi



From: Jack Krupansky 
Sent: Thursday, April 21, 2016 1:28 AM
To: solr-user@lucene.apache.org
Subject: Re: Is it possible to configure a minimum field length for the 
fieldNorm value?

I'm not sure I fully follow what distinction you're trying to focus on. I mean, 
traditionally length normalization has simply tried to distinguish a title 
field (rarely more than a dozen words) from a full body of text, or maybe an 
abstract, not things like exactly how many words 

Re: Is it possible to configure a minimum field length for the fieldNorm value?

2016-04-21 Thread Ahmet Arslan
Hi Jimi,

Please do either :

1) write your own similarity that saves document length (docValues) in a 
lossless way and implement whatever punishment/algorithm you want.

or

2) disable norms altogether add an integer field (title_lenght) and populate it 
(outside the solr) with the number of words in the title field. And use some 
function query to influence the score. e.g. 
q=something=someFuctionQuery(title_lenght)
https://cwiki.apache.org/confluence/display/solr/Function+Queries

Ahmet



On Thursday, April 21, 2016 9:37 AM, "jimi.hulleg...@svensktnaringsliv.se" 
 wrote:
Yes, it definately seems to be the main problem for us. I did some simple tests 
of the encoding and decoding calculations in DefaultSimilarity, and my findings 
are:

* For input between 1.0 and 0.5, a difference of 0.01 in the input causes the 
output to change by a value of 0 or 0.125 depending if it is an edge case or not
* For input between 0.5 and 0.25, a difference of 0.01 in the input causes the 
output to change by a value of 0 or 0.0625
* For input between 0.25 and 0.125, a difference of 0.01 in the input causes 
the output to change by a value of 0 or 0.015625
* And so on, with smaller and smaller differences in the output value for edge 
cases

I would say that the main problem is for input values between 1.0 and 0.5. So 
if one could tweak the SweetSpotSimilarity to start it's "raw" (ie not encoded) 
lengthNorm values at 0.5 instead of 1.0, it would solve my problem for the 
title field. This would of course worsen the precision for longer text values, 
but since this is a title field that is not a problem.

So, is there a way to configure SweetSpotSimilarity to use 0.5 as it's highest 
lengthNorm value, instead of 1.0?

/Jimi



From: Ahmet Arslan 
Sent: Thursday, April 21, 2016 2:24 AM
To: solr-user@lucene.apache.org
Subject: Re: Is it possible to configure a minimum field length for the 
fieldNorm value?

Hi Jim,

fieldNorm encode/decode thing cause some precision loss.
This may be a problem when dealing with very short documents.
You can find many discussions on this topic.

ahmet



On Thursday, April 21, 2016 3:10 AM, "jimi.hulleg...@svensktnaringsliv.se" 
 wrote:
Ok sure, I can try and give some examples :)

Lets say that we have the following documents:

Id: 1
Title: John Doe

Id: 2
Title: John Doe Jr.

Id: 3
Title: John Lennon: The Life

Id: 4
Title: John Thompson's Modern Course for the Piano: First Grade Book

Id: 5
Title: I Rode With Stonewall: Being Chiefly The War Experiences of the Youngest 
Member of Jackson's Staff from John Brown's Raid to the Hanging of Mrs. Surratt


And in general, when a search word matches the title, I would like to have the 
length of the title field influence the score, so that matching documents with 
shorter title get a higher score than documents with longer title, all else 
considered equal.

So, when a user searches for "John", I would like the results to be pretty much 
in the order presented above. Though, it is not crucial that for example 
document 1 comes before document 2. But I would surely want document 1-3 to 
come before document 4 and 5.

In my mind, the fieldNorm is a perfect solution for this. At least in theory. 
In practice, the encoding of the fieldNorm seems to make this function much 
less useful for this use case. Unless I have missed something.

Is there another way to achive something like this? Note that I don't want a 
general boost on documents with short titles, I only want to boost them if the 
title field actually matched the query.

/Jimi



From: Jack Krupansky 
Sent: Thursday, April 21, 2016 1:28 AM
To: solr-user@lucene.apache.org
Subject: Re: Is it possible to configure a minimum field length for the 
fieldNorm value?

I'm not sure I fully follow what distinction you're trying to focus on. I
mean, traditionally length normalization has simply tried to distinguish a
title field (rarely more than a dozen words) from a full body of text, or
maybe an abstract, not things like exactly how many words were in a title.
Or, as another example, a short newswire article of a few paragraphs vs. a
feature-length article, paper, or even book. IOW, traditionally it was more
of a boolean than a broad range of values. Sure, yes, you absolutely can
define a custom similarity with a custom norm that supports a wide range of
lengths, but you'll have to decide what you really want  to achieve to tune
it.

Maybe you could give a couple examples of field values that you feel should
be scored differently based on length.

-- Jack Krupansky

On Wed, Apr 20, 2016 at 7:17 PM, 
wrote:

> I am talking about the title field. And for the title field, a sweetspot
> interval of 1 to 50 makes very little sense. I want to have a fieldNorm

Cross collection join in Solr 5.x

2016-04-21 Thread Shikha Somani
Greetings,


Background: Our application is using Solr 4.10 and has multiple collections all 
of them sharded equally on Solr. These collections were joined to support 
complex queries.


Problem: We are trying to upgrade to Solr 5.x. However from Solr 5.2 onward to 
join two collections it is a requirement that the secondary collection must be 
singly sharded and replicated where primary collection is. But collections are 
very large and need to be sharded for performance.


Query: Is there any way in Solr 5.x to join two collections both of which are 
equally sharded i.e. the secondary collection is also sharded as the primary.


Thanks,
Shikha








NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.


Re: complete cluster shutdown

2016-04-21 Thread Shawn Heisey
On 4/20/2016 10:06 PM, Zap Org wrote:
> I have 5 zookeeper and 2 solr machines and after a month or two whole
> clustre shutdown i dont know why. The logs i get in zookeeper are attached
> below. otherwise i dont get any error. All this is based on linux VM.
>
> 2016-03-11 16:50:18,159 [myid:5] - WARN  [SyncThread:5:FileTxnLog@334] -
> fsync-ing the write ahead log in SyncThread:5 took 7268ms which will
> adversely effect operation latency. See the ZooKeeper troubleshooting guide
> 2016-03-11 16:50:18,161 [myid:5] - WARN  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2185:NIOServerCnxn@357] - caught end of stream exception
> EndOfStreamException: Unable to read additional data from client sessionid
> 0x4535f00ee370001, likely client has closed socket
> at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
> at
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
> at java.lang.Thread.run(Thread.java:745)

You'll need to further describe exactly what "whole cluster shutdown"
means.  I cannot tell from the logs, and there are very few situations I
can imagine where Solr would just die.  I will need to know which
version of Solr you are using.  If zookeeper is separate from Solr, that
version will also be needed.

The logs you have included indicate are all WARN and INFO logs (no
ERROR), and say that the zookeeper client disconnected.  Assuming that
this zookeeper is only used for this one SolrCloud, the zookeeper client
might be Solr, an instance of CloudSolrClient, or it might be the zkcli
script.

One of the later log entries said "/localhost" which suggests that this
is not set up the way I would recommend setting up a production
SolrCloud deployment.  I recommend each Solr running on a separate
machine using the same port number, each Zookeeper running on a separate
machine using the same port number, and everything using an identical
zkHost string.  In that setup, Zookeeper and Solr might share machines,
but none of the machines will be running more than one of each kind of
process.  If you are running that kind of setup, you will never be using
"localhost" or "127.0.0.1" for connecting to zookeeper.

There are no Solr logs included here, so if something is happening with
Solr, I cannot tell what it is.

Thanks,
Shawn



Re: DIH Schedule Solr 6

2016-04-21 Thread Shawn Heisey
On 4/21/2016 5:25 AM, Mahmoud Almokadem wrote:
> We have a cluster of solr 4.8.1 installed on tomcat servlet container and 
> we’re able to use DIH Schedule by adding this lines to web.xml of the 
> installation directory:
>
>   
>
> org.apache.solr.handler.dataimport.scheduler.ApplicationListener
>   
>
> No we are planing to migrate to Solr 6 and we already installed it as 
> service. The question is how to install DIHS on Solr 6 as service?

The dataimport scheduler is not an official part of Solr.  There's a
patch in Jira, and somebody has turned it into an external jar.  You
might be able to do exactly what you did for 4.8.1 with 6.0.0, but I
would not be surprised to find it's not compatible with 6.x.  Also, it
will not be officially supported here on this list.  You are on your
own.  You might be able to get some help from the person who wrote the
dataimport scheduler, but I have no idea what they are willing to do.

Instead of using something that has no official support: Your operating
system contains a scheduling program -- "cron" on most operating
systems, "Task Scheduler" on Windows.  Write a small script that calls
for an import with a program like curl and ask your operating system to
run it at certain times on the days you need.  If you want to get more
creative and write a more traditional program with HTTP capability, go
for it.

Thanks,
Shawn




solrcloud no active slice servicing hash code error

2016-04-21 Thread 李宏伟
hello,what should i do with the question?
http://stackoverflow.com/questions/33073960/solr-no-active-slice-servicing-hash-code
thanks。



_
  李宏伟  | 考拉FM技术部
 
 手机:15801483916
 QQ: 153563985
 邮箱:l...@kaolafm.com
 官网:www.autoradio.cn
 地址:北京市东城区广渠门内大街43号雍贵中心D座5层,100062
 


Re: Questions on SolrCloud core state, when will Solr recover a "DOWN" core to "ACTIVE" core.

2016-04-21 Thread Rajesh Hazari
Hi Li,

Do you see timeouts liek "CLUSTERSTATUS the collection time out:180s"
if its the case, this may be related to
https://issues.apache.org/jira/browse/SOLR-7940,
and i would say either use the patch file or upgrade.


*Thanks,*
*Rajesh,*
*8328789519,*
*If I don't answer your call please leave a voicemail with your contact
info, *
*will return your call ASAP.*

On Thu, Apr 21, 2016 at 6:02 AM, YouPeng Yang 
wrote:

> Hi
>We have used Solr4.6 for 2 years,If you post more logs ,maybe we can
> fixed it.
>
> 2016-04-21 6:50 GMT+08:00 Li Ding :
>
> > Hi All,
> >
> > We are using SolrCloud 4.6.1.  We have observed following behaviors
> > recently.  A Solr node in a Solrcloud cluster is up but some of the cores
> > on the nodes are marked as down in Zookeeper.  If the cores are parts of
> a
> > multi-sharded collection with one replica,  the queries to that
> collection
> > will fail.  However, when this happened, if we issue queries to the core
> > directly, it returns 200 and correct info.  But once Solr got into the
> > state, the core will be marked down forever unless we do a restart on
> Solr.
> >
> > Has anyone seen this behavior before?  Is there any to get out of the
> state
> > on its own?
> >
> > Thanks,
> >
> > Li
> >
>


DIH Schedule Solr 6

2016-04-21 Thread Mahmoud Almokadem
Hello, 

We have a cluster of solr 4.8.1 installed on tomcat servlet container and we’re 
able to use DIH Schedule by adding this lines to web.xml of the installation 
directory:

  
   
org.apache.solr.handler.dataimport.scheduler.ApplicationListener
  



No we are planing to migrate to Solr 6 and we already installed it as service. 
The question is how to install DIHS on Solr 6 as service?

Thanks,
Mahmoud 



Re: Storing different collection on different hard disk

2016-04-21 Thread Bram Van Dam
On 21/04/16 03:56, Zheng Lin Edwin Yeo wrote:
> This is the working one:
> dataDir=D:/collection1/data

Ah yes. Backslashes are escape characters in properties files.
C:\\collection1\\data would probably work as well.

 - bram


Running out of disk space for Solr, a proposed solution

2016-04-21 Thread Charlie Hull

Hi all,

Here's something we've just released as open source to help cope with 
running out of disk space on a Solr node or cluster. It's pretty early 
so we'd welcome contributions and feedback. Although conceived 
originally for an Elasticsearch project it's also targetted at Solr:

https://github.com/flaxsearch/harahachibu

There's a blog post explaining how and why we built it at 
http://www.flax.co.uk/blog/2016/04/21/running-disk-space-elasticsearch-solr/


Cheers

Charlie
--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk


Re: Questions on SolrCloud core state, when will Solr recover a "DOWN" core to "ACTIVE" core.

2016-04-21 Thread YouPeng Yang
Hi
   We have used Solr4.6 for 2 years,If you post more logs ,maybe we can
fixed it.

2016-04-21 6:50 GMT+08:00 Li Ding :

> Hi All,
>
> We are using SolrCloud 4.6.1.  We have observed following behaviors
> recently.  A Solr node in a Solrcloud cluster is up but some of the cores
> on the nodes are marked as down in Zookeeper.  If the cores are parts of a
> multi-sharded collection with one replica,  the queries to that collection
> will fail.  However, when this happened, if we issue queries to the core
> directly, it returns 200 and correct info.  But once Solr got into the
> state, the core will be marked down forever unless we do a restart on Solr.
>
> Has anyone seen this behavior before?  Is there any to get out of the state
> on its own?
>
> Thanks,
>
> Li
>


concat 2 fields

2016-04-21 Thread vrajesh
to concatenating two fields to use it as one field from
http://grokbase.com/t/lucene/solr-user/138vr75hvj/concat-2-fields-in-another-field
 
,  
but the solution whichever is given i tried but its not working. please help
me on it.
 i am trying to concat latitude and longitude fields to make it as single
unit using following:
 
   

   
 
 i added it to solrconfig.xml.

 some of my doubts are :
 - should we define destination field (geo_location) in schema.xml?

 - i want to make this combined field  (geo_location) as field facet so i
have to add   in 

 - any specific tag in which i should add above process script to make it
working.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/concat-2-fields-tp4271760.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Questions on SolrCloud core state, when will Solr recover a "DOWN" core to "ACTIVE" core.

2016-04-21 Thread danny teichthal
Hi Li,
If you could supply some more info from your logs would help.
We also had some similar issue. There were some bugs related to SolrCloud
that were solved on solr 4.10.4 and further on solr 5.x.
I would suggest you compare your logs with defects on 4.10.4 release notes
to see if they are the same.
Also, send relevant solr/zookeeper parts of logs to the mailing list.


On Thu, Apr 21, 2016 at 1:50 AM, Li Ding  wrote:

> Hi All,
>
> We are using SolrCloud 4.6.1.  We have observed following behaviors
> recently.  A Solr node in a Solrcloud cluster is up but some of the cores
> on the nodes are marked as down in Zookeeper.  If the cores are parts of a
> multi-sharded collection with one replica,  the queries to that collection
> will fail.  However, when this happened, if we issue queries to the core
> directly, it returns 200 and correct info.  But once Solr got into the
> state, the core will be marked down forever unless we do a restart on Solr.
>
> Has anyone seen this behavior before?  Is there any to get out of the state
> on its own?
>
> Thanks,
>
> Li
>


Re: Is it possible to configure a minimum field length for the fieldNorm value?

2016-04-21 Thread jimi.hullegard
Yes, it definately seems to be the main problem for us. I did some simple tests 
of the encoding and decoding calculations in DefaultSimilarity, and my findings 
are:

* For input between 1.0 and 0.5, a difference of 0.01 in the input causes the 
output to change by a value of 0 or 0.125 depending if it is an edge case or not
* For input between 0.5 and 0.25, a difference of 0.01 in the input causes the 
output to change by a value of 0 or 0.0625
* For input between 0.25 and 0.125, a difference of 0.01 in the input causes 
the output to change by a value of 0 or 0.015625
* And so on, with smaller and smaller differences in the output value for edge 
cases

I would say that the main problem is for input values between 1.0 and 0.5. So 
if one could tweak the SweetSpotSimilarity to start it's "raw" (ie not encoded) 
lengthNorm values at 0.5 instead of 1.0, it would solve my problem for the 
title field. This would of course worsen the precision for longer text values, 
but since this is a title field that is not a problem.

So, is there a way to configure SweetSpotSimilarity to use 0.5 as it's highest 
lengthNorm value, instead of 1.0?

/Jimi


From: Ahmet Arslan 
Sent: Thursday, April 21, 2016 2:24 AM
To: solr-user@lucene.apache.org
Subject: Re: Is it possible to configure a minimum field length for the 
fieldNorm value?

Hi Jim,

fieldNorm encode/decode thing cause some precision loss.
This may be a problem when dealing with very short documents.
You can find many discussions on this topic.

ahmet



On Thursday, April 21, 2016 3:10 AM, "jimi.hulleg...@svensktnaringsliv.se" 
 wrote:
Ok sure, I can try and give some examples :)

Lets say that we have the following documents:

Id: 1
Title: John Doe

Id: 2
Title: John Doe Jr.

Id: 3
Title: John Lennon: The Life

Id: 4
Title: John Thompson's Modern Course for the Piano: First Grade Book

Id: 5
Title: I Rode With Stonewall: Being Chiefly The War Experiences of the Youngest 
Member of Jackson's Staff from John Brown's Raid to the Hanging of Mrs. Surratt


And in general, when a search word matches the title, I would like to have the 
length of the title field influence the score, so that matching documents with 
shorter title get a higher score than documents with longer title, all else 
considered equal.

So, when a user searches for "John", I would like the results to be pretty much 
in the order presented above. Though, it is not crucial that for example 
document 1 comes before document 2. But I would surely want document 1-3 to 
come before document 4 and 5.

In my mind, the fieldNorm is a perfect solution for this. At least in theory. 
In practice, the encoding of the fieldNorm seems to make this function much 
less useful for this use case. Unless I have missed something.

Is there another way to achive something like this? Note that I don't want a 
general boost on documents with short titles, I only want to boost them if the 
title field actually matched the query.

/Jimi



From: Jack Krupansky 
Sent: Thursday, April 21, 2016 1:28 AM
To: solr-user@lucene.apache.org
Subject: Re: Is it possible to configure a minimum field length for the 
fieldNorm value?

I'm not sure I fully follow what distinction you're trying to focus on. I
mean, traditionally length normalization has simply tried to distinguish a
title field (rarely more than a dozen words) from a full body of text, or
maybe an abstract, not things like exactly how many words were in a title.
Or, as another example, a short newswire article of a few paragraphs vs. a
feature-length article, paper, or even book. IOW, traditionally it was more
of a boolean than a broad range of values. Sure, yes, you absolutely can
define a custom similarity with a custom norm that supports a wide range of
lengths, but you'll have to decide what you really want  to achieve to tune
it.

Maybe you could give a couple examples of field values that you feel should
be scored differently based on length.

-- Jack Krupansky

On Wed, Apr 20, 2016 at 7:17 PM, 
wrote:

> I am talking about the title field. And for the title field, a sweetspot
> interval of 1 to 50 makes very little sense. I want to have a fieldNorm
> value that differentiates between for example 2, 3, 4 and 5 terms in the
> title, but only very little.
>
> The 20% number I got by simply calculating the difference in the title
> fieldNorm of two documents, where one title was one word longer than the
> other title. And one fieldNorm value was 20% larger then the other as a
> result of that. And since we use multiplicative scoring calculation, a 20%
> increase in the fieldNorm results in a 20% increase in the final score.
>
> I'm not talking about "scores as percentages". I'm simply noting that this
> minor change in the text data (adding or removing one