Re: Data from 4.10 to 6.5.1

2017-05-28 Thread mganeshs
Thanks for the reply. Sure will pay attention. 

Indeed our approach was also to use the latest managed schema and configs
only and add our custom schema from the old version. Luckily we have only
one shard of data and others are replica only and also we are not using any
fields types ( pint, plong etc ) which are all deprecated in new version. So
I guess we are in safer side. Will keep you posted on the results.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Data-from-4-10-to-6-5-1-tp4337410p4337852.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: The unified highlighter html escaping. Seems rather extreme...

2017-05-28 Thread Zheng Lin Edwin Yeo
Hi,

I'm not so sure about the escaping, but to control how much text is
returned as context around the highlighted frag, you can set the following
in solrconfig.xml.

200

This will limit the fragments to consider for highlight to around 200
characters, and it will not return the whole chunk of data.


Regards,
Edwin


On 26 May 2017 at 23:26, Michael Joyner  wrote:

> Isn't the unified html escaper a rather bit extreme in it's escaping?
>
> It makes it hard to deal with for simple post-processing.
>
> The original html escaper seems to do minimial escaping, not every
> non-alphabetical character it can find.
>
> Also, is there a way to control how much text is returned as context
> around the highlighted frag?
>
> Compare:
>
>
> Unified Snippet: [HepatoblastomaPRETE
> XTStage1HepatoblastomaPRETEXTStage&
> #32;2HepatoblastomaPRETEXTStage3He
> patoblastomaPRETEXTStage4Hepatoblastoma
> ChildrensOncologyGroupCCTO

Re: StandardDirectoryReader.java:: applyAllDeletes, writeAllDeletes

2017-05-28 Thread Nawab Zada Asad Iqbal
After reading some more code it seems if we are sure that there are no
deletes in this segment/index, then setting  applyAllDeletes and
writeAllDeletes both to false will achieve similar to what I was getting in
4.5.0

However, after I read the comment from IndexWriter::DirectoryReader
getReader(boolean applyAllDeletes, boolean writeAllDeletes) , it seems that
this method is particular to NRT.  Since we are not using soft commits, can
this change actually improve our performance during full reindex?


Thanks
Nawab









On Sun, May 28, 2017 at 2:16 PM, Nawab Zada Asad Iqbal 
wrote:

> Thanks Michael and Shawn for the detailed response. I was later able to
> pull the full history using gitk; and found the commits behind this patch.
>
> Mike:
>
> So, in solr 4.5.0 ; some earlier developer has added code and config to
> set applyAllDeletes to false when we reindex all the data.  At the moment,
> I am not sure about the performance gain by this.
>
> 
>
>
> I am investigating the question, if this change is still needed in 6.5.1
> or can this be achieved by any other configuration?
>
> For now, we are not planning to use NRT and solrCloud.
>
>
> Thanks
> Nawab
>
> On Sun, May 28, 2017 at 9:26 AM, Michael McCandless <
> luc...@mikemccandless.com> wrote:
>
>> Sorry, yes, that commit was one of many on a feature branch I used to
>> work on LUCENE-5438, which added near-real-time index replication to
>> Lucene.  Before this change, Lucene's replication module required a commit
>> in order to replicate, which is a heavy operation.
>>
>> The writeAllDeletes boolean option asks Lucene to move all recent deletes
>> (tombstone bitsets) to disk while opening the NRT (near-real-time) reader.
>>
>> Normally Lucene won't always do that, and will instead carry the bitsets
>> in memory from writer to reader, for reduced refresh latency.
>>
>> What sort of custom changes do you have in this part of Lucene?
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>> On Sat, May 27, 2017 at 10:35 PM, Nawab Zada Asad Iqbal > > wrote:
>>
>>> Hi all
>>>
>>> I am looking at following change in lucene-solr which doen't mention any
>>> JIRA. How can I know more about it?
>>>
>>> "1ae7291 Mike McCandless on 1/24/16 at 3:17 PM current patch"
>>>
>>> Specifically, I am interested in what 'writeAllDeletes'  does in the
>>> following method. Let me know if it is very stupid question and I should
>>> have done something else before emailing here.
>>>
>>> static DirectoryReader open(IndexWriter writer, SegmentInfos infos,
>>> boolean applyAllDeletes, boolean writeAllDeletes) throws IOException {
>>>
>>> Background: We are running solr4.5 and upgrading to 6.5.1. We have
>>> some custom code in this area, which we need to merge.
>>>
>>>
>>> Thanks
>>>
>>> Nawab
>>>
>>
>>
>


TLog for non-Solrcloud scenario

2017-05-28 Thread Nawab Zada Asad Iqbal
Hi,

SolrCloud document 
mentions:

"The sync can be tunable e.g. flush vs fsync by default can protect against
JVM crashes but not against power failure and can be much faster "

Does it mean that flush protects against JVM crash but not power failure?
While fsync will protect against both scenarios.


Also, this NRT help

explains soft commit as:
"A *soft commit* is much faster since it only makes index changes visible
and does not fsync index files or write a new index descriptor. If the JVM
crashes or there is a loss of power, changes that occurred after the last *hard
commit* will be lost."

This is little confusing, as a soft-commit will only happen after a tlog
entry is flushed. Isn't it? Or doesn't tlog work differently for solrcloud
and non-solrCloud configurations.


Thanks
Nawab


Re: Solr uppercase inside phrase query

2017-05-28 Thread Chien Nguyen
Many thank. I will try it. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-uppercase-inside-phrase-query-tp4337403p4337787.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr uppercase inside phrase query

2017-05-28 Thread Chien Nguyen
Many thank. I will try it 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-uppercase-inside-phrase-query-tp4337403p4337786.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: StandardDirectoryReader.java:: applyAllDeletes, writeAllDeletes

2017-05-28 Thread Nawab Zada Asad Iqbal
Thanks Michael and Shawn for the detailed response. I was later able to
pull the full history using gitk; and found the commits behind this patch.

Mike:

So, in solr 4.5.0 ; some earlier developer has added code and config to set
applyAllDeletes to false when we reindex all the data.  At the moment, I am
not sure about the performance gain by this.




I am investigating the question, if this change is still needed in 6.5.1 or
can this be achieved by any other configuration?

For now, we are not planning to use NRT and solrCloud.


Thanks
Nawab

On Sun, May 28, 2017 at 9:26 AM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> Sorry, yes, that commit was one of many on a feature branch I used to work
> on LUCENE-5438, which added near-real-time index replication to Lucene.
> Before this change, Lucene's replication module required a commit in order
> to replicate, which is a heavy operation.
>
> The writeAllDeletes boolean option asks Lucene to move all recent deletes
> (tombstone bitsets) to disk while opening the NRT (near-real-time) reader.
>
> Normally Lucene won't always do that, and will instead carry the bitsets
> in memory from writer to reader, for reduced refresh latency.
>
> What sort of custom changes do you have in this part of Lucene?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Sat, May 27, 2017 at 10:35 PM, Nawab Zada Asad Iqbal 
> wrote:
>
>> Hi all
>>
>> I am looking at following change in lucene-solr which doen't mention any
>> JIRA. How can I know more about it?
>>
>> "1ae7291 Mike McCandless on 1/24/16 at 3:17 PM current patch"
>>
>> Specifically, I am interested in what 'writeAllDeletes'  does in the
>> following method. Let me know if it is very stupid question and I should
>> have done something else before emailing here.
>>
>> static DirectoryReader open(IndexWriter writer, SegmentInfos infos,
>> boolean applyAllDeletes, boolean writeAllDeletes) throws IOException {
>>
>> Background: We are running solr4.5 and upgrading to 6.5.1. We have
>> some custom code in this area, which we need to merge.
>>
>>
>> Thanks
>>
>> Nawab
>>
>
>


Re: StandardDirectoryReader.java:: applyAllDeletes, writeAllDeletes

2017-05-28 Thread Michael McCandless
Sorry, yes, that commit was one of many on a feature branch I used to work
on LUCENE-5438, which added near-real-time index replication to Lucene.
Before this change, Lucene's replication module required a commit in order
to replicate, which is a heavy operation.

The writeAllDeletes boolean option asks Lucene to move all recent deletes
(tombstone bitsets) to disk while opening the NRT (near-real-time) reader.

Normally Lucene won't always do that, and will instead carry the bitsets in
memory from writer to reader, for reduced refresh latency.

What sort of custom changes do you have in this part of Lucene?

Mike McCandless

http://blog.mikemccandless.com

On Sat, May 27, 2017 at 10:35 PM, Nawab Zada Asad Iqbal 
wrote:

> Hi all
>
> I am looking at following change in lucene-solr which doen't mention any
> JIRA. How can I know more about it?
>
> "1ae7291 Mike McCandless on 1/24/16 at 3:17 PM current patch"
>
> Specifically, I am interested in what 'writeAllDeletes'  does in the
> following method. Let me know if it is very stupid question and I should
> have done something else before emailing here.
>
> static DirectoryReader open(IndexWriter writer, SegmentInfos infos,
> boolean applyAllDeletes, boolean writeAllDeletes) throws IOException {
>
> Background: We are running solr4.5 and upgrading to 6.5.1. We have
> some custom code in this area, which we need to merge.
>
>
> Thanks
>
> Nawab
>


Re: StandardDirectoryReader.java:: applyAllDeletes, writeAllDeletes

2017-05-28 Thread Shawn Heisey
On 5/27/2017 8:35 PM, Nawab Zada Asad Iqbal wrote:
> I am looking at following change in lucene-solr which doen't mention any
> JIRA. How can I know more about it?
>
> "1ae7291 Mike McCandless on 1/24/16 at 3:17 PM current patch"

The reason that there's no Jira issue mentioned is that the commit was
first made on an entirely separate git branch -- the one for
LUCENE-5438.  I think I would argue that even commits on special
branches should have the Jira issue in the commit text, because
eventually those commits get merged back to master, where information
about the source branch seems to disappear.

https://issues.apache.org/jira/browse/LUCENE-5438

LUCENE-5438 added an HTTP-based replication capability that Lucene users
could leverage, similar to what Solr already had.

> Specifically, I am interested in what 'writeAllDeletes'  does in the
> following method. Let me know if it is very stupid question and I should
> have done something else before emailing here.
>
> static DirectoryReader open(IndexWriter writer, SegmentInfos infos,
> boolean applyAllDeletes, boolean writeAllDeletes) throws IOException {
>
> Background: We are running solr4.5 and upgrading to 6.5.1. We have
> some custom code in this area, which we need to merge.

I found this commit in the archive for the commits mailing list.  The
full commit hash is 1ae7291429bad742715344f86cfa5200229b3698.

https://git1-us-west.apache.org/repos/asf?p=lucene-solr.git;a=commit;h=1ae72914

This was a change in Lucene code, Solr code wasn't touched.  The author
of that change, Mike McCandless, is a participant on this list.  There
are posts from him as recently as April 2017.  Because of that, I
hesitate to have you ask your question on the dev list, but if you don't
get a useful reply from somebody soon, you may want to do that.

I wish I could offer you some advice myself, but I'm not familiar with
the low-level Lucene code.

Thanks,
Shawn