Re: [VOTE] Release Lucene/Solr 8.8.1 RC2

2021-02-17 Thread Atri Sharma
SUCCESS! [1:03:43.432934]

+1 Binding

On Thu, Feb 18, 2021 at 2:10 AM Houston Putman  wrote:
>
> SUCCESS! [1:01:43.630010]
>
> +1 (binding)
>
> On Wed, Feb 17, 2021 at 3:05 PM Tomás Fernández Löbbe  
> wrote:
>>
>> SUCCESS! [1:07:31.079810]
>>
>> Tested upgrading from 8.7 and saw no problems
>>
>> +1 (binding)
>>
>> On Wed, Feb 17, 2021 at 2:58 AM Noble Paul  wrote:
>>>
>>> SUCCESS! [1:04:46.520370]
>>>
>>> +1 Binding
>>>
>>> On Wed, Feb 17, 2021 at 1:44 PM Timothy Potter  
>>> wrote:
>>> >
>>> > And I continue to struggle with the python3 command:
>>> >
>>> > python3 -u dev-tools/scripts/smokeTestRelease.py \
>>> > https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.8.1-RC2-rev64f3b496bfee762a9d2dbff40700f457f4464dfe
>>> >
>>> > On Tue, Feb 16, 2021 at 7:41 PM Timothy Potter  
>>> > wrote:
>>> > >
>>> > > Please vote for release candidate 2 for Lucene/Solr 8.8.1
>>> > >
>>> > > The artifacts can be downloaded from:
>>> > > https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.8.1-RC2-rev64f3b496bfee762a9d2dbff40700f457f4464dfe
>>> > >
>>> > > You can run the smoke tester directly with this command:
>>> > > python3 -u dev-tools/scripts/smokeTestRelease.py
>>> > > https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.8.1-RC2-rev64f3b496bfee762a9d2dbff40700f457f4464dfe
>>> > >
>>> > > The vote will be open for at least 72 hours i.e. until 2021-02-20 03:00 
>>> > > UTC.
>>> > >
>>> > > [ ] +1  approve
>>> > > [ ] +0  no opinion
>>> > > [ ] -1  disapprove (and reason why)
>>> > >
>>> > > Here is my +1 SUCCESS! [0:50:07.947952]
>>> > >
>>> > > Also, as with RC1, in addition to the smoke test, I built a Docker
>>> > > image from the RC locally and verified:
>>> > >
>>> > > a. A rolling upgrade of a 3-node 8.7.0 cluster to the 8.8.1 RC
>>> > > completes successfully w/o any NPEs or weirdness with leader election
>>> > > / recoveries.
>>> > > b. The base_url property is stored in replica state after the upgrade
>>> > > c. A basic client application built with SolrJ 8.7.0 can load cluster
>>> > > state info directly from ZK and query the 8.8.1 RC2 servers.
>>> > > d. Same client app built with SolrJ 8.8.0 works as well.
>>> > >
>>> > > As this bug-fix release is primarily needed to address a SolrJ
>>> > > back-compat break (SOLR-15145) and unfortunately our smoke tester
>>> > > framework does not test for backcompat of older SolrJ against the RC,
>>> > > I ask others to please test rolling upgrades of servers (ideally
>>> > > multi-node clusters) running pre-8.8.0 to this RC if possible. Also,
>>> > > please try client applications that are using an older SolrJ, esp.
>>> > > those that load cluster state directly from ZK.
>>> > >
>>> > > Best regards,
>>> > > Tim
>>> >
>>> > -
>>> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> > For additional commands, e-mail: dev-h...@lucene.apache.org
>>> >
>>>
>>>
>>> --
>>> -
>>> Noble Paul
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>


-- 
Regards,

Atri
Apache Concerted

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Random disabling of asserts in tests is not working

2021-02-17 Thread Gautam Worah
Hi Folks,

I was working on PR LUCENE-9476
 when we found cases where
`asserts` in test cases were always enabled. To verify whether this was
true, I wrote a simple test case that called assert 1000 times, modified a
variable and then checked its value. This test case always passed because
`assert` was always enabled.

Mike McCandless mentioned in the PR that Lucene earlier had the capability
to randomly disable `asserts` so that there were no accidental cases of
developers relying on `asserts` always being enabled. We may have lost this
feature when the project transitioned to Gradle.
When I change the default value of `tests.asserts` in randomization.gradle
to false, the test fails promptly.

Has anyone else noticed this/knows more about this?

Thanks,
Gautam Worah.


[jira] [Commented] (PYLUCENE-56) Can't build JCC on Mac

2021-02-17 Thread Andreas Vajda (Jira)


[ 
https://issues.apache.org/jira/browse/PYLUCENE-56?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17286235#comment-17286235
 ] 

Andreas Vajda commented on PYLUCENE-56:
---

That link again (sorry for the noise):
https://mail-archives.apache.org/mod_mbox/lucene-pylucene-dev/202102.mbox/%3calpine.OSX.2.23.453.2102101246410.19085@yuzu.local%3e

> Can't build JCC on Mac
> --
>
> Key: PYLUCENE-56
> URL: https://issues.apache.org/jira/browse/PYLUCENE-56
> Project: PyLucene
>  Issue Type: Bug
> Environment: MacOSX 10.15.7, Intel Core i7, Python 3.8.2, gcc 
> (Homebrew GCC 10.2.0_3) 10.2.0
> following the instructions here:
> http://lucene.apache.org/pylucene/jcc/install.html
> svn co https://svn.apache.org/repos/asf/lucene/pylucene/trunk/jcc jcc
> Checked out revision 1886645.
>Reporter: Clem Wang
>Priority: Major
>  Labels: build
>
> This is puzzling to me, as I can't figure out how the failing gcc command 
> line gets its arguments (mostly).  I found one problem in the setup.py, but I 
> can't find the error causing strings anywhere in the files in the jcc 
> directory of sub directory.
>  
> Steps:
>  
> {code:java}
> svn co https://svn.apache.org/repos/asf/lucene/pylucene/trunk/jcc jcc
> Checked out revision 1886645.
> cd jcc
> python setup.py build
>  
> {code}
> ...
> /opt/local/bin/gcc -Wno-unused-result -Wsign-compare -Wunreachable-code 
> -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall 
> {color:#ff}*-iwithsysroot/*{color}System/Library/Frameworks/System.framework/PrivateHeaders
>  
> {color:#ff}*-iwithsysroot/*{color}Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.8/Headers
>  {color:#ff}*-arch arm64*{color} -arch x86_64 
> -I/usr/local/opt/libomp/include -Xpreprocessor -fopenmp -dynamiclib 
> -D_jcc_lib -DJCC_VER="3.8" 
> -I/Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home/include 
> -I/Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home/include/darwin
>  -I_jcc3 -Ijcc3/sources -I/Users/cwang/3.7/include 
> -I/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.8/include/python3.8
>  -c jcc3/sources/jcc.cpp -o 
> build/temp.macosx-10.14.6-x86_64-3.8/jcc3/sources/jcc.o -DPYTHON 
> -fno-strict-aliasing -Wno-write-strings -mmacosx-version-min=10.9 -std=c++11  
> {color:#ff}*-stdlib=libc++*{color}
>  
> which generates 4 errors due to the parts marked above in red bold:
>  
> *gcc:* *error:* this compiler does not support arm64
> *gcc:* *error:* unrecognized command-line option 
> '*-iwithsysroot/System/Library/Frameworks/System.framework/PrivateHeaders*'
> *gcc:* *error:* unrecognized command-line option 
> '*-iwithsysroot/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.8/Headers*'
> *gcc:* *error:* unrecognized command-line option '*-stdlib=libc++*'
> error: command '/opt/local/bin/gcc' failed with exit status 1
>  
> Obviously, the contradictory 
> *-arch arm64*
> needs to be removed but I can't find arm64 anywhere.
>  
> The unnecessary
> *-stdlib=libc++*
>  
> can be removed from setup.py:
> {code:java}
> CFLAGS = {
>  'darwin': ['-fno-strict-aliasing', '-Wno-write-strings',
>  '-mmacosx-version-min=10.9', '-std=c++11', '-stdlib=libc++'],{code}
>  
> After poking around, I figured out that gcc uses
> {code:java}
> -I {code}
> not
> {code:java}
> -i {code}
> for includes.
>  
> Making these modifications (and adding
> {code:java}
> -Wno-attributes{code}
> to remove warnings)
>  
> I came up with this line that does successfully compile without errors:
> /opt/local/bin/gcc -Wno-attributes -Wno-unused-result -Wsign-compare 
> -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall 
> -I/System/Library/Frameworks/System.framework/PrivateHeaders 
> -I/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.8/Headers
>  -arch x86_64 -I/usr/local/opt/libomp/include -Xpreprocessor -fopenmp 
> -dynamiclib -D_jcc_lib -DJCC_VER="3.8" 
> -I/Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home/include 
> -I/Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home/include/darwin
>  -I_jcc3 -Ijcc3/sources -I/Users/cwang/3.7/include 
> -I/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.8/include/python3.8
>  -c jcc3/sources/jcc.cpp -o 
> build/temp.macosx-10.14.6-x86_64-3.8/jcc3/sources/jcc.o -DPYTHON 
> -fno-strict-aliasing -Wno-write-strings -mmacosx-version-min=10.9 -std=c++11
>  
> But other than removing  *'-stdlib=libc++'*  from the setup.py file I have no 
> idea how to modify things to fix the compile errors by the line generated 
> some how by setup.py



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Congratulations to the new Lucene PMC Chair, Michael Sokolov!

2021-02-17 Thread Julie Tibshirani
Congratulations Mike!!

On Wed, Feb 17, 2021 at 3:12 PM Gus Heck  wrote:

> Congratulations :)
>
> On Wed, Feb 17, 2021 at 5:42 PM Tomás Fernández Löbbe <
> tomasflo...@gmail.com> wrote:
>
>> Congratulations Mike!
>>
>> On Wed, Feb 17, 2021 at 2:42 PM Steve Rowe  wrote:
>>
>>> Congrats Mike!
>>>
>>> --
>>> Steve
>>>
>>> > On Feb 17, 2021, at 4:31 PM, Anshum Gupta 
>>> wrote:
>>> >
>>> > Every year, the Lucene PMC rotates the Lucene PMC chair and Apache
>>> Vice President position.
>>> >
>>> > This year we nominated and elected Michael Sokolov as the Chair, a
>>> decision that the board approved in its February 2021 meeting.
>>> >
>>> > Congratulations, Mike!
>>> >
>>> > --
>>> > Anshum Gupta
>>>
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>
>>>
>
> --
> http://www.needhamsoftware.com (work)
> http://www.the111shift.com (play)
>


[jira] [Commented] (PYLUCENE-56) Can't build JCC on Mac

2021-02-17 Thread Andreas Vajda (Jira)


[ 
https://issues.apache.org/jira/browse/PYLUCENE-56?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17286231#comment-17286231
 ] 

Andreas Vajda commented on PYLUCENE-56:
---

Did you follow the instructions I posted on pylucene-dev@lucene.apache.org, 
following your earlier bug report ? If so, you seem to still be using the 
homebrew version of gcc, why ?
If not, please subscribe to the dev list and follow the instructions.
For reference, I'm talking about this post:
https://mail-archives.apache.org/mod_mbox/lucene-pylucene-dev/202102.mbox/ajax/%3Calpine.OSX.2.23.453.2102101246410.19085%40yuzu.local%3E

> Can't build JCC on Mac
> --
>
> Key: PYLUCENE-56
> URL: https://issues.apache.org/jira/browse/PYLUCENE-56
> Project: PyLucene
>  Issue Type: Bug
> Environment: MacOSX 10.15.7, Intel Core i7, Python 3.8.2, gcc 
> (Homebrew GCC 10.2.0_3) 10.2.0
> following the instructions here:
> http://lucene.apache.org/pylucene/jcc/install.html
> svn co https://svn.apache.org/repos/asf/lucene/pylucene/trunk/jcc jcc
> Checked out revision 1886645.
>Reporter: Clem Wang
>Priority: Major
>  Labels: build
>
> This is puzzling to me, as I can't figure out how the failing gcc command 
> line gets its arguments (mostly).  I found one problem in the setup.py, but I 
> can't find the error causing strings anywhere in the files in the jcc 
> directory of sub directory.
>  
> Steps:
>  
> {code:java}
> svn co https://svn.apache.org/repos/asf/lucene/pylucene/trunk/jcc jcc
> Checked out revision 1886645.
> cd jcc
> python setup.py build
>  
> {code}
> ...
> /opt/local/bin/gcc -Wno-unused-result -Wsign-compare -Wunreachable-code 
> -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall 
> {color:#ff}*-iwithsysroot/*{color}System/Library/Frameworks/System.framework/PrivateHeaders
>  
> {color:#ff}*-iwithsysroot/*{color}Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.8/Headers
>  {color:#ff}*-arch arm64*{color} -arch x86_64 
> -I/usr/local/opt/libomp/include -Xpreprocessor -fopenmp -dynamiclib 
> -D_jcc_lib -DJCC_VER="3.8" 
> -I/Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home/include 
> -I/Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home/include/darwin
>  -I_jcc3 -Ijcc3/sources -I/Users/cwang/3.7/include 
> -I/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.8/include/python3.8
>  -c jcc3/sources/jcc.cpp -o 
> build/temp.macosx-10.14.6-x86_64-3.8/jcc3/sources/jcc.o -DPYTHON 
> -fno-strict-aliasing -Wno-write-strings -mmacosx-version-min=10.9 -std=c++11  
> {color:#ff}*-stdlib=libc++*{color}
>  
> which generates 4 errors due to the parts marked above in red bold:
>  
> *gcc:* *error:* this compiler does not support arm64
> *gcc:* *error:* unrecognized command-line option 
> '*-iwithsysroot/System/Library/Frameworks/System.framework/PrivateHeaders*'
> *gcc:* *error:* unrecognized command-line option 
> '*-iwithsysroot/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.8/Headers*'
> *gcc:* *error:* unrecognized command-line option '*-stdlib=libc++*'
> error: command '/opt/local/bin/gcc' failed with exit status 1
>  
> Obviously, the contradictory 
> *-arch arm64*
> needs to be removed but I can't find arm64 anywhere.
>  
> The unnecessary
> *-stdlib=libc++*
>  
> can be removed from setup.py:
> {code:java}
> CFLAGS = {
>  'darwin': ['-fno-strict-aliasing', '-Wno-write-strings',
>  '-mmacosx-version-min=10.9', '-std=c++11', '-stdlib=libc++'],{code}
>  
> After poking around, I figured out that gcc uses
> {code:java}
> -I {code}
> not
> {code:java}
> -i {code}
> for includes.
>  
> Making these modifications (and adding
> {code:java}
> -Wno-attributes{code}
> to remove warnings)
>  
> I came up with this line that does successfully compile without errors:
> /opt/local/bin/gcc -Wno-attributes -Wno-unused-result -Wsign-compare 
> -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall 
> -I/System/Library/Frameworks/System.framework/PrivateHeaders 
> -I/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.8/Headers
>  -arch x86_64 -I/usr/local/opt/libomp/include -Xpreprocessor -fopenmp 
> -dynamiclib -D_jcc_lib -DJCC_VER="3.8" 
> -I/Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home/include 
> -I/Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home/include/darwin
>  -I_jcc3 -Ijcc3/sources -I/Users/cwang/3.7/include 
> -I/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.8/include/python3.8
>  -c jcc3/sources/jcc.cpp -o 
> build/temp.macosx-10.14.6-x86_64-3.8/jcc3/sources/jcc.o -DPYTHON 
> -fno-strict-aliasing -Wno-write-strings -mmacosx-version-min=10.9 -std=c++11
>  
> But other 

Re: GC improvement when skipping bytes in DataInput

2021-02-17 Thread Robert Muir
The current allocation is lazy/on-demand (see the code), so I wouldn't
worry about it. If skipSlowly is not called explicitly, nothing will
allocate byte[]s.

On Wed, Feb 17, 2021 at 8:56 PM Greg Miller  wrote:

> Sounds good; thanks again. I'll see if I can cook something up this week.
> As I thought about this a little further, I think I'll need to avoid a
> default "skipSlowly" in DataInput in order to see the same benefits I
> observed with my local change, since it requires doing away with allocating
> a byte[] for skipping in each DataInput. So if DataInput still has a
> default "slow skip" implementation, that doesn't quite solve the
> accumulation of byte array garbage (although there would be other benefits
> of course). After looking at the code a little more though, it doesn't seem
> too difficult to create a "proper" skipBytes implementation in all the
> necessary places.
>
> Thanks again!
>
> Cheers,
> -Greg
>
> On Wed, Feb 17, 2021 at 5:15 PM Robert Muir  wrote:
>
>> Sure, we can always create followups, but we can start with trying to
>> make the implementations efficient.
>>
>> We'd probably want to create more followup issues later anyway, e.g. to
>> address any TODOs, and ultimately to fix the API: it is silly to have
>> seek(long) and skipBytes(long) that do exactly the same thing.
>>
>> It is just especially egregious that one of these has a default
>> implementation that will allocate buffers and sequentially read+throw away
>> up to 2^63-1 bytes, so let's fix that first.
>>
>> On Wed, Feb 17, 2021 at 7:58 PM Greg Miller  wrote:
>>
>>> Fair points. I'm going to see if I can carve out some time to take up
>>> your suggested approach. Would you suggest using LUCENE-9480
>>>  to report back on
>>> any progress made? There are a few different ideas captured there, and it
>>> sounds like the leading suggestion is to collapse the ideas of DataInput
>>> and IndexInput, so I'm not sure if that's the best place to track the
>>> suggested approach we've discussed here or not.
>>>
>>> Thanks again for the discussion!
>>>
>>> Cheers,
>>> -Greg
>>>
>>> On Wed, Feb 17, 2021 at 4:09 PM Robert Muir  wrote:
>>>


 On Wed, Feb 17, 2021 at 6:53 PM Greg Miller  wrote:

>
> Right, I am looking at the code but maybe we're talking about two
> different things, so let me clarify. I agree that there is no concurrency
> issue with the current code and I apologise if my point was confusing. The
> reason skipBytes was made an instance variable as opposed to a static one
> was to *avoid *creating a concurrency issue (which certainly would
> exist if it had been made static). Making it an instance variable is
> wasteful for GC though, no? My suggestion of moving to a threadlocal hits 
> a
> "happy medium" where we're not allocating these silly buffers for each
> DataInput instance but making sure each thread has a separate one. Does
> this make more sense now?
>
>
 Adding a threadlocal isn't a happy medium here. The first thing I see
 is thread churn issues (apps that use non-fixed threadpools). See javadocs
 for ClosableThreadLocal for more information. We can't even use that
 CloseableThreadLocal hack here, because nobody calls close() on clones of
 IndexInputs. So threadlocal would seriously make matters worse for some
 apps using the library, all for something that should be a "+=" :)

 I'm not trying to suggest perfection, just start by deprecating the
 slow impl, make it abstract and "hoist" the responsibility of
 implementation upwards to subclasses that are better prepared to handle
 them. For the majority use case (e.g. IndexInput), your buffer trivially
 goes away since you implemented it with seek(). The ones that are left can
 remain slow, that's fine, you speed up 90% easily, and we can see what is
 needed to fix the leftovers. But I really don't think these will be
 difficult either, e.g. for ChecksumIndexInput we can probably leave it
 abstract and implement the skipping directly on BufferedChecksumIndexInput
 (it already has a buffer, so it should be able to use that one, and avoid 2
 buffers like today).

>>>


Re: GC improvement when skipping bytes in DataInput

2021-02-17 Thread Greg Miller
Sounds good; thanks again. I'll see if I can cook something up this week.
As I thought about this a little further, I think I'll need to avoid a
default "skipSlowly" in DataInput in order to see the same benefits I
observed with my local change, since it requires doing away with allocating
a byte[] for skipping in each DataInput. So if DataInput still has a
default "slow skip" implementation, that doesn't quite solve the
accumulation of byte array garbage (although there would be other benefits
of course). After looking at the code a little more though, it doesn't seem
too difficult to create a "proper" skipBytes implementation in all the
necessary places.

Thanks again!

Cheers,
-Greg

On Wed, Feb 17, 2021 at 5:15 PM Robert Muir  wrote:

> Sure, we can always create followups, but we can start with trying to make
> the implementations efficient.
>
> We'd probably want to create more followup issues later anyway, e.g. to
> address any TODOs, and ultimately to fix the API: it is silly to have
> seek(long) and skipBytes(long) that do exactly the same thing.
>
> It is just especially egregious that one of these has a default
> implementation that will allocate buffers and sequentially read+throw away
> up to 2^63-1 bytes, so let's fix that first.
>
> On Wed, Feb 17, 2021 at 7:58 PM Greg Miller  wrote:
>
>> Fair points. I'm going to see if I can carve out some time to take up
>> your suggested approach. Would you suggest using LUCENE-9480
>>  to report back on
>> any progress made? There are a few different ideas captured there, and it
>> sounds like the leading suggestion is to collapse the ideas of DataInput
>> and IndexInput, so I'm not sure if that's the best place to track the
>> suggested approach we've discussed here or not.
>>
>> Thanks again for the discussion!
>>
>> Cheers,
>> -Greg
>>
>> On Wed, Feb 17, 2021 at 4:09 PM Robert Muir  wrote:
>>
>>>
>>>
>>> On Wed, Feb 17, 2021 at 6:53 PM Greg Miller  wrote:
>>>

 Right, I am looking at the code but maybe we're talking about two
 different things, so let me clarify. I agree that there is no concurrency
 issue with the current code and I apologise if my point was confusing. The
 reason skipBytes was made an instance variable as opposed to a static one
 was to *avoid *creating a concurrency issue (which certainly would
 exist if it had been made static). Making it an instance variable is
 wasteful for GC though, no? My suggestion of moving to a threadlocal hits a
 "happy medium" where we're not allocating these silly buffers for each
 DataInput instance but making sure each thread has a separate one. Does
 this make more sense now?


>>> Adding a threadlocal isn't a happy medium here. The first thing I see is
>>> thread churn issues (apps that use non-fixed threadpools). See javadocs for
>>> ClosableThreadLocal for more information. We can't even use that
>>> CloseableThreadLocal hack here, because nobody calls close() on clones of
>>> IndexInputs. So threadlocal would seriously make matters worse for some
>>> apps using the library, all for something that should be a "+=" :)
>>>
>>> I'm not trying to suggest perfection, just start by deprecating the slow
>>> impl, make it abstract and "hoist" the responsibility of implementation
>>> upwards to subclasses that are better prepared to handle them. For the
>>> majority use case (e.g. IndexInput), your buffer trivially goes away since
>>> you implemented it with seek(). The ones that are left can remain slow,
>>> that's fine, you speed up 90% easily, and we can see what is needed to fix
>>> the leftovers. But I really don't think these will be difficult either,
>>> e.g. for ChecksumIndexInput we can probably leave it abstract and implement
>>> the skipping directly on BufferedChecksumIndexInput (it already has a
>>> buffer, so it should be able to use that one, and avoid 2 buffers like
>>> today).
>>>
>>


Re: GC improvement when skipping bytes in DataInput

2021-02-17 Thread Robert Muir
Sure, we can always create followups, but we can start with trying to make
the implementations efficient.

We'd probably want to create more followup issues later anyway, e.g. to
address any TODOs, and ultimately to fix the API: it is silly to have
seek(long) and skipBytes(long) that do exactly the same thing.

It is just especially egregious that one of these has a default
implementation that will allocate buffers and sequentially read+throw away
up to 2^63-1 bytes, so let's fix that first.

On Wed, Feb 17, 2021 at 7:58 PM Greg Miller  wrote:

> Fair points. I'm going to see if I can carve out some time to take up your
> suggested approach. Would you suggest using LUCENE-9480
>  to report back on any
> progress made? There are a few different ideas captured there, and it
> sounds like the leading suggestion is to collapse the ideas of DataInput
> and IndexInput, so I'm not sure if that's the best place to track the
> suggested approach we've discussed here or not.
>
> Thanks again for the discussion!
>
> Cheers,
> -Greg
>
> On Wed, Feb 17, 2021 at 4:09 PM Robert Muir  wrote:
>
>>
>>
>> On Wed, Feb 17, 2021 at 6:53 PM Greg Miller  wrote:
>>
>>>
>>> Right, I am looking at the code but maybe we're talking about two
>>> different things, so let me clarify. I agree that there is no concurrency
>>> issue with the current code and I apologise if my point was confusing. The
>>> reason skipBytes was made an instance variable as opposed to a static one
>>> was to *avoid *creating a concurrency issue (which certainly would
>>> exist if it had been made static). Making it an instance variable is
>>> wasteful for GC though, no? My suggestion of moving to a threadlocal hits a
>>> "happy medium" where we're not allocating these silly buffers for each
>>> DataInput instance but making sure each thread has a separate one. Does
>>> this make more sense now?
>>>
>>>
>> Adding a threadlocal isn't a happy medium here. The first thing I see is
>> thread churn issues (apps that use non-fixed threadpools). See javadocs for
>> ClosableThreadLocal for more information. We can't even use that
>> CloseableThreadLocal hack here, because nobody calls close() on clones of
>> IndexInputs. So threadlocal would seriously make matters worse for some
>> apps using the library, all for something that should be a "+=" :)
>>
>> I'm not trying to suggest perfection, just start by deprecating the slow
>> impl, make it abstract and "hoist" the responsibility of implementation
>> upwards to subclasses that are better prepared to handle them. For the
>> majority use case (e.g. IndexInput), your buffer trivially goes away since
>> you implemented it with seek(). The ones that are left can remain slow,
>> that's fine, you speed up 90% easily, and we can see what is needed to fix
>> the leftovers. But I really don't think these will be difficult either,
>> e.g. for ChecksumIndexInput we can probably leave it abstract and implement
>> the skipping directly on BufferedChecksumIndexInput (it already has a
>> buffer, so it should be able to use that one, and avoid 2 buffers like
>> today).
>>
>


Re: GC improvement when skipping bytes in DataInput

2021-02-17 Thread Greg Miller
Fair points. I'm going to see if I can carve out some time to take up your
suggested approach. Would you suggest using LUCENE-9480
 to report back on any
progress made? There are a few different ideas captured there, and it
sounds like the leading suggestion is to collapse the ideas of DataInput
and IndexInput, so I'm not sure if that's the best place to track the
suggested approach we've discussed here or not.

Thanks again for the discussion!

Cheers,
-Greg

On Wed, Feb 17, 2021 at 4:09 PM Robert Muir  wrote:

>
>
> On Wed, Feb 17, 2021 at 6:53 PM Greg Miller  wrote:
>
>>
>> Right, I am looking at the code but maybe we're talking about two
>> different things, so let me clarify. I agree that there is no concurrency
>> issue with the current code and I apologise if my point was confusing. The
>> reason skipBytes was made an instance variable as opposed to a static one
>> was to *avoid *creating a concurrency issue (which certainly would exist
>> if it had been made static). Making it an instance variable is wasteful for
>> GC though, no? My suggestion of moving to a threadlocal hits a "happy
>> medium" where we're not allocating these silly buffers for each DataInput
>> instance but making sure each thread has a separate one. Does this make
>> more sense now?
>>
>>
> Adding a threadlocal isn't a happy medium here. The first thing I see is
> thread churn issues (apps that use non-fixed threadpools). See javadocs for
> ClosableThreadLocal for more information. We can't even use that
> CloseableThreadLocal hack here, because nobody calls close() on clones of
> IndexInputs. So threadlocal would seriously make matters worse for some
> apps using the library, all for something that should be a "+=" :)
>
> I'm not trying to suggest perfection, just start by deprecating the slow
> impl, make it abstract and "hoist" the responsibility of implementation
> upwards to subclasses that are better prepared to handle them. For the
> majority use case (e.g. IndexInput), your buffer trivially goes away since
> you implemented it with seek(). The ones that are left can remain slow,
> that's fine, you speed up 90% easily, and we can see what is needed to fix
> the leftovers. But I really don't think these will be difficult either,
> e.g. for ChecksumIndexInput we can probably leave it abstract and implement
> the skipping directly on BufferedChecksumIndexInput (it already has a
> buffer, so it should be able to use that one, and avoid 2 buffers like
> today).
>


[jira] [Updated] (PYLUCENE-56) Can't build JCC on Mac

2021-02-17 Thread Clem Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/PYLUCENE-56?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Clem Wang updated PYLUCENE-56:
--
Description: 
This is puzzling to me, as I can't figure out how the failing gcc command line 
gets its arguments (mostly).  I found one problem in the setup.py, but I can't 
find the error causing strings anywhere in the files in the jcc directory of 
sub directory.

 

Steps:

 
{code:java}
svn co https://svn.apache.org/repos/asf/lucene/pylucene/trunk/jcc jcc
Checked out revision 1886645.
cd jcc
python setup.py build
 
{code}
...

/opt/local/bin/gcc -Wno-unused-result -Wsign-compare -Wunreachable-code 
-fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall 
{color:#ff}*-iwithsysroot/*{color}System/Library/Frameworks/System.framework/PrivateHeaders
 
{color:#ff}*-iwithsysroot/*{color}Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.8/Headers
 {color:#ff}*-arch arm64*{color} -arch x86_64 
-I/usr/local/opt/libomp/include -Xpreprocessor -fopenmp -dynamiclib -D_jcc_lib 
-DJCC_VER="3.8" 
-I/Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home/include 
-I/Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home/include/darwin
 -I_jcc3 -Ijcc3/sources -I/Users/cwang/3.7/include 
-I/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.8/include/python3.8
 -c jcc3/sources/jcc.cpp -o 
build/temp.macosx-10.14.6-x86_64-3.8/jcc3/sources/jcc.o -DPYTHON 
-fno-strict-aliasing -Wno-write-strings -mmacosx-version-min=10.9 -std=c++11  
{color:#ff}*-stdlib=libc++*{color}

 

which generates 4 errors due to the parts marked above in red bold:

 

*gcc:* *error:* this compiler does not support arm64

*gcc:* *error:* unrecognized command-line option 
'*-iwithsysroot/System/Library/Frameworks/System.framework/PrivateHeaders*'

*gcc:* *error:* unrecognized command-line option 
'*-iwithsysroot/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.8/Headers*'

*gcc:* *error:* unrecognized command-line option '*-stdlib=libc++*'

error: command '/opt/local/bin/gcc' failed with exit status 1

 

Obviously, the contradictory 

*-arch arm64*

needs to be removed but I can't find arm64 anywhere.

 

The unnecessary

*-stdlib=libc++*

 

can be removed from setup.py:
{code:java}
CFLAGS = {
 'darwin': ['-fno-strict-aliasing', '-Wno-write-strings',
 '-mmacosx-version-min=10.9', '-std=c++11', '-stdlib=libc++'],{code}
 

After poking around, I figured out that gcc uses
{code:java}
-I {code}
not
{code:java}
-i {code}
for includes.

 

Making these modifications (and adding
{code:java}
-Wno-attributes{code}
to remove warnings)

 

I came up with this line that does successfully compile without errors:

/opt/local/bin/gcc -Wno-attributes -Wno-unused-result -Wsign-compare 
-Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall 
-I/System/Library/Frameworks/System.framework/PrivateHeaders 
-I/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.8/Headers
 -arch x86_64 -I/usr/local/opt/libomp/include -Xpreprocessor -fopenmp 
-dynamiclib -D_jcc_lib -DJCC_VER="3.8" 
-I/Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home/include 
-I/Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home/include/darwin
 -I_jcc3 -Ijcc3/sources -I/Users/cwang/3.7/include 
-I/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.8/include/python3.8
 -c jcc3/sources/jcc.cpp -o 
build/temp.macosx-10.14.6-x86_64-3.8/jcc3/sources/jcc.o -DPYTHON 
-fno-strict-aliasing -Wno-write-strings -mmacosx-version-min=10.9 -std=c++11

 

But other than removing  *'-stdlib=libc++'*  from the setup.py file I have no 
idea how to modify things to fix the compile errors by the line generated some 
how by setup.py

  was:
This is puzzling to me, as I can't figure out how the failing gcc command line 
gets its arguments (mostly).  I found one problem in the setup.py, but I can't 
find the error causing strings anywhere in the files in the jcc directory of 
sub directory.

 

Steps:

 
{code:java}
svn co https://svn.apache.org/repos/asf/lucene/pylucene/trunk/jcc jcc
Checked out revision 1886645.
cd jcc
python setup.py build
 
{code}
...

/opt/local/bin/gcc -Wno-unused-result -Wsign-compare -Wunreachable-code 
-fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall 
{color:#FF}*-iwithsysroot/*{color}System/Library/Frameworks/System.framework/PrivateHeaders
 
{color:#FF}*-iwithsysroot/*{color}Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.8/Headers
 {color:#FF}*-arch arm64*{color} -arch x86_64 
-I/usr/local/opt/libomp/include -Xpreprocessor -fopenmp -dynamiclib -D_jcc_lib 
-DJCC_VER="3.8" 
-I/Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home/include 

[jira] [Created] (PYLUCENE-56) Can't build JCC on Mac

2021-02-17 Thread Clem Wang (Jira)
Clem Wang created PYLUCENE-56:
-

 Summary: Can't build JCC on Mac
 Key: PYLUCENE-56
 URL: https://issues.apache.org/jira/browse/PYLUCENE-56
 Project: PyLucene
  Issue Type: Bug
 Environment: MacOSX 10.15.7, Intel Core i7, Python 3.8.2, gcc 
(Homebrew GCC 10.2.0_3) 10.2.0

following the instructions here:
http://lucene.apache.org/pylucene/jcc/install.html

svn co https://svn.apache.org/repos/asf/lucene/pylucene/trunk/jcc jcc

Checked out revision 1886645.

Reporter: Clem Wang


This is puzzling to me, as I can't figure out how the failing gcc command line 
gets its arguments (mostly).  I found one problem in the setup.py, but I can't 
find the error causing strings anywhere in the files in the jcc directory of 
sub directory.

 

Steps:

 
{code:java}
svn co https://svn.apache.org/repos/asf/lucene/pylucene/trunk/jcc jcc
Checked out revision 1886645.
cd jcc
python setup.py build
 
{code}
...

/opt/local/bin/gcc -Wno-unused-result -Wsign-compare -Wunreachable-code 
-fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall 
{color:#FF}*-iwithsysroot/*{color}System/Library/Frameworks/System.framework/PrivateHeaders
 
{color:#FF}*-iwithsysroot/*{color}Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.8/Headers
 {color:#FF}*-arch arm64*{color} -arch x86_64 
-I/usr/local/opt/libomp/include -Xpreprocessor -fopenmp -dynamiclib -D_jcc_lib 
-DJCC_VER="3.8" 
-I/Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home/include 
-I/Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home/include/darwin
 -I_jcc3 -Ijcc3/sources -I/Users/cwang/3.7/include 
-I/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.8/include/python3.8
 -c jcc3/sources/jcc.cpp -o 
build/temp.macosx-10.14.6-x86_64-3.8/jcc3/sources/jcc.o -DPYTHON 
-fno-strict-aliasing -Wno-write-strings -mmacosx-version-min=10.9 -std=c++11  
{color:#FF}*-stdlib=libc++*{color}

 

which generates 4 errors due to the parts marked above in red bold:

 

*gcc:* *error:* this compiler does not support arm64

*gcc:* *error:* unrecognized command-line option 
'*-iwithsysroot/System/Library/Frameworks/System.framework/PrivateHeaders*'

*gcc:* *error:* unrecognized command-line option 
'*-iwithsysroot/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.8/Headers*'

*gcc:* *error:* unrecognized command-line option '*-stdlib=libc++*'

error: command '/opt/local/bin/gcc' failed with exit status 1

 

Obviously, the contradictory  **
{code:java}

{code}
*-arch arm64*

needs to be removed but I can't find arm64 anywhere.

 

 

The unnecessary **
{code:java}

{code}
*-stdlib=libc++*

 

can be removed from setup.py:

CFLAGS = {
 'darwin': ['-fno-strict-aliasing', '-Wno-write-strings',
 '-mmacosx-version-min=10.9', '-std=c++11', 
{color:#FF}*'-stdlib=libc++'*{color}],

 

After poking around, I figured out that gcc uses
{code:java}
-I {code}
not
{code:java}
-i {code}
for includes.

 

Making these modifications (and adding
{code:java}
-Wno-attributes{code}
to remove warnings)

 

I came up with this line that does successfully compile without errors:

/opt/local/bin/gcc -Wno-attributes -Wno-unused-result -Wsign-compare 
-Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall 
-I/System/Library/Frameworks/System.framework/PrivateHeaders 
-I/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.8/Headers
 -arch x86_64 -I/usr/local/opt/libomp/include -Xpreprocessor -fopenmp 
-dynamiclib -D_jcc_lib -DJCC_VER="3.8" 
-I/Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home/include 
-I/Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home/include/darwin
 -I_jcc3 -Ijcc3/sources -I/Users/cwang/3.7/include 
-I/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.8/include/python3.8
 -c jcc3/sources/jcc.cpp -o 
build/temp.macosx-10.14.6-x86_64-3.8/jcc3/sources/jcc.o -DPYTHON 
-fno-strict-aliasing -Wno-write-strings -mmacosx-version-min=10.9 -std=c++11

 

But other than removing  *'-stdlib=libc++'*  from the setup.py file I have no 
idea how to modify things to fix the compile errors by the line generated some 
how by setup.py



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: GC improvement when skipping bytes in DataInput

2021-02-17 Thread Robert Muir
On Wed, Feb 17, 2021 at 6:53 PM Greg Miller  wrote:

>
> Right, I am looking at the code but maybe we're talking about two
> different things, so let me clarify. I agree that there is no concurrency
> issue with the current code and I apologise if my point was confusing. The
> reason skipBytes was made an instance variable as opposed to a static one
> was to *avoid *creating a concurrency issue (which certainly would exist
> if it had been made static). Making it an instance variable is wasteful for
> GC though, no? My suggestion of moving to a threadlocal hits a "happy
> medium" where we're not allocating these silly buffers for each DataInput
> instance but making sure each thread has a separate one. Does this make
> more sense now?
>
>
Adding a threadlocal isn't a happy medium here. The first thing I see is
thread churn issues (apps that use non-fixed threadpools). See javadocs for
ClosableThreadLocal for more information. We can't even use that
CloseableThreadLocal hack here, because nobody calls close() on clones of
IndexInputs. So threadlocal would seriously make matters worse for some
apps using the library, all for something that should be a "+=" :)

I'm not trying to suggest perfection, just start by deprecating the slow
impl, make it abstract and "hoist" the responsibility of implementation
upwards to subclasses that are better prepared to handle them. For the
majority use case (e.g. IndexInput), your buffer trivially goes away since
you implemented it with seek(). The ones that are left can remain slow,
that's fine, you speed up 90% easily, and we can see what is needed to fix
the leftovers. But I really don't think these will be difficult either,
e.g. for ChecksumIndexInput we can probably leave it abstract and implement
the skipping directly on BufferedChecksumIndexInput (it already has a
buffer, so it should be able to use that one, and avoid 2 buffers like
today).


Re: GC improvement when skipping bytes in DataInput

2021-02-17 Thread Greg Miller
Thanks Robert for the detailed response. Let me try to address a few points
inline if I may...

There is zero concurrency issue. I think you are reading discussion about
old patches that didn't get committed, ignore that and look at the code.
Look at DataInput javadoc: "DataInput may only be used from one thread,
because it is not thread safe (it keeps internal state like file position)."
Look at IndexInput javadoc: "IndexInput may only be used from one thread,
because it is not thread safe (it keeps internal state like file position)."
This applies to ChecksumIndexInput, too, it is a subclass.


Right, I am looking at the code but maybe we're talking about two
different things, so let me clarify. I agree that there is no concurrency
issue with the current code and I apologise if my point was confusing. The
reason skipBytes was made an instance variable as opposed to a static one
was to *avoid *creating a concurrency issue (which certainly would exist if
it had been made static). Making it an instance variable is wasteful for GC
though, no? My suggestion of moving to a threadlocal hits a "happy medium"
where we're not allocating these silly buffers for each DataInput instance
but making sure each thread has a separate one. Does this make more sense
now?

Let's try to avoid adding more complex logic to the situation. Base classes
like DataInput shouldn't have "slow" methods that do things like read bytes
and copy them around just to increment a position pointer... that is not
good. It is supposed to just have some simple decoding helpers (like vint
decode). So it is at the wrong "level" to have such a method, as it can't
do it in an efficient way.


Sure, agree. But the more I think about the Jira you pointed me to, the
more I think we're talking about two separate problems (that could
certainly have a common solution). I agree that the current approach of
skipping bytes in general is wasteful and I don't disagree that it
shouldn't be in DataInput (or that DataInput and IndexInput should be
collapsed). Seems right. But... there's a potential needle-moving change by
just reducing garbage creation here. I just want to make sure we don't let
perfection block progress here.

A good start would be to rename DataInput.skipBytes() to a deprecated
DataInput.skipBytesSlowly() and add a new abstract DataInput.skipBytes().
Now you just have to implement skipBytes() with all the subclasses, but you
can always start with skipBytesSlowly() // TODO: fix this, so it allows
incremental progress. For IndexInput, you can make skipBytes() just call
seek(), that is an easy win. ByteArrayDataInput is already good to go,
perhaps it is the only one with a correct implementation :)

I like this thought. Let me see if I can run with your suggestion and come
up with an incremental approach to reap some short-term GC gains while
moving in a better direction overall. Just need to noodle on it a bit...

Thanks again for the discussion!

Cheers,
-Greg

On Wed, Feb 17, 2021 at 3:34 PM Robert Muir  wrote:

> There is zero concurrency issue. I think you are reading discussion about
> old patches that didn't get committed, ignore that and look at the code.
> Look at DataInput javadoc: "DataInput may only be used from one thread,
> because it is not thread safe (it keeps internal state like file position)."
> Look at IndexInput javadoc: "IndexInput may only be used from one thread,
> because it is not thread safe (it keeps internal state like file position)."
> This applies to ChecksumIndexInput, too, it is a subclass.
>
> Let's try to avoid adding more complex logic to the situation. Base
> classes like DataInput shouldn't have "slow" methods that do things like
> read bytes and copy them around just to increment a position pointer...
> that is not good. It is supposed to just have some simple decoding helpers
> (like vint decode). So it is at the wrong "level" to have such a method, as
> it can't do it in an efficient way.
>
> A good start would be to rename DataInput.skipBytes() to a deprecated
> DataInput.skipBytesSlowly() and add a new abstract DataInput.skipBytes().
> Now you just have to implement skipBytes() with all the subclasses, but you
> can always start with skipBytesSlowly() // TODO: fix this, so it allows
> incremental progress. For IndexInput, you can make skipBytes() just call
> seek(), that is an easy win. ByteArrayDataInput is already good to go,
> perhaps it is the only one with a correct implementation :)
>
>
> On Wed, Feb 17, 2021 at 5:10 PM Greg Miller  wrote:
>
>> Thanks for pointing me to the existing issue! I think I generally agree
>> that no threadlocal would be needed if the functionality of skipBytes() and
>> seek() were collapsed as per the issue you pointed me to. In the current
>> state though, concurrency causes problems for delegating implementations of
>> DataInput, specifically ChecksumIndexInput (as detailed in LUCENE-5583
>> ).
>>
>> I'll spend a 

Re: GC improvement when skipping bytes in DataInput

2021-02-17 Thread Robert Muir
There is zero concurrency issue. I think you are reading discussion about
old patches that didn't get committed, ignore that and look at the code.
Look at DataInput javadoc: "DataInput may only be used from one thread,
because it is not thread safe (it keeps internal state like file position)."
Look at IndexInput javadoc: "IndexInput may only be used from one thread,
because it is not thread safe (it keeps internal state like file position)."
This applies to ChecksumIndexInput, too, it is a subclass.

Let's try to avoid adding more complex logic to the situation. Base classes
like DataInput shouldn't have "slow" methods that do things like read bytes
and copy them around just to increment a position pointer... that is not
good. It is supposed to just have some simple decoding helpers (like vint
decode). So it is at the wrong "level" to have such a method, as it can't
do it in an efficient way.

A good start would be to rename DataInput.skipBytes() to a deprecated
DataInput.skipBytesSlowly() and add a new abstract DataInput.skipBytes().
Now you just have to implement skipBytes() with all the subclasses, but you
can always start with skipBytesSlowly() // TODO: fix this, so it allows
incremental progress. For IndexInput, you can make skipBytes() just call
seek(), that is an easy win. ByteArrayDataInput is already good to go,
perhaps it is the only one with a correct implementation :)


On Wed, Feb 17, 2021 at 5:10 PM Greg Miller  wrote:

> Thanks for pointing me to the existing issue! I think I generally agree
> that no threadlocal would be needed if the functionality of skipBytes() and
> seek() were collapsed as per the issue you pointed me to. In the current
> state though, concurrency causes problems for delegating implementations of
> DataInput, specifically ChecksumIndexInput (as detailed in LUCENE-5583
> ).
>
> I'll spend a little more time thinking about LUCENE-9480 and comment
> there. Seems like a better approach to eliminate the need for reading in
> bytes just to skip, assuming it can be made to play nice with checksums,
> etc. Thanks again!
>
> Cheers,
> -Greg
>
> On Wed, Feb 17, 2021 at 1:47 PM Robert Muir  wrote:
>
>> See this already-open issue:
>> https://issues.apache.org/jira/browse/LUCENE-9480
>>
>> No threadlocals are necessary. DataInput/IndexInput are already intended
>> for use by one thread.
>>
>> For e.g. mmapdirectory this should be "+=". no buffer is required.
>>
>> On Wed, Feb 17, 2021 at 4:15 PM Greg Miller  wrote:
>>
>>> Hi folks-
>>>
>>> I work on a Lucene-based search system and we recently added Java Flight
>>> Recorder to our benchmark tooling. When looking through results, we found
>>> DataInput#skipBytes() to be a top contributor to garbage creation. We're
>>> using Lucene84SkipReader and always skipping over Impacts in our use-case.
>>> At first glance, it appeared pretty obvious that creating new instances of
>>> the skipBuffer byte[] for each instance of DataInput was the culprit.
>>>
>>> It looks like alternatives were discussed originally in LUCENE-5583
>>> , one of which being a
>>> thread-local implementation of the skip buffer (since it can't be a static
>>> field without breaking delegating subclasses, like ChecksumIndexInput). At
>>> the time, a thread-local was advised against
>>> 
>>>  by
>>> Uwe due to GC expense, but in our benchmarks, bringing in a thread-local
>>> implementation reduced overall GC time by ~7%.
>>>
>>> I'd like to revisit this implementation decision and discuss ways in
>>> which we can reduce this unnecessary garbage creation. It seems like moving
>>> to a thread-local implementation is a win here, but I'd love to hear more
>>> thoughts or alternative suggestions from the group. I'm new to this
>>> community, so I'm not sure the best way to proceed. Should I open a Jira
>>> issue as a next step? Thanks in advance!
>>>
>>> Cheers,
>>> -Greg
>>>
>>


Re: Congratulations to the new Lucene PMC Chair, Michael Sokolov!

2021-02-17 Thread Gus Heck
Congratulations :)

On Wed, Feb 17, 2021 at 5:42 PM Tomás Fernández Löbbe 
wrote:

> Congratulations Mike!
>
> On Wed, Feb 17, 2021 at 2:42 PM Steve Rowe  wrote:
>
>> Congrats Mike!
>>
>> --
>> Steve
>>
>> > On Feb 17, 2021, at 4:31 PM, Anshum Gupta 
>> wrote:
>> >
>> > Every year, the Lucene PMC rotates the Lucene PMC chair and Apache Vice
>> President position.
>> >
>> > This year we nominated and elected Michael Sokolov as the Chair, a
>> decision that the board approved in its February 2021 meeting.
>> >
>> > Congratulations, Mike!
>> >
>> > --
>> > Anshum Gupta
>>
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: Congratulations to the new Lucene PMC Chair, Michael Sokolov!

2021-02-17 Thread Tomás Fernández Löbbe
Congratulations Mike!

On Wed, Feb 17, 2021 at 2:42 PM Steve Rowe  wrote:

> Congrats Mike!
>
> --
> Steve
>
> > On Feb 17, 2021, at 4:31 PM, Anshum Gupta 
> wrote:
> >
> > Every year, the Lucene PMC rotates the Lucene PMC chair and Apache Vice
> President position.
> >
> > This year we nominated and elected Michael Sokolov as the Chair, a
> decision that the board approved in its February 2021 meeting.
> >
> > Congratulations, Mike!
> >
> > --
> > Anshum Gupta
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


Re: Congratulations to the new Lucene PMC Chair, Michael Sokolov!

2021-02-17 Thread Steve Rowe
Congrats Mike!

--
Steve

> On Feb 17, 2021, at 4:31 PM, Anshum Gupta  wrote:
> 
> Every year, the Lucene PMC rotates the Lucene PMC chair and Apache Vice 
> President position.
> 
> This year we nominated and elected Michael Sokolov as the Chair, a decision 
> that the board approved in its February 2021 meeting.
> 
> Congratulations, Mike!
> 
> -- 
> Anshum Gupta


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Congratulations to the new Lucene PMC Chair, Michael Sokolov!

2021-02-17 Thread David Smiley
Congratulations Mike!

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Wed, Feb 17, 2021 at 4:32 PM Anshum Gupta  wrote:

> Every year, the Lucene PMC rotates the Lucene PMC chair and Apache Vice
> President position.
>
> This year we nominated and elected Michael Sokolov as the Chair, a
> decision that the board approved in its February 2021 meeting.
>
> Congratulations, Mike!
>
> --
> Anshum Gupta
>


ZkTestServer Watch limit violations

2021-02-17 Thread David Smiley
I've noticed that it's quite common for a SolrCloud based test to conclude
with warnings about "Watch limit violations".  I don't know how to
interpret these violations; it's normal to get them. Can someone offer
insights as to what this matter is about and what we ought to do about it?

63605 WARN  (ZkTestServer Run Thread) [ ] o.a.s.c.ZkTestServer Watch
limit violations:
Maximum concurrent create/delete watches above limit:

4 /solr/aliases.json
4 /solr/clusterprops.json
3 /solr/packages.json
3 /solr/security.json
2 /solr/collections/ping_test/terms/shard2
2 /solr/collections/ping_test/terms/shard1
2 /solr/configs/conf

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


Re: GC improvement when skipping bytes in DataInput

2021-02-17 Thread Greg Miller
Thanks for pointing me to the existing issue! I think I generally agree
that no threadlocal would be needed if the functionality of skipBytes() and
seek() were collapsed as per the issue you pointed me to. In the current
state though, concurrency causes problems for delegating implementations of
DataInput, specifically ChecksumIndexInput (as detailed in LUCENE-5583
).

I'll spend a little more time thinking about LUCENE-9480 and comment there.
Seems like a better approach to eliminate the need for reading in bytes
just to skip, assuming it can be made to play nice with checksums, etc.
Thanks again!

Cheers,
-Greg

On Wed, Feb 17, 2021 at 1:47 PM Robert Muir  wrote:

> See this already-open issue:
> https://issues.apache.org/jira/browse/LUCENE-9480
>
> No threadlocals are necessary. DataInput/IndexInput are already intended
> for use by one thread.
>
> For e.g. mmapdirectory this should be "+=". no buffer is required.
>
> On Wed, Feb 17, 2021 at 4:15 PM Greg Miller  wrote:
>
>> Hi folks-
>>
>> I work on a Lucene-based search system and we recently added Java Flight
>> Recorder to our benchmark tooling. When looking through results, we found
>> DataInput#skipBytes() to be a top contributor to garbage creation. We're
>> using Lucene84SkipReader and always skipping over Impacts in our use-case.
>> At first glance, it appeared pretty obvious that creating new instances of
>> the skipBuffer byte[] for each instance of DataInput was the culprit.
>>
>> It looks like alternatives were discussed originally in LUCENE-5583
>> , one of which being a
>> thread-local implementation of the skip buffer (since it can't be a static
>> field without breaking delegating subclasses, like ChecksumIndexInput). At
>> the time, a thread-local was advised against
>> 
>>  by
>> Uwe due to GC expense, but in our benchmarks, bringing in a thread-local
>> implementation reduced overall GC time by ~7%.
>>
>> I'd like to revisit this implementation decision and discuss ways in
>> which we can reduce this unnecessary garbage creation. It seems like moving
>> to a thread-local implementation is a win here, but I'd love to hear more
>> thoughts or alternative suggestions from the group. I'm new to this
>> community, so I'm not sure the best way to proceed. Should I open a Jira
>> issue as a next step? Thanks in advance!
>>
>> Cheers,
>> -Greg
>>
>


Re: GC improvement when skipping bytes in DataInput

2021-02-17 Thread Robert Muir
See this already-open issue:
https://issues.apache.org/jira/browse/LUCENE-9480

No threadlocals are necessary. DataInput/IndexInput are already intended
for use by one thread.

For e.g. mmapdirectory this should be "+=". no buffer is required.

On Wed, Feb 17, 2021 at 4:15 PM Greg Miller  wrote:

> Hi folks-
>
> I work on a Lucene-based search system and we recently added Java Flight
> Recorder to our benchmark tooling. When looking through results, we found
> DataInput#skipBytes() to be a top contributor to garbage creation. We're
> using Lucene84SkipReader and always skipping over Impacts in our use-case.
> At first glance, it appeared pretty obvious that creating new instances of
> the skipBuffer byte[] for each instance of DataInput was the culprit.
>
> It looks like alternatives were discussed originally in LUCENE-5583
> , one of which being a
> thread-local implementation of the skip buffer (since it can't be a static
> field without breaking delegating subclasses, like ChecksumIndexInput). At
> the time, a thread-local was advised against
> 
>  by
> Uwe due to GC expense, but in our benchmarks, bringing in a thread-local
> implementation reduced overall GC time by ~7%.
>
> I'd like to revisit this implementation decision and discuss ways in which
> we can reduce this unnecessary garbage creation. It seems like moving to a
> thread-local implementation is a win here, but I'd love to hear more
> thoughts or alternative suggestions from the group. I'm new to this
> community, so I'm not sure the best way to proceed. Should I open a Jira
> issue as a next step? Thanks in advance!
>
> Cheers,
> -Greg
>


Congratulations to the new Lucene PMC Chair, Michael Sokolov!

2021-02-17 Thread Anshum Gupta
Every year, the Lucene PMC rotates the Lucene PMC chair and Apache Vice
President position.

This year we nominated and elected Michael Sokolov as the Chair, a decision
that the board approved in its February 2021 meeting.

Congratulations, Mike!

-- 
Anshum Gupta


GC improvement when skipping bytes in DataInput

2021-02-17 Thread Greg Miller
Hi folks-

I work on a Lucene-based search system and we recently added Java Flight
Recorder to our benchmark tooling. When looking through results, we found
DataInput#skipBytes() to be a top contributor to garbage creation. We're
using Lucene84SkipReader and always skipping over Impacts in our use-case.
At first glance, it appeared pretty obvious that creating new instances of
the skipBuffer byte[] for each instance of DataInput was the culprit.

It looks like alternatives were discussed originally in LUCENE-5583
, one of which being a
thread-local implementation of the skip buffer (since it can't be a static
field without breaking delegating subclasses, like ChecksumIndexInput). At
the time, a thread-local was advised against

by
Uwe due to GC expense, but in our benchmarks, bringing in a thread-local
implementation reduced overall GC time by ~7%.

I'd like to revisit this implementation decision and discuss ways in which
we can reduce this unnecessary garbage creation. It seems like moving to a
thread-local implementation is a win here, but I'd love to hear more
thoughts or alternative suggestions from the group. I'm new to this
community, so I'm not sure the best way to proceed. Should I open a Jira
issue as a next step? Thanks in advance!

Cheers,
-Greg


Re: [VOTE] Release Lucene/Solr 8.8.1 RC2

2021-02-17 Thread Houston Putman
SUCCESS! [1:01:43.630010]

+1 (binding)

On Wed, Feb 17, 2021 at 3:05 PM Tomás Fernández Löbbe 
wrote:

> SUCCESS! [1:07:31.079810]
>
> Tested upgrading from 8.7 and saw no problems
>
> +1 (binding)
>
> On Wed, Feb 17, 2021 at 2:58 AM Noble Paul  wrote:
>
>> SUCCESS! [1:04:46.520370]
>>
>> +1 Binding
>>
>> On Wed, Feb 17, 2021 at 1:44 PM Timothy Potter 
>> wrote:
>> >
>> > And I continue to struggle with the python3 command:
>> >
>> > python3 -u dev-tools/scripts/smokeTestRelease.py \
>> >
>> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.8.1-RC2-rev64f3b496bfee762a9d2dbff40700f457f4464dfe
>> >
>> > On Tue, Feb 16, 2021 at 7:41 PM Timothy Potter 
>> wrote:
>> > >
>> > > Please vote for release candidate 2 for Lucene/Solr 8.8.1
>> > >
>> > > The artifacts can be downloaded from:
>> > >
>> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.8.1-RC2-rev64f3b496bfee762a9d2dbff40700f457f4464dfe
>> > >
>> > > You can run the smoke tester directly with this command:
>> > > python3 -u dev-tools/scripts/smokeTestRelease.py
>> > >
>> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.8.1-RC2-rev64f3b496bfee762a9d2dbff40700f457f4464dfe
>> > >
>> > > The vote will be open for at least 72 hours i.e. until 2021-02-20
>> 03:00 UTC.
>> > >
>> > > [ ] +1  approve
>> > > [ ] +0  no opinion
>> > > [ ] -1  disapprove (and reason why)
>> > >
>> > > Here is my +1 SUCCESS! [0:50:07.947952]
>> > >
>> > > Also, as with RC1, in addition to the smoke test, I built a Docker
>> > > image from the RC locally and verified:
>> > >
>> > > a. A rolling upgrade of a 3-node 8.7.0 cluster to the 8.8.1 RC
>> > > completes successfully w/o any NPEs or weirdness with leader election
>> > > / recoveries.
>> > > b. The base_url property is stored in replica state after the upgrade
>> > > c. A basic client application built with SolrJ 8.7.0 can load cluster
>> > > state info directly from ZK and query the 8.8.1 RC2 servers.
>> > > d. Same client app built with SolrJ 8.8.0 works as well.
>> > >
>> > > As this bug-fix release is primarily needed to address a SolrJ
>> > > back-compat break (SOLR-15145) and unfortunately our smoke tester
>> > > framework does not test for backcompat of older SolrJ against the RC,
>> > > I ask others to please test rolling upgrades of servers (ideally
>> > > multi-node clusters) running pre-8.8.0 to this RC if possible. Also,
>> > > please try client applications that are using an older SolrJ, esp.
>> > > those that load cluster state directly from ZK.
>> > >
>> > > Best regards,
>> > > Tim
>> >
>> > -
>> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> > For additional commands, e-mail: dev-h...@lucene.apache.org
>> >
>>
>>
>> --
>> -
>> Noble Paul
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>


Re: [VOTE] Release Lucene/Solr 8.8.1 RC2

2021-02-17 Thread Tomás Fernández Löbbe
SUCCESS! [1:07:31.079810]

Tested upgrading from 8.7 and saw no problems

+1 (binding)

On Wed, Feb 17, 2021 at 2:58 AM Noble Paul  wrote:

> SUCCESS! [1:04:46.520370]
>
> +1 Binding
>
> On Wed, Feb 17, 2021 at 1:44 PM Timothy Potter 
> wrote:
> >
> > And I continue to struggle with the python3 command:
> >
> > python3 -u dev-tools/scripts/smokeTestRelease.py \
> >
> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.8.1-RC2-rev64f3b496bfee762a9d2dbff40700f457f4464dfe
> >
> > On Tue, Feb 16, 2021 at 7:41 PM Timothy Potter 
> wrote:
> > >
> > > Please vote for release candidate 2 for Lucene/Solr 8.8.1
> > >
> > > The artifacts can be downloaded from:
> > >
> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.8.1-RC2-rev64f3b496bfee762a9d2dbff40700f457f4464dfe
> > >
> > > You can run the smoke tester directly with this command:
> > > python3 -u dev-tools/scripts/smokeTestRelease.py
> > >
> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.8.1-RC2-rev64f3b496bfee762a9d2dbff40700f457f4464dfe
> > >
> > > The vote will be open for at least 72 hours i.e. until 2021-02-20
> 03:00 UTC.
> > >
> > > [ ] +1  approve
> > > [ ] +0  no opinion
> > > [ ] -1  disapprove (and reason why)
> > >
> > > Here is my +1 SUCCESS! [0:50:07.947952]
> > >
> > > Also, as with RC1, in addition to the smoke test, I built a Docker
> > > image from the RC locally and verified:
> > >
> > > a. A rolling upgrade of a 3-node 8.7.0 cluster to the 8.8.1 RC
> > > completes successfully w/o any NPEs or weirdness with leader election
> > > / recoveries.
> > > b. The base_url property is stored in replica state after the upgrade
> > > c. A basic client application built with SolrJ 8.7.0 can load cluster
> > > state info directly from ZK and query the 8.8.1 RC2 servers.
> > > d. Same client app built with SolrJ 8.8.0 works as well.
> > >
> > > As this bug-fix release is primarily needed to address a SolrJ
> > > back-compat break (SOLR-15145) and unfortunately our smoke tester
> > > framework does not test for backcompat of older SolrJ against the RC,
> > > I ask others to please test rolling upgrades of servers (ideally
> > > multi-node clusters) running pre-8.8.0 to this RC if possible. Also,
> > > please try client applications that are using an older SolrJ, esp.
> > > those that load cluster state directly from ZK.
> > >
> > > Best regards,
> > > Tim
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: dev-h...@lucene.apache.org
> >
>
>
> --
> -
> Noble Paul
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


Re: [VOTE] Release Lucene/Solr 8.8.1 RC2

2021-02-17 Thread Noble Paul
SUCCESS! [1:04:46.520370]

+1 Binding

On Wed, Feb 17, 2021 at 1:44 PM Timothy Potter  wrote:
>
> And I continue to struggle with the python3 command:
>
> python3 -u dev-tools/scripts/smokeTestRelease.py \
> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.8.1-RC2-rev64f3b496bfee762a9d2dbff40700f457f4464dfe
>
> On Tue, Feb 16, 2021 at 7:41 PM Timothy Potter  wrote:
> >
> > Please vote for release candidate 2 for Lucene/Solr 8.8.1
> >
> > The artifacts can be downloaded from:
> > https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.8.1-RC2-rev64f3b496bfee762a9d2dbff40700f457f4464dfe
> >
> > You can run the smoke tester directly with this command:
> > python3 -u dev-tools/scripts/smokeTestRelease.py
> > https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.8.1-RC2-rev64f3b496bfee762a9d2dbff40700f457f4464dfe
> >
> > The vote will be open for at least 72 hours i.e. until 2021-02-20 03:00 UTC.
> >
> > [ ] +1  approve
> > [ ] +0  no opinion
> > [ ] -1  disapprove (and reason why)
> >
> > Here is my +1 SUCCESS! [0:50:07.947952]
> >
> > Also, as with RC1, in addition to the smoke test, I built a Docker
> > image from the RC locally and verified:
> >
> > a. A rolling upgrade of a 3-node 8.7.0 cluster to the 8.8.1 RC
> > completes successfully w/o any NPEs or weirdness with leader election
> > / recoveries.
> > b. The base_url property is stored in replica state after the upgrade
> > c. A basic client application built with SolrJ 8.7.0 can load cluster
> > state info directly from ZK and query the 8.8.1 RC2 servers.
> > d. Same client app built with SolrJ 8.8.0 works as well.
> >
> > As this bug-fix release is primarily needed to address a SolrJ
> > back-compat break (SOLR-15145) and unfortunately our smoke tester
> > framework does not test for backcompat of older SolrJ against the RC,
> > I ask others to please test rolling upgrades of servers (ideally
> > multi-node clusters) running pre-8.8.0 to this RC if possible. Also,
> > please try client applications that are using an older SolrJ, esp.
> > those that load cluster state directly from ZK.
> >
> > Best regards,
> > Tim
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>


-- 
-
Noble Paul

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org