from:"Gil Tene $JIRA$"

[jira] [Commented] (LUCENE-4482) Likely Zing JVM bug causes failures in TestPayloadNearQuery

2012-10-13 Thread Gil Tene (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475643#comment-13475643
]

Gil Tene commented on LUCENE-4482:
--

We're looking into this bug report. Will hopefully report back / resolve it
soon. [But Michael, please go ahead and report it on our bugzilla as well per
the above].

[Uwe Schindler wrote:]
I would run Zing tests, too, but before doing that they should:
Not rely on strange binary kernel modules that are outdated on
Ubuntu 12.04.1 LTS. The Jenkins server is running in DMZ so I
will never ever run it with outdated kernels. They should (if
they really need a kernel module, which is in my opinion a no-go,
too) use DKMS and make the kernel module open source, so my kernel
is also not tainted. Without that I will not support Zing, sorry.
But I doubt if the kernel module is really needed! Without a
clear explanation why this is needed on their homepage I don't agree.

This has two parts: One asking/questioning why our loadable module is needed at
all, and the other relating to it's availability for various kernels and Linux
distros.

1. Why is the ZST (which includes a loadable module) needed for Zing to operate?

One of Zing JVM's main distinctions is that it's C4 garbage collector (aka GPGC
internally) eliminates garbage collection as a response time concern for
enterprise applications. Among other things, C4 relies on rapid manipulation of
virtual memory and physical memory mappings to maintain continuous operation.
While the semantics of the manipulations we do are possible using the vanilla
mmap/mremap/munmap/madvise APIs, the rate at which those are supported in Linux
(and most other OSs) is extremely low due mostly to the historic, extremely
conservative approach to in-process TLB invalidation, and due partly to issues
with multiple-page size manipulations. We're not talking small change here.
More like 4-6 orders of magnitude for our common operation, which is, right
now, the difference between a practical and impractical implementation of C4.
You can find a detailed discussion of the difference in metrics for these
operations at http://tinyurl.com/34ytcvc, and a detailed discussion of C4 in
our ISMM paper (http://tinyurl.com/94c9btb at the ACM site, or at the Azul site
http://tinyurl.com/7rydpvo).

2. Loadable Module availability and compatibility

To be clear our loadable module is open source, under GPLv2, and you can have
the sources for it if you wish. The reason for the current choice of packaging
is that a wide range of current end-customer's Linux systems do not have (or
wish to install) the tooling needed to build or re-build the module, and what
they need operationally is an RPM that opens and installs without requiring
kernel headers and the like. In addition, we tend to intensively test and
examine the kernel module against specific distros and kernel to verify
compatibility and stability, and declare official support for these well tested
combinations.

On other linux distros (RHEL, CentOS, SLES), the kernel revision velocity is
fairly slow, and the kernel api signatures tend to remain the same unless
semantics are actually modified. As a result, we use a single module RPM of
RHEL5 and CentOS 5 versions, and have only needed a single rev of the module
packaging during the evolution of RHEL6/CentOS6 and SLES 11 thus far.

As we added Zing support for Ubunutu, primarily due to it's popularity with
developers, we found that kernel api signatures there change with practically
every patch, even with no semantic change. This creates some serious friction
with our current loadable module packaging and distribution choice for Ubuntu.
We are working to resolve this, either by using DKMS or some other alternative,
such that modules can continue to work or be properly updated as kernels rev up
in Ubunutu-style distros.

So we're working on it, and it will get better...

Likely Zing JVM bug causes failures in TestPayloadNearQuery
---

Key: LUCENE-4482
URL: https://issues.apache.org/jira/browse/LUCENE-4482
Project: Lucene - Core
Issue Type: Bug
Environment: Lucene trunk, rev 1397735
Zing:
{noformat}
java version 1.6.0_31
Java(TM) SE Runtime Environment (build 1.6.0_31-6)
Java HotSpot(TM) 64-Bit Tiered VM (build
1.6.0_31-ZVM_5.2.3.0-b6-product-azlinuxM-X86_64, mixed mode)
{noformat}
Ubuntu 12.04 LTS 3.2.0-23-generic kernel
Reporter: Michael McCandless
Attachments: LUCENE-4482.patch

[jira] [Comment Edited] (LUCENE-4482) Likely Zing JVM bug causes failures in TestPayloadNearQuery

2012-10-13 Thread Gil Tene (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475643#comment-13475643
]

Gil Tene edited comment on LUCENE-4482 at 10/13/12 4:09 PM:

We're looking into this bug report. Will hopefully report back / resolve it
soon. [But Michael, please go ahead and report it on our bugzilla as well per
the above].

This has two parts: One asking/questioning why our loadable module is needed at
all, and the other relating to it's availability for various kernels and Linux
distros.

1. Why is the ZST (which includes a loadable module) needed for Zing to operate?

2. Loadable Module availability and compatibility

So we're working on it, and it will get better...

-- Gil. [CTO, Azul Systems]

was (Author: giltene):
We're looking into this bug report. Will hopefully report back / resolve it
soon. [But Michael, please go ahead and report it on our bugzilla as well per
the above].

This has two parts: One asking/questioning why

[jira] [Commented] (LUCENE-4482) Likely Zing JVM bug causes failures in TestPayloadNearQuery

2012-10-13 Thread Gil Tene (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475666#comment-13475666
 ] 

Gil Tene commented on LUCENE-4482:
--

bq.
Maybe it would be a good idea to provide both C4 - Memory Management layers, 
so also for plain kernels (as configuration option to the JVM like huge pages 
in Oracle's). Or is your VM then only as fast as Oracle's?

It's not so much a matter of speed as it is a matter of pause time. Zing is not 
faster than Oracle's JVM, it's just as fast but without those pesky pauses. 
It's those pauses that keep people from using anything more than a tiny amount 
of memory in Java these days (to me tiny means a small fraction of a 
commodity, $4K server). With the ability to practically (i.e. without 
completely stopping for many seconds at a time once is a while) use the nice, 
cheap memory we now have in servers comes another form of speed - the kind that 
comes from not repeating work.

A 4 to 6 order of magnitude difference in pause time and in sustainable 
allocation rate is so big that a C4 that uses the vanilla memory management api 
would be unusable at this point. Think of the difference between a 20usec phase 
shift and a 20 second pause...

bq.
...As VirtualBOX's module has similar use-cases like yours for virtualization, 
I hope yours does not conflict with that one.

We don't test on VirtualBOX, so I don't know for sure. In general, Zing works 
fine when run on top of hypervisors that fully support things like 2MB page 
mappings (the same sort of support needed for hugetlb feature to work). 
Unfortunately, there are some hypervisors out there (e.g.  some versions of Xen 
for paravirt guests) that don't support that, and will crash a vanilla linux 
kernel trying to use hugetlb. Zing won't work in such cases either, and for the 
same reasons...


 Likely Zing JVM bug causes failures in TestPayloadNearQuery
 ---

 Key: LUCENE-4482
 URL: https://issues.apache.org/jira/browse/LUCENE-4482
 Project: Lucene - Core
  Issue Type: Bug
 Environment: Lucene trunk, rev 1397735
 Zing:
 {noformat}
   java version 1.6.0_31
   Java(TM) SE Runtime Environment (build 1.6.0_31-6)
   Java HotSpot(TM) 64-Bit Tiered VM (build 
 1.6.0_31-ZVM_5.2.3.0-b6-product-azlinuxM-X86_64, mixed mode)
 {noformat}
 Ubuntu 12.04 LTS 3.2.0-23-generic kernel
Reporter: Michael McCandless
 Attachments: LUCENE-4482.patch


 I dug into one of the Lucene test failures when running with Zing JVM
 (available free for open source devs...).  At least one other test
 sometimes fails but I haven't dug into that yet.
 I managed to get the failure easily reproduced: with the attached
 patch, on rev 1397735 checkout, if you cd to lucene/core and run:
 {noformat}
   ant test -Dtests.jvms=1 -Dtests.seed=C3802435F5FB39D0 
 -Dtests.showSuccess=true
 {noformat}
 Then you'll hit several failures in TestPayloadNearQuery, eg:
 {noformat}
 Suite: org.apache.lucene.search.payloads.TestPayloadNearQuery
   1 FAILED
   2 NOTE: reproduce with: ant test  -Dtestcase=TestPayloadNearQuery 
 -Dtests.method=test -Dtests.seed=C3802435F5FB39D0 -Dtests.slow=true 
 -Dtests.locale=ga -Dtests.timezone=America/Adak -Dtests.file.encoding=US-ASCII
 ERROR   0.01s | TestPayloadNearQuery.test 
 Throwable #1: java.lang.RuntimeException: overridden idfExplain method 
 in TestPayloadNearQuery.BoostingSimilarity was not called
  at 
 __randomizedtesting.SeedInfo.seed([C3802435F5FB39D0:4BD41BEF5B075428]:0)
  at 
 org.apache.lucene.search.similarities.TFIDFSimilarity.computeWeight(TFIDFSimilarity.java:740)
  at org.apache.lucene.search.spans.SpanWeight.init(SpanWeight.java:62)
  at 
 org.apache.lucene.search.payloads.PayloadNearQuery$PayloadNearSpanWeight.init(PayloadNearQuery.java:147)
  at 
 org.apache.lucene.search.payloads.PayloadNearQuery.createWeight(PayloadNearQuery.java:75)
  at 
 org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:648)
  at 
 org.apache.lucene.search.AssertingIndexSearcher.createNormalizedWeight(AssertingIndexSearcher.java:60)
  at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:265)
  at 
 org.apache.lucene.search.payloads.TestPayloadNearQuery.test(TestPayloadNearQuery.java:146)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
  at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
  at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
  at

[jira] [Comment Edited] (LUCENE-4482) Likely Zing JVM bug causes failures in TestPayloadNearQuery

2012-10-13 Thread Gil Tene (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475666#comment-13475666
]

Gil Tene edited comment on LUCENE-4482 at 10/13/12 5:20 PM:

{quote}
Maybe it would be a good idea to provide both C4 - Memory Management layers,
so also for plain kernels (as configuration option to the JVM like huge pages
in Oracle's). Or is your VM then only as fast as Oracle's?
{quote}

It's not so much a matter of speed as it is a matter of pause time. Zing is not
faster than Oracle's JVM, it's just as fast but without those pesky pauses.
It's those pauses that keep people from using anything more than a tiny amount
of memory in Java these days (to me tiny means a small fraction of a
commodity, $4K server). With the ability to practically (i.e. without
completely stopping for many seconds at a time once is a while) use the nice,
cheap memory we now have in servers comes another form of speed - the kind that
comes from not repeating work.

A 4 to 6 order of magnitude difference in pause time and in sustainable
allocation rate is so big that a C4 that uses the vanilla memory management api
would be unusable at this point. Think of the difference between a 20usec phase
shift and a 20 second pause...

{quote}
...As VirtualBOX's module has similar use-cases like yours for virtualization,
I hope yours does not conflict with that one.
{quote}

We don't test on VirtualBOX, so I don't know for sure. In general, Zing works
fine when run on top of hypervisors that fully support things like 2MB page
mappings (the same sort of support needed for hugetlb feature to work).
Unfortunately, there are some hypervisors out there (e.g. some versions of Xen
for paravirt guests) that don't support that, and will crash a vanilla linux
kernel trying to use hugetlb. Zing won't work in such cases either, and for the
same reasons...

was (Author: giltene):
bq.
Maybe it would be a good idea to provide both C4 - Memory Management layers,
so also for plain kernels (as configuration option to the JVM like huge pages
in Oracle's). Or is your VM then only as fast as Oracle's?

bq.
...As VirtualBOX's module has similar use-cases like yours for virtualization,
I hope yours does not conflict with that one.

Likely Zing JVM bug causes failures in TestPayloadNearQuery
---

I dug into one of the Lucene test failures when running with Zing JVM
(available free for open source devs...). At least one other test
sometimes fails but I haven't dug into that yet.
I managed to get the failure easily reproduced: with the attached
patch, on rev 1397735 checkout, if you cd to lucene/core and run:
{noformat}
ant test -Dtests.jvms=1 -Dtests.seed=C3802435F5FB39D0
-Dtests.showSuccess=true
{noformat}
Then you'll hit several failures in TestPayloadNearQuery, eg:
{noformat}
Suite: org.apache.lucene.search.payloads.TestPayloadNearQuery
1

[jira] [Comment Edited] (LUCENE-4482) Likely Zing JVM bug causes failures in TestPayloadNearQuery

2012-10-13 Thread Gil Tene (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475666#comment-13475666
]

Gil Tene edited comment on LUCENE-4482 at 10/13/12 5:22 PM:

{quote}
...As VirtualBOX's module has similar use-cases like yours for virtualization,
I hope yours does not conflict with that one.
{quote}

was (Author: giltene):
{quote}
Maybe it would be a good idea to provide both C4 - Memory Management layers,
so also for plain kernels (as configuration option to the JVM like huge pages
in Oracle's). Or is your VM then only as fast as Oracle's?
{quote}

{quote}
...As VirtualBOX's module has similar use-cases like yours for virtualization,
I hope yours does not conflict with that one.
{quote}

Likely Zing JVM bug causes failures in TestPayloadNearQuery
---

[jira] [Comment Edited] (LUCENE-4482) Likely Zing JVM bug causes failures in TestPayloadNearQuery

2012-10-13 Thread Gil Tene (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475666#comment-13475666
]

Gil Tene edited comment on LUCENE-4482 at 10/13/12 5:23 PM:

{quote}
...As VirtualBOX's module has similar use-cases like yours for virtualization,
I hope yours does not conflict with that one.
{quote}

Likely Zing JVM bug causes failures in TestPayloadNearQuery
---

[jira] [Commented] (LUCENE-4482) Likely Zing JVM bug causes failures in TestPayloadNearQuery

[jira] [Comment Edited] (LUCENE-4482) Likely Zing JVM bug causes failures in TestPayloadNearQuery

[jira] [Commented] (LUCENE-4482) Likely Zing JVM bug causes failures in TestPayloadNearQuery

[jira] [Comment Edited] (LUCENE-4482) Likely Zing JVM bug causes failures in TestPayloadNearQuery

[jira] [Comment Edited] (LUCENE-4482) Likely Zing JVM bug causes failures in TestPayloadNearQuery

[jira] [Comment Edited] (LUCENE-4482) Likely Zing JVM bug causes failures in TestPayloadNearQuery

6 matches

Site Navigation

Mail list logo

Footer information