[GitHub] lucene-solr issue #510: LUCENE-8573: Use FutureArrays#mismatch in BKDWriter
Github user uschindler commented on the issue: https://github.com/apache/lucene-solr/pull/510 Perfect. So it's even more performant with Java 9+. Thanks, Christian --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #500: LUCENE-8517: do not wrap FixedShingleFilter with con...
Github user uschindler commented on the issue: https://github.com/apache/lucene-solr/pull/500 Hi, Currently a Lucene release should be done with Java 8. Therefore all checks like precommit have to be done with Java 8, otherwise we canât be sure that it really works with Java 8. Lucene can for sure be built with Java 9 or later, but some developer-central tasks cannot be executed for various reasons (that are not necessary for building from source). The problem here is with the ECJ tool that is used for precommit checks. Itâs the Eclipse compiler and this one needs a version that support the module system. Unfortunately there were some problems with updating it as it broke some parts of the checks (I tried it a while back, mabye itâs better now). Other tools that wonât work are the Javadocs linter, because the format of Javadocs differs in Java 9+ and maintaining 2 different HTML parsers is too hard (because the Javadocs in the release JAR/ZIP/TGZ are Java 8 anyways). As a contributor/committer of Lucene you have to use Java 8 at least for the quality checks so we can be sure that all is fine with your code! As an end user, you can build lucene with any version later â this is also tested with Jenkins. Uwe --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #500: LUCENE-8517: do not wrap FixedShingleFilter with con...
Github user uschindler commented on the issue: https://github.com/apache/lucene-solr/pull/500 It's only supported with Java 8, because that's our intended release version. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #480: LUCENE-8535: Drop out of the box Block-Join highligh...
Github user uschindler commented on the issue: https://github.com/apache/lucene-solr/pull/480 Thanks for removing the dependency hell! --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #328: SOLR-12034
Github user uschindler commented on the issue: https://github.com/apache/lucene-solr/pull/328 Custom Analyzer was just added to have an easy-to-use Builder-Like API for Analyzers. It was not meant to replace SOlr's (although it would be nice, but it's impossible as I figured out at that time, too). Solr is based on modifiable classes and XML, not builders... --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #328: SOLR-12034
Github user uschindler commented on the issue: https://github.com/apache/lucene-solr/pull/328 If you can't make Solr work with the current public CustomAnalyzer API, please keep it as is and use your own TokenizerChain in Solr. Please don't make CustomAnalyzer unmodifiable or add access to internal fields! In fact, it's a minimum amount of code behind the 3 lists of factories that build up the Analyzer that does not justify cluttering Lucene's API (like the horrible MultiTermAwareComponent added by Solr). --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #432: LUCENE-8438: RAMDirectory speed improvements ...
Github user uschindler commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/432#discussion_r209170816 --- Diff: lucene/core/src/java/org/apache/lucene/store/ByteBuffersDirectory.java --- @@ -0,0 +1,237 @@ +package org.apache.lucene.store; + +import java.io.FileNotFoundException; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.nio.file.AccessDeniedException; +import java.nio.file.FileAlreadyExistsException; +import java.nio.file.NoSuchFileException; +import java.util.Arrays; +import java.util.Collection; +import java.util.List; +import java.util.Locale; +import java.util.Objects; +import java.util.concurrent.ConcurrentHashMap; +import java.util.concurrent.atomic.AtomicLong; +import java.util.function.BiFunction; +import java.util.function.Function; +import java.util.zip.CRC32; + +import org.apache.lucene.index.IndexFileNames; +import org.apache.lucene.util.BitUtil; + +public final class ByteBuffersDirectory extends BaseDirectory { + public static final BiFunction OUTPUT_AS_MANY_BUFFERS = + (fileName, output) -> { +ByteBuffersDataInput dataInput = output.toDataInput(); +String inputName = String.format(Locale.ROOT, "%s (file=%s, buffers=%s)", +ByteBuffersIndexInput.class.getSimpleName(), +fileName, +dataInput.toString()); +return new ByteBuffersIndexInput(dataInput, inputName); + }; + + public static final BiFunction OUTPUT_AS_ONE_BUFFER = + (fileName, output) -> { +ByteBuffersDataInput dataInput = new ByteBuffersDataInput(Arrays.asList(ByteBuffer.wrap(output.copyToArray(; +String inputName = String.format(Locale.ROOT, "%s (file=%s, buffers=%s)", +ByteBuffersIndexInput.class.getSimpleName(), +fileName, +dataInput.toString()); +return new ByteBuffersIndexInput(dataInput, inputName); + }; + + public static final BiFunction OUTPUT_AS_BYTE_ARRAY = + (fileName, output) -> { +byte[] array = output.copyToArray(); +String inputName = String.format(Locale.ROOT, "%s (file=%s, length=%s)", +ByteArrayIndexInput.class.getSimpleName(), +fileName, +array.length); +return new ByteArrayIndexInput(inputName, array, 0, array.length); + }; + + public static final BiFunction OUTPUT_AS_MANY_BUFFERS_LUCENE = + (fileName, output) -> { +List bufferList = output.toBufferList(); +int chunkSizePower; +bufferList.add(ByteBuffer.allocate(0)); --- End diff -- What exception did you get and where? --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #432: LUCENE-8438: RAMDirectory speed improvements ...
Github user uschindler commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/432#discussion_r209170130 --- Diff: lucene/core/src/java/org/apache/lucene/store/ByteBuffersDirectory.java --- @@ -0,0 +1,237 @@ +package org.apache.lucene.store; + +import java.io.FileNotFoundException; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.nio.file.AccessDeniedException; +import java.nio.file.FileAlreadyExistsException; +import java.nio.file.NoSuchFileException; +import java.util.Arrays; +import java.util.Collection; +import java.util.List; +import java.util.Locale; +import java.util.Objects; +import java.util.concurrent.ConcurrentHashMap; +import java.util.concurrent.atomic.AtomicLong; +import java.util.function.BiFunction; +import java.util.function.Function; +import java.util.zip.CRC32; + +import org.apache.lucene.index.IndexFileNames; +import org.apache.lucene.util.BitUtil; + +public final class ByteBuffersDirectory extends BaseDirectory { + public static final BiFunction OUTPUT_AS_MANY_BUFFERS = + (fileName, output) -> { +ByteBuffersDataInput dataInput = output.toDataInput(); +String inputName = String.format(Locale.ROOT, "%s (file=%s, buffers=%s)", +ByteBuffersIndexInput.class.getSimpleName(), +fileName, +dataInput.toString()); +return new ByteBuffersIndexInput(dataInput, inputName); + }; + + public static final BiFunction OUTPUT_AS_ONE_BUFFER = + (fileName, output) -> { +ByteBuffersDataInput dataInput = new ByteBuffersDataInput(Arrays.asList(ByteBuffer.wrap(output.copyToArray(; +String inputName = String.format(Locale.ROOT, "%s (file=%s, buffers=%s)", +ByteBuffersIndexInput.class.getSimpleName(), +fileName, +dataInput.toString()); +return new ByteBuffersIndexInput(dataInput, inputName); + }; + + public static final BiFunction OUTPUT_AS_BYTE_ARRAY = + (fileName, output) -> { +byte[] array = output.copyToArray(); +String inputName = String.format(Locale.ROOT, "%s (file=%s, length=%s)", +ByteArrayIndexInput.class.getSimpleName(), +fileName, +array.length); +return new ByteArrayIndexInput(inputName, array, 0, array.length); + }; + + public static final BiFunction OUTPUT_AS_MANY_BUFFERS_LUCENE = + (fileName, output) -> { +List bufferList = output.toBufferList(); +int chunkSizePower; +bufferList.add(ByteBuffer.allocate(0)); --- End diff -- I think the 0-byte buffer is only needed for buffers with exactly the chunkSize, see the comment: ```java // we always allocate one more buffer, the last one may be a 0 byte one final int nrBuffers = (int) (length >>> chunkSizePower) + 1; ``` So the last one is a 0-bytes buffer, if the total size is actually exactly 2^x. Maybe we can fix this, but this made calculations more easy. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #432: LUCENE-8438: RAMDirectory speed improvements ...
Github user uschindler commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/432#discussion_r208520676 --- Diff: lucene/core/src/java/org/apache/lucene/store/ByteBuffersDirectory.java --- @@ -0,0 +1,237 @@ +package org.apache.lucene.store; + +import java.io.FileNotFoundException; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.nio.file.AccessDeniedException; +import java.nio.file.FileAlreadyExistsException; +import java.nio.file.NoSuchFileException; +import java.util.Arrays; +import java.util.Collection; +import java.util.List; +import java.util.Locale; +import java.util.Objects; +import java.util.concurrent.ConcurrentHashMap; +import java.util.concurrent.atomic.AtomicLong; +import java.util.function.BiFunction; +import java.util.function.Function; +import java.util.zip.CRC32; + +import org.apache.lucene.index.IndexFileNames; +import org.apache.lucene.util.BitUtil; + +public final class ByteBuffersDirectory extends BaseDirectory { + public static final BiFunction OUTPUT_AS_MANY_BUFFERS = + (fileName, output) -> { +ByteBuffersDataInput dataInput = output.toDataInput(); +String inputName = String.format(Locale.ROOT, "%s (file=%s, buffers=%s)", +ByteBuffersIndexInput.class.getSimpleName(), +fileName, +dataInput.toString()); +return new ByteBuffersIndexInput(dataInput, inputName); + }; + + public static final BiFunction OUTPUT_AS_ONE_BUFFER = + (fileName, output) -> { +ByteBuffersDataInput dataInput = new ByteBuffersDataInput(Arrays.asList(ByteBuffer.wrap(output.copyToArray(; +String inputName = String.format(Locale.ROOT, "%s (file=%s, buffers=%s)", +ByteBuffersIndexInput.class.getSimpleName(), +fileName, +dataInput.toString()); +return new ByteBuffersIndexInput(dataInput, inputName); + }; + + public static final BiFunction OUTPUT_AS_BYTE_ARRAY = + (fileName, output) -> { +byte[] array = output.copyToArray(); +String inputName = String.format(Locale.ROOT, "%s (file=%s, length=%s)", +ByteArrayIndexInput.class.getSimpleName(), +fileName, +array.length); +return new ByteArrayIndexInput(inputName, array, 0, array.length); + }; + + public static final BiFunction OUTPUT_AS_MANY_BUFFERS_LUCENE = + (fileName, output) -> { +List bufferList = output.toBufferList(); +int chunkSizePower; +bufferList.add(ByteBuffer.allocate(0)); --- End diff -- why do we need the empty buffer --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #432: LUCENE-8438: RAMDirectory speed improvements ...
Github user uschindler commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/432#discussion_r208518497 --- Diff: lucene/core/src/java/org/apache/lucene/store/ByteBuffersDataOutput.java --- @@ -0,0 +1,488 @@ +package org.apache.lucene.store; + +import java.io.IOException; +import java.io.UncheckedIOException; +import java.nio.ByteBuffer; +import java.util.ArrayDeque; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.List; +import java.util.Locale; +import java.util.Map; +import java.util.Objects; +import java.util.Set; +import java.util.function.Consumer; +import java.util.function.IntConsumer; +import java.util.function.IntFunction; + +import org.apache.lucene.util.Accountable; +import org.apache.lucene.util.BitUtil; +import org.apache.lucene.util.BytesRef; +import org.apache.lucene.util.RamUsageEstimator; +import org.apache.lucene.util.UnicodeUtil; + +public final class ByteBuffersDataOutput extends DataOutput implements Accountable { + private final static ByteBuffer EMPTY = ByteBuffer.allocate(0); + private final static List EMPTY_LIST = Arrays.asList(EMPTY); + private final static byte [] EMPTY_BYTE_ARRAY = {}; + + public final static IntFunction ALLOCATE_BB_ON_HEAP = (size) -> { --- End diff -- There are other places with unneeded lambdas, but that's just a preference. I am fine with both. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #432: LUCENE-8438: RAMDirectory speed improvements ...
Github user uschindler commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/432#discussion_r208515533 --- Diff: lucene/core/src/java/org/apache/lucene/store/ByteBuffersDataOutput.java --- @@ -0,0 +1,488 @@ +package org.apache.lucene.store; + +import java.io.IOException; +import java.io.UncheckedIOException; +import java.nio.ByteBuffer; +import java.util.ArrayDeque; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.List; +import java.util.Locale; +import java.util.Map; +import java.util.Objects; +import java.util.Set; +import java.util.function.Consumer; +import java.util.function.IntConsumer; +import java.util.function.IntFunction; + +import org.apache.lucene.util.Accountable; +import org.apache.lucene.util.BitUtil; +import org.apache.lucene.util.BytesRef; +import org.apache.lucene.util.RamUsageEstimator; +import org.apache.lucene.util.UnicodeUtil; + +public final class ByteBuffersDataOutput extends DataOutput implements Accountable { + private final static ByteBuffer EMPTY = ByteBuffer.allocate(0); + private final static List EMPTY_LIST = Arrays.asList(EMPTY); + private final static byte [] EMPTY_BYTE_ARRAY = {}; + + public final static IntFunction ALLOCATE_BB_ON_HEAP = (size) -> { --- End diff -- Just write `public final static IntFunction ALLOCATE_BB_ON_HEAP = ByteBuffer::allocate;`, no lambda needed. The same applies at other places in the PR, but this looks really strange. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #392: LUCENE-8345 - add wrapper class constructors ...
Github user uschindler commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/392#discussion_r193109013 --- Diff: lucene/tools/forbiddenApis/base.txt --- @@ -40,3 +40,21 @@ java.util.Collections#shuffle(java.util.List) @ Use shuffle(List, Random) instea java.util.Locale#forLanguageTag(java.lang.String) @ use new Locale.Builder().setLanguageTag(...).build() which has error handling java.util.Locale#toString() @ use Locale#toLanguageTag() for a standardized BCP47 locale name + +@defaultMessage Constructors for wrapper classes of Java primitives should be avoided in favor of the public static methods available or autoboxingT --- End diff -- there is a typo at the end! --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #287: SOLR-11331: Ability to Debug Solr With Eclips...
Github user uschindler commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/287#discussion_r174281383 --- Diff: dev-tools/eclipse/dot.classpath.xsl --- @@ -54,7 +55,23 @@ - + --- End diff -- sure! --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #287: SOLR-11331: Ability to Debug Solr With Eclips...
Github user uschindler commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/287#discussion_r174147473 --- Diff: dev-tools/eclipse/dot.classpath.xsl --- @@ -54,7 +55,23 @@ - + --- End diff -- Could you fix the indent here. In Lucene we only allow whitespace, tabs are forbidden anywhere. This looks like it's mixed here (unfortunately I can't see it here) --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #:
Github user uschindler commented on the pull request: https://github.com/apache/lucene-solr/commit/3c0d111d07184e96a73ca6dc05c6227d839724e2#commitcomment-25480968 Could you please ask your message on the Lucene development mailing list? See here how to subscribe: https://lucene.apache.org/core/discussion.html There is no problem with the GIT repository here, if you click on the commit above (in your message), it works. So this may be a problem with your client. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #245: SOLR-11331: Ability to Debug Solr With Eclipse IDE
Github user uschindler commented on the issue: https://github.com/apache/lucene-solr/pull/245 Can we do this in a way to place all files inside the `/eclipse-build` folder and not add additional gitignores. I would also not copy the solr webapp, as this makes the whole setup more confusing! --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #197: [SOLR-10506] Fixes a memory leak in zk schema watchi...
Github user uschindler commented on the issue: https://github.com/apache/lucene-solr/pull/197 Should we not have a non-empty message when logging the InterruptedException? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #74: [SOLR-9444] Fix path usage for cloud backup/re...
Github user uschindler commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/74#discussion_r77301896 --- Diff: solr/core/src/test/org/apache/solr/cloud/TestLocalFSCloudBackupRestore.java --- @@ -24,12 +24,20 @@ * such file-system would be exposed via local file-system API. */ public class TestLocalFSCloudBackupRestore extends AbstractCloudBackupRestoreTestCase { + private static String backupLocation; @BeforeClass public static void setupClass() throws Exception { configureCluster(NUM_SHARDS)// nodes .addConfig("conf1", TEST_PATH().resolve("configsets").resolve("cloud-minimal").resolve("conf")) .configure(); + +boolean whitespacesInPath = random().nextBoolean(); +if (whitespacesInPath) { + backupLocation = createTempDir("my backup").toFile().getAbsolutePath(); --- End diff -- I'd use `backupLocation = createTempDir(...).toAbsolutePath().toString();` to get rid of legacy `java.io.File` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #74: [SOLR-9444] Fix path usage for cloud backup/re...
Github user uschindler commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/74#discussion_r76740913 --- Diff: solr/core/src/java/org/apache/solr/core/backup/repository/LocalFileSystemRepository.java --- @@ -58,21 +59,28 @@ public void init(NamedList args) { } @Override - public URI createURI(String... pathComponents) { -Preconditions.checkArgument(pathComponents.length > 0); - -String basePath = Preconditions.checkNotNull(pathComponents[0]); -// Note the URI.getPath() invocation on Windows platform generates an invalid URI. -// Refer to http://stackoverflow.com/questions/9834776/java-nio-file-path-issue -// Since the caller may have used this method to generate the string representation -// for the pathComponents, we implement a work-around specifically for Windows platform -// to remove the leading '/' character. -if (Constants.WINDOWS) { - basePath = basePath.replaceFirst("^/(.:/)", "$1"); + public URI createURI(String location) { +Preconditions.checkNotNull(location); + +URI result = null; +try { --- End diff -- Nice. This is exactly as I proposed. So people can use both URIs with a file: or just a plain path. URI.isAbsolute() returns false, if scheme ("file:") is missing: <https://docs.oracle.com/javase/7/docs/api/java/net/URI.html#isAbsolute()> "A URI is absolute if, and only if, it has a scheme component." --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #44: SOLR-8981
Github user uschindler commented on the issue: https://github.com/apache/lucene-solr/pull/44 Ah OK, so no problem on my side. I'll wait a bit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #44: SOLR-8981
Github user uschindler commented on the issue: https://github.com/apache/lucene-solr/pull/44 Hi I have applied some other fixes and will push soon. Currently ASF have some problems with pushing: git.exe push --progress "origin" master:master Counting objects: 121, done. Delta compression using up to 8 threads. Compressing objects: 100% (66/66), done. Writing objects: 100% (121/121), 8.90 KiB | 0 bytes/s, done. Total 121 (delta 55), reused 17 (delta 2) remote: You are not authorized to edit this repository. remote: To https://git-wip-us.apache.org/repos/asf/lucene-solr.git ! [remote rejected] master -> master (pre-receive hook declined) error: failed to push some refs to 'https://git-wip-us.apache.org/repos/asf/lucene-solr.git' --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #44: SOLR-8981
Github user uschindler commented on the issue: https://github.com/apache/lucene-solr/pull/44 OK, the tests pass for me successfully. Should I remove the jackcess-encrypt package from your PR after merging (you said you will be away this weekend)? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #44: SOLR-8981
Github user uschindler commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/44#discussion_r67575579 --- Diff: solr/contrib/morphlines-cell/src/test/org/apache/solr/morphlines/cell/SolrCellMorphlineTest.java --- @@ -42,8 +42,6 @@ @BeforeClass public static void beforeClass2() { assumeFalse("FIXME: Morphlines currently has issues with Windows paths", Constants.WINDOWS); -assumeFalse("This test fails with Java 9 (https://issues.apache.org/jira/browse/PDFBOX-3155, https://issues.apache.org/jira/browse/SOLR-8876)", --- End diff -- This should stay, because Hadoop related stuff also fails with Java 9. Maybe only remove the PDFBOX issue number. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #44: SOLR-8981
Github user uschindler commented on the issue: https://github.com/apache/lucene-solr/pull/44 Let's pick option 2 for now. Maybe update the rest of Solr after some review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #44: SOLR-8981
Github user uschindler commented on the issue: https://github.com/apache/lucene-solr/pull/44 I also only have Windows :) I would leave out image format, but MS Access looks fine. Could we leave out updating bouncycastl then? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #44: SOLR-8981
Github user uschindler commented on the issue: https://github.com/apache/lucene-solr/pull/44 Did you check with Java 9 or should I do it? I am not sure about the last assume removed, because there is another SOLR issue in the assume message' not just the PDFBOX one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #44: SOLR-8981
Github user uschindler commented on the issue: https://github.com/apache/lucene-solr/pull/44 What file formats are this? Documents? Otherwise please leave them out. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #44: SOLR-8981
Github user uschindler commented on the issue: https://github.com/apache/lucene-solr/pull/44 LOL. So is this a bug in Solr or in TIKA? Because it did not happen previously. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #44: SOLR-8981
Github user uschindler commented on the issue: https://github.com/apache/lucene-solr/pull/44 Were you able to fix the test or should I look into it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #44: SOLR-8981
Github user uschindler commented on the issue: https://github.com/apache/lucene-solr/pull/44 > will take a look. The test passed if you assumed that the html had two bodies, but that's crazy... I hope this test does not download the internet? It should all run local! I have not looked into it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #44: SOLR-8981
Github user uschindler commented on the issue: https://github.com/apache/lucene-solr/pull/44 Grep for that one and remove them. Tests should pass then with latest Java 9: `assumeFalse("This test fails with Java 9 (https://issues.apache.org/jira/browse/PDFBOX-3155)", Constants.JRE_IS_MINIMUM_JAVA9);` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #44: SOLR-8981
Github user uschindler commented on the issue: https://github.com/apache/lucene-solr/pull/44 OK, I will merge again later. So I will revert my checkout once you have fixed that. Otherwise all looks fine. BTW: Can you remove the assumeFalse on Java 9, because PDFBox is fixed? This was because on Java 9 PDFBOX failed in clinit (version number parsing failure). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #44: SOLR-8981
Github user uschindler commented on the issue: https://github.com/apache/lucene-solr/pull/44 for me it still happens. I just merged the PR --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #44: SOLR-8981
Github user uschindler commented on the issue: https://github.com/apache/lucene-solr/pull/44 I merged everything successfully, but I get one test failure in solr/contrib/extraction: [junit4] FAILURE 0.05s J0 | ExtractingRequestHandlerTest.testXPath <<< [junit4]> Throwable #1: org.junit.ComparisonFailure: expected:<[News]> but was:<[]> [junit4]>at __randomizedtesting.SeedInfo.seed([404BA07016F1FB57:3E1A6EE30E469911]:0) I have the feeling I have seen this before. Weren't you running the extraction tests? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #44: SOLR-8981
Github user uschindler commented on the issue: https://github.com/apache/lucene-solr/pull/44 > I think this should work... ant precommit worked in Linux with these modifications. I kept getting hangs with ant jar-checksums in Windows. If you checkout with git on windows using auto-eol it fails. The reason is git that threats sha1 files as text and converts their line endings. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #44: SOLR-8981
Github user uschindler commented on the issue: https://github.com/apache/lucene-solr/pull/44 Hallo, please also update all SHA1 hashes of files. Plesae run "ant precommit" from root folder of Lu/Solr. This will report all missing things. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request: SOLR-8716 Upgrade to Apache Tika 1.12
Github user uschindler commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/31#discussion_r60798688 --- Diff: solr/NOTICE.txt --- @@ -396,6 +394,33 @@ https://github.com/rjohnsondev/java-libpst JMatIO is a JAVA library to read/write/manipulate with Matlab binary MAT-files. http://www.sourceforge.net/projects/jmatio +metadata-extractor is a straightforward Java library for reading metadata +from image files. +https://github.com/drewnoakes/metadata-extractor + +Java MP4 Parser; A Java API to read, write and create MP4 container +https://github.com/sannies/mp4parser + +Jackcess; is a pure Java library for reading from and writing to MS Access +databases +http://jackcess.sourceforge.net/ + +Jackcess Encrypt; an extension library for the Jackcess project which +implements support for some forms of Microsoft Access and Microsoft +Money encryption +http://jackcessencrypt.sourceforge.net/ + +ROME; is a Java framework for RSS and Atom feeds +(https://github.com/rometools/rome) + +VorbisJava; Ogg and Vorbis Tools for Java +Copyright 2012 Nick Burch +https://github.com/Gagravarr/VorbisJava + +SQLite JSDC Driver; is a library for accessing and creating SQLite --- End diff -- JSDC -> JDBC --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request: SOLR-8716 Upgrade to Apache Tika 1.12
Github user uschindler commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/31#discussion_r60794497 --- Diff: solr/NOTICE.txt --- @@ -396,6 +394,33 @@ https://github.com/rjohnsondev/java-libpst JMatIO is a JAVA library to read/write/manipulate with Matlab binary MAT-files. http://www.sourceforge.net/projects/jmatio +metadata-extractor is a straightforward Java library for reading metadata +from image files. +https://github.com/drewnoakes/metadata-extractor + +Java MP4 Parser; A Java API to read, write and create MP4 container +https://github.com/sannies/mp4parser + +Jackcess; is a pure Java library for reading from and writing to MS Access +databases +http://jackcess.sourceforge.net/ + +Jackcess Encrypt; an extension library for the Jackcess project which +implements support for some forms of Microsoft Access and Microsoft +Money encryption +http://jackcessencrypt.sourceforge.net/ + +ROME; is a Java framework for RSS and Atom feeds +(https://github.com/rometools/rome) + +VorbisJava; Ogg and Vorbis Tools for Java +Copyright 2012 Nick Burch +https://github.com/Gagravarr/VorbisJava + +SQLite JSDC Driver; is a library for accessing and creating SQLite --- End diff -- This is a typo, I think. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request: SOLR-8166 provide config for tika's Pars...
Github user uschindler commented on the pull request: https://github.com/apache/lucene-solr/pull/206#issuecomment-153850780 Hi, this was merged into SVN. Can you close the pull request, the automatic close did not work... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request: SOLR-8166 provide config for tika's Pars...
Github user uschindler commented on the pull request: https://github.com/apache/lucene-solr/pull/206#issuecomment-153793014 Hey, yes exactly like that :-) I will review that later. Give me a day or two, I am looking in merging it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request: SOLR-8166 provide config for tika's Pars...
Github user uschindler commented on the pull request: https://github.com/apache/lucene-solr/pull/206#issuecomment-150497999 Hi, a useful alternative to using commons-beanutils is using the JDK internal bean classes. See https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/handler/admin/SystemInfoHandler.java#L166-L232 for an example. We read the properties of MXBeans in the SystemInfoHandler here. You can get a BeanInfo from the class and then use the property descriptors to get/set properties. And that is what you are doing. Because the JDK code is partly buggy for historical reasons, make sure to use the correct flags added with JDK 7 when inspecting and getting the property descriptors (disabling caches which are broken). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request: SOLR-8166 provide config for tika's Pars...
Github user uschindler commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/206#discussion_r42311681 --- Diff: solr/contrib/extraction/src/java/org/apache/solr/handler/extraction/ExtractingRequestHandler.java --- @@ -79,6 +81,20 @@ public void inform(SolrCore core) { throw new SolrException(ErrorCode.SERVER_ERROR, e); } } + + String parseContextConfigLoc = (String) initArgs.get(PARSE_CONTEXT_CONFIG); + if (parseContextConfigLoc != null) { +File parseContextConfigFile = new File(parseContextConfigLoc); +if (parseContextConfigFile.isAbsolute() == false) { + parseContextConfigFile = new File(core.getResourceLoader().getConfigDir(), parseContextConfigFile.getPath()); --- End diff -- Please dont create a file instance here, because with Zookeeper there is no config directory available. Just use the ResourceLoader methods to load the config file as InputStream (using RL#getResource()). Just pass this input stream to the XML parser. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request: SOLR-8166 provide config for tika's Pars...
Github user uschindler commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/206#discussion_r42311704 --- Diff: solr/contrib/extraction/src/java/org/apache/solr/handler/extraction/ParseContextConfig.java --- @@ -0,0 +1,113 @@ +package org.apache.solr.handler.extraction; + +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import javax.xml.parsers.DocumentBuilder; +import javax.xml.parsers.DocumentBuilderFactory; +import java.io.File; +import java.lang.reflect.Field; +import java.lang.reflect.Method; +import java.util.HashMap; +import java.util.Map; + +import org.apache.tika.parser.ParseContext; +import org.w3c.dom.Document; +import org.w3c.dom.Element; +import org.w3c.dom.NamedNodeMap; +import org.w3c.dom.Node; +import org.w3c.dom.NodeList; + +public class ParseContextConfig { + private Map<Class, Object> entries = new HashMap<>(); + + public ParseContextConfig() { --- End diff -- Please remove unused constructors. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request: SOLR-8166 provide config for tika's Pars...
Github user uschindler commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/206#discussion_r42311693 --- Diff: solr/contrib/extraction/src/java/org/apache/solr/handler/extraction/ExtractingRequestHandler.java --- @@ -79,6 +81,20 @@ public void inform(SolrCore core) { throw new SolrException(ErrorCode.SERVER_ERROR, e); } } + + String parseContextConfigLoc = (String) initArgs.get(PARSE_CONTEXT_CONFIG); + if (parseContextConfigLoc != null) { +File parseContextConfigFile = new File(parseContextConfigLoc); +if (parseContextConfigFile.isAbsolute() == false) { + parseContextConfigFile = new File(core.getResourceLoader().getConfigDir(), parseContextConfigFile.getPath()); +} +try { + parseContextConfig = new ParseContextConfig(parseContextConfigFile, core.getResourceLoader().getClassLoader()); --- End diff -- I would directly pass the resourceloader and not the classloader. Resourceloader has easy-to-use methods to load classes, too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request: SOLR-8166 provide config for tika's Pars...
Github user uschindler commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/206#discussion_r42311723 --- Diff: solr/contrib/extraction/src/test-files/log4j.properties --- @@ -0,0 +1,31 @@ +# Logging level --- End diff -- Please revert changes to this file. They are unrelated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request: SOLR-8166 provide config for tika's Pars...
Github user uschindler commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/206#discussion_r42311743 --- Diff: solr/contrib/extraction/src/test/org/apache/solr/handler/extraction/ParseContextConfigTest.java --- @@ -0,0 +1,54 @@ +package org.apache.solr.handler.extraction; + +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import org.apache.tika.parser.ParseContext; +import org.apache.tika.parser.pdf.PDFParserConfig; +import org.apache.xerces.dom.DocumentImpl; +import org.junit.Test; +import org.w3c.dom.Element; + +import static org.junit.Assert.*; + +public class ParseContextConfigTest { --- End diff -- Please subclass SolrTestCaseJ4, dont create plain tests, as those don't use the test framework, which does additional checks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request: SOLR-8166 provide config for tika's Pars...
Github user uschindler commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/206#discussion_r42311748 --- Diff: solr/contrib/extraction/src/java/org/apache/solr/handler/extraction/ParseContextConfig.java --- @@ -0,0 +1,113 @@ +package org.apache.solr.handler.extraction; + +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import javax.xml.parsers.DocumentBuilder; +import javax.xml.parsers.DocumentBuilderFactory; +import java.io.File; +import java.lang.reflect.Field; +import java.lang.reflect.Method; +import java.util.HashMap; +import java.util.Map; + +import org.apache.tika.parser.ParseContext; +import org.w3c.dom.Document; +import org.w3c.dom.Element; +import org.w3c.dom.NamedNodeMap; +import org.w3c.dom.Node; +import org.w3c.dom.NodeList; + +public class ParseContextConfig { + private Map<Class, Object> entries = new HashMap<>(); + + public ParseContextConfig() { + } + + public ParseContextConfig(Element element, ClassLoader loader) throws Exception { +extract(element, loader); + } + + public ParseContextConfig(Document document, ClassLoader loader) throws Exception { +this(document.getDocumentElement(), loader); + } + + public ParseContextConfig(String fileName, ClassLoader loader) throws Exception { +this(getBuilder().parse(fileName), loader); + } + + public ParseContextConfig(File file, ClassLoader loader) throws Exception { +this(getBuilder().parse(file), loader); + } + + private static DocumentBuilder getBuilder() throws Exception { + return DocumentBuilderFactory.newInstance().newDocumentBuilder(); + } + + private void extract(Element element, ClassLoader loader) throws Exception { +final NodeList xmlEntries = element.getElementsByTagName("entry"); +for (int i = 0; i < xmlEntries.getLength(); i++) { + final NamedNodeMap xmlEntryAttributes = xmlEntries.item(i).getAttributes(); + final String className = xmlEntryAttributes.getNamedItem("class").getNodeValue(); + final String implementationName = xmlEntryAttributes.getNamedItem("value").getNodeValue(); + + final NodeList xmlProperties = ((Element)xmlEntries.item(i)).getElementsByTagName("property"); + +final Class interfaceClass = loader.loadClass(className); +final Class implementationClass = loader.loadClass(implementationName); +final Object instance = implementationClass.newInstance(); + +for (int j = 0; j < xmlProperties.getLength(); j++) { + final Node xmlProperty = xmlProperties.item(j); + final NamedNodeMap xmlPropertyAttributes = xmlProperty.getAttributes(); + + final String propertyName = xmlPropertyAttributes.getNamedItem("name").getNodeValue(); + final String propertyValue = xmlPropertyAttributes.getNamedItem("value").getNodeValue(); + + final Field declaredField = interfaceClass.getDeclaredField(propertyName); + final Class type = declaredField.getType(); + final Method declaredMethod = interfaceClass.getDeclaredMethod("set" + propertyName.substring(0, 1).toUpperCase() + propertyName.substring(1, propertyName.length()), type); --- End diff -- This does not work in Turkey! :-) Don't use String#toUpper/LowerCase() without giving a Locale (Locale.ROOT is needed here) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request: Explicitly stop beast from running on to...
Github user uschindler commented on the pull request: https://github.com/apache/lucene-solr/pull/96#issuecomment-56595072 Hi, see comments on https://issues.apache.org/jira/browse/LUCENE-5968 ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request: Add PrefillTokenStream in analysis commo...
Github user uschindler commented on the pull request: https://github.com/apache/lucene-solr/pull/53#issuecomment-43470477 If incrementToken() is final its enough to hold contract, so an additional class would not be needed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org