[GitHub] lucene-solr issue #44: SOLR-8981
Github user uschindler commented on the issue: https://github.com/apache/lucene-solr/pull/44 Ah OK, so no problem on my side. I'll wait a bit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #44: SOLR-8981
Github user lewismc commented on the issue: https://github.com/apache/lucene-solr/pull/44 Yes the server is buggered. Good work folks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #44: SOLR-8981
Github user uschindler commented on the issue: https://github.com/apache/lucene-solr/pull/44 Hi I have applied some other fixes and will push soon. Currently ASF have some problems with pushing: git.exe push --progress "origin" master:master Counting objects: 121, done. Delta compression using up to 8 threads. Compressing objects: 100% (66/66), done. Writing objects: 100% (121/121), 8.90 KiB | 0 bytes/s, done. Total 121 (delta 55), reused 17 (delta 2) remote: You are not authorized to edit this repository. remote: To https://git-wip-us.apache.org/repos/asf/lucene-solr.git ! [remote rejected] master -> master (pre-receive hook declined) error: failed to push some refs to 'https://git-wip-us.apache.org/repos/asf/lucene-solr.git' --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #44: SOLR-8981
Github user uschindler commented on the issue: https://github.com/apache/lucene-solr/pull/44 OK, the tests pass for me successfully. Should I remove the jackcess-encrypt package from your PR after merging (you said you will be away this weekend)? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #44: SOLR-8981
Github user uschindler commented on the issue: https://github.com/apache/lucene-solr/pull/44 Let's pick option 2 for now. Maybe update the rest of Solr after some review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #44: SOLR-8981
Github user tballison commented on the issue: https://github.com/apache/lucene-solr/pull/44 > I also only have Windows :) How can you live with the failed builds?!? I wanted to help with [morphlines](https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201606.mbox/%3CCY1PR09MB1115F9A08E97879D959D3CDCC7570%40CY1PR09MB1115.namprd09.prod.outlook.com%3E), but I can't easily do much... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #44: SOLR-8981
Github user tballison commented on the issue: https://github.com/apache/lucene-solr/pull/44 If we leave out updating bouncycastle, I'm fairly confident that users will run problems at run time if they try to decrypt MSAccess and probably PDF and doc. We had a binary incompatibility between 1.52 and 1.54 with Jackcess: https://sourceforge.net/p/jackcessencrypt/feature-requests/2/ IIRC, the exception was thrown on any encrypted MSAccess file, not just those for which the user had a password. I see two options: 1) upgrade bouncycastle and hope we don't break other parts of Solr 2) announce decryption of Jackcess/POI/PDFBox as unsupported --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #44: SOLR-8981
Github user uschindler commented on the issue: https://github.com/apache/lucene-solr/pull/44 I also only have Windows :) I would leave out image format, but MS Access looks fine. Could we leave out updating bouncycastl then? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #44: SOLR-8981
Github user tballison commented on the issue: https://github.com/apache/lucene-solr/pull/44 There will likely be some conflicts with bouncy castle. Tika 1.13: bcmail-jdk15on 1.54 bcprov-jdk15on 1.54 vs. Solr: org.bouncycastle.version = 1.45 /org.bouncycastle/bcmail-jdk15 = ${org.bouncycastle.version} /org.bouncycastle/bcprov-jdk15 = ${org.bouncycastle.version} --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #44: SOLR-8981
Github user tballison commented on the issue: https://github.com/apache/lucene-solr/pull/44 WebP is an image format. Jackcess encrypt is the library that allows users to decrypt MSAccess files. Please give it a go with Java 9. I can't easily test the morphlines stuff on my main dev box (Windows ... :( ). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #44: SOLR-8981
Github user uschindler commented on the issue: https://github.com/apache/lucene-solr/pull/44 Did you check with Java 9 or should I do it? I am not sure about the last assume removed, because there is another SOLR issue in the assume message' not just the PDFBOX one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #44: SOLR-8981
Github user uschindler commented on the issue: https://github.com/apache/lucene-solr/pull/44 What file formats are this? Documents? Otherwise please leave them out. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #44: SOLR-8981
Github user tballison commented on the issue: https://github.com/apache/lucene-solr/pull/44 Our bug introduced in TIKA-995. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #44: SOLR-8981
Github user tballison commented on the issue: https://github.com/apache/lucene-solr/pull/44 Not willing to point fingers... :) I'd like to track down the change in our history between 1.7 and 1.13 so that I actually understand what happened --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #44: SOLR-8981
Github user uschindler commented on the issue: https://github.com/apache/lucene-solr/pull/44 LOL. So is this a bug in Solr or in TIKA? Because it did not happen previously. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #44: SOLR-8981
Github user tballison commented on the issue: https://github.com/apache/lucene-solr/pull/44 The XHTMLContentHandler adds and . In out-of-the-box Tika with the DefaultHtmlMapper, "body" tags are not in the list of "SAFE_ELEMENTS", which means that the html's "body" tag is never passed through...so we don't see the doubling in Tika. The solution is to suppress the body tag in Solr's MostlyPassthroughHtmlMapper. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #44: SOLR-8981
Github user tballison commented on the issue: https://github.com/apache/lucene-solr/pull/44 Just found it. Confirming that fix doesn't break anything else. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #44: SOLR-8981
Github user uschindler commented on the issue: https://github.com/apache/lucene-solr/pull/44 Were you able to fix the test or should I look into it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #44: SOLR-8981
Github user tballison commented on the issue: https://github.com/apache/lucene-solr/pull/44 No, it is a self-contained test with a test file. +1 on local and _only_ local. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #44: SOLR-8981
Github user uschindler commented on the issue: https://github.com/apache/lucene-solr/pull/44 > will take a look. The test passed if you assumed that the html had two bodies, but that's crazy... I hope this test does not download the internet? It should all run local! I have not looked into it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #44: SOLR-8981
Github user uschindler commented on the issue: https://github.com/apache/lucene-solr/pull/44 Grep for that one and remove them. Tests should pass then with latest Java 9: `assumeFalse("This test fails with Java 9 (https://issues.apache.org/jira/browse/PDFBOX-3155)", Constants.JRE_IS_MINIMUM_JAVA9);` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #44: SOLR-8981
Github user uschindler commented on the issue: https://github.com/apache/lucene-solr/pull/44 OK, I will merge again later. So I will revert my checkout once you have fixed that. Otherwise all looks fine. BTW: Can you remove the assumeFalse on Java 9, because PDFBox is fixed? This was because on Java 9 PDFBOX failed in clinit (version number parsing failure). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #44: SOLR-8981
Github user tballison commented on the issue: https://github.com/apache/lucene-solr/pull/44 argh... will take a look. The test passed if you assumed that the html had two bodies, but that's crazy... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #44: SOLR-8981
Github user uschindler commented on the issue: https://github.com/apache/lucene-solr/pull/44 for me it still happens. I just merged the PR --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #44: SOLR-8981
Github user lewismc commented on the issue: https://github.com/apache/lucene-solr/pull/44 @uschindler yep we've seen this before. I have no idea what is going on here. I'll look in to it again today. Can someone point out the exact code which does the XPath magic? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #44: SOLR-8981
Github user tballison commented on the issue: https://github.com/apache/lucene-solr/pull/44 Y, I did run the extraction tests. That was the error we were getting initially, but which (without explanation) disappeared on my most recent integration attempt. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #44: SOLR-8981
Github user uschindler commented on the issue: https://github.com/apache/lucene-solr/pull/44 I merged everything successfully, but I get one test failure in solr/contrib/extraction: [junit4] FAILURE 0.05s J0 | ExtractingRequestHandlerTest.testXPath <<< [junit4]> Throwable #1: org.junit.ComparisonFailure: expected:<[News]> but was:<[]> [junit4]>at __randomizedtesting.SeedInfo.seed([404BA07016F1FB57:3E1A6EE30E469911]:0) I have the feeling I have seen this before. Weren't you running the extraction tests? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #44: SOLR-8981
Github user tballison commented on the issue: https://github.com/apache/lucene-solr/pull/44 Git (well, it was my fault, don't get me wrong) added the \r\n somehow. I had turned off autocrlf earlier. > C:\...>git config --get core.autocrlf input I realized I forgot to update the isoparser, and I cleaned up the Jackcess notice. Let me know how this looks now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #44: SOLR-8981
Github user uschindler commented on the issue: https://github.com/apache/lucene-solr/pull/44 > I think this should work... ant precommit worked in Linux with these modifications. I kept getting hangs with ant jar-checksums in Windows. If you checkout with git on windows using auto-eol it fails. The reason is git that threats sha1 files as text and converts their line endings. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #44: SOLR-8981
Github user tballison commented on the issue: https://github.com/apache/lucene-solr/pull/44 I think I got it... ant precommit worked in Linux with these modifications. I kept getting hangs with ant jar-checksums in Windows. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #44: SOLR-8981
Github user uschindler commented on the issue: https://github.com/apache/lucene-solr/pull/44 Hallo, please also update all SHA1 hashes of files. Plesae run "ant precommit" from root folder of Lu/Solr. This will report all missing things. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org