[GitHub] lucene-solr issue #44: SOLR-8981

2016-06-18 Thread uschindler
Github user uschindler commented on the issue:

https://github.com/apache/lucene-solr/pull/44
  
Ah OK, so no problem on my side. I'll wait a bit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr issue #44: SOLR-8981

2016-06-18 Thread lewismc
Github user lewismc commented on the issue:

https://github.com/apache/lucene-solr/pull/44
  
Yes the server is buggered. Good work folks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr issue #44: SOLR-8981

2016-06-18 Thread uschindler
Github user uschindler commented on the issue:

https://github.com/apache/lucene-solr/pull/44
  
Hi I have applied some other fixes and will push soon. Currently ASF have 
some problems with pushing:

git.exe push --progress "origin" master:master

Counting objects: 121, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (66/66), done.
Writing objects: 100% (121/121), 8.90 KiB | 0 bytes/s, done.
Total 121 (delta 55), reused 17 (delta 2)
remote: You are not authorized to edit this repository.
remote:
To https://git-wip-us.apache.org/repos/asf/lucene-solr.git
! [remote rejected] master -> master (pre-receive hook declined)
error: failed to push some refs to 
'https://git-wip-us.apache.org/repos/asf/lucene-solr.git'



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr issue #44: SOLR-8981

2016-06-17 Thread uschindler
Github user uschindler commented on the issue:

https://github.com/apache/lucene-solr/pull/44
  
OK, the tests pass for me successfully. Should I remove the 
jackcess-encrypt package from your PR after merging (you said you will be away 
this weekend)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr issue #44: SOLR-8981

2016-06-17 Thread uschindler
Github user uschindler commented on the issue:

https://github.com/apache/lucene-solr/pull/44
  
Let's pick option 2 for now. Maybe update the rest of Solr after some 
review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr issue #44: SOLR-8981

2016-06-17 Thread tballison
Github user tballison commented on the issue:

https://github.com/apache/lucene-solr/pull/44
  
> I also only have Windows :)

How can you live with the failed builds?!?  I wanted to help with 
[morphlines](https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201606.mbox/%3CCY1PR09MB1115F9A08E97879D959D3CDCC7570%40CY1PR09MB1115.namprd09.prod.outlook.com%3E),
 but I can't easily do much...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr issue #44: SOLR-8981

2016-06-17 Thread tballison
Github user tballison commented on the issue:

https://github.com/apache/lucene-solr/pull/44
  
If we leave out updating bouncycastle, I'm fairly confident that users will 
run problems at run time if they try to decrypt MSAccess and probably PDF and 
doc.

We had a binary incompatibility between 1.52 and 1.54 with Jackcess: 
https://sourceforge.net/p/jackcessencrypt/feature-requests/2/

IIRC, the exception was thrown on any encrypted MSAccess file, not just 
those for which the user had a password.

I see two options: 

1) upgrade bouncycastle and hope we don't break other parts of Solr
2) announce decryption of Jackcess/POI/PDFBox as unsupported




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr issue #44: SOLR-8981

2016-06-17 Thread uschindler
Github user uschindler commented on the issue:

https://github.com/apache/lucene-solr/pull/44
  
I also only have Windows :)

I would leave out image format, but MS Access looks fine. Could we leave 
out updating bouncycastl then?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr issue #44: SOLR-8981

2016-06-17 Thread tballison
Github user tballison commented on the issue:

https://github.com/apache/lucene-solr/pull/44
  
There will likely be some conflicts with bouncy castle.  

Tika 1.13:
bcmail-jdk15on  1.54
bcprov-jdk15on  1.54

vs. Solr:
org.bouncycastle.version = 1.45
/org.bouncycastle/bcmail-jdk15 = ${org.bouncycastle.version}
/org.bouncycastle/bcprov-jdk15 = ${org.bouncycastle.version}



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr issue #44: SOLR-8981

2016-06-17 Thread tballison
Github user tballison commented on the issue:

https://github.com/apache/lucene-solr/pull/44
  
WebP is an image format.
Jackcess encrypt is the library that allows users to decrypt MSAccess files.

Please give it a go with Java 9.  I can't easily test the morphlines stuff 
on my main dev box (Windows ... :( ).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr issue #44: SOLR-8981

2016-06-17 Thread uschindler
Github user uschindler commented on the issue:

https://github.com/apache/lucene-solr/pull/44
  
Did you check with Java 9 or should I do it? I am not sure about the last 
assume removed, because there is another SOLR issue in the assume message' not 
just the PDFBOX one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr issue #44: SOLR-8981

2016-06-17 Thread uschindler
Github user uschindler commented on the issue:

https://github.com/apache/lucene-solr/pull/44
  
What file formats are this? Documents? Otherwise please leave them out.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr issue #44: SOLR-8981

2016-06-17 Thread tballison
Github user tballison commented on the issue:

https://github.com/apache/lucene-solr/pull/44
  
Our bug introduced in TIKA-995.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr issue #44: SOLR-8981

2016-06-17 Thread tballison
Github user tballison commented on the issue:

https://github.com/apache/lucene-solr/pull/44
  
Not willing to point fingers... :)

I'd like to track down the change in our history between 1.7 and 1.13 so 
that I actually understand what happened


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr issue #44: SOLR-8981

2016-06-17 Thread uschindler
Github user uschindler commented on the issue:

https://github.com/apache/lucene-solr/pull/44
  
LOL. So is this a bug in Solr or in TIKA? Because it did not happen 
previously.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr issue #44: SOLR-8981

2016-06-17 Thread tballison
Github user tballison commented on the issue:

https://github.com/apache/lucene-solr/pull/44
  
The XHTMLContentHandler adds  and .  In out-of-the-box Tika 
with the DefaultHtmlMapper, "body" tags are not in the list of "SAFE_ELEMENTS", 
which means that the html's "body" tag is never passed through...so we don't 
see the doubling in Tika.

The solution is to suppress the body tag in Solr's 
MostlyPassthroughHtmlMapper.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr issue #44: SOLR-8981

2016-06-17 Thread tballison
Github user tballison commented on the issue:

https://github.com/apache/lucene-solr/pull/44
  
Just found it.  Confirming that fix doesn't break anything else.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr issue #44: SOLR-8981

2016-06-17 Thread uschindler
Github user uschindler commented on the issue:

https://github.com/apache/lucene-solr/pull/44
  
Were you able to fix the test or should I look into it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr issue #44: SOLR-8981

2016-06-17 Thread tballison
Github user tballison commented on the issue:

https://github.com/apache/lucene-solr/pull/44
  
No, it is a self-contained test with a test file. +1 on local and _only_ 
local.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr issue #44: SOLR-8981

2016-06-17 Thread uschindler
Github user uschindler commented on the issue:

https://github.com/apache/lucene-solr/pull/44
  
> will take a look. The test passed if you assumed that the html had two 
bodies, but that's crazy...

I hope this test does not download the internet? It should all run local! I 
have not looked into it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr issue #44: SOLR-8981

2016-06-17 Thread uschindler
Github user uschindler commented on the issue:

https://github.com/apache/lucene-solr/pull/44
  
Grep for that one and remove them. Tests should pass then with latest Java 
9:
`assumeFalse("This test fails with Java 9 
(https://issues.apache.org/jira/browse/PDFBOX-3155)", 
Constants.JRE_IS_MINIMUM_JAVA9);`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr issue #44: SOLR-8981

2016-06-17 Thread uschindler
Github user uschindler commented on the issue:

https://github.com/apache/lucene-solr/pull/44
  
OK, I will merge again later. So I will revert my checkout once you have 
fixed that. Otherwise all looks fine.

BTW: Can you remove the assumeFalse on Java 9, because PDFBox is fixed? 
This was because on Java 9 PDFBOX failed in clinit (version number parsing 
failure).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr issue #44: SOLR-8981

2016-06-17 Thread tballison
Github user tballison commented on the issue:

https://github.com/apache/lucene-solr/pull/44
  
argh...

will take a look.  The test passed if you assumed that the html had two 
bodies, but that's crazy...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr issue #44: SOLR-8981

2016-06-17 Thread uschindler
Github user uschindler commented on the issue:

https://github.com/apache/lucene-solr/pull/44
  
for me it still happens. I just merged the PR


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr issue #44: SOLR-8981

2016-06-17 Thread lewismc
Github user lewismc commented on the issue:

https://github.com/apache/lucene-solr/pull/44
  
@uschindler yep we've seen this before. I have no idea what is going on 
here. I'll look in to it again today. Can someone point out the exact code 
which does the XPath magic?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr issue #44: SOLR-8981

2016-06-17 Thread tballison
Github user tballison commented on the issue:

https://github.com/apache/lucene-solr/pull/44
  
Y, I did run the extraction tests.  That was the error we were getting 
initially, but which (without explanation) disappeared on my most recent 
integration attempt.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr issue #44: SOLR-8981

2016-06-17 Thread uschindler
Github user uschindler commented on the issue:

https://github.com/apache/lucene-solr/pull/44
  
I merged everything successfully, but I get one test failure in 
solr/contrib/extraction:

[junit4] FAILURE 0.05s J0 | ExtractingRequestHandlerTest.testXPath <<<
[junit4]> Throwable #1: org.junit.ComparisonFailure: expected:<[News]> 
but was:<[]>
[junit4]>at 
__randomizedtesting.SeedInfo.seed([404BA07016F1FB57:3E1A6EE30E469911]:0)

I have the feeling I have seen this before. Weren't you running the 
extraction tests?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr issue #44: SOLR-8981

2016-06-17 Thread tballison
Github user tballison commented on the issue:

https://github.com/apache/lucene-solr/pull/44
  
Git (well, it was my fault, don't get me wrong) added the \r\n somehow.  I 
had turned off autocrlf earlier.

> C:\...>git config --get core.autocrlf
input

I realized I forgot to update the isoparser, and I cleaned up the Jackcess 
notice.

Let me know how this looks now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr issue #44: SOLR-8981

2016-06-16 Thread uschindler
Github user uschindler commented on the issue:

https://github.com/apache/lucene-solr/pull/44
  
> I think this should work... ant precommit worked in Linux with these 
modifications. I kept getting hangs with ant jar-checksums in Windows.

If you checkout with git on windows using auto-eol it fails. The reason is 
git that threats sha1 files as text and converts their line endings.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr issue #44: SOLR-8981

2016-06-16 Thread tballison
Github user tballison commented on the issue:

https://github.com/apache/lucene-solr/pull/44
  
I think I got it...  ant precommit worked in Linux with these 
modifications.  I kept getting hangs with ant jar-checksums in Windows.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr issue #44: SOLR-8981

2016-06-16 Thread uschindler
Github user uschindler commented on the issue:

https://github.com/apache/lucene-solr/pull/44
  
Hallo,
please also update all SHA1 hashes of files. Plesae run "ant precommit" 
from root folder of Lu/Solr. This will report all missing things.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org