Re: [VOTE] Release Apache Tika 2.0.0-ALPHA Candidate #1

2021-01-15 Thread Peter Lee
Here's my +1 On 1 15 2021, at 2:44, Tilman Hausherr wrote: > +1 > > Tilman > Am 14.01.2021 um 02:19 schrieb Tim Allison: > > All, > > > > A candidate for the Tika 2.0.0-ALPHA release is available at: > > https://dist.apache.org/repos/dist/dev/tika/ > > > > The release candidate is a zip archive

[jira] [Commented] (TIKA-3180) Tika 2.0.0 -- Modularize tika-server

2020-12-20 Thread Peter Lee (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17252556#comment-17252556 ] Peter Lee commented on TIKA-3180: - It works now. :) > Tika 2.0.0 -- Modularize tika-ser

[jira] [Commented] (TIKA-3180) Tika 2.0.0 -- Modularize tika-server

2020-12-18 Thread Peter Lee (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17251612#comment-17251612 ] Peter Lee commented on TIKA-3180: - Seems some tests are failed, see [https://ci-builds.apache.org/job

Re: accidental merge

2020-12-14 Thread Peter Lee
> That one (0810700) I wanted to commit. > I see. Everything looks good now. :) Lee On 12 14 2020, at 4:35, Tilman Hausherr wrote: > Am 14.12.2020 um 08:48 schrieb Peter Lee: > > Seems the latest commit 7f65d61 is exactly the same as dd85c73: > > https://github.com/apache

Re: accidental merge

2020-12-13 Thread Peter Lee
meone please verify this: > the last good commit is from Peter Lee "Simplify init code of some Set > and List". > then I made a small commit "TIKA-3248: avoid ClassCastException" of > about 10 lines. > > then "bad" things happened. > Ideally

[jira] [Resolved] (TIKA-3218) Wrong comment for method sortLoadedClasses in ServiceLoaderUtils

2020-12-04 Thread Peter Lee (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Lee resolved TIKA-3218. - Fix Version/s: 2.0.0 Resolution: Fixed > Wrong comment for method sortLoadedClas

[jira] [Commented] (TIKA-3218) Wrong comment for method sortLoadedClasses in ServiceLoaderUtils

2020-12-04 Thread Peter Lee (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17244377#comment-17244377 ] Peter Lee commented on TIKA-3218: - Thank you for fix this (y) > Wrong comment for met

Re: [ANNOUNCE] Welcome Peter Lee as Tika PMC member and committer

2020-11-25 Thread Peter Lee
Many thanks to you, Tim. :) Hi, all I'm Peter Lee and I was a Apache Commons committer. I'm familiar with many archivers and compressors. Feel free to ask me if you have some problems in compression. I'm honored to be part of Tika. Tika is great and it helped me a lot. Besides, Tika is a great

Re: branch_1x tika-bundle issues

2020-11-17 Thread Peter Lee
Got the same problem. After some investigation I believe it's caused by the version of maven-bundle-plugin : I can successfully build branch_1x with version 4.1.0, but failed with version 4.2.0, 4.2.1 and 5.1.1 Still working on finding out what's wrong here. Here this helps. cheers, Lee On 11

[jira] [Commented] (TIKA-3218) Wrong comment for method sortLoadedClasses in ServiceLoaderUtils

2020-11-05 Thread Peter Lee (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227108#comment-17227108 ] Peter Lee commented on TIKA-3218: - _so that user-provided ones would come first and would be able

[jira] [Created] (TIKA-3218) Wrong comment for method sortLoadedClasses in ServiceLoaderUtils

2020-10-30 Thread Peter Lee (Jira)
Peter Lee created TIKA-3218: --- Summary: Wrong comment for method sortLoadedClasses in ServiceLoaderUtils Key: TIKA-3218 URL: https://issues.apache.org/jira/browse/TIKA-3218 Project: Tika Issue

[jira] [Commented] (TIKA-3213) Consider migrating universalcharsetdetector to a live fork

2020-10-24 Thread Peter Lee (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17220030#comment-17220030 ] Peter Lee commented on TIKA-3213: - This fork repository don't support Chinese charset detect since version

[jira] [Commented] (TIKA-3209) Different between PictureRunMapper in POI and PicturesSource in Tika

2020-10-19 Thread Peter Lee (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217212#comment-17217212 ] Peter Lee commented on TIKA-3209: - Hi [~nick] Just replace PicturesSource in Tika with PictureRunMapper

[jira] [Commented] (TIKA-3209) Different between PictureRunMapper in POI and PicturesSource in Tika

2020-10-18 Thread Peter Lee (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17216391#comment-17216391 ] Peter Lee commented on TIKA-3209: -  [~nick] Could you give some advice ? Can we remove that line in POI

[jira] [Created] (TIKA-3209) Different between PictureRunMapper in POI and PicturesSource in Tika

2020-10-13 Thread Peter Lee (Jira)
Peter Lee created TIKA-3209: --- Summary: Different between PictureRunMapper in POI and PicturesSource in Tika Key: TIKA-3209 URL: https://issues.apache.org/jira/browse/TIKA-3209 Project: Tika Issue

[jira] [Comment Edited] (TIKA-3196) PackageParser should attempt to parse entries from zip files with STORED entries with data descriptor

2020-09-22 Thread Peter Lee (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200448#comment-17200448 ] Peter Lee edited comment on TIKA-3196 at 9/23/20, 2:13 AM: --- Hi [~tallison] I

[jira] [Commented] (TIKA-3196) PackageParser should attempt to parse entries from zip files with STORED entries with data descriptor

2020-09-22 Thread Peter Lee (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200448#comment-17200448 ] Peter Lee commented on TIKA-3196: - Hi [~tallison] I wrote a test here : [https://github.com/apache/tika

[jira] [Resolved] (TIKA-3197) TikaInputStream may not be closed

2020-09-14 Thread Peter Lee (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Lee resolved TIKA-3197. - Resolution: Not A Problem > TikaInputStream may not be clo

[jira] [Created] (TIKA-3197) TikaInputStream may not be closed

2020-09-14 Thread Peter Lee (Jira)
Peter Lee created TIKA-3197: --- Summary: TikaInputStream may not be closed Key: TIKA-3197 URL: https://issues.apache.org/jira/browse/TIKA-3197 Project: Tika Issue Type: Bug Components

Re: release planning?

2020-09-10 Thread Peter Lee
Anything else? > > On Tue, Sep 8, 2020 at 9:56 PM Peter Lee wrote: > > Hi Tim, > > > > I pushed some bugfix PRs in github and maybe we could have a look if they > > should be merged into branch_1x : > > #330 : URLs update > > #340 : some minor fix T

Re: release planning?

2020-09-08 Thread Peter Lee
Hi Tim, I pushed some bugfix PRs in github and maybe we could have a look if they should be merged into branch_1x : #330 : URLs update #340 : some minor fix TikaCLI #347 : minor fix for BatchProcessBuilder #353 : fix for tests failure for those developers whose default language is not English

Re: Tests failed in windows but not in linux

2020-08-24 Thread Peter Lee
th GeoParser and SentimentAnalysisParser on > the main branch. Removing the Logger fixes both and it builds cleanly. Still > not sure what the exact issue is but I can recreate the issue and your > solution. > - Bob > On 8/24/2020 4:02 AM, Peter Lee wrote: > > > > Update : >

Re: Tests failed in windows but not in linux

2020-08-24 Thread Peter Lee
Update : It works after I removed the loggers in GeoParser and GeoParserConfig. But I'm still not clear what exactly the problem is. :( Lee On 8 24 2020, at 3:27 , Peter Lee wrote: > Hi all, > > The tests are failing on my windows : the GeoParserTest are failing cause the

Tests failed in windows but not in linux

2020-08-24 Thread Peter Lee
Hi all, The tests are failing on my windows : the GeoParserTest are failing cause the class org.apache.tika.parser.geo.GeoParser cloud not be found. But everything works fine on my Ubuntu. The error is wired. I did some googling but couldn't figure out what's the problem. Anyone who got same

Re: Windows build errors

2020-08-19 Thread Peter Lee
Hi Tilman, > expected: but was: charset=[windows-1252]> I think this problem is caused by the charset detection strategy basing on line separator(CRLF or LF) and the git autocrlf config. I also met this problem and solved it like this : Set autocrlf false by git config --global core.autocrlf

[jira] [Commented] (TIKA-1770) AutoDetectParser wrongly detects plain text as images/audio

2020-08-15 Thread Peter Lee (Jira)
[ https://issues.apache.org/jira/browse/TIKA-1770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17178184#comment-17178184 ] Peter Lee commented on TIKA-1770: - Test 3 given file in tika-1.24.1 . here is tika content-type detection

[jira] [Commented] (TIKA-3155) Parse Error while extracting CSV files

2020-08-12 Thread Peter Lee (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17176206#comment-17176206 ] Peter Lee commented on TIKA-3155: - According to my understanding , here is how Tika handle csv file : 1

[jira] [Commented] (TIKA-3155) Parse Error while extracting CSV files

2020-08-11 Thread Peter Lee (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17175290#comment-17175290 ] Peter Lee commented on TIKA-3155: - We can do it in _TextAndCSVParser_ like this {code:java} CSVFormat

[jira] [Commented] (TIKA-3155) Parse Error while extracting CSV files

2020-08-11 Thread Peter Lee (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17175286#comment-17175286 ] Peter Lee commented on TIKA-3155: - Hey. I think it's caused by the Quote Mode of Apache Commons CSV. We

Should we add Apache Commons Lang to tika-core as a dependency?

2020-08-03 Thread Peter Lee
Hi all, I'm working with TIKA-3141 recently and pushed a PR in github. As Keith suggested in the PR, maybe we should add Commons Lang to tika-core, as it seems Commons Lang are being used elsewhere in tika but not tika-core. Ideas? cheers, Lee

PRs on github need reviews

2020-07-30 Thread Peter Lee
Hi all, I'm using Tika recently and found it fascinating! I pushed some PRs on github but it seems no one is reviewing(so are some other PRs on github). Maybe somebody could give me a hand? Here are the PRs: https://github.com/apache/tika/pull/334

[jira] [Commented] (TIKA-3141) LINUX - Tika shouldn't throw an exception for an empty TIKA_CONFIG environment variable value

2020-07-30 Thread Peter Lee (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17167845#comment-17167845 ] Peter Lee commented on TIKA-3141: - Hi [~nick], I'm working on Tika recently and I'm interested