[PR] Bump aws.version from 1.12.689 to 1.12.690 [tika]
dependabot[bot] opened a new pull request, #1700: URL: https://github.com/apache/tika/pull/1700 Bumps `aws.version` from 1.12.689 to 1.12.690. Updates `com.amazonaws:aws-java-sdk-s3` from 1.12.689 to 1.12.690 Changelog Sourced from https://github.com/aws/aws-sdk-java/blob/master/CHANGELOG.md;>com.amazonaws:aws-java-sdk-s3's changelog. 1.12.690 2024-03-28 AWS Compute Optimizer Features This release enables AWS Compute Optimizer to analyze and generate recommendations with a new customization preference, Memory Utilization. Amazon CodeCatalyst Features This release adds support for understanding pending changes to subscriptions by including two new response parameters for the GetSubscription API for Amazon CodeCatalyst. Amazon Elastic Compute Cloud Features Amazon EC2 C7gd, M7gd and R7gd metal instances with up to 3.8 TB of local NVMe-based SSD block-level storage have up to 45% improved real-time NVMe storage performance than comparable Graviton2-based instances. Amazon Elastic Kubernetes Service Features Add multiple customer error code to handle customer caused failure when managing EKS node groups Amazon GuardDuty Features Add EC2 support for GuardDuty Runtime Monitoring auto management. Amazon Neptune Graph Features Update ImportTaskCancelled waiter to evaluate task state correctly and minor documentation changes. Amazon QuickSight Features Amazon QuickSight: Adds support for setting up VPC Endpoint restrictions for accessing QuickSight Website. CloudWatch Observability Access Manager Features This release adds support for sharing AWS::InternetMonitor::Monitor resources. Commits https://github.com/aws/aws-sdk-java/commit/8158c919c956717dabf2a6ae7cc1d26b592488ac;>8158c91 AWS SDK for Java 1.12.690 https://github.com/aws/aws-sdk-java/commit/1b3444aa78f4579c4083bd4b3858322bc343a906;>1b3444a Update GitHub version number to 1.12.690-SNAPSHOT See full diff in https://github.com/aws/aws-sdk-java/compare/1.12.689...1.12.690;>compare view Updates `com.amazonaws:aws-java-sdk-transcribe` from 1.12.689 to 1.12.690 Changelog Sourced from https://github.com/aws/aws-sdk-java/blob/master/CHANGELOG.md;>com.amazonaws:aws-java-sdk-transcribe's changelog. 1.12.690 2024-03-28 AWS Compute Optimizer Features This release enables AWS Compute Optimizer to analyze and generate recommendations with a new customization preference, Memory Utilization. Amazon CodeCatalyst Features This release adds support for understanding pending changes to subscriptions by including two new response parameters for the GetSubscription API for Amazon CodeCatalyst. Amazon Elastic Compute Cloud Features Amazon EC2 C7gd, M7gd and R7gd metal instances with up to 3.8 TB of local NVMe-based SSD block-level storage have up to 45% improved real-time NVMe storage performance than comparable Graviton2-based instances. Amazon Elastic Kubernetes Service Features Add multiple customer error code to handle customer caused failure when managing EKS node groups Amazon GuardDuty Features Add EC2 support for GuardDuty Runtime Monitoring auto management. Amazon Neptune Graph Features Update ImportTaskCancelled waiter to evaluate task state correctly and minor documentation changes. Amazon QuickSight Features Amazon QuickSight: Adds support for setting up VPC Endpoint restrictions for accessing QuickSight Website. CloudWatch Observability Access Manager Features This release adds support for sharing AWS::InternetMonitor::Monitor resources. Commits https://github.com/aws/aws-sdk-java/commit/8158c919c956717dabf2a6ae7cc1d26b592488ac;>8158c91 AWS SDK for Java 1.12.690 https://github.com/aws/aws-sdk-java/commit/1b3444aa78f4579c4083bd4b3858322bc343a906;>1b3444a Update GitHub version number to 1.12.690-SNAPSHOT See full diff in https://github.com/aws/aws-sdk-java/compare/1.12.689...1.12.690;>compare view Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- Dependabot commands and options You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits
[PR] Bump commons-io:commons-io from 2.15.1 to 2.16.0 [tika]
dependabot[bot] opened a new pull request, #1701: URL: https://github.com/apache/tika/pull/1701 Bumps commons-io:commons-io from 2.15.1 to 2.16.0. [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=commons-io:commons-io=maven=2.15.1=2.16.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- Dependabot commands and options You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Bump aws.version from 1.12.689 to 1.12.690 [tika]
THausherr merged PR #1700: URL: https://github.com/apache/tika/pull/1700 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (TIKA-4207) PipesParser should have option to extract raw bytes of embedded files
[ https://issues.apache.org/jira/browse/TIKA-4207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17831840#comment-17831840 ] Hudson commented on TIKA-4207: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1580 (See [https://ci-builds.apache.org/job/Tika/job/tika-main-jdk11/1580/]) TIKA-4207: Add handling of embedded bytes to tika-pipes (#1699) (github: [https://github.com/apache/tika/commit/4fe7312330c430f357012f8d0ff886a0fb344783]) * (edit) tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/WMFParser.java * (add) tika-pipes/tika-async-cli/src/test/resources/configs/TIKA-4207-emitter.xml * (edit) tika-core/src/main/java/org/apache/tika/parser/RecursiveParserWrapper.java * (add) tika-core/src/main/java/org/apache/tika/extractor/EmbeddedDocumentByteStoreExtractorFactory.java * (edit) tika-core/src/main/java/org/apache/tika/extractor/ParsingEmbeddedDocumentExtractor.java * (edit) tika-core/src/test/java/org/apache/tika/pipes/PipesServerTest.java * (add) tika-app/src/test/java/org/apache/tika/cli/TikaCLIAsyncTest.java * (edit) tika-pipes/tika-pipes-iterators/pom.xml * (edit) tika-pipes/tika-async-cli/pom.xml * (add) tika-pipes/tika-pipes-iterators/tika-pipes-iterator-json/pom.xml * (edit) tika-core/src/main/java/org/apache/tika/metadata/TikaCoreProperties.java * (add) tika-core/src/test/resources/org/apache/tika/pipes/TIKA-4207.xml * (add) tika-core/src/main/java/org/apache/tika/pipes/extractor/EmbeddedDocumentBytesConfig.java * (edit) tika-parsers/tika-parsers-standard/tika-parsers-standard-package/src/test/resources/configs/tika-config-no-names.xml * (delete) tika-core/src/test/java/org/apache/tika/pipes/async/AsyncProcessorTest.java * (add) tika-core/src/main/java/org/apache/tika/extractor/EmbeddedDocumentBytesHandler.java * (add) tika-core/src/test/java/org/apache/tika/pipes/async/AsyncChaosMonkeyTest.java * (add) tika-pipes/tika-pipes-iterators/tika-pipes-iterator-json/src/test/resources/test-documents/test.json * (edit) tika-server/tika-server-core/src/main/java/org/apache/tika/server/core/resource/AsyncResource.java * (edit) tika-parsers/tika-parsers-standard/tika-parsers-standard-package/src/test/resources/configs/tika-config-with-names.xml * (add) tika-core/src/main/java/org/apache/tika/pipes/extractor/EmittingEmbeddedDocumentBytesHandler.java * (delete) tika-pipes/tika-async-cli/src/test/resources/tika-config-broken.xml * (add) tika-pipes/tika-async-cli/src/test/resources/configs/tika-config-broken.xml * (add) tika-pipes/tika-pipes-iterators/tika-pipes-iterator-json/src/test/java/org/apache/tika/pipes/pipesiterator/json/TestJsonPipesIterator.java * (add) tika-core/src/main/java/org/apache/tika/extractor/BasicEmbeddedDocumentBytesHandler.java * (add) tika-pipes/tika-pipes-iterators/tika-pipes-iterator-json/src/main/java/org/apache/tika/pipes/pipesiterator/json/JsonPipesIterator.java * (edit) tika-serialization/src/main/java/org/apache/tika/metadata/serialization/JsonFetchEmitTuple.java * (edit) tika-core/src/main/java/org/apache/tika/pipes/PipesServer.java * (edit) tika-pipes/tika-async-cli/src/test/java/org/apache/tika/async/cli/TikaAsyncCLITest.java * (edit) tika-serialization/src/test/java/org/apache/tika/metadata/serialization/JsonFetchEmitTupleTest.java * (add) tika-pipes/tika-pipes-iterators/tika-pipes-iterator-json/src/test/resources/test-documents/test-with-embedded-bytes.json * (add) tika-core/src/main/java/org/apache/tika/extractor/RUnpackExtractor.java * (add) tika-core/src/test/java/org/apache/tika/parser/AutoDetectParserConfigTest.java * (edit) tika-core/src/main/java/org/apache/tika/pipes/FetchEmitTuple.java * (edit) tika-core/src/main/java/org/apache/tika/parser/AutoDetectParser.java * (edit) tika-server/tika-server-standard/src/test/java/org/apache/tika/server/standard/TikaPipesTest.java * (add) tika-core/src/main/java/org/apache/tika/extractor/RUnpackExtractorFactory.java * (add) tika-core/src/test/resources/org/apache/tika/pipes/TIKA-4207-limit-bytes.xml * (add) tika-core/src/main/java/org/apache/tika/extractor/BasicEmbeddedBytesSelector.java * (add) tika-core/src/main/java/org/apache/tika/extractor/AbstractEmbeddedDocumentBytesHandler.java * (add) tika-core/src/main/java/org/apache/tika/extractor/EmbeddedBytesSelector.java * (edit) tika-core/src/main/java/org/apache/tika/io/BoundedInputStream.java * (edit) tika-core/src/main/java/org/apache/tika/parser/AutoDetectParserConfig.java * (edit) tika-core/src/test/java/org/apache/tika/parser/mock/MockParser.java * (edit) tika-app/src/test/java/org/apache/tika/cli/TikaCLITest.java * (add) tika-pipes/tika-async-cli/src/test/java/org/apache/tika/async/cli/AsyncProcessorTest.java * (add) tika-pipes/tika-async-cli/src/test/resources/test-documents/basic_embedded.xml * (add)
[jira] [Commented] (TIKA-4228) Tika parser crashes JVM when it gets metadata and embedded objects from pdf
[ https://issues.apache.org/jira/browse/TIKA-4228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17831835#comment-17831835 ] Xiaohong Yang commented on TIKA-4228: - It is not multithreaded. I will try to get the exit value of the process (if possible). I will also check if there is a core dump on the machine. > Tika parser crashes JVM when it gets metadata and embedded objects from pdf > --- > > Key: TIKA-4228 > URL: https://issues.apache.org/jira/browse/TIKA-4228 > Project: Tika > Issue Type: Bug >Affects Versions: 2.9.0 >Reporter: Xiaohong Yang >Priority: Major > Attachments: tika-config-and-sample-file.zip > > > [^tika-config-and-sample-file.zip] > > We use org.apache.tika.parser.AutoDetectParser to get metadata and embedded > objects from pdf documents. And we found out that it crashes the program (or > the JVM) when it gets metadata and embedded files from the sample pdf file. > > Following is the sample code and attached is the tika-config.xml and the > sample pdf file. Note that the sample file crashes the JVM in 1 out of 10 > runs in our production environment. Sometimes it happens when it gets > metadata and sometimes it happens when it extracts embedded files (the > chances are about 50/50). > > The operating system is Ubuntu 20.04. Java version is 21. Tika version is > 2.9.0 and POI version is 5.2.3. > > > import org.apache.pdfbox.io.IOUtils; > import org.apache.poi.poifs.filesystem.DirectoryEntry; > import org.apache.poi.poifs.filesystem.DocumentEntry; > import org.apache.poi.poifs.filesystem.DocumentInputStream; > import org.apache.poi.poifs.filesystem.POIFSFileSystem; > import org.apache.tika.config.TikaConfig; > import org.apache.tika.detect.Detector; > import org.apache.tika.extractor.EmbeddedDocumentExtractor; > import org.apache.tika.io.FilenameUtils; > import org.apache.tika.io.TikaInputStream; > import org.apache.tika.metadata.Metadata; > import org.apache.tika.metadata.TikaCoreProperties; > import org.apache.tika.mime.MediaType; > import org.apache.tika.parser.AutoDetectParser; > import org.apache.tika.parser.ParseContext; > import org.apache.tika.parser.Parser; > import org.apache.tika.sax.BodyContentHandler; > import org.xml.sax.ContentHandler; > import org.xml.sax.SAXException; > import org.xml.sax.helpers.DefaultHandler; > > import java.io.*; > import java.net.URL; > import java.nio.file.Files; > import java.nio.file.Path; > import java.nio.file.Paths; > > public class ProcessPdf { > private final Path inputFile = new > File("/home/ubuntu/testdirs/testdir_pdf/sample.pdf").toPath(); > private final Path outputDir = new > File("/home/ubuntu/testdirs/testdir_pdf/tika_output/").toPath(); > > private Parser parser; > private ParseContext context; > > > public static void main(String args[]) { > try > { System.out.println("Start"); ProcessPdf processPdf > = new ProcessPdf(); System.out.println("Get metadata"); > processPdf.getMataData(); System.out.println("Extract embedded > files"); processPdf.extract(); > System.out.println("End"); } > catch(Exception ex) > { ex.printStackTrace(); } > } > > public ProcessPdf() > { } > > public void getMataData() throws Exception { > BodyContentHandler handler = new BodyContentHandler(-1); > > Metadata metadata = new Metadata(); > try (FileInputStream inputData = new > FileInputStream(inputFile.toString())) > { TikaConfig config = new > TikaConfig("/home/ubuntu/testdirs/testdir_pdf/tika-config.xml"); > Parser autoDetectParser = new AutoDetectParser(config); > ParseContext context = new ParseContext(); > context.set(TikaConfig.class, config); > autoDetectParser.parse(inputData, handler, metadata, context); } > > String content = handler.toString(); > } > > public void extract() throws Exception { > TikaConfig config = new > TikaConfig("/home/ubuntu/testdirs/testdir_pdf/tika-config.xml"); > ProcessPdf.FileEmbeddedDocumentExtractor > fileEmbeddedDocumentExtractor = new > ProcessPdf.FileEmbeddedDocumentExtractor(); > > parser = new AutoDetectParser(config); > context = new ParseContext(); > context.set(Parser.class, parser); > context.set(TikaConfig.class, config); > context.set(EmbeddedDocumentExtractor.class, > fileEmbeddedDocumentExtractor); > > URL url = inputFile.toUri().toURL(); > Metadata metadata = new Metadata(); > try (InputStream input = TikaInputStream.get(url, metadata)) > { ContentHandler
[jira] [Resolved] (TIKA-4207) PipesParser should have option to extract raw bytes of embedded files
[ https://issues.apache.org/jira/browse/TIKA-4207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4207. --- Fix Version/s: 3.0.0 Resolution: Fixed > PipesParser should have option to extract raw bytes of embedded files > - > > Key: TIKA-4207 > URL: https://issues.apache.org/jira/browse/TIKA-4207 > Project: Tika > Issue Type: New Feature >Reporter: Tim Allison >Priority: Major > Fix For: 3.0.0 > > > There are many use cases, where text+metadata are important, but users also > need the raw bytes from embedded files. > Let's make it possible to extract the usual rmeta content in _and_ the raw > bytes. This is a preliminary step that will offer more customization options > than the proposal in TIKA-3703. > This is targeted to 3.x. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (TIKA-4207) PipesParser should have option to extract raw bytes of embedded files
[ https://issues.apache.org/jira/browse/TIKA-4207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17831815#comment-17831815 ] Tim Allison commented on TIKA-4207: --- There are some areas for simplification, but I think this is good enough to go for now. > PipesParser should have option to extract raw bytes of embedded files > - > > Key: TIKA-4207 > URL: https://issues.apache.org/jira/browse/TIKA-4207 > Project: Tika > Issue Type: New Feature >Reporter: Tim Allison >Priority: Major > > There are many use cases, where text+metadata are important, but users also > need the raw bytes from embedded files. > Let's make it possible to extract the usual rmeta content in _and_ the raw > bytes. This is a preliminary step that will offer more customization options > than the proposal in TIKA-3703. > This is targeted to 3.x. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (TIKA-4207) PipesParser should have option to extract raw bytes of embedded files
[ https://issues.apache.org/jira/browse/TIKA-4207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17831785#comment-17831785 ] ASF GitHub Bot commented on TIKA-4207: -- tballison merged PR #1699: URL: https://github.com/apache/tika/pull/1699 > PipesParser should have option to extract raw bytes of embedded files > - > > Key: TIKA-4207 > URL: https://issues.apache.org/jira/browse/TIKA-4207 > Project: Tika > Issue Type: New Feature >Reporter: Tim Allison >Priority: Major > > There are many use cases, where text+metadata are important, but users also > need the raw bytes from embedded files. > Let's make it possible to extract the usual rmeta content in _and_ the raw > bytes. This is a preliminary step that will offer more customization options > than the proposal in TIKA-3703. > This is targeted to 3.x. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] TIKA-4207: Add handling of embedded bytes to tika-pipes [tika]
tballison merged PR #1699: URL: https://github.com/apache/tika/pull/1699 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (TIKA-4230) Optimized code ComparableVersion
[ https://issues.apache.org/jira/browse/TIKA-4230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhao tao updated TIKA-4230: --- Issue Type: Improvement (was: Bug) > Optimized code ComparableVersion > > > Key: TIKA-4230 > URL: https://issues.apache.org/jira/browse/TIKA-4230 > Project: Tika > Issue Type: Improvement >Reporter: zhao tao >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (TIKA-4230) Optimized code ComparableVersion
zhao tao created TIKA-4230: -- Summary: Optimized code ComparableVersion Key: TIKA-4230 URL: https://issues.apache.org/jira/browse/TIKA-4230 Project: Tika Issue Type: Bug Reporter: zhao tao -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (TIKA-4207) PipesParser should have option to extract raw bytes of embedded files
[ https://issues.apache.org/jira/browse/TIKA-4207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17831732#comment-17831732 ] ASF GitHub Bot commented on TIKA-4207: -- tballison opened a new pull request, #1699: URL: https://github.com/apache/tika/pull/1699 Thanks for your contribution to [Apache Tika](https://tika.apache.org/)! Your help is appreciated! Before opening the pull request, please verify that * there is an open issue on the [Tika issue tracker](https://issues.apache.org/jira/projects/TIKA) which describes the problem or the improvement. We cannot accept pull requests without an issue because the change wouldn't be listed in the release notes. * the issue ID (`TIKA-`) - is referenced in the title of the pull request - and placed in front of your commit messages surrounded by square brackets (`[TIKA-] Issue or pull request title`) * commits are squashed into a single one (or few commits for larger changes) * Tika is successfully built and unit tests pass by running `mvn clean test` * there should be no conflicts when merging the pull request branch into the *recent* `main` branch. If there are conflicts, please try to rebase the pull request branch on top of a freshly pulled `main` branch * if you add new module that downstream users will depend upon add it to relevant group in `tika-bom/pom.xml`. We will be able to faster integrate your pull request if these conditions are met. If you have any questions how to fix your problem or about using Tika in general, please sign up for the [Tika mailing list](http://tika.apache.org/mail-lists.html). Thanks! > PipesParser should have option to extract raw bytes of embedded files > - > > Key: TIKA-4207 > URL: https://issues.apache.org/jira/browse/TIKA-4207 > Project: Tika > Issue Type: New Feature >Reporter: Tim Allison >Priority: Major > > There are many use cases, where text+metadata are important, but users also > need the raw bytes from embedded files. > Let's make it possible to extract the usual rmeta content in _and_ the raw > bytes. This is a preliminary step that will offer more customization options > than the proposal in TIKA-3703. > This is targeted to 3.x. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] TIKA-4207: Add handling of embedded bytes to tika-pipes [tika]
tballison opened a new pull request, #1699: URL: https://github.com/apache/tika/pull/1699 Thanks for your contribution to [Apache Tika](https://tika.apache.org/)! Your help is appreciated! Before opening the pull request, please verify that * there is an open issue on the [Tika issue tracker](https://issues.apache.org/jira/projects/TIKA) which describes the problem or the improvement. We cannot accept pull requests without an issue because the change wouldn't be listed in the release notes. * the issue ID (`TIKA-`) - is referenced in the title of the pull request - and placed in front of your commit messages surrounded by square brackets (`[TIKA-] Issue or pull request title`) * commits are squashed into a single one (or few commits for larger changes) * Tika is successfully built and unit tests pass by running `mvn clean test` * there should be no conflicts when merging the pull request branch into the *recent* `main` branch. If there are conflicts, please try to rebase the pull request branch on top of a freshly pulled `main` branch * if you add new module that downstream users will depend upon add it to relevant group in `tika-bom/pom.xml`. We will be able to faster integrate your pull request if these conditions are met. If you have any questions how to fix your problem or about using Tika in general, please sign up for the [Tika mailing list](http://tika.apache.org/mail-lists.html). Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (TIKA-4229) add microsoft graph fetcher
[ https://issues.apache.org/jira/browse/TIKA-4229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17831680#comment-17831680 ] ASF GitHub Bot commented on TIKA-4229: -- nddipiazza opened a new pull request, #1698: URL: https://github.com/apache/tika/pull/1698 initial attempt to add microsoft graph fetcher Thanks for your contribution to [Apache Tika](https://tika.apache.org/)! Your help is appreciated! Before opening the pull request, please verify that * there is an open issue on the [Tika issue tracker](https://issues.apache.org/jira/projects/TIKA) which describes the problem or the improvement. We cannot accept pull requests without an issue because the change wouldn't be listed in the release notes. * the issue ID (`TIKA-`) - is referenced in the title of the pull request - and placed in front of your commit messages surrounded by square brackets (`[TIKA-] Issue or pull request title`) * commits are squashed into a single one (or few commits for larger changes) * Tika is successfully built and unit tests pass by running `mvn clean test` * there should be no conflicts when merging the pull request branch into the *recent* `main` branch. If there are conflicts, please try to rebase the pull request branch on top of a freshly pulled `main` branch * if you add new module that downstream users will depend upon add it to relevant group in `tika-bom/pom.xml`. We will be able to faster integrate your pull request if these conditions are met. If you have any questions how to fix your problem or about using Tika in general, please sign up for the [Tika mailing list](http://tika.apache.org/mail-lists.html). Thanks! > add microsoft graph fetcher > --- > > Key: TIKA-4229 > URL: https://issues.apache.org/jira/browse/TIKA-4229 > Project: Tika > Issue Type: New Feature > Components: tika-pipes >Reporter: Nicholas DiPiazza >Priority: Major > > add a tika pipes fetcher capable of fetching files from MS graph api -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] TIKA-4229 [tika]
nddipiazza opened a new pull request, #1698: URL: https://github.com/apache/tika/pull/1698 initial attempt to add microsoft graph fetcher Thanks for your contribution to [Apache Tika](https://tika.apache.org/)! Your help is appreciated! Before opening the pull request, please verify that * there is an open issue on the [Tika issue tracker](https://issues.apache.org/jira/projects/TIKA) which describes the problem or the improvement. We cannot accept pull requests without an issue because the change wouldn't be listed in the release notes. * the issue ID (`TIKA-`) - is referenced in the title of the pull request - and placed in front of your commit messages surrounded by square brackets (`[TIKA-] Issue or pull request title`) * commits are squashed into a single one (or few commits for larger changes) * Tika is successfully built and unit tests pass by running `mvn clean test` * there should be no conflicts when merging the pull request branch into the *recent* `main` branch. If there are conflicts, please try to rebase the pull request branch on top of a freshly pulled `main` branch * if you add new module that downstream users will depend upon add it to relevant group in `tika-bom/pom.xml`. We will be able to faster integrate your pull request if these conditions are met. If you have any questions how to fix your problem or about using Tika in general, please sign up for the [Tika mailing list](http://tika.apache.org/mail-lists.html). Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (TIKA-4229) add microsoft graph fetcher
Nicholas DiPiazza created TIKA-4229: --- Summary: add microsoft graph fetcher Key: TIKA-4229 URL: https://issues.apache.org/jira/browse/TIKA-4229 Project: Tika Issue Type: New Feature Components: tika-pipes Reporter: Nicholas DiPiazza add a tika pipes fetcher capable of fetching files from MS graph api -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] Bump com.github.luben:zstd-jni from 1.5.5-11 to 1.5.6-1 [tika]
THausherr merged PR #1697: URL: https://github.com/apache/tika/pull/1697 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org