Re: [PR] Introduce GoogleDrive Fetcher for tika-pipes [tika]
nddipiazza commented on PR #2077: URL: https://github.com/apache/tika/pull/2077#issuecomment-2596585852 I am porting this into https://github.com/nddipiazza/tika-pipes starting now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] Introduce GoogleDrive Fetcher for tika-pipes [tika]
bartek commented on PR #2077: URL: https://github.com/apache/tika/pull/2077#issuecomment-2530004259 @THausherr Thanks, I sorted that out. Looks like my paths are based on tika-grpc-3x-features branch paths. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] Introduce GoogleDrive Fetcher for tika-pipes [tika]
THausherr commented on PR #2077: URL: https://github.com/apache/tika/pull/2077#issuecomment-2528758818 I didn't tough tika-grpc/pom.xml at all. Your script has "tika-pipes/tika-grpc" however "tika-grpc" is at the top level. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] Introduce GoogleDrive Fetcher for tika-pipes [tika]
bartek commented on PR #2077:
URL: https://github.com/apache/tika/pull/2077#issuecomment-2528691069
@THausherr Great. Btw, since these changes, I am unable to build tika-pipes
(which is what I am building, not the whole project). It looks like the
pom.xml that was previously expected no longer is applicable. Are you able to
help?
Here's the error:
```
[INFO] BUILD SUCCESS
[INFO]
[INFO] Total time: 02:33 min
[INFO] Finished at: 2024-12-09T12:47:12-04:00
[INFO]
+ mvn dependency:copy-dependencies -f
/Users/bartek/workspace/tika/tika-pipes/tika-grpc/example-dockerfile/../../../tika-pipes/tika-grpc
[INFO] Scanning for projects...
[ERROR] [ERROR] Some problems were encountered while processing the POMs:
[FATAL] Non-readable POM
/Users/bartek/workspace/tika/tika-pipes/tika-grpc/example-dockerfile/../../../tika-pipes/tika-grpc/pom.xml:
/Users/bartek/workspace/tika/tika-pipes/tika-grpc/example-dockerfile/../../../tika-pipes/tika-grpc/pom.xml
(No such file or directory) @
```
And here's my build script:
```
set -x
TAG_NAME=$1
if [ -z "${TAG_NAME}" ]; then
echo "Single command line argument is required which will be used as the
-t parameter of the docker build command"
exit 1
fi
SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null &&
pwd )
TIKA_SRC_PATH=${SCRIPT_DIR}/../../..
OUT_DIR=${TIKA_SRC_PATH}/tika-pipes/tika-grpc/target/tika-docker
mvn clean install -Dossindex.skip -DskipTests=true -Denforcer.skip=true
-Dossindex.skip=true -f "${TIKA_SRC_PATH}" || exit
mvn dependency:copy-dependencies -f "${TIKA_SRC_PATH}/tika-pipes/tika-grpc"
|| exit
rm -rf "${OUT_DIR}"
mkdir -p "${OUT_DIR}"
project_version=$(mvn help:evaluate -Dexpression=project.version -q
-DforceStdout -f "${TIKA_SRC_PATH}")
cp -r "${TIKA_SRC_PATH}/tika-pipes/tika-grpc/target/dependency"
"${OUT_DIR}/libs"
cp -r
"${TIKA_SRC_PATH}/tika-pipes/tika-fetchers/tika-fetcher-gcs/target/tika-fetcher-gcs-${project_version}.jar"
"${OUT_DIR}/libs"
cp -r
"${TIKA_SRC_PATH}/tika-pipes/tika-fetchers/tika-fetcher-az-blob/target/tika-fetcher-az-blob-${project_version}.jar"
"${OUT_DIR}/libs"
cp -r
"${TIKA_SRC_PATH}/tika-pipes/tika-fetchers/tika-fetcher-http/target/tika-fetcher-http-${project_version}.jar"
"${OUT_DIR}/libs"
cp -r
"${TIKA_SRC_PATH}/tika-pipes/tika-fetchers/tika-fetcher-microsoft-graph/target/tika-fetcher-microsoft-graph-${project_version}.jar"
"${OUT_DIR}/libs"
cp -r
"${TIKA_SRC_PATH}/tika-pipes/tika-fetchers/tika-fetcher-s3/target/tika-fetcher-s3-${project_version}.jar"
"${OUT_DIR}/libs"
cp
"${TIKA_SRC_PATH}/tika-pipes/tika-grpc/target/tika-grpc-${project_version}.jar"
"${OUT_DIR}/libs"
cp "${TIKA_SRC_PATH}/tika-pipes/tika-grpc/src/test/resources/log4j2.xml"
"${OUT_DIR}"
cp
"${TIKA_SRC_PATH}/tika-pipes/tika-grpc/src/test/resources/tika-pipes-test-config.xml"
"${OUT_DIR}/tika-config.xml"
cp "${TIKA_SRC_PATH}/tika-pipes/tika-grpc/example-dockerfile/Dockerfile"
"${OUT_DIR}/Dockerfile"
cd "${OUT_DIR}" || exit
# build single arch
#docker build "${OUT_DIR}" -t "${TAG_NAME}"
# Or we can build multi-arch - https://www.docker.com/blog/multi-arch-images/
docker buildx create --name tikabuilder
# see
https://askubuntu.com/questions/1339558/cant-build-dockerfile-for-arm64-due-to-libc-bin-segmentation-fault/1398147#1398147
docker run --rm --privileged tonistiigi/binfmt --install amd64
docker run --rm --privileged tonistiigi/binfmt --install arm64
docker buildx build --builder=tikabuilder "${OUT_DIR}" -t "${TAG_NAME}"
--platform linux/amd64,linux/arm64 --push
docker buildx stop tikabuilder
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] Introduce GoogleDrive Fetcher for tika-pipes [tika]
THausherr commented on PR #2077: URL: https://github.com/apache/tika/pull/2077#issuecomment-2528685532 Yes! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] Introduce GoogleDrive Fetcher for tika-pipes [tika]
bartek commented on PR #2077: URL: https://github.com/apache/tika/pull/2077#issuecomment-2528651600 @THausherr Looks like you got this building. Are you happy with your changes? If so I will squash them into a single commit. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] Introduce GoogleDrive Fetcher for tika-pipes [tika]
THausherr commented on PR #2077: URL: https://github.com/apache/tika/pull/2077#issuecomment-2528469750 I managed to do a complete build locally, mostly by moving the dependencyManagement stuff I introduced to the parent. I'll do another test locally and then add this here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] Introduce GoogleDrive Fetcher for tika-pipes [tika]
THausherr commented on PR #2077: URL: https://github.com/apache/tika/pull/2077#issuecomment-2528299161 So I was able to fix the google driver fetcher pom.xml, but not the pipes gRPC server is failing with dependency convergence errors 😬 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] Introduce GoogleDrive Fetcher for tika-pipes [tika]
bartek commented on PR #2077: URL: https://github.com/apache/tika/pull/2077#issuecomment-2527988933 > I got these by looking at the source code. This is just me, I like smaller pom.xml files that are easier to understand and maintain. Thanks for the commits and notes. I'm not too familiar with the project and am entering through tika-pipes and its fetcher requirements, so I appreciate your patience. > Is it possible for you to create some sort of unit test, or is this impossible because one would need some google drive access? I imagine we could mock the response from Google Drive, so at least we test happy/sad paths. Let me have a try at it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] Introduce GoogleDrive Fetcher for tika-pipes [tika]
THausherr commented on PR #2077: URL: https://github.com/apache/tika/pull/2077#issuecomment-2527890742 I got these by looking at the source code. This is just me, I like smaller pom.xml files that are easier to understand and maintain. Is it possible for you to create some sort of unit test, or is this impossible because one would need some google drive access? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
