Re: [PR] Introduce GoogleDrive Fetcher for tika-pipes [tika]

2025-01-16 Thread via GitHub


nddipiazza commented on PR #2077:
URL: https://github.com/apache/tika/pull/2077#issuecomment-2596585852

   I am porting this into https://github.com/nddipiazza/tika-pipes
   starting now. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Introduce GoogleDrive Fetcher for tika-pipes [tika]

2024-12-09 Thread via GitHub


bartek commented on PR #2077:
URL: https://github.com/apache/tika/pull/2077#issuecomment-2530004259

   @THausherr Thanks, I sorted that out. Looks like my paths are based on 
tika-grpc-3x-features branch paths.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Introduce GoogleDrive Fetcher for tika-pipes [tika]

2024-12-09 Thread via GitHub


THausherr commented on PR #2077:
URL: https://github.com/apache/tika/pull/2077#issuecomment-2528758818

   I didn't tough tika-grpc/pom.xml at all.
   
   Your script has "tika-pipes/tika-grpc" however "tika-grpc" is at the top 
level.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Introduce GoogleDrive Fetcher for tika-pipes [tika]

2024-12-09 Thread via GitHub


bartek commented on PR #2077:
URL: https://github.com/apache/tika/pull/2077#issuecomment-2528691069

   @THausherr Great. Btw, since these changes, I am unable to build tika-pipes 
(which is what I am building, not the whole project).  It looks like the 
pom.xml that was previously expected no longer is applicable. Are you able to 
help?
   
   Here's the error:
   
   ```
   [INFO] BUILD SUCCESS
   [INFO] 

   [INFO] Total time:  02:33 min
   [INFO] Finished at: 2024-12-09T12:47:12-04:00
   [INFO] 

   + mvn dependency:copy-dependencies -f 
/Users/bartek/workspace/tika/tika-pipes/tika-grpc/example-dockerfile/../../../tika-pipes/tika-grpc
   [INFO] Scanning for projects...
   [ERROR] [ERROR] Some problems were encountered while processing the POMs:
   [FATAL] Non-readable POM 
/Users/bartek/workspace/tika/tika-pipes/tika-grpc/example-dockerfile/../../../tika-pipes/tika-grpc/pom.xml:
 
/Users/bartek/workspace/tika/tika-pipes/tika-grpc/example-dockerfile/../../../tika-pipes/tika-grpc/pom.xml
 (No such file or directory) @ 
   ```
   
   And here's my build script:
   
   ```
   set -x
   
   TAG_NAME=$1
   
   if [ -z "${TAG_NAME}" ]; then
   echo "Single command line argument is required which will be used as the 
-t parameter of the docker build command"
   exit 1
   fi
   
   SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && 
pwd )
   TIKA_SRC_PATH=${SCRIPT_DIR}/../../..
   OUT_DIR=${TIKA_SRC_PATH}/tika-pipes/tika-grpc/target/tika-docker
   
   mvn clean install -Dossindex.skip -DskipTests=true -Denforcer.skip=true 
-Dossindex.skip=true -f "${TIKA_SRC_PATH}" || exit
   mvn dependency:copy-dependencies -f "${TIKA_SRC_PATH}/tika-pipes/tika-grpc" 
|| exit
   rm -rf "${OUT_DIR}"
   mkdir -p "${OUT_DIR}"
   
   project_version=$(mvn help:evaluate -Dexpression=project.version -q 
-DforceStdout -f "${TIKA_SRC_PATH}")
   
   cp -r "${TIKA_SRC_PATH}/tika-pipes/tika-grpc/target/dependency" 
"${OUT_DIR}/libs"
   cp -r 
"${TIKA_SRC_PATH}/tika-pipes/tika-fetchers/tika-fetcher-gcs/target/tika-fetcher-gcs-${project_version}.jar"
 "${OUT_DIR}/libs"
   cp -r 
"${TIKA_SRC_PATH}/tika-pipes/tika-fetchers/tika-fetcher-az-blob/target/tika-fetcher-az-blob-${project_version}.jar"
 "${OUT_DIR}/libs"
   cp -r 
"${TIKA_SRC_PATH}/tika-pipes/tika-fetchers/tika-fetcher-http/target/tika-fetcher-http-${project_version}.jar"
 "${OUT_DIR}/libs"
   cp -r 
"${TIKA_SRC_PATH}/tika-pipes/tika-fetchers/tika-fetcher-microsoft-graph/target/tika-fetcher-microsoft-graph-${project_version}.jar"
 "${OUT_DIR}/libs"
   cp -r 
"${TIKA_SRC_PATH}/tika-pipes/tika-fetchers/tika-fetcher-s3/target/tika-fetcher-s3-${project_version}.jar"
 "${OUT_DIR}/libs"
   
   cp 
"${TIKA_SRC_PATH}/tika-pipes/tika-grpc/target/tika-grpc-${project_version}.jar" 
"${OUT_DIR}/libs"
   cp "${TIKA_SRC_PATH}/tika-pipes/tika-grpc/src/test/resources/log4j2.xml" 
"${OUT_DIR}"
   cp 
"${TIKA_SRC_PATH}/tika-pipes/tika-grpc/src/test/resources/tika-pipes-test-config.xml"
 "${OUT_DIR}/tika-config.xml"
   cp "${TIKA_SRC_PATH}/tika-pipes/tika-grpc/example-dockerfile/Dockerfile" 
"${OUT_DIR}/Dockerfile"
   
   cd "${OUT_DIR}" || exit
   
   # build single arch
   #docker build "${OUT_DIR}" -t "${TAG_NAME}"
   
   # Or we can build multi-arch - https://www.docker.com/blog/multi-arch-images/
   docker buildx create --name tikabuilder
   # see 
https://askubuntu.com/questions/1339558/cant-build-dockerfile-for-arm64-due-to-libc-bin-segmentation-fault/1398147#1398147
   docker run --rm --privileged tonistiigi/binfmt --install amd64
   docker run --rm --privileged tonistiigi/binfmt --install arm64
   docker buildx build --builder=tikabuilder "${OUT_DIR}" -t "${TAG_NAME}" 
--platform linux/amd64,linux/arm64 --push
   docker buildx stop tikabuilder
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Introduce GoogleDrive Fetcher for tika-pipes [tika]

2024-12-09 Thread via GitHub


THausherr commented on PR #2077:
URL: https://github.com/apache/tika/pull/2077#issuecomment-2528685532

   Yes!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Introduce GoogleDrive Fetcher for tika-pipes [tika]

2024-12-09 Thread via GitHub


bartek commented on PR #2077:
URL: https://github.com/apache/tika/pull/2077#issuecomment-2528651600

   @THausherr Looks like you got this building. Are you happy with your 
changes? If so I will squash them into a single commit.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Introduce GoogleDrive Fetcher for tika-pipes [tika]

2024-12-09 Thread via GitHub


THausherr commented on PR #2077:
URL: https://github.com/apache/tika/pull/2077#issuecomment-2528469750

   I managed to do a complete build locally, mostly by moving the 
dependencyManagement stuff I introduced to the parent. I'll do another test 
locally and then add this here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Introduce GoogleDrive Fetcher for tika-pipes [tika]

2024-12-09 Thread via GitHub


THausherr commented on PR #2077:
URL: https://github.com/apache/tika/pull/2077#issuecomment-2528299161

   So I was able to fix the google driver fetcher pom.xml, but not the pipes 
gRPC server is failing with dependency convergence errors 😬


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Introduce GoogleDrive Fetcher for tika-pipes [tika]

2024-12-09 Thread via GitHub


bartek commented on PR #2077:
URL: https://github.com/apache/tika/pull/2077#issuecomment-2527988933

   > I got these by looking at the source code. This is just me, I like smaller 
pom.xml files that are easier to understand and maintain.
   
   Thanks for the commits and notes. I'm not too familiar with the project and 
am entering through tika-pipes and its fetcher requirements, so I appreciate 
your patience.
   
   > Is it possible for you to create some sort of unit test, or is this 
impossible because one would need some google drive access?
   
   I imagine we could mock the response from Google Drive, so at least we test 
happy/sad paths. Let me have a try at it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Introduce GoogleDrive Fetcher for tika-pipes [tika]

2024-12-09 Thread via GitHub


THausherr commented on PR #2077:
URL: https://github.com/apache/tika/pull/2077#issuecomment-2527890742

   I got these by looking at the source code. This is just me, I like smaller 
pom.xml files that are easier to understand and maintain.
   
   Is it possible for you to create some sort of unit test, or is this 
impossible because one would need some google drive access?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]