Aruna, The index and type for Elasticsearch are kinds of partitioning that can help the users organize data, but definitely help in indexing and searching data. Types are not always required, but an index is. Imagine you are trying to store a bunch of tweets from a Twitter feed (or firehose) into Elasticsearch. You could call the index "twitter" and type "tweet" for each tweet that you store in the twitter index. Now say you want to also put Twitter user information into that index. You can reuse "twitter" as the index but then specify "user" as the type. Now you can search the entire index for information in tweets and user data, or you can additionally search by type, perhaps searching only the documents with user type.
In the REST API, the index/type is specified such as GET /twitter/tweet/1 or GET /twitter/user/2 or something like that. The Elasticsearch processors use the index and type information to determine the right call to make to Elasticsearch. You can certainly choose "pdf" as the type if you like, although depending on the sort of queries you'll be running, you may want to pick an index that incorporates any kind of data you'll be keeping together, and a type that is more domain-specific (such as "customer" if it is a PDF full of customer data). Please let me know if that answers your question, I can provide more information if need be. Regards, Matt On Mon, Dec 11, 2017 at 4:18 PM, Joe Witt <[email protected]> wrote: > For that we'll need someone familiar with that processor/Elastic to chime > in :) > > Thanks > > On Mon, Dec 11, 2017 at 4:16 PM, Aruna Sankaralingam < > [email protected]> wrote: > >> Oops I overlooked the question on version that you asked. My apologies. I >> am using Nifi v1.4. >> >> >> >> I moved the pdf file to another folder in the same S3 bucket and Nifi was >> able to pick up. >> >> >> >> Initially it was in >> >> S3 > part-d-prescription-drug/unstructured >> >> I moved to >> >> S3 > Nifi-Pecos-files >> >> >> >> I still don’t know what was wrong with the old location. But for now, I >> am using the one that works. >> >> >> >> I am trying to put this pdf file in elasticsearch. >> >> >> >> I am not sure what I should give for “Index” and “Type”. Should the type >> be “PDF” ? >> >> >> >> Thanks >> >> Aruna >> >> >> >> *From:* Joe Witt [mailto:[email protected]] >> *Sent:* Monday, December 11, 2017 3:32 PM >> >> *To:* [email protected] >> *Subject:* Re: ListS3 Processor Error >> >> >> >> Aruna, >> >> >> >> We'll need to know more about your config/env to help I think. I am not >> aware of any normal usage situation that should result in truncated >> responses. It is possible it is a coding bug we can resolve but I think >> we'll need more details. Did you see the questions in my last reply? >> >> >> >> Thanks >> >> >> >> On Mon, Dec 11, 2017 at 2:50 PM, Aruna Sankaralingam < >> [email protected]> wrote: >> >> Could someone please let me know what is wrong with the configuration >> that it is failing? >> >> >> >> *From:* Aruna Sankaralingam [mailto:[email protected]] >> *Sent:* Monday, December 11, 2017 1:07 PM >> *To:* [email protected] >> *Subject:* RE: ListS3 Processor Error >> >> >> >> Attached my nifi-app.log. Could you please let me know what went wrong? >> >> >> >> *From:* Joe Witt [mailto:[email protected] <[email protected]>] >> *Sent:* Friday, December 08, 2017 4:04 PM >> *To:* [email protected] >> *Subject:* Re: ListS3 Processor Error >> >> >> >> Here is an example I found for another processor >> >> >> >> https://mail-archives.apache.org/mod_mbox/nifi-dev/201509. >> mbox/%3CCAFddr26AEVqnoQ=mWr7DSNDFVrr9NuYy9GCcXg=4FYyCQAbbuw@ >> mail.gmail.com%3E >> >> >> >> Thanks >> >> >> >> On Fri, Dec 8, 2017 at 4:02 PM, Aruna Sankaralingam < >> [email protected]> wrote: >> >> Joe, >> >> Could you please let me know how to turn on the debug logging? >> >> >> >> *From:* Joe Witt [mailto:[email protected]] >> *Sent:* Friday, December 08, 2017 3:59 PM >> *To:* [email protected] >> *Subject:* Re: ListS3 Processor Error >> >> >> >> What version of NiFi? >> >> >> >> Looks like either a classpath/classloader issue OR the amazon client >> library cannot parse the response it is getting back... >> >> >> >> The logs/nifi-app.log should have the full stack trace. If not you can >> turn on debug logging for that processor and perhaps then it will. >> >> >> >> Thanks >> >> >> >> On Fri, Dec 8, 2017 at 3:56 PM, Aruna Sankaralingam < >> [email protected]> wrote: >> >> I am trying to get a pdf file from S3 and load to Elastic Search. The >> ListS3 processor is giving me this error. Could someone please let me know >> where I am going wrong? >> >> >> >> *20:52:25 UTC* >> >> *ERROR* >> >> *37d7226e-0160-1000-6049-d4c489cd32f3* >> >> ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3] >> ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3] failed to process >> session due to com.amazonaws.SdkClientException: Failed to parse XML >> document with handler class com.amazonaws.services.s3.mode >> l.transform.XmlResponsesSaxParser$ListBucketHandler: Failed to parse XML >> document with handler class com.amazonaws.services.s3.mode >> l.transform.XmlResponsesSaxParser$ListBucketHandler >> >> *20:52:25 UTC* >> >> *WARNING* >> >> *37d7226e-0160-1000-6049-d4c489cd32f3* >> >> ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3] Processor >> Administratively Yielded for 1 sec due to processing failure >> >> *20:52:26 UTC* >> >> *ERROR* >> >> *37d7226e-0160-1000-6049-d4c489cd32f3* >> >> ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3] >> ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3] failed to process due to >> com.amazonaws.SdkClientException: Failed to parse XML document with >> handler class com.amazonaws.services.s3.mode >> l.transform.XmlResponsesSaxParser$ListBucketHandler; rolling back >> session: Failed to parse XML document with handler class >> com.amazonaws.services.s3.model.transform.XmlResponsesSaxPar >> ser$ListBucketHandler >> >> *20:52:26 UTC* >> >> *ERROR* >> >> *37d7226e-0160-1000-6049-d4c489cd32f3* >> >> ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3] >> ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3] failed to process >> session due to com.amazonaws.SdkClientException: Failed to parse XML >> document with handler class com.amazonaws.services.s3.mode >> l.transform.XmlResponsesSaxParser$ListBucketHandler: Failed to parse XML >> document with handler class com.amazonaws.services.s3.mode >> l.transform.XmlResponsesSaxParser$ListBucketHandler >> >> *20:52:26 UTC* >> >> *WARNING* >> >> *37d7226e-0160-1000-6049-d4c489cd32f3* >> >> ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3] Processor >> Administratively Yielded for 1 sec due to processing failure >> >> Auto-refresh >> >> >> >> >> >> >> >> >> > >
